![]() |
![]() |
![]() |
![]()
|
Trend Analysis PhilosophyClaudia Stubenrauch Joel Norris Returning to the first issue, I have heard it asserted that a trend (measured as b * [t_end - t_beg]) is not statistically different from zero if the trend value is less than the uncertainty of an individual data value. This view is incorrect because multiple independent measurements can provide information with substantially less uncertainty than any individual measurement. According to the least squares method, the uncertainty of a trend value (sigma_trend expressed as a 95% confidence interval) is sigma_trend = sigma_y * [t_end - t_beg] / (sqrt[N] * stddev[t]) where sigma_y is the uncertainty of an individual y value (expressed as a 95% confidence interval and assumed to be constant), N is the number of independent data points, and stddev[t] is the standard deviation of the time points. Note that the N may be different from the nominal number of data points if autocorrelation is present in the time series. If the time points are uniformly distributed, the above equation simplifies to sigma_trend = sigma_y * sqrt[12] / sqrt[N] Thus, a trend based on more than twelve points can still be statistically different from zero even if it is less than the uncertainty of an individual point. Another concern is the common meteorological practice of determining trend uncertainty without explicit reference to the uncertainty of individual data points. sigma_trend = 2 * stddev[e] * [t_end - t_beg] / (sqrt[N] * stddev[t]) where stddev[e] is the standard deviation of the residuals from the trend line and the factor of 2 converts this to a 95% confidence interval. (Note that for simplicity of presentation I have been treating N-2, N-1, and N as equivalent wherever they appear in the full equations.) Although trend uncertainty in this case is calculated only from the y values and not sigma_y, it is essential to keep in mind that this procedure includes an implicit assumption about the observational uncertainty of individual data points, namely that it is equivalent to 2 * stddev[e]. Although one might assume that taking the explicit observational uncertainty (sigma_y) into account would result in a larger value of sigma_trend, that is not necessarily the case. It instead depends on whether sigma_y is larger than 2 * stddev[e]. Although it may seem counterintuitive, I expect that trend uncertainty calculated only from the spread of the residuals will generally be larger than the trend uncertainty calculated from the uncertainty of individual data points. This is because the spread of the residuals arises from true natural variability in addition to observational uncertainty and consequently will usually be larger than the stated observational uncertainty. Returning to the issue of whether a trend is a good statistical model for the data, the method of calculating trend uncertainty based on the spread of the residuals works by assuming that a trend is a good model and from that determines what presumed value of observational uncertainty is consistent with it. On the other hand, the method of calculating trend uncertainty based on the previously specified uncertainty of individual data points can result in trend that is defined with great precision that is nonetheless a poor statistical model for the data. Here are my recommendations for trend analysis:
|