Trend Analysis Philosophy

Claudia Stubenrauch
Anomaly.pdf -- an explanation of how Claudia's analyses were done.

Joel Norris
Comprehensive trend analysis involves two issues. The first is the value of the trend and its uncertainty, and the second is whether a trend is a good statistical model for the data (i.e., do the data fit y = a + b * t + e[t], where the residuals e[t] come from nothing more than observational uncertainty). Meteorological trend analyses universally ignore the latter issue, presumably because a trend is almost never a good statistical model for the data (i.e., even when a trend is present, the residuals include real weather variability in addition to observational uncertainty). Moreover, in most circumstances there is no physical basis to expect that the data would exclusively follow a trend. Thus, trend analysis in meteorological research generally ends up being nothing more than a convenient means of summarizing low-frequency variability. I do not say this as a criticism because trend analysis often does provide a useful simplification of the data, but rather as a reminder that it is a simplification. Also, we must keep in mind that the trend will depend on the choice of endpoints.

Returning to the first issue, I have heard it asserted that a trend (measured as b * [t_end - t_beg]) is not statistically different from zero if the trend value is less than the uncertainty of an individual data value. This view is incorrect because multiple independent measurements can provide information with substantially less uncertainty than any individual measurement. According to the least squares method, the uncertainty of a trend value (sigma_trend expressed as a 95% confidence interval) is

sigma_trend = sigma_y * [t_end - t_beg] / (sqrt[N] * stddev[t])

where sigma_y is the uncertainty of an individual y value (expressed as a 95% confidence interval and assumed to be constant), N is the number of independent data points, and stddev[t] is the standard deviation of the time points. Note that the N may be different from the nominal number of data points if autocorrelation is present in the time series. If the time points are uniformly distributed, the above equation simplifies to

sigma_trend = sigma_y * sqrt[12] / sqrt[N]

Thus, a trend based on more than twelve points can still be statistically different from zero even if it is less than the uncertainty of an individual point.

Another concern is the common meteorological practice of determining trend uncertainty without explicit reference to the uncertainty of individual data points.

sigma_trend = 2 * stddev[e] * [t_end - t_beg] / (sqrt[N] * stddev[t])

where stddev[e] is the standard deviation of the residuals from the trend line and the factor of 2 converts this to a 95% confidence interval. (Note that for simplicity of presentation I have been treating N-2, N-1, and N as equivalent wherever they appear in the full equations.) Although trend uncertainty in this case is calculated only from the y values and not sigma_y, it is essential to keep in mind that this procedure includes an implicit assumption about the observational uncertainty of individual data points, namely that it is equivalent to 2 * stddev[e].

Although one might assume that taking the explicit observational uncertainty (sigma_y) into account would result in a larger value of sigma_trend, that is not necessarily the case. It instead depends on whether sigma_y is larger than 2 * stddev[e]. Although it may seem counterintuitive, I expect that trend uncertainty calculated only from the spread of the residuals will generally be larger than the trend uncertainty calculated from the uncertainty of individual data points. This is because the spread of the residuals arises from true natural variability in addition to observational uncertainty and consequently will usually be larger than the stated observational uncertainty.

Returning to the issue of whether a trend is a good statistical model for the data, the method of calculating trend uncertainty based on the spread of the residuals works by assuming that a trend is a good model and from that determines what presumed value of observational uncertainty is consistent with it. On the other hand, the method of calculating trend uncertainty based on the previously specified uncertainty of individual data points can result in trend that is defined with great precision that is nonetheless a poor statistical model for the data.

Here are my recommendations for trend analysis:

Trend analysis can be a useful way to summarize differences in low-frequency variability in different cloud datasets, but it is essential to use the same beginning and end points in any intercomparisons.
Trend uncertainty should be calculated using both methods (specified uncertainty of individual data points or standard deviation of residuals), and the larger value should be chosen.
Trend uncertainty calculations should take into account that the number of independent data points in a time series may be fewer than the nominal number of points.