Andrew C. Comrie
Dept. of Geography and Regional Development
University of Arizona, Tucson, AZ 85721, U.S.A.
Abstract: The dangers of assuming apparent trends to be real are highlighted via examination of a 40-year random time series of simulated temperature. The overall trend was examined using linear regression, and shorter-term trends were identified using a 9-year moving average. The likelihood of the observed trend having occurred by chance was evaluated by comparison to trends in 100 random series. The example series displayed annual and decadal variability, as well as a clear upward overall trend (slope = 0.0089; R2 = 0.1616) with a 2 percent chance of occurrence. The findings underscore the care that should be taken when evaluating trends in data for which the controlling processes are not fully understood.
Introduction
Many climate studies have examined trends in quantities such as
temperature,
precipitation, and carbon dioxide (CO2) based on time series
of data collected over the last 50 to 100 years (e.g., Cayan et al.,
1998;
Peterson and Vose, 1997; Keeling and Whorf, 1998). These studies
frequently
include time-series plots showing, for example, increases since the
middle
of the twentieth century. In some cases, these figures include trend
lines
or smoothed curves to highlight the nature of a particular trend.
The statistical strength or weakness of any such trend is usually detailed in the paper. However, it is not uncommon for a graph of an especially newsworthy trend to be reproduced in the media. Figure 1 shows two examples of this phenomenon, the annual Mauna Loa CO2 curve and the annual mean minimum temperature for Tucson, Arizona. While trends published in scientific articles have undergone review for scientific and statistical robustness, it is easy for the untrained eye to see apparent trends in other similar, relatively short time series that may not be real.

Figure 1. Two examples of climatic time series and trend lines for (a) Mauna Loa CO2 data from Keeling and Whorf (1998) and (b) mean minimum temperature data at Tucson International Airport from Peterson and Vose (1997).
The aim of this paper is to examine the apparent trend in a simulated annual climatic time series using random numbers. Any trends present in the data will have occurred by chance, and will highlight the level of caution required for interpretation.
To illustrate the likelihood of the observed trend having occurred by chance, 100 versions of the random series were generated for comparison. Slope coefficients were calculated for each series and tabulated by frequency of occurrence.
Results
Figure 2 illustrates the simulated time series and results. Visual
inspection
shows a clear upward trend in the 40-year series, although there is
noticeable
annual and decadal variability within the overall trend. The regression
line has a calculated slope of 0.0089 and an intercept of 0.401, and it
explains about 16 percent of the variance in the data (R2 =
0.1616). The moving average highlights two apparent cycles of rising
and
falling simulated annual temperatures, with a decrease in the middle of
the series that lasts for more than a decade. While the overall spread
of data covers the range between 0 and 1, the higher values tend to
fall
(randomly) in the middle and later part of the this particular series,
thereby leading to an apparent upward trend.

Figure 2. Random time series of 40 simulated annual temperatures, showing the raw annual values, the smoothed series using a 9-year moving average to highlight decadal variability, and the best-fit regression line highlighting the apparent long-term trend.
If these data were actual temperatures, this would be the point to consider explanations for the observed trends in the data. However, these are randomly generated data that are known to have occurred by chance. To examine the likelihood of the strong apparent trend in Figure 1, the frequency distribution of slope values representing the long-term trends from 100 simulated data series is provided in Table 2. It can be seen that the chance of the 0.0089 slope in Figure 1 is about 2 percent, or 1 in 50 occurrences.
Table 1: Percentage frequency of slope coefficients in 10 equal sized categories from 100 series of 40 years each.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Discussion and Conclusion
The results show that a remarkably strong apparent trend occurs in this
example of a 40-year random time series of simulated temperature. There
are also strong apparent shorter-term trends visible in the data. While
these trends are real in the sense that they exist for these specific
data,
the more refined question is to what degree the observed long-term
trend
might have occurred by chance. The simulation of 100 time series
provides
an answer to this question, and it mimics what would normally be
calculated
with some basic statistics.
For these 100 trials, 67 percent of the slopes fell between -0.004 and 0.004 (coinciding with a quantity known as the standard deviation). There are few slope values near the upper and lower tails of the frequency distribution, which is said to be normal (or bell shaped), and it appears that the trend in this example is relatively unusual. Slopes of this magnitude occur with a frequency of only about 2 percent. This may seem small, but to put it in perspective, if individual random time series were assigned to each member of a class of 25 students there would be a 50 percent chance of someone having a series displaying a trend as strong as this example.
Notice also that there is an equal chance of any simulated temperature in the series (or the temperature for the next year, 41) being between 0 and 1. Yet, the slope values calculated from the time series are normally distributed, and they have a much greater chance of being near zero.
In conclusion, this paper has examined trends in a simulated annual climatic time series using random numbers. The study identified a clear long-term trend in an example series that is known to have occurred by chance, and it highlights the caution that should be used when interpreting trends in situations where the underlying processes are not fully understood.
References
Cayan, D.R., M.D. Dettinger, H.F. Diaz, and N.E. Graham, 1998: Decadal
variability of precipitation over western North America. Journal of
Climate, 11, 3148-3166.
Keeling, C.D. and T.P. Whorf, 1998: Atmospheric CO2 concentrations -- Mauna Loa Observatory, Hawaii, 1958-1997. Technical Report NDP-001, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
Peterson, T.C. and R.S. Vose, 1997: An overview of the Global Historical Climatology Network temperature data base. Bulletin of the American Meteorological Society, 78, 2837-2849.
Appendix
Raw data for the example 40-year time series used in the study, and
illustrated in Figure 2.
Year Raw_Data 9-yr_Average
1 0.497138259
2 0.167629143
3 0.594334213
4 0.25999216
5 0.571433527 0.408250071
6 0.24064353 0.400273476
7 0.598618197 0.422243724
8 0.181487353 0.434713576
9 0.562974255 0.475839872
10 0.42534891 0.452854774
11 0.365361371 0.52294034
12 0.706562879 0.543391197
13 0.63012883 0.631537732
14 0.364567641 0.645745307
15 0.871413628 0.696658066
16 0.782675907 0.735152202
17 0.974806165 0.700988974
18 0.690842427 0.670585785
19 0.883563743 0.635680156
20 0.711808598 0.608218434
21 0.399093828 0.562599102
22 0.356500129 0.514998037
23 0.050416981 0.481188078
24 0.62425813 0.455627157
25 0.372101917 0.448180189
26 0.546396576 0.43987523
27 0.386552802 0.510629795
28 0.653515449 0.598826706
29 0.644785886 0.590463604
30 0.324349197 0.648739235
31 0.993291215 0.697518206
32 0.844189183 0.758854334
33 0.548990211 0.755409924
34 0.896582595 0.777210928
35 0.985407317 0.766883809
36 0.938577953 0.766599916
37 0.622515757
38 0.840994928
39 0.231405118
40 0.990736186