This post examines the issue of scientific inference when using imprecise data, using two unrelated examples. The first is about whether cutting toes off frogs reduces their return rate, and the second examines the correlation between contemporary warming and CO_{2}.

A few years ago, I wrote a paper with Kirsten Parris about how toe clipping, a method for marking amphibians, reduces the chances of frogs being recaptured in subsequent surveys. Up to that point, various studies had claimed there were inconsistent effects.

However, our paper demonstrated that the apparent inconsistency could be simply explained by the studies having different sample sizes. Studies with large numbers of frogs had sufficiently precise estimates of return rate that they consistently showed negative effects. Studies with smaller numbers of frogs had low precision – sometimes they demonstrated clear negative effects and sometimes not. The latter results had been misinterpreted as showing that there was no effect when the results were actually also consistent with large negative effects.

The misinterpretation arose because null hypothesis significance testing, a standard statistical approach to data analysis, had been interpreted poorly in the previous studies. Null hypothesis significance testing cannot confirm a null hypothesis, only reject it. But the previous papers had incorrectly interpreted a failure to reject the null hypothesis (in this case of no effect) as evidence that the null hypothesis was true.

We followed up our original paper with a second analysis that focused on measuring the size of the effect of toe clipping rather than using null hypothesis significance testing. It showed that the effect of toe clipping might actually be very similar in all studies (Fig. 1).

##### Fig. 1. Predicted change in return rate of frogs for each toe removed, allowing for linear changes in the effect of toe clipping with each toe removed for four different datasets. Circles are the means of the fitted values and the crosses represent the limits of the 95% credibility intervals. Negative values represent an adverse effect of toe clipping (from McCarthy and Parris 2004). The substantial overlap of the 95% intervals suggests that the responses to toe clipping could be very similar among studies.

Despite this, I continue to review and read papers that commit the same fallacy of interpreting failure to reject a null hypothesis as evidence that the null hypothesis is true, even in papers that cite our work on the topic. I don’t know how to correct it, but it seems to be a fallacy that is embedded in several fields of science. It is a basic issue about how variation in data should be interpreted.

Errors of logic when interpreting variation in data also seem to occur in the climate change debate. This topic has more global relevance than the question about whether frogs are harmed when their toes are cut off so I will switch to a seemingly more controversial track.

In mid 2009, Penny Wong, the then Australian minister for climate change, and Steve Fielding, a senator in the Australian federal parliament, were debating whether global temperatures were increasing and the role of CO_{2}. Steve Fielding asked, given that atmospheric CO_{2} concentrations were increasing over the last ten years, why did temperatures not change similarly if CO_{2} drives temperature? He argued that the world had stopped warming after 1998.

Below are some of the basic data relevant to the debate for the period up to 2008, showing the annual atmospheric CO_{2} concentrations measured at Mauna Loa, and the HadCRUT3 temperature anomalies for the same period.

##### Fig. 2. Annual average atmospherioc CO_{2} concentration as measured at Mauna Loa.

##### Fig. 3. Annual average global temperature anomaly (HadCRUT3) over the same period that CO_{2} has been measured at Mauna Loa.

The temperature record is clearly imprecise – the temperature fluctuates due to measurement error, but also because factors other than CO_{2} drive changes in temperature. Is there sufficient evidence of a decoupling between the CO_{2} concentration and temperature as claimed by Steve Fielding and others? Given the imprecision, what data would indicate that the world had stopped warming despite increases in CO_{2}? To analyse that, I have constructed a simple statistical model where the annual global average temperature is linearly related to the CO_{2} concentration.

If we take the data above, we get a relationship that can be modelled using standard linear regression (Fig. 4). In this first case, I have only used the data up to 1998 to fit the model (black points), the point at which Steve Fielding claimed the world appeared to have stopped warming. The solid line is the line of best fit, and the dashed lines show the 95% prediction interval. This interval is the region within which we would expect 95% of the data points to fall.

##### Fig. 4. Annual average global temperature anomaly (HadCRUT3) versus that year’s CO_{2} concentration. The linear regression for data up to 1998 is displayed, with the dashed lines being the 95% prediction interval. The blue crosses are the data for subsequent years.

Note that as we extrapolate to higher CO_{2} concentrations, the predictions are more uncertain, but only slightly so. The data suggest a relatively clear pattern such that the linear trend is estimated quite reliably; the main uncertainty (beyond the assumption of linearity) is the annual variation. If the future data consistently fall outside the 95% interval, then we can be confident that the apparent linear relationship between CO_{2} and temperature up to 1998 had broken down.

So, what happened for the data between 1999 and 2011? These points are shown in blue (Fig. 4). It illustrates that the temperature in these years is consistent with the previous pattern; global temperatures have remained within the bound predicted using data for the period 1959 to 1998. We can’t be sure the world has stopped warming.

If we build a model with the most recent data, we then have a model that can be falsified. If future data fall outside the 95% region, we should start suspecting that the apparent linear relationship between CO_{2} and temperature has broken down. We would occasionally expect data for one year to fall slightly outside the region, but two consecutive years would be rather compelling.

So we are close to a potentially interesting point in the temperature record. If the average global temperature anomaly in 2012 is less than 0.4°C (and assuming CO_{2} concentrations increase by 2 ppm from 2011), then the data will be suggestive. Consecutive years of low temperature anomalies (e.g., 0.4°C in 2012 and 0.35°C 2013) would place the data squarely outside the range predicted and indicate that the apparent linear relationship does not hold.

##### Fig. 5. Annual average global temperature anomaly (HadCRUT3) versus that year’s CO_{2} concentration. The linear regression for the full dataset (1959-2011) is displayed, with the dashed lines being the 95% prediction interval.

Based on this analysis, there is not compelling evidence that the world has stopped warming since 1998. That is, it is not possible to reject the null hypothesis that the world continues to warm. But of course, neither is it possible to say that the world has continued to warm. We are dealing with imprecise data where the answers have shades of grey. But the observed temperatures from 1999 onwards (blue crosses in Fig. 4) do look remarkably close to the extrapolation from previous years.

However, let’s now consider the possibility that the temperature has stopped increasing. Steve Fielding suggested that this had occurred from 1998, so we will use that as the hypothesis in a second analysis. For this model we assume that the average temperature does not change from 1998 onwards, but it changes linearly with CO_{2} concentration prior to that point. Fig. 6 shows the fit of that model (purplish-grey), superimposed on the fit that assumes a continuing linear increase (from Fig. 5; black lines).

##### Fig. 6. Annual average global temperature anomaly (HadCRUT3) versus that year’s CO_{2} concentration. The linear regression for data up to 2011 is displayed, with the dashed lines being the 95% prediction interval.The model in which the temperature stopped increasing is also shown (purplish-grey lines).

OK, this graph is a little busy, but let’s see what it says. Firstly, the data are consistent with a world that has stopped warming from 1998; the data fall within the 95% prediction intervals. So both models, one in which the world is warming and one in which it stopped warming in 1998 are consistent with the data. That is a little disappointing. It would be nice if the data could reliably distinguish between the hypotheses. But that is life with imprecise data – data are sometimes not compelling. But let’s think – what data in the future would help us distinguish between the models?

If we assume that the world stopped warming after 1998, then we expect 95% of the future temperature anomalies to lie in a region between approximately 0.2 and 0.85. If the world continued to warm in a manner that is linearly related to CO_{2}, then we would expect the temperature anomaly to be between 0.46 and 1.07 in 2012, and slightly higher in 2013. These regions overlap, but there are areas where they do not. Data in these non-overlapping regions (anomalies <0.46 or >0.85) would be reasonably strong evidence to distinguish between the two hypotheses. However, anomalies in the region 0.46 to 0.85 would continue to provide equivocal support for both (although the expected region for anomalies continues to increase under the linearly increasing model). I’ll plot anomalies on this blog in coming years, and see what the evidence says.

A major caveat of this is that my two models are statistical representations of hypotheses. The model with a linear relationship between temperature and CO_{2} is elementary, and it is a caricature of the position of Penny Wong. It would be good if we used a proper climate model, or several models. Similarly, the model in which warming stopped in 1998 is a caricature of the position of Steve Fielding. Nevertheless, the models seem to approximate the different points of view.

A second caveat, is that this analysis ignores all other evidence that might exist about global warming. The surface temperature record is only a small component of the uptake of heat by the planet. Melting ice caps and thermal expansion of water are also evidence to be considered. But I am going to ignore those here.

So, what does this tell us about how to analyse data? It emphasises that failing to reject the null hypothesis does not lend support to the null hypothesis. Null hypothesis significance testing is not designed to do that. The data do not allow us to reject the hypothesis that the world has continued to warm (Fig. 4). The data do not allow us to reject the hypothesis that the world stopped warming in 1998 (Fig. 5). Neither analysis allows us to reject the null hypotheses – the data are equivocal.

We would be wrong to conclude that a good fit of the data to a model demonstrates the validity of the model. All models are wrong – they are meant to be wrong because they are supposed to exclude less important aspects of reality to help focus on the aspects that matter. It is valuable to test the fit of a model, because a poor fit is instructive. But in the presence of a good fit, the interesting question is whether one model works better than another. This is something I forgot in a paper that I recently submitted on evaluating some biodiversity indices. Luckily a reviewer was sharp enough to point it out to me.

Of course, the above is a rather roundabout way of assessing the relative support for the two hypotheses. At what point would we conclude that one hypothesis has more support than another? To do that, we could turn to Bayes’ rule, to determine the probability of the hypothesis being true give the data. But that will need to wait until some other time. But speaking of Bayes’ rule, I have a strong prior belief that toe clipping of frogs has not caused global warming.

Edit: This is a brief video of Richard Alley discussing short-term versus long-term trends in temperatures and cherry-picking of data:

Pingback: Is the temperature rising? | Michael McCarthy's Teaching

Pingback: Interpreting trends with variation in data | Michael McCarthy's Teaching

Pingback: Interpreting ecological data | Michael McCarthy's Research

Pingback: A flurry of blogs on NHST | Michael McCarthy's Research

Pingback: Evil p-values | Michael McCarthy's Research

Pingback: Qaecologists in the blogosphere | Quantitative & Applied Ecology Group