## Saturday, June 12, 2010

### Illinois temperature data

Today I am going to return to looking at state temperatures over the past 110 years. It is something that I started doing on Saturdays a while ago, since there has been some debate, in the blog sphere, about the validity of certain choices for stations used in the Goddard Institute for Space Studies (GISS) network for the United States. In the non-too-distant past they have truncated the number of stations that they use to determine temperature, and I became curious as to why they chose the stations that they did. So, recognizing that it is a long-handed way of doing this, I put together a procedure for looking at individual state data, which now covers a number of states (See the list on the lower right side of the page).

And in that procedure I write the post as I find and evaluate the numbers, so that as I start I don’t actually know what I am going to find. The first step in the process is to download the USHCN data for the sites that they list in Illinois and add them to the spreadsheet. There are 36 sites, and the data is smoothly transferred.

A quick check, and Chiefio states that GISS uses three stations for Illinois, Chicago, Peoria and Moline. Now this immediately gives a problem. Because when you look for Chicago on the GISS site it gives you two alternatives – Midway and O’Hare. The problem is that the data for Midway is from 1880 to 1981, and that for O’Hare is 1958 to 2010. For those not familiar with the city, I seem to recall that Midway actually sits on the lake, while O’Hare is a bit further inland. But being curious I downloaded both sets of data – with the intent of correlating the concurrent years and seeing if there was a correction I could then apply to the O’Hare data to give a full suite of data over the period of interest (1895 to 2008).

Peoria is fine, a full set of data, but Moline is also a problem, since GISS doesn’t have a station there. Hmmm! Odd! Well there is one for Davenport Iowa, which is just across the river, so I’ll use this for now. But then I check on the stations nearest to Davenport and lo! There is a Moline station, but it doesn’t start until 1944. So I’ll patch it and the Davenport data from before that period together. But first a check to ensure that they are similar, over the common period. Looking at both the two Chicago stations and then the Davenport/Moline one. Well it turns out that even though they are close, the two temperatures don’t exactly follow each other directly in either case. I have plotted the differences (Midway – O’Hare, Davenport – Moline) over the common time periods.

There is a very slight trend line increase in difference with time for Chicago, but it is not great, and the average temperature difference in Chicago is that Midway is 1.26 deg F warmer, and Moline is 0.97 deg F cooler than Davenport. It is a bit of a kludge to put the two together, but with that adjustment to allow using a full set of data for the 3 GISS stations, the difference between the GISS stations and the USHCN stations is a fairly consistent (no significant statistical change over time) 1.13 degrees lower. Which is not surprising, really, given that the 3 GISS stations are all located in the Northern part of the state.

As for the state as a whole, looking at the temperature trend for the past 115 odd years we get:

Not a very strong increase in temperature with time, in fact barely significant, and the trend is moving back towards the historical average. (I am beginning to suspect that we might see this in the MidWest states, maybe I should pose it as a hypothesis in the next state I do?)

Well I said that the reason that the GISS averages were low was that they were in the Northern part of the state, which would mean a strong correlation of temperature with latitude. And, as usual, there is.

On the other hand there isn’t much of a correlation with Longitude – not a whole lot of change in the state, going from one side to the other.

And here is the first surprise of the evening, with relatively little elevation change, I didn’t think that there would be much correlation with elevation – Wrong!

So even though the change In height is small, it is consistent with what we saw for states with much greater elevation change out west.

The premise of the next plot was that conditions around stations have deteriorated with time, and that, as a result the standard deviation of the station data would increase with time. With one or two exceptions that hasn’t proved true, but this shows steady improvement – which may be due to the automation of the data gathering.

So now it is time to get the population data (since this is one of the larger disagreements with GISS). I use the information from a Google search, and the city-data values, where available, which in this case is for all the sites in this survey. And, whoops, the second surprise of the night.

This is about the first time that there has not been a good correlation with a log-plot, and perhaps it is because of the geography of the state, where the communities are scattered around the state, with the largest city in the North.

Well I could take out latitude, and see what that did, but since I am just getting back into this, I think, for now, we’ll leave it where it is.

Now which way to go next?

And in that procedure I write the post as I find and evaluate the numbers, so that as I start I don’t actually know what I am going to find. The first step in the process is to download the USHCN data for the sites that they list in Illinois and add them to the spreadsheet. There are 36 sites, and the data is smoothly transferred.

A quick check, and Chiefio states that GISS uses three stations for Illinois, Chicago, Peoria and Moline. Now this immediately gives a problem. Because when you look for Chicago on the GISS site it gives you two alternatives – Midway and O’Hare. The problem is that the data for Midway is from 1880 to 1981, and that for O’Hare is 1958 to 2010. For those not familiar with the city, I seem to recall that Midway actually sits on the lake, while O’Hare is a bit further inland. But being curious I downloaded both sets of data – with the intent of correlating the concurrent years and seeing if there was a correction I could then apply to the O’Hare data to give a full suite of data over the period of interest (1895 to 2008).

Peoria is fine, a full set of data, but Moline is also a problem, since GISS doesn’t have a station there. Hmmm! Odd! Well there is one for Davenport Iowa, which is just across the river, so I’ll use this for now. But then I check on the stations nearest to Davenport and lo! There is a Moline station, but it doesn’t start until 1944. So I’ll patch it and the Davenport data from before that period together. But first a check to ensure that they are similar, over the common period. Looking at both the two Chicago stations and then the Davenport/Moline one. Well it turns out that even though they are close, the two temperatures don’t exactly follow each other directly in either case. I have plotted the differences (Midway – O’Hare, Davenport – Moline) over the common time periods.

There is a very slight trend line increase in difference with time for Chicago, but it is not great, and the average temperature difference in Chicago is that Midway is 1.26 deg F warmer, and Moline is 0.97 deg F cooler than Davenport. It is a bit of a kludge to put the two together, but with that adjustment to allow using a full set of data for the 3 GISS stations, the difference between the GISS stations and the USHCN stations is a fairly consistent (no significant statistical change over time) 1.13 degrees lower. Which is not surprising, really, given that the 3 GISS stations are all located in the Northern part of the state.

As for the state as a whole, looking at the temperature trend for the past 115 odd years we get:

Not a very strong increase in temperature with time, in fact barely significant, and the trend is moving back towards the historical average. (I am beginning to suspect that we might see this in the MidWest states, maybe I should pose it as a hypothesis in the next state I do?)

Well I said that the reason that the GISS averages were low was that they were in the Northern part of the state, which would mean a strong correlation of temperature with latitude. And, as usual, there is.

On the other hand there isn’t much of a correlation with Longitude – not a whole lot of change in the state, going from one side to the other.

And here is the first surprise of the evening, with relatively little elevation change, I didn’t think that there would be much correlation with elevation – Wrong!

So even though the change In height is small, it is consistent with what we saw for states with much greater elevation change out west.

The premise of the next plot was that conditions around stations have deteriorated with time, and that, as a result the standard deviation of the station data would increase with time. With one or two exceptions that hasn’t proved true, but this shows steady improvement – which may be due to the automation of the data gathering.

So now it is time to get the population data (since this is one of the larger disagreements with GISS). I use the information from a Google search, and the city-data values, where available, which in this case is for all the sites in this survey. And, whoops, the second surprise of the night.

This is about the first time that there has not been a good correlation with a log-plot, and perhaps it is because of the geography of the state, where the communities are scattered around the state, with the largest city in the North.

Well I could take out latitude, and see what that did, but since I am just getting back into this, I think, for now, we’ll leave it where it is.

Now which way to go next?

Subscribe to:
Post Comments (Atom)

H.O.

ReplyDeleteI could be wrong, but I don't think you know how to perform the test for statistical significance on a linear regression. I think you're thinking that you look for whether the R^2 is less than 5% but that's not right. I suggest consulting:

http://en.wikipedia.org/wiki/Simple_linear_regression

for the correct procedure - at least the basic case assuming the residuals are normally distributed and not auto-correlated (which last they might be, and that could be tested for separately). You want the section down at the bottom on confidence intervals.

Stuart:

ReplyDeleteRecognizing that life is more complicated than I have made out, my initial intent was to slowly walk the discussion into the use and understanding of more complicated statistical treatments.

The best intentions having the habit of getting lost in the discussions, there have been a number of other distractions with what the data is starting to reveal that the initial intent has been sidetracked.

I have got into the bad habit of using something that I planned to be only a temporary measure for longer than I intended. It's what happens when an oil spill takes over your plans.

Having now looked at your reference, I don't think that I have changed my mind that what I am doing, as a primary filter and evaluation, is problematic.

ReplyDeleteBasically at this stage I am looking for factors that influence change, as a simple filtering mechanism, this has the advantage of being relatively straightforward, though I gather that there is a reluctance among climate change advocates to use the R^2 values.

This comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDelete