Saturday, June 19, 2010

Indiana Temperature data

Saturday is when I look at State Temperature data, and after looking at Illinois last week, I thought Indiana might be a reasonable next choice, moving East after the sojourn out West that started the series. Having found, as with Missouri, little correlation of temperature with time, this week’s hypothesis is that the Central States are less sensitive to changes in temperature over time, than the rest of the country. So let’s see if it is true. (For those new to this I try and pose a hypothesis at the start of the post, and then, writing the post as I run the calculations, find out if it is true. Some are, some aren’t). (And for those who have been reading these for a while there was a significant surprise this week – see if you spot it – grin - it proves I write these as I do them).

It is a little irritating to note that the temperature values I am using are not actually raw data, but as Dr. Hansen noted recently, the USHCN data is already adjusted for urban heat island effects.
We use the unadjusted version of GHCN. However, note that a subset of GHCN, the United States Historical Climatology Network (USHCN), has been adjusted via a homogenization intended to remove urban warming and other artifacts [Karl et al., 1990; Peterson and Vose, 1997].
It is an interesting note that this current study shows that the effect of population size on the data remains present, in many states, and we will now see if it holds true in Indiana. But at some stage I suppose I will (once I get sufficient states under my belt to be credible) have to see about the raw data issue.


So starting off for Indiana with the USHCN Network for that state, there are 36 stations so there will be a slight pause while I create the spreadsheet and import the data. Oh, and our first surprise of the day. The web site for the data has been reconfigured, so that the new page (this is going for Angola, which is the second USHCN site in Indiana) carries a number of options, rather than just the one set of adjusted data that I had been using until now:


So the question is whether we want the fully adjusted data, the raw data, or the time of observation adjusted data. Well I should run a series of tests to see what the difference between the sets turns out to be – but 36 stations is a lot of data to load, so I will probably do that at a later date for a smaller state, and for now use the data set that is merely adjusted for time of observation bias (The TOBS data set). So let’s get those into the spreadsheet.

And to remind you how I am doing that, you select the middle file option to download:


Click on that brings up this screen:


Select the average mean temperature (TOBS) and click the submit button at the bottom.

This brings up the download page:


Click on the blue line, and this downloads the file, which I then save as Angola Ann Temps on my Desktop, since this makes it easier to then find, and download the data into the spreadsheet. (See Missouri file for how I do that. The states I have covered already are in a list at the very bottom of the right-hand column on the site).

Well, having done that I discover, using Angola for the review, that the RAW and TOBS data sets have years missing, and that it is only the fully adjusted file that has the data for every year. Tsk! Well, had I more time today I would do a little adjusting of my own, but since I don’t, I will set that to one side for today, and go back to using the adjusted mean values for today. (Too many other things to do). Well at least I caught this early).

But just to give an example of the differences, here are the RAW, TOBS and adjusted data for the first 12 years of each data set. (There is then a gap of several years in the RAW and TOBS files – though not the ADJ file).


You will notice that the RAW and TOBS files have several temperatures in the 50’s, while the ADJ file does not. Hmm! The average RAW data temperature is 48.7 deg; the average TOBS temperature is 49.4 degrees: the average ADJ temperature is 46.3 deg. Well how does that compare with more recent temperatures – let’s look at the most recent 12 years:


You can see that the ADJ values (the ones that are used by GISS) are much closer to the RAW and TOBS values, with the average RAW temperature being 48.3 deg F, the TOBS value being 48.8 def F, and the ADJ data being 48.7 deg F. Now if I were tempted to be suspicious it would seem a little odd that the adjustment to get the used data lowers the historic temperatures while leaving current values more alone. But to substantiate that I would need to look at a lot more data than I have time to examine today, but it does cast some slight suspicion on the data as I move forward, for now using the Adjusted data.

So now, we move back to getting the data, and that means re-acquiring Anderson as an Adjusted Mean temperature. And so we get the data from the USHCN site, now carefully marking it as adjusted, and go to get the GISS data. There are four GISS stations in Indiana, according to Chiefio and these are in Evansville, Indianapolis, Fort Wayne and South Bend.

There is no data for 1895 or 1896 for Evansville and Fort Wayne, and South Bend data does not start until 1944. In the past I have interpolated data, or used adjacent sites to generate those numbers. But this time I will just average the value for the GISS stations based on the number of stations in that one year. Doing that clearly shows that the number is small enough, that each has a significant impact. Consider the difference between the GISS data and the USHCN data:


The change when the fourth GISS station is added is clear. Looking at the two segments of the curve, however, it is also clear that there is an increasing difference between the GISS stations and the USHCN stations, in both. This would indicate (and remember that the cities GISS chose are some of the largest in the state) that they are warming more than the average for the state.

For the state as a whole (now averaging all stations, for each year) the influence of the lack of one station becomes much less. But, as predicted by the hypothesis at the beginning of the post, there has not been a significant (i.e. R-squared greater than 0.05) increase in temperature over the 110 years.


The state is fairly flat, and longer North-South so that there is enough scatter for a good correlation with Latitude, and as with every other state, that is evident from the data.


And since there aren’t any strong geological features East-West I wasn’t looking for a strong correlation with longitude, and there isn’t one:


However, even though the state elevation only fluctuates over a couple of hundred meters, there is a strong correlation with elevation.


The other two graphs that I usually plot are the standard deviation of the data over time (to see if Anthony Watts concern about station deterioration is true – if so SD should perhaps increase over time.


And it does appear as though, within the past decade or two, that quality is beginning to suffer.

And so the question is, with the data for population being “factored out” of the data by the USHCN, is there a population trend. Bear in mind that the states where this was evident were states where there were a lot of very small communities. In Indiana the communities with stations are increasingly larger.

Hmm! As with Illinois last week it appears that there is no longer a correlation with population. Maybe the USHCN correction is a good one? (Perhaps, since the tabulation is new, I should go back and see if the earlier state data is still the same?)


Well, more interesting results. The story seems to be getting curiouser and curiouser the more we look. And we aren’t even half-way done yet!

11 comments:

  1. H.O. See my comment to your last temperature post - R^2 > 0.05 is not a valid way of assessing statistical significance on a linear regression.

    ReplyDelete
  2. Stuart:
    Actually now that I think about it a bit more I don't think you quite understand what I am doing.

    I am using the measurement, which is a valid one, as sort of a filter to see what has most effect, and also as a simple initial guide to try and work out the major parameters. Vide the latitude relationship and the impact of longitude where it is, for example, actually tied to elevation.

    ReplyDelete
  3. HO, I'm wondering if you're using the conventional calculation for the standard deviation. If you are, and the population mean is gradually increasing over the same period, the estimated standard deviation will be larger than its true value.

    ReplyDelete
  4. Porsena:
    Since this is just a "for curiosity" calculation I am just using the Standard Deviation formula that exists on Excel.

    ReplyDelete
  5. H.O. I'm specifically referring to the sentence "But, as predicted by the hypothesis at the beginning of the post, there has not been a significant (i.e. R-squared greater than 0.05) increase in temperature over the 110 years.". I think any passing reader would assume that you meant "significance" in the sense of statistical significance - that is, allowing for noise, does Indiana appear to have been getting warmer since 1890. But the size of the R^2 has to do with the degree of correlation between two variables, and doesn't tell you whether or not the degree of linear trend is statistically significant. So it's not helpful in assessing whether or not Indiana is warming. For that, you have to use a proper linear regression significance test.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete