Data Massage Isn't Science

It's hard for me to take much of the statistical analysis coming out of the climate debate seriously. The game seems to be to massage your data until you get the result you are looking for. Here's an example pointed out by Climate Audit. Kudos to the folks at that site by the way, they are performing an invaluable service and taking a lot of heat for their efforts.
Here's an equation from that article:



... an estimate of a seasonal or quarterly temperature when one month is missing
from the record depends heavily on averages for all three months in that quarter. This can be expressed by the following equation, where are the months in the quarter (in no particular order) and one of the three months is missing:

In the above, T is temperature, q is the given quarter, n is the given year, and N is all years of the record.


You could come up with a dozen formulas for this in theory. Why is this one preferred? Looks like they are trying to fill in missing data by a weighted linear combination of data for neighboring months and the historical difference between the missing month the neighboring months. In the case when the neighboring months' data matches the historical average, the formula returns the historic quarterly mean. Seems reasonable!

It's not clear to me why they need to fill in the missing data in the first place.

0 comments: