Handout (2.8 MB)
This study tested the accuracy of five methods for estimating missing hourly near-surface air temperature from available observations, and the results were compared in terms of mean error and bias. Each method was tested while varying the number of available stations and the duration of the missing period to be estimated. The study also tested the impact of snow cover on the observed air temperature after accounting for elevation using a lapse rate. The estimation methods included three types of interpolation based on elevation, i.e., using a lapse rate: an hourly lapse rate, an hourly lapse rate combined with kriging interpolation and a long-term constant lapse rate. The other two methods were diurnal cycle interpolation within a time series and spatio-temporal correlations (EOFs) among multiple stations. The comparison used dense networks of hourly surface temperature observations from five different regions in complex terrain to determine the accuracy of each method. The largest dataset of observations was from the NOAA Hydrometeorological Testbed (HMT) in the American River Basin, Northern Sierra Nevada, California.
The results indicated that spatio-temporal correlations using EOFs were more accurate than lapse rates or temporal interpolation, provided that multiple stations were available. Specifically, EOF-based filling was found to perform better when at least approximately 10 stations were available. Temporal interpolation was the most accurate method when only one or two stations were available, or when only one hour of data was missing. Errors in lapse rate estimation were found to be related to the presence of cold air pools and other surface features. Snow cover was found to suppress near-surface air temperature by approximately 1°C after controlling for elevation in the Northern Sierra dataset. None of the tested methods were found to have significant bias in estimating temperatures. From these findings we present guidelines for choosing an estimation method based on the duration of the missing data and the number of stations.