Completing the Meteorological Archive Missing Data at the Daily and Subdaily Time Scales

Osetinsky-Tzidaki, Isabella; Osetinsky-Tzidaki, Isabella

Completing the missing data in the meteorological archives at daily and sub-daily time scales is of high importance due to numerous applications such as: (a) climatological tables, atlases and maps, which require the completed observational data for all participating stations over a common period of years; (b) heating and cooling degree-days and degree-hours indices; (c) hourly-based high percentile indices for assessing the climate changes in the extreme temperature occurrences, etc.

The algorithms for completing the missing data at daily and sub-daily time scales were developed and implemented for the Israel Meteorological Service Archive, which include the observations recorded at the manual and automatic weather stations.

The daily missing data completing algorithm. Each daily element, such as maximum temperature or minimum temperature, is to be completed separately with the presented algorithm per station-element-month-year. The core principle of this algorithm is finding a few best-correlated reference stations for given station for given calendar month in a reference year. Due to the Israeli complex topography, various climate zones and diversity of synoptic conditions, no station is predefined as the best reference station for any station for any calendar month. Each station is considered as a possible candidate to serve as a reference station for any other station for any month. Completing any missed element for given station in given month in given year is based on the statistical relation between given station and reference station in a reference year. A reference year is either given year if no more than one-third of daily data of given station is missing, or another year in which given station has almost all daily data for given calendar month. In the latter case, the algorithm is searching for a reference year as close as possible to given year, to minimize a possible effect of (a) climate variations and (b) changes in the station's environment, equipment and maintenance on the completed data. Once a reference year is found, the correlation coefficients are being calculated between the daily data of given station and each candidate station having at least two-thirds of daily data in given calendar month in a reference year. All candidate stations are then ranged in the decreasing order of their correlation coefficients.

After constructing a ranged set of the candidate reference stations starting from the best-correlated one, all missing daily data for given station in given month in given year are being filled in with the completing procedure as follows:

(i) Calculating the linear regression coefficients between the daily data of given station and first best-correlated station in a reference year.

(ii) Applying these coefficients to the reference station data in given month in given year, to fill in the missing data for given station.

(iii) If not all missing daily data for given station in given month have been completed with the first reference station, due to some missing data of a reference station itself in given month in given year, the completing procedure is repeated with the next to the best-correlated station, and so on, until all missing daily data for given station in given month in given year have been completed.

Then, the algorithm proceeds to the next month for given station-element-year and starts with finding a reference year as described above.

The sub-daily missing data completing algorithm. This algorithm is aimed for filling in the missing data in the automatic weather stations' data records. It is basically identical to the daily data completing algorithm, but furthermore, it provides completing the missing data on hourly and 10-minute time scales. Therefore, the completing scheme is added up with the preprocessing and post-processing procedures.

Preprocessing. All small gaps, up to 9 consecutive missing data in the 10-minute data records, are being filled with a spline or linear interpolation. Then, the hourly temperatures are calculated by averaging each six 10-minute observations.

Completing. The algorithm for completing the hourly missing data is similar to the above described one, with a difference that instead of daily maximum or daily minimum in given month, the elements are 24 hourly temperatures, being completed per station-hour-month-year. Applying the completing algorithm directly on the 10-minute data record is not recommended, because this produces very noisy time series. However, completing the hourly data provides much more robust and reliable results.

Post-processing. The fully-completed hourly temperature time series are being downscaled back to the 10-minute time resolution. This transformation is being done with the lowpass Butterworth filter applied to the 10-minute data where each six data are merely repetitive values of the corresponding hourly temperature value. Then, in order to retain the recorded 10-minute observations, they are substituted into the filtered 10-minute time series. Thus, the full 10-minute data records are being produced. The obtained results have many applications, in part: (a) filling in the missing data in the climatological archives at the standard synoptic times, 00Z, 03Z, ..., 21Z; (b) updating the daily maximum and minimum; etc.

In the same way, the missing humidity data may be completed. The above algorithm described for the dry temperature Tdry may be applied to the wet temperature Twet as well. After completing the Twet missing data, the relative humidity RH missing data are to be calculated as a function of [Tdry, Twet, Pressure at station]. Completing the RH data directly is not recommended. Instead, if the automatic station is equipped with a sensor producing RH, it is worth to first transform RH into Twet as a function of [Tdry, RH, Pressure at station], then apply the described completing algorithm, and finally transform Twet back into RH.

The presented algorithms allow: (1) completing the missing data in any data record; (2) filling in the years which are beyond a station's life span, in order to bring the short stations to the common observational period when required.

Ref:

S. Kotsiantis, A. Kostoulas, S. Lykoudis, A. Argiriou, K. Menagias, 2006. Filling Missing Temperature Values in Weather Data Banks, 2nd IEE Int. Conf. on Intelligent Environments, 5-6 July, 2006, Athens, Greece, Vol 1, pp. 327-334.

533 Completing the Meteorological Archive Missing Data at the Daily and Subdaily Time Scales