Lessons in Diversity: How 40 Different Data Sources Were Combined to Create Version 2 of the Integrated Global Radiosonde Archive

Durre, Imke; Durre, Imke

Assessments of climatic trends require the use of reliable long-term records of meteorological observations. A frequently overlooked aspect of creating such records is the need to merge data from multiple sources into one coherent dataset that contains exactly one time series for each unique observing site. If done incorrectly, the resulting dataset may contain either overlapping duplicate records for the same location or single time series comprising data from multiple clearly distinct stations.
Version 2 of the Integrated Global Radiosonde Archive (IGRA), for example, was constructed from 40 distinct data sources which differed in period of record, spatial extent, availability and precision of variables, and station metadata. Each source contained at least some unique observations, while also partially overlapping in space and time with at least one other source. In order to facilitate the correct integration of this diverse set of source data records, a decision-making algorithm was designed that utilizes multiple pieces of information about the data and metadata. The process first identifies all matching pairs of stations on the basis of data similarity, station identifiers, distance, and station names, then removes stations with significant data or metadata problems, and finally determines how the remaining source stations should be combined and which should be added as single-source stations. In this presentation, we will describe the types of problems encountered, illustrate how they were handled, and present the result of applying the algorithm to the IGRA data sources.

3.6 Lessons in Diversity: How 40 Different Data Sources Were Combined to Create Version 2 of the Integrated Global Radiosonde Archive