Development of the International Surface Temperature Initiative's Global Land Surface Databank

Rennie, Jared; Rennie, Jared

The International Surface Temperature Initiative (ISTI) consists of an effort to create an end-to-end process for land surface air temperature analyses. The foundation of this process is the establishment of a global land surface databank. The databank builds upon the groundbreaking efforts of scientists who led efforts to construct global land surface datasets in the 1980's and 1990's. A primary aim of the databank is to improve aspects including data provenance, version control, openness and transparency, temporal and spatial coverage, and improved methods for bringing the dozens of source data together into an integrated merged dataset. The databank consists of six stages, with each successive stage providing a higher level of processing, quality and integration. A databank working group is focused on establishing Stage 0 (original observation forms) through Stage 3 data (merged dataset without quality control). Quality controlled (Stage 4) and bias corrected (Stage 5) data, although, not the focus of this working group, will be integrated as they are developed by other working groups within the ISTI.

Currently more than 40 sources of data have already been added to the databank. Although collection of new source datasets is ongoing, efforts have been made to develop the initial version of the Stage 3 merged dataset. This involves development of automated algorithms for removing duplicate station records, identifying two or more station records that can be merged into a single record, and incorporating new and unique stations. The program runs iteratively through all the sources which are ordered based upon criteria established by the ISTI. The highest preferred source, known as the master, runs through all the candidate sources, calculating station comparisons that are acceptable for merging. The process is Bayesian in approach, and the final fate of a candidate station is based upon metadata matching and data equivalence criteria. If there is not enough information, then the station is withheld for further investigation. The algorithm has been validated using a pseudo-source of stations with a known time of observation bias, and correct matches have been made nearly 95% of the time.

The final Stage 3 product contains over 40,000 stations, however slight changes in the algorithm can perturb results. Subjective decisions, such as the ordering of the sources, or changing metadata and data matching thresholds, can yield a different outcome. In order to address the uncertainty, multi-member ensembles of the merge program have been produced based upon expert decisions from the databank working group. All data and code will be provided openly and without charge, which facilitates easy access and ease of use by anyone in the international community. We strongly encourage the use of these data and feedback on any relevant aspect of the Databank effort from interested parties.

7.3 Development of the International Surface Temperature Initiative's Global Land Surface Databank