2002 Annual

Thursday, 17 January 2002: 10:59 AM
Station history database architectural techniques
Jeffrey Arnfield, NOAA/NCDC, Asheville, NC; and G. Shears
Poster PDF (118.8 kB)
As part of its Climate Database Modernization Program (CDMP) to increase the scope, accuracy and accessibility of its data holdings, the National Climatic Data Center (NCDC) worked with contractors to create Mi3, a new system to maintain and query station history information. A number of technical issues had to be addressed in order to develop an effective database architecture. This presentation describes the approaches taken to meet several such technical challenges.

The issue of valid date ranges was a critical concern, and was a shortcoming in previous systems. Each component of a station - its location, equipment, observers, reporting methods, even its identity - can change independently of other components, so each component record has its own period of validity in the form of beginning and ending dates. These date ranges may overlap the valid dates for other data items, making queries and reporting complex. The paper will discuss in depth the approach taken to managing and querying these date pairs.

Station information comes from many sources, and knowing the source for a given piece of information. Some are formal, like a National Weather Service Form B-44; others are less so, like ad hoc research or e-mail confirmation.

The best-intended corrections sometimes overwrite valid data. While mistakes must be corrected, discarding original data values may be perilous. There is no audit trail unless previous the value and information source are retained when erroneous data values are corrected. The technique used to maintain a change log is discussed.

While a well-normalized relational structure provides good data integrity, it can present performance challenges due to the large number of tables involved in queries. With a nod to the realities of using a system in a production environment, we look at some of the techniques used to improve query performance.

Supplementary URL: