Monday, 13 May 2002: 1:15 PM
Data Mining: Gold or Pyrite
Daily weather observations in Kentucky have been digitized for the period extending backward from 1896 to 1825. These digitized data include the temperatures, precipitation amounts, and snowfall. Most of these data were read from copies of the original observer forms that were available from microfilm of the originals that reside in the National Archives. Some data were read from the original forms that are held by the Kentucky Archives and Library in Frankfort, Kentucky. The newly produced digital data set is being evaluated by the Midwestern Regional Climate Center. It becomes the first readily available data for this early period in Kentucky. It also becomes one of the first looks at daily weather data in early America during those seventy-one years. Even so, there are enormous quantities of data and information yet to be digitized from these records. For example, the Smithsonian Institutionís observer forms from the mid 1860ís solicited thirty-four weather data entries for each day. In addition, the observers were encouraged to provide written comments to amplify the observations or to describe other weather related weather phenomena or impacts. These hand-written observer comments are a prospective target for data mining. This paper presents an assay of the mined data from the Kentucky digitization effort. The data impurities are identified, their recognition characteristics are defined, and their potential impacts on the value of the data are discussed. The comments of the observers will become an unexpectedly rich lode because of their detailed and qualitative description of both climate and its impact. However, their subjective nature means that they must go through a refining process before their value can be realized. These comments, yet to be recovered in any systematic way, may turn out to be most valuable to future researchers because of their scope and the variety of potential applications.