89th American Meteorological Society Annual Meeting

Tuesday, 13 January 2009: 8:30 AM
A Data Mining Approach to Soil Temperature and Moisture Prediction
Room 125A (Phoenix Convention Center)
William Myers, NCAR, Boulder, CO; and S. Linden and G. Wiener
Poster PDF (41.6 kB)
Agriculture is a critical sector of the US economy. Both weather and soil conditions are important input to agricultural decision-making process. However, both the weather and the soil predictions necessary to adequately model the agricultural environment at field scales are currently lacking. For example, phenological pest models predict the evolution of an organism's life stages based on the temperature of its environment. These models generally use only daily maximum and minimum air temperatures to estimate the continuum of conditions affecting the organism. These gross temperature bounds can be poor surrogates for temporally higher resolution air and soil temperature forecasts that are specific to a farm's microclimate.

The current generation of physically-based land-surface models has mixed results in soil forecasting across a variety of observational sites. Many of the complexities must be parameterized, such as limited land-use and soil type categories, and roughness length. These parameters can be tuned at a particular site to obtain improved soil forecasts. However, this is a complex, time-consuming process.

This paper describes a data mining approach to soil temperature and moisture prediction at observational sites. Historical observation data at a number of field sites were input into a data mining package called Cubist. Cubist was utilized to generate a set of site-specific rules and regression equations for predicting soil temperature and moisture based on the current state of the soil and the driving atmospheric conditions. Cubist is a regression tree package that partitions its input data into a number of subsets. For each subset, it develops a set of regression equations that approximates the complex non-linear processes that are inherent in the corresponding data. Application of the Cubist algorithm to input evaluation data involves automatically matching the input evaluation data to their appropriate subset, then applying the associated multivariate linear equation.

The data mining forecasts out to 60 hours show errors that are roughly half as large as those of the physically based model. While this is encouraging, this data mining approach has several drawbacks relative to the physically-based models. These results and caveats are discussed in detail in the paper.

Supplementary URL: