The current generation of physically-based land-surface models has mixed results in soil forecasting across a variety of observational sites. Many of the complexities must be parameterized, such as limited land-use and soil type categories, and roughness length. These parameters can be tuned at a particular site to obtain improved soil forecasts. However, this is a complex, time-consuming process.
This paper describes a data mining approach to soil temperature and moisture prediction at observational sites. Historical observation data at a number of field sites were input into a data mining package called Cubist. Cubist was utilized to generate a set of site-specific rules and regression equations for predicting soil temperature and moisture based on the current state of the soil and the driving atmospheric conditions. Cubist is a regression tree package that partitions its input data into a number of subsets. For each subset, it develops a set of regression equations that approximates the complex non-linear processes that are inherent in the corresponding data. Application of the Cubist algorithm to input evaluation data involves automatically matching the input evaluation data to their appropriate subset, then applying the associated multivariate linear equation.
The data mining forecasts out to 60 hours show errors that are roughly half as large as those of the physically based model. While this is encouraging, this data mining approach has several drawbacks relative to the physically-based models. These results and caveats are discussed in detail in the paper.
Supplementary URL: