Tuesday, 25 January 2011: 1:30 PM
2A (Washington State Convention Center)
When applying machine learning methods, e.g. artificial neural networks (ANN) and support vector machines (SVM), to nonlinear regression problems in the environmental sciences, it is common to encounter a large number of correlated predictors-- hence a need to effectively prescreen the set of predictors down to a smaller subset, especially when the training sample size is not large. Three atmospheric forecast problems -- surface air temperature, precipitation and sulphur dioxide concentration -- were tested. These three contain different amounts of nonlinearity and noise. The first two problems each had 106 predictors and the third problem, 27 predictors, and all were trained using 500 observations. A variety of prescreening techniques were tested, including a genetic algorithm (GA), the stepwise linear regression and the M5 regression tree. SVM generally outperformed ANN, though for temperature forecasting, even SVM was only marginally ahead of multiple linear regression (MLR), while ANN trailed both. For precipitation, both nonlinear methods outperformed MLR, especially for SVM, and prescreening by GA very effectively reduced the number of predictors. Similarly, for sulphur dioxide, SVM outperformed MLR, and GA prescreening was effective for both SVM and ANN -- with GA not only drastically reducing the number of predictors, but also reducing the mean absolute error of the SVM and ANN forecasts. We conclude that when there is significant nonlinearity in the predictor-predictand relation, SVM tends to outperform ANN, and both tend to outperform MLR, and GA is generally effective in prescreening the predictors, leading to a smaller set of predictors and smaller forecast errors.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner