92nd American Meteorological Society Annual Meeting (January 22-26, 2012)

Wednesday, 25 January 2012: 11:30 AM
Comparing the Performance of Several Machine Learning Methods in Environmental Problems
Room 242 (New Orleans Convention Center )
Aranildo Rodrigues Lima Jr., Univ. of British Columbia, Vancouver, BC, Canada; and A. J. Cannon and W. W. Hsieh

In machine learning, computer algorithms attempt to automatically distill knowledge from data, so as to construct a model capable of making predictions from novel data in the future. However, building a successful predictive model is not easy, as environmental modeling problems are typically very noisy. Furthermore, in many environmental problems (e.g. statistical downscaling), there is a large number of potential predictors, many of which are irrelevant or redundant. Thus, a researcher is confronted with a choice of many types of machine learning methods and often a large number of potential predictors. Even for a particular machine learning method, there is often more than one way to build the model. For example, in artificial neural network (ANN) models, the number of hidden processing units, the choice of activation functions and the regularization parameter all needed to be specified. Similarly, in support vector machines (SVM) for regression (SVR), typically three hyper-parameters have to be tuned. Common approaches to tuning the SVM hyper-parameters include using evolutionary algorithms (EA), particle swarm optimization (PSO), and evolutionary strategies (ES).

In this work, our main goals are to reduce the number of parameters requiring adjustment, to use an accurate initialization of the parameter search, and to discard irrelevant and redundant predictors. We proposed a hybrid algorithm called SVR-ES which uses a simple evolutionary strategy called "uncorrelated mutation with one step size" to find the optimal SVR hyper-parameters. We also combined the SVR-ES with stepwise linear regression (SLR) by using SLR to screen out irrelevant predictors.

Three atmospheric forecast problems -- surface air temperature (TEMP), precipitation (PRECIP) and sulphur dioxide (SO2) concentration -- were tested. These three problems contained different amounts of nonlinearity and noise. The first two problems each had 106 predictors and the third problem, 27 predictors. A variety of machine learning techniques were compared, including bagging and ensemble ANN, SVR with the Cherkassky-Ma estimate of the hyper-parameters, the M5 regression tree, random forrest (RF), and these techniques with the use of SLR to reduce the number of predictors.

We found different optimal (sub-optimal) SVR hyper-parameters for different objective minimization -- i.e. mean absolute error (MAE) or mean squared error (MSE). The Cherkassky-Ma estimate is a useful approach to minimizing the MAE error without requiring costly computation for the hyper-parameter search. However, for the MSE, it was necessary to implement an optimization algorithm using MSE as the objective function, and for this task we used the SVR-ES approach.

We concluded that SVR tends to outperform the other techniques with a correct heuristic such as an efficient search algorithm, a good initialization of the hyper-parameter search, and an appropriate objective function. In particular, the use of SVR-ES is attractive because it can be implemented in an almost automatic way. However, it is more time consuming than the other methods, especially when there is a large amount of data points. The ANN was also a good option to outperform multiple linear regression (MLR). However, the techniques (bagging and ensemble) used to avoid local minima increased the training time. The M5 tree had poor accuracy compared to the other techniques. On the other hand, RF was a strong competitor due to lower computational cost than ANN and SVR-ES, and an accuracy always better than MLR. For the SO2 data set, RF actually finished first among all the methods.

In conclusion, the nonlinear methods all outperformed MLR (with M5 the exception). SVR-ES is a promising method, outperforming the other methods when the data were PRECIP and TEMP, and losing only to RF when the data was SO2. Finally, the use of SLR for predictor selection can dramatically reduce computational time and often help to enhance accuracy.

Supplementary URL: