In this work, our main goals are to reduce the number of parameters requiring adjustment, to use an accurate initialization of the parameter search, and to discard irrelevant and redundant predictors. We proposed a hybrid algorithm called SVR-ES which uses a simple evolutionary strategy called "uncorrelated mutation with one step size" to find the optimal SVR hyper-parameters. We also combined the SVR-ES with stepwise linear regression (SLR) by using SLR to screen out irrelevant predictors.
Three atmospheric forecast problems -- surface air temperature (TEMP), precipitation (PRECIP) and sulphur dioxide (SO2) concentration -- were tested. These three problems contained different amounts of nonlinearity and noise. The first two problems each had 106 predictors and the third problem, 27 predictors. A variety of machine learning techniques were compared, including bagging and ensemble ANN, SVR with the Cherkassky-Ma estimate of the hyper-parameters, the M5 regression tree, random forrest (RF), and these techniques with the use of SLR to reduce the number of predictors.
We found different optimal (sub-optimal) SVR hyper-parameters for different objective minimization -- i.e. mean absolute error (MAE) or mean squared error (MSE). The Cherkassky-Ma estimate is a useful approach to minimizing the MAE error without requiring costly computation for the hyper-parameter search. However, for the MSE, it was necessary to implement an optimization algorithm using MSE as the objective function, and for this task we used the SVR-ES approach.
We concluded that SVR tends to outperform the other techniques with a correct heuristic such as an efficient search algorithm, a good initialization of the hyper-parameter search, and an appropriate objective function. In particular, the use of SVR-ES is attractive because it can be implemented in an almost automatic way. However, it is more time consuming than the other methods, especially when there is a large amount of data points. The ANN was also a good option to outperform multiple linear regression (MLR). However, the techniques (bagging and ensemble) used to avoid local minima increased the training time. The M5 tree had poor accuracy compared to the other techniques. On the other hand, RF was a strong competitor due to lower computational cost than ANN and SVR-ES, and an accuracy always better than MLR. For the SO2 data set, RF actually finished first among all the methods.
In conclusion, the nonlinear methods all outperformed MLR (with M5 the exception). SVR-ES is a promising method, outperforming the other methods when the data were PRECIP and TEMP, and losing only to RF when the data was SO2. Finally, the use of SLR for predictor selection can dramatically reduce computational time and often help to enhance accuracy.
Supplementary URL: