Application of a Random Forest Approach to Model Output Statistics for use in Day Ahead Wind Power Forecasts

Natenberg, Edward J.; Natenberg, Edward J.

Balancing authorities, which include independent system operators and electric utilities, use day-ahead variable generation (e.g. wind and solar) forecasts to plan for and allocate the day-ahead generation resources. Currently, wind provides the majority of variable power production. Thus, the day-ahead wind forecast is an essential tool to estimate wind-generated power that will be available on the grid during the next operating day. New techniques that can reduce day-ahead wind power forecast error will provide additional value to balancing authorities and ultimately reduce the cost of integrating wind into electric systems. One of the primary tools used to produce the day-ahead forecasts are numerical weather prediction (NWP) models. In almost all day-ahead forecasting systems, estimates of wind power production come from a number of different NWP models with varying configurations (grid cell size, formulation of model physics, data assimilation schemes etc.). NWP forecasts often exhibit systematic biases. These biases can vary by location, time of day, season and other factors. Various forms of statistical models, known as model output statistics (MOS), are commonly used to minimize the impact of model bias on a forecast of wind power generation. MOS approaches that have been employed include linear regressions, support vector machines, and artificial neural networks. Although these techniques typically decrease forecast error and minimize bias, they have several limitations. Artificial neural networks require long training periods and tend to overfit training data. Support vector machines are slow to train over large datasets and depend on the choice of kernel parameters while linear regression requires an assumption of normality and is overly sensitive to outliers. These limitations make the ensemble decision tree technique known as random forest a promising and interesting method that may be applied to NWP forecasts in order to reduce model forecast error.

The random forest technique is a machine learning method that improves on the decision tree approach with two key adjustments. A traditional decision tree is a set of questions linked through a hierarchical structure that subdivide a function into discrete values based on a training set. The questions are determined through an exhaustive search of predictors. Traditional decision trees are very sensitive to small changes in the training data and can easily overfit. Random forests address this problem by training an ensemble of trees on resampled versions of the original training set to explore the range dataset variability. The questions in the trees are selected from a random subset of the predictors. The random forest model produces more accurate predictions, is less likely to overfit, and accounts for nonlinear interactions among predictor variables. It also has a short training time and has built-in variable selection.

The random forest technique was used to predict direct power production for a group of wind farms in Texas from wind forecasts at the level of the wind turbine (hub height) typically 80-m above ground level. The technique was applied to day-ahead forecasts for individual wind farms while forecast performance was examined using both the forecasts for individual wind farms and regional aggregates of wind farms. A large set of model state variables including wind speed, temperature and geopotential height at various levels were used for training of the random forest.

For this presentation, results using random forest were compared with linear regression-based MOS approaches; one of which is trained from observed power production and a second approach using observed wind speed. Both approaches were run using a training sample of different sizes. Initial results show that random forest forecasts decrease forecast mean absolute error when given a sufficiently large and diverse training set. Adding additional NWP forecast variables beyond those derived from 80-m hub height wind speed also produced an improved forecast. The presentation will highlight the sensitivity of forecast performance to the number of decision trees, variables per node, training sample size, variable selection, and the use of regime-based predictors as well as a comparison to a screening multiple linear regression approach with a similar sample size.

879 Application of a Random Forest Approach to Model Output Statistics for use in Day Ahead Wind Power Forecasts