Statistical Modeling of Wind Plant Power Production: Contrasting Different Machine Learning Algorithms and Atmospheric Features

Optis, Mike; Optis, Mike

Machine learning is often applied in the wind energy industry to statistically model wind plant performance using atmospheric data as input features. As machine learning sees more widespread use, it is important to understand how different algorithms and different combination of atmospheric features impact the predictions. Not only is it important to choose the best performing model, but to also quantify the uncertainty associated with different model choices.

In this work, we examine the impact of model choice when statistically modelling hourly power production from an operational wind plant. Such models are commonly applied in the wind industry both to estimate long-term annual energy production based on a shorter period of operational data, and to estimate lost energy due to plant downtime (e.g. maintenance, grid curtailment). Given that wind plant revenue is typically on the order of $10M annually for an average-sized (e.g. 100 MW) wind farm, small differences in predicted plant energy production from different statistical models can have considerable financial implications.

Data sources for this study were downtime-corrected turbine power output for a wind farm in the Pacific Northwest and meteorological measurements from an on-site met mast over the period July 2016 to March 2017. We considered several atmospheric measurements as input features, including wind speed, wind direction, air density, standard deviation of wind speed, wind shear, potential temperature gradient, temperature, turbulent kinetic energy, and the Obukhov length. Measurement heights were at 17 m and 80 m above ground level. We considered four machine learning algorithms: a neural network (NN), a generalized additive model (GAM), and two types of regression tree algorithms: gradient boosting (GB) and extra trees regression (EX).

Statistical models of wind plant energy production were built for each of the four algorithms first by using a single atmospheric feature (wind speed) and then successively adding features in the order listed in the preceding paragraph (i.e. final model setup had 9 features). For each setup, algorithm parameters (e.g. neural network inner layer size, maximum number of splits in regression tree) were optimized using a cross-validated randomized grid search over the parameter grid.

Figure 1 shows mean modelled and observed diurnal power output for each model setup trained on the entire dataset. The pronounced diurnal cycle featuring a noon minimum and early evening peak is characteristic of thermally-driven flow in the Columbia Valley. Not surprisingly, wind speed was the dominant atmospheric feature driving the predicted result, while the remaining features had lesser but still considerable importance. In general, models with more input features tended to more accurately capture the diurnal cycle. The regression tree-based models (GB and EX) showed strong agreement on average when all atmospheric features were considered, whereas the NN and GAM models showed bias. Considering all algorithms and the 9 combinations of input features, uncertainty in the mean diurnal power profile was lowest at 1.5% in the evening and highest at 4.5% in noon. Considering only the 9-feature model setups, root-mean squared error (RMSE) normalized by mean power was highest for NN and GAM (4.6% and 4.8%, respectively) and lowest for the GB and EX regression tree-based models (1.6% and 2.1%, respectively).

Each model was then trained on 80% of concurrent data, and the result was used to estimate power production in the remaining 20% concurrent period (a typical approach when estimating lost energy due to curtailment). Only the 80-20% split was considered here, although other combinations (e.g. 90-10%, 95-5%) would certainly be relevant in this context. Considering all algorithms and the 9 combinations of input features, uncertainty in predicted power during the 20% test period was lowest at 1.0% in the summer and highest at 3.8% in the winter. These differences were driven by seasonal changes in the wind speed-power correlation, prevailing wind direction, and mean stratification. Considering only the 9-feature model setups, normalized RMSE was slightly higher for NN and GAM (both 5.4%) compared to GB and EX (5.2% and 4.8%, respectively) over the whole dataset. However, these errors were highly seasonal. For example, normalized RMSE for GB was 1.5% in July and August but 8.2% from October through January.

The range in model performance demonstrated in this study highlights the importance of building a robustly validated statistical model using a representative set of atmospheric features. In particular, the use of turbulence and stratification measurements was found to considerably improve model accuracy. Regression tree-based models performed relatively well in this single case study; however, extending this analysis to a larger set of wind plants is necessary to develop more representative comparative statistics. Furthermore, the neural network algorithm considered here was fairly basic, and improvements in model accuracy would likely be obtained by using more modern deep learning algorithms.

J3.2 Statistical Modeling of Wind Plant Power Production: Contrasting Different Machine Learning Algorithms and Atmospheric Features