This study assesses the utility of various data inputs and machine learning techniques for the prediction of 15-minute average power and the 60-minute power ramp rate at lead times of 0 to 3 hours at various aggregates of wind farms in the Tehachapi Pass region of California. Input data includes 1) observations of wind speed and power from 17 wind farms in the Tehachapi region as well as 4 sub-aggregates and the total aggregate, and 2) offsite observations deployed specifically to improve the very short-term wind power forecast.
The specially deployed observing platforms include: 2 SODARS that observe the wind profile up to 600 m, 1 mini SODAR (200 m range), 2 radiometers (temperature and moisture profile) and 1 radial wind profiler/RASS that can observe winds up to 3500 m and temperature to 1500 m. These observing platforms are roughly along a line that runs from the location of most of the wind turbines about 80 km upstream along the prevailing northwesterly flow direction. Predictor variables include both the unprocessed observations as well as various derived values including vertical and horizontal differences, temporal differences and regional averages.
Forecasts are generated using the following methods: 1) Analog Ensemble (AE), Random Forest (RF) and Gradient Boosted Model (GBM). The AE method selects an ensemble of similar cases by picking cases from an historical sample for which the input variables most closely resemble those in the current case. RF creates a prediction from the mean of the prediction of a number of decision trees. GBM is a machine learning technique that uses decision trees to generate a weighted prediction model where each tree is used to predict the residual model error of the previous iteration.
Observed data is available for all of 2015 and 2016 through at least July. Since AE, RF and GBM tend to perform the best when a long training period is used, forecasts are evaluated from January 1 through July 30, 2016 with a rolling training sample that begins at the beginning of the data-availability period and extends to just prior to forecast issue time.
Deterministic forecasts consisting of the AE mean and 50% probability of exceedance as well as the RF and GBM forecast were compared to the following benchmarks: a simple persistence forecast, a persistence of trend forecast, a climatological trend forecast and a standard NWP forecast bias corrected using screening multiple linear regression.
The conference presentation will focus on the overall performance relative of the machine learning methods relative to the benchmarks and each other. It will examine the effect of various input parameters and the selection of training variables on the performance of each technique.