J7.1 The Performance Impacts of Machine Learning Design Choices for Gridded Solar Irradiance Forecasting

Thursday, 26 January 2017: 3:30 PM
606 (Washington State Convention Center )
David John Gagne II, NCAR, Boulder, CO; and S. E. Haupt, A. McGovern, J. K. Williams, and S. Linden

Handout (3.1 MB)

In the rapidly growing solar energy sector, there is an increased need for solar forecasts at locations with limited or no history of observations. Gridded statistical forecasting systems are designed to integrate numerical model output with observations and static datasets to produce calibrated forecasts on a regular grid. The accuracy of the gridded predictions depend on both the choice and configuration of the statistical models used in the forecasting system. This project compares multiple configurations of machine learning models to determine which one produces the lowest solar irradiance forecast error and best captures the distribution of observations.

    Numerical model output from the GFS and a convection-allowing WRF run were used as input into the machine learning models. Observations of solar irradiance came from the Oklahoma Mesonet. The machine learning models were trained to predict clearness index, a ratio of observed to top-of-atmosphere irradiance. Multiple configurations of random forest, gradient boosted regression, and Lasso linear regression were evaluated. Two data aggregation and interpolation configurations were used. The Multi Site models trained machine learning models on data from all training sites and applied them using local model output near testing sites. Single Site models were trained at each training site and then had their predictions interpolated to testing sites using a Cressman scheme.

    The results of the experiment showed that the Multi Site model configuration generally outperformed the Single Site configuration for all machine learning models tested. Gradient Boosting regression with a mean absolute error loss function outperformed other model types. Site performance was correlated with the average amount of clearness and distance from training sites. All models tended to underestimate the effects of cloud cover on irradiance.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner