Numerical model output from the GFS and a convection-allowing WRF run were used as input into the machine learning models. Observations of solar irradiance came from the Oklahoma Mesonet. The machine learning models were trained to predict clearness index, a ratio of observed to top-of-atmosphere irradiance. Multiple configurations of random forest, gradient boosted regression, and Lasso linear regression were evaluated. Two data aggregation and interpolation configurations were used. The Multi Site models trained machine learning models on data from all training sites and applied them using local model output near testing sites. Single Site models were trained at each training site and then had their predictions interpolated to testing sites using a Cressman scheme.
The results of the experiment showed that the Multi Site model configuration generally outperformed the Single Site configuration for all machine learning models tested. Gradient Boosting regression with a mean absolute error loss function outperformed other model types. Site performance was correlated with the average amount of clearness and distance from training sites. All models tended to underestimate the effects of cloud cover on irradiance.