Handout (709.0 kB)
This effort builds upon two complementary activities. The first is the on-going work at IBM Research connecting the business implications of such issues to weather models, dubbed Deep Thunder. In particular, it is the ability to predict specific events or combination of weather conditions with sufficient spatial and temporal precision, and lead time coupled to the operational impacts to enable proactive optimization of resources to mitigate the effects of variability and uncertainty in weather. The second is on-going work at Instituto de Ingeniería del Conocimiento on the applications of machine learning (ML) methods to weather data that address the large dimensionality of NWP output to improve the predictability of wind power. The goal is targeted machine learning predictions driven by specific model physics.
To investigate the applications of these methods to isolated wind power systems, Deep Thunder is being deployed at a turbulence scale for the Canary Islands to enable more accurate wind power forecasting across the 45 wind farms situated on six of the seven islands making up the archipelago. To resolve turbulent eddies, which contribute to ramp-up or ramp-down events, Deep Thunder is utilizing the Large Eddy Simulation component of the WRF-ARW (version 3.3.1) community NWP model at 668m horizontal resolution. Fifty vertical levels are incorporated with at least ten in the planetary boundary layer to capture the conditions above, below and through the blade extent of the various turbines that are deployed.
Several model configurations were evaluated to generate numerical experiments for retrospective analysis of significant ramping events in 2010 and 2011 as identified via hourly power data from the wind farm operators. The effort is focused on recent events to minimize the impacts of any changes in the wind farms and their operational conditions. Given an eventual goal of enabling operational forecasting, the experiments were done as hindcasts. Therefore, the model has been initialized with 0.5-degree data from NCEP's Global Forecasting System. In addition, scaling experiments were done to maximize the efficiency of the model configuration on a small Power7-based HPC cluster. The first phase is to build a hindcast-based climatology of six months to one year, with one 24-hour forecast per day, initialized at 00 UTC, and output available every five minutes in order to capture the transient nature of wind events. Output variables include those that drive the turbine energy extraction process, namely, turbulent kinetic energy, volumetric vorticity, horizontal and vertical wind velocities as well as near-surface wind gusts and a clear-air turbulence index.
The complex topography of the Canary Islands warrants the utilization of 90m-resolution terrain data from NASA's Shuttle Radar Topography Mission. The oceanic influence on coastal winds and convection requires the NASA 1km-resolution sea surface temperature analyses for model initialization. Since data from automated weather sensors are limited, both validation and variational assimilation are somewhat problematic. As a result of the large domain size, high-resolution and aspect ratio, additional challenges include avoidance of numerical instability and the computational cost associated with the hindcast generation, given the eventual transition from research to operations. These issues can be addressed through domain splitting/parallelization, one-way nesting, and adaptive time-stepping techniques.
ML techniques will be applied to transform NWP output into energy forecasts. Currently ML-based energy forecasts are derived from numerical patterns obtained from NWP outputs derived at the synoptic scale. The approach followed here implies both a large increase of the NWP pattern dimension and of sample size, which goes from eight to at least 24 patterns per day. Even so, sample sizes and dimension will have the same order of magnitude, contrary to the ML rule of thumb of sample size being an order of magnitude greater. The ML algorithms to be used must be able to cope with this situation. Support Vector Machines (SVMs) are, therefore, a natural choice, as SVM models do not rely on individual pattern features, but rather on the overall pattern distribution.
On the other hand, given the emphasis on wind and turbulence data, we expect large correlations between NWP features and that effective NWP pattern dimension will be smaller. This suggests applying dimension-reduction techniques before model construction, such as Principal Component Analysis. An alternative, that we are considering, is modelling systems with a built-in capability for dimensionality reduction, particularly linear regression models coupled with sparsity-enforcing regularization, such as Lasso, Group Lasso and Elastic Net. These methods have a strong theoretical foundation and can be cast under the unifying framework of proximal optimization, which results in efficient training algorithms for them.
We will outline the research objectives of this work and the challenges of enabling it. We will discuss some of the scientific and computational results to date, and lessons that were learned. Since this is work in progress, we will present its current status and our plans for future work.