ML forecasts are made in two phases: for individual storm cells, then spatially, for each 1-km grid cell.
The ML system ingests three types of input data: radar images from the Multi-year Reanalysis of Remotely Sensed Storms (MYRORSS), forecast soundings from the Rapid Refresh (RAP) model, and surface wind observations from both weather stations and human reports (National Weather Service 2016).
The input data are transformed in four ways. First, storm cells are detected and tracked in time, using the radar images. Second, wind observations are linked to nearby storm cells. This helps to determine which storm cell, if any, if responsible for each wind observation. Third, predictor variables are computed for each storm object (one “storm object” = one storm cell at one time step). Predictor variables are based on the distribution of radar variables (e.g., composite reflectivity, -10 °C reflectivity, vertically integrated liquid, maximum estimated hail size, etc.) inside the storm cell, its shape and velocity, and the RAP sounding interpolated to the time and position of the storm cell. Fourth, each storm object is labeled (either “yes” or “no”), indicating whether or not it is responsible for severe wind.
In the non-spatial phase of the forecasting system, ML models are used to forecast the probability of severe wind for each storm object at different time horizons (0-15, 15-30, 30-45, 45-60, 60-90, and 90-120 minutes ahead) and distance buffers (inside, 0-5 km outside, and 5-10 km outside the storm cell).
In the spatial phase, a 1-km-resolution probability map is created for each time horizon. This involves four steps, repeated for each storm object S and distance buffer D.
Normalize the forecast probability by area. Specifically, normalized_probability = original_probability * area_of_distance_buffer / [p * (10 km)2]. Thus, the forecast at each grid cell will be probability of severe wind over a 10-km radius.
Sample many (thousands of) storm-motion vectors from a probability distribution.
For each sampled motion vector, extrapolate the storm object over the time horizon (e.g., 30-45 minutes into the future).
For each sampled motion vector, add the normalized probability to all grid cells inside the distance buffer D, then recompute the average probabilities at these grid cells.
Finally, the forecast probability at each grid cell is tuned by isotonic regression, which corrects bias and makes the probabilities more reliable.
The above-described ML system was deployed in the 2017 Spring Forecasting Experiment (SFE), with forecast maps updated every two minutes. Since the SFE we have explored deep learning, which can learn from 4-D spatiotemporal data without the precomputation of predictor variables. We will compare results from the two approaches.
Blouin, Karen D., et al. "Ensemble lightning prediction models for the province of Alberta, Canada." International Journal of Wildland Fire 25.4 (2016): 421-432.
Gagne II, David John, et al. "Day-Ahead Hail Prediction Integrating Machine Learning with Storm- Scale Numerical Weather Models." AAAI. 2015.
McGovern, Amy, D.H. Rosendahl, and R.A. Brown. “Toward Understanding Tornado Formation Through Spatiotemporal Data Mining.” In: Data Mining for Geoinformatics: Methods and Applications, eds. Cervone, Guido, Jessica Lin, and Nigel Waters. New York: Springer. 29-47.
McGovern, Amy, K. Elmore, D.J. Gagne II, S.E. Haupt, C.D. Karstens, R. Lagerquist, T. Smith, and J.K. Williams. “Using artificial intelligence to improve real-time decision making for high-impact weather.” Bulletin of the American Meteorological Society (2017).
National Weather Service, 2016: Storm Data preparation. National Weather Service Instruction 10-1605. Tech. rep., available at http://www.nws.noaa.gov/directives.
Williams, John K. "Using random forests to diagnose aviation turbulence." Machine Learning 95.1 (2014): 51-70.