4.3 Using Machine Learning to Predict Straight-line Convective Wind Hazards Throughout the Continental United States

Tuesday, 24 January 2017: 11:00 AM
310 (Washington State Convention Center )
Ryan A. Lagerquist, University of Oklahoma, Norman, OK; and A. McGovern and T. Smith

In the past decade, machine learning (ML) has led to significant improvements in the prediction of thunderstorm hazards such as tornadoes (McGovern et al. 2014), hail (Gagne et al. 2015), aircraft turbulence (Williams 2014), and lightning (Blouin et al. 2016). However, very few studies have focused on straight-line, for which hazardous events are much more common than tornadoes (National Severe Storms Laboratory n.d.). We have developed an ML system that predicts the probability of moderate (> 30 kt) and severe (> 50 kt) wind at lead times up to 90 minutes. Predictions are made for each storm cell in the continental United States, then storm-cell-wise probabilities are interpolated to a grid.

We use three types of input data: radar images from the Multi-year Reanalysis of Remotely Sensed Storms (MYRORSS); model soundings from the Rapid Update Cycle (RUC); and surface wind observations from the Meteorological Assimilation Data Ingest System (MADIS), Oklahoma Mesonet, one-minute meteorological aerodrome reports (METARs), and National Weather Service local storm reports (LSRs). Radar images and model soundings are used to create predictors, while surface wind observations are used to create verification data (knowledge of when and where hazardous winds occurred).

Input data are processed in four steps. First, storm cells are identified and tracked through time by w2segmotionll and w2besttrack. w2segmotionll outlines storm cells are creates preliminary tracks, which are then corrected by w2besttrack.

Second, wind observations are linked to nearby storm cells. Each wind observation is linked to the cell with the nearest boundary, as long as the boundary is within 10 km.

Third, predictors are calculated for each storm object (a “storm object” is one cell at one time step). The four types of predictors are radar statistics, storm motion, shape parameters, and sounding indices. For 12 different variables in MYRORSS (0—2-km azimuthal shear, 3—6-km azimuthal shear, 18-dBZ echo tops, 50-dBZ echo tops, maximum estimated hail size, -20 °C reflectivity, -10 °C reflectivity, 0 °C reflectivity, composite reflectivity, lowest-altitude reflectivity, severe-hail index, and vertically integrated liquid), 11 statistics (0th, 5th, 25th, 50th, 75th, 95th, and 100th percentiles; mean; standard deviation; skewness; and kurtosis) are calculated for values inside the storm object. For each of these 12 variables, the same statistics are calculated for gradient magnitudes inside the storm object. Storm motion (speed and direction) are calculated from w2besttrack. Shape parameters (area, orientation, eccentricity, etc.) are calculated for the boundary of the storm object. Finally, RUC soundings are interpolated to the time and center position of the storm object, and 97 indices (wind shears, mean layer winds, moisture variables, composite indices, etc.) are calculated with the SHARPpy software (Halbert et al. 2015). Overall, there are 431 predictors.

The fourth and last processing step is labeling each storm object. Labels are created for each threshold (30 kt for moderate, 50 kt for severe); spatial buffer (inside, 0-5 km around, and 5-10 km around the moving storm cell); and temporal buffer (0-15, 15-30, 30-45, 45-60, and 60-90 minutes into the future). For threshold U, the label is 1 if a wind gust > U occurred within the spatial and temporal buffer; otherwise, the label is 0.

For each spatial and temporal buffer, we use an ensemble of gradient-boosted trees (GBTs) to forecast the probability of both moderate and severe wind. Then we use isotonic regression to calibrate these probabilities, which makes them more reliable. Finally, for each threshold and temporal buffer (lead-time window), we interpolate storm-cell-wise probabilities to a grid. Thus, at lead times up to 90 minutes, we forecast the probability of both moderate and severe straight-line convective wind everywhere in the continental United States.


Blouin, Karen D., et al. "Ensemble lightning prediction models for the province of Alberta, Canada." International Journal of Wildland Fire 25.4 (2016): 421-432.

Gagne II, David John, et al. "Day-Ahead Hail Prediction Integrating Machine Learning with Storm- Scale Numerical Weather Models." AAAI. 2015.

Halbert, K.T., W.G. Blumberg, and P.T. Marsh, 2015. "SHARPpy: Fueling the Python Cult". Preprints, 5th Symposium on Advances in Modeling and Analysis Using Python, Phoenix, AZ.

Lakshmanan, Valliappa, and Travis Smith. "Evaluating a Storm Tracking Algorithm." 26th Conference on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology. 2010.

Lakshmanan, Valliappa, Benjamin Herzog, and Darrel Kingfield. "A Method for Extracting Postevent Storm Tracks." Journal of Applied Meteorology and Climatology 54.2 (2015): 451-462.

McGovern, Amy, D.H. Rosendahl, and R.A. Brown. “Toward Understanding Tornado Formation Through Spatiotemporal Data Mining.” In: Data Mining for Geoinformatics: Methods and Applications, eds. Cervone, Guido, Jessica Lin, and Nigel Waters. New York: Springer. 29-47.

National Severe Storms Laboratory. “Severe Weather 101: Damaging Winds Basics.” n.d. Website. 11 August 2015.

Williams, John K. "Using random forests to diagnose aviation turbulence." Machine Learning 95.1 (2014): 51-70.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner