ML models are trained with three types of data: radar grids from the Multi-year Reanalysis of Remotely Sensed Storms (MYRORSS); model soundings from the Rapid Update Cycle (RUC); and surface wind observations from the Meteorological Assimilation Data Ingest System (MADIS), Oklahoma Mesonet, one-minute meteorological aerodrome reports (METARs), and National Weather Service local storm reports (LSRs). Radar grids indicate the structural and hydrometeorological properties of storm cells, and model soundings indicate the near-storm environment, both of which dictate how storm cells will evolve. Thus, both of these data types are used to create predictors. Meanwhile, surface wind observations are used to determine when and where damaging winds occurred.
Training data are processed in four ways. First, storm cells are identified and tracked through time, using the algorithms w2segmotionll (Lakshmanan and Smith 2010) and w2besttrack (Lakshmanan et al. 2015). w2segmotionll performs identification and real-time tracking, then w2besttrack performs post-event tracking, which extends falsely truncated tracks from w2segmotionll.
Secondly, wind observations are linked (causally attributed) to storm cells. Each wind observation is linked to the nearest storm cell, or thrown out if there is no storm cell within 10 km.
Third, predictors are calculated for each storm object (a “storm object” is one storm cell at one time step). The four types of predictors are radar statistics, storm motion, shape parameters, and sounding parameters. For each of the 12 variables taken from MYRORSS (0—2-km azimuthal shear, 3—6-km azimuthal shear, 18-dBZ echo tops, 50-dBZ echo tops, maximum estimated hail size, -20 °C reflectivity, -10 °C reflectivity, 0 °C reflectivity, composite reflectivity, lowest-altitude reflectivity, severe-hail index, and vertically integrated liquid), 11 statistics (0th, 5th, 25th, 50th, 75th, 95th, and 100th percentiles; mean; standard deviation; skewness; and kurtosis) are calculated on all values inside the storm object. Then the gradient field for each variable is calculated, and the same 11 statistics are calculated on the gradient magnitudes of all 12 variables inside the storm object. This procedure leads to 264 radar statistics. Storm motion (speed and direction) are given by w2segmotionll. Shape parameters (area, orientation, eccentricity, solidity, extent, curvature, bending energy, and compactness) are calculated on the bounding polygon of the storm object. Finally, RUC soundings are interpolated to the time and centroid of the storm object, and 97 sounding parameters are calculated (e.g., convective available potential energy, convective inhibition, storm-relative helicity, etc.) by SHARPpy (Halbert et al. 2015). In total, there are 431 predictors.
The fourth and final processing step is the creation of labels (indicating whether or not damaging winds occurred). Our goal is to predict damaging winds at three buffer distances (0, 5, and 10 km around the storm object) and five lead times (0-15, 15-30, 30-45, 45-60, and 60-90 minutes). For each buffer distance and lead time, the storm object is assigned a label of 1 if it is linked to a wind observation > 50 kt and 0 otherwise.
For each buffer distance and lead time, two models are trained. The base model is an ensemble of gradient-boosted trees (GBTs), and isotonic regression is used for probability calibration (to make GBT forecasts more reliable).
Forecasts are evaluated with receiver operating characteristic (ROC) curves, performance diagrams, and attributes diagrams, which we show for each buffer distance and lead time. In these diagrams we focus mainly on three numbers: area under the ROC curve (AUC), maximum critical success index (CSI), and reliability. These values range from 0-1; 0 is a perfect score for reliability, and 1 is a perfect score for the others. For the best model (buffer of 0 km and lead time of 0-15 minutes) we achieve an AUC of 0.97, maximum CSI of 0.91, and reliability of 0.0036. For the worst model (10 km and 60-90 minutes) we achieve an AUC of 0.87, maximum CSI of 0.18, and reliability of 0.0001.
Finally, we present estimates of predictor importance from J-measure ranking and sequential forward selection. This will provide physical insight into how our ML models work.
Blouin, Karen D., et al. "Ensemble lightning prediction models for the province of Alberta, Canada." International Journal of Wildland Fire25.4 (2016): 421-432.
Gagne II, David John, et al. "Day-Ahead Hail Prediction Integrating Machine Learning with Storm- Scale Numerical Weather Models." AAAI. 2015.
Halbert, K. T., W. G. Blumberg, and P. T. Marsh, 2015: "SHARPpy: Fueling the Python Cult".
Preprints, 5th Symposium on Advances in Modeling and Analysis Using Python, Phoenix AZ.
Lakshmanan, Valliappa, and Travis Smith. "Evaluating a Storm Tracking Algorithm." 26th Conference on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology. 2010.
Lakshmanan, Valliappa, Benjamin Herzog, and Darrel Kingfield. "A Method for Extracting Postevent Storm Tracks." Journal of Applied Meteorology and Climatology 54.2 (2015): 451-462.
McGovern, Amy, D.H. Rosendahl, and R.A. Brown. “Toward Understanding Tornado Formation Through Spatiotemporal Data Mining.” In: Data Mining for Geoinformatics: Methods and Applications, eds. Cervone, Guido, Jessica Lin, and Nigel Waters. New York: Springer. 29-47.
National Severe Storms Laboratory. “Severe Weather 101: Damaging Winds Basics.” n.d. Website. 11 August 2015.
Williams, John K. "Using random forests to diagnose aviation turbulence." Machine Learning 95.1 (2014): 51-70.