Producing Machine Learning-Based Severe Weather Guidance at Watch-to-Warning Lead Times

Varga, Sam; Varga, Sam

Prior research by Clark & Loken (2022) and Flora et al. (2021) has demonstrated the efficacy of machine learning (ML) techniques in producing severe weather probabilities using the Warn-on-Forecast System, an experimental ensemble-based convection-allowing model system. However, these studies focused primarily on shorter lead times (0-3 hrs). In this study, we aim to extend the use of ML-based severe weather guidance to longer lead times (2-6 hrs) by adopting previously applied techniques and implementing novel approaches. These lead times are critical since they lie within the watch-to-warning range where numerical guidance is limited.

For this study, we employ two feature engineering approaches: one previously developed by Clark & Loken (2022), and a new approach that mimics the multi-resolution feature engineering used in deep learning methods. Clark & Loken’s (2022) approach utilizes a combination of features at a single scale, and fields that are spatially diffused with Gaussian smoothers. The smoothed 2-5 km updraft helicity (UH) fields from individual ensemble members are included as predictors. Our novel feature generation approach pools grid points over various spatial scales, then filters the upscaled fields. We apply maximum value filters to intrastorm variables, and average value filters to environmental variables. These feature sets are used to train three archetypes of ML models: logistic regression (LR), random forest (RF), and histogram-based gradient boosted trees (HGBT). Our goal is to predict any severe weather (hail, wind, or tornado), and we use NOAA’s Storm Events Database as our target data. To assess how the ML model skill varies with scale, and to determine spatial predictability limits, we use three target matching distances: 36 km, 18 km, and 9 km.

Our models’ performance is compared to a baseline of the neighborhood maximum ensemble probability (NMEP) exceedance of the most skillful 2-5 km UH threshold. An example of output produced by the novel framework is shown in the provided figure. The models’ performance is evaluated with receiver operating characteristic diagrams, performance diagrams, and attribute diagrams. Preliminary results indicate that the differences between the two feature engineering frameworks do not create substantial differences in skill, with both performing better than the baseline. Of the three machine learning archetypes evaluated, LR and HGBT consistently show the highest performance in both frameworks. These models are more likely than the UH baseline to correctly predict the occurrence of an event, while maintaining the same reliability. However, predictive skill decreases at smaller target scales. Future research directions include training ML models for individual convective hazards and exploring their performance with explainability techniques.

15.3 Producing Machine Learning-Based Severe Weather Guidance at Watch-to-Warning Lead Times