The main methodology of the research is to begin with a “cell track” dataset that contains 72 potential predictor fields as related to convective storms that did/did not produce severe weather (hail ≥0.25” in diameter, strong winds ≥25 ms–1, tornadoes). A total of 2,559 individual storm cells were tracked at 5-min intervals on 8 separate days in 2013, 2014 and 2015. Each cell lasted from 15 min to over 2 hours. A machine learning (random forest–RF) approach was used to assess predictor importance for the occurrence of severe weather. Within the RF classification, each 5-min time is considered an “event” in the variable importance portion of the analysis, and therefore 114,265 separate events are considered. The radar data used within the cell track dataset is WSR-88D processed through the algorithm (GridRad.org), and included a wide variety of reflectivity and polarimetric fields that describe storm-scale processes including updraft strength, hail signatures, and kinematics. GOES-14 data at 1-min resolution were used to develop overshooting cloud-top (Bedka and Khlopenkov 2016) and cloud-top divergence and vorticity (Apke et al. 2016) fields for the tracked convective cells, while all lightning data were from the Earth Networks Total Lightning Network (ENTLN). NWP fields were obtained from the North American Regional Reanalysis (NARR), and were focused on those fields that help characterize the storm-scale environment (e.g., 0-1 km shear, helicity). The RF classification algorithms used is part of the TreeBagger package in Matlab 2017a.
In the presentation, several approaches will be discussion regarding a down-selection of predictor fields that show higher correlation/relationship to observed severe weather: (a) evaluating all 72 predictor fields at once, (b) evaluating all radar fields alone, (c) evaluating all satellite and lightning fields together, and (d) performing a statistically-based feature down-selection process. Results to date show that the five fields of correlation coefficient at high (>45 dBZ) reflectivity, NEXRAD radar-retrieved implied ascent, Storm Labeling in Three Dimensions (SL3D; Starzec et al. 2017)-classified convection, hail volume, and hail presence at 3 km AGL are the most important when identifying and/or monitoring severe storms. Lightning flash density, tall overshooting tops, satellite-estimated anvil-level vorticity and divergence, and NARR 0-1 km and 0-3 km helicity were also found to be important. The end goal of this project will be to couple the RF-based algorithm to the University of Alabama in Huntsville (UAH) Severe CI algorithm for predicting pending severe storm development and determine the likelihood of severe weather for ongoing convection.