Tornadic supercell analysis from Oklahoma mesonet and proximity sounding observations: a spatiotemporal relational data mining approach

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Tuesday, 25 January 2011: 11:45 AM
Tornadic supercell analysis from Oklahoma mesonet and proximity sounding observations: a spatiotemporal relational data mining approach
2A (Washington State Convention Center)
David John Gagne II, University of Oklahoma, Norman, OK; and A. McGovern, J. B. Basara, and R. A. Brown

The complex relationships between a supercell thunderstorm and its environment have a major influence on whether or not a supercell thunderstorm will produce a tornado. Examining the observations of the supercell environment can reveal some of these relationships, but what can be resolved is limited by the observations' spatial and temporal distributions. Surface observations can provide a wide spatial and high temporal coverage of the lowest level of the environment but cannot reveal any of the upper level complexity. Proximity soundings contain a snapshot of a storm's surrounding environment, but are limited to a single location at a single time. To balance the trade-offs of the two observing systems, this study combines surface observations surrounding the storm and nearby surface boundaries with proximity soundings for 926 cases of tornadic and non-tornadic supercells in Oklahoma from 1994-2003. Surface observations from the Oklahoma Mesonet capture the variations of the environment for an hour prior to initiation through the entire life of the storm. The closest representative sounding to each storm is also sampled. The data are stored in a spatiotemporal relational framework where the storm, boundaries, and soundings are objects containing temporally and spatially varying attributes. Each object is also connected by spatial relationships that vary over time.

The spatiotemporal relational cases are used to train a Spatiotemporal Relational Random Forest (SRRF), an ensemble machine learning model. The SRRF is built from Spatiotemporal Relational Probability Trees (SRPTs) trained on bootstrap resampled cases and grown using a standard greedy decision tree algorithm. Each tree provides a probability of whether a storm is tornadic, and the higher mean of all the trees' probabilities determines the forest's decision. SRRFs were evaluated based on their Gerrity Skill Score (GSS) and Area Under the relative operator characteristic Curve (AUC). With the assistance of under-sampling the negative cases to balance the training data, the best SRRFs produced a mean GSS greater than 0.3 and an AUC greater than 0.7. Variable importance evaluation found that temperature, moisture variables, storm duration, and the movement of storms relative to nearby boundaries had the most impact differentiating tornadic and non-tornadic supercells. The importance measures indicate that the forests have the ability to distinguish different supercell environments such as moist, unstable, moderate shear environments, and given that environment provide a reliable estimation of the chance of a tornadic supercell.