Tornadic supercell analysis from Oklahoma mesonet and proximity sounding observations: a spatiotemporal relational data mining approach
The spatiotemporal relational cases are used to train a Spatiotemporal Relational Random Forest (SRRF), an ensemble machine learning model. The SRRF is built from Spatiotemporal Relational Probability Trees (SRPTs) trained on bootstrap resampled cases and grown using a standard greedy decision tree algorithm. Each tree provides a probability of whether a storm is tornadic, and the higher mean of all the trees' probabilities determines the forest's decision. SRRFs were evaluated based on their Gerrity Skill Score (GSS) and Area Under the relative operator characteristic Curve (AUC). With the assistance of under-sampling the negative cases to balance the training data, the best SRRFs produced a mean GSS greater than 0.3 and an AUC greater than 0.7. Variable importance evaluation found that temperature, moisture variables, storm duration, and the movement of storms relative to nearby boundaries had the most impact differentiating tornadic and non-tornadic supercells. The importance measures indicate that the forests have the ability to distinguish different supercell environments such as moist, unstable, moderate shear environments, and given that environment provide a reliable estimation of the chance of a tornadic supercell.