Abstract: Feature selection of radar-derived tornado attributes with support vector machines (85th AMS Annual Meeting)

Tuesday, 11 January 2005: 9:00 AM

Feature selection of radar-derived tornado attributes with support vector machines

Michael B. Richman, University of Oklahoma, Norman, OK; and B. Santosa and T. B. Trafalis

Poster PDF (112.0 kB)

Tornado circulation attributes derived largely from the National Severe Storms Laboratory Mesocyclone Detection Algorithm (MDA) have been investigated for their efficacy in distinguishing between mesocyclones that become tornadic from those which do not. Using a subset of the MDA attributes associated with velocity yields 23 potential predictors. Previous research has shown that the discrimination ability of several of the predictors is not good and the predictor pool has several highly associated subsets of these variables. Despite these drawbacks, application of artificial neural networks (ANN) and support vector machines (SVM) have met with success in predicting correctly pre-tornadic circulations. One of the largest challenges in this regard is to maintain a high probability of detection (POD) while simultaneously minimizing the false alarm rate (FAR).

SVM is a non-linear classifier in the input space and, accordingly, the use of linear statistics to screen the predictor pool a priori, may not be logically consistent. In this research, the impact of removing individual predictors is examined on the training and testing errors. The model was trained on a 50 percent tornado, 50 percent non-tornado ratio and was tested on a 2 percent tornado, 98 percent non-tornado ratio. Results were encouraging as exclusion of specific variables had a notable impact on the ability to distinguish accurately the tornadic from the non-tornadic circulations when viewed from misclassification rates, POD, FAR, and Heidke skill. A key finding is that inclusion of the current month number (1= January, 2 = February, …) in the testing data in addition to a subset of MDA predictors used in SVM is the most accurate set of features tested. The methodology used for feature selection outperforms SVM based on the MDA alone, achieving a Heidke skill of 0.84 with a POD of 0.82 and a FAR of 0.14.

Supplementary URL: