A Review of Support Vector Machine Performance on Tropical Cyclone Intensity Prediction with Imbalanced Datasets

Ko, Mu-Chieh; Ko, Mu-Chieh

In tropical cyclone (TC) cases, rapid intensification (RI) events are relatively rare. This means that if we want to use machine learning (ML) for RI predictions, we will face the issue of imbalanced classes. Techniques capable of dealing with this issue are numerous: under-sampling of the majority class (e.g., Tomek Links, Edited Nearest Neighbor, and Neighborhood Cleaning Rule), over-sampling on the minority class (e.g., direct duplication, Boosting, and SMOTE), and a combination of both under- and over-sampling. The reported study has investigated these methods, seeking to find ways of minimizing errors (especially false negatives) in our attempts to predict TC intensity changes using Support Vector Machines (SVM), one of the most powerful ML-techniques currently in use. The goal of the study is to train and test SVM model with analysis data coming from the 2018 version of the operational Hurricane Weather and Research Forecasting (HWRF). This contains more than 4000 cycles from all TCs in the period from 2015 to 2018. The dataset is storm-centered and it assimilates the majority of the available observational data on to a 3-km grid. The data are categorized into two classes, intensified vs. non-intensified, based on the maximum wind-speed changes within 24 hours in HURDAT2 dataset provided by National Hurricane Center. This study further inspects SVM's potential for distinguishing between the classes RI and non-RI. The dataset with the best performance will serve as a standard dataset for further studies on ML approaches in TC intensification research.

J43.3 A Review of Support Vector Machine Performance on Tropical Cyclone Intensity Prediction with Imbalanced Datasets