Labels for the training data were provided by Smith et al (2012) and Obermeier. Storm characteristics were drawn from radar data provided by the Multi-Year Reanalysis of Remotely Sensed Storms (MYRORSS, Ortega et al. 2012). Near Storm Enviroment (NSE) data was also created by interpolating soundings from RUC/RAP archives to the storm location using SharpPy (Blumberg et al, 2017).
This talk is part 1 of two highlighting two different machine learning approaches to classification. In this talk, we focus on tree-based classification systems including Random Forests (Breiman, 2001) and Gradient Boosted Classifiers (Friedman 2002). These systems achieve a preliminary Peirce Skill Score of 0.54 when trained and tested on data from 2011. We are currently augmenting the training set to include 2008-2010.
A preliminary version of this system was tested in real-time in NOAA’s Hazardous Weather Testbed (HWT) in the summer of 2017 and specifically within Probabilistic Hazard Information (PHI) (Karstens et al. 2015).