Our training period consists of all days from a seven-year period (2004-11 except 2009, the years of best data availability) with at least 100 SPC (Storm Prediction Center) reports of damaging non-tornadic winds in the continental United States. There are 315 of these days. Three types of training data are used: radar data from MYRORSS (Multi-Year Reanalysis of Remotely Sensed Storms); mesoscale model data from NARR (North American Regional Analysis); and surface wind observations from MADIS (Meteorological Assimilation Data Ingest System), one-minute METARs, and the Oklahoma Mesonet.
First, storm cells are identified and tracked through time, using the algorithm w2segmotionll (Lakshmanan and Smith 2010) from WDSS-II (Warning Decision Support System – Integrated Information). At each time step, a storm cell is described by its centroid and bounding polygon. Secondly, for each storm cell, surface wind observations within 10 km of the bounding polygon are attributed to said cell. Finally, four types of features are calculated: statistics for radar variables within the bounding polygon; statistics describing the shape of the bounding polygon; basic storm information (e.g., area, speed of motion, direction of motion); and sounding indices. Sounding indices are calculated by the SHARPpy software (Halbert et al. 2015), from a proxy sounding created by interpolating NARR data to the storm centroid. Over 400 features are calculated and used as predictors in machine learning.
At each time step in the life span of a storm cell, winds are predicted at lead times of 15, 30, 45, and 60 min. Both regression and classification are used. In regression, the 90th-percentile wind from the storm cell (over the given lead time) is predicted. In classification, probability of exceedance is calculated for thresholds of 30, 50, and 70 kt. Three types of models are used for both regression and classification: elastic nets, decision trees, and neural networks.
A two-phase experiment is conducted to find the best models and predictors. In the first phase, a large number of models is run on all 400+ predictors, each with different model parameters. For regression (classification), root-mean-square error or RMSE (cross-entropy) is used to evaluate performance. For learning goal (regression, classification with 30-kt threshold, classification with 50-kt threshold, classification with 70-kt threshold) and lead time, a small number of top models is chosen. For each of these top models, predictor importance is ranked using the permutation method (Lakshmanan et al. 2015). Predictor rank is averaged across these models, and a small number of top predictors is chosen. In the second phase, the same models are run again, this time using the selected predictors and slightly different model parameters (to allow deeper learning). Best results are presented.
Eventually, we hope to run best models in real time in the WDSS-II environment, which is used by NSSL forecasters and would provide them with guidance on the threat from damaging straight-line winds.
Works Cited
Gagne II, David John, et al. "Analyzing the effects of low level boundaries on tornadogenesis through spatiotemporal relational data mining.” 2010.
Gagne II, David John, et al. "Tornadic supercell environments analyzed using surface and reanalysis data: a spatiotemporal relational data-mining approach." Journal of Applied Meteorology and Climatology 51.12 (2012): 2203-2217.
Gagne II, David John, et al. "Severe hail prediction within a spatiotemporal relational data mining framework." Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on. IEEE, 2013.
Halbert, K. T., W. G. Blumberg, and P. T. Marsh, 2015: "SHARPpy: Fueling the Python Cult." Preprints, 5th Symposium on Advances in Modeling and Analysis Using Python, Phoenix AZ.
Lakshmanan, Valliappa, and Travis Smith. "An objective method of evaluating and devising storm-tracking algorithms." Weather and Forecasting 25.2 (2010): 701-709.
Lakshmanan, Valliappa, C. Karstens, J. Krause, K. Elmore, A. Ryzhkov, and S. Berkseth, “Which polarimetric variables are important for weather/no-weather discrimination?” Journal of Atmospheric and Oceanic Technology 32.6 (2015): 1209-1223.
Manzato, Agostino. "Hail in northeast Italy: climatology and bivariate analysis with the sounding-derived indices." Journal of Applied Meteorology and Climatology 51.3 (2012): 449-467.
National Severe Storms Laboratory. “Severe Weather 101: Damaging Winds Types.” n.d. Website. 11 August 2015.
Palencia, Covadonga, et al. "Maximum hailstone size: relationship with meteorological variables." Atmospheric Research 96.2 (2010): 256-265.
Trafalis, Theodore B., Indra Adrianto, and Michael B. Richman. "Active learning with support vector machines for tornado prediction." Computational Science–ICCS 2007. Springer Berlin Heidelberg, 2007. 1130-1137.