1.1
Hail Size Prediction with Machine Learning Applied to Storm-Scale Ensembles: Spring 2014 Evaluation and Physical Understanding

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Monday, 5 January 2015: 11:00 AM
124B (Phoenix Convention Center - West and North Buildings)
David John Gagne, NCAR, Boulder, CO; and A. McGovern, J. Brotzge, M. C. Coniglio, J. Correia Jr., and M. Xue

Multiple severe hailstorms struck the United States in May and June 2014. Colorado, Nebraska, and Texas received some of the worst damage with insured losses totaling nearly $1 billion. Some of these losses could be prevented with skilled forecasts of the location, timing, and size of hail. Ensembles of storm-scale numerical weather prediction models can provide forecasts of hail-producing storms up to two days in advance, but the ensembles do not predict hail size explicitly. During the 2014 NOAA Hazardous Weather Testbed Experimental Forecast Program, machine-learning-based hail size regression techniques were compared with the HAILCAST physics-based hail growth model. The National Severe Storms Lab Multi-Radar Multi-Sensor Maximum Estimated Size of Hail mosaic provided hourly hail size observations over the United States. The Center for Analysis and Prediction of Storms Storm-Scale Ensemble Forecast generated 4 km grid spacing hourly forecasts over the contiguous United States from May through early June 2014. Likely storms were identified in each ensemble member using the enhanced watershed object identification technique. These storms were matched with nearby observed hail swaths, and gridded fields of storm and environmental variables were extracted within each storm object. Machine learning and statistical algorithms were trained to predict whether each storm would produce hail and predict the maximum hail size produced by each storm. Gradient boosting regression trees, random forest, and ridge/logistic regression were evaluated with this process. The maximum HAILCAST predicted size from each storm was used for comparison. Results showed that all machine learning approaches produced statistically significantly better hail size predictions than HAILCAST. The forecast errors from the machine learning methods were statistically indistinguishable from each other. The neighborhood ensemble probabilities generated from the machine learning methods were generally reliable, but all methods produced a large number of false alarms relative to the number of hits. The physical components of the machine learning methods are compared using variable importance scores and partial dependence plots. The strengths and weaknesses of the forecasts are displayed in a case study.