Hail Forecasting with Interpretable Deep Learning

Gagne, David John; Gagne, David John

Hail growth and melting depend on the spatial structure of a storm and its environment. Hail growth is enhanced by favorable flow trajectories that increase time spent in regions of supercooled liquid water. Current storm-based machine learning hail models ingest spatially aggregated storm environment statistics, which can obscure the information associated with favorable wind and temperature fields. However, deep learning models, such as convolutional neural networks, can identify combinations of spatial structural patterns at multiple scales. They can utilize this spatial information to produce hail predictions with increased skill. Once the deep learning models are trained, forecasters and researchers need to learn what input features were important for identifying severe hailstorms in order to validate that the top features are physically relevant. In this study, a suite of post-hoc interpretation techniques are applied to the deep learning models to rank important input fields, visualize the structure of an ideal hailstorm, and identify the storm structures most associated with severe hail.

The deep learning models are trained on a large set of NWP modeled hailstorms. One set of storms is extracted from the NCAR convection-allowing ensemble over the period from 3 May to 3 June 2016. Other storms are extracted from deterministic 1 and 3 km WRF runs on major storm event days from 2010-2016. Temperature, dewpoint, geopotential height, and horizontal wind fields at multiple pressure levels within the vicinity of each storm are fed into convolutional neural networks, generative adversarial networks, principal component analysis logistic regressions, and spatial mean logistic regressions to predict the probability of hail at least 25 mm in diameter, which is considered severe hail by the National Weather Service. Multiple models of each type are trained with different subsets of the storm data randomly selected by storm day. A probabilistic evaluation of the model forecasts found that the convolutional neural networks perform significantly better than the other models in terms of Brier Skill Score and Area Under the ROC Curve. The models are interpreted by ranking each input variable through a permutation feature importance procedure. The convolutional neural network and spatial mean logistic regression have similar feature rankings, but models with a unsupervised encoding procedure assign similar importance to all inputs. Important spatial features in the convolutional neural network are identified by performing gradient descent on the input fields to maximize the probability of the output layer. Other features are identified by comparing the storms that maximize the activation of neurons with high weights. These interpretation methods reveal physically relevant features consistent with observational and modeling studies of hailstorms.

9A.2 Hail Forecasting with Interpretable Deep Learning