The deep learning models are trained on a large set of NWP modeled hailstorms. One set of storms is extracted from the NCAR convection-allowing ensemble over the period from 3 May to 3 June 2016. Other storms are extracted from deterministic 1 and 3 km WRF runs on major storm event days from 2010-2016. Temperature, dewpoint, geopotential height, and horizontal wind fields at multiple pressure levels within the vicinity of each storm are fed into convolutional neural networks, generative adversarial networks, principal component analysis logistic regressions, and spatial mean logistic regressions to predict the probability of hail at least 25 mm in diameter, which is considered severe hail by the National Weather Service. Multiple models of each type are trained with different subsets of the storm data randomly selected by storm day. A probabilistic evaluation of the model forecasts found that the convolutional neural networks perform significantly better than the other models in terms of Brier Skill Score and Area Under the ROC Curve. The models are interpreted by ranking each input variable through a permutation feature importance procedure. The convolutional neural network and spatial mean logistic regression have similar feature rankings, but models with a unsupervised encoding procedure assign similar importance to all inputs. Important spatial features in the convolutional neural network are identified by performing gradient descent on the input fields to maximize the probability of the output layer. Other features are identified by comparing the storms that maximize the activation of neurons with high weights. These interpretation methods reveal physically relevant features consistent with observational and modeling studies of hailstorms.