Rather than limiting the training dataset, the random forests are trained on all hail-producing CONUS storms from 1 April to 31 July, 2017. Storms are weighed by their proximity to regional domains that experience greater climatological hail frequencies. For testing, only storms identified within the regional domains are examined. The weighted severe hail forecasts are compared to ML predictions produced without any weights, to determine if localized modeling results in superior forecasting performance.
Of the chosen regions, the difference between regionally-trained and CONUS-trained hail forecasts was greatest for the southern plains (trained around Dallas, Texas), in terms of objective, subjective, and statistical measures. Also, ranks from permutation variable importance, a model interpretation technique, indicate that low-level temperature and dewpoint are more important in the southern plains than across the CONUS. Preliminary analysis suggests the greater forecasting skill in the southern plains, compared to the lack of substantial improvement in other regions, results from a large number of severe hail events in the southern plains within the training period. Examining a larger dataset covering multiple years, and different weighting functions, could result in greater forecast performance in other regions.