In this study, a random forest (RF) algorithm is used to produce probabilistic 12-36 hour precipitation forecasts for the contiguous United States using input data from the HREFv2. Predictor variables include member forecasts and ensemble statistics of a variety of fields, including: temperature and dewpoint temperature at multiple vertical levels, simulated 1 km above-ground reflectivity, surface-based CAPE and CIN, precipitable water, maximum hourly wind components, and forecast 24 hour precipitation. National Centers for Environmental Prediction (NCEP) Stage IV data are used as the observational dataset.
Results using data from late April 2017 to late March 2018 suggest that probabilistic 1-inch RF forecasts have excellent reliability and discrimination ability, especially compared to raw ensemble probabilities. However, RF forecasts are only marginally superior to spatially-smoothed raw ensemble probabilities. These findings demonstrate the importance of comparing ML results to meaningful baseline forecasts and suggest a need for large high-resolution ensemble datasets.