92nd American Meteorological Society Annual Meeting (January 22-26, 2012)

Tuesday, 24 January 2012: 9:00 AM
Machine Learning Enhancement of Storm Scale Ensemble Precipitation Forecasts
Room 242 (New Orleans Convention Center )
David John Gagne II, Univ. of Oklahoma, Norman, OK; and A. McGovern and M. Xue

The Center for the Analysis and Prediction of Storms (CAPS) Storm Scale Ensemble Forecast (SSEF) System, a multi-model, multi-physics, convection-allowing ensemble of NWP models, includes two types of probabilistic forecasts. The first are grid-point probabilities based on the number of ensemble members exceeding a threshold. The second are neighborhood probabilities based on thresholds exceeded within an area of influence among all members. Before any calibration, they do not take into account the biases of the ensemble system or the relationships between predicted precipitation and other model variables. This project has built machine-learning models that derive probabilities of precipitation from a dataset of ensemble model variables and radar-derived precipitation estimates from the 2010 SSEF Spring Forecast Experiment. For each ensemble run during the experiment, a weighted stratified random sample was taken of the points over the all grid pointswith differing thresholds of observed precipitation. Variables extracted include accumulated precipitation, composite radar reflectivity, precipitable water, surface temperature, surface dew point, surface and 700 mb winds, and maximum vertical velocities. Verification data come from the National Severe Storms Laboratory (NSSL) National Mosaic and Multisensor Quantitative Precipitation Estimate, a CONUS-wide high-resolution radar-derived and quality-controlled precipitation mosaic. Three types of machine learning models were trained on the data: logistic regression, random forest, and Bayesian network. Logistic regressions weight a subset of input variables to fit to a curve ranging from 0 to 1. Random forests are ensembles of decision-type trees that produce probabilities from the average of all their individual tree probabilities. Bayesian networks are directed acyclic graphs that produce probabilities based on the conditional probabilities of related attributes. These models can correct biases, account for different weather regimes, and select the most relevant predictive variables and ensemble members. Each machine-learning model was tested with multiple configurations and varying training set sizes and differing numbers of variables. Feature selection from the logistic regression and variable importance from the random forest are used to determine which variables in the dataset are most significant. Ultimately the machine-learning methods will provide forecasters with a more reliable ensemble forecast and modelers with more guidance on what areas to focus further model development.

Supplementary URL: