92nd American Meteorological Society Annual Meeting (January 22-26, 2012)

Wednesday, 25 January 2012: 11:45 AM
Comparison of Machine Learning Techniques for the Prediction of Thunderstorm Location
Room 242 (New Orleans Convention Center )
Philippe E. Tissot, Texas A&M Univ., Corpus Christi, TX; and W. G. Collins

This presentation compares the performance and features of two machine learning techniques, Artificial Neural Network (ANN) and Random Forest (RF), to predict the occurrence of thunderstorms within the next 3 to 12 hours. The models are developed for a South Texas grid of 13 x 22 equidistant 20-km x 20-km box regions. While the methodology is the same across the grid models are trained for each box to account for local characteristics. Inputs to the models combine predictions from a Numerical Weather Prediction (NWP) mesoscale models (Eta, WRF-NMM), and high-resolution/subgrid scale data with documented correlation with convective initiation. The NWP model predictions provide information as to the likely mesoscale environment while subgrid scale data provides information on the meso- scale microscale forcing. The meso- scale input is composed of statistics (mean, maximum, maximum gradient and cluster based measures) based on the computation of the Antecedent Precipitation Index (API) map (a proxy for the soil moisture pattern) over a 4-km grid. The API was computed from the 4-km Multi-Sensor Precipitation Estimator (MPE) output from the National Weather Service (NWS). Other subscale inputs include the number of previous dry days over the past 10 days in the box and 4-km aerosol optical depth (AOD).

Project data covers March 2004 to December 2010. The models were developed/calibrated over the time span March 2004 to December 2006 and January 2009 to December 2010. The period of January 2007 to December 2008 was used for performance evaluation and comparison with actual NWS forecasts. The performance of the calibrated ANN models was evaluated by comparing ANN predictions with observations and actual NWS forecasts for various configurations. The performance parameters used included the Heidke, Pierce, and Yule's Q skill scores. A threshold for the occurrence of thunderstorm is was determined subjectively based on the respective Probability of Detection (POD), False Alarm Rate (F), and Critical Success Index (CSI) output at each point on the receiver/relative operating characteristic (ROC) curve generated from the testing data set during calibration. ANN model performance for the area around Victoria, Texas indicates Heidke, Pierce, and Yule's Q Skill Scores ranging from 0.17 to 0.10, 0.71 to 0.55 and 0.96 to 0.85, POD varying from 93% to 78% and F from 22% to 33% respectively for 3, 6, 9 and 12 hour predictions. Importance variable evaluated as part of the RF model calibration indicates that the most important variables for thunderstorm predictions are NWP model outputs convective precipitation, (cp), precipitable water (pw) and lifted condensation level (LCL) but that several other variables contribute to model performance including subgrid scale inputs.

Supplementary URL: