Using a Random Forest Model to Assess Flash Flood Probability across Southern Utah

Seaman, Michael P.; Seaman, Michael P.

Accurately identifying environments favorable for flash flooding across southern Utah is challenging for a number of reasons, including a sparse network of observations, poor radar coverage, and limited communication to notify those who are potentially in harm’s way. Additionally, recreationalists in remote areas of Utah lack telecommunication ability, and are thus unable to receive typical short-term warnings and products from the National Weather Service. Consequently, the Salt Lake City Weather Forecast Office has placed an emphasis on developing a flash flood potential forecast extending out to two days, which provides hazard information to these groups before they head into the backcountry. Traditionally, this has been done using deterministic model fields such as precipitable water and storm motion. This study approaches the problem more rigorously by implementing machine learning to pinpoint environmental parameters most conducive to flash flooding, then applying those parameters in a probabilistic flash flood potential algorithm for operational use. The ensemble machine learning technique of Random Forests (RF) was selected for this project as RFs have been shown to perform well relative to other predictive techniques (e.g. regression or neural network) in analyzing large datasets, can generate native probabilistic output, and also provide feature importances.

For this study, the RF algorithm was trained with three-dimensional thermodynamic and kinematic data from NAM BUFR soundings at four environmentally representative sites in southern Utah over the 2010-2015 warm seasons (May-October inclusive). This daily environmental data was combined with storm data regarding the number of large basins around each of the four BUFR sites which experienced flash flooding on a particular day. Once trained, both the predicted class (i.e. flooding or no flooding) and the probabilities of each class were evaluated using an independent test dataset containing the same variables for 2016-2018. Various performance statistics were then calculated. Overal,l the model performed better than SLC’s legacy flash flood algorithm at all of the test sites, with more significant improvements noted at the two sites where there were the most widespread flash flooding days. The RF also highlighted the environmental parameters that were most important in distinguishing widespread flash flood days in southern Utah. Moisture and instability parameters such as MUCAPE, warm cloud depth, and surface dewpoint ranked as the most important, while kinematic parameters such as storm motion and Corfidi parameters were of lesser importance. Results of testing this model in the summer 2019 flash flood season over southern Utah will be shown, as well as plans to expand this project in the future.

J71.4 Using a Random Forest Model to Assess Flash Flood Probability across Southern Utah