A Machine Learning Tool to Provide Probabilities That Thunderstorm Wind Damage Reports Are Due to Severe Intensity Winds

Tirone, Elizabeth; Tirone, Elizabeth

For both the determination of severe thunderstorm warning skill and development of machine learning-based forecast techniques designed to improve severe storm forecasting, it is essential that an accurate verification dataset be available. Although problems are known to exist in the official local storm reports database often used for these purposes for both tornadoes and hail, it is likely the worst problems are present in the database for severe thunderstorm winds. Out of 186,181 thunderstorm wind damage reports during 2007-2018 in the official Storm Data database, 89.1% were estimates, with only 10.9% being measured winds. A large portion of the reports assigned an estimated speed involve damage to trees. Frequently, when tree damage is reported, an estimated wind gust of at least 50 knots is assigned, as that value is sufficient to verify a severe thunderstorm warning. Yet, it is well understood that tree damage is also a function of the age and health of the tree, and it is likely many of the tree damage reports occur with winds that are less than the severe criterion of 50 knots. Measured winds exceeding the 50 knot threshold are a small fraction of the database for severe thunderstorm winds.

We present the beginning stages of a machine learning tool that is used in a diagnostic manner to assign a probability to each thunderstorm wind damage report that the wind speed exceeded both the 50 knot threshold that defines a thunderstorm as being severe, and the 65 knot threshold that is used to indicate a significant severe wind report. The data used to construct the machine learning tool is the Storm Data database, and measured data -- environmental data such as lapse rates, specific humidity, winds, CAPE, CIN, shear, among others, from RUC and RAP analyses, measured winds from surface station networks near the location and time of the storm reports, and radar reflectivity and Doppler velocity information from the nearest NEXRAD radar to the storm reports (over a period of 30 minutes centered on the storm report, and within a 25 km radius of the report). Initially, spatial and temporal statistical analysis will be performed on the storm reports, allowing them to be clustered and assigned a probability value. Semi-supervised clustering along with the environmental data as predictors will permit clustering regression. Since a major challenge in the development of the diagnostic tool is the fact that the majority of the storm reports are not verifiable – there is no measured wind speed associated with them – the final tool will use techniques that assign a level of confidence, or probability, that the storm report wind speed reached severe or significant severe thresholds.

We will present the results of preliminary testing and discuss the development process, along with lessons learned.

970 A Machine Learning Tool to Provide Probabilities That Thunderstorm Wind Damage Reports Are Due to Severe Intensity Winds