We present the beginning stages of a machine learning tool that is used in a diagnostic manner to assign a probability to each thunderstorm wind damage report that the wind speed exceeded both the 50 knot threshold that defines a thunderstorm as being severe, and the 65 knot threshold that is used to indicate a significant severe wind report. The data used to construct the machine learning tool is the Storm Data database, and measured data -- environmental data such as lapse rates, specific humidity, winds, CAPE, CIN, shear, among others, from RUC and RAP analyses, measured winds from surface station networks near the location and time of the storm reports, and radar reflectivity and Doppler velocity information from the nearest NEXRAD radar to the storm reports (over a period of 30 minutes centered on the storm report, and within a 25 km radius of the report). Initially, spatial and temporal statistical analysis will be performed on the storm reports, allowing them to be clustered and assigned a probability value. Semi-supervised clustering along with the environmental data as predictors will permit clustering regression. Since a major challenge in the development of the diagnostic tool is the fact that the majority of the storm reports are not verifiable – there is no measured wind speed associated with them – the final tool will use techniques that assign a level of confidence, or probability, that the storm report wind speed reached severe or significant severe thresholds.
We will present the results of preliminary testing and discuss the development process, along with lessons learned.