6B.4 Tornado Database Cleansing and Augmentation for Use in Tornado Risk Modeling

Tuesday, 8 November 2016: 11:15 AM
Pavilion Ballroom West (Hilton Portland )
Melissa K. Faletra, Applied Research Associates, Raleigh, NC; and L. A. Twisdale Jr., M. B. Hardy, M. Levitan, and L. Phan

The Storm Prediction Center (SPC) tornado database provides the data that is typically used in tornado climatology and risk modeling. The SPC links together the tornado county segment data obtained from Storm Data in order to produce full tornado track data. The SPC has made edits and corrections to the data over the years as seen fit, and Storm Data and its sources have evolved throughout its history. The complex history of tornado data, along with transitions from printed to digital records, have introduced errors into the data.

The SPC database contains biases, errors, and default values and as a result it cannot be used in its raw form for tornado risk modeling. An analysis of the Storm Prediction Center (SPC) tornado database was completed which focused on locating and correcting errors and defaults in order to create a “cleansed” database with augmented data fields. The tornado county segments in the National Centers for Environmental Information (NCEI) Storm Events Database were linked together into full tornado tracks and matched to their corresponding tornadoes in the SPC database when possible. The following data fields were added to the cleansed database, which consist of computed parameters and fields that were obtained from the NCEI Storm Events Database: (1) tornado path direction, (2) the Weather Forecast Office (WFO) that produced the rating, (3) the source of the rating, (4) the tornado ending date and time, and (5) the narrative written about the tornado.

Data analysis was also paired with research on the history of the data to help determine errors, their meaning, and how to correct them. Corrections to the data resulting from this process include: (1) Path width corrections to 350 tornadoes in 1999 whose path widths had been changed to zero if their width had originally ended in the number ‘5’; (2) Path length and width corrections to individual F/EF4 and F/EF5 events with unrealistically small path lengths; (3) Corrections to tornadoes with unrealistic aspect ratios (path length/path width); (4) Adjustments for default path length and path width entries; and (5) Removal of mislabeled hurricane data from the database. These corrections have significantly impacted the overall characteristics of the database. For example, the path length and width corrections to the F/EF4 and F/EF5 events resulted in a 5.4% increase in total F/EF4 area, and a 2.3% increase in total F/EF5 area.

Many errors and biases within tornado data (e.g. time trends, population bias, etc.) need to be corrected by using a probabilistic modeling approach. The cleansed SPC database with augmented data fields created and described in this paper provides a better starting point for modeling tornado climatology and understanding reporting issues and potential biases within the database.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner