Modeling and Correcting Reporting Biases in the SPC Tornado Database

Potvin, Corey K.; Potvin, Corey K.

Maximizing the value of tornado climatologies requires accounting for unreported or mischaracterized tornadoes, especially where people and damage indicators are sparse. Previous attempts to spatially model tornado reporting bias have used only a single variable (e.g., population density) or a combination of variables that are implicitly assumed to have mutually independent effects. The proposed methodology, on the other hand, uses multivariable polynomial regression to simultaneously account for several variables, along with potentially important variable interactions (e.g., reduced importance of population density to reporting bias for tornadoes occurring near major highways). Reporting bias is estimated at each grid point by treating tornado counts valid over 100K cities (i.e., population > 100,000) or NWS Weather Forecast Offices (WFOs) as unbiased estimates of the true tornado counts. The regression model is then applied to these noisy (due to sampling error) estimates to produce new reporting bias estimates that eliminate most of the noise while preserving important spatial relationships with the geopolitical variables. While the approach is developed using the U.S. Storm Prediction Center (SPC) tornado database, it can be valuably applied to anywhere tornado locations are recorded and GIS data on population, cities, and/or roads are available.

We model tornado reporting bias east of the Rocky Mountains during 1975-2014 with various combinations of the following variables: 1) population density; 2) terrain ruggedness; 3) road density; and distance to 4) nearest 100K city, 5) 5K city, 6) WFO, 7) interstate, 8) WFO or 100K city, and 9) interstate or 5K city. In cross-validation tests, the combination of variables 1, 2, 4, 6, and 9 accounts for the most variance in reporting bias. Estimates of large-scale [O(1000 km)] reporting bias are not unduly sensitive to the number of regression variables, indicating useful information can be gained from limited geopolitical data. However, cross-validation tests and geographic maps of modeled bias suggest more complex regressions substantially improve bias estimates at smaller scales. The resulting improvements to tornado hazard models would be valuable to forecasters, severe storm and climate scientists, and insurance/reinsurance companies.

The regressions suggest only 46 % of tornadoes that actually occurred in the analysis domain were reported, with reporting rate decreasing by half as distance to nearest 100K+ city increases to 50 km. Reporting biases are especially pronounced earlier in the record and for shorter-track tornadoes, but remain nontrivial even for more recent and longer-track tornadoes. Underestimation of tornado frequency increases with damage rating; for example, the actual frequency of EF/F 3-5 tornadoes appears to be nearly three times that in the record. This underscores the problem of under-rating tornadoes in rural areas.

5.5 Modeling and Correcting Reporting Biases in the SPC Tornado Database