An Investigation of Population Bias in Tornado Records

Snow, John T.; Snow, John T.

Efforts have been made since the early 1950s to compile a comprehensive record of the occurrences of tornadoes in the United States. These efforts have resulted in the large database of more than 30,000 records of tornado occurrences maintained at the National Weather Service's Storm Prediction Center. Investigators using this database quickly come to appreciate that it contains both errors and biases. Research on one of the latter, population bias, is the subject of this presentation. In its simplest form, population bias results from some areas of the country having too few people to insure that all tornado occurrences in those areas are observed and reported. It manifests itself in apparent concentrations of tornadoes around urban areas and along major transportation corridors. Population bias is complicated, as it is time-dependent. In the last five decades, not only has the absolute population of the United States increased significantly, but also there has been a declining rural population, increasing urbanization, and growing urban sprawl.

In the research reported here, principal component analysis was used to identify large regions of the nation wherein tornado statistics were estimated to be homogeneous. The occurrences of tornadoes in these regions by decade were then analyzed over time, using counties as basic area elements. In the course of this analysis, plots of accumulated area versus accumulated number of tornadoes were prepared. In these summations, there are many possibilities for the order of the counties. An obvious ordering is alphabetical by county name, which turns out to be essentially random by size and population. As expected, the resulting plot is a nearly straight line from [zero area, zero tornadoes], i.e., the origin, to [total area of the region, total tornadoes reported in region this decade]. In contrast, if the data are ordered by county population density (county populations being obtained from national census data), the plot produces a characteristic bow-shaped curve. This lies below the straight line resulting from the random ordering, but has the same endpoints. The degree of separation between the two curves appears to be directly related to the population bias in the data being considered.

In light of this empirical finding, a simple heuristic model has been developed to relate population density to reported tornado occurrences. This model contains two parameters to be estimated from the data: a critical population density, above which all tornadoes that occur are assumed to be recorded; and a total number of tornadoes in the region for the decade. These two parameters are estimated by means of non-linear best fit to the data. In general the fit to the observed data is good. The results are indicative that many more tornadoes occurred than are recorded in the data set. Further, the fact that population bias can be represented by a simple model suggests that some tornado statistics can be corrected to give a better estimate of the risk of a tornado occurring at a particular location.

5A.3 An Investigation of Population Bias in Tornado Records