In the research reported here, principal component analysis was used to identify large regions of the nation wherein tornado statistics were estimated to be homogeneous. The occurrences of tornadoes in these regions by decade were then analyzed over time, using counties as basic area elements. In the course of this analysis, plots of accumulated area versus accumulated number of tornadoes were prepared. In these summations, there are many possibilities for the order of the counties. An obvious ordering is alphabetical by county name, which turns out to be essentially random by size and population. As expected, the resulting plot is a nearly straight line from [zero area, zero tornadoes], i.e., the origin, to [total area of the region, total tornadoes reported in region this decade]. In contrast, if the data are ordered by county population density (county populations being obtained from national census data), the plot produces a characteristic bow-shaped curve. This lies below the straight line resulting from the random ordering, but has the same endpoints. The degree of separation between the two curves appears to be directly related to the population bias in the data being considered.
In light of this empirical finding, a simple heuristic model has been developed to relate population density to reported tornado occurrences. This model contains two parameters to be estimated from the data: a critical population density, above which all tornadoes that occur are assumed to be recorded; and a total number of tornadoes in the region for the decade. These two parameters are estimated by means of non-linear best fit to the data. In general the fit to the observed data is good. The results are indicative that many more tornadoes occurred than are recorded in the data set. Further, the fact that population bias can be represented by a simple model suggests that some tornado statistics can be corrected to give a better estimate of the risk of a tornado occurring at a particular location.