9.3 "The Stippling Shows Statistically Significant Gridpoints": How Research Results are Routinely Overstated and Over-interpreted, and What to Do About It (Invited Presentation)

Wednesday, 13 January 2016: 11:30 AM
Room 226/227 ( New Orleans Ernest N. Morial Convention Center)
Daniel S. Wilks, Cornell University, Ithaca, NY
Manuscript (705.3 kB)

The problem of simultaneously evaluating results of multiple hypothesis tests, often at a large network of gridpoints or other geographic locations, is widespread in meteorology and climatology. Unfortunately, the dominant approach in the literature to this problem is to naively examine each gridpoint test in isolation, and then to report as "significant" any result for which a local null hypothesis is rejected, with no adjustment for the effects of test multiplicity on the overall result. As a consequence, language similar to the hypothetical quotation in the title of this paper is distressingly common, which immediately flags the results portrayed as almost certainty overstated. This statistically unprincipled practice should be unacceptable to reviewers and editors of scientific papers.

Controlling the False Discovery Rate (FDR) has many favorable attributes, including only modest sensitivity to spatial autocorrelation in the underlying data. Perhaps the greatest advantage of the FDR approach is that, by design, a control limit is placed on the fraction of significant gridpoint test results that are spurious, which greatly enhances the interpretability of the spatial patterns of significant results. Because the FDR approach is not only effective, but is also easy and computationally fast, it should be adopted whenever the results of simultaneous multiple hypothesis tests are reported or interpreted in the literature. Its main computational demand is only that the individual gridpoint p-values be sorted and examined. The usual strong spatial correlation encountered in gridded atmospheric data can be accommodated. The consequence of employing this statistically principled procedure in stark contract to the all-too-common naive approach is that there is much reduced scope for overstatement and over-interpretation of the results. In particular the analyst is not tempted to construct possibly fanciful rationalizations for the many spurious local test rejections, which may appear to be physically coherent structures because of the strong spatial autocorrelation, that competing methods produce.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner