8.2 Statistical Significance: the Wrong Answer to the Wrong Question (Invited Presentation)

Wednesday, 13 January 2016: 9:00 AM
Room 226/227 ( New Orleans Ernest N. Morial Convention Center)
Simon J. Mason, Columbia University/IRI, Palisades, NY

Statistical significance tests have become a widely adopted procedure for assessing whether a particular research result is "meaningful" or "correct". The basic idea involves calculating a test statistic (perhaps a correlation, or a difference in means), and then calculating what the probability is that the result could have been equaled or bettered by chance. If this probability is sufficiently low (less than 5%, or in some cases 10%) then the result is considered sufficiently strong to be proof of ..., well at this point the logic typically starts to become a bit hazy and questionable! What does a significance test actually tell us, and is what it tells us even interesting? Problems with significance testing are beginning to be seen as so egregious that some journals, notably in the statistics and medical literature, will not publish articles that use them. The problems are so bad that it can be demonstrated that most claimed research findings are false (although results in some disciplines are more susceptible to being falsely positive than in others). Why is this true? Some of the problems with significance tests are widely recognised (correlation does not imply causation, for example), but others apparently are less widely acknowledged. In this presentation I will detail what the p-value does mean, and why it remains possible for such a large proportion of published research results to be false despite apparently rigorous significance testing. I will outline the reasons why significance testing should be discouraged, pointing to two main issues: the fact that the p-value does not really address the question we are ultimately interested in, and secondly that the tests are invariably invalid whether because of violated assumptions and/or because of inherent biases in the way science proceeds (we are much more inclined to look for relationships between two or more sets of data than to demonstrate that such relationships do not exist). Of course, the fact that statistical significance testing does not work very well is not an excuse for ignoring the questions we are falteringly trying to address with them, and so some alternative procedures for assessing whether our research results are "meaningful" will be proposed.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner