Overestimating Forecast Skill Through Improper Application of Verification Metrics: Simpson's Paradox in Meteorology
Thomas M. Hamill, NOAA/CIRES/CDC, Boulder, CO; and J. Juras
It is common practice to summarize the skill of weather forecasts using an agglomeration of samples spanning many locations and dates. In calculating many of these verification metrics, there is an implicit assumption that the climatological frequency of event occurrence is fixed for all samples. If the event frequency actually varies among the samples, then the scores may report fictitiously high skill. This is an example of the previously described statistical conundrum known as “Simpson's Paradox.” Many common deterministic verification metrics such as threat scores are subject to overestimation of skill, and probabilistic forecast metrics such as the Brier skill score and relative operating characteristic are also affected. Demonstrations of the false skill are provided, and guidelines are suggested for how to adapt these diagnostics to avoid this problem.
Session 1, Forecast Evaluation
Monday, 30 January 2006, 9:00 AM-11:45 AM, A304
Previous paper Next paper
Browse or search entire meeting
AMS Home Page