1.4
Overestimating Forecast Skill Through Improper Application of Verification Metrics: Simpson's Paradox in Meteorology

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Monday, 30 January 2006: 9:45 AM
Overestimating Forecast Skill Through Improper Application of Verification Metrics: Simpson's Paradox in Meteorology
A304 (Georgia World Congress Center)
Thomas M. Hamill, Physical Sciences Division/ESRL/NOAA, Boulder, CO; and J. Juras

It is common practice to summarize the skill of weather forecasts using an agglomeration of samples spanning many locations and dates. In calculating many of these verification metrics, there is an implicit assumption that the climatological frequency of event occurrence is fixed for all samples. If the event frequency actually varies among the samples, then the scores may report fictitiously high skill. This is an example of the previously described statistical conundrum known as “Simpson's Paradox.” Many common deterministic verification metrics such as threat scores are subject to overestimation of skill, and probabilistic forecast metrics such as the Brier skill score and relative operating characteristic are also affected. Demonstrations of the false skill are provided, and guidelines are suggested for how to adapt these diagnostics to avoid this problem.