14.3 Towards Improving the Framework for Probabilistic Forecast Evaluation

Thursday, 14 January 2016: 4:00 PM
Room 226/227 ( New Orleans Ernest N. Morial Convention Center)
Hailiang Du, University of Chicago, Chicago, IL; and L. Smith, E. B. Suckling, E. L. Thompson, and T. Maynard

The evaluation of forecast performance plays a central role both in the interpretation and use the forecast system and in their development. Many forecast systems are available, but evaluations of their performance are not standardized, with many different scores being used to measure different aspects of performance. Ensemble interpretations which interpret a probability forecast as a single delta function (such as the ensemble mean) or as a collection of delta functions (reflecting, for example, the position of each ensemble member) rather than considering all the probabilistic information available may provide misleading estimates of skill in nonlinear systems. Even when the discussion is restricted to proper scores, there remains considerable variability between scores in terms of their sensitivity to outcomes in regions of low (or vanishing) probability; proper scores need not rank competing forecast systems in the same order when each forecast system is imperfect. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good's logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Comparing scores for forecast systems based on physical models (for example HadCM3 from the CMIP5 decadal archive) against benchmark forecasts from empirical models is more informative than internal comparison systems based on similar physical simulation models with each other. Physically inspired empirical models are shown to display probabilistic skill comparable to that of today's state-of-the-art simulation models.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner