2.1
Verifying probabilistic forecasts: calibration and sharpness
Tilmann Gneiting, University of Washington, Seattle, WA; and A. E. Raftery, F. Balabdaoui, and A. Westveld
During the past decade, ensemble forecasts have become the preferred operational tool of generating probabilistic forecasts of future weather events. Consequently, tools for assessing and comparing ensemble forecasts and, more generally, probabilistic forecasting techniques are in high demand.
A method of probabilistic forecasting is calibrated if events that are declared to have probability p occur with relative frequency p. It is sharp if the average length of the resulting prediction intervals is shorter than what is obtained from naive probabilistic forecasts, such as those based on climatology. A perfect probabilistic forecaster maximizes sharpness subject to calibration. We propose a diagnostic approach of assessing and comparing probabilistic forecasters, that is based on the principle of maximizing sharpness subject to calibration, and uses graphs and diagnostic plots, as well as summary measures. The notions of probabilistic calibration, exceedance calibration, and marginal calibration are introduced. Special emphasis is put on ramifications of the verification rank histogram and multicategory reliability diagram.
The various diagnostic tools are introduced in a simulation study, and they are applied to assess and compare methods of probabilistic temperature forecasting over the Pacific Northwest, based on the University of Washington MM5 mesoscale ensemble.
Session 2, Forecast Evaluation (Room 3A)
Tuesday, 13 January 2004, 8:30 AM-2:30 PM, Room 3A
Next paper