A method of probabilistic forecasting is calibrated if events that are declared to have probability p occur with relative frequency p. It is sharp if the average length of the resulting prediction intervals is shorter than what is obtained from naive probabilistic forecasts, such as those based on climatology. A perfect probabilistic forecaster maximizes sharpness subject to calibration. We propose a diagnostic approach of assessing and comparing probabilistic forecasters, that is based on the principle of maximizing sharpness subject to calibration, and uses graphs and diagnostic plots, as well as summary measures. The notions of probabilistic calibration, exceedance calibration, and marginal calibration are introduced. Special emphasis is put on ramifications of the verification rank histogram and multicategory reliability diagram.
The various diagnostic tools are introduced in a simulation study, and they are applied to assess and compare methods of probabilistic temperature forecasting over the Pacific Northwest, based on the University of Washington MM5 mesoscale ensemble.
Supplementary URL: