Abstract: Evaluating verification procedures for ensemble precipitation predictions (2002 Annual)

Monday, 14 January 2002: 2:43 PM

Evaluating verification procedures for ensemble precipitation predictions

Edward I. Tollerud, NOAA/FSL, Boulder, CO; and A. F. Loughe

With increasing operational emphasis on ensemble predictions, users of model results are now faced with the requirement to assess a range of predicted gridpoint values rather than a single deterministic value. Furthermore, since observations themselves cannot be considered perfectly accurate or well-sampled, verification fields should also be considered to possess uncertainty. The credibility of verification scores are potentially sensitive to both of these sources of uncertainty. For instance, the conclusion that an improved score indicates better model performance is complicated by the possibility that the improvement is within the variability to be expected when individual ensemble members are verified against uncertain observations.

Using Eta model ensemble predictions of rainfall produced by the NCEP Short Range Ensemble Forecast system, several standard precipitation verification scores (including frequency and magnitude bias, equivalent threat score, and simple average absolute difference), and the operational daily raingage network, we estimate confidence intervals for scores under several different scenarios. First, to determine the effect of observation quality on verification, we compare verification results produced subject to different levels of raingage quality control. Second, to gain a more general idea of observation variability and its effect on scores, we compute a range ("ensemble", if you will) of possible observed precipitation fields using bootstrapping methods to resample the raingage observations. A byproduct of these latter computations is the determination of confidence intervals for the actual verification scores and a partial answer to the question: How large must a difference in these scores be to justify claims of model superiority or model improvement? With the results of these comparisons in mind, we qualitatively address the legitimacy of estimates of verification scores computed using average or median ensemble predictions and average or median ensemble observations.

Supplementary URL: