13.2 The Daily Variation in Verification Scores

Thursday, 14 January 2016: 2:00 PM
Room 226/227 ( New Orleans Ernest N. Morial Convention Center)
Huug van den Dool, NOAA/NWS/NCEP/CPC, College Park, MD; and S. Saha and Å. Johansson

Although verification metrics are much stared at for major institutional decisions about model implementation these metrics hold many secrets that have yet to be understood. The anomaly correlation (AC) for Z500 at day 5 from 20°N to the pole is one such metric. How the AC varies as a function of space is very rarely reported, presumably because the results are bewildering. Another mystery, and the subject of this paper, is the daily variation in verification scores. While the RMSE metric tends to increase with forecast lead time one can see irregularities superimposed to its growth that look like some sort of diurnal variation. To study this, one needs sufficient temporal resolution, i.e. all four ‘cycles' (00, 06, 12 and 18Z starts or more), and output data every 6 hours (or better), plus a large enough sample of hindcasts. One needs to study this feature for many variables at many levels in the vertical because the results (and their interpretation) are far from simple. For Z500 day 5 scores one may find that only the arrival time matters, not the departure time, i.e. scores for forecast arriving at 00Z and 12Z are better than those at 06Z and 18Z, regardless of departure time. This suggests an explanation that relates to the physical phenomenon of the diurnal cycle. Closer to the surface, and for the volatile temperature especially, the departure time matters as well, giving way to a non-physical diurnal effect related to data ingestion being higher at 00Z and 12Z. One must distinguish the short and medium range verification here because in the all-important short range (where growth of small errors takes place, and the onset of bias can be studied) the verification cannot be done against analyses. The only alternative is using observations (either at stations or analyzed in univariate fashion to a grid) for verification. Furthermore, there is a need to adjust both of these verification scores – against analysis or observations - so that they give (approximately) the same result. In other words, verification has to be done against our best estimate of the “truth”. An attempt in this direction is done by applying the theory of Desroziers et al. 2005.


Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 3385-3396.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner