A Non-Parametric Definition of Summary NWP Forecast Assessment Metrics

Hoffman, Ross N.; Hoffman, Ross N.

A large number of quantitative assessment metrics (such as forecast anomaly correlation (AC) and the forecast root mean square error (RMSE)) are produced by modern NWP systems. Here we propose the empirical cumulative density function (ECDF) approach to combine multiple, diverse assessment metrics into summary assessment metrics (SAMs) to analyze the results of impact experiments and pre-operational implementation testing with NWP models.

The calculation of SAMs involves the following three steps: 1) Define appropriate subsets. For example, one subset could be all experiments, all initial times, for NHX AC for 2-day forecasts of 500 hPa height. Under H₀, the null hypothesis, all the metrics within a subset are from the same reference distribution. 2) Normalize. Each original or primary assessment metric (PAM) is normalized. The resulting normalized assessment metrics (NAMs) range from 0 (poor) to 1 (excellent). The normalization is different for each subset. ECDF normalization is proportional to rank in the subset of the reference sample. Under H₀, the normalized metrics are uniformly distributed on [0,1]. 3. Average. Since the normalized metrics are comparable, we may average them for each experiment over some or all of the different subset dimensions: variables, levels, forecast times, geographic domains, initial times, and metrics (e.g., AC and RMSE). Under H₀, the averages are approximately Gaussian and have mean 0.5, and variance 1/(12n), where nis the number of NAMs averaged.

The main advantages of the ECDF approach are that it is amenable to statistical significance testing and produces results that are easy to interpret because the SAMs for various subsets tend to vary smoothly and in a consistent manner. In addition, the ECDF approach can be applied in various contexts thanks to the flexibility allowed in the definition of the reference sample.

Examples of the impact of potential future data gaps are consistent with previously reported results. An interesting finding is that the impact of observations decreases with increasing forecast time. This is interpreted as being caused by the masking effect of NWP model errors increasing to become the dominant source of forecast error. Additional examples will be shown comparing the recent forecast skill of several operational systems.

618 A Non-Parametric Definition of Summary NWP Forecast Assessment Metrics