The calculation of SAMs involves the following three steps: 1) Define appropriate subsets. For example, one subset could be all experiments, all initial times, for NHX AC for 2-day forecasts of 500 hPa height. Under H0, the null hypothesis, all the metrics within a subset are from the same reference distribution. 2) Normalize. Each original or primary assessment metric (PAM) is normalized. The resulting normalized assessment metrics (NAMs) range from 0 (poor) to 1 (excellent). The normalization is different for each subset. ECDF normalization is proportional to rank in the subset of the reference sample. Under H0, the normalized metrics are uniformly distributed on [0,1]. 3. Average. Since the normalized metrics are comparable, we may average them for each experiment over some or all of the different subset dimensions: variables, levels, forecast times, geographic domains, initial times, and metrics (e.g., AC and RMSE). Under H0, the averages are approximately Gaussian and have mean 0.5, and variance 1/(12n), where nis the number of NAMs averaged.
The main advantages of the ECDF approach are that it is amenable to statistical significance testing and produces results that are easy to interpret because the SAMs for various subsets tend to vary smoothly and in a consistent manner. In addition, the ECDF approach can be applied in various contexts thanks to the flexibility allowed in the definition of the reference sample.
Examples of the impact of potential future data gaps are consistent with previously reported results. An interesting finding is that the impact of observations decreases with increasing forecast time. This is interpreted as being caused by the masking effect of NWP model errors increasing to become the dominant source of forecast error. Additional examples will be shown comparing the recent forecast skill of several operational systems.