Assessing the Use of Prior Skill for Merging NMME Ensembles

Scheftic, William; Scheftic, William

Significant improvement has been made in numerical modelling of subseasonal-to-seasonal (S2S) climate over the past few decades. However, there are still inherent biases and systematic errors in models. Consequently, various forecasting centers around the world have produced long reforecasting sets of the same operational models to remove systematic errors, and multi-modeling efforts such as (North American Multi-Model Experiment) NMME and (Copernicus Climate Change Service) C3S provide common output to allow for comparison and merging of the model forecasts into a final super-ensemble forecast.

In this study, we explore ensemble experimental seasonal forecasts of precipitation (P) and temperature (T2m) produced over hydrologically defined regions of the western U.S. Each of the six NMME models selected for the study is post-processed by reducing the bias in the mean and variance through quantile matching and then in the ensemble spread by applying a simple spread-error correction. We then test several methods of generating weights, primarily through cross-validated skill metrics, to merge the model output and compare to weighting each model equally when merging (i.e. equal weighting). We ask the following questions: Do prior skill weighting schemes outperform equal weighting of models when merging NMME models? Can weighting schemes be improved by using different metrics or aggregation periods? Are there specific conditions in which weighting schemes perform better or worse than equal weighting?

Deterministic and probabilistic metrics are used to compare the merged ensemble forecasts relative to equal weighting. Our results suggest that merging the NMME models, even through equal weighting, results in more skilled and reliable seasonal forecasts of P and T2m than individual model ensembles. Our results also show that across the western U.S., there is little benefit to any of the weighting schemes applied over equal weighting. When using skill scores such as the ranked probability skill score for the prior weights, an offset added to the skill score improved performance of the merged forecasts. This suggests that even those models that have negative skill in the training period still contain valuable information about the forecast. Results also show that random forest and multiple linear regression did not perform as well as the weighting schemes tested but may still be useful for merging under different forecasting paradigms.

5.3 Assessing the Use of Prior Skill for Merging NMME Ensembles