Providing and Using Probability and Ensemble Forecasts: Best Practices for Hydrologic Forecasts and Beyond (Invited)

Hartmann, Holly C.; Hartmann, Holly C.

Forecasts serve as a key link between resource managers and hydroclimatic research, because almost all resource management decisions require, whether explicitly or implicitly, some sort of hydroclimatic forecast, and research progress is often judged in terms of improved predictability. Modelers assess performance with a suite of methods for evaluating model structure, parameterization, initialization, and forcing. However, predictions are often ultimately applied for other purposes and by users without advanced scientific or statistical training. Consideration of the panoply of forecast applications reveals the need for an expanded array of metrics for evaluating forecasts, their performance, and impacts of their application. Further, while advances in predictability are desirable, alone they are insufficient for more extensive application of forecasts or better decision and societal outcomes. For example, many decision makers have difficulty placing forecast information in appropriate historical and regional contexts. Downstream forecasts that reflect impacts of hydroclimatic variability typically have more relevance to decisions than forecasts of driving forces. Pervasive forecast misinterpretation and uncertainty about forecast accuracy present formidable barriers.

Quantitative evaluations of seasonal forecasts demonstrate that simple approaches, using hit or miss criteria (e.g., probability of detection, false alarm rate) or traditional summary statistics (e.g., root mean squared error, correlation) neglect important aspects of forecast performance and can even be misleading, potentially affecting resource management decisions. Further, some evaluation approaches are clearly inappropriate for probabilistic predictions, particularly application of traditional metrics of model performance, used for deterministic predictions of continuous variables, to mean or median values of an ensemble. Distributions-oriented evaluation criteria are more informative and allow decision makers to target those aspects of forecast performance that are important for their situation. For example, using the criteria of discrimination, predictions of seasonal streamflow volumes for Colorado River tributaries are shown to convey useable information not revealed by other criteria, even with lead-times of several months. While user-centric metrics may not comprise the standards for hydrologic model evaluation, they do constitute best practices for use with real-world decision makers. From a decision maker's perspective, it is important that forecast evaluations be frequently updated and target locations, time periods, lead times, and criteria important to specific decision making situations. From an operational perspective, more information needs to be archived than has been the traditional practice, especially for probabilistic predictions.

User-centric tools for self-directed learning, customized evaluation of predictions, placing predictions in context with supporting information, and use of best practices in communicating uncertainty offer a practical pathway for meeting the needs of diverse decision makers in assessing whether hydrology forecasts, or any predictions, are 'good' enough.

2.2 Providing and Using Probability and Ensemble Forecasts: Best Practices for Hydrologic Forecasts and Beyond (Invited)