Verification of Probabilistic Temperature Forecasts in the NWS Graphical Forecast Editor (GFE)

Colin, Leslie; Colin, Leslie

The National Weather Service continues to move toward a probabilistic forecasting framework with an ever-increasing demand for accurate probabilistic forecast guidance. One form of this guidance is the National Blend of Models (NBM), which includes a full probabilistic distribution of forecast weather elements. For example, the 90th percentile of NBM members (NBM90Pct) for the forecast high temperature indicates a 10 percent chance that the observed maximum temperature (MaxT) will be warmer; or alternately, a 90 percent chance that NBM90Pct will be warmer than the observed MaxT.

But how accurate is the forecast guidance received by NWS field offices? Is the NBM90Pct actually warmer than the observed MaxT 90 percent of the time over many forecasts, or warmer than the observed MaxT on 90 percent of the grid points on one forecast?

In many cases, model guidance verification will not be perfectly calibrated with observations. For example, a given NBM90Pct forecast may be warmer than observed only 84 percent of the time rather than 90 percent. We can record the departure.

As an alternative, a 90Pct (i.e., 90th percentile) gridded forecast can be constructed from the errors made by similar forecasts in the past (called analogs), with the assumption that the current forecast will make the same errors as the analogs. Those errors usually have a non-zero bias, and after subtracting the bias the remaining errors should be normally distributed and their standard deviation can be calculated. The standard deviation is then used in the error function (erf) to produce the 90Pct current forecast. And that forecast can be verified to see how close it is to 90 percent warmer than observed. As with the NBM90Pct, we can record the departure.

After repeating this process over many days the two sets of departures are compared. Whichever set has the smaller departures is the better-calibrated forecast, while the better forecast is the one with the smaller mean bias and standard deviation of the departures. It is explained that while there are many different intervals that can contain, say, 90% of the observed values, the optimum interval is the smallest one (which has the smallest standard deviation) located in the center of the normal distribution.

This article compares analog-based probabilistic forecasts to NBM probabilistic forecasts for various probability thresholds, for forecast periods out to seven days. Results are shown for both the official NWS forecasts produced by WFO Boise, ID, NBM 4.1 forecasts, and an analog-derived NBM. Additionally, a systematic adjustment using the analog approach can be shown to improve the accuracy of the NBM probabilistic forecast envelope for specific elements (e.g., MaxT). For example, from 01 Sep 2022 to 09 Aug 2023, for MaxT at the 50th percentile, NBM 4.1 was warmer than observed 54.99% of the time, whereas analog-derived NBM (uncorrected for bias) was warmer than observed 52.06% of the time, with consistent improvement on every forecast period through seven days.

The goals of this approach are to precisely calibrate probabilistic temperature forecast thresholds and ranges, remove bias, and systematically reduce the standard deviation of the errors.

The attached figure shows increasing probabilistic range (i.e., decreasing forecast accuracy) from day 1 (left) to day 7 (right), where zero bias is the thick horizontal black line, and the vertical scale is degrees F.

11.3 Verification of Probabilistic Temperature Forecasts in the NWS Graphical Forecast Editor (GFE)