20th Conference on Weather Analysis and Forecasting/16th Conference on Numerical Weather Prediction


Evaluation of the National Marine Verification Program at WFO Key West

Matt C. Parke, NOAA/NWSFO, Key West, FL; and A. Devanas

The need for forecast verification has been well documented over the last several decades. In this regard, numerous measures have been developed to aid forecast offices in addressing the strengths and weaknesses of their marine forecasts. These range from measures of forecast accuracy (rms error, mean absolute error) to measures against a standard set of forecasts (e.g. equitable skill score). Around 1982, a National Marine Verification Program was instituted to collect statistics that address the overall performance of Weather Service Forecast Offices (WFO) marine forecasts as well as model (AVN/GFS, ETA, NGM) guidance to aid in evaluating the performance of NWS Marine Services. These statistics are available for download at the National Weather Service Verification website.

Using this information, it was found that the equitable skill score (ESS) used by the National Verification Program, is overly sensitive to improbable events. In fact, the equitable skill score awards forecasts which differ markedly from climatology. In a tropical marine environment, the ESS lacks the resistance needed to provide a stable, long term measure from which to judge WFO forecast accuracy trends. The ESS, which was designed to prevent forecaster “gaming”, does not adequately address the slight, but operationally relevant, changes in a tropical wind field. This is especially true at WFO Key West, where easterly trade winds dominate the marine environment for a great majority of the year. In addition, it was found that the size of the marine zones when compared to the number of verification sites available was inadequate to provide a meaningful measure of forecast accuracy, and care must be taken to extrapolate general results from the available data. Therefore, the ESS can not be considered a viable method of measuring forecast skill. Since the observation network used for verification is sparse, other verification methods need to be investigated which will emphasize the synoptic flow and should not heavily weight transitory mesoscale events.

