Customized Verification of High-resolution WRF-ARW Forecasts using the Model Evaluation Tools (MET) Package

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Tuesday, 6 January 2015
james P. Cipriani, IBM Research, Yorktown Heights,, NY; and L. A. Treinish and A. Praino

Forecast verification is a key component of both research and operational weather modeling. Generating and understanding a wide range of skill scores over a forecast database can help to determine potential biases and outliers, allow for the fine-tuning of the model configuration, and build user confidence. For continuous variables (such as temperature, dew point, and wind speed), typical scores include mean absolute error (MAE), root mean squared error (RMSE), and mean error (ME, aka additive bias), which are based on direct (point-to-point) comparisons of forecast vs. observed values. For categorical variables (such as accumulated precipitation), scores can include critical success index (CSI, aka threat score), probability of detection (POD), and Accuracy (ACC), and are based on a contingency-table analysis (hits, misses, false alarms, and correct negatives). Given the strengths and weaknesses of each metric, it is often desirable (and necessary) to utilize multiple scores in order to assess the overall quality of the forecasts.

IBM has developed a high temporal and spatial resolution weather forecasting capability, known as Deep Thunder, which is customized for particular geographies and client requirements. Typical horizontal resolution is 1-2 km, with lead times out to 72 hours. Current deployments include the Detroit and New York metropolitan areas, southeast and northeast U.S., city of Rio de Janeiro, and country of Brunei. The validation is often performed according to client specifications and is based on comparisons against surface observations, which include both point (weather observing system) and gridded data.

The NCAR Developmental Testbed Center (DTC) has developed the Model Evaluation Tools (MET) verification package, which is highly-configurable and can operate on post-processed output from WRF-ARW (and can be applied to other model output as well). It includes standard scores for point-based (model grid to point), grid-point to grid-point, spatial, ensemble, and probabilistic verification.

To better validate the operational Deep Thunder forecasts, the METv4.0 package has been implemented and utilized for both point-based and grid-point to grid-point comparisons. We will discuss aspects of the verification process, data and customization, some results thus far (for specific geographies), challenges, and future work.