Customized Verification Applied to High-Resolution WRF-ARW Forecasts for Rio de Janeiro

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Thursday, 6 February 2014: 8:45 AM
Room C205 (The Georgia World Congress Center )
James P. Cipriani, IBM Thomas J. Watson Research Center, Yorktown Heights, NY; and L. A. Treinish, A. P. Praino, R. Cerqueira, M. N. Santos, V. C. Segura, I. C. Oliveira, K. Mantripragada, and P. Jourdan

Verification is an integral part of weather modeling. Generating a continuous database of statistics, based on operational forecasts, can be particularly useful for: (1) analysis of a current model configuration, (2) fine-tuning a configuration for enhanced future deployment, and (3) building confidence in the ability of the model to predict various scenarios. In the meteorological community, verification often consists of standard scores, based on point observations, which include both (a) continuous (e.g., 2-meter temperature and dew point, 10-meter wind speed) and (b) categorical (e.g., precipitation, visibility, ceiling, etc.) variables. Typical continuous metrics include mean absolute error (MAE), root mean squared error (RMSE), and mean error (ME, i.e., additive bias). These metrics are derived from direct comparison (forecast value to observation value). In contrast, categorical metrics include scores such as critical success index (CSI, aka threat score), Heidke skill score (HSS), probability of detection (POD), and accuracy (ACC). They are derived from the analysis of a contingency table (hits, misses, false alarms, correct negatives). Since each score has strengths and weaknesses, a diversity of them has value in a comprehensive assessment of forecast models.

IBM has a state-of-the-art, high temporal and spatial resolution weather forecasting capability, known as Deep Thunder, which is customized for a particular geography and client requirements (i.e., local weather sensitivity). For the City Government of Rio de Janeiro, the primary concern is flash flooding and mudslides, especially given the steep topography (i.e., high aspect ratios) throughout the city, the sub-tropical environment, and coastal influences (land/sea breezes). Since May 2011, Deep Thunder has been running in a production environment, generating 48-hour forecasts for the metropolitan area at 1-km horizontal resolution, which is updated twice daily, and has been tailored to meet the city's needs. As a result of (1) the customized configuration, (2) the city's operational procedures for managing significant weather impacts, and (3) the focus on precipitation, traditional verification approaches are not sufficient and further customization is needed.

As an alternative, IBM Research has worked with the City Government of Rio de Janeiro to define the appropriate metrics for validating the model-based forecasts against accumulated precipitation across 46 observing sites, where applicable: 33 from AlertaRio (city-operated), eight from INEA (state-operated), and five from INMET (national). Several metrics have been developed as part of this effort since late December 2011. The implementation of these metrics leverages version 4.0 of the Model Evaluation Tools (MET) package, developed by the Developmental Testbed Center (DTC) to perform basic statistical calculations. It is associated with custom pre- and post-processing to generate the required data. As an example, from early January 2012 through mid-April 2013, we employed a methodology based on six-hour accumulated precipitation with a certain level of tolerance between forecast and observed conditions. This was used to generate weekly reports of an accuracy statistic. More recently (May 2013 through the present), we identified the need for a metric more representative of the weather impacts on the city of Rio and thus, developed an alternative approach, which compares the maximum hourly accumulated precipitation within each three-hour forecast period and aggregated through each, entire 48-hour forecast. The precipitation data are decomposed into seven categories (<=1, 1-5, 5-15, 15-25, 25-50, 50-75, > 75 mm in one hour) to create a multi-category contingency table. The elements of the table are weighted to drive a more focused analysis in order to generate an average CSI score. Specifically, the weighting allows for a multi-category contingency table analysis, with higher weights given to stronger events (i.e., “weak” rain is not as important as “very strong” rain). Thus, the individual scores for each category range between zero and one (as percentages of “misses”, “hits”, “false alarms”) and are aggregated to produce an overall score. We have experimented with different types of weighting (including linear and quadratic) among the categories. In addition, we have developed customizable support for tolerances between rain intensities, such that an error between the forecasted and observed values may be treated non-uniformly, depending on the difference between the two.

It is important to note that while the forecasts are compared against point data, the observing sites are not uniformly distributed across Rio de Janeiro, unlike the weather model data. The sites are mainly concentrated in the eastern portion of the city, which makes it difficult to assess the quality of the forecasts in areas with sparse coverage. In addition, most of the sites do not report temperature, humidity, pressure, or wind speed/direction. As a result, more comprehensive verification is not feasible, and therefore, the verification results only pertain to accumulated precipitation. Spatial techniques are also a challenge, given the lack of high-resolution gridded precipitation analyses for direct comparison.

We will discuss aspects of the verification process, including the use of the METv4.0 package, our customized approach as well as a brief overview of operational considerations, some of the results thus far, challenges, and future work.