Post-processing of Model data with Machine-Learning Techniques for Operational Precipitation Forecast

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Tuesday, 6 January 2015: 11:30 AM
124B (Phoenix Convention Center - West and North Buildings)
Igor Oliveira, IBM Research, Rio de Janeiro, RJ, Brazil; and C. N. dos Santos

Weather forecasting is one of key challenges that environmental research has been trying to solve since the initial attempts to understand the planet's atmosphere. Numerical Weather Prediction (NWP) techniques have been applied with success for a long time now and are consolidated as the main tool for weather forecasting. NWP models have evolved from very simple models to fully coupled physical models that are able to simulate multi-scale processes in the atmosphere, ocean, biosphere, criosphere, etc. Despite this evolution, gaps in accuracy still occur. Machine-Learning (ML) techniques can be an effective alternative to improve model performance; especially in the forecasting of variables that are highly nonlinear like precipitation.

This work applies ML techniques to different types of model output data in order to perform forecast of precipitation in short-term operational scales (typically less than 24hours). At this scale, many weather-sensitive operations can be heavily impacted by high amounts of rainfall and an accurate forecast is essential for effective management. The use of NWP and ML aims to combine the strengths of both techniques, where NWP models analysis provide large amounts of data that is used for supervised learning of an artificial neural network (ANN). The model data is a source of nonlinear atmospheric patterns that can be useful for effectively training ANNs for precipitation forecasting. Other observations sources can also be ingested to make these nonlinear patterns more realistic where good quality data is available from surface networks, satellite or radar. In this work, we apply the proposed method to forecast the occurrence of severe precipitation events in the region of Rio de Janeiro.

The model data used for training the ANN comes from two sources: the ECMWF operational model analysis and NCEP Final Analysis Data (FNL). Both datasets provide near real time atmospheric data, which allow the proposed method to be applied in an operational environment with the ECMWF dataset having a higher resolution (0.125˚ x 0.125˚) compared to the FNL dataset (1˚ x 1˚). At the present stage, features from the input dataset are manually selected by choosing variables that can explain precipitation formation and propagation processes. Surface and upper air data of the following variables are selected as features: sea level pressure, cloud water, boundary layer height, u and v wind-component, vertical movement, air temperature, dewpoint temperature, precipitable water, vapor mixing ratio and soil water volume. In order to capture the atmospheric patterns that are relevant for precipitation, the features are selected for an area covering a larger region around Rio de Janeiro city. Considering the number of features, vertical levels and points available especially in the ECMWF high resolution dataset the dimension of the input data can quick become very high, which makes the training of the neural network more challenging both in terms of computational cost and model generalization. To deal with this issue, we apply Principal Components Analysis (PCA) to reduce the dimensionality of the input data, still preserving as much of the variance of the original input data as possible.

Since the prediction target is precipitation, data from a rain-gauge network in Rio de Janeiro was used for the supervised learning of the ANN. The rain-gauge network is publicly available, has 17 years of data history and reports data in real time, which is necessary for operational use. The method proposed in this work aims to only forecast the occurrence of rainfall (yes/no), however further work can include the quantitative forecast of precipitation. We will discuss the results of applying the proposed technique to precipitation forecast in Rio de Janeiro, the impacts of choosing different architectures for the ANN, different criteria for the PCA and data of different periods for training, validation and testing of the ANN. For future work, more advanced ML techniques are being considered such as deep learning approaches to perform unsupervised feature extraction.