Post-processing of Model data with Machine-Learning Techniques for Operational Precipitation Forecast
This work applies ML techniques to different types of model output data in order to perform forecast of precipitation in short-term operational scales (typically less than 24hours). At this scale, many weather-sensitive operations can be heavily impacted by high amounts of rainfall and an accurate forecast is essential for effective management. The use of NWP and ML aims to combine the strengths of both techniques, where NWP models analysis provide large amounts of data that is used for supervised learning of an artificial neural network (ANN). The model data is a source of nonlinear atmospheric patterns that can be useful for effectively training ANNs for precipitation forecasting. Other observations sources can also be ingested to make these nonlinear patterns more realistic where good quality data is available from surface networks, satellite or radar. In this work, we apply the proposed method to forecast the occurrence of severe precipitation events in the region of Rio de Janeiro.
The model data used for training the ANN comes from two sources: the ECMWF operational model analysis and NCEP Final Analysis Data (FNL). Both datasets provide near real time atmospheric data, which allow the proposed method to be applied in an operational environment with the ECMWF dataset having a higher resolution (0.125˚ x 0.125˚) compared to the FNL dataset (1˚ x 1˚). At the present stage, features from the input dataset are manually selected by choosing variables that can explain precipitation formation and propagation processes. Surface and upper air data of the following variables are selected as features: sea level pressure, cloud water, boundary layer height, u and v wind-component, vertical movement, air temperature, dewpoint temperature, precipitable water, vapor mixing ratio and soil water volume. In order to capture the atmospheric patterns that are relevant for precipitation, the features are selected for an area covering a larger region around Rio de Janeiro city. Considering the number of features, vertical levels and points available – especially in the ECMWF high resolution dataset – the dimension of the input data can quick become very high, which makes the training of the neural network more challenging both in terms of computational cost and model generalization. To deal with this issue, we apply Principal Components Analysis (PCA) to reduce the dimensionality of the input data, still preserving as much of the variance of the original input data as possible.
Since the prediction target is precipitation, data from a rain-gauge network in Rio de Janeiro was used for the supervised learning of the ANN. The rain-gauge network is publicly available, has 17 years of data history and reports data in real time, which is necessary for operational use. The method proposed in this work aims to only forecast the occurrence of rainfall (yes/no), however further work can include the quantitative forecast of precipitation. We will discuss the results of applying the proposed technique to precipitation forecast in Rio de Janeiro, the impacts of choosing different architectures for the ANN, different criteria for the PCA and data of different periods for training, validation and testing of the ANN. For future work, more advanced ML techniques are being considered such as deep learning approaches to perform unsupervised feature extraction.