Data-Driven Approaches for Simulating Rainfall in Climate Models

Saravanan, R.; Saravanan, R.

Climate model simulations of rainfall in the tropics suffer from pervasive biases. Through atmospheric teleconnections, these biases degrade climate simulations in other regions of the globe. Over the past two decades, high-resolution satellite measurements of tropical rainfall have become available. These data are most commonly used to constrain physics-based climate model simulations by validating statistical properties of rainfall such as temporal means and variances. Temporal correlation properties of rainfall are also sometimes validated, but typically on daily or longer timescales, such as those associated with equatorial waves. However, satellite rainfall measurements contain a wealth of spatio-temporal information on sub-diurnal timescales that can be used to construct predictive models of rainfall. This study explores the feasibility of predicting rain characteristics in the tropical Pacific from atmospheric profiles using a hierarchy of statistical models.

Our statistical approach is superficially similar to the physics-based approach in that vertical profiles of temperature and humidity at a particular instant of time serve as the primary predictors, and rainfall over a subsequent period of time is the predictand. However, we allow the statistical model to “learn” from the data by allowing the data to determine most important predictors as well as the parameters of the statistical model. In addition to temperature and humidity profiles, we also allow additional predictors, such as vertical wind shear and surface variables. Empirical Orthogonal Function (EOF) decomposition is applied to vertical profiles from NASA MERRA-2 atmospheric reanalysis to select the dominant predictor modes at analysis time 00 UTC. Rain predictions for the subsequent 6-hour period (00-06 UTC) are separated into different types from TRMM satellite data: stratiform, deep convective, and shallow convective. For each rain type, two different generalized linear statistical models (logistic regression for rain occurrence and gamma regression for rain amount) are trained on 2003 data and used to predict 2004 rain occurrence and rate, respectively. The first EOF of humidity and the second EOF of temperature contribute most to the prediction for both statistical models. The logistic regression generally performs well for all rain types, but does better in the East Pacific compared to the West Pacific. The gamma regression predicts reasonable geographical rain amount distributions but rain rate probability distributions are not predicted as well, suggesting the need for higher order models. In addition to generalized linear models, other common machine learning techniques (support vector machine and random forest) are compared. But the improvement, if any, is slight. Furthermore, marginal nonlinear relationship between predictand and individual predictor are explored via nonparametric regression techniques. Interestingly, incorporating the identified marginal nonlinear relationship into the generalized linear model does not improve the prediction, suggesting that these marginal nonlinear effects are explained by other predictors in the model.

The results of this study suggest that statistical models applied to TRMM radar observations and MERRA-2 environmental parameters can predict the spatial patterns and amplitudes of tropical rainfall in the time-averaged sense. Comparing the observationally trained models to models that are trained using NCAR CAM5 simulations points to possible deficiencies in the convection parameterization used in the model.

J66.4 Data-Driven Approaches for Simulating Rainfall in Climate Models