To address these limitations, we propose to extend a successful attempt in the data-driven equation discovery of cloud cover (Grundner et al., 2023) to the subgrid parameterization of surface precipitation in the Tropics. We ask: Can we find an analytic equation linking the vertical profile of thermodynamic variables and surface properties (land-sea fraction, statistical moments of the orography, etc.) to surface precipitation?
Following successful attempts, we first train simple feed-forward neural networks on meteorological reanalysis data before distilling the neural networks’ added value into symbolic equations using multi-population evolutionary algorithms. Unlike our previous attempts that were local in space or deterministic, modeling precipitation introduces two additional challenges: (1) Non-locality in the vertical dimension, e.g., to capture condensation throughout the vertical column; and (2) stochasticity, e.g., to capture precipitation onset and the spread of possible precipitation values in moist atmospheric columns.
To tackle the non-locality challenge, we discovered that using analytic kernels with three adjustable parameters for the vertical integration of relevant predictors across the atmospheric column strikes an effective balance between performance and interpretability. Our strategy involves optimizing the onset of precipitation through binary classification for rainy and non-rainy days, and then using precipitation on rainy days as the regression target. This approach allows us to also address the stochastic nature of precipitation by targeting the conditional moment of e..g, a Bernoulli-Gamma distribution via neural networks as a first step, and symbolic regression as a second step.
Preliminary results show that simple equations with only two vertically-integrated predictors and one surface predictor are sufficient to outperform current analytic baselines used to model precipitation in idealized studies (e.g., Ahmed et al., 2020). Coupled with the appropriate vertical integral of humidity and equivalent potential temperature, the subgrid standard deviation of orography gives our model the ability to generalize from ocean to land surfaces. Current research focuses on transfer learning the relationship learned on ERA5 meteorological reanalysis to IMERG observations, taking advantage of our parameterization’s stochastic nature to tolerate slight mismatches between drivers derived from reanalysis and observed precipitation. Our findings affirm the role of symbolic regression in discovering accurate yet interpretable equations to parameterize atmospheric processes, even when these processes are non-local and stochastic.
Bibliography
Ahmed, F., Adames, Á. F., & Neelin, J. D. (2020). Deep convective adjustment of temperature and moisture. Journal of the Atmospheric Sciences, 77(6), 2163-2186.
Grundner, A., T. Beucler et al.: Data-Driven Equation Discovery of a Cloud Cover Parameterization. Journal of Advances in Modeling Earth Systems (Accepted, In press).

