The most common strategy for training a parameterization is to formulate an offline (supervised) training task. In this setup, neural networks' parameters (i.e., weights and biases) can be easily trained using stochastic gradient descent through backpropagation. However, one key challenge for offline training a GWP is the limited availability of observations of GWs, compounded by issues of sparsity and noise. The other key challenge is the need to separate GWs from the large-scale flow and accurately determine the GW drag (GWD). This task is far from straightforward and has proven sensitive to separation methods, as well as filtering and coarse-graining operations. Furthermore, the un-resolved and under-resolved portion of the GWD, which is what has to be added to a low-resolution general circulation model (GCM), depends on the GCM's effective resolution. This effective resolution is influenced not just by the model's horizontal and vertical resolutions, but also by both implicit numerical and explicit dissipation.
Given all these challenges, there is a clear need for methods to calibrate or online train GWP when it is integrated into a climate model. This can be achieved by formulating the training task as an inverse problem, which can be solved by the ensemble Kalman inversion (EKI) algorithm. EKI offers several advantages over other training algorithms. First, it enables the use of statistically averaged data, bypassing the need for trajectories, addressing the main challenges mentioned earlier. Second, training is performed using gradient-free methods, making it suitable for climate models where obtaining derivatives may be challenging or infeasible.
In this work, we use a one-dimensional (1D) model of the quasi-biennial oscillation (QBO) as a testbed to demonstrate an offline-online strategy for training a convolutional neural network (CNN)-based GWP. QBO is characterized by the downward propagation of successive westerly and easterly wind regimes with an average period of about 28 months. It is the primary mode of interannual variability in the tropical stratosphere with links to subseasonal-to-seasonal forecast skill. GWs are believed to contribute significantly to the forcing of the QBO, and most GCMs rely on GWP to explicitly realize the QBO.
We show that offline training of this CNN-based GWP within a big-data regime spanning 100 years successfully reproduces a realistic QBO when used in place of the existing physics-based GWP. Conversely, offline training of the CNN within a small-data regime covering only 18 consecutive months yields an unrealistic QBO. However, we demonstrate that this unrealistic QBO can be rectified through online re-training of only the first and last hidden layers, using ensemble Kalman inversion (EKI), and only time-averaged statistics such as the period and amplitude of the QBO (Figure 1).
Spectral analysis of the CNN's kernels, trained within the big-data regime, reveals a notable dominance of low- and high-pass spectral filters. Such a finding is consistent with the dynamics of wave propagation and dissipation, where both local and non-local dynamics play significant roles. Online re-training of the CNN initially trained in the small-data regime brings the frequency of these dominant filters closer to those observed in the big-data regime.
Our study, although primarily centered on the parameterization of GWs, illustrates a strategy that could be efficiently used for other subgrid-scale parameterization tasks. Our findings underscore the potential for online re-training of data-driven parameterization schemes when they are incorporated into climate models.

