Improving Serial and Parallel Efficiency of NWP Models with Differential Transforms and ADER Time Integration

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Thursday, 6 February 2014
Hall C3 (The Georgia World Congress Center )
Matthew R. Norman, ORNL, Oak Ridge, TN

Successful atmospheric simulation must respect nonlinear interactions between PDE terms in multiple spatial dimensions and in time. Also, model throughput must accommodate ensemble and lead time requirements for applications of interest to stakeholders. Therefore, optimality is neither straightforward nor unique, leading to the wide range of assumptions and tradeoffs considered as acceptable within the NWP community. To respect this wide range of modeling choices, we propose a flexible and extensible toolset that can improve the efficiency of any time-explicit atmospheric model or model component while respecting accuracy, often more so than the original methods used. At the core of this toolset are Differential Transforms (DTs), which are also used extensively in so-called ``automatic'' or ``algorithmic'' differentiation. Herein, we present how DTs may improve spatial and temporal operator efficiency for a range of modeling choices including Finite-Volume and Galerkin.

The i-th and j-th DT of $u(x,t)$ is defined as

\begin{displaymath}  U\left(i,j\right)=\frac{1}{i!j!}\frac{\partial^{\left(i+j\right)}u}{\partial x^{i}\partial t^{j}}  \end{displaymath} (1)

, which is also a coefficient in the Taylor expansion of $u$, providing a trivial inverse transform. Novel to DTs is that even complex, non-linear PDE systems are transformed into simple recurrence relations over derivatives. For instance, the DT of the 1-D inviscid Burger's equation, $\partial_{t}u\left(x,t\right)+\partial_{x}f\left(u\right)=0$, where $f\left(u\right)=\frac{1}{2}u^{2}$, is:

\begin{displaymath}  U\left(i,j+1\right)=-\left(\frac{i+1}{j+1}\right)F\left(i+1,j\right)  \end{displaymath} (2)

\begin{displaymath}  F\left(i,j\right)=\frac{1}{2}\sum_{s=0}^{j}\sum_{r=0}^{i}U\left(r,s\right)U\left(i-r,j-s\right)  \end{displaymath} (3)

From this, we note two levels of applying DTs to NWP models. First, from (3), we see that, given known derivatives of the fluid state (e.g., from Galerkin basis functions or a reconstructed stencil of cells), the derivatives of even complex, non-linear PDE terms can be obtained with simple recurrence relations. Since all PDE terms can then be expanded directly as polynomials, any combination of integration, differentiation, or sampling operators can be combined efficiently into a single dot product, significantly improving upon the costs of quadrature methods in multiple spatial dimensions. It is important to note that all derivatives obtained using DTs are analytical (accurate to machine precision), and they fully couple all space and time dimensions to any arbitrary order of accuracy including all cross-terms. Order of accuracy is increased merely by changing the loop bounds of the summations involved. For complex PDE terms, analytically integrating DTs is more accurate than quadrature.

Second, from (2), we note that we can also obtain time and space-time derivatives from a given set of spatial derivatives. For instance, if we know $u$, $\partial_{x}u$, and $\partial_{xx}u$ (e.g., from reconstruction), we can then calculate $\partial_{t}u$, $\partial_{xt}u$, and $\partial_{tt}u$, which gives us third-order-accurate, space-time polynomials, fully and non-linearly coupled in space and time, of $u$ and $f$ across the cell / element in question and over the length of a time step. This allows direct space-time integration, differentiation, and sampling of any PDE term to provide a fully coupled, arbitrarily high-order-accurate, one-step time integration scheme suitable for any set of moments (point values, averages, derivatives), any evolution constrain (strong, weak, variational), and any grid (structured, unstructured, curvilinear). This time discretization is identical to the ADER (Arbitrary DERivatives) philosophy, but the expense of the Cauchy-Kowalewski procedure is avoided by the use of DTs. Using iterations to avoid C-K expense is not unique, but DTs are more flexible and extensible than existing variational techniques. Not only is this usually faster in serial than multi-stage methods, but it is also advantageous in massively parallel computing environments because inter-node data transfer frequency and volume are reduced.

Due to power constraints, computers are becoming increasingly fast at computation while decreasingly generous with local cache space. What this means is that in order to realize the computations per second possible on modern chips, we must do more computations per data access. One way to accomplish this is to use higher-order-accurate methods, for which adjacent cells / elements share moments / stencils during reconstruction and can re-use data for significantly greater amounts of computation per data access. These DT-based ADER methods trivially allow for arbitrarily high-order-accurate and fully coupled methods in multiple spatial dimensions and in time, paving an easier path toward high-order-accuracy in NWP models.

The ADER time discretization maintains non-oscillatory properties of robust spatial reconstruction operators (e.g., Piecewise Parabolic Method, Weighted Essentially Non-Oscillatory, Hermite WENO, etc.). Because the spatial derivatives sampled from a robust spatial reconstruction are also robustly limited to shocks and discontinuities, the resulting analytically evaluated time and space-time derivatives are also well-bounded and robust to shocks. We include some tests with discontinuities and shocks to demonstrate this.

We apply DTs at two different levels for 2-D transport and 3-D compressible, non-hydrostatic Euler dynamics for reconstructed Finite-Volume operators and for Spectral Element / Discontinuous Galerkin operators. The first level of application is to simplify the cost of spatial operators by removing the need for quadrature by expanding all PDE terms directly as spatial polynomials. The second level of application is to use DTs not only to remove quadrature but to also expand all PDE terms directly as polynomials in space and in time, leading to an ADER time discretization to replace multi-stage time integrators. Standard test cases are run, and results are discussed quantitatively and qualitatively from viewpoints of extensibility, flexibility, accuracy, and time-to-solution. Linear analyses reveal stable time step regions, which for Finite-Volume methods follow the typical linear CFL criterion of $1/D$, where $D$ is the number of spatial dimensions coupled in a time step.

Finally, it is important to couple as many spatial dimensions as possible. For models with uniform aspect ratio, all three spatial dimensions may be fully coupled with time. For typical NWP regimes, while the vertical dimension requires separate treatment (hydrostasis or semi-implicit), the two horizontal dimensions can still be fully coupled. DTs and ADER time integration, along with smart multi-dimensional recovery operators that form fully tensored multi-dimensional polynomials in $\mathcal{O}\left(n^{D+1}\right)$ complexity ($n$ is order of accuracy and $D$ is number of spatial dimensions), may encourage more modeling applications to employ full coupling of spatial dimensions.