spectral element method. I'll present performance results from our
Fortran and C++/kokkos implementations of this dynamical core on a
range of processors (Ivy Bridge, Haswell, KNL, and P100 and V100
GPUs). With a careful implementation, the C++ code is competitive with
the Fortran code on all processors and also supports GPUs.
This work allows us to evaluate the effectiveness of GPU
architectures for modern climate models. After normalizing for
power consumptions, we see that with sufficient work per node, the
v100 GPU can obtain significant speedups over conventional Xeons.
But in the strong scaling limit, we see little or no improvement over
conventional architectures.
Unfortunately, climate models are run close to their scaling limit in
order to meet throughput requirements, and thus porting to GPUs not
cost effective. Instead, GPU systems should be used in other
regimes. Two regimes where GPU systems can provide a real benefit
are super-parameterization and ultra high resolution simulations
running at throughput rates suitable for short process studies.