The work described here is an extension of these previous efforts to an additional parameterization scheme: the MYNN surface scheme. The MYNN is a relatively simple scheme to port to GPUs algorithmically as it largely depends upon large loops over a large number of independent 1D columns comprising the lowest 2 model layers. Because of the large, independent nature of these loops, data is easily mapped to single-instruction multiple-data (SIMD) architectures such as the GPU. Validation of the scheme was done using output from the CCPP Single Column model as a baseline, with results showing only a small change in the least significant digit for real-valued variables primarily due to differences in rounding between the CPU and GPU. Computational performance of this new GPU-capable code was evaluated in parallel on 1 10-core Intel Haswell CPU vs 1 discrete Nvidia P100 PCIe GPU. To replicate real-world load on the scheme, the number of vertical columns was varied from 150,000 to 750,000 as may be encountered in a modern high-resolution global model. With data movement between the host and device being a significant bottleneck in GPU computing, performance ranged from a 2-3x slowdown with unoptimized data movement to a 12-42x speedup with fully optimized data movement.
 - Indicates paper has been withdrawn from meeting
 - Indicates paper has been withdrawn from meeting - Indicates an Award Winner
 - Indicates an Award Winner