156 Enhancing Efficiency of the RRTMG Radiation Code with GPU and MIC Approaches for Numerical Weather Prediction Models

Monday, 7 July 2014
Michael J. Iacono, AER, Lexington, MA; and D. Berthiaume and J. Michalakes
Manuscript (1.7 MB)

Handout (2.1 MB)

The RRTMG radiative transfer options in the Weather Research and Forecasting (WRF) model have been modified to demonstrate improvement in computational performance of the radiation physics that can be attained using Graphics Processing Unit (GPU) and Many Integrated Core (MIC) technology. New versions of the longwave and shortwave RRTMG codes have been developed and tested in WRF on a GPU-enabled computer system (caldera) at NCAR and on MIC-enabled workstations. The GPU system uses Intel Xeon (Sandybridge) processors, it has two NVIDIA Tesla M2070-Q GPUs per node, and it supports the PGI compilers that are currently necessary to run the GPU-accelerated codes using CUDA Fortran. Although the radiation models have been restructured for this application, the high accuracy of the radiative transfer is shown to be unaffected. In order to fully utilize the potential of the GPU processing, the codes were transformed from operating on a single atmospheric column per call to running in parallel on multiple threads on the GPU over blocks of horizontal grid cells, vertical layers, and the RRTMG pseudo-spectral g-point dimension. In stand-alone mode, the GPU-accelerated radiation codes show speed-up on the NCAR system relative to the original codes for a large set of profiles, though the speed-up is dependent on the specific GPU hardware on caldera. Further enhancement is potentially attainable with newer GPU hardware. The specific improvement in computational performance of the RRTMG radiation options attained within WRF will be demonstrated. Additional work will also be described to improve the efficiency of RRTMG within the NCEP GFS and NMMB forecast models. This effort has focused on applying specific optimizations, including vectorization and dynamic thread scheduling, to utilize more effectively the Intel MIC system configuration.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner