1.5
Next Generation HPC and Forecast Model Application Readiness at NCEP

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Thursday, 8 January 2015: 9:30 AM
128AB (Phoenix Convention Center - West and North Buildings)
John Michalakes, NOAA; and M. J. Iacono

We describe efforts at the NOAA National Centers for Environmental Prediction to ready operational forecast models for new generations of HPC systems expected in the 2018-20 time frame and beyond. These systems will provide dramatically higher peak floating-point rates – multi-petascale verging to exascale – but proportionately less of everything else that matters for performance: memory bandwidth, cache capacity, thread-speed, and I/O bandwidth. Efficient use of the Intel Xeon Phi Many Integrated Core (MIC) architecture, Graphics Processor Units (GPUs) from NVIDIA, and successive generations of “conventional” multi-core processors will require greater concurrency, fine-grained parallelism, and memory system performance. We characterize existing performance bottlenecks in these areas by direct measurement using hardware counters, analyses of run-time traces and timers, and static analyses (compiler generated reports and manual inspection) of loops and data structures.

Detailed analysis of column-physics components such as RRTMG radiation and other model components shows that threads executing these packages are state-heavy and thereby exhaust local storage: cache memory on conventional CPU and MIC cores or shared memory on GPUs. Fine-grained parallelism (vectorization on Xeon and thread parallelism on GPUs) is inhibited by data-dependencies in the vertical dimension of column physics. And while there is fine-grained parallelism over dependency-free horizontal dimensions of weather model domains, processing a vector of state-heavy grid columns per thread only exacerbates aforementioned per-thread pressure on local storage. In spite of these constraints, various code and data restructuring techniques have yielded performance gains for RRTMG on next-generation processors. Better still, these changes also result in improved performance on the host processor.

We will also touch on efforts to improve software architecture and processes at NCEP for developing, maintaining, and using high-performance codes for operations and research, touching on the inherent trade-offs between modularity and performance. And we will provide an overview of efforts underway to strengthen connections to the research community as we evaluate new dynamics, grid systems, and scale-appropriate physics informing development of a Next Generation Global Prediction System (NGGPS) at NCEP.