Abstract: Parallelization and Performance of the NIM for CPU, GPU and MIC (95th American Meteorological Society Annual Meeting)

1.3
Parallelization and Performance of the NIM for CPU, GPU and MIC

- Indicates paper has been withdrawn from meeting

- Indicates an Award Winner

Thursday, 8 January 2015: 9:00 AM

128AB (Phoenix Convention Center - West and North Buildings)

Mark W. Govett, NOAA/ESRL/GSD, Boulder, CO; and T. Henderson, J. Rosinski, J. Middlecoff, and R. A. Madden

Recorded Presentation

The Non-hydrostatic Icosahedral Model (NIM), is a global, non-hydrostatic model being developed at ESRL with a goal of running at 3.5KM resolution in 2015. NIM dynamics code is performance-portable across CPU, GPU and MIC architectures, using openMP (CPU, MIC), openACC (GPU), and F2C-ACC (GPU) directives. The F2C-ACC compiler, developed at ESRL, is the primary compiler for execution on NVIDIA GPUs, and serves as a benchmark for on-going evaluation of commercial openACC compilers from Cray and PGI. For MPI-based parallelization, ESRL's Scalable Modeling System (SMS) directives are used to handle domain decomposition, inter-process communications, and I/O operations. Collectively, these directives allow a single source code to be maintained capable of running on CPU, GPU and MIC processors for serial or multi-node execution.

Both model performance and portability will be discussed in this presentation. Evaluation of CPU, GPU and MIC performance for the NIM is an on-going activity and constantly changing as we (1) add new optimizations to the code, (2) new systems or architectures emerge, and (3) compare performance in different ways. The code is performance-portable, so any optimizations that benefit one architecture (eg. MIC), cannot degrade performance on another (CPU, GPU). A test suite, developed at ESRL, automates testing to insure openMP, openACC, F2C-ACC, and SMS parallelization remain valid for CPU, GPU, MIC execution.

Finally, we will report on efforts to improve single-node performance by using both CPU and GPU resources (symmetric mode). Performance tests have been made on systems with up to 6 GPU or 2 MIC accelerators attached to a single node. Using these results as a guide, we will make some general comparisons between CPU, MIC and GPU systems based on cost, energy, and model performance.

95th American Meteorological Society Annual Meeting

January 04 - 08, 2015

1.3
Parallelization and Performance of the NIM for CPU, GPU and MIC

Meeting Information

Additional Information

95th American Meteorological Society Annual Meeting January 04 - 08, 2015

1.3 Parallelization and Performance of the NIM for CPU, GPU and MIC

Meeting Information

Additional Information

95th American Meteorological Society Annual Meeting

January 04 - 08, 2015

1.3
Parallelization and Performance of the NIM for CPU, GPU and MIC