1.3
Parallelization and Performance of the NIM for CPU, GPU and MIC
Both model performance and portability will be discussed in this presentation. Evaluation of CPU, GPU and MIC performance for the NIM is an on-going activity and constantly changing as we (1) add new optimizations to the code, (2) new systems or architectures emerge, and (3) compare performance in different ways. The code is performance-portable, so any optimizations that benefit one architecture (eg. MIC), cannot degrade performance on another (CPU, GPU). A test suite, developed at ESRL, automates testing to insure openMP, openACC, F2C-ACC, and SMS parallelization remain valid for CPU, GPU, MIC execution.
Finally, we will report on efforts to improve single-node performance by using both CPU and GPU resources (symmetric mode). Performance tests have been made on systems with up to 6 GPU or 2 MIC accelerators attached to a single node. Using these results as a guide, we will make some general comparisons between CPU, MIC and GPU systems based on cost, energy, and model performance.