Component Concurrency: Coarse-Grained Parallelism for Earth System Models

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Thursday, 8 January 2015: 2:45 PM
128AB (Phoenix Convention Center - West and North Buildings)
Rusty Benson, NOAA/GFDL, Princeton, NJ; and V. Balaji

Over the last several years at the NOAA Geophysical Fluid Dynamics Lab (GFDL) in Princeton, NJ, we have been actively modifying and updating our Flexible Modeling System (FMS) infrastructure and model components to extend our pure MPI-based parallelism through the use of hybrid MPI-OpenMP concepts. The hybrid programming model has allowed us to incrementally increase the performance of our models, measured in simulation years per calendar day (SYPD), beyond the weak-scaling limits imposed by fixed-sized datasets utlizing MPI-only. The same framework is also being used for experimenting with other fine-grained parallelism methods as well. But it is clear that the approach to exascale is going to require discovering and extracting parallelism at many levels.

Recently, the Modeling Services group undertook an even more ambitious, science-based project in which the chosen approach had the benefit of preparing the model infrastructure for current and future hardware paradigms. This effort, focused on coupling algorithms, rather than programming models, aimed at increasing coarse-grained concurrency at the granularity of model components. This allows each component to use timestepping based on its intrinsic timescales.

An initial effort focused on the atmospheric radiative transfer component. The scientific goal was to decrease the timescales for which the GFDL atmospheric radiation model was calculated, from some fixed time period to every atmospheric tme step, without impacting the SYPD performance of the model. The solution was to re-architect the flow of computation to enhance component level concurrency but doing so in a way that exploits the tightly-coupled, many-core architectures that are finally coming to fruition. Using OpenMP at a high level, the dynamics, radiation, and atmospheric physics can be run concurrently in dedicated shared-memory threads (MIMD) and all data exchange occurs via memory copies. The radiation and atmospheric physics are column-independent processes which can be further blocked and parallelized using a nested, second level of OpenMP (SIMD). The dynamics contain both column and slab computation and the nested OpenMP occurs differently for each and at a finer level. Performance data will be presented demonstrating the effectiveness of the approach and efficacy for planned self-booting, many-core systems. We will also show that despite changes to the coupling algorithms, the new systems produces valid climates running in AMIP mode.