Parallel Performance Analysis of Two Infrastructure Frameworks for GMI Chemistry

Kouatchou, Jules; Kouatchou, Jules

We have recently developed a generalized Chemistry Transport Model (CTM) named the Goddard Earth Observing System CTM (GEOS-5 CTM) to facilitate the integration of several Chemistry modules within a single code base. GEOS-5 CTM uses a cubed-sphere grid. It relies on the Earth System Modeling Framework (ESMF), a newly implemented mass conserving advection scheme (called AdvCore) and a software component (called ExtData) that has the ability to read external files at any resolution (horizontal and temporal) and from any source (as long as the data are in a latitude-longitude grid). GEOS-5 CTM is meant to replace several offline CTMs. It has the capability to run under the Global Modeling Initiative (GMI) scenario. The "offline" version of GMI (oGMI) has been in existence for over sixteen years, is based on a latitude-longitude grid and is driven by a multi-dimensional flux form semi-Lagrangian advection scheme (named LL-TpCore). The use of such a grid formulation has shown some limitations at high spatial resolutions and with increasing number of transported tracers in oGMI.

The GMI configuration of GEOS-5 CTM contains the components (Emission, Deposition, Diffusion, Convection and Chemistry) that originated in oGMI. oGMI has 124 tracers (of which 69 are advected), whereas the GMI configuration of GEOS-5 CTM has 122 tracers (72 of which are advected). GEOS-5 CTM has fewer tracers because the GMI Chemistry is also used in the GEOS-5 Chemistry Climate Model and does not require the full set of tracers (as in oGMI). Overall, oGMI and GEOS-5 CTM share 121 identical tracers and associated chemical reactions.

We carry out a series of one-day numerical integrations of oGMI (at 1

^o

x1.25

^o

 horizontal resolution or 288x181 grid points) and GEOS-5 CTM (at C90 horizontal resolution or 90x90x6 grid points) to analyze the parallel performance of both implementations (GEOS-5 CTM being under the GMI configuration). We are interested in comparing the computational speed of GEOS-5 CTM with respect to oGMI and in particular the performance of their most computationally intensive components. All the experiments were done on Intel Xeon Haswell processor nodes where each node has 28 cores (2.6 GHz each) and 128 Gb of available memory.

In Figure 1, we plot the total wall clock times for both models as the number of processor varies. The integration with GEOS-5 CTM is faster than that of oGMI and the wall clock time steadily decreases as the number of processors increases. If we examine the distribution of the overall wall clock time, we note that Advection and Chemistry take together up to 90% of the time in oGMI and up to 76% in GEOS-5 CTM. We plot the wall clock times (in milliseconds) per tracer and per grid point for Advection and Chemistry in both models. We also present the parallel scaling of each of the two components. Figure 2 shows that AdvCore is at leat four times faster than LL-TpCore and AdvCore has a more favorable parallel scalability (Figure 3). The clustering of grid points at the poles in LL-TpCore requires special attention to avoid violating the CFL stability requirements.  This introduces more computation overhead and load imbalances in a parallelized execution. The quasi-uniform nature of the cubed-sphere grid in AdvCore eliminates such problems at the poles. If we continue to increase of GEOS-5 CTM horizontal resolution, we expect AdvCore to have better parallel efficiencies.

Chemistry in GEOS-5 CTM costs less per tracer and grid point (Figure 4). However, as shown in Figure 5 the scaling is better with oGMI because it has more grid points that contribute to a better load balancing in its Chemistry.

As the number of processors increases, LL-TpCore slowly becomes the more time consuming component in oGMI and will quickly reach a limit on its parallel efficiency. At a given model resolution, using more processors in oGMI will not lead to a gain in time because of the communication overhead in LL-TpCore. AdvCore does not have a similar problem in GEOS-5 CTM. However, we have observed that, the ExtData component, critical in the GEOS-5 CTM implementation, contributes to an increase of the overall wall clock time. The reason why ExtData is so time consuming is at least because of the opening of a file for each required meteorological field, the automatic regridding and the time interpolation. In oGMI there is only one file opening for all the fields, no regridding and little time interpolation. As shown in Table 1, the time spent in ExtData remains basically flat regardless of the number of processors used. As the number of processor increases, ExtData might be the most dominant component. To limit the number of file openings by ExtData, we have started the process of modifying the code by opening each external file only once. Instead of having at least 50 file openings (one for each meteorological related field), there are now 11 only (one for each file collection). We conducted additional experiments with the modified code and recorded the ExtData wall clock times. The results in Table 1 show at least a 30% reduction of time reading external data files. We are currently exploring other options to further decrease the ExtData time requirement.

Our experiments show that the most time consuming components of GMI (Advection and Chemistry) are faster in GEOS-5 CTM. Overall, GEOS-5 CTM is a better alternative to oGMI as far as the parallel performance is concerned. The advantage of GEOS-5 CTM is not limited to its computational speed. It does not require any pre-processing (for regridding) of input data files, it can be run without any restart input file, and it has the ability to integrate various CTMs using the same executable.

4.3 Parallel Performance Analysis of Two Infrastructure Frameworks for GMI Chemistry