J10.6 Running Climate Model in the Commercial Cloud Computing Environment: A Case Study Using Community Earth System Model (CESM)

Wednesday, 25 January 2017: 11:45 AM
611 (Washington State Convention Center )
Xiuhong Chen, Univ. of Michigan, Ann Arbor, MI; and X. Huang, C. Jiao, M. G. Flanner, T. Raeker, and B. Palen

Numerical model is the major tool used in the studies of climate change and climate projection. Because of the enormous complexity involved in such climate models, they are usually run on supercomputing centers or at least high-performance computing (HPC) clusters. The cloud computing environment, however, offers an alternative option for running climate models. Compared to traditional supercomputing environment, cloud computing offers more flexibility yet also extra technical challenges. This study investigates this alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections and operated by the University of Michigan. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9º latitude by 2.5º longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the speedup is nearly linear. Beyond 64 cores, the communication latency starts to overweight the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner