Monday, 29 January 2024: 1:45 PM
324 (The Baltimore Convention Center)
Shuxia Zhang, Engelhart CTP (US) LLC, Stamford, CT; and J. Belanger
Cloud computing has become a more common approach for private enterprises to integrate the Global Ensemble Forecast System (GEFS) into their business services and/or decision-making process. Because of extremely large data volumes, complicated parameterization schemes and fluid dynamical process involved, specific instance types and extensive computing resources are required to run the GEFS efficiently and reliably. The complexity and size of the GEFS itself are likely to grow as well as evident by recent changes and trends with the ECMWF ensembles. However, a single community cloud provider is unlikely to provide reliably the needed compute resources to guarantee operational uptime and common SLA requirements. On the other hand, intercloud computing provides unique benefits such as better operational continuity with additional fall back options, mitigateing business risk by avoiding cloud provider lock-in and collectively achieving greater economies of scale.
From the HPC perspective, the current version of the GEFS fits well into the universe of intercloud computing because each of the ensemble members can be run independently, starting from downloading the initial conditions. However, this opportunity arises a new set of technological and logistical challenges to the Cloud HPC community such as the need to execute the GEFS successfully across multiple clouds concurrently in addition to the need for cohesive collaboration among involved stakeholders.
In our talk we will propose a potential framework for distributing workloads of the GEFS across multiple clouds, discuss the technological challenge(s) expected at individual stages of the workflow and present technical tactics to address these challenges.

- Indicates paper has been withdrawn from meeting

- Indicates an Award Winner