Quantifying Weather and Climate Simulation Reproducibility in the Cloud

Shepherd, T. J.; Shepherd, T. J.

Cloud computing is an agile information technology model that enables remote access to shared pools of computers, often referred to as virtual machines. Recent advances in cloud computing technology, infrastructure and design mean that it is now possible to conduct high resolution weather and climate simulations on cloud platforms as an alternative to more traditional high performance computing (HPC) environments. The idea of sustainable computing has gained traction in recent times, as researchers look to undertake complex weather and climate simulations on the cloud for a fraction of the cost compared to running on traditional HPC systems. This is because cloud-based instances can be suspended between simulations, require minimal management, and have less overhead than traditional HPC.

It is generally accepted that, like a traditional HPC system, simulations conducted on a cloud should be reproducible, that is, if the same simulation is run a second time, the result should be more or less identical to the first run. This concept of reproducibility, however, has not been extensively verified and quantified in weather and climate related studies.

Analysis of Weather Research and Forecasting (WRF) model simulations conducted on a federated cloud computing resource at the Cornell University Center for Advanced Computing, has revealed sensitivities exist between simulations conducted on the same node, with the same CPUs. This sensitivity was discovered during a major cloud hardware update in October 2017, when the cloud management hardware, Eucalyptus, was upgraded from version 4.0 to 4.4.

At the time of the upgrade, a long term WRF climate simulation (2001 – 2017) of atmospheric flow was running on an instance. The simulation was at the mid-way stage (2007), and the upgrade could not be delayed while the simulation completed. A decision was made to suspend the simulation, take an image of the instance, migrate the Docker container running WRF, and restart the simulation on a new node with the new cloud hardware.

Before recommencing the long term simulation on the new node, three months of 2007 were repeated in order to test the reproducibility of results. These three months were compared to the output run on the old node with the old cloud hardware. As stated, it is generally accepted that a simulation can be reproduced with identical output on a traditional HPC system. Results indicate this might not be the case for cloud-based systems. Here, we present results of an investigation to determine what range of sensitivities might exist on cloud systems, as a means to inform best practices, and to mitigate (and even eliminate) sensitivities, or the potential for sensitivities to be introduced into model results, in this computing environment.

In this study, the application of cloud-based computing to weather and climate modeling relates to assessing the fidelity of simulated North American wind climate for the wind energy industry. Accurate quantification of the wind resource is of considerable cost benefit to the industry, as even the smallest of errors in modeled wind speed can have significant cost impact on investment capital. For example, the placement of future wind farms, and the likelihood of return of investment from annual energy production. Thus, quantifying the potential uncertainties in computing in the cloud could significantly improve our understanding and interpretation of model results.

This study is therefore motivated by several research questions:

When the migration to the cloud management hardware Eucalyptus 4.4 occurred, is the output from the run that was currently underway the same as the output from continuing that run using the new cloud hardware? Specifically, did those changes impact WRF simulation of wind fields?
Do kernel-based virtual machine updates, and other software updates that occur on a cloud-based compute node during a simulation, or between simulations, introduce sensitivities in the model results?
If we have different CPUs on the node (i.e. different generations of CPU or CPUs with different clock speeds), does this introduce sensitivities between two simulations with the same base climate (i.e. can a simulation be repeated and achieve the exact same result if nothing is changed between those two simulations [as you would see on a traditional HPC system])? Further, if there are different CPU generations/clock speeds on different nodes (e.g. between nodes on system 1 and system 2) does this introduce sensitivities if the same two experiments are run (e.g. one on system 1, and one on system 2)?
Can we establish that our existing settings generate precisely the same data when run again on the same machine? This might reveal non-hardware stochastic effects if they are inherent to the software. For example, run a simulation (e.g. A) on system 1 and run it again (A1) and test for sensitivity.

Initial results for 1. indicate that there is an impact in the geospatial maps of mean wind speed and mean difference in each grid cell. These impacts, however, appear to be stochastic, that is, they are symmetric about zero, modest onshore (where current wind turbine deployments exist), and are of modest magnitude compared to mean wind speeds. Such a finding increases the confidence that despite the migration of cloud hardware and node change midway during a long term climate simulation, the fidelity of the simulation was not compromised, and thus the entire output can be used for applications for weather and climate research.

2.3 Quantifying Weather and Climate Simulation Reproducibility in the Cloud