One of the main challenges that the BWW addresses involves efficient sharing of the ensemble’s output. Cloud-based methods are a natural fit for this problem. The BWW, thanks to an education grant from Amazon Web Services (AWS), is using AWS resources on which to store the model output data. Although the model data is first generated on hardware hosted on the individual university nodes, they are then uploaded to AWS Simple Storage Service (S3) buckets. Meanwhile, postprocessing (using, for example, NCEP’s Universal Postprocessor, UPP) takes place on the AWS Elastic Compute Cloud (EC2) service. The EC2 instance can mount the S3-hosted model output using the s3fs file system.
Data sharing is being explored using two catalog services that leverage the OpenDAP data access protocol. Unidata’s Thematic Real-time Environmental Distributed Data Services (THREDDS) Data Server (TDS) as well as Geode System’s Repository for Archiving, Managing and Accessing Diverse DAta (RAMADDA) are easily deployed as Tomcat-served webapps on the EC2 instance. A user can simply use a web browser to explore the BWW data archive, but can also use OpenDAP client tools such as NCL, Python, and the Integrated Data Viewer (IDV) to analyze and visualize the cloud-hosted BWW data.
Besides raw compute power and storage (provided by the EC2 and S3 services, respectively), the Cloud also presents an ideal platform to deploy container-based technologies, such as Docker. This presentation will include a brief synopsis of the BWW project team’s work using containers, particularly with regard to configuring, running, and postprocessing WRF.
There has understandably been a lot of excitement, but also a lot of uncertainty about using cloud-based services such as AWS, not only in our meteorological community but the wider world as well. This presentation will include a discussion about some of these issues, including:
- Cost
- Permanence of the data, especially with regard to constraints caused by (1)
- Reproducibility
- Security