J8.2 Cloud-Based Sharing, Analysis, and Visualization of NWP Model Output for the Big Weather Web

Wednesday, 25 January 2017: 8:45 AM
611 (Washington State Convention Center )
Kevin R. Tyle, SUNY, Albany, NY

The Big Weather Web (BWW; http://www.bigweatherweb.org) is an NSF-funded multi-university effort whose primary goal is to develop a common and sustainable Big Data infrastructure in support of weather prediction research and education in universities.  Currently, seven university partners (Colorado State University; Pennsylvania State University; South Dakota School of Mines & Technology; Texas Tech University; University at Albany-SUNY; University of North Dakota; and University of Wisconsin-Milwaukee) generate NWP model output with the Advanced Research Weather Research and Forecasting (WRF) model.  While the model domain and run-time are fixed, each university node runs one or more unique instances of WRF.  Specifically, each run may use a different set of physics, convective, boundary-layer, and/or radiative parameterization schemes, or use a different set of initial conditions.  As a result, a moderately-large (currently 47-member) ensemble of CONUS-centered WRF forecasts produce model output every 3 hours up to the model run time of 84 hours.

One of the main challenges that the BWW addresses involves efficient sharing of the ensemble’s output.  Cloud-based methods are a natural fit for this problem.  The BWW, thanks to an education grant from Amazon Web Services (AWS), is using AWS resources on which to store the model output data.  Although the model data is first generated on hardware hosted on the individual university nodes, they are then uploaded to AWS Simple Storage Service (S3) buckets.  Meanwhile, postprocessing (using, for example, NCEP’s Universal Postprocessor, UPP) takes place on the AWS Elastic Compute Cloud (EC2) service. The EC2 instance can mount the S3-hosted model output using the s3fs file system.

Data sharing is being explored using two catalog services that leverage the OpenDAP data access protocol.  Unidata’s Thematic Real-time Environmental Distributed Data Services (THREDDS) Data Server (TDS) as well as Geode System’s Repository for Archiving, Managing and Accessing Diverse DAta (RAMADDA) are easily deployed as Tomcat-served webapps on the EC2 instance.  A user can simply use a web browser to explore the BWW data archive, but can also use OpenDAP client tools such as NCL, Python, and the Integrated Data Viewer (IDV) to analyze and visualize the cloud-hosted BWW data.

Besides raw compute power and storage (provided by the EC2 and S3 services, respectively), the Cloud also presents an ideal platform to deploy container-based technologies, such as Docker. This presentation will include a brief synopsis of the BWW project team’s work using containers, particularly with regard to configuring, running, and postprocessing WRF.

There has understandably been a lot of excitement, but also a lot of uncertainty about using cloud-based services such as AWS, not only in our meteorological community but the wider world as well.  This presentation will include a discussion about some of these issues, including:

  1. Cost
  2. Permanence of the data, especially with regard to constraints caused by (1)
  3. Reproducibility
  4. Security
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner