There is also broad consensus that as data volumes grow rapidly, it is important to reduce data movement and bring processing and computations to the data. We need to give scientists an ecosystem that includes data, tools, models, and modern workflows, all residing in a cloud-like environment. Instead of moving data to users, data providers need to facilitate bringing analysis, visualization and other applications and tools to data – so called data proximate workflow capabilities.
Unidata, a cyberinfrastructure facility, has been developing big-data infrastructure and data-driven scientific workflows using cloud computing technologies for accessing, analyzing, and visualizing geoscience data. Unidata has implemented the aforementioned services on the Unidata Science Gateway (http://science-gateway.unidata.ucar.edu), hosted on the Jetstream cloud (https://jetstream-cloud.org), a cloud facility funded by the U. S. National Science Foundation. Through the Unidata Science Gateway, researchers can make use of well-integrated resources. Currently, the Unidata Gateway provides the following capabilities:
- A data ingest service that moves real-time data from over 30 meteorological data streams and in excess of 1 TB/day, into the Unidata Gateway via the Local Data Manager software, a TCP/IP-based data transfer technology. Those data include radar, satellite, surface, upper-air, ship, aircraft and other observations as well as forecast model output from several operational weather prediction centers;
- Remote access to the above data via different protocols, including access to subsets of data;
- Data transformation and format conversion;
- Extensive data analysis capabilities;
- Visualization of meteorological data;
- A collection of Jupyter Notebooks for data analysis;
Pre-configured Docker virtual images of tools; - Access to Advanced Weather Information Processing System for remote data analysis and visualization
In concert with the above efforts, Unidata has developed techniques that combine robust access to well-documented datasets with easy-to-use tools, using workflow technologies such as JupyterHub and Docker Containers. In addition to fostering the adoption of technologies like pre-configured virtual machines through Docker containers and Jupyter notebooks, other computational and analytic methods are enabled through “Software as a Service” and “Data as a Service” techniques via the deployment of the Cloud IDV, AWIPS EDEX Servers, and the THREDDS Data Server in the cloud. The collective impact of these services and tools is to enable scientists to use the Unidata Science Gateway capabilities to not only conduct their research but also share and collaborate with other researchers and advance the intertwined goals of Reproducibility of Science and Open Science, and in the process, truly enabling “Science as a Service”.
In this talk, we will present our work to date in developing the Unidata Science Gateway and the hosted services therein, as well as our future directions to benefit the atmospheric sciences community.