The NCCS High Performance Data Analysis System and Climate Model Data Services - Supporting Collaborative Climate Research

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 7 January 2015
Laura Carriere, NASA, Greenbelt, MD; and G. L. Potter, M. McInerney, S. Ambrose, D. Duffy, J. L. Schnase, T. P. Maxwell, and B. Huffer

The NASA Center for Climate Simulation (NCCS) and the Climate Model Data Services (CDS) are partnering to use the NCCS High Performance Data Analysis System as the infrastructure to build a collaborative environment for climate research. The NCCS has repurposed compute nodes from their HPC Discover system to build the High Performance Data Analysis System environment designed to support data-centric medium sized codes that are inherently parallel, for example, analysis of large quantities of high-resolution satellite imagery, or intercomparisons of large climate model output. The system utilizes (1) virtualized high-speed Infiniband networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive science. Large commonly used data sets such as Landsat, MODIS, and the major climate reanalysis projects (e.g. NASA/GMAO's MERRA, ECMWF's ERA-Interim. NOAA/NCEP's CFSR, NOAA/ESRL's 20CR, and JMA's JRA-25 and JRA-55), will be available in the system, allowing the scientists to bring their analysis to the data. Access to compute resources will be made available to both authorized users via virtual containers built with standard climate tools and managed by Data Analysis System administrators, and external users via the CDS API. The CDS provides access to climate model data through a variety of services, including access, visualization, compute, and comparison, to facilitate advances in climate research. These services; data, analytic, and knowledge based, include data services such as Earth System Federation Grid (ESGF), THREDDS (TDS), Web Map Services (WMS), and ArcGIS. The Climate Analytics as a Service (CAaaS) provides access to MERRA AS (analysis of MERRA data using a Hadoop/MapReduce approach) and is planning a web-based implementation of the Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT). Knowledge services will be provided through the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES) containing metadata associated with the major Reanalysis projects mentioned above. We will present the experience of two projects that are using the High Performance Data Analysis System. The first is the Arctic Boreal Vulnerability Experiment (ABoVE), a major NASA Terrestrial Ecology Program field campaign to study climate change in Alaska and western Canada. The ABoVE Data Analysis System will provide a centralized location to share field measurements, remote observations, including weather products, as well as run models. Internal data management services as well as both internal and external data sharing services such as ArcGIS will be provided by CDS. We will also present our support of the Collaborative REAnalsis Technical Environment for Intercomparison (CREATE-Intercomparison Project). This will include hosting approximately .5 PB of Reanalysis output, gridded observations and innovations, providing download access through ESGF and TDS, writing conversion routines to a geospatial format for ArcGIS access, providing compute and analytics through the Data Analysis System and the CDS API, including MERRA AS, and maintaining the Reanalysis metadata in the ODISEES Ontology. This will support the reanalysis science community by providing identically formatted reanalysis input, output and innovation data for intercomparison and the determination of uncertainty. Access to the data through ArcGIS and the API will support the geospatial community interested in using climate data generated by the reanalysis projects in their research.