10 The OCC NOAA Data Commons: First Year Experiences

Monday, 23 January 2017
4E (Washington State Convention Center )
Zachary L. Flamig, Univ. of Chicago, Chicago, IL; and M. Patterson, W. Wells, and R. Grossman

The NOAA Big Data Project (BDP) contains five Data Alliances anchored by Google, Microsoft, Amazon, IBM, and the Open Commons Consortium (OCC). Announced on April 21st, 2015 this study will present lessons learned from the first year of the BDP from OCC’s unique perspective as the collaboration primarily serving the academic and nonprofit communities. So far OCC has established a pilot data commons with some initial datasets including NEXRAD data and established a digital ID service.

            The pilot data OCC NOAA Data Commons contains level 2 NEXRAD data, which was made available to the BDP partners by NOAA. Approximately 50 TiB of NEXRAD data representing the year 2015 was incorporated into the data commons. The digital ID service, “signpost,” supports a common persistent data ID that can access data from multiple data locations. Using this digital ID service allows users to access the NEXRAD data from their choice of the OCC NOAA Data Commons, Amazon’s NEXRAD data holdings, or any other compatible data holding in the same manner.

            To demonstrate the concept further, a sample Jupyter notebook was created to utilize the NEXRAD data. The Py-ART Python package is utilized in the notebook to create an animated loop of a June 2015 Mayfly hatch in Wisconsin. The notebook also demonstrates how to do a basic quality control procedure on the radar data, in this instance to remove meteorological echoes in favor of showcasing the biological scatters. Open Science Data Cloud grantees have access to additional premade resources such as images from virtual machines preloaded with the tools needed to access the NEXRAD data.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner