6.5 Pangeo: A Community Platform for Big Data Geoscience

Wednesday, 9 January 2019: 9:45 AM
North 129B (Phoenix Convention Center - West and North Buildings)
Kevin Paul, UCAR, Boulder, CO

Pangeo is a community dedicated to providing a scalable, flexible, easy-to-use analytics platform for analysis of large geoscience data. It has mobilized around the development and better integration of the Python packages Xarray, Dask, and Jupyter. Xarray provides an easy-to-use interface and abstraction of data conforming to the Common Data Model (e.g., NetCDF). Dask is used "under the hood" in Xarray to provide parallelism that is abstracted away from the user. Jupyter provides the "user interface." Together, these three packages constitute a platform for Big Data geoscientific analytics, which provides a uniform user experience on both high-performance and cloud computing platforms. In this presentation, I will cover the basics of how the Pangeo community functions as an organization, how the packages are developed in an open source model, and how the platform works in both cloud and HPC environments.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner