PODPAC: A Python Library for Automatic Geospatial Data Harmonization and Seamless Transition to Cloud-Based Processing

Shapiro, Marc; Shapiro, Marc

PODPAC, the Pipeline for Observation Data Processing Analysis and Collaboration, is a Python-based software library for automated data harmonization and seamless transition to cloud processing. Data sources encapsulated by PODPAC are automatically projected and interpolated to a user-specified geospatial reference system. This allows plug-and-play development of processing pipelines using multi-scale and multi-source data. While these pipelines may be developed in Jupyter notebooks on local machines, they can also be exported using an automatically-generated text-based description and run on massively distributed remote cloud servers. PODPAC is under development under a permissive open source license, and is available at https://github.com/creare-com/podpac.

This paper demonstrates PODPAC usage through example applications combining multiple data sources and running on the AWS commercial cloud. In particular, we will show applications involving: NASA observational data products such as SMAP (Soil Moisture Active-Passive); distributed sensor networks such as the COSMOS soil moisture networks; and digital terrain model data. These data sources will be encapsulated using PODPAC to demonstrate the automated data harmonization features. We will also show how to generate a PODPAC processing pipeline, and then execute it both locally and remotely using AWS Serverless Lambda functions. A description of this cloud-based architecture will also be presented. Finally, we will describe our progress, as well as the planned development goals for the PODPAC software.

671 PODPAC: A Python Library for Automatic Geospatial Data Harmonization and Seamless Transition to Cloud-Based Processing