PODPAC: The Easy Way to Analyze Earth Science Data in the Cloud

Ueckermann, Mattheus P.; Ueckermann, Mattheus P.

Cloud-based analysis and analytics promise significant benefits to researchers, including (1) lower cost/effort of accessing data stored on the cloud, (2) the ability to use massive computing resources for a short time, and (3) lower costs compared to maintaining a local cluster. In reality, there remain significant barriers to migrating existing earth science analysis to the cloud, including (1) the need for expertise and knowledge of a particular cloud vendor’s services, (2) the difficulty to optimize costs for a cloud hosted solution given multiple different services and options, (3) administrative barriers to fund cloud infrastructure, and (4) the continued burden of maintaining cloud infrastructure. What we need is a cloud-ready workflow for earth science data research that just works.
In response to these needs, we are developing PODPAC, a cloud-ready Python-based workflow solution that leverages serverless AWS Lambda functions to deploy custom applications. PODPAC is built around the tools of the Python data ecosystem (NumPy, Scipy, X-Array) and aims to bridge the gap between data sources, analysis, and the cloud. By using a preconfigured, general AWS Lambda implementation, PODPAC simplifies setting up and maintaining cloud infrastructure. We will demonstrate on-demand cloud computation of a drought-monitor index derived using NASA SMAP data. We will show the steps taken to develop the application using a Jupyter Notebook, and the steps needed to deploy the application for on-demand computation of our SMAP-derived drought-index.

534 PODPAC: The Easy Way to Analyze Earth Science Data in the Cloud