365046 PODPAC: The Easy Way to Analyze Earth Science Data in the Cloud

Tuesday, 14 January 2020
Hall B1 (Boston Convention and Exhibition Center)
Jerry Bieszczad, Creare LLC, Hanover, NH; and M. P. Ueckermann, M. Shapiro, D. R. Callender, D. Sullivan, and D. Entekhabi

Cloud-based analysis and analytics promise significant benefits to researchers, including (1) lower cost/effort of accessing data stored on the cloud, (2) the ability to use massive computing resources for a short time, and (3) lower costs compared to maintaining a local cluster. In reality, there remain significant barriers to migrating existing earth science analysis to the cloud, including (1) the need for expertise and knowledge of a particular cloud vendor’s services, (2) the difficulty to optimize costs for a cloud hosted solution given multiple different services and options, (3) administrative barriers to fund cloud infrastructure, and (4) the continued burden of maintaining cloud infrastructure. What we need is a cloud-ready workflow for earth science data research that just works.
In response to these needs, we are developing PODPAC, a cloud-ready Python-based workflow solution that leverages serverless AWS Lambda functions to deploy custom applications. PODPAC is built around the tools of the Python data ecosystem (NumPy, Scipy, X-Array) and aims to bridge the gap between data sources, analysis, and the cloud. By using a preconfigured, general AWS Lambda implementation, PODPAC simplifies setting up and maintaining cloud infrastructure. We will demonstrate on-demand cloud computation of a drought-monitor index derived using NASA SMAP data. We will show the steps taken to develop the application using a Jupyter Notebook, and the steps needed to deploy the application for on-demand computation of our SMAP-derived drought-index.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner