7A.2 Cloud Native Data Processing and Visualizations Techniques for Earth Science Data

Tuesday, 14 January 2020: 3:15 PM
157C (Boston Convention and Exhibition Center)
Ajinkya Kulkarni, Univ. of Alabama, Huntsville, AL; and H. Conover, A. Marouane, T. Berendes, B. Ellingson, G. T. Stano, and S. J. Graves

The integration of cloud-native technologies has the potential to greatly alter how data are archived, analyzed, and cross-referenced with other data sets. This is due, in part to the expandability of cloud storage, co-location of data and analysis code, as well as efforts to create unified standards across multiple organizations storing data in the cloud. The Information Technology and Systems Center (ITSC) at the University of Alabama in Huntsville (UAH) has developed a cloud native data processing, cataloging, and visualization framework that takes advantage of advancements in cloud computing. The VISAGE (Visualization for Integrated Satellite, Airborne and Ground-based data Exploration) project at UAH/ITSC is working to provide three-dimensional visualization and basic analytics capabilities for a variety of diverse datasets in an interactive web based user interface. The Global Hydrology Resource Center (GHRC) Distributed Active Archive Center’s (DAAC) Field Campaign eXplorer (FCX), also developed by UAH/ITSC, is a related project with similar goals and shared technology. In order to better cope with large data volumes, and to align with current NASA data management advances, both tools are built on cloud-native technologies. Serverless computing tools from Amazon Web Services (AWS) allow for flexible scaling to support on-demand data rendering, interrogation and analytics with the project paying for cloud resources only when in use. VISAGE is using Amazon's Athena stateless query service as the interactive interface to the data repository, with Apache Parquet for analysis optimized data storage. Amazon Glue is used to crawl the partitioned Parquet files and construct a data catalog, which provides a single view into the data no matter where or how it is stored. On the other hand, FCX is exploring the use of cloud data storage technologies such as Zarr, Cloud Optimized GeoTIFF, and scalable Dask clusters for parallel on-demand data processing. In this presentation we demonstrate both VISAGE and FCX tools, present their detailed architectures, and discuss the pros and cons of various cloud technologies.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner