7A.3 Distributing WDSS-II data on Google Cloud

Tuesday, 14 January 2020: 3:30 PM
157C (Boston Convention and Exhibition Center)
Valliappa Lakshmanan, Valliappa Lakshmanan, Bellevue, WA; and S. Glass, T. Smith, and A. Campbell

We describe how we experimented with a new way of providing WDSS-II reflectivity data on the public cloud. This allows users to download subsets and slices of data, and carry out both longitudinal and individual case studies. The traditional approach would have involved provisioning network bandwidth and setting up a server to provide all this functionality. The problem is that these are expensive, both in terms of computational and personnel resources and do not scale.

We set up a process to upload the data to the cloud from the cluster on which data is created, a serverless pipeline to publish the data as NetCDF grids on Google Cloud Storage (a blob store), and as a GIS tabular dataset in Google BigQuery, a serverless data warehouse. Anyone wishing to download the grids as-is can access them off Google Cloud Storage. Anyone wishing to download subsets of slices of the grids can run a Cloud Dataflow pipeline to do the slicing on-demand, potentially carrying out analytics along the way, and then download the results. We published the Dataflow pipelines both as ready-to-run templates and as open-source software that can be adapted and run. Finally, GIS queries in SQL were enabled so that users can carry out longitudinal and near-real-time analyses of the data without downloading any of the data. Because Google Cloud provides for analysis of 1 TB of BigQuery per month per user for free, this opens up the radar data to many research and occasional users at no cost.

In the talk, we discuss our system architecture, the Dataflow template, the BigQuery GIS schema, and show some example analyses that can be carried out.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner