We set up a process to upload the data to the cloud from the cluster on which data is created, a serverless pipeline to publish the data as NetCDF grids on Google Cloud Storage (a blob store), and as a GIS tabular dataset in Google BigQuery, a serverless data warehouse. Anyone wishing to download the grids as-is can access them off Google Cloud Storage. Anyone wishing to download subsets of slices of the grids can run a Cloud Dataflow pipeline to do the slicing on-demand, potentially carrying out analytics along the way, and then download the results. We published the Dataflow pipelines both as ready-to-run templates and as open-source software that can be adapted and run. Finally, GIS queries in SQL were enabled so that users can carry out longitudinal and near-real-time analyses of the data without downloading any of the data. Because Google Cloud provides for analysis of 1 TB of BigQuery per month per user for free, this opens up the radar data to many research and occasional users at no cost.
In the talk, we discuss our system architecture, the Dataflow template, the BigQuery GIS schema, and show some example analyses that can be carried out.