2.2 Analyzing Large Radar Datasets Using Python

Monday, 8 January 2018: 9:45 AM
Room 8 ABC (ACC) (Austin, Texas)
Robert Jackson, ANL, Argonne, IL; and S. Collis, Z. Sherman, G. Palanisamy, S. Giangrande, J. Kumar, and J. C. Hardin

Global climate model (GCM) developers looking to improve parameterizations in climate models need large observational data from instruments such as radars, wind profilers, and aircraft. Observations from a large variety of cases in a wide variety of meteorological conditions provides the best dataset for developing climatologies to guide GCM parameterization. In this study, the use of Python to analyze tens of terabytes of radar data on Argonne National Laboratory’s Bebop and U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM)’s Stratus supercomputer is demonstrated. In particular, the Python ARM Radar Toolkit, Python software useful for the gridding, processing, and plotting of radar files, is adapted to process thousands of scans in parallel using various distributed computing frameworks. Various distributed computing frameworks such as Dask, IPython clusters, PySpark, and joblib are explored with the advantage and disadvantages of each shown. In particular, the performance of standard radar data analysis techniques such as phase processing, gridding, deriving quasi-vertical profiles, plotting, and calculating statistical coverage across the differing distributed computing frameworks for different radar systems will be shown. Early results indicate that multiple months of radar data can be processed on a small scale (1000+ cores) cluster in order of 10 minutes.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner