4.5 Machine Learning Method of Weather Radar Data Quality Control Using Python and PySpark

Tuesday, 24 January 2017: 11:30 AM
Conference Center: Chelan 5 (Washington State Convention Center )
Jingyin Tang, Univ. of Florida, Gainesville, FL; and C. J. Matyas

Quality-control (QC) on radar reflectivity data to filter out non-meteorological echoes is a constant requirement. Non-meteorological echoes may be caused by biological targets, sunstrobe, anomalous propagation or ground clutters. Although it is not complicated to identify those contamination by forecaster, due to large volume of radar data nowadays, automated quality control with real-time performance is often desired and challenging.

In this research, we present a binary classification framework to discriminate “good” and “bad” gates. The framework is based on support vector machine (SVM) and quadratic kernelling method. Each gate is converted to a pattern vector containing basic Doppler moments (i.e. reflectivity, spectrum width) with dual-pol moments (correlation coefficient, differential reflectivity, differential phases) when available, spatial properties (position, spatial connectivity, elevation, etc.) and local variance of these features. Then the patterns are classified as either “good” or “bad”. This SVM-based classification model is built on existed QC dataset with supervised training procedure. Finally, the model is evaluated with both legacy and dual-pol radar data. We implement the model using Python language, Apache Spark distributed computing framework, and the machine learning library (MLLib) in Apache Spark. We demonstrated this QC model is highly performant in distributed computing environment for handling large volume of radar data. With QCed radar data, we show improved quality in multi-radar mosaic scenario after “bad” gates removed.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner