3A.2 Efficient and Rigorous Data Quality Checking for Training a Machine Learning Postprocessing Algorithm on High-Resolution NWP Data and Surface Observations

Monday, 29 January 2024: 2:00 PM
337 (The Baltimore Convention Center)
Ashley Elizabeth Payne, Tomorrow.io, Golden, CO; and K. Keshavamurthy, L. Conibear, A. Reed Harris, T. McCandless, M. E. Green, MA, and S. Flampouris

Quality data is core to building any machine learning-driven weather forecasting system. We present our approach to data cleaning and quality checks for a machine learning model that post-processes high-resolution, deterministic weather forecasts to produce probabilistic forecasts over the contiguous United States and globally. We make use of data from the NOAA Integrated Surface Database (ISD), ECMWF reanalysis version 5 (ERA5), NCEP Stage IV Precipitation, and NASA Integrated Multi-satellitE Retrievals for GPM (IMERG) as target observations. Each data source requires advanced quality control and applications for ingesting and cleaning the data to allow the machine learning model to capture the predictive signals in the data appropriately. Furthermore, in addition to initial quality control, we also use long-term climatological bounds using ERA5 reanalysis to ensure we maintain physically meaningful results. Here, we present these quality control methodologies in detail, and the impacts on machine learning results.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner