9A.2 Improving Geophysical Air Quality Forecasts With Machine Learning Algorithms

Wednesday, 15 January 2020: 1:45 PM
Hervé Petetin, Barcelona Supercomputing Center, Barcelona, Spain; and A. Soret, M. Guevara, K. Serradell, and C. Pérez García-Pando

Despite major improvements over the past decades, air quality forecasts (AQF) remain subject to substantial systematic and random errors due to numerous uncertainty sources (e.g. emission, meteorology, physical and chemical parameterisations, initialisation). In practice, operational AQF systems thus often rely on so-called model output statistics (MOS) methods for improving the raw forecasts based on past observations. As they mainly allow to reduce the bias, these methods are also known as bias-correction methods. Most common MOS methods include the Kalman filter and the analogs.

In this work, we explore the use of machine learning (ML) for improving the geophysical AQF produced by chemistry-transport models, using ancillary meteorological information in addition to chemical forecasts. As a case study, we consider an operational AQF system of Mexico City (based on WRF and CMAQ models) that continuously produces 48 hour forecasts for the main pollutants, and we focus on O3 and PM2.5. The analysis is performed using several algorithms including the popular gradient boosting machine algorithm. The AQF are corrected at the 35 individual surface monitoring stations available in the agglomeration. Results are compared to both the Kalman filter and analogs methods.

Our first results indicate that the best performance is currently obtained with both the analog and ML methods; most of the bias is efficiently removed, correlations are substantially improved and errors are strongly reduced. However, these average statistics hide deficiencies at the upper edge of the concentration distributions. Such issue is problematic since AQF systems aim at predicting exceedances of ambient air quality standards sufficiently in advance to allow warning the population and eventually setting-up short-term emission reduction plans (e.g. limitation of the traffic, temporary shut-down of some industrial facilities). As pollution episodes remain relatively infrequent, ML models trained with typically unbalanced datasets miss many of these important events. In order to improve their ability to predict highly-polluted situations, we explore the use of various approaches, including and under-/oversampling and sample weighting.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner