Enhancing Air Quality Prediction through Bias Correction using Machine Learning on CMAQ

Dang, Quang; Dang, Quang

Poor air quality is a significant threat to human health, emphasizing the critical need for accurate air quality predictions to mitigate its adverse effects. The National Oceanic and Atmospheric Administration (NOAA) relies on the Community Multiscale Air Quality (CMAQ) model for forecasting air quality conditions. However, these predictions often necessitate bias correction to address systematic errors in the model's output. The conventional method for bias correction, known as Kalman filter, has limitations as it assumes linearity and specific distribution characteristics, potentially constraining its effectiveness. In recent years, machine learning has emerged as a promising alternative to enhance air quality forecasting, particularly during extreme events. Machine learning's ability to tackle nonlinear problems and generalize patterns effectively makes it a valuable tool in this context. This study assesses the efficacy of machine learning techniques, specifically Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, a CNN U-Net architecture, in comparison to a traditional Kalman filter approach for bias correction applied to CMAQ predictions. This exploration is motivated by the need to address the complexities of air quality data, which may not adhere to linear assumptions and can be influenced by various factors. Each machine learning model offers unique advantages: CNNs capture spatial dependencies in the data, LSTM models focus on temporal patterns, and U-Net leverages spatial context effectively. To facilitate the training process, we initially employ K-means clustering to group stations with similar geographic characteristics, such as latitude, longitude, and elevation level. This clustering approach helps simplify training by grouping stations with similar behaviors, thereby enhancing the learning process. Subsequently, bias correction is applied to the machine learning models for CMAQ forecasting. We trained Machine Learning models on 9 months of prediction data, and evaluated their performance in September 2022 for Ozone and PM 2.5, utilizing the Root Mean Squared Error (RMSE) and Pearson correlation coefficient (PCC) as performance metrics. In the case of Ozone, LSTM demonstrated a substantial improvement in forecasts with a 37.3% reduction in RMSE and achieved 0.852 in PCC. U-Net architecture further excels with a 38.1% reduction in RMSE and 0.859 in PCC. The CNN sequential model stands out as the most effective, 40.5% reduction in RMSE plus 0.863 in PCC. These findings underscore the potential of machine learning approaches in bias correction for air quality prediction. The application of CNN, LSTM, and U-Net models showcases their ability to improve forecast accuracy. These advancements hold significant implications for public health, environmental management, and decision-making, particularly during extreme air quality events. Ultimately, this research highlights the importance of adopting advanced techniques to better address the multifaceted nature of air quality data and improve public health outcomes.

143 Enhancing Air Quality Prediction through Bias Correction using Machine Learning on CMAQ