898 Development of U.S. Based 20-Year Daily, 1-Km-Resolution, Multi-Species (PM2.5, O3, NO2, Dust) Surface Air Pollution Estimates By Machine Learning Data Fusion: Incorporating 1-Km Emission and 12-Km CMAQ

Thursday, 1 February 2024
Hall E (The Baltimore Convention Center)
Beiming Tang, NOAA, fairfax, VA; and D. Tong, Y. Li, S. Ma, C. Chang, B. H. Baek, Y. Tang, and B. Baker

Exposure to ambient air pollutants causes up to 4 million deaths each year globally. In the U.S., ground observations are distributed unevenly, with more stations along the east and west coast regions and fewer within the mid-west and mountain regions. Another problem is long-term records are also limited due to fewer stations built and maintained throughout the last 30 years. To analyze and prevent air pollution-related disease, flexible tools are needed for the estimation of major pollutants through the fusion of available local observations, global and regional chemical transport model (CTM) simulations, high-resolution emission datasets, and both geostationary and polar-orbiting satellite products. We present a machine learning fusion approach to estimate fine particulate matter (PM2.5), ozone, nitrogen dioxide (NO2), and dust concentrations at fine resolution (1km x 1km) for the continental U.S. domain over the last 20 years at a daily scale, facilitating exposure studies for regions lacking extensive ground station observations, and providing the multi-pollutant trends for U.S. air quality at unprecedented details. Novel features of the model are that we report relative performance for the use of coarse global reanalysis (CAMS) vs. 12-km regional CMAQ modeling. We also evaluate model performance by incorporating the 1-km Neighborhood Emission Mapping Operation (NEMO) emission dataset developed at GMU. The random forest machine learning approach is tested for the last 30 years. Predictor variables are processed at daily time intervals (meteorological, satellite products, CTM output, fine resolution emissions, elevation, population, and percent urban land cover) for predictions at daily, weekly, and monthly time scales. The ML model’s initial evaluation in Nov-2018 (during the California Campfire event) show promising results: with 10-fold correlation coefficients (R) being 0.94 for PM2.5, root mean square errors (RMSE) being 6.6 ug/m3 and mean bias equals 0.03 ug/m3. We demonstrate model accuracy and robustness by further application to S.Korea, Vietnam, and Thailand, where there is a strong need for air pollution exposure estimates for health studies, but very few surface observations.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner