365608 Application of Machine Learning to Classify and Predict Events of Severe PM2.5 Pollution in Taiwan

Wednesday, 15 January 2020
Wei-Ting Chen, National Taiwan University, Taipei City, Taiwan; and C. W. Chang, P. J. Chen, T. S. Yo, S. H. Su, C. Y. Su, and C. M. Wu

Direct numerical simulation and forecast of air pollution events over areas with complex terrain and coastlines is challenging, as the transport and accumulation of air pollutant is highly controlled by the synoptic-scale environment as well as the local circulation associated with fine-scale processes such as boundary layer development, topographic effects, and land-sea breeze. In this work, we explore the application of machine learning (ML) to classify and predict the occurrence of severe PM2.5 pollution events over Taiwan, a subtropical island with complicated orography. Using the hourly PM2.5 data observed at 78 Taiwan Environmental Protection Administration ground stations over 2008-2015, a daily pollution index is defined by the total number of station hours with PM2.5 > 54.5 μg/m3 each day. The time series of the daily pollution index over a 10-day running window are then objectively classified using the agglomerative clustering algorithm, which is an unsupervised, hierarchical cluster analysis. These pollution events are clustered into three types, each has a distinct temporal evolution pattern. The type with highest averaged pollution index is considered the severe events of PM2.5 pollution. Finally, to predict the occurrence of these events, generalized linear model based on logistic regression were trained with the daily mean NCEP CFSR reanalysis data. The variables taken into consideration initially were the 40 leading principle component analysis (PCA) modes of geopotential height, relative humidity, temperature, divergence, equivalent potential temperature (θe), wind fields, potential vorticity, and low-level stability. Using recursive feature elimination with cross-validation, we identified specific PCAs of θe at lower troposphere and geopotential height at 500hPa as important features for predicting severe events. The prediction skill features 81 % of accuracy of 81%, 85% hit ratio, 42% false alarm rate, and 40% of true predictability. The seasonal contrast of the severe event occurrence is closely reproduced. During winter, the predicted severe events are highly associate with the mid-latitude synoptic systems affecting Taiwan. The results demonstrate ML could be used to predict severe PM2.5 pollution events over complex terrain based on global model outputs. Our prediction model can be used with inputs from future climate projection to explore the long-term change of air pollution events over Taiwan.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner