Calculation of PAH Maps using SVM in Urban Areas

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 7 January 2015: 11:00 AM
124B (Phoenix Convention Center - West and North Buildings)
Armando Pelliccioni, INAIL, Monteporzio Catone, Rome, Italy; and A. Cristofari and S. E. Haupt
Manuscript (191.4 kB)

Epidemiological studies about health effects of air quality are often based on data inferred by monitoring stations, and constructing pollutant exposure maps is crucial for improving such studies. The exposure of Polycyclic Aromatic Hydrocarbons (PAHs) in urban areas is the major goal of the EXPAH LIFE+ Project; thus, an integrated approach based on measurements and modeling techniques has been applied to simulate PAHs concentration in the metropolitan area of Rome in a period of one year (June 2011 - May 2012). In particular, a Machine Learning method has been used to forecast atmospheric pollution using Support Vector Machines (SVMs). After an initial feature selection process, the SVM was trained and tested with blind samples, showing very significant results. Then, the same SVM was used for building PAH daily exposure maps. Here, because the actual measurements were not available, new indices were considered for assessing the maps. All the outputs produced by the SVM were also compared with those obtained by two applications of chemical transport models (FARM bc and FARM fc).

The results are divided into two parts: 1) we test the performance of the SVM model in the test phase and 2) we consider the maps obtained by applying the SVM. The SVM model appears to provide much better results than the two chemical models. In particular, while FARM bc tends to overestimate (slope = 2.0) and FARM fc model tends to underestimate (slope = 0.78), the SVM model avoids both of these distortions (slope = 0.96), with also a better correlation (R2 = 0.93 against an average of R2  0.82). With regard to the daily exposure maps by the SVM, note that the model has been built (and tested) for reproducing not daily, but period concentrations, so addition of a small amount of forcing was necessary. Generally, for large area simulations, not all pixels include measurements. For that reason, it is difficult to test the maps deriving by air dispersion results. Thus, indirect performance indices were developed: R_neg measures the percentage of negative values, R_(U-NU) indicates the percentage of days where pollutant concentrations are lower in the urban versus the non-urban area. The choice of these indices lies in the observation that negative concentrations are not possible and that pollutant concentrations are higher in the urban than in a non-urban area. To define R_(U-NU), three pixels have been fixed: one on the sea (South-West region of the domain), one on the lake (South-East region) and one in the center of Rome. We obtained: R_neg = 0, R_(U-NU) = 3.29% comparing the city with the sea, and R_(U-NU) = 2.74% comparing the city with the lake.

The analysis of estimates produced by FARM bc and by SVM underlines the congruent behavior of the SVM model and its generalization capability. It produces estimates generally lower over the lake and over the sea than in the city, even if only urban samples have been used for training. All of the annual maps produce higher values in the urban area than outside. However, while the maps obatined by FARM bc and by FARM fc are strongly related, the maps produced by SVM show a slightly different shape. The mean values over all maps are 2.23 ng/m3, 0.98 ng/m3 and 1.78 ng/m3 for FARM bc, FARM fc and SVM, respectively. These maps seem to further confirm the results obtained previously, where the estimates produced by the SVM model are between those obtained by the FARM bc model (that tends to overestimate) and the FARM fc model (that tends to underestimate).