S44 Using Supervised Machine Learning and HYSPLIT Backwards Trajectories to Predict Airborne VOC Concentrations

Sunday, 28 January 2024
Hall E (The Baltimore Convention Center)
Victor William Geiser, NASA SARP, Belgium, WI
Manuscript (73.3 kB)

Handout (31.4 MB)

Algorithms for supervised machine learning are growing more accessible to research and the predictive power they offer have important scientific implications. The use of machine learning in predicting volatile organic compound (VOC) concentrations in whole air sampling (WAS) has not previously been explored. The model created in this study was initialized with trajectory data generated using the NOAA Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT) integrated with the Global Data Assimilation System (GDAS) to calculate meteorological variables along a 24 hour backwards trajectory path. The endpoints of these trajectories (at Time -24hr and Time 0hr) were then treated as inputs to train this model using the Random Forest Regression algorithm provided by the Python module Scikit-Learn. The goal of this study is to predict the airborne concentrations of selected VOCs: dimethyl sulfide (DMS), methane, ethane, benzene, toluene, and isoprene. These concentrations were compared against the gas chromatography results from WAS data collected by NASA Student Airborne Research Program (SARP) flights from 2009 to 2022. It was found that the Random Forest Regression model created for this study can predict the concentrations of the selected VOCs calculated over 20 model runs with average R^2 values of: T-24: 0.66, T0: 0.72 (DMS), T-24: 0.56, T0: 0.73 (methane), T-24: 0.41, T0: 0.48 (ethane), T-24: 0.34, T0: 0.38 (benzene), T-24: 0.34, T0: 0.45 (toluene), T-24: 0.13, T0: 0.35 (isoprene). These results predict airborne DMS and methane concentrations with actionable confidence based on the initial T-24 and T0 model parameters. The concentrations of ethane, benzene, toluene, and isoprene can be evaluated using the same process, but with a larger degree of uncertainty. These results, in combination with an analysis of model sensitivity and performance, yield consistent trends regarding known properties governing the locality of VOC concentrations and lifetime in the atmosphere. The model created in this study is scalable and can be trained in any location where WAS data is collected. Possible future work on this model includes predicting other VOCs not included in this study, using longer backwards trajectories, and higher resolution meteorological data.

Supplementary URL: https://drive.google.com/file/d/1XNbiNHZbvRHVRyPTjpK_KiH2uK7WPwZ5/view?usp=sharing

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner