The aim of this study is to explore machine learning methodologies for the estimation of aerosol mixing state metrics, which we define here in three different ways: with respect to hygroscopicity, optical properties, or chemical species abundance. We adopted a data-driven approach, leveraging deep learning and XGboost to utilize PartMC simulations. Deep learning uncovers hidden insights and models complex relationships between features and responses, whereas XGBoost outperforms several other well-known implementations of gradient tree boosting by offering advanced features for model tuning, computing environments, and algorithm enhancement. For training, development, and testing datasets we used the output from scenario libraries of particle-resolved simulations covering aerosol populations with different compositions and different mixing states to represent a range of environmental conditions at the global scale. Numerous configurations of machine learning models were evaluated in this investigation. A high-performance predictive model was determined to provide accurate aerosol mixing state metrics predictions.
Our framework is designed to gain new fundamental understanding about: 1) how machine learning can be applied to improve the representation of aerosol mixing state; and 2) where the inappropriate assumptions of aerosol mixing state may lead to large errors at a global scale. The novel capabilities help overcome some of the current limitations in atmospheric modeling and numerical weather prediction.