An increase in cloud condensation nuclei from biomass burning and urban pollution can increase the overall number of cloud droplets, thereby lowering cloud effective radius (re) assuming a fixed liquid water content. This reduction in re is known as the first indirect or "Twomey" effect, and can delay precipitation resulting in increased cloud lifetime and thickness. This study assesses the prediction of the first aerosol indirect effect in regions of varying levels of biomass burning and non-biomass burning pollution using machine learning regression methods such as Random Forests, K-Nearest Neighbors, and Support Vector Machines. We use in situ DC8 aircraft data from NASA's Studies of Emissions and Atmospheric Composition Clouds and Climate Coupling by Regional Surveys (NASA-SEAC4RS) project.
The data are standardized, filtered to include only in cloud samples, and divided into 4 categories based on pollution amount and type: clean, biomass burning polluted, non-biomass burning polluted, and all-inclusive. We then use recursive feature elimination (RFE) to rank several different atmospheric variables based on their relative importance in predicting re in each of the 4 pollution categories. Early findings show that ice water content (IWC), temperature (T), and carbon monoxide (CO) are highly predictive of re, which is substantiated by current theoretical and observational studies. Using 100-fold cross validation (CV) to estimate re in the all inclusive category, random forest regression had a CV root mean squared error of 0.49 and a CV-R2 of 0.75 with IWC, T, and CO as the predictor variables.