440571 Downscaling SoilMERGE within an Open Science Framework using Machine Learning

Sunday, 28 January 2024
Hall E (The Baltimore Convention Center)
Daniela Esparza, CEES = Center for Earth and Environmental Studies, Laredo, TX; and M. Garcia and A. Perez

The purpose of this project is to downscale SoilMERGE (SMERGE) using machine learning algorithms. SMERGE is a root-zone soil moisture product that covers the continental United States. To facilitate open science the SMERGE team is developing a cookbook in Project Pythia to provide full transparency into this downscaling effort. The SMERGE cookbook consists of three machine learning (ML) script categories, which correspond to Random Forest (RF), Extreme Gradient Boosting Regressor (XGBoost), and Gradient Boosting Regressor (GBoost) machine learning algorithms. Scripts were generalized and cross-validated among the team members for accuracy. Finally, the generic scripts were reviewed to build up the documentation to make them accessible and easy to understand. The downscaling was performed between 100 to 3000 m, with a default spatial resolution of 0.125 degrees. XGBoost, GBoost, and RF are used to develop a downscaled version of SMERGE that is validated by comparison with various in situ dataset sources focusing in Oklahoma and Kansas by using correlation and Unbiased Root Mean Square Error (ubRMSE). Internal metrics of variable sensitivity were also utilized. RF sensitivity was gauged using TensorFlow's Inverse Mean Minimum Depth (IMMD). For XGBoost and GBoost, independent variable sensitivity was evaluated using the interpretability model tool SHapley Additive exPlanations (SHAP). The SMERGE cookbook facilitates the application of the above ML algorithms for any potential user and employs visual representation to report and evaluate the results. Two students helped to develop this cookbook and learned ML marketable skills helping to build their professional background. Those skills included applying machine learning algorithms into large datasets, handling spatial data, evaluating, and validating models (evaluating model predictions), applying visualization tools (using tables to show datasets and plotting graphs showing prediction error and feature importance), and understanding the parameters that affect soil moisture. These practical skills are vital to address real-world challenges in today’s data-driven world.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner