An approach towards including watershed traits in machine learning models for predictions in unmonitored basins (INVITED)

Varadharajan, Charuleka; Varadharajan, Charuleka

Predictions in unmonitored basins (PUBs) has been a longstanding area of research in hydrological modeling, and is especially important for assessing the impacts of disturbances that have unpredictable timing, duration and spatial extents. Classical bottom-up approaches for modeling PUBs have involved regionalization of statistical or process-based models built at representative, monitored sites based on different measures of watershed similarity. Recent top-down machine learning (ML) models use large continental-scale datasets combined with static site attributes (referred to here as watershed traits), and are increasingly outperforming other approaches for PUB predictions.

Both approaches depend on the concept of watershed similarity based on their traits, which are properties such as their topography, geology, land cover, land use and other human activities. These traits interact and coevolve with each other, and with climate forcings to influence how watersheds function at different scales. Data on meteorology, watershed traits functions and traits are now available at large spatial scales from monitoring networks (e.g., USGS Streamflow network), remote sensing and derived geospatial products (e.g. Daymet, Streamcat), presenting opportunities to use these in watershed to continental-scale models of hydrologic functions.

In this study we evaluate different approaches to building trait-based, ML PUB models at different spatial scales across the continental United States. Top-down approaches include continental models that use data from all available sites with associated trait information, and grouping models by region or by trait similarity. Here, similarity is determined by a novel approach that uses networks to classify over 9000 watersheds using over 300 traits. This is compared to a bottom-up approach where local ML models are built for individual monitored sites, and a meta-transfer learning model that incorporates trait information, is used to make predictions at unmonitored sites. Finally, we use different methods, including mutual information and feature importance, to determine the relationships between traits and hydrologic function and ultimately select the traits that are most relevant for a prediction of a function. Our results help to understand the predominant traits that influence hydrologic functions, and can inform design and feature selection for ML PUB models.

7A.1 An approach towards including watershed traits in machine learning models for predictions in unmonitored basins (INVITED)