Both approaches depend on the concept of watershed similarity based on their traits, which are properties such as their topography, geology, land cover, land use and other human activities. These traits interact and coevolve with each other, and with climate forcings to influence how watersheds function at different scales. Data on meteorology, watershed traits functions and traits are now available at large spatial scales from monitoring networks (e.g., USGS Streamflow network), remote sensing and derived geospatial products (e.g. Daymet, Streamcat), presenting opportunities to use these in watershed to continental-scale models of hydrologic functions.
In this study we evaluate different approaches to building trait-based, ML PUB models at different spatial scales across the continental United States. Top-down approaches include continental models that use data from all available sites with associated trait information, and grouping models by region or by trait similarity. Here, similarity is determined by a novel approach that uses networks to classify over 9000 watersheds using over 300 traits. This is compared to a bottom-up approach where local ML models are built for individual monitored sites, and a meta-transfer learning model that incorporates trait information, is used to make predictions at unmonitored sites. Finally, we use different methods, including mutual information and feature importance, to determine the relationships between traits and hydrologic function and ultimately select the traits that are most relevant for a prediction of a function. Our results help to understand the predominant traits that influence hydrologic functions, and can inform design and feature selection for ML PUB models.

