J8A.3 An Open-Source Software Solution for Repeatable and Interpretable Geospatial Evaluations

Tuesday, 30 January 2024: 5:00 PM
337 (The Baltimore Convention Center)
Gregory Petrochenkov, Lynker, Leesburg, VA; and F. Aristizabal and F. Salas

Earth science tools that lower the barrier of entry for domain experts while allowing for scalable computations are critical to accelerate scientific progress. The benefit of such progress includes the increase in productivity in science workflows, ability to disseminate data with greater veracity, velocity, and variety, and the potential to make cross-cutting linkages in risk assessment and science communication. One common bottleneck for productivity is the evaluation of geospatial model output. Researchers are left to evaluate the rapid proliferation of georeferenced datasets independently with tools that are model or dataset specific. This leads to cumbersome, non-repeatable efforts to quantify the performance of modeling results. As a consequence this significantly slows the pace of the dissemination of data which could be used in cross-cutting research and operations as well as community supported science capabilities. We developed a Python software package for geospatial evaluations, which we call “GVAL”, to address these problems.

GVAL is an interoperable, scalable, and efficient Python framework to validate gridded datasets for a variety of applications. GVAL compares modeled output maps with either observation or alternative modeled output maps, which can be of raster or vector formats, producing agreement maps and metrics. Maps undergo necessary homogenization to ensure spatial, data format, and numerical alignment prior to comparison. Comparisons are handled for two-class categorical, multi-class categorical, continuous, and probabilistic statistical data types. Libraries of standard metrics for each statistical data type are included with the ability to register custom, user-defined statistical metrics. Functionality also includes the visualization of agreement maps, subsampling for regions of interest or for use in masking, catalog comparisons including cloud-native catalogs, and attribute tracking methods to facilitate readable metadata, statistical analysis, and hypothesis testing. Leveraging the Pangeo stack, GVAL takes advantage of libraries which provide options for both local serial and parallel processing as well as distributed processing capabilities. The GVAL package is portable, open-source, and supports a variety of analysis domains with geospatial modeling output.

This work will demonstrate the utility of GVAL showing examples of evaluations including several statistical data types and geospatial variables of interest including fluvial inundation extent, storm surge, and total precipitation. We will also demonstrate GVAL’s scalability in workflows by batch processing evaluations of cloud datasets via cataloging as well as the interpretability and reusability of its evaluation workflows via attribute tracking with the intention to adhere to the Findable Accessible Interoperable Reusable (FAIR) data standards and best practices. Each example will also include subsampling the dataset with regions of interest representing vulnerable infrastructure, political boundaries, and wildlife refuges respectively. Through these examples we will articulate how the advancement of geospatial evaluation workflows provide detailed insight into the model performance of gridded datasets in cross-cutting contexts.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner