Dredging Data Discovery with Datashader

Martin, Thomas Kallstrom; Martin, Thomas Kallstrom

Dredging is one of the U.S. Army Corps of Engineers (USACE) core business lines. Each year more than 1.4 billion dollars of dredging activity takes place to ensure safe navigation as well as national and international commerce. Dredging data, including bathymetry surveys, GPS traces of dredging boat activities, time sequence of on-boat sensors, dredging activity reports, and sediment surveys, is vast, complex and typically archived in separate data silos.

In this paper we will discuss early progress to unify all dredging data from various USACE archival systems, along with supporting data from USGS and NOAA, to produce a single operational overview of USACE historical dredging activity.

In order to study scalability of integration and visualization of dredging data the team unified all dredging related data for the Galveston Bay and Houston shipping channel using Python libraries: Pandas, Matplotlib and Jupyter Notebooks. This pilot study showed the value of unifying all related data as the impact of major hurricane and flooding events was evident in dredging surveys, but also showed scaling problems as processing and visualization slowed dramatically as more data was considered.

In order to approach integration and visualization of all dredging data for coastal and inland river United States a more scalable solution was needed. After the case study the team moved to code that utilized automated parallelization and chunking with Dask, plotting using Geoviews, a part of the Pyviz family of Bokeh extensions, and Datashader, a python library that samples huge out of memory datasets to produce a view-optimized heat map. With this scalable compute and visualization architecture the team was able to unify more than half a billion data items into a single comprehensive view of USACE dredging activity.

3.5 Dredging Data Discovery with Datashader