2.3 Bytes from Petabytes: Extracting Information Out of Massive Meteorological Datasets

Monday, 29 January 2024: 11:15 AM
324 (The Baltimore Convention Center)
Tiago Quintino, ECMWF, Reading, United kingdom; and M. Leuridan, J. N. Hawkes, C. Bradley, E. Betke, S. Smart, B. Raoult, and N. Wedi

Numerical Weather Prediction (NWP) is a highly data intensive HPC application. Today, ECMWF operational weather forecasts generate massive amounts of I/O in short bursts, accumulating to 320 TiB per day in one-hour forecast cycle windows. With model improvements and higher resolution forecasting however, this raw data is expected to grow to over a petabyte per day over the next few years, especially under the European Union Destination Earth initiative.

While these improvements should truly help scientists better forecast weather events, distributing such vast amounts of data efficiently will prove increasingly difficult with the current data access mechanisms.

To tackle this challenge, ECMWF has developed a novel feature extraction concept, named “Polytope”. By leveraging tools in the field of higher dimensional computational geometry, Polytope is able to efficiently cut a wide range of intricate n-dimensional shapes (polytopes) from ECMWF’s high-dimension (6D/7D) weather forecast datacubes. Polytope can be used to perform server-side feature extraction, providing multiple orders of magnitude of data reduction before delivering the data to the user.
Most importantly, we couple this selection algorithm with a new functionality capable of only retrieving the required bytes from the packed gigabyte-size forecast fields, allowing to reduce to the bare minimum the I/O required to satisfy user request, even for compressed data. All this happens directly on IFS model GRIB output, without any data preparation or processing.

Practical examples include requesting time series or vertical profiles, for arbitrary points on the globe, without any special pre-processing. Alternatively, we can extract weather data over a 4-dimensional flight path, crossing three spatial axes as well the temporal axis. Instead of providing data entire bounding boxes of the flight path, Polytope only returns the few precise bytes of interest to the user.

This presentation will introduce the Polytope concept and demonstrate its usage for different types of feature extraction directly on model output inside of the HPC without requiring any further post-processing of the data.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner