15.1 Building Real-Time Machine Learning Forecast and Visualization Systems

Thursday, 20 July 2023: 2:00 PM
Madison Ballroom A (Monona Terrace)
David John Gagne II, Ph.D., NCAR, Boulder, CO; and C. Becker, G. Gantos, R. A. Sobash, D. A. Ahijevych, N. A. Snook, R. Chase, L. K. Spychalla, M. Shotande, A. McGovern, C. D. Wirz, C. K. Potvin, and M. G. Cains

As machine learning systems are developed and mature into a product that could be used operationally, the requirements and constraints on the system change significantly. Rather than making design choices to maximize a single test set metric, the system designers now need to consider how their choices affect the stability, robustness, and latency of the whole machine learning and visualization pipeline. As a result of the changing considerations between the initial training and operational transition period, many machine learning systems have struggled to make the transition to an operational environment. In this presentation, I will discuss some of the specific challenges with building real-time machine learning weather prediction systems and how my team has overcome them in the context of two projects we have transitioned into real time systems. The first project discussed is an analysis system for analyzing storm mode from convection-allowing models. Some of the key challenges for this project were transitioning models from a fixed WRF reforecast dataset to an operational convection-allowing NWP model, accounting for differences in storm segmentation process between training and operations, building an inference pipeline that could operate on both traditional HPC and cloud platforms, and building an interactive visualization on a static website with cloud-hosted prediction data. The second project discussed is a short-term deep learning tornado prediction system trained on gridded radar data and applied to the Warn-on-Forecast ensemble System (WoFS). In order to run in real-time, this system required redesign to minimize the latency in the data processing pipeline, coordination with the WoFS team to transfer full 3D WoFS files securely from one cloud instance to another as the ensemble is running, and compression and visualization design choices to minimize the latency in the visualization of a large volume of ensemble predictions and associated metadata. For both systems, we utilized a mix of Python and Javascript tools to enable fast development and iteration of the machine learning and visualization pipelines. We also found feedback from forecasters and other operations-focused collaborators to be critical in making the systems more robust and useful for operational tasks.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner