Lessons Learned from Building Real-Time Machine Learning Testbeds for AI2ES

Gagne, David; Gagne, David

As machine learning systems evolve from their initial training and offline evaluation stages to running in a real-time setting for evaluation and use by different stakeholders, the priorities and constraints on the system change significantly, and issues masked by bulk statistics and the differing culture of a research environment become readily apparent. In the past year, the NSF AI Institute for Research on Trustworthy AI for Weather, Climate and Coastal Oceanography (AI2ES) has been developing real-time testbeds for some of the AI products developed as part of the institute’s research in collaboration with both government and industry partners, including NOAA, The Weather Company, and Vaisala. Applications for the testbed include tornadic storms, winter precipitation type, frontal analysis, and coastal hazards (fog, flooding, and cold stunning of sea turtles). As part of the testbed-building process, we have learned many important lessons about transitioning these products closer to operations and the important roles that communication and shared understanding among developers and stakeholders play. Providing potential adopters with a low-cost way to experience AI product output in real-time facilitates feedback, creates unique opportunities for user-centered research, and ultimately helps justify the expense of R2O into an operational government or industry environment. We have learned the value of developing ML systems that can be applied to multiple types of NWP models for inter-model comparison, documenting and transferring ML model configurations, developing low-latency visualization systems, identifying and fixing issues with ML models revealed by real-time predictions, and that the downstream added value of an ML system is not necessarily tied to its test set score. These lessons and more recent updates to the testbed will be discussed.

J16C.1 Lessons Learned from Building Real-Time Machine Learning Testbeds for AI2ES