Fine-tuning AI Language Translation Models for NWS Applications

Stiefel, Phil; Stiefel, Phil

Of the 18 people killed by Hurricane Ida in 2021, more than half did not speak English or Spanish. In response to this tragic event, the National Weather Service’s accelerated its efforts to identify emerging technologies that could:

extend existing Spanish translation to additional weather products that had not previously been translated and
provide at-scale language translation for multiple new languages that had never before been nationally supported by NWS.

The NWS faced numerous technical barriers as they sought to implement an at-scale translation strategy for Spanish, Samoan, and new languages:

Existing Machine Translation (MT) systems did not meet NWS timeliness or accuracy requirements for translating high-impact weather alerts into Spanish and additional languages without further human review. NWS maintains a high quality bar for this task as software-generated mistranslations in a weather report could cause a recipient to take incorrect response actions, potentially leading to death, injury, or property loss.
While NWS possessed some language translation model training data for Spanish tropical products, it lacked Spanish training data for other weather types (e.g. snow squalls) and additional text products.
NWS did not have an internal capability to produce sufficient quantities of training data for languages beyond Samoan and Spanish.
MT models would eventually go stale without extensive retraining, leading to progressive translation accuracy degradations.

To overcome these hurdles, NWS partnered with Lilt to fine-tune Lilt’s AI language translation models to properly translate NWS-specific content. Lilt’s software is an AI-enabled supervised learning system that a) intelligently interacts with human linguists as they translate text and b) learns from those interactions to produce more accurate translations (instant, automatic, continuous model retraining). The NWS and Lilt observed that this supervised learning feedback mechanism interacting with bilingual NWS San Juan forecasters could be used to train highly accurate MT models for translation of NWS content across multiple languages. Accuracy of the trained system was assessed using BLEU scores and by measuring the system Word Prediction Accuracy (WPA; the percentage of words predicted by the system that are accepted by a human translator).

The accuracy improvements demonstrated in this experiment provided a pathway for NWS to scale translation to multiple languages that had never been supported at scale by NWS in the past, such as Simplified Chinese. In the future, the NWS plans to apply this approach to at least 8 additional languages. This talk will present further details about the Lilt AI system, an overview of available AI translation industry accuracy scores, and quantitative accuracy results achieved from the NWS language models and pilots.

2.4 Fine-tuning AI Language Translation Models for NWS Applications