J9B.4 Developing and Evaluating Meteorologically Fine-Tuned Large Language Models for Weather Safety Education

Wednesday, 31 January 2024: 9:15 AM
338 (The Baltimore Convention Center)
Armani L. Cassel, NWS, Shreveport, LA; and B. Thorne and D. C. Hilderbrand

In November 2022, the debut and popularity of ChatGPT, a chatbot interface to a large language model (LLM) from OpenAI, pushed an already active artificial intelligence research field/industry into a new frenzy of development. Several chatbots and LLMs have been released since then, with multiple versions designed to be deployed from the cloud or running locally without the need for internet access. Some chatbots, like Khanmigo from Khan Academy have also been fine-tuned or tailored to produce text with the persona (and patience) of a tutor for the purpose of augmenting subject comprehension.

With this ability to tailor chatbots for certain purposes, there is the potential for LLMs to be used for the purpose of meteorological text output. There is also the likelihood that a significant amount of publicly-available meteorological texts have already been collected into data sets that multiple LLMs were trained on. This research explores the development of a meteorologically-tailored chatbot and how it can deliver text output fine-tuned for the purpose of citizen science and citizen safety. The exploration of this purpose starts with the combined texts from Skywarn and NWS Jetstream literatures to train and inform an interactive Owlie Skywarn chatbot interface. In the deployment of this Owlie chatbot, users can ask weather-related questions to Owlie for it to respond and for users to learn weather information from its response.

As with any model, the quality of LLM output is not indubitable, and the proliferation of LLM output within more software introduces necessary scrutiny in more areas. The need for a particular level of trustworthiness in a chatbot like this is heightened by its very nature of distributing potentially life-saving weather information. Beyond evaluating question-answer interactions, the Owlie chatbot will also be evaluated on its ability to effectively guide users through storm safety and storm spotting curriculum. Ultimately, we examine how meteorological LLMs and chatbots can be developed, evaluated for quality, and explore the beginning of how/where their output can be used to inform and prepare a Weather-Ready Nation.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner