Large Language Models for Classifying Flash Flood Impacts

Gourley, Jonathan; Gourley, Jonathan

Floods are one of the most pervasive and increasingly ubiquitous natural hazards. Flash floods are dangerous
and often acute events, with little time for alerting the public of imminent hazards to life and property. In
late 2019, the National Weather Service (NWS) transitioned their warning paradigm into an impact-based format
known as impact-based warnings (IBWs). This new format applies to flash flood warnings, and intends to
provide detailed information about the hazard, its source, an impact narrative, and a flash flood damage
threat tag. These damage threat tags and additional warning information aim to elicit different calls to
action for the emergency management community and the public (e.g. Wireless Emergency Alerts).
Arduous previous attempts have been made at using pre-trained deep learning-based language models
(e.g. BERT) for natural language processing (NLP), to classify local storm reports (LSRs) into impact
categories (i.e. damage threat tags). Having relied on a severely limited amount of impact-labeled re-
ports (IBW-based), these efforts yielded results consistent with attempting to model highly-complex natural
language-related classification tasks under severe data scarcity. However, a turning point in methods and
results was marked by the advent of highly performant, widely available, and affordable access to pre-trained
large language models (LLMs) like ChatGPT, in conjunction with the Flash Flood Severity Index (FFSI): a
systematically-conceived framework for assessing (and therefore classifying) flash flood severity from textual
flash flood reports (Shroeder et al., 2016).

This work will present a novel LLM-based workflow for classifying hazard reports into impact categories,
which yielded a new and novel dataset of flash flood reports with their associated impact classifications.
Specifically, this work showcases the automatic classification of flash flood LSRs into FFSI impact categories using
ChatGPT, and prompts engineered to incorporate textualized forms of FFSI impact definitions based on
the published literature. Report classifications were verified and contrasted with Flooded Locations and
Simulated Hydrographs (FLASH) product outputs for each LSR’s flood event, within a given spatio-temporal
3D window (search radius δr[km], search time ±δt[h]). This approach was first tested on a significant flash
flooding event which occurred in the state of Kentucky in July 2022, and was subsequently used
to classify over 22k historical LSR reports between May 2018 and June 2022. This unprecedented
dataset is the cornerstone that enables the present development of new experimental FLASH products,
which look to inform forecasters of the anticipated impacts for any given flash flood forecast, in line with
the NWS’ recently implemented IBW framework.

Acknowledgements This research was funded by the NOAA/OAR - Joint Technology Transfer Initiative
(JTTI) program under Federal Award No. NA20OAR4590354, U.S. Department of Commerce, National
Oceanic and Atmospheric Administration.

J9B.3 Large Language Models for Classifying Flash Flood Impacts