J10B.1 Identifying Societal Vulnerabilities and Resilience Related to Weather Using Newspapers and Artificial Intelligence

Wednesday, 31 January 2024: 10:45 AM
338 (The Baltimore Convention Center)
Renee Sieber, McGill Univ., Montreal West, QC, Canada; and F. Fabry, V. Slonosky, M. Wang, and Y. Zhang

We sought to determine vulnerabilities and resilience as a consequence of weather events, and how they changed from the 19th Century onwards. In the absence of standardized records of these kinds of impacts (e.g., properties and people swept away by floods, breaks and falls due to ice accumulation, power outages and road closures), we hypothesized that 1) these events would be worth writing about in traditional or social media and that, 2) the greater the discussion about an event, the more impactful it must have been. In these writings, we may even learn which aspects of weather events most affected people, as well as what may have been some of the failings in our preparation or responses to that event.

We use classification algorithms to analyze two 20-year tranches of newspaper articles from Southern Quebec, which also reflects weather of the North American Northeast. One tranche is for the recent period (1995-2014), and one for an historical period (1880-1899). We looked for 1) types of events that concerned journalists then and now, 2) word clusters that may associate the type of event to suggested vulnerabilities, either the cause, impacts, or clues hinting at reasons for insufficient preparedness or resilience, and 3) comparisons of similar clusters to detect change over time. Our hope was that automation via these classification algorithms could reveal insights absent preconceived ideas about how people coped in the past and how those coping mechanisms have changed over time.

To achieve this, we created a workflow that consisted of selecting newspapers, identifying weather types and associated keywords, filtering the newspapers, segmenting newspaper articles, and performing optical character recognition (OCR--which actually is comprised of multiple steps including bounding boxes as well as the pixel to character comparisons). We then conducted a preliminary AI-enabled visualization, preprocessed the corpus for classification (stemming, lemmatization) analyzed the corpus with classification methods (unsupervised, semi-supervised), and finally compared the results from the two temporal tranches. Our newspapers included one English language newspaper (The Montreal Gazette) and two French language newspapers (La Patrie and La Presse).

Robust AI analysis requires large volumes of digitized text. Contrary to examples in the literature, building the corpus remained our largest challenge. We encountered issues arising from the digitization and OCR of print newspapers (historical and current). Legibility of the text, and subsequent word recognition, was heavily dependent on the quality of the digital scan and OCR processing. Obtaining sufficient volume of relevant and quality digitized output (txt, xml) proved to be an unexpected obstacle that complicated our analysis.

Classification algorithms look for patterns in data, transforming them into useful concepts. In our case, we are interested in clusters we can label as types of vulnerabilities and resilience. We began with the unsupervised classification, Latent Dirichlet Allocation (LDA) and had hoped to move to supervised classification. Due to the specialized and targeted nature of our research question, it proved difficult to find sufficient articles within the two time periods to build a corpus large enough for many AI algorithms. With this data sparsity, we moved to the Bidirectional Encoder Representations from Transformers (BERT). One of the first large language models, BERT had an advantage of already being pre-trained on natural languages.

We report on preliminary results. Rather than winter weather, floods proved to be the type of events that were discussed in the newspapers. However, flooding did not show up clearly in the classifications. We postulate that this is because there are several different meteorological events that can lead to flooding (ice thaws, heavy precipitation), making flooding appear more generalized in the cluster analysis, and spread over several clusters. Snow/thaw/freeze sequences and their impacts (e.g., “hospital power outage”, “snow removal”, “use of indoor heaters”) were good examples of impact events for which society is often unprepared. Preliminary results also show differences when comparing corpora (e.g., increasing impacts on infrastructure).

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner