Computational Linguistics and the Communication of Weather Forecasts

Stern, Harvey; Stern, Harvey

The Association for Computational Linguistics (ACL) defines the term Computational Linguistics as the scientific study of language from a computational perspective (https://www.aclweb.org/archive/misc/what.html). The ACL notes that computational linguists are interested in providing computational models of various kinds of linguistic phenomena and that these models may be knowledge-based (hand-crafted) or data-driven (statistical or empirical).

The ACL further notes that work in computational linguistics is in some cases motivated from a scientific perspective in that one is trying to provide a computational explanation for a particular linguistic or psycholinguistic phenomenon; and in other cases the motivation may be more purely technological in that one wants to provide a working component of a speech or natural language system.

The ACL observes that, indeed, the work of computational linguists is incorporated into many working systems today, including speech recognition systems, text-to-speech synthesizers, automated voice response systems, web search engines, text editors, and language instruction materials.

The current paper presents an analysis of the words used in a 12-year data set (2005-2017) of précis weather forecasts for Melbourne, Australia, with a view to:

Analysing the overall frequency of occurrence, of particular words and phrases;
Noting any significant trends over the period, in the nature of the language utilised to communicate the weather forecast information;
Establishing how one might best combine textual components of weather forecasts with numerical components (for example, precipitation amount and probability) to (hopefully) enhance the accuracy of the latter.

The ten most frequently occurring Day-1 précis weather forecasts issued by the Australian Bureau of Meteorology for Melbourne over the twelve years were PARTLY CLOUDY (8.4%), SHOWER OR TWO (7.8%), MOSTLY SUNNY (6.6%), FINE (6.2%), SUNNY (5.3%), FEW SHOWERS (3.3%), A FEW SHOWERS (3.1%), SHOWER OR TWO CLEARING (2.5%), BECOMING FINE (2.0%), and POSSIBLE SHOWER (1.7%).

The most dramatic change in the language utilised relates to FINE which was used on 20% of occasions during the first year, but was completely absent during the last year. By contrast, the précis PARTLY CLOUDY, which was not used at all during the first year, was used on 16% of occasions during the final year.

Regarding how best to blend textual components of weather forecasts with numerical components with a view to enhancing the accuracy of the numerical components, some preliminary work towards this end was presented by the author at last year's AMS Annual Meeting (https://ams.confex.com/ams/97Annual/webprogram/Paper315567.html).

The results of that work have been applied to the generation of forecasts by combining the output of an NWP model with the official Bureau of Meteorology predictions (both textual and numerical components).

FIGURE. Frequency of differently worded précis weather forecasts

2.2 Computational Linguistics and the Communication of Weather Forecasts