A Study on a Weather AI-Search Engine using Speech Recognition and Natural Language Processing

Kim, Byeongyeon; Kim, Byeongyeon

This study aims to develop a search engine that can quickly and accurately search for various weather data for forecasting tasks.
The data are provided by the Korea Meteorological Administration’s Integrated Weather Information System(COMIS5),and the search engine uses speech recognition and natural language processing.
The natural language processing model was based on mT5-small (Xue et al., 2020) and the speech recognition model employed Whisper medium (Radford et al., 2022).
We created pseudo URL training data by mapping the KMA webpage and keywords through web crawling.
To solve the imbalance of search results, we augmented the pseudo URLs and trained additional synonym dictionaries and spacing dictionaries for meteorological terms.
The speech recognition model was trained using a Korean general speech recognition dataset, a forecast domain speech data, and TTS synthesized speech.
The inference time of the speech recognition model was 0.63s on average, and Character-Level Error rate was 8.59 and 9.24 on the general and forecast domain datasets, respectively
MRR(Mean Reciprocal Rank) of the natural language processing model was 0.73, and recall was 0.73
This study is expected to reduce the search time for reference materials needed for forecasting, and improve the efficiency and satisfaction of forecasting tasks.

J9B.6 A Study on a Weather AI-Search Engine using Speech Recognition and Natural Language Processing