J21.4 Evaluating the Use of a Na´ve Bayes Classification Algorithm to Identify Tweets Containing Observations of Flash Flooding

Thursday, 14 January 2016: 2:15 PM
Room 354 ( New Orleans Ernest N. Morial Convention Center)
Brandon R. Smith, OU/CIMMS and NOAA/OAR/NSSL, Norman, OK; and J. J. Gourley, Y. Hong, C. Silva, and Z. L. Flamig

Although many current observational databases exist that allow the collection of information about a flash flooding event, there is no single database that correctly captures all of the associated characteristics. Using these observational databases in tandem can alleviate some of these issues but drawbacks still exist. Adding an additional database that uses reports gathered from social media, in this case Twitter, could attempt to alleviate this issue. Twitter can provide an easy and efficient method for individuals to distribute and collect information regarding high-impact events, such as flash flooding. However, the large number of non-germane tweets sent typically obscures this information. This study examines the feasibility of using a Na´ve Bayes machine-learning algorithm to classify a collection of tweets as either containing or not containing an observation of flash flooding. Using a total of six independent flash flooding events, the performance of three trained Na´ve Bayes classification models on properly classifying tweets from these events are evaluated. Results show that the Na´ve Bayes models typically perform well when classifying tweets for the local areas upon which they were trained. When the models are tested on datasets that fall outside of their geographically trained area, their performance decreases. Use of a combined Na´ve Bayes model that is trained from classified tweets from different geographical areas helps to improve model performance for events that occur outside of its trained domain.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner