J1.1 Wrangling Station Data with Big Data Tools and Approaches

Monday, 23 January 2017: 11:00 AM
Conference Center: Chelan 4 (Washington State Convention Center )
Eric Hewitt, Understory Weather, Madison, WI; and A. Kubicek, N. Homeier, S. Bussmann, and K. E. Willmot

As weather stations become smarter and more capable, it's important to understand the various approaches to dealing with big data. Data processing tools have exploded in popularity and efficiency in the past decade. The big data ecosystem is a robust and diverse set of tools, communities, and techniques. Several architectures have been employed in the efforts to wrangle big data, from batch processing with MapReduce jobs on Hadoop to the Kappa architecture relying almost entirely on Spark. With such a vast ecosystem, architecting data infrastructure requires working knowledge of many different tools and approaches.
Understory Weather has micronets in Kansas City and Dallas, covering the highest population centers of these metro areas with stations spaced every one to five kilometers. By blanketing a metro area with these stations, we can get a better picture of how mesoscale systems affect a city, accurately determine where hail fell, and assess the potential damage from the storm. The data generated from these micronets, though, is immense and never stops coming. The data from the micronets is processed in real-time, leveraging a scalable cloud infrastructure. Within seconds of severe weather being detected, email alerts are generated. Understory Weather partnered with broadcast meteorologists and insurance providers to pilot the real-time alerting and analysis capabilities of these micronets. Broadcasters were able to pinpoint where the most damaging areas of a storm were as the storm was occurring, and insurers were able to mobilize their catastrophe and claims teams faster and with better precision than had been possible in the past.
A modified Kappa architecture was used to facilitate the real-time response of the Understory network. Data is handled by a high-throughput message queue and a complex event processor to route it to the correct location, in the right format as quick as possible. Leveraging the lessons of big data, we're able to build a fast, reliable, and most importantly, scalable infrastructure to handle the ever increasing load of sensor data.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner