J2.4 Geocaching with Geohashing—Scaling Weather APIs with Python and Spark for Big Data Machine Learning

Monday, 13 January 2020: 11:15 AM
157AB (Boston Convention and Exhibition Center)
Alexander Kalmikov, QuantumBlack, a McKinsey Company, Cambridge, MA; and Y. Zhu, L. Zhang, and J. Annor
Manuscript (13.3 MB)

Handout (13.3 MB)

Implementation of machine learning pipelines for weather data-driven advanced analytics applications encounters scalability bottleneck when merging existing weather API infrastructure with high volume and speed data requirements of industrial AI systems. Algorithmic solution based on geospatial hash tables reduces the latency due to API access time and eases the redundancy pulling similar weather data at neighboring locations. We demonstrate implementations of geocaching solutions with Python and Spark, and quantify the resulting algorithmic efficiency gains. Multi-objective trade-offs in hashing function optimization are illustrated, highlighting the engineering considerations in Big Data and Cloud Computing strategies for scalable geospatial AI applications.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner