367071 A Serverless Architecture for NEXRAD Weather Radar Data Pipeline

Wednesday, 15 January 2020
Hall B1 (Boston Convention and Exhibition Center)
Jingyin Tang, IBM, Atlanta, GA; and S. Honey and P. O'Neil

On cloud platform, traditional weather radar data processing usually involves set up a cluster or group of virtual machine instances to operationally ingest, processing and generate multiple levels of radar products using volumetric and mosaic products. It is also possible to migrate virtual machine instances to containerized images with a cluster manager (e.g. Kubernetes). In operations, we realized significant system utilization differences in heavy and light weather conditions, and such utilization ratio could change rapidly in summer days. This scenario leads to difficulties to auto-scale systems in a responsive manner. In the Weather Company, an IBM Business, a pilot project starts to further decompose containerized radar processing workflow into a serverless architecture. Under this architecture, single radar processing functions are moved to platform triggered cloud functions (e.g. Amazon Web Service Lambda), automatically invoked when proper data arrive at certain S3 buckets. One cloud function may ingest the data, saving its states in an external memory cache server, and put processed data back to another S3 bucket, optionally send a message via a message queue system (e.g. AWS Simple Queue System). The data pipeline for NEXRAD single radar processing starts with receiving NEXRAD Level-II cuts from a S3 bucket. When a cut arrives the S3 bucket, a decoding function is triggered, and it stores intermediate results into a Redis cluster. When a sweep completed, a sweep combining function is triggered to retrieve all intermediate decoded cuts, combines them as a complete sweep, and dumps it into “sweep stack” S3 bucket in Cf/Radial format. Sweep-based quality control are then trigged and apply non-precipitation-mask onto the sweep. When a final cut is received by the decoding function, it also emits a message to the mosaic processor to receive processed Level-II data, which then are used to create CONUS radar mosaic. This serverless architecture brings us two significant benefits: (1) simpler, faster software iteration; (2) cost saving. For the first perspective, the serverless architecture allows us to deploy individual functions without rebuild the whole software layer. Also, such single function deployment could be fully automated by Continuous Integration Continuous Deployment practice. For the cost saving, the serverless functions are simply billed on callings times, removing all necessity of budget data fluctuations. The scalability, tolerance and redundancy are naturally provided by the cloud services provider, which could dramatically save hours on system maintenance. A preliminary estimation that we could cut the cost by half due to above mentioned benefits. Currently, there are some limitations with this architecture because there is no way to keep a function state inside cloud function naturally. Consequently, at this stage, the national mosaic functionality is still run as standalone instances, due to legacy technical stacks and responsiveness in operations. We’re actively working on a better solution to make this architecture more scalable and reliable.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner