In the Spring of 2011 the CASA Engineering Reseach Center operated a four radar network in southwest Oklahoma. An intensive operation period (IOP) was defined, beginning April 2nd, ending June 15th for a total of 75 days (1800 hours), representing the climatological peak season for thunderstorms. During this time the network, covering 10,300 square kilometers, had ongoing convective precipitation for approximately 90 hours, or 5% of the IOP. Several of the derived products generated by CASA, including Multi-Doppler winds and 15 minute Reflectivity nowcasting, are only useful during these events since X-band Radars are not able to determine winds in clear air and Nowcasting algorithms do not predict convective initiation. Our approach was to dedicate individual computers to each product despite the 95% idle rate and frequent over-provisioning during smaller scale and weaker events. The machines necessary to process the data in a timely manner were purchased in 2011 and cost over $4000 dollars each, not including IT overhead expenses associated with their management.
As a result of this experience, we propose the Infrastructure-as-a-Service (IaaS) model, a more efficient compute cloud based architecture designed to procure computing resources on demand in an automated fashion. A lightweight command and control server differentiates between clear-air, stratiform rain, and convective regimes and issues Java-based in-line spot requests to Amazon's Elastic Compute Cloud. Disk images pre-configured with the various processing algorithms are uploaded in advance, triggered and released as weather enters and exits the radar domain. The routines responsible for triggering more resource intensive algorithms are integrated on-board the radar and require no additional maintenance or siting overhead. These include reflectivity thresholding (RT) and storm cell identification and tracking (SCIT) with local radar data, as well as external monitoring routines such as XML based RSS feeds for WFO issued watches and warnings.
Based on current spot prices for similar machines as those used in the 2011 IOP (45 cents/ hour), 90 hours of active use would cost ~$40 per product, plus $2 per user to stream the resultant data out of the cloud. This represents significant cost savings over the dedicated compute model assuming a 5- year lifecycle. In addition to computing, long term data storage of the moment data is another substantial cost. The 90 hours of moment data containing storms from 4 radars combined with the derived merged products amounts to roughly 700GB for the IOP. Current rates of 10 cents per GB per Month yield ongoing $70/month costs to keep this data online. Disk arrays are expensive to purchase and maintain and the cloud storage model appears to be cheaper, though the advantages are less than for the computing.
In order to test the feasibility of streaming data to the cloud it was necessary to verify that the network bandwidth was sufficient. We have made measurements to a number of cloud providers, Amazon's East and West coast presences, Rackspace, and others. Our traces demonstrate that throughput from the radars to the data centers consistently met the 5 Mbps minimum requirement for transporting moment data without backlog, and in most cases far exceeded it.
Beyond the immediate cost savings associated with cloud usage for the 4- radar network, empirical results suggest a significant benefit as the system grows. Benchmark tests suggest that as the number of radars in the domain increase, both network bandwidth and available CPU cycles become bottlenecks. As weather moves through the domain, it will generally encompass only a subset of radar coverage areas. The ability to dynamically partition and spin-off independent processes operating on a portions of domain containing the weather is an important consideration. Using a weather simulator to mimic weather advection through a domain of arbitrary size, we evaluate our ability to retain manageable workloads on a per-machine basis.