Cloud-based opportunities in scientific computing: insights from processing Suomi National Polar-Orbiting Partnership (S-NPP) Direct Broadcast data
- Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by lengthy quiet periods (when the satellite is out of receiving range). In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are constructing a data-driven approach to the management of cloud computing resources: whenever new observations become available, an automated workflow manager creates one or more virtual machines (of variable size and processing power) to process the data; and terminates them when the processing is complete.
- “Spot instances” are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- but must withstand unpredictable interruptions (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing to resume with no loss and minimal delays after interruptions.
- Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by “user data” containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing.
- Finally, cloud computing may be a promising platform for distributed volunteer (“interstitial”) computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. As commodity computing shifts from (always on) desktop computers towards smartphones and tablets (running on scarce battery power), interstitial computing may rely increasingly on the cloud's slack capacity: virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources for satellite data processing and related analytical processes.
We will present our findings and research directions on these and related topics.