Tuesday, 16 January 2007: 8:45 AM
DAP-enabled Server-side Data Reduction and Analysis
217A (Henry B. Gonzalez Convention Center)
Despite the inexorable advance of faster, better, and cheaper computing hardware, terascale data reduction and analysis remain elusive for most. Massive amounts of netCDF data remain underutilized due to scientists' limited bandwidth and computational capacity. Our strategy relocates computation closer to data sources to minimize bandwidth usage and increase efficiency. We apply our server-side analysis framework to Intergovernmental Panel on Climate Change (IPCC AR4) simulations. Instead of running locally, scientists' netCDF Operator (NCO) data analysis scripts are processed and sent via Data Access Protocol (DAP) to a modified OPeNDAP server, where they are parsed, optimized, scheduled, and executed. Instead of receiving raw input data to be computed locally, scientists distribute computation and merely receive the reduced output data, while leveraging their legacy script-based analysis methods. Our benchmarks quantify the reduced bandwidth and improved execution time. Local computation and bandwidth requirements are drastically reduced, freeing the scientist to perform exploratory analysis and discovery in wider scopes and finer resolutions, and reducing time to discovery.