Tuesday, 16 January 2007: 9:15 AM
High performance distributed data reduction and analysis with the netCDF Operators (NCO)
217A (Henry B. Gonzalez Convention Center)
The netCDF Operator (NCO) software facilitates manipulation and analysis of gridded geoscience data stored in the self-describing netCDF format. NCO has helped researchers analyze and data-centers serve netCDF data for about ten years. In the last decade NCO has acquired significant functionality including shared memory threading, a message-passing interface, network transparency, and an interpreted language parser. NCO is optimized to efficiently analyze large multi-dimensional datasets spanning many files on local and remote clusters. NCO treats data files as a high level data type whose contents may be simultaneously manipulated by a single command. Institutions and data portals often use NCO for middleware, to hyperslab and aggregate dataset requests, while scientific researchers use NCO to perform three general functions: arithmetic operations, data permutation and compression, and metadata editing. We give an overview of NCO's design philosophy, primary features, and future plans.
The bulk of our presentation will describe how NCO can access and reduce distributed multi-file terascale datasets swiftly and with relative ease. These features include 1) a server-side implementation of NCO which efficiently shares the analysis workload with the client to reduce network bandwidth required and improve throughput; 2) automatic threading of workloads within each file; and 3) compatibility with multiple transfer protocols including OPeNDAP, FTP, and SFTP. We will illustrate this functionality using datset reduction of climate simulations from the Intergovernmental Panel on Climate Change (IPCC) fourth assessment report (AR4).
Supplementary URL: http://nco.sf.net