User Driven Automatic Data Request Service - Providing User Access to Terabyte-sized Datasets

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 5 February 2014: 4:00 PM
Room C106 (The Georgia World Congress Center )
Zaihua Ji, NCAR, Boulder, CO; and S. Worley and D. Schuster

It is a challenge for data service centers to provide user access to terabyte-sized dataset collections. Traditional data services support access either through online web interfaces or with computer nodes that directly connect to the storage device. Data files served this way are formatted and built in advance, and are normally large in size (multiple gigabytes). In reality, users often need data files built much differently, such as a temporal, spatial, and parameter subsets, and in a different data format that suites the local data analysis environment. It is important for data service centers to provide dynamic data services where users request and receive the data they need in a preferred format.

In the Research Data Archive (RDA, rda.ucar.edu) at NCAR we maintain many historical and ongoing, observational and model-produced, atmospheric and oceanographic data products. The data product files are archived in a tape-based High Performance Storage System (HPSS), and a copy of the most active data files are also stored on a central disk-based file system. A stable, scalable and distributed controller, DataSet ReQueST (DSRQST), has been designed and implemented for auto-processing user requests, including data subsetting, format converting, and data staging for individual users. The system runs unattended 24x7, has fault resistant recovery procedures, uses the HPSS and central file systems for data access, a MySQL database for record keeping and job control, and a large capacity multi-node large memory cluster for fast and efficient computations. The DSRQST workflow has been designed to easily implement new data services as user needs and resource availability dictate. For example DSRQST could drive user specified re-gridding of model data and algorithmic application to native parameter fields to create additional products.

More than 300 RDA data products are served by DSRQST, in addition to the traditional methods. In this presentation we will discuss the highlights of the DSRQST work flow, and illustrate the positive impact it is having on the users with more than 2000 individual requests per month.