A high performance LDM data cluster

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Monday, 18 January 2010
Robert C. Lipschutz, CIRA/Colorado State Univ., Boulder, CO; and D. Hagerty, P. Hamer, P. Lannigan, and C. MacDermaid

Handout (1.3 MB)

Data systems in the NOAA Earth System Research Laboratory (ESRL) Global Systems Division (GSD) Central Facility acquire, process, store, and distribute global meteorological data sets in support of GSD's various research and development projects. These systems deliver some 800 GB of data per day to GSD scientists and collaborators.

To improve its support to the division, the GSD Information and Technology Services (ITS) group has implemented a new clustered architecture for this data processing. The new system replaces a collection of aging Linux High-Availability pairs and stand-alone platforms with a scalable Linux cluster that offers high throughput performance and excellent reliability, resource utilization and configurability. The cluster comprises six compute hosts and a network-attached RAID Data Storage Server. Key software components of this system include the Red Hat Cluster Suite for managing cluster-wide application services and failovers, Sun Grid Engine (SGE) for job activation and load balancing, 'fcron' for time-based job triggering, and Unidata's Local Data Manager (LDM) for data transport and event-based job triggering. GSD-developed Object Data System (ODS) applications are responsible for such data processing tasks as converting GOES GVAR satellite data, Gridded Binary (GRIB) model data, and WSR-88D Level-II radar data into the netCDF formats needed by user applications.

The cluster's novel use of LDM provides for much of the system's performance and configurability. The LDM implementation, which required minor extensions to the standard LDM scripts, allows for multiple concurrent LDM instances to run on a host. Using this capability, the cluster defines relocatable LDM services that independently manage the high-volume WSR-88D and NOAAPORT data streams, data from other providers, and also a data archive function. In addition, each host runs an identically-configured "Notify" LDM that that watches a local notification queue and submits processing jobs to the cluster using data arrival messages emitted by each ODS data acquisition and processing job. In this way, cascading processing events are evenly distributed across the cluster.

In this paper, we detail the implementation of the GSD data cluster architecture, describing how the combined services of Cluster Suite, SGE, fcron, and LDM have enabled us to construct a high performance real-time data processing environment that serves the GSD community. Given the central role of LDM in the architecture, we particularly focus on the steps required to configure LDM as a cluster service, and on the use of event notifications in LDM.