Data access and storage in the LEAD cyberinfrastructure

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 1 February 2006: 11:30 AM
Data access and storage in the LEAD cyberinfrastructure
A412 (Georgia World Congress Center)
Anne Wilson, UCAR, Boulder, CO; and D. Lindholm and T. Baltzer

Presentation PDF (171.5 kB)

The Linked Environments for Atmospheric Discovery (LEAD) project, funded by the National Science Foundation (NSF), is building a cyberinfrastructure for mesoscale meteorology research and education. In LEAD users can build and execute orchestrations involving multiple data sources and the web service-based tools provided by LEAD. LEAD tools and applications include data mining algorithms such as ADaM, assimilation tools such as ADAS, and forecast models such as WRF.

A major goal of LEAD is to allow users to query, import, and manage data in the LEAD domain for purposes such as visualization, storage, and as input to LEAD applications. LEAD data can be individual files, collections of files, or streams. LEAD data sets may be very large, necessitating storage in a distributed manner or on a mass storage system. These large sizes also require support for subsetting and aggregation of data.

There are three classes of data that the LEAD data storage subsystem must handle. Personal data is data that a user has brought into their space within the LEAD Data Repository. Users may store data and other resources here, and also the orchestration will store intermediate results in a user's personal space. Public LEAD data is data that is made available to the LEAD community by cooperating data providers, such as universities interested in sharing the data that they receive or generate. External data is any other data that a LEAD user knows about and would like to access. NCDC archival data is an example of this data.

Both personal and public LEAD data are catalogued within LEAD. Via the catalogs, users can query over the metadata to discover data that meets specified time, spatial, and field requirements. Other relevant requirements pertain to metadata generation, data quality, metadata quality, access control, and handling of proprietary data.

This paper presents some of the requirements for data access, acquisition, storage, and retrieval that are shaping the architecture of the LEAD Data Subsystem. It also gives a high level view of that architecture through a canonical use case, as well as its current state of development.