The Research Data Archive at NCAR: A metadata system that enables discovery across a diverse archive

Dattore, Robert; Dattore, Robert

Improving discovery of high quality climate data is critical to advancing our understanding of the Earth's climate system and how it changes over time. The Research Data Archive (RDA), managed by the Computational and Information Systems Laboratory at NCAR, is a data resource designed to meet the needs of climate and weather research. First efforts to build the RDA began in the late-1960s when datasets were typically small in size and did not have global coverage, e.g. surface observations for a single country or balloon data from a single experiment. Further, many distinct specialized data formats were common. Today, datasets are generally larger, have global coverage, and are often stored in common data formats (GRIB, netCDF, BUFR, IMMA, etc.). Over time, the RDA has become a large heterogeneous collection with the need for a scalable system to handle the diverse and increasingly rich associated metadata.

Recent major changes in metadata collection have transformed the RDA into a climate data management system that is efficient, adaptable, and provides many benefits to the data user. We now capture and store detailed standard discovery metadata and automatically harvested file content metadata into an integrated metadatabase. This is the foundation for up-to-date archive-wide data discovery, user interfaces where constraints determine which data files are needed from terabyte-sized collections, and for flexible metadata sharing through the OAI-PMH protocol. The system description will be illustrated with RDA examples and briefly discussed how it supports a scale-appropriate set of data access methods.

5.1 The Research Data Archive at NCAR: A metadata system that enables discovery across a diverse archive