83rd Annual

Tuesday, 11 February 2003: 11:00 AM
The Data Discovery Support Repository—the next chapter in the science of data discovery
Roland H. Schweitzer, NOAA/OAR/CDC and Univ. of Colorado, Boulder, CO; and T. Stevens and T. Habermann
Poster PDF (79.5 kB)
Helping users find data they need is a critical part of NOAA's mission.  One of the greatest barriers to providing effective search systems is the lack of accurate metadata.  Since many users discover the data they need via public search engines data providers are obligated to maintain HTML data descriptions that can be indexed by search engines.  Data providers are also required to maintain FGDC compliant descriptions of their data.  Domain specific search engines such as the GCMD, the FGDC Clearinghouse and NOAAServer use these FGDC descriptions.  Other practical considerations may force data providers into maintaining still other forms of metadata, including data contained in self-describing netCDF files or relational database tables. The Data Discovery Support System is one approach to unifying metadata into a single system that automatically synchronizes and stays up-to-date.

The central piece in the system is the Data Discovery Support Repository. This repository supports network connections for importing metadata from and exporting metadata to other parts of the system.  This back-end tool is from Blue Angel Technology (www.blueangeltech.com ) and has already been used for management of metadata from NCDC, NGDC, and NODC.   The first step populating the repository was to build a Web interface that allows users to quickly and easily fill in metadata information that is not easily obtained from other sources or that is repeated for every data set in the provider collection.  Data and metadata contacts are an example of information that is inserted into the repository via the Web interface.   In addition to this Web-based input facility, we have built connections between this repository and netCDF files (both local and those served by DODS servers) and relational databases.

The connection between the repository and netCDF files was accomplished via a set of Java classes.  These classes read a simple XML configuration file.  The configuration file associates particular netCDF file attributes or netCDF data values with FGDC metadata elements.  The input also specifies a Java class, which implements the logic needed to transform the content extracted form the netCDF file into content suitable for use in an FGDC description. When that transformation is accomplished the metadata content can be written out as an XML file or it can be ingested directly into the repository using a Java class library provided by Blue Angel Technologies.

Once the information is contained in the repository it can be easily extracted (on-demand if desired) as HTML, XML or as conventional FGDC documents.  Being able to extract the data in this way closes the loop back to the data provider by offering a single solution to the practical considerations that drove the provider toward many different forms of metadata in the first place.  This paper will present details about the repository, the connections that have been implemented and will discuss future directions for the project.

Supplementary URL: