15A.1
A renewed look at Distributed Discovery of Earth Science Data; Leveraging the Open Geospatial Consortium's (OGC) Catalog Services for the Web (CS-W) to serve NASA's metadata to the Global Earth Observation System of Systems (GEOSS)
Scott A. Ritz, WYLE Information Systems, Greenbelt, MD; and L. M. Olsen, T. B. Stevens, and R. T. Northcutt
NASA's Global Change Master Directory (GCMD, gcmd.nasa.gov) has utilized the metadata harvesting model to serve NASA's climate change metadata to its partners for almost a decade. Harvesting is the physical transfer of formatted documents among metadata systems to facilitate system interoperability. Methods used for harvesting include web-accessible folders, FTP, Z39.50, and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Harvesting proved to be adequate when there was a limited number of requests by partners to import GCMD metadata into their systems, and the overall collection was small (7,000 documents in 1999). A staff member could manually edit a small metadata set in a reasonable amount of time to ensure compatibility with a partner's system. As the number of requests increased, along with the size of the metadata collection (over 20,000 documents in 2009), the task of translating, editing, and updating documents on heterogenous metadata systems quickly became unmanageable. For this reason, a model where partners could both share and discover metadata without the need to transfer documents among systems was needed.
Distributed discovery of data is a method whereby a software client is used to search a external server using simple, standardized queries to locate and display metadata locally that meet a specific semantic criteria. A typical query would be: Query the GCMD server and display all documents locally that match Project Keyword: "GEOSS" and Science Keyword: "Atmosphere > Aerosols > Aerosol Optical Depth/Thickness > Angstrom Exponent". Earlier attempts at distributed discovery were hindered by slow query times related to factors such as inadequate Internet bandwidth. However, recent improvements in server technologies and software, enhanced Internet bandwidth, and protocol standardization have brought about a revival in distributed discovery.
Because distributed discovery does not require the synchronization of metadata content among systems, the need for time-consuming translations and preparation of metadata to be compliant with a partner's system is thus eliminated. Furthermore, by serving metadata directly from the source, users will always have access to the most recent version of the content. Finally, reducing metadata processing and handling permits critical climate change data to be made immediately available for discovery by the scientific community.
We will demonstrate distributed discovery of climate change data using the Open Geospatial Consortium's (OGC) Catalog Service for the Web (CS-W) to serve metadata to the Global Earth Observation System of Systems (GEOSS). We chose CS-W, because it is open source, interoperable, and widely used within the GEOSS community. Topics addressed will include hardware and software architecture, results from performance tests, best practices, and assessment of increased efficiency of metadata interoperability and management subsequently achieved.
Session 15A, Challenges in Data Access, Distribution, and Use including, but not limited to, issues raised in the National Academy of Sciences report Observing Weather and Climate from the Ground Up - Part II
Thursday, 21 January 2010, 3:30 PM-4:30 PM, B217
Next paper