The NCAR Research Data Archive's Hybrid Approach for Data Discovery and Access
The distributed search capability enabled by standards-based community tools, such as Geoportal and an OAI-PMH access point that serves multiple metadata standards, provide pathways for external users to initially discover RDA holdings. From here, in-house developed web interfaces leverage primary discovery level metadata databases that support keyword and faceted searches. Internal NCAR HPC users, or those familiar with the RDA, may go directly to the dataset collection of interest and refine their search based on rich file collection metadata. Multiple levels of metadata have proven to be invaluable for discovery within terabyte-sized archives composed of many atmospheric or oceanic levels, hundreds of parameters, and often numerous grid and time resolutions.
Once users find the data they want, their access needs may vary as well. A THREDDS data server running on targeted dataset collections enables remote file access through OPENDAP and other web based protocols primarily for external users. In-house developed tools give all users the capability to submit data subset extraction and format conversion requests through scalable, HPC based delayed mode batch processing. Users can monitor their RDA-based data processing progress and receive instructions on how to access the data when it is ready. External users are provided with RDA server generated scripts to download the resulting request output. Similarly they can download native dataset collection files or partial files using Wget or cURL based scripts supplied by the RDA server. Internal users can access the resulting request output or native dataset collection files directly from centralized file systems.