The Atmospheric Radiation (ARM) Program is a DOE long-term field measurement and modeling project that obtains precise measurements of atmospheric radiation-related processes such as scattering and absorption due to water, aerosol, and clouds. This information will improve the parameterization of global circulation models (GCM) that predict climate change. The ARM Program has generated more than 700,000 data files that are accessible to the scientific community from the Program's Data Archive on the Internet (www.archive.arm.gov). This file collection includes hundreds of combinations of data sources (instruments or algorithms), data levels, and geographic locations. The thematic, temporal, and spatial span of the ARM information is irregular because of incremental implementation of the Program (field sites, instruments within a site, algorithms and data products within an instrument type), and operational interruptions (maintenance, upgrades, etc.). The size and irregularity of the ARM data collection presents a 'Where's Waldo? (find data)' challenge to potential users and the user interface logic.
The Archive has provided a query-based interface on the Internet for web browsers since 1995. The query-based interface is efficient for users who want data from exact specifications (time period, location, source). In contrast, finding data with the query interface can be a tedious process for the researcher with fewer exact specifications and many acceptable data subsets. Because of the irregularity of the data collection, many times the query response is 'no data found.' A catalog (i.e., summary tables by time period, data source, and location) was developed to show the extent of the data collection. The advent of HTML logic and Web applications inspired the integration of the catalog with a user interface for file selection. This catalog-based interface complements the query-based interface.
The catalog-based user interface is a series of tables that show the data density (number of files) within the major dimensions of the data collection (i.e., time period, site, data source, data level, etc.). The content of the catalog tables provides the user with early indications about the patterns of data availability and each layer of the catalog displays more detail about a smaller subset of the data collection. The structure of the catalog conforms to the irregular and changing scope of the data collection. The finest-grain level of the summary tables is linked to a process that records the selection of 'file blocks' by the user in a 'shopping cart'. The contents of the shopping cart can be 'edited' and are ultimately submitted to a sub-process for file retrieval.
The catalog contents are generated by statistical analysis software from the database table that lists all of the available data files. The catalog contents are updated periodically to include new data files. These tables are displayed by a CGI program that also records 'file blocks' to be retrieved for the user. Catalogs of other data objects (e.g., graphs, summary tables, quality reports, operations logs, etc.) are being evaluated for development. The catalog concepts are potentially applicable to many other systems that collect and distribute data