The Atmospheric Radiation Measurement (ARM) Program is gathering hundreds of millions of data values to further two objectives. The first objective is to relate spectrally resolved spatially and temporally observed radiative fluxes to atmospheric temperature, composition and surface radiative properties. The second objective is to develop parameterizations for general circulation and related models. This ongoing data collection comes from numerous data sources, which include instruments and algorithms, and the operational conditions at the time of collection.
The initial phase of the ARM data system recorded the "measurement context" within the data files using netCDF formats with global information and data definition segments. The "context" available at the collection time of the parametric data was embedded with in the data file. Some operational conditions and quality control notes (additional measurement context) were captured in databases. There was no direct binding, or embedding of these conditions or notes to the data files.
It became apparent, as the data system matured that many users needed more direct access to the metadata. The next phase of the ARM data system development addressed this need with the creation of numerous web pages. Web pages containing descriptive information of the ARM program and the measurements being collected were produced. The web technologies were developing and allowed for access to the metadata that was held in the databases. Applications were built for web interaction with individual databases. The need for an overall information architecture was identified as the number of web pages increased from 10's to 100's to 1000's, the number of databases from a few to 10's and the number of information contributors (persons and processes) from 10's to 100's.
The current phase of ARM's data system development is the design and implementation of this "information architecture". The core of the architecture is the development of key data descriptors such as instrument names, measurement names, geographic references, and data focus areas. These descriptors are the framework in which all other information is categorized and linked (whether as a web page or a database record).
The evolving information architecture has to include the existing millions of records in various formats, on various platforms, and the 10,000's of new records generated each day. As a consequence of momentum and inertia in the ARM data systems, the information architecture is being implemented as a series of "services" that access an API to the information sources. The information sources continue to reside on distributed systems across the Internet. Web and Internet technologies implemented in JAVA are being utilized in the development of these services.