Our web portal, located at http://lidar.ssec.wisc.edu/syst/ahsrl/ahsrl_data.htm, attempts to perform overhead tasks that most users encounter in accessing data. It particularly reduces tasks encountered when using data from multiple sources. The current site provides easy mechanisms to browse data from one instrument or from combinations of instruments. It reads and concatenates data across file boundaries. It converts raw data to geophysical quantities on demand with user selected averaging intervals. On demand processing provides real time access to data streams from operating instruments. It allows users to make custom images of selected data segments with data from multiple instruments converted to a common grid system. It provides for user input of data selection thresholds derived from any combination the input data sets and provides images showing the selected points. Additionally, it provides synthesis products generated from combinations of data streams. Because the system provides on demand processing, it is not necessary to store a new version of processed data with each improvement of the processing algorithm. This greatly reduces data volume; it is only necessary to store code updates rather than multiple copies of the data set. Finally, the system writes selected data from multiple sensors to a single NetCDF file with all data on a common grid. This file, which also includes program version numbers and user inputs to the processing code, can then be downloaded over the web. In addition, data files for many data intervals can be requested at once. These are processed off line and made available at a public ftp site.
In order to perform these tasks, current software provides a number of internal functions: 1) a catalog of available data, 2) programs to find data based on an external condition (e.g. find all data close to the overpasses of a particular satellite) 3) an archive of detailed quick-look and thumbnail browse images for each instrument, 4) a cursor function which uses the catalog to identify files and locations of data within the files at a particular time, 5) functions to read various data formats and decompress files when needed, 6) processing code that works with arbitrary time slices allowing it to be independent of file boundaries, 7) functions to automatically retrieve auxiliary data from external web sites, 8) averaging and interpolation programs to convert data to a common coordinate system, 9) programs to convert raw data to geophysical quantities, 10) programs using data from multiple sensors to compute geophysical quantities with user selected processing assumptions, 11) display functions to provide images of processed data and images of data masked by selection criteria, 12) mechanisms to input data selection criteria, 13) programs to write output data files and return them to the user, 14) web page interfaces for control, display, and data downloading.
Originally, this data distribution system was pasted together with a combination of computer languages and scripts(MatLab, Python, C++, Fortran, HTML, and C) and as it grew it became hard to maintain and difficult to extend. As a result, we are in the process of rewriting the system in Python with a Pyramid web framework, using well organized interfaces and defined protocols with the intention to release it as open source so that it can be applied by other organizations.
Many of the included functions are common to any data retrieval and the power this approach is that each investigator does not have to duplicate this work independently. In addition, an investigator who adds a new data stream to the system immediately gains access to all preexisting data, while all other users gain access to the new data stream. This provides a much more efficient model than the normal roll-your-own approach to data retrieval. We believe that a generalized web based system of this type could greatly improve access to many types of geophysical data and have a broad impact on the science by increasing utilization of vast stores of data that are accumulating in geophysical archives.