2A.3 A Web Based Portal for Integrated Real Time Access to Multiple Data Streams

Monday, 7 January 2013: 4:30 PM
Room 11AB (Austin Convention Center)
Edwin W. Eloranta, University of Wisconsin Madison, Madison, WI; and J. P. Garcia and R. Garcia

As part of our NSF-funded development of the Arctic High Spectral Resolution Lidar, we have constructed a web site to disseminate data. Initially this provided a way for us to access and process our lidar data. In time we found it necessary to access data from other instruments at our site in Eureka, Nunavut. This has led to the development of a flexible user friendly data portal that provides easy access to data from a cloud radar, a microwave radiometer, a Fourier transform spectrometer, radio-sondes, and our lidar. Simplified data access, it has encouraged a flurry of scientific papers and conference presentations and has prompted enthusiastic endorsements from users. We believe that concepts used our site can be of general use and provide easier access to new and existing geophysical data sets.

Our web portal, located at http://lidar.ssec.wisc.edu/syst/ahsrl/ahsrl_data.htm, attempts to perform overhead tasks that most users encounter in accessing data. It particularly reduces tasks encountered when using data from multiple sources. The current site provides easy mechanisms to browse data from one instrument or from combinations of instruments. It reads and concatenates data across file boundaries. It converts raw data to geophysical quantities on demand with user selected averaging intervals. On demand processing provides real time access to data streams from operating instruments. It allows users to make custom images of selected data segments with data from multiple instruments converted to a common grid system. It provides for user input of data selection thresholds derived from any combination the input data sets and provides images showing the selected points. Additionally, it provides synthesis products generated from combinations of data streams. Because the system provides on demand processing, it is not necessary to store a new version of processed data with each improvement of the processing algorithm. This greatly reduces data volume; it is only necessary to store code updates rather than multiple copies of the data set. Finally, the system writes selected data from multiple sensors to a single NetCDF file with all data on a common grid. This file, which also includes program version numbers and user inputs to the processing code, can then be downloaded over the web. In addition, data files for many data intervals can be requested at once. These are processed off line and made available at a public ftp site.

In order to perform these tasks, current software provides a number of internal functions: 1) a catalog of available data, 2) programs to find data based on an external condition (e.g. find all data close to the overpasses of a particular satellite) 3) an archive of detailed quick-look and thumbnail browse images for each instrument, 4) a cursor function which uses the catalog to identify files and locations of data within the files at a particular time, 5) functions to read various data formats and decompress files when needed, 6) processing code that works with arbitrary time slices allowing it to be independent of file boundaries, 7) functions to automatically retrieve auxiliary data from external web sites, 8) averaging and interpolation programs to convert data to a common coordinate system, 9) programs to convert raw data to geophysical quantities, 10) programs using data from multiple sensors to compute geophysical quantities with user selected processing assumptions, 11) display functions to provide images of processed data and images of data masked by selection criteria, 12) mechanisms to input data selection criteria, 13) programs to write output data files and return them to the user, 14) web page interfaces for control, display, and data downloading.

Originally, this data distribution system was pasted together with a combination of computer languages and scripts(MatLab, Python, C++, Fortran, HTML, and C) and as it grew it became hard to maintain and difficult to extend. As a result, we are in the process of rewriting the system in Python with a Pyramid web framework, using well organized interfaces and defined protocols with the intention to release it as open source so that it can be applied by other organizations.

Many of the included functions are common to any data retrieval and the power this approach is that each investigator does not have to duplicate this work independently. In addition, an investigator who adds a new data stream to the system immediately gains access to all preexisting data, while all other users gain access to the new data stream. This provides a much more efficient model than the normal roll-your-own approach to data retrieval. We believe that a generalized web based system of this type could greatly improve access to many types of geophysical data and have a broad impact on the science by increasing utilization of vast stores of data that are accumulating in geophysical archives.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner