11A.2
A General Purpose System for Server-side Analysis of Earth Science Data
Roland Schweitzer, Weathertop Consulting, LLC, College Station, TX; and K. M. O'Brien, J. Li, A. Manke, J. Malczyk, and S. Hankin
Many earth science data archives depend on netCDF for local data storage and expose their archives to outside users via OPeNDAP. The IPCC AR4 data collection is one prominent example of this strategy of local netCDF data storage and remote access via OPeNDAP [Williams]. OPeNDAP servers make it possible for remote users to access data from archives as if that data stored local to the scientist. Many advances in recent years, such as OPeNDAP servers that can provide access to collections of individual netCDF files from a time series as if all the time steps were stored in a single file (a technique called aggregation) has greatly improved the usability of data archives for scientists.
Even with these advances, there continue to be barriers for effective use of these large data archives via OPeNDAP. One barrier is the inefficiency of transferring large volumes of data in order to make a calculation. Even though the netCDF API and OPeNDAP allow for efficient access to the relevant sub-set of data, there is still significant overhead tied up in the network transfer of the data needed as input to the calculation.
Many systems are being introduced to address this issue. Each such system is designed to allow the calculation to take place on servers local to the data and only incur the expense of data transfer on the (usually significantly smaller) result of the calculation. The SWAMP system [Wang], GrADS Data Server [Wielgosz] and the data services provided by the INGRID archive [Blumenthal] are examples of systems that allow server-side calculations by scientists access data remotely. The Ferret-THREDDS Data Server (F-TDS) is another example of such a system and is the topic of this abstract.
The THREDDS Data Server (TDS) is a general purpose OPeNDAP data server that can be installed by data archives to provide aggregated access through the netCDF API to many different formats of local data files. TDS which is written in Java depends on the Java netCDF library for access to the local netCDF files being served. The netCDF Java library allows other software developers to build "plug-in" modules which can read other data formats and provide the data to the internal Common Data Model via the I/O Service Provider (IOSP) interface. Once built, and I/O Service Provider can be plugged into TDS and thereby provide netCDF API OPeNDAP access to the data which is understood by that IOSP implementation.
Ferret is a legacy command-line analysis and graphics package that reads COARDS and CF-1.0 netCDF files. We have written an IOSP implementation which can execute Ferret commands and place the results of those commands into the Common Data Model. After opening one or more netCDF files, the Ferret commands can define new virtual variables which represent the result of some analysis operation applied to one or more of the data variables. By registering the Ferret IOSP with the THREDDS Data Server (TDS) a Ferret script which reads netCDF data and defines virtual variables then becomes an OPeNDAP data set. All of the variables (the real and virtual variables) defined by the script are visible to OPeNDAP clients through the netCDF API.
When deployed in an F-TDS, this server also has the capability to accept Ferret commands embedded in the OPeNDAP URL to define new variables which are the result of a calculation. The new variables which are defined via commands embedded in the URL are visible to the remote software client as if the entire resulting data set existed, but the actual calculations that need to be performed to create the new variables are only performed when the remote client actually asks for the data. And when the data is requested, Ferret will only calculate the sub-set needed to satisfy the request. Once calculated, the result is keep in a cache on the F-TDS server and subsequent request for the same data sub-set are satisfied from the cache.
In the first implementation of F-TDS, the specification of the calculations to be performed must be done using Ferret syntax. However, this can be a disadvantage for scientists that are not familiar with Ferret. Recently we have started work to create a general syntax for the specification of well-known server-side calculations that can then be implemented on any OPeNDAP server.
The Live Access Server (LAS) is a well-established Web-application software system for display and analysis of geo-science data sets. The software, which can be downloaded and installed by anyone, gives data providers an easy way to establish services for their on-line data holdings so their users can make plots, create and download sub-sets in a variety of formats, and compare and analyze data. When LAS needs to request data that has been transformed (averaged, summed, interpolated to a new grid and so forth) is simply creates a new virtual data set and passes off the work to do the transformation to an F-TDS server that is installed along side the LAS. This has the advantage of reducing the complexity of the part of LAS which create the products since it doesn't need any special knowledge of the transformation being requested to generate the plot or other product.
D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, C. Kesselman, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, and N. Wilhelmi, 2008: The Earth System Grid: Enabling access to multi-model climate simulation data. Bulletin of the American Meteorological Society (in review).
Joe Wielgosz, Brian Doty, The Grads-Dods Server: An Open-Source Tool For Distributed Data Access And Anaysis, 19th Conference on IIPS
Richard Rogers, Steve Hankin and Ansley Manke, The Ferret DODS Server, 20th Conference on IIPS
Daniel L. Wang, Univ. of California, Irvine, CA; and C. S. Zender and S. F. Jenks DAP-enabled Server-side Data Reduction and Analysis, 23rd Conference on IIPS
Benno Blumenthal, http://iridl.ldeo.columbia.edu/dochelp/QA/Expert/
Session 11A, Challenges in Data Access, Distribution, and Use - Part II
Wednesday, 14 January 2009, 4:00 PM-5:30 PM, Room 121BC
Previous paper Next paper