10.13: Data Management Support for Adaptive Analysis and Prediction of the Atmosphere in LEAD

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 1 February 2006: 12:00 PM
10.13: Data Management Support for Adaptive Analysis and Prediction of the Atmosphere in LEAD
A412 (Georgia World Congress Center)
Beth Plale, Indiana Univ., Bloomington, IN; and R. Ramachandran and S. Tanner

Presentation PDF (402.7 kB)

With the dominant role played by data in all aspects of mesoscale meteorology, it is reasonable to expect that a large number of the requirements for the cyberinfrastructure in development in the Linked Environments for Atmospheric Discovery project (LEAD) will be oriented towards data management support.

The meteorology community has benefited from a relatively long history of access to a large number of observational and model generated data products. The relatively long time over which these products have existed and the general agreement by the community as to their value have resulted in the early establishment of community-supported data dissemination, access, and visualization tools.

With this strong existing foundation, what then is needed in the way of data management tools and functionality to support the paradigm shift to integrated, scalable framework for adaptively analyzing and predicting the atmosphere that LEAD envisions? The data subsystem challenges being explored by LEAD as needed to satisfy adaptive analysis and prediction fall into three categories:

Automated data discovery – what we as computer scientists refer to colloquially as running a “weather forecast” is actually a complex sequence of steps including gathering data products, setting configuration parameters, assimilating the products into a single 3D volume, executing the model, and generating resulting products that are then analyzed by a statistical tool or visualized and analyzed by a human. This sequence, which we depict as is called a workflow. In order for a weather forecast workflow to be kicked off and execute automatically in response to early severe storm conditions, it is necessary to replace the manual tasks of data management with automated ones. This means that searching for input data products needed by a workflow, and capturing and storing the output data products for a user must be automated.

Highly scalable data archiving system – by introducing automated workflows as the means by which forecasting is done, this opens the opportunity to scale the forecast model to levels well beyond what is done today. The data management challenges to support the scale of forecasting envisioned requires considerable attention to movement and storage of terabytes of data. No longer is it possible for a single user to organize on his/her own workstation all the data products generated during the runs. Storage facilities located on the computational grid need to be available to a user, providing the same guarantees of privacy and protection as his/her own file system.

Easy search and access to data – not every step of the forecast can be automated. The user must still indicate the starting conditions and specify the parameters of the run. But today this task is exceeding difficult because it requires significant expertise to know what data products contain what kinds of data, where the products are located, and how they are to be used. In LEAD we are easing the task by providing a search GUI, ontology, and search services to ease the task of locating data products.

In this paper we discuss three recent developments of the data subsystem that our groups are prototyping as solutions to one or more of the goals identified above. These are a metadata representation based on the FGDC standard, the OIS ontology, and the myLEAD personal workspace. These three developments in the LEAD data subsystem are key early outcomes of the ongoing fundamental research in creating an integrated, scalable data management framework for adaptively analyzing and predicting the atmosphere.