J56.3 Determining Best Practices for Archiving and Reproducibility of Model Data

Thursday, 16 January 2020: 9:00 AM
157C (Boston Convention and Exhibition Center)
Gretchen L. Mullendore, Univ. of North Dakota, Grand Forks, ND; and M. S. Mayernik and D. Schuster

Much of the research in the geosciences, such as projecting future changes in the environment and improving weather and flood forecasting, is conducted using computational models that simulate the Earth's atmosphere, oceans, and land surfaces. There is strong agreement across the sciences that reproducible workflows are needed for computational modeling. Open and reproducible workflows not only strengthen public confidence in the sciences, but also result in more efficient community science. However, recent efforts to standardize data sharing and archiving guidelines within research institutions, professional societies, and academic publishers make clear that the scientific community does not know what to do about data produced as output from computational models. To date, the rule for reproducibility is to “save all the data”, but model data can be prohibitively large, particularly in a field like atmospheric science. The massive size of the model outputs, as well as the large computational cost to produce these outputs, makes this not only a problem of reproducibility, but also a “big data” problem.

This presentation will discuss a newly funded project focused on bringing together modelers to develop community guidelines for achieving open and reproducible workflows in geoscience modeling research. Preliminary discussions across different modeling communities suggest that the answer to “what to do about model data” will look different depending on model descriptors. Examples of important model descriptors include reproducibility, storage vs. computational costs, and value to the community. This presentation will outline our approach to getting community agreement on model data best practices, as well as discuss the need for rubrics based on the model descriptors that will help researchers and centers describe their model data in consistent terms so that proper decisions are made regarding archiving and retention.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner