6B.1
Successful data curation practices for large data archives

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Tuesday, 25 January 2011: 3:30 PM
Successful data curation practices for large data archives
607 (Washington State Convention Center)
Joseph L. Comeaux, NCAR, Boulder, CO; and S. Worley

The Research Data Archive (RDA) at NCAR has grown to be over 4 million files, 400 Terabytes, and 600 datasets during the past 45 years. Data curation to support climate and weather research has always been a primary focus. From this experience we will highlight successful data curation practices that include science-educated curators, facilities for data security, scalable systems, long-term preservation of diverse data, and the benefits from partnerships.

Staff members that are educated in the target discipline of the archive, and who gain computing and data management skills, make excellent curators. This helps assure data organization suitable for users, archives which are properly quality checked, and an availability of expert consultants. Reliable and secure data storage systems, which technologically evolve over time, are essential for successful curation. The NCAR storage systems have fulfilled this important need. As the scale of available data continue to increase over time, our methods for archiving data have evolved. This includes not only improving methods for updates and ingesting new datasets, but also for handling routine backups as well as disaster recovery. The results of these processes are integrated into databases where the integrity of the data and its curation can be verified. Diverse data formats are inherent in the RDA long-term archive. Awareness and strategies to avoid potential data access loss are a curation obligation addressed with well-documented formats and software assessments as computing systems are changed. With today's rapidly growing amounts of research data and the trend for more and more multi-disciplinary studies, national and international partnerships are beneficial. Sharing data and data management experiences between centers improve the archives that are available to the user community, can provide mutual data preservation backup, and promote teamwork to solve data questions and problems.