Community Earth System Science Datasets from NCAR

Gagne, David John; Gagne, David John

Benchmark datasets, such as Imagenet, have been critical to measuring and accelerating the advancement of deep learning over the past decade. These datasets provide a point of comparison for the approaches of multiple groups and enable a much broader community to contribute to solving a problem by greatly reducing the startup costs involved. The National Center for Atmospheric Research (NCAR) has recently invested in a project over fiscal year 2020 the develop multiple community machine learning benchmark datasets for a wide range of Earth system science problems, including cloud microphysics, atmospheric chemistry, processing of high-fidelity observations, and severe weather prediction. Once we have completed development of these datasets, we plan to release them on multiple platforms and will provide extensive documentation, tutorials, and baseline machine learning models for comparison and to spur further research by both the Earth System Science and machine learning communities on these problems and data. We will discuss the current status of the project, details of the targeted training datasets, and provide a roadmap for its future directions.

5B.3 Community Earth System Science Datasets from NCAR