'ImageNet' refers to one of the largest labelled data repositories that has been successfully used to track progress of AI algorithms with the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [3, 4]. The database and challenge has led to successful commercial applications in the technology industry. The database design and challenge focused solely on the recognition of objects in human-captured imagery, which is not well-suited to geosciences, which require a new standard dataset and challenge formulations. For example, geoscience objects include waves, flows, and structures in all phases of matter. Hence, the structure, and patterns of geoscience objects that can exist at multiple scales in continuous spatio-temporal fields are much more complex than those found in the discrete spaces that AI algorithms typically work with, from a purely object recognition perspective (e.g. items in a basket or tagging, dog and cat images). With geoscience changing from a data poor to a data rich field, we are increasingly witnessing algorithms such as deep convolutional neural networks, that dramatically reduced ILSVRC error rates in the 2010s, being increasingly applied to geoscience problems. It is therefore, conceivable that a similar ImageNet designed for geosciences and a global challenge may help bridge gaps with AI and advance both fields.
This work thus discusses an ongoing proposal and project update on the creation of an ImageNet analog for environmental sciences. Our efforts will contribute a novel labelled dataset called the EnviroNet for the wider community across both geoscience and AI fields, and design of an EnviroNet-International planetary computing challenge (E-IPCC). At present, feedback is being sought on EnviroNet from various universities, industry, research labs and startups with support from the American Meteorological Society Committee on Artificial Intelligence Applications to Environmental Science and collaborators. This EnviroNet project will seek to provide a new benchmark and labelled dataset lacking in geoscience to track progress of AI algorithms, to bridge gaps in both fields. Similar to ImageNet categories, the project will seek to advance the field instead to address geoscience challenges: 1) localise model or satellite data against ground observations 2) classify events, including extremes 3) track events in spatiotemporal fields, and a category unique to geosciences that will involve, 4) forecasting future events. It is expected that the EnviroNet labelled dataset and challenge will be made available at groundobs.org.
[1] Rockström, J., Steffen, W., Noone, K., Persson, Å., Chapin III, F. S., Lambin, E. F., ... & Nykvist, B. (2009). A safe operating space for humanity. nature, 461(7263), 472.
[2] Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, H. A., & Kumar, V. (2018). Machine Learning for the Geosciences: Challenges and Opportunities. IEEE Transactions on Knowledge and Data Engineering.
[3] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). Ieee.
[4] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252.