7.1 Using R to Promote Data Interoperability in the Atmospheric Science Community

Wednesday, 10 January 2018: 8:30 AM
Room 13AB (ACC) (Austin, Texas)
Joshua A. Roberti, National Ecological Observatory Network, Boulder, CO; and R. H. Lee, C. Flagg, and L. Stanish

An abundance of environmental data are freely available on the internet. Gathering these data can sometimes be arduous if a researcher is interested in compiling numerous datatypes (e.g., precipitation, soil moisture, radiation) from a handful of networks managed by different governing bodies (e.g., NSF, USDA, NOAA). Data availability, metadata, sampling rates and aggregated data products are not consistent among networks. Often times, this inherent issue can lead to a large number of hours spent searching for data only to find that variables of interest may not be collected at specific sites, or sites that do report these variables may not have been active during the time period of interest.

At last year’s annual AMS meeting, we introduced the metScanR package (https://cran.r-project.org/package=metScanR) for the R Programming language. This package enables individuals to search for data across numerous environmental networks based on key parameters such as data types and active observation dates, among others. At the time of last year’s meeting, the package comprised metadata for ~13,000 stations within the USA. Since then the metScanR database has been updated to include metadata from >107,000 environmental stations, worldwide.

The metScanR package is a novel tool. To the authors’ knowledge it represents one of the most robust and complete databases of worldwide, environmental station metadata openly available. Its “fuzzy search” function allows users to enter basic environmental variables, such as temperature, precipitation, and wind (or more complex terms such as soil moisture and ammonia gas concentration, for example), and return environmental sites that monitor these variables, a useful concept given the vast amount of environmental datatypes in the database. In addition to being a metadata hub, metScanR interfaces well with R packages that download environmental data, such as RNRCS, rnoaa, etc. Along with this capability, we are developing new features that will enhance the intercomparability of datasets by allowing users to format data from many networks in the same format, timescales, and units. We are also in the process of building a Shiny Application (a web application built with R) that will enable people with no programming experience to use metScanR. With all of these components at play, we’re hopeful that metScanR will be a valuable resource for finding and downloading a wide range of environmental datasets. An overview of the metScanR package, its functions and many applications, as well as a use case will be presented.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner