A metadata extraction pipeline has been developed to provide an efficient, plugin-in based, method for adding new parsers, a configuration system that lets non-developers customize how files are processed, and a system for identifying and logging metadata quality issues to ensure they are readily found and addressed. The metadata extraction pipeline identifies critical pieces of metadata that are needed to promote data FAIRness, including location, file revision, measurement start/end datetime and can be easily modified to extract further information (such as variables). Given the wide-ranging datasets, the pipeline has been modified to accommodate multiple file formats, including multiple versions of ICARTT (International Consortium for Atmospheric Research on Transport and Transformation), HDF (Hierarchical Data Format), netCDF (network Common Data Form), and multiple versions of the Ames File Format. The pipeline also supports building metadata for file formats that cannot have metadata easily extracted from them, such as PDF (Portable Document Format) and GIF (Graphics Interchange Format). The pipeline has allowed our team to maintain a consistent flow of data and metadata to archival and distribution services, ensuring the ASDC meets the needs of the suborbital science community. This presentation will highlight the ASDC’s suborbital metadata extraction pipeline, its development, how it’s been modified to support data FAIRness, and plans for maintaining the pipeline and adding new features.

