S1 Enhancing Dataset Discovery with Knowledge Graph Link Prediction Techniques

Sunday, 28 January 2024
Hall E (The Baltimore Convention Center)
Sean Hughes, GSFC, College Park, MD; and I. Gerasimov, A. Mehrabian, and L. Pham

Handout (253.2 kB)

Enhancing Dataset Discovery with Knowledge Graph Link Prediction Techniques

Sean Hughes1,2, Irina Gerasimov1,3, Armin Mehrabian1,3, Long Pham1

1Code 619, NASA Goddard Space Flight Center, Greenbelt, MD, USA

2University of Maryland, College Park, MD, USA

3ADNET Systems Inc., Lanham, MD, USA

In the evolving landscape of open science, the ability to effectively navigate and discover pertinent datasets at data centers is increasingly significant. This hinges primarily on the presence of detailed metadata, delineating the dataset’s content and potential spheres of application. Recognizing the central role of metadata, we are in the process of developing a methodology that leverages knowledge graph link prediction algorithms to enhance metadata quality, aiming to facilitate a more efficient dataset discovery during user interface searches.

At the core of our ongoing project is the aspiration to utilize structured dataset metadata, managed and organized by a comprehensive dictionary of terms. This strategy allows us to integrate datasets and their inherent attributes into a preliminary Knowledge Graph. This graph is envisioned to be a pivotal asset in understanding and exploring the multifaceted relationships embedded within the datasets.

Taking initial steps, we are working on creating a model nurtured with relationships discerned from the Knowledge Graph, fostering a deeper understanding of dataset interconnections. We anticipate that this approach could potentially reveal novel associations, thereby refining the dataset search and retrieval process.

To ensure the viability of our approach, we plan to undertake a testing phase leveraging a selection of datasets from the Goddard Earth Sciences Data and Information Services Center (GES DISC). This stage will be instrumental in assessing the potential of our model to contribute to more streamlined dataset discovery pathways, aligning with the broader objectives of fostering advancements in open science. We look forward to sharing our progress and insights at the student conference and are eager to engage with peers and mentors to further refine our approach.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner