Bringing FAIR Principles to AI Models

The Science

Researchers proposed the original FAIR (findable, accessible, interoperable, and reusable) principles to define best practices to maximize the use of datasets by researchers and machines. Now scientists have adapted these principles for scientific datasets and research software. The effort has two broad goals. The first is to increase the transparency, reproducibility, and reusability of research. The second is to support software reuse over redevelopment. Artificial intelligence (AI) models bring together several digital assets such as datasets, research software, and advanced computing. To follow suit, FAIR principles for AI models also require a computational framework for evaluating FAIR principles. A new paper introduces a set of practical, concise, and measurable FAIR principles for AI models. The paper also describes how to combine FAIR AI models and datasets to accelerate scientific discovery.

The Impact

This work introduces the definition of FAIR principles for AI models. It also showcases how to apply these principles to a special type of microscopy. Specifically, this work demonstrates how to combine FAIR datasets and FAIR AI models to characterize materials at Argonne National Laboratory’s (ANL) Advanced Photon Source two orders of magnitude faster than traditional methods. It also shows how to link ANL’s Advanced Photon Source with the Argonne Leadership Computing Facility to accelerate scientific discovery. This approach transcends differences in computer hardware, allows researchers to speak a common AI language, and enables accelerated AI-driven discovery. These FAIR guidelines for AI models will catalyze the development of next-generation AI and help find connections between data, AI models, and high-performance computing.

Summary

In this research, scientists produced a FAIR experimental dataset of Bragg diffraction peaks of an undeformed bi-crystal gold sample produced at the Advanced Photon Source at Argonne National Laboratory. This FAIR and AI-ready dataset was published at the Materials Data Facility. The researchers then used this dataset to train three types of AI models at the Argonne Leadership Computing Facility (ALCF): a traditional AI model using the open-source API PyTorch; an NVIDIA TensorRT version of the traditional PyTorch AI model using the ThetaGPU supercomputer; and a model trained on the SambaNova DataScale system at the ALCF AI Testbed. These AI models incorporate uncertainty quantification metrics that clearly indicate when AI predictions are trustworthy.

These three different models were then published in the Data and Learning Hub for Science following the researchers’ proposed FAIR principles for AI models. They then linked all these different resources, FAIR AI models, and datasets and used the ThetaGPU supercomputer at the ALCF to conduct reproducible AI-driven inference. This entire workflow is orchestrated with Globus and executed with Globus Compute. The researchers developed software to automate this work and asked colleagues at the University of Illinois to independently verify the reproducibility of the findings.

Funding

This work was supported by the FAIR Data program and the Braid project of the Department of Energy (DOE) Office of Science, Advanced Scientific Computing Research. It used resources of the Argonne Leadership Computing Facility, a DOE Office of Science user facility. It was also supported by the Department of Commerce, National Institute of Standards and Technology, the National Science Foundation, Argonne National Laboratory’s Laboratory Directed Research and Development program, and resources of the Advanced Photon Source, a DOE Office of Science user facility at Argonne National Laboratory.

withyou android app