The Science
The Impact
This work introduces the definition of FAIR principles for AI models. It also showcases how to apply these principles to a special type of microscopy. Specifically, this work demonstrates how to combine FAIR datasets and FAIR AI models to characterize materials at Argonne National Laboratory’s (ANL) Advanced Photon Source two orders of magnitude faster than traditional methods. It also shows how to link ANL’s Advanced Photon Source with the Argonne Leadership Computing Facility to accelerate scientific discovery. This approach transcends differences in computer hardware, allows researchers to speak a common AI language, and enables accelerated AI-driven discovery. These FAIR guidelines for AI models will catalyze the development of next-generation AI and help find connections between data, AI models, and high-performance computing.
Summary
In this research, scientists produced a FAIR experimental dataset of Bragg diffraction peaks of an undeformed bi-crystal gold sample produced at the Advanced Photon Source at Argonne National Laboratory. This FAIR and AI-ready dataset was published at the Materials Data Facility. The researchers then used this dataset to train three types of AI models at the Argonne Leadership Computing Facility (ALCF): a traditional AI model using the open-source API PyTorch; an NVIDIA TensorRT version of the traditional PyTorch AI model using the ThetaGPU supercomputer; and a model trained on the SambaNova DataScaleⓇ system at the ALCF AI Testbed. These AI models incorporate uncertainty quantification metrics that clearly indicate when AI predictions are trustworthy.
These three different models were then published in the Data and Learning Hub for Science following the researchers’ proposed FAIR principles for AI models. They then linked all these different resources, FAIR AI models, and datasets and used the ThetaGPU supercomputer at the ALCF to conduct reproducible AI-driven inference. This entire workflow is orchestrated with Globus and executed with Globus Compute. The researchers developed software to automate this work and asked colleagues at the University of Illinois to independently verify the reproducibility of the findings.
Funding
This work was supported by the FAIR Data program and the Braid project of the Department of Energy (DOE) Office of Science, Advanced Scientific Computing Research. It used resources of the Argonne Leadership Computing Facility, a DOE Office of Science user facility. It was also supported by the Department of Commerce, National Institute of Standards and Technology, the National Science Foundation, Argonne National Laboratory’s Laboratory Directed Research and Development program, and resources of the Advanced Photon Source, a DOE Office of Science user facility at Argonne National Laboratory.