The AI-driven initiative that’s hastening the discovery of drugs to treat COVID-19

To find a drug that can stop the SARS-CoV-2 virus, scientists want to screen billions of molecules for the right combination of properties. The process is usually risky and slow, often taking several years. However, an international team of scientists say they’ve found a way to make the process 50,000 times faster using artificial intelligence (AI).

“With the AI we’ve implemented, we’ve been able to screen four billion potential drug candidates in a matter of a day, while existing computational tools might only realistically screen one to 10 million.” — Thomas Brettin, strategic program manager at Argonne

Ten organizations, including the U.S. Department of Energy’s (DOE) Argonne National Laboratory, have developed a pipeline of AI and simulation techniques to hasten the discovery of promising drug candidates for COVID-19, the disease caused by the SARS-CoV-2 virus. The pipeline is named IMPECCABLE, short for Integrated Modeling PipelinE for COVID Cure by Assessing Better Leads. 

“With the AI we’ve implemented, we’ve been able to screen four billion potential drug candidates in a matter of a day, while existing computational tools might only realistically screen one to 10 million,” said Thomas Brettin, strategic program manager at Argonne.

Why an integrated approach is needed

IMPECCABLE integrates multiple techniques for data processing, physics-based modeling and simulation, and machine learning, a form of AI that uses patterns in data to generate predictive models.

“We integrate multiple approaches because there’s no single algorithm or method that can single-handedly work with great efficiency and accuracy,” said Argonne computational biologist Arvind Ramanathan. “If we only relied on simulations, it would take us years to find a likely target, even with the fastest supercomputers.”

Components of the pipeline

At the start of the pipeline, computational techniques are used to calculate the basic properties of billions of molecules. This data is used in the next stage of the pipeline to create machine learning models that can predict how likely it is that a given molecule will bind with a known viral protein. Those found to be most promising are then simulated on high-performance computing systems.

“Proteins are fluid structures, and simulations show us new conformations for them. We use those to improve our machine learning models,” said Argonne computational scientist Austin Clyde. “The iterative process continues until we can validate that the molecules we’ve identified as likely to bind to SARS-CoV-2 proteins have promise.”

Very large experimental data sets are also being gathered from thousands of protein crystals using X-rays at the Advanced Photon Source (APS), a DOE Office of Science User Facility on Argonne’s campus. The technique they’re using to get this data is known as X-ray crystallography. With it, researchers can capture detailed images of viral proteins and their chemical states to improve the accuracy of their machine learning models.

“Since the beginning of the pandemic, we’ve been able to determine over 45 high-resolution crystal structures of SARS-CoV-2 proteins and their complexes with other compounds. This information, when combined with computational analysis, can provide critical insights for further structure-based drug design efforts and enable the design of higher affinity inhibitors and, ultimately therapeutics that can be used to treat COVID-19,” said Andrzej Joachimiak, director of the Structural Biology Center (SBC) at beamline 19-ID-D of the APS.

The ultimate goals of the pipeline are to (1) understand the function of viral proteins; (2) identify molecules with a high potential to bind with these proteins and, as a result, block SARS-CoV-2 proliferation; and (3) deliver this insight to drug designers and developers for further research and development.

“Unlike the traditional approach, where you rely on the scientist to think really hard and, based on what they know, come up with ideas for a molecule, with our pipeline you can screen huge numbers of molecules automatically, dramatically increasing your chance of finding a likely candidate,” said Ian Foster, director of Argonne’s Data Science and Learning division.

Organizations involved in this research include Argonne, Rutgers University, University College London, University of Chicago, DOE’s Brookhaven National Laboratory, Oak Ridge Leadership Computing Facility (OLCF), Leibniz Supercomputing Center, NVIDIA Corporation, University of Amsterdam and the University of Naples Federico II. The team performed its computations on a diverse range of high-performance computing platforms, including the Texas Advanced Computing Center’s Frontera, Lawrence Livermore National Laboratory’s Lassen, Leibniz Supercomputing Center’s SuperMUC-NG, OLCF’s Summit, and the Argonne Leadership Computing Facility’s Theta, which was recently expanded with GPU nodes. The ALCF and OLCF are DOE Office of Science User Facilities.

This research was supported by DOE’s Office of Science through the National Virtual Biotechnology Laboratory, and as part of the CANDLE project by the Exascale Computing Project, a collaborative effort of DOE’s Office of Science and the National Nuclear Security Administration.

This work has also been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by DOE and the National Cancer Institute (NCI) of the National Institutes of Health. SBC is funded by the DOE Office of Science Biological and Environmental Research program.

The Advanced Photon Source is a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory. Additional funding for beamlines used for COVID-19 research at the APS is provided by the National Institutes of Health (NIH) and by DOE Office of Science Biological and Environmental Research. The APS operated for 10 percent more hours in 2020 than usual to support COVID-19 research, with the additional time supported by the DOE Office of Science through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to COVID-19 with funding provided by the Coronavirus CARES Act.

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

About the Advanced Photon Source

The U. S. Department of Energy Office of Science’s Advanced Photon Source (APS) at Argonne National Laboratory is one of the world’s most productive X-ray light source facilities. The APS provides high-brightness X-ray beams to a diverse community of researchers in materials science, chemistry, condensed matter physics, the life and environmental sciences, and applied research. These X-rays are ideally suited for explorations of materials and biological structures; elemental distribution; chemical, magnetic, electronic states; and a wide range of technologically important engineering systems from batteries to fuel injector sprays, all of which are the foundations of our nation’s economic, technological, and physical well-being. Each year, more than 5,000 researchers use the APS to produce over 2,000 publications detailing impactful discoveries, and solve more vital biological protein structures than users of any other X-ray light source research facility. APS scientists and engineers innovate technology that is at the heart of advancing accelerator and light-source operations. This includes the insertion devices that produce extreme-brightness X-rays prized by researchers, lenses that focus the X-rays down to a few nanometers, instrumentation that maximizes the way the X-rays interact with samples being studied, and software that gathers and manages the massive quantity of data resulting from discovery research at the APS.

This research used resources of the Advanced Photon Source, a U.S. DOE Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.