As simple as it is, the SARS-CoV-2 virus, responsible for the COVID-19 pandemic, has researchers around the world grappling to explain how it infiltrates the human immune system, setting off a viral chain reaction throughout the body. Understanding how the process works could lead to more effective ways of targeting drugs to stop that reaction.
Supporting a large collaboration of research organizations and scientific disciplines, the U.S. Department of Energy’s (DOE) Argonne National Laboratory is piecing together this puzzle by exploring the use of artificial intelligence and high-performance computing resources to study, in great detail, the complex dynamics of the spike protein, one of the key proteins in the SARS-CoV-2 virus.
The team, comprised of nearly 30 researchers across 10 organizations, is trying to understand how that protein binds to and interacts with one of the first point of contacts with the human cell, the ACE2-receptor protein. That binding begins a cascade of events that eventually lets the viral and human cell membranes fuse, allowing the SARS-CoV-2 virus to enter and infect the host.
Proteins aren’t static, they have a wide range of motions that span multiple length- and timescales and it’s not always understood which motions are important, notes Arvind Ramanathan, an Argonne computational biologist and co-principal investigator on the project. To understand and simulate those actions requires a huge amount of data and computing resources.
Developing a reasonable simulation of the spike protein alone can create a huge system consisting of approximately 1.8 million atoms and simulations can consist of enormous datasets that tax the resources of even the largest supercomputers. In order to make that data more accessible for interpretation, the team developed a machine learning method that can summarize large volumes of data.
“One of the key things that this method allowed us to do was to determine what was interesting, what was important, even those things that were not obvious to the human eye,” said Ramanathan. “So, when you look deeper using the simulations, you start seeing significant changes in the protein structure, which told us something about how the spike protein opens up such that it can interact with the ACE2 receptor.”
As the size of the systems they were working on grew, the team faced challenges of scaling all of the data to run fluidly on today’s biggest and best supercomputing systems, as well as their key components.
Because many of the machine learning models they were training on these large simulations needed to be efficiently scaled for use on supercomputers, they partnered with NVIDIA, a leader in GPU and artificial intelligence design, to effectively run the models on Summit, at the DOE’s Oak Ridge National Laboratory. The team also utilized many of the top U.S. supercomputers, including Theta at Argonne; Frontera/Longhorn at Texas Advanced Computing Center; Comet at San Diego Supercomputing Center; and Lassen at DOE’s Lawrence Livermore National Laboratory, to uncover alternate ways to handle the deluge of data.
“Given the complexity of the data, trying to understand the ACE2 receptor-spike interaction seemed almost impossible at this scale,” Ramanathan confided. “One of the things that we clearly showed was that we could actuate a sampling of these dynamical configurations, pushing the idea that we could use AI to bridge these different scales.”
The data generated, so far, is providing new insights into how the stalk region of the spike protein changes its overall motions when it interacts with the ACE2 receptor, he said. Eventually, these kinds of insights derived from the highly conjoined combination of machine learning and simulation will help facilitate antibody or vaccine discoveries.
The team’s paper, “AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics,” is a finalist for a prestigious 2020 Gordon Bell Special Prize, sponsored by the Association for Computing Machinery. The awards will be announced Nov. 18–19 at SC20, the International Conference for High Performance Computing, Networking, Storage, and Analysis, held virtually this year. The paper will appear in the International Journal of High Performance Computing Applications, 2020.
“Whether we win or not, the whole point is just pushing the boundaries of what we think we can do with AI,” said Ramanathan. “The ability to scale such a huge set of simulations on one hand, and also be able to use AI to drive some factors was key for this to work out.”
This research was supported by the Exascale Computing Project, a collaborative effort of the U.S. DOE Office of Science and the National Nuclear Security Administration, and the DOE’s National Virtual Biotechnology Laboratory with funding from the Coronavirus CARES Act. This work used resources, services, and support from the COVID-19 HPC Consortium.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.