Proteins are a string of amino acids linked together in a specific sequence, with 20 possible amino acids coming into play. Each protein is defined by its one-dimensional sequence of amino acids, which can be anywhere from a few dozen to several thousand amino acids long. For proteins to function, that string of amino acids must fold into complex three-dimensional shapes – and from there they are ready to interact with different parts of cells or other proteins. This makes them dynamic, coming together and apart to send signals, change cell states, fight infections, reproduce, even create new proteins.
Accurately predicting protein structures and interactions can dramatically accelerate biological research. Take viruses for example — understanding the nature of the interaction between a virus and human proteins is the first step before deducing how an infection takes hold, and then how to interrupt that process. If researchers were to start with that information, bringing vaccines to market could be a much faster process. And vaccines are just one application – developing drug and gene therapies, battling antibiotic resistance, and finding a solution for microplastic pollution are just some areas of research that could benefit.
Thanks to what is now heralded as the greatest contribution A.I. has made to the world to date, the ability to predict protein structure and interactions can be achieved by entering information on a computer and letting machine learning do the rest. DeepMind, a British A.I. subsidiary of Alphabet Inc., is a research laboratory founded in September 2010 and acquired by Google in 2014. DeepMind developed AlphaFold, software that, at first, only predicted a protein’s 3D structure from its amino acid sequence – but by open-sourcing the software and inviting researchers around the world to access the code and the growing database of protein structures, the scientists trying out AlphaFold discovered it could do even more. Professor Yoshitaka Moriwaki from the University of Tokyo tweeted a picture of his screen after asking AlphaFold to predict the structure of two proteins that he connected with a long, thin, looping linker. That linker made AlphaFold think it was analyzing a single protein – and so it predicted the structure that the two proteins would form together.
It’s Evolutionary, Dear Watson
This is where machine learning comes in. AlphaFold had the ability to predict how chains of amino acids fold together by looking at the known structure of proteins with similar amino acid sequences. It looked for “rules” – logic that the protein interactions followed. Although mutations may occur in a protein over eons of evolution, the structure largely stays the same so it can maintain its key function. The rules that AlphaFold followed were essentially observations of proteins of different sequences within the same family.
Pedro Beltrao, a cell biologist at the European Bioinformatics Institute, used AlphaFold to study known protein interactions in human cells. Beltrao and colleagues applied the software to more than 65,000 protein-protein interactions. The results, which appeared in a preprint late last year, show that they identified the structure of more than 3,000 complexes with a high degree of confidence. Before this, only 5% had been defined through methods such as X-ray crystallography.
Potential Applications
This has two applications for human health: understanding how mutations impact protein function, and finding structural pockets where drugs could be targeted. Roland Dunbrack, a structural biologist at the Fox Chase Cancer Center put this into perspective in a press release saying, “What we can do with AlphaFold, specifically in cancer, is predict the structure of the protein in which a mutation occurs. We can see where that mutation is, see how the mutated protein interacts with other molecules, and ultimately get an idea … why that mutation causes a problem for that cell. Once you have a mechanism for how a mutation causes a problem, you can begin to think about strategies for stopping that problem.”
With a more comprehensive image of the protein interactions formed by cancer-associated proteins such as BRCA1 and RAS, researchers are better-positioned to detect their vulnerabilities. RAS has been particularly difficult to target, even as it plays a role in more than 30% of all cancers, including nearly half of all colon cancers and virtually all pancreatic cancer.
Complex Collaborations
David Baker, Head of the Institute for Protein Design at the University of Washington, and Minkyung Baek, a postdoctoral candidate in Baker’s lab, reverse-engineered DeepMind’s software before it was open-sourced. In June, the University released their own open-source, nearly-as-good version called RoseTTAFold that could also work for protein-protein complexes.
Baker collaborated broadly to advance the application of the software further. In a new paper, researchers used RoseTTAFold and AlphaFold to predict protein-protein complexes that scientists didn’t even know existed. They identified hundreds of new interactions, potentially pointing to new mechanism at the center of core processes such as DNA repair, protein translation and replication.
“The exciting step forward here is that protein complexes provide considerably more insight into mechanism than [individual proteins],” Baker said in an email. “The biological impact of this second chapter is likely considerably greater than the first.”
Due Diligence
Dunbrack noted that you still want to validate structures experimentally before developing drugs for it. Even though AlphaFold can be highly accurate in its predictions, it still makes mistakes. Recently a cancer-associated protein called BRD4 made the software “crash and burn.”
Regardless of these caveats, this transformational tool represents a powerful new approach that is bridging the worlds of biology and technology, energizing scientists, and spurring productive collaborations. In addition to open access to the software, researchers can tap into the AlphaFold Protein Structure Database, a collaboration between the European Molecular Biology Laboratory and DeepMind with structures for more than 350,000 proteins for 21 model organisms with plans for expanding predictions to millions of structures in 2022.
Did you enjoy this blog post? Check out our other blog posts as well as related topics on our Webinar page
QPS is a GLP- and GCP-compliant contract research organization (CRO) delivering the highest grade of discovery, preclinical and clinical drug research development services. Since 1995, it has grown from a tiny bioanalysis shop to a full-service CRO with 1,100+ employees in the U.S., Europe and Asia. Today, QPS offers expanded pharmaceutical contract R&D services with special expertise in neuropharmacology, DMPK, toxicology, bioanalysis, translational medicine and clinical development. An award-winning leader focused on bioanalytics and clinical trials, QPS is known for proven quality standards, technical expertise, a flexible approach to research, client satisfaction and turnkey laboratories and facilities. Through continual enhancements in capacities and resources, QPS stands tall in its commitment to delivering superior quality, skilled performance and trusted service to its valued customers. For more information, visit www.qps.com or email info@qps.com.