More than 20 years ago when researchers shared the human genome sequence, the same scientists counted 20,000 discrete sections, approximately 1% of our DNA, that code for proteins. In the time since, researchers have detected 90% of those proteins. The missing portion of this catalog, labeled the “dark proteome,” has three parts: unidentified proteins for known genes, uncharacterized versions of modified proteins, and other proteins believed to exist but with unknown structure and function.
Scientists are searching for these undiscovered human proteins with hopes of understanding our overall biology, and especially human disease mechanisms, to devise ways to counter them. The goal is to build a more complete human proteome — the set of all expressed proteins in the human body. The current state of the search — and where the gaps lie —is described in a recent article in Chemical & Engineering News.
Undetectable … For Now
One of the reasons some proteins remain hidden is simply because they elude current methods used to detect them. This may be because they are not expressed by cells or may only be expressed in very limited quantities at very specific times, which makes them hard to detect by typical analyses. Alternately, they could be undetectable because they lack features necessary for analysis. For example, if the enzyme trypsin can’t digest a protein, that protein can’t be detected by mass spectrometry. While other enzymes can be used to break the proteins into peptides, the motivation to break protocol and use an enzyme that’s more
limited in application than trypsin simply isn’t there. Alternately, it may be as simple as timing. If proteins aren’t in a sample, they can’t be detected.
The payoff for trying alternative protocols or dramatically increasing frequency of sampling isn’t guaranteed, so grants to research these missing proteins are in short supply. More often than not, scientists are conducting additional research in and around of their other, funded work.
Proteoform: A Protein Modified
Protein variants, or proteoforms, result from differences in gene expression or modifications that occur after a protein is produced. This adds complexity to the dark proteome, given that there could be millions of protein varieties expressed by the roughly 20,000 human genes.
As explained in the Chemical and Engineering News article, the protein-making process doesn’t simply start at one end of a gene and then read through to the other end – the RNA is often spliced together in different ways. This means a gene might code for protein sections 1, 2, 3, and 4, but the spliced RNA could tell the ribosome to make a protein consisting of 1-2-3-4 or 1-3-4 or 1-2-4. These different forms of the expressed gene are called isoforms, and each can have a different task to perform in the body.
Beyond isoforms, proteins can be chemically modified by different inner workings of a cell. One example of posttranslational modifications includes additions of methyl or acetyl groups, or sugar molecules. Another modification results from an enzyme that “snips” proteins into shorter ones so they can serve a different purpose – this is the case in manufacturing the biologically active form of insulin, which results from the cutting of a longer protein.
To create tools to better detect and identify protein variations, Parag Mallick cofounded Nautilus Biotechnology. The Nautilus method uses fluorescent reagents that can bind to specific protein structures and variations while they are immobilized, as opposed to having to destroy the proteins in a sample for mass spectrometry. This way the same sample can be analyzed with multiple kinds of probes. This technology is not commercially available but may be in the future.
Using Artificial Intelligence to Search in the Dark
Proteins with unknown structures or functions represent another area of the dark proteome. Scientists are deploying new artificial intelligence-based research methods to predict the structures of these unknown
proteins. Using the amino acid sequences of dark proteins, algorithms apply knowledge from data on known protein structures to predict how the unknown proteins’ amino acid chains might fold.
This folding — creating complex three-dimensional shapes — determines how proteins interact with different parts of cells and other proteins. Protein structures are not always fixed or well-defined, and research suggests that some of the proteins in this part of the dark proteome may fold in ways that have not yet been seen.
Uncovering these unknown proteins’ structures could hint at their functions. Other clues to function may come from gene silencing experiments and AI approaches. However, researchers caution that the data to feed AI algorithms are not yet advanced enough to really put AI to use in this application.
When discussing the dark proteome, researchers refer to “the known unknowns,” acknowledging how much more there may be to explore and uncover. As new technology and approaches are successfully applied to shed light on the present unknowns, we hope to get that much closer to understanding our overall biology – and how to restore or repair it as needed.
QPS is a GLP- and GCP-compliant contract research organization (CRO) delivering the highest grade of discovery, preclinical and clinical drug research development services. Since 1995, it has grown from a tiny bioanalysis shop to a full-service CRO with 1,100+ employees in the U.S., Europe and Asia. Today, QPS offers expanded pharmaceutical contract R&D services with special expertise in neuropharmacology, DMPK, toxicology, bioanalysis, translational medicine and clinical development. An award-winning leader focused on bioanalytics and clinical trials, QPS is known for proven quality standards, technical expertise, a flexible approach to research, client satisfaction and turnkey laboratories and facilities. Through continual enhancements in capacities and resources, QPS stands tall in its commitment to delivering superior quality, skilled performance and trusted service to its valued customers. For more information, visit www.qps.com or email [email protected].