Deep Mind Knows how Proteins Fold

This week, Deep Mind, a London-based Google company, claims to have solved the number one most consequential problem in computational biochemistry: the protein-folding problem.  If true, this could be the start of something big.


What does it mean, and why is it important? Let’s start with signal transduction. This is a word for the body’s chemical computer. The nervous system, of course, constitutes a signal-processing and decision-making engine; and in parallel, there is a chemical computer. The body has molecules that talk to other molecules that talk to other molecules, sending a cascade of ifs and thens down a chain of logic. The way molecules with very complex shapes fit snugly together is the language of the chemical computer. These molecules with intricate shapes are proteins, and they are not formed in 3D. Rather, DNA provides instructions for a linear peptide chain of amino acids which are transcribed in ribosomes (present in every cell) to create a chain of amino acids, chosen from a canonical set of 20. Each peptide chain folds into a protein with a characteristic shape, and it is these shapes that constitute the body’s signaling language. Most age-related diseases can be traced to an excess or a deficiency of these protein signal molecules.

So signal proteins are targets of medical research. Pharmaceutical interventions may modify signal transduction, perhaps by goosing signaling at some juncture, or by siphoning off a particular signal with another chemical designed to fit perfectly into its bumps and hollows. Up until now, there has been a lot of trial and error in the lab, looking for chemicals with complementary shapes. Imagine now that the Deep Mind press release is not exaggerating, and they really can reliably predict the shape that a peptide will take once it is folded. Then many months of laboratory experiments can be replaced with many hours of computation. All the trial-and-error work can be done in cyberspace. An inflection point in drug development, if it’s true.

Why it’s a Hard Problem

Computers solve large problems by breaking them down into a great many small ones. But protein folding can’t be solved by looking separately at each segment of the protein molecule. Everything affects everything else, and the optimal shape is a property of the whole. Proteins are typically huge molecules, with hundreds or thousands of amino acids chained together. The peptide bonds allow for free rotation. So the number of shapes you can form with a given chain is truly humongus. The sheer number of possibilities would overwhelm any computer program that tried to deal with the different shapes one at a time.

The thing that stabilizes a given shape is hydrogen bonding. Nominally, each hydrogen atom can form only one bond to a carbon or oxygen, but every hydrogen is a closet bigamist, and it longs to couple with a nearby carbon or (better still) oxygen atom even as it is bound primarily to its LTR partner. Every twist and bend in the molecular chain allows some new opportunities for hydrogen bonding, while removing others. The breakthrough in computing came 1% inspiration, 99% perspiration (Edisonn’s recipe). A key input was to map the structure of 170,000 known, natural proteins, and to train the computer to be able to retrodict the known results. Then, when working with a new and unknown shape, the computer makes decisions that are based on its past success.

How does it make the decisions? No one knows. One of the most successful techniques in artificial intelligence uses generic layers of input and output with programmable maps, and the maps are trained to give the right answer in known cases. But the fundamental logic that drives these decisions remains opaque, even to the programmers. 

 

It gets more complicated

Many proteins don’t have a unique folded state. They are in danger of folding the wrong way. So there are proteins called chaperones that help them to get it right. These chaperones don’t explicitly dictate the proetein’s final structure, but rather they place the protein in a protected environment. There are 20,000 different proteins needed in the human body, but only a handful of different chaperones.


Factoid: Most inorganic chemical reactions take place on a time scale of billionths of a second. Organic reactions are somewhat slower. But protein folding happens on a human time scale of seconds, or even minutes.


The AI that finds a protein’s ultimate structure must have knowledge of the environment in which the protein folds. It is not merely computing something intrinsic to the sequence of amino acids that makes up the nacent protein. To underscore this problem, proteins fold incorrectly almost as often as they fold correctly. There is an army of caretaker proteins that inspect and correct already-folded proteins. Misfolded proteins tend to clump together and there are chemicals specialized in puilling them apart. For the lost causes, there are proteasomes, which break the peptide bonds and recycle a damaged protein into constituent parts. The name ubiquitin derives from the fact that these protein recyclers are found in every part of every cell.

The question arises, how do these caretaker proteins know what is the correct shape and what is a misfolded shape? Remember that the number of chaperones and caretakers is vastly smaller than the number of proteins that they attend to, so they cannot contain detailed information about the proper conformation of each protein they service. And this leads to a deep question for AI: It’s hard enough to know how a particular protein chain will fold into a conformation that is thermodynamically optimized. But the conformation optimized for least energy may or may not be the one that is useful to the body.

Prions are mysterious

In the late 1970s, a young neurologist named Stanley Prusiner began to suspect that misfolded proteins could be infectious agents. He coined the term prion for a misfolded protein that could cause other proteins to misfold. This idea defied ideas about how pathogens evolve, and in particular ran afoul of Francis Crick’s Central Dogma of Molecular Biology, which said that information was always stored in DNA and transferred downstream to proteins.

The evolutionary provenance of prions remains a mystery, but it is now well-established that certain misfolded proteins can cause a chain reaction of misfolding. The process is as mysterious as it is frightening. Neil Ferguson, who has become infamous this year for his apocalyptic COVID contagion models, frightened the UK in an earlier episode into slaughtering and incinerating more than 6 million cows and sheep, in a classic example of panic leading to overkill.

Prusiner had to wait less than 20 years before the medical community acceded to his heresy. He was awarded the Nobel Prize in 1997.

Example and Teaser

This example is from a review I am preparing for this space next week. I am reading two recent papers about proteins in the blood that change as we age. Assuming that these signals are drivers of aging, what can be done to enhance the action of those that we lose, or suppress the action of those that increase with age? The connection to the present column is that knowledge of protein folding can be used to engineer proteins that redirect the body’s chemical signal transduction at a given intervention point. For example, FSH (follicle-stimulating hormone) is needed just a few days of a woman’s menstrual cycle, but FSH levels rise late in life, with disastrous consequences for health. FSH shoots up in female menopause, and in males it rises more gradually.

FSH drives the imbalance in blood lipids associated with heart disease and stroke. In lab rodents, FSH can be blocked with an antibody, or by genetic engineering, with consequent benefits for cardiovascular health [ref] and loss of bone mass [ref]. The therapy also reduces body fat “Here, we report that this antibody sharply reduces adipose tissue in wild-type mice, phenocopying genetic haploinsufficiency for the Fsh receptor gene Fshr. The antibody also causes profound beiging*, increases cellular mitochondrial density, activates brown adipose tissue and enhances thermogenesis.” [ref] In the near future, we may be able to use computer-assisted protein design to create a protein that blocks the FSH receptor and do safely in humans what was done with genetic engineering in mice.
_______________
*Beiging is turning white adipose tissue to brown. Briefly, the white fat cells are permanent and cause diabetes, while the brown are burned for fuel.

12 thoughts on “Deep Mind Knows how Proteins Fold

  1. Josh:
    Your article was a masterpiece of communication of a complex issue in an
    understandable manner. Thanks for that!

    • Agreed, very nice distillation of very complex topic…..but regarding Prusiner (hadn’t heard of him previously) my editor’s eye noticed he’s described as making his discovery in late 1970s and got his Nobel less than 20 years later, so there’s a typo in the date, he got the prize in 1997, not 1977

  2. Vow Alphamind (Deepmind) is leaving everyone in the dust. Beating the rest by a good 20-30%. Reminds me of Alexnet in 2012 – the first breakthrough of deep learning in image recognition.
    Hopefully they can solve the binding problem as well.

  3. Josh computational biology is where you have distinct advantage over other researchers of molecular biology. Explaining the crux of the breakthrough gave a glimpse of your intelligence. I especially liked one line: “Most age-related diseases can be traced to an excess or a deficiency of these protein signal molecules.”

  4. does it mean that having identified the methylation site on the DNA we will be able create fast artificial demethylation agent without arduous search of natural safe substance that will do the job?

    • You can’t just demethylate the whole genome, you’d probably want to target certain amino acids resides in particular histones. Or maybe target certain regulatory genes that maintain proper methylation levels, if we find that these ‘guardians’ themselves lose expression levels with age.

      • “the methylation site” I mean the single one point, like the one that keeps telomerase switched off in most cells most of the time, or the one that can’t switch ISR site off once it get activated in many cell types. Actually, telomerase plays few roles – first it goes to mitochondria to repair them, then it lengthen chromosomes’ caps, and the methylation pattern change on the whole chromosome to a young state is a side effect. So it is the safest way to do the demethylation of the DNA possible – switch on the one single methylated point (that is persistently switched off) and the cell makes all the rest by itself.

    • I think thats a different problem. We could apply deep learning to find hidden guiding sequences that predispose certain promoters for age dependent differential methylation.
      Probably there is an interplay of many binding factors. So experiments focusing on a few binding factors have a low chance of uncovering the mechanism. Then we could identify the signalling networks that drive the age related differential methylation and attack those to stop the clock.

  5. Hopefully we’ll soon see a vast improvement in the speed and accuracy of computational matching of pharmaceuticals to their protein targets. Right now we have lots of targets we want to hit but have to play roulette with a couple of million $ doing assays every time we look for a candidate drug.

  6. Interesting article and intriguing information, Josh.

    Just to note and you very likely already know this information… Inhibin B suppresses FSH

    Inhibins are heterodimeric protein hormones secreted by granulosa cells of the ovary in females and Sertoli cells of the testis in males. Inhibins selectively suppress the secretion of pituitary follicle-stimulating hormone (FSH) and also have local paracrine actions in the gonads.

    Also in women Estradiol/Estrogen therapy will lower FSH levels, although it will not stimulate egg production in the ovary.

    Dhea may also lower FSH levels.

Leave a Reply

Your email address will not be published. Required fields are marked *