The Mother of All Clinical Trials, Part II

Part II: Why We Should Trust the Methylation Clocks to Measure Aging

Last week, I proposed that methylation age could be used to measure the benefits of putative anti-aging interventions.  This procedure has the potential to slash the cost and the duration of testing. The reason is that we don’t have to wait for a small percentage of experimental subjects to become sick or die.  The vast majority of subjects in a human anti-aging trial give us no information whatever.  In contrast, with the aging clock, every experimental subject is a data point, and the effect on his aging might be measured in a year or two.

The proposal depends critically on the assumption that whatever slows aging will slow the methylation clock, and the converse: whatever slows the methylation clock slows aging.  Some people will find this hard to believe, because their fundamental conception of aging is an accumulation of damage, so that any association with methylation will be incidental or worse!  (What if the changes in methylation that accompany aging tell the story of the body’s increasingly powerful efforts to repair the damage from aging?)

But for those of us who come to the table open to the idea that aging is an epigenetic program, a close (causal) association between methylation and aging seems utterly expected.  For decades, developmental biologists have assumed that development in childhood is driven by age-dependent gene expression. The only thing that prevents us from seeing that the same is true about aging is a kind of prejudice from evolutionary theory that I have described in my book and in this blog.

4. Brief History of the Horvath Clock

Of many biochemical markers that cells use for epigenetic control, methylation of CpG sites is best studied.  If you know what that is all the better; if you don’t, all you need to know is methylation is a modification of the DNA by adding a CH3 group to Cytosine residues in a promotor region adjacent to a gene.  Regions with heavy methylation tend to suppress expression of the (usually) adjacent gene.  Methylation isn’t the only means by which gene expression is controlled — there are many others.   But it is far the best-studied and, given present technology, it is the only epigenetic marker that can be routinely measured, for a few hundred dollars in a small sample of blood, urine, or nanogram-scale biopsy of other tissue.  

The clock was developed by Steve Horvath at UCLA, and first published in 2013, built on an idea of Teschendorff from a few years earlier..  He identified patient records for methylation measurements of tissue samples from 8,000 individuals, with associated ages.  Methylation is recorded as a number between 0 and 1 for each Cytosine, indicating the proportion of that site that is methylated.  He scanned the entire genome for sites that changed most with age, and varied least from one tissue type to another.  In this way, he identified 353 sites, and optimized a set of 353 multipliers, such that multiplying levels of methylation at each site by each multiplier and adding the products produced a number that could be mapped onto chronological age.  About 45% of the multipliers are negative (sites losing methylation with age) and 55% positive (gaining methylation).

The original Horvath clock correlates 0.95 with chronological age.  The standard error in predicting any one individual’s age is + 4 years.  Averages of N individuals increase the accuracy of the clock by √N, so that the average of 100 individuals is accurate to 0.4 years.  (This is a general statistical principle that is useful to remember.)  For our purposes, the relevant question is: measuring the same individual at two different times, how accurate is the difference in Horvath age compared to the elapsed time?  There is no data on this yet, but we might safely assume that it is well under 4 years, since standard error of 4 years represents mostly individual departures from the average.

Five years after Horvath’s original publication, there are several other clocks based on methylation.  Just this spring, Horvath has developed a new clock, not yet published, which, to my knowledge, is the best standard we have.  This is the Levine/Horvath clock.  It is based on 513 methylation sites and it is calibrated not to chronological age, but to a tighter measure of age-based health, derived from blood lipid profies, inflammatory markers, insulin resistance, etc, which Horvath calls “phenotypic age”.  Consequently, it is less well correlated with chronological age than the original, but it is better able to predict mortality than either the classic Horvath clock or chronological age itself. By this measure, the scatter has been greatly reduced.


There is statistical evidence that the Levine clock reliably reports phenotypic age, and there is theoretical reason to believe that what the clock measures is close to the root cause of aging.

5. Statistical evidence that the Levine Clock=PhenoAge reliably measures biological age

What I find most convincing is the meta-analysis based on historic data.  Levine and Horvath use old, frozen blood samples to calculate a Horvath and Levine Ages as it was at some past date.  These are people who have died since the blood was drawn, and Horvath Age accurately “predicts” the remaining life expectancy of the subjects.  [Chen, Aging 2016].  There is less data available for the new Levine clock, but strong indications it performs much better than the Horvath clock for this purpose.

In addition, many of the life styles that promote long life have been confirmed to slow the Levine clock, while, conversely, obesity and high blood pressure and insulin resistance have been found to accelerate aging as measured by the Levine clock.

  • Epigenetic age correlates with progression of Alzheimer’s and Parkinson’s Disease [Levine 2016]
  • Same for Arthritis [Horvath 2015]
  • Menopause moves the methylation clock forward.  Early menopause is associated with accelerated methylation aging, and late menopause with younger methylation age.
  • Epigenetic age is accelerated by obesity, blood sugar, insulin, and inflammation
  • Epigenetic age is retarded by carotenoids*, exercise, education (!), and by diets high in vegetables, fruits and nuts.
  • Stem cell transplants lower epigenetic age more dramatically than anything (from a study of leukemia patients [Stolzel 2017]).  Epigenetic age is set back ~8 years for a short period, but then accelerates to a set-forward a few years after treatment.

6. Theoretical foundation of the Horvath Clock

The original Horvath clock was developed by a statistical process that took into account only chronological age.  But Horvath age turns out to be a better predictor than chronological age for risk of all the diseases of old age.  This is powerful evidence that methylation is measuring something fundamental about the aging process.  If an individual’s methylation age is higher or lower than his chronologial age, the difference is a powerful predictor of his disease risk and how long he will live.  This can only be true if methylation is associated with a fundamental cause of age-related decline.

An emerging theory the last 7 years is that aging procedes under epigenetic control.  De Magalhaes, Rando, Blagosklonny, Johnson and Mitteldorfall have independently proposed an epigenetic basis for aging.  The root cause of aging—the reason our bodies are different at age 70 compared to age 20—is that different sets of genes are expressed at different times of life.  This priniciple is already well-accepted for growth and development [ref, ref].  During formation of the body in utero, gene expression rapidly changes, and in early childhood, the growth and mathuration of the body are widely agreed to occur under epigenetic control. But now we know that much of the change in methylation is continuous, from development through aging [ref].  I call this programmed aging.  Blagosklonny hedges and says “quasi-programmed”.  The difference is about evolutionary purpose and whether function is related to natural selection.  My view is that we are programmed for a fixed lifespan for the stability of the community. Blagosklonny’s is that the epigenetic changes that start in development continue afterward through a kind of inertia because there isn’t enough natural selection to turn these changes around.  

But for the sake of the reliability of the methylation clock in evaluation of anti-aging interventions, these two perspectives converge: they both support the expectation that methylation age will be an excellent criterion for trying and judging new ideas and combinations of old ideas.

Parabiosis experiments support the idea that factors circulating in the blood have a deep effect on the age of the body.  This is indirect support for the epigenetic foundations of aging, because these blood factors come from gene expression in cells–especially but not exclusively endocrine cells.

7. Counter-arguments

A) ‘Epigenetic drift’ Many authors still write about changes in methylation during aging as “epigenetic drift”.  For those who cannot accept the idea that aging is programmed, it is much more palatable to imagine a loss of order in gene expression, a randomization of gene expression.  Indeed, this is true. It is part of the story that gene expression does become more random with age.  But it is also true that there are specific gene expression changes associated with aging–the methylation clock is based on such programmed changes.  

B) Perhaps gene expression changes are a response to damage, the body’s attempt to mitigate aging.  This is the suspicion that haunts the aging clock.  If this is the case, interventions that thwart the mitigation would come out looking like age reversal, but in fact they’d have the opposite effect, increasing risk of disease and mortality.  Support for this idea comes from the prejudice that says “the body would never purpposefully destroy itself.” But there is no evidence for this idea, and in fact many of the programmed changes have been shown to be detrimental.  For example, signals for inflammation are increased, DNA repair is slowed down, and the anti-oxidant metabolism is suppressed.

“DNA PhenoAge acceleration was found to be associated with increased activation of pro-inflammatory and interferon pathways and decreased activation of the transcriptional and translational machineries, the DNA damage response and nuclear mitochondrial signatures” [quote from Horvath 2018; footnote is to Levine 2018

C) Not all anti-aging interventions affect the Levine or Horvath clocks.  This is a substantial problem if it turns out that there are real anti-aging strategies that work, and yet the Levine clock won’t tell us that they work.  But we don’t really know this yet, because we don’t really know what works. “For example, within a 9-month follow-up period, the substantial weight loss resulting from bariatric surgery was not associated with a reduction in epigenetic age of human liver tissue samples”  [quote from Horvath 2018; footnote is to Horvath 2014]  To the extent we think that bariatric surgery is a legitimate anti-aging strategy, this is a problem


8. Improvements and adaptations of the Horvath clock

The “clocks” we’re talking about are really mathematical operations.  Given the output of a blood (or urine) test that reports what percentage of the DNA is methylated at each of hundreds of  thousands of different CpG sites, the “clock” is a computer program that distills this information down to a single number, the predicted age.

The Levine clock is a substantial improvement on the original Horvath clock, attained by calibrating it against health indicators and not just chronological age.  For prediction, it leaves its predecessors in the dust.

There are three more ways in which the methylation age test can be improved, and I have begun working with the Horvath lab to do the number crunching in support of these changes.

A) The original clock and all its successors have thus far been based on combining information from 353 different methylation sites in the simplest possible way.  They simply have 353 different multipliers. It is these 353 (positive and negative) numbers that have been optimized by the statisticians, so that each multiplier can be multiplied by each methylation, and the 353 products are added up to make a single number that indicates age.  My suggestion is to combine the 353 sites in a more flexible way. Some change rapidly during youth and then remain constant. Some change continually over a lifetime. Some don’t change much at all until aging sets in. There is no reason that all the 353 sites have to be treated the same way.  Using non-linear math that’s just a little more complicated, the 353 sites can be tracked in a way that corresponds to their peculiar lifetime trajectories. This will improve the clock’s accuracy for any application.

B) The clock might be specialized to the application of testing anti-aging effects on individual humans, i.e., comparing biological age for the same individual at two different times.  Some of the scatter in the plot of DNAmAge is due to variation from one individual to another, and some is due to other random factors that don’t depend on the individual. In the past, there was little data available for the same individual at two different times, but this is changing, and now it is feasible to separate the two kinds of scatter.  The clock can then be specialized to report age differences even more accurately.

C) Again, for the particular application proposed, there is no need for a clock that works generally on any age, from pre-birth to centennarian.  If all of the people in the study are between the ages of 50 and 70, then the clock might be specialized to be more accurate in this age range, at the expense of losing accuracy for younger and older subjects–who aren’t part of the study.  It may be worthwhile to take this idea even further and have four sub-specialized clocks, calibrated for ages 50-55, 55-60, 60-65 and 65-70.

In my brief experimentation with the data, I was able to raise the correlation from 95% to 96% using technique #1.  I’m guessing that with further work it can be raised to 98%. The reason that it pays to do this is that the cost of a human study depends on (A) how many people are studied and (B) how long a time they are followed.  As the scatter in the data is reduced by better statistical techniques, we can find out what we need to know with fewer subjects and a shorter study time. Raising the correlation from 96% to 98% will reduce the number of subjects needed for the experiment by a factor of 4.  Alternatively, for the same effort and expense, we wil be able to derive more information.

If we can indeed construct a clock with 98% accuracy, a new benefit will be available:  It will be accurate enough to distinguish changes for a single individual with no statistical averaging necessary.  This will be a gateway to individualized medicine.  There will always be treatments that work for some people but not others, and the future of medicine is connected to knowing what works for you as an individual.  Each of us will be able to use the methylation clock to know how we are doing. You can try a new supplement for a year and if it doesn’t work for you as an individual, you’ll know it and switch to trying something else next year.


*carotenoids are molecules related to vitamin A, but vitamin A itself does not slow the aging clock.

The Mother of All Clinical Trials, Part I

Part I: An Incipient Revolution in Epidemiology

There are a great number of promising interventions that might have anti-aging benefits, singly and in combination.  There is a testing bottleneck, which means that we don’t know what works. By way of contrast, there is a well-documented catalog of life extension interventions in lab worms, but for humans we’re mostly in the dark.  To complicate things further, lab worms are clonal populations, while every human is different, and there are growing indications that many if not most medications work for some people and not others.

Horvath’s methylation clock is a disruptive technology that could make human testing of longevity interventions ten times faster and 100 times cheaper than it has been in the past.  No one is yet doing this kind of testing, but you and I should be advocating vigorously, and volunteering as subjects to help test whatever it is that we are already doing.

Let me begin with the punchline, and work backward to build a foundation under the idea.  I think we might learn a great deal and push the science of anti-aging medicine forward with a study encompassing about 10,000 people like you and me—people who are aware of the long-term consequences of their diet, exercise, supplements, and medications—10,000 people who are trying different combinations of things in a conscious effort to maintain long-term health and extend their lives.  We need a standard form for recording our individual habits and a standard measure of progress. Subjects will be required to

  • keep diaries of what they are doing for long-term health  (It would be helpful but not necessary that they keep to the same program for a year or two.)
  • send in blood or urine samples at the beginning and end of a year for methylation testing
  • sign up for a database so all their records can compiled

Given a database like this, multivariate statistical techniques can, in principle, separate the effects of different interventions individually, and also their interactions.

The idea is only as good as the Horvath clock.  Can we detect differences in aging rate over a time period as short as a year or two?  And how sure are we that the Horvath clock really captures the differences that affect aging and long-term health?  That’s what next week’s article will be about.

The present cost of methylation testing is several hundred dollars, but a funder would only have to put up a fraction of that. The rest would be covered by participants themselves, and Zymo Research, the only company offering commercial testing of methylation age, would offer bulk discounts because they are investing in their future, and because their costs are likely to drop with volume.

So far, I’ve talked about this with Steve Horvath of UCLA, Brian Delaney of LEF, Larry Jia of Zymo, and Elissa Epel of UCSF.  All are enthusiastic about the idea. Though none is yet convinced to throw resources into the project, I believe that this trial or something much like it will begin within a year, as scientists and funders have a chance to recognize its potential and rearrange their plans appropriately.  (I will also approach Aubrey de Grey at SENS, but their primary commitment is to a different model, developing new interventions rather than testing what we have already.)



For several years I’ve been talking to anyone who will listen about the importance of testing.  (Here’s a 2015  link, and here’s an update from two months ago.)  Aging clocks based on DNA methylation are a disruptive technology which will change the way we screen putative longevity treatments.  We now have the potential to learn in a very few years what works and what does not.

There are a great number of promising interventions that might have anti-aging benefits, singly and in combination.  Some are already approved and safe for use in humans, yet we don’t know what will be most effective. Because human longevity studies are prohibitively slow and expensive, none have ever been funded or conducted.  (We know only accidentally that aspirin and metformin lower mortality rates in humans, because these drugs were prescribed to tens of millions of people beginning in the 1960s for cardiovascular disease and diabetes, respectively, with no premonition that they might extend lifespan.)

We have relied on animal tests, biochemical theory, and guesswork because testing in humans has been impractical.  Epidemiological studies require treating a very large population and following them over a course of decades. Even very substantial difference is mortality rates can be difficult to detect because the baseline mortality rate is low, because researchers inevitably lose track of some subjects over such long time scales, and because there are so many confounding variables that must be overcome with sheer numbers.

Testing of anti-aging interventions in humans has been so expensive and slow that we have been forced to make inferences from animal tests, supplemented by historic (human) data from drugs that happen to have a large user base going back decades.  As it turns out, it is much easier to extend lifespan in worms than in mammals, and even the interventions that work in rodents don’t always work in humans. Conversely, there are drugs that work in humans that don’t work in mice—how are we to find them?

We know so much more about life extension in C. elegans worms than in people because worms live only a few weeks, are easily cloned, and can be grown by the thousands in standard laboratory conditions.  Humans are not so easily controlled, they can’t be genetically engineered or cloned, and their lives can’t be manipulated in the interest of science.  It takes decades to document the long-term effects of dietary changes, drugs, supplements and exercise routines, and it generally requires thousands of people to separate the effects of one particular intervention from all the differences in genetics and lifestyle that distinguish human individuals.

Just this year, a test is available that is accurate enough to measure anti-aging benefits on short time scales, without waiting for subjects to die.  DNAm PhenoAge is a simple blood test developed at the UCLA lab of Steve Horvath.  It determines risk of age-related mortality accurate to about 1 year of biological age.  Averaging over just a hundred people pinpoints biological age with accuracy of one month.  This implies that an anti-aging benefit can be detected with high reliability using a test population of just a few hundred people, followed for two years, tested at the beginning and end of this period.  A study that might have required fifteen years and cost hundreds of millions of dollars can now be completed in two years at a cost of less than $1 million. When this new technology is embraced, we will have the means to separate the most effective treatment combinations from a large field of contenders.

1. Testing is Important

We have a program in basic science that will eventually lead to understanding of aging at a molecular level.  This will suggest molecular interventions that can alter the course of aging. This approach is a sure bet, and it will yield a great deal of interesting science and clinical applications along the way.  The drawback is that it is slow.  At least several decades will be required to understand aging from the system level down to the molecular level.  What can be done to accelerate progress toward substantial anti-aging remedies?

You might think that the bottleneck is in ideas.  What we need is a disruptive idea. Something like CRISPR or the Yamanaka factors, or maybe some engineered molecule that leaves rapamycin in the dust.

I don’t think so.  How would we recognize this great idea if we saw it?  If it were rather conventional, it’s unlikely it would produce revolutionary results.  On the other hand, if the idea were profoundly different and innovative, why would we believe in it without extensive testing?  And who would pay for the testing?

I believe that testing is really the bottleneck here.  We may well have our powerful anti-aging tonic already in hand, and we don’t know it.  And if the breakthrough is yet to come, we will need a way to recognize that it works.

Two years ago, I proposed that the best promise is in combinations of known therapies.

The listed interventions all have been shown to extend lifespan in rats or mice.

We know what they do individually, but we don’t know how they interact among themselves.  In reality, of course, we’ll never see that 172% life extension.  Almost all interactions are expected to be redundancy—in other words A and B together are a marginal improvement over either A or B separately.  But occasionally, we will discover that A and B synergize. A and B administered together yield life extension greater than the sum of what is available from each of them separately.

But there are an enormous number of combinations to test.  How are we going to find those combinations that synergize together?

2. Testing in humans is slow and expensive

It’s not just because humans require a level of care and safety that you don’t worry about in animal tests.  It’s the length of the human lifespan.

If you’re studying an old drug like metformin or aspirin, then you have a database of people who have used it for decades, and you can look for small differences in their rates of disease or mortality.

But suppose you want to try a new remedy, or a new combination of remedies?  Typically, you would choose several thousand people as a test group. You need to wait for a substantial number of them to die, so you want to start as late as possible.  On the other hand, it’s easier to maintain the health of a younger person than to restore the health of an older person, so you want subjects as young as possible. So perhaps you compromise with an age around 50 or 60.

Then you administer the drug or combination of drugs in half the subjects and a placebo in the other half.  You follow them for a decade and monitor compliance. How many of them are still taking your placebo 10 years later?  Out of a sample of 1,000 sixty-year-olds, you expect 120 of them to die before their 70th birthday. Now suppose you had an intervention that would cut the death rate by 10%, so only 108 of them died.  The trouble is that statistically, you can’t tell the difference between 108 and 120.  The random fluctuations will overwhelm this difference.

How large a sample would you have to start with in order to detect that difference with 95% confidence? For N=6,000 tests + 6,000 controls, you would detect a 10% difference with 95% confidence half the time.  If you wanted to be 90% sure that your results would be statistically significant, you would need 15,000 test subjects and 15,000 controls, tracked for 10 years.  The cost would be in the hundreds of millions  of dollars.

Another way to think about the same example:  Imagine that the treatment you are testing does not immediately lower the mortality rate, but it slows the rate of aging by 20%.  The result is about the same—a 10% lower mortality over 10 years.

In New York’s Einstein School of Medicine, Nir Barzilai is organizing the first ever clinical trial of an anti-aging drug.  Metformin is the drug he chose, based on lower rates of all-cause mortality, cancer, and Alzheimer’s disease among people who have been prescribed metformin to control diabetes.  The risk of Alzheimer’s between age 60 and 80 is about 10%. Data from people taking metformin suggest this could reduce this to 7% [Knowler, 2002].  Barzilai is still trying to fund this study with about $50 million.  For that, the TAME study hopes to recruit 3,000 subjects (1500+1500) [Sciencemag 2015].  What is the probability that they will have results significant at the (p<0.05) level?  Answer: 83%. You may think that’s pretty good. Or you may be horrified that he could spend $50 million and there’s a 1 in 7 chance that, just because of dumb luck, the trial wouldn’t produce significant results.  There’s a footnote in the 83% number: The 31% Alzheimer’s risk reduction comes from a study of younger people, but Barzilai is planning to recruit subjects from 65 to 79 years old because the rate of AD is higher.

3. Suppose we could accurately measure effects on aging without having to wait…

What’s the alternative?  I’m so glad you asked. Suppose we could actually measure aging.  We don’t have to wait for someone to die or be diagnosed with dementia.  We can do a blood test instead and determine that “this subject has aged 1.5 years”  or “this subject has been rejuvenated by 0.5 years”.

To reimagine the TAME protocol with an aging clock, we need to add an assumption about what the effect of metformin might be on the Horvath clock (or successor).  From reduction in mortality combined with an actuarial table, we might infer an age setback. Lamanna 2010 reports a OR=0.80.  Facila 2017 report OR=1.34/2.24=0.60.   Bannister 2014 reports OR=0.85 when comparing diabetics on metformin with non-diabetics (yes — metformin in some studies reduces mortality for diabetics lower than it would have been if they didn’t have diabetes in the first place).  The logarithmic increase in mortality for a 60-year-old is about 0.075, corresponding to a range for actuarial setback of 2 to 7 years for long-term metformin use.

Let’s say the experiment lasts 2 years and after 2 years on metformin, the subjects might have aged only 1¾  years. Very conservative, I think. Compared to 3000 subjects over 10 years, You could get equivalent results from a Horvath clock over 2 years time with 200 subjects.  The total cost of the study could be reduced from $50 million to less than $1 million.

These probablities are not difficult to compute, but their inputs are very uncertain.  We don’t know how much scatter there will be in the difference between two Horvath clock readings when repeated for the same person.  I’ve assumed 1.414 years. It coud well be better. We don’t know whether metformin will slow the epigenetic clock, and by how much. It may be that we will get that same 3-months benefit in one year instead of two.

The Bottom Line

Numbers are my thing, and I’m sorry if I’ve left your head spinning.  The take-home is that by switching from traditional epidemiological studies of mortality to the Horvath clock, we can get the same information five times faster and 100 times cheaper. 

For example: Barzilai’s TAME study is projected to cost $50 million, it will take 10 years, and it will teach us the benefits of just one drug. The study I’m proposing will cost less than $10 million and most of this will be covered by Zymo (as discounts) and by subjects themselves. It will take only 2 years, and we will learn about a dozen different interventions and their interactions.

Next week, Part II: Reasons to think that the Horvath Clock will be up to this task