At Ending Age-Related Diseases 2021, Daniel Belsky discussed the DunedinPACE and DunedinPoAm biomarkers and how biomarkers that measure the pace of aging might prove useful in the development of geroprotectors.
Thanks everyone for sticking it out for this talk. I’m very pleased to join you and be able to share some of the work we’ve been after for the last couple of years. I’m going to talk, in the beginning, a little bit about how we think about aging and how we went about measuring it, and then I’m going to introduce this new DNA methylation-based measure of aging we’ve been working on and show you some data from the CALERIE randomized trial, which is the first-ever human RCT of long-term calorie restriction in healthy, non-obese adults.
There’s a disclosure here. You’ll hear more about that in the next talk. I want to begin with acknowledgement to the funders and collaborators that have made all this work possible, especially Terrie Moffitt, Avshalom Caspi, and their group at Duke University. They’ve been my great mentors and collaborators these past many years.
Also David Corcoran and Karen Sugden, who’ve worked quite closely with me on many projects, especially development of these measurements that I’ll tell you about, and Richie Poulton, who directs the Dunedin study at the University of Otago that provided much of the data that went into the development of this measure.
Other folks on the slide are Andrea Baccarelli and Zhu Gao at Columbia University and Peking University, respectively, Michael Kobor and David Lin at the University of British Columbia, and then my collaborators on the CALERIE trial work, Bill Kraus, Virginia Kraus, and Kim Huffman, who is leading the team at Duke.
You’ve seen a lot of stuff over the past couple of days of this conference having to do with geroscience. I’m not gonna belabor that point, but I think it is useful orientation at the outset of talks like this one just to describe what we think is going on. This cartoon model essentially illustrates this idea that an accumulation of molecular changes, sometimes called hallmarks or pillars of aging, feeds forward to drive decline in the integrity of our many different organ systems, ultimately resulting in a decline in functional capacity, the onset of chronic disease, disability, and ultimately death.
Critically, the hypothesis in geroscience is that intervention to slow or reverse these molecular changes can forestall declines in system integrity and thereby delay or prevent functional decline, disease, disability, mortality, etc, etc, etc. The reason that’s hard, among others, since we do, in fact, have a number of geroprotective therapies that work to slow aging and extend healthy lifespan in C. elegans worms, in C57 Black 6 mice, in Drosophila, is that translating those measures to humans requires randomized trials, which would have to run for decades to detect effects of a mid-life intervention on prevention or delay of aging-related diseases.
It takes a very long time to test effects on healthspan. As a result, many of us have been working for a long time to develop surrogates for this healthspan construct, this idea of the years of life you get to live in good health that could be measured over a matter of years and would enable randomized clinical trials to test effects of some of these interventions. Broadly, these efforts to develop surrogate metrics for healthspan have focused on measuring something called biological age, which we might think of as the underlying biological substance of the process of aging, which ultimately disables and kills us.
This figure from a Nature Reviews molecular biology article from seven years back illustrates the general concept of quantification of biological age, which is that we build a model in some reference dataset or sample that describes the typical biological or physiological states of individuals at different points in the lifespan.
We can then take our research participants and benchmark them against this reference population to say, you may be 25, but you have the physiology of the average, in our reference dataset, 30-year-old, so you’re five years older biologically than you are chronologically and vice versa.
I’m sure you’ve heard a great deal about this so far today. Feel free to ask questions at the end if the idea is unclear, but ultimately, biological age is a summary of the extent to which the biological process of aging has unfolded in a participant’s body up to the time that we measure it and describes how much older or younger they are than we think they should be based on our model.
Now, there are some limitations to this biological age construct that I want to outline as preliminary arguments for the development of the measurements that we’ve been working on and that we were seeking to address. These are, briefly, mortality selection, cohort effects, and uncertain timing of the exposures that drive differences between biological age and chronological age. I’ll say a few additional words about each of these in turn.
Mortality selection, or sometimes called survival bias, depending on your discipline, ultimately represents the idea that older and younger individuals are drawn from different populations. Young people have experienced very little aging-related mortality, [but a cohort of older people does not include many less healthy] individuals born in the same year, as those folks have already died.
As a consequence, our models of biological age that are developed from comparisons of older and younger people may underestimate the true aging signal because we’re essentially comparing a true population average drawn from people in their 20s and 30s to this slower-ager average represented by people who we’re observing in their 70s and 80s.
A second limitation of the approaches to develop biological age is the presence of cohort effects. These are differences between individuals born in different years in exposure history. We might imagine that in our studies, when we’re looking at folks who were born in the 1920s and 1930s, and comparing them to people who were born in the 70s, 80s, 90s, and now the 2000s, there are dramatic differences in their exposure to undernutrition, to pathogens both in utero and postnatally, to a range of environmental toxicants, as well as certain health behaviors that have become dramatically less prevalent over time. In particular, we might think about tobacco use.
Finally, this question of uncertain timing, which has been brought into focus by recent work in the Gladyshev lab, illustrating the extent to which the very earliest phases of development can drive significant changes in molecular assessments of biological age. Ultimately, this question of uncertain timing revolves around the idea that when we measure a participant’s biological age at some point in adulthood, that measurement can tell us whether they look older or younger than the comparison individual in the control group, but it won’t tell us when that difference arose in development.
To the extent that those differences represent very early established differences in their biology, they may be ultimately less sensitive to the kinds of geroprotective interventions we’re seeking to deliver, particularly when those interventions are modifications to behavior, lifestyle, or some other intermediary.
The current measurements that we have, particularly at the molecular level of biological age, are a family of measurements known as DNA methylation clocks, and on the slide here is Steve Horvath, who I think you heard from earlier today. Back in 2013, he changed the world in terms of quantification of biological aging with his introduction of the multi-tissue DNA methylation clock.
For those of you, for whatever reason, are just joining us now, these clocks are algorithms that combine information about DNA methylation states of loci across the human genome to infer the biological age of the individual, again, based on a model developed in a reference dataset. For those of you not familiar with DNA methylation, you can see the little cartoon on the slide. These are chemical tags on the nucleotide sequence of our genomes, and in classical genome biology, they act to silence the expression of genes.
In fact, we know that DNA methylation marks are gained and lost with the process of aging in such a regular way that these algorithms, these epigenetic clocks, can predict the chronological age of an individual with stunning accuracy. It’s on the basis of that initial observation that they’re very good at predicting age that Steve Horvath and his colleagues developed this epigenetic theory of aging, proposing that those whose clock ages were older than their chronological ages were, in fact, biologically older and that the reverse was also true.
The development of these DNA methylation clocks has gone through two phases so far. The first generation of clocks were based simply on models of differences between people who were younger and older in terms of their chronological age.
A more recent iteration in their development has focused on differences in mortality risk, and filtered often through measurements of participants’ physiologies, so the PhenoAge clock and the GrimAge clock, about which I imagine you’ve heard a great deal so far at this meeting, were developed by modeling age and physiological differences from DNA methylation, and then, in turn, fitting that to a mortality risk model.
The second generation of clocks are far superior at predicting morbidity and mortality, but they remain subject to those core limitations of biological age measures that I articulated previously. Mortality selection is still a significant problem somewhat mitigated in the second-generation clocks, which model mortality risk differences rather than chronological age differences. The cohort effects and uncertain timing issues remain profound.
What I’m going to share with you now is the theory of our measurement approach, which we first published back in 2015, and that’s Terrie Moffitt and Avshalom Caspi at the bottom of the slide, with whom this work was developed and conducted, is that aging is characterized by a gradual and progressive decline in system integrity. We can, therefore, infer the underlying rate of biological aging from observations of decline in integrity occurring across multiple organ systems.
Changes in any individual system could reflect a specific disease process or injury, but changes occurring across multiple systems are more likely to reflect that underlying process of aging, which we know drives cross-system vulnerability. Finally, that process of decline should be observable already by young adulthood, even in people who are still decades from typical age of onset of the range of chronic diseases that are the classical signs of advancing biological aging.
Very briefly, the way we went about designing this measure was to compile assessments of multiple different organ systems within a cohort of individuals who were all born in the same year. These are participants in the Dunedin Longitudinal Study, a birth cohort consisting of all the babies born in the Queen Mary Hospital in Dunedin, New Zealand between 1972 and 1973.
At the time we conducted this study, they’ve been followed up through the 38th birthday. At each of three assessments, at ages 26 years, 32 years and 38 years, they were brought in to the laboratory at the University of Otago, and a range of measurements were taken of their lung function, of their cardiovascular health, bloods were drawn, and DNA was assayed for telomere length, among other things.
What you can see in the middle figure on this slide is that we observe, even in these young, healthy individuals, broad, consistent changes across these many indicators in a direction associated with aging. For some of these measurements, like lung function or cardiorespiratory fitness that are known to decline with age, we reverse coded them for the figures. An upward slope indicates the direction of age-related change we expect.
Finally, we model the slopes of change in each of these biomarkers to estimate, for each participant, how fast their body was changing on each marker. We composited those estimates of change to form a single index, which we called Pace of Aging. It’s scaled so that a value of 1 corresponds to the expected biological change occurring in an individual over 12 calendar months. The histogram shows you that in this cohort, some people were aging slow, some people were aging fast, most of them were aging about as we expect.
We, of course, couldn’t establish that these differences forecasted differences in mortality because the participants were still early in mid-life at the time the assessment was conducted, but we were able to test whether people whose bodies were changing more rapidly showed deficits in physical and cognitive functions as well as what we called subjective signs of aging: participants’ assessments of their own health, and then assessments made in this case by Duke University undergraduates of how old their faces looked. There have been subsequent studies published by Max Eliott illustrating that not only are the faster agers showing deficits in physical and cognitive functions, but, in fact, they show signs of accelerated aging of their brains based on MRI data.
We also performed an analysis that I’m calling a backward validation, and one that I think has gained some traction in the field; we’re seeing more of it. Here, what we’re doing is we’re simply testing, are people who we already know to be at high risk for early onset of aging-related disease and shorter healthy lifespan showing signs of faster aging by this measurement?
What each of these cells of this graph is showing you is that each of these early-life risk factors is forecasting either a faster or slower pace of aging. People with long-lived grandparents are aging more slowly. People who grow up in wealthy households tend to age more slowly. Those who have more adverse childhood experiences are aging more rapidly.
You didn’t come for the story about Pace of Aging, which is by now a little old, you’re here perhaps to hear about the molecular translation of this measure because Pace of Aging, as I’ve just described it, requires many different types of assessments conducted in parallel at repeated intervals over a relatively long expanse of time. In that case, a little over a decade.
What we want is a test that we can administer at a single point in time that will tell us how fast a person is aging. We use the cartoon image of a speedometer, but in fact, what we’re using this test as is a radar gun. In any case, we published our first molecular translation of this measure last year; we named it DunedinPoAm for Dunedin Pace of Aging from DNA methylation.
In our eLife article, we report that we can perform this distillation of our 3-time .18 indicator, a measure of physiological change into a single time point DNA methylation blood test, and then we can take that blood test and apply it in a range of different studies and show that it does things like predict morbidity and mortality, record accelerated aging in people exposed to early-life adversity, and so forth.
What I want to spend the rest of my time on here is addressing our next iteration in the development of this measurement. The original DunedinPoAm measure was good proof of concept, but it suffered from two limitations. One was that the pace of aging on which it was based had a limited life course follow up: just three time points of measurement over only 12 years spanning this kind of young adulthood to midlife transition, so before processes of aging are perhaps accelerating in the way we expect them to as folks grow older.
A second limitation that has recently been pointed out in some work by my colleague Karen Sugden, as well as by Morgan Levine’s group at Yale, is that this measurement, like the other DNA methylation clocks, has moderate test-retest reliability. If you’re familiar with what an intraclass correlation coefficient is, the value is around .8. That’s a problem if we want to use a measurement like this in a randomized trial to test how treatment changes a person’s value of this measurement, from pre-treatment baseline to post-treatment follow up.
What I’m gonna tell you about now is this work to develop the second-generation, DNA methylation Pace of Aging, which we call DunedinPACE, and in this case, the acronym is Pace of Aging Calculated from the Epigenome. There are two significant developments in this further iteration of the measure.
The first, of course, is that it represents an extended follow-up of the pace of aging, now recording 20 years of physiological change across four time points of measurement. You can read more about the details of that assessment in Max’s paper in Nature Aging from earlier this year.
The second is that based on some work that Karen Sugden did, and is published in Cell Patterns a couple of years ago, we conducted the analysis to distill the two-decade Pace of Aging into a DNA methylation blood test, using only a subset of DNA methylation probes included on both the Illumina 450K and EPIC arrays. Specifically, we screened the probes that were on both of the arrays to select the subset that showed relatively higher test-retest reliability, in this case, ICCs of .4 with the threshold used. It’s a little under 100,000 probes, including the training analysis, and that yielded an algorithm consisting of 173 CPG sites, distinct methylation locations on the genome, to produce this DunedinPACE algorithm.
Just to begin with, that approach of using the subset of probes that Karen had identified to be reliable, proved to be fairly effective in boosting the test-retest reliability of this measurement. You can see a scatterplot of 36 replicates on the left-hand side of the figure. On the right-hand side of the figure, I’m plotting ICCs for this DunedinPACE measurement, the original DunedinPoAm, and then these four benchmark DNA methylation clocks in each of three datasets: two small datasets, comparing 450K assays and EPIC assays and then a larger data set comparing across 450K and EPIC assays.
All of them, you can see that the DunedinPACE algorithm substantially outperforms the original DunedinPoAm as well as the other clocks, although the GrimAge has a higher test/retest reliability in the within-array comparisons.
I’m going to share now with you a couple of validation observations of the DunedinPACE measure. The first is that DunedinPACE records faster pace of aging in both older as compared to younger people in chronological terms as well as older as compared to younger people, according to certain biological age metrics. These are chronological age data from the Understanding Society cohort. What you’re seeing here is a correlation of about .3 between DunedinPACE and chronological age. Again, this is not a clock, so we’re not trying to predict chronological age here.
The data are illustrating a phenomenon well established in demography that older people appear to be aging biologically more rapidly than are younger people. In demography, the metric they use is the mortality rate, which accelerates with advancing chronological age. This isn’t something that’s been able to be studied using existing clock technology, because by construction, those measures are uncorrelated with chronological age. Because DunedinPACE is developed in a different way, we can use it to test the hypothesis that older people are, in fact, aging faster than younger people, and, here, the data suggests that they are.
We also see that people who are measured to be biologically older than their chronological age using the different DNA methylation clocks also exhibit a faster pace of aging, and that’s particularly true for the clock most predictive of morbidity and mortality, the GrimAge. The DunedinPACE, again, summarizes change occurring over the recent past. This is a measurement about how fast the person’s body has been changing over the past 10 or 20 years, perhaps, but certainly aims to be the rate of aging at the time of assessment.
What we see is that measurement of the rate of aging provides about as much information for prediction of morbidity and mortality, as compared to these benchmark epigenetic clocks, which summarize all the aging that has occurred across the life course. These are mortality effect sizes on the right-hand side of the slide from the Normative Aging Study. These are 771 older men drawn from a Veterans Affairs study, and then the Framingham Heart Study offspring cohort on the right-hand side of the slide. For those of you who prefer to see survival plots, or Kaplan Meier curves, on the left hand side of the slide for slow, average, and fast agers.
Here’s the morbidity data, we’re looking at incident chronic disease on the left and prevalent chronic disease on the right across about a dozen years of follow-up in the Normative Aging Study. You can see DunedinPACE, the gold circle with the orangey outline, is performing as well as GrimAge and better than the other clocks. I think what is is most interesting is how these measurements perform in the setting of a randomized controlled trial aiming to test the geroprotective effect of an intervention.
What I’m going to conclude with here is data from the CALERIE trial. This is a trial run by the National Institute on Aging between 2007-2009 in which 220 non-obese healthy adults were randomized to either 25% calorie restriction or an ad libitem (AL) control diet for 24 months. We have measures of their blood from an assay of DNA methylation at pretreatment baseline, at a 12-month follow up, and at a 24-month follow up, and we have data for about 200 of the 220 folks in the trial. It’s worth noting that although CALERIE prescribed 25% CR, the achieved adherence through the second year of follow-up was only about half that.
This is what the data look like for our first- and second-generation clocks and for the Pace of Aging. What I’m graphing here is, on the Y axis, change from baseline. The blue line charts change from baseline in the control group, and the red line charts change from baseline in the calorie restriction treatment group.
What you can see here is that in the case of the first-generation clocks and the PhenoAge clock, there isn’t a difference. In fact, to the extent that there is some difference, the treated participants are actually experiencing a faster rate of epigenetic change than are the control participants, whereas in contrast, the GrimAge clock shows the expected pattern in which the increase is slower in the treatment group as compared to the control group.
In the case of our Pace of Aging measures, what we see is, we have an expectation that the pace of aging won’t change in the ad libitum control group. That’s roughly what we see, although perhaps there’s some acceleration, whereas there’s a clear deceleration of the pace of aging in the CR treatment group on both the original DunedinPoAm and the DunedinPACE measures.
You can see this a little bit more clearly. I’ll show you on the next slide, these are standardized treatment effects in terms of Cohen’s D, so the measures are scaled by their standard deviation at baseline. In the case of the epigenetic clocks, we’ve scaled them by the age acceleration, standard deviation.
What you see here is that the GrimAge and the DunedinPoAm and DunedinPACE algorithms are recording a slower aging in the CR treatment group; that is, those plotted points fall below the zero line in contrast to the other clocks, but it’s really the DunedinPACE measure that proves to be most sensitive to the effects of treatment, showing consistent and statistically significant treatment effects. That begs the question of why should this measure perform so differently from the others.
One answer is it does have clearly superior test-retest reliability to the DunedinPoAm algorithm. These analyses of change depend very heavily on that reliability statistic. Relative to the GrimAge clock, which itself has a higher test-retest reliability, I think there may be something to this idea that focusing measurement on the pace of aging rather than the total progress of aging can yield a more sensitive metric for detecting effects of interventions, although that’s something that will ultimately have to be established in additional trials.
I’ll conclude there with just these final observations. We have our new tool for geroscience research and epidemiologic studies in RCTs. This work represents a partnership between measurement developers, cohort study researchers, and interventionists that I think represents the most productive path forward to rigorous science with real translational value.
That is caveated, with, even though we’re bringing together these diverse groups of researchers, we’re still, unfortunately, studying cohorts that are mostly white. They’re socioeconomically diverse in the case of the Dunedin study, but because Dunedin, New Zealand is a mostly white place, the cohort itself is almost exclusively of white, European ethnicity.
Increasing the diversity in the samples we use to develop and evaluate our measurements as well as the samples we use to test our interventions, I think is a critical priority for geroscience that delivers health equity as well as increased healthspan. With that, I’ll say thank you and conclude.