Abstract Alzheimer’s disease, the most common form of dementia, is characterized by the emergence and spread of senile plaques and neurofibrillary tangles, causing widespread neurodegeneration. Though the progression of Alzheimer’s disease is considered to be stereotyped, the significant variability within clinical populations obscures this interpretation on the individual level. Of particular clinical importance is understanding where exactly pathology, e.g. tau, emerges in each patient and how the incipient atrophy pattern relates to future spread of disease. Here we demonstrate a newly developed graph theoretical method of inferring prior disease states in patients with Alzheimer’s disease and mild cognitive impairment using an established network diffusion model and an L1-penalized optimization algorithm. Although the ‘seeds’ of origin using our inference method successfully reproduce known trends in Alzheimer’s disease staging on a population level, we observed that the high degree of heterogeneity between patients at baseline is also reflected in their seeds. Additionally, the individualized seeds are significantly more predictive of future atrophy than a single seed placed at the hippocampus. Our findings illustrate that understanding where disease originates in individuals is critical to determining how it progresses and that our method allows us to infer early stages of disease from atrophy patterns observed at diagnosis. Alzheimer’s disease, brain atrophy, mild cognitive impairment, neuropathology, tractography Introduction Alzheimer’s disease-associated atrophy, tau and amyloid pathology and metabolic load all display highly stereotyped progression into brain circuits. Tau in particular is associated with regional atrophy, hypometabolism and cognitive decline (Forman et al., 2002; Reitz et al., 2009; Attems et al., 2012). Tau tangles appear first in locus coeruleus, then entorhinal cortex, followed by an orderly spread into the hippocampus, amygdala, temporal lobe, basal forebrain, and isocortical association areas (Braak and Braak, 1991). Several emerging lines of evidence suggest that these patterns of progression might occur via white matter fibre connections, most likely involving trans-synaptic transmission of toxic proteins along neuronal pathways (Clavaguera et al., 2009; Frost and Diamond, 2010; Jucker and Walker, 2013; Iba et al., 2015). Indeed, toxic tau protein induces misfolding in connecting neurons and slowly ramifies across widespread brain circuits. Despite these advances in our understanding of the pathophysiology of disease transmission and progression, many key questions relevant to Alzheimer’s disease remain unanswered: where does pathology and/or atrophy begin in a patient, and what is the relationship of the inception region to the eventual pattern of disease that develops in maturing stages? Although Alzheimer’s disease progression is considered stereotyped, in clinical populations a considerable variability exists around the static pattern. This question therefore assumes clinical relevance due to the large intersubject heterogeneity in the site and timing of disease onset and subsequent progression observed in patients. In this paper we present a network-based algorithm to infer the likely sites of early Alzheimer’s disease, which we assume related to tau pathology seeding, in Alzheimer’s disease-spectrum patients at all stages of disease. The basis of our approach is that trans-synaptic transmission of misfolded proteins along neuronal pathways can be deterministically modelled using graph theoretic models of network spread. Recently we published a mathematical ‘Network Diffusion’ model (NDM) that was shown to successfully recapitulate patterns of regional brain atrophy and metabolism in Alzheimer’s disease on cross-sectional Alzheimer’s disease imaging data (Raj et al., 2012). This dynamical model represents trans-synaptic protein transmission as a first-order, diffusive process. As it is conditional only on knowing the anatomic connectivity organization of the brain, the NDM can predict a patient’s longitudinal progression using only their empirically observed baseline pattern (Raj et al., 2012, 2015). Related studies and models were also successful in predicting the empirical patterns of Alzheimer’s disease (Iturria-Medina et al., 2014), lending further support to the applicability of networked spread to protein transmission through the brain. Therefore, using the NDM as the forward model, we propose an inverse inference algorithm involving objective function minimization that gives the most likely estimate of the seed regions from which that patient’s ongoing progression of longitudinal atrophy can be expected to proceed. The cost function is designed to optimize two constraints: (i) that the seed pattern, when extrapolated into future time points using the (forward) network diffusion model, must resemble the observed pattern of atrophy in the patient; and (ii) that the seed pattern must be sparse, so as to give the smallest set of regions that are necessary to explain the data. The former constraint is encoded by a cost term that is the L2-norm of the discrepancy between the forward model prediction and empirical pattern, and the latter constraint is imposed via a L1-norm penalty function. The latter constraint is needed because we expect Alzheimer’s disease seeding to involve only a few regions of early vulnerability. Without this constraint, we would get the trivial result that the seed configuration is identical to the baseline pattern. The cost function involves fitting the forward model to both baseline regional atrophy of a patient as well as its longitudinal change, in order to benefit from longitudinal atrophy data wherever available. Although the model is based on pathology spread, here we apply it to model patterns of atrophy, since (tau) pathology distribution is closely related to atrophy distribution, and the latter may be considered a direct consequence of the former (Arriagada et al., 1992; Nelson et al., 2012; Xia et al., 2017). Like the previous model, we continue to use a static healthy connectome, under the assumption that the structural network serves merely as a conduit for transmitting proteinopathies, rather than being the first to be impaired itself. We applied the algorithm to a large number of Alzheimer’s disease-spectrum subjects obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) public database (adni.loni.usc.edu). Our total sample size was 421 patients diagnosed with either Alzheimer’s disease, late mild cognitive impairment (MCI), or early MCI, along with 175 age- and sex-matched controls (Table 1). On these data we show that the proposed algorithm can successfully infer likely sites of disease initiation. Of course, there can be no ‘gold standard’ neuropathological validation of the inferred seed sites, as these are live patients at various stages of post-onset progression. Instead we test whether the group average behaviour of our inference correctly reproduces the most common sites of tau pathology known from autopsy series (Braak and Braak, 1996; Thal et al., 2002). We found that group average maps of inferred seed successfully recapitulated the strongest likelihood in the hippocampus and adjoining temporal cortices, which are well known as the sites of tau aggregation in early disease. However, despite this concordance on a population level, there is significant heterogeneity on an individual level. We find that a common initial state of neurodegeneration is insufficient for explaining the atrophy patterns observed in the patients at the time of diagnosis. In contrast, our inference algorithm shows wide divergence between patients in their seeding patterns, which are nonetheless highly predictive at the individual level of the patient’s empirical atrophy patterns. Taken together, these data suggest not only that inference of Alzheimer’s disease onset regions and hence putatively tau pathology seeding is possible from in vivo MRI scans, but also point to intersubject variability in the seeding pattern that could provide important insights into seeding and progression in Alzheimer’s disease and other neurodegenerative conditions. Table 1 Study demographics and clinical characteristics Cohort Group size (male/female) CSF amyloid-β (+/−ve) Age (µ ± σ) CDR MMSE Control 175 (78/97) 59/78 73.8 ± 6.7 0 to 0.5 24 to 30 Alzheimer’s disease 117 (60/57) 92/7 74.2 ± 7.6 0.5 to 2 4 to 29 Late MCI 156 (86/70) 100/34 73.0 ± 7.8 0.05 to 2 15 to 30 Early MCI 148 (82/66) 80/50 72.4 ± 6.3 0 to 1 22 to 30 Cohort Group size (male/female) CSF amyloid-β (+/−ve) Age (µ ± σ) CDR MMSE Control 175 (78/97) 59/78 73.8 ± 6.7 0 to 0.5 24 to 30 Alzheimer’s disease 117 (60/57) 92/7 74.2 ± 7.6 0.5 to 2 4 to 29 Late MCI 156 (86/70) 100/34 73.0 ± 7.8 0.05 to 2 15 to 30 Early MCI 148 (82/66) 80/50 72.4 ± 6.3 0 to 1 22 to 30 Each disease cohort is age- and sex-matched with respect to the control group. CDR = Clinical Dementia Rating Scale; MMSE = Mini-Mental State Examination. Materials and methods Participants All subject data were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/). ADNI is a public-private private, large multisite longitudinal study with the goal of tracking Alzheimer’s disease biomarkers and accelerate prevention and treatment of the disease. For this study we have included all ADNI 2 and ADNI GO subjects from early 2011 to mid-2015. For our analysis we have excluded all subjects that did not meet pass and partial QC status for overall FreeSurfer segmentation. Subjects were diagnosed and grouped to Alzheimer’s disease, late MCI and early MCI according to ADNI data description. Demographic and clinical features of our study population are shown in Table 1. All subjects have volumetric MRI data. There was no significant difference in age between the control and each of the disease populations (P > 0.05), nor were there significant differences in gender composition (P > 0.05). FreeSurfer longitudinal MRI processing Automated cortical and subcortical volume measures were performed with FreeSurfer software package, version 5.3 (http://surfer.nmr.mgh.harvard.edu/fswiki) (Fischl et al., 2002, 2004). We chose Freesurfer because it is perhaps the most widely used brain segmentation and parcellation tool, and because of our previous experience with using it for NDM testing (Raj et al., 2012). We found also that the NDM predictions similar results, whether using Freesurfer or SPM, another widely used tool. To reduce the confounding effect of intra-participant morphological variability, each participant’s longitudinal data series was processed by FreeSurfer longitudinal workflow (http://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalProcessing). A previous test-retest study validated that the longitudinal processing provides consistent regions of interest (ROI) segmentation (Reuter et al., 2012). Estimated total intracranial volume (generated by FreeSurfer was used as an estimate for intracranial volume as a measure to normalize FreeSurfer data. All images underwent standardized quality control. Participants with complete segmentation failure or gross errors throughout all brain regions were rated as complete failure. Participants with complete segmentation failure or gross errors throughout all brain regions, as observed visually in Freesurfer’s tkmedit tool and Freesurfer quality status outputs, were rated as complete failure. Participants with gross errors in one or more specific brain regions (i.e. temporal lobe regions, superior regions, occipital regions, and insula) were given partial pass rating. Participants with partial pass rating were included in analyses, but those with gross errors were not. Detailed FreeSurfer segmentation and QC guidelines are available on UCSF Freesurfer mthods and QC official document (dated 01/31/2014) from the LONI-ADNI website. Using these criteria we removed 8 of 125 subjects from the Alzheimer’s disease cohort and 25 of 181 from the late MCI cohort. At the time of writing, of the 305 early MCI subjects, Freesurfer processing was available for 148 subjects, all of whom had passing or partial Freesurfer QC grade. The resulting size of the early MCI cohort is comparable to the other groups, and is age-matched to them. White matter connectome The present study used diffusion MRI data from 37 control subjects in the ADNI database to form a single control connectome. Raw diffusion-weighted images (DWIs) were corrected for image artefacts including Eddy current, motion, and echo planar imaging distortions using FSL toolbox (Jenkinson et al., 2012). A single diffusion tensor was modelled at each voxel in the brain from the corrected DWI scans using CAMINO toolbox (Cook et al., 2006). Afterward, the deterministic simple whole white matter streamlining was applied on the diffusion tensor images (DTI) using CAMINO software (Cook et al., 2006). The tissue masks from T1 image was rigidly registered to the first frame of the DWI and used in the white matter tractography. FreeSurfer cortical parcellations mapped in the DTI subject space is used to calculate the ROI-ROI connectivity matrix. The resulting matrices in the current study were of size 86 × 86, per the 86 cortical and subcortical structures from the FreeSurfer (Desikan-Killarney) grey matter parcellation. The control connectome was derived by taking the average of all individual 37 86 × 86 control subject matrices to form a single 86 × 86 control white matter connectome. Derivation of atrophy from FreeSurfer volumes Using the FreeSurfer (FS) volumes derived above from the control and disease populations, we calculated regional Z-scores for atrophy: zij=(vij−μi,control)σi,control (1) where zij and vij are the Z-score and FS volume for the i-th region of the j-th patient, respectively; and μi,control and σi,control are the age-matched control FS volume mean and standard deviation for the i-th region, respectively. We then used a weighted logistic transform to renormalize each patient’s atrophy within the range (0, 1): aj=e(zj−a0)/sσj1+e(zj−a0)/sσj (2) where aj and zj are the renormalized atrophy and Z-score vectors for patient j, respectively, σj is the standard deviation of zj, s is an empirically-determined constant, and a0=s2∑i=1nzijn;zij>0 (3) Participants and regions that showed hypertrophy (z-scores < 0) were assumed to be due to error or methodological limitations. Instead of excluding these regions and subjects in our calculation of longitudinal atrophy slopes, we simply replaced these z-scores with zero and computed slope as in the normal case. Similarly, negative slopes are also indicative of hypertrophy and were thresholded to zero. Network diffusion model The NDM was used as described in Raj et al. (2012). Briefly, Alzheimer’s disease-related disease pattern, given by the vector x(t), was modeled as a diffusive process: x(t)=e−βLtxseed (4) where xseed is the initial regional pattern of the disease, on which the term e−βLt acts as a spatial and temporal blurring operator. We therefore call e−βLt ‘the diffusion kernel’. We used an eigenvalue decomposition of the Laplacian to evaluate the network diffusion model. Seed inference method To infer the likely seed configuration subject to a tunable sparsity parameter, we implemented a constrained optimization routine using the graphical model described above. Importantly, we applied our method to each patient individually, obtaining unique seed vectors for each baseline configuration. To properly initialize our minimization problem, we used the following heuristic method to calculate a best first guess: (i) run the network diffusion model on each brain region individually using the rescaled atrophy as the end point, obtaining a vector R of maximum Pearson correlations for each region and an associated vector t of the model times at which these correlations were achieved; (ii) set R to 0 for all regions i for which ti = 0 or ti = tend, where tend is the final time point at which the model was evaluated. Call this new vector R′; (iii) set Ri to 0 if Ri<R¯′. Call this new vector R″; and (iv) normalize R″ to the unit norm. We then used R″ as x0 and create a narrower t-range over which to evaluate the inference method using the distribution of ti within t. The optimization occurs in two discrete steps as there are two active variables, xseed and t. We first find tmin, the t at which the following cost function is minimized: tmin=argmint(e−〈x*(t),xb*〉) (5) where x(t) is defined as in Equation 4, xb is the patient baseline atrophy pattern, and v*=v−v¯||v−v¯||2 〈v1*,v2*〉=(v1−v¯1)·(v2−v¯2)||v1−v¯1||2·||v2−v¯2||2=R(v1,v2) In other words, (tmin) is the atrophy pattern that best fits the patient baseline under the NDM, using the Pearson correlation coefficient R as the metric of similarity. Having chosen tmin, we now find the optimal xseed using the following L1-penalized cost function: cL1(xseed)=e−〈xmin*,xb*〉+λ||xseed||1 (6) where we adopt the shorthand xmin for x(tmin) as determined in the previous step. In the majority of cases, patients had follow-up scans in addition to the initial scans we used for the seed determination procedure described above. To incorporate this information into our optimization, we approximated the rate of atrophy spread using a least-squares linear fit of the sequence of scans. We then followed the two-step procedure outlined above, incorporating an additional term into our cost functions scaled by a tunable weight parameter, w: tmin=arg mint((1−w)e−〈x*(t),xb*〉+we−〈x˙*(t),xslope*〉) (7) cL1=(1−w)e−〈xmin*,xb*〉+we−〈x˙min*,xslope*〉+λ||xseed||1 (8) The new terms involve the Pearson correlation between the fitted atrophy rate, xslope, and the predicted rate using the NDM, x˙(t). Note that Equations 7 and 8 reduce to Equations 5 and 6 when the rate-fitting parameter, w, is set to 0. The details of the inference algorithm are in the Supplementary material. Inspection of the cost function (Equations 6 and 8) suggests that the cost function is quite close to convex, except when the NDM predictor is completely uncorrelated with baseline atrophy (which is by design never close to the optimum solution). We ignore here the slight non-linearity caused by the norm denominators inherent in Pearson’s R. To summarize the algorithm briefly, we first estimate a reasonable initial guess for xseed, which is here given by single-region seeds’ Rmax values. Using this estimate for xseed we estimate tmin by minimizing Equations S2 in the Supplementary material. Then the optimal seed vector is inferred from the L1 lasso algorithm using the above tmin and the initial guess for xseed. We note that due to the various heuristics used here, the overall algorithm cannot be said to have any guaranteed global optima or a guaranteed convergence speed. The most we can say is that it has been found to be empirically effective. Empirically, we were able to determine convexity and successful convergence indirectly by noting that the minima found using the heuristic for initial seed do not significantly differ from those found using random initialization, suggesting they are global minima. For cost function minimization we used the minConf_SPG algorithm (https://www.cs.ubc.ca/∼schmidtm/Software/minConf.html) to do the optimization and used the default settings. The algorithm was stopped when either the maximum number of iterations reached 500, or the maximum number of function calls reached 500, or the error tolerance (change of objective between consecutive iterations) dipped below 1 × 10−5. Convergence under these conditions was not an issue in all runs we observed. Results Group level regional atrophy patterns Figure 1 shows the mean baseline atrophy for the Alzheimer’s disease cohort. Almost all 86 regions exhibit decreased volume compared with age-matched controls, with the left and right hippocampi, the left and right amygdalae, and the left entorhinal cortex exhibiting both the greatest degrees of atrophy and the lowest coefficients of variation (COVs). In this ‘glassbrain’ rendering, each brain region is represented by a sphere: its radius is proportional to the region’s atrophy in comparison to healthy controls, and its shading is proportional to the COV. Outside of this relatively conserved set of regions, intersubject variability is higher, suggesting significant patient-to-patient heterogeneity. The late MCI cohort’s average atrophy map (Fig. 1, middle) is comparatively sparser with a lower degree of global atrophy, consistent with the less severe diagnosis. As with Alzheimer’s disease, the left and right hippocampi are affected most severely across all late MCI patients. In contrast, the early MCI group shows no global atrophy pattern (Fig. 1, bottom) and only the right hippocampus has significant degeneration. On average there are more regions exhibiting a greater volume than the controls (red spheres), though it is unclear whether this is true hypertrophy or statistical noise. Overall, our three patient groups have average atrophy patterns consistent with their diagnostic categories. Figure 1 View largeDownload slide Average global atrophy patterns. The size of the spheres corresponds to the magnitude of atrophy (Z-scores with respect to controls), and the colour of each node indicates whether there is hypotrophy (blue) or hypertrophy (red). The node size scale is linear. The shading as represented by the colour bars corresponds to the coefficient of variation, with lighter shades indicating a higher degree of variation. The scale of the spheres in the early MCI glass brains is two times larger than the reference for easier visualization. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Figure 1 View largeDownload slide Average global atrophy patterns. The size of the spheres corresponds to the magnitude of atrophy (Z-scores with respect to controls), and the colour of each node indicates whether there is hypotrophy (blue) or hypertrophy (red). The node size scale is linear. The shading as represented by the colour bars corresponds to the coefficient of variation, with lighter shades indicating a higher degree of variation. The scale of the spheres in the early MCI glass brains is two times larger than the reference for easier visualization. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Parameter choices, L curves and group level patterns of inferred seeds As described in the ‘Materials and methods’ section, we designed our algorithm to determine a consistent seeding pattern for each patient, subject to two parameters: the L1 penalty, λ, and the weight given to a linear fit of the rate of atrophy, w. We first needed to characterize the relationship between these parameters and the derived seeds, and then select an optimal (λ, w) pair for each cohort. Figure 2A and Supplementary Fig. 1 show the characteristic L-curves for a range of λ and w for the Alzheimer’s disease, late MCI, and early MCI cohorts, respectively. Since a consistent ‘elbow’ region is observed in these L-curves, we chose the corresponding (λ, w) values as optimal choices. In theory, parameter optimization can be performed on a per-subject basis, but we did not observe sufficient variability for this to be necessary. Henceforth, a single set of ‘optimal’ parameter set was chosen for each diagnostic condition, as indicated by the red circles in Fig. 2A and Supplementary Fig. 1. Since the exact ‘elbow’ depends on the two user-defined cost terms (x and y axes) and their relative weights, the optimal parameters are still somewhat subjective. Hence we allowed a narrow range around the visually identified elbow. For the late MCI group the elbow is in fact unambiguous, and was selected as such. For the Alzheimer’s disease and early MCI cohorts, the strict elbow occurs at w = 0, which is arguably suboptimal. For example, early MCI elbow is at (λ, w) = (0.04, 0), but we selected the close-by point (λ, w) = (0.02, 0.33) instead. Our intention was to opt for a non-zero ‘w’ weight with nearly the same mean-squared error as the strict elbow, in order to induce a longitudinal effect. At w = 0 the longitudinal slope will be completely ignored, which is suboptimal. Figure 2 View large Download slide Seed parameterization and distribution. (A) L-curves for each weighting parameter plotted over a range of penalty parameters for the Alzheimer’s disease cohort, with the best combination (λ = 0.06, w = 0.17) circled in red. See Supplementary Fig. 1 for the late MCI and early MCI L-curves. (B) Glass brain representations of the average seed atrophy patterns for each pair of parameters. The range closely corresponds to the elbows of the L-curves for each cohort. As either λ or w is increased, the sparsity of the average seed also increases until no regions show significant atrophy. The glass brains outlined in red reflect the parameter choices used for further analysis. (C) Distribution of seeds. We quantified the frequency with which patients in each cohort showed nonzero atrophy on a node-by-node basis. The results are visualized as a bar graph, where regions were binned based on gross anatomical location. Seeds are particularly enriched in the temporal lobe compared with the other regions of the brain for all three groups (P < 0.001). AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Figure 2 View large Download slide Seed parameterization and distribution. (A) L-curves for each weighting parameter plotted over a range of penalty parameters for the Alzheimer’s disease cohort, with the best combination (λ = 0.06, w = 0.17) circled in red. See Supplementary Fig. 1 for the late MCI and early MCI L-curves. (B) Glass brain representations of the average seed atrophy patterns for each pair of parameters. The range closely corresponds to the elbows of the L-curves for each cohort. As either λ or w is increased, the sparsity of the average seed also increases until no regions show significant atrophy. The glass brains outlined in red reflect the parameter choices used for further analysis. (C) Distribution of seeds. We quantified the frequency with which patients in each cohort showed nonzero atrophy on a node-by-node basis. The results are visualized as a bar graph, where regions were binned based on gross anatomical location. Seeds are particularly enriched in the temporal lobe compared with the other regions of the brain for all three groups (P < 0.001). AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Figure 2B shows the sagittal cross-sections of the non-trivial average inferred seed patterns across a range of parameter values for each patient group. As expected by the behaviour of L1 regularization, increasing the penalty parameter smoothly increases the sparsity of the solutions. Surprisingly, we observe a similar trend when we increase the slope weighting parameter. We hypothesize that the terms corresponding to the baseline and fitted slope constraints in our cost function (see ‘Materials and methods’ section) generally oppose each other, forcing more regions to 0 as w changes from 0 to 0.5. The optimal parameter set (Fig. 2B, top) for Alzheimer’s disease gives just three regions with significant non-zero seed atrophy: the left and right hippocampi and the right amygdala, agreeing with the prevailing theory that atrophy is first detected in this general area of the temporal lobe. The analogous cohort-level plots for the late MCI and early MCI patients exhibit the same sparsity trends with regards to λ and w, although the range of values for which there is at least one region with non-zero atrophy was smaller than the range for Alzheimer’s disease. The group level patterns of inferred seeds shown in Fig. 2B are consistent with the known prominence of hippocampus, entorhinal and adjoining temporal cortices in early disease, thereby validating our inference algorithm at least at the group level. Another way to assess this is to look at the anatomical distribution of the seeds to find ‘hot spots’ that exhibit the highest frequency among patients. Figure 2C shows the number of patients with seeds including each of the 86 regions, grouped by gross anatomical location. Notably the inferred seeding in the Alzheimer’s disease cohort is greatest in the temporal lobe, in agreement with current knowledge and the above results (χ2 goodness-of-fit, P < 0.001). The most prevalent regions within the seeds of the late and early MCI patients are also located within the temporal lobe, though the seed patterns are more diffuse than those of the Alzheimer’s disease group (P < 0.001). The trend is very similar to that of the baseline atrophy patterns (Fig. 1), with stage of disease correlating with higher concentrations of temporal atrophy. Table 2 lists the most prevalent regions among the inferred seeds in all three diagnostic groups, correctly imputing prominence to hippocampal, entorhinal and temporal structures. In particular, both hippocampi were among the five highest regions for Alzheimer’s disease, late and early MCI. The consistency of the seeds across cohorts suggests a common aetiology for these three sets of patients, further reinforcing the stereotypical pattern of tau spread. Table 2 Most frequent regions present in the inferred seed patterns for each cohort Alzheimer’s disease Highest frequency Late MCI Highest frequency Early MCI Highest frequency Left hippocampus 43% Right hippocampus 38% Left hippocampus 36% Right hippocampus 40% Left hippocampus 37% Left inferior temporal 33% Left entorhinal 35% Right entorhinal 37% Left parahippocampal 29% Right entorhinal 32% Right superior temporal 32% Right bank STS 28% Left inferior temporal 32% Right parahippocampal 29% Right entorhinal 27% Alzheimer’s disease Highest frequency Late MCI Highest frequency Early MCI Highest frequency Left hippocampus 43% Right hippocampus 38% Left hippocampus 36% Right hippocampus 40% Left hippocampus 37% Left inferior temporal 33% Left entorhinal 35% Right entorhinal 37% Left parahippocampal 29% Right entorhinal 32% Right superior temporal 32% Right bank STS 28% Left inferior temporal 32% Right parahippocampal 29% Right entorhinal 27% Each region is listed alongside the fraction of patients with seed configurations containing that region. Intersubject heterogeneity in inferred seeding Next, we explored the heterogeneity between individual subjects’ baseline atrophy patterns and between their inferred seed patterns. As an illustration, Fig. 3A shows two cases on opposite ends of the similarity spectrum: the baselines of Alzheimer’s disease Patients 82 and 95 are highly correlated (R = 0.76), whereas Alzheimer’s disease Patients 45 and 98 have atrophy patterns that oppose each other (R = −0.47). In the former case, the correlation between their derived seeds is still positive but slightly smaller in magnitude (R = 0.64), suggesting that the differences that separate these two patients are more exaggerated in their seed patterns than their baselines. The magnitude of the seed correlation in the anticorrelated case is much more dramatically shifted towards zero (R = −0.030). Figure 3 View largeDownload slide Individual seed investigation. (A) Baseline and seed atrophy patterns for two pairs of patients, one exhibiting high correlation (left) and the other high anticorrelation (right). (B) Histograms of the pairwise correlations between patients within the Alzheimer’s disease, late MCI, and early MCI cohorts, respectively. We calculated the correlations between baseline atrophy patterns and seed atrophy patterns for all pairs of patients. (C) The same data as in B represented as scatterplots, with each patient pair represented by a single point. There is a strong correlation between pairwise baseline comparisons and pairwise seed comparisons for all three cohorts, as shown by the best-fit lines and mean R-values between both sets of correlations. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Figure 3 View largeDownload slide Individual seed investigation. (A) Baseline and seed atrophy patterns for two pairs of patients, one exhibiting high correlation (left) and the other high anticorrelation (right). (B) Histograms of the pairwise correlations between patients within the Alzheimer’s disease, late MCI, and early MCI cohorts, respectively. We calculated the correlations between baseline atrophy patterns and seed atrophy patterns for all pairs of patients. (C) The same data as in B represented as scatterplots, with each patient pair represented by a single point. There is a strong correlation between pairwise baseline comparisons and pairwise seed comparisons for all three cohorts, as shown by the best-fit lines and mean R-values between both sets of correlations. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. To explore intersubject heterogeneity in seeding and how it relates to heterogeneity in baseline atrophy, we performed a series of Pearson correlations on regional data, used here as a measure of similarity between any pair of subjects. We calculated the pairwise correlations between all patients within each cohort, displayed graphically in Fig. 4B and C. The histograms in Fig. 3B show the distributions of pairwise R-values for baseline atrophy and the inferred seeds. As we observed in the two cases above, the intersubject correlations between inferred seed patterns for the Alzheimer’s disease cohort are closer to zero compared to the correlation between their respective baseline atrophy patterns. Looking at all the patient groups together, the three histograms show several notable trends: (i) intra-cohort variability was high, reflecting the high degree of heterogeneity between patients’ baseline atrophy patterns; (ii) the mean pairwise correlations for both baselines and seeds are higher for Alzheimer’s disease (0.17 ± 0.17, 0.075 ± 0.19) than late MCI (0.052 ± 0.16, 0.030 ± 0.16) or early MCI (0.014 ± 0.15, 0.014 ± 0.15); and (iii) comparing between the baseline atrophy patterns and the inferred seeds, the means are significantly smaller for Alzheimer’s disease and late MCI (two-sample t-test, P < 0.001). Figure 3C transforms the same data as in Fig. 3B into a series of scatterplots for the three groups, with each pair of subjects represented by a single point. When there is strong similarity between two patients’ baseline atrophy patterns, there is also, on average, a strong similarity between their inferred seed patterns. Taken together, these observations suggest that (i) inferred seeds preserve important features of patient baseline data; and (ii) the increase in sparsity improves the distinguishability of patients within the same diagnostic group, as exhibited by a decrease in the correlations between them. As an aside, it is possible that similar results and trends could also be reproduced by simply thresholding the baseline atrophy patterns, without performing any inference algorithm at all. However, as shown in Supplementary Fig. 2, the divergence between these thresholded baselines and our seeds suggests that the inference algorithm nontrivially transforms atrophy patterns in addition to strongly enhancing sparsity. Figure 4 View large Download slide Forward predictions of baseline atrophy patterns with the inferred and hippocampal seeds. We initialize the forward NDM with either the inferred seed or a static seed located at the hippocampus and compare these predictions with baseline. The forward prediction using the inferred seeds performs well, while the hippocampal seed largely fails at reproducing each patient’s baseline. Note that the choice of parameters differs here from previous figures; see Supplementary Fig. 3. Figure 4 View large Download slide Forward predictions of baseline atrophy patterns with the inferred and hippocampal seeds. We initialize the forward NDM with either the inferred seed or a static seed located at the hippocampus and compare these predictions with baseline. The forward prediction using the inferred seeds performs well, while the hippocampal seed largely fails at reproducing each patient’s baseline. Note that the choice of parameters differs here from previous figures; see Supplementary Fig. 3. Inferred seeding, but not static hippocampal seeding, has high forward predictive power We also investigated how well our inferred seeds could predict future atrophy of individual subjects using network diffusion as the forward model. Given the role of hippocampus as one of the earliest and most atrophied brain regions in Alzheimer’s disease, we chose to benchmark the patient-derived seeds’ performance against a static seed located in the hippocampus. Figure 4 shows histograms summarizing the maximum correlation between patients’ baselines and the forward predictions of our two choices of seeds. For all three cohorts, our seeds performed well, with mean correlations with baseline around 0.4. Intriguingly, the hippocampal seed, when extrapolated in the forward time direction using the NDM, gave markedly poorer mean correlations (Alzheimer’s disease: 0.082 ± 0.13, late MCI: 0.074 ± 0.11, early MCI ± 0.11; two-sample t-test, P < 0.001). This is surprising since the hippocampus is considered a ‘hot spot’ of early atrophy in Alzheimer’s disease. Moreover, as mentioned above, the left and right hippocampi were among the most atrophied regions in the raw baseline atrophy patterns (Fig. 1) and significantly overrepresented among the inferred seeds for all three patient groups (Table 2). Principal component analysis reveals no underlying substructure Given the heterogeneity observed in above results, next we used principal component analysis (PCA) to investigate if the atrophy and seed patterns from different subgroups would lead to distinguishable and separate clusters. We formed the full-data matrix X = [atrophy, seed] with both atrophy and seed scores from all subjects. Then we performed singular value decomposition and projected the high-dimensional data X into the two largest principle components PC1 and PC2. Figure 5 (top) shows a scatter plot of PC1 and PC2 for all Alzheimer’s disease patients and all early MCI patients on atrophy scores and on seed patterns. While there were significant differences in the atrophy pattern between early MCI and Alzheimer’s disease (consistent with Noh et al., 2014; Dong et al., 2017; Park et al., 2017), PCA did not reveal distinct clusters for the seed patterns. These results are consistent with our hierarchical clustering results (Supplementary Figs 4 and 5), in which no significant subgroups emerged within any of the three cohorts for either the inferred seeds or the baselines. Additionally, when we applied PCA to the seeds and baseline atrophy of all three diagnostic groups (early MCI, late MCI, Alzheimer’s disease) combined (Fig. 5, bottom), the first two PCs show no evidence of clustering of atrophy based on diagnostic grouping, as the late MCI group overlaps the other two. Similarly, there is no evident clustering in the seeding pattern (Fig. 5 bottom right); the Alzheimer’s disease and MCI groups are almost indistinguishable, which was also suggested by hierarchical clustering (Supplementary Fig. 3). Ultimately these PCA results suggest that the data themselves lack substructure, and that the seeding patterns of ADNI subjects cannot be clustered into separate groups using disease stage. Figure 5 View largeDownload slide PCA inferred seeds on Alzheimer’s disease, early MCI, and late MCI groups. The top row shows only Alzheimer’s disease and early MCI and bottom row shows all three groups. Principal Component Analysis (PCA) for inferred seeds showed no distinct clusters within the Alzheimer’s disease, early MCI, and late MCI cohort. Seed values were calculated using optimal L-curve parameters for the each cohort as described earlier. Alzheimer’s disease patients are shown in magenta and early MCI patients in blue. While there were significant differences in the atrophy pattern (consistent with Noh et al., 2014; Dong et al., 2017; Park et al., 2017), PCA did not reveal distinct/separate clusters for the seed patterns. However, we recapitulate significant separation between patients with Alzheimer’s disease (in magenta) versus patients with early MCI (in blue) regarding atrophy scores. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Figure 5 View largeDownload slide PCA inferred seeds on Alzheimer’s disease, early MCI, and late MCI groups. The top row shows only Alzheimer’s disease and early MCI and bottom row shows all three groups. Principal Component Analysis (PCA) for inferred seeds showed no distinct clusters within the Alzheimer’s disease, early MCI, and late MCI cohort. Seed values were calculated using optimal L-curve parameters for the each cohort as described earlier. Alzheimer’s disease patients are shown in magenta and early MCI patients in blue. While there were significant differences in the atrophy pattern (consistent with Noh et al., 2014; Dong et al., 2017; Park et al., 2017), PCA did not reveal distinct/separate clusters for the seed patterns. However, we recapitulate significant separation between patients with Alzheimer’s disease (in magenta) versus patients with early MCI (in blue) regarding atrophy scores. AD = Alzheimer’s disease; EMCI = early MCI; LMCI = late MCI. Comparison of inferred seeds for early mild cognitive impairment subgroups according to amyloid-β levels Given that intersubject heterogeneity in seeding patterns is high and does not cluster by diagnostic group, we next explored whether pathological heterogeneity might be responsible for this effect. In particular, the CSF amyloid status is an important feature of clinical Alzheimer’s disease progression (Mattsson et al., 2013). Hence it is possible that observed seeding heterogeneity is caused by differential amyloid status within diagnostic groupings. The ADNI dataset contains thorough amyloid-β biomarker scores, which allowed us to compare both atrophy and seed patterns for amyloid-positive subjects (score < 192, following published cut-off level from Mattsson et al., 2013) versus amyloid-negative subjects. Figure 6A shows glass brain renderings of the seeding pattern of early MCI subjects, separated into amyloid-positive and -negative, using optimal L-curve parameters for early MCI (λ = 0.02, w = 0.33). The locations of high probability seeds are somewhat similar for both patterns, but their likelihood values (given by R-max) exhibit more variability. Ctx-lh-entorhinal and right-pallidum regions were identified with higher seeds for amyloid-β-positive while, ctx-lh-temporalpole and ctx-lh-frontalpole regions were identified with higher seeds for amyloid-β-negative subgroups in early MCI subjects. See Supplementary Fig. 6 for a detailed comparison of seed likelihood values for all 86 brain regions. Histograms for R-max values are shown in Fig. 6B. The amyloid-β-positive distribution favours slightly higher R-max values and is appreciably more left-skewed than the amyloid-β-negative distribution. This analysis was limited to the early MCI group as the amyloid positive/negative ratio was roughly even in this group (80:50). This subgroup is also of special interest since it would be more clinically advantageous to identify seeds in the earlier stages of the disease. In comparison, within the Alzheimer’s disease subgroup, the ratio between amyloid-β-positive to amyloid-β-negative subjects is 92 to 7. Therefore we do not report corresponding data for the Alzheimer’s disease group, as the amyloid-β-negative data cannot be reliably reported. Limiting the analysis to the amyloid-β-positive patients in the Alzheimer’s disease group did not significantly change the results. Figure 6 View largeDownload slide Seed investigation for amyloid-β and PCA inferred seeds on distinct amyloid-β subgroups. (A) Seed investigation for amyloid-β. Comparison between amyloid-β-negative (red) and amyloid-β-positive (blue) early MCI cohorts. Ctx-lh-entorhinal region was identified with higher seeds for amyloid-β-positive versuss ctx-lh-temporal pole region was identified with higher seeds for amyloid-β-negative subgroups in early MCI subjects. Atrophy patterns on top depict (scaled) nodes with average Z-scores above 0.05 and seed patterns on bottom show (scaled) nodes with average seed values above 0.15. Seed values were calculated using optimal L-curve parameters for the early MCI cohort (λ = 0.02, w = 0.33). (B) Seed investigation for amyloid-β. Histograms (normalized probability) for different Rmax values for amyloid-β-negative (red) and amyloid-β-positive (blue) in early MCI cohorts. The positive distribution favors higher Rmax values and is significantly more left-skewed than the negative distribution. (C) PCA inferred seeds on distinct amyloid-β subgroups. PCA for inferred seeds showed no distinct clusters for different amyloid-β subgroups within the early MCI cohort. Seed values were calculated using optimal L-curve parameters for the early MCI cohort (λ = 0.02, w = 0.33). Ctx = cortex; lh = left hemisphere. Figure 6 View largeDownload slide Seed investigation for amyloid-β and PCA inferred seeds on distinct amyloid-β subgroups. (A) Seed investigation for amyloid-β. Comparison between amyloid-β-negative (red) and amyloid-β-positive (blue) early MCI cohorts. Ctx-lh-entorhinal region was identified with higher seeds for amyloid-β-positive versuss ctx-lh-temporal pole region was identified with higher seeds for amyloid-β-negative subgroups in early MCI subjects. Atrophy patterns on top depict (scaled) nodes with average Z-scores above 0.05 and seed patterns on bottom show (scaled) nodes with average seed values above 0.15. Seed values were calculated using optimal L-curve parameters for the early MCI cohort (λ = 0.02, w = 0.33). (B) Seed investigation for amyloid-β. Histograms (normalized probability) for different Rmax values for amyloid-β-negative (red) and amyloid-β-positive (blue) in early MCI cohorts. The positive distribution favors higher Rmax values and is significantly more left-skewed than the negative distribution. (C) PCA inferred seeds on distinct amyloid-β subgroups. PCA for inferred seeds showed no distinct clusters for different amyloid-β subgroups within the early MCI cohort. Seed values were calculated using optimal L-curve parameters for the early MCI cohort (λ = 0.02, w = 0.33). Ctx = cortex; lh = left hemisphere. The highest seed values for the amyloid-β-positive subgroup occurred in the left entorhinal cortex (seed = 0.55), right pallidum (seed = 0.36) and right hippocampus (seed = 0.34). For the amyloid-β-negative subgroup, however, the highest seed values occurred in the left temporal pole (seed = 0.54), left frontal pole (seed = 0.33) and left cuneus (seed = 0.32). The magnitudes of the top seed values were similar for both subgroups, and both had large seed values in the right hippocampus, left frontal pole and right temporal pole. Having the largest seed at the left entorhinal, however, seems to be a signature of the amyloid-β-positive subgroup. Amyloid negatives tend to have more widespread and diffuse patterns of atrophy compared to amyloid positives, who display the classic mesial temporal dominance. The seed patterns appear to be slightly more different between the amyloid ± subgroups compared to their baseline atrophy patterns, suggesting that perhaps seeding heterogeneity may be governed by amyloid status. Therefore to explore possible clustering by amyloid status, in Fig. 6C, we repeat the earlier PCA analysis for amyloid-β-positive patients versus amyloid-β-negative patients within the early MCI cohort on atrophy scores and on seed patterns. The subgroups are poorly separated in both cases. A possible explanation is that the amyloid-β-negative group gives the same patterns as amyloid-β-positive group, but is more widespread. This is true in both baseline atrophy and seeding patterns. This suggests that the origin region is either the same for both, or is not necessarily correlated to amyloid aetiology. Discussion The present study proposes a novel inference algorithm, based on the network diffusion model and sparsity-inducing L1 norm regularization, capable of inferring a patient’s past patterns of Alzheimer’s disease and MCI onset, which we believe is suggestive of tau pathology seeding. Specifically, we obtained sparse estimates of the patient’s most likely foci of disease initiation, referred to here as ‘seed’ regions, from baseline and longitudinal regional volumetric data obtained from MRI. The key assumption underlying this approach is that the current state of a patient’s regional atrophy patterns are a result of network transmission, following accumulating evidence of ‘trans-neuronal transmission’ of misfolded tau protein in the brain. In this work, regional atrophy is assumed to be a reasonable stand-in for regional tau distribution, backed by post-mortem and in vivo studies (Arriagada et al., 1992; Nelson et al., 2012; Xia et al., 2017). Our major findings are as follows. First, from each patient’s longitudinal MRI scans, our inference algorithm identified several seed regions from which Alzheimer’s disease or MCI tau pathology most likely originated in that patient. The sparsity of the inferred seeds is effectively controlled by the algorithm parameters λ and w, indicating that the algorithm functions as designed. Increasing either, the L1 constraint, or w, the emphasis on fitting the longitudinal rate of atrophy, results in seeds that have, on average, fewer nonzero regions. Consequently, our inference method is flexible in yielding a set of seeds at different levels of sparsity. We performed thorough L-curve analysis to obtain the optimal parameter choices. Second, we showed that the majority of these Alzheimer’s disease seeds lie in the temporal lobe, specifically the hippocampus, entorhinal cortices and surrounding areas in Alzheimer’s disease, late MCI and early MCI subjects. Given the prominence of these structures in early Alzheimer’s disease from autopsy series in human Alzheimer’s disease brains (Braak stages II–IV; Braak and Braak, 1996), this provides a level of support to the accuracy of the proposed algorithm. Third, we demonstrated that applying our previously published NDM in the forward direction from these seeds predicts a patient’s future atrophy (i.e. measured at baseline and follow-up visits in the ADNI study) with high predictability. We showed that the individualized seeds inferred by our algorithm are significantly better at predicting ‘future’ patterns of regional atrophy than static seeding of the hippocampus in all subjects. This suggests that hippocampus, or any other single structure, may not be the true region of disease onset in all or even most individual patients. Finally, we thoroughly characterized the intersubject heterogeneity in the seeding patterns in ADNI subjects, and showed that seed variability is quite high even amongst subjects in the same diagnostic group. This suggests that Alzheimer’s disease spectrum patients can have quite variable patterns of disease onset. It also suggests that correctly inferring a patient’s seeding pattern can be helpful in differential and personalized diagnosis. We showed that seeding variability is even higher than baseline atrophy variability between patients. One potential explanation of this could be that network transmission following disparate sources of regional onset can still converge to a common template of regional atrophy as disease progresses from onset to early MCI to late MCI to full Alzheimer’s disease. Taken together, these results provide preliminary support for the proposed inference algorithm, which may have future potential to characterize individual patterns of atrophy for clinical applications. Agreement and comparison with prior studies The average baseline atrophy Z-scores of our Alzheimer’s disease patient population (Fig. 1, top) is consistent with classic features of Alzheimer’s disease topography, including significant mesial temporal grey matter atrophy compared to controls. Mesial temporal atrophy is a widely replicated and accepted biomarker of Alzheimer’s disease (Baron et al., 2001; Thompson et al., 2003; Apostolova et al., 2007; Da et al., 2013; Peter et al., 2014; Fischer et al., 2016). Figure 2B agrees with prior findings of Alzheimer’s disease degeneration originating in the mesial temporal lobe. In particular, as we apply more stringent sparsity constraints by increasing λ and w, the average Alzheimer’s disease seed converges on hippocampal and entorhinal regions. We further evaluated all seeds on an individual-subject basis (Fig. 2C), confirming that atrophy originates most frequently from temporal regions. A closer look at seed distribution in Table 2 shows that the majority of our computationally-modelled seeds are in the hippocampus, entorhinal cortex, and immediate surrounding cortices consistently in Alzheimer’s disease and MCI patients. These seed results parallel classic post-mortem pathology studies of Alzheimer’s disease. The Braak staging model of Alzheimer’s disease shows deposition of misfolded proteins occurs in a stereotypical fashion, and early sites of pathological involvement are the hippocampus, entorhinal cortex, and limbic structures (Braak and Braak, 1991, 1996; Rüb et al., 2000; Sassin et al., 2000; Lace et al., 2009). Our group seeding results of Fig. 2 generally agree with the spatial pattern of tau deposition noted in recently emerging in vivo tau-binding PET scans. In particular, the THK5351 binding pattern of early MCI subjects reported by Kang et al. (2017), and of AV1451 binding pattern of Alzheimer’s disease subjects reported by Mishra et al. (2017), are strikingly similar to the early MCI glass brains shown in Fig. 6A. All have a strong medial and lateral temporal involvement, followed by more diffuse involvement of the neocortex. Intriguingly, our seeding results are much closer to these published tau patterns in early MCI than is the early MCI atrophy pattern, lending critical empirical support to our inferred seeds. A fundamental assumption of the proposed method is that pathological progression (of tau, not necessarily of amyloid-β) occurs through white matter tracts. This is justified by substantial emerging evidence in human studies (Raj et al., 2012, 2015) and observations of trans-neuronal tau spread in mouse models (Clavaguera et al., 2009). Hence we are confident that network-mediated spread can be effectively used in the proposed inference algorithm. However, other possibilities cannot be ruled out, for instance that tau might spread without the requirement of direct physical connections between neurons (e.g. through interstitial fluid; Wu et al., 2016). It is also possible that network mediation is through functional rather than anatomic connectivity (Seeley et al., 2009; Zhou et al., 2012), or through spatial spread. Interestingly, recent work in our laboratory supports that the anatomic connectome, rather than distance-based or interstitial fluid-mediated spread, is the most likely mediator of tau spread in mouse models (Mezias et al., 2017). Some prior work exists on solving the seed inference problem in the dementia context. Zhou et al. (2012) perform a brute-force search for the seed region that produces the best match between its functional connectivity pattern to the rest of the brain and regional atrophy patterns. This may be considered a variant of our seed initialization strategy, where we use as x0 each region’s Rmax after forward propagation using network diffusion. However, as we have seen, the final inference frequently veers away from the brute-force initial pattern, after the imposition of sparsity and other constraints. More recently, Hu et al. (2016), proposed a Monte Carlo sampling technique to infer seed patterns using the network diffusion model. This study focuses mainly on simulations, and reports interesting patterns that appear to be at variance to expected spatial patterns in Alzheimer’s disease. This might be due to the complexity of their inference problem, which has a large number of unknown parameters whose accurate estimation is difficult from either simulation data or from limited number of longitudinal data points available in ADNI. In contrast, our study considers only empirical atrophy and connectivity data, and proposes a cost function that is well suited to the limited number of time points available. It therefore succeeds in recapitulating the classic temporal-dominant seeding patterns at the group level. Iturria-Medina et al. (2017) have also reported related approaches. Individualized seeds outperform static seeds in predicting future atrophy Figure 4 lends further support to the seeds discovered by the inference algorithm by showing each subject’s individual seed performed significantly better than using the hippocampus as a static seed. In subjects whose seeds were not in the hippocampus, seeds were also found in the entorhinal cortex and surrounding cortices. This seeding heterogeneity reinforces the importance of inference on personalized patient as opposed to group-wise data. We chose to test the hippocampus as an a priori hypothesis, since exogenous seeding of pathogenic proteins in the hippocampus via injections in mouse models causes remote pathology in connected regions similar to that observed in post-mortem humans (Clavaguera et al., 2009; Ahmed et al., 2014). Even when we chose outside hippocampus, consisting of the 5–10 most common regions (Table 2), the results did not significantly improve (Supplementary Fig. 7). Also worth noting is that as the seeds are inferred in part from the baseline atrophy, the seed might itself be a strong predictor of baseline atrophy without needing to do any forward prediction using the NDM. However, the NDM predictions from the inferred seeds are more strongly correlated with the baseline than the seeds themselves (Supplementary Fig. 3), indicating that the seeds truly represent earlier time points along the neurodegeneration trajectory. Initial patterns of atrophy are highly heterogeneous Concordant with the observation that a group-level seed fails to explain the progression of atrophy in individual patients, the inferred seeds are highly heterogeneous, as shown by the distribution of pairwise Pearson correlations in Fig. 3B. This suggests that, although the progression of Alzheimer’s disease is well-characterized on a group level, the heterogeneity underlying the broad distributions of baseline pairwise correlations is also present at very early stages of disease. We therefore propose that the patient-to-patient variability dominates disease aetiology, even though hippocampus and adjoining temporal cortices are implicated as seed locations on a population level. Further, inferred seeds are more variable than the baseline atrophy patterns. The correlated and anticorrelated examples in Fig. 3A also illustrate the shifts towards no correlation. One explanation of this of this trend is that the Pearson’s R between two vectors relates to the cosine of the angle between them. Hence sparsity-enforcing terms will tend to make these vectors more orthogonal to each other, and reduce Pearson’s R. This enhanced ‘distinguishability’ of our inferred seeds may provide a more sensitive means of diagnosing neurodegenerative diseases than MRI alone. Additionally, it may be able to identify subtypes or new diagnostic classes sharing a common pattern of origin. We detail plans to more fully evaluate these hypotheses below. These data on heterogeneity are not inconsistent with prior literature. Alzheimer’s disease is well known to be a heterogeneous disease in terms of different cognitive presentations as well as neuropathological and structural heterogeneity (Whitwell et al., 2012; Noh et al., 2014). In an attempt to address these heterogeneities, Alzheimer’s disease patients were grouped into amnestic and non-amnestic types based on cognition and as typical, limbic-predominant and hippocampal-sparing based on MRI and pathology (Whitwell et al., 2012). Our results support and underscore the issue of phenotypic heterogeneity; however, unlike previous studies, we find no evidence for clustering based on inferred seed patterns. One possibility could be that the disease onset is highly heterogeneous, and that the heterogeneity decreases, rather than increases, over time. The challenges of validation There is no ‘gold standard’ against which we can validate our inference method in individuals. Figure 5 and Supplementary Fig. 2 confirm that a consensus seed, whether it was chosen using outside knowledge or derived from the most prevalent regions among the inferred seeds themselves, cannot reproduce individual baseline atrophy patterns under the NDM. The individual inferred seeds in contrast give strong and significant prediction under NDM of the subject’s baseline atrophy pattern (Fig. 4), suggesting that the seeds represent pre-baseline stages of atrophy. However, absent bona fide preclinical atrophy data to compare to the inferred seeds (which, if it existed, would eliminate the need to infer prior states at all), it is impossible to ascertain exactly how accurately they reproduce initial stages of pathology. An alternative, indirect validation is if our inferred seeds can distinguish (i) known population-level trends in early disease; and (ii) populations of patients with different foci of origin. Table 2 and Figs 2 and 3 confirm the first; with regards to the second, we used hierarchical clustering on the Alzheimer’s disease and MCI seeds to try to identify latent substructure. Since subtypes of Alzheimer’s disease have been well-described in the literature (Murray et al., 2011; Mattsson et al., 2016) we hypothesized that our inferred seeds can classify these subtypes. However, PCA suggested that there was a lack of inherent substructure to the seeds and baselines of the patient groups (Fig. 5). Hierarchical clustering of the seeds and baselines revealed no discernible outgroups in any of the three cohorts (Supplemental Figs 4 and 5). One possibility is that in fact our patient groups do not contain sufficient numbers of subjects for the rarer subtypes to be detected. As an additional test, we grouped all three of our cohorts into one group and attempted to reproduce the diagnostic classes using PCA and hierarchical clustering, but neither method yielded identifiable subgroups (Fig. 5 and Supplementary Fig. 3). This may reflect the fact that MCI resembles a less severe form of Alzheimer’s disease, and using our method to regress current regional atrophy patterns to earlier stages produces a similar distribution of seeds. Figure 2C and Table 2 suggest that this may be the case, given the overlap of the most highly represented regions in the seeds of all three groups. Thus, failure to distinguish clinically relevant subpopulations may simply be suggestive that MCI is not a topographically distinct phase than mature Alzheimer’s disease, just temporally earlier. Another, not mutually-exclusive possibility is that the origins of atrophy on an individual-patient level may truly be heterogeneous, and apparent randomness dominates any substructure within this population. We believe that the true diagnostic power of inferred seeding will be apparent between more diverse patient populations diagnosed with other proteinopathies such as Parkinson’s disease, frontotemporal dementia and posterior cortical atrophy—a subject of future work in our laboratory. Other limitations Our inference algorithm, as it is based on the NDM, uses a linear approximation to explain macroscopic trends in atrophy spread (Raj et al., 2012, 2015), which may obscure the effects of higher-level dynamics exhibited at finer size scales. The choice of Pearson correlation as the primary metric in our cost function can be suboptimal if the atrophy data are not Gaussian distributed or show outliers. While we did not observe outliers in our atrophy data, we did not explore other metrics such as Spearman correlation. Pearson was our preferred choice as it reduces the risk of non-convex optimization problem during the seed inference procedure. Spearman is unfortunately quite discontinuous with respect to the NDM quantities x and t. Additionally, we chose to use an L1 regularization constraint for its desirable soft-thresholding behaviour and ease of implementation, but further exploring other possibilities is a necessary next step in the validation of our inference method. However, we note that the inferred seeds differ considerably from L0 seeds (Supplementary Fig. 8) so the model is not simply picking the regions of highest atrophy at the expense of less prominent regions that may have instrumental roles in atrophy progression at earlier time points. Atrophy and white matter connectivity results in the current work share the same sensitivity and accuracy issues common to computational neuroanatomic data currently in the public domain. Specifically, there exist technical limitations of the volumetric and tractography processing pipelines include HARDI spatial and angular resolution, coregistration errors, low test-retest reliability of volumetric data, and the distance bias inherent in tractography. Finally, the use of pathology spread-based NDM on MRI-derived atrophy data assumes that two are co-localized; this is reasonable as tau neurofibrillary tangles are strongly associated with degeneration (Arriagada et al., 1992; Nelson et al., 2012; Xia et al., 2017). Acknowledgements All patient data used in this study were obtained from the public ADNI study. Users interested in obtaining these data can do so by visiting http://adni.loni.usc.edu. Funding This research was supported in part by the following grants: R01 NS092802 and R01 EB022717 from the National Institutes of Health (AR). F.P. was supported by the Ford Fellowship. None of the authors have a competing financial interest in relation to the work described herein. Supplementary material Supplementary material is available at Brain online. Abbreviations Abbreviations MCI mild cognitive impairment NDM network diffusion model PCA principal component analysis References Ahmed Z, Cooper J, Murray TK, Garn K, McNaughton E, Clarke H, et al. A novel in vivo model of tau propagation with rapid and progressive neurofibrillary tangle pathology: the pattern of spread is determined by connectivity, not proximity. Acta Neuropathol 2014; 127: 667– 83. Google Scholar CrossRef Search ADS PubMed Apostolova LG, Steiner CA, Akopyan GG, Dutton RA, Hayashi KM, Toga AW, et al. Three-dimensional gray matter atrophy mapping in mild cognitive impairment and mild Alzheimer disease. Arch Neurol 2007; 64: 1489– 95. Google Scholar CrossRef Search ADS PubMed Arriagada PV, Growdon JH, Hedley-Whyte ET, Hyman BT. Neurofibrillary tangles but not senile plaques parallel duration and severity of Alzheimer’s disease. Neurology 1992; 42: 631– 9. Google Scholar CrossRef Search ADS PubMed Attems J, Thal DR, Jellinger KA. The relationship between subcortical tau pathology and Alzheimer’s disease. Biochem Soc Trans 2012; 40: 711– 15. Google Scholar CrossRef Search ADS PubMed Baron JC, Chételat G, Desgranges B, Perchey G, Landeau B, de la Sayette V, et al. In vivo mapping of gray matter loss with voxel-based morphometry in mild Alzheimer’s disease. Neuroimage 2001; 14: 298– 309. Google Scholar CrossRef Search ADS PubMed Braak H, Braak E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol 1991; 82: 239– 59. Google Scholar CrossRef Search ADS PubMed Braak H, Braak E. Evolution of the neuropathology of Alzheimer’s disease. Acta Neurol Scand Suppl 1996; 165: 3– 12. Google Scholar CrossRef Search ADS PubMed Clavaguera F, Bolmont T, Crowther RA, Abramowski D, Frank S, Probst A, et al. Transmission and spreading of tauopathy in transgenic mouse brain. Nat Cell Biol 2009; 11: 909– 13. Google Scholar CrossRef Search ADS PubMed Cook PA, Bai Y, Nedjati-Gilani S, Seunarine KK, Hall MG, Parker GJ, et al. Camino: open-source diffusion-MRI reconstruction and processing. In: 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine . Seattle; 2006. p. 2759. Da X, Toledo JB, Zee J, Wolk DA, Xie SX, Ou Y, et al. Integration and relative value of biomarkers for prediction of MCI to Alzheimer’s disease progression: spatial patterns of brain atrophy, cognitive scores, APOE genotype and CSF biomarkers. Neuroimage Clin 2013; 4: 164– 73. Google Scholar CrossRef Search ADS PubMed Dong A, Toledo JB, Honnorat N, Doshi J, Varol E, Sotiras A, et al. Heterogeneity of neuroanatomical patterns in prodromal Alzheimer’s disease: links to cognition, progression and biomarkers. Brain 2017; 140: 735– 47. Google Scholar PubMed Fischer CE, Ting WK, Millikin CP, Ismail Z, Schweizer TA. Gray matter atrophy in patients with mild cognitive impairment/Alzheimer’s disease over the course of developing delusions. Int J Geriatr Psychiatry 2016; 31: 76– 82. Google Scholar CrossRef Search ADS PubMed Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 2002; 33: 341– 55. Google Scholar CrossRef Search ADS PubMed Fischl B, Salat DH, van der Kouwe AJ, Makris N, Ségonne F, Quinn BT, et al. Sequence-independent segmentation of magnetic resonance images. Neuroimage 2004; 23: S69– 84. Google Scholar CrossRef Search ADS PubMed Forman MS, Zhukareva V, Bergeron C, Chin SS, Grossman M, Clark C, et al. Signature tau neuropathology in gray and white matter of corticobasal degeneration. Am J Pathol 2002; 160: 2045– 53. Google Scholar CrossRef Search ADS PubMed Frost B, Diamond MI. Prion-like mechanisms in neurodegenerative diseases. Nat Rev Neurosci 2010; 11: 155– 9. Google Scholar CrossRef Search ADS PubMed Hu C, Hua X, Ying J, Thompson PM, Fakhri GE, Li Q. Localizing sources of brain disease progression with network diffusion model. IEEE J Sel Top Signal Process 2016; 10: 1214– 25. Google Scholar CrossRef Search ADS PubMed Iba M, McBride JD, Guo JL, Zhang B, Trojanowski JQ, Lee VM. Tau pathology spread in PS19 tau transgenic mice following locus coeruleus (LC) injections of synthetic tau fibrils is determined by the LC’s afferent and efferent connections. Acta Neuropathol 2015; 130: 349– 62. Google Scholar CrossRef Search ADS PubMed Iturria-Medina Y, Carbonell FM, Sotero RC, Chouinard-Decorte F, Evans AC. Multifactorial causal model of brain (dis)organization and therapeutic intervention: application to Alzheimer’s disease. Neuroimage 2017; 152: 60– 77. Google Scholar CrossRef Search ADS PubMed Iturria-Medina Y, Sotero RC, Toussaint PJ, Evans AC. Epidemic spreading model to characterize misfolded proteins propagation in aging and associated neurodegenerative disorders. PLoS Comput Biol 2014; 10: e1003956. Google Scholar CrossRef Search ADS PubMed Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. Neuroimage 2012; 62: 782– 90. Google Scholar CrossRef Search ADS PubMed Jucker M, Walker LC. Self-propagation of pathogenic protein aggregates in neurodegenerative diseases. Nature 2013; 501: 45– 51. Google Scholar CrossRef Search ADS PubMed Kang JM, Lee SY, Seo S, Jeong HJ, Woo SH, Lee H, et al. Tau positron emission tomography using [(18)F]THK5351 and cerebral glucose hypometabolism in Alzheimer’s disease. Neurobiol Aging 2017; 59: 210– 19. Google Scholar CrossRef Search ADS PubMed Lace G, Savva GM, Forster G, De Silva R, Brayne C, Matthews FE, et al. Hippocampal tau pathology is related to neuroanatomical connections: an ageing population-based study. Brain 2009; 132: 1324– 34. Google Scholar CrossRef Search ADS PubMed Mattsson N, Insel P, Tosun D, Zhang J, Jack CR, Galasko D, et al. Effects of baseline CSF α-synuclein on regional brain atrophy rates in healthy elders, mild cognitive impairment and Alzheimer’s disease. PLoS One 2013; 8: e85443. Google Scholar CrossRef Search ADS PubMed Mattsson N, Schott JM, Hardy J, Turner MR, Zetterberg H. Selective vulnerability in neurodegeneration: insights from clinical variants of Alzheimer’s disease. J Neurol Neurosurg Psychiatry 2016; 87: 1000– 4. Google Scholar CrossRef Search ADS PubMed Mezias C, LoCastro E, Xia C, Raj A. Connectivity, not region-intrinsic properties, predicts regional vulnerability to progressive tau pathology in mouse models of disease. Acta Neuropathol Commun 2017; 5: 61. Google Scholar CrossRef Search ADS PubMed Mishra S, Gordon BA, Su Y, Christensen J, Friedrichsen K, Jackson K, et al. AV-1451 PET imaging of tau pathology in preclinical Alzheimer disease: defining a summary measure. Neuroimage 2017; 161: 171– 78. Available from: http://www.ncbi.nlm.nih.gov/pubmed/28756238 Google Scholar CrossRef Search ADS PubMed Murray ME, Graff-Radford NR, Ross OA, Petersen RC, Duara R, Dickson DW. Neuropathologically defined subtypes of Alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol 2011; 10: 785– 96. Google Scholar CrossRef Search ADS PubMed Nelson PT, Alafuzoff I, Bigio EH, Bouras C, Braak H, Cairns NJ, et al. Correlation of Alzheimer disease neuropathologic changes with cognitive status: a review of the literature. J Neuropathol Exp Neurol 2012; 71: 362– 81. Google Scholar CrossRef Search ADS PubMed Noh Y, Jeon S, Lee JM, Seo SW, Kim GH, Cho H, et al. Anatomical heterogeneity of Alzheimer disease: based on cortical thickness on MRIs. Neurology 2014; 83: 1936– 44. Google Scholar CrossRef Search ADS PubMed Park JY, Na HK, Kim S, Kim H, Kim HJ, Seo SW, et al. Robust identification of Alzheimer’s disease subtypes based on cortical atrophy patterns. Sci Rep 2017; 7: 43270. Google Scholar CrossRef Search ADS PubMed Peter J, Scheef L, Abdulkadir A, Boecker H, Heneka M, Wagner M, et al. Gray matter atrophy pattern in elderly with subjective memory impairment. Alzheimers Dement 2014; 10: 99– 108. Available from: http://linkinghub.elsevier.com/retrieve/pii/S1552526013024230 Google Scholar CrossRef Search ADS PubMed Raj A, Kuceyeski A, Weiner M. A network diffusion model of disease progression in dementia. Neuron 2012; 73: 1204– 15. Google Scholar CrossRef Search ADS PubMed Raj A, LoCastro E, Kuceyeski A, Tosun D, Relkin N, Weiner M, et al. Network diffusion model of progression predicts longitudinal patterns of atrophy and metabolism in Alzheimer’s disease. Cell Rep 2015; 10: 359– 69. Google Scholar CrossRef Search ADS Reitz C, Honig L, Vonsattel JP, Tang MX, Mayeux R. Memory performance is related to amyloid and tau pathology in the hippocampus. J Neurol Neurosurg Psychiatry 2009; 80: 715– 21. Google Scholar CrossRef Search ADS PubMed Reuter M, Schmansky NJ, Rosas HD, Fischl B. Within-subject template estimation for unbiased longitudinal image analysis. Neuroimage 2012; 61: 1402– 18. Available from: http://linkinghub.elsevier.com/retrieve/pii/S1053811912002765 Google Scholar CrossRef Search ADS PubMed Rüb U, Del Tredici K, Schultz C, Thal DR, Braak E, Braak H. The evolution of Alzheimer’s disease-related cytoskeletal pathology in the human raphe nuclei. Neuropathol Appl Neurobiol 2000; 26: 553– 67. Google Scholar CrossRef Search ADS PubMed Sassin I, Schultz C, Thal DR, Rüb U, Arai K, Braak E, et al. Evolution of Alzheimer’s disease-related cytoskeletal changes in the basal nucleus of Meynert. Acta Neuropathol 2000; 100: 259– 69. Google Scholar CrossRef Search ADS PubMed Seeley WW, Crawford RK, Zhou J, Miller BL, Greicius MD. Neurodegenerative diseases target large-scale human brain networks. Neuron 2009; 62: 42– 52. Google Scholar CrossRef Search ADS PubMed Thal DR, Rüb U, Orantes M, Braak H. Phases of A beta-deposition in the human brain and its relevance for the development of Alzheimer’s disease. Neurology 2002; 58: 1791– 800. Google Scholar CrossRef Search ADS PubMed Thompson PM, Hayashi KM, de Zubicaray G, Janke AL, Rose SE, Semple J, et al. Dynamics of gray matter loss in Alzheimer’s disease. J Neurosci 2003; 23: 994– 1005. Google Scholar PubMed Whitwell JL, Dickson DW, Murray ME, Weigand SD, Tosakulwong N, Senjem ML, et al. Neuroimaging correlates of pathologically defined subtypes of Alzheimer’s disease: a case-control study. Lancet Neurol 2012; 11: 868– 77. Google Scholar CrossRef Search ADS PubMed Wu JW, Hussaini SA, Bastille IM, Rodriguez GA, Mrejeru A, Rilett K, et al. Neuronal activity enhances tau propagation and tau pathology in vivo. Nat Neurosci 2016; 19: 1085– 92. Google Scholar CrossRef Search ADS PubMed Xia C, Makaretz SJ, Caso C, McGinnis S, Gomperts SN, Sepulcre J, et al. Association of in vivo [18 F]AV-1451 Tau PET imaging results with cortical atrophy and symptoms in typical and atypical Alzheimer disease. JAMA Neurol 2017; 74: 427. Google Scholar CrossRef Search ADS PubMed Zhou J, Gennatas ED, Kramer JH, Miller BL, Seeley WW. Predicting regional neurodegeneration from the healthy brain functional connectome. Neuron 2012; 73: 1216– 27. Google Scholar CrossRef Search ADS PubMed © The Author(s) (2018). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: firstname.lastname@example.org
Brain – Oxford University Press
Published: Mar 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.
All for just $49/month
It’s easy to organize your research with our built-in tools.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud