TY - JOUR AU - Boussau, Bastien AB - Abstract Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. For example, horizontal gene transfer events and ancient coevolutionary interactions such as (endo)symbioses occur between contemporaneous species and thus can imply temporal relationships between two nodes in a phylogeny. Temporal constraints from these alternative sources can be particularly helpful when the geological record is sparse, for example, for microorganisms, which represent the majority of extant and extinct biodiversity. Here, we present a new method to combine fossil calibrations and relative age constraints to estimate chronograms. We provide an implementation of relative age constraints in RevBayes that can be combined in a modular manner with the wide range of molecular dating methods available in the software. We use both realistic simulations and empirical datasets of 40 Cyanobacteria and 62 Archaea to evaluate our method. We show that the combination of relative age constraints with fossil calibrations significantly improves the estimation of node ages. [Archaea, Bayesian analysis, cyanobacteria, dating, endosymbiosis, lateral gene transfer, MCMC, molecular clock, phylogenetic dating, relaxed molecular clock, revbayes, tree of life.] Dated species trees (chronograms or timetrees, in which branch lengths are measured in units of geological time) are used in all areas of evolutionary biology. Their construction typically involves collecting molecular sequence data, which are then analyzed using probabilistic models (Álvarez-Carretero and dos Reis 2020). Commonly, in a clock-dating analysis, the assumption of a strict molecular clock (Zuckerkandl and Pauling 1962) is relaxed and variation in evolutionary rates is allowed. Such relaxed molecular clock methods combine three components: a model of sequence evolution, a model of clock rate variation across the phylogeny, and calibrations of node ages. Inference is typically performed using Bayesian Markov chain Monte Carlo (MCMC) algorithms (Mau and Newton 1997; Yang and Rannala 1997). Inferring the age of speciations based on molecular data is challenging because it amounts to factoring divergence between sequences, estimated in units of substitutions per site, as a product of time (ages of speciations) and rates of evolution (Donoghue and Yang 2016). Additional information on ages and clock rates must be provided. Information on node ages can be provided through calibrated nodes, that is, nodes that can be associated to a date in the past, usually with some uncertainty, typically through probability distributions (Yang and Rannala 2006). Node age calibrations are often derived from the ages of particular fossils or groups of fossils, but any information about dates in the past that can be associated with nodes (e.g., geochemical information such as the amount of oxygen in the atmosphere) can be used (Parham et al. 2012). By contrast, external data are rarely available to inform clock rates, especially over longer timescales where contemporary mutation rates, even if they are known, are not informative. Consequently, inferences of the rate of evolution combine information contained in the analyzed sequence data and in the node age calibrations and are strongly dependent on the model of rate evolution along the phylogeny (Ho and Duchêne 2014). When rates can be considered to be constant throughout the phylogeny, that is, when the strict molecular clock hypothesis (Zuckerkandl and Pauling 1962) can be applied, only a single global rate needs to be estimated. For datasets that do not fit the strict molecular clock hypothesis, rate variation needs to be modeled. Several such relaxed-clock models have been proposed (Thorne et al. 1998; Drummond et al. 2006; Lepage et al. 2007; Heath et al. 2012; Lartillot et al. 2016) to account for rate variation across the phylogeny. Some assume that branch-specific rates are drawn independently of each other from a common distribution with global parameters (Drummond et al. 2006; Lepage et al. 2007; Heath et al. 2012). Other models assume neighboring branches to have more similar rates than distant branches (Thorne et al. 1998), and a model that can accommodate both situations has recently been proposed (Lartillot et al. 2016). The sophistication, and typically much better fit (Pybus 2006) of relaxed-clock models, however, comes at a price: inference is computationally more demanding than under the strict molecular clock. This is because relaxed-clock models contain a large number of parameters, some of which are highly correlated, and special MCMC algorithms are required (Zhang and Drummond 2020). Since the inference of the rate of evolution extracted by relaxed-clock models contains uncertainty, dating a phylogeny relies heavily on node calibrations (Pybus 2006; dos Reis et al. 2015). Recent developments complement node age calibration with tip-dating where fossil species are placed as tips (Pyron 2011; Ronquist et al. 2012; Gavryushkina et al. 2017) or sampled ancestors (Gavryushkina et al. 2014) in the phylogeny, serially sampled phylogenies with molecular sequence from different times (Drummond et al. 2002; Stadler and Yang 2013), and biogeographic calibrations (Landis 2017; Landis et al. 2021). These developments considerably improve our ability to incorporate additional data and uncertainty in node age estimation. However, all these approaches require either fossil data with known phylogenetic placement (node-dating), associated morphological/molecular sequence data (tip-dating), or geographic/geological restriction (biogeographic-dating). Unfortunately, fossils are rare and unevenly distributed both in the geological record and on the tree of life. Microbes, in particular, have left few fossils that can be unambiguously assigned to known species or clades. Therefore, entire clades cannot be reliably dated. For example, a recent dating analysis encompassing the three domains of life (Betts et al. 2018) used only 11 fossil calibrations, 7 of which could be assigned to Eukaryotes, 3 to Bacteria, 1 to the root, and none to Archaea. Clearly, incorporating new sources of information into dating analyses would be very useful, especially for dating the microbial tree of life. Recently, it has been shown that gene transfers encode a novel and abundant source of information about the temporal coexistence of lineages throughout the history of life (Szöllosi et al. 2012; Davín et al. 2018; Magnabosco et al. 2018; Wolfe and Fournier 2018). From the perspective of divergence time estimation, gene transfers provide node order constraints, that is, they specify that a given node in the phylogeny is necessarily older than another node, even though the older node is not an ancestor of the descendant node (Fig. 1a). Davín et al. (2018) showed that the dating information provided by these constraints was consistent with information provided by (calibrated) relaxed molecular clocks, which suggest that node calibrations could be combined with node-order constraints to date species trees more accurately. The benefit of including transfer-based constraints may be particularly noticeable in microbial clades, where transfers can be frequent (Doolittle 1999; Abby et al. 2012; Szöllosi et al. 2012; Davín et al. 2018) and fossils are rare. However, constraints may also be derived from other events, such as the transfer of a parasite or symbiont between hosts, endosymbioses, or other obligatory relationships. Figure 1. Open in new tabDownload slide Relative age constraints inform molecular clock-based dates. This conceptual figure illustrates how relative age constraints can affect a posteriori node age estimates. The amount of information used increases from a to c. Elements written in bold correspond to new information. a) Estimation of divergence time from sequences requires at least one maximum age calibration, typically provided as a maximum age of the root. As illustrated above with only a single maximum age calibration, the estimates will be highly uncertain. b) Incorporating multiple minimum and maximum age calibrations, usually based on fossils from the geological record, can increase the resolution and accuracy of node ages, but well-resolved and accurate ages require large numbers of calibrations that are not always available. c) Incorporating relative age constraints that specify that a given node in the phylogeny is necessarily older than another node, even though the older node is not an ancestor of the descendant node, can further improve the resolution and accuracy of molecular clock inferences. The inclusion of relative age constraints into dating methods has so far involved ad hoc approaches, comprising several steps (Davín et al. 2018; Magnabosco et al. 2018; Wolfe and Fournier 2018). A statistically correct two-step approach was proposed by Magnabosco et al. (2018). First, an MCMC chain is run with calibrations but without relative age constraints. Then the posterior sample of timetrees is filtered to remove timetrees that violate relative age constraints. This approach works for a small number of constraints, but is difficult to scale to large numbers of constraints, where an increasing proportion of sampled timetrees will be rejected. Here, we present a method to combine relative node age constraints with node age calibrations within the standard (relaxed) molecular clock framework in a Bayesian framework. The resulting method is statistically sound and can handle a large number of constraints. We examine its performance on realistic simulations and evaluate its benefits on two empirical datasets. Materials and Methods Bayesian MCMC Dating with Calibrations and Constraints Informal description Relaxed-clock dating methods are often implemented in a Bayesian MCMC framework. Briefly, prior distributions are specified for (1) a diversification process (e.g., a birth–death prior) (Rannala and Yang 1996), (2) the parameters of a model of sequence evolution (e.g., the HKY model; Hasegawa et al. 1985), (3) calibration ages, and (4) the parameters of a model of rate heterogeneity along the tree. Such models may consider that neighboring branches have correlated rates of evolution (e.g., the autocorrelated lognormal model; Thorne et al. 1998), or that each branch is associated to a rate drawn from a shared distribution (e.g. the uncorrelated gamma [UGAM] model; Drummond et al. 2006). Calibrations specify prior distributions that account for the uncertainty associated with the corresponding node ages (dos Reis et al. 2015), and sometimes for the uncertainty associated with their position in the species tree (Heath et al. 2014). Our method introduces relative node age constraints as a new type of information that can be incorporated into this framework. We chose to treat node-order constraints as data without uncertainty, in the same way that topological constraints have been implemented in, for example, MrBayes (Ronquist and Huelsenbeck 2003; Bouckaert et al. 2019). Note that our approach disregards uncertainty and differs from common node age calibrations. This decision provides us with a simple way to incorporate constraints in the model: during the MCMC, any tree that does not satisfy a constraint is given a prior probability of 0, and is thus rejected during the Metropolis–Hastings step. Therefore, only trees that satisfy all relative node age constraints have a nonzero posterior probability. Formal description Let |$A$| be the sequence alignment, Ca be the set of fossil calibrations, and Co be the set of node-order constraints. Further, let |$\Psi_{t}$| be the timetree, that is, a tree with branch lengths in units of time (e.g. years), and |$\Psi_{s}$| be the tree with branches measured in expected number of substitutions per unit time, respectively. Finally, let |$\theta$| be the set of all other parameters. In particular, |$\theta$| contains the parameters of the sequence evolution model, the parameters of the relaxed molecular clock model, and the rates of the timetree diversification model. The sets (⁠|$A$|⁠, Co, Ca) and (⁠|$\Psi_{s}, \Psi_{t}, \theta $|⁠) fully specify the data and the model, respectively. Then, the posterior distribution is $$ \begin{align}\label{eq1} &P(\Psi_{\rm s} ,\Psi_{\rm t} ,\theta | A,\mbox{Ca, Co})\nonumber\\ &= \frac{P(A\mbox{, Ca, Co| }\Psi_{\rm s} ,\Psi_{\rm t} ,\theta )\times P(\Psi_{\rm s} | \Psi_{\rm t} ,\theta )\times P(\Psi_{\rm t} ,\theta )}{P(A\mbox{, Ca, Co})}.\end{align} $$(1) The likelihood consists of two terms, the first of which can be further separated into $$ \begin{align} \label{eq2} P(A,\mbox{Ca, Co| }\Psi_{\rm s} ,\Psi_{\rm t} ,\theta ) {=} P(A| \Psi_{\rm s} \mbox{,}\theta ){\times}P(\mbox{Ca}| \Psi_{\rm t} )\times P(\mbox{Co}| \Psi_{\rm t} ), \end{align} $$(2) where |$P(A| \Psi_{\rm s} \mbox{,}\theta )$| is the phylogenetic likelihood typically obtained with the pruning algorithm (Felsenstein 1981). The probability density |$P(\mbox{Ca}| \Psi_{\rm t} )$| assures the node age calibrations Ca are honored by |$\Psi_{\rm t} $| using distributions with hard or soft boundaries (Yang and Rannala 2005). Node-order constraints are accounted for by |$P(\mbox{Co}| \Psi_{\rm t} ) = \delta (\mbox{Co},\Psi _{\rm t} )$|⁠, where |$\delta \mbox{(Co},\Psi_{\rm t} )$| is the indicator function that is 1 if the node-order constraints Co are satisfied by |$\Psi _{\rm t} $|⁠, and 0 otherwise. The second term |$P(\Psi_{\rm s} | \Psi_{\rm t} ,\theta )$| of the likelihood in Equation (1) describes the relaxed molecular clock model, which includes the rate modifiers relating the branches in expected number of substitutions of |$\Psi_{\rm s} $| to the branches in units of time of |$\Psi_{\rm t} $|⁠. Here, we use the UGAM-relaxed molecular clock model, but many other models such as the lognormal-relaxed molecular clock model are available (Lepage et al. 2007). Finally, the prior |$P(\Psi_{\rm t} ,\theta )$| is usually separated into a product of a timetree prior |$P(\Psi_{\rm t} | \theta)$| typically based on the birth–death process (Rannala and Yang 1996) and a prior |$P(\theta)$| on the other parameters. Two-Step Inference of Timetrees Evaluation of the phylogenetic likelihood |$P(A| \Psi_{\rm s} ,\theta )$| in Equation (2) is the most expensive operation when calculating the posterior density. Further, the phylogenetic likelihood has to be recalculated at each iteration when performing a Bayesian MCMC analysis. Typically, the Markov chain has to be run for many iterations to obtain a good approximation of the posterior distribution. Consequently, inference is cumbersome, even when the topology of |$\Psi_{\rm s} $| is fixed. To reduce the computational cost, we decided to approximate the phylogenetic likelihood within a two-step approach. In the first step, the posterior distribution of branch lengths measured in expected number of substitutions is obtained for the fixed unrooted topology of |$\Psi_{\rm s} $| using a standard MCMC analysis. The obtained posterior distribution is used to calculate the posterior mean |$\mu_i $| and posterior variance |$v_i$| of the branch length for each branch |$i\in I$| of the unrooted topology of |$\Psi_{\rm s} $|⁠. In the second step, the posterior means and variances are then used to approximate the phylogenetic likelihood using a composition of normal distributions $$ \begin{align}\label{eq3} P(A| \Psi_{\rm s} ,\theta ) \approx \prod_{i \in I}{N(\lambda_i ;\mu_i ,\upsilon_i )} , \end{align} $$(3) where |$\lambda_i$|⁠, which is sampled during this second MCMC analysis, is the branch length measured in expected number of substitutions of branch |$i$| of the unrooted topology of |$\Psi_{\rm s}$|⁠. |$N(x,\mu,v)$| is the probability density of the normal distribution with mean |$\mu$| and variance |$v$| evaluated at |$x$|⁠. Since the two branches leading to the root of |$\Psi_{\rm t} $| correspond to a single branch on the unrooted topology of |$\Psi_{\rm s} $|⁠, only their sum contributes to |$P(A| \Psi_{\rm s} ,\theta )$|⁠. The two-step approach has the same motivation as the penalized approach of (Sanderson 2002) and is similar to the approximation of the phylogenetic likelihood performed by MCMCTree (dos Reis and Yang 2011). MCMCTree uses a variable transformation together with a secondorder Taylor expansion of the likelihood surface, thereby also handling the covariance of branch lengths. The two-step approach reported here is fast for large datasets as well as complex models. In fact, state-of-the-art substitution models such as the CAT model, which is currently available only in PhyloBayes (Lartillot et al. 2013), could be used during the first step of the analysis. Implementation We implemented this model and the two-step approach in RevBayes so that it can be combined with other available relaxed molecular clock models and models of sequence evolution and species diversification. Using the model in a RevScript implies calling two additional functions: one to read the constraints from a file and another one to specify the timetree prior accounting for the constraints. Scripts are available at https://github.com/Boussau/DatingWithConsAndCal. We also provide a tutorial to guide RevBayes users: https://revbayes.github.io/tutorials/relative_time_constraints/ Evaluation of the Accuracy of the Two-Step Approach We compared our two-step, composite-likelihood approach to the one-step, full Bayesian MCMC approach in combination with two different models of rate evolution, White Noise (WN), and UGAM (see Lepage et al. [2007] for a presentation of both). Analyses were performed in RevBayes (Höhna et al. 2016). We used an empirical sequence alignment and phylogeny of 36 mammalian species from dos Reis et al. (2012), using all their calibrations and no relative constraint. Simulations to Evaluate the Usefulness of Relative Node Age Constraints General framework We generated an artificial timetree and extracted calibration points from its node ages. We also gathered node-order constraints by recording true node orders. Then we altered the branch lengths of the timetree to obtain branch lengths in expected number of substitutions (see Fig. S1 of the Supplementary material available on Dryad at https://doi.org/10.5061/dryad.s4mw6m958 for a description of our simulation protocol). Based on this substitution tree, we simulated a DNA sequence alignment. Based on this sequence alignment, we used the two-step approach described above in RevBayes to infer timetrees. We then compared the reconstructed node ages to the true node ages from the artificial timetree to investigate the information provided by constraints. Simulating an artificial timetree To obtain a tree with realistic divergence times, we decided to simulate a tree that has the same divergence times as in the timetree from Betts et al. (2018). To do so, we gathered the divergence times from that timetree and produced an artificial tree by firstly randomly joining tips to produce speciation events, and secondly assigning the divergence times from the empirical timetree to these speciation events. We call the resulting tree a “shuffled tree” (Fig. 2). This shuffled tree has total depth from root to tips 45.12 units of time, as the timetree from Betts et al. (2018). Figure 2. Open in new tabDownload slide Shuffled tree with 102 taxa, calibrated nodes, and node-order constraints. Calibrated nodes are shown with red dots when they are part of the set of 10 balanced calibrations, and with blue dots when they are part of the set of 10 unbalanced calibrations. Handpicked constraints have been numbered from 1 to 15, according to one order in which they were used (e.g., constraint 1 was used when only one constraint was included, constraints 1 to 5 when 5 constraints were included, and so on). Constraints have been colored according to their characteristics: green constraints are the 5 constraints between nodes with most similar ages (proximal), orange constraints are the 5 constraints between nodes with least similar ages (distal), and purple constraints are in between. Building calibration times and node order constraints We chose to use 10 internal node calibrations plus one calibration at the root node, as in Betts et al. (2018). We used two configurations: one balanced configuration where calibrations are placed on both sides of the root and one unbalanced configuration where calibrations are found only on one side of the root (Fig. 2, red and blue dots, respectively). In both cases, calibrations were hand-picked. We hand-picked 15 constraints by gathering true node orders from the shuffled tree. In choosing our sets of constraints, we avoided redundant constraints, that is, constraints that were already implied by previously included constraints (Fig. 2), and aimed to cover the phylogeny homogeneously. We performed one inference with 0 constraints and one inference with all 15 constraints. In addition, we ran 10 independent experiments. In each experiment, we performed inference 14 times, varying the number of constraints from 1 to 14. The order with which constraints were introduced varied between experiments. We built calibration times from the artificial tree by gathering the true speciation time, and associating it with a prior distribution to convey uncertainty. The prior distribution we chose is uniform between [true age – (true age/5); true age |$+$| (true age/5)] and decays according to the tails of a normal distribution with standard deviation 2.5 beyond these boundaries (with |$2.5{\%}$| of the prior weight in each tail). Ten calibration points were chosen both in the balanced and unbalanced cases (Fig. 2). In addition, the tree root age was calibrated with a uniform distribution between [root age – (root age/5); root age |$+$| (root age/5)]. Simulations of deviations from the clock The shuffled tree was rescaled to yield branch lengths that can be interpreted as numbers of expected substitutions (its length from root to tip was 0.451). Then, it was traversed from root to tips, and rate changes were randomly applied to the branches according to two Poisson processes, one for small and frequent rate changes and one for big and rare rate changes. The magnitudes of small and large rate changes were drawn from lognormal distributions with parameters (mean |$= 0.0$|⁠, variance |$=0.1$|⁠) and (mean |$= 0.0$|⁠, variance |$=0.2$|⁠), respectively, and their rates of occurrence were 33 and 1, respectively. After this process, branches smaller than 0.01 were set to 0.01. The trees at the various steps of this simulation pipeline are also represented in Fig. S1 of the Supplementary material available on Dryad. We compared the extent of the deviations from ultrametricity we had introduced in our simulated tree to empirical trees from the Hogenom database (Penel et al. 2009). Fig. S2 of the Supplementary material available on Dryad shows that our simulated tree harbors a realistic amount of nonultrametricity. Alignment simulation The tree rescaled with deviations from the clock was used to simulate one alignment 1000 bases long according to a HKY model (Hasegawa et al. 1985), with ACGT frequencies |${\{}0.18, 0.27, 0.33, 0.22{\}}$| and with a transition/transversion ratio of 3, both chosen arbitrarily. Inference based on simulated data Inference of timetrees based on the simulated alignment was performed in two steps as explained above. Both steps were performed in RevBayes (Höhna et al. 2016). We inferred branch length distributions under a Jukes–Cantor model (Jukes and Cantor 1969) to make our test more realistic in that the reconstruction model is simpler than the process generating the data. The tree topology was fixed to the true unrooted topology. The obtained posterior distributions of branch lengths were then summarized by their mean and variance per branch. These means and variances were given as input to a script that computes a posterior distribution of timetrees according to a birth–death prior on the tree topology and node ages, an UGAM prior on the rate of sequence evolution through time (Lepage et al. 2007), and using the calibrations and constraints gathered in previous steps (see above), with the Metropolis-coupled MCMC algorithm (Altekar et al. 2004). Python code using the ete3 library (Huerta-Cepas et al. 2016) and RevBayes code to simulate sequences and run the analyses are available at https://github.com/Boussau/DatingWithConsAndCal/blob/master/Scripts, along with a README file Empirical data analyses We used alignments, tree topologies, and sets of constraints from Archaea and Cyanobacteria analyzed in Davín et al. (2018). In both cases, the constraints had been derived from transfers identified in the reconciliations of thousands of gene families with the species tree, and filtered to keep the largest consistent set of supported constraints. We used 431 constraints for Archaea and 144 for Cyanobacteria. In Cyanobacteria, fossil calibration corresponded to a minimum age for fossil akinetes at 1.956 GYa. Reflecting our uncertainty regarding the age of the root, we tried two alternatives for the maximum root age (i.e. age of crown cyanobacteria), 2.45 Gy and 2.7 Gy, corresponding to the “Great Oxygenation Event” and the “whiff of Oxygen” (Holland 2006), respectively. As the age of the root of Archaea is uncertain, we explored the impact on our inferences of three different choices: a relatively young estimate of 3.5 Gya from the analysis of Wolfe and Fournier (2018); the end of the late heavy bombardment at 3.85 Gya (Boussau and Gouy 2012); and the age of the solar system at 4.52 Gya (Barboni et al. 2017). Alignments, trees, and sets of constraints are available at https://doi.org/10.5061/dryad.s4mw6m958. We used the CAT-GTR model in PhyloBayes (Lartillot et al. 2013) to generate branch length tree distributions with a fixed topology, and our two-step approach in RevBayes (Höhna et al. 2016) to compute posterior distributions of timetrees, under the UGAM model of rate evolution (Drummond et al. 2006). Results Two-Step Inference Provides an Efficient and Flexible Method to Estimate Time Trees We compared posterior distributions of node ages obtained using the classical full Bayesian MCMC approach to those obtained using our two-step approximation on a dataset of 36 mammalian species (dos Reis et al. 2012). As shown in Figs. S3–S6 of the Supplementary material available on Dryad, the two posterior distributions of node ages are practically indistinguishable. Further, the impact of the approximation is negligible in comparison to the choice of the model of rate evolution. We used the UGAM or the WN models, both uncorrelated, and found that using one or the other results in more differences in the estimated node ages than using our two-step inference compared with the full Bayesian MCMC. Simulations Constraints improve dating accuracy We used two statistics to evaluate the accuracy of node age estimates. Firstly, we computed the normalized root mean square deviation (RMSD) between the true node ages used in the simulation and the node ages estimated in the maximum a posteriori tree (Fig. 3a), and normalized it by the true node ages. This provides measures of the error as a percentage of the true node ages. Secondly, we computed the coverage probability, that is, how frequently the |$95{\%}$| high posterior density (HPD) intervals on node ages contained the true node ages (Fig. 3b). Figure 3. Open in new tabDownload slide Increasing the number of constraints improves node age estimation. a) Average normalized RMSD over all internal node ages is shown in orange for 10 balanced calibrations and blue for 10 unbalanced calibrations. This is a measure of the error as a percentage of the true node ages. b) The percentage of nodes with true age in |$95{\%}$| HPD interval is shown (colors as in a). Regression lines with confidence intervals in gray have been superimposed. As the number of constraints increases, Figure 3a shows that the error in node ages decreases and Figure 3b shows that the |$95{\%}$| HPD intervals include the true node ages more often. When 0 or only 1 constraint is used, the true node age is contained in only |$\sim 55{\%}$| of the |$95{\%}$| HPD intervals, suggesting that the mismatch between the model used for simulation and the model used for inference has a noticeable impact. Poor mixing could also explain these results, but it is unlikely to occur in our experiment for two reasons. First, the expected sample sizes for the node ages are typically above 300. Second, if the same moves are used in the MCMC, but the simulation model is changed to fit the inference model, about |$95{\%}$| of the true node ages end up in |$95{\%}$| HPD intervals, as expected for well-calibrated Bayesian methods and well-mixing MCMC chains (see Fig. S8 of the Supplementary material available on Dryad and associated section). Results improve with more constraints. The variation in normalized RMSD can be explained by a linear model (M1) including an intercept and the number of constraints with an adjusted |$R$|-square of |$\sim $|0.63. However, it appears that points in Figure 3a can be grouped in at least two clusters: those with normalized RMSD above |$\sim 48{\%}$| and those below. This suggests that some constraints have a bigger effect than other constraints. In particular, constraint 5 (see Fig. 2) is absent from all runs with normalized RMSD above |$48{\%}$|⁠, suggesting that it is highly informative (more on the informativeness of constraints below). The results obtained with the balanced set of calibrations are similar to the results obtained with the unbalanced set of calibrations: adding a variable indicating whether the balanced or unbalanced sets were used to model M1 does not improve the adjusted |$R$|-square. Constraints reduce credibility intervals The additional information provided by constraints results in smaller credibility intervals, as shown in Figure 4. The improvement in coverage probability observed in Figure 3b therefore occurs despite smaller credibility intervals. Figure 4. Open in new tabDownload slide The |$95{\%}$| HPD intervals on node ages become smaller as the number of constraints increases. The sizes are given in units of time; for reference, the total depth for the true tree is 45.12 units of time. Colors as in Figure 3. A regression line with confidence intervals in gray has been superimposed. Investigating the informativeness of constraints To measure the informativeness of constraints, we developed a linear model predicting the normalized RMSD based on whether or not each of the 15 constraints were used, using the results obtained with either the balanced or unbalanced calibrations. This linear model improves upon M1 with an adjusted |$R$|-square of 0.91. Its coefficients provide a measure of the informativeness of each constraint (Fig. 5). Figure 5. Open in new tabDownload slide Contribution of individual constraints to dating error. Each constraint reduces up to 9.1 normalized RMSD percentage points. Error bars correspond to twice the standard error. Stars indicate coefficients of the linear model that are significantly different from 0 at the |$1{\%}$| level. Computations were run with either the 10 balanced or 10 unbalanced calibrations. Some constraints are much more informative than other constraints. Constraint 5 is the most informative one, as it reduces the normalized RMSD by 9.1 percentage points, followed by constraint 6, which reduces RMSD by 5.5 points, and constraint 13 which reduces RMSD by 4.4 points. All provide a significant reduction in normalized RMSD according to our linear model at the |$1{\%}$| level, along with constraints 2, 7, and 12. Constraints 1, 3, 4, 8, 9, 10, 11, 14, and 15 do not bring much information as they do not significantly affect the normalized RMSD at the |$1{\%}$| level. Constraints 3 and 14 appear to increase the normalized RMSD if the significance threshold is increased to |$5{\%}$|⁠. To understand what explains the difference in infor- mativeness among our constraints, we computed statis- tics associated with each of them. We provide a more detailed discussion of what could make a constraint informative in the Supplementary Material, but here we investigated eight different statistics computed on the true timetree. Firstly, three statistics computed between the two constrained nodes: the difference in true node ages, the nodal distance, and the sum of branch lengths. We also noted whether the constraint spanned the root node, computed the number of leaves in the older and younger subtrees involved in the constraint, and the number of nodes ancestral to the nodes involved in the constraint. We regressed the contributions of each constraint to the normalized RMSD (Fig. 5) against these eight statistics. We obtained an adjusted |$R$|-square of |$\sim $|0.67. The number of leaves in the younger subtree was the only significant explanatory variable at the |$5{\%}$| threshold and the sum of branch lengths between the two constrained nodes came second (⁠|$6.7{\%}$|⁠). A constraint such that the younger node is the ancestor of a big subtree brings a lot of information because it provides an upper time constraint to all the nodes in the subtree. This is particularly useful in our context where all calibrations are lower time calibrations. Analyses of Empirical Data Davín et al. (2018) showed that gene transfers contain dating information that is consistent with relaxed molecular clock models. We used a phylogeny of cyanobacterial genomes presented in Davín et al. (2018) and a phylogeny of archaeal genomes from Williams et al. (2017) to investigate the individual and cumulative impacts of fossil calibrations and relative constraints on the inference of time trees. Relative constraints agree with fossil calibration on the age of akinete-forming multicellular Cyanobacteria Davín et al. (2018) analyzed a set of 40 cyanobacteria spanning most of their species diversity. Cyanobacteria likely originated more than 2 billion years ago, but a review of the literature suggests that there is only a single reliable fossil calibration that we can place on the species tree: a minimum bound for akinete-forming multicellular Cyanobacteria from Tomitani et al. (2006). These authors reported a series of fossils that they assign to filamentous Cyanobacteria producing both specialized cells for nitrogen fixation (heterocysts) and resting cells able to endure environmental stress (akinetes). We investigated whether node-order constraints could recover the effect of the available fossil calibration by comparing several dating protocols, with or without fossil calibrations and node-order constraints (Fig. 6). Figure 6. Open in new tabDownload slide Relative age constraints agree with the akinete fossil calibration that akinete-forming multicellular Cyanobacteria are likely older than suggested by sequence data alone. We compared four dating protocols for the 40 cyanobacteria from Davín et al. 2018: a) fossil calibration (dashed red line) with no node-order constraints, b) no fossil calibration and no relative age constraints, c) 144 node-order constraints, with no fossil calibration, and d) simultaneous fossil calibration and constraints. All four chronograms were inferred with a root maximum age of 2.45 Gya with an UGAM rate prior, and a birth–death prior on divergence times. Clade highlighted in green corresponds to akinete-forming multicellular cyanobacteria. Comparison between Figure 6a and b shows that including the minimum calibration increases the age of the clade containing akinete-forming multicellular Cyanobacteria (green clade) by about 1 Gy. It is noteworthy that the inclusion of constraints partially compensates for the absence of a minimum calibration (Fig. 6c) and places the age of clade of akinete forming multicellular Cyanobacteria significantly older, and close to its age when a fossil-based minimum age calibration is used (Fig. 6a). This implies that the information provided by constraints is concordant with the fossil age for multicellular Cyanobacteria. Combining calibrations and constraints (Fig. 6d) produces a chronogram with similar ages, but significantly smaller credibility intervals. To further characterize the effect of constraints on the age of akinete forming multicellular Cyanobacteria, we plotted the distributions of its age based on different sources of dating information and for different choices of root maximum age. In Figure 7a, we show the age of akinete forming multicellular Cyanobacteria (green clade in Fig. 6) estimated based on (i) only the rate and divergence time priors, (ii) priors and sequence divergence only, (iii) priors and relative age constraints only, and (iv) priors and both sequence divergence and relative age constraints. Comparison of the age distributions shows that relative age constraints convey information that complements sequence divergence and is coherent with the fossil record on the age of akinete-forming Cyanobacteria. Figure 7. Open in new tabDownload slide Distributions of key node ages according to different sources of dating information. We show the age of a) akinete-forming Cyanobacteria, b) Thaumarchaeota, and c) the most recent common ancestor of methanogenic Archaea. Distributions in white are based solely on the maximum root age and the rate and divergence time priors, distributions in red are informed by sequence divergence, distributions in blue include relative age constraints, but not sequence divergence, while distributions in green rely on both. Dashed lines indicate, respectively, a) age of fossils of putative akinete forming multicellular cyanobacteria, b) age of Viridiplantae, and c) age of evidence for biogenic methane. For the corresponding time trees with constraints, see Fig. S9 of the Supplementary material available on Dryad. Relative constraints refine the time tree of Archaea We next investigated divergence times of the Archaea, one of the primary domains of life (Woese et al. 1990). We used the data from Williams et al. (2017) containing 62 species. Most analyses place the root of the entire tree of life between Archaea and Bacteria (Gogarten et al. 1989; Iwabe et al. 1989; Woese et al. 1990; Gouy et al. 2015), suggesting that the Archaea are likely an ancient group. However, there are no unambiguous fossil Archaea and so the history of the group in geological time is poorly constrained. Methanogenesis is a hallmark metabolism of some members of the Euryarchaeota, and so the discovery of biogenic methane in 3.46 Gya rocks (Ueno et al. 2006) might indicate that Euryarchaeota already existed at that time. However, the genes required for methanogenesis have also been identified in genomes of other archaeal groups including Korarchaeota (McKay et al. 2019) and Verstraetearchaeota (Vanwonterghem et al. 2016), and it is difficult to exclude the possibility that methanogenesis maps to the root of the Archaea (Berghuis et al. 2019). Thus, ancient methane might have been produced by Euryarchaeota, another extant archaeal group, a stem archaeon, or even by Cyanobacteria (Bižić et al. 2020). In the absence of strong geochemical constraints, can relative constraints help to refine the time tree of Archaea? We investigated two nodes on the archaeal tree from Williams et al. (2017): the common ancestor of ammonia-oxidizing (AOA) Thaumarchaeota and the common ancestor of methanogenic Euryarchaeota (i.e., the common ancestor of all Euryarchaeota except for the Thermococcus/Pyrococcus clade). While we lack absolute constraints for these lineages, dating hypotheses have been proposed on the basis of individually identified and curated gene transfers to, or from, other lineages for which fossil information does exist. These include the transfer of a DnaJ-Fer fusion gene from Viridiplantae (land plants and green algae) into the common ancestor of AOA Thaumarchaeota (Petitjean et al. 2012), and a transfer of three SMC complex genes from within one clade of Euryarchaeota (Methanotecta, including the class 2 methanogens) to the root of Cyanobacteria (Wolfe and Fournier 2018). Note that, in the following analyses, we did not use the two transfers listed above. Instead, we used 431 relative constraints derived from inferred within-Archaea gene transfers; therefore, these constraints are independent of the transfers used to propose the hypotheses we test. We found that, despite uncertainty in the age of the root, the estimated age of AOA Thaumarchaeota informed by relative age constraints is consistent with the hypothesis that AOA are younger than stem Viridiplantae (Petitjean et al. 2012), with a recent estimate for the age of Viridiplantae between 972.4 and 669.9 Mya (Morris et al. 2018; Fig. 7b). As in the case of Cyanobacteria, information from relative constraints had a substantial impact on the analysis; sequence data alone (in combination with the root age prior) suggest a somewhat older age of AOA Thaumarchaeota, consistent with recent molecular clock analyses (Ren et al. 2019). In the case of methanogenic Euryarchaeota, inference both with and without relative constraints was strongly influenced by the choice of root prior (Fig. 7c), and so the results do not clearly distinguish between hypotheses about the age of archaeal methanogenesis or the potential source of ancient biogenic methane. With those caveats in mind, the information from relative constraints supported moderately older age distributions than inference from sequence data alone across all root priors. The results are consistent with an early origin of methanogenic Euryarchaeota within the archaeal domain (Wolfe and Fournier 2018) and, for the moderate (3.85 Gya) and older (4.52 Gya) priors, indicate that these archaea are a potential source of biogenic methane at 3.46 Gya (Ueno et al. 2006). Discussion Constraints Are a New and Reliable Source of Information for Dating Phylogenies Davín et al. (2018) showed that gene transfers contained reliable information about node ages. They also used this information in an ad hoc two-step process to provide approximate age estimates for a few nodes in three clades. Here we built upon these results to develop a full Bayesian method that accounts for both node-order constraints and absolute time calibrations within the MCMC algorithm by extending the standardrelaxed clock approach. We also introduced a fast and accurate two-step method for incorporating branch length distributions inferred under complex substitution models into relaxed molecular clock analyses. To test our method, we performed sequence simulations and analyzed three empirical datasets. We simulated sequences according to a model that differs from the inference model so as to emulate the typical situation with empirical data, where the process that generated the data differs from our inference models. As expected under these conditions, node age coverage probabilities, that is, the percentage of true node ages that fall within inferred |$95{\%}$| credibility intervals, are much lower than |$95{\%}$|⁠. We used a realistic phylogeny for simulating sequences by drawing node ages from a previously published dated tree including representatives from Archaea, Bacteria, and Eukaryotes (Betts et al. 2018) but by rearranging the tree topology. We then investigated the effect of sampling node age and node relative-order constraints on dating accuracy. A single tree topology and a single simulated alignment were used overall, which might adversely affect the generality of our results. However, this tree topology is large (102 tips) and realistic, and the results on empirical data suggest that our method is useful across the tree of life. Further, using a single alignment allowed us to estimate branch length distributions only once and then use our fast two-step inference to reduce our computational footprint. The simulations show that node-order constraints improve the accuracy of node ages and coverage probabilities. We further found that some constraints were more informative than others. In particular, constraints in which younger nodes were ancestral to lots of nodes tended to be more informative than other constraints. This is because such a constraint provides an upper time limit to all the nodes in the younger subtree, which is complementary to the calibrations that provide lower time limits in our test. Lower time calibrations are more frequent than upper time calibrations, which suggests that, in empirical data analyses, the most informative constraints are likely to involve younger nodes ancestral to a big subtree. Results obtained on empirical datasets show that node-order constraints extracted from dozens of gene transfers contain information that can compensate for the lack of fossil calibrations. This shows promise for dating phylogenies for which fossils are scant, that is, the great majority of the tree of life. One limitation of the method presented here is that relative constraints are treated as though they are known with certainty. Only trees that satisfy all of the input constraints will have nonzero probability, and so incorrect input constraints will result in incorrect age estimates. We, therefore, suggest that only the most reliable constraints should be used when dating a species tree using transfers. One practical approach, which we have used in our empirical analyses of genomic data, is to use only those constraints that are highly supported (Davín et al. 2018). A clear direction for future work will be to treat relative constraints probabilistically, perhaps as a function of the number and quality of inferred gene transfers that support them, or with a probability |$P$| that constraints are matched, which would be estimated in the course of the MCMC. Dating phylogenies is a challenging statistical problem since only fossils and rates of molecular evolution provide information. Here, we have developed a new method to exploit the information contained in gene transfers, which are particularly numerous in clades where fossil information is lacking. Gene transfers define node-order constraints. We have shown in simulations that using node-order constraints improves node age estimates and reduces credibility intervals. We have also used our method on two empirical datasets to show that node-order constraints can compensate for the absence of a fossil calibration: ages obtained without a fossil calibration but with constraints match those obtained with the fossil calibration, and incorporating both sources of time information further refines the inferred divergence times. Looking forward we envision that our method will be useful to date parts of the tree of life where node ages have so far remained very uncertain. Data Availability Scripts and data used to run the simulation analyses are available at https://github.com/Boussau/DatingWithConsAndCal Data for the empirical data analysis have been deposited at https://doi.org/10.5061/dryad.s4mw6m958. A tutorial is available at https://revbayes.github.io/tutorials/relative_time_constraints/ to use both our two-step approach and for dating with relative node age constraints. Author Contributions G.J.S., V.D., and B.B. initiated the project. B.B., G.J.S., and S.H. implemented the model in RevBayes. G.J.S. ran the empirical analyses and analyzed them with T.A.W. B.B. ran the simulations. D.S., G.J.S., and B.B. wrote the tutorial. B.B., G.J.S., T.A.W., and V.D. wrote the manuscript. All authors read and commented on the manuscript. Acknowledgements The authors thank the reviewers and the editor for their comments on earlier versions of the manuscript. They also thank Eric Tannier for fruitful discussions. Version 8 of this preprint has been peer-reviewed and recommended by Peer Community In Evolutionary Biology (https://doi.org/10.24072/pci.evolbiol.100127). Supplementary Material Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.s4mw6m958. Conflict of Interest Statement The authors of this preprint declare that they have no financial conflict of interest with the content of this article. G.J.S., T.A.W., and V.D. are members of the PCI Evol. Biol. recommenders. Funding D.S. and G.J.S. received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program under Grant Agreement 714774. T.A.W. was supported by a Royal Society University Research Fellowship and NERC grant NE/P00251X/1. B.B. and T.A.W. acknowledge support from a “Projet de Recherche Collaborative” co-funded by the CNRS and the Royal Society. S.H. was supported by the Deutsche Forschungsgemeinschaft (DFG) Emmy Noether-Program HO 6201/1-1. References Abby S.S. , Tannier E., Gouy M., Daubin V. 2012 . Lateral gene transfer as a support for the tree of life . Proc. Natl Acad. Sci. USA 109 ( 13 ): 4962 – 4967 . Google Scholar Crossref Search ADS WorldCat Altekar G. , Dwarkadas S., Huelsenbeck J. P., Ronquist F. 2004 . Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference . Bioinformatics 20 ( 3 ): 407 – 415 . Google Scholar Crossref Search ADS PubMed WorldCat Álvarez-Carretero S. , dos Reis M. 2020 . Bayesian phylogenomic dating . In: Ho S.Y.W., editor. The molecular evolutionary clock. Cham: Springer International Publishing . p. 221 – 249 . Available at: https://doi.org/10.1007/978-3-030-60181-2_13. Google Scholar OpenURL Placeholder Text WorldCat Barboni M. , Boehnke P., Keller B., Kohl I.E., Schoene B., Young E.D., McKeegan K.D. 2017 . Early formation of the moon 4.51 billion years ago . Sci. Adv. 3 ( 1 ): e1602365 . Google Scholar Crossref Search ADS PubMed WorldCat Berghuis B.A. , Yu F.B., Schulz F., Blainey P.C., Woyke T., Quake S.R. 2019 . Hydrogenotrophic methanogenesis in Archaeal phylum Verstraetearchaeota reveals the shared ancestry of all methanogens . Proc. Natl Acad. Sci. USA 116 ( 11 ): 5037 – 5044 . Google Scholar Crossref Search ADS WorldCat Betts H.C. , Puttick M.N., Clark J.W., Williams T.A., Donoghue P.C.J., Pisani D. 2018 . Integrated genomic and fossil evidence illuminates life’s early evolution and eukaryote origin . Nat Ecol Evol 2 ( 10 ): 1556 – 1562 . Google Scholar Crossref Search ADS PubMed WorldCat Bižić M. , Klintzsch T., Ionescu D., Hindiyeh M.Y., Günthel M., Muro-Pastor A.M., Eckert W., Urich T., Keppler F., Grossart H.-P. 2020 . Aquatic and terrestrial Cyanobacteria produce methane . Sci. Adv. 6 ( 3 ): eaax5343 . Google Scholar Crossref Search ADS PubMed WorldCat Bouckaert R. , Vaughan T.G., Barido-Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., Jones G., Kühnert D., De Maio N., Matschiner M., Mendes F.K, Müller N.F., Ogilvie H.A., du Plessis L., Popinga A., Rambaut A., Rasmussen D., Siveroni I., Suchard M.A., Wu C.-H., Xie D., Zhang C., Stadler T., Drummond A.J. 2019 . BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis . PLoS Comput. Biol . 15 ( 4 ): e1006650 . Google Scholar Crossref Search ADS PubMed WorldCat Boussau B. Gouy M. 2012 . What genomes have to say about the evolution of the earth. Gondwana Res. 21(2–3):483–494 . Available at: https://doi.org/10.1016/j.gr.2011.08.002. OpenURL Placeholder Text WorldCat Davín A.A. , Tannier E., Williams T.A., Boussau B., Daubin V., Szöllõsi G.J. 2018 . Gene transfers can date the tree of life . Nat. Ecol. Evol. 2 ( 5 ): 904 – 909 . Google Scholar Crossref Search ADS PubMed WorldCat Donoghue P.C.J. , Yang Z. 2016 . The evolution of methods for establishing evolutionary timescales . Phil. Trans. R. Soc. Lond. Ser. B Biol. Sci . 371 ( 1699 ). https://doi.org/10.1098/rstb.2016.0020. Google Scholar OpenURL Placeholder Text WorldCat Doolittle W.F. 1999 . Phylogenetic classification and the universal tree . Science. 284 ( 5423 ): 2124 – 2129 . https://doi.org/10.1126/science.284.5423.2124. Google Scholar Crossref Search ADS PubMed WorldCat Drummond A.J. , Ho S.Y.W., Phillips M.J., Rambaut A. 2006 . Relaxed phylogenetics and dating with confidence . PLoS Biol. 4 ( 5 ): e88 . Google Scholar Crossref Search ADS PubMed WorldCat Drummond A.J. , Nicholls G.K., Rodrigo A.G., Solomon W. 2002 . Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data . Genetics 161 ( 3 ): 1307 – 1320 . Google Scholar Crossref Search ADS PubMed WorldCat Felsenstein J. 1981 . Evolutionary trees from DNA sequences: a maximum likelihood approach . J. Mol. Evol. 17 ( 6 ): 368 – 376 . Google Scholar Crossref Search ADS PubMed WorldCat Gavryushkina A. , Heath T.A., Ksepka D.T., Stadler T., Welch D., Drummond A.J. 2017 . Bayesian total-evidence dating reveals the recent crown radiation of penguins . Syst. Biol. 66 ( 1 ): 57 – 73 . Google Scholar PubMed OpenURL Placeholder Text WorldCat Gavryushkina A. , Welch D., Stadler T., Drummond A.J. 2014 . Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration . PLoS Comput. Biol. 10 ( 12 ): e1003919 . Google Scholar Crossref Search ADS PubMed WorldCat Gogarten J.P. , Kibak H., Dittrich P., Taiz L., Bowman E.J., Bowman B.J., Manolson M.F., Poole R.J., Date T., Oshima T. 1989 . Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes . Proc. Natl Acad. Sci. USA 86 ( 17 ): 6661 – 6665 . Google Scholar Crossref Search ADS WorldCat Gouy R. , Baurain D., Philippe H. 2015 . Rooting the tree of life: the phylogenetic jury is still out. Phil . Trans. R. Soc. Lond. Ser. B Biol. Sci . 370 ( 1678 ): 20140329 . Google Scholar Crossref Search ADS WorldCat Hasegawa M. , Kishino H., Yano T. 1985 . Dating of the human–ape splitting by a molecular clock of mitochondrial DNA . J. Mol. Evol. 22 ( 2 ): 160 – 174 . Google Scholar Crossref Search ADS PubMed WorldCat Heath T.A. , Holder M.T., Huelsenbeck J.P. 2012 . A Dirichlet process prior for estimating lineage-specific substitution rates . Mol. Biol. Evol. 29 ( 3 ): 939 – 955 . Google Scholar Crossref Search ADS PubMed WorldCat Heath T.A. , Huelsenbeck J.P., Stadler T. 2014 . The fossilized birth–death process for coherent calibration of divergence-time estimates . Proc. Natl Acad. Sci. USA 111 ( 29 ): E2957 – E2966 . Google Scholar Crossref Search ADS WorldCat Höhna S. , Landis M.J., Heath T.A., Boussau B., Lartillot N., Moore B.R., Huelsenbeck J.P., Ronquist F. 2016 . RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language . Syst. Biol. 65 ( 4 ): 726 – 736 . Google Scholar Crossref Search ADS PubMed WorldCat Holland H.D. 2006 . The oxygenation of the atmosphere and oceans . Phil. Trans. R. Soc. Lond. Ser. B Biol. Sci. 361 ( 1470 ): 903 – 915 . Google Scholar Crossref Search ADS WorldCat Ho S.Y.W. , Duchêne S. 2014 . Molecular-clock methods for estimating evolutionary rates and timescales . Mol. Ecol. 23 ( 24 ): 5947 – 5965 . Google Scholar Crossref Search ADS PubMed WorldCat Huerta-Cepas J. , Serra F., Bork P. 2016 . ETE 3: reconstruction, analysis, and visualization of phylogenomic data . Mol. Biol. Evol. 33 ( 6 ): 1635 – 1638 . Google Scholar Crossref Search ADS PubMed WorldCat Iwabe N. , Kuma K., Hasegawa M., Osawa S., Miyata T. 1989 . Evolutionary relationship of Archaebacteria, Eubacteria, and Eukaryotes inferred from phylogenetic trees of duplicated genes . Proc. Natl Acad. Sci. USA 86 ( 23 ): 9355 – 9359 . Google Scholar Crossref Search ADS WorldCat Jukes T.H. , Cantor C.R. 1969 . Evolution of protein molecules . Mamm. Prot. Metab. 3 : 21 – 132 . https://doi.org/10.1016/b978-1-4832-3211-9.50009-7. Google Scholar OpenURL Placeholder Text WorldCat Landis M. , Edwards E.J., Donoghue M.J. 2021 . Modeling phylogenetic biome shifts on a planet with a past . Syst Biol 70 ( 1 ): 86 – 107 . Google Scholar Crossref Search ADS PubMed WorldCat Landis M.J. 2017 . Biogeographic dating of speciation times using paleogeographically informed processes . Syst. Biol. 66 ( 2 ): 128 – 144 . Google Scholar PubMed OpenURL Placeholder Text WorldCat Lartillot N. , Phillips M.J., Ronquist F. 2016 . A mixed relaxed clock model . Phil. Trans. R. Soc. Lond. Ser. B Biol. Sci . 371 ( 1699 ). https://doi.org/10.1098/rstb.2015.0132. Google Scholar OpenURL Placeholder Text WorldCat Lartillot N. , Rodrigue N., Stubbs D., Richer J. 2013 . PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment . Syst. Biol. 62 ( 4 ): 611 – 615 . Google Scholar Crossref Search ADS PubMed WorldCat Lepage T. , Bryant D., Philippe H., Lartillot N. 2007 . A general comparison of relaxed molecular clock models . Mol. Biol. Evol. 24 ( 12 ): 2669 – 2680 . Google Scholar Crossref Search ADS PubMed WorldCat Magnabosco C. , Moore K.R., Wolfe J.M., Fournier G.P. 2018 . Dating phototrophic microbial lineages with reticulate gene histories . Geobiology 16 ( 2 ): 179 – 189 . Google Scholar Crossref Search ADS PubMed WorldCat Mau B. , Newton M.A. 1997 . Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo . J. Comput. Graph. Statist. 6 ( 1 ): 122 – 131 . https://doi.org/10.1080/10618600.1997.10474731. Google Scholar OpenURL Placeholder Text WorldCat McKay L.J. , Dlakić M., Fields M.W., Delmont T.O., Eren A.M., Jay Z.J., Klingelsmith K.B., Rusch D.B., Inskeep W.P. 2019 . Co-occurring genomic capacity for anaerobic methane and dissimilatory sulfur metabolisms discovered in the Korarchaeota . Nat. Microbiol. 4 ( 4 ): 614 – 622 . Google Scholar Crossref Search ADS PubMed WorldCat Morris J.L. , Puttick M.N., Clark J.W., Edwards D., Kenrick P., Pressel S., Wellman C.H., Yang Z., Schneider H., Donoghue P.C.J. 2018 . The timescale of early land plant evolution . Proc. Natl Acad. Sci. USA 115 ( 10 ): E2274 – E2283 . Google Scholar Crossref Search ADS WorldCat Parham J.F. , Donoghue P.C.J., Bell C.J., Calway T.D., Head J.J., Holroyd P.A., Inoue J.G., Irmis R.B., Joyce W.G., Ksepka D.T., Patané J.S.L., Smith N.D., Tarver J.E., van Tuinen M., Yang Z., Angielczyk K.D., Greenwood J.M., Hipsley C.A., Jacobs L., Makovicky P.J., Müller J., Smith K.T., Theodor J.M., Warnock R.C.M., Benton M.J. 2012 . Best practices for justifying fossil calibrations . Syst. Biol. 61 ( 2 ): 346 – 359 . Google Scholar Crossref Search ADS PubMed WorldCat Penel S. , Arigon A.-M., Dufayard J.-F., Sertier A.-S., Daubin V., Duret L., Gouy M., Perrière G. 2009 . Databases of homologous gene families for comparative genomics . BMC Bioinformatics 10 ( Suppl. 6 ): S3 . Google Scholar PubMed OpenURL Placeholder Text WorldCat Petitjean C. , Moreira D., López-García P., Brochier-Armanet C. 2012 . Horizontal gene transfer of a chloroplast DnaJ-Fer protein to Thaumarchaeota and the evolutionary history of the DnaK chaperone system in Archaea . BMC Evol. Biol. 12 ( November ): 226 . Google Scholar PubMed OpenURL Placeholder Text WorldCat Pybus O.G. 2006 . Model selection and the molecular clock . PLoS Biol . 4 ( 5 ): e151 . Google Scholar Crossref Search ADS PubMed WorldCat Pyron R.A. 2011 . Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia . Syst. Biol. 60 ( 4 ): 466 – 481 . Google Scholar Crossref Search ADS PubMed WorldCat Rannala B. , Yang Z. 1996 . Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference . J. Mol. Evol. 43 ( 3 ): 304 – 311 . Google Scholar Crossref Search ADS PubMed WorldCat dos Reis M. , Donoghue P.C.J., Yang Z. 2015 . Bayesian molecular clock dating of species divergences in the genomics era . Nat. Rev. Genet. 17 ( 2 ): 71 – 80 . Google Scholar Crossref Search ADS PubMed WorldCat dos Reis M. , Inoue J., Hasegawa M., Asher R.J., Donoghue P.C.J., Yang Z. 2012 . Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny . Proc. Biol. Sci. 279 ( 1742 ): 3491 – 3500 . Google Scholar PubMed OpenURL Placeholder Text WorldCat dos Reis M , Yang Z. 2011 . Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times . Mol. Biol. Evol. 28 ( 7 ): 2161 – 2172 . Google Scholar Crossref Search ADS PubMed WorldCat Ren M. , Feng X., Huang Y., Wang H., Hu Z., Clingenpeel S., Swan B.K., Fonseca M.M., Posada D., Stepanauskas R., Hollibaugh J.T., Foster P.G., Woyke T., Luo, H. 2019 . Phylogenomics suggests oxygen availability as a driving force in Thaumarchaeota evolution . ISME J. 13 ( 9 ): 2150 – 2161 . Google Scholar Crossref Search ADS PubMed WorldCat Ronquist F. , Huelsenbeck J.P. 2003 . MrBayes 3: Bayesian phylogenetic inference under mixed models . Bioinformatics 19 ( 12 ): 1572 – 1574 . Google Scholar Crossref Search ADS PubMed WorldCat Ronquist F. , Klopfstein S., Vilhelmsen L., Schulmeister S., Murray D.L., Rasnitsyn A.P. 2012 . A total-evidence approach to dating with fossils, applied to the early radiation of the Hymenoptera . Syst. Biol. 61 ( 6 ): 973 – 999 . Google Scholar Crossref Search ADS PubMed WorldCat Sanderson M.J. 2002 . Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach . Mol. Biol. Evol. 19 ( 1 ): 101 – 109 . Google Scholar Crossref Search ADS PubMed WorldCat Stadler T. , Yang Z. 2013 . Dating phylogenies with sequentially sampled tips . Syst. Biol. 62 ( 5 ): 674 – 688 . Google Scholar Crossref Search ADS PubMed WorldCat Szöllosi G.J. , Boussau B., Abby S.S., Tannier E., Daubin V. 2012 . Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations . Proc. Natl Acad. Sci. USA 109 ( 43 ): 17513 – 17518 . Google Scholar Crossref Search ADS WorldCat Thorne J.L. , Kishino H., Painter I.S. 1998 . Estimating the rate of evolution of the rate of molecular evolution . Mol. Biol. Evol. 15 ( 12 ): 1647 – 1657 . https://doi.org/10.1093/oxfordjournals.molbev.a025892. Google Scholar Crossref Search ADS PubMed WorldCat Tomitani A. , Knoll A.H., Cavanaugh C.M., Ohno T. 2006 . The evolutionary diversification of cyanobacteria: molecular–phylogenetic and paleontological perspectives . Proc. Natl Acad. Sci. USA 103 ( 14 ): 5442 – 5447 . Google Scholar Crossref Search ADS WorldCat Ueno Y. , Yamada K., Yoshida N., Maruyama S., Isozaki Y. 2006 . Evidence from fluid inclusions for microbial methanogenesis in the Early Archaean era . Nature 440 ( 7083 ): 516 – 519 . Google Scholar Crossref Search ADS PubMed WorldCat Vanwonterghem I. , Evans P.N., Parks D.H., Jensen P.D., Woodcroft B.J., Hugenholtz P., Tyson G.W. 2016 . Methylotrophic methanogenesis discovered in the Archaeal phylum Verstraetearchaeota . Nat. Microbiol. 1 ( 12 ): 1 – 9 . Google Scholar Crossref Search ADS WorldCat Williams T.A. , Szöllõsi G.J., Spang A., Foster P.G., Heaps S.E., Boussau B., Ettema T.J.G., Embley T.M. 2017 . Integrative modeling of gene and genome evolution roots the Archaeal tree of life . Proc. Natl Acad. Sci. USA 114 ( 23 ): E4602 – E4611 . Google Scholar Crossref Search ADS WorldCat Woese C.R. , Kandler O., Wheelis M.L. 1990 . Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya . Proc. Natl Acad. Sci. USA 87 ( 12 ): 4576 – 4579 . Google Scholar Crossref Search ADS WorldCat Wolfe J.M. , Fournier G.P. 2018 . Horizontal gene transfer constrains the timing of methanogen evolution . Nat. Ecol. Evol. 2 ( 5 ): 897 – 903 . Google Scholar Crossref Search ADS PubMed WorldCat Yang Z. , Rannala B. 2005 . Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds . Mol. Biol. Evol. 23 ( 1 ): 212 – 26 . Google Scholar Crossref Search ADS PubMed WorldCat Yang Z. , Rannala B. 2006 . Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds . Mol. Biol. Evol. 23 ( 1 ): 212 – 226 . https://doi.org/10.1093/molbev/msj024. Google Scholar Crossref Search ADS PubMed WorldCat Yang Z. , Rannala B. 1997 . Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method . Mol. Biol. Evol. 14 ( 7 ): 717 – 724 . https://doi.org/10.1093/oxfordjournals.molbev.a025811. Google Scholar Crossref Search ADS PubMed WorldCat Zhang R. , Drummond A. 2020 . Improving the performance of Bayesian phylogenetic inference under relaxed clock models . BMC Evol. Biol. 20 ( 1 ): 54 . Google Scholar Crossref Search ADS PubMed WorldCat Zuckerkandl E. , Pauling L.B. 1962 . Molecular disease, evolution, and genetic heterogeneity . In: Kasha M., Pullman B., editors. Horizons in Biochemistry . New York : Academic Press . p. 189 – 225 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC © The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com © The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists. TI - Relative Time Constraints Improve Molecular Dating JF - Systematic Biology DO - 10.1093/sysbio/syab084 DA - 2022-06-16 UR - https://www.deepdyve.com/lp/oxford-university-press/relative-time-constraints-improve-molecular-dating-UvFYVRNlSn SP - 797 EP - 809 VL - 71 IS - 4 DP - DeepDyve ER -