Access the full text.
Sign up today, get DeepDyve free for 14 days.
R. Page (1996)
TreeView: an application to display phylogenetic trees on personal computersComputer applications in the biosciences : CABIOS, 12 4
B. Larget (2000)
Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees
J. Huelsenbeck, F. Ronquist, Rasmus Nielsen, Jonathan Bollback (2001)
Bayesian Inference of Phylogeny and Its Impact on Evolutionary BiologyScience, 294
Gautam Altekar, S. Dwarkadas, J. Huelsenbeck, F. Ronquist (2002)
Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inferenceBioinformatics, 20 3
J. Huelsenbeck, F. Ronquist (2001)
MRBAYES : Bayesian inference of phylogenyBioinformatics, 17
Vol. 19 no. 12 2003, pages 1572–1574 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg180 MrBayes 3: Bayesian phylogenetic inference under mixed models 1,∗ 2 Fredrik Ronquist and John P. Huelsenbeck Department of Systematic Zoology, Evolutionary Biology Centre, Uppsala University, Norbyv. 18D, SE-752 36 Uppsala, Sweden and Section of Ecology, Behavior and Evolution, Division of Biological Sciences, University of California, San Diego, La Jolla, CA 92093-0116, USA Received on December 20, 2002; revised on February 14, 2003; accepted on February 19, 2003 ABSTRACT invoked constraint on model complexity is the assumption Summary: MrBayes 3 performs Bayesian phylogenetic of data homogeneity. Many phylogenetic data sets now analysis combining information from different data par- include evidence from several different sources: morphol- titions or subsets evolving under different stochastic ogy and molecules, amino acid and nucleotide data, or evolutionary models. This allows the user to analyze sequences from the mitochondrial, plastid and nuclear heterogeneous data sets consisting of different data genomes. However, the available software commonly types—e.g. morphological, nucleotide, and protein— forces the investigator to either: (1) model the evolution and to explore a wide variety of structured models mixing of such data using a single stochastic model; (2) analyze partition-unique and shared parameters. The program em- the different data partitions or subsets separately and use ploys MPI to parallelize Metropolis coupling on Macintosh ad hoc methods to obtain a summary result; or (3) resort or UNIX clusters. to simple search algorithms or non-statistical methods. Availability: http://morphbank.ebc.uu.se/mrbayes. None of these alternatives is particularly attractive. MrBayes 3 is a completely rewritten and restructured Contact: fredrik.ronquist@ebc.uu.se version of MrBayes, a command-driven program for Bayesian phylogenetic inference (Huelsenbeck and Computational complexity has long been a major ob- Ronquist, 2001). The hallmark of the new program is a stacle in the development of statistical approaches to powerful framework for phylogenetic inference under phylogenetic inference. Even moderate-sized empirical mixed models accommodating data heterogeneity. This problems have posed serious challenges to computational framework will help the user to specify mixed models and biologists, forcing compromises in analytical accuracy. exploit the computational efficiency of Bayesian MCMC However, the recent introduction of Bayesian inference analysis in dealing with composite data sets. and Markov chain Monte Carlo (MCMC) techniques to Bayesian phylogenetic inference is based on Bayes’s phylogenetics has changed this situation. Early Bayesian rule. Applied to the phylogeny problem, the rule can be phylogenetics papers showed that Markov chains based on expressed as follows the Metropolis–Hastings algorithm were computationally more efficient than the standard Maximum Likelihood f (τ, v,θ) f ( X |τ, v, θ ) f (τ, v,θ | X ) = (ML) bootstrapping approach (Larget and Simon, 1999). f ( X ) It is now known that problems with more than 350 sequences (taxa) can be analyzed successfully with mod- where X is the data matrix, τ is the topology of the tree, v is a vector of branch (or edge) lengths on the tree, and θ is erate computational effort using Bayesian inference and avector of substitution model parameters. The distribution an MCMC convergence acceleration technique known f (τ, v,θ) is referred to as the prior, and specifies the prior as Metropolis coupling (Huelsenbeck et al., 2001). Such probability of different parameter values; f ( X |τ, v, θ ) is problems are set on tree spaces many orders of magnitude the likelihood function, describing the probability of the larger than those amenable to ML bootstrapping. data under different parameter values; and f ( X ) is the The increase in computational efficiency associated total probability of the data summed and integrated over with the Bayesian MCMC approach makes it possible to the parameter space. Bayesian inference is based on the analyze more complex and realistic evolutionary models so-called posterior distribution f (τ, v,θ | X ). than previously. Currently, an important but commonly Typically, it is not possible to calculate the posterior To whom correspondence should be addressed. probability distribution analytically; instead, MCMC tech- 1572 Bioinformatics 19(12) c Oxford University Press 2003; all rights reserved. MrBayes 3: Bayesian phylogenetic inference under mixed models niques are used to obtain samples from it. MrBayes 3 uses are all available for nucleotide data. Protein data can be aMetropolis–Hastings sampler and updates single param- analyzed using a range of fixed or variable rate matrices. eters or blocks of related parameters in each move. As- Both the standard and restriction site models can include sume that, in the current generation, the Markov chain has correction for coding biases. The standard model, appro- parameter values τ , v, and θ and that we are consider- priate for analysis of morphological data, allows up to ten ing a change in θ to the new value θ picked from some different states and there are models for both unordered proposal distribution q(θ |θ). Then we accept the change and linearly ordered characters. Rate variation across sites with probability can be accommodated using a standard gamma or, for nucleotide and protein models, an autocorrelated gamma ∗ ∗ ∗ f (θ ) f ( X |τ, v, θ ) q(θ |θ ) distribution. For nucleotide and protein data, it is possible r = min 1, f (θ ) f ( X |τ, v, θ ) q(θ |θ) to allow rate variation across the tree using a variant of the covarion/covariotide model, in which sites independently Bayesian inference is easily extended to deal with switch between an ‘on’ and an ‘off’ state (Huelsenbeck heterogeneous models. Assume that we have two subsets et al., 2001; Huelsenbeck and Ronquist, 2001). Available of our data, X and X , such that X = X + X . Assume a b a b tree models include unconstrained, standard molecular further that it is likely that these data subsets evolved clock, birth-and-death, and coalescent models. on the same phylogenetic tree but according to entirely MrBayes 3 reads data from a text file conforming to a different substitution models with parameters θ and θ , a b modified NEXUS format allowing mixed data sets. By respectively, such that θ = θ + θ . Bayes’s rule now a b default, the data are partitioned according to data type. becomes: The user can further subdivide the data, most easily by f (τ, v,θ ,θ ) f ( X |τ, v, θ ,θ ) specifying character sets. Once the data set has been a b a b f (τ, v,θ ,θ | X ) = a b partitioned appropriately, the user can set the model f ( X ) structure and priors of each data subset. Individual model =[ f (τ, v,θ ,θ ) f ( X |τ, v, θ ) a b a a parameters can be unlinked or linked across selected × f ( X |τ, v, θ )] [ f ( X )] b b data subsets. The currently active subsets and the model parameters applying to these subsets can be listed, giving During a MCMC run, we might consider a change of the the user a means of checking that the mixed model is substitution model parameters affecting one of the subsets, specified correctly. for instance the subset X . Then we have a potential In addition to allowing heterogeneity across data subsets change from θ to θ and the jumping kernel of the in overall rate and in substitution model parameters, Metropolis–Hastings sampler simplifies to MrBayes 3 also allows the user to unlink topology and ∗ ∗ ∗ branch lengths. Different data subsets can thus have f (θ ) f ( X |τ, v, θ ) q(θ |θ ) a a a a a r = min 1, independent branch lengths or even different topologies. f (θ ) f ( X |τ, v, θ ) q(θ |θ ) a a a a Correct scaling of rate parameters is important in mixed Because we only need to calculate the likelihood ratio models. MrBayes 3 scales rates such that branch lengths (the second ratio in the product) for the affected data are measured in the expected number of changes per site. subsets, the run will actually be faster (per generation) For instance, a change in a codon model is counted as one under a mixed than under a homogeneous model. Of change per three sites. Partition-specific rate multipliers course, since there are more parameters in the mixed are scaled such that the mean rate per site across partitions model, we will be visiting each parameter more rarely is unity. and this is likely to lead to slower mixing and a need MrBayes 3 provides many options for summarizing and to run the chain longer before an adequate sample of diagnosing the results of an MCMC analysis. Simple plots the posterior distribution is obtained. The net effect is of overall likelihood and individual parameter values can dependent on the particulars of the analysis but there is no be generated to determine burn-in and examine mixing, obvious reason why the computational complexity would and the program will also estimate the model likelihood, used in Bayesian model testing. The parameter file is a necessarily be much worse under a parameter-rich mixed tab-delimited text file, which can be imported into and model than under a homogeneous model for the same data set. MrBayes 3 now provides the tools needed to examine analyzed with most standard statistical software packages. this question empirically. The program will summarize trees and branch lengths in MrBayes 3 implements a wide variety of stochastic the form of consensus trees, partition tables, and lists of models for nucleotide, protein, restriction site, and trees with their estimated posterior probability. Consensus morphological (standard) data. Single, doublet (for stem trees are written with both branch lengths and posterior regions) and codon (with or without variation in the non- clade probabilities, for easy graphical representation using synonymous/synonymous rate ratio across sites) models software such as TreeView (Page, 1996). 1573 F.Ronquist and J.P.Huelsenbeck By default, MrBayes 3 uses Metropolis coupling to ACKNOWLEDGEMENTS accelerate convergence of the Markov chain (Huelsenbeck We are deeply indebted to the beta testers and users of and Ronquist, 2001). Because of the relatively small previous versions of MrBayes for identifying bugs and amount of information communicated among the multiple proposing various improvements. F.R. was supported by chains during such a run, Metropolis coupling is well the Swedish Research Council grant 621-2001-2963 and suited to parallel implementations in which chains are J.P.H. by NSF grants DEB-0075406 and MCB-0075404. distributed among processors. Using the Message-Passing Interface (MPI), this type of parallelization has been REFERENCES implemented in MrBayes 3 for UNIX and Macintosh Altekar et al. (2003) Parallel metropolis-coupled Markov chain clusters. With large data sets, near linear speed-ups can Monte Carlo for Bayesian phylogenetic inference. Bioinformat- be achieved using this approach (Altekar et al., 2003). ics,in press. MrBayes 3 is written in ANSI C and is available free Huelsenbeck,J.P and Ronquist,F. (2001) MrBayes: Bayesian infer- of charge from http://morphbank.ebc.uu.se/mrbayes/. ence of phylogeny. Bioinformatics, 17, 754–755. The site provides precompiled versions for the MacOS Huelsenbeck,J.P., Ronquist,F., Nielsen,R. and Bollback,J.P. (2001) and Windows platforms, and the source code for com- Bayesian inference of phylogeny and its impact on evolutionary pilation on UNIX machines. The MPI-enabled parallel biology. Science, 294, 2310–2314. version of MrBayes 3 is available both precompiled Larget,B. and Simon,D. (1999) Markov chain Monte Carlo algo- for Macintosh OS X and through setting the relevant rithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. compiler switch before compilation for UNIX. The par- Evol., 16, 750–759. allel Macintosh version requires installation of POOCH Page,R.D.M. (1996) TreeView: an application to display phyloge- (http://www.daugerresearch.com/pooch/whatis.html) on netic trees on personal computers. Computer Applications in the Biosciences, 12, 357–358. all participating machines.
Bioinformatics – Oxford University Press
Published: Aug 12, 2003
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.