Access the full text.
Sign up today, get DeepDyve free for 14 days.
O. Bininda-Emonds, J. Gittleman, M. Steel (2002)
The (Super)Tree of Life: Procedures, Problems, and ProspectsAnnual Review of Ecology, Evolution, and Systematics, 33
K. Huber, Vincent Moulton, Peter Lockhart, A. Dress (2001)
Pruned median networks: a technique for reducing the complexity of median networks.Molecular phylogenetics and evolution, 19 2
F. Ronquist, J. Huelsenbeck (2003)
MrBayes 3: Bayesian phylogenetic inference under mixed modelsBioinformatics, 19 12
D. Penny, L. Foulds, M. Hendy (1982)
Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequencesNature, 297
Ziheng Yang (1996)
Maximum-likelihood models for combined analyses of multiple sequence dataJournal of Molecular Evolution, 42
A. Rokas, B. Williams, N. King, S. Carroll (2003)
Genome-scale approaches to resolving incongruence in molecular phylogeniesNature, 425
B. Holland, V. Moulton (2003)
Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees
E. Buckler, Anthony Ippolito, T. Holtsford (1997)
The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications.Genetics, 145 3
J. Huelsenbeck, J. Bull, C. Cunningham (1996)
Combining data in phylogenetic analysis.Trends in ecology & evolution, 11 4
M. Steel, D. Huson, P. Lockhart (2000)
Invariable sites models and their use in phylogeny reconstruction.Systematic biology, 49 2
A. Mooers, E. Holmes (2000)
The evolution of base composition and phylogenetic inference.Trends in ecology & evolution, 15 9
D. Bryant (2001)
A classification of consensus methods for phylogenetics
Abstract Building species phylogenies from genome data requires the evaluation of phylogenetic evidence from independent gene loci. We propose an approach to do this using consensus networks. We compare gene trees for eight yeast genomes and show that consensus networks have potential for helping to visualize contradictory evidence for species phylogenies. Although gene trees are important for inferring species relationships, they differ in a number of ways from species trees. These differences are caused by the complex (nonbifurcating) evolutionary histories of species, as well as evolutionary processes that violate the assumptions of simple sequence substitution models. These processes may include gene conversion and concerted evolution (Aguilar, Rossello, and Feliner 1999; Buckler, Ippolito, and Holtsford 1999), as well as asymmetric sequence evolution (Moores and Holmes 2000; Steel, Huson, and Lockhart 2000). Despite the difficulty of interpretation, gene trees are central to the problem of building a species phylogeny. An important issue is how to combine evidence from different genome loci without losing information about independent gene histories. This may not be achievable using current methods, including those that (1) reconstruct the optimal evolutionary tree for single genes and then find their consensus (Bryant 2003) or build a supertree (Bininda-Emonds, Gittleman, and Steel 2002), (2) methods that identify optimal substitution models for different genes and then find the tree that best fits these mixed substitution models (Yang 1996; Ronquist and Huelsenbeck 2003), and (3) methods that concatenate genes before fitting an evolutionary model (Huelsenbeck, Bull, and Cunningham 1996). The last method, in particular, is highly controversial. Although concatenation can provide long enough sequences to overcome sampling error, the problem of model misspecification has the potential to lead-tree building methods to converge (e.g., with 100% nonparametric bootstrap support) to an incorrect tree. Recently, a study was published by Rokas et al. (2003) that has allowed us to test a new approach to visualizing the extent of contradictory genome evidence for building a species phylogeny. Our approach uses consensus networks (Bandelt 1995; Holland and Moulton 2003) and involves combining “splits” (partitions of the taxa into two groups) from different genes into a potentially hyperdimensional graph. Computational details for constructing consensus networks are described in Holland and Moulton (2003), where they are used to visualize phylogenetic uncertainty in large collections of trees. The consensus network approach is now freely available and implemented in SplitsTree version 4.0 (http://www-ab.informatik.uni-tuebingen.de/software/jsplits/welcome_en.html). In this brief communication, we have constructed a consensus network, by using the gene trees from Rokas et al. (2003), for 106 orthologs common to Candida albicans and seven species of Saccharomyces. We compare our results with those obtained in their analyses. They concatenated each of the 106 genes from the yeast genomes and then exhaustively evaluated the best-fitting parsimony and maximum-likelihood trees. A single tree was identified with 100% nonparametric bootstrap support for all internal edges. These authors noted that this level of support was obtained even though some individual gene trees were not congruent with the tree derived from the concatenated sequences. To gain insight into the extent of incongruence between their 106 individual gene trees, we constructed a consensus network from the splits occurring in the different gene trees. In doing this, we used median network construction (Bandelt 1995; Holland and Moulton 2003) and included in our consensus network all splits that occurred above a threshold value of 10% (i.e., all splits that occurred in at least 10 of the 106 gene trees). Ours is a novel application of median networks, which typically are used to analyze sequence site pattern variation in population studies (Bandelt, Forster, and Rohl 1999; Huber et al. 2001). Because splits are compared for a large number of independent gene loci, we show for the Rokas et al. (2003) data that our approach provides a more informative indicator of the species phylogeny than does concatenation of sequences. Expectations are clear when reconstructing a species phylogeny from a collection of gene trees using consensus networks. At one extreme, in the absence of any common phylogenetic patterns among the gene trees, the consensus network will be a structure consisting of high-dimensional hypercubes (e.g., as in figure 1a). At the other extreme, and assuming no stochasticity associated with a bifurcating evolutionary process, the consensus network will be a unique bifurcating tree. The consensus networks for the 106 maximum-likelihood (ML) and maximum-parsimony (MP) trees of Rokas et al. (2003) (figure 1b and c, respectively) are both very similar to the concatenated gene tree reported earlier by these authors. The largely bifurcating nature of these consensus networks indicates that the data are very treelike and that there is common phylogenetic signal between a large number of independent genes. However, there is some uncertainty in the species phylogeny in respect of the relationship of S. castellii, S. kluyveri, and C. albicans, as well as uncertainty in the placement of S. bayanus and S. kudriavzevii. This uncertainty is hidden when genes are concatenated before phylogenetic analysis, as is done in the study of Rokas et al. (2003). It is interesting to note that evaluating the extent of the incongruence between optimal gene trees for many loci in yeast is relevant to earlier work that sought to test the theory of evolution. The finding of highly similar gene trees for Candida albicans and seven species of Saccharomyces provides strong evidence that corroborates the conclusions of Penny, Foulds, and Hendy. (1982). These authors sought to test the theory of evolution by asking whether gene trees for the same species of mammals were more similar than one would expect by chance. In doing so, and because of limited data at the time, they compared trees reconstructed from very few genes. The gene trees of Rokas et al. (2003) allow comparisons to be made for very large number of genes (106). It is evident from comparing figures 1a–c that the gene trees of Rokas et al. (2003) are far more congruent with each other than would be expected by chance. As shown in table 1, this finding is also evident when the analytical approach of Penny, Foulds, and Hendy. (1982) is used. Our study highlights the potential of using consensus networks to visualize species phylogenies for large numbers of independent genes. Their main advantage is that, unlike other methods, conflicting evolutionary hypotheses can be displayed simultaneously. Such conflict might arise because of stochastic processes and sampling error. It may also arise because of complex biological processes such as hybridization or lateral gene transfer. The incongruence generated by these processes in the evolution of genomes is easily quantified using consensus networks. Arndt von Haesler, Associate Editor Fig. 1. View largeDownload slide Consensus networks have been reconstructed for (a) 106 random bifurcating trees on eight taxa [A, B, C, … , H], (b) the 106 maximum-likelihood trees obtained in Rokas et al. (2003), and (c) the 106 maximum-parsimony trees obtained in Rokas et al. (2003). The presence of boxes in these networks indicates contradictory evidence for grouping certain species together. The lengths of the edges are proportional to the number of gene trees in which a particular edge occurs. Each network displays all those edges that are represented in at least 10 of the 106 trees Fig. 1. View largeDownload slide Consensus networks have been reconstructed for (a) 106 random bifurcating trees on eight taxa [A, B, C, … , H], (b) the 106 maximum-likelihood trees obtained in Rokas et al. (2003), and (c) the 106 maximum-parsimony trees obtained in Rokas et al. (2003). The presence of boxes in these networks indicates contradictory evidence for grouping certain species together. The lengths of the edges are proportional to the number of gene trees in which a particular edge occurs. Each network displays all those edges that are represented in at least 10 of the 106 trees Table 1 Distribution of the Robinson-Foulds Distance Between Pairs of Trees. Robinson-Fouldsa Distance 0 2 4 6 8 10 Expectedb 0.53 5.34 36.17 208.69 1113.56 4201.02 Observed MP 1377 2652 2638 1589 0 0 Observed ML 975 1387 1661 1187 355 0 Robinson-Fouldsa Distance 0 2 4 6 8 10 Expectedb 0.53 5.34 36.17 208.69 1113.56 4201.02 Observed MP 1377 2652 2638 1589 0 0 Observed ML 975 1387 1661 1187 355 0 a The Robinson-Foulds distance between two trees is the number of edges that appear in each tree but not in both. b The expected number of pairs from 106 random bifurcating trees with a Robinson-Foulds distance from 0 to 10. View Large We thank the anonymous reviewers and AE for their helpful and constructive comments. This work was financially supported by the New Zealand Marsden Fund (P.J.L.), the Swedish Research Council (K.H.), and the Swedish Foundation for International Cooperation in Research and Higher Education (STINT). Literature Cited Aguilar, J. F., J. A. Rossello, and G. N. Feliner. 1999. Nuclear ribosomal DNA (nrDNA) concerted evolution in natural and artificial hybrids of Armeria (Plumbaginaceae). Mol. Ecol. 8: 1341-1346. Google Scholar Bandelt, H.-J. 1995. Combination of data in phylogenetic analysis. Plant Syst. Evol. (suppl) 9: 355-361. Google Scholar Bandelt, H-J., P. Forster, and A. Rohl. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16: 37-48. Google Scholar Bininda-Emonds, O. R. P., J. L. Gittleman, and M. A. Steel. 2002. The (super) tree of life: procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33: 265-89. Google Scholar Buckler, E. S., A. Ippolito, and T. P. Holtsford. 1997. The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications. Genetics 145: 821-832. Google Scholar Bryant, D. 2003. A classification of consensus methods for phylogenetics. Pp. 1–21 in M. Janowitz, F. J. Lapointe, F. McMorris, B. Mirkin, and F. Roberts, eds. Bioconsensus. American Mathematical Society Publications—DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), Piscataway, NJ. Google Scholar Holland, B., and V. Moulton. 2003. Consensus networks: a method for visualising incompatibilities in collections of trees. Pp. 165–176 in G. Benson and R. Page, eds. Algorithms in bioinformatics, WABI 2003. Springer-Verlag, Berlin, Germany. Google Scholar Huber, K. T., V. Moulton, P. J. Lockhart, and A. Dress. 2001. Pruned median networks: A technique for studying plant speciations. Mol. Phylogenet. Evol. 19: 302-310. Google Scholar Huelsenbeck, J. P., J. J. Bull, and C. W. Cunningham. 1996. Combining data in phylogenetic analysis. Trends Ecol. Evol. 11: 152-158. Google Scholar Moores, A., and E. C. Holmes. 2000. The evolution of base composition and phylogenetic inference. Trends Ecol. Evol. 15: 365-369. Google Scholar Penny, D., L. R. Foulds, and M. D. Hendy. 1982. Testing the theory of evolution by comparing phylogenetic trees constructed from 5 different protein sequences. Nature 297: 197-200. Google Scholar Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798-803. Google Scholar Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572-1574. Google Scholar Steel, M. A., D. Huson, and P. J. Lockhart. 2000. Invariable site models and their use in phylogeny reconstruction. Syst. Biol. 49: 225-232. Google Scholar Yang, Z. H. 1996. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42: 587-596. Google Scholar Molecular Biology and Evolution vol. 21 no. 7 © Society for Molecular Biology and Evolution 2004; all rights reserved.
Molecular Biology and Evolution – Oxford University Press
Published: Jul 1, 2004
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.