Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group

Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group Recent phylogenetic analyses position certain “orphan” protist lineages deep in the tree of eukaryotic life, but their exact placements are poorly resolved. We conducted phylogenomic analyses that incorporate deeply sequenced transcriptomes from representatives of collodictyonids (diphylleids), rigifilids, Mantamonas, and ancyromonads (planomonads). Analyses of 351 genes, using site-heterogeneous mixture models, strongly support a novel super-group-level clade that includes collo- dictyonids, rigifilids, and Mantamonas, which we name “CRuMs”. Further, they robustly place CRuMs as the closest branch to Amorphea (including animals and fungi). Ancyromonads are strongly inferred to be more distantly related to Amorphea than are CRuMs. They emerge either as sister to malawimonads, or as a separate deeper branch. CRuMs and ancyromonads represent two distinct major groups that branch deeply on the lineage that includes animals, near the most commonly inferred root of the eukaryote tree. This makes both groups crucial in examinations of the deepest-level history of extant eukaryotes. Key words: eukaryote tree of life, concatenated phylogenetic analysis, protist, site-heterogeneous models. Introduction of proteins typically show a eukaryote tree consisting of Our understanding of the eukaryote tree of life has been five-to-eight “super-groups” that fall within three even- revolutionized by genomic and transcriptomic investiga- higher-order assemblages: 1) Amorphea (Amoebozoa tions of diverse protists, which constitute the overwhelm- plus Obazoa, the latter including animals and fungi), ing majority of eukaryotic diversity (Burki 2014; Simpson 2) Diaphoretickes (primarily Sar, Archaeplastida, and Eglit 2016). Phylogenetic analyses of super-matrices Cryptista, and Haptophyta), and 3) Excavata (Discoba The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(2):427–433. doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 427 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Brown et al. GBE and Metamonada) (Adl et al. 2012). Recent analyses Phylogenomic Tree Inference (Derelle et al. 2015) place the root of the eukaryote tree Maximum likelihood (ML) trees were inferred using IQ-TREE v. somewhere between Amorphea and the other two listed 1.5.5 (Nguyen et al. 2015). The best-fitting available model lineages; Derelle et al. (2015) termed this the “Opimoda- based on the Akaike Information Criterion (AIC) was the Diphoda” root. There is considerable debate over the po- LGþ C60þ FþC mixture model with class weights optimized sition of the root, however (Cavalier-Smith 2010; Katz from the data set and four discrete gamma (C) categories. ML et al. 2012; He et al. 2014). trees were estimated under this model for both 61- and Nonetheless, there remain several “orphan” protist lin- 64-taxon data sets. We then used this model and best ML eages that cannot be assigned to any super-group by cel- tree under the LGþ C60þ FþC model to estimate the lular anatomy or ribosomal RNA phylogenies (Brugerolle “posterior mean site frequencies” (PMSF) model (Wang et al. 2002; Glu ¨ cksman et al. 2011; Heiss et al. 2011; et al. 2017) for both 61 (fig. 1) and 64 (supplementary fig. Cavalier-Smith 2013; Pawlowski 2013; Yabuki, Eikrem, S1, Supplementary Material online) taxon data sets. This et al. 2013; Yabuki, Ishida, et al. 2013; Katz and Grant LGþ C60þ FþC-PMSF model was used to re-estimate ML 2015). Recent phylogenomic analyses including trees, and for a bootstrap analysis of the 61-taxon data set, Collodictyon, Mantamonas, and ancyromonads indicate with 100 pseudoreplicates (fig. 1). AU topology tests under that these particular “orphans” branch near the base of the LGþ C60þ FþC were conducted with IQ-TREE to evaluate Amorphea (Zhao et al. 2012; Cavalier-Smith et al. 2014), whether trees recovered by the Bayesian analyses or alterna- the same general position as the purported Opimoda- tive placements (see supplementary table S1, Supplementary Diphoda root. This implies, 1) that these lineages are of Material online, for hypotheses tested) of the orphan taxa special evolutionary importance, but also, 2) that uncer- could be rejected statistically. tainty over their phylogenetic positions will profoundly Bayesian inferences were performed using PHYLOBAYES-MPI impact our understanding of deep eukaryote history. v1.6j (Rodrigue and Lartillot 2014), under the CAT-GTRþC Unfortunately their phylogenetic positions indeed remain model, with four discrete C categories. For the 61-taxon anal- unclear, with different phylogenomic analyses supporting ysis, 6 independent Markov chain Monte Carlo chains were incompatible topologies, and often showing low statisti- run for 4,000 generations, sampling every second genera- cal support (Cavalier-Smith et al. 2014). This is likely due in tion. Two sets of two chains converged (at 800 and 2,000 part to the modest numbers of sampled genes for some/ generations, which were, respectively, used as the burnin), most species and generally poor taxon sampling (Cavalier- with the largest discrepancy in posterior probabilities (PPs) Smith et al. 2014; Torruella et al. 2015). Therefore, we (maxdiff)< 0.05. The topologies of the converged chains undertook phylogenomic analyses that incorporated are presented in supplementary figures S3 and S4, deeply sequenced transcriptome data from representa- Supplementary Material online, and are mapped upon tives of two collodictyonids, a Mantamonas, three ancyr- figure 1. For the 64-taxon analysis, four chains were run for omonads, and a single rigifilid. 3,000 generations. Two chains converged at 200 gener- ations, which was used as the burnin, (maxdiff¼ 0) and the posterior probabilities are mapped upon the ML tree in sup- Materials and Methods plementary figure S1, Supplementary Material online. Details of experimental methods for culturing, nucleic acid extraction, and Illumina sequencing are described in the sup- plementary text, Supplementary Material online. Fast-Site Removal and Gene Subsampling Analyses For fast site removal, rates of evolution at each site of the 61- taxon data set were estimated with DIST_EST (Susko et al. Phylogenomic Data Set Construction 2003) under the LG model using discrete gamma probability A reference data set of 351 aligned proteins described in estimation. A custom PYTHON script wasthenusedtoremove (Kang et al. 2017) was used as the starting point for the fastest evolving sites in 4,000-site steps. Random subsampling current analysis, from which 61 or 64 taxa representing of 20%, 40%, 60%, or 80% of the genes in the 61-taxon diverse eukaryotes were selected (see supplementary ta- data set was conducted using a custom PYTHON script, with the ble S2, Supplementary Material online). Extensive efforts number of replicates as given in figure 2B. In both cases each were made to exclude contamination and paralogs, as step or subsample was analyzed using 1,000 UFBOOT repli- described in the supplementary text, Supplementary cates in IQ-TREE under the LGþ C60þ FþC-PMSF model. Material online. Poorly aligned sites were excluded using BMGE (Criscuolo and Gribaldo 2010), resulting in an Results alignment of 97,002 amino acid (AA) sites with <25% missing data for both 61- and 64-taxon data sets (supple- Using a custom phylogenomic pipeline plus manual curation, mentary table S2, Supplementary Material online). we generated a data set of 351 orthologs. The data set was 428 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE FIG.1.—Phylogenetic tree for 61 eukaryotes, inferred from 351 proteins using Maximum Likelihood (LGþ C60þ FþC-PMSF model). The numbers on branches show (in order) support values from 100 real bootstrap replicates (LGþ C60þ FþC-PMSF model) and posterior probabilities from both sets of converged chains in PHYLOBAYES-MPI under CAT-GTRþC model (i.e., MLBS/PP/PP). Filled circles represent maximum support with all methods; asterisks indicate a clade not recovered in the PHYLOBAYES analysis. The dashed arrow indicates the placement of malawimonads inferred with PHYLOBAYES-MPI (see also inset summary tree), and gray arrows indicate the placements of other lineages in the PHYLOBAYES-MPI analyses. filtered of paralogs and potential cross-contamination by vi- Our analyses of both 61- and 64-taxon data sets ro- sualizing each protein’s phylogeny individually, then removing bustly recover well-accepted major groups including Sar, sequences whose positions conflicted with a conservative Discoba, Metamonada, Obazoa, and Amoebozoa (fig. 1 consensus phylogeny (as in Tice et al. 2016; Kang et al. and supplementary fig. S1, Supplementary Material on- 2017)(supplementary methods, Supplementary Material on- line). Cryptista (e.g., cryptomonads and close relatives) line). We selected data-rich species to represent the phyloge- branches with Haptophyta (fig. 1)in the netic diversity of eukaryotes. Our primary data set retained 61 LGþ C60þ FþC-PSMF analysesaswellasin one set of taxa, with metamonads represented by two short-branching two converged PHYLOBAYES-MPI chains under the CAT- taxa (Trimastix and Paratrimastix). We also analyzed a 64- GTR model (supplementary fig. S2, Supplementary taxon data set containing three additional longer branching Material online). However another pair of converged metamonads. Maximum likelihood (ML) and Bayesian analy- chains places Haptophyta as sister to Sar while Cryptista ses were conducted using site-heterogeneous models; nests within Archaeplastida (supplementary fig. S3, LGþ C60þ FþC and the associated PMSF model Supplementary Material online), which is largely consis- (LGþ C60þ FþC-PMSF) as implemented in IQ-TREE (Wang tent with some other recent phylogenomic studies (Burki et al. 2017)and CAT-GTRþC in PHYLOBAYES-MPI, respectively. et al. 2016). Excavata was never monophyletic, with Such site-heterogeneous models are important for deep-level Discoba forming a clan with Diaphoretickes taxa (Sar, phylogenetic inference with numerous substitutions along Haptophyta, Archaeplastidaþ Cryptista) and branches (Lartillot et al. 2007; Le et al. 2008; Wang et al. Metamonada grouping with Amorphea plus the four or- 2008, 2017; Pisani et al. 2015). phan lineages targeted in this study (see below). Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 429 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 % of Genes Sampled Ultrafast Bootstrap Support 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% Brown et al. GBE Fast Site Deletion Assay 100% 80% 60% 40% 20% 0% FULL 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Stepwise deletion of fastest sites (4000 site steps) DATASET 1 3 5 Archaeplastida Amorphea+CRuMs+Ancyromonads+Malawimonads Amorphea+CRuMs Metamonada+Malawimonads 2 4 Metamonada+Ancyromonads+Malawimonads CRuMs 6 Ancyromonads+Malawimonads B Random Resampling of Genes 20% Genes [70] 40% Genes [140] 60% Genes [210] 80% Genes [280] 100% Genes [351] 80% 3=99.2% n=14 n=6 n=5 n=3 n=1 Amorphea 60% 5=99.0% Obazoa 6=95.3% 40% Amoebozoa 14=95.6% CRuMs 20% Ancryomonads 510 15 20 Sar Resampling Replicates Archaeplastida Amorphea+CRuMs Amorphea+CRuMs+Ancyromonads+Malawimonads Ancyromonads+Malawimonads Amoebozoa+CRuMs Metamonada+Ancyromonads Metamonada+Malawimonads Excavata (No Malawimonads) Excavata+Malawimonads Ancyromonads+Malawimonads+CRuMs Metamonada+Ancyromonads+Malawimonads Ultrafast Bootstrap Support FIG.2.—Effects of fast evolving sites and random subsampling of genes on our phylogenomic analyses. (A) Sites were sorted based on their rates of evolution estimated under the LGþ FþC model and removed from the data set from highest to lowest rate. Each step has 4,000 of the fastest evolving sites removed progressively. The bootstrap values (UFBOOT; LGþ C60þ FþC-PSMF model) for each bipartition of interest are plotted. (B and C) Effects of random subsampling of genes within the 351-gene data set. The following bipartitions were examined but received nearly 100% support across the fast site deletion series (data not shown); Amorphea, Obazoa, Amoebozoa, Ancryomonads, and Sar. The following bipartitions were examined but received nearly 0% support across the fast site deletion series (data not shown); Amoebozoaþ CRuMs, Metamonadaþ Ancyromonads, Excavata (No Malawimonads), Excavataþ Malawimonads, and Ancyromonadsþ Malawimonadsþ CRuMs. (B) Effects of random subsampling of genes on the bipartitions of interest. Inset panel is the calculation of the number of replicates (n) necessary for a 95% probability of sampling every gene when subsampling 20%, 40%, 60%, and 80% of genes using the formula: 0.95¼ 1(1x/100) ,where x is the percentage of genes subsampled. UFBOOT support values for all nodes of interest with the variability of support values illustrated by box-and-whisker plots. Malawimonads, which are morphologically similar to cer- clade with the collodictyonids Collodictyon triciliatum and tain metamonads and discobids (Simpson 2003), also Diphylleia rotans. Mantamonas plastica then branches as their branch among the “orphans” (see below). closest relative, with maximal support. This Phylogenies of both data sets place all four orphan taxa Collodictyonidþ Rigifilidaþ Mantamonas clade (“CRuMs”) near thebaseof Amorphea(fig. 1 and supplementary fig. S1, forms the sister group to Amorphea, again with maximal Supplementary Material online). The uncertain position of the support. eukaryotic root (discussed earlier) therefore makes it unclear ML analyses and the converged PHYLOBAYES chains grouped which bipartitions are truly clades, and which could be inter- ancyromonads, malawimonads, and CRuMs with Amorphea, rupted by the root. To allow efficient communication, we with strong bootstrap support and Bayesian posterior proba- discuss the phylogenies as if the orphan taxa all lie on the bility (fig. 1, 61 taxa; PMSF BS¼ 98%, PP¼ 1). Amorphea side of the root. We will also consider Amorphea Ancyromonads and malawimonads formed a clade in the as previously circumscribed (Adl et al. 2012): the least-inclusive ML analyses, but with equivocal support (fig. 1, 61 taxa; clade or clan containing Amoebozoa and Opisthokonta. BS¼ 77%). Both sets of converged chains of the Bayesian Three of the orphan lineages are specifically related in our analyses instead grouped malawimonads with trees (fig. 1 and supplementary fig. S1, Supplementary CRuMsþ Amorphea to the exclusion of ancyromonads (sup- Material online). In both 61- and 64-taxon analyses, Rigifila plementary figs. S2 and S3, Supplementary Material online, ramosa (representing Rigifilida) forms a maximally supported PP¼ 1 for both); however some unconverged chains support 430 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE an ancyromonadþ malawimonad clade (data not shown). Supplementary Material online). As an alternative approach, Lack of convergence among multiple chains using the CAT- we conducted analyses on a data set with the amino acid GTRþC model is unfortunately common for large data sets, sequences recoded into fewer states, an approach that has and often cannot be resolved by increasing the number of been shown to ameliorate compositional bias problems generations of Markov chain Monte Carlo within a reasonable (Feuda et al. 2017). We recoded the concatenated amino time frame (Pisani et al. 2015; Kang et al. 2017). Instead we acid sequences of our 61-taxon data set into four states based treat the two topologies recovered in these analyses as can- on the saturation bins of (Susko and Roger 2007). ML analyses didate hypotheses requiring further investigation. of the recoded data set using the general-time-reversible We conducted approximately unbiased (AU) topology tests (GTR)þC60þ FþC model (with 4 states) recovered a phylog- on the 61-taxon data set under the LGþ C60þ FþC mixture eny (supplementary fig. S6, Supplementary Material online) model (supplementary table S1, Supplementary Material on- largely congruent with the foregoing analyses (e.g., fig. 1). line). These testsrejectedthe Phylobayestrees, aswellasall Together, these analyses strongly suggest that our phyloge- trees optimized by enforcing constraints representing plausi- netic results cannot be attributed to sequences of similar ble alternative relative placements of ancyromonads, malawi- amino acid composition being artificially grouped together monads, and metamonads. and that compositional heterogeneity had minimal impact The fastest evolving sites are expected to be the most on our analyses. prone to saturation and systematic error arising from model misspecification in phylogenomic analyses (Philippe et al. Discussion 2011). We conducted a “fast-site removal” analysis with the 61-taxon data set and generated ultrafast bootstrap sup- Our 351 protein (97,002 AA site) super-matrix places several port (UFBOOT) values (Minh et al. 2013) for relevant groups as orphan lineages in two separate clades emerging between sites were progressively removed from fastest to slowest Amorphea and all other major eukaryote groups. All methods (fig. 2A). All groups of interest receive reasonably strong sup- recover a strongly supported clade comprising the free- port until 44,000–48,000 sites were removed, when sup- swimming collodictyonid flagellates, the idiosyncratic filose port fell markedly for the ancryomonadþ malawimonad protist Rigifila (Rigifilida), and the gliding flagellate clade and the Amorpheaþ CRuMsþ ancryomonadþ mala- Mantamonas. This clade is resilient to exclusion both of fast- wimonad clan. At this point, a notable proportion of the boot- evolving sites and of randomly selected genes. It is also con- strap trees show malawimonads and/or ancyromonads sistently placed as the immediate sister taxon to Amorphea. grouping with metamonads. This decline in support for the This represents the first robust estimate of the positions of ancryomonadþ malawimonad group reverses somewhat these three taxonomically poor but phylogenetically deep with further site removal, before support falls again as overall clades. Previous phylogenomic analyses placed collodictyonids phylogenetic structure is lost when 76,000 sites are re- in various positions, such as sister to either malawimonads or moved (fig. 2A). Amoebozoa, but often with low statistical support (Zhao et al. To evaluate heterogeneity in phylogenetic signals among 2012; Cavalier-Smith et al. 2014). Placements of genes (Inagaki et al. 2009), we also inferred phylogenies from Mantamonas have varied dramatically. A recent phyloge- subsamples of the 351 examined genes (61-taxon data set; nomic study recovered a weak Mantamonasþ collodictyonid fig. 2B and C). For each subsample 20–80% of the genes clade in some analyses, but other analyses in the same study were randomly selected, without replacement, with replica- instead recovered a weak Mantamonasþ ancyromonad rela- tion as per figure 2B (giving a>95% probability that a par- tionship (Cavalier-Smith et al. 2014), and SSUþ LSU rRNA ticular gene would be sampled at each level), and UFBOOT gene phylogenies strongly grouped Mantamonas with apu- support for major clades was inferred (fig. 2C). The “80% somonads (Glu ¨ cksman et al. 2011; Yabuki, Ishida, et al. retained” replicates gave nearly identical results to the full 2013). Our study decisively supports the first of these possi- data set, indicating that there was little stochastic error asso- bilities. This is the first phylogenomic analysis incorporating ciated with gene sampling at this level. Support for the Rigifilida: Previous SSUþ LSUrRNAgene analysesrecovered CRuMs clade is almost always high when 40%þ of genes a negligibly supported collodictyonidþ rigifilid clade, but not are retained, whereas subsamples containing 60% of genes a relationship with Mantamonas (Yabuki, Ishida, et al. 2013). still showed differing support for a ancyromonad- Overall, the hypotheses that 1) collodictyonids, rigifilids, malawimonad clade (as opposed to, e.g., malawimonads and Mantamonas form a major eukaryote clade, and 2) this branching with metamonads). clade is sister to Amorphea, are novel, plausible, and evolu- We also investigated whether heterogeneity in amino acid tionarily important. No name exists for this putative super- composition among sequences in the data set had any impact group, and it is obviously premature to propose a formal on the branching order of the inferred phylogenies. Clustering taxon. We suggest the place-holding moniker “CRuMs” on aminoacidcompositionfailedtorecover any groupings (Collodictyonidae, Rigifilida, Mantamonas), which is euphonic that were inferred in our phylogenies (supplementary fig. S5, and evokes the species-poor nature of these taxa. Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 431 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Brown et al. GBE Whether ancyromonads branch outside Amorphea or 15H04411 awarded to K.I., and 15H05231 to T.H.) and by the within it has been disputed (Paps et al. 2013; Cavalier-Smith “Tree of Life” research project (University of Tsukuba). et al. 2014). Our study strongly places ancyromonads outside Amorphea, more distantly related to it than are the CRuMs. Literature Cited Ancyromonads instead fall “among” the excavate lineages Adl SM, et al. 2012. The revised classification of eukaryotes. J Eukaryot (Discoba, Metamonada, and Malawimonadidae). Resolving Microbiol. 59(5):429–493. the relationships among “excavates” is extremely challenging Brugerolle G, Bricheux G, Philippe H, Coffea G. 2002. Collodictyon tricilia- (Hampl et al. 2009; Derelle et al. 2015), and this likely con- tum and Diphylleia rotans (¼Aulacomonas submarina) form a new family of flagellates (Collodictyonidae) with tubular mitochondrial cris- tributed to our difficulty in resolving the exact position of tae that is phylogenetically distant from other flagellate groups. Protist ancyromonads vis-a-vis malawimonads. A close relationship 153(1):59–70. between ancyromonads and some/all excavates would be Burki F. 2014. The eukaryotic tree of life from a global phylogenomic broadly consonant with the marked cytoskeletal similarity be- perspective. Cold Spring Harb Perspect Biol. 6(5):a016147. tween Ancyromonas and “typical excavates” (Heiss et al. Burki F, et al. 2016. Untangling the early diversification of eukaryotes: a phylogenomic study of the evolutionary origins of Centrohelida, 2011). Certainly, our study flags ancyromonads as highly rel- Haptophyta and Cryptista. Proc Biol Sci. 283:20152802. evant to resolving relationships among excavates. Cavalier-Smith T. 2010. Kingdoms Protozoa and Chromista and the Both candidate positions for ancyromonads place them at eozoan root of the eukaryotic tree. Biol Lett. 6(3):342–345. the center of a crucial open question: locating the root of the Cavalier-Smith T. 2013. Early evolution of eukaryote feeding modes, cell eukaryote tree. As discussed earlier, the latest analyses structural diversity, and classification of the protozoan phyla Loukozoa, Sulcozoa, and Choanozoa. Eur J Protistol. 49(2):115–178. (Derelle et al. 2015) locate the root between Cavalier-Smith T, et al. 2014. Multigene eukaryote phylogeny reveals the Discobaþ Diaphoretickes (“Diphoda”) and a clade including likely protozoan ancestors of opisthokonts (animals, fungi, choanozo- Amorphea, collodictyonids, and malawimonads ans) and Amoebozoa. Mol Phylogenet Evol. 81:71–85. (“Opimoda”). Our phylogenies show the ancyromonad line- Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with age emerging close to this split. One of the two positions we Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 10:210. recovered would actually place ancyromonads either as the Derelle R, et al. 2015. Bacterial proteins pinpoint a single eukaryotic root. deepest branch within “Diphoda,” or the deepest branch Proc Natl Acad Sci U S A. 112(7):E693–E699. within “Opimoda,” or even as sister to all other extant eukar- Feuda R, et al. 2017. Improved modeling of compositional heterogeneity yotes. This demonstrates the profound importance of includ- supports sponges as sister to all other animals. Curr Biol. ing ancyromonads in future rooted phylogenies of 27(24):3864–3870. Glu ¨ cksman E, et al. 2011. The novel marine gliding zooflagellate genus eukaryotes, using data sets optimized for this purpose. Mantamonas (Mantamonadida ord. n.: Apusozoa). Protist 162(2):207–221. Supplementary Material Hampl V, et al. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups.” Supplementary data areavailableat Genome Biology and Proc Natl Acad Sci U S A. 106(10):3859–3864. Evolution online. He D, et al. 2014. An alternative root for the eukaryote tree of life. Curr Biol. 24(4):465–470. Heiss AA, Walker G, Simpson AGB. 2011. The ultrastructure of Ancyromonas, a eukaryote without supergroup affinities. Protist Acknowledgments 162(3):373–393. The authors thank Tom Cavalier-Smith and Ed Glu¨cksman Inagaki Y, Nakajima Y, Sato M, Sakaguchi M, Hashimoto T. 2009. Gene sampling can bias multi-gene phylogenetic inferences: the relationship (Oxford University) for supplying cultures strains B-70 between red algae and green plants as a case study. Mol Biol Evol. (Ancyromonas sigmoides), NYK3C (Fabomonas tropica), and 26(5):1171–1178. Bass1 (Mantamonas plastica). The part of this work conducted Kang S, et al. 2017. Between a pod and a hard test: the deep evolution of at Dalhousie University was supported by NSERC Discovery amoebae. Mol Biol Evol. 34:2258–2270. grants awarded to A.G.B.S. (298366-2014) and A.J.R. Katz LA, Grant JR, Parfrey LW, Burleigh JG. 2012. Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. (2016-06792), respectively. A.J.R. also acknowledges the Syst Biol. 61(4):653–660. Canada Research Chairs program for support. This project Katz LA, Grant JR. 2015. Taxon-rich phylogenomic analyses resolve the was supported in part by the National Science Foundation eukaryotic tree of life and reveal the power of subsampling by sites. (NSF) Division of Environmental Biology (DEB) grant Syst Biol. 64(3):406–415. 1456054 (http://www.nsf.gov), awarded to M.W.B. Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site- Mississippi State University’s High Performance Computing heterogeneous model. BMC Evol Biol. 7(Suppl 1):S4. Collaboratory provided some computational resources. The Le SQ, Lartillot N, Gascuel O. 2008. Phylogenetic mixture models for part of this work conducted at the University of Tsukuba proteins. Philos Trans R Soc Lond B Biol Sci. was supported by grants from the Japan Society for the 363(1512):3965–3976. Promotion of Science (JSPS; 15H05606 and 15K14591 Minh BQ, Nguyen MAT, von Haeseler A. 2013. Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol. 30(5):1188–1195. awarded to R.K., 23117006 and 16H04826 awarded to Y.I., 432 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast Susko E, Roger AJ. 2007. On reduced amino acid alphabets for phyloge- and effective stochastic algorithm for estimating maximum-likelihood netic inference. Mol. Biol. Evol. 24(9):2139–2150. phylogenies. Mol Biol Evol. 32(1):268–274. Tice AK, et al. 2016. Expansion of the molecular and morphological Paps J, Medina-Chaco n LA, Marshall W, Suga H, Ruiz-Trillo I. 2013. diversity of Acanthamoebidae (Centramoebida, Amoebozoa) Molecular phylogeny of unikonts: new insights into the position of and identification of a novel life cycle type within the group. apusomonads and ancyromonads and the internal relationships of Biol Direct. 11:69. opisthokonts. Protist 164(1):2–12. Torruella G, et al. 2015. Phylogenomics reveals convergent evolution of Pawlowski J. 2013. The new micro-kingdoms of eukaryotes. BMC Biol. lifestyles in close relatives of animals and fungi. Curr Biol. 11:40. 25(18):2404–2410. Philippe H, et al. 2011. Resolving difficult phylogenetic questions: why Wang H, Minh B, Susko E, Roger AJ. 2017. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phy- more sequences are not enough. PLoS Biol. 9(3):e1000602. Pisani D, et al. 2015. Genomic data do not support comb jellies as the sister logenomic estimation. Syst Biol. doi: 10.1093/sysbio/syx068. group to all other animals. Proc Natl Acad Sci U S A. Wang H-C, Li K, Susko E, Roger AJ. 2008. A class frequency mixture model 112(50):15402–15407. that adjusts for site-specific amino acid frequencies and improves in- Rodrigue N, Lartillot N. 2014. Site-heterogeneous mutation-selection mod- ference of protein phylogeny. BMC Evol Biol. 8:331. els within the PhyloBayes-MPI package. Bioinformatics Yabuki A, Eikrem W, Takishita K, Patterson DJ. 2013. Fine structure of 30(7):1020–1021. Telonema subtilis Griessmann, 1913: a flagellate with a unique cyto- Simpson AGB. 2003. Cytoskeletal organization, phylogenetic affinities and skeletal structure among eukaryotes. Protist 164(4):556–569. systematics in the contentious taxon Excavata (Eukaryota). Int J Syst Yabuki A, Ishida K-I, Cavalier-Smith T. 2013. Rigifila ramosa n. gen., n. sp., Evol Microbiol. 53(6):1759–1777. a filose apusozoan with a distinctive pellicle, is related to Simpson AGB, Eglit Y. 2016. Protist diversification. In: Kliman RM, editor. Micronuclearia. Protist 164:75–88. Encyclopedia of evolutionary biology. Vol. 3. Amsterdam: Elsevier. p. Zhao S, et al. 2012. Collodictyon–an ancient lineage in the tree of eukar- 344–360. yotes. Mol Biol Evol. 29(6):1557–1568. Susko E, Field C, Blouin C, Roger AJ. 2003. Estimation of rates-across-sites distributions in phylogenetic substitution models. Syst Biol. 52(5):594–603. Associate editor:Laura Katz Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 433 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology and Evolution Oxford University Press

Loading next page...
 
/lp/ou_press/phylogenomics-places-orphan-protistan-lineages-in-a-novel-eukaryotic-HjOFRf1eHA
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ISSN
1759-6653
eISSN
1759-6653
D.O.I.
10.1093/gbe/evy014
Publisher site
See Article on Publisher Site

Abstract

Recent phylogenetic analyses position certain “orphan” protist lineages deep in the tree of eukaryotic life, but their exact placements are poorly resolved. We conducted phylogenomic analyses that incorporate deeply sequenced transcriptomes from representatives of collodictyonids (diphylleids), rigifilids, Mantamonas, and ancyromonads (planomonads). Analyses of 351 genes, using site-heterogeneous mixture models, strongly support a novel super-group-level clade that includes collo- dictyonids, rigifilids, and Mantamonas, which we name “CRuMs”. Further, they robustly place CRuMs as the closest branch to Amorphea (including animals and fungi). Ancyromonads are strongly inferred to be more distantly related to Amorphea than are CRuMs. They emerge either as sister to malawimonads, or as a separate deeper branch. CRuMs and ancyromonads represent two distinct major groups that branch deeply on the lineage that includes animals, near the most commonly inferred root of the eukaryote tree. This makes both groups crucial in examinations of the deepest-level history of extant eukaryotes. Key words: eukaryote tree of life, concatenated phylogenetic analysis, protist, site-heterogeneous models. Introduction of proteins typically show a eukaryote tree consisting of Our understanding of the eukaryote tree of life has been five-to-eight “super-groups” that fall within three even- revolutionized by genomic and transcriptomic investiga- higher-order assemblages: 1) Amorphea (Amoebozoa tions of diverse protists, which constitute the overwhelm- plus Obazoa, the latter including animals and fungi), ing majority of eukaryotic diversity (Burki 2014; Simpson 2) Diaphoretickes (primarily Sar, Archaeplastida, and Eglit 2016). Phylogenetic analyses of super-matrices Cryptista, and Haptophyta), and 3) Excavata (Discoba The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(2):427–433. doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 427 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Brown et al. GBE and Metamonada) (Adl et al. 2012). Recent analyses Phylogenomic Tree Inference (Derelle et al. 2015) place the root of the eukaryote tree Maximum likelihood (ML) trees were inferred using IQ-TREE v. somewhere between Amorphea and the other two listed 1.5.5 (Nguyen et al. 2015). The best-fitting available model lineages; Derelle et al. (2015) termed this the “Opimoda- based on the Akaike Information Criterion (AIC) was the Diphoda” root. There is considerable debate over the po- LGþ C60þ FþC mixture model with class weights optimized sition of the root, however (Cavalier-Smith 2010; Katz from the data set and four discrete gamma (C) categories. ML et al. 2012; He et al. 2014). trees were estimated under this model for both 61- and Nonetheless, there remain several “orphan” protist lin- 64-taxon data sets. We then used this model and best ML eages that cannot be assigned to any super-group by cel- tree under the LGþ C60þ FþC model to estimate the lular anatomy or ribosomal RNA phylogenies (Brugerolle “posterior mean site frequencies” (PMSF) model (Wang et al. 2002; Glu ¨ cksman et al. 2011; Heiss et al. 2011; et al. 2017) for both 61 (fig. 1) and 64 (supplementary fig. Cavalier-Smith 2013; Pawlowski 2013; Yabuki, Eikrem, S1, Supplementary Material online) taxon data sets. This et al. 2013; Yabuki, Ishida, et al. 2013; Katz and Grant LGþ C60þ FþC-PMSF model was used to re-estimate ML 2015). Recent phylogenomic analyses including trees, and for a bootstrap analysis of the 61-taxon data set, Collodictyon, Mantamonas, and ancyromonads indicate with 100 pseudoreplicates (fig. 1). AU topology tests under that these particular “orphans” branch near the base of the LGþ C60þ FþC were conducted with IQ-TREE to evaluate Amorphea (Zhao et al. 2012; Cavalier-Smith et al. 2014), whether trees recovered by the Bayesian analyses or alterna- the same general position as the purported Opimoda- tive placements (see supplementary table S1, Supplementary Diphoda root. This implies, 1) that these lineages are of Material online, for hypotheses tested) of the orphan taxa special evolutionary importance, but also, 2) that uncer- could be rejected statistically. tainty over their phylogenetic positions will profoundly Bayesian inferences were performed using PHYLOBAYES-MPI impact our understanding of deep eukaryote history. v1.6j (Rodrigue and Lartillot 2014), under the CAT-GTRþC Unfortunately their phylogenetic positions indeed remain model, with four discrete C categories. For the 61-taxon anal- unclear, with different phylogenomic analyses supporting ysis, 6 independent Markov chain Monte Carlo chains were incompatible topologies, and often showing low statisti- run for 4,000 generations, sampling every second genera- cal support (Cavalier-Smith et al. 2014). This is likely due in tion. Two sets of two chains converged (at 800 and 2,000 part to the modest numbers of sampled genes for some/ generations, which were, respectively, used as the burnin), most species and generally poor taxon sampling (Cavalier- with the largest discrepancy in posterior probabilities (PPs) Smith et al. 2014; Torruella et al. 2015). Therefore, we (maxdiff)< 0.05. The topologies of the converged chains undertook phylogenomic analyses that incorporated are presented in supplementary figures S3 and S4, deeply sequenced transcriptome data from representa- Supplementary Material online, and are mapped upon tives of two collodictyonids, a Mantamonas, three ancyr- figure 1. For the 64-taxon analysis, four chains were run for omonads, and a single rigifilid. 3,000 generations. Two chains converged at 200 gener- ations, which was used as the burnin, (maxdiff¼ 0) and the posterior probabilities are mapped upon the ML tree in sup- Materials and Methods plementary figure S1, Supplementary Material online. Details of experimental methods for culturing, nucleic acid extraction, and Illumina sequencing are described in the sup- plementary text, Supplementary Material online. Fast-Site Removal and Gene Subsampling Analyses For fast site removal, rates of evolution at each site of the 61- taxon data set were estimated with DIST_EST (Susko et al. Phylogenomic Data Set Construction 2003) under the LG model using discrete gamma probability A reference data set of 351 aligned proteins described in estimation. A custom PYTHON script wasthenusedtoremove (Kang et al. 2017) was used as the starting point for the fastest evolving sites in 4,000-site steps. Random subsampling current analysis, from which 61 or 64 taxa representing of 20%, 40%, 60%, or 80% of the genes in the 61-taxon diverse eukaryotes were selected (see supplementary ta- data set was conducted using a custom PYTHON script, with the ble S2, Supplementary Material online). Extensive efforts number of replicates as given in figure 2B. In both cases each were made to exclude contamination and paralogs, as step or subsample was analyzed using 1,000 UFBOOT repli- described in the supplementary text, Supplementary cates in IQ-TREE under the LGþ C60þ FþC-PMSF model. Material online. Poorly aligned sites were excluded using BMGE (Criscuolo and Gribaldo 2010), resulting in an Results alignment of 97,002 amino acid (AA) sites with <25% missing data for both 61- and 64-taxon data sets (supple- Using a custom phylogenomic pipeline plus manual curation, mentary table S2, Supplementary Material online). we generated a data set of 351 orthologs. The data set was 428 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE FIG.1.—Phylogenetic tree for 61 eukaryotes, inferred from 351 proteins using Maximum Likelihood (LGþ C60þ FþC-PMSF model). The numbers on branches show (in order) support values from 100 real bootstrap replicates (LGþ C60þ FþC-PMSF model) and posterior probabilities from both sets of converged chains in PHYLOBAYES-MPI under CAT-GTRþC model (i.e., MLBS/PP/PP). Filled circles represent maximum support with all methods; asterisks indicate a clade not recovered in the PHYLOBAYES analysis. The dashed arrow indicates the placement of malawimonads inferred with PHYLOBAYES-MPI (see also inset summary tree), and gray arrows indicate the placements of other lineages in the PHYLOBAYES-MPI analyses. filtered of paralogs and potential cross-contamination by vi- Our analyses of both 61- and 64-taxon data sets ro- sualizing each protein’s phylogeny individually, then removing bustly recover well-accepted major groups including Sar, sequences whose positions conflicted with a conservative Discoba, Metamonada, Obazoa, and Amoebozoa (fig. 1 consensus phylogeny (as in Tice et al. 2016; Kang et al. and supplementary fig. S1, Supplementary Material on- 2017)(supplementary methods, Supplementary Material on- line). Cryptista (e.g., cryptomonads and close relatives) line). We selected data-rich species to represent the phyloge- branches with Haptophyta (fig. 1)in the netic diversity of eukaryotes. Our primary data set retained 61 LGþ C60þ FþC-PSMF analysesaswellasin one set of taxa, with metamonads represented by two short-branching two converged PHYLOBAYES-MPI chains under the CAT- taxa (Trimastix and Paratrimastix). We also analyzed a 64- GTR model (supplementary fig. S2, Supplementary taxon data set containing three additional longer branching Material online). However another pair of converged metamonads. Maximum likelihood (ML) and Bayesian analy- chains places Haptophyta as sister to Sar while Cryptista ses were conducted using site-heterogeneous models; nests within Archaeplastida (supplementary fig. S3, LGþ C60þ FþC and the associated PMSF model Supplementary Material online), which is largely consis- (LGþ C60þ FþC-PMSF) as implemented in IQ-TREE (Wang tent with some other recent phylogenomic studies (Burki et al. 2017)and CAT-GTRþC in PHYLOBAYES-MPI, respectively. et al. 2016). Excavata was never monophyletic, with Such site-heterogeneous models are important for deep-level Discoba forming a clan with Diaphoretickes taxa (Sar, phylogenetic inference with numerous substitutions along Haptophyta, Archaeplastidaþ Cryptista) and branches (Lartillot et al. 2007; Le et al. 2008; Wang et al. Metamonada grouping with Amorphea plus the four or- 2008, 2017; Pisani et al. 2015). phan lineages targeted in this study (see below). Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 429 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 % of Genes Sampled Ultrafast Bootstrap Support 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% 100% 80% 60% 40% 20% Brown et al. GBE Fast Site Deletion Assay 100% 80% 60% 40% 20% 0% FULL 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Stepwise deletion of fastest sites (4000 site steps) DATASET 1 3 5 Archaeplastida Amorphea+CRuMs+Ancyromonads+Malawimonads Amorphea+CRuMs Metamonada+Malawimonads 2 4 Metamonada+Ancyromonads+Malawimonads CRuMs 6 Ancyromonads+Malawimonads B Random Resampling of Genes 20% Genes [70] 40% Genes [140] 60% Genes [210] 80% Genes [280] 100% Genes [351] 80% 3=99.2% n=14 n=6 n=5 n=3 n=1 Amorphea 60% 5=99.0% Obazoa 6=95.3% 40% Amoebozoa 14=95.6% CRuMs 20% Ancryomonads 510 15 20 Sar Resampling Replicates Archaeplastida Amorphea+CRuMs Amorphea+CRuMs+Ancyromonads+Malawimonads Ancyromonads+Malawimonads Amoebozoa+CRuMs Metamonada+Ancyromonads Metamonada+Malawimonads Excavata (No Malawimonads) Excavata+Malawimonads Ancyromonads+Malawimonads+CRuMs Metamonada+Ancyromonads+Malawimonads Ultrafast Bootstrap Support FIG.2.—Effects of fast evolving sites and random subsampling of genes on our phylogenomic analyses. (A) Sites were sorted based on their rates of evolution estimated under the LGþ FþC model and removed from the data set from highest to lowest rate. Each step has 4,000 of the fastest evolving sites removed progressively. The bootstrap values (UFBOOT; LGþ C60þ FþC-PSMF model) for each bipartition of interest are plotted. (B and C) Effects of random subsampling of genes within the 351-gene data set. The following bipartitions were examined but received nearly 100% support across the fast site deletion series (data not shown); Amorphea, Obazoa, Amoebozoa, Ancryomonads, and Sar. The following bipartitions were examined but received nearly 0% support across the fast site deletion series (data not shown); Amoebozoaþ CRuMs, Metamonadaþ Ancyromonads, Excavata (No Malawimonads), Excavataþ Malawimonads, and Ancyromonadsþ Malawimonadsþ CRuMs. (B) Effects of random subsampling of genes on the bipartitions of interest. Inset panel is the calculation of the number of replicates (n) necessary for a 95% probability of sampling every gene when subsampling 20%, 40%, 60%, and 80% of genes using the formula: 0.95¼ 1(1x/100) ,where x is the percentage of genes subsampled. UFBOOT support values for all nodes of interest with the variability of support values illustrated by box-and-whisker plots. Malawimonads, which are morphologically similar to cer- clade with the collodictyonids Collodictyon triciliatum and tain metamonads and discobids (Simpson 2003), also Diphylleia rotans. Mantamonas plastica then branches as their branch among the “orphans” (see below). closest relative, with maximal support. This Phylogenies of both data sets place all four orphan taxa Collodictyonidþ Rigifilidaþ Mantamonas clade (“CRuMs”) near thebaseof Amorphea(fig. 1 and supplementary fig. S1, forms the sister group to Amorphea, again with maximal Supplementary Material online). The uncertain position of the support. eukaryotic root (discussed earlier) therefore makes it unclear ML analyses and the converged PHYLOBAYES chains grouped which bipartitions are truly clades, and which could be inter- ancyromonads, malawimonads, and CRuMs with Amorphea, rupted by the root. To allow efficient communication, we with strong bootstrap support and Bayesian posterior proba- discuss the phylogenies as if the orphan taxa all lie on the bility (fig. 1, 61 taxa; PMSF BS¼ 98%, PP¼ 1). Amorphea side of the root. We will also consider Amorphea Ancyromonads and malawimonads formed a clade in the as previously circumscribed (Adl et al. 2012): the least-inclusive ML analyses, but with equivocal support (fig. 1, 61 taxa; clade or clan containing Amoebozoa and Opisthokonta. BS¼ 77%). Both sets of converged chains of the Bayesian Three of the orphan lineages are specifically related in our analyses instead grouped malawimonads with trees (fig. 1 and supplementary fig. S1, Supplementary CRuMsþ Amorphea to the exclusion of ancyromonads (sup- Material online). In both 61- and 64-taxon analyses, Rigifila plementary figs. S2 and S3, Supplementary Material online, ramosa (representing Rigifilida) forms a maximally supported PP¼ 1 for both); however some unconverged chains support 430 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE an ancyromonadþ malawimonad clade (data not shown). Supplementary Material online). As an alternative approach, Lack of convergence among multiple chains using the CAT- we conducted analyses on a data set with the amino acid GTRþC model is unfortunately common for large data sets, sequences recoded into fewer states, an approach that has and often cannot be resolved by increasing the number of been shown to ameliorate compositional bias problems generations of Markov chain Monte Carlo within a reasonable (Feuda et al. 2017). We recoded the concatenated amino time frame (Pisani et al. 2015; Kang et al. 2017). Instead we acid sequences of our 61-taxon data set into four states based treat the two topologies recovered in these analyses as can- on the saturation bins of (Susko and Roger 2007). ML analyses didate hypotheses requiring further investigation. of the recoded data set using the general-time-reversible We conducted approximately unbiased (AU) topology tests (GTR)þC60þ FþC model (with 4 states) recovered a phylog- on the 61-taxon data set under the LGþ C60þ FþC mixture eny (supplementary fig. S6, Supplementary Material online) model (supplementary table S1, Supplementary Material on- largely congruent with the foregoing analyses (e.g., fig. 1). line). These testsrejectedthe Phylobayestrees, aswellasall Together, these analyses strongly suggest that our phyloge- trees optimized by enforcing constraints representing plausi- netic results cannot be attributed to sequences of similar ble alternative relative placements of ancyromonads, malawi- amino acid composition being artificially grouped together monads, and metamonads. and that compositional heterogeneity had minimal impact The fastest evolving sites are expected to be the most on our analyses. prone to saturation and systematic error arising from model misspecification in phylogenomic analyses (Philippe et al. Discussion 2011). We conducted a “fast-site removal” analysis with the 61-taxon data set and generated ultrafast bootstrap sup- Our 351 protein (97,002 AA site) super-matrix places several port (UFBOOT) values (Minh et al. 2013) for relevant groups as orphan lineages in two separate clades emerging between sites were progressively removed from fastest to slowest Amorphea and all other major eukaryote groups. All methods (fig. 2A). All groups of interest receive reasonably strong sup- recover a strongly supported clade comprising the free- port until 44,000–48,000 sites were removed, when sup- swimming collodictyonid flagellates, the idiosyncratic filose port fell markedly for the ancryomonadþ malawimonad protist Rigifila (Rigifilida), and the gliding flagellate clade and the Amorpheaþ CRuMsþ ancryomonadþ mala- Mantamonas. This clade is resilient to exclusion both of fast- wimonad clan. At this point, a notable proportion of the boot- evolving sites and of randomly selected genes. It is also con- strap trees show malawimonads and/or ancyromonads sistently placed as the immediate sister taxon to Amorphea. grouping with metamonads. This decline in support for the This represents the first robust estimate of the positions of ancryomonadþ malawimonad group reverses somewhat these three taxonomically poor but phylogenetically deep with further site removal, before support falls again as overall clades. Previous phylogenomic analyses placed collodictyonids phylogenetic structure is lost when 76,000 sites are re- in various positions, such as sister to either malawimonads or moved (fig. 2A). Amoebozoa, but often with low statistical support (Zhao et al. To evaluate heterogeneity in phylogenetic signals among 2012; Cavalier-Smith et al. 2014). Placements of genes (Inagaki et al. 2009), we also inferred phylogenies from Mantamonas have varied dramatically. A recent phyloge- subsamples of the 351 examined genes (61-taxon data set; nomic study recovered a weak Mantamonasþ collodictyonid fig. 2B and C). For each subsample 20–80% of the genes clade in some analyses, but other analyses in the same study were randomly selected, without replacement, with replica- instead recovered a weak Mantamonasþ ancyromonad rela- tion as per figure 2B (giving a>95% probability that a par- tionship (Cavalier-Smith et al. 2014), and SSUþ LSU rRNA ticular gene would be sampled at each level), and UFBOOT gene phylogenies strongly grouped Mantamonas with apu- support for major clades was inferred (fig. 2C). The “80% somonads (Glu ¨ cksman et al. 2011; Yabuki, Ishida, et al. retained” replicates gave nearly identical results to the full 2013). Our study decisively supports the first of these possi- data set, indicating that there was little stochastic error asso- bilities. This is the first phylogenomic analysis incorporating ciated with gene sampling at this level. Support for the Rigifilida: Previous SSUþ LSUrRNAgene analysesrecovered CRuMs clade is almost always high when 40%þ of genes a negligibly supported collodictyonidþ rigifilid clade, but not are retained, whereas subsamples containing 60% of genes a relationship with Mantamonas (Yabuki, Ishida, et al. 2013). still showed differing support for a ancyromonad- Overall, the hypotheses that 1) collodictyonids, rigifilids, malawimonad clade (as opposed to, e.g., malawimonads and Mantamonas form a major eukaryote clade, and 2) this branching with metamonads). clade is sister to Amorphea, are novel, plausible, and evolu- We also investigated whether heterogeneity in amino acid tionarily important. No name exists for this putative super- composition among sequences in the data set had any impact group, and it is obviously premature to propose a formal on the branching order of the inferred phylogenies. Clustering taxon. We suggest the place-holding moniker “CRuMs” on aminoacidcompositionfailedtorecover any groupings (Collodictyonidae, Rigifilida, Mantamonas), which is euphonic that were inferred in our phylogenies (supplementary fig. S5, and evokes the species-poor nature of these taxa. Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 431 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Brown et al. GBE Whether ancyromonads branch outside Amorphea or 15H04411 awarded to K.I., and 15H05231 to T.H.) and by the within it has been disputed (Paps et al. 2013; Cavalier-Smith “Tree of Life” research project (University of Tsukuba). et al. 2014). Our study strongly places ancyromonads outside Amorphea, more distantly related to it than are the CRuMs. Literature Cited Ancyromonads instead fall “among” the excavate lineages Adl SM, et al. 2012. The revised classification of eukaryotes. J Eukaryot (Discoba, Metamonada, and Malawimonadidae). Resolving Microbiol. 59(5):429–493. the relationships among “excavates” is extremely challenging Brugerolle G, Bricheux G, Philippe H, Coffea G. 2002. Collodictyon tricilia- (Hampl et al. 2009; Derelle et al. 2015), and this likely con- tum and Diphylleia rotans (¼Aulacomonas submarina) form a new family of flagellates (Collodictyonidae) with tubular mitochondrial cris- tributed to our difficulty in resolving the exact position of tae that is phylogenetically distant from other flagellate groups. Protist ancyromonads vis-a-vis malawimonads. A close relationship 153(1):59–70. between ancyromonads and some/all excavates would be Burki F. 2014. The eukaryotic tree of life from a global phylogenomic broadly consonant with the marked cytoskeletal similarity be- perspective. Cold Spring Harb Perspect Biol. 6(5):a016147. tween Ancyromonas and “typical excavates” (Heiss et al. Burki F, et al. 2016. Untangling the early diversification of eukaryotes: a phylogenomic study of the evolutionary origins of Centrohelida, 2011). Certainly, our study flags ancyromonads as highly rel- Haptophyta and Cryptista. Proc Biol Sci. 283:20152802. evant to resolving relationships among excavates. Cavalier-Smith T. 2010. Kingdoms Protozoa and Chromista and the Both candidate positions for ancyromonads place them at eozoan root of the eukaryotic tree. Biol Lett. 6(3):342–345. the center of a crucial open question: locating the root of the Cavalier-Smith T. 2013. Early evolution of eukaryote feeding modes, cell eukaryote tree. As discussed earlier, the latest analyses structural diversity, and classification of the protozoan phyla Loukozoa, Sulcozoa, and Choanozoa. Eur J Protistol. 49(2):115–178. (Derelle et al. 2015) locate the root between Cavalier-Smith T, et al. 2014. Multigene eukaryote phylogeny reveals the Discobaþ Diaphoretickes (“Diphoda”) and a clade including likely protozoan ancestors of opisthokonts (animals, fungi, choanozo- Amorphea, collodictyonids, and malawimonads ans) and Amoebozoa. Mol Phylogenet Evol. 81:71–85. (“Opimoda”). Our phylogenies show the ancyromonad line- Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with age emerging close to this split. One of the two positions we Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 10:210. recovered would actually place ancyromonads either as the Derelle R, et al. 2015. Bacterial proteins pinpoint a single eukaryotic root. deepest branch within “Diphoda,” or the deepest branch Proc Natl Acad Sci U S A. 112(7):E693–E699. within “Opimoda,” or even as sister to all other extant eukar- Feuda R, et al. 2017. Improved modeling of compositional heterogeneity yotes. This demonstrates the profound importance of includ- supports sponges as sister to all other animals. Curr Biol. ing ancyromonads in future rooted phylogenies of 27(24):3864–3870. Glu ¨ cksman E, et al. 2011. The novel marine gliding zooflagellate genus eukaryotes, using data sets optimized for this purpose. Mantamonas (Mantamonadida ord. n.: Apusozoa). Protist 162(2):207–221. Supplementary Material Hampl V, et al. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups.” Supplementary data areavailableat Genome Biology and Proc Natl Acad Sci U S A. 106(10):3859–3864. Evolution online. He D, et al. 2014. An alternative root for the eukaryote tree of life. Curr Biol. 24(4):465–470. Heiss AA, Walker G, Simpson AGB. 2011. The ultrastructure of Ancyromonas, a eukaryote without supergroup affinities. Protist Acknowledgments 162(3):373–393. The authors thank Tom Cavalier-Smith and Ed Glu¨cksman Inagaki Y, Nakajima Y, Sato M, Sakaguchi M, Hashimoto T. 2009. Gene sampling can bias multi-gene phylogenetic inferences: the relationship (Oxford University) for supplying cultures strains B-70 between red algae and green plants as a case study. Mol Biol Evol. (Ancyromonas sigmoides), NYK3C (Fabomonas tropica), and 26(5):1171–1178. Bass1 (Mantamonas plastica). The part of this work conducted Kang S, et al. 2017. Between a pod and a hard test: the deep evolution of at Dalhousie University was supported by NSERC Discovery amoebae. Mol Biol Evol. 34:2258–2270. grants awarded to A.G.B.S. (298366-2014) and A.J.R. Katz LA, Grant JR, Parfrey LW, Burleigh JG. 2012. Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. (2016-06792), respectively. A.J.R. also acknowledges the Syst Biol. 61(4):653–660. Canada Research Chairs program for support. This project Katz LA, Grant JR. 2015. Taxon-rich phylogenomic analyses resolve the was supported in part by the National Science Foundation eukaryotic tree of life and reveal the power of subsampling by sites. (NSF) Division of Environmental Biology (DEB) grant Syst Biol. 64(3):406–415. 1456054 (http://www.nsf.gov), awarded to M.W.B. Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site- Mississippi State University’s High Performance Computing heterogeneous model. BMC Evol Biol. 7(Suppl 1):S4. Collaboratory provided some computational resources. The Le SQ, Lartillot N, Gascuel O. 2008. Phylogenetic mixture models for part of this work conducted at the University of Tsukuba proteins. Philos Trans R Soc Lond B Biol Sci. was supported by grants from the Japan Society for the 363(1512):3965–3976. Promotion of Science (JSPS; 15H05606 and 15K14591 Minh BQ, Nguyen MAT, von Haeseler A. 2013. Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol. 30(5):1188–1195. awarded to R.K., 23117006 and 16H04826 awarded to Y.I., 432 Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group GBE Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast Susko E, Roger AJ. 2007. On reduced amino acid alphabets for phyloge- and effective stochastic algorithm for estimating maximum-likelihood netic inference. Mol. Biol. Evol. 24(9):2139–2150. phylogenies. Mol Biol Evol. 32(1):268–274. Tice AK, et al. 2016. Expansion of the molecular and morphological Paps J, Medina-Chaco n LA, Marshall W, Suga H, Ruiz-Trillo I. 2013. diversity of Acanthamoebidae (Centramoebida, Amoebozoa) Molecular phylogeny of unikonts: new insights into the position of and identification of a novel life cycle type within the group. apusomonads and ancyromonads and the internal relationships of Biol Direct. 11:69. opisthokonts. Protist 164(1):2–12. Torruella G, et al. 2015. Phylogenomics reveals convergent evolution of Pawlowski J. 2013. The new micro-kingdoms of eukaryotes. BMC Biol. lifestyles in close relatives of animals and fungi. Curr Biol. 11:40. 25(18):2404–2410. Philippe H, et al. 2011. Resolving difficult phylogenetic questions: why Wang H, Minh B, Susko E, Roger AJ. 2017. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phy- more sequences are not enough. PLoS Biol. 9(3):e1000602. Pisani D, et al. 2015. Genomic data do not support comb jellies as the sister logenomic estimation. Syst Biol. doi: 10.1093/sysbio/syx068. group to all other animals. Proc Natl Acad Sci U S A. Wang H-C, Li K, Susko E, Roger AJ. 2008. A class frequency mixture model 112(50):15402–15407. that adjusts for site-specific amino acid frequencies and improves in- Rodrigue N, Lartillot N. 2014. Site-heterogeneous mutation-selection mod- ference of protein phylogeny. BMC Evol Biol. 8:331. els within the PhyloBayes-MPI package. Bioinformatics Yabuki A, Eikrem W, Takishita K, Patterson DJ. 2013. Fine structure of 30(7):1020–1021. Telonema subtilis Griessmann, 1913: a flagellate with a unique cyto- Simpson AGB. 2003. Cytoskeletal organization, phylogenetic affinities and skeletal structure among eukaryotes. Protist 164(4):556–569. systematics in the contentious taxon Excavata (Eukaryota). Int J Syst Yabuki A, Ishida K-I, Cavalier-Smith T. 2013. Rigifila ramosa n. gen., n. sp., Evol Microbiol. 53(6):1759–1777. a filose apusozoan with a distinctive pellicle, is related to Simpson AGB, Eglit Y. 2016. Protist diversification. In: Kliman RM, editor. Micronuclearia. Protist 164:75–88. Encyclopedia of evolutionary biology. Vol. 3. Amsterdam: Elsevier. p. Zhao S, et al. 2012. Collodictyon–an ancient lineage in the tree of eukar- 344–360. yotes. Mol Biol Evol. 29(6):1557–1568. Susko E, Field C, Blouin C, Roger AJ. 2003. Estimation of rates-across-sites distributions in phylogenetic substitution models. Syst Biol. 52(5):594–603. Associate editor:Laura Katz Genome Biol. Evol. 10(2):427–433 doi:10.1093/gbe/evy014 Advance Access publication January 19, 2018 433 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/427/4817507 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

Genome Biology and EvolutionOxford University Press

Published: Feb 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

Monthly Plan

  • Read unlimited articles
  • Personalized recommendations
  • No expiration
  • Print 20 pages per month
  • 20% off on PDF purchases
  • Organize your research
  • Get updates on your journals and topic searches

$49/month

Start Free Trial

14-day Free Trial

Best Deal — 39% off

Annual Plan

  • All the features of the Professional Plan, but for 39% off!
  • Billed annually
  • No expiration
  • For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588

$360/year

billed annually
Start Free Trial

14-day Free Trial