The genome of Plasmodium falciparum, the causative agent of malaria in Africa, has been extensively studied since it was ﬁrst fully sequenced in 2002. However, many open questions remain, including understanding the chromosomal context of molecular evolutionary changes (e.g., relationship between chromosome map and phylogenetic conservation, patterns of gene duplication, and patterns of selection). Here, we present PhyloChromoMap, a method that generates a phylogenomic map of chromosomes from a custom-built bioinformatics pipeline. Using P. falciparum 3D7 as a model, we analyze 2,116 genes with homologs in up to 941 diverse eukaryotic, bacterial and archaeal lineages. We estimate the level of conservation along chromosomes based on conservation across clades, and identify “young” regions (i.e., those with recent or fast evolving genes) that are enriched in subtelomeric regions as compared with internal regions. We also demonstrate that patterns of molecular evolution for paralogous genes differ signiﬁcantly depending on their location as younger paralogs tend to be found in subtelomeric regions whereas older paralogs are enriched in internal regions. Combining these observations with analyses of synteny, we demonstrate that subtelomeric regions are actively shufﬂed among chromosome ends, which is consistent with the hypothesis that these regions are prone to ectopic recombination. We also assess patterns of selection by comparing dN/dS ratios of gene family members in subtelomeric versus internal regions, and we include the important antigenic gene family var. These analyses illustrate the highly dynamic nature of the karyotype of P. falciparum, and provide a method for exploring genome dynamics in other lineages. Key words: chromosomal mapping, Plasmodium falciparum, phylogenomics, karyotype evolution, antigenic genes. Introduction lamblia, Encephalitozoon cuniculi (Biderre et al. 1999), Numerous studies of plants, animals, and fungi have in- Encephalitozoon hellem (Delarbre et al. 2001), and formed the classical view of karyotypes as stable entities Plasmodium falciparum (Freitas-Junior et al. 2000; Scherf that have only minor variations within species (Hope et al. 2008; Hernandez-Rivas et al. 2013; Claessens et al. 1993; Sites and Reed 1994; Schubert and Vu 2016). 2014), the same type of chromosomal rearrangements con- However, an increasing number of studies of unicellular tributes to antigenic variation, which allows escape from the eukaryotes in the last decades have revealed that karyo- host immune system. Most of these karyotype variations types are more dynamic than originally thought (McGrath have been described using microscopy and/or analyses of and Katz 2004; Zufall et al. 2005; Parfrey et al. 2008; Katz limited sets of genes (Loidl and Nairz 1997; Biderre et al. 2012; Oliverio and Katz 2014). For instance, recombination 1999; Freitas-Junior et al. 2000; Delarbre et al. 2001). between nonhomologous chromosomes (i.e., ectopic re- The growing number of genomes that are available combination) can lead to intraspeciﬁc variation of the kar- enables the development of new methods to explore patterns yotype in the model organism Saccharomyces cerevisiae of karyotype evolution. Well-annotated genomes can be used (Loidl and Nairz 1997). In parasites such as Giardia to build physical maps in order to compare structural The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact email@example.com Genome Biol. Evol. 10(2):553–561. doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 553 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Ceron-Romero et al. GBE characteristics such as gene content and synteny. For in- eukaryote lineages) in single gene trees. We also assess pat- stance, genome maps have allowed detection of differences terns of molecular evolution in paralogs across chromosomes, in synteny among species of the lineages Ostreococcus and provide a map that indicates putative origin of genes. (Palenik et al. 2007), Plasmodium (Carlton et al. 1999; Kooij et al. 2005), Saccharomyces (Walther et al. 2014), and Materials and Methods Trypanosoma (Ghedin et al. 2004). Likewise for phylogenomic Development of PhyloChromoMap analyses, the increase in genomic data provides more taxa and genes to compare. Analysis of the phylogenetic history Starting from a phylogenomic pipeline previously built in our lab of genes along chromosome combining these maps can yield (Grant and Katz 2014; Katz and Grant 2015), we develop important insights about the evolution of karyotypes. PhyloChromoMap to map the evolutionary history of genes Plasmodium falciparum, the most virulent of the human along chromosomes (https://github.com/Katzlab/PhyloChromo malaria parasites, is a good model to study karyotype evolu- Map_py; last accessed January 2018). Our initial collection of tion because its life cycle has been extensively studied and its homologs uses gene families deﬁned in OrthoMCL (http:// genome has been fully sequenced (Gardner et al. 1998, www.orthomcl.org/orthomcl/; last accessed January 2018) and 2002). The AT-rich genome of P. falciparum is divided among as such, each of these clusters of homologs is referred to as an 14 chromosomes that harbor housekeeping genes in their “orthologous group” or OG. We analyze a total of 5,336 pu- internal regions and antigen genes at their ends (Gardner tative coding genes from P. falciparum 3D7 (assembly et al. 2002). Because of the importance of antigenic variation ASM276v1) by BLAST (Altschul et al. 1990) against OrthoMCL as P. falciparum evades host immune system, the ends of the (supplementary ﬁg. S1, Supplementary Material online). This chromosomes (that are enriched for antigenic gene families) results in 2,116 genes falling in 1,962 OGs that are represented have been relatively well characterized (de Bruin et al. 1994; in our pipeline. The remaining OGs are not represented in our Pace et al. 1995). In P. falciparum, these regions are marked taxon-rich pipeline either because they contain very few homo- by telomeres, followed by a 40 kb region, the “telomere logs or because they produce very poor quality alignments that associated sequences,” containing a series of repeat sequen- are discarded in subsequent steps of the pipeline; these are ces (Figueiredo et al. 2000, 2002; Figueiredo and Scherf 2005; labeled as NIP (not in pipeline) in tables and ﬁgures. We repre- Hernandez-Rivas et al. 2013). Antigen genes var, rif,and ste- sent graphically the number of minor clades (e.g., Apicomplexa) vor are located after 40 kb, where the abundance of repeated per major clade (e.g., SAR) for every OG in our pipeline (ﬁg. 1 genes makes this region prone to ectopic recombination and supplementary ﬁgs. S1 and S2, Supplementary Material (Scherf et al. 2001; Hernandez-Rivas et al. 2013). This obser- online). We then use the R “image” function (Team 2016), vation has led to the proposal that subtelomeric regions in P. which uses a matrix to display spatial data, to display the phy- falciparum evolve through ectopic recombination between logenomic history of genes along the chromosome map. In chromosomes (Freitas-Junior et al. 2000; Scherf et al. 2001; order to validate our method and results for P. falciparum,we Hernandez-Rivas et al. 2013). implement PhyloChromoMap also in the model organism S. Genomes from other apicomplexans have been completed, cerevisiae S288C, mapping 3338 of its 5893 ORFs (ﬁg. 2 and enabling comparative genomic analyses between those line- supplementary ﬁg. S3, Supplementary Material online). ages and P. falciparum. Previous studies comparing presence and absence of genes show high conservation in gene content Deﬁnition of Subtelomeres and Detection of Young among Plasmodium species (Carlton et al. 2002, 2008; Pain Portions and Centromeres et al. 2008). While comparisons among apicomplexan species revealed that few genes are shared among all species (<34%; We deﬁne subtelomeric regions after producing the chromo- Kuo et al. 2008; Kissinger and DeBarry 2011). some maps and observing that all chromosome ends contain In this study we explore further the evolution of the P. well deﬁned young regions. We then focus on subtelomeric falciparum genome by analyzing the phylogenetic conserva- regions that contain the most distal 15% of the chromosome tion of genes and gene families in their chromosomal context. or the ﬁnal 200 kb (whichever is smaller) to capture these In order to achieve this goal, we develop a method, young regions. We use a custom Ruby script to walk the PhyloChromoMap, to depict the evolutionary history of genes chromosomes and detect young portions in the subtelomeric along a chromosomal map. Using P. falciparum as a case of and internal regions (supplementary ﬁg. S1, Supplementary study we infer the phylogeny of its genes with a taxon-rich Material online). Young portions are deﬁned as regions in phylogenomic pipeline (Grant and Katz 2014; Katz and Grant which genes are in <3 major eukaryotic clades, though 2015). Then, we estimate the level of conservation of protein we allow the presence of one gene conserved in three or coding sequences by determining the presence or absence more major clades. Moreover, we illustrate a gene as present of homologs in other clades (i.e., Bacteria, Archaea, in a major clade only if it is found in at least 25% of its Opisthokonta, Archaeplastida, SAR [Stramenopiles, minor clades to account for spurious results and intradomain Alveolata, Rhizaria], Excavata, Amoebozoa, and other Lateral Gene Transfer (LGT; see supplementary Materials, 554 Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 PhyloChromoMap GBE FIG.1.—Exemplar phylogenomic maps of chromosomes 1, 2, and 7 of Plasmodium falciparum 3D7 highlighting “young” subtelomeric and internal regions (boxes). Black lines represent chromosomes of P. falciparum 3D7 and bars above reﬂect levels of conservation, with dashed boxes around “young” regions. First row from the bottom (NIP, “not in pipeline”) indicates ORFs that do not match our criteria for tree building (i.e., likely Plasmodium-speciﬁc or misannotated ORFs). The remaining rows (bottom to top) are heatmaps reﬂecting the proportion of lineages of SAR (Sr), Archaeplastida (Pl), Opisthokonta (Op), orphans (EE, “everything else”), Amoebozoa (Am), Excavata (Ex), Bacteria (Ba), and Archaea (Ar) that contain the indicated gene. Shorter linesbelow the chromosomes show the location of paralogs of Plasmodium speciﬁc gene family members involved in antigenic responses: var and rif. FIG.2.—Exemplar phylogenomic maps of chromosomes 1–3 of Saccharomyces cerevisiae S288C. Black lines represent chromosomes of S. cerevisiae S288C and bars above reﬂect levels of conservation. First row from the bottom (NIP, “not in pipeline”) indicates ORFs that do not match our criteria for tree building (i.e., likely Saccharomyces-speciﬁc or misannotated ORFs). The remaining rows (bottom to top) are heatmaps reﬂecting the proportion of lineages of Opisthokonta (Op), Amoebozoa (Am), Excavata (Ex), orphans (EE, “everything else”), Archaeplastida (Pl), SAR (Sr), Archaea (Ar), Bacteria (Ba) that contain the indicated gene. Unlike to all the other chromosomes (supplementary ﬁg. S2), chromosome I exhibits large regions of low gene content toward the ends. Supplementary Material online for more detail here). We Analysis of Gene Family Members: Synteny, Gene search young portions in both subtelomeric and internal Content, and dN/dS Ratios regions, only considering internal young portions that We perform a synteny analysis of subtelomeric and internal are 90 kb (supplementary table S1, Supplementary young portions using SyMAP (Soderlund et al. 2006; supple- Material online). All chromosomes except chromosome 10 mentary ﬁg. S1, Supplementary Material online). We explore have an internal region of around 2–3 kb with the highest different values for the minimum number of anchors to deﬁne GC content, 94–98%. This region is assumed as centromere a synteny block (i.e., from 3 to 7) and do not see any major (Bowman et al. 1999; Hall et al. 2002). In chromosome 10 this differences (supplementary ﬁg. S4, Supplementary Material region is less obvious, encompassing only around 1 kb with a online). We choose parameters to better retain duplications: 94% GC content (supplementary table S2, Supplementary N¼ 2 (retain the anchors with scores among the top 2) Material online). and anchor scores 80% of the second best anchor. Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 555 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Ceron-Romero et al. GBE FIG.3.—Paralogs in (a) subtelomeric regions of Plasmodium falciparum 3D7 tend to be young whereas paralogs in (b) internal regions tend to be old. The 14 chromosomes of P. falciparum are displayed as a circle with the red portions of each chromosome indicating subtelomeric regions. The lines within the circles link pairs of paralogs and the color indicates how many eukaryotic major clades (MC, see notes in ﬁg. 1) contain those paralogs (i.e., older paralogs are more blue and younger paralogs are more green). Finally, overlapping synteny blocks are merged. We also survey such as rif and stevor, because there are few rif and no stevor the gene content of young portions, including Plasmodium spe- paralogs in the internal regions of the chromosomes. ciﬁc coding domains (supplementary ﬁg. S1, Supplementary Material online). We categorize the sequences by gene family Analysis of Putative Origin of Genes when possible and plot their frequency as a heatmap (supple- mentary ﬁg. S5, Supplementary Material online). We use two approaches to detect both recent and old inter- We useCIRCOSplots (Krzywinski et al. 2009)to map domain LGT events in P. falciparum, a parametric approach paralogs of genes that match OGs (ﬁg. 3 and supplementary based on nucleotide composition and a phylogenetic ap- ﬁg. S1, Supplementary Material online). In CIRCOS, we proach (supplementary table S3, Supplementary Material on- choose the option “links” for representing these paralogs, line). For the parametric approach, we calculate the average with a single link connecting each pair of paralogs. The rela- GC content per chromosome and per gene; when the aver- tive age of paralogs is calculated as the number of major age GC content in a gene is 2 SD away from the chromosomal clades that contain them and is also displayed in the plots. average GC content, the gene is considered as a candidate Additionally, pairwise dN/dS values are calculated for all paral- laterally transferred gene. Then, we use BLAST to assess ogs using yn00, PAML (Yang 1997) and compared between whether the gene is shared only between Apicomplexa and subtelomeric and internal paralogs (ﬁg. 4). prokaryotes. For the phylogenetic approach, we explore the We conduct a phylogenetic analysis for protein sequences topology of gene trees with custom python scripts that incor- of var using RAxML (Stamatakis 2014) and model of evolution porate the phylogenetic toolkit P4 (Foster 2004). In the topol- WAGþ IþGþ F. The model of evolution is inferred using ogy of the gene trees, we identify potential interdomain LGTs Prottest3 (Darriba et al. 2011). The resulting phylogenetic when: (1) the gene trees contain only prokaryotes and tree is used to calculate a dN/dS value (free ratio model) using Apicomplexa; and (2) Apicomplexa lineages are monophyletic codeML-PAML (Yang 1997)and HyPhy (Kosakovsky Pond and nested or sister to a clade of Bacteria/Archaea. et al. 2005; supplementary ﬁg. S6, Supplementary Material We also estimate putative origin of genes by counting pres- online). Difference of selection intensity between internal and ence and absence of taxa in gene trees. Archaea, Bacteria, or subtelomeric copies is analyzed using the software RELAX major clades of Eukaryotes are considered as present in a from the Datamonkey package (Wertheim et al. 2015). This gene tree if at least 25% of their minor clades are present. analysis is not performed in other antigenic gene families Genes that have bacteria and at least 5 of the 6 eukaryotic 556 Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 PhyloChromoMap GBE and corresponding gene trees built in RaxML (Stamatakis 2014), which includes up to 519 Eukaryotes, 303 Bacteria and 119 Archaea (Grant and Katz 2014; Katz and Grant 2015). PhyloChromoMap estimates the phylogenetic conser- vation for every gene based on the presence/absence of major and minor lineages in single gene trees (see Materials and Methods, table 1). We then use the function “image” in R (Team 2016) to map the phylogenetic conservation of each gene along each chromosome. We use PhyloChromoMap to estimate the level of conser- vation of 5,336 protein coding genes along the chromosomes of P. falciparum strain 3D7. The results indicate that 21% of the genes of P. falciparum are present in at least some FIG.4.—Paralogs from gene family var (blue) do not exhibit signiﬁcant representatives of all major eukaryotic clades (i.e., SAR, differences in selection intensity (i.e., dN/dS) according to location, Archaeplastida, Excavata, Amoebozoa, and Opisthokonta; whereas paralogs from other gene families (red and black) show signiﬁ- table 1). Some genes are more ancient/conserved as they cant differences between subtelomeric and internal regions. This graph are also shared with Archaea (3%), Bacteria (4%), or both depicts the dN/dS ratio for three data sets of paralogs, with the x-axis Archaea and Bacteria (5%). In contrast, 2% of the genes are representing the percentage of length of each chromosome, and the more recent as they are present only in Plasmodium and other graph represents the summary across all 14 chromosomes. Levels of con- members of the SAR clade. Roughly 60% of “genes” (i.e., servation vary among subtelomeric paralogs (red), internal paralogs ORFs) in the P. falciparum genome are fast evolving, unique to (black), and paralogs of the gene family var (blue). Paralogs exhibit signif- Plasmodium and/or are misannotated; these genes are con- icantly different dN/dS ratios according to their location (Kolmogorov– sidered “NIP” in our analyses as they do not pass our criteria Smirnov, P< 0.05), with subtelomeric paralogs having the highest ranges for generation of multisequence alignments and trees (see of dN/dS rations and internal paralogs being under relatively constant levels of constraint. In contrast, dN/dS in var paralogs are not affected Materials and Methods, table 1). by location (RELAX, k¼ 1.22, P> 0, 05; supplementary ﬁg. S6, We build phylogenomic maps of the 14 chromosomes of Supplementary Material online) and are under less functional constraint P. falciparum 3D7 to illuminate patterns of conservation than most internal paralogs. across different chromosomal regions (ﬁg. 1 and supplemen- tary ﬁg. S2, Supplementary Material online). Distinct patterns of conservation are found across chromosomes. For instance, major clades (considering orphans [“EE”—everything else] as whereas internal regions contain primarily conserved genes a major clade) are candidate Endosymbiotic Gene Transfers (i.e., genes with many homologs in other lineages), subtelo- (EGTs) from mitochondria. Genes that have bacteria and at meric regions contain almost exclusively young genes. We least 2 major clades of photosynthetic eukaryotes (i.e., SAR, recognize that “young” genes will include both fast evolving Archaeplastida, some orphans) are candidate EGTs from the genes (i.e., those whose identity to homologs is very low) as plastid. Genes that have at least 5 eukaryotic major clades and well as genes with recent origins. We determine the length of no prokaryotes are candidate conserved genes from the Last “young” regions (i.e., those containing genes shared with Eukaryotic Common Ancestor (LECA). Genes present in members of two or fewer major eukaryotic clades, allowing Archaea and at least 5 eukaryotic major clades are candidate for a single “interrupting” gene) and found that subtelomeric conserved genes from the Last Archaeal Common Ancestor young regions average 134 kb (range of 85–218 kb; supple- (LACA, which includes the ancestor of eukaryotes, Williams mentary table S1, Supplementary Material online), and inter- et al. 2013; Hug et al. 2016). Finally, genes present in nal young regions average 106 kb (range of 91–141 kb; Archaea, Bacteria and at least 5 eukaryotic major clades supplementary table S1, Supplementary Material online). On have a putative origin in the Last Universal Common the other hand, centromeric regions do not exhibit any clear Ancestor (LUCA). All these genes were mapped (ﬁg. 5 and pattern of gene conservation as these regions harbor young supplementary ﬁg. S7, Supplementary Material online). genes in some chromosomes (e.g., chromosomes 3 and 7) and old/conserved in others (e.g., chromosomes 2 and 5; ﬁg. 1 and supplementary ﬁg. S2, Supplementary Material Results online). Development of PhyloChromoMap To exemplify further the power of PhyloChromoMap,we We build PhyloChromoMap to map the evolutionary history also generate the phylogenomic map of the chromosomes of of genes along chromosomes using P. falciparum as a test S. cerevisiae in order to validate our method (ﬁg. 2 and sup- case. In sum, we start with a collection of 13,104 multise- plementary ﬁg. S3, Supplementary Material online). Overall quence alignments generated in Guidance (Sela et al. 2015) this map shows a higher density of genes than we observe Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 557 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Ceron-Romero et al. GBE FIG.5.—Exemplar phylogenomic map of the chromosomes 1, 2, and 7 according to the putative origin of genes. The arrows are candidate LGTs from prokaryotes to Apicomplexa. NIP: Not in pipeline, likely young genes, are in black. Candidate EGTs from plastid (in at least 2 photosynthetic major clades [i.e., Sr, Pl, EE]) and mitochondria (in at least 4 eukaryotic major clades and Bacteria) are in green and orange, respectively. Candidate conserved genes from LECA (in at least 4 eukaryotic major clades), LACA (in at least 4 eukaryotic major clades and Archaea), and LUCA (in at least 4 eukaryotic major clades, Archaea and Bacteria) are in magenta, blue, and red, respectively. for P. falciparum and here too we do not see any pattern of of subtelomeric and internal paralogs demonstrate that while gene conservation near the centromeres (ﬁg. 2 and supple- subtelomeric regions tend to accumulate more “young” or mentary ﬁg. S3, Supplementary Material online). Unlike the SAR-speciﬁc paralogs, internal regions tend to accumulate pattern for P. falciparum, we ﬁnd no evidence of young sub- “old” paralogs that are conserved in ﬁve or more major clades telomeric regions except for chromosome I, which contains a (ﬁg. 3). There is also a difference in the patterns of selection dense central region ﬂanked by low gene density in the distal acting on subtelomeric and internal paralogs: Subtelomeric regions (ﬁg. 2). Previous studies reveal that chromosome I is paralogs tend to have higher and more variable dN/dS ratios rich in rRNA genes (Seligy and James 1977) and unexpressed (mean 0.48, 95% CI 0.42–0.53) than paralogs in internal pseudogenes, suggesting that these regions represent the regions (mean 0.15, 95% CI 0.13–0.16). This implies that yeast equivalent of heterochromatin (Bussey et al. 1995). paralogs in internal regions are more consistently subject to functional constraint than subtelomeric paralogs. Paralogs of the gene family var, which encode for PfEMP1 Synteny and Gene Content Analyses in Young Portions antigens, exhibit different patterns than paralogs of other We test for recombination between subtelomeric (ST) regions genes. The var genes are young as they are speciﬁc of and internal (IN) young portions of chromosomes through P. falciparum and are also frequently found in internal regions analysis of synteny (supplementary ﬁg. S4, Supplementary (ﬁg. 1 and supplementary ﬁg. S5, Supplementary Material Material online) and comparison of gene content (supplemen- online). Moreover, dN/dS ratios are relatively high for var tary ﬁg. S5, Supplementary Material online). Chromosomes genes (mean 0.5, 95% CI 0.46–0.54; ﬁg. 4 and supplemen- share blocks of sequences in conserved order (i.e., synteny tary ﬁg. S6, Supplementary Material online). In contrast to blocks) in subtelomeric regions with a few exceptions patterns for other gene families, there are no signiﬁcant dif- 0 0 0 0 (14ST3 , 14ST5 ,5ST3 , and 11ST3 ; supplementary ﬁg. S4, ferences among dN/dS ratios between internal and subtelo- Supplementary Material online). Some subtelomeric regions meric var paralogs based on RELAX, a hypothesis testing 0 0 0 (e.g., 13ST3 ,1ST5 , 11ST5 ) have complex patterns of syn- framework for detecting relaxed selection (Wertheim et al. teny, sharing many blocks with other subtelomeric regions. In 2015). This suggests that natural selection coupled with re- contrast, internal young regions do not share synteny blocks. combination contributes to levels of variation among var In addition, although there are some gene family members genes, which in turn are important in enabling these parasites shared between young portions of internal and subtelomeric to escape host immune systems (Kyes et al. 2007). regions, subtelomeric regions tend to harbor more antigenic genes such as var, rif,and stevor (supplementary ﬁg. S5, Putative Gene Origin Supplementary Material online). Given that our novel method connects the physical chromo- somal map with the evolutionary history of genes sampled Analysis of SAR-Speciﬁc and Older Paralogs from across the tree of life, we can map putative origins of We compare the patterns of evolution of gene family mem- genes along chromosome maps. Using an approach based on bers across subtelomeric and internal regions of the chromo- differences of GC content, we detect one possible case of a somes. We analyze both levels of conservation and selection recent interdomain LGT event involving P. falciparum and intensity, the latter estimated by dN/dS ratios (Yang 1997; prokaryotes (supplementary table S3, Supplementary Kosakovsky Pond et al. 2005; Wertheim et al. 2015). Maps Material online). This gene (FIRA) is an interspersed repeat 558 Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 PhyloChromoMap GBE Table 1 to explore patterns of karyotype, gene and molecular evolu- Summary of Conservation of Genes in Plasmodium falciparum tion. Using P. falciparum as a model, we characterize the level Description Number of Occurrences of evolutionary conservation in genes along all fourteen chro- mosomes. This analysis demonstrates that subtelomeric Total in P. falciparum 3D7 5,336 Recent (NIP): In fewer than 3,220 (60%) regions are young as compared with internal chromosome 10 species in pipeline regions, which contain a mixture of conserved and lineage- Older (IP): Phylogenomic pipeline 2,116 (40%) speciﬁc genes (ﬁg. 1 and supplementary ﬁg. S2, Distribution Supplementary Material online). These data and the evidence In all major clades of Eukaryotes 1,144 (21%) of syntenic blocks among subtelomeres (supplementary ﬁg. In at least 4 major clades of Eukaryotes 1,440 (27%) S4, Supplementary Material online) are consistent with the In at least 3 major clades of Eukaryotes 1,644 (31%) hypothesis that chromosomes of P. falciparum are actively In prokaryotes 635 (12%) swapping subtelomeric regions due to frequent ectopic re- In Bacteria and Archaea 267 (5%) combination (Freitas-Junior et al. 2000; Scherf et al. 2001, In Bacteria and not in Archaea 202 (4%) 2008; Hernandez-Rivas et al. 2013). Analyses using ﬂuores- In Archaea and not in Bacteria 166 (3%) cent in situ hybridization reveal that chromosomes of P. fal- NOTE.—NIP, not in our pipeline, which required10 species to build phylogeny; IP, in pipeline. ciparum attach to the nuclear periphery in clusters, suggesting A sequence is considered to be present in a major clade only if it is present on that these clusters may facilitate recombination across subte- at least 25% of the clades from the next taxonomic rank (e.g., Apicomplexans, lomeric regions of chromosomes (Freitas-Junior et al. 2000). Ciliates, Animals, Fungi); sequences in only a few lineages may be contaminants or the resultofgene transfers. Differences in levels of conservation across chromosomes The ﬁve major clades are: SAR (Sr), Archaeplastida (Pl), Opisthokonta (Op), exist in diverse lineages from across the tree of life. For in- Amoebozoa (Am), and Excavata (Ex). stance, the soil bacterium Streptomyces also has more con- served genes in the internal part of its linear chromosomes and the younger genes towards chromosome ends (Bentley antigen, whichis involvedindrugresistance (Stahl et al. et al. 2002; Ikeda et al. 2003; Chater 2016). As is thecasefor 1987). Moreover, analyzing single gene trees, we detect nine P. falciparum, young genes in Streptomyces evolve by recom- possible cases of ancient LGT events involving prokaryotes bination, mostly with linear plasmids or segments of chromo- and Apicomplexa (supplementary table S3, Supplementary somes from other Streptomyces (Chater 2016). Other Material online). Here, we identify cases where apicomplexan eukaryotic lineages such as the yeast Saccharomyces and sequences are nested within bacterial clades in single gene the parasites Giardia intestinalis and E. cuniculi also tend to trees (see Materials and Methods). These genes have varied have younger genes toward the chromosome ends (Kellis function and do not display any distinctive pattern of distribu- et al. 2003; Ankarklev et al. 2015; Dia et al. 2016). tion in the chromosomes (supplementary ﬁg. S7, Chromosome ends in these lineages are also subject to rear- Supplementary Material online). rangements such as translocations or duplications, which pro- We also assign genes along our chromosome map to cat- motes diversity in telomeric and subtelomeric gene families egories of putative origins, which can then be used for further (Kellis et al. 2003; Ankarklev et al. 2015). In contrast, the investigation. For example, genes that are widely distributed in highly conserved ribosomal DNA loci are found in subtelo- bacteria, archaea and eukaryotes may date to LUCA whereas meric regions of the nucleomorph (remnant nuclei from algal genes found only in photosynthetic eukaryotes (and some- symbionts) genomes in cryptomonads and chlorarachnio- times also some bacteria) may represent cases of EGT from phytes (Lane and Archibald 2006; Lane et al. 2006; Silver plastids (ﬁg. 5 and supplementary ﬁg. S7, Supplementary et al. 2010; Tanifuji et al. 2014). Material online). On the basis of an analysis of presence/ab- senceoftaxaon genetrees, wedetect179 genes thatare candidate cases of EGT from plastids and 148 genes that are Chromosome Swapping of Subtelomeric Regions and candidate cases of EGT from mitochondria (or bacteria). We Evolution of Gene Families also detect 844 genes that may be conserved from LECA, 151 We analyze the relationship between level of conservation of from LACA and 238 putatively from LUCA (ﬁg. 5 and supple- duplicated genes and chromosomal location, and ﬁnd that mentary ﬁg. S7, Supplementary Material online). paralogs in subtelomeric regions tend to be young as com- pared with those throughout the rest of the chromosome map (ﬁg. 3). Mechanisms underlying gene duplication in Discussion eukaryotes include unequal crossing over, transposition/retro- Patterns of Gene Conservation in P. falciparum and Other transposition and genome or segmental duplication (Hahn Eukaryotes 2009). The use of PhyloChromoMap reveals that gene Here, we present PhyloChromoMap, a novel method that duplication occurs frequently during the shufﬂing of subtelo- combines the power of phylogenomics and genome mapping meric regions between chromosomes, leading to differences Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 559 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Ceron-Romero et al. GBE of gene content between subtelomeric and internal regions in Acknowledgments P. falciparum (supplementary ﬁg. S5, Supplementary Material We thank J.R. Grant (Smith College) and R. Dorit (Smith online). For instance, subtelomeric regions in P. falciparum are College) for help with the phylogenomic pipeline and LGT enriched for the rapidly evolving immune response gene fam- analysis, respectively; and M.M. Fonseca (Centre of Marine ilies such as var, rif, stevor (Freitas-Junior et al. 2000; Kyes et al. and Environmental Research of the University of Porto) and 2007; Hernandez-Rivas et al. 2013); hence, the evolution of members of the Katz lab for comments on earlier version of these gene families is linked to the mechanisms of karyotype the manuscript. This work was supported by National variation. Institutes of Health grant 1R15GM113177-01, and National Given the differences in history of duplicated genes in sub- Science Foundation grants DEB-1541511 and DEB-1208741 telomeric versus internal regions, we evaluate the level of to L.A.K. functional constraints/selection in paralogs along chromo- some maps using dN/dS rations (ﬁg. 4 and supplementary ﬁg. S6, Supplementary Material online). We compare patterns Literature Cited for the var gene family, which are deployed as the parasite Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. seeks to evade host immune responses (Su et al. 1995; Scherf Ankarklev J, et al. 2015. Comparative genomic analyses of freshly isolated et al. 2008; Claessens et al. 2014), to paralogs of other gene Giardia intestinalis assemblage A isolates. BMC Genomics. 16:697. families in both subtelomeric and internal regions (ﬁg. 4). Bentley SD, et al. 2002. Complete genome sequence of the model acti- Overall, paralogs of subtelomeric gene families are under nomycete Streptomyces coelicolor A3(2). Nature 417(6885):141–147. less selection constraint than paralogs of internal regions as Biderre C, et al. 1999. Molecular karyotype diversity in the microsporidian Encephalitozoon cuniculi. Parasitology 118(5):439–445. evidenced by dN/dS ratios (ﬁg. 4). The varying levels of con- Bowman S, et al. 1999. The complete nucleotide sequence of chromo- straint observed between subtelomeric and internal gene some 3 of Plasmodium falciparum. Nature 400(6744):532–538. families suggest that the mechanism of ectopic recombination Bussey H, et al. 1995. The nucleotide sequence of chromosome I from introduces mutations into gene family members. In contrast, Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 92(9):3809–3813. patterns for var paralogs are not affected by their position in Carlton JM, et al. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature 455(7214):757–763. the chromosome (ﬁg. 4 and supplementary ﬁg. S6, Carlton JM, et al. 2002. Genome sequence and comparative analysis of Supplementary Material online). The more constant level of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature constraint in the var gene family indicates that other forces are 419(6906):512–519. at play in diversifying members of this particular gene family, Carlton JM, Galinski MR, Barnwell JW, Dame JB. 1999. Karyotype and independent of the location along the chromosome. synteny among the chromosomes of all four species of human malaria parasite. Mol Biochem Parasitol. 101(1–2):23–32. Chater KF. 2016. Recent advances in understanding Streptomyces. F1000Res 5:2795. Claessens A, et al. 2014. Generation of antigenic diversity in Plasmodium Putative Origin of Each Gene of P. falciparum falciparum by structured rearrangement of var genes during mitosis. PLoS Genet. 10(12):e1004812. PhyloChromoMap enables exploration of the age and origin Darriba D, Taboada GL, Doallo R, Posada D. 2011. ProtTest 3: fast selection of genes along chromosomes. For example, we identify three of best-ﬁt models of protein evolution. Bioinformatics candidate LGTs (i.e., 1-cys peroxiredoxin, ribosomal protein 27(8):1164–1165. de Bruin D, Lanzer M, Ravetch JV. 1994. The polymorphic subtelo- L35 precursor and holo-ACP synthase, supplementary table meric regions of Plasmodium falciparum chromosomes contain S3, Supplementary Material online) as potential EGTs as arrays of repetitive sequence elements. Proc Natl Acad Sci U S they encode for apicoplastic functions such as fatty acid syn- A. 91(2):619–623. thesis. We can then map cases of EGT and LGT along chro- Delarbre S, Gatti S, Scaglia M, Drancourt M. 2001. Genetic diversity in the mosomes of P. falciparum 3D7 (ﬁg. 5 and supplementary ﬁg. microsporidian Encephalitozoon hellem demonstrated by pulsed-ﬁeld gel electrophoresis. J Eukaryot Microbiol. 48(4):471–474. S7, Supplementary Material online). We also bind genes into Dia N, et al. 2016. Subtelomere organization in the genome of the micro- categories based on possible age (ﬁg. 5): LUCA indicates sporidian Encephalitozoon cuniculi: patterns of repeated sequences genes in bacteria, archaea, and many eukaryotes, LACA are and physicochemical signatures. BMC Genomics. 17:34. genes only in Archaea and Eukaryotes, and LECA are genes Figueiredo L, Scherf A. 2005. Plasmodium telomeres and telomerase: the found only among diverse eukaryotes. Importantly, these cat- usual actors in an unusual scenario. Chromosome Res. 13:517–524. Figueiredo LM, Freitas-Junior LH, Bottius E, Olivo-Marin JC, Scherf A. 2002. egorizations should be viewed as putative—they indicate hy- A central role for Plasmodium falciparum subtelomeric regions in spa- potheses and future directions for study. tial positioning and telomere length regulation. EMBO J. 21:815–824. Figueiredo LM, Pirrit LA, Scherf A. 2000. Genomic organisation and chro- matin structure of Plasmodium falciparum chromosome ends. Mol Supplementary Material Biochem Parasitol. 106:169–174. Supplementary data areavailableat Genome Biology and Foster PG. 2004. Modeling compositional heterogeneity. Syst Biol. 53(3):485–495. Evolution online. 560 Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018 PhyloChromoMap GBE Freitas-Junior LH, et al. 2000. Frequent ectopic recombination of virulence Pace T, Ponzi M, Scotti R, Frontali C. 1995. Structure and superstructure of factor genes in telomeric chromosome clusters of P-falciparum.Nature Plasmodium falciparum subtelomeric regions. Mol Biochem Parasitol. 407(6807):1018–1022. 69(2):257–268. Gardner MJ, et al. 1998. Chromosome 2 sequence of the human malaria Pain A, et al. 2008. The genome of the simian and human malaria parasite parasite Plasmodium falciparum. Science 282(5391):1126–1132. Plasmodium knowlesi. Nature 455(7214):799–803. Gardner MJ, et al. 2002. Genome sequence of the human malaria parasite Palenik B, et al. 2007. The tiny eukaryote Ostreococcus provides genomic Plasmodium falciparum. Nature 419(6906):498–511. insights into the paradox of plankton speciation. Proc Natl Acad Sci U S Ghedin E, et al. 2004. Gene synteny and evolution of genome architecture A. 104(18):7705–7710. in trypanosomatids. Mol Biochem Parasitol. 134(2):183–191. Parfrey LW, Lahr DJG, Katz LA. 2008. The dynamic nature of eukaryotic Grant JR, Katz LA. 2014. Building a phylogenomic pipeline for the eukaryotic genomes. Mol Biol Evol. 25(4):787–794. tree of life – addressing deep phylogenies with genome-scale data. PLoS Scherf A, Figueiredo LM, Freitas-Junior LH. 2001. Plasmodium telomeres: a Curr. 6:ecurrents.tol.c24b6054aebf3602748ac042ccc8f2e9. pathogen’s perspective. Curr Opin Microbiol. 4(4):409–414. Hahn MW. 2009. Distinguishing among evolutionary models for the main- Scherf A, Lopez-Rubio JJ, Riviere L. 2008. Antigenic variation in tenance of gene duplicates. J Hered. 100(5):605–617. Plasmodium falciparum. Annu Rev Microbiol. 62:445–470. Hall N, et al. 2002. Sequence of Plasmodium falciparum chromosomes 1, Schubert I, Vu GTH. 2016. Genome stability and evolution: attempting a 3–9 and 13. Nature 419(6906):527–531. holistic view. Trends Plant Sci. 21(9):749–757. Hernandez-Rivas R, Herrera-Solorio AM, Sierra-Miranda M, Delgadillo DM, Sela I, Ashkenazy H, Katoh K, Pupko T. 2015. GUIDANCE2: accurate Vargas M. 2013. Impact of chromosome ends on the biology and viru- detection of unreliable alignment regions accounting for the un- lence of Plasmodium falciparum. Mol Biochem Parasitol. 187(2): certainty of multiple parameters. Nucleic Acids Res. 43(W1): 121–128. W7–14. Hope RM. 1993. Selected features of marsupial genetics. Genetica 90(2– Seligy VL, James AP. 1977. Multiplicity and distribution of rDNA cistrons 3):165–180. among chromosome I and VII aneuploids of Saccharomyces cerevisiae. Hug LA, et al. 2016. A new view of the tree of life. Nat Microbiol. 1:16048. Exp Cell Res. 105(1):63–72. Ikeda H, et al. 2003. Complete genome sequence and comparative anal- Silver TD, Moore CE, Archibald JM. 2010. Nucleomorph ribosomal DNA ysis of the industrial microorganism Streptomyces avermitilis.Nat and telomere dynamics in chlorarachniophyte algae. J Eukaryot Biotechnol. 21(5):526–531. Microbiol. 57(6):453–459. Katz LA. 2012. Origin and diversiﬁcation of eukaryotes. Ann Rev Microbiol. Sites JW, Reed KM. 1994. Chromosomal evolution, speciation, and sys- 66:411–427. tematics – some relevant issues. Herpetologica 50:237–249. Katz LA, Grant JR. 2015. Taxon-rich phylogenomic analyses resolve the Soderlund C, Nelson W, Shoemaker A, Paterson A. 2006. SyMAP: a sys- eukaryotic tree of life and reveal the power of subsampling by sites. tem for discovering and viewing syntenic regions of FPC maps. Syst Biol. 64(3):406–415. Genome Res. 16(9):1159–1168. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. 2003. Sequencing Stahl HD, Crewther PE, Anders RF, Kemp DJ. 1987. Structure of the FIRA and comparison of yeast species to identify genes and regulatory gene of Plasmodium falciparum. Mol Biol Med. 4(4):199–211. elements. Nature 423(6937):241–254. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis Kissinger JC, DeBarry J. 2011. Genome cartography: charting the apicom- and post-analysis of large phylogenies. Bioinformatics 30(9): plexan genome. Trends Parasitol. 27(8):345–354. 1312–1313. Kooij TW, et al. 2005. A Plasmodium whole-genome synteny map: indels Su XZ, et al. 1995. The large diverse gene family var encodes proteins and synteny breakpoints as foci for species-speciﬁc genes. PLoS involved in cytoadherence and antigenic variation of Plasmodium fal- Pathog. 1(4):e44. ciparum-infected erythrocytes. Cell 82(1):89–100. Pond SLK, Frost SDW, Muse SV. 2005. HyPhy: hypothesis testing using Tanifuji G, et al. 2014. Nucleomorph and plastid genome sequences of the phylogenies. Bioinformatics 21(5):676–679. chlorarachniophyte Lotharella oceanica: convergent reductive evolu- Krzywinski M, et al. 2009. Circos: an information aesthetic for comparative tion and frequent recombination in nucleomorph-bearing algae. BMC genomics. Genome Res. 19(9):1639–1645. Genomics. 15:374. Kuo CH, Wares JP, Kissinger JC. 2008. The Apicomplexan whole-genome Team RC. 2016. R: a language and environment for statistical computing phylogeny: an analysis of incongruence among gene trees. Mol Biol [Internet]. Vienna, Austria. Evol. 25(12):2689–2698. Walther A, Hesselbart A, Wendland J. 2014. Genome sequence of Kyes SA, Kraemer SM, Smith JD. 2007. Antigenic variation in Plasmodium Saccharomyces carlsbergensis, the world’s ﬁrst pure culture lager falciparum: gene organization and regulation of the var multigene yeast. G3 (Bethesda) 4(5):783–793. family. Eukaryot Cell. 6(9):1511–1520. Wertheim JO, Murrell B, Smith MD, Kosakovsky Pond SL, Schefﬂer K. Lane CE, Archibald JM. 2006. Novel nucleomorph genome architecture in 2015. RELAX: detecting relaxed selection in a phylogenetic frame- the cryptomonad genus hemiselmis. J Eukaryot Microbiol. 53(6): work. Mol Biol Evol. 32(3):820–832. 515–521. Williams TA, Foster PG, Cox CJ, Embley TM. 2013. An archaeal origin of Lane CE, et al. 2006. Proceedings of the SMBE Tri-National Young eukaryotes supports only two primary domains of life. Nature Investigators’ Workshop 2005. Insight into the diversity and evo- 504(7479):231–236. lution of the cryptomonad nucleomorph genome. Mol Biol Evol. Yang Z. 1997. PAML: a program package for phylogenetic analysis by 23(5):856–865. maximum likelihood. Comput Appl Biosci. 13(5):555–556. Loidl J, Nairz K. 1997. Karyotype variability in yeast caused by nonallelic Zufall RA, Robinson T, Katz LA. 2005. Evolution of developmentally regu- recombination in haploid meiosis. Genetics 146(1):79–88. lated genome rearrangements in eukaryotes. J Exp Zool B-Mol Dev McGrath CL, Katz LA. 2004. Genome diversity in microbial eukaryotes. Evol. 304B(5):448–455. Trends Ecol Evol. 19(1):32–38. Oliverio AM, Katz LA. 2014. The dynamic nature of genomes across the tree of life. Genome Biol Evol. 6(3):482–488. Associate editor: John Archibald Genome Biol. Evol. 10(2):553–561 doi:10.1093/gbe/evy017 Advance Access publication January 22, 2018 561 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/553/4819265 by Ed 'DeepDyve' Gillespie user on 16 March 2018
Genome Biology and Evolution – Oxford University Press
Published: Feb 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud