Multiple and Independent Phases of Transposable Element Amplification in the Genomes of Piciformes (Woodpeckers and Allies)

Multiple and Independent Phases of Transposable Element Amplification in the Genomes of... The small and conserved genomes of birds are likely a result of flight-related metabolic constraints. Recombination-driven deletions and minimal transposable element (TE) expansions have led to continually shrinking genomes during evolution of many lineages of volant birds. Despite constraints of genome size in birds, we identified multiple waves of amplification of TEs in Piciformes (wood- peckers, honeyguides, toucans, and barbets). Relative to other bird species’ genomic TE abundance (< 10% of genome), we found 17–30% TE content in multiple clades within Piciformes. Several families of the retrotransposon superfamily chicken repeat 1 (CR1) expanded in at least three different waves of activity. The most recent CR1 expansions (4–7% of genome) preceded bursts of diversification in the woodpecker clade and in the American barbetsþ toucans clade. Additionally, we identified several thousand polymorphic CR1 insertions (hundreds per individual) in three closely related woodpecker species. Woodpecker CR1 insertion polymorphisms are maintained at lower frequencies than single nucleotide polymorphisms indicating that purifying selection is acting against additional CR1 copies and that these elements impose a fitness cost on their host. These findings provide evidence of large scale and ongoing TE activity in avian genomes despite continual constraint on genome size. Key words: transposable elements, CR1, genomics, woodpeckers, diversification. Introduction in the lineage leading to saurischian dinosaurs (Organ Birds, the only extant saurischian dinosaurs, have compar- et al. 2007). atively small genome sizes (0.9–2.1 Gb) relative to other Genome size is, however, fluctuating across avian line- tetrapods (Gregory et al. 2007; Wright et al. 2014). Flying ages (Hughes and Hughes 1995; Wright et al. 2014; organisms—including bats, birds, and pterosaurs—have Kapusta et al. 2017), butatalowerrelativepacecom- convergently evolved constricted genome sizes (Organ pared with most vertebrates because genome size evolu- and Shedlock 2009), and a growing body of evidence tion scales positively with genome size (Oliver et al. 2007). indicates that avian genome size is constrained by the Continual avian genome contraction is driven by metabolic requirements of powered flight. A reduced ge- recombination-caused deletions of small to large nome size in birds corresponds with decreased intron size (> 10 kb) segments of the genome (Nam and Ellegren (Zhang and Edwards 2012), decreased red blood cell and 2012; Kapusta et al. 2017), and a remarkable paucity of nucleus size (Gregory 2001, 2002), and increased flight transposable elements (TEs) (< 10% genomic content) ability (Wright et al. 2014). Among avian ancestors, initial (Chalopin et al. 2015; Kapusta and Suh 2017; Sotero- genomic contraction appears to have occurred> 200 Ma Caio et al. 2017). Reduced TE activity likely limits genome The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(6):1445–1456. doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1445 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE TE Superfamilies TE Families Super- (a) (b) (c)(d) CR1 Families % Genome Estimate % Genome Estimate families 02 10 0 30 02 4 6 J3_Pass E_Pass J2_Pass Y4 Legend Bee-eater Merops = Estimates from full CR1 genome assemblies Jacamar Galbula ERV3 Asian barbet Psilopogon R2 African barbet Satellite Lybius Other Toucan Ramphastos American barbet Families Capito Legend Honeyguide Indicator CR1 - J3_Pass CR1 - E_Pass Sasia CR1 - J2_Pass Nesoctites CR1 - Y4 Chrysocolaptes R2 - 1_PPu Colaptes Coraciiformes (e) n = 11 Sphyrapicus Galbuliformes n = 6 Picoides Toucans + Barbets n = 17 Piciformes Dendropicos Honeyguides n = 2 Dryobates Woodpeckers n = 17 03 15 0 03 15 0 03 15 0 03 15 0 1.0 1.6 2.2 46 other bird species Genome Size (Gbp) % Divergent from Consensus TEs Jarvis et al. 2014 FIG.1.—Summary of Piciformes genomic transposable element content. (a) Phylogeny of samples in our study [based on phylogeny of Jetz et al. (2012)]. (b) Summary of TE superfamily content. Stars indicate estimates from fully assembled genomes (Jarvis et al. 2014) and show that our estimates are conservative and underestimate genomic TE content. (c) Summary of select TE families genomic content. (d) Divergence curves of select CR1 families based on BLASTing raw TE matches to a portion of the 3 end of species- and TE-specific consensus sequences (see supplementary table S1, Supplementary Material online). The y axis is the relative frequency of percent divergence, and the families are on different scales to make them all legible. (e) Genome size estimates of all sampled clades (Gregory et al. 2007). Gray boxes highlight the Picidae clade (woodpeckers) and the clade including barbets and toucans. size increases and genomic rearrangements, resulting in in Piciformes. With whole-genome shotgun sequencing large-scale synteny in macrochromosomes across highly data, we used two data sets to characterize transposable ele- divergent lineages in the avian clade (Ellegren 2010). ment evolution in Piciformes. First, we sequenced representa- An exception to this trend is the downy woodpecker tives of all major lineages of Piciformes (fig. 1a)—including (Dryobates pubescens), the only species of Piciformes with a woodpeckers, honeyguides, toucans, and barbets—and a sin- complete genome sequence. A major genomic explosion gle lineage of Galbuliformes—a jacamar—to characterize ge- event, that is, massive expansion of TEs (Belyayev 2014), oc- nomic TE expansions and periods of activity. In addition, we curred somewhere in the evolutionary lineage leading to this sequenced several individuals of closely related woodpeckers species (Zhang et al. 2014); its genome contains greater than (Genus: Dryobates) to describe TE polymorphisms. The results a 2-fold increase in genomic TE content relative to all other indicate at least three genomic expansions of different families sequenced birds, mostly due to expansion of the superfamily of CR1, which preceded diversification rate shifts in two clades. chicken-repeat 1 (CR1), a type of non-long terminal repeat In addition, several thousand polymorphic TEs are found at low (LTR) retrotransposon. Whereas TE insertions may occasionally frequencies relative to single nucleotide polymorphisms, sug- impact some aspect of an organism’s phenotype, periods of gestive of ongoing negative selection against TE expansions. intense TE amplification can potentially instigate profound genomic changes and may promote evolutionary diversifica- tion (Jurka et al. 2007; Belyayev 2014; Hoffmann et al. 2015). Materials and Methods Coincident bursts of lineage diversification and genomic Sampling explosions of TEs have been identified in radiations of fishes To characterize TE diversity and genomic content across (de Boer et al. 2007)and mammals (Pascale et al. 1990; Piciformes’ genomes, we partially sequenced the genome of Pritham and Feschotte 2007; Ray et al. 2008). 13 species in the avian orders Piciformes and Galbuliformes at Some evidence supports that genome size increased low coverage (0.75 following conservative quality trimming, slightly in the ancestor of Piciformes (Wright et al. 2014); assuming a 1.2 Gb genome) and downloaded raw sequenc- this is suggestive of potential TE expansions across the entire ing reads for the downy woodpecker (Dryobates pubescens) clade outpacing deletions that maintain small genome size. To and the northern carmine bee-eater (Merops nubicus)from date, only one piciform genome has been sequenced, and its avian phylogenomics projects (Jarvis et al. 2014)(table 1). We sister lineage (Galbuliformes) lacks any genome sequence, implemented this sampling scheme to maximize phylogenetic precluding inference about scale, temporal dynamics, and diversity in woodpeckers (family: Picidae) and to represent evolutionary context and extent of genomic TE expansions 1446 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Table 1 Sampling Information Common Name Species Name Sample # Set # Reads Prop. Reads Northern Carmine Bee-eater Merops nubicus SRR958514 Phy 12,000,000 0.51 Bluish-fronted Jacamar Galbula cyanescens KU 24566 Phy 14,193,044 0.77 Green-eared Barbet Psilopogon faiostrictus KU 33324 Phy 15,599,647 0.61 Vieillot’s Barbet Lybius vieilloti KU 15540 Phy 19,029,009 0.64 Green-billed Toucan Ramphastos dicolorus KU 3649 Phy 18,531,397 0.64 Gilded Barbet Capito auratus KU 18855 Phy 12,855,219 0.74 Spotted Honeyguide Indicator maculatus KU 29101 Phy 17,445,057 0.78 Rufous Piculet Sasia abnormis KU 24421 Phy 16,261,827 0.69 Antillean Piculet Nesoctites micromegas KU 8153 Phy 15,253,876 0.77 Greater Flameback Chrysocolaptes lucidus KU 25777 Phy 11,283,426 0.72 Gilded Flicker Colaptes chrysoides KU 30078 Phy 16,480,947 0.84 Yellow-bellied Sapsucker Sphyrapicus varius KU 21911 Phy 17,981,465 0.82 Eurasian Three-toed Woodpecker Picoides tridactylus KU 30447 Phy 10,515,156 0.67 Brown-backed Woodpecker Dendropicos obsoletus KU 20039 Phy 9,945,685 0.69 Downy Woodpecker Dryobates pubescens SRR949789 Phy 12,000,000 0.66 Nuttall’s Woodpecker Dryobates nuttallii KU 29815 Pol 39,814,228 0.94 Nuttall’s Woodpecker Dryobates nuttallii KU 29816 Pol 59,121,354 0.96 Ladder-backed Woodpecker Dryobates scalaris KU 29797 Pol 74,403,829 0.97 Ladder-backed Woodpecker Dryobates scalaris KU 30061 Pol 53,177,745 0.96 Downy Woodpecker Dryobates pubescens KU 11987 Pol 57,308,585 0.96 Downy Woodpecker Dryobates pubescens KU 15939 Pol 60,567,542 0.97 NOTE.—KU samples from University of Kansas Biodiversity Institute and SRR numbers from NCBI sequence read archive. Phy, phylogenetic data set; Pol, polymorphisms data set; # reads, number raw reads in FASTQ files; Prop. Reads, proportion of raw reads retained after quality trimming. other major lineages in Piciformes. For previously sequenced and then sequenced all individuals on either an Illumina genomes, we used the reads from library preparations with a HiSeq2500 (101-bp paired-end) or Illumina HiSeq3000 target insert size of 500 bp for direct comparability to our (151-bp paired-end). Sequencing was performed at the newly sequenced data. All tissue samples were provided by New York University Abu Dhabi Center for Genomics and the University of Kansas Biodiversity Institute. We used the Systems Biology (HiSeq2500) or at the Oklahoma Medical bee-eater as an outgroup, a jacamar as a more closely related Research Foundation Clinical Genomics Center (HiSeq3000). outgroup, and included representatives of major lineages within Piciformes, including major woodpecker clades Phylogeny Data Set Quality Checking and Repeat (Dufort 2016; Shakya et al. 2017), honeyguides, three line- Database Creation ages of barbets, and toucans (fig. 1a and table 1). Hereafter, we refer to these samples as the “phylogeny data set.” We used Trimmomatic v0.36 (Bolger et al. 2014)to remove We also sequenced the genome of two samples each of adapter contamination and trim reads of low-quality bases. downy woodpecker (Dryobates pubescens), ladder-backed We trimmed all bases below a threshold of Q20 on the ends woodpecker (D. scalaris), and Nuttall’s woodpecker (D. nuttal- of all sequences, and trimmed regions of the sequences if the lii) at moderate-coverage (4–8 following quality trimming) to average quality dropped below Q30 in sliding windows of investigate CR1 insertion polymorphisms in closely related indi- 15 bp. Following trimming, reads were retained if they were viduals and species (table 1 and fig. 2a). The ladder-backed still 75 bp to limit spurious short-read matches to TEs. woodpecker and Nuttall’s woodpecker are sister species, All reads were filtered for mitochondrial DNA sequences with both together the sister lineage of the downy woodpecker using the bbsplit.sh perl script implemented as part of BBMap (Weibel and Moore 2002; Dufort 2016). Hereafter, we refer to v36.x (Bushnell 2014). We used several mitogenomes of birds these samples as the “polymorphisms data set.” from the orders Coraciiformes, Galbuliformes, and Piciformes We extracted genomic DNA from all tissue samples using a to filter reads. We standardized the number of basepairs (bp) magnetic bead DNA extraction protocol (Rohland and Reich per individual, trimmed every sequence to 75 bp using the 2012). We quantified all DNA extractions—for concentration FASTX toolkit (Gordon and Hannon 2010) and then used standardization—using Qubit Fluorometric Quantitation (Life Seqtk v1.2-r95-dirty (Li 2015) to randomly sample six million Technologies). We used standard Illumina library preparation paired-end reads from each individual, which we used for all for each sample (mean insert size between 500 and 700 bp), downstream analyses. Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1447 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE (a) Shared Polymorphic CR1 J3_Pass insertions ≤ 10% divergence D.n. D.n. D.s. D.n. + from consensus + polymorphic + + + D.s. + reference TEs D.s. D.p. D.p. D.p. 325 370 Total #: 3485 D. nuttallii 54 219 648 436 D. scalaris 857 124 D. pubescens (b)(c) = TEs = SNPs 0.8 Max. Div. 0.6 0.4 0.2 0.0 nuttallii pubescens scalaris FIG.2.—Patterns of CR1 polymorphisms in three closely related woodpeckers (genus Dryobates). (a) Patterns of CR1 J3_Pass insertion polymorphisms. Numbers on branches indicate fixed differences in that lineage. (b) Variation of insertion detection using four sensitivity settings in the MELT analyses. (c) Pairwise comparisons of polymorphisms* in four CR1 families relative to single nucleotide polymorphisms (SNPs). *¼ [Fixed / BETWEEN (Fixed þ Polymorphic )]. BETWEEN WITHIN We used REPdenovo (Chu et al. 2016) with the default to 123 TE families, largely CR1 elements or endogenous ret- settings to identify putative transposable elements and repet- roviruses. In total, the vast majority (85%) of newly identi- itive sequences from the raw sequencing reads. REPdenovo is fied repeats for our custom database shared homology with a kmer-based approach that identifies overrepresented kmers CR1 elements. in the genome. Following initial kmer identification, these sequences are joined where possible to create longer contigs. Summarizing Transposable Element Content in Piciformes With the output contigs from REPdenovo, we used a (Phylogeny Data Set) homology-based approach to identify putative transposable elements. First, we downloaded the vertebrate database of We used six million paired-end sequencing reads per individ- repetitive elements from RepBase v21.10, accessed on ual and our custom repeat database to identify the proportion October 10, 2016 (Jurka et al. 2005). We used of each species’ genome comprised of transposable elements BLASTþ v2.6.0 (Camacho et al. 2009)and the function (fig. 3). We used BLASTþ to match raw reads to our custom rmblastn to match REPdenovo contigs to previously identified repeat database with the following requirements: maximum transposable elements and repeats. rmblastn is a modified e-value of 0.01 (1e-2), a minimum of 60% identity, and less function of blastn for use with RepeatMasker (Smit et al. than three gaps in the alignment. We minimized the number 2015). For annotation to transposable elements, we required of gaps allowed to minimize matching of spurious short por- the matches to have a minimum of 60% identity to a previ- tions of reads matching multiple regions of TE sequences with ously annotated element, alignment length of 50 bp, and wide-spanning gaps. an e-value< 1e-6. All BLAST matches >90% identity were We performed the above BLASTþ search with the RepBase removed because raw reads would be mappable to original database (i.e., not including our newly annotated TEs) to see RepBase sequences. We created a custom repeat database for how including species-specific repetitive sequence informa- further analyses by combining the newly annotated repeats tion improved overall TE identification across low-coverage with the RepBase database. After de novo repeat identifica- genomes. Additionally, we summarized results with various tion and sequence annotation, we included 5,732 novel re- maximum e-values (1e-2, 1e-4, and 1e-6) to investigate sen- peat sequences in our modified repeat database. We sitivity of genome-wide TE content estimation to this identified novel repeat sequences with homology matches parameter. 1448 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Singletons Private (n > 1) J3_Pass J2_Pass E_Pass Y4 # Polymorphic TEs Insertions 3% 6% 10% 15% Fixed Differences / Total Segregating Sites Create Repeat Database Transposable Element Explosions in Piciformes GBE Shotgun Sequencing Trimmomatic Illumina HiSeq 2500 Trim ends, remove adapter contamination, Piciformes remove quality < Q30 Low-coverage BBMap Remove reads from Phylogeny mitochondrial genome Workflow Seqtk Standardize # of reads per species 6 million PE reads REPdenovo kmer-based approach to Summary of reads Dataset of 6 million identify repetitive elements matching repetitive PE reads per species from raw reads and elements join into contigs BLAST+ BLAST+ Match sequences and rmblastn to vertebrate portions of sequences to RepBase and annotate the repeat database newly identified elements Combine RepBase Extracted reads Repeat Database database with newly matching repetitive (de novo + previously annotated repetitive elements elements described) BLAST+ Geneious Divergence Match sequences to Reference-based TE consensus each species TE curves (RepBase sequences) sequences consensus sequences alignment of TEs FIG.3.—Workflow to detect genomic TE content using low-coverage shotgun sequencing of Piciformes genomes. Detailed information for all steps is described in Materials and Methods. Green shaded ovals indicate output used for comparisons and are summarized in figure 1. We extracted all reads matching several common CR1 fam- Woodpeckers Y4 Barbets + Toucans ilies (supplementary table S1, Supplementary Material online) Honeyguides for downstream use. We used all reads that completely Jacamar, Bee-eater matched TEs (i.e., 75 bp alignment) to perform reference- J2_Pass guided assemblies in Geneious v9.1.7 (BioMatters Ltd.). The RepBase reference sequences for each subfamily were used to guide assembly. From these assemblies, we created consensus sequences for each subfamily using a minimum of 10 Jacamar J2_Pass + coverage and a 25% threshold to call the consensus sequence J3_Pass bases. For each species, we created a separate consensus se- quence for each subfamily. We aligned all consensus sequen- ces for each of the four common CR1 families using MAFFT J3_Pass (Katoh and Standley 2013), implemented in Geneious. For each of these families, we extracted the longest stretch of 0.08 E_Pass aligned consensus sequences that was covered by each species (supplementary table S1, Supplementary Material online). FIG.4.—Neighbor-joining phylogeny of consensus TE sequences from From these alignments, we created a neighbor-joining phylog- CR1 families with large expansions in Piciformes genomes. The scale bar eny using Tamura–Nei distance matrices in Geneious (fig. 4). indicates estimated sequence divergence across the phylogeny. Our next goal with the common CR1 consensus sequences was to identify trends in percent sequence divergence to in- vestigate amplification patterns. CR1 insertions in the genome truncations) of the complete element (Burch et al. 1993). initiate on the 3 end of the full length element, with most Indeed, we observed a majority of the raw reads matching 0 0 insertions only including a small portion (i.e., severe 5 their respective consensus sequences near the 3 end. Because Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1449 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Sequencing Quality Checking Manthey et al. GBE we did not want to bias our results if there might be sequences are typical representatives of their respective sub- differential selective pressures (or ages) for shorter or longer families, and because they generally reflect the phylogenetic CR1 insertions, as well as to keep analyses consistent between patterns in their hosts (fig. 4). We preprocessed BAM and TE different CR1 families, we limited our analysis to sequences consensus files with the “Preprocess” and matching 500 bp regions near the 3 end of the full-length “BuildTransposonZIP” utilities of MELT, respectively. CR1 consensus sequences (supplementary table S1, When detecting polymorphic TEs in multiple individuals, Supplementary Material online). We used raw reads that MELT is a multistep process: 1) Discovery of potential TEs in were initially identified as one of these four CR1 families to each individual (“IndivAnalysis”); 2) Output of all individuals’ measure percent sequence divergence from species-specific TE discovery analyses are compiled together to identify puta- consensus sequences using BLASTþ. We created divergence tive insertion breakpoints in the reference genome histograms from this output with reads containing 50 bp (“GroupAnalysis”); 3) Genotyping of all insertions for each matching the CR1 consensus sequences (fig. 1d and supple- individual (“Genotype”); and 4) Filtering of genotype files mentary table S1, Supplementary Material online). and variant call format (VCF) file creation (“MakeVCF”). We limited the MELT analyses to scaffolds of at least two mega- bases (n¼ 179; 52% of genome) because highly frag- CR1 Polymorphism Detection in Three Woodpecker mented scaffolds and contigs influence the performance of Species (Polymorphisms Data Set) the program. These four steps were run for each CR1 family We used Trimmomatic v0.36 (Bolger et al. 2014) for quality separately. MELT allows different sensitivity thresholds for TE filtering of all raw reads with less conservative filtering relative detection by changing the maximum amount of allowed di- to the phylogeny data set because of higher representative vergence between putative polymorphic TEs and the consen- coverage. We removed adapter contamination and trimmed sus sequence. We used four maximum divergence levels (3%, reads of low-quality bases using the following filters: we re- 6%, 10%, and 15%) to have conservative and liberal TE poly- moved all bases below a quality of Q20 on the ends of all morphism estimates (i.e., different sensitivity levels). Any poly- sequences, and trimmed regions in sliding windows of 25 bp morphisms called by MELT for more than one of the CR1 if the average quality dropped below Q20. We filtered mito- families (breakpoints within 100 bp to allow some error) chondrial DNA with the bbsplit.sh perl script implemented as were matched to the most similar CR1 consensus sequence part of BBMap v36.x (Bushnell 2014) with the downy wood- and removed from other VCF files. We additionally postpro- pecker mitochondrial genome as a reference. cessed the MELT output using several filtering steps by remov- We used the BWA-MEM implementation of the Burrows– ing: 1) putative insertions near scaffold breakpoints ( 10 kb) Wheeler transform algorithm in BWA (Li and Durbin 2009)to to limit any effects of misassembly on scaffold edges; 2) TEs align all quality-filtered sequences to the downy woodpecker called with limited evidence, for example, imprecise break- (Dryobates pubescens) genome (April 28, 2014 version, points due to ambiguous alignment (MELT ASSESS flag 3); dx.doi.org/10.5524/101012). We used samtools v1.4.1 (Li 3) reads not passing MELT’s suggested quality filters (MELT et al. 2009) to convert the BWA output SAM file to BAM for- FILTER flag¼¼ PASS); and 4) any polymorphic TEs with miss- mat. Next, we cleaned, sorted, added read groups to, and re- ing calls for any individuals. moved duplicates from the BAM file for each individual using To compare CR1 polymorphisms to putatively neutral geno- Picard (available at: http://broadinstitute.github.io/picard). mic patterns, we called single nucleotide polymorphisms (SNPs) We used the final cleaned and filtered BAM files for each from all six Dryobates individuals. From the filtered and cleaned individual as input for detecting polymorphic transposable BAM file, we used the Genome Analysis Toolkit (GATK) elements with the Mobile Element Locator Tool v2.1.2 (McKenna et al. 2010) HaplotypeCaller to create an intermedi- (MELT) (Gardner et al. 2017). MELT uses unaligned and split ate genomic variant call format (gVCF) file for each individual. reads from BWA alignments, a reference genome, and con- All gVCFs were used as input in GATK to group genotype all sensus TE sequences to identify polymorphic TEs. Because SNPs and small indels. We filtered all output SNPs in VCFtools MELT relies on sequence similarity for identifying TEs, we v0.1.14 (Danecek et al. 2011) to keep those with the following could not use reference elements from RepBase. Instead, conditions: 1) sites with quality 20; 2) genotypes with qual- we used the downy woodpecker partial consensus sequences ity 20; 3) biallelic SNPs; 4) minimum depth per individual 5; of the CR1 subfamilies J3_Pass (length¼ 3,720 bp), J2_Pass 5) maximum mean depth per individual< 20; and 6) the site is (2,934 bp), E_Pass (2,655 bp), and Y4 (1,214 bp) that we cre- covered in all individuals. Lastly, we only used SNPs present on ated for use in the phylogeny data set. While these are not the scaffolds used in MELT analyses for comparisons. full-length consensus sequences, we are confident they are representative of their respective CR1 subfamilies in the three Analysis of Diversification Rates in Piciformes species of Dryobates woodpeckers investigated here because of the large number of sequencing reads used to construct We used the program Bayesian Analysis of Macroevolutionary the consensus sequences, the fact that the consensus Mixtures (BAMM) v2.5.0 (Rabosky 2014; Rabosky et al. 2014; 1450 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Mitchell and Rabosky 2017) to assess whether bursts of TE Material online), due to a large expansion of satellite DNA amplification coincide with diversification in Piciformes. For (10.5% of genome; fig. 1b). The majority of the satellite the input phylogeny for BAMM, we obtained data for all indi- BLAST matches in Nesoctites were similar to a 153-bp repeat viduals of the Coraciiformes, Galbuliformes, and Piciformes identifiedwithREPdenovo(supplementary fig. S2a, with available genetic data from birdtree.org (Hackettetal. Supplementary Material online). We manually investigated 2008; Jetz et al. 2012) using the Hackett 6670 OTUs data the satellite sequences to ensure these results were not spu- set. We used the highest clade credibility phylogeny from the rious. First, we aligned 2,000 75-bp sequence matches using 10,000 tree set as input. We used the BAMMtools (Rabosky MAFFT (Katoh and Standley 2013) to create a consensus se- et al. 2014) function “setBAMMpriors” to determine all input quence of the satellite. Within the consensus, we used mreps prior settings for BAMM, that is, using empirically optimized (Kolpakov et al. 2003) to identify any repetitive elements of parameterization. We ran two iterations of BAMM with 100 the consensus. We found a 10-bp motif (supplementary fig. million MCMC generations, using 10% burn-in (supplemen- S2c, Supplementary Material online) with 3.5 repeats in the tary fig. S1, Supplementary Material online) and an assumption consensus. We searched all long Nesoctites satellite BLAST of 50% missing taxa. This resulted in effective sample sizes for matches ( 70 bp, n¼ 594,155) for the repeat motif using all estimated paramaters> 3,500, and estimates of 23 or 25 the Biostrings R package (Pages et al. 2017), while allowing variations of credible shifts in the 95% posterior distribution for one mutation maximum (i.e., one mismatch) in the repeat the two runs. The major distinctive shift configurations output motif. from the two BAMM iterations were largely the same; the dif- ferent configurations had slight variation of node placement Results (supplementary fig. S1, Supplementary Material online). For Characterizing TE Dynamics with Whole-Genome Shotgun example, one rate shift varied between the nodes of the wood- Sequencing pecker clade Picinae to the Picidae clade, which is inclusive of We used whole-genome shotgun sequencing data to charac- Picinae and a few more genera (supplementary fig. S1, terize TE abundance and diversity across the genomes of Supplementary Material online). Piciformes. We identified a massive expansion of the retro- Recently, a critical analysis of BAMM (Moore et al. 2016) transposon superfamily CR1 (13–19% genomic content) in identified several putative flaws with the methodologies and the woodpecker and toucansþ barbets clades, but not in consistency of the program, to which BAMM’s authors pro- honeyguides or jacamars (fig. 1b and supplementary table vided a rebuttal (Rabosky et al. 2017): 1) the posterior of S2, Supplementary Material online). Minor expansions BAMM has strong prior sensitivity, 2) a faulty nondefault (< 2% genomic content) of endogenous retrovirus (ERV3) extinction-rate calculation, 3) strong impact of unobserved and woodpecker-specific R2 elements were also identified lineages on final rate shift results, and 4) unreliable results (fig. 1b and supplementary table S2, Supplementary when many shifts are present relative to the number of extant Material online). One piculet species (Nesoctites micromegas), tips in the phylogeny. Rabosky et al. (2017) showed that these from a monotypic genus endemic to Hispaniola, displayed a arguments were either flawed or did not have significant massive expansion of satellite sequence in its genome effects on final results when reasonable considerations were (fig. 1b). With further investigation of these satellite sequen- taken: 1) checking whether the prior distribution systemati- ces, we found the sequences exhibited an abundance of a 10- cally shaped the posterior distribution, 2) use of nonhidden, bp repeat motif (median¼ three motifs per 75 bp sequence, that is, default, extinction parameterization, 3) the use of sd¼ 1.11; supplementary fig. S2, Supplementary Material on- empirically parameterized values for the calculations, and 4) line). Compared with the two previously assembled the use of phylogenies with a large tip to rate-shift ratio (> 10: genomes—northern carmine bee-eater and downy 1). We followed the protocols outlined by (Rabosky et al. woodpecker—our approach underestimated genomic TE 2017) and additional guidelines at (http://bamm-project. content by 2–5% (fig. 1b), suggesting our approach in iden- org), checked the distribution of the prior versus the posterior tifying genomic TE content is conservative. Using different (supplementary fig. S1, Supplementary Material online), used sensitivities for detecting TEs, we found small changes in empirically derived values for calculations as determined using the absolute levels of genomic TE content (up to 4% less), BAMMtools (Rabosky et al. 2014), and used a phylogeny with but not in relative differences between species (supplemen- hundreds of tips relative to only a few possible variations of tary fig. S3, Supplementary Material online). It was critical to estimated rate shifts (supplementary fig. S1, Supplementary incorporate the custom repeats into the repeat search data- Material online). base; using only the RepBase consensus sequences resulted in missing a large proportion of TEs (1–15% genomic con- Nesoctites Satellite Investigation tent), which positively scaled with genomic TE content, that is, more TEs were missed in genomes with more TE content In Nesoctites, we found a distinct pattern in the rank- (supplementary fig. S4, Supplementary Material online). abundance curves (supplementary fig. S2, Supplementary Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1451 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE Overall, this approach could be implemented with few reads CR1s were largely short (< 1,000 bp) and truncated on the per individual for overall genomic TE content and TE family 5 end (supplementary fig. S7, Supplementary Material on- content; we recovered consistent results when we used sub- line); 20–25% of polymorphic CR1s were intronic and sets of 100,000 sequencing reads (supplementary fig. S5 and had a random orientation relative to their respective gene table S2, Supplementary Material online). insertion site (50% same orientation as gene) (supplemen- The CR1 expansion in Piciformes is due to at least three tary table S4, Supplementary Material online). These general waves of activity in different CR1 families (fig. 1c and supple- patterns of the polymorphic CR1s mirror the downy wood- mentary table S3, Supplementary Material online): 1) All pecker genome-wide patterns of CR1 truncation, intronic Piciformes had modest expansions (1–2% of genome abundance, and orientation within genes, suggesting that each; i.e., tens of millions of bp) of J2_Pass and Y4; 2) The current patterns of CR1 activity mirror general trends of woodpeckers had a major expansion of J3_Pass (4–5%) and long-term CR1 accumulation. We compared putatively neu- a minor expansion of E_Pass (1–2%); 3) Lastly, the bar- tral genetic variation (i.e., genome-wide SNPs) with CR1 betsþ toucans clade had mixed patterns, but massive overall J3_Pass polymorphisms, the most abundant TEs in woodpeck- expansions of J3_Passþ E_Pass (8–10%). These results spe- ers. We found a much smaller proportion of CR1 J3_Pass cifically exemplify the dynamic nature of the E_Pass and polymorphisms to be fixed between species than SNPs J3_Pass families; these two families each amplified at least (fig. 2c); similarly, J3_Pass insertions had a much higher fre- twice and to different magnitudes (fig. 1c)in the bar- quency of singletons than SNPs (supplementary fig. S8, betsþ toucans and woodpecker clades. Apart from major Supplementary Material online). We looked at patterns of CR1 genomic explosions, all Piciformes genomes are generally observed heterozygosity across all individuals to see if neutral homogenous in TE richness (types of TEs) and evenness (sim- genetic diversity was related to diversity of TE insertion poly- ilarity of TE abundances across species) (supplementary fig. morphisms and found that observed heterozygosity was neg- S6, Supplementary Material online). To estimate divergence atively correlated between SNPs and CR1 insertions, but curves for the J3_Pass, E_Pass, J2_Pass, and Y4 families, we positively correlated among all CR1 families (supplementary created species-specific and CR1 family specific consensus table S5, Supplementary Material online). Because selection is sequences for each species (fig. 4 and supplementary table more effective in large populations relative to smaller popu- S1, Supplementary Material online) and estimated divergence lations, this pattern is consistent with selection acting against of sequence reads from consensus sequences. The clade- CR1 proliferation, because genomes with higher genetic specific expansions of J3_Pass and E_Pass occurred simulta- diversity—and likely higher recent population sizes—have neously, as well as more recently than Piciformes-wide expan- lower diversity of CR1 polymorphisms. sions of J2_Pass and Y4 (fig. 1d and supplementary table S1, Supplementary Material online). Because large expansions of TEs, and associated genomic Discussion changes, may promote evolutionary diversification (Jurka et al. Multiple Transposable Element Genomic Explosions in 2007; Belyayev 2014; Hoffmann et al. 2015) we looked for Piciformes bursts of speciation that may coincide with major TE expansions Our results demonstrate multiple, independent waves of CR1 in Piciformes. Using BAMM, we found two consistent shifts in amplification in the Piciformes, with clade-specific expansions diversification rate that follow clade-specific TE expansions: of a similar magnitude to total TE content identified in most 1) increased diversification in the early radiation of either bird genomes (Zhang et al. 2014). The history of TE prolifer- Picidae (all woodpeckers) or Picinae (all woodpeckers excluding ation and activity in Piciformes is dynamic (fig. 1); we found several genera of wrynecks and piculets: Jynx, Picumnus, Sasia, different temporal patterns of CR1 activity. The J2_Pass and Verreauxia,and Vivia); and 2) increased diversification in the Y4 families were active and amplifying at low levels during early radiation of either both Ramphastidaeþ Capitonidae (tou- early evolution of the Piciformes clade, but lineage-specific cans and American barbets, respectively) or only Ramphastidae amplification of different CR1 subsets occurred subsequently. (supplementary fig. S1, Supplementary Material online). The honeyguides exhibit more TE content than most birds, but have relatively small expansions of CR1s compared with CR1 Polymorphisms in Three Species of Closely Related all other Piciformes (fig. 1a). The toucansþ barbets clade dis- Woodpeckers played the highest amplitude of CR1 expansions from the J3_Pass and E_Pass families, with distinct patterns in different We searched for the presence of polymorphic CR1 insertions lineages. Lastly, we found remarkably consistent patterns of in the downy woodpecker and two close relatives to deter- TE superfamily and family abundance across woodpeckers mine if CR1 is still active in woodpeckers. We also character- (fig. 1) with major amplifications of the CR1 J3_Pass and a ized several traits of polymorphic TEs to determine whether clade-specific amplification of an R2 element described from polymorphic CR1s show different patterns relative to those the downy woodpecker genome. One major deviant from the exhibited in the downy woodpecker genome. Polymorphic 1452 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE general trend in woodpeckers was Nesoctites, with expansions toucansþ barbets (but also see other slightly different possi- of satellite sequences (fig. 1a and supplementary fig. S2, bilities in supplementary fig. S1, Supplementary Material on- Supplementary Material online). Nesoctites is the lone repre- line). Transposable elements have long been recognized to sentative of a monotypic genus and will need to be investigated have potential for rearranging genomes, with phenotypic further with high-coverage genomic sequencing and assem- consequences (Feschotte and Pritham 2007; Feschotte bly. Similarly, a recent study identified a large genomic propor- 2008), and it was proposed that TEs could promote speciation tion of satellites in a North American bird species, the northern when bursts of diversification occur simultaneous with or af- spotted owl (Strix occidentalis caurina)(Hanna et al. 2017). ter TE genomic amplification (Jurka et al. 2007; Belyayev Large and variable clade-specific expansions of CR1s cause 2014; Hoffmann et al. 2015). This provides a chicken or the distinct evolutionary genomic patterns and lineage-specific egg type of question (the egg definitely came before the increases in genomic content. Overall, genome size in chicken.): Does rapid diversification induce rampant TE activity Piciformes (fig. 1e) likely increased relative to the clade’s com- or does TE amplification promote increased speciation rates mon ancestor (Wright et al. 2014), varies slightly among and consequent rapid diversification? Piciformes lineages, and differs by several hundred Mb within During rapid diversification, small populations—due to lineages (fig. 1e). Recently, genome-wide analyses of bird ge- founder events or other causes—will be more influenced by nome assemblies indicated that genome size remains rela- genetic drift and have less effective selection against TE tively static through evolutionary time despite TE activity due expansions (Tollis and Boissinot 2013; Ruggiero et al. 2017; to mid- and large-scale deletions (Kapusta et al. 2017). Trizzino et al. 2017). Similarly, populations experiencing novel However, these genome-size stability analyses relied on highly stressors, such as during population expansions to new envi- fragmented genome assemblies and taxonomic sampling at ronments, could experience disruption of TE epigenetic re- the level of avian orders. pression (Zeh et al. 2009). In contrast, the TE-thrust Despite the genome size variation across birds (range 0.9 hypothesis (Oliver and Greene 2011, 2012) posits that the to 2.1 Gb), the current collection of avian genome assemblies continuum of TE activity—from extinct to highly abundant is largely homogenous in size (Zhang et al. 2014), with and active—has various evolutionary consequences. With a moderate-coverage assemblies generally ranging between moderate TE abundance and activity in a lineage’s genomes, 1.0 and 1.3 Gb. The discrepancy between genome-size esti- the TE-thrust hypothesis suggests those genomes are more mates and genome-assembly sizes suggests we are missing a dynamic, adaptable, and may lead to increased rates of spe- large portion of avian genome size dynamics through the in- ciation (Oliver and Greene 2011, 2012). ability to accurately annotate and assemble large-scale inter- While the present data do not allow us to distinguish be- spersed repetitive elements in avian genomes. Although no tween these two possibilities, it is unlikely by chance that rapid genome size estimate exists for the downy woodpecker to diversification and large TE expansions coincided. A large compare with its genome assembly, one of its two closest body of evidence indicates that TE activity contributed to relatives (Nuttall’s woodpecker, Dryobates nuttallii) has a ge- novel phenotypes (Feschotte 2008; Belyayev 2014; Trizzino nomesizeof 1.48 Gb (Gregory et al. 2007), which indicates et al. 2017), and TE expansions occurring coincidently with an incongruence of at least 200 Mb. These gaps in genome diversification is documented in fishes (de Boer et al. 2007), assemblies are likely due to the most difficult regions to as- mammals (Pascale et al. 1990; Pritham and Feschotte 2007; semble: highly repetitive elements. The misrepresentation of Ray et al. 2008), and now birds (this study). Because the mas- genome size dynamics with highly fragmented assemblies is sive CR1 expansions we identified here precede diversification exemplified by two recent surveys with high-quality and rate shifts in Piciformes, we hypothesize that TEs may have chromosome-scale genome assemblies (chicken and crow) promoted genomic novelty in the woodpecker and bar- that have identified large genomic regions of repetitive ele- betsþ toucans clades and contributed to their bursts of in- ments that were undetected and unaccounted for in short- creased diversification rate, although we cannot imply read sequencing assemblies (Kapusta and Suh 2017; causation and this hypothesis will require further investigation Weissensteiner et al. 2017). This suggests that while some using high-coverage genomic data for this group. genome size variation in birds may be moderated with mid- and large-scale deletions (Kapusta et al. 2017), we are also Woodpeckers Have Moderate Levels of CR1 missing a substantial portion of the picture with most current Polymorphisms genome assemblies. We identified thousands of polymorphic CR1 elements in three woodpecker species in the genus Dryobates (fig. 2). Massive CR1 Expansions Precede Shifts in Diversification CR1 polymorphisms may be present in these species due to Rate maintained ancestral polymorphism since their most recent We found that two large bursts of CR1 activity preceded common ancestor, that is, at least several Ma (Shakya et al. bursts of diversification in the woodpeckers and in the 2017) or continually maintained through low but continual Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1453 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE CR1 activity. The magnitude of CR1 activity in woodpeckers barren landscape of fossil TEs, but a landscape under the in- produced a number of insertion polymorphisms similar to fluence of prolonged and ongoing TE action. those in other vertebrates with greater genomic TE content. For example, the woodpecker CR1 polymorphism level we Conclusions identified is greater in magnitude than LINE-1 polymorphisms Using whole-genome sequencing of several species of in humans (Stewart et al. 2011), as would be expected be- Piciformes (woodpeckers, toucans, barbets, and honey- cause most LINE-1 elements are inactive in humans. In con- guides), we identified massive TE expansions from at least trast, the green anole (Anolis carolinensis)has more than three waves of activity, resulting in doubling or tripling geno- double the number of CR1 polymorphisms per individual mic TE content in multiple Piciformes lineages. Additionally, than we found in our samples (Ruggiero et al. 2017). we found thousands of polymorphic CR1 insertions in wood- Similar to the overall pattern of the woodpecker genome, peckers maintained at low frequencies relative to single nu- CR1 polymorphisms showed a trend of extreme 5 truncation cleotide polymorphisms. Our findings show that TE activity in (supplementary fig. S7, Supplementary Material online), con- woodpeckers and closely related species has been prolonged sistent with patterns in anoles (Ruggiero et al. 2017). These and remains ongoing. highly truncated CR1 elements are essentially inactive on in- sertion, and are likely the result of inefficient reverse transcrip- tion or host interruption of reverse transcription (Levin and Supplementary Material Moran 2011). Consequently, CR1s are precluded from over- Supplementary data are available at Genome Biology and activity but maintain some genomic proliferation with few Evolution online. functional copies producing mainly truncated copies. J3_Pass CR1 polymorphism insertions are maintained at lower frequencies than SNP minor alleles, similar to patterns in Authors’ Contributions anoles (Ruggiero et al. 2017). The low to moderate, but non- J.D.M. and S.B. conceived and designed the study. J.D.M. and zero, frequency of CR1 insertions is strongly suggestive of R.G.M. provided samples. S.B. and R.G.M. funded the study. ongoing selection against additional CR1 genomic expansions J.D.M. performed laboratory work, and all bioinformatics and in the woodpecker lineage. Selection against ongoing expan- data analyses. J.D.M. wrote the original draft of the manu- sions may be due to continued constraints on genome size, script and all authors contributed to the review and editing of or, alternatively, due to negative effects of high retrotranspo- the final version of the article. sition rates, for example, a high prevalence of ectopic recombination. If woodpecker genomes exhibit continued selection against high levels of CR1 proliferation, the question remains Acknowledgments how CR1s amplified to such high numbers in the recent evo- We thank Mark Robbins at the University of Kansas lutionary past. Two mechanisms potentially explain CR1 pro- Biodiversity Institute for assistance with all tissue loans. liferation throughout the woodpecker genomes. First, novel Much of the data analyses were performed using the high- features of CR1s, such as new promoter classes (as observed performance computing cluster at New York University in in human LINE-1 elements; Khan et al. 2006), start amplifying Abu Dhabi. We thank Marc Arnoux from the Genome Core because host genomes lack strong defense mechanisms Facility at NYUAD for assistance with genome sequencing. against TEs with newly evolved features. This hypothesis is This work was supported by New York University Abu doubtful for woodpeckers, because all CR1 consensus Dhabi (NYUAD) research funds AD180 (to S.B.) and sequences look like normal CR1s relative to outgroup sequen- National Science Foundation DEB-1241181 and DEB- ces (fig. 4). Alternatively, and more likely, population frag- 1557053 (to R.G.M.). The NYUAD Sequencing Core is sup- mentation, founder events, or speciation may result in small ported by NYUAD Research Institute grant G1205-1205A to population sizes, and subsequently allow CR1 accumulation the NYUAD Center for Genomics and Systems Biology. through drift (Jurka et al. 2011), whereas in larger populations CR1 accumulation would be limited by selection. In particular, accumulation of relatively more deleterious full-length CR1s Literature Cited could cause higher retrotransposition rates and cause geno- Belyayev A. 2014. Bursts of transposable elements as an evolutionary driv- mic CR1 copy number increases. ing force. J Evol Biol. 27(12):2573–2584. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Altogether, CR1 polymorphisms at low to moderate levels Illumina sequence data. Bioinformatics 30(15):2114–2120. and selection against large CR1 genomic expansions in the Burch J, Davis DL, Haas NB. 1993. Chicken repeat 1 elements contain a woodpecker lineage, together with high levels of TE amplifi- pol-like open reading frame and belong to the non-long terminal re- cation across Piciformes, provide multiple lines of evidence peat class of retrotransposons. Proc Natl Acad Sci U S A. that woodpeckers’ and related species’ genomes are not a 90(17):8199–8203. 1454 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Bushnell B. 2014. BBMap: a Fast, Accurate, Splice-Aware Aligner. Berkeley Khan H, Smit A, Boissinot S. 2006. Molecular evolution and tempo of (CA): Ernest Orlando Lawrence Berkeley National Laboratory. amplification of human LINE-1 retrotransposons since the origin of Camacho C, et al. 2009. BLASTþ: architecture and applications. BMC primates. Genome Res. 16(1):78–87. Bioinformatics 10(1):421. Kolpakov R, Bana G, Kucherov G. 2003. mreps: efficient and flexible de- Chalopin D, Naville M, Plard F, Galiana D, Volff J-N. 2015. Comparative tection of tandem repeats in DNA. Nucleic Acids Res. analysis of transposable elements highlights mobilome diversity and 31(13):3672–3678. evolution in vertebrates. Genome Biol Evol. 7(2):567–580. Levin HL, Moran JV. 2011. Dynamic interactions between transposable Chu C, Nielsen R, Wu Y. 2016. REPdenovo: inferring de novo repeat motifs elements and their hosts. Nat Rev Genet. 12(9):615. from short sequence reads. PLoS One 11(3):e0150719. Li H. 2015. Seqtk: a toolkit for processing sequences in FASTA/Q formats. Danecek P, et al. 2011. The variant call format and VCFtools. Available from: https://github.com/lh3/seqtk. Li H, Durbin R. 2009. Fast and accurate short read alignment with Bioinformatics 27(15):2156–2158. de Boer JG, Yazawa R, Davidson WS, Koop BF. 2007. Bursts and horizontal Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760. evolution of DNA transposons in the speciation of pseudotetraploid Li H, et al. 2009. The sequence alignment/map format and SAMtools. salmonids. BMC Genomics 8:422. Bioinformatics 25(16):2078–2079. Dufort MJ. 2016. An augmented supermatrix phylogeny of the avian fam- McKenna A, et al. 2010. The Genome Analysis Toolkit: a MapReduce ily Picidae reveals uncertainty deep in the family tree. Mol Phylogenet framework for analyzing next-generation DNA sequencing data. Evol. 94(Pt A):313–326. Genome Res. 20(9):1297–1303. Ellegren H. 2010. Evolutionary stasis: the stable chromosomes of birds. Mitchell JS, Rabosky DL. 2017. Bayesian model selection with BAMM: Trends Ecol Evol. 25(5):283–291. effects of the model prior on the inferred number of diversification Feschotte C. 2008. The contribution of transposable elements to the evo- shifts. Methods Ecol Evol. 8(1):37–46. lution of regulatory networks. Nat Rev Genet. 9(5):397. Moore BR, Ho ¨ hna S, May MR, Rannala B, Huelsenbeck JP. 2016. Critically Feschotte C, Pritham EJ. 2007. DNA transposons and the evolution of evaluating the theory and performance of Bayesian analysis of mac- eukaryotic genomes. Annu Rev Genet. 41:331–368. roevolutionary mixtures. Proc Natl Acad Sci U S A. Gardner EJ, et al. 2017. The Mobile Element Locator Tool (MELT): 113(34):9569–9574. Population-scale mobile element discovery and biology. Genome Nam K, Ellegren H. 2012. Recombination drives vertebrate genome con- Res. 27:1916–1929. traction. PLoS Genet. 8(5):e1002680. Gordon A, Hannon G. 2010. Fastx-toolkit: FASTQ/A short-reads prepro- Oliver KR, Greene WK. 2011. Mobile DNA and the TE-Thrust hypothesis: cessing tools. Available from: http://hannonlab.cshl.edu/fastx_toolkit. supporting evidence from the primates. Mobile DNA 2(1):8. Oliver KR, Greene WK. 2012. Transposable elements and viruses as factors Gregory TR. 2001. The bigger the C-value, the larger the cell: genome size and red blood cell size in vertebrates. Blood Cells Mol Dis. in adaptation and evolution: an expansion and strengthening of the 27(5):830–843. TE-thrust hypothesis. Ecol Evol. 2(11):2912–2933. Gregory TR. 2002. A bird’s-eye view of the C-value enigma: genome size, Oliver MJ, Petrov D, Ackerly D, Falkowski P, Schofield OM. 2007. The cell size, and metabolic rate in the class Aves. Evolution mode and tempo of genome size evolution in eukaryotes. Genome 56(1):121–130. Res. 17(5):594–601. Gregory TR, et al. 2007. Eukaryotic genome size databases. Nucleic Acids Organ CL, Shedlock AM. 2009. Palaeogenomics of pterosaurs and the Res. 35(Database issue):D332–D338. evolution of small genome size in flying vertebrates. Biol Lett. Hackett SJ, et al. 2008. A phylogenomic study of birds reveals their evo- 5(1):47–50. lutionary history. Science 320(5884):1763–1768. Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. 2007. Origin of Hanna ZR, et al. 2017. Northern spotted owl (Strix occidentalis caurina) avian genome size and structure in non-avian dinosaurs. Nature genome: divergence with the barred owl (Strix varia) and characteri- 446(7132):180. zation of light-associated genes. Genome Biol Evol. 9(10):2522–2545. Pages H, Aboyoun P, Gentleman R, DebRoy S. 2017. String objects rep- Hoffmann FG, McGuire LP, Counterman BA, Ray DA. 2015. Transposable resenting biological sequences, and matching algorithms. R package elements and small RNAs: genomic fuel for species diversity. Mobile version 2.44.2. Genet Elem. 5(5):63–66. Pascale E, Valle E, Furano AV. 1990. Amplification of an ancestral mam- Hughes AL, Hughes MK. 1995. Small genomes for better flyers. Nature malian L1 family of long interspersed repeated DNA occurred just 377(6548):391. before the murine radiation. Proc Natl Acad Sci U S A. Jarvis ED, et al. 2014. Whole-genome analyses resolve early branches in 87(23):9481–9485. the tree of life of modern birds. Science 346(6215):1320–1331. Pritham EJ, Feschotte C. 2007. Massive amplification of rolling-circle trans- Jetz W, Thomas G, Joy J, Hartmann K, Mooers A. 2012. The global diver- posons in the lineage of the bat Myotis lucifugus.Proc Natl Acad Sci U sity of birds in space and time. Nature 491(7424):444–448. S A. 104(6):1895–1900. Jurka J, Bao W, Kojima KK. 2011. Families of transposable elements, pop- Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, ulation structure and the origin of species. Biol Dir. 6:44. and diversity-dependence on phylogenetic trees. PLoS One Jurka J, et al. 2005. Repbase Update, a database of eukaryotic repetitive 9(2):e89543. elements. Cytogenet Genome Res. 110(1–4):462–467. Rabosky DL, et al. 2014. BAMMtools: an R package for the analysis of Jurka J, Kapitonov VV, Kohany O, Jurka MV. 2007. Repetitive sequences in evolutionary dynamics on phylogenetic trees. Methods Ecol Evol. complex genomes: structure and evolution. Annu Rev Genomics Hum 5(7):701–707. Genet. 8:241–259. Rabosky DL, Mitchell JS, Chang J. 2017. Is BAMM flawed? Theoretical and Kapusta A, Suh A. 2017. Evolution of bird genomes—a transposon’s-eye practical concerns in the analysis of multi-rate diversification models. view. Ann N Y Acad Sci. 1389(1):164–185. Syst Biol. 66(4):477–498. Kapusta A, Suh A, Feschotte C. 2017. Dynamics of genome size evolution Ray DA, et al. 2008. Multiple waves of recent DNA transposon activity in in birds and mammals. Proc Natl Acad Sci U S A. 114(8):E1460–E1469. the bat, Myotis lucifugus. Genome Res. 18(5):717–728. Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment soft- Rohland N, Reich D. 2012. Cost-effective, high-throughput DNA sequenc- ware version 7: improvements in performance and usability. Mol Biol ing libraries for multiplexed target capture. Genome Res. Evol. 30(4):772–780. 22(5):939–946. Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1455 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE Ruggiero RP, Bourgeois Y, Boissinot S. 2017. LINE insertion polymorphisms Weibel AC, Moore WS. 2002. Molecular phylogeny of a cosmopolitan are abundant but at low frequencies across populations of Anolis group of woodpeckers (genus Picoides) based on COI and cyt b mi- carolinensis. Front Genet. 8:44. tochondrial gene sequences. Mol Phylogenet Evol. 22(1):65–75. Shakya SB, Fuchs J, Pons J-M, Sheldon FH. 2017. Tapping the woodpecker Weissensteiner MH, et al. 2017. Combination of short-read, long-read, tree for evolutionary insight. Mol Phylogenet Evol. 116:182–191. and optical mapping assemblies reveals large-scale tandem repeat Smit A, Hubley R, Green P. 2015. RepeatMasker Open-4.0. 2013–2015. arrays with population genetic implications. Genome Res. Institute for Systems Biology. Available from: http://repeatmasker/org. 27(5):697–708. Sotero-Caio CG, Platt RN, Suh A, Ray DA. 2017. Evolution and diversity of Wright NA, Gregory TR, Witt CC. 2014. Metabolic ‘engines’ of flight drive transposable elements in vertebrate genomes. Genome Biol Evol. genome size reduction in birds. Proc R Soc Lond B Biol Sci. 9(1):161–177. 281(1779):20132780. Stewart C, et al. 2011. A comprehensive map of mobile element insertion Zeh DW, Zeh JA, Ishida Y. 2009. Transposable elements and an epigenetic polymorphisms in humans. PLoS Genet. 7(8):e1002236. basis for punctuated equilibria. Bioessays 31(7):715–726. Tollis M, Boissinot S. 2013. Lizards and LINEs: selection and Zhang G, et al. 2014. Comparative genomic data of the Avian demography affect the fate of L1 retrotransposons in the genome Phylogenomics Project. GigaScience 3(1):1. of the green anole (Anolis carolinensis). Genome Biol Evol. Zhang Q, Edwards SV. 2012. The evolution of intron size in amniotes: a 5(9):1754–1768. role for powered flight? Genome Biol Evol. 4(10):1033–1043. Trizzino M, et al. 2017. Transposable elements are the primary source of novelty in primate gene regulation. Genome Res. 27(10):1623–1633. Associate editor: Helen Piontkivska 1456 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology and Evolution Oxford University Press

Multiple and Independent Phases of Transposable Element Amplification in the Genomes of Piciformes (Woodpeckers and Allies)

Free
12 pages

Loading next page...
 
/lp/ou_press/multiple-and-independent-phases-of-transposable-element-amplification-wRKosnk6A7
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ISSN
1759-6653
eISSN
1759-6653
D.O.I.
10.1093/gbe/evy105
Publisher site
See Article on Publisher Site

Abstract

The small and conserved genomes of birds are likely a result of flight-related metabolic constraints. Recombination-driven deletions and minimal transposable element (TE) expansions have led to continually shrinking genomes during evolution of many lineages of volant birds. Despite constraints of genome size in birds, we identified multiple waves of amplification of TEs in Piciformes (wood- peckers, honeyguides, toucans, and barbets). Relative to other bird species’ genomic TE abundance (< 10% of genome), we found 17–30% TE content in multiple clades within Piciformes. Several families of the retrotransposon superfamily chicken repeat 1 (CR1) expanded in at least three different waves of activity. The most recent CR1 expansions (4–7% of genome) preceded bursts of diversification in the woodpecker clade and in the American barbetsþ toucans clade. Additionally, we identified several thousand polymorphic CR1 insertions (hundreds per individual) in three closely related woodpecker species. Woodpecker CR1 insertion polymorphisms are maintained at lower frequencies than single nucleotide polymorphisms indicating that purifying selection is acting against additional CR1 copies and that these elements impose a fitness cost on their host. These findings provide evidence of large scale and ongoing TE activity in avian genomes despite continual constraint on genome size. Key words: transposable elements, CR1, genomics, woodpeckers, diversification. Introduction in the lineage leading to saurischian dinosaurs (Organ Birds, the only extant saurischian dinosaurs, have compar- et al. 2007). atively small genome sizes (0.9–2.1 Gb) relative to other Genome size is, however, fluctuating across avian line- tetrapods (Gregory et al. 2007; Wright et al. 2014). Flying ages (Hughes and Hughes 1995; Wright et al. 2014; organisms—including bats, birds, and pterosaurs—have Kapusta et al. 2017), butatalowerrelativepacecom- convergently evolved constricted genome sizes (Organ pared with most vertebrates because genome size evolu- and Shedlock 2009), and a growing body of evidence tion scales positively with genome size (Oliver et al. 2007). indicates that avian genome size is constrained by the Continual avian genome contraction is driven by metabolic requirements of powered flight. A reduced ge- recombination-caused deletions of small to large nome size in birds corresponds with decreased intron size (> 10 kb) segments of the genome (Nam and Ellegren (Zhang and Edwards 2012), decreased red blood cell and 2012; Kapusta et al. 2017), and a remarkable paucity of nucleus size (Gregory 2001, 2002), and increased flight transposable elements (TEs) (< 10% genomic content) ability (Wright et al. 2014). Among avian ancestors, initial (Chalopin et al. 2015; Kapusta and Suh 2017; Sotero- genomic contraction appears to have occurred> 200 Ma Caio et al. 2017). Reduced TE activity likely limits genome The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(6):1445–1456. doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1445 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE TE Superfamilies TE Families Super- (a) (b) (c)(d) CR1 Families % Genome Estimate % Genome Estimate families 02 10 0 30 02 4 6 J3_Pass E_Pass J2_Pass Y4 Legend Bee-eater Merops = Estimates from full CR1 genome assemblies Jacamar Galbula ERV3 Asian barbet Psilopogon R2 African barbet Satellite Lybius Other Toucan Ramphastos American barbet Families Capito Legend Honeyguide Indicator CR1 - J3_Pass CR1 - E_Pass Sasia CR1 - J2_Pass Nesoctites CR1 - Y4 Chrysocolaptes R2 - 1_PPu Colaptes Coraciiformes (e) n = 11 Sphyrapicus Galbuliformes n = 6 Picoides Toucans + Barbets n = 17 Piciformes Dendropicos Honeyguides n = 2 Dryobates Woodpeckers n = 17 03 15 0 03 15 0 03 15 0 03 15 0 1.0 1.6 2.2 46 other bird species Genome Size (Gbp) % Divergent from Consensus TEs Jarvis et al. 2014 FIG.1.—Summary of Piciformes genomic transposable element content. (a) Phylogeny of samples in our study [based on phylogeny of Jetz et al. (2012)]. (b) Summary of TE superfamily content. Stars indicate estimates from fully assembled genomes (Jarvis et al. 2014) and show that our estimates are conservative and underestimate genomic TE content. (c) Summary of select TE families genomic content. (d) Divergence curves of select CR1 families based on BLASTing raw TE matches to a portion of the 3 end of species- and TE-specific consensus sequences (see supplementary table S1, Supplementary Material online). The y axis is the relative frequency of percent divergence, and the families are on different scales to make them all legible. (e) Genome size estimates of all sampled clades (Gregory et al. 2007). Gray boxes highlight the Picidae clade (woodpeckers) and the clade including barbets and toucans. size increases and genomic rearrangements, resulting in in Piciformes. With whole-genome shotgun sequencing large-scale synteny in macrochromosomes across highly data, we used two data sets to characterize transposable ele- divergent lineages in the avian clade (Ellegren 2010). ment evolution in Piciformes. First, we sequenced representa- An exception to this trend is the downy woodpecker tives of all major lineages of Piciformes (fig. 1a)—including (Dryobates pubescens), the only species of Piciformes with a woodpeckers, honeyguides, toucans, and barbets—and a sin- complete genome sequence. A major genomic explosion gle lineage of Galbuliformes—a jacamar—to characterize ge- event, that is, massive expansion of TEs (Belyayev 2014), oc- nomic TE expansions and periods of activity. In addition, we curred somewhere in the evolutionary lineage leading to this sequenced several individuals of closely related woodpeckers species (Zhang et al. 2014); its genome contains greater than (Genus: Dryobates) to describe TE polymorphisms. The results a 2-fold increase in genomic TE content relative to all other indicate at least three genomic expansions of different families sequenced birds, mostly due to expansion of the superfamily of CR1, which preceded diversification rate shifts in two clades. chicken-repeat 1 (CR1), a type of non-long terminal repeat In addition, several thousand polymorphic TEs are found at low (LTR) retrotransposon. Whereas TE insertions may occasionally frequencies relative to single nucleotide polymorphisms, sug- impact some aspect of an organism’s phenotype, periods of gestive of ongoing negative selection against TE expansions. intense TE amplification can potentially instigate profound genomic changes and may promote evolutionary diversifica- tion (Jurka et al. 2007; Belyayev 2014; Hoffmann et al. 2015). Materials and Methods Coincident bursts of lineage diversification and genomic Sampling explosions of TEs have been identified in radiations of fishes To characterize TE diversity and genomic content across (de Boer et al. 2007)and mammals (Pascale et al. 1990; Piciformes’ genomes, we partially sequenced the genome of Pritham and Feschotte 2007; Ray et al. 2008). 13 species in the avian orders Piciformes and Galbuliformes at Some evidence supports that genome size increased low coverage (0.75 following conservative quality trimming, slightly in the ancestor of Piciformes (Wright et al. 2014); assuming a 1.2 Gb genome) and downloaded raw sequenc- this is suggestive of potential TE expansions across the entire ing reads for the downy woodpecker (Dryobates pubescens) clade outpacing deletions that maintain small genome size. To and the northern carmine bee-eater (Merops nubicus)from date, only one piciform genome has been sequenced, and its avian phylogenomics projects (Jarvis et al. 2014)(table 1). We sister lineage (Galbuliformes) lacks any genome sequence, implemented this sampling scheme to maximize phylogenetic precluding inference about scale, temporal dynamics, and diversity in woodpeckers (family: Picidae) and to represent evolutionary context and extent of genomic TE expansions 1446 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Table 1 Sampling Information Common Name Species Name Sample # Set # Reads Prop. Reads Northern Carmine Bee-eater Merops nubicus SRR958514 Phy 12,000,000 0.51 Bluish-fronted Jacamar Galbula cyanescens KU 24566 Phy 14,193,044 0.77 Green-eared Barbet Psilopogon faiostrictus KU 33324 Phy 15,599,647 0.61 Vieillot’s Barbet Lybius vieilloti KU 15540 Phy 19,029,009 0.64 Green-billed Toucan Ramphastos dicolorus KU 3649 Phy 18,531,397 0.64 Gilded Barbet Capito auratus KU 18855 Phy 12,855,219 0.74 Spotted Honeyguide Indicator maculatus KU 29101 Phy 17,445,057 0.78 Rufous Piculet Sasia abnormis KU 24421 Phy 16,261,827 0.69 Antillean Piculet Nesoctites micromegas KU 8153 Phy 15,253,876 0.77 Greater Flameback Chrysocolaptes lucidus KU 25777 Phy 11,283,426 0.72 Gilded Flicker Colaptes chrysoides KU 30078 Phy 16,480,947 0.84 Yellow-bellied Sapsucker Sphyrapicus varius KU 21911 Phy 17,981,465 0.82 Eurasian Three-toed Woodpecker Picoides tridactylus KU 30447 Phy 10,515,156 0.67 Brown-backed Woodpecker Dendropicos obsoletus KU 20039 Phy 9,945,685 0.69 Downy Woodpecker Dryobates pubescens SRR949789 Phy 12,000,000 0.66 Nuttall’s Woodpecker Dryobates nuttallii KU 29815 Pol 39,814,228 0.94 Nuttall’s Woodpecker Dryobates nuttallii KU 29816 Pol 59,121,354 0.96 Ladder-backed Woodpecker Dryobates scalaris KU 29797 Pol 74,403,829 0.97 Ladder-backed Woodpecker Dryobates scalaris KU 30061 Pol 53,177,745 0.96 Downy Woodpecker Dryobates pubescens KU 11987 Pol 57,308,585 0.96 Downy Woodpecker Dryobates pubescens KU 15939 Pol 60,567,542 0.97 NOTE.—KU samples from University of Kansas Biodiversity Institute and SRR numbers from NCBI sequence read archive. Phy, phylogenetic data set; Pol, polymorphisms data set; # reads, number raw reads in FASTQ files; Prop. Reads, proportion of raw reads retained after quality trimming. other major lineages in Piciformes. For previously sequenced and then sequenced all individuals on either an Illumina genomes, we used the reads from library preparations with a HiSeq2500 (101-bp paired-end) or Illumina HiSeq3000 target insert size of 500 bp for direct comparability to our (151-bp paired-end). Sequencing was performed at the newly sequenced data. All tissue samples were provided by New York University Abu Dhabi Center for Genomics and the University of Kansas Biodiversity Institute. We used the Systems Biology (HiSeq2500) or at the Oklahoma Medical bee-eater as an outgroup, a jacamar as a more closely related Research Foundation Clinical Genomics Center (HiSeq3000). outgroup, and included representatives of major lineages within Piciformes, including major woodpecker clades Phylogeny Data Set Quality Checking and Repeat (Dufort 2016; Shakya et al. 2017), honeyguides, three line- Database Creation ages of barbets, and toucans (fig. 1a and table 1). Hereafter, we refer to these samples as the “phylogeny data set.” We used Trimmomatic v0.36 (Bolger et al. 2014)to remove We also sequenced the genome of two samples each of adapter contamination and trim reads of low-quality bases. downy woodpecker (Dryobates pubescens), ladder-backed We trimmed all bases below a threshold of Q20 on the ends woodpecker (D. scalaris), and Nuttall’s woodpecker (D. nuttal- of all sequences, and trimmed regions of the sequences if the lii) at moderate-coverage (4–8 following quality trimming) to average quality dropped below Q30 in sliding windows of investigate CR1 insertion polymorphisms in closely related indi- 15 bp. Following trimming, reads were retained if they were viduals and species (table 1 and fig. 2a). The ladder-backed still 75 bp to limit spurious short-read matches to TEs. woodpecker and Nuttall’s woodpecker are sister species, All reads were filtered for mitochondrial DNA sequences with both together the sister lineage of the downy woodpecker using the bbsplit.sh perl script implemented as part of BBMap (Weibel and Moore 2002; Dufort 2016). Hereafter, we refer to v36.x (Bushnell 2014). We used several mitogenomes of birds these samples as the “polymorphisms data set.” from the orders Coraciiformes, Galbuliformes, and Piciformes We extracted genomic DNA from all tissue samples using a to filter reads. We standardized the number of basepairs (bp) magnetic bead DNA extraction protocol (Rohland and Reich per individual, trimmed every sequence to 75 bp using the 2012). We quantified all DNA extractions—for concentration FASTX toolkit (Gordon and Hannon 2010) and then used standardization—using Qubit Fluorometric Quantitation (Life Seqtk v1.2-r95-dirty (Li 2015) to randomly sample six million Technologies). We used standard Illumina library preparation paired-end reads from each individual, which we used for all for each sample (mean insert size between 500 and 700 bp), downstream analyses. Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1447 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE (a) Shared Polymorphic CR1 J3_Pass insertions ≤ 10% divergence D.n. D.n. D.s. D.n. + from consensus + polymorphic + + + D.s. + reference TEs D.s. D.p. D.p. D.p. 325 370 Total #: 3485 D. nuttallii 54 219 648 436 D. scalaris 857 124 D. pubescens (b)(c) = TEs = SNPs 0.8 Max. Div. 0.6 0.4 0.2 0.0 nuttallii pubescens scalaris FIG.2.—Patterns of CR1 polymorphisms in three closely related woodpeckers (genus Dryobates). (a) Patterns of CR1 J3_Pass insertion polymorphisms. Numbers on branches indicate fixed differences in that lineage. (b) Variation of insertion detection using four sensitivity settings in the MELT analyses. (c) Pairwise comparisons of polymorphisms* in four CR1 families relative to single nucleotide polymorphisms (SNPs). *¼ [Fixed / BETWEEN (Fixed þ Polymorphic )]. BETWEEN WITHIN We used REPdenovo (Chu et al. 2016) with the default to 123 TE families, largely CR1 elements or endogenous ret- settings to identify putative transposable elements and repet- roviruses. In total, the vast majority (85%) of newly identi- itive sequences from the raw sequencing reads. REPdenovo is fied repeats for our custom database shared homology with a kmer-based approach that identifies overrepresented kmers CR1 elements. in the genome. Following initial kmer identification, these sequences are joined where possible to create longer contigs. Summarizing Transposable Element Content in Piciformes With the output contigs from REPdenovo, we used a (Phylogeny Data Set) homology-based approach to identify putative transposable elements. First, we downloaded the vertebrate database of We used six million paired-end sequencing reads per individ- repetitive elements from RepBase v21.10, accessed on ual and our custom repeat database to identify the proportion October 10, 2016 (Jurka et al. 2005). We used of each species’ genome comprised of transposable elements BLASTþ v2.6.0 (Camacho et al. 2009)and the function (fig. 3). We used BLASTþ to match raw reads to our custom rmblastn to match REPdenovo contigs to previously identified repeat database with the following requirements: maximum transposable elements and repeats. rmblastn is a modified e-value of 0.01 (1e-2), a minimum of 60% identity, and less function of blastn for use with RepeatMasker (Smit et al. than three gaps in the alignment. We minimized the number 2015). For annotation to transposable elements, we required of gaps allowed to minimize matching of spurious short por- the matches to have a minimum of 60% identity to a previ- tions of reads matching multiple regions of TE sequences with ously annotated element, alignment length of 50 bp, and wide-spanning gaps. an e-value< 1e-6. All BLAST matches >90% identity were We performed the above BLASTþ search with the RepBase removed because raw reads would be mappable to original database (i.e., not including our newly annotated TEs) to see RepBase sequences. We created a custom repeat database for how including species-specific repetitive sequence informa- further analyses by combining the newly annotated repeats tion improved overall TE identification across low-coverage with the RepBase database. After de novo repeat identifica- genomes. Additionally, we summarized results with various tion and sequence annotation, we included 5,732 novel re- maximum e-values (1e-2, 1e-4, and 1e-6) to investigate sen- peat sequences in our modified repeat database. We sitivity of genome-wide TE content estimation to this identified novel repeat sequences with homology matches parameter. 1448 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Singletons Private (n > 1) J3_Pass J2_Pass E_Pass Y4 # Polymorphic TEs Insertions 3% 6% 10% 15% Fixed Differences / Total Segregating Sites Create Repeat Database Transposable Element Explosions in Piciformes GBE Shotgun Sequencing Trimmomatic Illumina HiSeq 2500 Trim ends, remove adapter contamination, Piciformes remove quality < Q30 Low-coverage BBMap Remove reads from Phylogeny mitochondrial genome Workflow Seqtk Standardize # of reads per species 6 million PE reads REPdenovo kmer-based approach to Summary of reads Dataset of 6 million identify repetitive elements matching repetitive PE reads per species from raw reads and elements join into contigs BLAST+ BLAST+ Match sequences and rmblastn to vertebrate portions of sequences to RepBase and annotate the repeat database newly identified elements Combine RepBase Extracted reads Repeat Database database with newly matching repetitive (de novo + previously annotated repetitive elements elements described) BLAST+ Geneious Divergence Match sequences to Reference-based TE consensus each species TE curves (RepBase sequences) sequences consensus sequences alignment of TEs FIG.3.—Workflow to detect genomic TE content using low-coverage shotgun sequencing of Piciformes genomes. Detailed information for all steps is described in Materials and Methods. Green shaded ovals indicate output used for comparisons and are summarized in figure 1. We extracted all reads matching several common CR1 fam- Woodpeckers Y4 Barbets + Toucans ilies (supplementary table S1, Supplementary Material online) Honeyguides for downstream use. We used all reads that completely Jacamar, Bee-eater matched TEs (i.e., 75 bp alignment) to perform reference- J2_Pass guided assemblies in Geneious v9.1.7 (BioMatters Ltd.). The RepBase reference sequences for each subfamily were used to guide assembly. From these assemblies, we created consensus sequences for each subfamily using a minimum of 10 Jacamar J2_Pass + coverage and a 25% threshold to call the consensus sequence J3_Pass bases. For each species, we created a separate consensus se- quence for each subfamily. We aligned all consensus sequen- ces for each of the four common CR1 families using MAFFT J3_Pass (Katoh and Standley 2013), implemented in Geneious. For each of these families, we extracted the longest stretch of 0.08 E_Pass aligned consensus sequences that was covered by each species (supplementary table S1, Supplementary Material online). FIG.4.—Neighbor-joining phylogeny of consensus TE sequences from From these alignments, we created a neighbor-joining phylog- CR1 families with large expansions in Piciformes genomes. The scale bar eny using Tamura–Nei distance matrices in Geneious (fig. 4). indicates estimated sequence divergence across the phylogeny. Our next goal with the common CR1 consensus sequences was to identify trends in percent sequence divergence to in- vestigate amplification patterns. CR1 insertions in the genome truncations) of the complete element (Burch et al. 1993). initiate on the 3 end of the full length element, with most Indeed, we observed a majority of the raw reads matching 0 0 insertions only including a small portion (i.e., severe 5 their respective consensus sequences near the 3 end. Because Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1449 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Sequencing Quality Checking Manthey et al. GBE we did not want to bias our results if there might be sequences are typical representatives of their respective sub- differential selective pressures (or ages) for shorter or longer families, and because they generally reflect the phylogenetic CR1 insertions, as well as to keep analyses consistent between patterns in their hosts (fig. 4). We preprocessed BAM and TE different CR1 families, we limited our analysis to sequences consensus files with the “Preprocess” and matching 500 bp regions near the 3 end of the full-length “BuildTransposonZIP” utilities of MELT, respectively. CR1 consensus sequences (supplementary table S1, When detecting polymorphic TEs in multiple individuals, Supplementary Material online). We used raw reads that MELT is a multistep process: 1) Discovery of potential TEs in were initially identified as one of these four CR1 families to each individual (“IndivAnalysis”); 2) Output of all individuals’ measure percent sequence divergence from species-specific TE discovery analyses are compiled together to identify puta- consensus sequences using BLASTþ. We created divergence tive insertion breakpoints in the reference genome histograms from this output with reads containing 50 bp (“GroupAnalysis”); 3) Genotyping of all insertions for each matching the CR1 consensus sequences (fig. 1d and supple- individual (“Genotype”); and 4) Filtering of genotype files mentary table S1, Supplementary Material online). and variant call format (VCF) file creation (“MakeVCF”). We limited the MELT analyses to scaffolds of at least two mega- bases (n¼ 179; 52% of genome) because highly frag- CR1 Polymorphism Detection in Three Woodpecker mented scaffolds and contigs influence the performance of Species (Polymorphisms Data Set) the program. These four steps were run for each CR1 family We used Trimmomatic v0.36 (Bolger et al. 2014) for quality separately. MELT allows different sensitivity thresholds for TE filtering of all raw reads with less conservative filtering relative detection by changing the maximum amount of allowed di- to the phylogeny data set because of higher representative vergence between putative polymorphic TEs and the consen- coverage. We removed adapter contamination and trimmed sus sequence. We used four maximum divergence levels (3%, reads of low-quality bases using the following filters: we re- 6%, 10%, and 15%) to have conservative and liberal TE poly- moved all bases below a quality of Q20 on the ends of all morphism estimates (i.e., different sensitivity levels). Any poly- sequences, and trimmed regions in sliding windows of 25 bp morphisms called by MELT for more than one of the CR1 if the average quality dropped below Q20. We filtered mito- families (breakpoints within 100 bp to allow some error) chondrial DNA with the bbsplit.sh perl script implemented as were matched to the most similar CR1 consensus sequence part of BBMap v36.x (Bushnell 2014) with the downy wood- and removed from other VCF files. We additionally postpro- pecker mitochondrial genome as a reference. cessed the MELT output using several filtering steps by remov- We used the BWA-MEM implementation of the Burrows– ing: 1) putative insertions near scaffold breakpoints ( 10 kb) Wheeler transform algorithm in BWA (Li and Durbin 2009)to to limit any effects of misassembly on scaffold edges; 2) TEs align all quality-filtered sequences to the downy woodpecker called with limited evidence, for example, imprecise break- (Dryobates pubescens) genome (April 28, 2014 version, points due to ambiguous alignment (MELT ASSESS flag 3); dx.doi.org/10.5524/101012). We used samtools v1.4.1 (Li 3) reads not passing MELT’s suggested quality filters (MELT et al. 2009) to convert the BWA output SAM file to BAM for- FILTER flag¼¼ PASS); and 4) any polymorphic TEs with miss- mat. Next, we cleaned, sorted, added read groups to, and re- ing calls for any individuals. moved duplicates from the BAM file for each individual using To compare CR1 polymorphisms to putatively neutral geno- Picard (available at: http://broadinstitute.github.io/picard). mic patterns, we called single nucleotide polymorphisms (SNPs) We used the final cleaned and filtered BAM files for each from all six Dryobates individuals. From the filtered and cleaned individual as input for detecting polymorphic transposable BAM file, we used the Genome Analysis Toolkit (GATK) elements with the Mobile Element Locator Tool v2.1.2 (McKenna et al. 2010) HaplotypeCaller to create an intermedi- (MELT) (Gardner et al. 2017). MELT uses unaligned and split ate genomic variant call format (gVCF) file for each individual. reads from BWA alignments, a reference genome, and con- All gVCFs were used as input in GATK to group genotype all sensus TE sequences to identify polymorphic TEs. Because SNPs and small indels. We filtered all output SNPs in VCFtools MELT relies on sequence similarity for identifying TEs, we v0.1.14 (Danecek et al. 2011) to keep those with the following could not use reference elements from RepBase. Instead, conditions: 1) sites with quality 20; 2) genotypes with qual- we used the downy woodpecker partial consensus sequences ity 20; 3) biallelic SNPs; 4) minimum depth per individual 5; of the CR1 subfamilies J3_Pass (length¼ 3,720 bp), J2_Pass 5) maximum mean depth per individual< 20; and 6) the site is (2,934 bp), E_Pass (2,655 bp), and Y4 (1,214 bp) that we cre- covered in all individuals. Lastly, we only used SNPs present on ated for use in the phylogeny data set. While these are not the scaffolds used in MELT analyses for comparisons. full-length consensus sequences, we are confident they are representative of their respective CR1 subfamilies in the three Analysis of Diversification Rates in Piciformes species of Dryobates woodpeckers investigated here because of the large number of sequencing reads used to construct We used the program Bayesian Analysis of Macroevolutionary the consensus sequences, the fact that the consensus Mixtures (BAMM) v2.5.0 (Rabosky 2014; Rabosky et al. 2014; 1450 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Mitchell and Rabosky 2017) to assess whether bursts of TE Material online), due to a large expansion of satellite DNA amplification coincide with diversification in Piciformes. For (10.5% of genome; fig. 1b). The majority of the satellite the input phylogeny for BAMM, we obtained data for all indi- BLAST matches in Nesoctites were similar to a 153-bp repeat viduals of the Coraciiformes, Galbuliformes, and Piciformes identifiedwithREPdenovo(supplementary fig. S2a, with available genetic data from birdtree.org (Hackettetal. Supplementary Material online). We manually investigated 2008; Jetz et al. 2012) using the Hackett 6670 OTUs data the satellite sequences to ensure these results were not spu- set. We used the highest clade credibility phylogeny from the rious. First, we aligned 2,000 75-bp sequence matches using 10,000 tree set as input. We used the BAMMtools (Rabosky MAFFT (Katoh and Standley 2013) to create a consensus se- et al. 2014) function “setBAMMpriors” to determine all input quence of the satellite. Within the consensus, we used mreps prior settings for BAMM, that is, using empirically optimized (Kolpakov et al. 2003) to identify any repetitive elements of parameterization. We ran two iterations of BAMM with 100 the consensus. We found a 10-bp motif (supplementary fig. million MCMC generations, using 10% burn-in (supplemen- S2c, Supplementary Material online) with 3.5 repeats in the tary fig. S1, Supplementary Material online) and an assumption consensus. We searched all long Nesoctites satellite BLAST of 50% missing taxa. This resulted in effective sample sizes for matches ( 70 bp, n¼ 594,155) for the repeat motif using all estimated paramaters> 3,500, and estimates of 23 or 25 the Biostrings R package (Pages et al. 2017), while allowing variations of credible shifts in the 95% posterior distribution for one mutation maximum (i.e., one mismatch) in the repeat the two runs. The major distinctive shift configurations output motif. from the two BAMM iterations were largely the same; the dif- ferent configurations had slight variation of node placement Results (supplementary fig. S1, Supplementary Material online). For Characterizing TE Dynamics with Whole-Genome Shotgun example, one rate shift varied between the nodes of the wood- Sequencing pecker clade Picinae to the Picidae clade, which is inclusive of We used whole-genome shotgun sequencing data to charac- Picinae and a few more genera (supplementary fig. S1, terize TE abundance and diversity across the genomes of Supplementary Material online). Piciformes. We identified a massive expansion of the retro- Recently, a critical analysis of BAMM (Moore et al. 2016) transposon superfamily CR1 (13–19% genomic content) in identified several putative flaws with the methodologies and the woodpecker and toucansþ barbets clades, but not in consistency of the program, to which BAMM’s authors pro- honeyguides or jacamars (fig. 1b and supplementary table vided a rebuttal (Rabosky et al. 2017): 1) the posterior of S2, Supplementary Material online). Minor expansions BAMM has strong prior sensitivity, 2) a faulty nondefault (< 2% genomic content) of endogenous retrovirus (ERV3) extinction-rate calculation, 3) strong impact of unobserved and woodpecker-specific R2 elements were also identified lineages on final rate shift results, and 4) unreliable results (fig. 1b and supplementary table S2, Supplementary when many shifts are present relative to the number of extant Material online). One piculet species (Nesoctites micromegas), tips in the phylogeny. Rabosky et al. (2017) showed that these from a monotypic genus endemic to Hispaniola, displayed a arguments were either flawed or did not have significant massive expansion of satellite sequence in its genome effects on final results when reasonable considerations were (fig. 1b). With further investigation of these satellite sequen- taken: 1) checking whether the prior distribution systemati- ces, we found the sequences exhibited an abundance of a 10- cally shaped the posterior distribution, 2) use of nonhidden, bp repeat motif (median¼ three motifs per 75 bp sequence, that is, default, extinction parameterization, 3) the use of sd¼ 1.11; supplementary fig. S2, Supplementary Material on- empirically parameterized values for the calculations, and 4) line). Compared with the two previously assembled the use of phylogenies with a large tip to rate-shift ratio (> 10: genomes—northern carmine bee-eater and downy 1). We followed the protocols outlined by (Rabosky et al. woodpecker—our approach underestimated genomic TE 2017) and additional guidelines at (http://bamm-project. content by 2–5% (fig. 1b), suggesting our approach in iden- org), checked the distribution of the prior versus the posterior tifying genomic TE content is conservative. Using different (supplementary fig. S1, Supplementary Material online), used sensitivities for detecting TEs, we found small changes in empirically derived values for calculations as determined using the absolute levels of genomic TE content (up to 4% less), BAMMtools (Rabosky et al. 2014), and used a phylogeny with but not in relative differences between species (supplemen- hundreds of tips relative to only a few possible variations of tary fig. S3, Supplementary Material online). It was critical to estimated rate shifts (supplementary fig. S1, Supplementary incorporate the custom repeats into the repeat search data- Material online). base; using only the RepBase consensus sequences resulted in missing a large proportion of TEs (1–15% genomic con- Nesoctites Satellite Investigation tent), which positively scaled with genomic TE content, that is, more TEs were missed in genomes with more TE content In Nesoctites, we found a distinct pattern in the rank- (supplementary fig. S4, Supplementary Material online). abundance curves (supplementary fig. S2, Supplementary Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1451 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE Overall, this approach could be implemented with few reads CR1s were largely short (< 1,000 bp) and truncated on the per individual for overall genomic TE content and TE family 5 end (supplementary fig. S7, Supplementary Material on- content; we recovered consistent results when we used sub- line); 20–25% of polymorphic CR1s were intronic and sets of 100,000 sequencing reads (supplementary fig. S5 and had a random orientation relative to their respective gene table S2, Supplementary Material online). insertion site (50% same orientation as gene) (supplemen- The CR1 expansion in Piciformes is due to at least three tary table S4, Supplementary Material online). These general waves of activity in different CR1 families (fig. 1c and supple- patterns of the polymorphic CR1s mirror the downy wood- mentary table S3, Supplementary Material online): 1) All pecker genome-wide patterns of CR1 truncation, intronic Piciformes had modest expansions (1–2% of genome abundance, and orientation within genes, suggesting that each; i.e., tens of millions of bp) of J2_Pass and Y4; 2) The current patterns of CR1 activity mirror general trends of woodpeckers had a major expansion of J3_Pass (4–5%) and long-term CR1 accumulation. We compared putatively neu- a minor expansion of E_Pass (1–2%); 3) Lastly, the bar- tral genetic variation (i.e., genome-wide SNPs) with CR1 betsþ toucans clade had mixed patterns, but massive overall J3_Pass polymorphisms, the most abundant TEs in woodpeck- expansions of J3_Passþ E_Pass (8–10%). These results spe- ers. We found a much smaller proportion of CR1 J3_Pass cifically exemplify the dynamic nature of the E_Pass and polymorphisms to be fixed between species than SNPs J3_Pass families; these two families each amplified at least (fig. 2c); similarly, J3_Pass insertions had a much higher fre- twice and to different magnitudes (fig. 1c)in the bar- quency of singletons than SNPs (supplementary fig. S8, betsþ toucans and woodpecker clades. Apart from major Supplementary Material online). We looked at patterns of CR1 genomic explosions, all Piciformes genomes are generally observed heterozygosity across all individuals to see if neutral homogenous in TE richness (types of TEs) and evenness (sim- genetic diversity was related to diversity of TE insertion poly- ilarity of TE abundances across species) (supplementary fig. morphisms and found that observed heterozygosity was neg- S6, Supplementary Material online). To estimate divergence atively correlated between SNPs and CR1 insertions, but curves for the J3_Pass, E_Pass, J2_Pass, and Y4 families, we positively correlated among all CR1 families (supplementary created species-specific and CR1 family specific consensus table S5, Supplementary Material online). Because selection is sequences for each species (fig. 4 and supplementary table more effective in large populations relative to smaller popu- S1, Supplementary Material online) and estimated divergence lations, this pattern is consistent with selection acting against of sequence reads from consensus sequences. The clade- CR1 proliferation, because genomes with higher genetic specific expansions of J3_Pass and E_Pass occurred simulta- diversity—and likely higher recent population sizes—have neously, as well as more recently than Piciformes-wide expan- lower diversity of CR1 polymorphisms. sions of J2_Pass and Y4 (fig. 1d and supplementary table S1, Supplementary Material online). Because large expansions of TEs, and associated genomic Discussion changes, may promote evolutionary diversification (Jurka et al. Multiple Transposable Element Genomic Explosions in 2007; Belyayev 2014; Hoffmann et al. 2015) we looked for Piciformes bursts of speciation that may coincide with major TE expansions Our results demonstrate multiple, independent waves of CR1 in Piciformes. Using BAMM, we found two consistent shifts in amplification in the Piciformes, with clade-specific expansions diversification rate that follow clade-specific TE expansions: of a similar magnitude to total TE content identified in most 1) increased diversification in the early radiation of either bird genomes (Zhang et al. 2014). The history of TE prolifer- Picidae (all woodpeckers) or Picinae (all woodpeckers excluding ation and activity in Piciformes is dynamic (fig. 1); we found several genera of wrynecks and piculets: Jynx, Picumnus, Sasia, different temporal patterns of CR1 activity. The J2_Pass and Verreauxia,and Vivia); and 2) increased diversification in the Y4 families were active and amplifying at low levels during early radiation of either both Ramphastidaeþ Capitonidae (tou- early evolution of the Piciformes clade, but lineage-specific cans and American barbets, respectively) or only Ramphastidae amplification of different CR1 subsets occurred subsequently. (supplementary fig. S1, Supplementary Material online). The honeyguides exhibit more TE content than most birds, but have relatively small expansions of CR1s compared with CR1 Polymorphisms in Three Species of Closely Related all other Piciformes (fig. 1a). The toucansþ barbets clade dis- Woodpeckers played the highest amplitude of CR1 expansions from the J3_Pass and E_Pass families, with distinct patterns in different We searched for the presence of polymorphic CR1 insertions lineages. Lastly, we found remarkably consistent patterns of in the downy woodpecker and two close relatives to deter- TE superfamily and family abundance across woodpeckers mine if CR1 is still active in woodpeckers. We also character- (fig. 1) with major amplifications of the CR1 J3_Pass and a ized several traits of polymorphic TEs to determine whether clade-specific amplification of an R2 element described from polymorphic CR1s show different patterns relative to those the downy woodpecker genome. One major deviant from the exhibited in the downy woodpecker genome. Polymorphic 1452 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE general trend in woodpeckers was Nesoctites, with expansions toucansþ barbets (but also see other slightly different possi- of satellite sequences (fig. 1a and supplementary fig. S2, bilities in supplementary fig. S1, Supplementary Material on- Supplementary Material online). Nesoctites is the lone repre- line). Transposable elements have long been recognized to sentative of a monotypic genus and will need to be investigated have potential for rearranging genomes, with phenotypic further with high-coverage genomic sequencing and assem- consequences (Feschotte and Pritham 2007; Feschotte bly. Similarly, a recent study identified a large genomic propor- 2008), and it was proposed that TEs could promote speciation tion of satellites in a North American bird species, the northern when bursts of diversification occur simultaneous with or af- spotted owl (Strix occidentalis caurina)(Hanna et al. 2017). ter TE genomic amplification (Jurka et al. 2007; Belyayev Large and variable clade-specific expansions of CR1s cause 2014; Hoffmann et al. 2015). This provides a chicken or the distinct evolutionary genomic patterns and lineage-specific egg type of question (the egg definitely came before the increases in genomic content. Overall, genome size in chicken.): Does rapid diversification induce rampant TE activity Piciformes (fig. 1e) likely increased relative to the clade’s com- or does TE amplification promote increased speciation rates mon ancestor (Wright et al. 2014), varies slightly among and consequent rapid diversification? Piciformes lineages, and differs by several hundred Mb within During rapid diversification, small populations—due to lineages (fig. 1e). Recently, genome-wide analyses of bird ge- founder events or other causes—will be more influenced by nome assemblies indicated that genome size remains rela- genetic drift and have less effective selection against TE tively static through evolutionary time despite TE activity due expansions (Tollis and Boissinot 2013; Ruggiero et al. 2017; to mid- and large-scale deletions (Kapusta et al. 2017). Trizzino et al. 2017). Similarly, populations experiencing novel However, these genome-size stability analyses relied on highly stressors, such as during population expansions to new envi- fragmented genome assemblies and taxonomic sampling at ronments, could experience disruption of TE epigenetic re- the level of avian orders. pression (Zeh et al. 2009). In contrast, the TE-thrust Despite the genome size variation across birds (range 0.9 hypothesis (Oliver and Greene 2011, 2012) posits that the to 2.1 Gb), the current collection of avian genome assemblies continuum of TE activity—from extinct to highly abundant is largely homogenous in size (Zhang et al. 2014), with and active—has various evolutionary consequences. With a moderate-coverage assemblies generally ranging between moderate TE abundance and activity in a lineage’s genomes, 1.0 and 1.3 Gb. The discrepancy between genome-size esti- the TE-thrust hypothesis suggests those genomes are more mates and genome-assembly sizes suggests we are missing a dynamic, adaptable, and may lead to increased rates of spe- large portion of avian genome size dynamics through the in- ciation (Oliver and Greene 2011, 2012). ability to accurately annotate and assemble large-scale inter- While the present data do not allow us to distinguish be- spersed repetitive elements in avian genomes. Although no tween these two possibilities, it is unlikely by chance that rapid genome size estimate exists for the downy woodpecker to diversification and large TE expansions coincided. A large compare with its genome assembly, one of its two closest body of evidence indicates that TE activity contributed to relatives (Nuttall’s woodpecker, Dryobates nuttallii) has a ge- novel phenotypes (Feschotte 2008; Belyayev 2014; Trizzino nomesizeof 1.48 Gb (Gregory et al. 2007), which indicates et al. 2017), and TE expansions occurring coincidently with an incongruence of at least 200 Mb. These gaps in genome diversification is documented in fishes (de Boer et al. 2007), assemblies are likely due to the most difficult regions to as- mammals (Pascale et al. 1990; Pritham and Feschotte 2007; semble: highly repetitive elements. The misrepresentation of Ray et al. 2008), and now birds (this study). Because the mas- genome size dynamics with highly fragmented assemblies is sive CR1 expansions we identified here precede diversification exemplified by two recent surveys with high-quality and rate shifts in Piciformes, we hypothesize that TEs may have chromosome-scale genome assemblies (chicken and crow) promoted genomic novelty in the woodpecker and bar- that have identified large genomic regions of repetitive ele- betsþ toucans clades and contributed to their bursts of in- ments that were undetected and unaccounted for in short- creased diversification rate, although we cannot imply read sequencing assemblies (Kapusta and Suh 2017; causation and this hypothesis will require further investigation Weissensteiner et al. 2017). This suggests that while some using high-coverage genomic data for this group. genome size variation in birds may be moderated with mid- and large-scale deletions (Kapusta et al. 2017), we are also Woodpeckers Have Moderate Levels of CR1 missing a substantial portion of the picture with most current Polymorphisms genome assemblies. We identified thousands of polymorphic CR1 elements in three woodpecker species in the genus Dryobates (fig. 2). Massive CR1 Expansions Precede Shifts in Diversification CR1 polymorphisms may be present in these species due to Rate maintained ancestral polymorphism since their most recent We found that two large bursts of CR1 activity preceded common ancestor, that is, at least several Ma (Shakya et al. bursts of diversification in the woodpeckers and in the 2017) or continually maintained through low but continual Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1453 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE CR1 activity. The magnitude of CR1 activity in woodpeckers barren landscape of fossil TEs, but a landscape under the in- produced a number of insertion polymorphisms similar to fluence of prolonged and ongoing TE action. those in other vertebrates with greater genomic TE content. For example, the woodpecker CR1 polymorphism level we Conclusions identified is greater in magnitude than LINE-1 polymorphisms Using whole-genome sequencing of several species of in humans (Stewart et al. 2011), as would be expected be- Piciformes (woodpeckers, toucans, barbets, and honey- cause most LINE-1 elements are inactive in humans. In con- guides), we identified massive TE expansions from at least trast, the green anole (Anolis carolinensis)has more than three waves of activity, resulting in doubling or tripling geno- double the number of CR1 polymorphisms per individual mic TE content in multiple Piciformes lineages. Additionally, than we found in our samples (Ruggiero et al. 2017). we found thousands of polymorphic CR1 insertions in wood- Similar to the overall pattern of the woodpecker genome, peckers maintained at low frequencies relative to single nu- CR1 polymorphisms showed a trend of extreme 5 truncation cleotide polymorphisms. Our findings show that TE activity in (supplementary fig. S7, Supplementary Material online), con- woodpeckers and closely related species has been prolonged sistent with patterns in anoles (Ruggiero et al. 2017). These and remains ongoing. highly truncated CR1 elements are essentially inactive on in- sertion, and are likely the result of inefficient reverse transcrip- tion or host interruption of reverse transcription (Levin and Supplementary Material Moran 2011). Consequently, CR1s are precluded from over- Supplementary data are available at Genome Biology and activity but maintain some genomic proliferation with few Evolution online. functional copies producing mainly truncated copies. J3_Pass CR1 polymorphism insertions are maintained at lower frequencies than SNP minor alleles, similar to patterns in Authors’ Contributions anoles (Ruggiero et al. 2017). The low to moderate, but non- J.D.M. and S.B. conceived and designed the study. J.D.M. and zero, frequency of CR1 insertions is strongly suggestive of R.G.M. provided samples. S.B. and R.G.M. funded the study. ongoing selection against additional CR1 genomic expansions J.D.M. performed laboratory work, and all bioinformatics and in the woodpecker lineage. Selection against ongoing expan- data analyses. J.D.M. wrote the original draft of the manu- sions may be due to continued constraints on genome size, script and all authors contributed to the review and editing of or, alternatively, due to negative effects of high retrotranspo- the final version of the article. sition rates, for example, a high prevalence of ectopic recombination. If woodpecker genomes exhibit continued selection against high levels of CR1 proliferation, the question remains Acknowledgments how CR1s amplified to such high numbers in the recent evo- We thank Mark Robbins at the University of Kansas lutionary past. Two mechanisms potentially explain CR1 pro- Biodiversity Institute for assistance with all tissue loans. liferation throughout the woodpecker genomes. First, novel Much of the data analyses were performed using the high- features of CR1s, such as new promoter classes (as observed performance computing cluster at New York University in in human LINE-1 elements; Khan et al. 2006), start amplifying Abu Dhabi. We thank Marc Arnoux from the Genome Core because host genomes lack strong defense mechanisms Facility at NYUAD for assistance with genome sequencing. against TEs with newly evolved features. This hypothesis is This work was supported by New York University Abu doubtful for woodpeckers, because all CR1 consensus Dhabi (NYUAD) research funds AD180 (to S.B.) and sequences look like normal CR1s relative to outgroup sequen- National Science Foundation DEB-1241181 and DEB- ces (fig. 4). Alternatively, and more likely, population frag- 1557053 (to R.G.M.). The NYUAD Sequencing Core is sup- mentation, founder events, or speciation may result in small ported by NYUAD Research Institute grant G1205-1205A to population sizes, and subsequently allow CR1 accumulation the NYUAD Center for Genomics and Systems Biology. through drift (Jurka et al. 2011), whereas in larger populations CR1 accumulation would be limited by selection. In particular, accumulation of relatively more deleterious full-length CR1s Literature Cited could cause higher retrotransposition rates and cause geno- Belyayev A. 2014. Bursts of transposable elements as an evolutionary driv- mic CR1 copy number increases. ing force. J Evol Biol. 27(12):2573–2584. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Altogether, CR1 polymorphisms at low to moderate levels Illumina sequence data. Bioinformatics 30(15):2114–2120. and selection against large CR1 genomic expansions in the Burch J, Davis DL, Haas NB. 1993. Chicken repeat 1 elements contain a woodpecker lineage, together with high levels of TE amplifi- pol-like open reading frame and belong to the non-long terminal re- cation across Piciformes, provide multiple lines of evidence peat class of retrotransposons. Proc Natl Acad Sci U S A. that woodpeckers’ and related species’ genomes are not a 90(17):8199–8203. 1454 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Transposable Element Explosions in Piciformes GBE Bushnell B. 2014. BBMap: a Fast, Accurate, Splice-Aware Aligner. Berkeley Khan H, Smit A, Boissinot S. 2006. Molecular evolution and tempo of (CA): Ernest Orlando Lawrence Berkeley National Laboratory. amplification of human LINE-1 retrotransposons since the origin of Camacho C, et al. 2009. BLASTþ: architecture and applications. BMC primates. Genome Res. 16(1):78–87. Bioinformatics 10(1):421. Kolpakov R, Bana G, Kucherov G. 2003. mreps: efficient and flexible de- Chalopin D, Naville M, Plard F, Galiana D, Volff J-N. 2015. Comparative tection of tandem repeats in DNA. Nucleic Acids Res. analysis of transposable elements highlights mobilome diversity and 31(13):3672–3678. evolution in vertebrates. Genome Biol Evol. 7(2):567–580. Levin HL, Moran JV. 2011. Dynamic interactions between transposable Chu C, Nielsen R, Wu Y. 2016. REPdenovo: inferring de novo repeat motifs elements and their hosts. Nat Rev Genet. 12(9):615. from short sequence reads. PLoS One 11(3):e0150719. Li H. 2015. Seqtk: a toolkit for processing sequences in FASTA/Q formats. Danecek P, et al. 2011. The variant call format and VCFtools. Available from: https://github.com/lh3/seqtk. Li H, Durbin R. 2009. Fast and accurate short read alignment with Bioinformatics 27(15):2156–2158. de Boer JG, Yazawa R, Davidson WS, Koop BF. 2007. Bursts and horizontal Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760. evolution of DNA transposons in the speciation of pseudotetraploid Li H, et al. 2009. The sequence alignment/map format and SAMtools. salmonids. BMC Genomics 8:422. Bioinformatics 25(16):2078–2079. Dufort MJ. 2016. An augmented supermatrix phylogeny of the avian fam- McKenna A, et al. 2010. The Genome Analysis Toolkit: a MapReduce ily Picidae reveals uncertainty deep in the family tree. Mol Phylogenet framework for analyzing next-generation DNA sequencing data. Evol. 94(Pt A):313–326. Genome Res. 20(9):1297–1303. Ellegren H. 2010. Evolutionary stasis: the stable chromosomes of birds. Mitchell JS, Rabosky DL. 2017. Bayesian model selection with BAMM: Trends Ecol Evol. 25(5):283–291. effects of the model prior on the inferred number of diversification Feschotte C. 2008. The contribution of transposable elements to the evo- shifts. Methods Ecol Evol. 8(1):37–46. lution of regulatory networks. Nat Rev Genet. 9(5):397. Moore BR, Ho ¨ hna S, May MR, Rannala B, Huelsenbeck JP. 2016. Critically Feschotte C, Pritham EJ. 2007. DNA transposons and the evolution of evaluating the theory and performance of Bayesian analysis of mac- eukaryotic genomes. Annu Rev Genet. 41:331–368. roevolutionary mixtures. Proc Natl Acad Sci U S A. Gardner EJ, et al. 2017. The Mobile Element Locator Tool (MELT): 113(34):9569–9574. Population-scale mobile element discovery and biology. Genome Nam K, Ellegren H. 2012. Recombination drives vertebrate genome con- Res. 27:1916–1929. traction. PLoS Genet. 8(5):e1002680. Gordon A, Hannon G. 2010. Fastx-toolkit: FASTQ/A short-reads prepro- Oliver KR, Greene WK. 2011. Mobile DNA and the TE-Thrust hypothesis: cessing tools. Available from: http://hannonlab.cshl.edu/fastx_toolkit. supporting evidence from the primates. Mobile DNA 2(1):8. Oliver KR, Greene WK. 2012. Transposable elements and viruses as factors Gregory TR. 2001. The bigger the C-value, the larger the cell: genome size and red blood cell size in vertebrates. Blood Cells Mol Dis. in adaptation and evolution: an expansion and strengthening of the 27(5):830–843. TE-thrust hypothesis. Ecol Evol. 2(11):2912–2933. Gregory TR. 2002. A bird’s-eye view of the C-value enigma: genome size, Oliver MJ, Petrov D, Ackerly D, Falkowski P, Schofield OM. 2007. The cell size, and metabolic rate in the class Aves. Evolution mode and tempo of genome size evolution in eukaryotes. Genome 56(1):121–130. Res. 17(5):594–601. Gregory TR, et al. 2007. Eukaryotic genome size databases. Nucleic Acids Organ CL, Shedlock AM. 2009. Palaeogenomics of pterosaurs and the Res. 35(Database issue):D332–D338. evolution of small genome size in flying vertebrates. Biol Lett. Hackett SJ, et al. 2008. A phylogenomic study of birds reveals their evo- 5(1):47–50. lutionary history. Science 320(5884):1763–1768. Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. 2007. Origin of Hanna ZR, et al. 2017. Northern spotted owl (Strix occidentalis caurina) avian genome size and structure in non-avian dinosaurs. Nature genome: divergence with the barred owl (Strix varia) and characteri- 446(7132):180. zation of light-associated genes. Genome Biol Evol. 9(10):2522–2545. Pages H, Aboyoun P, Gentleman R, DebRoy S. 2017. String objects rep- Hoffmann FG, McGuire LP, Counterman BA, Ray DA. 2015. Transposable resenting biological sequences, and matching algorithms. R package elements and small RNAs: genomic fuel for species diversity. Mobile version 2.44.2. Genet Elem. 5(5):63–66. Pascale E, Valle E, Furano AV. 1990. Amplification of an ancestral mam- Hughes AL, Hughes MK. 1995. Small genomes for better flyers. Nature malian L1 family of long interspersed repeated DNA occurred just 377(6548):391. before the murine radiation. Proc Natl Acad Sci U S A. Jarvis ED, et al. 2014. Whole-genome analyses resolve early branches in 87(23):9481–9485. the tree of life of modern birds. Science 346(6215):1320–1331. Pritham EJ, Feschotte C. 2007. Massive amplification of rolling-circle trans- Jetz W, Thomas G, Joy J, Hartmann K, Mooers A. 2012. The global diver- posons in the lineage of the bat Myotis lucifugus.Proc Natl Acad Sci U sity of birds in space and time. Nature 491(7424):444–448. S A. 104(6):1895–1900. Jurka J, Bao W, Kojima KK. 2011. Families of transposable elements, pop- Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, ulation structure and the origin of species. Biol Dir. 6:44. and diversity-dependence on phylogenetic trees. PLoS One Jurka J, et al. 2005. Repbase Update, a database of eukaryotic repetitive 9(2):e89543. elements. Cytogenet Genome Res. 110(1–4):462–467. Rabosky DL, et al. 2014. BAMMtools: an R package for the analysis of Jurka J, Kapitonov VV, Kohany O, Jurka MV. 2007. Repetitive sequences in evolutionary dynamics on phylogenetic trees. Methods Ecol Evol. complex genomes: structure and evolution. Annu Rev Genomics Hum 5(7):701–707. Genet. 8:241–259. Rabosky DL, Mitchell JS, Chang J. 2017. Is BAMM flawed? Theoretical and Kapusta A, Suh A. 2017. Evolution of bird genomes—a transposon’s-eye practical concerns in the analysis of multi-rate diversification models. view. Ann N Y Acad Sci. 1389(1):164–185. Syst Biol. 66(4):477–498. Kapusta A, Suh A, Feschotte C. 2017. Dynamics of genome size evolution Ray DA, et al. 2008. Multiple waves of recent DNA transposon activity in in birds and mammals. Proc Natl Acad Sci U S A. 114(8):E1460–E1469. the bat, Myotis lucifugus. Genome Res. 18(5):717–728. Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment soft- Rohland N, Reich D. 2012. Cost-effective, high-throughput DNA sequenc- ware version 7: improvements in performance and usability. Mol Biol ing libraries for multiplexed target capture. Genome Res. Evol. 30(4):772–780. 22(5):939–946. Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 1455 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manthey et al. GBE Ruggiero RP, Bourgeois Y, Boissinot S. 2017. LINE insertion polymorphisms Weibel AC, Moore WS. 2002. Molecular phylogeny of a cosmopolitan are abundant but at low frequencies across populations of Anolis group of woodpeckers (genus Picoides) based on COI and cyt b mi- carolinensis. Front Genet. 8:44. tochondrial gene sequences. Mol Phylogenet Evol. 22(1):65–75. Shakya SB, Fuchs J, Pons J-M, Sheldon FH. 2017. Tapping the woodpecker Weissensteiner MH, et al. 2017. Combination of short-read, long-read, tree for evolutionary insight. Mol Phylogenet Evol. 116:182–191. and optical mapping assemblies reveals large-scale tandem repeat Smit A, Hubley R, Green P. 2015. RepeatMasker Open-4.0. 2013–2015. arrays with population genetic implications. Genome Res. Institute for Systems Biology. Available from: http://repeatmasker/org. 27(5):697–708. Sotero-Caio CG, Platt RN, Suh A, Ray DA. 2017. Evolution and diversity of Wright NA, Gregory TR, Witt CC. 2014. Metabolic ‘engines’ of flight drive transposable elements in vertebrate genomes. Genome Biol Evol. genome size reduction in birds. Proc R Soc Lond B Biol Sci. 9(1):161–177. 281(1779):20132780. Stewart C, et al. 2011. A comprehensive map of mobile element insertion Zeh DW, Zeh JA, Ishida Y. 2009. Transposable elements and an epigenetic polymorphisms in humans. PLoS Genet. 7(8):e1002236. basis for punctuated equilibria. Bioessays 31(7):715–726. Tollis M, Boissinot S. 2013. Lizards and LINEs: selection and Zhang G, et al. 2014. Comparative genomic data of the Avian demography affect the fate of L1 retrotransposons in the genome Phylogenomics Project. GigaScience 3(1):1. of the green anole (Anolis carolinensis). Genome Biol Evol. Zhang Q, Edwards SV. 2012. The evolution of intron size in amniotes: a 5(9):1754–1768. role for powered flight? Genome Biol Evol. 4(10):1033–1043. Trizzino M, et al. 2017. Transposable elements are the primary source of novelty in primate gene regulation. Genome Res. 27(10):1623–1633. Associate editor: Helen Piontkivska 1456 Genome Biol. Evol. 10(6):1445–1456 doi:10.1093/gbe/evy105 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1445/5020728 by Ed 'DeepDyve' Gillespie user on 20 June 2018

Journal

Genome Biology and EvolutionOxford University Press

Published: May 29, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off