Highly conserved noncoding elements (CNEs) constitute a signiﬁcant proportion of the genomes of multicellular eukaryotes. The function of most CNEs remains elusive, but growing evidence indicates they are under some form of purifying selection. Noncoding regions in many species also harbor large numbers of transposable element (TE) insertions, which are typically lineage speciﬁc and depleted in exons because of their deleterious effects on gene function or expression. However, it is currently unknown whether the landscape of TE insertions in noncoding regions is random or inﬂuenced by purifying selection on CNEs. Here, we combine comparative and population genomic data in Drosophila melanogaster to show that the abundance of TE insertions in intronic and intergenic CNEs is reduced relative to random expectation, supporting the idea that selective constraints on CNEs eliminate a proportion of TE insertions in noncoding regions. However, we ﬁnd no evidence for differences in the allele frequency spectra for polymorphic TE insertions in CNEs versus those in unconstrained spacer regions, suggesting that the distri- bution of ﬁtness effects acting on observable TE insertions is similar across different functional compartments in noncoding DNA. Our results provide evidence that selective constraints on CNEs contribute to shaping the landscape of TE insertion in eukaryotic genomes, and provide further evidence that CNEs are indeed functionally constrained and not simply mutational cold spots. Key words: noncoding DNA, conserved noncoding elements, purifying selection, transposable elements, Drosophila. Introduction (assumed to be TEs) were rarely found in transcribed regions Transposable elements (TEs) are mobile DNA sequences that (Aquadro et al. 1986, 1992; Langley and Aquadro 1987; make up a signiﬁcant fraction of the genomes of many multi- Langley et al. 1988; Schaeffer et al. 1988). Subsequent anal- cellular organisms (Elliott and Gregory 2015), including the ysis of the D. melanogaster reference genome showed that model insect species, Drosophila melanogaster (Bergman the paucity of TEs in transcribed regions is primarily driven by a et al. 2006; Sackton et al. 2010). TEs are powerful mutagenic strong depletion of the number of TE insertions in exons com- agents that can affect gene expression and genome stability bined with a weaker reduction in introns (Kaminker et al. and are responsible for the majority of spontaneous muta- 2002; Lipatov et al. 2005). More recently, analysis of popula- tions in D. melanogaster (Ashburner et al. 2005). While many tion genomic data has conﬁrmed that TE insertions are rare in gaps remain in our understanding of the mechanisms that D. melanogaster exonic regions (Koﬂeretal. 2012; Cridland control TE content in natural populations of D. melanogaster, et al. 2013; Zhuang et al. 2014). it is well established that TE insertions in the D. melanogaster The underrepresentation of TEs in D. melanogaster exons is genome are largely restricted to noncoding DNA (reviewed in most likely explained by natural selection purging TE insertions Barron et al. 2014). Early restriction mapping studies on a that disrupt gene function from natural populations (Lipatov limited number of loci revealed that large DNA insertions et al. 2005; Petrov et al. 2011; Koﬂer et al. 2012). In general, The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Genome Biol. Evol. 10(6):1533–1545. doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1533 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manee et al. GBE TE insertions in D. melanogaster are thought to be under data sets allow unprecedented insight into this fundamental some form of purifying selection, based on the observation question by providing large samples of naturally occurring TE that they typically have lower allele frequencies relative to insertions mapped at nucleotide-level resolution in individual single nucleotide polymorphisms (SNPs) from the same pop- strains of D. melanogaster. We initially establish that signals ulation (Aquadro et al. 1986, 1992; Langley and Aquadro consistent with purifying selection can be observed in our 1987; Langley et al. 1988; Schaeffer et al. 1988; Cridland data by conﬁrming past results that the abundance of TE et al. 2013). However, few studies have directly investigated insertions is strongly reduced in exonic regions and weakly the allele frequency distribution of TE insertions in exons, prin- reduced in intronic regions relative to intergenic regions. We cipally because of the lack of data, and past studies have led then show that the abundance of TE insertions is signiﬁcantly to mixed conclusions. Analysis of a small sample of exonic TE reduced in both intronic and intergenic CNEs relative to ran- insertions using a pool-PCR strategy suggested their allele fre- dom expectations. However, the proportion of TE insertions quencies did not differ substantially from nonexonic TE inser- we estimate to be eliminated from CNEs is lower than in tions with similar genomic properties (Lipatov et al. 2005). In exonic regions, suggesting that many noncoding functional contrast, genome-wide analysis using pool-seq data showed a elements harbor TE insertion mutations in natural populations reduction in median allele frequencies for TE insertions in exons of D. melanogaster. We also ﬁnd no evidence that the derived relative those found in intergenic regions (Koﬂer et al. 2012). allele frequency (DAF) spectrum for TE insertions inferred from In addition to effects manifest at the RNA or protein level, it strain-speciﬁc genome sequences varies signiﬁcantly across is also possible TE insertions may be selected for their effects different functional compartments of the D. melanogaster at the DNA level in noncoding regions, for example, by inter- genome. Our results are consistent with selective constraints fering with cis-regulatory elements (Geyer et al. 1990; Lerman on CNEs in noncoding regions acting to inﬂuence the land- and Feder 2005). While comprehensive cis-regulatory maps scapeofTEinsertion in D. melanogaster. However, our results for D. melanogaster remain incomplete (Negre et al. 2011; also suggest that the evolutionary forces governing the abun- Arnold et al. 2013), it is well established that highly conserved dance of TE insertions in different functional compartments of noncoding elements (CNEs) are an abundant component of the D. melanogaster genome may be decoupled from those the D. melanogaster genome (Bergman and Kreitman 2001; controlling the allele frequency of observable TE insertions in Siepel et al. 2005) and that CNEs often overlap with known natural populations. cis-regulatory elements (Emberly et al. 2003; Brody et al. 2012). It has been estimated that 30–40% of sites in D. mel- Materials and Methods anogaster noncoding DNA are contained in CNEs (Siepel et al. Data Sets 2005), and population genetic analysis has shown that these Annotations of genes (ﬂyBaseGene), TEs in the reference ge- CNEs are maintained by purifying selection (Casillas et al. nome (rmsk), and conserved elements (phastCons15way) on 2007). Thus, CNEs represent an abundant class of noncoding Release 5 (dm3) coordinates of the D. melanogaster genome features under purifying selection that may inﬂuence the were obtained from UCSC Genome Browser (Siepel et al. 2005; landscape of TE insertions. Previous work showed that artiﬁ- http://www.repeatmasker.org/ (last accessed June 6, 2018); cially induced TE insertions are depleted in the most highly Gramates et al. 2017; Tyneretal. 2017). Annotations of non- conserved CNEs (so-called “ultra-conserved elements”) reference TE insertion in the Drosophila Genetic Reference (Makunin et al. 2013). However the nonrandom target pref- Panel (DGRP) of D. melanogaster strains from Raleigh, NC erences, requirement for marker gene activation in TE detec- (Mackay et al. 2012) were obtained from Supplementary tion, and experimental origin of the TEs analyzed by Makunin Material of papers describing two different TE detection meth- et al. (2013) do not allow conclusions to be drawn about CNE- ods: ngs_te_mapper (Linheiro and Bergman 2012)and TEMP based constraints on TE insertion for the endogenous set of TE (Zhuang et al. 2014). These data sets were chosen because both families in natural populations. Resolving whether CNEs inﬂu- ngs_te_mapper and TEMP take advantage of the TE-ﬂanking ence the landscape of TE insertion in natural populations of D. regions information contained in split reads and thus localize TE melanogaster will provide further insight into the factors gov- insertions to precise genomic coordinates. erning TE dynamics in this species, and contribute to our The ngs_te_mapper data set consists of nonreference TE broader understanding of the forces that shape genome or- insertions from 37 long terminal repeat (LTR) retrotransposon ganization and molecular evolution in general. and terminal inverted repeat (TIR) transposon families on the Here, we use genome-wide data sets of “nonreference” major chromosome arms (chrX, chr2L, chr2R, chr3L, chr3R, TE insertions (i.e., TEs identiﬁed in a resequenced sample that and chr4) identiﬁed using split-read information in whole- are not present in the reference genome) from a North genome Illumina shotgun sequence data from 166 DGRP American population of D. melanogaster (Linheiro and strains (Linheiro and Bergman 2012). Anew BEDﬁle for this Bergman 2012; Mackay et al. 2012; Zhuang et al. 2014)to data set was generated by Dr Linheiro R (personal communi- investigate whether selective constraints on CNEs inﬂuence cation) that encodes the number of DGRP strains in which the landscape of TE insertions in noncoding DNA. These 1534 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE each insertion was found in the score column (https://ﬁg- Intronic and intergenic regions were further partitioned into share.com/articles/Alternate_version_of_File_S4_from_ CNEs and spacers using the dm3 phastCons15way track. Linheiro_amp_Bergman_2012/1168883; last accessed Spacers are deﬁned as the noncoding regions complementary June 6, 2018). The TEMP data set consists of nonreference to CNEs that exhibit low primary sequence conservation TE insertions from 56 LTR retrotransposon, non-LTR retro- (Bergman et al. 2002; Casillas et al. 2007). transposon, and TIR transposon families identiﬁed using We restricted our analysis to regions of the D. melanogaster read-pair and split-read information in whole-genome Release 5 genome sequence with normal rates of recombina- Illumina shotgun sequence data for 53 DGRP strains tion using criteria established in previous population genomic (Zhuang et al. 2014). We transformed the original TEMP analyses of TEs in D. melanogaster (Cridland et al. 2013, 2015): data set from https://zlab.umassmed.edu/TEMP/TEMP_ chrX: 300000–20800000, chr2L: 200000–20100000, chr2R: resources/DGRP_53lines_TE_polymorphisms.tar.gz (last 2300000–21000000, chr3L: 100000–21900000, chr3R: accessed June 6, 2018) to match the format of the ngs_te_- 600000–27800000. Low recombination regions on the major mapper data set as follows. TE insertions in the original *.inser- chromosome arms (including all of chr4) were excluded be- tion.reﬁned.bp.refsup TEMP output ﬁles were ﬁrst merged cause of the high density of reference TE sequences in these across all strains. Insertions supported by split-read data on regions (Bartolome et al. 2002; Bergman et al. 2006), which both ends of the TE (“1p1” ﬂag) that are mapped to precise poses challenges both to identifying nonreference TE insertions genomic coordinates on the major chromosome arms (chrX, and to deﬁning CNEs using comparative genomic data. chr2L, chr2R, chr3L, chr3R, and chr4) were then extracted and Additionally, we excluded regions of the reference genome converted to BED format. BED-formatted insertions were then identiﬁed by RepeatMasker as TE from normally recombining sorted and clustered using BEDtools complement (-s -d 0) regions because nonreference TEs are likewise systematically (Quinlan and Hall 2010). The number of strains per cluster con- underpredicted in these regions. Low-recombination and ref- taining a TE insertion for the same TE family on the same strand erence TE intervals were subtracted from all exonic, intronic, was then encoded in the score column of a BED-formatted ﬁle. intergenic, CNE, and spacer compartments. Normally recom- For both data sets, a small number of TE insertions were pre- bining regions excluding reference TE intervals occupy 86.6% dicted to occur at the same location, either from closely related of the 120 Mb D. melanogaster Release 5 genome. TE families (e.g., Stalker vs Stalker 4) or for TIR elements pre- Nonreference TE insertions in high recombination regions dicted on opposite strands at the same location (e.g., S ele- excluding reference TE intervals were then assigned to geno- ment). We kept one of these redundant annotations based mic compartments using overlapSelect (Kuhn et al. 2013). The on theﬁrstoccurrence in thedataset. Finally, weexcluded all locations of the nonreference TE insertions in the ngs_te_map- P element insertions from both data sets, since this TE family is per and TEMP data sets analyzed here are annotated as their known to have a strong nonrandom preference to insert around target site duplication (TSD) (Bergman 2012; Linheiro and transcriptional start sites (Spradling et al. 1995; Bellen et al. Bergman 2012; Zhuang et al. 2014), whichspansmall intervals 2004; Koﬂer et al. 2015). (typically<10 bp) on reference genome coordinates and can therefore overlap the boundaries of neighboring genomic compartments. To avoid counting TEs that overlap boundaries Assigning TE Insertions to Genomic Compartments multiply or partially in different compartments, a series of ﬁl- We partitioned regions of the D. melanogaster genome into tering steps was implemented to identify TE insertions that mutually exclusive exonic, intronic, and intergenic compart- overlap intronic/exonic, intergenic/exonic, and CNE/spacer ments based on the gene structures in the dm3 ﬂyBaseGene boundaries. Each distinct category of “overlapping” TE inser- track using the overlapSelect and BEDtools intersect, comple- tions is mutually exclusive with other overlapping or “pure” ment, and subtract tools (Quinlan and Hall 2010; Kuhn et al. compartments. TE insertions observed in low-recombination or 2013). Each tool was run using default parameter settings. Our reference TE intervals were eliminated from both data sets. The partitioning strategy follows Lipatov et al. (2005) and assumes majority of nonreference TEs in both data sets studied here a hierarchy of functional constraints for genomic regions that were located in normally recombining regions excluding refer- have multiple annotation states due to alternative splicing or ence TE intervals (ngs_te_mapper: n¼ 6,061/6,747, 89.8%; promoter usage: namely, functional constraints on exonic TEMP: 4,652/5,331, 87.3%). The ﬁnal data sets of nonrefer- regions take precedence over intronic regions, and constraints ence TE insertions in normally recombining regions excluding on intronic regions take precedence over intergenic regions. reference TE intervals are available in Additional Files 1 and 2, Exonic regions span the union of all exon intervals in the ge- for ngs_te_mapper and TEMP, respectively. nome and include both coding sequences (CDS) and untrans- lated regions (UTRs). Intronic regions were deﬁned as the Testing for Purifying Selection on TE Insertions complement of exonic regions in genomic intervals spanned by at least one transcript model. Intergenic regions were de- We tested for depletion of TE insertions in different genomic ﬁned as the complement of all exonic and intronic regions. compartments relative to random expectations using a Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1535 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manee et al. GBE permutation approach. In contrast to goodness-of-ﬁt tests exonic regions. Third, TE insertions observed in intronic based on expected proportions of genomic compartments, regions were allowed to randomly insert in intronic regions our permutation approach accommodates the fact that non- to test if TEs are depleted in intronic CNEs relative to intronic reference TEs can span multiple compartments (see above) spacers, independent of the effects of purifying selection on and accounts for the empirical length distributions of intervals exonic or intergenic regions but accounting for potential se- in different genomic compartments and the variable lengths lection on introns. Finally, TE insertions observed in intergenic of TSDs for nonreference TEs. Random TE insertion was sim- regions were allowed to randomly insert in intergenic regions ulated using BEDTools shufﬂe to permute the location of TE to test if TEs are depleted in intergenic CNEs relative to inter- insertions in different compartments of the Release 5 ge- genic spacers, independent of the effects of purifying selec- nome. Random TE insertions were required to be placed tion on exonic or intronic regions. For each test, 10,000 within their same chromosome (-chrom option) and were permutations were performed to provide a distribution of not allowed to overlap each other (-noOverlapping option). outcomes under the null hypothesis of random insertion. P Random insertions were not allowed to land in low- values under the null hypothesis of random insertion were recombination regions or regions of the reference genome estimated as the proportion of 10,000 permutations with annotated as TE by RepeatMasker (Smit et al. 2013) using the numbers of TE insertions in putatively selected compartments BEDtools shufﬂe -excl option. These constraints were imple- that were less than or equal to the observed data. We tested mented for several reasons. First, as noted above, nonrefer- the one-sided hypothesis that putatively functional categories ence TE detection systems systematically underpredict in should have a depletion of TE insertions. To conservatively repetitive regions like reference TEs, which are enriched in account for the effects of multiple tests (n¼ 16), we consider low recombination regions. Second, reference TE spans are P values smaller than an a-level of 0.0005 (0.01/20) as signif- found almost exclusively in noncoding regions (Kaminker icant. Fold enrichment or depletion of TE insertions in puta- et al. 2002; Lipatov et al. 2005) and, within noncoding tively selected compartments was estimated by comparing regions, reference TEs are found almost exclusively in spacers the observed values to the median value of random since few TE insertions in the D. melanogaster genome oc- outcomes. curred prior to speciation (Caspi and Pachter 2005; Bergman Additionally, we tested whether the derived allele fre- and Bensasson 2007; Sackton et al. 2010). If not controlled quency (DAF) of TE insertions in putatively selected geno- for, the combined effects of detection bias and nonrandom mic compartments (exonic regions, CNEs) differed from distribution of reference TEs would lead to an excess of non- control regions (intergenic spacers). The DAF for each in- reference insertions in regions enriched in reference TEs in sertion site was calculated by dividing the number of permuted data sets, even under the null hypothesis of random strains in which the insertion was present by the sample insertion. Finally, the efﬁcacy of natural selection on individual size of the data set (ngs_te_mapper: n¼ 166; TEMP: alleles is reduced in regions of the Drosophila genome with n¼ 53). Following previous efforts testing whether CNEs low rates of recombination because of the confounding are cold spots of point mutation (Drake et al. 2006; effects of selection on linked sites extending over larger Casillas et al. 2007), the null hypothesis of no difference regions (Presgraves 2005; Haddrill et al. 2007). The -seed op- in DAF between “selected” and “control” compartments tion was used to allow results of each run to be replicated. TE was tested using a nonparametric Wilcoxon rank sum test. insertions in permuted data sets were then assigned to geno- DAF tests of TE insertion allele frequencies in CNEs versus mic compartments as described earlier. spacers were performed separately for intronic and inter- A series of permutation tests were performed to test the genic regions. As in related work (Petrov et al. 2011; null hypothesis of random TE insertion across various sets of Koﬂer et al. 2012; Cridland et al. 2013), we assumed all genomic compartments. TE insertions and intervals for com- TE insertions represent the derived state since, with the ex- partments not included in a particular test were excluded us- ception of the INE-1 family that is not studied here (Singh ing the BEDtools shufﬂe -excl option. All permutation tests et al. 2005; Wang et al. 2007), few TE insertions in D. mel- were restricted to normally recombining regions of the ge- anogaster are thought to have occurred prior to speciation nome excluding reference TE intervals as described earlier. (Caspi and Pachter 2005; Bergman and Bensasson 2007; First, TE insertions observed in all compartments were allowed Sackton et al. 2010). Rare TE insertions spanning intron/ to randomly insert into all compartments to test if TEs are exon on intergenic/exon boundaries were excluded from depleted in pure and overlapping exonic regions relative to DAF analysis because of their low sample sizes. However, noncoding DNA. This analysis was performed as a positive TE insertions spanning CNE/spacer boundaries were rela- control to determine if our approach could replicate previously tively common, and thus were analyzed as distinct class reported results. Second, TE insertions observed in noncoding and compared with TEs contained fully within spacers. regions were allowed to randomly insert in noncoding regions All graphical and statistical analyses were performed in the to test if TEs are depleted in introns relative to intergenic R programming environment (version 3.4.0) (https://www. regions, independent of the effects of purifying selection on r-project.org/; last accessed June 6, 2018). 1536 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE results, we analyzed both data sets independently to address Results how robust our results are to TE detection methods. The TE Insertions Are Depleted in Conserved Noncoding numbers of nonreference TE insertions, nucleotides, and pro- Elements portion of the genome spanned are shown for exons, introns, To understand whether selective constraints on noncoding and intergenic regions in table 1 and for CNEs and spacers in DNA inﬂuence patterns of TE insertion, we analyzed the noncoding regions in table 2. abundance of nonreference TEs insertions in different func- As a positive control, we ﬁrst tested whether the previously tional genomic compartments of the D. melanogaster ge- reported depletion of TE insertions in D. melanogaster exonic nome. We ﬁrst assigned nonreference TE insertions in regions (Lipatov et al. 2005; Koﬂer et al. 2012; Cridland et al. normally recombining regions to functional compartments 2013) could be observed in the ngs_te_mapper and TEMP based on gene and conserved element annotations (see data sets using our permutation procedure. As shown in ta- Materials and Methods for details). We then tested for deple- ble 1, several hundred TE insertions can be found in exonic tion of nonreference TE insertions in genomic regions with regions in natural populations of D. melanogaster (see also putatively higher levels of functional constraint (i.e., exonic Koﬂer et al. 2012; Cridland et al. 2013). Nevertheless, we regions, CNEs) by comparing observed numbers of TEs in observed a clear depletion of TE insertions in exonic regions these regions to an empirical null distribution based of relative to random expectations (ﬁg. 1A), coupled with a con- 10,000 random permutations of the observed TE insertion comitant excess in intronic regions (ﬁg. 1B) and intergenic data sets. Finally, we tested whether the DAF spectrum for regions (ﬁg. 1C). We estimatea4-fold (P< 1e-04) and TE insertions in genomic regions with putatively higher levels 4.35-fold (P< 1e-04) reduction in TEs in exonic regions rela- of functional constraint was skewed toward rarer alleles, as tive to the median of random outcomes for the ngs_te_map- would be expected if TE insertions in these regions were per and TEMP data sets, respectively (ﬁg. 1A). We also weakly negatively selected. detected evidence for a signiﬁcant depletion of TE insertions Recent studies have shown that no single bioinformatic spanning intron/exon boundaries (ﬁg. 1D) for both ngs_te_- system can comprehensively identify all nonreference TE inser- mapper (4.6-fold reduction, P¼ 1e-04) and TEMP (5.9-fold tions in resequencing data (Nelson et al. 2017; Rishishwar reduction, P< 1e-04), consistent with the presence of et al. 2017). Therefore, we used two independent nonrefer- “hazardous zones” for TE insertion near intron–exon junc- ence TE insertion data sets in our analysis, ngs_te_mapper tions shown previously in humans (Zhang et al. 2011). In con- (Linheiro and Bergman 2012)and TEMP (Zhuang et al. trast, we observed no signiﬁcant depletion of TEs at 2014), both derived from the same sample of strain-speciﬁc intergenic/exon boundaries (ﬁg. 1E; ngs_te_mapper: genome sequences isolated from a North American popula- P¼ 0.98; TEMP: P¼ 0.27). These results support previous tion of D. melanogaster (Mackay et al. 2012). Unlike related analyses that TEs are selectively eliminated from exonic data sets for the DGRP population that do not map TE inser- regions (Lipatov et al. 2005; Petrov et al. 2011; Koﬂer et al. tion breakpoints to exact locations (Cridland et al. 2013; 2012; Cridland et al. 2013), and demonstrate that our ap- Rahman et al. 2015), the ngs_te_mapper and TEMP data proach can detect selective constraints on TE insertions that sets analyzed here both use TE-ﬂanking region junction infor- are assumedtoexist in the D. melanogaster genome. mation contained in split reads to annotate TE insertions with We next investigated whether our data provide evidence highest possible resolution (the TSD; see Bergman 2012 for that purifying selection eliminates a higher proportion of TEs discussion). The high positional accuracy of the ngs_te_map- in intronic regions relative to intergenic regions, by permuting per and TEMP data sets improves identiﬁcation of allelic inser- the locations of TEs in noncoding regions only. We observed a tions occupying the same insertion site in different strains and trend toward fewer TE insertions in intronic regions relative to assignment of TE insertion sites to speciﬁc genomic compart- random expectation (ﬁg. 1F) with a corresponding excess in ments. We did not ﬁlter either data set to remove regions with intergenic regions (ﬁg. 1G) in both data sets. The magnitude identity-by-descent to another strain or residual heterozygos- of this effect was weak but signiﬁcant in the ngs_te_mapper ity within strain because these issues affect only 10% of data set (1.05-fold reduction, P¼ 3e-04), and of a similar sites in the DGRP genomes (Lack et al. 2015), are expected magnitude but not signiﬁcant in the TEMP data set (1.02- to inﬂuence our abundance and DAF analyses only by small fold reduction, P¼ 0.05). Our results support those of factors, and can only bias our results if these regions are non- Koﬂer et al. (2012) who similarly observed a weak but signif- randomly associated with functional compartments. The icant reduction in numbers of TE insertions in intronic regions ngs_te_mapper and TEMP data sets used here are largely relative to intergenic regions using pool-seq data, but differ nonoverlapping, with only 869 insertion sites in common from Cridland et al. (2013) who observed more TEs in intronic (14.3–18.7% of each data set). Because of the largely non- regions relative to intergenic regions using strain-speciﬁc ge- overlapping nature of these data sets, together with biases nome data. Together, these results suggest that the TE density associated with merging data sets and the inability to interpret in D. melanogaster intronic regions is weakly reduced relative merged data sets in the context of previous benchmarking to random expectations, but that the proportion of TEs Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1537 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manee et al. GBE Table 1 TE Insertions in Normal Recombination Regions Region Coverage (bp) % Normal Rec. Genome # ngs_te_mapper TE % ngs_te_mapper TE # TEMP TE % TEMP TE Exon 27502613 26.4 399 6.6 278 6 Intron 38960671 37.4 2,743 45.3 2,153 46.3 Intron/exon n.a. n.a. 5 0.1 7 0.2 Intergenic 37804929 36.3 2,905 47.9 2,210 47.5 Intergenic/exon n.a. n.a. 9 0.1 4 0.1 Total 104268213 100 6,061 100 4,652 100 NOTE.—Columns contain the coverage (in bp) and percent of the normally recombining genome covered for exonic, intronic, and intergenic regions followed by the number and percent of TE insertions found fully in exonic, intronic, and intergenic regions or spanning intron/exon and intergenic/exon boundaries for both ngs_te_mapper and TEMP. Overlap categories have “n.a.” for coverage and percent of the normally recombining genome covered since boundaries between compartments do not occupy any space. Regions of the reference genome identiﬁed by RepeatMasker as TE were subtracted from all compartments and any nonreference TE in these regions were excluded from all analyses. Regions of normal recombination were deﬁned by Cridland et al. (2013). Table 2 TE Insertions in Noncoding Regions with Normal Recombination Region Coverage (bp) % Normal Rec. Noncoding Genome # ngs_te_mapper TE % ngs_te_mapper TE # TEMP TE % TEMP TE Intronic CNE 14093340 18.4 747 13.2 500 11.5 Intronic spacer 24867331 32.4 1,842 32.6 1,458 33.4 Intronic CNE/spacer n.a. n.a. 154 2.7 195 4.5 Intergenic CNE 14749396 19.2 813 14.4 577 13.2 Intergenic spacer 23055533 30 1,928 34.1 1,447 33.2 Intergenic CNE/spacer n.a. n.a. 164 2.9 186 4.3 Total 76765600 100 5,648 100 4,363 100 NOTE.—Columns contain the coverage (in bp) and percent of the normally recombining noncoding genome covered by CNEs and spacers for introns and intergenicregions followed by the number and percent of TE insertions found fully in CNEs and spacers or spanning CNE/spacer boundaries for both ngs_te_mapper and TEMP. Overlap categories have “n.a.” for coverage and percent of the normally recombining noncoding genome covered since boundaries between compartments do not occupy any space. Regions of the reference genome identiﬁed by RepeatMasker as TE and any nonreference TE in these regions were excluded from all compartments. Regions of normal recombination were deﬁned by Cridland et al. (2013). eliminated from intronic regions is not sufﬁciently large for the Correspondingly, we also observe that TE insertions in both effect to be reliably identiﬁed in all population genomic data data sets are overrepresented in spacers in both intronic sets. regions (ﬁg. 2E; ngs_te_mapper: 1.11-fold excess; TEMP: Finally, we tested whether TE insertions were depleted in 1.15-fold excess) and intergenic regions (ﬁg. 2F; ngs_te_map- CNEs relative to spacer regions (ﬁg. 2). For this analysis, we per: 1.83-fold excess; TEMP: 1.17-fold excess). Overall, these permuted TE insertions separately within intronic regions and results suggest that while some CNEs tolerate disruption by within intergenic regions and accounted for TE insertions large TE insertions, constraints on CNEs are substantial spanning CNE/spacer boundaries. We identiﬁed several hun- enough to eliminate enough TE insertions in CNEs to bias dred TE insertions that exist in CNEs in both intronic and inter- the distribution of observed TE insertions toward spacers in genic regions (table 2). Nonetheless, we found evidence for a noncoding regions of the D. melanogaster genome. signiﬁcant depletion in the density of TEs in CNEs in both intronic regions (ﬁg. 2A; ngs_te_mapper: 1.21-fold reduction, Allele Frequencies of TE Insertions are Similar P< 1e-04; TEMP: 1.31-fold reduction, P< 1e-04) and inter- across Different Functional Compartments of the genic regions (ﬁg. 2B; ngs_te_mapper: 1.3-fold reduction, D. melanogaster Genome P< 1e-04; TEMP: 1.3-fold reduction, P< 1e-04). We also ob- served a weak but nonsigniﬁcant trend for fewer TE insertions Additional evidence for purifying selection acting to shape overlapping CNE/spacer boundaries relative to random expec- the landscape of TE insertions can potentially be obtained tation in intronic regions (ﬁg. 2C; ngs_te_mapper: 1.18-fold from investigating the allele frequencies of TE insertions in reduction, P¼ 0.04; TEMP: 1.23-fold reduction, P¼ 0.002). population samples. Population genetics theory predicts Fewer TE insertions overlapping CNE/spacer boundaries rela- that natural selection will prevent new deleterious alleles tive to expectations were also observed in intergenic regions, from reaching high population frequency (Fay et al. 2001). with data for TEMP but not ngs_te_mapper showing a signif- If polymorphic TE insertions are weakly negatively se- icanteffect(ﬁg. 2D; ngs_te_mapper: 1.16-fold reduction, lected, they should be skewed toward lower allele fre- P¼ 0.16; TEMP: 1.28-fold reduction, P¼ 1e-04). quencies in regions under higher selective constraint 1538 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE ngs_te_mapper TEMP A Exonic 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 # TE insertions B Intronic C Intergenic 1500 1700 1900 2100 2300 2500 2700 2900 1500 1700 1900 2100 2300 2500 2700 2900 3100 # TE insertions # TE insertions D Intronic/Exonic overlap E Intergenic/Exonic overlap 0 5 10 15 20 25 30 35 40 45 50 55 60 0 5 10 15 20 # TE insertions # TE insertions F Intronic (noncoding only) 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 # TE insertions G Intergenic (noncoding only) 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 # TE insertions FIG.1.—TEs in normally recombining regions of the Drosophila melanogaster genome are depleted in exonic and intronic regions. Observed numbers of TEs in different genomic compartments are shown as vertical lines for ngs_te_mapper (red) and TEMP (blue). Empirical null distributions of the numbers of TEs in different genomic compartments in 10,000 random permutations are shown as density plots for ngs_te_mapper (red) and TEMP (blue). All permu- tation analyses were restricted to normally recombining regions of the D. melanogaster genome as deﬁned by Cridland et al. (2013). Permutation analyses were conducted across all compartments (A–E), or in noncoding regions only (F and G). Observed and simulated numbers of TEs were counted in exonic regions (A), intronic regions (B and F), intergenic regions (C and G), intronic/exonic boundaries (D), and intergenic/exonic boundaries (E). Observed TEs overlapping intron/exon boundaries or intergenic/exon boundaries were excluded from permutation analyses in noncoding regions only (F and G). Regions of the reference genome identiﬁed by RepeatMasker as TE sequence and any nonreference TE in these regions were also excluded from all permutation analyses. such as exonic regions and CNEs relative to control spacers (Casillas et al. 2007) and in replacement sites rel- regions that have weaker functional constraint. A skew ative to silent sites (Huang et al. 2014). However, small in the frequency of D. melanogaster SNPs toward rarer indels showed no tendency to be skewed toward rarer alleles has previously been observed in CNEs relative to alleles in CNEs relative to spacers (Casillas et al. 2007), Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1539 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Density Density Density Density Density 0.000 0.010 0.000 0.010 0.00 0.06 0.000 0.010 0.000 0.010 Density Density 0.00 0.15 0.000 0.010 Manee et al. GBE ngs_te_mapper TEMP A Intronic CNE B Intergenic CNE 400 500 600 700 800 900 1000 1100 500 600 700 800 900 1000 1100 1200 # TE insertions # TE insertions C Intronic CNE/spacer overlap D Intergenic CNE/spacer overlap 100 125 150 175 200 225 250 275 300 325 100 125 150 175 200 225 250 275 300 325 # TE insertions # TE insertions E Intronic spacer F Intergenic spacer 1000 1200 1400 1600 1800 2000 1000 1200 1400 1600 1800 2000 # TE insertions # TE insertions FIG.2.—TEs in normally recombining regions of the Drosophila melanogaster genome are depleted in conserved noncoding elements. Observed numbers of TEs in different noncoding compartments are shown as vertical lines for ngs_te_mapper (red) and TEMP (blue). Empirical null distributionsof the numbers of TEs in different noncoding compartments in 10,000 random permutations are shown as density plots for ngs_te_mapper (red) and TEMP (blue). All permutation analyses were restricted to normally recombining regions of the D. melanogaster genome as deﬁned by Cridland et al. (2013). Permutation analyses were conducted across intronic regions only (A, C,and E) or intergenic regions only (B, D,and F). Observed and simulated numbers of TEs were countedinCNEs (A and B), CNE/spacer boundaries (C and D), or spacers (E and F). The TEMP data set has higher number of observed and expected CNE/ spacer overlaps (C and D) despite having fewer TE insertions overall because of a larger average TSD length (7.71 bp) relative to ngs_te_mapper (4.73 bp). Observed TEs overlapping intron/exon boundaries or intergenic/exon boundaries were excluded from these analyses. Regions of the reference genome identiﬁed by RepeatMasker as TE sequence and any nonreference TE in these regions were also excluded from all permutation analyses. suggesting a similar distribution of ﬁtness effects for small be substantially compromised, since all compartments are af- indels in both types of noncoding region. fected by the similar methodological biases in TE detection. Figure 3 shows the DAF spectra for TE insertions in differ- We ﬁrst assessed whether the expected skew toward ent functional compartments across the D. melanogaster ge- lower allele frequencies could be observed for TE insertion nome. Consistent with classical restriction mapping and in situ in exonic regions. For this and all subsequent DAF spectra hybridization studies (reviewed in Charlesworth and Langley analyses, we used the frequency distribution of TE insertions 1989 and Nuzhdin 1999) and recent strain-speciﬁc population in intergenic spacers as a control, based on abundance results genomic data (Cridland et al. 2013), both data sets show the above showing this compartment was under the weakest expected pattern for TE insertion alleles to be skewed toward selective constraint for TE insertion. As shown in ﬁgure 3, rare alleles in all genomic compartments. However, clear dif- we ﬁnd no signiﬁcant differences between the DAF spectra ferences are observed between ngs_te_mapper (ﬁg. 3A)and for TEs in exonic regions in either data set: (ngs_te_mapper: TEMP (ﬁg. 3B) in the overall shape of the DAF spectra across W¼ 391,158.5, P¼ 0.43; TEMP: W¼ 205,299.5, P¼ 0.36). all compartments, with a skew toward more rare alleles in the One possibility for the lack of skew toward rarer alleles for TEs ngs_te_mapper data set relative to TEMP. We interpret overall in exonic regions is the presence of a small number of unusu- differences in DAF spectra between TE data sets to result pri- ally high-frequency exonic TE insertions that are potentially marily from the higher false negative rate for ngs_te_mapper involved in adaptation to insecticide resistance (arrows, relative to TEMP (Nelson et al. 2017) (see Discussion). ﬁg. 3A and B) (ngs_te_mapper: 1360 in sut1, Steele et al. Regardless of the cause(s) of systematic differences in the 2015;TEMP: 17.6 in cyp6a2, Waters et al. 1992; Delpuech DAF spectra across methods, comparison of DAF spectra et al. 1993; Wan et al. 2014, accord in cyp6g1, Daborn et al. across genomic compartments within a data set should not 2002; Chung et al. 2006). When these putatively adaptitive 1540 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Density Density Density 0.000 0.015 0.000 0.025 0.000 0.015 Density Density Density 0.000 0.015 0.000 0.025 0.000 0.015 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE ngs_te_mapper 0.99 Exon 0.98 0.97 Intronic CNE 0.96 0.95 Intronic CNE/spacer overlap 0.94 0.93 Intronic spacer 0.92 0.91 Intergenic CNE 0.9 0.89 Intergenic CNE/spacer overlap 0.88 0.87 Intergenic spacer 0.86 0.09 0.08 0.07 0.06 0.05 0.04 sut1−1360 0.03 0.02 0.01 0.025 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 0.475 Derived allele frequency TEMP 0.99 Exon 0.98 0.97 Intronic CNE 0.96 0.95 Intronic CNE/spacer overlap 0.94 0.93 Intronic spacer 0.92 0.91 Intergenic CNE 0.9 0.89 Intergenic CNE/spacer overlap 0.88 0.87 Intergenic spacer 0.86 0.09 0.08 0.07 0.06 0.05 0.04 cyp6a2−17.6 cyp6g1−accord 0.03 0.02 0.01 0.025 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 0.475 Derived allele frequency FIG.3.—The derived allele frequency (DAF) spectrum for TE insertions is similar across different compartments of the Drosophila melanogaster genome. DAF spectra are shown for TE insertions predicted by ngs_te_mapper (A) or TEMP (B). Allele frequency classes are shown on the X axis, and the proportion of TE insertions observed in a particular compartment of the genome at that allele frequency is shown on the Y axis. Note that the Y axis is split to allow better visualization of the proportion of higher allele frequency classes. outlier loci are excluded, TEs in exonic regions still do not show regions (ngs_te_mapper: W¼ 767,402.5, P¼ 0.2; TEMP: a consistent skew toward rarer alleles relative to those in inter- W¼ 411,058, P¼ 0.31). Likewise, the DAF spectra for TEs genic spacers regions: (ngs_te_mapper: W¼ 389,232.5, overlapping CNE/spacer boundaries did not differ from TEs P¼ 0.5; TEMP: W¼ 203,853.5, P¼ 0.27). These results sug- fully contained in spacer intervals in both intronic regions gest that the distribution of ﬁtness effects for exonic TE inser- (ngs_te_mapper: W¼ 141,937, P¼ 0.98; TEMP: tions that are not strongly deleterious does not differ W¼ 139,781.5, P¼ 0.46) and intergenic regions (ngs_te_- substantially from those in intergenic spacers (see also mapper: W¼ 157,028.5, P¼ 0.83; TEMP: W¼ 132,093, Lipatov et al. 2005). P¼ 0.44). Similar to previous results for small indels (Casillas Next, we tested whether the DAF spectrum for TE inser- et al. 2007), these results imply that the distribution of ﬁtness tions in CNEs differed from those in noncoding spacer effects on TE insertions wholly or partially contained in CNEs is regions. In this analysis, we also considered the DAF spectrum not substantially different from that operating on spacer of TE insertions that spanned CNE/spacer boundaries because regions in noncoding DNA. this overlap class is reasonably common and also exhibits a trend toward being depleted in TE insertions (see above). As Discussion shown in ﬁgure 3, we found no signiﬁcant differences in the Here, we show that the abundance of TE insertions is signif- DAF spectra for TEs in CNEs relative to those in spacer intervals icantly reduced relative to random expectation in two distinct in both intronic regions (ngs_te_mapper: W¼ 671,827, genomic compartments with known or suspected function: P¼ 0.19; TEMP: W¼ 358,690, P¼ 0.29) and intergenic Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1541 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Proportion Proportion Manee et al. GBE exonic regions and CNEs. In contrast, we ﬁnd no clear signa- indicating they are not dependent on the idiosyncrasies of a ture for a skew toward lower allele frequencies for TEs in single method for calling TE insertions in short-read rese- these genomic compartments when compared with regions quencing data. Nevertheless, it is important to consider how of the genome under the lowest level of selective constraint. our results may be affected by the imperfect state of the art in Our results are consistent either with 1) nonrandom transpo- TE calling in terms of positional accuracy and false negative sition causing TEs to avoid functional compartments like ex- rates (Nelson et al. 2017; Rishishwar et al. 2017). It is unlikely onic regions and CNEs, or 2) a mode of purifying selection that the depletion of TE insertions we observe is due to im- that differentially eliminates TE insertions from functional precise annotation of the TE insertions analyzed here, since regions but leaves behind polymorphic TEs insertions that underrepresentation of TEs in exonic regions has been ob- have a similar distribution of ﬁtness effects across genomic served previously using a variety of different classical and ge- compartments. nomic approaches (Aquadro et al. 1986, 1992; Langley and Support for purifying selection driving the patterns we ob- Aquadro 1987; Langley et al. 1988; Schaeffer et al. 1988; serve comes from the facts that the majority of spontaneous Bartolome et al. 2002; Kaminker et al. 2002; Lipatov et al. mutations in D. melanogaster genes are caused by TEs 2005; Koﬂer et al. 2012; Cridland et al. 2013; Zhuang et al. (Ashburner et al. 2005) (proving that transposition can occur 2014). Likewise, if false negative rates are constant across in functional regions), and that TE insertions are skewed to- genomic compartments, false negatives are unlikely to gen- ward lower allele frequencies relative to SNPs from the same erate the abundance patterns we observe. For this to be the population (Aquadro et al. 1986, 1992; Langley and Aquadro case, the allele frequency of TE insertions would need to be 1987; Langley et al. 1988; Schaeffer et al. 1988; Cridland skewed toward lower frequencies in compartments with et al. 2013). Moreover, TEs in D. melanogaster only show higher levels of constraint, so that a higher relative proportion weak target site preferences for short AT-rich motifs of singleton TE insertion sites would fail to be detected in (Linheiro and Bergman 2012), which argues against the non- compartments under higher constraint (leading to an artifac- random transposition model. The only D. melanogaster TE tually lower number of insertion sites in high constraint family known to have strong nonrandom insertion biases— regions). We ﬁnd no evidence for a skew toward lower DAF the P element (Spradling et al. 1995; Bellen et al. 2004; Koﬂer in compartments with higher levels of constraint in our data et al. 2015)—was excluded from our analysis for this reason. (ﬁg. 3). False negative rates may, however, vary across func- Additionally, recent analysis of de novo TE insertion in D. tional compartments, for example, if higher SNP density in melanogaster mutation accumulation lines found no associa- regions under lower constraint reduces read mapping quality tion between transposition rate and exon content, and only and increase false negative rates. This potential bias cannot one TE family (copia) showed an association with chromatin explain our results since it would lead to an enrichment of TE state (Adrion et al. 2017). Adrion et al. (2017) did ﬁnd a insertions in regions with high constraint, which is the oppo- marginally signiﬁcant negative association between transpo- site of the pattern observed here. sition rate and GC-content at the 10-kb scale. Coupled with Although we observe the expected pattern of depletion of the weak AT-bias of TE target site motifs and the fact that TE insertions in regions with higher constraint, we ﬁnd no exons and CNEs are more GC-rich than their ﬂanking regions difference in the DAF spectra between highly constrained (Casillas et al. 2007; Zhu et al. 2009), it is possible that base and weakly constrained compartments within either the composition may contribute to the patterns of TE depletion ngs_te_mapper or TEMP data sets. As above, it is unlikely seen in these functional compartments. However, the mag- that positional inaccuracy or false negatives can explain the nitude of differences in GC-content in the high- lack of difference in the DAF spectra between exonic regions recombination regions studied here between noncoding or CNEs and spacers. The high positional accuracy of the regions (GC: 0.40) and exons (GC: 0.49) or between spacers ngs_te_mapper and TEMP data sets mitigates against mis- (GC: 0.39) and CNEs (GC: 0.42) does not appear sufﬁcient to assignment of TEs to the wrong compartment, which could explain the>14.1-fold increase in TE abundance in noncod- in principle cause the DAF spectra for different compartments ing regions relative to exons or the>2.3-fold increase in TE to appear more similar than they really are. Furthermore, in abundance in spacers relative to CNEs. On balance, we con- the case of CNEs, we accounted for potential blurring of clude that purifying selection is the more likely explanation for compartment assignment by showing that the DAF spectra the depletion of TEs observed in exons and CNEs. If this inter- of TEs spanning CNE/spacer boundaries have similar allele pretation is correct, our results provide the ﬁrst systematic frequencies to TEs fully contained within CNEs. Additionally, evidence that selective constraints on CNEs inﬂuence the while it is clear that false negatives distort the DAF spectrum landscape of TE insertion in a eukaryote genome, and provide toward rare alleles (Emerson et al. 2008), if the false negative new evidence supporting the conclusion that CNEs are func- rate is uniform across the genome, false negatives should tionally constrained and not mutational cold spots. affect the DAF spectra for all functional compartments in a Our conclusions are derived from two largely nonoverlap- similar way. It is formally possible that one reason we failed to ping TE insertion data sets (ngs_te_mapper and TEMP), detect a real skew toward lower DAF in more highly 1542 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE constrained regions is because SNP-induced reduction in map- the depletion of TEs in exonic regions as being due to the ping quality increases false negative TE detection rates in direct effects of TE insertion (Petrov et al. 2011; Koﬂeretal. regions with lower constraint, although we are unaware of 2012) and the same logic should hold for depletion of TEs in any evidence supporting this possibility. It is also possible that CNEs. However, the similarity of DAF spectra in different ge- our analysis lacks power to detect a real skew toward rare nomic compartments is consistent with the remainder of TE alleles in the DAF for TE insertions in exons and CNEs. Previous insertions that are not eliminated from functional elements results studying TE insertions in D. melanogaster exons using being governed by a number of evolutionary mechanisms. pool-seq data showed a reduction in median allele frequen- Polymorphic TE insertions could be at similar allele frequencies cies relative to those found in intergenic regions (Koﬂer et al. in different compartments simply because they inserted at 2012), however exonic TE insertions studied using pool-PCR similar distributions of times in the past (Bergman and suggested their allele frequencies did not differ substantially Bensasson 2007; Koﬂeretal. 2012; Blumenstiel et al. from nonexonic TE insertions with similar genomic properties 2014). Alternatively, the similar DAF spectra of polymorphic (Lipatov et al. 2005). Future studies may reveal whether these TE insertions in different genomic compartments could reﬂect discrepancies are related to differences in methodology or similar distributions of selective effects that are independent truly reﬂect similarity in TE insertion allele frequencies across of the precise location of a TE insertion, which might be compartments. If clear differences can be identiﬁed in the expected if the deleterious effects of TE insertion are caused frequency of TE insertions in exons and CNEs relative to inter- by ectopic exchange events (Petrov et al. 2011; Koﬂeretal. genic spacers, it would be interesting to estimate the strength 2012) or local epigenetic silencing spreading from TE inser- of purifying selection acting on TE insertions in these compart- tions (Lee 2015; Lee and Karpen 2017). While our work does ments (Keightley and Eyre-Walker 2007). not resolve these widely debated alternatives, it does reveal Importantly, we observed systematic differences in the DAF that the selective effects of TE insertion on conserved ele- spectrum across different nonreference TE insertion data sets, ments in noncoding DNA should be factored into future mod- which has not been discussed sufﬁciently as an issue in pop- els explaining TE evolution in D. melanogaster and other ulation genomic analysis of TE evolution. Speciﬁcally, we ﬁnd species. that the DAF for ngs_te_mapper is skewed more toward lower frequencies that the DAF for TEMP (ﬁg. 3A vs B). We Supplementary Material do not interpret this difference among data sets to result from Supplementary data are available at Genome Biology and lower positional accuracy of ngs_te_mapper relative to TEMP Evolution online. artiﬁcially splitting alleles from the same insertion site into several different insertion sites each at lower allele frequency, since both data sets use split-read information. Rather it is more likely this difference in DAF among data sets results Acknowledgments from the higher false negative rate for ngs_te_mapper The authors would like to thank Raquel Linheiro, Michael (58% on simulated data; Nelson et al. 2017) relative to Nelson, Florence Gutzwiller, and Mar Marzo Llorca for their TEMP (10% on simulated data; Nelson et al. 2017). This ob- valuable suggestions throughout this project; and members of servation cautions against naive use of allele frequency data the Bergman, Dyer, Hall, and White Labs and two anonymous from short-read TE insertion detection methods to test pre- reviewers for comments on the article. This work was funded dictions of population genetic models, since the precise shape by the Life Science and Environment Research Institute and of the frequency spectrum may be determined by false neg- the Center of Excellence for Genomics (grant 20-0078), King ative rates of TE detection methods rather than any particular Abdulaziz City for Science and Technology, and the Georgia evolutionary force (Emerson et al. 2008). This result also moti- Research Foundation. vates more advanced methods to estimate the TE frequency spectra that incorporate false negative detection rates, similar to methods for estimating the frequency spectrum of SNPs Author Contributions that incorporate false positive rates due to sequencing error C.M.B. conceived and designed the experiments; M.M.M., (Kim et al. 2011; Nielsen et al. 2012). J.J., and C.M.B. carried out the experiments; M.M.M. and Our twin ﬁndings of depletion of TEs in functional elements C.M.B. analyzed the data; M.M.M. and C.M.B. wrote the like exonic regions and CNEs coupled with a lack of a skew article. All authors reviewed the article. toward rarer alleles in these regions suggests that the selective mechanism controlling location of TEs in the D. melanogaster genome may be decoupled from the forces governing allele Literature Cited frequencies of polymorphic alleles (Petrov et al. 2011). Among Adrion JR, Song MJ, Schrider DR, Hahn MW, Schaack S. 2017. Genome- competing theories for selective forces acting on TE insertions wide estimates of transposable element insertion and deletion rates in (Nuzhdin 1999; Lee and Langley 2010), it is easiest to interpret Drosophila melanogaster. Genome Biol Evol. 9(5):1329–1340. Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1543 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Manee et al. GBE Aquadro CF, Desse SF, Bland MM, Langley CH, Laurie-Ahlberg CC. 1986. Delpuech JM, Aquadro CF, Roush RT. 1993. Noninvolvement of the long Molecular population genetics of the alcohol dehydrogenase gene terminal repeat of transposable element 17.6 in insecticide resistance region of Drosophila melanogaster. Genetics 114:1165–1190. in Drosophila. Proc Natl Acad Sci U S A. 90(12):5643–5647. Aquadro CF, Jennings RM, Bland MM, Laurie CC, Langley CH. 1992. Drake JA, et al. 2006. Conserved noncoding sequences are selectively Patterns of naturally occurring restriction map variation, dopa decar- constrained and not mutation cold spots. Nat Genet. 38(2):223–227. boxylase activity variation and linkage disequilibrium in the Ddc gene Elliott TA, Gregory TR. 2015. What’s in a genome? The C-value enigma region of Drosophila melanogaster. Genetics 132:443–452. and the evolution of eukaryotic genome content. Philos Trans R Soc B Arnold CD, et al. 2013. Genome-wide quantitative enhancer activity maps 370(1678):20140331. identiﬁed by STARR-seq. Science 339(6123):1074–1077. Emberly E, Rajewsky N, Siggia ED. 2003. Conservation of regulatory ele- Ashburner M, Golic KG, Hawley RS. 2005 Drosophila: a laboratory hand- ments between two species of Drosophila. BMC Bioinformatics 4:57. Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural se- book. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press. Barron MG, Fiston-Lavier A-S, Petrov DA, Gonzalez J. 2014. Population lection shapes genome-wide patterns of copy-number polymorphism genomics of transposable elements in Drosophila. Annu Rev Genet. in Drosophila melanogaster. Science 320(5883):1629–1631. 48:561–581. Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the Bartolome C, Maside X, Charlesworth B. 2002. On the abundance and human genome. Genetics 158(3):1227–1234. distribution of transposable elements in the genome of Drosophila Geyer P, Green M, Corces V. 1990. Tissue-speciﬁc transcriptional melanogaster. Mol Biol Evol. 19(6):926–937. enhancers may act in trans on the gene located in the homologous Bellen HJ, et al. 2004. The BDGP gene disruption project: single transposon chromosome: the molecular basis of transvection in Drosophila. EMBO insertions associated with 40% of Drosophila genes. Genetics J. 9:2247–2256. 167(2):761–781. Gramates LS, et al 2017. FlyBase at 25: looking to the future. Nucleic Acids Bergman CM. 2012. A proposal for the reference-based annotation of Res. 45(D1):D663–D671. de novo transposable element insertions. Mob Genet Elem. Haddrill PR, Halligan DL, Tomaras D, Charlesworth B. 2007. Reduced ef- 2(1):51–54. ﬁcacy of selection in regions of the Drosophila genome that lack cross- Bergman CM, Bensasson D. 2007. Recent LTR retrotransposon insertion ing over. Genome Biol. 8(2):R18. contrasts with waves of non-LTR insertion since speciation in Huang W, et al. 2014. Natural variation in genome architecture among Drosophila melanogaster. Proc Natl Acad Sci U S A. 205 Drosophila melanogaster Genetic Reference Panel lines. Genome 104(27):11340–11345. Res. 24(7):1193–1208. Bergman CM, Kreitman M. 2001. Analysis of conserved noncoding DNA Kaminker JS, et al. 2002. The transposable elements of the Drosophila in Drosophila reveals similar constraints in intergenic and intronic melanogaster euchromatin: a genomics perspective. Genome Biol. sequences. Genome Res. 11(8):1335–1345. 3(12):research0084.1. Bergman CM, et al. 2002. Assessing the impact of comparative genomic Keightley PD, Eyre-Walker A. 2007. Joint inference of the distribution of sequence data on the functional annotation of the Drosophila ge- ﬁtness effects of deleterious mutations and population demography nome. Genome Biol. 3(12):research0086.1. based on nucleotide polymorphism frequencies. Genetics Bergman CM, Quesneville H, Anxolabehere D, Ashburner M. 2006. 177(4):2251–2261. Recurrent insertion and duplication generate networks of transposable Kim SY, et al. 2011. Estimation of allele frequency and association map- element sequences in the Drosophila melanogaster genome. Genome ping using next-generation sequencing data. BMC Bioinformatics Biol. 7:R112. 12(1):231. Blumenstiel JP, Chen X, He M, Bergman CM. 2014. An age-of-allele test of Koﬂer R, Betancourt AJ, Schlotterer C. 2012. Sequencing of pooled DNA neutrality for transposable element insertions. Genetics samples (pool-seq) uncovers complex dynamics of transposable ele- 196(2):523–538. ment insertions in Drosophila melanogaster. PLoS Genet. Brody T, et al. 2012. Use of a Drosophila genome-wide conserved se- 8(1):e1002487. quence database to identify functionally related cis-regulatory Koﬂer R, Hill T, Nolte V, Betancourt AJ, Schlo ¨ tterer C. 2015. The recent enhancers. Dev Dyn. 241(1):169–189. invasion of natural Drosophila simulans populations by the P-element. Casillas S, Barbadilla A, Bergman CM. 2007. Purifying selection maintains Proc Natl Acad Sci U S A. 112(21):6659–6663. highly conserved noncoding sequences in Drosophila. Mol Biol Evol. Kuhn RM, Haussler D, Kent WJ. 2013. The UCSC genome browser and 24(10):2222–2234. associated tools. Brief Bioinform. 14(2):144–161. Caspi A, Pachter L. 2005. Identiﬁcation of transposable elements using Lack JB, et al. 2015. The Drosophila genome nexus: a population genomic multiple alignments of related genomes. Genome Res. resource of 623 Drosophila melanogaster genomes, including 197 16(2):260–270. from a single ancestral range population. Genetics 199(4):1229–1241. Charlesworth B, Langley CH. 1989. The population genetics of Drosophila Langley CH, Aquadro CF. 1987. Restriction-map variation in natural pop- transposable elements. Annu Rev Genet. 23:251–287. ulations of Drosophila melanogaster: white-locus region. Mol Biol Evol. Chung H, et al. 2006. Cis-regulatory elements in the Accord retrotrans- 4:651–663. poson result in tissue-speciﬁc expression of the Drosophila mela- Langley CH, et al. 1988. Naturally occurring variation in the restriction map nogaster insecticide resistance gene Cyp6g1. Genetics of the Amy region of Drosophila melanogaster. Genetics 175(3):1071–1077. 119:619–629. Cridland JM, Macdonald SJ, Long AD, Thornton KR. 2013. Abundance Lee YCG. 2015. The role of piRNA-mediated epigenetic silencing in the and distribution of transposable elements in two Drosophila QTL map- population dynamics of transposable elements in Drosophila mela- ping resources. Mol Biol Evol. 30(10):2311–2327. nogaster. PLoS Genet. 11(6):e1005269. Cridland JM, Thornton KR, Long AD. 2015. Gene expression variation in Lee YCG, Karpen GH. 2017. Pervasive epigenetic effects of Drosophila eu- Drosophila melanogaster due to rare transposable element insertion chromatic transposable elements impact their evolution. Elife 6:e25762. alleles of large effect. Genetics 199(1):85–93. Lee YCG, Langley CH. 2010. Transposable elements in natural populations Daborn PJ, et al. 2002. A single p450 allele associated with insecticide of Drosophila melanogaster. Philos Trans R Soc Lond B Biol Sci. resistance in Drosophila. Science 297(5590):2253–2256. 365(1544):1219–1228. 1544 Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Conserved Noncoding Elements Inﬂuence the Transposable Element Landscape in Drosophila GBE Lerman DN, Feder ME. 2005. Naturally occurring transposable elements Schaeffer SW, Aquadro CF, Langley CH. 1988. Restriction-map variation in disrupt hsp70 promoter function in Drosophila melanogaster. Mol Biol the Notch region of Drosophila melanogaster. Mol Biol Evol. Evol. 22(3):776–783. 5(1):30–40. Linheiro RS, Bergman CM. 2012. Whole genome resequencing reveals Siepel A, et al. 2005. Evolutionarily conserved elements in vertebrate, in- natural target site preferences of transposable elements in sect, worm, and yeast genomes. Genome Res. 15(8):1034–1050. Drosophila melanogaster. PLoS One 7(2):e30008. Singh ND, Arndt PF, Petrov DA. 2005. Genomic heterogeneity of back- Lipatov M, Lenkov K, Petrov DA, Bergman CM. 2005. Paucity of chimeric ground substitutional patterns in Drosophila melanogaster. Genetics gene-transposable element transcripts in the Drosophila melanogaster 169(2):709–722. genome. BMC Biol. 3:24. Spradling AC, et al. 1995. Gene disruptions using P transposable elements: Mackay TFC, et al. 2012. The Drosophila melanogaster genetic reference an integral component of the Drosophila genome project. Proc Natl panel. Nature 482(7384):173–178. Acad Sci U S A. 92(24):10824–10830. Makunin IV, Shloma VV, Stephen SJ, Pheasant M, Belyakin SN. 2013. Steele LD, et al. 2015. Selective sweep analysis in the genomes of the 91-R Comparison of ultra-conserved elements in Drosophilids and verte- and 91-C Drosophila melanogaster strains reveals few of the usual brates. PLoS One 8(12):e82362. suspects in dichlorodiphenyltrichloroethane (DDT) resistance. PLoS Negre N, et al. 2011. A cis-regulatory map of the Drosophila genome. One 10(3):e0123066. Nature 471(7339):527–531. Tyner C, et al. 2017. The UCSC Genome Browser database: 2017 update. Nelson MG, Linheiro RS, Bergman CM. 2017. McClintock: an integrated Nucleic Acids Res. 45:D626–D634. pipeline for detecting transposable element insertions in whole- Wan H, et al. 2014. Nrf2/Maf-binding-site-containing functional Cyp6a2 genome shotgun sequencing data. G3 (Bethesda) 7:2749–2762. allele is associated with DDT resistance in Drosophila melanogaster. Nielsen R, Korneliussen T, Albrechtsen A, Li Y, Wang J. 2012. SNP calling, Pest Manag Sci. 70(7):1048–1058. genotype calling, and sample allele frequency estimation from new- Wang J, Keightley PD, Halligan DL. 2007. Effect of divergence time and generation sequencing data. PLoS One 7(7):e37558. recombination rate on molecular evolution of Drosophila INE-1 trans- Nuzhdin SV. 1999. Sure facts, speculations, and open questions about the posable elements and other candidates for neutrally evolving sites. J evolution of transposable element copy number. Genetica 107(1–3):129. Mol Evol. 65(6):627. Petrov DA, Fiston-Lavier A-S, Lipatov M, Lenkov K, Gonzalez J. 2011. Waters LC, Zelhof AC, Shaw BJ, Ch’ang LY. 1992. Possible involvement of Population genomics of transposable elements in Drosophila mela- the long terminal repeat of transposable element 17.6 in regulating nogaster. Mol Biol Evol. 28(5):1633–1644. expression of an insecticide resistance-associated P450 gene in Presgraves DC. 2005. Recombination enhances protein adaptation in Drosophila. Proc Natl Acad Sci U S A. 89(11):4855–4859. Drosophila melanogaster. Curr Biol. 15(18):1651–1656. Zhang Y, Romanish MT, Mager DL. 2011. Distributions of transposable Quinlan AR, Hall IM. 2010. BEDTools: a ﬂexible suite of utilities for com- elements reveal hazardous zones in mammalian introns. PLoS Comput paring genomic features. Bioinformatics 26(6):841–842. Biol. 7(5):e1002046. Rahman R, et al. 2015. Unique transposon landscapes are pervasive across Zhu L, et al. 2009. Patterns of exon-intron architecture variation of genes in Drosophila melanogaster genomes. Nucleic Acids Res. eukaryotic genomes. BMC Genomics 10:47. 43(22):10655–10672. Zhuang J, Wang J, Theurkauf W, Weng Z. 2014. TEMP: a computational Rishishwar L, Mario-Ramrez L, Jordan IK. 2017. Benchmarking computa- method for analyzing transposable element polymorphism in popula- tional tools for polymorphic transposable element detection. Brief tions. Nucleic Acids Res. 42(11):6826–6838. Bioinformatics 18:908–918. Sackton TB, et al. 2010. Population genomic inferences from sparse high- throughput sequencing of two populations of Drosophila mela- nogaster. Genome Biol Evol. 1(0):449–465. Associate editor:Esther Betran Genome Biol. Evol. 10(6):1533–1545 doi:10.1093/gbe/evy104 Advance Access publication May 29, 2018 1545 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1533/5020726 by Ed 'DeepDyve' Gillespie user on 20 June 2018
Genome Biology and Evolution – Oxford University Press
Published: May 29, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera