A major locus controls local adaptation and adaptive life history variation in a perennial plant

A major locus controls local adaptation and adaptive life history variation in a perennial plant Background: The initiation of growth cessation and dormancy represent critical life-history trade-offs between survival and growth and have important fitness effects in perennial plants. Such adaptive life-history traits often show strong local adaptation along environmental gradients but, despite their importance, the genetic architecture of these traits remains poorly understood. Results: We integrate whole genome re-sequencing with environmental and phenotypic data from common garden experiments to investigate the genomic basis of local adaptation across a latitudinal gradient in European aspen (Populus tremula). A single genomic region containing the PtFT2 gene mediates local adaptation in the timing of bud set and explains 65% of the observed genetic variation in bud set. This locus is the likely target of a recent selective sweep that originated right before or during colonization of northern Scandinavia following the last glaciation. Field and greenhouse experiments confirm that variation in PtFT2 gene expression affects the phenotypic variation in bud set that we observe in wild natural populations. Conclusions: Our results reveal a major effect locus that determines the timing of bud set and that has facilitated rapid adaptation to shorter growing seasons and colder climates in European aspen. The discovery of a single locus explaining a substantial fraction of the variation in a key life-history trait is remarkable, given that such traits are generally considered to be highly polygenic. These findings provide a dramatic illustration of how loci of large-effect for adaptive traits can arise and be maintained over large geographical scales in natural populations. Keywords: Populus tremula, Local adaptation, Genomic basis, PtFT2, Adaptive traits, Selective sweep Backgrounds investigating how local adaptation is established and Most species are distributed over heterogeneous envi- maintained at the molecular level in natural populations. ronments across their geographic range and spatially Many perennial plants, such as forest trees, have wide varying selection is known to induce adaptation to local geographic distributions and are consequently exposed environments [1]. Local adaptation thus provides an to a broad range of environmental conditions, making opportunity to study population genetic divergence in adaptation to diverse environmental and climate condi- action [2]. Although the interaction between gene flow tions crucial in these species [4–7]. Natural populations and natural selection is well studied from a theoretical of these plants are often locally adapted and display pro- point of view and makes a number of testable predic- nounced geographic clines in phenotypic traits related to tions [3], there are to date few empirical studies climatic adaptation even in the face of substantial gene flow [5, 6]. One of the most important traits mediating local adaptation is initiation of growth cessation at the end of the growing season, which represents a critical * Correspondence: jingwang368@gmail.com; par.ingvarsson@slu.se life history trade-off between survival and growth in Umeå Plant Science Centre, Department of Ecology and Environmental most perennial plants [8, 9]. Local adaptation in Science, Umeå University, 90187 Umeå, Sweden Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Wang et al. Genome Biology (2018) 19:72 Page 2 of 17 phenology traits, such as growth cessation, is well docu- according to latitude (r = 0.889, P < 0.001) but explaining mented at the phenotypic level in many long-lived per- only 1.3% of the total genetic variance (Fig. 1c; ennial species [2, 6]. Compared to traditional model and Additional file 2: Table S2). Consistent with this, a Man- crop species that are usually annuals, naturally inbred tel test also showed a weak pattern of isolation by dis- and have rich genomic resources available, the genomic tance (IBD; r = 0.210; P = 0.047; Additional file 3: Figure and evolutionary research in long-lived, outcrossing per- S1). Swedish populations of P. tremula have gone ennial species is much more difficult to conduct, and the through a recent admixture of divergent post-glacial lin- genetic architecture of adaptive traits in such species is eages following the Last Glacial Maximum (LGM) [14] therefore still rather poorly understood [5, 6]. and it is possible that this is capable of generating a Here we investigate the genomic signatures of local genome-wide pattern of clinal variation. However, exten- adaptation across a latitudinal gradient that limits the sive gene flow among populations of P. tremula, as sug- length of the growing season in European aspen (Popu- gested by the extremely low level of genome-wide lus tremula). P. tremula is a dioecious and obligately population genetic differentiation (mean F = 0.0021; ST outbreeding tree species; both seeds and pollen are Additional file 3: Figure S2), has almost eradicated any wind-dispersed and usually show weak population gen- such signal across the genome. etic structure [10, 11]. Despite low genetic differentiation at neutral molecular markers, local populations display Identifying genomic variants associated with local strong adaptive differentiation in phenology traits, such adaptation as the timing of bud set and growth cessation, across the We used three complementary approaches to identify latitudinal gradient [10]. In this study, we integrate candidate SNPs involved in local adaptation. First, we whole genome re-sequencing with field and greenhouse identified SNPs that were most strongly associated with experiments to characterize the genome-wide architec- the observed population structure using PCAdapt [15]. ture of local adaptation in P. tremula. Using a combin- Second, we identified SNPs showing strong associations ation of approaches, we identify a single genomic region, with environmental variables based on a latent factor centered on a P. tremula homolog of FLOWERING mixed-effect model (LFMM) [16]. Finally, we performed LOCUS T2(PtFT2), that controls a substantial fraction genome-wide association mapping (GWAS) on the tim- of the naturally occurring genetic variation in the timing ing of bud set, our target adaptive trait, using GEMMA of bud set. The region displays multiple signs of a recent (Fig. 2a,[16, 17]). SNPs identified as significant (false selective sweep that appears to have been restricted to discovery rate [FDR] < 0.05) by the three methods the northern-most populations. Our results provide evi- showed a large degree of overlap (Additional file 3: dence of a major locus that has facilitated rapid adapta- Figure S3) and for subsequent analyses we consider tion to shorter growing seasons and colder climates SNPs that were identified as significant by at least two of following post-glacial colonization. the three methods to be involved in local adaptation. In total, 99.2% of the 910 SNPs identified by all three Results methods and 89.1% of the additional 705 SNPs identified Genome sequencing, polymorphism detection, and by two methods were located in a single region spanning population structure c. 700 kbp on chromosome 10 (Fig. 2a, b; Additional file In this study, we used a total of 94 unrelated P. tremula 3: Figure S4; Additional file 4: Table S3). trees that were originally collected from 12 sites spanning SNPs associated with local adaptation displayed strong c. 10° of latitude (~ 56–66 °N) across Sweden (the SwAsp clinal patterns in allele frequencies with latitude, in stark collection from [12], see also Additional file 1: Table S1). contrast to 10,000 SNPs randomly selected from across Earlier studies have shown that the SwAsp collection dis- the genome that displayed no or negligible differences plays a strong latitudinal cline in the timing of bud set among populations (Additional file 3: Figure S5). The (Fig. 1a, b)[10–12]. We performed whole genome 700-kbp region on chromosome 10 encompasses 92 re-sequencing of all 94 aspens and obtained a total of genes and the most strongly associated variants for all 1139.2 Gb of sequence, with an average sequencing depth three tests are located in a region containing two P. tre- of ~ 30 × per individual covering > 88% of the reference mula homologs of the Arabidopsis FLOWERING genome (Additional file 1: Table S1). After stringent vari- LOCUS T (PtFT2; Potra001246g10694 and an unanno- ant calling and filtering, we identified a total of 4,425,109 tated copy located c. 20 kbp upstream of PtFT2, tenta- high-quality single nucleotide polymorphisms (SNPs) with tively named PtFT2β) (Fig. 2b, c). FT is known to be a minor allele frequency (MAF) > 5%. involved in controlling seasonal phenology in perennial We found very weak population structure across the plants [18] and has previously been implicated in regu- entire range using principal component analysis (PCA) lating short-day induced growth cessation, bud set, and [13], with a single significant axis separating individuals dormancy induction in Populus [19, 20]. Wang et al. Genome Biology (2018) 19:72 Page 3 of 17 a b Pop11 Pop12 Pop9 Pop10 56 58 60 62 64 66 Pop7 Pop8 Latitude 0.2 Pop6 Days Pop5 0.1 0.0 Pop3 Pop4 −0.1 −0.2 Pop1 −0.3 Pop2 −0.2 −0.1 0.0 0.1 0.2 PC1 (1.31%) Fig. 1 Geographic distribution and genetic structure of 94 aspen individuals. a Location of the 12 original sample sites of the SwAsp collection (circles) and the location of the two common garden sites (orange stars). The original collection sites span a latitudinal gradient of c. 10 latitude degrees across Sweden. b Genetic values for date of bud set for the 94 individuals included in the study across the two common gardens and three years (2005, 2006, and 2007). c Population structure in the SwAsp collection based on a PCA of 217,489 SNPs that were pruned to remove 2 −12 SNPs in high linkage disequilibrium (SNPs included all have r < 0.2). Although two axes are shown, only the first axis is significant (P =3.65×10 , Tacey-Widom test, 1.31% variance explained) We observed that structure of the PtFT2 locus is many SNPs that could individually contribute to bud set conserved across Populus species, but not between Popu- and hence could be involved in local adaptation. lus and Salix (Additional file 3: Figure S6). Although both copies of PtFT2 appear to be expressed (Additional file 3: Evidence of rapid adaptive evolution Figure S7), the SNP showing the strongest signal of local In order to gain further insight into the evolutionary history adaptation across all three methods (Potra001246:25256) of the PtFT2 locus, we performed several haplotype-based was located in the third intron of the previously annotated tests toexamine thepresenceofrecentpositiveselection in copy of PtFT2 (Potra001246g10694) (Fig. 2c). This SNP this region. We calculated the standardized integrated explain 65% of the observed genetic variation in the haplotype score (iHS) [21, 22] for all SNPs (8570 SNPs timing of bud set across years and sites. Furthermore, it where information of ancestral or derived states was avail- was identified as having highest probability of being the able) located in the 700-kbp region (Fig. 3a). Positive selec- causal variant within the 700-kbp region by CAVIAR [21] tion signals, revealed by |iHS| > 2.0, were observed for (Fig. 2b, c), a fine-mapping method that accounts for 20.6% of all tested SNPs. We found that the region sur- linkage disequilibrium (LD) and effect sizes to rank poten- rounding PtFT2 contained the highest concentration of sig- tial causal variants. Another potentially causal SNP nificant hits by the iHS test across the genome (Fig. 3b), (Potra001246:43095) in this region is in strong LD with confirming that PtFT2 locus as the strongest candidate for Potra001246:25256 (Fig. 2c). Therefore, we identify PtFT2 positive selection in the Swedish populations of P. tremula. as a candidate gene, and henceforth, we refer to the entire Similar results were found when the number of segregating ~ 700-kbp region centered on PtFT2 as the PtFT2 locus. sites by length (nSL) [23], which has proven sensitive for We note, however, that this region potentially harbors detecting incomplete selective sweeps, was calculated for PC2 (1.21%) Bud set (days) Wang et al. Genome Biology (2018) 19:72 Page 4 of 17 bc Fig. 2 Local adaptation signals across the genome. a Manhattan plots for SNPs associated with population structure (PCAdapt), climate variation (LFMM), and phenotype (GEMMA). The 700-kbp region surrounding PtFT2 gene (marked in red) is identified by all methods. The dashed line represents the significance threshold for each method. Quantile-quantile plot is displayed in the right panel,withsignificant SNPs highlighted in red. b Magnification of the phenotype association results (from GEMMA) for the region surrounding PtFT2 on Chr10. The coordinates correspond to the region 16.3 Mbp-17.0 Mbp on Chr10. Individual data points are colored according to LD with the most strongly associated SNP (Potra001246:25256). The two potential causal variants identified by CAVIAR within this region are marked by black circles. c Close-up view of the phenotype association results (from GEMMA) in a region corresponding to the blue bar in (b). This region contains the two PtFT2 homologs (red - exons, blue - UTRs) and several other genes (dark gray - exons, light grey -UTRs) these same loci (Additional file 3:FigureS8).Wefurther allele (Fig. 3e). Notably, the derived allele with high EHH is performed the extended haplotype homozygosity (EHH) largely restricted to the four high-latitude populations and test [24], centering on the most strongly associated SNP almost absent in the southern-most populations (Fig. 3c), (Potra001246:25256), to explore the extent of haplotype implying that PtFT2 locus has likely been subjected to geo- homozygosity around the selected region. The core haplo- graphically restricted selective sweeps [25]. type carrying the derived allele (G) had elevated EHH and To further understand the evolution of functional dif- exhibited long-range LD relative to haplotypes carrying the ferences between northern and southern PtFT2 alleles, ancestral allele (T) (Fig. 3d). Also, haplotypes carrying the we examined the patterns of genetic variation at the derived allele were longer than those carrying the ancestral PtFT2 locus separately for South (pop 1–6), Mid (pop Wang et al. Genome Biology (2018) 19:72 Page 5 of 17 a bc de f Fig. 3 Evidence of positive selection centered on the PtFT2 locus. a Patterns of normalized iHS scores (y-axes) across the ~ 700-kbp genomic region (x-axis)around the PtFT2 gene (vertical light gray bar). The dashed horizontal lines indicate the threshold of positive selection signal (|iHS| > 2). The red dot indicates the SNP (Potra001246:25256) showing the strongest signal of local adaptation. b A high concentration of significant |iHS| signals was found in the ~ 700-kbp region surrounding PtFT2 (marked as red line) compared to the genome-wide distribution (based on dividing the genome into non-overlapping windows of 700 kbp). The dashed lines represent the 95% and 99% quantiles, respectively. c Allele frequencies of the most strongly associated SNP Potra001246:25256 for the 12 original populations of the SwAsp collection. d The decay of extended haplotype homozygosity (EHH) of the derived (blue) and ancestral (red) alleles for the SNP Potra001246:25256. e The extent of the three most common haplotypes at Potra001246:25256. Rare recombinant haplotypes were pooled and are displayed in gray. f Joint inference of allele age and selection coefficient for the region surrounding PtFT2 7–8), and North (pop 9–12) populations. First, we found populations (Additional file 3: Figure S9). Finally, we per- that the nucleotide diversity at the PtFT2 locus was sig- formed a composite-likelihood based (CLR) test and sep- nificantly below the genome-wide averages in all groups of arately evaluated the evidence of positive selection in populations (Fig. 4a, b;Additional file 5: Table S4), which different groups of populations. As expected for positive was consistent with the expectation of a strong selective selection, a distorted site frequency spectrum with an ex- event [26]. In particular, northern populations were cess of rare and high frequency derived variants near the observed to have a much stronger reduction of genetic di- PtFT2 locus was only found in northern populations (Fig. versity relative to other populations (Fig. 4a, b). Addition- 4i, j;Additional file 5: Table S4). Overall, all these findings ally, the level of genetic differentiation among populations provide compelling evidence for the occurrence of a was exceptionally high at PtFT2 locus compared with gen- strong selection on a single variant at the PtFT2 locus in omic background, especially between southern and north- the northern-most Swedish populations of P. tremula. ern populations (Fig. 4c, d;Additional file 5: Table S4), The observation of a single adaptive haplotype rising implying that spatially varying selection has likely driven to high frequency in high-latitude populations (Fig. 4; latitudinal differentiation at this locus. Furthermore, high Additional file 3: Figure S9) is consistent with a selective H12 but low H2/H1 statistics [27] was only observed in sweep pattern, where adaptation can result either from a northern populations (Fig. 4e–h; Additional file 5:Table de novo mutation or from a low frequency standing vari- S4), providing a clear indication of a single adaptive haplo- ant that was already present in the population before the type that has risen to high frequency among these onset of selection [28]. Assuming the causal mutation Wang et al. Genome Biology (2018) 19:72 Page 6 of 17 Genome-wide Chr10-700kb ab 0.035 0.035 *** *** *** North 0.030 0.030 Mid 0.025 0.025 South 0.020 0.020 π π 0.015 0.015 0.010 0.010 0.005 0.005 0.000 0.000 North Mid South 0.5 0.5 *** *** *** N vs. S 0.4 0.4 N vs. M 0.3 0.3 S vs. M F F ST ST 0.2 0.2 0.1 0.1 0.0 0.0 N vs. S N vs. M S vs. M ef 0.8 1.0 n.s. *** *** North 0.8 0.6 Mid 0.6 South 0.4 H12 H12 0.4 0.2 0.2 0.0 0.0 North Mid South n.s. *** *** 1.0 1.0 0.8 0.8 0.6 0.6 H2/H1 H2/H1 0.4 0.4 North 0.2 0.2 Mid South 0.0 0.0 North Mid South n.s. n.s. *** North Mid CLR South CLR 0 100 200 300 400 500 600 700 North Mid South Position Fig. 4 Geographically restricted selective sweep in northern-most populations. Left panels: A magnified view of different summary statistics that are sensitive to the effects of a selective sweep for the ~ 700 kbp region surrounding PtFT2. The gray bar marks the location of the PtFT2 gene. Right panels: Comparison of these statistics between the PtFT2 region (colored boxplot) and the genome-wide averages (gray boxplot). Statistics were calculated separately for individuals from southern (population 1–6), middle (populations 7–8), and northern (populations 9–12) in Sweden. a, b Nucleotide diversity, π. c, d Genetic differentiation, F . e, f H12. g, h H2/H1. i, j Composite likelihood ratio (CLR) test for the presence of a ST selective sweep appeared near the time of the onset of selection, we used 95% credible interval = 719–114,122 years) and that an Approximate Bayesian Computation (ABC) method selection during the sweep has been relatively strong (s [29] to estimate jointly the age and strength of selection = 0.016, 95% credible interval = 0.006–0.192). This sug- acting on the northern allele. The results (Fig. 3f) point gests that the adaptive event that occurred in to a recent origin of the northern allele (T = 18,952 years, northern-most populations of P. tremula most likely Wang et al. Genome Biology (2018) 19:72 Page 7 of 17 represents an evolutionary response to the harsher of PtFT2 expression is involved in mediating local adap- environmental conditions experienced by these popula- tation, we selected two southern genotypes and two tions during the post-glacial colonization of northern northern genotypes for greenhouse and field experi- Scandinavia. ments in order to test whether PtFT2 expression regu- lates the timing of growth cessation and bud set. In greenhouse experiments, we found that the two north- PtFT2 regulates the timing of bud set ern genotypes showed rapid growth cessation and bud Although the extensive LD in the immediate vicinity of set following a shift from long (23-h day length) to short the PtFT2 locus (Fig. 2b) makes it hard to identify the day (19-h day length) conditions whereas the two south- true causal SNP(s) that are involved in mediating natural ern genotypes continued active growth under the same variation in bud set, we found that the significantly asso- conditions (Fig. 5a). Analyses of PtFT2 gene expression ciated SNPs are overall enriched in non-coding regions in these genotypes show a strong downregulation of located in and around genes and show a deficit in inter- PtFT2 in the northern genotypes in conjunction with genic regions (Additional file 3: Figure S10; Additional growth cessation and bud set (Fig. 5b; Additional file 6: file 4: Table S3). One possible way that functional vari- Table S5). Similarly, under field conditions we observe ation is mediated by these SNPs is thus by altering ex- that northern genotypes also show lower expression of pression patterns of related genes across the latitudinal PtFT2 even at a time point when all genotypes were ac- gradient. To further assess the possibility that patterns tively growing (Fig. 5c). ab 200 Low latitude sample High latitude sample SwAsp018 SwAsp023 048 12 16 20 c 20 ZT SwAsp112 SwAsp100 12.15 16.15 20.15 00.15 04.15 08.15 12.15 Time Fig. 5 PtFT2 expression affects short-day induced growth cessation and bud set in P. tremula. a Bud set phenotype under 19-h day-length conditions. Two southern clones (marked with a red box, SwAsp 018, Ronneby, latitude 56.2 °N; SwAsp 023, Vårgårda, latitudes 58 °N) and two northern clones (marked with a blue box, SwAsp 100, Umeå, latitude 63.9 °N; SwAsp 112, Luleå, latitudes 65.7 °N) were chosen to be analyzed. Trees were grown under 23-h day length for one month and then shifted to 19-h day length. Photos were taken one month after the shift to 19-h day length. b Dynamic expression analysis of PtFT2 in twosouthernclones (red, SwAsp018 and SwAsp023) and two northern clones (blue, SwAsp100 and SwAsp112) from the greenhouse experiment. The genotypes of these trees at the most strongly associated PtFT2 SNP are SwAsp018: T/T, SwAsp023: T/T, SwAsp100: not available and SwAsp 112: G/G. Samples for RT-PCR were taken two weeks after the trees were shifted to 19-h day length. Error bars, ±standard deviation. ZT zeitgeber time. c Dynamic expression analysis of PtFT2 in two southern clones (red, SwAsp005 and SwAsp023) and two northern clones (blue, SwAsp100 and SwAsp116) from common garden experiment. The genotypes of these trees at the most strongly associated PtFT2SNP are SwAsp005: T/T, SwAsp023: T/T, SwAsp100: not available and SwAsp 112: G/G. Samples were collected in the Sävar common garden in early July 2014 PtFT2/PtUBQ PtFT2/PtUBQ Wang et al. Genome Biology (2018) 19:72 Page 8 of 17 Furthermore, downregulation of the PtFT2 expression gene that plays a central and widely conserved role in using RNA interference (RNAi) to approximately 20% of day-length perception and seasonal regulation of photo- wild-type levels accelerates bud set by c. 23 days, a periodic responses [32]. In Populus, the FT gene is repre- difference that is comparable to the differences we ob- sented by two functionally diverged paralogs where serve between the most extreme phenotypes in our PtFT1 has been speculated to retain the function of re- field-collected trees (Fig. 6). For instance, wild-collected productive initiation whereas PtFT2 acts to maintain trees carrying the derived G allele in homozygous form growth and prevent bud set [18, 19]. We observe that for the most strongly associated SNP in PtFT2 differences in PtFT2 gene expression between genotypes (Potra001246:25256) set bud on average 28 days earlier from southern and northern Swedish populations are as- than those homozygous for the ancestral T allele, with sociated with the timing of bud set in response to vari- the derived G allele showing partial dominance (Fig. 6a). able day lengths in different environments (Fig. 5b, c). The RNAi experiment thus provides additional evidence Transgenic downregulation of PtFT2, under field condi- that differences in gene expression of PtFT2 are involved tions, yields a phenotype that closely mimics variation in mediating the phenotypic differences we observe in found in our wild collected trees, further implying that bud set between northern and southern genotypes. non-coding regulatory variation in or around PtFT2 me- diate local adaptation in bud set by altering the level and Discussion timing of PtFT2 expression. Moreover, a study in the re- To date, only a small number of candidate genes have lated species Populus trichocarpa also identified an asso- been used to identify potential loci linked to traits in- ciation between a non-coding variant at PtFT2, a SNP in volved in local adaptation in P. tremula [11, 30, 31]. the second intron, and naturally occurring variation in Here we have substantially expanded our earlier studies bud set [20]. Although the exact causal mutations differ, by utilizing data from whole genome re-sequencing to this demonstrates that parallel adaptive changes in the local environmental variables and phenotypic variation timing of bud set between P. tremula and P. trichocarpa, in a key adaptive life-history trait in order to investigate two species that diverged more than 7 million years ago the genomic basis of local adaptation in P. tremula.We and that occur on different continents, has involved identify a locus, centered on PtFT2, that has a major ef- changes in the same orthologous gene. fect on phenotypic variation in bud set and that has While PtFT2 has been shown to contribute to local played a key role in the establishment of local adaptation adaptation in Swedish populations of P. tremula, we only of P. tremula. The likely target of the selective sweep, observe a signal of a strong and recent selective sweep PtFT2,isa P. tremula homolog of the Arabidopsis FT at this locus in the four northern-most populations. This n=50 a b n=98 n=17 n=18 220 240 n=26 Control PtFT 2 RNAi TT GT GG PtFT2 Genotype PtFT2 Genotype Fig. 6 Phenotypic effects of PtFT2. a The timing of bud set for the three genotypes classes at the PtFT2 SNP (Potra001246:25256) that displays the strongest signal of local adaptation identified by all three methods as shown in Fig. 2a. The plot displays mean genotype bud set after correcting for common garden site, year, and block effects. The horizontal line indicates the median value and the vertical line marks the interquartile range. The number of genotypes in the respective classes is indicated above the figure. b The timing of bud set for wild type control lines and transgenic PtFT2 lines in the field experiments at Våxtorp. The structure of the plots is the same as in (a) Bud set (days) Bud set (days) Wang et al. Genome Biology (2018) 19:72 Page 9 of 17 selective event has likely been driven by adaptation in adaptation most likely arose from a selective sweep in- response to the substantially shorter growing seasons stead of an inversion [39]. Nonetheless, owing to the that P. tremula has encountered at northern latitudes limited ability to detect inversions using short-insert during the post-glacial colonization of northern Scandi- paired reads, future characterization of structural vari- navia following the last glaciation. One caveat concern- ation across the genome is clearly required to determine ing the selective scans performed in this study is that whether genomic rearrangements are involved in medi- splitting populations into groups along a geographic ating signals of adaptation in the Populus genome. Sec- transect (i.e. latitude) could confound inference of the ond, the establishment probability of additional adaptive underlying selective and demographic forces. For in- mutations can be increased in the vicinity of a locus stance, it is possible that adaptation to spatially varying undergoing strong divergent selection, leading to a gen- selection in Swedish populations of P. tremula have omic architecture where multiple, tightly linked loci are arisen in response to continuous rather than discrete controlling an adaptive trait [39, 40]. However, recent environment clines [10]. In addition, the estimated theoretical work has shown that the conditions for such age of the adaptive mutation at the PtFT2 locus establishment of de novo linked beneficial mutations are coincides with recent post-glacial re-colonization of rather restrictive [41]. Instead, another potentially more northern Scandinavia and it is thus possible that important mechanism for the formation of “genomic strong genetic drift at the front of the range expan- islands” of strong genetic differentiation is via secondary sion have promoted surfing of the adaptive allele in contact and the erosion of pre-existing genetic diver- the newly colonized regions [33, 34]. gence, which is a process that can be very rapid, espe- The weak population genetic structure we observe in cially compared to the alternative scenario that involves our samples, combined with the fact that both pollen the fixation of novel mutations [41]. This mechanism and seeds are wind dispersed in P. tremula, suggest that provides a tantalizing hypothesis for P. tremula where gene flow among Swedish populations of P. tremula is earlier studies have established the existence of a hybrid likely relatively high. In accordance with recent theoret- zone between divergent post-glacial lineages in Scandi- ical predictions [3], our findings show that despite the navia [14, 41]. The selective sweep at PtFT2 is geograph- relatively high, inferred rates of gene flow, strong selec- ically restricted and likely occurred before secondary tion for local adaptation is acting to maintain the contact. Therefore, the large genomic “island” of diver- large-effect beneficial alleles that underlie the locally gence that we observe surrounding the PtFT2 locus is a adaptive traits. Compared to small-effect loci that are strong candidate for having evolved via erosion following prone to swamping and only transiently contribute to secondary contact. local adaptation [3, 35], large-effect loci are more likely to establish and persist over longer time scales as they Conclusions are able to resist the homogenizing effect of migration Our study identifies a single genomic region containing [3]. The distribution of number and effect size for vari- the PtFT2 gene that has a major effect on regulating the ants controlling adaptive traits is therefore expected to timing of bud set and that has facilitated rapid local shift to few large-effect loci under persistent adaptation in P. tremula across a latitudinal gradient in migration-selection balance [3] compared with models Sweden. Natural selection is actively maintaining alter- from isolated populations [36]. Multiple mechanisms nate alleles at this locus despite low genetic differenti- can give rise to the characteristic pattern in P. tremula ation across the rest of the genome. In particular, we where a single locus explains most of the variation for a identify a strong and recent selective sweep that is re- key life-history trait and facilitates rapid adaptation. stricted to the northern-most populations. This adapta- First, the presence of genomic rearrangements, such as tion has thus likely arisen and been driven to fixation chromosomal inversions, that suppress recombination during the post-glacial colonization of northern Scandi- can be favored by natural selection and cause the clus- navia in response to the substantially shorter growing tering of SNPs associated with local adaptation at the seasons that are characteristic of northern latitudes. PtFT2 locus [37, 38]. However, in contrast to expecta- Although the FT gene has repeatedly gone through du- tions from the presence of an inversion, we did not ob- plications and functional diversifications in many plants, serve blocks of elevated LD around the PtFT2 locus variation within and around these FT-like genes are in- (Additional file 3: Figure S11). LD in this region decays volved in mediating adaptive responses to photoperiod rapidly and falls to background levels within a few thou- changes and altering overall fitness in a wide range of sand bases, similar to what is seen in other regions plant species [42]. Given the central role of FT as a key genome-wide (Additional file 3: Figure S11a). This indi- integrator of diverse environmental signals [32], it is per- cates that frequent recombination has occurred in this haps not surprising that FT is more likely to act like an region and that the clustering of SNPs involved in local evolutionary hotspot for rapid adaptation to changing Wang et al. Genome Biology (2018) 19:72 Page 10 of 17 environmental conditions compared to other genes in deletions (indels) were globally realigned using the Realig- the photoperiodic pathway (Additional file 3:Figure nerTargetCreator and IndelRealigner in the Genome Ana- S12) and that these adaptations are mediated through lysis Toolkit (GATK v3.2.2) [53]. To minimize the cis-regulatory changes [43, 44]. FT thus appears to influence of mapping bias, we further discarded the fol- serve as evolutionary “master switch” for adaptive lowing site types: (1) sites with extremely low (< 400× life-history variation, similar to what have been seen across all samples, i.e. less than an average of 4× per sam- for a few other loci in plants, such as FLC [45], FRI ple) or extremely high coverage (> 4500×, or approxi- [46], and DOG1[47, 48]. mately twice the mean depth at variant sites) across all samples after investigating the coverage distribution em- Methods pirically; (2) sites with a high number of reads (> 200×, Sample collection and sequencing that is on average > 2 reads per sample) with mapping We collected material from all available trees in the score equaling zero; (3) sites located within repetitive se- Swedish Aspen (SwAsp), which consists of 116 individ- quences as identified using RepeatMasker [54]; (4) sites uals collected from 12 different locations spanning the that were in genomic scaffolds with a length < 2 kbp. distribution range in Sweden [12] (Fig. 1a). Leaf material was sampled from one clonal replicate of each individual SNP and genotype calling growing at a common garden experiment located in SNP calling in each sample was performed using the Sävar, northern Sweden. Total genomic DNA for each GATK HaplotypeCaller and GenotypeGVCFs were then individual was extracted from frozen leaf tissue using used to perform the multi-sample joint aggregation, the DNeasy plant mini prep kit (QIAGEN, Valencia, CA, re-genotyping, and re-annotation of the newly merged USA). Paired-end sequencing libraries with an average records among all samples. We performed several filter- insert size of 650 bp were constructed for all samples ing steps to minimize SNP calling bias and to retain only according to the Illumina manufacturer’s instructions. high-quality SNPs: (1) remove SNPs at sites not passing Whole genome sequencing and base calling were all previous filtering criteria; (2) retain only bi-allelic performed on the Illumina HiSeq 2000 platform for all SNPs with a distance of > 5 bp away from any indels; (3) individuals to a mean, per-sample depth of approxi- remove SNPs for which the available information de- mately 30× at the Science for Life Laboratory, rived from < 70% of the sampled individuals after treat- Stockholm, Sweden. ing genotypes with quality score (GQ) < 10 as missing; (4) remove SNPs with an excess of heterozygotes and Sequence quality checking, read mapping, and deviates from Hardy–Weinberg equilibrium test (P value post-mapping filtering < 1e-8). After all steps of filtering, a total of 4,425,109 A total of 103 SwAsp individuals were successfully se- SNPs with minor allele frequency > 5% were left for quenced. Before read mapping, we used Trimmomatic downstream analysis. Finally, the effect of each SNP was v0.30 [49] to identify reads with adapter contamination and annotated using SnpEff version 3.6 [55] based on gene to trim adapter sequences from reads. After checking the models from the P. tremula reference genome (available quality of the raw sequencing data using FastQC (https:// at http://popgenie.org); the most deleterious effect was www.bioinformatics.babraham.ac.uk/projects/fastqc/), the selected if multiple effects occurred for the same SNP quality of sequencing reads was found to drop towards the using a custom Perl script. ends of reads (Additional file 3: Figure S13). We therefore used Trimmomatic v0.30 to trim bases from both ends of Relatedness, population structure, and isolation by the reads if their qualities were < 20. Reads < 36 bases after distance trimming were discarded completely. To identify closely related individuals and to infer popu- After quality control, all high-quality reads were mapped lation structure among the sampled individuals, we dis- to a de novo assembly of the P. tremula genome (available carded SNPs with missing rate > 10%, MAF < 5%, and at http://popgenie.org;[50]) using the BWA-MEM algo- that failed the Hardy–Weinberg equilibrium test (P < −6 rithm with default parameters using bwa-0.7.10 [51]. We 1×10 ) after all filtering steps as shown above. We also used MarkDuplicates methods from the Picard packages generated LD-trimmed SNP sets by removing one SNP (http://broadinstitute.github.io/picard/) to correct for the from each pair of SNPs when the correlation coefficients artifacts of PCR duplication by only keeping one read or (r ) between SNPs exceed 0.2 in blocks of 50 SNPs using read-pair with the highest summed base quality among PLINK v1.9 [56]. This yielded 217,489 independent SNPs those of identical external coordinates and/or same insert that were retained for downstream analyses of popula- lengths. Alignments of all paired-end and single-end reads tion structure. First, we used PLINK v1.9 to estimate for each sample were then merged using SAMtools 0.1.19 identity-by-state (IBS) scores among pairs of all individ- [52]. Sequencing reads in the vicinity of insertions and uals. Nine individuals were excluded from further Wang et al. Genome Biology (2018) 19:72 Page 11 of 17 analyses due to their high pairwise genetic similarity 2002–2012 for the original sample coordinates of each with another sampled individual (IBS > 0.8), leaving a SwAsp individual and the average values over years were total of 94 “unrelated” individuals for all subsequent then calculated. The environmental variables include analyses (Additional file 3: Figure S14). Then, we used latitude, longitude, altitude, the number of days with the smartpca program in EIGENSOFT v5.0 [13] to per- temperatures > 5 °C, UV irradiance, the photosynthetic form the PCA on the reduced set of genome-wide inde- photon flux density (PPFD), sunshine duration, monthly pendent SNPs. A Tracey-Widom test, implemented in and annual average precipitation, and temperature. Due the program twstats in EIGENSOFT v5.0, was used to to the high degree of correlation among these environ- determine the significance level of the eigenvectors. mental variables (Additional file 3: Figure S16a), we per- Finally, IBD analysis was computed based on the pair- formed a PCA on these variables using the “prcomp” wise comparison of the genetic and geographic distances function in R to identify PCs that best summarized the between populations. We calculated the population dif- range of environmental variation. The first environmen- ferentiation coefficient (F )[57] for each pair of the 12 tal PC, which explained > 60% of the total variance ST populations using VCFtools v0.1.12b [58]. The relation- (Additional file 3: Figure S16b,c) and had the strongest ship between genetic distance measured as F /(1-F ) loadings for the length of growing season (Additional file ST ST and geographic distance (km) was evaluated using 3: Figure S16d), was kept to represent our target envir- Mantel tests in the R package “vegan” [59]; the signifi- onmental variable for further analyses. We then used a cance of the correlation was estimated based on 9999 latent factor mixed-effect model (LFMM) implemented permutations. in the package LEA in R [63] to investigate associations between SNPs and the first environmental PC while sim- Screening for SNPs associated with local adaptation ultaneously accounting for population structure by We used three conceptually different approaches to test introducing unobserved latent factors into the model for genome-wide signatures of local adaptation. First, we [16]. Due to the weak population structure found in the detected candidate SNPs involved in local adaptation SwAsp collection (see “Results”), we ran the LEA func- using the PCA as implemented in PCAdapt [60]. PCA- tion lfmm with the number of latent factors (K) in the dapt examines the correlations (measured as the squared range of 1–3, using 5000 iterations as burn-in followed loadings ρ , which is the squared correlation between by 10,000 iterations to compute LFMM parameters for jk the jth SNP and the kth principal component [PC]) be- all SNPs. This was performed five times for each value tween genetic variants and specific PCs without any of K; we observed identical results across both different prior definition of populations. As only the first PC was values of K and across independent runs within each significant from the PCA (see “Results”), we only esti- value of K (data not shown). We only showed the results mated the squared loadings ρ with PC1 to identify using K = 2 to account for the background population j1 SNPs involved in local adaptation. Our results showed structure. LFMM outliers were detected as those SNPs that most outlier SNPs that were highly correlated with with FDR < 0.05 after using the method of Storey and the first population structure PC also had high F Tibshirani [61] to account for multiple testing. ST values between populations (Additional file 3: Figure Third, we obtained previously published measure- S15). Assuming a chi-square distribution (degree of free- ments of the timing of bud set, which is a highly herit- dom = 1) for the squared loadings ρ , as suggested by able trait that shows strong adaptive differentiation j1 [60], we used PCAdapt to compute P values for all SNPs along the latitudinal gradient [31]. To measure pheno- and then calculated the FDR using the method of Storey typic traits, all SwAsp individuals have previously been and Tibshirani [61] to generate a list of candidate SNPs clonally replicated (four ramets per individual) and showing significant associations to population structure. planted at two common garden sites in 2004 (Sävar, 63 ° Only SNPs with FDR < 5% were retained as those signifi- N, and Ekebo, 56 °N) (Fig. 1a). The common garden cantly involved in local adaptation. set-up is described in detail in Luquez et al. [12]. The Second, we tested for the presence of candidate SNPs timing of bud set was scored twice weekly starting from that exhibited high correlations with environmental gra- mid-July and continuing until all trees had set terminal dients. To do this, a total of 39 environmental variables buds. Bud set measurements were scored in three con- were analyzed (Additional file 7: Table S6). Precipitation secutive years, 2005–2007, in both common gardens and temperature values were retrieved from WorldClim [10]. A severe drought in Sävar caused most of the trees version 1 [62]. Sunshine hours, photosynthetically active to set bud prematurely in 2006 and we therefore ex- radiation, and ultraviolet (UV) radiation were obtained cluded data from Sävar in 2006 in all downstream ana- using the STRÅNG data model at the Swedish Meteoro- lyses (see Ingvarsson et al. [31] for further discussion). logical and Hydrological Institute (SMHI) (http:// We combined data on bud set from the two common strang.smhi.se). Values were collected from the years garden sites and years by predicting genetic values with Wang et al. Genome Biology (2018) 19:72 Page 12 of 17 best linear unbiased prediction (BLUP) for all individ- BLAST algorithm (E-value cut-off of 1e-50) and, finally, uals. ASReml [64] was used to fit Eq. 1 to the data for > 99% of the SNPs (4,397,537 out of 4,425,109) were an- calculating BLUP using restricted maximum-likelihood chored on the 19 pseudochromosomes. techniques: To test for the accuracy of imputation, and its relationship with the MAF cutoff and the missing rate of genotypes in our dataset, we selected 346,821 SNPs with z ¼ μ þ s þ b þ y þ β þ ε ð1Þ ijklm i jiðÞ ijklm kiðÞ l a rate of missing genotypes < 10% from the pseudo-chromosome 2 (~ 32.6 Mb) for the simulation where z is the phenotype of the mth individual in ijklm analysis. We randomly masked out varying proportions the jth block in the kth year of the lth clone from the ith (5–50%) of SNPs, which were treated as missing. BEA- site. In Eq. 1, μ denotes the grand mean and ε is the ijklm GLE v 4.1 was then used to impute genotypes at the residual term. The clone (β , BLUP) and residual term masked positions. We found high imputation accuracy (ε ) were modeled as random effects, whereas the site ijklm (> 0.97) across a wide range of MAF when rates of miss- (s ), site/block (b ), and site/year (y ) were treated as i j(i) k(i) ing genotypes were < 30% (Additional file 3: Figure S17), fixed effects. The genetic value of each individual was suggesting imputation and phasing by BEAGLE should then used as the dependent trait in a univariate linear not bias the accuracy of our results. We therefore mixed model for SNP-trait association analyses per- phased and imputed genotypes of the SNPs anchored on formed with GEMMA [17]. This method takes related- pseudo-chromosomes using BEAGLE v 4.1. ness among samples into account through the use of a kinship matrix. The mixed model approach imple- Estimation of ancestral states for all SNPs mented in GEMMA has been shown to outperform Since the ancestral states of SNPs are usually used for se- methods that try to correct for population structure by lection detection, for each SNP, we classified alleles as ei- including it as a fixed effect in the GWAS analyses [65]. ther ancestral or derived on the basis of comparisons with Given the extremely weak population structure we two outgroup species: P. tremuloides and P. trichocarpa. observe in our GWAS population (see “Results”), we did We obtained publicly available short read Illumina data not pursue any further corrections for population struc- for one P. tremuloides (SRA ID: SRR2749867) and one P. ture in the association analyses as this likely would trichocarpa (SRA ID: SRR1571343) individual from the severely reduce our power to detect significant associa- NCBI Sequence Read Archive (SRA) [69]. We individually tions. As described previously, we used a FDR < 5% [61] aligned the reads from these two samples to the de novo to control for the multiple testing across the 4,425,109 P. tremula assembly (Potra v1.1, available at PopGen SNPs. We calculated the proportion of variance in IE.org) and used UnifiedGenotyper in GATK to call SNPs phenotype explained by a given SNP (PVE) using the at all sites (−-output_mode EMIT_ALL_SITES). For each method of Shim et al. [66]: SNP, two procedures were performed to define their an- cestral states: (1) because P. trichocarpa is more distantly 2β MAFðÞ 1−MAF PVE ¼ 2 related to P. tremula compared to P. tremuloides [70]and ^ ^ 2β MAFðÞ 1−MAFþ se β 2NMAFðÞ 1−MAF from our previous study there were < 1% polymorphic sites shared between P. tremula and P. trichocarpa [69], ð2Þ we inferred the ancestral state as the P. trichocarpa allele where β and MAF is the effect size estimate and minor at sites where the P. trichocarpa individual was homozy- allele frequency for the SNP, N is sample size, and seðβÞ gous and matched one of the P. tremula alleles; otherwise, is standard error of effect size for the SNP. (2) we inferred the ancestral state as the P. tremuloides al- lele at sites where the P. tremuloides individual was homo- Genotype imputation zygous and matched one of the P. tremula alleles. If the For some haplotype-based selection tests, imputed and above two requirements were not met, the ancestral state phased datasets were needed. We therefore used BEA- was defined as missing. In total, we obtained information GLE v4.1 [67] to perform imputation and haplotype of ancestral states for 96.3% of all SNPs. phasing on genotypes of 94 individuals with default pa- rameters. Before performing genotype imputation, we Anchoring and orientation of SNPs associated with local first used Chromosemble from the Satsuma packages adaptation to a single region on chromosome 10 [68] to order and orient the scaffolds of the P. tremula As we found that a large majority of significant SNPs assembly to 19 pseudo-chromosomes according to syn- (> 90%) detected by at least two of the three methods teny with the P. trichocarpa genome. We then per- (PCAdapt, LFMM, and GEMMA) were clustered in a sin- formed pairwise genome alignment between scaffolds of gle genomic region on pseudo-chromosome 10, we per- P. tremula and the 19 pseudochromosomes using the formed several further steps to refine the anchoring and Wang et al. Genome Biology (2018) 19:72 Page 13 of 17 orientation of these SNPs within this region. First, we used sites by length (nS )[23], to test for possible positive se- ColaAlignSatsuma from the Satsuma packages [68]to lection. These statistics were calculated for all SNPs with align the genomes of P. tremula and P. trichocarpa using MAF > 0.05 and with information on ancestral state across default settings. The output was then converted and filtered the genome using the software selscan v1.1.0a [72]with into GBrowse synteny compatible format that was available its assumed default parameters. The iHS and the nS at http://popgenie.org [50]. Based on the alignment of the values were then normalized in frequency bins across the two genomes, 15 scaffolds from the P. tremula assembly whole genome (we used 100 bins). To test for whether that contain SNPs inferred to be associated with local adap- there is significant concentration of selection signals on tation were completely or partially mapped to a single re- the region surrounding the PtFT2, we divided the 19 gion on chromosome 10 of P. trichocarpa genome pseudo-chromosomes (without the seven scaffolds around (Additional file 4: Table S3). We then retained only seven the PtFT2 locus) into non-overlapping windows of 700 scaffolds that were completely mapped to the region and kbp and calculated the proportion of SNPs with |iHS| > 2 with length > 10 kbp. The seven scaffolds contained > 95% or with |nS | > 2 in each window. Statistical significance (1465 out of 1528) of the total number of significant SNPs was assessed using the ranking of genome-wide windows, in the single region of chromosome 10. Lastly, according to with windows having < 100 SNPs being excluded. the alignment results between the genome of P. tremula and P. trichocarpa, we re-ordered and re-oriented the seven Population-specific selective sweeps scaffolds to a ~ 700-kbp region for all downstream selection Several standard methods were further applied to search tests (Additional file 3:FigureS4). for signs of selective sweeps in different groups of popula- tions: (1) pairwise nucleotide diversity (π)[73], which is Linkage disequilibrium expected to have a local reduction following a selective To explore and compare patterns of LD between the sweep, was calculated using a sliding window approach ~ 700-kbp region on chromosome 10 and genome-wide with window size of 10 kbp and moving step of 5 kbp levels, we first calculated correlations (D’ and r )between using the software package - Analysis of Next-Generation all pairwise common SNPs (MAF > 5%, 9149 SNPs) in the Sequencing Data (ANGSD v0.602) [74] separately for ~ 700-kbp region using PLINK 1.9 [56]. Then we used South (pop 1-6), Mid (pop 7-8) and North (pop 9-12) PLINK 1.9 to randomly thin the number of common populations. Only the reads with mapping quality > 30 SNPs across the genome to 200,000 and calculated the and the bases with quality score > 20 were used in the ana- squared correlation coefficients (r )between allpairs of lysis. Windows with < 10% of covered sites remaining SNPs that were within a distance of 100 kbp. The decay of from the previous filtering steps (section 2.1) were ex- LD against physical distance was estimated using cluded; (2) Weir and Cockerham’s F , which measures ST non-linear regression of pairwise r vs the physical dis- genetic divergence between pairs of three groups of popu- tance between sites in base pairs [71]. lations, South, Mid, and North, was calculated using a sliding-window approach with window size of 10 kbp and Fine-mapping the causal variants using CAVIAR moving step of 5 kbp by VCFtools; (3) a combination of We utilized CAVIAR (CAusal Variants Identification in H12 and H2/H1 [27], which measures haplotype homozy- Associated Regions, v1.0) [21] to identify the potential gosity and can distinguish hard from soft selective sweeps, causal variants in the ~ 700-kbp region on chromosome was calculated in windows of 200 SNPs (~ 15 kbp) for 10. CAVIAR is a fine-mapping method that quantifies the common SNPs with MAF > 5% separately for South, Mid, probability of each variant in a locus to be causal and out- and North populations. As the mean LD (r )in P. tremula puts a set of variants that with a predefined probability decays to < 0.1 within 10 kbp (Additional file 3:Figure (e.g. 95% or 99%) contain all of causal variants at the S11a and [69]), the use of ~ 15 kbp windows should be locus. We created the LD structure by computing r be- large enough to differentiate the footprint of selective tween all pairwise significantly associated SNPs in the ~ sweeps from those caused by neutral processes. The H12 700-kbp region using PLINK 1.9. Marginal statistics for and H2/H1 values were then averaged using a sliding win- each significantly associated variant is the association sta- dow method with window size of 10 kbp and moving step tistics obtained from GWAS analysis by GEMMA. In our of 5 kbp; (4) a composite likelihood ratio statistic (CLR) analysis, we set the causal confidence as 99% (−r0.99) to [75], which contrasts the likelihood of the null hypothesis obtain a set of causal variants that capture all the causal based on the genome-wide site frequency spectrum with variants with the probability > 99%. the likelihood of a model where the site frequency has been altered by a recent selective sweep, was computed Positive selection detection using SweepFinder2 [76] separately for South, Mid, We measured two haplotype-based tests, integrated and North populations. SweepFinder2 is most efficient haplotype score (iHS) [22]and thenumber of segregating when information on the ancestral and derived states is Wang et al. Genome Biology (2018) 19:72 Page 14 of 17 available for SNPs and we therefore polarized SNPs as de- heterozygous and homozygous genotypes carrying the scribed above. The small fraction of SNPs (~ 3.7%) that selected (derived) allele were 1 + s/2 and 1 + s, res- could not be polarized was excluded from further analysis pectively. We simulated a chromosome region consisting using SweepFinder2. CLRs were calculated using of L = 25,000 sites and assumed a diploid effective popula- −8 non-overlapping windows with a spacing of 2 kbp; the tion size of N = 92,000, a mutation rate of μ =3.75 × 10 empirical site frequency spectrum across the whole P. tre- per base pair per generation [79], and a recombination −8 mula genome was estimated using the –f option in rate of r =0.729 ×10 per base pair per generation. To- SweepFinder2 after including all polymorphic sites in gether these parameters yielded a scaled population muta- the genome (a total of 8,007,303 SNPs). As recommended tion rate equal to Θ =4N μL = 86.27 and a scaled by Huber et al. [77], we only used sites that were poly- population recombination rate ρ =4N rL = 19.76. For each morphic or that represented fixed substitutions in each simulation, values for both s and T were drawn from uni- group of populations to scan for sweeps. To determine form prior distributions, log (T)~U(− 4,– 0.5) and whether there are significant differences of the above sta- log (s)~U(− 4,– 0.5). tistics between the 700-kbp region around PtFT2 gene on chromosome 10 and genome-wide estimates, we use the Gene expression of PtFT2 under active growth and during same strategy to divide the genome into the windows with growth cessation the same size for each test and calculated the above statis- Samples used for the expression analysis of PtFT2 were tics across the genome (results are shown in Fig. 4b, d, f, collected from both climate chamber and the field (Sävar, h, j and Additional file 5: Table S4). Significance for the 63.4 °N, Umeå) conditions. For treatment in the climate above statistical measurements was evaluated using chamber, two southern clones (SwAsp018, 56.2 °N, Mann–Whitney tests. Ronneby; SwAsp023, 56.2 °N, Ronneby) and two northern To assess the scale of a genomic region that is affected clones (SwAsp100, 63.9 °N, Umeå; SwAsp112, 65.6 °N, by a selective sweep, we ran coalescent simulations mod- Luleå) were selected. These plants were selected to repre- eling a selective sweep in the Northern populations. sent the northern-most and southern-most populations of Simulations were run assuming that the selected site the SwAsp collection that are experiencing the most was located at the center of the simulated region. Pa- diverged photoperiodic conditions. Plants were grown rameters for the simulations were taken from ABC cal- under 23-h day lengths for one month and then trans- culations dating the selective sweep inferred in the ferred to 19-h day-length conditions for two weeks before North populations (as shown below). Briefly, we used a the start of sampling. Leaves were harvested at 2-h inter- scaled population mutation rate (4N μ) of 0.0081/bp, vals for a total period of 24 h using three biological repli- which corresponds to the average observed diversity in cates of each genotype. Samples were subsequently the North populations. Similarly, we set the scaled popu- flash-frozen in liquid nitrogen and stored at − 80 °C until lation recombination rate (4N r) to 0.0019 to match the sample preparation. genome-wide ratio of r/μ = 0.229 in P. tremula [69]. Field samples were collected in the Sävar common Analyses of the simulated data using SweepFinder2 garden in early July 2014 and samples were taken from showed that a single selective sweep often yields mul- two southern clones (SwAsp005, 56.7 °N, Simlång; tiple significant peaks across a region spanning up to, SwAsp023, 56.2 °N, Ronneby) and two northern clones and even exceeding, 100 kbp (95% quartile: 148,221 bp; (SwAsp100, 63.9 °N, Umeå; SwAsp116, 65.6 °N, Luleå). Additional file 3: Figure S18). Leaves were harvested from three different clonal repli- cates planted in the common garden to serve as bio- Dating the selective sweep in the North populations logical repeats. Leaf samples were flash-frozen in in To date the inferred selective sweep in the North popula- liquid nitrogen and stored at − 80 °C until sample prep- tions, we used the ABC method described in Ormond et aration. Samples were collected at 2-h intervals for a al. [29] to jointly estimate s (the strength of selection on total period of 24 h. the beneficial mutation causing the sweep) and T (the RNA extraction for all samples was performed using a time since the beneficial allele fixed) assuming a model of CTAB-LiCl method [80]. Complementary DNA (cDNA) selection from a de novo mutation (hard selective sweep). synthesis was performed using the iScript cDNA Synthe- We simulated 5 × 10 independent selective sweep events sis Kit (BIO-RAD) according to the manufacturer’s in- using the coalescent simulation program msms [78]. For structions. Quantitative real-time PCR analyses were the coalescent simulations, the ancestries of samples were performed using a Roche LightCycler 480 II instrument, traced backwards in time using standard coalescent and the measurements were obtained using the relative methods and allowing for recombination. Selection was quantification method [81]. We used primers qFT2F modelled at a single site by applying forward simulations, (5’-AGCCCAAGGCCTACAGCAGGAA-3′) and qFT2R assuming additive selection so that the fitness of (5’-GGGAATCTTTCTCTCATGAT-3′) for amplifying Wang et al. Genome Biology (2018) 19:72 Page 15 of 17 the transcript of FT2 and qUBQF (5’-GTTGATTTT Additional file 7: Table S6. Average values of 39 environmental TGCTGGGAAGC-3′) and qUBQR (5’-GATCTTGGC variables over the years 2002–2012 for the original sample location of 94 P. tremula individuals used in this study. (XLSX 64 kb) CTTCACGTTGT-3′) for UBQ as the internal control. We assessed the presence of transcription of both PtFT2 (Potra001246g10694) and PtFT2β by digesting RT-PCR Acknowledgements We thank Carin Olofsson for extracting DNA for all samples used in this products with SacI that distinguish the two transcripts study. We thank three anonymous reviewers for their suggestions that (Additional file 3: Figure S7). helped improve the final version of the manuscript. STRÅNG data are obtained from the Swedish Meteorological and Hydrological Institute (SMHI), which were produced with support from the Swedish Radiation Protection Field experiment with transgenic PtFT2 lines Authority and the Swedish Environmental Agency. The authors also would Construction of the PtFT RNAi lines are described in de- like to acknowledge support from Science for Life Laboratory and the tail in [19]. Briefly, the clone used for transformations is a National Genomics Infrastructure (NGI) for providing assistance with massive parallel sequencing. All analyses were performed on resources provided by hybrid aspen, P. tremula × tremuloides, clone T89, that the Swedish National Infrastructure for Computing (SNIC) at Uppsala sets bud at 15-h day lengths [19] and this clone thus has a Multidisciplinary Center for Advanced Computational Science (UPPMAX) photoperiodic response that is comparable to SwAsp ge- under the projects b2010014 and b2011141. notypes from southern Sweden [82]. Transformed T89 Funding plants were planted together with wild type T89 (WT) The research was funded through grants from Vetenskapsrådet, Knut and controls in a common garden at Våxtorp, Halland (lati- Alice Wallenbergs stiftelse, and a Young Researcher Award from Umeå tude 56.4 N, longitude 13.1E) in 2014. Eighteen replicates University to PKI. JW was supported by a scholarship from the Chinese Scholarship Council. BT is supported by the UPSC “Industrial graduate school of each line were planted in a complete randomized block in forest genetics, biotechnology and breeding.” NRS is supported by the design together with six WT controls per block. Starting Trees and Crops for the Future (TC4F) project. in 2015, data were collected on growth cessation, bud for- mation, and bud set for all trees in the common garden. Availability of data and materials The whole genome sequencing (WGS) raw reads have been deposited in From early August, plants were visually inspected roughly NCBI’s sequence read archive (SRA) under accession number PRJNA297202 [83]. every five days and top shoots were scored according to a Background information, bud set genetic values (BLUPs), and environmental pre-determined scoring sheet (Additional file 3:Figure data at the site or origin for all clones used in the GWAS are available from Zendo [84] under a CC BY-SA 4.0 license. All scripts used for the analysis S19) and classified as active growth (score 3), growth ces- described are available on GitHub under a MIT License [85]. sation (score 2), bud formation (score 1), and bud set (score 0). Scoring was continued until all plants had com- Authors’ contributions pletely senesced in late October. Bud scoring data were JW, ON, SJ, NS, and PKI conceived of and designed the experiments. JW, BT, converted to Julian date of bud set and analyzed using the AJ, BN, DGS, NS, and PKI carried out all population genetic analyses. JD performed greenhouse and RT-PCR experiments. KMR and IHM collected following linear model: common garden data. JW and PKI wrote the paper. All authors commented on the manuscript. All authors read and approved the final manuscript. y ¼ μ þ α þ β þ ε i ij ij j Ethics approval and consent to participate where μ is an overall mean, α is the effect of treatment i Not applicable. (where i is either PtFT2 RNAi or WT), and β is the ef- fect of block j and ε are individual residual errors. ij Competing interests The authors declare that they have no competing interests. Additional files Publisher’sNote Additional file 1: Table S1. Geographical details of the 94 P. tremula Springer Nature remains neutral with regard to jurisdictional claims in published samples used in this study and the summary statistics of Illumina re- maps and institutional affiliations. sequencing data per sample. (DOCX 155 kb) Author details Additional file 2: Table S2.. Tracy-Widom statistics for the first three Umeå Plant Science Centre, Department of Ecology and Environmental eigenvalues in PCA. (DOCX 31 kb) Science, Umeå University, 90187 Umeå, Sweden. Centre for Integrative Additional file 3: Figures S1–S19. (PDF 7335 kb) Genetics, Department of Animal and Aquacultural Sciences, Faculty of Life Additional file 4: Table S3. List of the 1615 candidate SNPs associated Sciences, Norwegian University of Life Sciences, PO Box 5003, Ås, Norway. with local adaptation. (XLSX 234 kb) Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, 901 83 Umeå, Additional file 5: Table S4. Summary statistics (median and central 4 5 Sweden. Stora Enso Biomaterials, 13104 Nacka, Sweden. Umeå Plant 95% range) for five selective sweep measures across the ~ 700-kbp Science Centre, Department of Plant Physiology, Umeå University, 90187 region around PtFT2 gene on chromosome 10 and genome-wide level. Umeå, Sweden. Wallenberg Advanced Bioinformatics Infrastructure, Science Pairwise nucleotide diversity (π), genetic divergence between groups of for Life Laboratory, Uppsala University, Uppsala, Sweden. Department of populations (F ), H12, H2/H1, and composite likelihood ratio (CLR) test ST Ecology and Genetics, Evolutionary Biology, Uppsala University, Uppsala, are compared for three groups of populations, South (pop 1–6), Mid (pop Sweden. Uppsala Multidisciplinary Center for Advanced Computational 7–8), and North (pop 9–12) corresponding to Fig. 4. (DOCX 95 kb) Science, Uppsala University, Uppsala, Sweden. Present address: Department Additional file 6: Table S5. ANOVA tables for analyses of gene of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural expression in greenhouse and common garden experiments. (DOCX 51 kb) Sciences, PO Box 7080, 750 07 Uppsala, Sweden. Wang et al. Genome Biology (2018) 19:72 Page 16 of 17 Received: 4 December 2017 Accepted: 3 May 2018 27. Garud NR, Messer PW, Buzbas EO, Petrov DA. Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genet. 2015;11:e1005004. 28. Hermisson J, Pennings PS. Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid References adaptation. Methods Ecol Evol. 2017;8:700–16. 1. Richardson JL, Urban MC, Bolnick DI, Skelly DK. Microgeographic adaptation 29. Ormond L, Foll M, Ewing GB, Pfeifer SP, Jensen JD. Inferring the age of a and the spatial scale of evolution. Trends Ecol Evol. 2014;29:165–76. fixed beneficial allele. Mol Ecol. 2016;25:157–69. 2. Savolainen O, Lascoux M, Merilä J. Ecological genomics of local adaptation. 30. Ingvarsson PK, Garcia MV, Hall D, Luquez V, Jansson S. Clinal variation in Nat Rev Genet. 2013;14:807–20. phyB2, a candidate gene for day-length-induced growth cessation and bud 3. Yeaman S, Whitlock MC. The genetic architecture of adaptation under set, across a latitudinal gradient in European aspen (Populus tremula). migration-selection balance. Evolution. 2011;65:1897–911. Genetics. 2006;172:1845–53. 4. Neale DB, Ingvarsson PK. Population, quantitative and comparative 31. Ingvarsson PK, Garcia MV, Luquez V, Hall D, Jansson S. Nucleotide genomics of adaptation in forest trees. Curr Opin Plant Biol. 2008;11:149–55. polymorphism and phenotypic associations within and around the 5. Neale DB, Kremer A. Forest tree genomics: growing resources and phytochrome B2 Locus in European aspen (Populus tremula, Salicaceae). applications. Nat Rev Genet. 2011;12:111–22. Genetics. 2008;178:2217–26. 6. Savolainen O, Pyhajarvi T, Knurr T. Gene flow and local adaptation in trees. 32. Turck F, Fornara F, Coupland G. Regulation and identity of florigen: Annu Rev Ecol Evol Syst. 2007;21:5530–45. FLOWERING LOCUS T moves center stage. Annu Rev Plant Biol. 2008;59: 7. Aitken SN, Whitlock MC. Assisted gene flow to facilitate local adaptation to 573–94. climate change. Ann Rev Ecol Evol Syst. 2013;44:367–88. 33. Klopfstein S, Currat M, Excoffier L. The fate of mutations surfing on the wave 8. Rohde A, Bhalerao RP. Plant dormancy in the perennial context. Trends of a range expansion. Mol Biol Evol. 2006;23:482–90. Plant Sci. 2007;12:217–23. 34. Excoffier L, Ray N. Surfing during population expansions promotes genetic 9. Singh RK, Svystun T, AlDahmash B, Jönsson AM, Bhalerao RP. Photoperiod- revolutions and structuration. Trends Ecol Evol. 2008;23:347–51. and temperature-mediated control of phenology in trees - a molecular 35. Yeaman S. Local adaptation by alleles of small effect. Am Nat. 2015; perspective. New Phytol. 2017;213:511–24. 186(Suppl 1):S74–89. 10. Hall D, Luquez V, Garcia MV, St Onge KR, Jansson S, Ingvarsson PK. Adaptive 36. Orr HA. The population genetics of adaptation: the distribution of factors population differentiation in phenology across a latitudinal gradient in fixed during adaptive evolution. Evolution. 1998;52:935–49. European aspen (Populus tremula, L.): a comparison of neutral markers, 37. Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and candidate genes and phenotypic traits. Evolution. 2007;61:2849–60. speciation. Genetics. 2006;173:419–34. 11. Ma X-F, Hall D, Onge KRS, Jansson S, Ingvarsson PK. Genetic differentiation, 38. Yeaman S. Genomic rearrangements and the evolution of clusters of locally clinal variation and phenotypic associations with growth cessation across adaptive loci. Proc Natl Acad Sci U S A. 2013;110:E1743–51. the Populus tremula photoperiodic pathway. Genetics. 2010;186:1033–44. 39. Supple MA, Hines HM, Dasmahapatra KK, Lewis JJ, Nielsen DM, Lavoie 12. Luquez V, Hall D, Albrectsen BR, Karlsson J, Ingvarsson P, Jansson S. Natural C, et al. Genomic architecture of adaptive color pattern divergence and phenological variation in aspen (Populus tremula): the SwAsp collection. convergence in Heliconius butterflies. Genome Res. 2013;23:gr150615. Tree Genet Genomes. 2008;4:279–92. 112-1257. 13. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS 40. Feder JL, Gejji R, Yeaman S, Nosil P. Establishment of new mutations Genet. 2006;2:e190. under divergence and genome hitchhiking. Phil Trans Roy Soc B. 2012; 14. De Carvalho D, Ingvarsson PK, Joseph J, Suter L, Sedivy C, Macaya-Sanz D, et B367:461–74. al. Admixture facilitates adaptation from standing variation in the European 41. Yeaman S, Aeschbacher S, Bürger R. The evolution of genomic islands aspen (Populus tremula L.), a widespread forest tree. Mol Ecol. 2010;19:1638– by increased establishment probability of linked alleles. Mol Ecol. 2016; 25:2542–58. 15. Duforet-Frebourg N, Bazin É, Blum MGB. Genome scans for detecting footprints of local adaptation using a Bayesian factor model. Mol Biol Evol. 42. Pin PA, Nilsson O. The multifaceted roles of FLOWERING LOCUS T in plant 2014;31:2483–95. development. Plant Cell Environ. 2012;35:1742–55. 16. Frichot É, Schoville SD, Bouchard G, François O. Testing for associations 43. Stern DL, Orgogozo V. Is genetic evolution predictable? Science. 2009;323: between loci and environmental gradients using latent factor mixed 746–51. models. Mol Biol Evol. 2013;30:1687–99. 44. Stern DL, Orgogozo V. The loci of evolution: how predictable is genetic evolution? Evolution. 2008;62:2155–77. 17. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for 45. Li P, Filiault D, Box MS, Kerdaffrec E, van Oosterhout C, Wilczek AM, et al. association studies. Nat Genet. 2012;44:821–4. Multiple FLC haplotypes defined by independent cis-regulatory variation 18. Ding J, Nilsson O. Molecular regulation of phenology in trees-because the underpin life history diversity in Arabidopsis thaliana. Genes Dev. 2014;28: seasons they are a-changin. Curr Opin Plant Biol. 2016;29:73–9. 1635–40. 19. Böhlenius H, Huang T, Charbonnel-Campaa L, Brunner AM, Jansson S, Strauss SH, et al. CO/FT regulatory module controls timing of flowering and 46. Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C, Halldorsdottir SS, seasonal growth cessation in trees. Science. 2006;312:1040–3. et al. A latitudinal cline in flowering time in Arabidopsis thaliana modulated 20. Evans LM, Slavov GT, Rodgers-Melnick E, Martin J, Ranjan P, Muchero W, et by the flowering time gene FRIGIDA. Proc Natls Acad Sci USA. 2004;101: 4712–7. al. Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet. 2014;46:1089–96. 47. Huo H, Wei S, Bradford KJ. DELAY OF GERMINATION1(DOG1) regulates both 21. Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E. Identifying seed dormancy and flowering time through microRNA pathways. Proc Natl causal variants at loci with multiple signals of association. Genetics. Acad Sci U S A. 2016;113:E2199–206. 2014;198:497–508. 48. Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, et al. 22. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive Multiple alleles at a single locus control seed dormancy in Swedish selection in the human genome. PLoS Biol. 2006;4:e72. Arabidopsis. elife. 2016;5:e22502. 49. Bolger AM, Lohse M, Usadel B. Trimmomatic a flexible trimmer for Illumina 23. Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. On detecting sequence data. Bioinformatics. 2014;30:2114–20. incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol. 2014;31:1275–91. 50. Sundell D, Mannapperuma C, Netotea S, Delhomme N, Lin Y-C, Sjödin A, et al. The Plant Genome Integrative Explorer Resource: PlantGenIE.org. New 24. Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, Schaffner SF, et al. Detecting recent positive selection in the human genome from haplotype Phytol. 2015;208:1149–56. structure. Nature. 2002;419:832–7. 51. Li H. Aligning sequence reads, clone sequences and assembly contigs with 25. Bragg JG, Supple MA, Andrew RL, Borevitz JO. Genomic variation across BWA-MEM. arXiv. 2013;1303:3997. http://arxiv.org/abs/1303.3997 landscapes: insights and applications. New Phytol. 2015;207:953–67. 52. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The 26. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005; Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 39:197–218. 2078–9. Wang et al. Genome Biology (2018) 19:72 Page 17 of 17 53. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A 79. Ingvarsson PK. Multilocus patterns of nucleotide polymorphism and the framework for variation discovery and genotyping using next-generation demographic history of Populus tremula. Genetics. 2008;180:329–40. DNA sequencing data. Nat Genet. 2011;43:491–8. 80. Xu M, Zang B, Yao HS, Huang MR. Isolation of high quality RNA and 54. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive molecular manipulations with various tissues of Populus. Russ J Plant elements in genomic sequences. Curr Protoc Bioinformatics. 2009;Chapter 4: Physiol. 2009;56:716–9. Unit 4.10. 81. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using 55. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program real-time quantitative PCR and the 2(−Delta Delta C(T)) Method. Methods. for annotating and predicting the effects of single nucleotide 2001;25:402–8. polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster 82. Michelson IH, Ingvarsson PK, Robinson KM, Edlund E, Eriksson ME, Nilsson O, strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. et al. Autumn senescence in aspen is not triggered by day length. Physiol Plant. 2018;162:123–34. 56. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. 83. Wang J, Ding J, Tan B, Robinson KM, Michelson IH, Johansson A, et al. A PLINK: a tool set for whole-genome association and population-based major locus controls local adaptation and adaptive life history variation in a linkage analyses. Am J Hum Genet. 2007;81:559–75. perennial plant. NCBI SRA; 2017. BioProject Accession: PRJNA297202. https:// 57. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population www.ncbi.nlm.nih.gov/bioproject/PRJNA297202/. Accessed 4 Oct 2016. structure. Evolution. 1984;38:1358–70. 84. Ingvarsson PK. Data from SwAsp collection - environmental PCAs and bud 58. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The set BLUPs. https://doi.org/10.5281/zenodo.844372. Accessed 4 Dec 2017. variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. 85. Wang J, Ding J, Tan B, Robinson KM, Michelson IH, Johansson A, et al. A 59. Oksanen J, Kindt R, Legendre P, OHara B, Simpson GL, Solymos P, et al. major locus controls local adaptation and adaptive life history variation in a vegan: Community Ecology Package. https://cran.r-project.org/web/ perennial plant. Github. 2018. https://github.com/parkingvarsson/ packages/vegan/index.html. PhotoperiodLocalAdaptation. Accessed 26 Mar 2018. 60. Duforet-Frebourg N, Luu K, Laval G, Bazin É, Blum MGB. Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes Data. Mol Biol Evol. 2016;33:1082–93. 61. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–5. 62. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol. 2005;25: 1965–78. 63. Frichot E, François O. LEA an R package for Landscape and Ecological Association studies. Methods Ecol Evol. 2015;6:925–9. 64. Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml User Guide Release 3. 0. 2009. http://www.vsni.co.uk/. 65. Vilhjálmsson BJ, Nordborg M. The nature of confounding in genome-wide association studies. Nat Rev Genet. 2013;14:1–2. 66. Shim H, Chasman DI, Smith JD, Mora S, Ridker PM, Nickerson DA, et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One. 2015;10: e0120758. 67. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84:210–23. 68. Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Di Palma F, et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics. 2010;26:1145–51. 69. Wang J, Street NR, Scofield DG, PK I. Natural selection and recombination rate variation shape nucleotide polymorphism across the genomes of three related Populus species. Genetics. 2016;202:1185–200. 70. Wang Z, Du S, Dayanandan S, Wang D, Zeng Y, Zhang J. Phylogeny reconstruction and hybrid analysis of Populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments. PLoS One. 2014;9:e103645. 71. Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A. 2001;98:11479–84. 72. Szpiech ZA, Hernandez RD. Selscan an efficient multi-threaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 2014; 31:2824–7. 73. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–95. 74. Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15:356. 75. Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002;160:765–77. 76. DeGiorgio M, Huber CD, Hubisz MJ, Hellmann I, Nielsen R. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformatics. 2016;32:1895–7. 77. Huber CD, DeGiorgio M, Hellmann I, Nielsen R. Detecting recent selective sweeps while controlling for mutation rate and background selection. Mol Ecol. 2016;25:142–56. 78. Ewing G, Hermisson J. MSMS a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2010;26:2064–5. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology Springer Journals

A major locus controls local adaptation and adaptive life history variation in a perennial plant

Free
17 pages

Loading next page...
 
/lp/springer_journal/a-major-locus-controls-local-adaptation-and-adaptive-life-history-Q64uuX5fMG
Publisher
BioMed Central
Copyright
Copyright © 2018 by The Author(s).
Subject
Life Sciences; Animal Genetics and Genomics; Human Genetics; Plant Genetics and Genomics; Microbial Genetics and Genomics; Bioinformatics; Evolutionary Biology
eISSN
1474-760X
D.O.I.
10.1186/s13059-018-1444-y
Publisher site
See Article on Publisher Site

Abstract

Background: The initiation of growth cessation and dormancy represent critical life-history trade-offs between survival and growth and have important fitness effects in perennial plants. Such adaptive life-history traits often show strong local adaptation along environmental gradients but, despite their importance, the genetic architecture of these traits remains poorly understood. Results: We integrate whole genome re-sequencing with environmental and phenotypic data from common garden experiments to investigate the genomic basis of local adaptation across a latitudinal gradient in European aspen (Populus tremula). A single genomic region containing the PtFT2 gene mediates local adaptation in the timing of bud set and explains 65% of the observed genetic variation in bud set. This locus is the likely target of a recent selective sweep that originated right before or during colonization of northern Scandinavia following the last glaciation. Field and greenhouse experiments confirm that variation in PtFT2 gene expression affects the phenotypic variation in bud set that we observe in wild natural populations. Conclusions: Our results reveal a major effect locus that determines the timing of bud set and that has facilitated rapid adaptation to shorter growing seasons and colder climates in European aspen. The discovery of a single locus explaining a substantial fraction of the variation in a key life-history trait is remarkable, given that such traits are generally considered to be highly polygenic. These findings provide a dramatic illustration of how loci of large-effect for adaptive traits can arise and be maintained over large geographical scales in natural populations. Keywords: Populus tremula, Local adaptation, Genomic basis, PtFT2, Adaptive traits, Selective sweep Backgrounds investigating how local adaptation is established and Most species are distributed over heterogeneous envi- maintained at the molecular level in natural populations. ronments across their geographic range and spatially Many perennial plants, such as forest trees, have wide varying selection is known to induce adaptation to local geographic distributions and are consequently exposed environments [1]. Local adaptation thus provides an to a broad range of environmental conditions, making opportunity to study population genetic divergence in adaptation to diverse environmental and climate condi- action [2]. Although the interaction between gene flow tions crucial in these species [4–7]. Natural populations and natural selection is well studied from a theoretical of these plants are often locally adapted and display pro- point of view and makes a number of testable predic- nounced geographic clines in phenotypic traits related to tions [3], there are to date few empirical studies climatic adaptation even in the face of substantial gene flow [5, 6]. One of the most important traits mediating local adaptation is initiation of growth cessation at the end of the growing season, which represents a critical * Correspondence: jingwang368@gmail.com; par.ingvarsson@slu.se life history trade-off between survival and growth in Umeå Plant Science Centre, Department of Ecology and Environmental most perennial plants [8, 9]. Local adaptation in Science, Umeå University, 90187 Umeå, Sweden Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Wang et al. Genome Biology (2018) 19:72 Page 2 of 17 phenology traits, such as growth cessation, is well docu- according to latitude (r = 0.889, P < 0.001) but explaining mented at the phenotypic level in many long-lived per- only 1.3% of the total genetic variance (Fig. 1c; ennial species [2, 6]. Compared to traditional model and Additional file 2: Table S2). Consistent with this, a Man- crop species that are usually annuals, naturally inbred tel test also showed a weak pattern of isolation by dis- and have rich genomic resources available, the genomic tance (IBD; r = 0.210; P = 0.047; Additional file 3: Figure and evolutionary research in long-lived, outcrossing per- S1). Swedish populations of P. tremula have gone ennial species is much more difficult to conduct, and the through a recent admixture of divergent post-glacial lin- genetic architecture of adaptive traits in such species is eages following the Last Glacial Maximum (LGM) [14] therefore still rather poorly understood [5, 6]. and it is possible that this is capable of generating a Here we investigate the genomic signatures of local genome-wide pattern of clinal variation. However, exten- adaptation across a latitudinal gradient that limits the sive gene flow among populations of P. tremula, as sug- length of the growing season in European aspen (Popu- gested by the extremely low level of genome-wide lus tremula). P. tremula is a dioecious and obligately population genetic differentiation (mean F = 0.0021; ST outbreeding tree species; both seeds and pollen are Additional file 3: Figure S2), has almost eradicated any wind-dispersed and usually show weak population gen- such signal across the genome. etic structure [10, 11]. Despite low genetic differentiation at neutral molecular markers, local populations display Identifying genomic variants associated with local strong adaptive differentiation in phenology traits, such adaptation as the timing of bud set and growth cessation, across the We used three complementary approaches to identify latitudinal gradient [10]. In this study, we integrate candidate SNPs involved in local adaptation. First, we whole genome re-sequencing with field and greenhouse identified SNPs that were most strongly associated with experiments to characterize the genome-wide architec- the observed population structure using PCAdapt [15]. ture of local adaptation in P. tremula. Using a combin- Second, we identified SNPs showing strong associations ation of approaches, we identify a single genomic region, with environmental variables based on a latent factor centered on a P. tremula homolog of FLOWERING mixed-effect model (LFMM) [16]. Finally, we performed LOCUS T2(PtFT2), that controls a substantial fraction genome-wide association mapping (GWAS) on the tim- of the naturally occurring genetic variation in the timing ing of bud set, our target adaptive trait, using GEMMA of bud set. The region displays multiple signs of a recent (Fig. 2a,[16, 17]). SNPs identified as significant (false selective sweep that appears to have been restricted to discovery rate [FDR] < 0.05) by the three methods the northern-most populations. Our results provide evi- showed a large degree of overlap (Additional file 3: dence of a major locus that has facilitated rapid adapta- Figure S3) and for subsequent analyses we consider tion to shorter growing seasons and colder climates SNPs that were identified as significant by at least two of following post-glacial colonization. the three methods to be involved in local adaptation. In total, 99.2% of the 910 SNPs identified by all three Results methods and 89.1% of the additional 705 SNPs identified Genome sequencing, polymorphism detection, and by two methods were located in a single region spanning population structure c. 700 kbp on chromosome 10 (Fig. 2a, b; Additional file In this study, we used a total of 94 unrelated P. tremula 3: Figure S4; Additional file 4: Table S3). trees that were originally collected from 12 sites spanning SNPs associated with local adaptation displayed strong c. 10° of latitude (~ 56–66 °N) across Sweden (the SwAsp clinal patterns in allele frequencies with latitude, in stark collection from [12], see also Additional file 1: Table S1). contrast to 10,000 SNPs randomly selected from across Earlier studies have shown that the SwAsp collection dis- the genome that displayed no or negligible differences plays a strong latitudinal cline in the timing of bud set among populations (Additional file 3: Figure S5). The (Fig. 1a, b)[10–12]. We performed whole genome 700-kbp region on chromosome 10 encompasses 92 re-sequencing of all 94 aspens and obtained a total of genes and the most strongly associated variants for all 1139.2 Gb of sequence, with an average sequencing depth three tests are located in a region containing two P. tre- of ~ 30 × per individual covering > 88% of the reference mula homologs of the Arabidopsis FLOWERING genome (Additional file 1: Table S1). After stringent vari- LOCUS T (PtFT2; Potra001246g10694 and an unanno- ant calling and filtering, we identified a total of 4,425,109 tated copy located c. 20 kbp upstream of PtFT2, tenta- high-quality single nucleotide polymorphisms (SNPs) with tively named PtFT2β) (Fig. 2b, c). FT is known to be a minor allele frequency (MAF) > 5%. involved in controlling seasonal phenology in perennial We found very weak population structure across the plants [18] and has previously been implicated in regu- entire range using principal component analysis (PCA) lating short-day induced growth cessation, bud set, and [13], with a single significant axis separating individuals dormancy induction in Populus [19, 20]. Wang et al. Genome Biology (2018) 19:72 Page 3 of 17 a b Pop11 Pop12 Pop9 Pop10 56 58 60 62 64 66 Pop7 Pop8 Latitude 0.2 Pop6 Days Pop5 0.1 0.0 Pop3 Pop4 −0.1 −0.2 Pop1 −0.3 Pop2 −0.2 −0.1 0.0 0.1 0.2 PC1 (1.31%) Fig. 1 Geographic distribution and genetic structure of 94 aspen individuals. a Location of the 12 original sample sites of the SwAsp collection (circles) and the location of the two common garden sites (orange stars). The original collection sites span a latitudinal gradient of c. 10 latitude degrees across Sweden. b Genetic values for date of bud set for the 94 individuals included in the study across the two common gardens and three years (2005, 2006, and 2007). c Population structure in the SwAsp collection based on a PCA of 217,489 SNPs that were pruned to remove 2 −12 SNPs in high linkage disequilibrium (SNPs included all have r < 0.2). Although two axes are shown, only the first axis is significant (P =3.65×10 , Tacey-Widom test, 1.31% variance explained) We observed that structure of the PtFT2 locus is many SNPs that could individually contribute to bud set conserved across Populus species, but not between Popu- and hence could be involved in local adaptation. lus and Salix (Additional file 3: Figure S6). Although both copies of PtFT2 appear to be expressed (Additional file 3: Evidence of rapid adaptive evolution Figure S7), the SNP showing the strongest signal of local In order to gain further insight into the evolutionary history adaptation across all three methods (Potra001246:25256) of the PtFT2 locus, we performed several haplotype-based was located in the third intron of the previously annotated tests toexamine thepresenceofrecentpositiveselection in copy of PtFT2 (Potra001246g10694) (Fig. 2c). This SNP this region. We calculated the standardized integrated explain 65% of the observed genetic variation in the haplotype score (iHS) [21, 22] for all SNPs (8570 SNPs timing of bud set across years and sites. Furthermore, it where information of ancestral or derived states was avail- was identified as having highest probability of being the able) located in the 700-kbp region (Fig. 3a). Positive selec- causal variant within the 700-kbp region by CAVIAR [21] tion signals, revealed by |iHS| > 2.0, were observed for (Fig. 2b, c), a fine-mapping method that accounts for 20.6% of all tested SNPs. We found that the region sur- linkage disequilibrium (LD) and effect sizes to rank poten- rounding PtFT2 contained the highest concentration of sig- tial causal variants. Another potentially causal SNP nificant hits by the iHS test across the genome (Fig. 3b), (Potra001246:43095) in this region is in strong LD with confirming that PtFT2 locus as the strongest candidate for Potra001246:25256 (Fig. 2c). Therefore, we identify PtFT2 positive selection in the Swedish populations of P. tremula. as a candidate gene, and henceforth, we refer to the entire Similar results were found when the number of segregating ~ 700-kbp region centered on PtFT2 as the PtFT2 locus. sites by length (nSL) [23], which has proven sensitive for We note, however, that this region potentially harbors detecting incomplete selective sweeps, was calculated for PC2 (1.21%) Bud set (days) Wang et al. Genome Biology (2018) 19:72 Page 4 of 17 bc Fig. 2 Local adaptation signals across the genome. a Manhattan plots for SNPs associated with population structure (PCAdapt), climate variation (LFMM), and phenotype (GEMMA). The 700-kbp region surrounding PtFT2 gene (marked in red) is identified by all methods. The dashed line represents the significance threshold for each method. Quantile-quantile plot is displayed in the right panel,withsignificant SNPs highlighted in red. b Magnification of the phenotype association results (from GEMMA) for the region surrounding PtFT2 on Chr10. The coordinates correspond to the region 16.3 Mbp-17.0 Mbp on Chr10. Individual data points are colored according to LD with the most strongly associated SNP (Potra001246:25256). The two potential causal variants identified by CAVIAR within this region are marked by black circles. c Close-up view of the phenotype association results (from GEMMA) in a region corresponding to the blue bar in (b). This region contains the two PtFT2 homologs (red - exons, blue - UTRs) and several other genes (dark gray - exons, light grey -UTRs) these same loci (Additional file 3:FigureS8).Wefurther allele (Fig. 3e). Notably, the derived allele with high EHH is performed the extended haplotype homozygosity (EHH) largely restricted to the four high-latitude populations and test [24], centering on the most strongly associated SNP almost absent in the southern-most populations (Fig. 3c), (Potra001246:25256), to explore the extent of haplotype implying that PtFT2 locus has likely been subjected to geo- homozygosity around the selected region. The core haplo- graphically restricted selective sweeps [25]. type carrying the derived allele (G) had elevated EHH and To further understand the evolution of functional dif- exhibited long-range LD relative to haplotypes carrying the ferences between northern and southern PtFT2 alleles, ancestral allele (T) (Fig. 3d). Also, haplotypes carrying the we examined the patterns of genetic variation at the derived allele were longer than those carrying the ancestral PtFT2 locus separately for South (pop 1–6), Mid (pop Wang et al. Genome Biology (2018) 19:72 Page 5 of 17 a bc de f Fig. 3 Evidence of positive selection centered on the PtFT2 locus. a Patterns of normalized iHS scores (y-axes) across the ~ 700-kbp genomic region (x-axis)around the PtFT2 gene (vertical light gray bar). The dashed horizontal lines indicate the threshold of positive selection signal (|iHS| > 2). The red dot indicates the SNP (Potra001246:25256) showing the strongest signal of local adaptation. b A high concentration of significant |iHS| signals was found in the ~ 700-kbp region surrounding PtFT2 (marked as red line) compared to the genome-wide distribution (based on dividing the genome into non-overlapping windows of 700 kbp). The dashed lines represent the 95% and 99% quantiles, respectively. c Allele frequencies of the most strongly associated SNP Potra001246:25256 for the 12 original populations of the SwAsp collection. d The decay of extended haplotype homozygosity (EHH) of the derived (blue) and ancestral (red) alleles for the SNP Potra001246:25256. e The extent of the three most common haplotypes at Potra001246:25256. Rare recombinant haplotypes were pooled and are displayed in gray. f Joint inference of allele age and selection coefficient for the region surrounding PtFT2 7–8), and North (pop 9–12) populations. First, we found populations (Additional file 3: Figure S9). Finally, we per- that the nucleotide diversity at the PtFT2 locus was sig- formed a composite-likelihood based (CLR) test and sep- nificantly below the genome-wide averages in all groups of arately evaluated the evidence of positive selection in populations (Fig. 4a, b;Additional file 5: Table S4), which different groups of populations. As expected for positive was consistent with the expectation of a strong selective selection, a distorted site frequency spectrum with an ex- event [26]. In particular, northern populations were cess of rare and high frequency derived variants near the observed to have a much stronger reduction of genetic di- PtFT2 locus was only found in northern populations (Fig. versity relative to other populations (Fig. 4a, b). Addition- 4i, j;Additional file 5: Table S4). Overall, all these findings ally, the level of genetic differentiation among populations provide compelling evidence for the occurrence of a was exceptionally high at PtFT2 locus compared with gen- strong selection on a single variant at the PtFT2 locus in omic background, especially between southern and north- the northern-most Swedish populations of P. tremula. ern populations (Fig. 4c, d;Additional file 5: Table S4), The observation of a single adaptive haplotype rising implying that spatially varying selection has likely driven to high frequency in high-latitude populations (Fig. 4; latitudinal differentiation at this locus. Furthermore, high Additional file 3: Figure S9) is consistent with a selective H12 but low H2/H1 statistics [27] was only observed in sweep pattern, where adaptation can result either from a northern populations (Fig. 4e–h; Additional file 5:Table de novo mutation or from a low frequency standing vari- S4), providing a clear indication of a single adaptive haplo- ant that was already present in the population before the type that has risen to high frequency among these onset of selection [28]. Assuming the causal mutation Wang et al. Genome Biology (2018) 19:72 Page 6 of 17 Genome-wide Chr10-700kb ab 0.035 0.035 *** *** *** North 0.030 0.030 Mid 0.025 0.025 South 0.020 0.020 π π 0.015 0.015 0.010 0.010 0.005 0.005 0.000 0.000 North Mid South 0.5 0.5 *** *** *** N vs. S 0.4 0.4 N vs. M 0.3 0.3 S vs. M F F ST ST 0.2 0.2 0.1 0.1 0.0 0.0 N vs. S N vs. M S vs. M ef 0.8 1.0 n.s. *** *** North 0.8 0.6 Mid 0.6 South 0.4 H12 H12 0.4 0.2 0.2 0.0 0.0 North Mid South n.s. *** *** 1.0 1.0 0.8 0.8 0.6 0.6 H2/H1 H2/H1 0.4 0.4 North 0.2 0.2 Mid South 0.0 0.0 North Mid South n.s. n.s. *** North Mid CLR South CLR 0 100 200 300 400 500 600 700 North Mid South Position Fig. 4 Geographically restricted selective sweep in northern-most populations. Left panels: A magnified view of different summary statistics that are sensitive to the effects of a selective sweep for the ~ 700 kbp region surrounding PtFT2. The gray bar marks the location of the PtFT2 gene. Right panels: Comparison of these statistics between the PtFT2 region (colored boxplot) and the genome-wide averages (gray boxplot). Statistics were calculated separately for individuals from southern (population 1–6), middle (populations 7–8), and northern (populations 9–12) in Sweden. a, b Nucleotide diversity, π. c, d Genetic differentiation, F . e, f H12. g, h H2/H1. i, j Composite likelihood ratio (CLR) test for the presence of a ST selective sweep appeared near the time of the onset of selection, we used 95% credible interval = 719–114,122 years) and that an Approximate Bayesian Computation (ABC) method selection during the sweep has been relatively strong (s [29] to estimate jointly the age and strength of selection = 0.016, 95% credible interval = 0.006–0.192). This sug- acting on the northern allele. The results (Fig. 3f) point gests that the adaptive event that occurred in to a recent origin of the northern allele (T = 18,952 years, northern-most populations of P. tremula most likely Wang et al. Genome Biology (2018) 19:72 Page 7 of 17 represents an evolutionary response to the harsher of PtFT2 expression is involved in mediating local adap- environmental conditions experienced by these popula- tation, we selected two southern genotypes and two tions during the post-glacial colonization of northern northern genotypes for greenhouse and field experi- Scandinavia. ments in order to test whether PtFT2 expression regu- lates the timing of growth cessation and bud set. In greenhouse experiments, we found that the two north- PtFT2 regulates the timing of bud set ern genotypes showed rapid growth cessation and bud Although the extensive LD in the immediate vicinity of set following a shift from long (23-h day length) to short the PtFT2 locus (Fig. 2b) makes it hard to identify the day (19-h day length) conditions whereas the two south- true causal SNP(s) that are involved in mediating natural ern genotypes continued active growth under the same variation in bud set, we found that the significantly asso- conditions (Fig. 5a). Analyses of PtFT2 gene expression ciated SNPs are overall enriched in non-coding regions in these genotypes show a strong downregulation of located in and around genes and show a deficit in inter- PtFT2 in the northern genotypes in conjunction with genic regions (Additional file 3: Figure S10; Additional growth cessation and bud set (Fig. 5b; Additional file 6: file 4: Table S3). One possible way that functional vari- Table S5). Similarly, under field conditions we observe ation is mediated by these SNPs is thus by altering ex- that northern genotypes also show lower expression of pression patterns of related genes across the latitudinal PtFT2 even at a time point when all genotypes were ac- gradient. To further assess the possibility that patterns tively growing (Fig. 5c). ab 200 Low latitude sample High latitude sample SwAsp018 SwAsp023 048 12 16 20 c 20 ZT SwAsp112 SwAsp100 12.15 16.15 20.15 00.15 04.15 08.15 12.15 Time Fig. 5 PtFT2 expression affects short-day induced growth cessation and bud set in P. tremula. a Bud set phenotype under 19-h day-length conditions. Two southern clones (marked with a red box, SwAsp 018, Ronneby, latitude 56.2 °N; SwAsp 023, Vårgårda, latitudes 58 °N) and two northern clones (marked with a blue box, SwAsp 100, Umeå, latitude 63.9 °N; SwAsp 112, Luleå, latitudes 65.7 °N) were chosen to be analyzed. Trees were grown under 23-h day length for one month and then shifted to 19-h day length. Photos were taken one month after the shift to 19-h day length. b Dynamic expression analysis of PtFT2 in twosouthernclones (red, SwAsp018 and SwAsp023) and two northern clones (blue, SwAsp100 and SwAsp112) from the greenhouse experiment. The genotypes of these trees at the most strongly associated PtFT2 SNP are SwAsp018: T/T, SwAsp023: T/T, SwAsp100: not available and SwAsp 112: G/G. Samples for RT-PCR were taken two weeks after the trees were shifted to 19-h day length. Error bars, ±standard deviation. ZT zeitgeber time. c Dynamic expression analysis of PtFT2 in two southern clones (red, SwAsp005 and SwAsp023) and two northern clones (blue, SwAsp100 and SwAsp116) from common garden experiment. The genotypes of these trees at the most strongly associated PtFT2SNP are SwAsp005: T/T, SwAsp023: T/T, SwAsp100: not available and SwAsp 112: G/G. Samples were collected in the Sävar common garden in early July 2014 PtFT2/PtUBQ PtFT2/PtUBQ Wang et al. Genome Biology (2018) 19:72 Page 8 of 17 Furthermore, downregulation of the PtFT2 expression gene that plays a central and widely conserved role in using RNA interference (RNAi) to approximately 20% of day-length perception and seasonal regulation of photo- wild-type levels accelerates bud set by c. 23 days, a periodic responses [32]. In Populus, the FT gene is repre- difference that is comparable to the differences we ob- sented by two functionally diverged paralogs where serve between the most extreme phenotypes in our PtFT1 has been speculated to retain the function of re- field-collected trees (Fig. 6). For instance, wild-collected productive initiation whereas PtFT2 acts to maintain trees carrying the derived G allele in homozygous form growth and prevent bud set [18, 19]. We observe that for the most strongly associated SNP in PtFT2 differences in PtFT2 gene expression between genotypes (Potra001246:25256) set bud on average 28 days earlier from southern and northern Swedish populations are as- than those homozygous for the ancestral T allele, with sociated with the timing of bud set in response to vari- the derived G allele showing partial dominance (Fig. 6a). able day lengths in different environments (Fig. 5b, c). The RNAi experiment thus provides additional evidence Transgenic downregulation of PtFT2, under field condi- that differences in gene expression of PtFT2 are involved tions, yields a phenotype that closely mimics variation in mediating the phenotypic differences we observe in found in our wild collected trees, further implying that bud set between northern and southern genotypes. non-coding regulatory variation in or around PtFT2 me- diate local adaptation in bud set by altering the level and Discussion timing of PtFT2 expression. Moreover, a study in the re- To date, only a small number of candidate genes have lated species Populus trichocarpa also identified an asso- been used to identify potential loci linked to traits in- ciation between a non-coding variant at PtFT2, a SNP in volved in local adaptation in P. tremula [11, 30, 31]. the second intron, and naturally occurring variation in Here we have substantially expanded our earlier studies bud set [20]. Although the exact causal mutations differ, by utilizing data from whole genome re-sequencing to this demonstrates that parallel adaptive changes in the local environmental variables and phenotypic variation timing of bud set between P. tremula and P. trichocarpa, in a key adaptive life-history trait in order to investigate two species that diverged more than 7 million years ago the genomic basis of local adaptation in P. tremula.We and that occur on different continents, has involved identify a locus, centered on PtFT2, that has a major ef- changes in the same orthologous gene. fect on phenotypic variation in bud set and that has While PtFT2 has been shown to contribute to local played a key role in the establishment of local adaptation adaptation in Swedish populations of P. tremula, we only of P. tremula. The likely target of the selective sweep, observe a signal of a strong and recent selective sweep PtFT2,isa P. tremula homolog of the Arabidopsis FT at this locus in the four northern-most populations. This n=50 a b n=98 n=17 n=18 220 240 n=26 Control PtFT 2 RNAi TT GT GG PtFT2 Genotype PtFT2 Genotype Fig. 6 Phenotypic effects of PtFT2. a The timing of bud set for the three genotypes classes at the PtFT2 SNP (Potra001246:25256) that displays the strongest signal of local adaptation identified by all three methods as shown in Fig. 2a. The plot displays mean genotype bud set after correcting for common garden site, year, and block effects. The horizontal line indicates the median value and the vertical line marks the interquartile range. The number of genotypes in the respective classes is indicated above the figure. b The timing of bud set for wild type control lines and transgenic PtFT2 lines in the field experiments at Våxtorp. The structure of the plots is the same as in (a) Bud set (days) Bud set (days) Wang et al. Genome Biology (2018) 19:72 Page 9 of 17 selective event has likely been driven by adaptation in adaptation most likely arose from a selective sweep in- response to the substantially shorter growing seasons stead of an inversion [39]. Nonetheless, owing to the that P. tremula has encountered at northern latitudes limited ability to detect inversions using short-insert during the post-glacial colonization of northern Scandi- paired reads, future characterization of structural vari- navia following the last glaciation. One caveat concern- ation across the genome is clearly required to determine ing the selective scans performed in this study is that whether genomic rearrangements are involved in medi- splitting populations into groups along a geographic ating signals of adaptation in the Populus genome. Sec- transect (i.e. latitude) could confound inference of the ond, the establishment probability of additional adaptive underlying selective and demographic forces. For in- mutations can be increased in the vicinity of a locus stance, it is possible that adaptation to spatially varying undergoing strong divergent selection, leading to a gen- selection in Swedish populations of P. tremula have omic architecture where multiple, tightly linked loci are arisen in response to continuous rather than discrete controlling an adaptive trait [39, 40]. However, recent environment clines [10]. In addition, the estimated theoretical work has shown that the conditions for such age of the adaptive mutation at the PtFT2 locus establishment of de novo linked beneficial mutations are coincides with recent post-glacial re-colonization of rather restrictive [41]. Instead, another potentially more northern Scandinavia and it is thus possible that important mechanism for the formation of “genomic strong genetic drift at the front of the range expan- islands” of strong genetic differentiation is via secondary sion have promoted surfing of the adaptive allele in contact and the erosion of pre-existing genetic diver- the newly colonized regions [33, 34]. gence, which is a process that can be very rapid, espe- The weak population genetic structure we observe in cially compared to the alternative scenario that involves our samples, combined with the fact that both pollen the fixation of novel mutations [41]. This mechanism and seeds are wind dispersed in P. tremula, suggest that provides a tantalizing hypothesis for P. tremula where gene flow among Swedish populations of P. tremula is earlier studies have established the existence of a hybrid likely relatively high. In accordance with recent theoret- zone between divergent post-glacial lineages in Scandi- ical predictions [3], our findings show that despite the navia [14, 41]. The selective sweep at PtFT2 is geograph- relatively high, inferred rates of gene flow, strong selec- ically restricted and likely occurred before secondary tion for local adaptation is acting to maintain the contact. Therefore, the large genomic “island” of diver- large-effect beneficial alleles that underlie the locally gence that we observe surrounding the PtFT2 locus is a adaptive traits. Compared to small-effect loci that are strong candidate for having evolved via erosion following prone to swamping and only transiently contribute to secondary contact. local adaptation [3, 35], large-effect loci are more likely to establish and persist over longer time scales as they Conclusions are able to resist the homogenizing effect of migration Our study identifies a single genomic region containing [3]. The distribution of number and effect size for vari- the PtFT2 gene that has a major effect on regulating the ants controlling adaptive traits is therefore expected to timing of bud set and that has facilitated rapid local shift to few large-effect loci under persistent adaptation in P. tremula across a latitudinal gradient in migration-selection balance [3] compared with models Sweden. Natural selection is actively maintaining alter- from isolated populations [36]. Multiple mechanisms nate alleles at this locus despite low genetic differenti- can give rise to the characteristic pattern in P. tremula ation across the rest of the genome. In particular, we where a single locus explains most of the variation for a identify a strong and recent selective sweep that is re- key life-history trait and facilitates rapid adaptation. stricted to the northern-most populations. This adapta- First, the presence of genomic rearrangements, such as tion has thus likely arisen and been driven to fixation chromosomal inversions, that suppress recombination during the post-glacial colonization of northern Scandi- can be favored by natural selection and cause the clus- navia in response to the substantially shorter growing tering of SNPs associated with local adaptation at the seasons that are characteristic of northern latitudes. PtFT2 locus [37, 38]. However, in contrast to expecta- Although the FT gene has repeatedly gone through du- tions from the presence of an inversion, we did not ob- plications and functional diversifications in many plants, serve blocks of elevated LD around the PtFT2 locus variation within and around these FT-like genes are in- (Additional file 3: Figure S11). LD in this region decays volved in mediating adaptive responses to photoperiod rapidly and falls to background levels within a few thou- changes and altering overall fitness in a wide range of sand bases, similar to what is seen in other regions plant species [42]. Given the central role of FT as a key genome-wide (Additional file 3: Figure S11a). This indi- integrator of diverse environmental signals [32], it is per- cates that frequent recombination has occurred in this haps not surprising that FT is more likely to act like an region and that the clustering of SNPs involved in local evolutionary hotspot for rapid adaptation to changing Wang et al. Genome Biology (2018) 19:72 Page 10 of 17 environmental conditions compared to other genes in deletions (indels) were globally realigned using the Realig- the photoperiodic pathway (Additional file 3:Figure nerTargetCreator and IndelRealigner in the Genome Ana- S12) and that these adaptations are mediated through lysis Toolkit (GATK v3.2.2) [53]. To minimize the cis-regulatory changes [43, 44]. FT thus appears to influence of mapping bias, we further discarded the fol- serve as evolutionary “master switch” for adaptive lowing site types: (1) sites with extremely low (< 400× life-history variation, similar to what have been seen across all samples, i.e. less than an average of 4× per sam- for a few other loci in plants, such as FLC [45], FRI ple) or extremely high coverage (> 4500×, or approxi- [46], and DOG1[47, 48]. mately twice the mean depth at variant sites) across all samples after investigating the coverage distribution em- Methods pirically; (2) sites with a high number of reads (> 200×, Sample collection and sequencing that is on average > 2 reads per sample) with mapping We collected material from all available trees in the score equaling zero; (3) sites located within repetitive se- Swedish Aspen (SwAsp), which consists of 116 individ- quences as identified using RepeatMasker [54]; (4) sites uals collected from 12 different locations spanning the that were in genomic scaffolds with a length < 2 kbp. distribution range in Sweden [12] (Fig. 1a). Leaf material was sampled from one clonal replicate of each individual SNP and genotype calling growing at a common garden experiment located in SNP calling in each sample was performed using the Sävar, northern Sweden. Total genomic DNA for each GATK HaplotypeCaller and GenotypeGVCFs were then individual was extracted from frozen leaf tissue using used to perform the multi-sample joint aggregation, the DNeasy plant mini prep kit (QIAGEN, Valencia, CA, re-genotyping, and re-annotation of the newly merged USA). Paired-end sequencing libraries with an average records among all samples. We performed several filter- insert size of 650 bp were constructed for all samples ing steps to minimize SNP calling bias and to retain only according to the Illumina manufacturer’s instructions. high-quality SNPs: (1) remove SNPs at sites not passing Whole genome sequencing and base calling were all previous filtering criteria; (2) retain only bi-allelic performed on the Illumina HiSeq 2000 platform for all SNPs with a distance of > 5 bp away from any indels; (3) individuals to a mean, per-sample depth of approxi- remove SNPs for which the available information de- mately 30× at the Science for Life Laboratory, rived from < 70% of the sampled individuals after treat- Stockholm, Sweden. ing genotypes with quality score (GQ) < 10 as missing; (4) remove SNPs with an excess of heterozygotes and Sequence quality checking, read mapping, and deviates from Hardy–Weinberg equilibrium test (P value post-mapping filtering < 1e-8). After all steps of filtering, a total of 4,425,109 A total of 103 SwAsp individuals were successfully se- SNPs with minor allele frequency > 5% were left for quenced. Before read mapping, we used Trimmomatic downstream analysis. Finally, the effect of each SNP was v0.30 [49] to identify reads with adapter contamination and annotated using SnpEff version 3.6 [55] based on gene to trim adapter sequences from reads. After checking the models from the P. tremula reference genome (available quality of the raw sequencing data using FastQC (https:// at http://popgenie.org); the most deleterious effect was www.bioinformatics.babraham.ac.uk/projects/fastqc/), the selected if multiple effects occurred for the same SNP quality of sequencing reads was found to drop towards the using a custom Perl script. ends of reads (Additional file 3: Figure S13). We therefore used Trimmomatic v0.30 to trim bases from both ends of Relatedness, population structure, and isolation by the reads if their qualities were < 20. Reads < 36 bases after distance trimming were discarded completely. To identify closely related individuals and to infer popu- After quality control, all high-quality reads were mapped lation structure among the sampled individuals, we dis- to a de novo assembly of the P. tremula genome (available carded SNPs with missing rate > 10%, MAF < 5%, and at http://popgenie.org;[50]) using the BWA-MEM algo- that failed the Hardy–Weinberg equilibrium test (P < −6 rithm with default parameters using bwa-0.7.10 [51]. We 1×10 ) after all filtering steps as shown above. We also used MarkDuplicates methods from the Picard packages generated LD-trimmed SNP sets by removing one SNP (http://broadinstitute.github.io/picard/) to correct for the from each pair of SNPs when the correlation coefficients artifacts of PCR duplication by only keeping one read or (r ) between SNPs exceed 0.2 in blocks of 50 SNPs using read-pair with the highest summed base quality among PLINK v1.9 [56]. This yielded 217,489 independent SNPs those of identical external coordinates and/or same insert that were retained for downstream analyses of popula- lengths. Alignments of all paired-end and single-end reads tion structure. First, we used PLINK v1.9 to estimate for each sample were then merged using SAMtools 0.1.19 identity-by-state (IBS) scores among pairs of all individ- [52]. Sequencing reads in the vicinity of insertions and uals. Nine individuals were excluded from further Wang et al. Genome Biology (2018) 19:72 Page 11 of 17 analyses due to their high pairwise genetic similarity 2002–2012 for the original sample coordinates of each with another sampled individual (IBS > 0.8), leaving a SwAsp individual and the average values over years were total of 94 “unrelated” individuals for all subsequent then calculated. The environmental variables include analyses (Additional file 3: Figure S14). Then, we used latitude, longitude, altitude, the number of days with the smartpca program in EIGENSOFT v5.0 [13] to per- temperatures > 5 °C, UV irradiance, the photosynthetic form the PCA on the reduced set of genome-wide inde- photon flux density (PPFD), sunshine duration, monthly pendent SNPs. A Tracey-Widom test, implemented in and annual average precipitation, and temperature. Due the program twstats in EIGENSOFT v5.0, was used to to the high degree of correlation among these environ- determine the significance level of the eigenvectors. mental variables (Additional file 3: Figure S16a), we per- Finally, IBD analysis was computed based on the pair- formed a PCA on these variables using the “prcomp” wise comparison of the genetic and geographic distances function in R to identify PCs that best summarized the between populations. We calculated the population dif- range of environmental variation. The first environmen- ferentiation coefficient (F )[57] for each pair of the 12 tal PC, which explained > 60% of the total variance ST populations using VCFtools v0.1.12b [58]. The relation- (Additional file 3: Figure S16b,c) and had the strongest ship between genetic distance measured as F /(1-F ) loadings for the length of growing season (Additional file ST ST and geographic distance (km) was evaluated using 3: Figure S16d), was kept to represent our target envir- Mantel tests in the R package “vegan” [59]; the signifi- onmental variable for further analyses. We then used a cance of the correlation was estimated based on 9999 latent factor mixed-effect model (LFMM) implemented permutations. in the package LEA in R [63] to investigate associations between SNPs and the first environmental PC while sim- Screening for SNPs associated with local adaptation ultaneously accounting for population structure by We used three conceptually different approaches to test introducing unobserved latent factors into the model for genome-wide signatures of local adaptation. First, we [16]. Due to the weak population structure found in the detected candidate SNPs involved in local adaptation SwAsp collection (see “Results”), we ran the LEA func- using the PCA as implemented in PCAdapt [60]. PCA- tion lfmm with the number of latent factors (K) in the dapt examines the correlations (measured as the squared range of 1–3, using 5000 iterations as burn-in followed loadings ρ , which is the squared correlation between by 10,000 iterations to compute LFMM parameters for jk the jth SNP and the kth principal component [PC]) be- all SNPs. This was performed five times for each value tween genetic variants and specific PCs without any of K; we observed identical results across both different prior definition of populations. As only the first PC was values of K and across independent runs within each significant from the PCA (see “Results”), we only esti- value of K (data not shown). We only showed the results mated the squared loadings ρ with PC1 to identify using K = 2 to account for the background population j1 SNPs involved in local adaptation. Our results showed structure. LFMM outliers were detected as those SNPs that most outlier SNPs that were highly correlated with with FDR < 0.05 after using the method of Storey and the first population structure PC also had high F Tibshirani [61] to account for multiple testing. ST values between populations (Additional file 3: Figure Third, we obtained previously published measure- S15). Assuming a chi-square distribution (degree of free- ments of the timing of bud set, which is a highly herit- dom = 1) for the squared loadings ρ , as suggested by able trait that shows strong adaptive differentiation j1 [60], we used PCAdapt to compute P values for all SNPs along the latitudinal gradient [31]. To measure pheno- and then calculated the FDR using the method of Storey typic traits, all SwAsp individuals have previously been and Tibshirani [61] to generate a list of candidate SNPs clonally replicated (four ramets per individual) and showing significant associations to population structure. planted at two common garden sites in 2004 (Sävar, 63 ° Only SNPs with FDR < 5% were retained as those signifi- N, and Ekebo, 56 °N) (Fig. 1a). The common garden cantly involved in local adaptation. set-up is described in detail in Luquez et al. [12]. The Second, we tested for the presence of candidate SNPs timing of bud set was scored twice weekly starting from that exhibited high correlations with environmental gra- mid-July and continuing until all trees had set terminal dients. To do this, a total of 39 environmental variables buds. Bud set measurements were scored in three con- were analyzed (Additional file 7: Table S6). Precipitation secutive years, 2005–2007, in both common gardens and temperature values were retrieved from WorldClim [10]. A severe drought in Sävar caused most of the trees version 1 [62]. Sunshine hours, photosynthetically active to set bud prematurely in 2006 and we therefore ex- radiation, and ultraviolet (UV) radiation were obtained cluded data from Sävar in 2006 in all downstream ana- using the STRÅNG data model at the Swedish Meteoro- lyses (see Ingvarsson et al. [31] for further discussion). logical and Hydrological Institute (SMHI) (http:// We combined data on bud set from the two common strang.smhi.se). Values were collected from the years garden sites and years by predicting genetic values with Wang et al. Genome Biology (2018) 19:72 Page 12 of 17 best linear unbiased prediction (BLUP) for all individ- BLAST algorithm (E-value cut-off of 1e-50) and, finally, uals. ASReml [64] was used to fit Eq. 1 to the data for > 99% of the SNPs (4,397,537 out of 4,425,109) were an- calculating BLUP using restricted maximum-likelihood chored on the 19 pseudochromosomes. techniques: To test for the accuracy of imputation, and its relationship with the MAF cutoff and the missing rate of genotypes in our dataset, we selected 346,821 SNPs with z ¼ μ þ s þ b þ y þ β þ ε ð1Þ ijklm i jiðÞ ijklm kiðÞ l a rate of missing genotypes < 10% from the pseudo-chromosome 2 (~ 32.6 Mb) for the simulation where z is the phenotype of the mth individual in ijklm analysis. We randomly masked out varying proportions the jth block in the kth year of the lth clone from the ith (5–50%) of SNPs, which were treated as missing. BEA- site. In Eq. 1, μ denotes the grand mean and ε is the ijklm GLE v 4.1 was then used to impute genotypes at the residual term. The clone (β , BLUP) and residual term masked positions. We found high imputation accuracy (ε ) were modeled as random effects, whereas the site ijklm (> 0.97) across a wide range of MAF when rates of miss- (s ), site/block (b ), and site/year (y ) were treated as i j(i) k(i) ing genotypes were < 30% (Additional file 3: Figure S17), fixed effects. The genetic value of each individual was suggesting imputation and phasing by BEAGLE should then used as the dependent trait in a univariate linear not bias the accuracy of our results. We therefore mixed model for SNP-trait association analyses per- phased and imputed genotypes of the SNPs anchored on formed with GEMMA [17]. This method takes related- pseudo-chromosomes using BEAGLE v 4.1. ness among samples into account through the use of a kinship matrix. The mixed model approach imple- Estimation of ancestral states for all SNPs mented in GEMMA has been shown to outperform Since the ancestral states of SNPs are usually used for se- methods that try to correct for population structure by lection detection, for each SNP, we classified alleles as ei- including it as a fixed effect in the GWAS analyses [65]. ther ancestral or derived on the basis of comparisons with Given the extremely weak population structure we two outgroup species: P. tremuloides and P. trichocarpa. observe in our GWAS population (see “Results”), we did We obtained publicly available short read Illumina data not pursue any further corrections for population struc- for one P. tremuloides (SRA ID: SRR2749867) and one P. ture in the association analyses as this likely would trichocarpa (SRA ID: SRR1571343) individual from the severely reduce our power to detect significant associa- NCBI Sequence Read Archive (SRA) [69]. We individually tions. As described previously, we used a FDR < 5% [61] aligned the reads from these two samples to the de novo to control for the multiple testing across the 4,425,109 P. tremula assembly (Potra v1.1, available at PopGen SNPs. We calculated the proportion of variance in IE.org) and used UnifiedGenotyper in GATK to call SNPs phenotype explained by a given SNP (PVE) using the at all sites (−-output_mode EMIT_ALL_SITES). For each method of Shim et al. [66]: SNP, two procedures were performed to define their an- cestral states: (1) because P. trichocarpa is more distantly 2β MAFðÞ 1−MAF PVE ¼ 2 related to P. tremula compared to P. tremuloides [70]and ^ ^ 2β MAFðÞ 1−MAFþ se β 2NMAFðÞ 1−MAF from our previous study there were < 1% polymorphic sites shared between P. tremula and P. trichocarpa [69], ð2Þ we inferred the ancestral state as the P. trichocarpa allele where β and MAF is the effect size estimate and minor at sites where the P. trichocarpa individual was homozy- allele frequency for the SNP, N is sample size, and seðβÞ gous and matched one of the P. tremula alleles; otherwise, is standard error of effect size for the SNP. (2) we inferred the ancestral state as the P. tremuloides al- lele at sites where the P. tremuloides individual was homo- Genotype imputation zygous and matched one of the P. tremula alleles. If the For some haplotype-based selection tests, imputed and above two requirements were not met, the ancestral state phased datasets were needed. We therefore used BEA- was defined as missing. In total, we obtained information GLE v4.1 [67] to perform imputation and haplotype of ancestral states for 96.3% of all SNPs. phasing on genotypes of 94 individuals with default pa- rameters. Before performing genotype imputation, we Anchoring and orientation of SNPs associated with local first used Chromosemble from the Satsuma packages adaptation to a single region on chromosome 10 [68] to order and orient the scaffolds of the P. tremula As we found that a large majority of significant SNPs assembly to 19 pseudo-chromosomes according to syn- (> 90%) detected by at least two of the three methods teny with the P. trichocarpa genome. We then per- (PCAdapt, LFMM, and GEMMA) were clustered in a sin- formed pairwise genome alignment between scaffolds of gle genomic region on pseudo-chromosome 10, we per- P. tremula and the 19 pseudochromosomes using the formed several further steps to refine the anchoring and Wang et al. Genome Biology (2018) 19:72 Page 13 of 17 orientation of these SNPs within this region. First, we used sites by length (nS )[23], to test for possible positive se- ColaAlignSatsuma from the Satsuma packages [68]to lection. These statistics were calculated for all SNPs with align the genomes of P. tremula and P. trichocarpa using MAF > 0.05 and with information on ancestral state across default settings. The output was then converted and filtered the genome using the software selscan v1.1.0a [72]with into GBrowse synteny compatible format that was available its assumed default parameters. The iHS and the nS at http://popgenie.org [50]. Based on the alignment of the values were then normalized in frequency bins across the two genomes, 15 scaffolds from the P. tremula assembly whole genome (we used 100 bins). To test for whether that contain SNPs inferred to be associated with local adap- there is significant concentration of selection signals on tation were completely or partially mapped to a single re- the region surrounding the PtFT2, we divided the 19 gion on chromosome 10 of P. trichocarpa genome pseudo-chromosomes (without the seven scaffolds around (Additional file 4: Table S3). We then retained only seven the PtFT2 locus) into non-overlapping windows of 700 scaffolds that were completely mapped to the region and kbp and calculated the proportion of SNPs with |iHS| > 2 with length > 10 kbp. The seven scaffolds contained > 95% or with |nS | > 2 in each window. Statistical significance (1465 out of 1528) of the total number of significant SNPs was assessed using the ranking of genome-wide windows, in the single region of chromosome 10. Lastly, according to with windows having < 100 SNPs being excluded. the alignment results between the genome of P. tremula and P. trichocarpa, we re-ordered and re-oriented the seven Population-specific selective sweeps scaffolds to a ~ 700-kbp region for all downstream selection Several standard methods were further applied to search tests (Additional file 3:FigureS4). for signs of selective sweeps in different groups of popula- tions: (1) pairwise nucleotide diversity (π)[73], which is Linkage disequilibrium expected to have a local reduction following a selective To explore and compare patterns of LD between the sweep, was calculated using a sliding window approach ~ 700-kbp region on chromosome 10 and genome-wide with window size of 10 kbp and moving step of 5 kbp levels, we first calculated correlations (D’ and r )between using the software package - Analysis of Next-Generation all pairwise common SNPs (MAF > 5%, 9149 SNPs) in the Sequencing Data (ANGSD v0.602) [74] separately for ~ 700-kbp region using PLINK 1.9 [56]. Then we used South (pop 1-6), Mid (pop 7-8) and North (pop 9-12) PLINK 1.9 to randomly thin the number of common populations. Only the reads with mapping quality > 30 SNPs across the genome to 200,000 and calculated the and the bases with quality score > 20 were used in the ana- squared correlation coefficients (r )between allpairs of lysis. Windows with < 10% of covered sites remaining SNPs that were within a distance of 100 kbp. The decay of from the previous filtering steps (section 2.1) were ex- LD against physical distance was estimated using cluded; (2) Weir and Cockerham’s F , which measures ST non-linear regression of pairwise r vs the physical dis- genetic divergence between pairs of three groups of popu- tance between sites in base pairs [71]. lations, South, Mid, and North, was calculated using a sliding-window approach with window size of 10 kbp and Fine-mapping the causal variants using CAVIAR moving step of 5 kbp by VCFtools; (3) a combination of We utilized CAVIAR (CAusal Variants Identification in H12 and H2/H1 [27], which measures haplotype homozy- Associated Regions, v1.0) [21] to identify the potential gosity and can distinguish hard from soft selective sweeps, causal variants in the ~ 700-kbp region on chromosome was calculated in windows of 200 SNPs (~ 15 kbp) for 10. CAVIAR is a fine-mapping method that quantifies the common SNPs with MAF > 5% separately for South, Mid, probability of each variant in a locus to be causal and out- and North populations. As the mean LD (r )in P. tremula puts a set of variants that with a predefined probability decays to < 0.1 within 10 kbp (Additional file 3:Figure (e.g. 95% or 99%) contain all of causal variants at the S11a and [69]), the use of ~ 15 kbp windows should be locus. We created the LD structure by computing r be- large enough to differentiate the footprint of selective tween all pairwise significantly associated SNPs in the ~ sweeps from those caused by neutral processes. The H12 700-kbp region using PLINK 1.9. Marginal statistics for and H2/H1 values were then averaged using a sliding win- each significantly associated variant is the association sta- dow method with window size of 10 kbp and moving step tistics obtained from GWAS analysis by GEMMA. In our of 5 kbp; (4) a composite likelihood ratio statistic (CLR) analysis, we set the causal confidence as 99% (−r0.99) to [75], which contrasts the likelihood of the null hypothesis obtain a set of causal variants that capture all the causal based on the genome-wide site frequency spectrum with variants with the probability > 99%. the likelihood of a model where the site frequency has been altered by a recent selective sweep, was computed Positive selection detection using SweepFinder2 [76] separately for South, Mid, We measured two haplotype-based tests, integrated and North populations. SweepFinder2 is most efficient haplotype score (iHS) [22]and thenumber of segregating when information on the ancestral and derived states is Wang et al. Genome Biology (2018) 19:72 Page 14 of 17 available for SNPs and we therefore polarized SNPs as de- heterozygous and homozygous genotypes carrying the scribed above. The small fraction of SNPs (~ 3.7%) that selected (derived) allele were 1 + s/2 and 1 + s, res- could not be polarized was excluded from further analysis pectively. We simulated a chromosome region consisting using SweepFinder2. CLRs were calculated using of L = 25,000 sites and assumed a diploid effective popula- −8 non-overlapping windows with a spacing of 2 kbp; the tion size of N = 92,000, a mutation rate of μ =3.75 × 10 empirical site frequency spectrum across the whole P. tre- per base pair per generation [79], and a recombination −8 mula genome was estimated using the –f option in rate of r =0.729 ×10 per base pair per generation. To- SweepFinder2 after including all polymorphic sites in gether these parameters yielded a scaled population muta- the genome (a total of 8,007,303 SNPs). As recommended tion rate equal to Θ =4N μL = 86.27 and a scaled by Huber et al. [77], we only used sites that were poly- population recombination rate ρ =4N rL = 19.76. For each morphic or that represented fixed substitutions in each simulation, values for both s and T were drawn from uni- group of populations to scan for sweeps. To determine form prior distributions, log (T)~U(− 4,– 0.5) and whether there are significant differences of the above sta- log (s)~U(− 4,– 0.5). tistics between the 700-kbp region around PtFT2 gene on chromosome 10 and genome-wide estimates, we use the Gene expression of PtFT2 under active growth and during same strategy to divide the genome into the windows with growth cessation the same size for each test and calculated the above statis- Samples used for the expression analysis of PtFT2 were tics across the genome (results are shown in Fig. 4b, d, f, collected from both climate chamber and the field (Sävar, h, j and Additional file 5: Table S4). Significance for the 63.4 °N, Umeå) conditions. For treatment in the climate above statistical measurements was evaluated using chamber, two southern clones (SwAsp018, 56.2 °N, Mann–Whitney tests. Ronneby; SwAsp023, 56.2 °N, Ronneby) and two northern To assess the scale of a genomic region that is affected clones (SwAsp100, 63.9 °N, Umeå; SwAsp112, 65.6 °N, by a selective sweep, we ran coalescent simulations mod- Luleå) were selected. These plants were selected to repre- eling a selective sweep in the Northern populations. sent the northern-most and southern-most populations of Simulations were run assuming that the selected site the SwAsp collection that are experiencing the most was located at the center of the simulated region. Pa- diverged photoperiodic conditions. Plants were grown rameters for the simulations were taken from ABC cal- under 23-h day lengths for one month and then trans- culations dating the selective sweep inferred in the ferred to 19-h day-length conditions for two weeks before North populations (as shown below). Briefly, we used a the start of sampling. Leaves were harvested at 2-h inter- scaled population mutation rate (4N μ) of 0.0081/bp, vals for a total period of 24 h using three biological repli- which corresponds to the average observed diversity in cates of each genotype. Samples were subsequently the North populations. Similarly, we set the scaled popu- flash-frozen in liquid nitrogen and stored at − 80 °C until lation recombination rate (4N r) to 0.0019 to match the sample preparation. genome-wide ratio of r/μ = 0.229 in P. tremula [69]. Field samples were collected in the Sävar common Analyses of the simulated data using SweepFinder2 garden in early July 2014 and samples were taken from showed that a single selective sweep often yields mul- two southern clones (SwAsp005, 56.7 °N, Simlång; tiple significant peaks across a region spanning up to, SwAsp023, 56.2 °N, Ronneby) and two northern clones and even exceeding, 100 kbp (95% quartile: 148,221 bp; (SwAsp100, 63.9 °N, Umeå; SwAsp116, 65.6 °N, Luleå). Additional file 3: Figure S18). Leaves were harvested from three different clonal repli- cates planted in the common garden to serve as bio- Dating the selective sweep in the North populations logical repeats. Leaf samples were flash-frozen in in To date the inferred selective sweep in the North popula- liquid nitrogen and stored at − 80 °C until sample prep- tions, we used the ABC method described in Ormond et aration. Samples were collected at 2-h intervals for a al. [29] to jointly estimate s (the strength of selection on total period of 24 h. the beneficial mutation causing the sweep) and T (the RNA extraction for all samples was performed using a time since the beneficial allele fixed) assuming a model of CTAB-LiCl method [80]. Complementary DNA (cDNA) selection from a de novo mutation (hard selective sweep). synthesis was performed using the iScript cDNA Synthe- We simulated 5 × 10 independent selective sweep events sis Kit (BIO-RAD) according to the manufacturer’s in- using the coalescent simulation program msms [78]. For structions. Quantitative real-time PCR analyses were the coalescent simulations, the ancestries of samples were performed using a Roche LightCycler 480 II instrument, traced backwards in time using standard coalescent and the measurements were obtained using the relative methods and allowing for recombination. Selection was quantification method [81]. We used primers qFT2F modelled at a single site by applying forward simulations, (5’-AGCCCAAGGCCTACAGCAGGAA-3′) and qFT2R assuming additive selection so that the fitness of (5’-GGGAATCTTTCTCTCATGAT-3′) for amplifying Wang et al. Genome Biology (2018) 19:72 Page 15 of 17 the transcript of FT2 and qUBQF (5’-GTTGATTTT Additional file 7: Table S6. Average values of 39 environmental TGCTGGGAAGC-3′) and qUBQR (5’-GATCTTGGC variables over the years 2002–2012 for the original sample location of 94 P. tremula individuals used in this study. (XLSX 64 kb) CTTCACGTTGT-3′) for UBQ as the internal control. We assessed the presence of transcription of both PtFT2 (Potra001246g10694) and PtFT2β by digesting RT-PCR Acknowledgements We thank Carin Olofsson for extracting DNA for all samples used in this products with SacI that distinguish the two transcripts study. We thank three anonymous reviewers for their suggestions that (Additional file 3: Figure S7). helped improve the final version of the manuscript. STRÅNG data are obtained from the Swedish Meteorological and Hydrological Institute (SMHI), which were produced with support from the Swedish Radiation Protection Field experiment with transgenic PtFT2 lines Authority and the Swedish Environmental Agency. The authors also would Construction of the PtFT RNAi lines are described in de- like to acknowledge support from Science for Life Laboratory and the tail in [19]. Briefly, the clone used for transformations is a National Genomics Infrastructure (NGI) for providing assistance with massive parallel sequencing. All analyses were performed on resources provided by hybrid aspen, P. tremula × tremuloides, clone T89, that the Swedish National Infrastructure for Computing (SNIC) at Uppsala sets bud at 15-h day lengths [19] and this clone thus has a Multidisciplinary Center for Advanced Computational Science (UPPMAX) photoperiodic response that is comparable to SwAsp ge- under the projects b2010014 and b2011141. notypes from southern Sweden [82]. Transformed T89 Funding plants were planted together with wild type T89 (WT) The research was funded through grants from Vetenskapsrådet, Knut and controls in a common garden at Våxtorp, Halland (lati- Alice Wallenbergs stiftelse, and a Young Researcher Award from Umeå tude 56.4 N, longitude 13.1E) in 2014. Eighteen replicates University to PKI. JW was supported by a scholarship from the Chinese Scholarship Council. BT is supported by the UPSC “Industrial graduate school of each line were planted in a complete randomized block in forest genetics, biotechnology and breeding.” NRS is supported by the design together with six WT controls per block. Starting Trees and Crops for the Future (TC4F) project. in 2015, data were collected on growth cessation, bud for- mation, and bud set for all trees in the common garden. Availability of data and materials The whole genome sequencing (WGS) raw reads have been deposited in From early August, plants were visually inspected roughly NCBI’s sequence read archive (SRA) under accession number PRJNA297202 [83]. every five days and top shoots were scored according to a Background information, bud set genetic values (BLUPs), and environmental pre-determined scoring sheet (Additional file 3:Figure data at the site or origin for all clones used in the GWAS are available from Zendo [84] under a CC BY-SA 4.0 license. All scripts used for the analysis S19) and classified as active growth (score 3), growth ces- described are available on GitHub under a MIT License [85]. sation (score 2), bud formation (score 1), and bud set (score 0). Scoring was continued until all plants had com- Authors’ contributions pletely senesced in late October. Bud scoring data were JW, ON, SJ, NS, and PKI conceived of and designed the experiments. JW, BT, converted to Julian date of bud set and analyzed using the AJ, BN, DGS, NS, and PKI carried out all population genetic analyses. JD performed greenhouse and RT-PCR experiments. KMR and IHM collected following linear model: common garden data. JW and PKI wrote the paper. All authors commented on the manuscript. All authors read and approved the final manuscript. y ¼ μ þ α þ β þ ε i ij ij j Ethics approval and consent to participate where μ is an overall mean, α is the effect of treatment i Not applicable. (where i is either PtFT2 RNAi or WT), and β is the ef- fect of block j and ε are individual residual errors. ij Competing interests The authors declare that they have no competing interests. Additional files Publisher’sNote Additional file 1: Table S1. Geographical details of the 94 P. tremula Springer Nature remains neutral with regard to jurisdictional claims in published samples used in this study and the summary statistics of Illumina re- maps and institutional affiliations. sequencing data per sample. (DOCX 155 kb) Author details Additional file 2: Table S2.. Tracy-Widom statistics for the first three Umeå Plant Science Centre, Department of Ecology and Environmental eigenvalues in PCA. (DOCX 31 kb) Science, Umeå University, 90187 Umeå, Sweden. Centre for Integrative Additional file 3: Figures S1–S19. (PDF 7335 kb) Genetics, Department of Animal and Aquacultural Sciences, Faculty of Life Additional file 4: Table S3. List of the 1615 candidate SNPs associated Sciences, Norwegian University of Life Sciences, PO Box 5003, Ås, Norway. with local adaptation. (XLSX 234 kb) Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, 901 83 Umeå, Additional file 5: Table S4. Summary statistics (median and central 4 5 Sweden. Stora Enso Biomaterials, 13104 Nacka, Sweden. Umeå Plant 95% range) for five selective sweep measures across the ~ 700-kbp Science Centre, Department of Plant Physiology, Umeå University, 90187 region around PtFT2 gene on chromosome 10 and genome-wide level. Umeå, Sweden. Wallenberg Advanced Bioinformatics Infrastructure, Science Pairwise nucleotide diversity (π), genetic divergence between groups of for Life Laboratory, Uppsala University, Uppsala, Sweden. Department of populations (F ), H12, H2/H1, and composite likelihood ratio (CLR) test ST Ecology and Genetics, Evolutionary Biology, Uppsala University, Uppsala, are compared for three groups of populations, South (pop 1–6), Mid (pop Sweden. Uppsala Multidisciplinary Center for Advanced Computational 7–8), and North (pop 9–12) corresponding to Fig. 4. (DOCX 95 kb) Science, Uppsala University, Uppsala, Sweden. Present address: Department Additional file 6: Table S5. ANOVA tables for analyses of gene of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural expression in greenhouse and common garden experiments. (DOCX 51 kb) Sciences, PO Box 7080, 750 07 Uppsala, Sweden. Wang et al. Genome Biology (2018) 19:72 Page 16 of 17 Received: 4 December 2017 Accepted: 3 May 2018 27. Garud NR, Messer PW, Buzbas EO, Petrov DA. Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genet. 2015;11:e1005004. 28. Hermisson J, Pennings PS. Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid References adaptation. Methods Ecol Evol. 2017;8:700–16. 1. Richardson JL, Urban MC, Bolnick DI, Skelly DK. Microgeographic adaptation 29. Ormond L, Foll M, Ewing GB, Pfeifer SP, Jensen JD. Inferring the age of a and the spatial scale of evolution. Trends Ecol Evol. 2014;29:165–76. fixed beneficial allele. Mol Ecol. 2016;25:157–69. 2. Savolainen O, Lascoux M, Merilä J. Ecological genomics of local adaptation. 30. Ingvarsson PK, Garcia MV, Hall D, Luquez V, Jansson S. Clinal variation in Nat Rev Genet. 2013;14:807–20. phyB2, a candidate gene for day-length-induced growth cessation and bud 3. Yeaman S, Whitlock MC. The genetic architecture of adaptation under set, across a latitudinal gradient in European aspen (Populus tremula). migration-selection balance. Evolution. 2011;65:1897–911. Genetics. 2006;172:1845–53. 4. Neale DB, Ingvarsson PK. Population, quantitative and comparative 31. Ingvarsson PK, Garcia MV, Luquez V, Hall D, Jansson S. Nucleotide genomics of adaptation in forest trees. Curr Opin Plant Biol. 2008;11:149–55. polymorphism and phenotypic associations within and around the 5. Neale DB, Kremer A. Forest tree genomics: growing resources and phytochrome B2 Locus in European aspen (Populus tremula, Salicaceae). applications. Nat Rev Genet. 2011;12:111–22. Genetics. 2008;178:2217–26. 6. Savolainen O, Pyhajarvi T, Knurr T. Gene flow and local adaptation in trees. 32. Turck F, Fornara F, Coupland G. Regulation and identity of florigen: Annu Rev Ecol Evol Syst. 2007;21:5530–45. FLOWERING LOCUS T moves center stage. Annu Rev Plant Biol. 2008;59: 7. Aitken SN, Whitlock MC. Assisted gene flow to facilitate local adaptation to 573–94. climate change. Ann Rev Ecol Evol Syst. 2013;44:367–88. 33. Klopfstein S, Currat M, Excoffier L. The fate of mutations surfing on the wave 8. Rohde A, Bhalerao RP. Plant dormancy in the perennial context. Trends of a range expansion. Mol Biol Evol. 2006;23:482–90. Plant Sci. 2007;12:217–23. 34. Excoffier L, Ray N. Surfing during population expansions promotes genetic 9. Singh RK, Svystun T, AlDahmash B, Jönsson AM, Bhalerao RP. Photoperiod- revolutions and structuration. Trends Ecol Evol. 2008;23:347–51. and temperature-mediated control of phenology in trees - a molecular 35. Yeaman S. Local adaptation by alleles of small effect. Am Nat. 2015; perspective. New Phytol. 2017;213:511–24. 186(Suppl 1):S74–89. 10. Hall D, Luquez V, Garcia MV, St Onge KR, Jansson S, Ingvarsson PK. Adaptive 36. Orr HA. The population genetics of adaptation: the distribution of factors population differentiation in phenology across a latitudinal gradient in fixed during adaptive evolution. Evolution. 1998;52:935–49. European aspen (Populus tremula, L.): a comparison of neutral markers, 37. Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and candidate genes and phenotypic traits. Evolution. 2007;61:2849–60. speciation. Genetics. 2006;173:419–34. 11. Ma X-F, Hall D, Onge KRS, Jansson S, Ingvarsson PK. Genetic differentiation, 38. Yeaman S. Genomic rearrangements and the evolution of clusters of locally clinal variation and phenotypic associations with growth cessation across adaptive loci. Proc Natl Acad Sci U S A. 2013;110:E1743–51. the Populus tremula photoperiodic pathway. Genetics. 2010;186:1033–44. 39. Supple MA, Hines HM, Dasmahapatra KK, Lewis JJ, Nielsen DM, Lavoie 12. Luquez V, Hall D, Albrectsen BR, Karlsson J, Ingvarsson P, Jansson S. Natural C, et al. Genomic architecture of adaptive color pattern divergence and phenological variation in aspen (Populus tremula): the SwAsp collection. convergence in Heliconius butterflies. Genome Res. 2013;23:gr150615. Tree Genet Genomes. 2008;4:279–92. 112-1257. 13. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS 40. Feder JL, Gejji R, Yeaman S, Nosil P. Establishment of new mutations Genet. 2006;2:e190. under divergence and genome hitchhiking. Phil Trans Roy Soc B. 2012; 14. De Carvalho D, Ingvarsson PK, Joseph J, Suter L, Sedivy C, Macaya-Sanz D, et B367:461–74. al. Admixture facilitates adaptation from standing variation in the European 41. Yeaman S, Aeschbacher S, Bürger R. The evolution of genomic islands aspen (Populus tremula L.), a widespread forest tree. Mol Ecol. 2010;19:1638– by increased establishment probability of linked alleles. Mol Ecol. 2016; 25:2542–58. 15. Duforet-Frebourg N, Bazin É, Blum MGB. Genome scans for detecting footprints of local adaptation using a Bayesian factor model. Mol Biol Evol. 42. Pin PA, Nilsson O. The multifaceted roles of FLOWERING LOCUS T in plant 2014;31:2483–95. development. Plant Cell Environ. 2012;35:1742–55. 16. Frichot É, Schoville SD, Bouchard G, François O. Testing for associations 43. Stern DL, Orgogozo V. Is genetic evolution predictable? Science. 2009;323: between loci and environmental gradients using latent factor mixed 746–51. models. Mol Biol Evol. 2013;30:1687–99. 44. Stern DL, Orgogozo V. The loci of evolution: how predictable is genetic evolution? Evolution. 2008;62:2155–77. 17. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for 45. Li P, Filiault D, Box MS, Kerdaffrec E, van Oosterhout C, Wilczek AM, et al. association studies. Nat Genet. 2012;44:821–4. Multiple FLC haplotypes defined by independent cis-regulatory variation 18. Ding J, Nilsson O. Molecular regulation of phenology in trees-because the underpin life history diversity in Arabidopsis thaliana. Genes Dev. 2014;28: seasons they are a-changin. Curr Opin Plant Biol. 2016;29:73–9. 1635–40. 19. Böhlenius H, Huang T, Charbonnel-Campaa L, Brunner AM, Jansson S, Strauss SH, et al. CO/FT regulatory module controls timing of flowering and 46. Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C, Halldorsdottir SS, seasonal growth cessation in trees. Science. 2006;312:1040–3. et al. A latitudinal cline in flowering time in Arabidopsis thaliana modulated 20. Evans LM, Slavov GT, Rodgers-Melnick E, Martin J, Ranjan P, Muchero W, et by the flowering time gene FRIGIDA. Proc Natls Acad Sci USA. 2004;101: 4712–7. al. Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet. 2014;46:1089–96. 47. Huo H, Wei S, Bradford KJ. DELAY OF GERMINATION1(DOG1) regulates both 21. Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E. Identifying seed dormancy and flowering time through microRNA pathways. Proc Natl causal variants at loci with multiple signals of association. Genetics. Acad Sci U S A. 2016;113:E2199–206. 2014;198:497–508. 48. Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, et al. 22. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive Multiple alleles at a single locus control seed dormancy in Swedish selection in the human genome. PLoS Biol. 2006;4:e72. Arabidopsis. elife. 2016;5:e22502. 49. Bolger AM, Lohse M, Usadel B. Trimmomatic a flexible trimmer for Illumina 23. Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. On detecting sequence data. Bioinformatics. 2014;30:2114–20. incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol. 2014;31:1275–91. 50. Sundell D, Mannapperuma C, Netotea S, Delhomme N, Lin Y-C, Sjödin A, et al. The Plant Genome Integrative Explorer Resource: PlantGenIE.org. New 24. Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, Schaffner SF, et al. Detecting recent positive selection in the human genome from haplotype Phytol. 2015;208:1149–56. structure. Nature. 2002;419:832–7. 51. Li H. Aligning sequence reads, clone sequences and assembly contigs with 25. Bragg JG, Supple MA, Andrew RL, Borevitz JO. Genomic variation across BWA-MEM. arXiv. 2013;1303:3997. http://arxiv.org/abs/1303.3997 landscapes: insights and applications. New Phytol. 2015;207:953–67. 52. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The 26. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005; Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 39:197–218. 2078–9. Wang et al. Genome Biology (2018) 19:72 Page 17 of 17 53. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A 79. Ingvarsson PK. Multilocus patterns of nucleotide polymorphism and the framework for variation discovery and genotyping using next-generation demographic history of Populus tremula. Genetics. 2008;180:329–40. DNA sequencing data. Nat Genet. 2011;43:491–8. 80. Xu M, Zang B, Yao HS, Huang MR. Isolation of high quality RNA and 54. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive molecular manipulations with various tissues of Populus. Russ J Plant elements in genomic sequences. Curr Protoc Bioinformatics. 2009;Chapter 4: Physiol. 2009;56:716–9. Unit 4.10. 81. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using 55. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program real-time quantitative PCR and the 2(−Delta Delta C(T)) Method. Methods. for annotating and predicting the effects of single nucleotide 2001;25:402–8. polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster 82. Michelson IH, Ingvarsson PK, Robinson KM, Edlund E, Eriksson ME, Nilsson O, strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. et al. Autumn senescence in aspen is not triggered by day length. Physiol Plant. 2018;162:123–34. 56. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. 83. Wang J, Ding J, Tan B, Robinson KM, Michelson IH, Johansson A, et al. A PLINK: a tool set for whole-genome association and population-based major locus controls local adaptation and adaptive life history variation in a linkage analyses. Am J Hum Genet. 2007;81:559–75. perennial plant. NCBI SRA; 2017. BioProject Accession: PRJNA297202. https:// 57. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population www.ncbi.nlm.nih.gov/bioproject/PRJNA297202/. Accessed 4 Oct 2016. structure. Evolution. 1984;38:1358–70. 84. Ingvarsson PK. Data from SwAsp collection - environmental PCAs and bud 58. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The set BLUPs. https://doi.org/10.5281/zenodo.844372. Accessed 4 Dec 2017. variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. 85. Wang J, Ding J, Tan B, Robinson KM, Michelson IH, Johansson A, et al. A 59. Oksanen J, Kindt R, Legendre P, OHara B, Simpson GL, Solymos P, et al. major locus controls local adaptation and adaptive life history variation in a vegan: Community Ecology Package. https://cran.r-project.org/web/ perennial plant. Github. 2018. https://github.com/parkingvarsson/ packages/vegan/index.html. PhotoperiodLocalAdaptation. Accessed 26 Mar 2018. 60. Duforet-Frebourg N, Luu K, Laval G, Bazin É, Blum MGB. Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes Data. Mol Biol Evol. 2016;33:1082–93. 61. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–5. 62. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol. 2005;25: 1965–78. 63. Frichot E, François O. LEA an R package for Landscape and Ecological Association studies. Methods Ecol Evol. 2015;6:925–9. 64. Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml User Guide Release 3. 0. 2009. http://www.vsni.co.uk/. 65. Vilhjálmsson BJ, Nordborg M. The nature of confounding in genome-wide association studies. Nat Rev Genet. 2013;14:1–2. 66. Shim H, Chasman DI, Smith JD, Mora S, Ridker PM, Nickerson DA, et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One. 2015;10: e0120758. 67. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84:210–23. 68. Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Di Palma F, et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics. 2010;26:1145–51. 69. Wang J, Street NR, Scofield DG, PK I. Natural selection and recombination rate variation shape nucleotide polymorphism across the genomes of three related Populus species. Genetics. 2016;202:1185–200. 70. Wang Z, Du S, Dayanandan S, Wang D, Zeng Y, Zhang J. Phylogeny reconstruction and hybrid analysis of Populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments. PLoS One. 2014;9:e103645. 71. Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A. 2001;98:11479–84. 72. Szpiech ZA, Hernandez RD. Selscan an efficient multi-threaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 2014; 31:2824–7. 73. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–95. 74. Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15:356. 75. Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002;160:765–77. 76. DeGiorgio M, Huber CD, Hubisz MJ, Hellmann I, Nielsen R. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformatics. 2016;32:1895–7. 77. Huber CD, DeGiorgio M, Hellmann I, Nielsen R. Detecting recent selective sweeps while controlling for mutation rate and background selection. Mol Ecol. 2016;25:142–56. 78. Ewing G, Hermisson J. MSMS a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2010;26:2064–5.

Journal

Genome BiologySpringer Journals

Published: Jun 4, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off