Nonequilibrium Neutral Theory for Hitchhikers

Nonequilibrium Neutral Theory for Hitchhikers Abstract Selective sweep is a phenomenon of reduced variation at presumably neutrally evolving sites (hitchhikers) in the genome that is caused by the spread of a selected allele at a linked focal site, and is widely used to test for action of positive selection. Nonetheless, selective sweep may also provide an unprecedented opportunity for studying nonequilibrium properties of the neutral variation itself. We have demonstrated this possibility in relation to ancient selective sweep for modern human-specific changes and ongoing selective sweep for local population-specific changes. Neutral Theory and the Site Frequency Spectrum A decade ago, Crow (2008) viewed the then present status of the neutral theory of Kimura (1968) as a standard null model in evolutionary genetics: “Vast regions of the genome are near enough to being neutral for neutrality to be assumed. In fact virtually every study of molecular variability, evolutionary rates, and coalescence is based on this assumption.” At the same time, since he knew Kimura’s devotion to the theory, he felt irony when seeing that reduced neutral variation around a focal site often provided the strongest evidence for selection acting on the site. Although these aspects of the neutral theory are now taken for granted, it seems that there still exit many to be learned about neutral variation per se. Here, we are concerned with nonequilibrium properties of the site frequency spectrum (SFS) among selectively neutral hitchhikers in ancient and ongoing selective sweeps (Maynard Smith and Haigh 1974; Kaplan et al. 1989). In one of his seminal papers, Kimura (1969) proposed the infinite-sites model for DNA variation in a diploid population of constant size Ne (see also Kimura 1971) and obtained the distribution of allele frequency under the steady-flux of mutations at rate u per region per generation. With the scaled mutation parameter θ=4Neu, the Kimura’s distribution allowed the prediction of the average number Eξi=θi of segregating sites at each of which the derived allele is present in i copies ( 1≤i≤n-1) in a sample of size n (Charlesworth and Charlesworth 2012). The unfolded SFS {ξi} and its summary statistics such as S=∑i=1n-1ξi, Π=∑i=1n-12in-inn-1ξi, L=∑i=1n-1in-1ξi, and H=∑i=1n-12i2nn-1ξi have since been extensively studied and used to test neutrality and detect selective sweeps (Watterson 1975; Tajima 1989; Fu 1995; Griffiths and Tavaré 1998; Fay and Wu 2000; Bustamante et al. 2001; Zeng et al. 2006; Coop and Ralph 2012; Enard et al. 2014; Ferretti et al. 2017). In the above, S is the number of segregating sites per region, Π is the mean pairwise differences, L is the normalized number of derived alleles at the S sites, and H is related to the Fay and Wu’s statistic. We will explore one simple idea to quantify a selective sweep in terms of the SFS {ξi} among hitchhikers when their linked selected allele has been segregating for the past 2 Net generations or was fixed 2 Neτ generations ago. We assume that the selected allele emerges with initial frequency f0 in a population of constant size Ne and is driven by genic selection (no dominance) with intensity s. While segregating at time t after the emergence, the copy number of the allele is represented by nr in a sample of size n and the sample frequency is given by fr=nrn. As t increases, fr irreversibly increases due to selection and so does nr. When the fixation occurs, the fixation time is denoted by tfix and the total time elapsed by tfix+τ in units of 2Ne generations. Hence, for 0<t<tfix, the sweep is ongoing and the sample size of the selected allele is specified by nr, while for τ>0, the sweep is completed and the sample size is specified by n. In accordance with this sampling of the selected allele, we consider linked neutral hitchhikers in a sample of size nr for an ongoing sweep and n for an ancient sweep. In either case, since segregating sites among hitchhikers are characterized by “sizes” of 2(nr-1) or 2(n-1) branches in a coalescent tree (Fu 1995) and many large sizes cannot be simultaneously present in a sample, we bin the segregating sites into such size-classes as c1={i=1}, c2={i=2, 3}, c3={4≤i≤9}, c4={10≤i≤25}, c5={26≤i≤68}, if n or nr>68. Using these size-classes and Eξi under the standard neutral model, we define the normalized SFS ξ˜cj of class cj (fig. 1). Equilibrium E{ξ˜cj} is ∼ 1 in all size-classes, but nonequilibrium ξ˜cj behave differently in different size-classes. This can be demonstrated by coalescent simulation, and of a number of available such simulators, we use discoal that implements both soft and hard sweeps (Kern and Schrider 2016). FIG. 1. View largeDownload slide (A) Cumulative distributions of the normalized SFS in size-class cj, {ξ˜cj}, in ongoing (warm colored) and ancient (cold colored) sweeps. The number of segregating sites ∑i∈cjξi in size-class cj is normalized by the expected equilibrium value of θacj where acj=∑i∈cj1i and the summation is taken for all i in cj:. ξ˜cj=1θacj∑i∈cjξi (1) Sample size n in (1) is replaced by nr for a sample of hitchhikers in an ongoing sweep. In simulation, the selected allele trajectory is stochastic, but for sufficiently strong selection, it is approximated by: t=2aln⁡fr(1-f0)(1-fr)f0 (2) The time before fixation is thus specified by fr=0.3, 0.6, 0.9, while the time at and after fixation by τ =0, 0.5, 1, 2 in units of 2Ne generations. Two cases of ρ per region are depicted: ρ=0 (left) and ρ=40 (right). For each case, we made 104 replications. A dot on each curve indicates the mode. (B) Stairway plot based on the SFS produced by discoal for a sample of hitchhikers with S=13 when fr=0.9 and f0=0.001 (a left panel for a hard sweep) and of those with S=31 when fr=0.9 and f0=0.1 (a right panel for a soft sweep). An abrupt termination of basal coalescences is a characteristic of hard sweeps. The ordinate stands for the number of hitchhikers and the abscissa for time t estimated under the assumption of the per-year mutation rate per bp of 0.5×10-9. Other parameters are set as n=68, θ=40, a=2Nes=400, ρ=0, and 40. The command line is ./discoal 68 10000 70000 -t 40 -a 400 -ws τ/2 -r ρ -f f0 -x 0.5 -c fr. FIG. 1. View largeDownload slide (A) Cumulative distributions of the normalized SFS in size-class cj, {ξ˜cj}, in ongoing (warm colored) and ancient (cold colored) sweeps. The number of segregating sites ∑i∈cjξi in size-class cj is normalized by the expected equilibrium value of θacj where acj=∑i∈cj1i and the summation is taken for all i in cj:. ξ˜cj=1θacj∑i∈cjξi (1) Sample size n in (1) is replaced by nr for a sample of hitchhikers in an ongoing sweep. In simulation, the selected allele trajectory is stochastic, but for sufficiently strong selection, it is approximated by: t=2aln⁡fr(1-f0)(1-fr)f0 (2) The time before fixation is thus specified by fr=0.3, 0.6, 0.9, while the time at and after fixation by τ =0, 0.5, 1, 2 in units of 2Ne generations. Two cases of ρ per region are depicted: ρ=0 (left) and ρ=40 (right). For each case, we made 104 replications. A dot on each curve indicates the mode. (B) Stairway plot based on the SFS produced by discoal for a sample of hitchhikers with S=13 when fr=0.9 and f0=0.001 (a left panel for a hard sweep) and of those with S=31 when fr=0.9 and f0=0.1 (a right panel for a soft sweep). An abrupt termination of basal coalescences is a characteristic of hard sweeps. The ordinate stands for the number of hitchhikers and the abscissa for time t estimated under the assumption of the per-year mutation rate per bp of 0.5×10-9. Other parameters are set as n=68, θ=40, a=2Nes=400, ρ=0, and 40. The command line is ./discoal 68 10000 70000 -t 40 -a 400 -ws τ/2 -r ρ -f f0 -x 0.5 -c fr. Hitchhikers in an Ancient Selective Sweep In the human population, it has been argued that any sweep is unlikely to leave significant signals beyond 250 thousand years (ky) (Enard et al. 2002; Przeworski 2002). Nevertheless, it is important to reveal genetic changes that have been fixed by positive selection in the stem lineage of modern humans. In the rejection sampling approach, Przeworski (2003) used S, D∝S-Π and the number ( K) of haplotypes to estimate the time τ elapsed since an ancient sweep was completed. Recently, Racimo et al. (2014), Racimo (2016) and Peyrée´gne et al. (2017) took alternative approaches that use Denisovan and Neanderthal genomes and rely on distorted lineage sorting due to ancient sweeps that could occur in particular branches of the population tree. These approaches may identify candidate regions with sweep signals older than 400 ky, but do not necessarily pinpoint target sites of positive selection. One way in developing an inference method for ancient sweep is to focus on large size-classes of ξ˜cj, because the larger the size, the longer the time required for sweep signals to be erased by new mutations and/or recombination. For instance, the distribution of ξ˜c5has its mode at ξ˜c5<0.5 when τ=1, indicating that unlike the remaining smaller size-classes, most sweep signals are still retained in c5 even 2Ne generations after the fixation (fig. 1). If 2Ne generations amounts to 2×104 generations in the case of humans (Takahata 1993) and the generation time is 20 (or 30) years, then τ=1 equals 400 (or 600) ky. Needless to say, as τ further increases up to 2, ξ˜c5 equilibrates like all other size-classes. For larger sample size n, we can have even higher size-classes. Unfortunately, however, we cannot greatly improve the intrinsic ceiling in the rate-of-return to equilibrium by increasing the sample size (Wakeley 2009). On the other hand, the distribution of ξ˜cj is sensitive to θ; the smaller the θ, the greater the stochastic error and broader the distribution. For example, when τ=0.5 and the population recombination rate per region ρ=0, the coefficient of variation of ξ˜cj is  >3 times greater when θ=10 compared with θ=100. This suggests that a region of hitchhikers should be reasonably large to detect an ancient sweep. Although recombination narrows the distribution of ξ˜cj, it does not greatly alter the mode or the median (fig. 1). We can expect such a relatively minor effect on a sample of hitchhikers after the fixation of a linked selected allele because recombination occurs only among closely related hitchhikers. However, this is not the case under ongoing selective sweep. Hitchhikers in an Ongoing Selective Sweep Fan et al. (2016) reviewed current efforts in establishing a complete picture of local adaptations after the out-of-Africa dispersal of modern humans. For such ongoing sweeps, it is important to consider how intra-allelic variability is measured (Slatkin and Rannala 1997, 2000). Fujito, Satta, Hane, et al. (2018) and Fujito, Satta, Hayakawa, et al. (2018) developed one such statistic Fc as the ratio of the number of derived alleles in a sample of hitchhikers to that in the whole sample. While hitchhikers can restore the intra-allelic variability in small size-classes relatively rapidly (Kimura and Ohta 1973), it is not the case in large size-classes when a sweep is still ongoing without recombination. The recovery becomes substantial only at some time after the selected allele has been fixed. In relation to this intra-allelic variability, if the initial frequency of the selected allele is as low as f0=0.001, there is little chance for multiple ancestral lineages of hitchhikers to be traced back beyond the selection phase. This is almost equivalent to the hard sweep assumption (Hermisson and Pennings 2005), and in the absence of recombination, there is little or no intra-allelic variability in large size-classes before the fixation, resulting in a small Fc value. Recombination changes these features dramatically. Allowing some of the ancestral lineages of hitchhikers to stop over (backward in time) or ride on the way (forward in time), recombination instantaneously creates deep genealogical separation among hitchhikers (see Fig. 1 in Coop et al. 2008). Intriguingly, the result of this stopover process depends strongly on the current frequency ( fr) of a selected allele. When fr=0.3 and n=68, nr=nfr≈20 and recombination cannot generate segregating sites in c5 or sites with any size greater than i=20: this reflects the unusual pattern and the absence of the fr=0.3 cumulative distribution in c4 and c5, respectively (fig. 1). Although under any ordinary circumstance, it is true that recombination affects mostly lower branches or tips of a coalescent tree (Ferretti et al. 2017), it also affects the SFS in higher branches and larger size-classes when coupled with an ongoing sweep. Recombination thus leaves enhanced intra-allelic variability that persists even after the fixation of a selected allele. Perspectives It is not a purpose of this perspective to propose any statistical method that can be developed based on the hierarchical binning of segregating sites and/or linkage disequilibrium (see Fujito, Satta, Hane, et al. 2018; Fujito, Satta, Hayakawa, et al. 2018 for such a method based on Fc and comparisons with available methods). We have instead demonstrated that both nonequilibrium properties of selectively neutral hitchhikers and effects of recombination on them differ greatly among different size-classes of SFS. We can naturally extend the same approach to such summaries as S, Π, L, H, and K if appropriately defined in terms of size-classed segregating sites. There is another interesting application of nonequilibrium SFS. To infer the human demographic history, Liu and Fu (2015) have proposed a SFS-based method called the stairway plot and applied it to the 1000 Genome Project sequence data. It is an attractive and powerful composite likelihood method for inferring recent changes in population size. To examine whether or not the stairway plot can recover the frequency change of a selected allele that imitates a population size change, we applied the method to a SFS among hitchhikers in an ongoing sweep. Despite the necessarily small number of segregating sites, we have noted that one characteristic in hitchhikers’ SFS lies in ξk=0 for all k greater than a certain j. The principle used in the stairway plot must then infer the scaled population size of ϑk=0 per bp at all (coalescent) states k in 2≤k≤nr-j. The bootstrap resampling procedure does not change this inference. This would suggest that the state nr-j+1 corresponds to the initial frequency of the selected allele as well as the time of the emergence. We used SFSs in a region of length 70kb that were generated by discoal with ρ=0, θ=40, a=400, and various combinations of f0 and fr. Surprisingly, the stairway plot only with S=∑ξi=13 or 31 could depict the assumed frequency trajectory of the selected allele in a satisfactory manner (fig. 1). Unless this is fortuitous, it provides us with an excellent inference device for the time period and strength of positive selection inscribed in selectively neutral hitchhikers. In particular, it may offer a clue to distinguish between soft and hard sweeps, and in either case, it appears that necessary conditions for detecting sweeps are roughly t≪1 and f0≪fr. However, its application to nonhitchhikers or the entire population might be more challenging. “Increasingly, the neutral theory is not so much an end in itself, but is a way to study other evolutionary processes, such as migration.” Crow (2008) thus closed his reminiscence. Yet he did not forget to quote JBS Haldane for Kimura: the highest honor a scientist can have is for his theory to be so taken for granted that his name is no longer mentioned. These words must have sounded gracious to Kimura and seem most apropos for the 50th anniversary of his theory and the commemorating 2018 SMBE meeting held in his country. Acknowledgments We thank reviewers for constructive criticisms and Dr Quintin Lau for his useful comments on the English. This work was supported in part by the Japan Society for Promotion of Science (JSPS) grant 16H04821 to Y.S., and the Scientific Research on Innovative Areas, a MEXT Grant-in-Aid Project FY2016-2020 16H06412 to N.T. References Bustamante CD , Wakeley J , Sawyer S , Hartl DL. 2001 . Directional selection and the site-frequency spectrum . Genetics 159 ( 4 ): 1779 – 1788 . Google Scholar PubMed Charlesworth B , Charlesworth D. 2012 . Elements of evolutionary genetics. 2nd ed. Colorado: Roberts and Company Publishers . Coop G , Bullaughey K , Luca F , Przeworski M. 2008 . The timing of selection at the human FOXP2 gene . Mol Biol Evol . 25 ( 7 ): 1257 – 1259 . Google Scholar CrossRef Search ADS PubMed Coop G , Ralph P. 2012 . Patterns of neutral diversity under general models of selective sweeps . Genetics 192 ( 1 ): 205 – 224 . Google Scholar CrossRef Search ADS PubMed Crow JF. 2008 . Motoo Kimura and the rise of neutralism. In: Oren H , Michael D , editors. Rebels, maverick and heretics in biology. New York (NY ): Yale University . p. 265 – 280 . Enard D , Messer PW , Petrov DA. 2014 . Genome-wide signals of positive selection in human evolution . Genome Res . 24 ( 6 ): 885 – 895 . Google Scholar CrossRef Search ADS PubMed Enard W , Przeworski M , Fisher SE , Lai CSL , Wiebe V , Kitano T , Monaco AP , Päabo S. 2002 . Molecular evolutio of FOXP2, a gene involved in speech and laugguage . Nature 418 ( 6900 ): 869 – 872 . Google Scholar CrossRef Search ADS PubMed Fan S , Hansen ME , Lo Y , Tishkoff SA. 2016 . Going global by adapting local: a review of recent human adaptation . Science 354 ( 6308 ): 54 – 59 . Google Scholar CrossRef Search ADS PubMed Fay JC , Wu CI. 2000 . Hitchhiking under positive Darwinian selection . Genetics 155 ( 3 ): 1405 – 1413 . Google Scholar PubMed Ferretti L , Ledda A , Wiehe T , Achaz G , Ramos-Onsins SE. 2017 . Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests . Genetics 207 ( 1 ): 229 – 240 . Google Scholar CrossRef Search ADS PubMed Fu YX. 1995 . Statistical properties of segregating sites . Theor Popul Biol . 48 ( 2 ): 172 – 197 . Google Scholar CrossRef Search ADS PubMed Fujito NT , Satta Y , Hane M , Matsui A , Yashima K , Kitajima K , Sato C , Takahata N , Hayakawa T. 2018 . Positive selection on schizophrenia-associated ST8SIA2 gene in post-glacial Asia . PLoS One (submitted). Fujito NT , Satta Y , Hayakawa T , Takahata N. 2018 . A new inference method for ongoing selective sweep . Gene Genet Syst . (submitted). Griffiths RC , Tavaré S. 1998 . The age of a mutation in a general coalescent tree . Commun Stat Stochastic Models 14 ( 1–2 ): 273 – 295 . Google Scholar CrossRef Search ADS Hermisson J , Pennings PS. 2005 . Soft sweeps: molecular population genetics of adaptation from standing genetic variation . Genetics 169 ( 4 ): 2335 – 2352 . Google Scholar CrossRef Search ADS PubMed Kaplan NL , Hudson RR , Langley CH. 1989 . The “hitchhiking effect” revisited . Genetics 123 ( 4 ): 887 – 899 . Google Scholar PubMed Kern AD , Schrider DR. 2016 . discoal: flexible coalescent simulations with selection . Bioinformatics 32 ( 24 ): 3839 – 3841 . Google Scholar CrossRef Search ADS PubMed Kimura M. 1968 . Evolutionary rate at the molecular level . Nature 217 ( 5129 ): 624 – 626 . Google Scholar CrossRef Search ADS PubMed Kimura M. 1969 . The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations . Genetics 61 : 893 – 903 . Google Scholar PubMed Kimura M. 1971 . Theoretical foundations of population genetics at the molecular level . Theor Popul Biol . 2 ( 2 ): 174 – 208 . Google Scholar CrossRef Search ADS PubMed Kimura M , Ohta T. 1973 . The age of a neutral mutant persisting in a finite population . Genetics 75 ( 2 ):535–212. Liu X , Fu YX. 2015 . Exploring population size changes using SNP frequency spectra . Nat Genet . 47 ( 5 ): 555 – 559 . Google Scholar CrossRef Search ADS PubMed Maynard Smith J , Haigh J. 1974 . The hitchhiking effect of a favorable gene . Genet Res . 23 ( 01 ): 23 – 35 . Google Scholar CrossRef Search ADS PubMed Peyrégne S , Boyle MJ , Dannemann M , Prüfer K. 2017 . Detecting ancient positive selection in humans using extended lineage sorting . Genome Res . 27 ( 9 ): 1563 – 1572 . Google Scholar CrossRef Search ADS PubMed Przeworski M. 2002 . The signature of positive selection at randomly chosen loci . Genetics 160 ( 3 ): 1179 – 1189 . Google Scholar PubMed Przeworski M. 2003 . Estimating the time since the fixation of a beneficial allele . Genetics 164 ( 4 ): 1667 – 1676 . Google Scholar PubMed Racimo F. 2016 . Testing for ancient selection using cross-population allele frequency differentiation . Genetics 202 ( 2 ): 733 – 750 . Google Scholar CrossRef Search ADS PubMed Racimo F , Kuhlwilm M , Slatkin M. 2014 . A test for ancient selective sweeps and an application to candidate sites in modern humans . Mol Biol Evol . 31 ( 12 ): 3344 – 3358 . Google Scholar CrossRef Search ADS PubMed Slatkin M , Rannala B. 1997 . Estimating the age of alleles by use of intra-allelic variability . Am J Hum Genet . 60 : 447 – 458 . Google Scholar PubMed Slatkin M , Rannala B. 2000 . Estimating allele age . Annu Rev Genomics Hum Genet . 1 : 225 – 249 . Google Scholar CrossRef Search ADS PubMed Tajima F. 1989 . Statistical method for testing the neutral mutation hypothesis by DNA polymorphism . Genetics 123 ( 3 ): 585 – 595 . Google Scholar PubMed Takahata N. 1993 . Allelic genealogy and human evolution . Mol Biol Evol . 10 ( 1 ): 2 – 22 . Google Scholar PubMed Wakeley J. 2009 . Coalescent theory. An introduction. Greenwood Village (CO ): Roberts & Company Publishers . Watterson GA. 1975 . On the number of segregating sites in genetical models without recombination . Theor Popul Biol . 7 ( 2 ): 256 – 276 . Google Scholar CrossRef Search ADS PubMed Zeng K , Fu YX , Shi S , Wu CI. 2006 . Statistical tests for detecting positive selection by utilizing high-frequency variants . Genetics 174 ( 3 ): 1431 – 1439 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Biology and Evolution Oxford University Press

Nonequilibrium Neutral Theory for Hitchhikers

Loading next page...
 
/lp/ou_press/nonequilibrium-neutral-theory-for-hitchhikers-5rbcl8u4Al
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
0737-4038
eISSN
1537-1719
D.O.I.
10.1093/molbev/msy093
Publisher site
See Article on Publisher Site

Abstract

Abstract Selective sweep is a phenomenon of reduced variation at presumably neutrally evolving sites (hitchhikers) in the genome that is caused by the spread of a selected allele at a linked focal site, and is widely used to test for action of positive selection. Nonetheless, selective sweep may also provide an unprecedented opportunity for studying nonequilibrium properties of the neutral variation itself. We have demonstrated this possibility in relation to ancient selective sweep for modern human-specific changes and ongoing selective sweep for local population-specific changes. Neutral Theory and the Site Frequency Spectrum A decade ago, Crow (2008) viewed the then present status of the neutral theory of Kimura (1968) as a standard null model in evolutionary genetics: “Vast regions of the genome are near enough to being neutral for neutrality to be assumed. In fact virtually every study of molecular variability, evolutionary rates, and coalescence is based on this assumption.” At the same time, since he knew Kimura’s devotion to the theory, he felt irony when seeing that reduced neutral variation around a focal site often provided the strongest evidence for selection acting on the site. Although these aspects of the neutral theory are now taken for granted, it seems that there still exit many to be learned about neutral variation per se. Here, we are concerned with nonequilibrium properties of the site frequency spectrum (SFS) among selectively neutral hitchhikers in ancient and ongoing selective sweeps (Maynard Smith and Haigh 1974; Kaplan et al. 1989). In one of his seminal papers, Kimura (1969) proposed the infinite-sites model for DNA variation in a diploid population of constant size Ne (see also Kimura 1971) and obtained the distribution of allele frequency under the steady-flux of mutations at rate u per region per generation. With the scaled mutation parameter θ=4Neu, the Kimura’s distribution allowed the prediction of the average number Eξi=θi of segregating sites at each of which the derived allele is present in i copies ( 1≤i≤n-1) in a sample of size n (Charlesworth and Charlesworth 2012). The unfolded SFS {ξi} and its summary statistics such as S=∑i=1n-1ξi, Π=∑i=1n-12in-inn-1ξi, L=∑i=1n-1in-1ξi, and H=∑i=1n-12i2nn-1ξi have since been extensively studied and used to test neutrality and detect selective sweeps (Watterson 1975; Tajima 1989; Fu 1995; Griffiths and Tavaré 1998; Fay and Wu 2000; Bustamante et al. 2001; Zeng et al. 2006; Coop and Ralph 2012; Enard et al. 2014; Ferretti et al. 2017). In the above, S is the number of segregating sites per region, Π is the mean pairwise differences, L is the normalized number of derived alleles at the S sites, and H is related to the Fay and Wu’s statistic. We will explore one simple idea to quantify a selective sweep in terms of the SFS {ξi} among hitchhikers when their linked selected allele has been segregating for the past 2 Net generations or was fixed 2 Neτ generations ago. We assume that the selected allele emerges with initial frequency f0 in a population of constant size Ne and is driven by genic selection (no dominance) with intensity s. While segregating at time t after the emergence, the copy number of the allele is represented by nr in a sample of size n and the sample frequency is given by fr=nrn. As t increases, fr irreversibly increases due to selection and so does nr. When the fixation occurs, the fixation time is denoted by tfix and the total time elapsed by tfix+τ in units of 2Ne generations. Hence, for 0<t<tfix, the sweep is ongoing and the sample size of the selected allele is specified by nr, while for τ>0, the sweep is completed and the sample size is specified by n. In accordance with this sampling of the selected allele, we consider linked neutral hitchhikers in a sample of size nr for an ongoing sweep and n for an ancient sweep. In either case, since segregating sites among hitchhikers are characterized by “sizes” of 2(nr-1) or 2(n-1) branches in a coalescent tree (Fu 1995) and many large sizes cannot be simultaneously present in a sample, we bin the segregating sites into such size-classes as c1={i=1}, c2={i=2, 3}, c3={4≤i≤9}, c4={10≤i≤25}, c5={26≤i≤68}, if n or nr>68. Using these size-classes and Eξi under the standard neutral model, we define the normalized SFS ξ˜cj of class cj (fig. 1). Equilibrium E{ξ˜cj} is ∼ 1 in all size-classes, but nonequilibrium ξ˜cj behave differently in different size-classes. This can be demonstrated by coalescent simulation, and of a number of available such simulators, we use discoal that implements both soft and hard sweeps (Kern and Schrider 2016). FIG. 1. View largeDownload slide (A) Cumulative distributions of the normalized SFS in size-class cj, {ξ˜cj}, in ongoing (warm colored) and ancient (cold colored) sweeps. The number of segregating sites ∑i∈cjξi in size-class cj is normalized by the expected equilibrium value of θacj where acj=∑i∈cj1i and the summation is taken for all i in cj:. ξ˜cj=1θacj∑i∈cjξi (1) Sample size n in (1) is replaced by nr for a sample of hitchhikers in an ongoing sweep. In simulation, the selected allele trajectory is stochastic, but for sufficiently strong selection, it is approximated by: t=2aln⁡fr(1-f0)(1-fr)f0 (2) The time before fixation is thus specified by fr=0.3, 0.6, 0.9, while the time at and after fixation by τ =0, 0.5, 1, 2 in units of 2Ne generations. Two cases of ρ per region are depicted: ρ=0 (left) and ρ=40 (right). For each case, we made 104 replications. A dot on each curve indicates the mode. (B) Stairway plot based on the SFS produced by discoal for a sample of hitchhikers with S=13 when fr=0.9 and f0=0.001 (a left panel for a hard sweep) and of those with S=31 when fr=0.9 and f0=0.1 (a right panel for a soft sweep). An abrupt termination of basal coalescences is a characteristic of hard sweeps. The ordinate stands for the number of hitchhikers and the abscissa for time t estimated under the assumption of the per-year mutation rate per bp of 0.5×10-9. Other parameters are set as n=68, θ=40, a=2Nes=400, ρ=0, and 40. The command line is ./discoal 68 10000 70000 -t 40 -a 400 -ws τ/2 -r ρ -f f0 -x 0.5 -c fr. FIG. 1. View largeDownload slide (A) Cumulative distributions of the normalized SFS in size-class cj, {ξ˜cj}, in ongoing (warm colored) and ancient (cold colored) sweeps. The number of segregating sites ∑i∈cjξi in size-class cj is normalized by the expected equilibrium value of θacj where acj=∑i∈cj1i and the summation is taken for all i in cj:. ξ˜cj=1θacj∑i∈cjξi (1) Sample size n in (1) is replaced by nr for a sample of hitchhikers in an ongoing sweep. In simulation, the selected allele trajectory is stochastic, but for sufficiently strong selection, it is approximated by: t=2aln⁡fr(1-f0)(1-fr)f0 (2) The time before fixation is thus specified by fr=0.3, 0.6, 0.9, while the time at and after fixation by τ =0, 0.5, 1, 2 in units of 2Ne generations. Two cases of ρ per region are depicted: ρ=0 (left) and ρ=40 (right). For each case, we made 104 replications. A dot on each curve indicates the mode. (B) Stairway plot based on the SFS produced by discoal for a sample of hitchhikers with S=13 when fr=0.9 and f0=0.001 (a left panel for a hard sweep) and of those with S=31 when fr=0.9 and f0=0.1 (a right panel for a soft sweep). An abrupt termination of basal coalescences is a characteristic of hard sweeps. The ordinate stands for the number of hitchhikers and the abscissa for time t estimated under the assumption of the per-year mutation rate per bp of 0.5×10-9. Other parameters are set as n=68, θ=40, a=2Nes=400, ρ=0, and 40. The command line is ./discoal 68 10000 70000 -t 40 -a 400 -ws τ/2 -r ρ -f f0 -x 0.5 -c fr. Hitchhikers in an Ancient Selective Sweep In the human population, it has been argued that any sweep is unlikely to leave significant signals beyond 250 thousand years (ky) (Enard et al. 2002; Przeworski 2002). Nevertheless, it is important to reveal genetic changes that have been fixed by positive selection in the stem lineage of modern humans. In the rejection sampling approach, Przeworski (2003) used S, D∝S-Π and the number ( K) of haplotypes to estimate the time τ elapsed since an ancient sweep was completed. Recently, Racimo et al. (2014), Racimo (2016) and Peyrée´gne et al. (2017) took alternative approaches that use Denisovan and Neanderthal genomes and rely on distorted lineage sorting due to ancient sweeps that could occur in particular branches of the population tree. These approaches may identify candidate regions with sweep signals older than 400 ky, but do not necessarily pinpoint target sites of positive selection. One way in developing an inference method for ancient sweep is to focus on large size-classes of ξ˜cj, because the larger the size, the longer the time required for sweep signals to be erased by new mutations and/or recombination. For instance, the distribution of ξ˜c5has its mode at ξ˜c5<0.5 when τ=1, indicating that unlike the remaining smaller size-classes, most sweep signals are still retained in c5 even 2Ne generations after the fixation (fig. 1). If 2Ne generations amounts to 2×104 generations in the case of humans (Takahata 1993) and the generation time is 20 (or 30) years, then τ=1 equals 400 (or 600) ky. Needless to say, as τ further increases up to 2, ξ˜c5 equilibrates like all other size-classes. For larger sample size n, we can have even higher size-classes. Unfortunately, however, we cannot greatly improve the intrinsic ceiling in the rate-of-return to equilibrium by increasing the sample size (Wakeley 2009). On the other hand, the distribution of ξ˜cj is sensitive to θ; the smaller the θ, the greater the stochastic error and broader the distribution. For example, when τ=0.5 and the population recombination rate per region ρ=0, the coefficient of variation of ξ˜cj is  >3 times greater when θ=10 compared with θ=100. This suggests that a region of hitchhikers should be reasonably large to detect an ancient sweep. Although recombination narrows the distribution of ξ˜cj, it does not greatly alter the mode or the median (fig. 1). We can expect such a relatively minor effect on a sample of hitchhikers after the fixation of a linked selected allele because recombination occurs only among closely related hitchhikers. However, this is not the case under ongoing selective sweep. Hitchhikers in an Ongoing Selective Sweep Fan et al. (2016) reviewed current efforts in establishing a complete picture of local adaptations after the out-of-Africa dispersal of modern humans. For such ongoing sweeps, it is important to consider how intra-allelic variability is measured (Slatkin and Rannala 1997, 2000). Fujito, Satta, Hane, et al. (2018) and Fujito, Satta, Hayakawa, et al. (2018) developed one such statistic Fc as the ratio of the number of derived alleles in a sample of hitchhikers to that in the whole sample. While hitchhikers can restore the intra-allelic variability in small size-classes relatively rapidly (Kimura and Ohta 1973), it is not the case in large size-classes when a sweep is still ongoing without recombination. The recovery becomes substantial only at some time after the selected allele has been fixed. In relation to this intra-allelic variability, if the initial frequency of the selected allele is as low as f0=0.001, there is little chance for multiple ancestral lineages of hitchhikers to be traced back beyond the selection phase. This is almost equivalent to the hard sweep assumption (Hermisson and Pennings 2005), and in the absence of recombination, there is little or no intra-allelic variability in large size-classes before the fixation, resulting in a small Fc value. Recombination changes these features dramatically. Allowing some of the ancestral lineages of hitchhikers to stop over (backward in time) or ride on the way (forward in time), recombination instantaneously creates deep genealogical separation among hitchhikers (see Fig. 1 in Coop et al. 2008). Intriguingly, the result of this stopover process depends strongly on the current frequency ( fr) of a selected allele. When fr=0.3 and n=68, nr=nfr≈20 and recombination cannot generate segregating sites in c5 or sites with any size greater than i=20: this reflects the unusual pattern and the absence of the fr=0.3 cumulative distribution in c4 and c5, respectively (fig. 1). Although under any ordinary circumstance, it is true that recombination affects mostly lower branches or tips of a coalescent tree (Ferretti et al. 2017), it also affects the SFS in higher branches and larger size-classes when coupled with an ongoing sweep. Recombination thus leaves enhanced intra-allelic variability that persists even after the fixation of a selected allele. Perspectives It is not a purpose of this perspective to propose any statistical method that can be developed based on the hierarchical binning of segregating sites and/or linkage disequilibrium (see Fujito, Satta, Hane, et al. 2018; Fujito, Satta, Hayakawa, et al. 2018 for such a method based on Fc and comparisons with available methods). We have instead demonstrated that both nonequilibrium properties of selectively neutral hitchhikers and effects of recombination on them differ greatly among different size-classes of SFS. We can naturally extend the same approach to such summaries as S, Π, L, H, and K if appropriately defined in terms of size-classed segregating sites. There is another interesting application of nonequilibrium SFS. To infer the human demographic history, Liu and Fu (2015) have proposed a SFS-based method called the stairway plot and applied it to the 1000 Genome Project sequence data. It is an attractive and powerful composite likelihood method for inferring recent changes in population size. To examine whether or not the stairway plot can recover the frequency change of a selected allele that imitates a population size change, we applied the method to a SFS among hitchhikers in an ongoing sweep. Despite the necessarily small number of segregating sites, we have noted that one characteristic in hitchhikers’ SFS lies in ξk=0 for all k greater than a certain j. The principle used in the stairway plot must then infer the scaled population size of ϑk=0 per bp at all (coalescent) states k in 2≤k≤nr-j. The bootstrap resampling procedure does not change this inference. This would suggest that the state nr-j+1 corresponds to the initial frequency of the selected allele as well as the time of the emergence. We used SFSs in a region of length 70kb that were generated by discoal with ρ=0, θ=40, a=400, and various combinations of f0 and fr. Surprisingly, the stairway plot only with S=∑ξi=13 or 31 could depict the assumed frequency trajectory of the selected allele in a satisfactory manner (fig. 1). Unless this is fortuitous, it provides us with an excellent inference device for the time period and strength of positive selection inscribed in selectively neutral hitchhikers. In particular, it may offer a clue to distinguish between soft and hard sweeps, and in either case, it appears that necessary conditions for detecting sweeps are roughly t≪1 and f0≪fr. However, its application to nonhitchhikers or the entire population might be more challenging. “Increasingly, the neutral theory is not so much an end in itself, but is a way to study other evolutionary processes, such as migration.” Crow (2008) thus closed his reminiscence. Yet he did not forget to quote JBS Haldane for Kimura: the highest honor a scientist can have is for his theory to be so taken for granted that his name is no longer mentioned. These words must have sounded gracious to Kimura and seem most apropos for the 50th anniversary of his theory and the commemorating 2018 SMBE meeting held in his country. Acknowledgments We thank reviewers for constructive criticisms and Dr Quintin Lau for his useful comments on the English. This work was supported in part by the Japan Society for Promotion of Science (JSPS) grant 16H04821 to Y.S., and the Scientific Research on Innovative Areas, a MEXT Grant-in-Aid Project FY2016-2020 16H06412 to N.T. References Bustamante CD , Wakeley J , Sawyer S , Hartl DL. 2001 . Directional selection and the site-frequency spectrum . Genetics 159 ( 4 ): 1779 – 1788 . Google Scholar PubMed Charlesworth B , Charlesworth D. 2012 . Elements of evolutionary genetics. 2nd ed. Colorado: Roberts and Company Publishers . Coop G , Bullaughey K , Luca F , Przeworski M. 2008 . The timing of selection at the human FOXP2 gene . Mol Biol Evol . 25 ( 7 ): 1257 – 1259 . Google Scholar CrossRef Search ADS PubMed Coop G , Ralph P. 2012 . Patterns of neutral diversity under general models of selective sweeps . Genetics 192 ( 1 ): 205 – 224 . Google Scholar CrossRef Search ADS PubMed Crow JF. 2008 . Motoo Kimura and the rise of neutralism. In: Oren H , Michael D , editors. Rebels, maverick and heretics in biology. New York (NY ): Yale University . p. 265 – 280 . Enard D , Messer PW , Petrov DA. 2014 . Genome-wide signals of positive selection in human evolution . Genome Res . 24 ( 6 ): 885 – 895 . Google Scholar CrossRef Search ADS PubMed Enard W , Przeworski M , Fisher SE , Lai CSL , Wiebe V , Kitano T , Monaco AP , Päabo S. 2002 . Molecular evolutio of FOXP2, a gene involved in speech and laugguage . Nature 418 ( 6900 ): 869 – 872 . Google Scholar CrossRef Search ADS PubMed Fan S , Hansen ME , Lo Y , Tishkoff SA. 2016 . Going global by adapting local: a review of recent human adaptation . Science 354 ( 6308 ): 54 – 59 . Google Scholar CrossRef Search ADS PubMed Fay JC , Wu CI. 2000 . Hitchhiking under positive Darwinian selection . Genetics 155 ( 3 ): 1405 – 1413 . Google Scholar PubMed Ferretti L , Ledda A , Wiehe T , Achaz G , Ramos-Onsins SE. 2017 . Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests . Genetics 207 ( 1 ): 229 – 240 . Google Scholar CrossRef Search ADS PubMed Fu YX. 1995 . Statistical properties of segregating sites . Theor Popul Biol . 48 ( 2 ): 172 – 197 . Google Scholar CrossRef Search ADS PubMed Fujito NT , Satta Y , Hane M , Matsui A , Yashima K , Kitajima K , Sato C , Takahata N , Hayakawa T. 2018 . Positive selection on schizophrenia-associated ST8SIA2 gene in post-glacial Asia . PLoS One (submitted). Fujito NT , Satta Y , Hayakawa T , Takahata N. 2018 . A new inference method for ongoing selective sweep . Gene Genet Syst . (submitted). Griffiths RC , Tavaré S. 1998 . The age of a mutation in a general coalescent tree . Commun Stat Stochastic Models 14 ( 1–2 ): 273 – 295 . Google Scholar CrossRef Search ADS Hermisson J , Pennings PS. 2005 . Soft sweeps: molecular population genetics of adaptation from standing genetic variation . Genetics 169 ( 4 ): 2335 – 2352 . Google Scholar CrossRef Search ADS PubMed Kaplan NL , Hudson RR , Langley CH. 1989 . The “hitchhiking effect” revisited . Genetics 123 ( 4 ): 887 – 899 . Google Scholar PubMed Kern AD , Schrider DR. 2016 . discoal: flexible coalescent simulations with selection . Bioinformatics 32 ( 24 ): 3839 – 3841 . Google Scholar CrossRef Search ADS PubMed Kimura M. 1968 . Evolutionary rate at the molecular level . Nature 217 ( 5129 ): 624 – 626 . Google Scholar CrossRef Search ADS PubMed Kimura M. 1969 . The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations . Genetics 61 : 893 – 903 . Google Scholar PubMed Kimura M. 1971 . Theoretical foundations of population genetics at the molecular level . Theor Popul Biol . 2 ( 2 ): 174 – 208 . Google Scholar CrossRef Search ADS PubMed Kimura M , Ohta T. 1973 . The age of a neutral mutant persisting in a finite population . Genetics 75 ( 2 ):535–212. Liu X , Fu YX. 2015 . Exploring population size changes using SNP frequency spectra . Nat Genet . 47 ( 5 ): 555 – 559 . Google Scholar CrossRef Search ADS PubMed Maynard Smith J , Haigh J. 1974 . The hitchhiking effect of a favorable gene . Genet Res . 23 ( 01 ): 23 – 35 . Google Scholar CrossRef Search ADS PubMed Peyrégne S , Boyle MJ , Dannemann M , Prüfer K. 2017 . Detecting ancient positive selection in humans using extended lineage sorting . Genome Res . 27 ( 9 ): 1563 – 1572 . Google Scholar CrossRef Search ADS PubMed Przeworski M. 2002 . The signature of positive selection at randomly chosen loci . Genetics 160 ( 3 ): 1179 – 1189 . Google Scholar PubMed Przeworski M. 2003 . Estimating the time since the fixation of a beneficial allele . Genetics 164 ( 4 ): 1667 – 1676 . Google Scholar PubMed Racimo F. 2016 . Testing for ancient selection using cross-population allele frequency differentiation . Genetics 202 ( 2 ): 733 – 750 . Google Scholar CrossRef Search ADS PubMed Racimo F , Kuhlwilm M , Slatkin M. 2014 . A test for ancient selective sweeps and an application to candidate sites in modern humans . Mol Biol Evol . 31 ( 12 ): 3344 – 3358 . Google Scholar CrossRef Search ADS PubMed Slatkin M , Rannala B. 1997 . Estimating the age of alleles by use of intra-allelic variability . Am J Hum Genet . 60 : 447 – 458 . Google Scholar PubMed Slatkin M , Rannala B. 2000 . Estimating allele age . Annu Rev Genomics Hum Genet . 1 : 225 – 249 . Google Scholar CrossRef Search ADS PubMed Tajima F. 1989 . Statistical method for testing the neutral mutation hypothesis by DNA polymorphism . Genetics 123 ( 3 ): 585 – 595 . Google Scholar PubMed Takahata N. 1993 . Allelic genealogy and human evolution . Mol Biol Evol . 10 ( 1 ): 2 – 22 . Google Scholar PubMed Wakeley J. 2009 . Coalescent theory. An introduction. Greenwood Village (CO ): Roberts & Company Publishers . Watterson GA. 1975 . On the number of segregating sites in genetical models without recombination . Theor Popul Biol . 7 ( 2 ): 256 – 276 . Google Scholar CrossRef Search ADS PubMed Zeng K , Fu YX , Shi S , Wu CI. 2006 . Statistical tests for detecting positive selection by utilizing high-frequency variants . Genetics 174 ( 3 ): 1431 – 1439 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Molecular Biology and EvolutionOxford University Press

Published: May 2, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off