DeepDyve requires Javascript to function. Please enable Javascript on your browser to continue.

The Influence of Higher-Order Epistasis on Biological Fitness Landscape Topography
The Influence of Higher-Order Epistasis on Biological Fitness Landscape Topography
Weinreich, Daniel; Lan, Yinghong; Jaffe, Jacob; Heckendorn, Robert
2018-02-07 00:00:00
J Stat Phys (2018) 172:208–225 https://doi.org/10.1007/s10955-018-1975-3 The Inﬂuence of Higher-Order Epistasis on Biological Fitness Landscape Topography 1,2 1 Daniel M. Weinreich · Yinghong Lan · 1 3 Jacob Jaffe · Robert B. Heckendorn Received: 17 July 2017 / Accepted: 24 January 2018 / Published online: 7 February 2018 © The Author(s) 2018. This article is an open access publication Abstract The effect of a mutation on the organism often depends on what other mutations are already present in its genome. Geneticists refer to such mutational interactions as epista- sis. Pairwise epistatic effects have been recognized for over a century, and their evolutionary implications have received theoretical attention for nearly as long. However, pairwise epistatic interactions themselves can vary with genomic background. This is called higher-order epis- tasis, and its consequences for evolution are much less well understood. Here, we assess the inﬂuence that higher-order epistasis has on the topography of 16 published, biological ﬁtness landscapes. We ﬁnd that on average, their effects on ﬁtness landscape declines with order, and suggest that notable exceptions to this trend may deserve experimental scrutiny. We conclude by highlighting opportunities for further theoretical and experimental work dissecting the inﬂuence that epistasis of all orders has on ﬁtness landscape topography and on the efﬁciency of evolution by natural selection. Keywords Higher-order epistasis · Fitness landscapes topography · Natural selection · NK landscape · Sequence space combinatorics Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10955- 018-1975-3) contains supplementary material, which is available to authorized users. Daniel M. Weinreich Daniel_Weinreich@Brown.edu http://brown.edu/research/labs/weinreich/home Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA Computer Science Department, University of Idaho, 875 Perimeter Drive, MS 1010, Moscow, ID 83844, USA 123 The Inﬂuence of Higher-Order Epistasis on Biological... 209 1 Introduction One of the more evocative pictures of biological evolution is that of a population climbing the ﬁtness landscape [37,44]. This image was originally proposed by Wright [73] to build intuition into his [72]and Fisher’s [19] technical treatment of Darwin’s theory of natural selection in ﬁnite populations under Mendelian genetics [51]. The topography of the ﬁtness landscape represents the strength and direction of natural selection as local gradients that inﬂuence the direction and speed with which populations evolve. While several distinct framings of the ﬁtness landscape have been suggested [51], here we employ the projection of genotypic ﬁtness over Maynard Smith’s sequence space [36]. Sequence space is a discrete, high-dimensional space in which genotypes differing by exactly one point mutation are spatially adjacent. Thus, proximity on the ﬁtness landscape cor- responds to mutational accessibility, and selection will try to drive populations along the locally steepest mutational trajectory. (See [68] for several processes not readily captured by this construction.) The most obviously interesting topographic feature of the ﬁtness landscape is the number of maxima, a point already recognized by Wright [73]. Two (or more) maxima can constrain natural selection’s ability to discover highest-ﬁtness solutions, since populations may be required to transit lower-ﬁtness valleys on the landscape en route. (Though see [25,65]for the population genetics of that process, sometimes called stochastic tunneling [15,25].) 1.1 Epistasis and Fitness Landscape Topography Epistasis is the geneticist’s term for interactions among mutational effects on the organism [46]. For example, genetically disabling two genes whose products act in the same linear biochemical pathway can have a much more modest effect than the sum of the effects of disabling either gene in isolation. Alternatively, disabling two functionally redundant genes can have a much more substantial effect than expected. (Indeed, such observations have taught us quite a bit about the organization of biochemical pathways, e.g., [2].) Epistatic interactions between mutations can occur for any organismal trait, including ﬁtness. Importantly, epistasis for ﬁtness has an intimate connection to the topography of the ﬁtness landscape, a fact also already appreciated by Wright [73]. For example, multi- ple peaks require the presence of mutations that are only conditionally beneﬁcial (called sign epistasis [49,68]). More generally, an isomorphism exists between ﬁtness landscapes deﬁned by mutations at some L positions in the genome and the suite of epistatic interac- tions possible among them. This follows because, while any particular mutation can appear L−1 on 2 different genetic backgrounds (assuming two alternative genetic states, or alle- les, at each position), each such mutation-by-background pair corresponds to a distinct adjacency in sequence space. Consequently, arbitrary differences in the ﬁtness effect of a mutation across genetic backgrounds can generically be represented on the ﬁtness landscape [68]. 1.2 Higher Order Epistasis and Fitness Landscape Topography Widespread epistasis between pairs of mutations has been recognized in nature for over 100 years [46,67], and the corresponding evolutionary theory is fairly advanced (e.g., [5,71]). However, pairwise interactions can themselves vary with genetic background, called higher- order epistasis [13,67]. And while it is now becoming clear that higher-order interactions are commonplace in nature [32,42,61,67], their inﬂuence on natural selection is less well 123 210 D. M. Weinreich et al. understood (though see [55]). Here, we present a simple framework for assessing the inﬂu- ence on ﬁtness landscape topography of epistatic terms of arbitrary order. We speculate that epistatic inﬂuence on the topography of naturally occurring ﬁtness landscapes should decline with epistatic order. We tested this prediction using 16 published biological ﬁtness landscapes. 2 Methods 2.1 The Order of Epistatic Interactions L L Any set of L biallelic loci deﬁnes 2 genotypes, each with 2 potentially independent ﬁtness values. Simultaneously, there are distinct subsets of k mutations that in principle can also independently contribute to a genotype’s ﬁtness. In total, there are thus k=0 =2 subsets of mutations (i.e., the power set of L mutations). This counting reﬂects the isomorphism between any ﬁtness landscape and its corresponding suite of epistatic terms [67]. We designate interactions among any subset of k mutations as kth-order epistasis. Note that here ﬁrst-order “epistasis” is degenerate in the sense that it represents the ﬁtness effects of each of the L mutations in isolation. And our zeroth-order “epistatic” term is the benchmark, relative to which the effect of each subset of mutations is computed. 2.2 The Fourier–Walsh Transformation Following earlier work [22,41,59,64,67] we employ the Fourier–Walsh transformation (Fig. 1a) to convert between ﬁtness landscapes and their corresponding epistatic terms. This is a linear transformation written − → 1 − → E = W . (1) − → Here W is the vector of all 2 ﬁtness values arranged in the canonical order deﬁned by ascending L-bit binary numbers encoding the corresponding genotype with respect to the presence or absence of each mutation (e.g., [33]). (W is the traditional population genetics L L symbol for ﬁtness.) is the Hadamard matrix, the unique, symmetric 2 × 2 matrix whose entries are either +1 or −1 and whose rows (and columns) are mutually orthogonal. ( can be written for arbitrary L, as for example with the hadamard() function in the software − → package Matlab, Mathworks, Natick, MA.) Finally, E is the resulting vector of 2 epistatic terms arranged in the canonical order deﬁned by ascending L-bit binary numbers whose 1’s indicate the corresponding subset of interacting loci. Figure 1a illustrates this transformation − → using the data in [45]. For example, the fourth component of E (–0.1429) signals a negative epistatic interaction between the two most 3’ mutations in that dataset. (See Fig. 1 in [54] − → for a graphical representation of the elements of E ,and [50] for the relationship between Eq. (1) and other formalisms for computing epistatic terms.) T 2 L The orthogonality and symmetry of means that · = = 2 I,where I is the identity matrix. This means that, just as Eq. (1) converts any landscape into its epistatic 123 The Inﬂuence of Higher-Order Epistasis on Biological... 211 a Compute epistatic coefficients 1 Binary String Epistatic × Ψ × = W E W Encoding Order +1 +1 +1 +1 +1 +1 +1...+1 1.0137 2.2880 0 +1 -1 +1 -1 +1 -1 +1... -1 2.5027 0.0857 1 +1 +1 -1 -1 +1 +1 -1... -1 1.2963 -0.1007 1 +1 -1 -1 +1 +1 -1 -1...+1 2.9960 -0.1429 2 × × = +1 +1 +1 +1 -1 -1 -1... -1 2.8863 0.0201 1 +1 -1 +1 -1 -1 +1 +1... -1 -2 -0.4814 2 +1 +1 -1 -1 -1 -1 +1...+1 1.3387 -0.2020 2 +1 -1 -1 +1 -1 +1 +1... -1 2.7427 -0.1051 6 b Compute epistatic contributions Compute correlation to landscape topography coefficient τ 1. Record epistatic order of each element in 0 0 E after sorting from largest to smallest in 1 1 2 1 absolute value. τ (Obs,Exp) 4 1 2. For each 2 ≤ m ≤ 2 = 0.1921 3 1 (m) (m) a. Construct W = Ψ E using the m best W 4 1 largest elements in E . 1 1 P = 0.03639 b. Record residual variance between 3 4 (m) W and W. best 2 4 Observed Expected 10 36 35 0.5 4 Sorted τ values in 3 b 1 3 4 5 3 n = 10 permutations Experimental variance 3 0.25 0.1 2 b 0 0.01 -0.25 0.001 Log[Trimethoprim IC75 of E. coli DHFR allele] -0.5 0.0001 02468 10 0 8 16 24 32 40 48 56 64 Sorted τ Index (x10 ) Number of Parameters Fig. 1 Analytic pipeline, illustrated with data from Palmer et al. [45]. a For each dataset, published ﬁtness − → data (or a suitable proxy, written W ) were ﬁrst converted to the corresponding epistatic terms (E using the Fourier–Walsh transformation (Eq. 1). b Explanatory power of a succession of models using only the m − −− → (m) largest epistatic terms in absolute value (W ) were compared with the published data. For given value best of m, these models provably have the greatest explanatory power (smallest residual variance) of any model with exactly m parameters (Appendix). The symbols plotted represent the epistatic order (Sect. 1.2) of each successive parameter added to the model. c Rank correlation coefﬁcient (τ ) between the empirical sequence of epistatic orders and those of our naïve expectation (Eq. 2) were computed. In cases where experimental variance was reported, these sequences were truncated as soon as the remaining model variance was less than the experimental variance. For the data shown, that truncation occurred after the 55th epistatic term. Finally, statistical signiﬁcance was assessed by a permutation test that asked whether the observed sequence of epistatic orders was signiﬁcantly different than random. For the data shown (red arrow), the observed value of τ (0.1921) was smaller than only the 3639 largest of 10 values obtained by the permutation test, yielding P = 0.03639 Residual Variance (%) ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 212 D. M. Weinreich et al. terms, so too can any vector of epistatic terms E be converted into its corresponding ﬁtness − → landscape as W = E . We take advantage of this fact next. 2.3 Subsetting Approximations of a Fitness Landscape − − → −−→ − → (m) (m) Given ﬁtness function W , we now introduce subsetting approximations W = E . − − → − → − → (m) L 1 Here, the E are constructed so that 0 ≤ m ≤ 2 of the components are from E = W W L L 2 (Eq. 1) and the remaining 2 − m components are set to zero. There are thus 2 subsetting − → approximations for any ﬁtness function W (corresponding to the power set of the 2 epistatic − → terms in E ). As a consequence of the orthogonality of the Fourier–Walsh transformation, the −−→ − → (m) sum of squares distance between ﬁtness function W and subsetting approximation W = − − → − − → (m) (m) E is minimized for given m if and only if E uses the m largest components in absolute W W − → value of E (see Appendix). We denote these 0 ≤ m ≤ 2 best subsetting approximations −−→ (m) W . best (Subsetting approximations deﬁned by interaction order rather than absolute magnitude of epistatic terms were recently employed elsewhere [55].) 2.4 Quantifying the Inﬂuence of Epistatic Terms on Empirical Fitness Landscape Topography To examine the inﬂuence of epistasis on ﬁtness landscape topography as a function of epistatic − → − → order, we ﬁrst used Eq. (1) to compute E for each W gleaned from the literature (Sect. −−→ (m) 2.7). For each 1 ≤ m ≤ 2 , we then iteratively constructed each W . Finally, for each best −−→ − → (m) m we recorded the residual variance between W ) and W (minimized by this subsetting best approximation; Sect. 2.3), together with the epistatic order of the mth-largest component of − → E . Figure 1b illustrates this process. 2.5 Statistics Our hypothesis is that the inﬂuence of an epistatic term on the ﬁtness landscape should decline with epistatic order. Put another way, we expected that after sorting the elements of − → E (Eq. 1) by their absolute magnitudes, the associated epistatic orders should be represented by a vector of 2 integers that reads: 0, 1, 1,..., 2, 2, 2,...,...L −−1, L −−1,..., L . (2) Speciﬁcally, this vector consists of one zero, followed by L ones, twos and in general k’s for all 0 ≤ k ≤ 2 . We tested this hypothesis for each dataset by ﬁrst computing Kendall’s τ correlation coefﬁcient [28] between this expectation and the epistatic orders observed among the elements − → in E sorted by absolute magnitude. τ is one (negative one) when the observed epistatic W b orders are perfectly correlated (anticorrelated) with expectation, and zero when they are uncorrelated. Note that Kendall’s τ statistic is appropriate because it accommodates ties. 123 The Inﬂuence of Higher-Order Epistasis on Biological... 213 For studies that also reported experimental variance, we computed the correlation coefﬁcient − → after discarding the epistatic orders of all jelements in E that reduced residual variance by less than experimental variance (see Fig. 1b and Table 1)aswellasthe last j epistatic order values in our expectation (given by Eq. 2). For each dataset, we then used a permutation test to test the null hypothesis that the corre- sponding correlation coefﬁcient is zero. Speciﬁcally, each dataset is characterized by some number of epistatic terms: 2 in cases where no experimental variance estimate is provided, or 2 – j in cases where we were able to identify non-signiﬁcant epistatic components (see previous paragraph and Table 1). For each of n = 10 replicates, we computed the rank cor- L L relation coefﬁcient between two random permutations of this number (2 or 2 – j ) of the epistatic order values drawn from Eq. (2) for given L. We then sorted correlation coefﬁcients, and the uncorrected P value reported for each dataset (Table 1) was taken as the fraction of permutations in which a correlation coefﬁcient greater than or equal to the empirical value was observed. Thus, ours is a one-tailed test of the hypothesis that no positive correlation is present. This process is illustrated in Fig. 1c. We used the Bonferroni–Holm method [24] to correct for multiple tests. In addition, under the null hypothesis that epistatic orders are uncorrelated with the naïve expectation given by Eq. (2), the distribution of P values observed across datasets should be uniformly distributed. We tested this hypothesis with a G-test after binning counts of empirically observed Pvalues. We assessed statistical signiﬁcance relative to the χ distribution [56]. 2.6 Empirical Datasets To compute all 2 epistatic terms in a ﬁtness landscape deﬁned over L biallelic loci requires data on the ﬁtness values (or suitable proxy) for each of the corresponding 2 genotypes. We previously designated such datasets combinatorially complete [67], and the datasets analyzed here are shown in Table 1. Several datasets [4,34,43,45] had a few loci with cardinality greater than two. In these cases, we examined one “slice” through the landscape deﬁned by randomly choosing just two alleles at those loci. Several studies examined multiple phenotypes for a single set of mutations, and follow- up studies sometimes presented additional phenotypes for a previously described set of mutations. Those cases are enumerated in Table 2; for each set of mutations we randomly sampled just one phenotype. Table 2 also lists all combinatorially complete datasets we know that are deﬁned over loci with cardinality greater than two. These were excluded here because the Fourier–Walsh framework doesn’t trivially generalize to higher cardinalities. Following [67], datasets reporting growth rates [4,10,14,16,20,21,70] and drug-resistance phenotypes [8,33,38,39,45,66] were log-transformed before analysis. Following [45], neg- ative two was used in place of log-transformed values when growth rate or drug resistance phenotypes of zero were observed. (In all cases, this is roughly one log order smaller than the smallest non-zero log-transformed value.) In cases where only mean and experimental variances (but not individual replicate observations) were provided, log transformations were 2 2 2 2 approximated by Taylor expansions: ln(x ) ≈ ln(x¯ ) − s /2x¯ and s ≈ s /x¯ . In cases ( ) ln(x ) where only means (but not variances) were provided, log transformations were approximated as ln(x ) ≈ ln(x¯ ). Following [45], for studies in which experimental variance estimates were provided, we recorded this quantity as a fraction of the total model variance. In one case [8], standard error was reported as standard error over “at least” two replicates; we therefore assumed n =2for each observation in that dataset. In one case [29], 95% experimental conﬁdence intervals were 123 214 D. M. Weinreich et al. Table 1 Analyses of published combinatorially complete empirical and simulated (NK) ﬁtness landscapes, sorted by P value associated with Kendall’s τ Phenotype [citation] Number of loci Number of Number of epistatic Kendall’s τ P value (L) maxima terms signiﬁcantly correlation different from zero coefﬁcient Log[S. cerevisiae HSP90 6 4 41 0.6202 < 0.00001*** mutant growth rates] [4] Log[diploid S. Cerevisiae 6 4 3 0.6667 <0.00001*** mutant growth rate] [21] Log[E. coli IMDH mutant 6 1 N.D. 0.7566 < 0.00001*** relative growth rates] [34] Avian lysozyme 314 0.5 <0.00001*** thermostability [35] N =5, K = 0 5 1 N.D. 0.3333 <0.00001*** Log[relative ﬁtness among 4 1 N.D. 0.7527 0.00001*** Methylobacterium extorquens mutants] [10] Log[HIV replicative capacity 5 3 25 0.5703 0.00002*** on CCR5+ cells] [14] Log[cefotaxime MIC of E. 5 1 29 0.5490 0.00011** coli TEM alleles] [66] Log[relative viability among 5 3 N.D. 0.4896 0.00027** fruit ﬂy mutants] [70] Log[cefalexin MIC of 4 1 N.D. 0.5376 0.00448 Bacillus cereus metallo-β-lactamase alleles] [38] N =5, K = 1 5 2 N.D. 0.5714 0.00474 Log[relative ﬁtness among 5 2 30 0.3486 0.01023 LTEE E. coli mutants in DM25 + EGTA] [20] N =5, K = 2 5 2 N.D. 0.4428 0.0156 The Inﬂuence of Higher-Order Epistasis on Biological... 215 Table 1 continued Phenotype [citation] Number of loci Number of Number of epistatic Kendall’s τ P value (L) maxima terms signiﬁcantly correlation different from zero coefﬁcient Log[relative colony growth 5 4 10 0.4387 0. 02002 rate among Aspergillus niger mutants] [16] Percent production of 6 10 N.D. 0.1974 0.02639 5-epi-aristolochene by sesquiterpene synthase mutants [43] Log[pyrimethamine IC50 of 4 2 14 0.4337 0.03133 P. falciparum DHFR alleles assayed in E. coli][33] Log[IC75 of E. coli DHFR 6 2 55 0.1921 0.03639 alleles against trimethoprim] [45] N =5, K = 4 5 5 N.D. 0.1735 0.1442 Mammalian glucocorticoid 4 4 N.D. 0.1075 0.30182 receptor cortisol sensitivity [7] Log[MIC of E. coli TEM 4 3 N.D. 0.0430 0.41356 alleles against ampicillin] [39] N =5, K = 5 5 7 N.D. −0.1632 0.86161 a L Maximum possible value is 2 b 5 Uncorrected value from permutation test (n =10 replicates). Bonferroni-corrected P values: *** ≤ 0.001; 0.001 < ** ≤ 0.01; 0.01 < * ≤ 0.05 No data because no experimental variance estimates provided with this dataset No data because simulated ﬁtness landscapes have no experimental variance 216 D. M. Weinreich et al. Table 2 Published combinatorially complete ﬁtness landscapes not examined here Phenotype [citation] Number of loci Number of genotypes Cycloguanil IC50 of S. cerevisiciae carrying P. 32 =8 falciparum DHFR alleles [11] Pyrimethamine IC50 of S. cerevisiciae carrying 32 =8 P. falciparum DHFR alleles [8] MIC against pyrimethamine of S. cerevisiae 32 =8 carrying P. falciparum DHFR alleles [8] Pyrimethamine IC50 of P. vivax DHFR alleles 42 =16 assayed in S. cerevisiae [26] 4 c MIC of TEM β-lactamase mutants against 15 42 =16 β-lactams [39] E. coli operator binding afﬁnities for 400 44 = 256 repressor sequences [48] Relative ﬁtness among LTEE E. coli mutants in 52 =32 DM25 [29] Relative ﬁtness among LTEE E. coli mutants in 52 =32 DM25 + guanazole [20] b 5 HIV replicative capacity on CXCR5+ cells [14]5 2 =32 2 3 d Transcription factor/response element speciﬁcity 54 × 2 = 128 in an ancient steroid hormone receptor [1] MIC of TEM-β-lactamase mutants against 62 =32 Piperacillin supplemented with clavulanic acid [63] b 6 Diploid S. cerevisiae mutant growth rate [21]6 2 =64 Percent production of minor products by 62 =64 sesquiterpene synthase mutants [43] Percent production of 4-EE by sesquiterpene 62 =64 synthase mutants [43] Percent production of PSD by sesquiterpene 62 =64 synthase mutants [43] 10 e 104 mouse DNA-binding proteins’ afﬁnity for 10 4 = 1,048,576 10 bp binding motifs [3] GFP afﬁnity for 10 nucleotide base pair binding 10 4 = 1,048,576 motifs [52] 10 f Afﬁnity of 1032 DNA-binding proteins spanning 10 4 = 1,048,576 eukaryotic diversity against 10 nucleotide base pair binding motifs [69] Written as the product of cardinalities across loci Another phenotype from this system is included in Table 1 In total, 16 × 15 distinct β-lactam compounds = 240 observations are reported in this study This study examined all combinations of 4 nucleotides at two key positions in the DNA response element together with all combinations of two amino acids at three key positions in the transcription factor In total 1,048,576 × 104 DNA-binding proteins = 109,051,904 observations are reported in this study In total 1,048,576 × 1,032 DNA-binding proteins = 1,082,130,432 observations are reported in this study reported, so variance estimates were computed under the assumption of normally distributed 2 2 noise as s = (n · CI 95/1.96) . 123 The Inﬂuence of Higher-Order Epistasis on Biological... 217 2.7 Simulated Fitness Landscapes We used NK ﬁtness landscapes [27] to test our hypothesis in a framework with explicitly tunable mutational interactions. Genomes in the NK model carry N loci. The ﬁtness contri- bution of each locus depends on its allelic state and that at K others. Thus 0 ≤ K ≤ N –1 represents a parameter that tunes the level of epistatic interaction in the landscape. (See [41] and references therein for a number of elegant statistical properties of NK ﬁtness landscapes.) We set N= 5 and generated one NK landscape for each 0 ≤ K ≤ N – 1, where interacting loci were assigned at random in the genome. Simulated data were then analyzed as described in Fig. 1. 2.8 Data and Software Archiving Input data ﬁles, together with purpose-built MatLab code to perform all analyses described are archived at https://github.com/weinreichlab/JStatPhys2018. Kendall’s τ correlation coefﬁ- cient was computed using MatLab code developed elsewhere [9]. NK ﬁtness landscapes were generated using code downloaded from https://github.com/qzcwx/NK-generator. 3 Results Epistasis can have profound consequences at many levels of biological organization [47, 53,60,71]. Here we tested the hypothesis that the inﬂuence of epistasis on empirical ﬁtness landscape topography should decline as a function of epistatic order. This study was originally stimulated by Fig. 2 in Palmer et al. [45], which examined six mutations in the dihydrofolate reductase (DHFR) gene of E. coli that contribute to increased resistance to an antimicrobial called trimethoprim. In that analysis, particular second- and third-order interactions were the third- and second-most inﬂuential epistatic terms for ﬁtness landscape topography respectively. Indeed, just two of the ﬁrst ten most inﬂuential epistatic terms were ﬁrst-order, and in aggregate ﬁrst-order terms explained just ∼ 28% of the variance in ﬁtness across the landscape. At ﬁrst blush, these results seem to challenge the hypothesis outlined in the previous paragraph, and we therefore sought to explore the pattern more broadly using published data from other systems. Fig. 2 Distribution of P value -5 -4 -3 -2 -1 0 uncorrected P values among 16 10 10 10 10 10 10 empirical datasets. Under any null model, P values are expected to be uniformly -1 distributed (black bars; note both axes are log-transformed). -2 Instead observed P values (grey bars) are sharply skewed toward small values (G = 143.77, -3 P 0.01) d.f.=5 Observed Expected -4 -5 Proportion of datasets with given P value 218 D. M. Weinreich et al. Figure 1 illustrates the application of our analytic pipeline (see Sect. 2) to these same data. Our Fig. 1b closely recapitulates Fig. 2ainPalmeretal. [45]. While the precise sequence of epistatic terms differs slightly (likely because the previous study employed a subtly different framework for computing epistatic terms), higher-order epistatic interac- tions are again responsible for some the largest reductions in residual variance. Indeed, as previously observed, just two of the ﬁrst ten terms are ﬁrst-order, and in aggregate and ﬁrst- order terms again explain just ∼ 28% of the variance in the data (Table 3a, compare the ﬁrst two columns with Fig. 2bin[45]). Importantly however, Fig. 1c illustrates that we ﬁnd a signiﬁcant, positive correlation between expectation (Eq. 2) and the observed inﬂuence of epistatic terms on landscape topography as a function of their order (τ = 0.1921, P = 0.03639). We next applied our pipeline to 15 other published, combinatorially complete datasets. Results are summarized in Table 1 and shown graphically in Fig. S1. Out of all 16 datasets examined, 14 exhibit a signiﬁcantly positive correlation between observation and the expec- tation, and eight of these remain signiﬁcant after Bonferroni correction for multiple tests. Moreover, across datasets Table 1 exhibits a bias toward small P values. Under the null hypothesis (no signiﬁcant correlation with expectation), we would expect a uniform distri- bution of P values. Instead, the observed distribution is sharply and signiﬁcantly skewed toward small values (Fig. 2, G = 143.77, P 0.01). d.f.=5 We also applied our pipeline to NK ﬁtness landscapes generated for N =5and0 ≤ K ≤ N –1.Weset N = 5 because the average size of the empirical datasets is 4.875 loci. Those results are also included in Table 1 (though omitted from Fig. 2). 4 Discussion Using a novel analytic pipeline (Fig. 1), we have examined 16 published, combinatorially complete biological datasets. This analysis broadly conﬁrms our intuition that the inﬂuence of epistatic terms on empirical ﬁtness landscape topography should decline with order, i.e., with the number of interacting mutations. Consistent with this intuition, observed ﬁt to expectation in our simulated (NK) ﬁtness landscapes deteriorates as the amount of epistasis (K ) goes up. In the limit of K = N – 1, ﬁtness values (and hence, our epistatic terms) are i.i.d., and consequently the correlation is ∼ 0. While considerable heterogeneity in effect exists among our empirical datasets (Table 1), eight of the 16 exhibit a Bonferroni-corrected, signiﬁcantly positive correlation with expecta- tion (Eq. 2). Moreover, across all 16 empirical datasets, we ﬁnd a sharp bias toward signiﬁcant P values (Fig. 2). Nor is there any correlation between the size of the dataset and uncorrected P value (not shown), suggesting that low statistical power is unlikely to contribute to the overall picture. The relative magnitudes of epistatic terms depend on the underlying ﬁtness scale employed [30,67]. Although we log-transformed growth rate and drug resistance data (see Sect. 2.6), we have otherwise overlooked this fact. Recently, approaches for systematically rescaling data to minimize higher-order epistatic effects have been introduced [54](seealso[41,62]). Applications of such methods would certainly have quantitative consequences for results presented here. However, because these approaches (on average) reduce higher-order epistatic terms, we believe this omission renders our conclusions conservative. We also acknowledge that we failed to honor experimental uncertainty in the magnitudes of epistatic effects observed, which would almost certainly weaken the signal reported in 123 The Inﬂuence of Higher-Order Epistasis on Biological... 219 Table 1. While we regard a rigorous treatment of experimental noise to be outside the scope of the present study, we note that the results presented in Fig. 2 are robust to its inﬂuence. Nevertheless, this is a serious concern for future consideration: because epistasis represents the difference between mutational ﬁtness effects on different genetic background, experi- mental variance in ﬁtness assays must be summed when computing variance in epistatic terms. For example, variance in epistatic terms computed with Eq. (1) will be roughly 2 as large as variance in the individual, underlying ﬁtness measurements. Recently, an alter- native, ranks-based approach to assessing epistatic interactions between mutations has been proposed [13], which appears to be less sensitive to this effect. 4.1 The Combinatorics of Higher-Order Epistasis This work was originally stimulated by a previous study [45] that examined six mutations in the DHFR gene responsible for increased trimethoprim resistance in E. coli.Atﬁrstblush, results summarized in Fig. 2 of that study called into question the hypothesis that higher- Table 3 Average epistatic inﬂuence on ﬁtness landscape topography as a function of epistatic order in select datasets Epistatic order Aggregate reduction in Number of epistatic terms Mean reduction in residual residual variance signiﬁcantly different from variance per epistatic term zero (a) Log[IC75 of E. coli DHFR alleles against trimethoprim] [45] First 0.279 5 0.056 Second 0.266 12 0.022 Third 0.233 18 0.013 Fourth 0.144 11 0.013 Fifth 0.0685 6 0.011 Sixth 0.0065 1 0.0065 (b) Mammalian glucocorticoid receptor cortisol sensitivity [7] First 0.171 4 0.043 Second .405 6 0.067 Third 0.420 4 0.105 Fourth 0.004 1 0.004 (c) Log[MIC of E. coli TEM alleles against ampicillin] [39] First 0.353 4 0.088 Second 0.278 6 0.043 Third 0.279 4 0.070 Fourth 0.091 1 0.091 (d) N =5, K =4 First 0.027 5 0.005 Second 0.315 10 0.315 Third 0.402 10 0.402 Fourth 0.241 5 0.412 Fifth 0.015 1 0.015 Largest value for each dataset shown in bold 123 220 D. M. Weinreich et al. order epistasis should only modestly inﬂuence naturally occurring ﬁtness landscapes. And the salient features of that ﬁgure were recapitulated by our treatment (Fig. 1b, Table 3a). However, our statistical analysis of those data reveals a strong positive correlation between epistatic inﬂuence on ﬁtness topography as a function of epistatic order, consistent with our hypothesis (Fig. 1c). Thus in this system, the substantial inﬂuence of a few high-order epistatic terms is nevertheless consistent with the idea that high-order epistatic terms should in general only modestly contribute to ﬁtness topography. The resolution to this puzzle resides in the combinatoric number of epistatic terms. As noted above, given L biallelic loci there are epistatic coefﬁcients of order k,and this quantity grows almost exponentially for k L. Indeed, after normalizing the summed inﬂuence of all epistatic terms of order k by the number of such terms, we observe that the per- term effect declines almost monotonically with order in this dataset (Table 3a; see also [67]). More generally, in all but three of the datasets examined, the normalized explanatory power is largest for ﬁrst-order epistatic terms. Intriguingly, those three exceptions (see Table 3b-d: mammalian glucocorticoid receptor cortisol sensitivity [7], log[MIC of E. coli TEM allele sensitivity to ampicillin] [39]and the N =5, K = 4 simulated ﬁtness landscape) correspond to the three datasets with the largest P values in Table 1. The consideration of the combinatorics of the problem is closely related to the Fourier spectrum of a ﬁtness landscape [41,57], namely the sum of squared epistatic coefﬁcients as a function of interaction order. (This connection derives formally from the Appendix, which implies that the squared magnitude of each epistatic coefﬁcient is monotonic in its inﬂuence on landscape topography.) The Fourier spectrum is proportional to the binomial coefﬁcient when each genotype’s ﬁtness is identically and independently distributed. This follows from the fact that on such landscapes all epistatic coefﬁcients are also i.i.d., together with the combinatorics outlined in the previous paragraph. But as already anticipated by results in Table 3, the Fourier spectrum for the DHFR datasets is sharply shifted toward lower-order terms (not shown), as has previously been reported for both sesquiterpene synthase and several others biological datasets [41]. Nevertheless, declining average epistatic effects notwithstanding we ﬁnd many examples of speciﬁc epistatic terms with anomalously large explanatory effects in many of the datasets examined here (Fig. S1). We suggest that these may reﬂect important mechanistic interactions among those particular mutations in the underlying biology of the system, thus representing potentially fruitful entry points for the molecular biologist [17]. 4.2 Epistasis and the Efﬁciency of Natural Selection Our observation that the inﬂuence of epistatic terms on naturally occurring ﬁtness landscapes declines with epistatic order raises the question of how epistatic terms inﬂuence the efﬁciency of natural selection. We lack a complete theoretical understanding of this connection. One well-developed result concerns the inﬂuence of epistasis on the selective accessibility of mutational trajectories to high ﬁtness genotypes. First, sign epistasis means that the sign of the ﬁtness effect of a mutation varies with genetic background [68], and it renders selectively inaccessible at least some mutational trajectories to high ﬁtness (e.g., [66]). But connections between sign epistasis and epistatic order are only now being developed [13]. Second, a subsetting approach similar to ours (Sect. 2.3) was recently used to examine the inﬂuence of epistatic interactions selectively accessible mutational trajectories to high ﬁtness genotypes [55] in six of the datasets described here. Those authors found that higher-order terms indeed substantially alter the identity of selectively favored mutational trajectories to high-ﬁtness 123 The Inﬂuence of Higher-Order Epistasis on Biological... 221 genotypes, as well as their probabilities of realization. Further and consistent with ﬁndings here, that study also noted that the absolute magnitude of epistatic terms had an even larger effect on realized mutational trajectories than did their interaction order. Moreover, pairwise epistasis has long been understood to inﬂuence not just the selective accessibility of high ﬁtness genotypes but also the pace at which natural selection both increases the frequency of beneﬁcial mutations (e.g., [18]) and at which it purges deleterious mutations (e.g., [31]). This work is closely related to the role that genetic recombination can play in “unlocking” epistatically interacting mutations (e.g., [5,40]). However, to our knowledge the relationship between these effects and higher-order epistasis remains entirely unexplored. In addition, we have only quantitatively examined the sequence of epistatic orders sorted by explanatory power (Fig. 1c). Thus, a great deal of information present in these data (e.g., the slopes in Figs. 1b and S1) remains to be examined. And of course, the number and size of available combinatorially complete datasets continues to grow, motivating further work in this regard. It seems reasonable to suppose that the development and testing of more nuanced theoretical predictions may be possible using data of the sort examined here. Finally, we note that the Fourier–Walsh framework employed here depends on the avail- ability of combinatorially complete datasets. But the experimental demands of this approach grow exponentially with the number of mutations examined. This fact sharply limits the scalability of analytic pipelines like ours. Recently, theoretical progress has been made in the analysis of less-than-complete datasets [6,12,13], and older work has also explored this idea [23,58]. Theory that allows inferences using sparse datasets is likely to be a key advance in our ability explore broad, evolutionarily fascinating questions such as those considered here. Acknowledgements We are grateful to Tony Dean, David Hall, Sebastian Matuszewski, and Vaughn Cooper for providing raw data ﬁles. We also acknowledge constructive feedback on an earlier draft of this manuscript from Guillaume Achaz, Kristina Crona, Inês Fragata, Roy Kishony Joachim Krug, Sebastian Matuszewski, Brandon Ogbunugafor, Adam Palmer and two anonymous reviewers. DMW is supported in part by National Science Foundation Grant DEB-1556300, FEC-1736253 and National institutes of Health Grant R01GM095728. RBH is supported in part by the National Science Foundation under Cooperative Agreement No. DBI-0939454. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix: The Explanatory Power of Fourier–Walsh Coefﬁcients is Mono- tonic in Their Absolute Magnitude Assume two ﬁtness functions deﬁned over L biallelic loci are represented as column vectors − → − → − → W and X with Fourier–Walsh coefﬁcients E and E computed with Eq. (1). Deﬁne the sum W X − → − → − → − → 2 T of squares distance between W and X as W − X≡ (W − X ) = (W − X ) · W − X, i i i =1 − → where w and x are the ith components of W and X, respectively. i i Theorem 1 Sum of squares distance equivalence − → − → − → W − X= 2 E − E . W X 123 222 D. M. Weinreich et al. Proof By deﬁnition − → − → − → W = E and X = E W X where is the Hadamard matrix (see Sect. 2.2). Therefore − → − → − → W − X= E − E W X − → − → − → − → = E − E · E − E W X W X T T − → − → − → − → = E − E · E − E W X W X T T − → − → − → − → T T = E − E · E − E W X W X T T − → − → − → − → = E − E · E − E W X W X T L But recall that = 2 I, where I is the identity matrix. Thus T T − → − → − → − → − → W − X= 2 E − E · E − E W X W X − → − → − → − → − → − → L L = 2 E − E · E − E = 2 E − E . W X W X W X T L −1 L An interesting property of the Hadamard matrix is that = 2 . Without the 2 this equality is the hallmark of a rotational transformation. This means that Fourier–Walsh coefﬁcients are simply the result of a high dimensional axis rotation of the coordinates of function space, together with a uniform contraction. This provides intuition into Theorem 1: rotating the space and contracting it uniformly only changes the distance between two vectors in the space by the constant of contraction. Theorem 2 Minimizing the sum-of-squares distance of subsetting approximations − − → −−→ (m) (m) The subsetting approximation W = E that minimizes the sum of squares distance − − → − → (m) to function W is the one whose E uses the m largest components in absolute value in − → − → E = W. −−→ − −− → − → − → (m) Proof By Theorem 1, the sum of squares distance between W and W is W − W (m)= − − → − → (m) 2 E − E , which means that we can equivalently solve the minimization problem on either side of the equality. And trivially, the right-hand side is minimized when the m nonzero − − → − → (m) components in E are the m largest components in absolute value in E . (The squaring of − − → − → (m) differences in epistatic terms in the deﬁnition of E − E removes the signiﬁcance of W w their sign.) 123 The Inﬂuence of Higher-Order Epistasis on Biological... 223 References 1. Anderson, D.W., McKeown, A.N., Thornton, J.W.: Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites. eLife 4, e07864 (2015) 2. Avery, L., Wasserman, S.: Ordering gene function: the interpretation of epistasis in regulatory hierarchies. Trends Genet. 8, 312–316 (1992) 3. Badis, G., Berger, M.F., Philippakis, A.A., Talukder, S., Gehrke, A.R., Jaeger, S.A., Chan, E.T., Metzler, G., Vedenko, A., Chen, X., Kuznetsov, H., Wang, C.E., Coburn, D., Newburger, D.E., Morris, Q., Hughes, T.R., Bulyk, M.L.: Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009) 4. Bank, C., Matuszewski, S., Hietpas, R.T., Jensen, J.D.: On the (un)predictability of a large intragenic ﬁtness landscape. Proc. Natl. Acad. Sci. 113, 14085–14090 (2016) 5. Barton, N.H.: Why sex and recombination? Cold Spring Harbor Symp. Quantit. Biol. 74, 187–195 (2009) 6. Beerenwinkel, N., LPachter, L., Sturmfels, B.: Epistasis and shapes of ﬁtness landscapes. Stat. Sin. 17, 1317–1342 (2007) 7. Bridgham, J.T., Ortlund, E.A., Thornton, J.W.: An epistatic ratchet constrains the direction of glucocor- ticoid receptor evolution. Nature 461, 515–519 (2009) 8. Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R., Hartl, D.L.: Compensatory mutations restore ﬁtness during the evolution of dihydrofolate reductase. Mol. Biol. Evol. 27, 2682–2690 (2010) 9. Burkey, J.: A non-parametric monotonic trend test computing Mann-Kendall Tau, Tau-b, and Sen’s Slope written in Mathworks-MATLAB implemented using matrix rotations (2006) 10. Chou, H.-H., Chiu, H.-C., Delaney, N.F., Segrè, D., Marx, C.J.: Diminishing returns epistasis among beneﬁcial mutations decelarates adaptation. Science 322, 1190–1192 (2011) 11. Costanzo, M.S., Brown, K.M., Hartl, D.L.: Fitness trade-offs in the evolution of dihydrofolate reductase and drug rsistance in Plasmodium falciparum. PLoS ONE 6, e19636 (2011) 12. Crawford, L., Zeng, P., Mukherjee, S., Zhou, X.: Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLOS Genet. 13, e1006869 (2017) 13. Crona, K., Gavryushkin, A., Greene, D., Beerenwinkel, N.: Inferring genetic interactions from compar- ative ﬁtness data. eLife 6, e28629 (2017) 14. da Silva, J., Coetzer, M., Nedellec, R., Pastore, C., Mosier, D.E.: Fitness epistasis and constraints on adaptation in a human immunodeﬁciency virus type 1 protein region. Genetics 185, 293–303 (2010) 15. de Visser, J.A.G.M., Krug, J.: Empirical ﬁtness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014) 16. de Visser, J.A.G.M., Park, S.-C., Krug, J.: Exploring the effect of sex on empirical ﬁtness landscapes. Am. Nat. 174, S15–S30 (2009) 17. Dean, A.M., Thornton, J.W.: Mechanistic approaches to the study of evolution: the functional synthesis. Nat. Rev. Genet. 8, 675–688 (2007) 18. Eshel, I., Feldman, M.W.: On the evolutionary effect of recombination. Theor. Popul. Biol. 1, 88–100 (1970) 19. Fisher, R.A.: The genetical theory of natural selection. Clarendon Press, Oxford (1930) 20. Flynn, K.M., Cooper, T.F., Moore, F.B.G., Cooper, V.S.: The environment affects epistatic interactions to alter the topology of an empirical ﬁtness landscape. PLOS Genet. 9, e1003426 (2013) 21. Hall, D.W., Agan, M., Pope, S.C.: Fitness epistasis among 6 biosynthtic loci in the budding yeast Sac- charomyces cervisiae. J. Hered. 1010, S75–S84 (2010) 22. Heckendorn, R.B., Whitley, D.: Predicting epistasis from mathematical models. Evol. Comput. 7, 69–101 (1997) 23. Heckendorn, R.B., Wright, A.H.: Efﬁcient linkage discovery by limited probing. Evol. Comput. 12, 517–545 (2004) 24. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979) 25. Iwasa, Y., Michor, F., Nowak, M.A.: Stochastic tunnels in evolutionary dynamics. Genetics 166, 1571– 1579 (2004) 26. Jiang, P.-P., Corbett-Detig, R.B., Hartl, D.L., Lozovsky, E.R.: Accessible mutational trajectories for the evolution of pyrimethamine resistance in the malaria parasite Plasmodium vivax.J.Mol.Evol. 77, 81–91 (2013) 27. Kauffman, S.A., Weinberger, E.D.: The NK model of rugged ﬁtness landscapes and its application to maturation of the immune response. J. Theor. Biol. 141, 211–245 (1989) 28. Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938) 29. Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E., Cooper, T.F.: Negative epistasis between beneﬁcial mutations in an evolving bacterial population. Science 332, 1193–1196 (2011) 123 224 D. M. Weinreich et al. 30. Knies, J.L., Cai, F., Weinreich, D.M.: Enzyme efﬁciency but not thermostability drives cefotaxime resis- tance evolution in TEM-1 β-lactamase. Mol. Biol. Evol. 34, 1040–1054 (2017) 31. Kondrashov, A.S.: Deleterious mutations and the evolution of sex. Nature 336, 435–440 (1988) 32. Leem, S., Jeong, H.-H., Lee, J., Wee, K., Sohn, K.-A.: Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. Comput. Biol. Chem. 50, 19–28 (2014) 33. Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J., Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M., Hartl, D.L.: Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proc. Natl. Acad. Sci. 106, 12025–12030 (2009) 34. Lunzer, M., Miller, S.P., Felsheim, R., Dean, A.M.: The biochemical architecture of an ancient adaptive landscape. Science 310, 499–501 (2005) 35. Malcolm, B.A., Wilson, K.P., Matthews, B.W., Kirsch, J.F., Wilson, A.C.: Ancestral lysozymes recon- structed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature 345, 86–89 (1990) 36. Maynard Smith, J.: Natural selection and the concept of a protein space. Nature 225, 563–565 (1970) 37. McCandlish, D.M.: Visualizing ﬁtness landscapes. Evolution 65, 1544–1558 (2011) 38. Meini, M.-R., Tomatis, P.E., Weinreich, D.M., Vila, A.J.: Quantitative description of a protein ﬁtness landscape based on molecular features. Mol. Biol. Evol. 32, 1774–1787 (2015) 39. Mira, P.M., Meza, J.C., Nandipati, A., Barlow, M.: Adaptive landscapes of resistance genes change as antibiotic concentrations change. Mol. Biol. Evol. 32, 2707–2715 (2015) 40. Neher, R.A., Shraiman, B.I.: Competition between recombination and epistasis can cause a transition from allele to genotype selection. Proc. Natl. Acad. Sci. 106, 6866–6871 (2009) 41. Neidhart, J., Szendro, I.G., Krug, J.: Exact results for amplitude spectra of ﬁtness landscapes. J. Theor. Biol. 332, 218–227 (2013) 42. Nelson, R.M., Kierczak, M., Carlborg, Ö.: Higher Order Interactions: Detection of Epistasis Using Machine Learning and Evolutionary Computation. In: Gondro, C., van der Werf, J., Hayes, B. (eds.) Genome-Wide Association Studies and Genomic Prediction, pp. 499–518. Humana Press, Totowa (2013) 43. O’Maille, P.E., Malone, A., Dellas, N., Hess Jr., B.A., Smentek, L., Sheehan, I., Greenhagen, B.T., Chap- pell, J., Manning, G., Noel, J.P.: Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nat. Chem. Biol. 4, 617–623 (2008) 44. Orr, H.A.: Fitness and its role in evolutionary genetics. Nat. Rev. Genet. 10, 531–539 (2009) 45. Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S., Kishony, R.: Delayed commitment to evolutionary fate in antibiotic resistance ﬁtness landscapes. Nat. Commun. 6, 7385 (2015) 46. Phillips, P.C.: The language of gene interaction. Genetics 149, 1167–1171 (1998) 47. Phillips, P.C.: Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008) 48. Poelwijk, F., Kiviet, D.J., Tans, S.J.: Evolutionary potential of a duplicated repressor-operator pair: sim- ulating pathways using mutational data. PLoS Comput. Biol. 2, e58 (2006) 49. Poelwijk, F., Kiviet, D.J., Weinreich, D.M., Tans, S.J.: Empirical ﬁtness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007) 50. Poelwijk, F.J., Krishna, V., Ranganathan, R.: The context-dependence of mutations: a linkage of for- malisms. PLoS Comput. Biol. 12, e1004771 (2016) 51. Provine, W.B.: Sewall Wright and Evolutionary Biology. University of Chicago Press, Chicago (1986) 52. Rowe, W., Platt, M., Wedge, D.C., Day, P.J., Kell, D.B., Knowles, J.: Analysis of a complete DNA-protein afﬁnity landscape. J. R. Soc. Interface 7, 397–408 (2010) 53. Sackton, T.B., Hartl, D.L.: Genotypic context and epistasis in individuals and populations. Cell 166, 279–287 (2016) 54. Sailer, Z.R., Harms, M.J.: Detecting high-order epistasis in nonlinear genotype-phenotype maps. Genetics 205, 1079–1088 (2017) 55. Sailer, Z.R., Harms, M.J.: High-order epistasis shapes evolutionary trajectories. PLOS Comput. Biol. 13, e1005541 (2017) 56. Sokal, R.R., Rohlf, F.J.: Biometry. W.H. Freeman and Company, New York (1995) 57. Stadler, P.F.: Landscapes and their correlation functions. J. Math. Chem. 20, 1–45 (1996) 58. Stadler, P.F.: Spectral Landscape Theory. In: Crutchﬁeld, J.P., Schuster, P. (eds.) Evolutionary Dynamics: Exploring the Interplay of Selection, Accident, Neutrality, and Function, pp. 221–272. Oxford University Press, Oxford (2003) 59. Stadler, P.F., Happel, R.: Random ﬁeld models for ﬁtness landscapes. J. Math. Biol. 38, 435–478 (1999) 60. Starr, T.N., Thornton, J.W.: Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016) 61. Sun, X., Lu, Q., Mukherjee, S., Crane, P.K., Elston, R., Ritchie, M.D.: Analysis pipeline for the epistasis search – statistical versus biological ﬁltering. Frontiers in Genetics 5, (2014) 123 The Inﬂuence of Higher-Order Epistasis on Biological... 225 62. Szendro, I.G., Schenk, M., Franke, J., Krug, J., de Visser, J.A.G.M.: Quantitative analyses of empirical ﬁtness landscapes. J. Stat. Mech. P01, 005 (2013) 63. Tan, L., Serene, S., Chao, H.X., Gore, J.: Hidden randomness between ﬁtness landscapes limits reverse evolution. Physical Review Letters 106, 198102 (2011) 64. Weinberger, E.D.: Fourier and Taylor series on ﬁtness landescapes. Biol. Cybern. 65, 321–330 (1991) 65. Weinreich, D.M., Chao, L.: Rapid evolutionary escape by large populations from local ﬁtness peaks is likely in nature. Evolution 59, 1175–1182 (2005) 66. Weinreich, D.M., Delaney, N.F., DePristo, M.A., Hartl, D.L.: Darwinian evolution can follow only very few mutational paths to ﬁtter proteins. Science 312, 111–114 (2006) 67. Weinreich, D.M., Lan, Y., Wylie, C.S., Heckendorn, R.B.: Should evolutionary geneticists worry about high order epistasis? Curr. Opin. Dev. Genet. 23, 700–707 (2013) 68. Weinreich, D.M., Watson, R.A., Chao, L.: Perspective: sign epistasis and genetic constraint on evolution- ary trajectories. Evolution 59, 1165–1174 (2005) 69. Weirauch, M.T., Yang, A., Albu, M., Cote, A.G., Montenegro-Montero, A., Drewe, P., Najafabadi, H.S., Lambert, S.A., Mann, I., Cook, K., Zheng, H., Goity, A., van Bakel, H., Lozano, J.C., Galli, M., Lewsey, M.G., Huang, E., Mukherjee, T., Chen, X., Reece-Hoyes, J.S., Govindarajan, S., Shaulsky, G., Walhout, A.J.M., Bouget, E.Y., Ratsch, G., Larrondo, L.E., Ecker, J.R., Hughes, T.R.: Determination and inference of eukaryotic transcription factor sequence speciﬁcity. Cell 158, 1431–1443 (2014) 70. Whitlock, M.C., Bourguet, D.: Factors affecting the genetic load in Drosophila: synergistic epistasis and correlations among ﬁtness components. Evolution 54, 1654–1660 (2000) 71. Wolf, J.B., Brodie, E.D.I., Wade, M.J. (eds.): Epistasis and the Evolutionary Process. Oxford University Press, New York (2000) 72. Wright, S.: Evolution in Mendelian populations. Genetics 16, 97–159 (1931) 73. Wright, S.: The roles of mutation, inbreeding, crossbreeding and selection in evolution. In: D.F. Jones (eds.) Proceedings of the Sixth International Congress of Genetics, pp. 356–366. Brooklyn Botanic Garden, Menasha (1932)
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png
Journal of Statistical Physics
Springer Journals
http://www.deepdyve.com/lp/springer-journals/the-influence-of-higher-order-epistasis-on-biological-fitness-Rs0MtUqV40