Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations

Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. functional impact of multiple mutations, network analysis, protein adaptation, beta lactamase, antibiotic resistance Introduction The acquisition of new functions, a process known as genetic adaptation, occurs through the accumulation of mutations. The functional impacts of mutations on organisms or their protein constituents depends on both the current selective pressures and on the specific sequence contexts in which these mutations occur (Soskine and Tawfik 2010). In proteins, complex functional codependencies between mutations are pervasive but are also challenging to study, due to the large number of combinations involved (Weinreich et al. 2013). A unique opportunity to advance our understanding of protein evolutionary dynamics presents itself in single-cell, haploid microorganisms. Large populations produce a high genetic diversity even when baseline mutation frequency is low, as is the case in bacteria and yeast (Lynch et al. 2016). Drawing from this genetic diversity, positive selection can lead to the fixation of a sequence of mutations increasing fitness (DePristo et al. 2005, 2007; Orr 2005; Unckless and Orr 2009). Organismal fitness can be used as a convenient proxy for activity in these model systems, allowing for genotype–phenotype profiling across large sections of sequence space. In this way, the principles governing key aspects of evolution, including genetic robustness, evolvability, and predictability of evolutionary trajectories can be studied (Bershtein et al. 2006; Carneiro and Hartl 2010; Saakian et al. 2012; Crona et al. 2013; Firnberg et al. 2014; Bank et al. 2016). In previous work, we used the evolution of the TEM β-lactamase cluster as a model system for the acquisition of new functions (Guthrie et al. 2011). Because of low genetic barriers (in some cases, a single mutation can result in the acquisition of resistance to a new β-lactam antibiotic), TEM β-lactamases are a convenient model system for genetic adaptation (Camps et al. 2007; Fogle et al. 2008; Tomatis et al. 2008; Jansen et al. 2013; de Visser and Krug 2014). We analyzed both the inhibitor-resistant β-lactamase and extended-spectrum β-lactamase (ESBL) phenotypes, with a greater focus on the ESBL phenotype. In TEM ESBLs, the acquisition of new extended-spectrum activity leads to broader substrate specificity, in terms of resistance to additional antibiotics such as cefotaxime, which can be used to monitor adaptation (Salverda et al. 2011; Schenk et al. 2012; Mira et al. 2015). In addition, experimental and clinical evolution of extended-spectrum resistance represent analogous adaptation events in this model system, which allows the use of hybrid (experimental and clinical) mutant databases (Salverda et al. 2010; Guthrie et al. 2011). Our basic hypothesis was that recent evolutionary radiation, particularly in the case of drug resistance genes, is indicative of strong positive selection dominated by one or a few discrete types of selections. We assumed that the co-occurrence of pairs of mutations in the same TEM sequence was an indication of a potential functional interaction between them. Pairwise mutation co-occurrence was used to construct a network model. Higher order (n = 3) combinations with a positive effect on fitness were then explored by analyzing the network structure. We showed that central network links represent pairs of mutations leading to increased protein fitness. More generally, central paths consisting of three linked mutations were helpful in identifying functionally important triple mutants in TEM (Guthrie et al. 2011). Although our approach had similarities to earlier network models of mutation covariation over long evolutionary distances (Lee et al. 2008; Chakrabarti and Panchenko 2010), these earlier studies focused on dependencies which preserve protein function and structural constraints across large protein families. In contrast, we focused on dependencies critical to the evolution of new, specialized functions evolving in small clusters of highly related proteins. A limitation of mutation co-occurrence network models, regardless of their focus on function preservation or function acquisition, is that without an underlying phylogenetic model, it is not possible to determine how many times a pair of mutations co-occurred independently (Kryazhimskiy et al. 2011). The NAPA method presented here extends our previous work to incorporate phylogenetic reconstruction into network modeling. We apply it to three functionally related clusters of β-lactamase sequences: to the original TEM ESBL, we added the CTX-M-3, and the OXA-51 clusters. These three recent evolutionary radiations represent parallel evolution events under selection by different types of β-lactam antibiotics: oximino-cephalosporins (TEM ESBL, CTX-M-3) and carbapenems (OXA-51). For each protein cluster, we reconstruct the evolution of extant sequences from a putative ancestor and count mutations that co-occur along unique paths from ancestral to descendent sequences on the phylogenetic trees. The resulting networks are described here as “phylogeny-based,” although they are distinct from the well-known “phylogenetic networks,” which generalize phylogenetic trees (Legendre and Makarenkov 2002). By reducing noise caused by overrepresentation in the population caused by clonal expansion, the phylogeny-based network approach can potentially produce a more accurate count of parallel evolution events (Pollock et al. 1999). Furthermore, this approach intrinsically contains information about likely temporal ordering of mutations. The three β-lactamase clusters studied here are serine hydrolases, in which an acyl-enzyme intermediate between the β-lactam moiety and the conserved active-site serine (S70) is hydrolyzed by a reactive water, releasing the product of hydrolysis to regenerate the active site for the next turnover (Galleni et al. 1995; Chen et al. 2007; Schneider et al. 2009). The first two clusters (TEM ESBL and CTX-M-3) are Class A serine β-lactamases but are distantly homologous, and the third (OXA-51) belongs to the highly heterogeneous Class D (Bush and Jacoby 2010; Bush 2013). Each of the three clusters is the result of rapid genetic diversification, in response to widespread clinical use of newer β-lactam antibiotics. The original TEM had broad spectrum β-lactam resistance, i.e. resistance to penicillins and to first-generation cephalosporins (Matagne et al. 1999). TEM diversification appears to have been driven by expanded-spectrum cephalosporins and monobactams (Jacoby-Bush 2be group) and by β-lactamase inhibitors such as clavulamic acid or tazobactam (Jacoby-Bush 2br group) (Bush and Jacoby 2010). In contrast, the original wild-type CTX-M was already intrinsically resistant to a third-generation cephalosporin, cefotaxime (Delmas et al. 2010). Its recent adaptation appears to have been driven by mutations increasing its hydrolytic activity against ceftazidime/ceftriaxone while preserving activity against cefotaxime (Bonnet 2004). The OXA-51 cluster is a member of the OXA subfamily of serine β-lactamases, which are characterized by high activity against oxacyclin. This subfamily contains >500 unique variants (Naas et al. 2017). Some of these variants (such as the OXA-48 cluster) have acquired ESBL activity, pointing to cephalosporins or monobactams as the drivers of rapid genetic diversification (Poirel et al. 2012). The OXA-51 cluster studied here consistently shows carbapenemase activity, suggesting that its genetic diversification was driven by carbapenem use in the clinic (Evans and Amyes 2014; Docquier and Mangani 2016). Thus, while OXA-51 β-lactamases are functionally analogous to TEM and CTX-M-3, they have diverged substantially from these clusters and have experienced a different selection as a source for rapid diversification. Inclusion of the Class D OXA-51 supports the generalizability of NAPA, as it allows us to contrast and compare patterns of ESBL evolution in this cluster to the two Class A clusters. The NAPA method provides novel tools to build phylogeny-based mutation co-occurrence networks, and to derive fundamental properties of protein evolution from these networks. It can also be used to build networks based only on protein sequence alignments. In the three clusters analyzed here, NAPA identified key adaptive mutations against a background of dozens or even hundreds of reported mutations, as well as pairwise and higher order functional interactions. We were also able to predict higher order trajectories likely to increase fitness. These observations were consistent across all three clusters and we believe that they are more broadly applicable to proteins that have experienced rapid diversification under a dominant positive selection. Importantly, although we interpret our findings with respect to functional and structural information from the literature, our network analysis does not rely on structural or biochemical data. This feature makes our tools applicable to a wide range of proteins, many of which have limited structural and/or biochemical information. Results and Discussion TEM ESBL Networks To ascertain the possible advantages of phylogeny-based mutation co-occurrence networks with respect to networks built without a phylogenetic model (alignment-based networks), we generated both networks for TEM ESBLs (fig. 1A and B). Node centralities in both the alignment-based and the phylogeny-based networks identified mutations known to be critical for ESBL adaptation: G238S, E104K, and E240K (Page 2008; Dellus-Gur et al. 2013,, 2015) (supplementary table 1, Supplementary Material online). A fourth position also known to be important for adaptation, R164, appeared in different locations within the network, depending on the amino acid substitution: R164S, H, or C (Salverda et al. 2011; Dellus-Gur et al. 2013,, 2015). Mutations in these positions (164, 238, and 240) either move the Ω-loop (164) or reposition the β-strand β3 within the active site (238, 240) (Matagne and Frere 1995; Matagne et al. 1998). Additionally, both networks identified mutations that compensate for pleiotropic effects of multiple adaptive mutations. Although not directly adaptive, these mutations are critical under strong selective pressure. Mutation M182T represents a global suppressor that compensates for decreased thermodynamic stability introduced by the adaptive mutations (Huang and Palzkill 1997; Brown et al. 2010). Mutations at residue E104 have both gain-of-function and compensatory effects (Petit et al. 1995; Schenk et al. 2015). Although both networks reinforce the identification of functionally important mutations, there are differences in the connectivity of specific mutations. In this framework, connectivity refers to “hub-ness” in the network, as measured by the weighted degree centrality. On one hand, the connectivity of a mutation known to have a minor role in adaptation, such as Q39K or T265M, tends to be notably lower in the phylogeny-based network. Q39K is mildly adaptive, and its high prevalence in the clinic is likely due to clonal expansion (Blazquez et al. 1995), and T265M is a global suppressor, which occurs infrequently (Brown et al. 2010; Salverda et al. 2010). On the other hand, the network connectivities (weighted degrees) of two of the critical adaptive mutations, R164C and R164H, are increased in the phylogeny-based network. More specifically, the link between R163H and R164C is also stronger in the phylogeny-based network, which makes functional sense as these are adjacent residues in the Ω-loop. Fig. 1. View largeDownload slide NAPA-generated mutation networks of TEM ESBL and CTX-M-3 adaptive evolution. (A) TEM ESBL alignment-based network. (B) TEM ESBL undirected phylogeny-based network. (C) CTX-M-3 alignment-based network. (D) CTX-M-3 undirected phylogeny-based network. Nodes represent individual mutations with respect to the ancestral protein sequence, and links represent predicted functional interactions (see Materials and Methods). Node size is proportional to the node’s connectivity (weighted degree centrality); color represents the assignment of a node to a densely linked community in the network. Link thickness is proportional to the link’s weight, which corresponds to mutation co-occurrence counts. In alignment-based networks, links are weighted by the number of times two mutations are seen together in the same protein sequence. In undirected phylogeny-based networks, links are weighted by the number of times two mutations are found in the same path leading from an ancestral to descendant sequence on the phylogenetic tree. In TEM, G238S, and R164S/H/C, and E240K are adaptive mutations, increasing resistance to extended-spectrum antibiotics, whereas M182T, and T265M are stabilizing and compensate for the decreased thermodynamic instability induced by some of the adaptive mutations. E104K is both adaptive and stabilizing. In CTX-M-3, D240G increases resistance to the extended-spectrum antibiotic cefotaxime while reducing resistance to ceftazidime. Mutations in P167 are antagonistic to D240G, increasing resistance to ceftazidime, with little effect on resistance to cefotaxime. A77V stabilizes the active site. N114D, A140S, and D288N are frequently found in combination with D240G or P167S/T leading to dual resistance to cefotaxime and ceftazidime (Results and Discussion, ESBL TEM β-lactamase networks). Fig. 1. View largeDownload slide NAPA-generated mutation networks of TEM ESBL and CTX-M-3 adaptive evolution. (A) TEM ESBL alignment-based network. (B) TEM ESBL undirected phylogeny-based network. (C) CTX-M-3 alignment-based network. (D) CTX-M-3 undirected phylogeny-based network. Nodes represent individual mutations with respect to the ancestral protein sequence, and links represent predicted functional interactions (see Materials and Methods). Node size is proportional to the node’s connectivity (weighted degree centrality); color represents the assignment of a node to a densely linked community in the network. Link thickness is proportional to the link’s weight, which corresponds to mutation co-occurrence counts. In alignment-based networks, links are weighted by the number of times two mutations are seen together in the same protein sequence. In undirected phylogeny-based networks, links are weighted by the number of times two mutations are found in the same path leading from an ancestral to descendant sequence on the phylogenetic tree. In TEM, G238S, and R164S/H/C, and E240K are adaptive mutations, increasing resistance to extended-spectrum antibiotics, whereas M182T, and T265M are stabilizing and compensate for the decreased thermodynamic instability induced by some of the adaptive mutations. E104K is both adaptive and stabilizing. In CTX-M-3, D240G increases resistance to the extended-spectrum antibiotic cefotaxime while reducing resistance to ceftazidime. Mutations in P167 are antagonistic to D240G, increasing resistance to ceftazidime, with little effect on resistance to cefotaxime. A77V stabilizes the active site. N114D, A140S, and D288N are frequently found in combination with D240G or P167S/T leading to dual resistance to cefotaxime and ceftazidime (Results and Discussion, ESBL TEM β-lactamase networks). Adaptive mutations are found in corresponding densely linked communities in both networks. One community contains E104K and G238S; a second contains E240K and R164S; and a third contains R164H. G238S and mutations at position R164 confer a fitness advantage but are known to be mutually exclusive (Dellus-Gur et al. 2015), consistent with the idea that network communities may reflect contingent evolution. Conversely, links between mutations in different communities point to overlaps in distinct adaptive trajectories. Such overlaps could result from functional interactions that are preserved in different sequence contexts, for example compensatory interactions mediated by structurally stabilizing mutations. Interestingly, there are fewer links between communities in the phylogeny-based networks, in and communities are better defined as measured by the network community modularity metric (see Materials and Methods, Network Analysis). The modularity of the alignment-based network is only 0.13, compared with 0.28 for the phylogeny-based network. CTX-M β-Lactamase Networks CTX-M β-lactamases are mechanistically similar to TEMs, but only distantly homologous. CTX-Ms have independently acquired ESBL resistance during their more recent evolutionary radiation (Bonnet 2004). Using NAPA, we generated a network model of the CTX-M-3 cluster, which contains the largest number of CTX-M sequences. In the alignment-based network (fig. 1C), five nodes stand out for their high connectivity: D240G, A77V, N114D, A140S, and D288N. Of these, D240G and A77V have the highest connectivity, suggesting their impacts on adaptive evolution of protein function are most prominent. Indeed, D240G is known to confer increased resistance to ceftazidime, by allowing this bulkier β-lactam to be more easily accommodated within the active site (Chen et al. 2005; Delmas et al. 2008). The A77 residue is known to contribute to the formation of the active site cavity, providing key hydrophobic interactions that stabilize its core architecture (He et al. 2015). In the CTX-M-3 phylogeny-based network (fig. 1D), when compared with the alignment-based network, both the number of links and link weights are dramatically reduced. At the same time, strong links between the known adaptive mutations A77V, D240G, and D288N are maintained. Both networks identified the same two communities, containing D240G, and A77V-D288N, respectively). However, the size of the A77V-D288N community was significantly reduced in the phylogeny-based network, from 16 to 8 linked nodes. Of the latter 8 nodes, five have been previously shown to occur in the two distinct adaptive solutions leading to increased ceftazidime resistance (Novais et al. 2010). One of these contains D240G, whereas the other requires P167S. P167S is preserved in both networks, but only a very weak signal is detected for trajectories containing this adaptive mutation. P167S shifts resistance from cefotaxime to ceftazidime and leads to a local fitness optimum. Therefore, mutants containing this mutation have very limited potential to continue to evolve higher levels of resistance to both antibiotics (Novais et al. 2010). The star-like topology in the D240G community is preserved in both the alignment-based and the phylogeny-based networks, and it reveals a set of mutations existing only in the context of this adaptive mutation. The increased mobility and flexibility of the D240G mutation is known to lead to a substantial side-chain reorganization (Delmas et al. 2008). The mutations that are weakly linked to D240G likely lead to minor structural readjustments, and possibly represent a variety of redundant solutions. OXA β-Lactamase Networks OXA β-lactamases are a diverse subfamily of serine β-lactamases characterized by high activity against oxacyclin and are undergoing very rapid evolutionary radiation (Barlow and Hall 2002; Evans and Amyes 2014). The OXA-51 cluster is the largest and most diverse within molecular class D and appears to be evolving under carbapenem selective pressure (Walther-Rasmussen and Hoiby 2006; Evans and Amyes 2014; Docquier and Mangani 2016). The OXA-51 alignment-based (fig. 2A) and undirected phylogeny-based (fig. 2B) networks are both much larger than for the TEM and CTX-M-3 clusters. This greater complexity reflects the increased genetic diversification of OXA-51 evolution. Compared with alignment-based network, OXA-51’s phylogeny-based network has fewer highly connected nodes. This appears to be a general trend, as this reduction was also seen in TEM and CTX-M (fig. 1B and D). Fig. 2. View largeDownload slide NAPA-generated undirected mutation networks of OXA-51 adaptive evolution. (A) OXA-51 alignment-based network. (B) OXA-51 undirected phylogeny-based network. Node and link representations are the same as figure 1. OXA-51, D117N, K146D/N, L187V, Q194P, and D225N mutations are under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (Results and Discussion, OXA β-lactamase networks). Fig. 2. View largeDownload slide NAPA-generated undirected mutation networks of OXA-51 adaptive evolution. (A) OXA-51 alignment-based network. (B) OXA-51 undirected phylogeny-based network. Node and link representations are the same as figure 1. OXA-51, D117N, K146D/N, L187V, Q194P, and D225N mutations are under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (Results and Discussion, OXA β-lactamase networks). Although network communities are largely preserved across all OXA-51-like networks, they are better defined in the phylogeny-based networks, for which the network modularity increases from 0.37 to 0.55 (see Materials and Methods, Network Analysis). Five large communities (containing five or more nodes) were detected in both the alignment- and the phylogeny-based networks. One community (green in fig. 2) contains the highly connected node D225N. A second community (red), contains D105N, D117N, and K146N. A third community (blue), contains E36D and Q57H. A fourth community (orange) clusters around A48V and Q194P, and the fifth community contains another mutation in residue Q194 (Q194L). According to a recent review, the D225N and D117N mutations are in positions likely to be involved in specificity for the carbapenem β-lactam antibiotic, and D225N increases the flexibility of the active site entrance. The L187V mutation (in the undirected phylogeny-based network green community) increases the space between the S80 catalytic residue and the end of the active site cleft; K146 is in the active site Ω-loop spanning positions 144 through 160, with mutations K146N and K146D affecting substrate specificity. Finally, Q194 is in a region known to be important for dimer formation (residues 180 through 200) (Evans and Amyes 2014). Directed Phylogeny-Based Networks Although mutation co-occurrence networks can point to functional dependencies, they do not include information about the temporal order in which two mutations were fixed. NAPA derives temporal ordering from inferred ordering of mutations along distinct paths from internal to leaf sequences in a phylogenetic tree (see Materials and Methods, Network Construction). This approach was most successful in the OXA-51-like network (fig. 3), whereas both the TEM and CTX-M-3 clusters exhibited low genetic diversity and convergent evolution. The phylogenetic trees for these clusters were very shallow and did not contain sufficient information to determine the preferred temporal ordering of mutations on the tree. For this reason, the phylogeny-based networks contained directed links pointing in both directions and with similar weights (supplementary figs. 1 and 2, Supplementary Material online). Fig. 3. View largeDownload slide NAPA-generated directed mutation network of OXA-51 adaptive evolution. Directed phylogeny-based network, where link direction represents the temporal ordering of mutations on the phylogenetic tree. Node and link representations are the same as figure 1, with addition of link direction. Link arrows start at the inferred earlier mutation and point to the inferred later mutation. As in figure 2, D117N, K146D/N, L187V, Q194P, and D225N are mutations under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (see Results and Discussion, OXA β-lactamase networks). Fig. 3. View largeDownload slide NAPA-generated directed mutation network of OXA-51 adaptive evolution. Directed phylogeny-based network, where link direction represents the temporal ordering of mutations on the phylogenetic tree. Node and link representations are the same as figure 1, with addition of link direction. Link arrows start at the inferred earlier mutation and point to the inferred later mutation. As in figure 2, D117N, K146D/N, L187V, Q194P, and D225N are mutations under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (see Results and Discussion, OXA β-lactamase networks). In all three protein clusters, directed phylogeny-based networks further reduced the number and strength of the links connecting most mutation pairs, whereas preserving strong links involving known adaptive mutations. In the case of the TEM and CTX-M-3 clusters, the reduction in number and weight of links was subtler because the corresponding phylogenetic trees were shallow, but in the OXA-51-like cluster the trend toward reduction while preserving key functional links was most striking. Indeed, OXA-51’s directed phylogeny-based network preserved strong links connecting D225N with E36V, E36V with L167V; and Q194P with A48V. These adaptive mutations are described in detail above. The higher proportion of known adaptive mutations among the most highly connected pairs suggests that the directed phylogeny-based network improves on the undirected phylogeny-based network by reducing the number of spurious interactions. Surprisingly, given the large size of the OXA-51 cluster, the directed phylogeny-based network largely preserved the mutation assignments to each of the four major communities seen in the alignment-based and in the undirected phylogeny-based networks. Central Links and Fitness-Increasing Double Mutants in the OXA-51 Cluster NAPA was used to identify linked pairs of mutations that were central in the OXA-51 alignment-based, and the undirected/directed phylogeny-based networks. We investigated whether there was enrichment for fitness-increasing double mutants in these central links. Because a double mutant found as a clinical isolate is highly likely to have been selected for increased fitness, we looked for pairs of mutations that occurred together against a wild-type background in the OXA-51 clinical isolates. We found that while the total number of central network links dramatically decreased from alignment-based to phylogeny-based undirected to phylogeny-based directed, the number of central links observed in clinical isolates was similar for all three networks (table 1). Therefore, the percent of central links that were clinically observed as double mutants increased (4%, 8%, and 14%, respectively). This observation suggests that undirected phylogeny-based networks are enriched for functionally relevant links, and that directed networks represent even further enrichment. Table 1. Network Link and Path Centralities Identify Fitness-Increasing Double Mutants in OXA-51. Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Note.—The alignment-based, phylogeny-based undirected, and phylogeny-based directed networks are compared by total number of links (pairs of linked nodes) in the network, number of links that are central to the network topology (see Materials and Methods), and the proportion of central links that occur as double mutants in clinical isolates. The hypothesis is that a double mutant that occurs against a wild-type background in OXA-51 clinical isolates represents a true positive fitness-increasing mutation pair. Ranking double mutants by frequency of their constituent mutations was a less effective predictor of fitness-increasing pairs than link centrality. The phylogeny-based networks and the directed phylogeny-based network in particular produced the highest proportion of network central links seen as clinical isolates. Table 1. Network Link and Path Centralities Identify Fitness-Increasing Double Mutants in OXA-51. Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Note.—The alignment-based, phylogeny-based undirected, and phylogeny-based directed networks are compared by total number of links (pairs of linked nodes) in the network, number of links that are central to the network topology (see Materials and Methods), and the proportion of central links that occur as double mutants in clinical isolates. The hypothesis is that a double mutant that occurs against a wild-type background in OXA-51 clinical isolates represents a true positive fitness-increasing mutation pair. Ranking double mutants by frequency of their constituent mutations was a less effective predictor of fitness-increasing pairs than link centrality. The phylogeny-based networks and the directed phylogeny-based network in particular produced the highest proportion of network central links seen as clinical isolates. High-frequency single mutations found in clinical isolates are likely to increase fitness in isolation or when combined with other mutations (Gerrish and Lenski 1998; Wilke and Adami 2001; Guthrie, et al. 2011; Jain, et al. 2011). However, our central link analysis was more effective at identifying fitness-increasing double mutants, than simply considering pair combinations of high-frequency mutations. We compared the enrichment for clinical isolates in central network links to enrichment for isolates in pairs of mutations that individually occurred at high-frequency (see Materials and Methods, A Mutation Frequency-based Control for Central Network Paths). A list of double mutants ranked according to network link centrality was compared with a list of double mutants ranked by the product of individual frequencies. The proportion of double mutants seen as clinical isolates was always many fold lower in the frequency-based list (table 1). In addition, the directed phylogeny-based network produced the highest proportion of network central links seen as clinical isolates, suggesting that consideration of the temporal ordering of adaptive trajectories provided additional information for identifying fitness increasing double mutants. Central Paths and Fitness-Increasing Triple Mutants in the OXA-51 Cluster Generalizing the hypothesis above from double mutants to triple mutants, we looked for fitness-increasing triple mutants, using paths through the network that consisted of three linked nodes, ranked by betweenness centrality (see Materials and Methods, Network Analysis). NAPA was used to compute triplet centrality for the OXA-51 alignment-based, undirected phylogeny-based, and directed phylogeny-based networks, and the frequency-based control was extended to triplets (table 2). Although the number of central triplets decreased as we moved from alignment-based to undirected and directed phylogeny-based networks, the fraction of triplets that had been observed in clinical isolates increased. The trend matched what was observed in the double mutants and suggests that our conclusions based on mutation pairs may be generalizable to higher order interactions. Table 2. Network Link and Path Centralities Identify Fitness-Increasing Triple Mutants in OXA-51. Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Note.—The network link analysis extended to central network paths consisting of two adjacent links (three nodes linked in a path) representing triple mutants. Table 2. Network Link and Path Centralities Identify Fitness-Increasing Triple Mutants in OXA-51. Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Note.—The network link analysis extended to central network paths consisting of two adjacent links (three nodes linked in a path) representing triple mutants. Comparing Predictions by Network Type Our metric of predictive success is the proportion of central network paths representing double or triple mutants that have been observed as clinical isolates (tables 1 and 2). We expect that combinations of mutations that appear as clinical isolates have been selected for conferring increased antibiotic resistance. Therefore, they can be used to evaluate the predictive success of a network centrality metric. However, they cannot be applied as a gold standard to compare false positive and false negative rates of different networks. Evolution has not necessarily discovered all possible fit sequences for a protein cluster, thus the number of sequenced clinical isolates increases over time and is incomplete. Network predictions that a central path is fitness-increasing are not guaranteed to be false positives if they do not match a clinical isolate in current databases. Although known clinical isolates that are not predicted by network centralities are indeed false negatives, the false negative rates of alignment-based, undirected phylogeny-based, and directed phylogeny-based networks cannot be reliably compared. The reason is that for a given protein cluster, each network type will have a different size, and larger networks are more likely to have lower false negative rates by chance (because they have more central links). By considering the proportion of central paths in clinical isolates for each network type, it is possible to compare networks of different types and sizes. Utility and Potential Pitfalls of Phylogeny-Based Networks Spurious interactions between mutations resulting from inheritance, rather than functional constraints, are a major issue for networks that only consider the alignment of extant sequences and not the inferred phylogeny. Indeed, if only alignments are used, a pair of mutations co-occurring in multiple sequences may be erroneously inferred to have been independently selected multiple times, when in fact it was fixed only once early on in evolution of the protein cluster. When NAPA is used to construct a phylogeny-based network, only co-occurrences within the same path from an ancestral to a root node in the phylogenetic tree are considered (see Materials and Methods, Network Construction). Therefore, a mutation pair that was fixed early on will receive a co-occurrence count (link weight) of 1, while a pair that is positively selected and recurs in multiple different sequence contexts (multiple nonoverlapping paths) will receive a higher co-occurrence count. Potential caveats with using phylogenetic trees to reconstruct networks of functionally interacting pairs of mutations occur in protein clusters exhibiting convergent evolution, where multiple sequences can be ancestral to a given descendant sequence. Convergent evolution cannot be represented with tree models, which require that every protein sequence have a single ancestor. However, in our directed mutation network model, the single ancestor constraint is lifted, and we allow mutations from multiple ancestors to be linked to the same descendant mutation. The only remaining caveat is that some temporal orderings of mutation pairs based on the phylogenetic tree may not have been properly resolved. For this reason, the directed phylogeny-based network for the TEM cluster, which is known to exhibit convergent evolution, has mutation pairs with links pointing in either direction, and often with similar weights (supplementary fig. 1, Supplementary Material online). Interpreting Network Output: Prioritizing Mutations and Central Paths The network models assign a network centrality score to each mutation, but the raw score may not be sufficient to identify a tractable set of prioritized mutations, particularly in a large sequence cluster. Mutation scores can be thresholded by transforming them to P values, which quantify the probability of observing the score (or a better score) by chance. This can be accomplished by generating an equivalent Erdős–Rényi (ER) random network model and calculating a distribution of degree centrality scores, which in turn can be used to compute the P value for any score of interest. Equivalently, the distribution can be calculated analytically (e.g., as in Garlaschelli 2009). NAPA provides P values for mutation degree centrality scores by default. A standard P value threshold of statistical significance, such as p ≤ 0.05, can then be applied. To our knowledge, there is no statistically principled method for fast computation of P values for path centralities. Using k-path centrality scores to rank double or complex mutants (links or paths) (see Materials and Methods, Network Analysis) is likely to yield an intractably long list, particularly for large networks. A much shorter list can be generated by using shortest-path centrality scores, and they can be simply prioritized by considering only paths with centrality scores >0. To reduce the number of paths further, paths that do not contain mutations with significant degree centralities, can be filtered out at this stage. Finally, as NAPA also finds communities of densely linked mutations, a researcher may choose to focus on a specific community of interest within a larger network and prioritize the central mutations/links/paths in that community. Key Properties of Adaptive Evolution Derived from NAPA We used NAPA to study protein adaptation in three clusters of β-lactamase homologs, which have all experienced recent evolutionary radiation (TEM ESBL, CTX-M-3, OXA-51). We hypothesized that each cluster originated in a diversification event that depended on a few key mutations, which would appear in the networks as highly connected nodes. Indeed, in TEM and CTX-M, the most highly connected nodes represented mutations well documented as playing important roles in adaptive evolution, based on functional and structural studies (fig. 1 and supplementary tables 1 and 2, Supplementary Material online). In OXA-51, which is the least studied of the three clusters, the highly connected nodes were mutations previously identified as positively selected by nonsynonymous to synonymous ratio (dN/dS) and whose functional importance has been rationalized by inspection of OXA-51 protein structures (fig. 2 and supplementary table 3, Supplementary Material online) (Evans and Amyes 2014). Additionally, in each of the clusters we analyzed, different adaptive mutations had a tendency to be assigned to different network communities. These large, well-defined communities may correspond to alternative evolutionary strategies, or to divergent selective pressures. There were notable differences in the global network structure between the three protein clusters, reflecting distinct characteristics of adaptive evolution in each cluster. In TEM and CTX-M, despite a small number of mutations, there were many strong links between mutations in different communities. This network structure could reflect convergent adaptive trajectories, which cover a small number of possible adaptive sequence solutions. In this scenario, evolution has explored numerous combinations of a small set of distinct mutations, including the major gain-of-function mutations at residues G238 and R164 in TEM, or D240 and P167 in CTX-M. It is possible further sequence space could not be explored due to genetic constraints. Escape from this limited region of sequence space may require engineered solutions, such as initial selection for decreased resistance in a directed evolution experiment (Steinberg and Ostermeier 2016). In OXA-51, there is a larger number of nodes, and network communities appear to be better separated, with fewer strong links across different communities. This global structure could reflect divergent adaptive trajectories resulting from a combination of multiple factors, such as equivalent adaptive solutions and complex selections (multiple hosts, multiple β-lactam drugs, and fluctuating selective pressures). OXA β-lactamases in general may be more versatile in their evolution of new functions, as other members of the OXA family have also evolved ESBL activity independently. This versatility may have arisen in response to exposure to different types of β-lactamases than TEM or CTX-M. Studies of mutation coevolution and covariation in proteins have largely focused on pairs of compensatory mutations that preserve protein function and fold (de Juan et al. 2013). These pairs tend to be in physical contact, and they play a critical role in the structural constraints governing protein folding and stability. Recently, improvements in their identification have advanced the field of protein structure prediction (Hopf et al. 2014). In contrast, NAPA identifies pairs and higher order combinations of mutations whose dependencies contribute to the evolution of new functions. Based on analysis of available high-resolution (<2 Å) protein X-ray crystal structures in the Protein Data Bank (Berman et al. 2000) (PDB ID 1m40 for TEM-1, Minasov et al. 2002; PDB ID 5kzh for OXA-51, June et al. 2016), we find that most of these mutations do not form physical contacts. The mean pairwise distance between mutated residues in the pairs predicted as coevolving by NAPA was 18.3 Å for TEM-1 and 27.7 Å for OXA-51. In summary, we have developed NAPA, a new method and open source software package to analyze protein evolution under strong selective pressures with network modeling. We demonstrate here how it can be applied to help characterize the evolutionary process of three β-lactamase protein clusters as they acquire mutations and develop resistance to antibiotic treatments in the clinic. NAPA was able to identify functionally important mutations and their interactions as well as characterize higher order interactions in likely trajectories leading to increased fitness. NAPA also revealed important differences between the clusters, with respect to mechanisms by which sequence space was explored in their adaptive evolution. In the future, we expect to apply NAPA to additional homologous clusters of proteins rapidly evolving new functions under strong selective pressures. Materials and Methods Data Collection TEM sequences were downloaded from the Lahey clinic database (http://www.lahey.org/Studies/; last accessed May 24, 2015). CTX-M Class A and OXA Class D sequences were downloaded from the Beta-Lactamase Data Resources section of the NCBI Pathogen Detection database (ftp://ftp.ncbi.nlm.nih.gov/pathogen/betalactamases/; last accessed April 10, 2016). To directly compare the alignment- and phylogeny-based approaches, we excluded laboratory-engineered sequences, which could not be incorporated into the same phylogenetic model as sequences from naturally occurring clinical isolates. Because we used coding DNA sequences for phylogenetic inference, only the mutants for which the nucleotide sequence was available were included. Therefore, the total number of sequences used was less than in our previous report (Guthrie et al. 2011). For Class A β-lactamase sequences, codon positions were renumbered to correspond to the standard Ambler numbering scheme (Ambler et al. 1991). For the Class D sequences, there is no single standard scheme and we renumbered codons based on a structural alignment, as suggested in (Evans and Amyes 2014). Clusters within CTX-M and OXA families were identified with a neighbor-joining tree and matched to known phylogenetic groups. Network Construction We constructed networks of functionally interacting mutations from either protein multiple sequence alignments, resulting in undirected alignment-based co-occurrence networks, or from a phylogeny with associated alignments of internal and leaf phylogeny node protein sequences, resulting in directed or undirected phylogeny-based networks (supplementary fig. 3, Supplementary Material online). The “build” module of NAPA was used to construct all networks, setting the network type (“net_type”) to either “aln” for alignment-based, or “phylo” for phylogeny-based networks. The link type (“edge_type”) was set to undirected (available for both, alignment- and phylogeny-based, networks) or directed (only available for phylogeny-based networks). To construct alignment-based networks, input protein sequences were aligned with ClustalW, using TEM-1, CTX-M-3, and OXA-269 as the ancestral sequence for each cluster (Thompson et al. 1994). The networks were constructed based on missense mutations only, so that each network node represents an amino acid substitution at a specific codon position. At each position in the alignment, amino acid residues that did not match the ancestral sequence were identified, and gapped positions were skipped. Each mutation was represented as a network node, and two nodes were linked in the network if the corresponding mutations co-occurred in at least one distinct alignment sequence. The link was weighted by the number of sequences in which the mutation pair co-occurred (supplementary fig. 4, Supplementary Material online). To construct phylogeny-based networks, we first built outgroup-rooted phylogenies with reconstructed ancestral sequences on the internal phylogeny nodes (see Phylogeny and Ancestral Sequence reconstruction). If coding DNA is used to build the phylogeny, every sequence at an internal or leaf node is translated to protein sequence. Given an input phylogeny with reconstructed internal sequences, we identified the missense mutations differentiating each internal sequence from its immediate descendants (can include other internal or leaf node sequences). Pairs of mutations co-occurring along a single path from an ancestor to a unique descendant were initially gathered. An empirically determined distance threshold was then applied, such that pairs of mutations that occurred on the same path but too far apart on the tree were not counted as co-occurring. For our choice of distance threshold, we wanted to ensure that all mutation pairs, in which one mutation immediately followed the other on a given tree path are included. The distance threshold was therefore defined as the maximum sum of the lengths of any pair of adjacent links in the phylogeny. For all three protein clusters 0.01 was used as the threshold that guaranteed inclusion of all adjacent links on the phylogeny. Similar to the alignment-based network, phylogeny-based network links were weighted by mutation co-occurrence counts and limited to mutation pairs co-occurring on a single phylogeny path, within the chosen distance threshold (supplementary fig. 5, Supplementary Material online). Network Visualization All visualization was done in Cytoscape 3.5.1 (Shannon et al. 2003), using default layouts and applying visual mappings to nodes and links. A continuous mapping was applied to node size, which is proportional to that node’s weighted degree centrality. Node colors are based on a discrete mapping of its community assignment. Link thickness proportional to that link’s weight in the network using a continuous mapping. Network Analysis All analysis was done in the “analyze” module of NAPA turning on both the options for community detection (“cluster_nodes” and “calculate_centralities” both set to “yes”). The community partitioning algorithms are based on Louvain hierarchical modularity optimization (Blondel et al. 2008) as implemented in NetworkX (Hagberg et al. 2008). Community-finding algorithms optimize a metric known as modularity, which quantifies the difference between link densities within versus between communities (Blondel et al. 2008). As fewer established methods for community detection exist in directed networks, these networks were automatically converted in NAPA to their undirected representation before community detection. For identification of putative functionally important mutations the weighted degree centrality of the corresponding nodes was used:   CSv=Σu=1Nauvwuv, (1) where u and v are nodes in the network, auv is 1 if u and v are linked, otherwise 0, and wuv is the weight of the link between u and v. We used an analytical null to assess the significance of a node’s weighted centrality. The null distribution is based on Erdős–Rényi (ER) random networks with constrained number of nodes and probability of link formation (Garlaschelli 2009). To find central sets of connected nodes in the network, the k-path betweenness centrality was applied (Alahakoon et al. 2011). Unlike the more widely used shortest-path betweenness centrality, k-path betweenness centrality is based on multiple random walks of up to k steps, rather than only the minimum distance walks through the network. For sets of network nodes connected in a network path P, both k-path and shortest-path betweenness centrality are defined as:   CBP=∑s,t ∉ Pσst(P)σst, (2) where σst(P) is the number of distinct paths (shortest or random, respectively) connecting all pairs of network nodes (s, t) passing through path P. σst is the total number of (shortest or random) paths connecting nodes s and t. P can consist of a single node, and edge, or multiple adjacent edges. A Mutation Frequency-Based Control for Central Network Paths To assess the contribution of network modeling to the prediction of fitness increasing combinations of mutations, we compared a list of double mutants ranked by network link centrality to a list of double mutants ranked by the product of individual frequencies. These top-ranked frequency-based double mutants were used as our control. Because the number of pairwise combinations of mutations is much greater than the network links with centrality >0, we applied a threshold to the ranked frequency-based list: We picked the top n mutation pairs, as ranked by frequency, with n being the number of links that were central in the network. This allowed us to directly compare the enrichment in both rankings. Phylogeny and Ancestral Sequence Reconstruction NAPA can build phylogeny-based networks from either a single phylogenetic tree or a Bayesian ensemble of trees sampled from the posterior probability distribution. Due to low-sequence divergence in the TEM and CTX-M-3 clusters, the coding DNA rather than protein sequence alignment was used as input for phylogeny inference and ancestral reconstruction. The phylogenetic trees in each protein cluster were outgroup rooted. A closely related protein cluster to the protein of interest was found by constructing an identity-based neighbor joining (NJ) tree from the alignment of the protein cluster with a set of homologous protein clusters. For TEM, the NJ trees included SHV and CARB-type β-lactamases; for CTX-M-3 and OXA-51-like the NJ tree included all CTX-M and OXA clusters, respectively. In this way, the SHV-1, CTX-M-14, and OXA-213 coding DNA sequences were identified as outgroups for the TEM, CTX-M-3, and CTX-M-51-like clusters, respectively. The TEM phylogeny was inferred using the MrBayes Metropolis-coupled MCMC method (Huelsenbeck and Ronquist 2001), whereas the CTX-M-3 and OXA-51-like phylogenies were constructed with GARLi, a genetic algorithm for maximum likelihood inference method (Zwickl 2006). For each cluster, the input coding DNA sequence alignment was partitioned by codon position (Lanfear et al. 2012) and the generalized time reversible model of nucleotide substitution with gamma-distributed rates and a proportion of invariant (invariable) sites (GTR + G+I) was used. For the TEM cluster, six independent runs were performed in MrBayes, each for 30 million generations, including 15 million generation burn in. Trees were thinned to every 20,000th generation, to remove autocorrelation between phylogeny parameters. Burn-in and thinning parameters were determined from standard MCMC convergence diagnostics. The genetic algorithm run parameters for the CTX-M-3 and OXA-51-like clusters were population size of 4 individuals, selection intensity of 0.25, 2 million generations. After phylogenies were reconstructed for each cluster, coding DNA sequences on the internal nodes of the trees (ancestral sequences) were inferred by maximum likelihood (Knight et al. 2007). The same nucleotide substitution model (GTR + G+I) was applied as the one used for phylogeny reconstruction. The leaf and reconstructed internal node sequences were translated to protein sequences and included in the input to NAPA, along with the phylogenetic trees (supplementary fig. 3, Supplementary Material online). Network Analysis of Protein Adaptation All methods described in this work, are available for nonprofit use in an open source software package Network Analysis of Protein Adaptation (NAPA) at https://github.com/KarchinLab/NAPA. For a protein cluster of interest, the user provides a multiple sequence alignment or an alignment with a corresponding phylogenetic tree or tree ensemble and reconstructed ancestral sequences. NAPA produces a network model of the cluster, where each mutation is represented as a network node, and where mutation co-occurrences are represented as links between pairs of nodes. The weight of each link corresponds to the inferred strength of functional association between the corresponding pair of mutations. Temporal ordering can be inferred from the phylogenetic tree, based on the order in which individual pairs of mutations are ordered on the tree. The network model specification is exported as a simple text file, which can be imported into network visualization software, such as Cytoscape (Shannon et al. 2003). Additionally, NAPA provides customized tools, based on graph theory, to identify highly connected nodes, mutation communities, and multiple network centrality metrics to assess interactions of likely functional relevance. Acknowledgments This work was supported by the National Science Foundation ABI Innovation award (#1262435). Supplementary Material Supplementary data are available at Molecular Biology and Evolution online. References Alahakoon T, Tripathi R, Kourtellis N, Simha R, Iamnitchi A. 2011. K-path centrality: A new centrality measure in social networks. Proceedings of the 4th Workshop on Social Network Systems. p. 1. ACM Ambler RP, Ambler RP, Coulson AF, Frere JM, Ghuysen JM, Joris B, Forsman M, Levesque RC, Tiraby G, Waley SG. 1991. A standard numbering scheme for the class A beta-lactamases. Philos Trans R Soc Lond B Biol Sci . 276 (Pt 1): 269– 270. Bank C, Matuszewski S, Hietpas RT, Jensen JD. 2016. On the (un)predictability of a large intragenic fitness landscape. Proc Natl Acad Sci U S A.  113( 49): 14085– 14090. Google Scholar CrossRef Search ADS PubMed  Barlow M, Hall BG. 2002. Phylogenetic analysis shows that the OXA beta-lactamase genes have been on plasmids for millions of years. J Mol Evol.  55( 3): 314– 321. Google Scholar CrossRef Search ADS PubMed  Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000. The Protein Data Bank. Nucleic Acids Res.  28( 1): 235– 242. Google Scholar CrossRef Search ADS PubMed  Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS. 2006. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature  444( 7121): 929– 932. Google Scholar CrossRef Search ADS PubMed  Blazquez J, Morosini MI, Negri MC, Gonzalez-Leiza M, Baquero F. 1995. Single amino acid replacements at positions altered in naturally occurring extended-spectrum TEM beta-lactamases. Antimicrob Agents Chemother . 39( 1): 145– 149. Google Scholar CrossRef Search ADS PubMed  Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. 2008. Fast unfolding of communities in large networks. J Stat Mech.  2008( 10): P10008. Google Scholar CrossRef Search ADS   Bonnet R. 2004. Growing group of extended-spectrum beta-lactamases: the CTX-M enzymes. Antimicrob Agents Chemother.  48( 1): 1– 14. Google Scholar CrossRef Search ADS PubMed  Brown NG, Pennington JM, Huang W, Ayvaz T, Palzkill T. 2010. Multiple global suppressors of protein stability defects facilitate the evolution of extended-spectrum TEM beta-lactamases. J Mol Biol.  404( 5): 832– 846. Google Scholar CrossRef Search ADS PubMed  Bush K. 2013. Proliferation and significance of clinically relevant beta-lactamases. Ann N Y Acad Sci.  1277: 84– 90. Google Scholar CrossRef Search ADS PubMed  Bush K, Jacoby GA. 2010. Updated functional classification of beta-lactamases. Antimicrob Agents Chemother.  54( 3): 969– 976. Google Scholar CrossRef Search ADS PubMed  Camps M, Herman A, Loh E, Loeb LA. 2007. Genetic constraints on protein evolution. Crit Rev Biochem Mol Biol.  42( 5): 313– 326. Google Scholar CrossRef Search ADS PubMed  Carneiro M, Hartl DL. 2010. Colloquium papers: adaptive landscapes and protein evolution. Proc Natl Acad Sci U S A.  107(Suppl 1): 1747– 1751. Google Scholar CrossRef Search ADS PubMed  Chakrabarti S, Panchenko AR. 2010. Structural and functional roles of coevolved sites in proteins. PLoS One  5( 1): e8591. Google Scholar CrossRef Search ADS PubMed  Chen Y, Bonnet R, Shoichet BK. 2007. The acylation mechanism of CTX-M beta-lactamase at 0.88 a resolution. J Am Chem Soc.  129( 17): 5378– 5380. Google Scholar CrossRef Search ADS PubMed  Chen Y, Delmas J, Sirot J, Shoichet B, Bonnet R. 2005. Atomic resolution structures of CTX-M beta-lactamases: extended spectrum activities from increased mobility and decreased stability. J Mol Biol.  348( 2): 349– 362. Google Scholar CrossRef Search ADS PubMed  Crona K, Greene D, Barlow M. 2013. The peaks and geometry of fitness landscapes. J Theor Biol.  317: 1– 10. Google Scholar CrossRef Search ADS PubMed  de Juan D, Pazos F, Valencia A. 2013. Emerging methods in protein co-evolution. Nat Rev Genet.  14( 4): 249– 261. Google Scholar CrossRef Search ADS PubMed  de Visser JA, Krug J. 2014. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet.  15( 7): 480– 490. Google Scholar CrossRef Search ADS PubMed  Dellus-Gur E, Elias M, Caselli E, Prati F, Salverda ML, de Visser JAG, Fraser JS, Tawfik DS. 2015. Negative epistasis and evolvability in TEM-1 β-lactamase–the thin line between an enzyme’s conformational freedom and disorder. J Mol Biol.  427( 14): 2396– 2409. Google Scholar CrossRef Search ADS PubMed  Dellus-Gur E, Toth-Petroczy A, Elias M, Tawfik DS. 2013. What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. J Mol Biol.  425( 14): 2609– 2621. Google Scholar CrossRef Search ADS PubMed  Delmas J, Chen Y, Prati F, Robin F, Shoichet BK, Bonnet R. 2008. Structure and dynamics of CTX-M enzymes reveal insights into substrate accommodation by extended-spectrum beta-lactamases. J Mol Biol.  375( 1): 192– 201. Google Scholar CrossRef Search ADS PubMed  Delmas J, Leyssene D, Dubois D, Birck C, Vazeille E, Robin F, Bonnet R. 2010. Structural insights into substrate recognition and product expulsion in CTX-M enzymes. J Mol Biol.  400( 1): 108– 120. Google Scholar CrossRef Search ADS PubMed  DePristo MA, Hartl DL, Weinreich DM. 2007. Mutational reversions during adaptive protein evolution. Mol Biol Evol.  24( 8): 1608– 1610. Google Scholar CrossRef Search ADS PubMed  DePristo MA, Weinreich DM, Hartl DL. 2005. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet.  6( 9): 678– 687. Google Scholar CrossRef Search ADS PubMed  Docquier JD, Mangani S. 2016. Structure-Function Relationships of Class D Carbapenemases. Curr Drug Targets  17( 9): 1061– 1071. Google Scholar CrossRef Search ADS PubMed  Evans BA, Amyes SG. 2014. OXA beta-lactamases. Clin Microbiol Rev.  27( 2): 241– 263. Google Scholar CrossRef Search ADS PubMed  Firnberg E, Labonte JW, Gray JJ, Ostermeier M. 2014. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol.  31( 6): 1581– 1592. Google Scholar CrossRef Search ADS PubMed  Fogle CA, Nagle JL, Desai MM. 2008. Clonal interference, multiple mutations and adaptation in large asexual populations. Genetics  180( 4): 2163– 2173. Google Scholar CrossRef Search ADS PubMed  Galleni M, Lamotte-Brasseur J, Raquet X, Dubus A, Monnaie D, Knox JR, Frere JM. 1995. The enigmatic catalytic mechanism of active-site serine beta-lactamases. Biochem Pharmacol.  49( 9): 1171– 1178. Google Scholar CrossRef Search ADS PubMed  Garlaschelli D. 2009. The weighted random graph model. N J Phys . 11( 7): 073005. Google Scholar CrossRef Search ADS   Gerrish PJ, Lenski RE. 1998. The fate of competing beneficial mutations in an asexual population. Genetica  102–103( 1–6): 127– 144. Google Scholar CrossRef Search ADS PubMed  Guthrie VB, Allen J, Camps M, Karchin R. 2011. Network models of TEM beta-lactamase mutations coevolving under antibiotic selection show modular structure and anticipate evolutionary trajectories. PLoS Comput Biol.  7( 9): e1002184. Google Scholar CrossRef Search ADS PubMed  Hagberg AA, Schult DA, Swart PJ. 2008. Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference (SciPy2008) : 11– 15. He D, Chiou J, Zeng Z, Liu L, Chen X, Zeng L, Chan EW, Liu JH, Chen S. 2015. Residues distal to the active site contribute to enhanced catalytic activity of variant and hybrid beta-lactamases derived from CTX-M-14 and CTX-M-15. Antimicrob Agents Chemother.  59( 10): 5976– 5983. Google Scholar CrossRef Search ADS PubMed  Hopf TA, Scharfe CP, Rodrigues JP, Green AG, Kohlbacher O, Sander C, Bonvin AM, Marks DS. 2014. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife  3: e03430. Google Scholar CrossRef Search ADS   Huang W, Palzkill T. 1997. A natural polymorphism in beta-lactamase is a global suppressor. Proc Natl Acad Sci U S A.  94( 16): 8801– 8806. Google Scholar CrossRef Search ADS PubMed  Huelsenbeck JP, Ronquist F. 2001. MRBAYES: bayesian inference of phylogenetic trees. Bioinformatics  17( 8): 754– 755. Google Scholar CrossRef Search ADS PubMed  Jain K, Krug J, Park SC. 2011. Evolutionary advantage of small populations on complex fitness landscapes. Evolution  65( 7): 1945– 1955. Google Scholar CrossRef Search ADS PubMed  Jansen G, Barbosa C, Schulenburg H. 2013. Experimental evolution as an efficient tool to dissect adaptive paths to antibiotic resistance. Drug Resist Updat.  16( 6): 96– 107. Google Scholar CrossRef Search ADS PubMed  June CM, Muckenthaler TJ, Schroder EC, Klamer ZL, Wawrzak Z, Powers RA, Szarecka A, Leonard DA. 2016. The structure of a doripenem-bound OXA-51 class D beta-lactamase variant with enhanced carbapenemase activity. Protein Sci.  25( 12): 2152– 2163. Google Scholar CrossRef Search ADS PubMed  Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z, et al.   2007. PyCogent: a toolkit for making sense from sequence. Genome Biol.  8( 8): R171. Google Scholar CrossRef Search ADS PubMed  Kryazhimskiy S, Dushoff J, Bazykin GA, Plotkin JB. 2011. Prevalence of epistasis in the evolution of influenza A surface proteins. PLoS Genet.  7( 2): e1001301. Google Scholar CrossRef Search ADS PubMed  Lanfear R, Calcott B, Ho SY, Guindon S. 2012. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol.  29( 6): 1695– 1701. Google Scholar CrossRef Search ADS PubMed  Lee BC, Park K, Kim D. 2008. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins  72( 3): 863– 872. Google Scholar CrossRef Search ADS PubMed  Legendre P, Makarenkov V. 2002. Reconstruction of biogeographic and evolutionary networks using reticulograms. Syst Biol.  51( 2): 199– 216. Google Scholar CrossRef Search ADS PubMed  Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL. 2016. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet.  17( 11): 704– 714. Google Scholar CrossRef Search ADS PubMed  Matagne A, Dubus A, Galleni M, Frere JM. 1999. The beta-lactamase cycle: a tale of selective pressure and bacterial ingenuity. Nat Prod Rep.  16( 1): 1– 19. Google Scholar CrossRef Search ADS PubMed  Matagne A, Frere JM. 1995. Contribution of mutant analysis to the understanding of enzyme catalysis: the case of class A beta-lactamases. Biochim Biophys Acta  1246( 2): 109– 127. Google Scholar CrossRef Search ADS PubMed  Matagne A, Lamotte-Brasseur J, Frere JM. 1998. Catalytic properties of class A beta-lactamases: efficiency and diversity. Biochem J.  330 (Pt 2): 581– 598. Google Scholar CrossRef Search ADS PubMed  Minasov G, Wang X, Shoichet BK. 2002. An ultrahigh resolution structure of TEM-1 beta-lactamase suggests a role for Glu166 as the general base in acylation. J Am Chem Soc.  124( 19): 5333– 5340. Google Scholar CrossRef Search ADS PubMed  Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M. 2015. Rational design of antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PLoS One  10( 5): e0122283. Google Scholar CrossRef Search ADS PubMed  Naas T, Oueslati S, Bonnin RA, Dabos ML, Zavala A, Dortet L, Retailleau P, Iorga BI. 2017. Beta-lactamase database (BLDB) – structure and function. J Enzyme Inhib Med Chem.  32( 1): 917– 919. Google Scholar CrossRef Search ADS PubMed  Novais Â, Comas I, Baquero F, Cantón R, Coque TM, Moya A, González-Candelas F, Galán J-C. 2010. Evolutionary trajectories of beta-lactamase CTX-M-1 cluster enzymes: predicting antibiotic resistance. PLoS Pathog.  6( 1): e1000735. Google Scholar CrossRef Search ADS PubMed  Orr HA. 2005. The genetic theory of adaptation: a brief history. Nat Rev Genet.  6( 2): 119– 127. Google Scholar CrossRef Search ADS PubMed  Page MG. 2008. Extended-spectrum beta-lactamases: structure and kinetic mechanism. Clin Microbiol Infect . 14: 63– 74. Google Scholar CrossRef Search ADS PubMed  Petit A, Maveyraud L, Lenfant F, Samama JP, Labia R, Masson JM. 1995. Multiple substitutions at position 104 of beta-lactamase TEM-1: assessing the role of this residue in substrate specificity. Biochem J . 305( 1): 33– 40. Google Scholar CrossRef Search ADS PubMed  Poirel L, Potron A, Nordmann P. 2012. OXA-48-like carbapenemases: the phantom menace. J Antimicrob Chemother.  67( 7): 1597– 1606. Google Scholar CrossRef Search ADS PubMed  Pollock DD, Taylor WR, Goldman N. 1999. Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol.  287( 1): 187– 198. Google Scholar CrossRef Search ADS PubMed  Saakian DB, Kirakosyan Z, Hu CK. 2012. Biological evolution in a multidimensional fitness landscape. Phys Rev E Stat Nonlin Soft Matter Phys.  86( 3): 031920. Google Scholar CrossRef Search ADS PubMed  Salverda ML, Dellus E, Gorter FA, Debets AJ, van der Oost J, Hoekstra RF, Tawfik DS, de Visser JA. 2011. Initial mutations direct alternative pathways of protein evolution. PLoS Genet.  7( 3): e1001321. Google Scholar CrossRef Search ADS PubMed  Salverda MLM, de Visser JAGM, Barlow M. 2010. Natural evolution of TEM-1 β-lactamase: experimental reconstruction and clinical relevance. FEMS Microbiol Rev . 34( 6): 1015– 1036. Google Scholar CrossRef Search ADS PubMed  Schenk MF, Szendro IG, Krug J, de Visser JA. 2012. Quantifying the adaptive potential of an antibiotic resistance enzyme. PLoS Genet.  8( 6): e1002783. Google Scholar CrossRef Search ADS PubMed  Schenk MF, Witte S, Salverda MLM, Koopmanschap B, Krug J, de Visser JAGM. 2015. Role of pleiotropy during adaptation of TEM-1 beta-lactamase to two novel antibiotics. Evol Appl.  8( 3): 248– 260. Google Scholar CrossRef Search ADS PubMed  Schneider KD, Karpen ME, Bonomo RA, Leonard DA, Powers RA. 2009. The 1.4 A crystal structure of the class D beta-lactamase OXA-1 complexed with doripenem. Biochemistry  48( 50): 11840– 11847. Google Scholar CrossRef Search ADS PubMed  Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res.  13( 11): 2498– 2504. Google Scholar CrossRef Search ADS PubMed  Soskine M, Tawfik DS. 2010. Mutational effects and the evolution of new protein functions. Nat Rev Genet.  11( 8): 572– 582. Google Scholar CrossRef Search ADS PubMed  Steinberg B, Ostermeier M. 2016. Environmental changes bridge evolutionary valleys. Sci Adv.  2( 1): e1500921. Google Scholar CrossRef Search ADS PubMed  Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res.  22( 22): 4673– 4680. Google Scholar CrossRef Search ADS PubMed  Tomatis PE, Fabiane SM, Simona F, Carloni P, Sutton BJ, Vila AJ. 2008. Adaptive protein evolution grants organismal fitness by improving catalysis and flexibility. Proc Natl Acad Sci U S A.  105( 52): 20605– 20610. Google Scholar CrossRef Search ADS PubMed  Unckless RL, Orr HA. 2009. The population genetics of adaptation: multiple substitutions on a smooth fitness landscape. Genetics  183( 3): 1079– 1086. Google Scholar CrossRef Search ADS PubMed  Walther-Rasmussen J, Hoiby N. 2006. OXA-type carbapenemases. J Antimicrob Chemother.  57( 3): 373– 383. Google Scholar CrossRef Search ADS PubMed  Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. 2013. Should evolutionary geneticists worry about higher-order epistasis? Curr Opin Genet Dev  23: 700– 707. Google Scholar CrossRef Search ADS PubMed  Wilke CO, Adami C. 2001. Interaction between directional epistasis and average mutational effects. Proc Biol Sci  268: 1469– 1474. Google Scholar CrossRef Search ADS PubMed  Zwickl DJ. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD diss. http://hdl.handle.net/2152/2666: [The University of Texas at Austin]. © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Biology and Evolution Oxford University Press

Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations

Loading next page...
 
/lp/ou_press/network-analysis-of-protein-adaptation-modeling-the-functional-impact-DOWMGF0ukX
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ISSN
0737-4038
eISSN
1537-1719
D.O.I.
10.1093/molbev/msy036
Publisher site
See Article on Publisher Site

Abstract

Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. functional impact of multiple mutations, network analysis, protein adaptation, beta lactamase, antibiotic resistance Introduction The acquisition of new functions, a process known as genetic adaptation, occurs through the accumulation of mutations. The functional impacts of mutations on organisms or their protein constituents depends on both the current selective pressures and on the specific sequence contexts in which these mutations occur (Soskine and Tawfik 2010). In proteins, complex functional codependencies between mutations are pervasive but are also challenging to study, due to the large number of combinations involved (Weinreich et al. 2013). A unique opportunity to advance our understanding of protein evolutionary dynamics presents itself in single-cell, haploid microorganisms. Large populations produce a high genetic diversity even when baseline mutation frequency is low, as is the case in bacteria and yeast (Lynch et al. 2016). Drawing from this genetic diversity, positive selection can lead to the fixation of a sequence of mutations increasing fitness (DePristo et al. 2005, 2007; Orr 2005; Unckless and Orr 2009). Organismal fitness can be used as a convenient proxy for activity in these model systems, allowing for genotype–phenotype profiling across large sections of sequence space. In this way, the principles governing key aspects of evolution, including genetic robustness, evolvability, and predictability of evolutionary trajectories can be studied (Bershtein et al. 2006; Carneiro and Hartl 2010; Saakian et al. 2012; Crona et al. 2013; Firnberg et al. 2014; Bank et al. 2016). In previous work, we used the evolution of the TEM β-lactamase cluster as a model system for the acquisition of new functions (Guthrie et al. 2011). Because of low genetic barriers (in some cases, a single mutation can result in the acquisition of resistance to a new β-lactam antibiotic), TEM β-lactamases are a convenient model system for genetic adaptation (Camps et al. 2007; Fogle et al. 2008; Tomatis et al. 2008; Jansen et al. 2013; de Visser and Krug 2014). We analyzed both the inhibitor-resistant β-lactamase and extended-spectrum β-lactamase (ESBL) phenotypes, with a greater focus on the ESBL phenotype. In TEM ESBLs, the acquisition of new extended-spectrum activity leads to broader substrate specificity, in terms of resistance to additional antibiotics such as cefotaxime, which can be used to monitor adaptation (Salverda et al. 2011; Schenk et al. 2012; Mira et al. 2015). In addition, experimental and clinical evolution of extended-spectrum resistance represent analogous adaptation events in this model system, which allows the use of hybrid (experimental and clinical) mutant databases (Salverda et al. 2010; Guthrie et al. 2011). Our basic hypothesis was that recent evolutionary radiation, particularly in the case of drug resistance genes, is indicative of strong positive selection dominated by one or a few discrete types of selections. We assumed that the co-occurrence of pairs of mutations in the same TEM sequence was an indication of a potential functional interaction between them. Pairwise mutation co-occurrence was used to construct a network model. Higher order (n = 3) combinations with a positive effect on fitness were then explored by analyzing the network structure. We showed that central network links represent pairs of mutations leading to increased protein fitness. More generally, central paths consisting of three linked mutations were helpful in identifying functionally important triple mutants in TEM (Guthrie et al. 2011). Although our approach had similarities to earlier network models of mutation covariation over long evolutionary distances (Lee et al. 2008; Chakrabarti and Panchenko 2010), these earlier studies focused on dependencies which preserve protein function and structural constraints across large protein families. In contrast, we focused on dependencies critical to the evolution of new, specialized functions evolving in small clusters of highly related proteins. A limitation of mutation co-occurrence network models, regardless of their focus on function preservation or function acquisition, is that without an underlying phylogenetic model, it is not possible to determine how many times a pair of mutations co-occurred independently (Kryazhimskiy et al. 2011). The NAPA method presented here extends our previous work to incorporate phylogenetic reconstruction into network modeling. We apply it to three functionally related clusters of β-lactamase sequences: to the original TEM ESBL, we added the CTX-M-3, and the OXA-51 clusters. These three recent evolutionary radiations represent parallel evolution events under selection by different types of β-lactam antibiotics: oximino-cephalosporins (TEM ESBL, CTX-M-3) and carbapenems (OXA-51). For each protein cluster, we reconstruct the evolution of extant sequences from a putative ancestor and count mutations that co-occur along unique paths from ancestral to descendent sequences on the phylogenetic trees. The resulting networks are described here as “phylogeny-based,” although they are distinct from the well-known “phylogenetic networks,” which generalize phylogenetic trees (Legendre and Makarenkov 2002). By reducing noise caused by overrepresentation in the population caused by clonal expansion, the phylogeny-based network approach can potentially produce a more accurate count of parallel evolution events (Pollock et al. 1999). Furthermore, this approach intrinsically contains information about likely temporal ordering of mutations. The three β-lactamase clusters studied here are serine hydrolases, in which an acyl-enzyme intermediate between the β-lactam moiety and the conserved active-site serine (S70) is hydrolyzed by a reactive water, releasing the product of hydrolysis to regenerate the active site for the next turnover (Galleni et al. 1995; Chen et al. 2007; Schneider et al. 2009). The first two clusters (TEM ESBL and CTX-M-3) are Class A serine β-lactamases but are distantly homologous, and the third (OXA-51) belongs to the highly heterogeneous Class D (Bush and Jacoby 2010; Bush 2013). Each of the three clusters is the result of rapid genetic diversification, in response to widespread clinical use of newer β-lactam antibiotics. The original TEM had broad spectrum β-lactam resistance, i.e. resistance to penicillins and to first-generation cephalosporins (Matagne et al. 1999). TEM diversification appears to have been driven by expanded-spectrum cephalosporins and monobactams (Jacoby-Bush 2be group) and by β-lactamase inhibitors such as clavulamic acid or tazobactam (Jacoby-Bush 2br group) (Bush and Jacoby 2010). In contrast, the original wild-type CTX-M was already intrinsically resistant to a third-generation cephalosporin, cefotaxime (Delmas et al. 2010). Its recent adaptation appears to have been driven by mutations increasing its hydrolytic activity against ceftazidime/ceftriaxone while preserving activity against cefotaxime (Bonnet 2004). The OXA-51 cluster is a member of the OXA subfamily of serine β-lactamases, which are characterized by high activity against oxacyclin. This subfamily contains >500 unique variants (Naas et al. 2017). Some of these variants (such as the OXA-48 cluster) have acquired ESBL activity, pointing to cephalosporins or monobactams as the drivers of rapid genetic diversification (Poirel et al. 2012). The OXA-51 cluster studied here consistently shows carbapenemase activity, suggesting that its genetic diversification was driven by carbapenem use in the clinic (Evans and Amyes 2014; Docquier and Mangani 2016). Thus, while OXA-51 β-lactamases are functionally analogous to TEM and CTX-M-3, they have diverged substantially from these clusters and have experienced a different selection as a source for rapid diversification. Inclusion of the Class D OXA-51 supports the generalizability of NAPA, as it allows us to contrast and compare patterns of ESBL evolution in this cluster to the two Class A clusters. The NAPA method provides novel tools to build phylogeny-based mutation co-occurrence networks, and to derive fundamental properties of protein evolution from these networks. It can also be used to build networks based only on protein sequence alignments. In the three clusters analyzed here, NAPA identified key adaptive mutations against a background of dozens or even hundreds of reported mutations, as well as pairwise and higher order functional interactions. We were also able to predict higher order trajectories likely to increase fitness. These observations were consistent across all three clusters and we believe that they are more broadly applicable to proteins that have experienced rapid diversification under a dominant positive selection. Importantly, although we interpret our findings with respect to functional and structural information from the literature, our network analysis does not rely on structural or biochemical data. This feature makes our tools applicable to a wide range of proteins, many of which have limited structural and/or biochemical information. Results and Discussion TEM ESBL Networks To ascertain the possible advantages of phylogeny-based mutation co-occurrence networks with respect to networks built without a phylogenetic model (alignment-based networks), we generated both networks for TEM ESBLs (fig. 1A and B). Node centralities in both the alignment-based and the phylogeny-based networks identified mutations known to be critical for ESBL adaptation: G238S, E104K, and E240K (Page 2008; Dellus-Gur et al. 2013,, 2015) (supplementary table 1, Supplementary Material online). A fourth position also known to be important for adaptation, R164, appeared in different locations within the network, depending on the amino acid substitution: R164S, H, or C (Salverda et al. 2011; Dellus-Gur et al. 2013,, 2015). Mutations in these positions (164, 238, and 240) either move the Ω-loop (164) or reposition the β-strand β3 within the active site (238, 240) (Matagne and Frere 1995; Matagne et al. 1998). Additionally, both networks identified mutations that compensate for pleiotropic effects of multiple adaptive mutations. Although not directly adaptive, these mutations are critical under strong selective pressure. Mutation M182T represents a global suppressor that compensates for decreased thermodynamic stability introduced by the adaptive mutations (Huang and Palzkill 1997; Brown et al. 2010). Mutations at residue E104 have both gain-of-function and compensatory effects (Petit et al. 1995; Schenk et al. 2015). Although both networks reinforce the identification of functionally important mutations, there are differences in the connectivity of specific mutations. In this framework, connectivity refers to “hub-ness” in the network, as measured by the weighted degree centrality. On one hand, the connectivity of a mutation known to have a minor role in adaptation, such as Q39K or T265M, tends to be notably lower in the phylogeny-based network. Q39K is mildly adaptive, and its high prevalence in the clinic is likely due to clonal expansion (Blazquez et al. 1995), and T265M is a global suppressor, which occurs infrequently (Brown et al. 2010; Salverda et al. 2010). On the other hand, the network connectivities (weighted degrees) of two of the critical adaptive mutations, R164C and R164H, are increased in the phylogeny-based network. More specifically, the link between R163H and R164C is also stronger in the phylogeny-based network, which makes functional sense as these are adjacent residues in the Ω-loop. Fig. 1. View largeDownload slide NAPA-generated mutation networks of TEM ESBL and CTX-M-3 adaptive evolution. (A) TEM ESBL alignment-based network. (B) TEM ESBL undirected phylogeny-based network. (C) CTX-M-3 alignment-based network. (D) CTX-M-3 undirected phylogeny-based network. Nodes represent individual mutations with respect to the ancestral protein sequence, and links represent predicted functional interactions (see Materials and Methods). Node size is proportional to the node’s connectivity (weighted degree centrality); color represents the assignment of a node to a densely linked community in the network. Link thickness is proportional to the link’s weight, which corresponds to mutation co-occurrence counts. In alignment-based networks, links are weighted by the number of times two mutations are seen together in the same protein sequence. In undirected phylogeny-based networks, links are weighted by the number of times two mutations are found in the same path leading from an ancestral to descendant sequence on the phylogenetic tree. In TEM, G238S, and R164S/H/C, and E240K are adaptive mutations, increasing resistance to extended-spectrum antibiotics, whereas M182T, and T265M are stabilizing and compensate for the decreased thermodynamic instability induced by some of the adaptive mutations. E104K is both adaptive and stabilizing. In CTX-M-3, D240G increases resistance to the extended-spectrum antibiotic cefotaxime while reducing resistance to ceftazidime. Mutations in P167 are antagonistic to D240G, increasing resistance to ceftazidime, with little effect on resistance to cefotaxime. A77V stabilizes the active site. N114D, A140S, and D288N are frequently found in combination with D240G or P167S/T leading to dual resistance to cefotaxime and ceftazidime (Results and Discussion, ESBL TEM β-lactamase networks). Fig. 1. View largeDownload slide NAPA-generated mutation networks of TEM ESBL and CTX-M-3 adaptive evolution. (A) TEM ESBL alignment-based network. (B) TEM ESBL undirected phylogeny-based network. (C) CTX-M-3 alignment-based network. (D) CTX-M-3 undirected phylogeny-based network. Nodes represent individual mutations with respect to the ancestral protein sequence, and links represent predicted functional interactions (see Materials and Methods). Node size is proportional to the node’s connectivity (weighted degree centrality); color represents the assignment of a node to a densely linked community in the network. Link thickness is proportional to the link’s weight, which corresponds to mutation co-occurrence counts. In alignment-based networks, links are weighted by the number of times two mutations are seen together in the same protein sequence. In undirected phylogeny-based networks, links are weighted by the number of times two mutations are found in the same path leading from an ancestral to descendant sequence on the phylogenetic tree. In TEM, G238S, and R164S/H/C, and E240K are adaptive mutations, increasing resistance to extended-spectrum antibiotics, whereas M182T, and T265M are stabilizing and compensate for the decreased thermodynamic instability induced by some of the adaptive mutations. E104K is both adaptive and stabilizing. In CTX-M-3, D240G increases resistance to the extended-spectrum antibiotic cefotaxime while reducing resistance to ceftazidime. Mutations in P167 are antagonistic to D240G, increasing resistance to ceftazidime, with little effect on resistance to cefotaxime. A77V stabilizes the active site. N114D, A140S, and D288N are frequently found in combination with D240G or P167S/T leading to dual resistance to cefotaxime and ceftazidime (Results and Discussion, ESBL TEM β-lactamase networks). Adaptive mutations are found in corresponding densely linked communities in both networks. One community contains E104K and G238S; a second contains E240K and R164S; and a third contains R164H. G238S and mutations at position R164 confer a fitness advantage but are known to be mutually exclusive (Dellus-Gur et al. 2015), consistent with the idea that network communities may reflect contingent evolution. Conversely, links between mutations in different communities point to overlaps in distinct adaptive trajectories. Such overlaps could result from functional interactions that are preserved in different sequence contexts, for example compensatory interactions mediated by structurally stabilizing mutations. Interestingly, there are fewer links between communities in the phylogeny-based networks, in and communities are better defined as measured by the network community modularity metric (see Materials and Methods, Network Analysis). The modularity of the alignment-based network is only 0.13, compared with 0.28 for the phylogeny-based network. CTX-M β-Lactamase Networks CTX-M β-lactamases are mechanistically similar to TEMs, but only distantly homologous. CTX-Ms have independently acquired ESBL resistance during their more recent evolutionary radiation (Bonnet 2004). Using NAPA, we generated a network model of the CTX-M-3 cluster, which contains the largest number of CTX-M sequences. In the alignment-based network (fig. 1C), five nodes stand out for their high connectivity: D240G, A77V, N114D, A140S, and D288N. Of these, D240G and A77V have the highest connectivity, suggesting their impacts on adaptive evolution of protein function are most prominent. Indeed, D240G is known to confer increased resistance to ceftazidime, by allowing this bulkier β-lactam to be more easily accommodated within the active site (Chen et al. 2005; Delmas et al. 2008). The A77 residue is known to contribute to the formation of the active site cavity, providing key hydrophobic interactions that stabilize its core architecture (He et al. 2015). In the CTX-M-3 phylogeny-based network (fig. 1D), when compared with the alignment-based network, both the number of links and link weights are dramatically reduced. At the same time, strong links between the known adaptive mutations A77V, D240G, and D288N are maintained. Both networks identified the same two communities, containing D240G, and A77V-D288N, respectively). However, the size of the A77V-D288N community was significantly reduced in the phylogeny-based network, from 16 to 8 linked nodes. Of the latter 8 nodes, five have been previously shown to occur in the two distinct adaptive solutions leading to increased ceftazidime resistance (Novais et al. 2010). One of these contains D240G, whereas the other requires P167S. P167S is preserved in both networks, but only a very weak signal is detected for trajectories containing this adaptive mutation. P167S shifts resistance from cefotaxime to ceftazidime and leads to a local fitness optimum. Therefore, mutants containing this mutation have very limited potential to continue to evolve higher levels of resistance to both antibiotics (Novais et al. 2010). The star-like topology in the D240G community is preserved in both the alignment-based and the phylogeny-based networks, and it reveals a set of mutations existing only in the context of this adaptive mutation. The increased mobility and flexibility of the D240G mutation is known to lead to a substantial side-chain reorganization (Delmas et al. 2008). The mutations that are weakly linked to D240G likely lead to minor structural readjustments, and possibly represent a variety of redundant solutions. OXA β-Lactamase Networks OXA β-lactamases are a diverse subfamily of serine β-lactamases characterized by high activity against oxacyclin and are undergoing very rapid evolutionary radiation (Barlow and Hall 2002; Evans and Amyes 2014). The OXA-51 cluster is the largest and most diverse within molecular class D and appears to be evolving under carbapenem selective pressure (Walther-Rasmussen and Hoiby 2006; Evans and Amyes 2014; Docquier and Mangani 2016). The OXA-51 alignment-based (fig. 2A) and undirected phylogeny-based (fig. 2B) networks are both much larger than for the TEM and CTX-M-3 clusters. This greater complexity reflects the increased genetic diversification of OXA-51 evolution. Compared with alignment-based network, OXA-51’s phylogeny-based network has fewer highly connected nodes. This appears to be a general trend, as this reduction was also seen in TEM and CTX-M (fig. 1B and D). Fig. 2. View largeDownload slide NAPA-generated undirected mutation networks of OXA-51 adaptive evolution. (A) OXA-51 alignment-based network. (B) OXA-51 undirected phylogeny-based network. Node and link representations are the same as figure 1. OXA-51, D117N, K146D/N, L187V, Q194P, and D225N mutations are under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (Results and Discussion, OXA β-lactamase networks). Fig. 2. View largeDownload slide NAPA-generated undirected mutation networks of OXA-51 adaptive evolution. (A) OXA-51 alignment-based network. (B) OXA-51 undirected phylogeny-based network. Node and link representations are the same as figure 1. OXA-51, D117N, K146D/N, L187V, Q194P, and D225N mutations are under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (Results and Discussion, OXA β-lactamase networks). Although network communities are largely preserved across all OXA-51-like networks, they are better defined in the phylogeny-based networks, for which the network modularity increases from 0.37 to 0.55 (see Materials and Methods, Network Analysis). Five large communities (containing five or more nodes) were detected in both the alignment- and the phylogeny-based networks. One community (green in fig. 2) contains the highly connected node D225N. A second community (red), contains D105N, D117N, and K146N. A third community (blue), contains E36D and Q57H. A fourth community (orange) clusters around A48V and Q194P, and the fifth community contains another mutation in residue Q194 (Q194L). According to a recent review, the D225N and D117N mutations are in positions likely to be involved in specificity for the carbapenem β-lactam antibiotic, and D225N increases the flexibility of the active site entrance. The L187V mutation (in the undirected phylogeny-based network green community) increases the space between the S80 catalytic residue and the end of the active site cleft; K146 is in the active site Ω-loop spanning positions 144 through 160, with mutations K146N and K146D affecting substrate specificity. Finally, Q194 is in a region known to be important for dimer formation (residues 180 through 200) (Evans and Amyes 2014). Directed Phylogeny-Based Networks Although mutation co-occurrence networks can point to functional dependencies, they do not include information about the temporal order in which two mutations were fixed. NAPA derives temporal ordering from inferred ordering of mutations along distinct paths from internal to leaf sequences in a phylogenetic tree (see Materials and Methods, Network Construction). This approach was most successful in the OXA-51-like network (fig. 3), whereas both the TEM and CTX-M-3 clusters exhibited low genetic diversity and convergent evolution. The phylogenetic trees for these clusters were very shallow and did not contain sufficient information to determine the preferred temporal ordering of mutations on the tree. For this reason, the phylogeny-based networks contained directed links pointing in both directions and with similar weights (supplementary figs. 1 and 2, Supplementary Material online). Fig. 3. View largeDownload slide NAPA-generated directed mutation network of OXA-51 adaptive evolution. Directed phylogeny-based network, where link direction represents the temporal ordering of mutations on the phylogenetic tree. Node and link representations are the same as figure 1, with addition of link direction. Link arrows start at the inferred earlier mutation and point to the inferred later mutation. As in figure 2, D117N, K146D/N, L187V, Q194P, and D225N are mutations under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (see Results and Discussion, OXA β-lactamase networks). Fig. 3. View largeDownload slide NAPA-generated directed mutation network of OXA-51 adaptive evolution. Directed phylogeny-based network, where link direction represents the temporal ordering of mutations on the phylogenetic tree. Node and link representations are the same as figure 1, with addition of link direction. Link arrows start at the inferred earlier mutation and point to the inferred later mutation. As in figure 2, D117N, K146D/N, L187V, Q194P, and D225N are mutations under positive selection for resistance to carbapenem antibiotics, altering the conformation of the active site, and in the case of, Q194, affecting dimer formation (see Results and Discussion, OXA β-lactamase networks). In all three protein clusters, directed phylogeny-based networks further reduced the number and strength of the links connecting most mutation pairs, whereas preserving strong links involving known adaptive mutations. In the case of the TEM and CTX-M-3 clusters, the reduction in number and weight of links was subtler because the corresponding phylogenetic trees were shallow, but in the OXA-51-like cluster the trend toward reduction while preserving key functional links was most striking. Indeed, OXA-51’s directed phylogeny-based network preserved strong links connecting D225N with E36V, E36V with L167V; and Q194P with A48V. These adaptive mutations are described in detail above. The higher proportion of known adaptive mutations among the most highly connected pairs suggests that the directed phylogeny-based network improves on the undirected phylogeny-based network by reducing the number of spurious interactions. Surprisingly, given the large size of the OXA-51 cluster, the directed phylogeny-based network largely preserved the mutation assignments to each of the four major communities seen in the alignment-based and in the undirected phylogeny-based networks. Central Links and Fitness-Increasing Double Mutants in the OXA-51 Cluster NAPA was used to identify linked pairs of mutations that were central in the OXA-51 alignment-based, and the undirected/directed phylogeny-based networks. We investigated whether there was enrichment for fitness-increasing double mutants in these central links. Because a double mutant found as a clinical isolate is highly likely to have been selected for increased fitness, we looked for pairs of mutations that occurred together against a wild-type background in the OXA-51 clinical isolates. We found that while the total number of central network links dramatically decreased from alignment-based to phylogeny-based undirected to phylogeny-based directed, the number of central links observed in clinical isolates was similar for all three networks (table 1). Therefore, the percent of central links that were clinically observed as double mutants increased (4%, 8%, and 14%, respectively). This observation suggests that undirected phylogeny-based networks are enriched for functionally relevant links, and that directed networks represent even further enrichment. Table 1. Network Link and Path Centralities Identify Fitness-Increasing Double Mutants in OXA-51. Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Note.—The alignment-based, phylogeny-based undirected, and phylogeny-based directed networks are compared by total number of links (pairs of linked nodes) in the network, number of links that are central to the network topology (see Materials and Methods), and the proportion of central links that occur as double mutants in clinical isolates. The hypothesis is that a double mutant that occurs against a wild-type background in OXA-51 clinical isolates represents a true positive fitness-increasing mutation pair. Ranking double mutants by frequency of their constituent mutations was a less effective predictor of fitness-increasing pairs than link centrality. The phylogeny-based networks and the directed phylogeny-based network in particular produced the highest proportion of network central links seen as clinical isolates. Table 1. Network Link and Path Centralities Identify Fitness-Increasing Double Mutants in OXA-51. Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Network Type  Total Number of Network Links  # Central Network Links  Proportion of Double Mutants Observed as Clinical Isolates   By Network Link Centrality  By Frequency of Constituent Mutations  Alignment-based  413  408  3.9% (16 of 408)  2.9% (14 of 408)  Phylogeny-based, undirected  168  141  7.8% (11 of 141)  4.9% (7 of 141)  Phylogeny-based, directed  174  78  14% (11 of 78)  3.8% (3 of 78)  Note.—The alignment-based, phylogeny-based undirected, and phylogeny-based directed networks are compared by total number of links (pairs of linked nodes) in the network, number of links that are central to the network topology (see Materials and Methods), and the proportion of central links that occur as double mutants in clinical isolates. The hypothesis is that a double mutant that occurs against a wild-type background in OXA-51 clinical isolates represents a true positive fitness-increasing mutation pair. Ranking double mutants by frequency of their constituent mutations was a less effective predictor of fitness-increasing pairs than link centrality. The phylogeny-based networks and the directed phylogeny-based network in particular produced the highest proportion of network central links seen as clinical isolates. High-frequency single mutations found in clinical isolates are likely to increase fitness in isolation or when combined with other mutations (Gerrish and Lenski 1998; Wilke and Adami 2001; Guthrie, et al. 2011; Jain, et al. 2011). However, our central link analysis was more effective at identifying fitness-increasing double mutants, than simply considering pair combinations of high-frequency mutations. We compared the enrichment for clinical isolates in central network links to enrichment for isolates in pairs of mutations that individually occurred at high-frequency (see Materials and Methods, A Mutation Frequency-based Control for Central Network Paths). A list of double mutants ranked according to network link centrality was compared with a list of double mutants ranked by the product of individual frequencies. The proportion of double mutants seen as clinical isolates was always many fold lower in the frequency-based list (table 1). In addition, the directed phylogeny-based network produced the highest proportion of network central links seen as clinical isolates, suggesting that consideration of the temporal ordering of adaptive trajectories provided additional information for identifying fitness increasing double mutants. Central Paths and Fitness-Increasing Triple Mutants in the OXA-51 Cluster Generalizing the hypothesis above from double mutants to triple mutants, we looked for fitness-increasing triple mutants, using paths through the network that consisted of three linked nodes, ranked by betweenness centrality (see Materials and Methods, Network Analysis). NAPA was used to compute triplet centrality for the OXA-51 alignment-based, undirected phylogeny-based, and directed phylogeny-based networks, and the frequency-based control was extended to triplets (table 2). Although the number of central triplets decreased as we moved from alignment-based to undirected and directed phylogeny-based networks, the fraction of triplets that had been observed in clinical isolates increased. The trend matched what was observed in the double mutants and suggests that our conclusions based on mutation pairs may be generalizable to higher order interactions. Table 2. Network Link and Path Centralities Identify Fitness-Increasing Triple Mutants in OXA-51. Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Note.—The network link analysis extended to central network paths consisting of two adjacent links (three nodes linked in a path) representing triple mutants. Table 2. Network Link and Path Centralities Identify Fitness-Increasing Triple Mutants in OXA-51. Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Network Type  Total Number of Network Triplet Paths  # Central Network Triplet Paths  Proportion of Triple Mutants Observed as Clinical Isolates   By Triplet Network Path Centrality  By Frequency of Constituent Mutations  Alignment-based  8184  6696  1.2% (80 of 6696)  0.16% (11 of 6696)  Phylogeny-based, undirected  1278  954  4.2% (40 of 954)  0.9% (8 of 954)  Phylogeny-based, directed  568  158  8.2% (13 of 158)  3.8% (6 of 158)  Note.—The network link analysis extended to central network paths consisting of two adjacent links (three nodes linked in a path) representing triple mutants. Comparing Predictions by Network Type Our metric of predictive success is the proportion of central network paths representing double or triple mutants that have been observed as clinical isolates (tables 1 and 2). We expect that combinations of mutations that appear as clinical isolates have been selected for conferring increased antibiotic resistance. Therefore, they can be used to evaluate the predictive success of a network centrality metric. However, they cannot be applied as a gold standard to compare false positive and false negative rates of different networks. Evolution has not necessarily discovered all possible fit sequences for a protein cluster, thus the number of sequenced clinical isolates increases over time and is incomplete. Network predictions that a central path is fitness-increasing are not guaranteed to be false positives if they do not match a clinical isolate in current databases. Although known clinical isolates that are not predicted by network centralities are indeed false negatives, the false negative rates of alignment-based, undirected phylogeny-based, and directed phylogeny-based networks cannot be reliably compared. The reason is that for a given protein cluster, each network type will have a different size, and larger networks are more likely to have lower false negative rates by chance (because they have more central links). By considering the proportion of central paths in clinical isolates for each network type, it is possible to compare networks of different types and sizes. Utility and Potential Pitfalls of Phylogeny-Based Networks Spurious interactions between mutations resulting from inheritance, rather than functional constraints, are a major issue for networks that only consider the alignment of extant sequences and not the inferred phylogeny. Indeed, if only alignments are used, a pair of mutations co-occurring in multiple sequences may be erroneously inferred to have been independently selected multiple times, when in fact it was fixed only once early on in evolution of the protein cluster. When NAPA is used to construct a phylogeny-based network, only co-occurrences within the same path from an ancestral to a root node in the phylogenetic tree are considered (see Materials and Methods, Network Construction). Therefore, a mutation pair that was fixed early on will receive a co-occurrence count (link weight) of 1, while a pair that is positively selected and recurs in multiple different sequence contexts (multiple nonoverlapping paths) will receive a higher co-occurrence count. Potential caveats with using phylogenetic trees to reconstruct networks of functionally interacting pairs of mutations occur in protein clusters exhibiting convergent evolution, where multiple sequences can be ancestral to a given descendant sequence. Convergent evolution cannot be represented with tree models, which require that every protein sequence have a single ancestor. However, in our directed mutation network model, the single ancestor constraint is lifted, and we allow mutations from multiple ancestors to be linked to the same descendant mutation. The only remaining caveat is that some temporal orderings of mutation pairs based on the phylogenetic tree may not have been properly resolved. For this reason, the directed phylogeny-based network for the TEM cluster, which is known to exhibit convergent evolution, has mutation pairs with links pointing in either direction, and often with similar weights (supplementary fig. 1, Supplementary Material online). Interpreting Network Output: Prioritizing Mutations and Central Paths The network models assign a network centrality score to each mutation, but the raw score may not be sufficient to identify a tractable set of prioritized mutations, particularly in a large sequence cluster. Mutation scores can be thresholded by transforming them to P values, which quantify the probability of observing the score (or a better score) by chance. This can be accomplished by generating an equivalent Erdős–Rényi (ER) random network model and calculating a distribution of degree centrality scores, which in turn can be used to compute the P value for any score of interest. Equivalently, the distribution can be calculated analytically (e.g., as in Garlaschelli 2009). NAPA provides P values for mutation degree centrality scores by default. A standard P value threshold of statistical significance, such as p ≤ 0.05, can then be applied. To our knowledge, there is no statistically principled method for fast computation of P values for path centralities. Using k-path centrality scores to rank double or complex mutants (links or paths) (see Materials and Methods, Network Analysis) is likely to yield an intractably long list, particularly for large networks. A much shorter list can be generated by using shortest-path centrality scores, and they can be simply prioritized by considering only paths with centrality scores >0. To reduce the number of paths further, paths that do not contain mutations with significant degree centralities, can be filtered out at this stage. Finally, as NAPA also finds communities of densely linked mutations, a researcher may choose to focus on a specific community of interest within a larger network and prioritize the central mutations/links/paths in that community. Key Properties of Adaptive Evolution Derived from NAPA We used NAPA to study protein adaptation in three clusters of β-lactamase homologs, which have all experienced recent evolutionary radiation (TEM ESBL, CTX-M-3, OXA-51). We hypothesized that each cluster originated in a diversification event that depended on a few key mutations, which would appear in the networks as highly connected nodes. Indeed, in TEM and CTX-M, the most highly connected nodes represented mutations well documented as playing important roles in adaptive evolution, based on functional and structural studies (fig. 1 and supplementary tables 1 and 2, Supplementary Material online). In OXA-51, which is the least studied of the three clusters, the highly connected nodes were mutations previously identified as positively selected by nonsynonymous to synonymous ratio (dN/dS) and whose functional importance has been rationalized by inspection of OXA-51 protein structures (fig. 2 and supplementary table 3, Supplementary Material online) (Evans and Amyes 2014). Additionally, in each of the clusters we analyzed, different adaptive mutations had a tendency to be assigned to different network communities. These large, well-defined communities may correspond to alternative evolutionary strategies, or to divergent selective pressures. There were notable differences in the global network structure between the three protein clusters, reflecting distinct characteristics of adaptive evolution in each cluster. In TEM and CTX-M, despite a small number of mutations, there were many strong links between mutations in different communities. This network structure could reflect convergent adaptive trajectories, which cover a small number of possible adaptive sequence solutions. In this scenario, evolution has explored numerous combinations of a small set of distinct mutations, including the major gain-of-function mutations at residues G238 and R164 in TEM, or D240 and P167 in CTX-M. It is possible further sequence space could not be explored due to genetic constraints. Escape from this limited region of sequence space may require engineered solutions, such as initial selection for decreased resistance in a directed evolution experiment (Steinberg and Ostermeier 2016). In OXA-51, there is a larger number of nodes, and network communities appear to be better separated, with fewer strong links across different communities. This global structure could reflect divergent adaptive trajectories resulting from a combination of multiple factors, such as equivalent adaptive solutions and complex selections (multiple hosts, multiple β-lactam drugs, and fluctuating selective pressures). OXA β-lactamases in general may be more versatile in their evolution of new functions, as other members of the OXA family have also evolved ESBL activity independently. This versatility may have arisen in response to exposure to different types of β-lactamases than TEM or CTX-M. Studies of mutation coevolution and covariation in proteins have largely focused on pairs of compensatory mutations that preserve protein function and fold (de Juan et al. 2013). These pairs tend to be in physical contact, and they play a critical role in the structural constraints governing protein folding and stability. Recently, improvements in their identification have advanced the field of protein structure prediction (Hopf et al. 2014). In contrast, NAPA identifies pairs and higher order combinations of mutations whose dependencies contribute to the evolution of new functions. Based on analysis of available high-resolution (<2 Å) protein X-ray crystal structures in the Protein Data Bank (Berman et al. 2000) (PDB ID 1m40 for TEM-1, Minasov et al. 2002; PDB ID 5kzh for OXA-51, June et al. 2016), we find that most of these mutations do not form physical contacts. The mean pairwise distance between mutated residues in the pairs predicted as coevolving by NAPA was 18.3 Å for TEM-1 and 27.7 Å for OXA-51. In summary, we have developed NAPA, a new method and open source software package to analyze protein evolution under strong selective pressures with network modeling. We demonstrate here how it can be applied to help characterize the evolutionary process of three β-lactamase protein clusters as they acquire mutations and develop resistance to antibiotic treatments in the clinic. NAPA was able to identify functionally important mutations and their interactions as well as characterize higher order interactions in likely trajectories leading to increased fitness. NAPA also revealed important differences between the clusters, with respect to mechanisms by which sequence space was explored in their adaptive evolution. In the future, we expect to apply NAPA to additional homologous clusters of proteins rapidly evolving new functions under strong selective pressures. Materials and Methods Data Collection TEM sequences were downloaded from the Lahey clinic database (http://www.lahey.org/Studies/; last accessed May 24, 2015). CTX-M Class A and OXA Class D sequences were downloaded from the Beta-Lactamase Data Resources section of the NCBI Pathogen Detection database (ftp://ftp.ncbi.nlm.nih.gov/pathogen/betalactamases/; last accessed April 10, 2016). To directly compare the alignment- and phylogeny-based approaches, we excluded laboratory-engineered sequences, which could not be incorporated into the same phylogenetic model as sequences from naturally occurring clinical isolates. Because we used coding DNA sequences for phylogenetic inference, only the mutants for which the nucleotide sequence was available were included. Therefore, the total number of sequences used was less than in our previous report (Guthrie et al. 2011). For Class A β-lactamase sequences, codon positions were renumbered to correspond to the standard Ambler numbering scheme (Ambler et al. 1991). For the Class D sequences, there is no single standard scheme and we renumbered codons based on a structural alignment, as suggested in (Evans and Amyes 2014). Clusters within CTX-M and OXA families were identified with a neighbor-joining tree and matched to known phylogenetic groups. Network Construction We constructed networks of functionally interacting mutations from either protein multiple sequence alignments, resulting in undirected alignment-based co-occurrence networks, or from a phylogeny with associated alignments of internal and leaf phylogeny node protein sequences, resulting in directed or undirected phylogeny-based networks (supplementary fig. 3, Supplementary Material online). The “build” module of NAPA was used to construct all networks, setting the network type (“net_type”) to either “aln” for alignment-based, or “phylo” for phylogeny-based networks. The link type (“edge_type”) was set to undirected (available for both, alignment- and phylogeny-based, networks) or directed (only available for phylogeny-based networks). To construct alignment-based networks, input protein sequences were aligned with ClustalW, using TEM-1, CTX-M-3, and OXA-269 as the ancestral sequence for each cluster (Thompson et al. 1994). The networks were constructed based on missense mutations only, so that each network node represents an amino acid substitution at a specific codon position. At each position in the alignment, amino acid residues that did not match the ancestral sequence were identified, and gapped positions were skipped. Each mutation was represented as a network node, and two nodes were linked in the network if the corresponding mutations co-occurred in at least one distinct alignment sequence. The link was weighted by the number of sequences in which the mutation pair co-occurred (supplementary fig. 4, Supplementary Material online). To construct phylogeny-based networks, we first built outgroup-rooted phylogenies with reconstructed ancestral sequences on the internal phylogeny nodes (see Phylogeny and Ancestral Sequence reconstruction). If coding DNA is used to build the phylogeny, every sequence at an internal or leaf node is translated to protein sequence. Given an input phylogeny with reconstructed internal sequences, we identified the missense mutations differentiating each internal sequence from its immediate descendants (can include other internal or leaf node sequences). Pairs of mutations co-occurring along a single path from an ancestor to a unique descendant were initially gathered. An empirically determined distance threshold was then applied, such that pairs of mutations that occurred on the same path but too far apart on the tree were not counted as co-occurring. For our choice of distance threshold, we wanted to ensure that all mutation pairs, in which one mutation immediately followed the other on a given tree path are included. The distance threshold was therefore defined as the maximum sum of the lengths of any pair of adjacent links in the phylogeny. For all three protein clusters 0.01 was used as the threshold that guaranteed inclusion of all adjacent links on the phylogeny. Similar to the alignment-based network, phylogeny-based network links were weighted by mutation co-occurrence counts and limited to mutation pairs co-occurring on a single phylogeny path, within the chosen distance threshold (supplementary fig. 5, Supplementary Material online). Network Visualization All visualization was done in Cytoscape 3.5.1 (Shannon et al. 2003), using default layouts and applying visual mappings to nodes and links. A continuous mapping was applied to node size, which is proportional to that node’s weighted degree centrality. Node colors are based on a discrete mapping of its community assignment. Link thickness proportional to that link’s weight in the network using a continuous mapping. Network Analysis All analysis was done in the “analyze” module of NAPA turning on both the options for community detection (“cluster_nodes” and “calculate_centralities” both set to “yes”). The community partitioning algorithms are based on Louvain hierarchical modularity optimization (Blondel et al. 2008) as implemented in NetworkX (Hagberg et al. 2008). Community-finding algorithms optimize a metric known as modularity, which quantifies the difference between link densities within versus between communities (Blondel et al. 2008). As fewer established methods for community detection exist in directed networks, these networks were automatically converted in NAPA to their undirected representation before community detection. For identification of putative functionally important mutations the weighted degree centrality of the corresponding nodes was used:   CSv=Σu=1Nauvwuv, (1) where u and v are nodes in the network, auv is 1 if u and v are linked, otherwise 0, and wuv is the weight of the link between u and v. We used an analytical null to assess the significance of a node’s weighted centrality. The null distribution is based on Erdős–Rényi (ER) random networks with constrained number of nodes and probability of link formation (Garlaschelli 2009). To find central sets of connected nodes in the network, the k-path betweenness centrality was applied (Alahakoon et al. 2011). Unlike the more widely used shortest-path betweenness centrality, k-path betweenness centrality is based on multiple random walks of up to k steps, rather than only the minimum distance walks through the network. For sets of network nodes connected in a network path P, both k-path and shortest-path betweenness centrality are defined as:   CBP=∑s,t ∉ Pσst(P)σst, (2) where σst(P) is the number of distinct paths (shortest or random, respectively) connecting all pairs of network nodes (s, t) passing through path P. σst is the total number of (shortest or random) paths connecting nodes s and t. P can consist of a single node, and edge, or multiple adjacent edges. A Mutation Frequency-Based Control for Central Network Paths To assess the contribution of network modeling to the prediction of fitness increasing combinations of mutations, we compared a list of double mutants ranked by network link centrality to a list of double mutants ranked by the product of individual frequencies. These top-ranked frequency-based double mutants were used as our control. Because the number of pairwise combinations of mutations is much greater than the network links with centrality >0, we applied a threshold to the ranked frequency-based list: We picked the top n mutation pairs, as ranked by frequency, with n being the number of links that were central in the network. This allowed us to directly compare the enrichment in both rankings. Phylogeny and Ancestral Sequence Reconstruction NAPA can build phylogeny-based networks from either a single phylogenetic tree or a Bayesian ensemble of trees sampled from the posterior probability distribution. Due to low-sequence divergence in the TEM and CTX-M-3 clusters, the coding DNA rather than protein sequence alignment was used as input for phylogeny inference and ancestral reconstruction. The phylogenetic trees in each protein cluster were outgroup rooted. A closely related protein cluster to the protein of interest was found by constructing an identity-based neighbor joining (NJ) tree from the alignment of the protein cluster with a set of homologous protein clusters. For TEM, the NJ trees included SHV and CARB-type β-lactamases; for CTX-M-3 and OXA-51-like the NJ tree included all CTX-M and OXA clusters, respectively. In this way, the SHV-1, CTX-M-14, and OXA-213 coding DNA sequences were identified as outgroups for the TEM, CTX-M-3, and CTX-M-51-like clusters, respectively. The TEM phylogeny was inferred using the MrBayes Metropolis-coupled MCMC method (Huelsenbeck and Ronquist 2001), whereas the CTX-M-3 and OXA-51-like phylogenies were constructed with GARLi, a genetic algorithm for maximum likelihood inference method (Zwickl 2006). For each cluster, the input coding DNA sequence alignment was partitioned by codon position (Lanfear et al. 2012) and the generalized time reversible model of nucleotide substitution with gamma-distributed rates and a proportion of invariant (invariable) sites (GTR + G+I) was used. For the TEM cluster, six independent runs were performed in MrBayes, each for 30 million generations, including 15 million generation burn in. Trees were thinned to every 20,000th generation, to remove autocorrelation between phylogeny parameters. Burn-in and thinning parameters were determined from standard MCMC convergence diagnostics. The genetic algorithm run parameters for the CTX-M-3 and OXA-51-like clusters were population size of 4 individuals, selection intensity of 0.25, 2 million generations. After phylogenies were reconstructed for each cluster, coding DNA sequences on the internal nodes of the trees (ancestral sequences) were inferred by maximum likelihood (Knight et al. 2007). The same nucleotide substitution model (GTR + G+I) was applied as the one used for phylogeny reconstruction. The leaf and reconstructed internal node sequences were translated to protein sequences and included in the input to NAPA, along with the phylogenetic trees (supplementary fig. 3, Supplementary Material online). Network Analysis of Protein Adaptation All methods described in this work, are available for nonprofit use in an open source software package Network Analysis of Protein Adaptation (NAPA) at https://github.com/KarchinLab/NAPA. For a protein cluster of interest, the user provides a multiple sequence alignment or an alignment with a corresponding phylogenetic tree or tree ensemble and reconstructed ancestral sequences. NAPA produces a network model of the cluster, where each mutation is represented as a network node, and where mutation co-occurrences are represented as links between pairs of nodes. The weight of each link corresponds to the inferred strength of functional association between the corresponding pair of mutations. Temporal ordering can be inferred from the phylogenetic tree, based on the order in which individual pairs of mutations are ordered on the tree. The network model specification is exported as a simple text file, which can be imported into network visualization software, such as Cytoscape (Shannon et al. 2003). Additionally, NAPA provides customized tools, based on graph theory, to identify highly connected nodes, mutation communities, and multiple network centrality metrics to assess interactions of likely functional relevance. Acknowledgments This work was supported by the National Science Foundation ABI Innovation award (#1262435). Supplementary Material Supplementary data are available at Molecular Biology and Evolution online. References Alahakoon T, Tripathi R, Kourtellis N, Simha R, Iamnitchi A. 2011. K-path centrality: A new centrality measure in social networks. Proceedings of the 4th Workshop on Social Network Systems. p. 1. ACM Ambler RP, Ambler RP, Coulson AF, Frere JM, Ghuysen JM, Joris B, Forsman M, Levesque RC, Tiraby G, Waley SG. 1991. A standard numbering scheme for the class A beta-lactamases. Philos Trans R Soc Lond B Biol Sci . 276 (Pt 1): 269– 270. Bank C, Matuszewski S, Hietpas RT, Jensen JD. 2016. On the (un)predictability of a large intragenic fitness landscape. Proc Natl Acad Sci U S A.  113( 49): 14085– 14090. Google Scholar CrossRef Search ADS PubMed  Barlow M, Hall BG. 2002. Phylogenetic analysis shows that the OXA beta-lactamase genes have been on plasmids for millions of years. J Mol Evol.  55( 3): 314– 321. Google Scholar CrossRef Search ADS PubMed  Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000. The Protein Data Bank. Nucleic Acids Res.  28( 1): 235– 242. Google Scholar CrossRef Search ADS PubMed  Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS. 2006. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature  444( 7121): 929– 932. Google Scholar CrossRef Search ADS PubMed  Blazquez J, Morosini MI, Negri MC, Gonzalez-Leiza M, Baquero F. 1995. Single amino acid replacements at positions altered in naturally occurring extended-spectrum TEM beta-lactamases. Antimicrob Agents Chemother . 39( 1): 145– 149. Google Scholar CrossRef Search ADS PubMed  Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. 2008. Fast unfolding of communities in large networks. J Stat Mech.  2008( 10): P10008. Google Scholar CrossRef Search ADS   Bonnet R. 2004. Growing group of extended-spectrum beta-lactamases: the CTX-M enzymes. Antimicrob Agents Chemother.  48( 1): 1– 14. Google Scholar CrossRef Search ADS PubMed  Brown NG, Pennington JM, Huang W, Ayvaz T, Palzkill T. 2010. Multiple global suppressors of protein stability defects facilitate the evolution of extended-spectrum TEM beta-lactamases. J Mol Biol.  404( 5): 832– 846. Google Scholar CrossRef Search ADS PubMed  Bush K. 2013. Proliferation and significance of clinically relevant beta-lactamases. Ann N Y Acad Sci.  1277: 84– 90. Google Scholar CrossRef Search ADS PubMed  Bush K, Jacoby GA. 2010. Updated functional classification of beta-lactamases. Antimicrob Agents Chemother.  54( 3): 969– 976. Google Scholar CrossRef Search ADS PubMed  Camps M, Herman A, Loh E, Loeb LA. 2007. Genetic constraints on protein evolution. Crit Rev Biochem Mol Biol.  42( 5): 313– 326. Google Scholar CrossRef Search ADS PubMed  Carneiro M, Hartl DL. 2010. Colloquium papers: adaptive landscapes and protein evolution. Proc Natl Acad Sci U S A.  107(Suppl 1): 1747– 1751. Google Scholar CrossRef Search ADS PubMed  Chakrabarti S, Panchenko AR. 2010. Structural and functional roles of coevolved sites in proteins. PLoS One  5( 1): e8591. Google Scholar CrossRef Search ADS PubMed  Chen Y, Bonnet R, Shoichet BK. 2007. The acylation mechanism of CTX-M beta-lactamase at 0.88 a resolution. J Am Chem Soc.  129( 17): 5378– 5380. Google Scholar CrossRef Search ADS PubMed  Chen Y, Delmas J, Sirot J, Shoichet B, Bonnet R. 2005. Atomic resolution structures of CTX-M beta-lactamases: extended spectrum activities from increased mobility and decreased stability. J Mol Biol.  348( 2): 349– 362. Google Scholar CrossRef Search ADS PubMed  Crona K, Greene D, Barlow M. 2013. The peaks and geometry of fitness landscapes. J Theor Biol.  317: 1– 10. Google Scholar CrossRef Search ADS PubMed  de Juan D, Pazos F, Valencia A. 2013. Emerging methods in protein co-evolution. Nat Rev Genet.  14( 4): 249– 261. Google Scholar CrossRef Search ADS PubMed  de Visser JA, Krug J. 2014. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet.  15( 7): 480– 490. Google Scholar CrossRef Search ADS PubMed  Dellus-Gur E, Elias M, Caselli E, Prati F, Salverda ML, de Visser JAG, Fraser JS, Tawfik DS. 2015. Negative epistasis and evolvability in TEM-1 β-lactamase–the thin line between an enzyme’s conformational freedom and disorder. J Mol Biol.  427( 14): 2396– 2409. Google Scholar CrossRef Search ADS PubMed  Dellus-Gur E, Toth-Petroczy A, Elias M, Tawfik DS. 2013. What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. J Mol Biol.  425( 14): 2609– 2621. Google Scholar CrossRef Search ADS PubMed  Delmas J, Chen Y, Prati F, Robin F, Shoichet BK, Bonnet R. 2008. Structure and dynamics of CTX-M enzymes reveal insights into substrate accommodation by extended-spectrum beta-lactamases. J Mol Biol.  375( 1): 192– 201. Google Scholar CrossRef Search ADS PubMed  Delmas J, Leyssene D, Dubois D, Birck C, Vazeille E, Robin F, Bonnet R. 2010. Structural insights into substrate recognition and product expulsion in CTX-M enzymes. J Mol Biol.  400( 1): 108– 120. Google Scholar CrossRef Search ADS PubMed  DePristo MA, Hartl DL, Weinreich DM. 2007. Mutational reversions during adaptive protein evolution. Mol Biol Evol.  24( 8): 1608– 1610. Google Scholar CrossRef Search ADS PubMed  DePristo MA, Weinreich DM, Hartl DL. 2005. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet.  6( 9): 678– 687. Google Scholar CrossRef Search ADS PubMed  Docquier JD, Mangani S. 2016. Structure-Function Relationships of Class D Carbapenemases. Curr Drug Targets  17( 9): 1061– 1071. Google Scholar CrossRef Search ADS PubMed  Evans BA, Amyes SG. 2014. OXA beta-lactamases. Clin Microbiol Rev.  27( 2): 241– 263. Google Scholar CrossRef Search ADS PubMed  Firnberg E, Labonte JW, Gray JJ, Ostermeier M. 2014. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol.  31( 6): 1581– 1592. Google Scholar CrossRef Search ADS PubMed  Fogle CA, Nagle JL, Desai MM. 2008. Clonal interference, multiple mutations and adaptation in large asexual populations. Genetics  180( 4): 2163– 2173. Google Scholar CrossRef Search ADS PubMed  Galleni M, Lamotte-Brasseur J, Raquet X, Dubus A, Monnaie D, Knox JR, Frere JM. 1995. The enigmatic catalytic mechanism of active-site serine beta-lactamases. Biochem Pharmacol.  49( 9): 1171– 1178. Google Scholar CrossRef Search ADS PubMed  Garlaschelli D. 2009. The weighted random graph model. N J Phys . 11( 7): 073005. Google Scholar CrossRef Search ADS   Gerrish PJ, Lenski RE. 1998. The fate of competing beneficial mutations in an asexual population. Genetica  102–103( 1–6): 127– 144. Google Scholar CrossRef Search ADS PubMed  Guthrie VB, Allen J, Camps M, Karchin R. 2011. Network models of TEM beta-lactamase mutations coevolving under antibiotic selection show modular structure and anticipate evolutionary trajectories. PLoS Comput Biol.  7( 9): e1002184. Google Scholar CrossRef Search ADS PubMed  Hagberg AA, Schult DA, Swart PJ. 2008. Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference (SciPy2008) : 11– 15. He D, Chiou J, Zeng Z, Liu L, Chen X, Zeng L, Chan EW, Liu JH, Chen S. 2015. Residues distal to the active site contribute to enhanced catalytic activity of variant and hybrid beta-lactamases derived from CTX-M-14 and CTX-M-15. Antimicrob Agents Chemother.  59( 10): 5976– 5983. Google Scholar CrossRef Search ADS PubMed  Hopf TA, Scharfe CP, Rodrigues JP, Green AG, Kohlbacher O, Sander C, Bonvin AM, Marks DS. 2014. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife  3: e03430. Google Scholar CrossRef Search ADS   Huang W, Palzkill T. 1997. A natural polymorphism in beta-lactamase is a global suppressor. Proc Natl Acad Sci U S A.  94( 16): 8801– 8806. Google Scholar CrossRef Search ADS PubMed  Huelsenbeck JP, Ronquist F. 2001. MRBAYES: bayesian inference of phylogenetic trees. Bioinformatics  17( 8): 754– 755. Google Scholar CrossRef Search ADS PubMed  Jain K, Krug J, Park SC. 2011. Evolutionary advantage of small populations on complex fitness landscapes. Evolution  65( 7): 1945– 1955. Google Scholar CrossRef Search ADS PubMed  Jansen G, Barbosa C, Schulenburg H. 2013. Experimental evolution as an efficient tool to dissect adaptive paths to antibiotic resistance. Drug Resist Updat.  16( 6): 96– 107. Google Scholar CrossRef Search ADS PubMed  June CM, Muckenthaler TJ, Schroder EC, Klamer ZL, Wawrzak Z, Powers RA, Szarecka A, Leonard DA. 2016. The structure of a doripenem-bound OXA-51 class D beta-lactamase variant with enhanced carbapenemase activity. Protein Sci.  25( 12): 2152– 2163. Google Scholar CrossRef Search ADS PubMed  Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z, et al.   2007. PyCogent: a toolkit for making sense from sequence. Genome Biol.  8( 8): R171. Google Scholar CrossRef Search ADS PubMed  Kryazhimskiy S, Dushoff J, Bazykin GA, Plotkin JB. 2011. Prevalence of epistasis in the evolution of influenza A surface proteins. PLoS Genet.  7( 2): e1001301. Google Scholar CrossRef Search ADS PubMed  Lanfear R, Calcott B, Ho SY, Guindon S. 2012. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol.  29( 6): 1695– 1701. Google Scholar CrossRef Search ADS PubMed  Lee BC, Park K, Kim D. 2008. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins  72( 3): 863– 872. Google Scholar CrossRef Search ADS PubMed  Legendre P, Makarenkov V. 2002. Reconstruction of biogeographic and evolutionary networks using reticulograms. Syst Biol.  51( 2): 199– 216. Google Scholar CrossRef Search ADS PubMed  Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL. 2016. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet.  17( 11): 704– 714. Google Scholar CrossRef Search ADS PubMed  Matagne A, Dubus A, Galleni M, Frere JM. 1999. The beta-lactamase cycle: a tale of selective pressure and bacterial ingenuity. Nat Prod Rep.  16( 1): 1– 19. Google Scholar CrossRef Search ADS PubMed  Matagne A, Frere JM. 1995. Contribution of mutant analysis to the understanding of enzyme catalysis: the case of class A beta-lactamases. Biochim Biophys Acta  1246( 2): 109– 127. Google Scholar CrossRef Search ADS PubMed  Matagne A, Lamotte-Brasseur J, Frere JM. 1998. Catalytic properties of class A beta-lactamases: efficiency and diversity. Biochem J.  330 (Pt 2): 581– 598. Google Scholar CrossRef Search ADS PubMed  Minasov G, Wang X, Shoichet BK. 2002. An ultrahigh resolution structure of TEM-1 beta-lactamase suggests a role for Glu166 as the general base in acylation. J Am Chem Soc.  124( 19): 5333– 5340. Google Scholar CrossRef Search ADS PubMed  Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M. 2015. Rational design of antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PLoS One  10( 5): e0122283. Google Scholar CrossRef Search ADS PubMed  Naas T, Oueslati S, Bonnin RA, Dabos ML, Zavala A, Dortet L, Retailleau P, Iorga BI. 2017. Beta-lactamase database (BLDB) – structure and function. J Enzyme Inhib Med Chem.  32( 1): 917– 919. Google Scholar CrossRef Search ADS PubMed  Novais Â, Comas I, Baquero F, Cantón R, Coque TM, Moya A, González-Candelas F, Galán J-C. 2010. Evolutionary trajectories of beta-lactamase CTX-M-1 cluster enzymes: predicting antibiotic resistance. PLoS Pathog.  6( 1): e1000735. Google Scholar CrossRef Search ADS PubMed  Orr HA. 2005. The genetic theory of adaptation: a brief history. Nat Rev Genet.  6( 2): 119– 127. Google Scholar CrossRef Search ADS PubMed  Page MG. 2008. Extended-spectrum beta-lactamases: structure and kinetic mechanism. Clin Microbiol Infect . 14: 63– 74. Google Scholar CrossRef Search ADS PubMed  Petit A, Maveyraud L, Lenfant F, Samama JP, Labia R, Masson JM. 1995. Multiple substitutions at position 104 of beta-lactamase TEM-1: assessing the role of this residue in substrate specificity. Biochem J . 305( 1): 33– 40. Google Scholar CrossRef Search ADS PubMed  Poirel L, Potron A, Nordmann P. 2012. OXA-48-like carbapenemases: the phantom menace. J Antimicrob Chemother.  67( 7): 1597– 1606. Google Scholar CrossRef Search ADS PubMed  Pollock DD, Taylor WR, Goldman N. 1999. Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol.  287( 1): 187– 198. Google Scholar CrossRef Search ADS PubMed  Saakian DB, Kirakosyan Z, Hu CK. 2012. Biological evolution in a multidimensional fitness landscape. Phys Rev E Stat Nonlin Soft Matter Phys.  86( 3): 031920. Google Scholar CrossRef Search ADS PubMed  Salverda ML, Dellus E, Gorter FA, Debets AJ, van der Oost J, Hoekstra RF, Tawfik DS, de Visser JA. 2011. Initial mutations direct alternative pathways of protein evolution. PLoS Genet.  7( 3): e1001321. Google Scholar CrossRef Search ADS PubMed  Salverda MLM, de Visser JAGM, Barlow M. 2010. Natural evolution of TEM-1 β-lactamase: experimental reconstruction and clinical relevance. FEMS Microbiol Rev . 34( 6): 1015– 1036. Google Scholar CrossRef Search ADS PubMed  Schenk MF, Szendro IG, Krug J, de Visser JA. 2012. Quantifying the adaptive potential of an antibiotic resistance enzyme. PLoS Genet.  8( 6): e1002783. Google Scholar CrossRef Search ADS PubMed  Schenk MF, Witte S, Salverda MLM, Koopmanschap B, Krug J, de Visser JAGM. 2015. Role of pleiotropy during adaptation of TEM-1 beta-lactamase to two novel antibiotics. Evol Appl.  8( 3): 248– 260. Google Scholar CrossRef Search ADS PubMed  Schneider KD, Karpen ME, Bonomo RA, Leonard DA, Powers RA. 2009. The 1.4 A crystal structure of the class D beta-lactamase OXA-1 complexed with doripenem. Biochemistry  48( 50): 11840– 11847. Google Scholar CrossRef Search ADS PubMed  Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res.  13( 11): 2498– 2504. Google Scholar CrossRef Search ADS PubMed  Soskine M, Tawfik DS. 2010. Mutational effects and the evolution of new protein functions. Nat Rev Genet.  11( 8): 572– 582. Google Scholar CrossRef Search ADS PubMed  Steinberg B, Ostermeier M. 2016. Environmental changes bridge evolutionary valleys. Sci Adv.  2( 1): e1500921. Google Scholar CrossRef Search ADS PubMed  Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res.  22( 22): 4673– 4680. Google Scholar CrossRef Search ADS PubMed  Tomatis PE, Fabiane SM, Simona F, Carloni P, Sutton BJ, Vila AJ. 2008. Adaptive protein evolution grants organismal fitness by improving catalysis and flexibility. Proc Natl Acad Sci U S A.  105( 52): 20605– 20610. Google Scholar CrossRef Search ADS PubMed  Unckless RL, Orr HA. 2009. The population genetics of adaptation: multiple substitutions on a smooth fitness landscape. Genetics  183( 3): 1079– 1086. Google Scholar CrossRef Search ADS PubMed  Walther-Rasmussen J, Hoiby N. 2006. OXA-type carbapenemases. J Antimicrob Chemother.  57( 3): 373– 383. Google Scholar CrossRef Search ADS PubMed  Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. 2013. Should evolutionary geneticists worry about higher-order epistasis? Curr Opin Genet Dev  23: 700– 707. Google Scholar CrossRef Search ADS PubMed  Wilke CO, Adami C. 2001. Interaction between directional epistasis and average mutational effects. Proc Biol Sci  268: 1469– 1474. Google Scholar CrossRef Search ADS PubMed  Zwickl DJ. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD diss. http://hdl.handle.net/2152/2666: [The University of Texas at Austin]. © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

Molecular Biology and EvolutionOxford University Press

Published: Mar 7, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off