HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

V. M. Gonçalves-Almeida; D. E. V. Pires; R. C. de Melo-Minardi; C. H. da Silveira; W. Meira; M. M. Santoro

doi:10.1093/bioinformatics/btr680

HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

Gonçalves-Almeida, V. M.; Pires, D. E. V.; de Melo-Minardi, R. C.; da Silveira, C. H.; Meira, W.; Santoro, M. M. 2011-12-09 00:00:00 MANUSCRIPT CATEGORY: ORIGINAL PAPER Vol. 28 no. 3 2012, pages 342–349 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btr680 Structural bioinformatics Advance Access publication December 9, 2011 HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids 1,2,∗ 1,2 1,∗ V. M. Gonçalves-Almeida ,D.E.V.Pires , R. C. de Melo-Minardi , 3 1 2 C. H. da Silveira , W. Meira and M. M. Santoro 1 2 Department of Computer Science, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte and Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, Brazil Associate Editor: Anna Tramontano ABSTRACT repositories (Rawlings et al., 2008). The MEROPS database groups both proteases and inhibitors hierarchically into families Motivation: Protein–protein interfaces contain important information (sequence-related entities) and clans (structure-related entities). about molecular recognition. The discovery of conserved patterns A careful MEROPS search highlighted a well-known but intriguing is essential for understanding how substrates and inhibitors are phenomenon: some protease inhibitors lack speciﬁcity and involve bound and for predicting molecular binding. When an inhibitor different 3D structures and catalytic mechanisms. For instance, binds to different enzymes (e.g. dissimilar sequences, structures or Turkey Ovomucoid and Englin C act in different serine peptidase mechanisms what we call cross-inhibition), identiﬁcation of invariants clans such as PA(S) (all β Trypsin-like folds) and SB (α/β Subtilisin- is a difﬁcult task for which traditional methods may fail. Results: To clarify how cross-inhibition happens, we model the like folds) and soybean Kunitz trypsin inhibitor decays proteolytic activity as much in serine peptidases as in metallopeptidases (which problem, propose and evaluate a methodology called HydroPaCe have very different enzymatic mechanisms). We call this lack of to detect conserved patterns. Interfaces are modeled as graphs of speciﬁcity cross-inhibition. Our main challenge in this article is atomic apolar interactions and hydrophobic patches are computed to create a methodology that helps to understand and predict this and summarized by centroids (HP-centroids), and their conservation phenomenon. is detected. Despite sequence and structure dissimilarity, our method Protease–inhibitor recognition and binding are determined by achieves an appropriate level of abstraction to obtain invariant a complex orchestration of interactions and entropic factors that properties in cross-inhibition. We show examples in which HP- involve the entire protease–inhibitor–solvent system. Fortunately, centroids successfully predicted enzymes that could be inhibited by the experimental binding energetics of many protease–inhibitor the studied inhibitors according to BRENDA database. complexes have already been thermodynamically determined. It is Availability: www.dcc.ufmg.br/∼raquelcm/hydropace known, for example, that the binding of Turkey Ovomucoid with Contact: valdetemg@ufmg.br; raquelcm@dcc.ufmg.br; Elastase at 25 C is characterized by a negative Gibbs free energy santoro@icb.ufmg.br in which enthalpy change is almost negligible but entropy change Supplementary information: Supplementary data are available at is largely positive (Baker and Murphy, 1997). Furthermore, we spot Bioinformatics online. a clear trend of higher apolar/polar accessible surface area ratio Received on July 25, 2011; revised on November 17, 2011; accepted toward interface (Supplementary Fig. S1), which is an evidence of on December 3, 2011 the importance of the hydrophobic interactions in protease–inhibitor complex formation. That said, we particularly focus our attention on the search for conserved hydrophobic interaction patterns. We 1 INTRODUCTION deﬁne these patterns as invariant hydrophobic regions (or patches) Enzyme inhibition occurs when a molecule binds to an enzyme, thus that are in contact with the same apolar complementary parts of the decreasing its activity. Inhibitors may be proteic or non-proteic; they inhibitor (Supplementary Figs 3 and 4). We show (Supplementary can decrease the enzyme’s ability to bind substrates or can lower the Fig. S2) a strong linear relationship (Pearson’s correlation coefﬁcient enzyme’s catalytic activity or a combination of both. Inhibition is an of 0.98) between the inferred solvation entropy change and the important biochemical mechanism that is involved in metabolism extension of hydrophobic patches, measured in terms of the number regulation. It controls many intra- and extracellular pathways, of hydrophobic atoms inside them. inﬂammatory and immunological processes, virus replication and Although there are many biochemical studies that analyze many other biological functions (Barrett et al., 2004) Furthermore, diversity in inhibition processes [e.g. (Bode et al., 1986; Chakrabarti once this natural phenomenon is understood, it might be used and Janin, 2002a; Laskowski and Qasim, 2000; Qasim et al., 1997)], for biotechnological purposes including the development of drugs, experimental characterization of inhibition is a labor-intensive insecticides, pesticides and disinfectants. process. The large amount of possible inhibitors for a given enzyme A particular case is the inhibition of peptidases; on this subject, the can make tests costly; hence, in silico methods can contribute to MEROPS database is currently one of the most important peptidase predicting inhibitor–enzyme recognition. Despite its evident importance, there are few models and To whom correspondence should be addressed. algorithms that identify recognition and interaction patterns that 342 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 342 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe could help to clarify how cross-inhibition occurs. In this context, cross-inhibition structures in the Protein Data Bank (PDB) (Berman et al., 2000). Moreover, this is a well-studied family that presents some peculiarities a pattern is a conserved set of interface attributes that is used to and similarities in catalytic sites (Page and Di Cera, 2008). Although explain or predict binding. Trypsin-like and Subtilisin-like have very different 3D structures, they Traditionally, sequence comparison and/or structural alignment hydrolyze their substrates by the same mechanism (Ekici et al., 2008; Lesk methods have been used in conservation detection (Melo-Minardi and Fordham, 1996; Siezen and Leunissen, 1997). et al., 2007; Ribeiro et al., 2010; Zhang et al., 2011). According Enzyme–inhibitor complexes: we found ﬁve non-redundant complexes to Tuncbag et al. (2011), structures are more conserved than involving the Eglin C inhibitor: four bond to Subtilisin-like (PDB IDs: sequences, and interface-forming residues (IRFs) are even more 1TEC, 1CSE, 1MEE and 1SBN) and one to Trypsin-like (PDB ID: 1ACB) conserved than the whole structure. However, these classical enzymes. Likewise, we found four complexes involving the Ovomucoid methods are inappropriate because in cross-inhibition we may deal inhibitor: three complexed with Trypsin-like (PDB IDs: 1CHO, 1PPF and with very dissimilar sequences and even completely distant folds. 3SGB) and one with Subtilisin-like (PDB ID: 1R0R) enzymes. Despite the large amount of information on enzymatic complexes involving these Indeed, in cross-inhibition pattern detection with traditional two families, there is much redundant information regarding the sequence methods, we identify essentially known conserved residues that identities, and this leaves only a small number of non-redundant complexes directly participate in the catalysis process, such as the catalytic to be analyzed. triad, the speciﬁcity pocket and oxyanion-binding sites. We note Apo enzymes: we selected a set of non-redundant apo enzymes from the that to correctly assess the eventual hydrophobic contribution two families by removing enzymes that presented >50% of sequence identity. of the entire protease–inhibitor interface, we should abstract the Hence, we use 9 samples from Subtilisin-like and 35 from Trypsin-like residue semantics and should assess patches at the atomic level. families. The complete list of PDB ids is presented in the Supplementary A similar approach has been used to characterize the core of Material. protein domains with similar folds but very divergent sequence All the structures were submitted to standardization processes using the compositions (Soundararajan et al., 2010). The atomic level is more PDB Enhanced Structures Toolkit (PDBest) (Pires et al., 2007). appropriate because all residues have apolar portions. Lysine, for example, is considered a positively charged residue (at neutral pH), 2.2 IFRs but there are also several hydrophobic methyl groups. The current analysis is restricted to regions of the molecular interface of Enzyme–inhibitor recognition is determined by a network the enzyme and its inhibitor. The IFRs can be determined by three different of interactions between atoms; hence, graph modeling is a methods. The ﬁrst deﬁnes the interface simply by using a cut-off distance straightforward approach. We model hydrophobic atoms as nodes between the residues of the interacting molecules (Chothia and Janin, 1975; of a graph and the contacts between them as the edges. We use the Conte et al., 1999). The second approach computes the interactions based graph to obtain conserved hydrophobic patches or, in other words, on differences in solvent-accessible surface area (ASA) when the monomers connected components. are separated (Chakrabarti and Janin, 2002b; Janin et al., 1990). Finally, the last approach deﬁnes interfaces through computational geometry using Supposing that the most important property of a hydrophobic Voronoi diagrams and the alpha shapes theory (Pontius et al., 1996). We used patch is where it is positioned to interact with the ligand, we the ASA method because it is the most used method and is therefore more abstract from its composition volume, shape and density, and consolidated. we represent the patch as a geometric centroid that we call the Enzyme–inhibitor IFRs: we computed the IFRs in the cross-inhibition HP-centroid(hydrophobic patch centroid). In this work, we propose complexes using the ASA approach with the STING Millennium Suite a novel model and algorithms to detect conserved HP-centroids in platform (SMS) (Neshich et al., 2003). cross-inhibition. Projection of IFRs from complexes into apo enzymes: for the apo proteins, Finally, we present a qualitative case study that consists of the projection was derived by structural alignment using an enzyme–inhibitor two examples of cross-inhibition, Trypsin-like and Subtilisin-like complex and the computed IFR. Moreover, the structures were solvated using enzymes, both of which belong to the serine proteases family. Gromacs. After applying the treatment to PDB ﬁles, all structures, including the complex model, were superimposed using the program MultiProt. Finally, They present completely different 3D structures and the sequence the residues that aligned with the interface of the complex model were identity is as low as 20% (Wallace et al., 1996). However, they considered the interfaces of the apo proteins. This process was performed possess exactly the same Ser-His-Asp triad on their active sites. In for analysis of both sets (Trypsin-like and Subtilisin-like) of single-chain the ﬁrst case, we have complexes of Trypsin-like and Subtilisin- proteins in our database. like enzymes inhibited by Eglin C (Betzel et al., 1993), and in the second case, we have complexes of the same families with 2.3 Problem modeling Turkey Ovomucoid (Papamokos et al., 1982). We verify that the HP-centroids obtained from the complexes are present in a set of The proposed method is based on the search for conserved hydrophobic sequence-diverse apo structures that are conserved throughout the patches (HP-centroids). In what follows, we detail each step of our model: Graph construction: the ﬁrst step of our model consists of the family. representation of IFRs as graphs. The nodes are atoms from the IFR residues, and the edges are the presumed contacts. According to our previous work 2 MATERIALS AND METHODS (da Silveira et al., 2009), there are two main approaches to identify contacts Each step of the proposed methodology, called HydroPaCe, is in proteins: the ﬁrst is cut-off dependent (CD), and the other is independent described below. A complete workﬂow of the methodology is presented in (CI). Although in the above-mentioned study we found that, at the residue Figure 1. level, the CD approach was a simpler, more complete and more reliable technique than some CI techniques, here we chose to use a CI methodology 2.1 Data selection and preparation because we did not ﬁnd a reliable cut-off value at the atomic level. This As explained previously, we have chosen serine proteases to test our paradigm uses classical computational geometry algorithms to compute a algorithm. We chose them because there are few other examples of Voronoi diagram (VD) (Poupon, 2004) and its dual problem, the Delaunay [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 343 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. Fig. 1. HydroPaCe workﬂow: We searched for 3D structures of subtilisin-like and trypsin-like families in the PDB database. The PDB ids were separated into protein–inhibitor complexes and apo proteins. The structures with sequence identities that were >50% identical to other selected sequences were discarded. The cross-inhibition complexes were aligned by the inhibitor’s chain (a), and the interfaces of contact (also called IFRs) were identiﬁed using ASA methodology (b). The apo proteins were aligned by their single chains by using an enzyme–inhibitor complex to project the interface. Using DT, possible interatomic contacts were computed, resulting in the edges of a graph where nodes are atoms (c). We considered only the hydrophobic interactions between atoms and removed edges that represented covalent bonds. We then identiﬁed the connected components that represent the hydrophobic patches in these graphs (d). We propose two levels of abstractions to represent the hydrophobic patches, both of which are based on geometric centroids (HP-centroids). The ﬁrst isa coarse-grained analysis that consists of computing a centroid for each connected component, and the second is a ﬁne-grained analysis that searches for dense subregions using two different community detection algorithms and calculating the HP-centroids on communities. The obtained HP-centroids were clustered using the OM and AC methods (e). Finally, the HP-centroids were evaluated using PRM, which accounts positively for coverage in terms of enzymes and negatively for enzyme redundancy. In (a–d), the left-hand structure is Subtilisin-like and the right-hand structure is Trypsin-like. tessellation (DT) (Dupuis et al., 2005). In the 3D view, the VD decomposes a neighborhood with the closest (not occluded) contacts (da Silveira et al., the volume by associating a polyhedron with each site (which is called 2009). a Voronoi cell). Each face of these polyhedrons is composed of a plane Deletion of covalent edges: we are interested only in non-covalent that bisects the line and links each site to each of its near sites, thus mapping interactions; hence, we remove covalent bond edges in a post-processing step. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 344 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Deletion of polar edges: once we have a geometrical inference of when to stop the process to ensure that we have high-quality clusters. The non-occluded interactions, we classify them into hydrophobic and polar strategy for determining this stopping point, and a detailed explanation of interactions based on the classiﬁcation rules proposed in Sobolev et al. the algorithm, are presented in the Supplementary Material. (1999). The complete table with the classiﬁcations of all the atoms can be found in the Supplementary Material. As discussed previously, we restrict our 2.5 Evaluation analysis to hydrophobic interactions type by removing polar contact edges. To perform a quantitative evaluation of the clusters formed by the matches, Nevertheless, the analysis can be extended to deal with polar areas. we propose a metric based on the concept of recall that is penalized Computation of hydrophobic patches: we use a depth-ﬁrst search when different HP-centroids of the same protein (redundant centroids) are to efﬁciently detect the connected components, which are natural grouped together. We have it called the penalized recall metric (PRM) and representations of the hydrophobic patches. is formalized below: Abstraction of hydrophobic patches through centroids: hydrophobic D E patches may occur in different shapes and volumes; our model considers two C C 2 2 PRM = − (1) levels of abstractions to represent them, both of which are based on geometric P P C C 2 2 centroids (HP-centroid). The ﬁrst, which we call the coarse-grained analysis, consists of computing a centroid for each connected component. The second where C is the number of pairs of centroids from different enzymes in is a ﬁne-grained analysis that divides the original connected components into the same cluster, C is the number of pairs of HP-centroids from the same dense subgraphs or communities. A community is a subgraph in which the protein in the same cluster, C is the total number of pairs of HP-centroids nodes are much more connected with the other nodes in the community than in the cluster and the values of D and E are limited to P. with the external nodes. In this approach, the HP-centroids are computed The metric produces values in the range of [−1; 1] where −1 is the worst based on communities. case, with minimum recall and maximum redundancy. It will result in 1 when In conclusion, our method is based on the computation of hydrophobic we have maximum recall and minimum redundancy. When we have similar patches and their abstraction through geometric centroids (HP-centroids) values for recall and redundancy, the metric approaches 0. that can represent the entire patch (coarse-grained) or communities of these The average of the PRM of the clusters was used to evaluate the three patches (ﬁne-grained). Considering the HP-centroids of a set of cross- different approaches (CCC, EBCC and SGCC). It cannot be used to compare inhibition complexes, we propose algorithms to cluster the centroids and to between the OM and AC. In the OM, clusters are formed with total variability detect those that are conserved across all of them. We describe the algorithms by deﬁnition; in other words, there are no pair of centroids of the same in the next section and then explain how to evaluate the clusters obtained. enzyme in a cluster. In this case, we use traditional intra- and intercluster average distances. 2.4 Algorithms A priori, a high-quality cluster must have low intracluster and high intercluster distances. That is because, in an ideal clustering, similar elements Here, we describe in more detail the different approaches (coarse- and must be grouped together and dissimilar ones must be separated. ﬁne-grained) that we propose to abstract from the hydrophobic patches. We In conclusion, we compare the proposed approaches in the light of the brieﬂy describe the paradigms for community detection used in ﬁne-grained PRM (the closer to 1, the better) and the average intra- and intercluster decomposition of hydrophobic patches. Finally, we explain the algorithms distances (it is better to have low intracluster and high intercluster distances). that we use to cluster the HP-centroids: one attempts to globally match similar centroids and the other locally clustered centroids in an agglomerative manner. 3 RESULTS CCC: Connected Component Centroids is the name we give to the coarse-grained approach. In this section, we present and discuss the results of the case EBCC: the Edge Betweenness Community Centroid (EBCC) (Newman study of serine peptidases (Trypsin-like and Subtilisin-like) that are and Girvan, 2004) is a divisive approach in which the most central edges cross-inhibited by Eglin C and Turkey Ovomucoid. We also compare are broken one after another until the modularity of the graph is maximized. the quality of the conserved HP-centroids that are obtained by the The edge centrality is computed through the edge betweenness, which counts different proposed methods. the number of shortest paths that traverse through that edge. The higher the value of edge betweenness, the more the edge is used or the more central it is. In other words, this value indicates when there are no redundant edges to 3.1 The Eglin C Inhibitor cross between different communities and when the edge joins two different Eglin C is a small monomeric protein (70 residues) that belongs communities. to the Potato Chymotrypsin Inhibitor I family of serine protease SGCC: the Spin Glass Community Centroid (SGCC) (Reichardt and inhibitors that occurs naturally in the Leech Hirudo medicinalis. Bornholdt, 2006) tries to ﬁnd communities in graphs via a spin-glass model Functionally, Eglin C can inhibit more than one proteinase family and simulated annealing. That is, it uses simulated annealing to maximize with non-homologous structures (Hyberts et al., 1992). In the graph modularity. The modularity of a possible division of a graph into BRENDA database (Scheer et al., 2011), we found 12 different communities is deﬁned as the fraction of edges that falls within a given community minus the expected value of this fraction if edges were randomly EC numbers that are known to be inhibited by this molecule. In distributed. Commonly, the randomization of the edges is done in such a way this section, we present the analysis with the ﬁve non-redundant as to preserve the degree of each vertex. existing experimental complexes, four of which are Subtilisin-like OM: we have developed a linear programming Optimization Model (OM) and one of which is Trypsin-like. that is based on the transport problem and that attempts to match points As explained previously, we use different approaches to ﬁnd the by globally minimizing the differences between the edge sizes between HP-centroids. The OM has no parameters and it clusters all of the all possible pairs of points. The optimization functions that we want to centroids. With the AC, we must supply the number of clusters minimize, as well as the associated restrictions, are explained in detail in as an input parameter. Figure 2a shows the distributions of mean the Supplementary Material. PRM and intracluster distances. We observe that PRM is maximized AC: this method is a local strategy based on Agglomerative Clustering and intradistance values are stable with 12 clusters. With this (AC). It matches the closest HP-centroids through an iterative bottom-up conﬁguration, we obtain ﬁve high-quality clusters according to the agglomerative process. In this case, there is an important decision about [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 345 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. (a) (b) Table 2. Quantitative comparison of the proposed algorithms for Turkey Ovomucoid cross-inhibition. Mean intra (Å) Mean inter (Å) Mean PRM AC 4.803 13.239 0.94 CCC OM 8.009 10.402 – AC 2.901 10.999 0.75 SGCC OM 9.303 11.014 – AC 3.419 14.045 0.75 EBCC OM 6.459 11.997 – The best mean PRM value is in bold. Fig. 2. The CCC approach. In (a), we present the distribution used to maximize the mean PRM metric as well as the respective mean intracluster distance distribution. (b) Shows the PRM distribution for the why the proposed method reaches an abstraction level that is useful best conﬁguration achieved, with 12 clusters. for identifying relevant cross-inhibition patterns. When we compare the residues that compose cluster IV, we can see for a Trypsin-like enzyme the presence of LEU-143, THR-151, ALA-149, TYR-146, (a) (b) CYS-220, CYS-191 and MET-192. At the counterpart cluster in a Subtilisin-like enzyme, we ﬁnd PHE-193, ASN-163 and THR-224. Despite the very dissimilar residue compositions, patch volumes and densities, the method selects HP-centroids that are spatially conserved according to the inhibitor. Additional graphs for the other three samples are presented in the Supplementary Material. 3.2 The Turkey Ovomucoid inhibitor Ovomucoids are the glycoprotein protease–inhibitors of avian egg Fig. 3. The SGCC approach. In (a), we present the distribution used to whites. There are several protease inhibitors in egg white. The maximize mean PRM metric as well as the respective mean intracluster Turkey Ovomucoid is from a Kazal-type inhibitor family of serine distance distribution. (b) Shows the PRM distribution for the best protease inhibitors, which occurs naturally in Meleagris gallopavo. conﬁguration achieved, with 24 clusters. It is a signiﬁcant contaminant of crude Ovomucoid preparations, and it acts on Bovine Trypsin and Chymotrypsin as well as on Porcine Table 1. Quantitative comparison of the proposed algorithms for Eglin C Elastase and Fungal Proteinase (Fujinaga et al., 1987; Robertson cross-inhibition. et al., 1988). Mean intra (Å) Mean inter (Å) Mean PRM We analyze the four non-redundant existing complexes, of AC 3.435 13.835 0.98 which three have Trypsin-like enzymes and one has Subtilisin-like CCC OM 5.460 9.294 – enzymes. By conducting similar experiments to those presented in AC 2.450 13.138 0.82 the previous section, and by varying the number of clusters, we SGCC OM 5.093 11.231 – observe that the mean intradistance stabilizes from four clusters AC 2.679 12.670 0.90 on. We obtain three high-quality clusters according to the PRM EBCC OM 5.339 9.986 – (Supplementary Material). The best mean PRM value is in bold. Table 2 shows the results for the algorithm comparisons. As in the previous analysis, AC presents a combination of low intracluster PRM (Fig. 2b). This set of conserved HP-centroids presents a very distances, high intercluster distances and the highest PRM value high recall value (i.e. they are present in almost all the cross- (0.94) indicating a consistent match of the patterns. inhibition complexes) and furthermore, there is only one case where According to these results, the coarse-grained approach once more two points in a cluster come from the same complex. achieved better results than the ﬁne-grained approach. The same experiment was performed using the ﬁne-grained The three hydrophobic patches that were conserved in the approach, as presented in Figure 3. At this level of abstraction, Ovomucoid complexes are presented in Figure 5. Again, we we could not identify a threshold that clearly distinguishes high- can see a very dissimilar cluster composition and an interesting quality clusters from poor-quality ones. Since we aim to ﬁnd as many conservation of position according to the common inhibitor. We conserved HP-centroids as possible, the coarse-grained approach present additional graphs for Ovomucoid cross-inhibition in the systematically presents better results. This might indicate that the Supplementary Material. cross-inhibition pattern depends on the inhibitor-relative positions According to Baker and Murphy (1997), hydrophobic interactions of the conserved HP-centroids regardless of their density. are essential for explaining how inhibition happens in proteases. Table 1 shows the complete set of results. AC performs better, Our results are in agreement with this hypothesis. Searching for especially in the coarse-grained analysis, achieving low intra- and conserved abstractions of hydrophobic patches at the atomic level high intercluster distances combined with a very high PRM value (HP-centroids) in protease–inhibitor interfaces, we proposed and (0.98). evaluated a global and a local algorithm to cluster centroids. We The semantics of the ﬁve hydrophobic patches represented by aimed to ﬁnd conserved centroids at coarse- and ﬁne-grained levels. the conserved HP-centroids is presented in Figure 4. We can see We conclude that the coarse-grained AC local algorithm was able to [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 346 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Fig. 4. Hydrophobic patches for cross-inhibition by Eglin C. In (a) PDB id 1ACB:E, we can see a sample with Trypsin-like enzyme and in (b) PDB id 1TEC:E, with Subtilisin-like enzyme (the hydrophobic patches for the ﬁve complexes are in the Supplementary Material). We show an atomic graph in which the residue types and numbers are presented and the red (a) and green (b) spheres are the HP-centroids that represent each of the patches. In the last partof the ﬁgure (c), we present the inhibitor (residues from 40 to 48) as gray sticks (in black, the apolar portions), and the ﬁve centroids are superposed in colors. The green shades are the Subtilisin-like HP-centroids and the red ones are Trypsin-like. Fig. 5. Hydrophobic patches for cross-inhibition by Turkey Ovomucoid. In (a) PDB id 1R0R:E, we can see a sample with Subtilisin-like enzyme and in (b) PDB id 1PPF:E, a sample with Trypsin-like enzyme (the hydrophobic patches for the four complexes are shown in the Supplementary Material). We show an atomic graph in which the residue types and numbers are presented and the red (a) and green (b) spheres show the HP-centroids that represent each patch. In (c), we present the inhibitor (residues from 13 to 21) as gray sticks (apolar portions in black), and the ﬁve HP-centroids are superposed in colors. The green shades are the Subtilisin-like centroids and the red ones are the Trypsin-like centroids. identify the more complete set of invariant HP-centroids across the structural alignments as shown in Fig. 6). Notice that we can protease–inhibitor cross-inhibition examples. ﬁnd some conserved residues (marked with *) that are known to Certainly, the contribution of polar interactions must be studied participate in the catalysis (catalytic triad, oxyanion role) or in the in more detail in future work; interestingly, however, we have found speciﬁcity binding sites. Apart from these residues, no other interest a minimum of three invariant centroids in all cross-inhibition cases. conservation can be easily identiﬁed in these logos. As proteins are 3D objects, we conjecture that for a molecule to bind However, our hypothesis is that for inhibition to occur, we must and to hold another one, there must exist at least three non-collinear have very conserved hydrophobic patches in speciﬁc positions to contact points. It is possible that the conserved hydrophobic patches accommodate each of the inhibitors. For example, PHE-215 in obtained are responsible for binding and holding inhibitors at the the Trypsin-like enzymes in Figures 6b and 5b is a voluminous enzyme binding sites. hydrophobic residue that is equivalent to the hydrophobic portions of residues in positions LEU-96, ILE-107 and LEU-126 in the Subtilisin-like enzymes in Figures 6a and 5a. This is an example in 3.3 The use of HP-centroids for inhibition prediction which conserved patterns cannot be inferred from the sequence or structure but are clearly identiﬁed in our conserved HP-centroid I. Once we have the problem of scarcity of experimental complexes Going further, we believe that these patterns could be used representing cross-inhibition examples, it is intriguing to ask to predict inhibition for other enzymes for which structures are whether we can generalize the conserved HP-centroids to binding available but no experimental evidence of inhibition is known. For sites of apo enzymes of the studied families. We extended the instance, we used eight samples of non-redundant Subtilisin-like analysis to a set of non-redundant apo structures of serine proteases apo enzymes (listed in the Supplementary Material) belonging to (a list of proteins is in the Supplementary Material). We project ﬁve different EC numbers (3.4.21.62 / 64 / 66 / 75 / 97). We the IFR obtained from the cross-inhibition complexes to the considered only those enzymes for which the ECs are complete apo enzymes by using structural alignments, and we verify a with the four levels of annotation. According to the BRENDA strong conservation of the HP-centroids found through complex database, three of these are inhibited by Eglin C (3.4.21.62 / 66 analysis. / 75), and we can say that this constitutes successful predictions. Due to the low conservation of residues, it is not possible to As far as we are concerned, the other two enzymes (3.4.21.64 / understand how inhibition occurs by examining only sequence- level conservation (even when sequence alignments are done by 97), which represent Proteinase K and Assemblin Protease, are [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 347 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. Fig. 6. IFR projections of HP-centroids found in serine proteases that are cross-inhibited by Ovomucoid. In (a), we show results for nine non-redundant superposed Subtilisin-like enzymes (residue numbers according to PDB id 1R0R:E) and in (b) we show results for 35 non-redundant superposed Trypsin-like enzymes (residue numbers according to PDB id 1PPF:E). On both sides, the bottom logos show the residues that are in the IFR but that are not part of a conserved cluster. not mentioned in the literature but present the same pattern as apo enzymes representing entire families. By comparing with do the other Subtilisin-like enzymes. It would be very interesting experimental data available in the BRENDA database, we also show to verify experimentally whether they can be inhibited by Eglin some successful examples of how HP-centroids can be used to C, as they present the same HP-centroids as do other complexes predict enzymes that could be inhibited by the studied inhibitors. with this inhibitor. Similar analyses for Ovomucoid are presented Finally, we raise some questions about possible enzymes that might in the Supplementary Material, and we can also verify successful be inhibited by Eglic C and/or Turkey Ovomucoid and expose them predictions and several unknown inhibition possibilities. to further experimental validation. We believe that this work should be extended to other enzyme families for which entropic changes are known to be important factors in inhibition processes. It would also be interesting to 4 CONCLUSIONS verify whether this method should be used in other problems of In this work, we model the problem of understanding and predicting protein–protein interaction pattern detection. enzyme cross-inhibition. We propose and evaluate algorithms to Funding: Brazilian agencies Coordenação de Aperfeiçoamento detect conserved hydrophobic patch centroids (HP-centroids) to de Pessoal de Nível Superior (CAPES); Conselho Nacional de clarify how these centroids occur in proteases. Our model is based Desenvolvimento Cientíﬁco e Tecnológico (CNPq); Fundação de on the importance of apolar interactions to inhibition in this family Amparo a Pesquisa do Estado de Minas Gerais (FAPEMIG); and on the fact that these hydrophobic portions should be studied Financiadora de Estudos e Projetos (FINEP). at an atomic level. We model the interfaces between enzymes and inhibitors as graphs of atomic apolar interactions, detect connected Conﬂict of Interest: none declared. components to represent hydrophobic patches, summarize them using centroids and show how to obtain as complete a set of REFERENCES conserved centroids as possible. One of the strengths of the method is that it achieves the appropriate level of abstraction to detect Baker,B.M. and Murphy,K.P. (1997) Dissecting the energetics of a protein-protein interaction: the binding of ovomucoid third domain to elastase. J. Mol. Biol., 268, the invariant properties involved in cross-inhibition. One of the 557–569. main difﬁculties in the study and understanding of this complex Barrett,A.J. et al. (eds) (2004) Handbook of Proteolytic Enzymes, vol. 1–2, 2 edn. phenomenon through classical methods is that dissimilar sequences Elsevier, London. and structures might be inhibited by the same inhibitor. Despite Berman,H.M. et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. the lack of conservation at the sequence and structure levels, the Betzel,C. et al. (1993) Structure of the proteinase inhibitor eglin c with hydrolysed proposed HP-centroids appear to be promising, as they are very reactive centre at 2.0 a resolution. FEBS Lett., 317, 185–188. conserved across the studied cases of cross-inhibition. Bode,W. et al. (1986) X-ray crystal structure of the complex of human leukocyte As we have few non-redundant experimental complexes available, elastase (pmn elastase) and the third domain of the turkey ovomucoid inhibitor. we test the generality of HP-centroids with a set of non-redundant EMBO J., 5, 2453–2458. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 348 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Chakrabarti,P. and Janin,J. (2002a) Dissecting protein-protein recognition sites. Pires,D.E.V. et al. (2007) Pdbest: Pdb enhanced structures toolkit. In Proceedings of Proteins, 47, 334–343. the 3rd International Conference of Brazil Association for Bioinformatics. AB3C Chakrabarti,P. and Janin,J. (2002b) Dissecting protein-protein recognition sites. Publishing, São Paulo, p. 39. Proteins Struct. Funct. Genet., 47, 334–343. Pontius,J. et al. (1996) Deviations from standard atomic volumes as a quality measure Chothia,C. and Janin,J. (1975) Principles of protein-protein recognition. Nature, 256, for protein crystal structures. J. Mol. Biol., 264, 121–136. 705–708. Poupon,A. (2004) Voronoi and voronoi-related tessellations in studies of protein Conte,L.L. et al. (1999) The atomic structure of protein-protein recognition sites. J. structure and interaction. Curr. Opin. Struct. Biol., 14, 233–241. Mol. Biol., 285, 2177–2198. Qasim,M.A. et al. (1997) Interscaffolding additivity. association of p1 variants of eglin da Silveira,C.H. et al. (2009) Protein cutoff scanning: a comparative analysis of cutoff c and of turkey ovomucoid third domain with serine proteinases. Biochemistry, 36, dependent and cutoff free methods for prospecting contacts in proteins. Proteins, 1598–1607. 74, 727–743. Rawlings,N.D. et al. (2008) Merops: the peptidase database. Nucleic Acids Res., 36, Dupuis,F. et al. (2005) Voro3d: 3d voronoi tessellations applied to protein structures. D320–D325. Bioinformatics, 21, 1715–1716. Reichardt,J. and Bornholdt,S. (2006) Statistical mechanics of community detection. Ekici,O.D. et al. (2008) Unconventional serine proteases: variations on the catalytic Phys. Rev. E, 74, 016110. ser/his/asp triad conﬁguration. Protein Sci., 17, 2023–2037. Ribeiro,C. et al. (2010) Analysis of binding properties and speciﬁcity through Fujinaga,M. et al. (1987) Crystal and molecular structures of the complex of alpha- identiﬁcation of the interface forming residues (ifr) for serine proteases in silico chymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 a resolution. docked to different inhibitors. BMC Struct. Biol., 10, 36. J. Mol. Biol., 195, 397–418. Robertson,A.D. et al. (1988) Two-dimensional NMR studies of kazal proteinase Hyberts,S.G. et al. (1992) The solution structure of eglin c based on measurements of inhibitors. 1. sequence-speciﬁc assignments and secondary structure of turkey many noes and coupling constants and its comparison with x-ray structures. Protein ovomucoid third domain. Biochemistry, 27, 2519–2529. Sci., 1, 736–751. Scheer,M. et al. (2011) BRENDA, the enzyme information system in 2011. Nucleic Janin,J. et al. (1990) The structure of protein-protein recognition sites. Structure, 265, Acids Res., 39, 670–676. 16027–16030. Siezen,R.J. and Leunissen,J.A. (1997) Subtilases: the superfamily of subtilisin-like Laskowski,M. and Qasim,M.A. (2000) What can the structures of enzyme-inhibitor serine proteases. Protein Sci., 6, 501–523. complexes tell us about the structures of enzyme substrate complexes? Biochim. Sobolev,V. et al. (1999) Automated analysis of interatomic contacts in proteins. Biophys. Acta, 1477, 324–337. Bioinformatics, 15, 327–332. Lesk,A.M. and Fordham,W.D. (1996) Conservation and variability in the structures of Soundararajan,V. et al. (2010) Atomic interaction networks in the core of protein serine proteinases of the chymotrypsin family. J. Mol. Biol., 258, 501–537. domains and their native folds. PLoS One, 5, e9391. Melo-Minardi,R.C. et al. (2007) Finding protein-protein interaction patterns by contact Tuncbag,N. et al. (2011) Prediction of protein-protein interactions: unifying evolution map matching. Genet. Mol. Res., 6, 946–963. and structure at protein interfaces. Phys. Biol., 8, 035006. Neshich,G. et al. (2003) Sting millennium: a web-based suite of programs for Wallace,A.C. et al. (1996) Derivation of 3D coordinate templates for searching comprehensive and simultaneous analysis of protein structure and sequence. Nucleic structural databases: application to ser-his-asp catalytic triads in the serine Acids Res., 31, 3386. proteinases and lipases. Protein Sci., 5, 1001–1013. Newman,M.E.J. and Girvan,M. (2004) Finding and evaluating community structure in Zhang,Z. et al. (2011) Identiﬁcation of cavities on protein surface using multiple networks. Phys. Rev. E, 69, 026113. computational approaches for drug binding site prediction. Bioinformatics, 27, Page,M.J. and Di Cera,E. (2008) Serine peptidases: classiﬁcation, structure and 2083–2088. function. Cell. Mol. Life Sci., 65, 1220–1236. Papamokos,E. et al. (1982) Crystallographic reﬁnement of japanese quail ovomucoid, a kazal-type inhibitor, and model building studies of complexes with serine proteases. J. Mol. Biol., 158, 515–537. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 349 342–349 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/hydropace-understanding-and-predicting-cross-inhibition-in-serine-s5ahagtqx2

Loading next page...

References (36)

C. Betzel, Z. Dauter, N. Genov, V. Lamzin, J. Navaza, H. Schnebli, M. Vişan, K. Wilson (1993)
Structure of the proteinase inhibitor eglin c with hydrolysed reactive centre at 2.0 Å resolution
FEBS Letters, 317
S. Hyberts, M. Goldberg, Timothy Havel, G. Wagner (1992)
The solution structure of eglin c based on measurements of many NOEs and coupling constants and its comparison with X‐ray structures
Protein Science, 1
M. Qasim, P. Ganz, C. Saunders, K. Bateman, M. James, M. Laskowski (1997)
Interscaffolding additivity. Association of P1 variants of eglin c and of turkey ovomucoid third domain with serine proteinases.
Biochemistry, 36 7
V. Sobolev, A. Sorokin, J. Prilusky, E. Abola, M. Edelman (1999)
Automated analysis of interatomic contacts in proteins
Bioinformatics, 15 4
H. Berman, J. Westbrook, Zukang Feng, G. Gilliland, T. Bhat, H. Weissig, I. Shindyalov, Philip Bourne (2000)
The Protein Data Bank
Nucleic acids research, 28 1
Ö. Ekici, M. Paetzel, R. Dalbey (2008)
Unconventional serine proteases: Variations on the catalytic Ser/His/Asp triad configuration
Protein Science, 17
V. Soundararajan, R. Raman, S. Raguram, V. Sasisekharan, R. Sasisekharan (2010)
Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds
PLoS ONE, 5
A. Barrett, N. Rawlings, J. Woessner (1998)
Handbook of proteolytic enzymes
M. Page, E. Cera (2008)
Serine peptidases: Classification, structure and function
Cellular and Molecular Life Sciences, 65
Brian Baker, Kenneth Murphy (1997)
Dissecting the energetics of a protein-protein interaction: the binding of ovomucoid third domain to elastase.
Journal of molecular biology, 268 2
C. Chothia, J. Janin (1975)
Principles of protein–protein recognition
Nature, 256
G. Neshich, R. Togawa, A. Mancini, P. Kuser, M. Yamagishi, G. Pappas, Wellington Torres, Tharsis Campos, L. Ferreira, Fabio Luna, Adilton Oliveira, R. Miura, M. Inoue, L. Horita, D. Souza, Fabiana Dominiquini, Alexandre Alvaro, Cleber Lima, Fabio Ogawa, Gabriel Gomes, Juliana Palandrani, Gabriela Santos, Esther Freitas, A. Mattiuz, Ivan Costa, Celso Almeida, Savio Souza, Christian Baudet, R. Higa (2003)
STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence
Nucleic acids research, 31 13
E. Papamokos, E. Weber, W. Bode, R. Huber, M. Empie, I. Kato, M. Laskowski (1982)
Crystallographic refinement of Japanese quail ovomucoid, a Kazal-type inhibitor, and model building studies of complexes with serine proteases.
Journal of molecular biology, 158 3
C. Silveira, Douglas Pires, R. Minardi, C. Ribeiro, C. Veloso, J. Lopes, Wagner Jr, G. Neshich, C. Ramos, Raul Habesch, M. Santoro (2009)
Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins
Proteins: Structure, 74
A. Wallace, R. Laskowski, J. Thornton (1996)
Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases
Protein Science, 5
Michael Laskowski, M. Qasim (2000)
What can the structures of enzyme-inhibitor complexes tell us about the structures of enzyme substrate complexes?
Biochimica et biophysica acta, 1477 1-2
Maurice Scheer, A. Grote, Antje Chang, I. Schomburg, Cornelia Munaretto, M. Rother, C. Söhngen, M. Stelzer, Juliane Thiele, D. Schomburg (2010)
BRENDA, the enzyme information system in 2011
Nucleic Acids Research, 39
C. Ribeiro, R. Togawa, I. Neshich, I. Mazoni, A. Mancini, R. Minardi, C. Silveira, J. Jardine, M. Santoro, G. Neshich (2010)
Analysis of binding properties and specificity through identification of the interface forming residues (IFR) for serine proteases in silico docked to different inhibitors
BMC Structural Biology, 10
P. Chakrabarti, J. Janin (2002)
Dissecting protein–protein recognition sites
Proteins: Structure, 47
L. Conte, C. Chothia, J. Janin (1999)
The atomic structure of protein-protein recognition sites.
Journal of molecular biology, 285 5
M. Fujinaga, A. Sielecki, R. Read, W. Ardelt, M. Laskowski, M. James (1987)
Crystal and molecular structures of the complex of α-chymotrypsin with its inhibitor Turkey ovomucoid third domain at 1.8 Å resolution
Journal of Molecular Biology, 195
W. Bode, A. Wei, R. Huber, E. Meyer, J. Travis, S. Neumann (1986)
X‐ray crystal structure of the complex of human leukocyte elastase (PMN elastase) and the third domain of the turkey ovomucoid inhibitor.
The EMBO Journal, 5
J. Pontius, J. Richelle, S. Wodak (1996)
Deviations from standard atomic volumes as a quality measure for protein crystal structures.
Journal of molecular biology, 264 1
Mark Newman, Mark Newman, Michelle Girvan, Michelle Girvan (2003)
Finding and evaluating community structure in networks.
Physical review. E, Statistical, nonlinear, and soft matter physics, 69 2 Pt 2
J. Reichardt, S. Bornholdt (2006)
Statistical mechanics of community detection.
Physical review. E, Statistical, nonlinear, and soft matter physics, 74 1 Pt 2
A. Robertson, W. Westler, J. Markley (1988)
Two-dimensional NMR studies of Kazal proteinase inhibitors. 1. Sequence-specific assignments and secondary structure of turkey ovomucoid third domain.
Biochemistry, 27 7
A. Poupon (2004)
Voronoi and Voronoi-related tessellations in studies of protein structure and interaction.
Current opinion in structural biology, 14 2
R. Siezen, J. Leunissen (1997)
Subtilases: The superfamily of subtilisin‐like serine proteases
Protein Science, 6
A. Lesk, William Fordham (1996)
Conservation and variability in the structures of serine proteinases of the chymotrypsin family.
Journal of molecular biology, 258 3
R. Melo, C. Ribeiro, C. Murray, C. Veloso, C. Silveira, G. Neshich, Wagner Jr, R. Carceroni, M. Santoro (2007)
Finding protein-protein interaction patterns by contact map matching.
Genetics and molecular research : GMR, 6 4
Neil Rawlings, Alan Barrett, Alex Bateman (2009)
MEROPS: the peptidase database
Nucleic Acids Research, 38
F. Dupuis, J. Sadoc, R. Jullien, B. Angelov, J. Mornon (2005)
Voro3D: 3D Voronoi tessellations applied to protein structures
Bioinformatics, 21 8
Zengming Zhang, Yu Li, B. Lin, M. Schroeder, Bingding Huang (2011)
Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction
Bioinformatics, 27 15
(2007)
Pdbest: Pdb enhanced structures toolkit
J. Janin, C. Chothia (1990)
The structure of protein-protein recognition sites.
The Journal of biological chemistry, 265 27
Nurcan Tuncbag, A. Gursoy, O. Keskin (2011)
Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces
Physical Biology, 8

Publisher: Oxford University Press
Copyright: © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN: 1367-4803
eISSN: 1460-2059
DOI: 10.1093/bioinformatics/btr680
pmid: 22171332
Publisher site: See Article on Publisher Site

Abstract

MANUSCRIPT CATEGORY: ORIGINAL PAPER Vol. 28 no. 3 2012, pages 342–349 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btr680 Structural bioinformatics Advance Access publication December 9, 2011 HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids 1,2,∗ 1,2 1,∗ V. M. Gonçalves-Almeida ,D.E.V.Pires , R. C. de Melo-Minardi , 3 1 2 C. H. da Silveira , W. Meira and M. M. Santoro 1 2 Department of Computer Science, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte and Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, Brazil Associate Editor: Anna Tramontano ABSTRACT repositories (Rawlings et al., 2008). The MEROPS database groups both proteases and inhibitors hierarchically into families Motivation: Protein–protein interfaces contain important information (sequence-related entities) and clans (structure-related entities). about molecular recognition. The discovery of conserved patterns A careful MEROPS search highlighted a well-known but intriguing is essential for understanding how substrates and inhibitors are phenomenon: some protease inhibitors lack speciﬁcity and involve bound and for predicting molecular binding. When an inhibitor different 3D structures and catalytic mechanisms. For instance, binds to different enzymes (e.g. dissimilar sequences, structures or Turkey Ovomucoid and Englin C act in different serine peptidase mechanisms what we call cross-inhibition), identiﬁcation of invariants clans such as PA(S) (all β Trypsin-like folds) and SB (α/β Subtilisin- is a difﬁcult task for which traditional methods may fail. Results: To clarify how cross-inhibition happens, we model the like folds) and soybean Kunitz trypsin inhibitor decays proteolytic activity as much in serine peptidases as in metallopeptidases (which problem, propose and evaluate a methodology called HydroPaCe have very different enzymatic mechanisms). We call this lack of to detect conserved patterns. Interfaces are modeled as graphs of speciﬁcity cross-inhibition. Our main challenge in this article is atomic apolar interactions and hydrophobic patches are computed to create a methodology that helps to understand and predict this and summarized by centroids (HP-centroids), and their conservation phenomenon. is detected. Despite sequence and structure dissimilarity, our method Protease–inhibitor recognition and binding are determined by achieves an appropriate level of abstraction to obtain invariant a complex orchestration of interactions and entropic factors that properties in cross-inhibition. We show examples in which HP- involve the entire protease–inhibitor–solvent system. Fortunately, centroids successfully predicted enzymes that could be inhibited by the experimental binding energetics of many protease–inhibitor the studied inhibitors according to BRENDA database. complexes have already been thermodynamically determined. It is Availability: www.dcc.ufmg.br/∼raquelcm/hydropace known, for example, that the binding of Turkey Ovomucoid with Contact: valdetemg@ufmg.br; raquelcm@dcc.ufmg.br; Elastase at 25 C is characterized by a negative Gibbs free energy santoro@icb.ufmg.br in which enthalpy change is almost negligible but entropy change Supplementary information: Supplementary data are available at is largely positive (Baker and Murphy, 1997). Furthermore, we spot Bioinformatics online. a clear trend of higher apolar/polar accessible surface area ratio Received on July 25, 2011; revised on November 17, 2011; accepted toward interface (Supplementary Fig. S1), which is an evidence of on December 3, 2011 the importance of the hydrophobic interactions in protease–inhibitor complex formation. That said, we particularly focus our attention on the search for conserved hydrophobic interaction patterns. We 1 INTRODUCTION deﬁne these patterns as invariant hydrophobic regions (or patches) Enzyme inhibition occurs when a molecule binds to an enzyme, thus that are in contact with the same apolar complementary parts of the decreasing its activity. Inhibitors may be proteic or non-proteic; they inhibitor (Supplementary Figs 3 and 4). We show (Supplementary can decrease the enzyme’s ability to bind substrates or can lower the Fig. S2) a strong linear relationship (Pearson’s correlation coefﬁcient enzyme’s catalytic activity or a combination of both. Inhibition is an of 0.98) between the inferred solvation entropy change and the important biochemical mechanism that is involved in metabolism extension of hydrophobic patches, measured in terms of the number regulation. It controls many intra- and extracellular pathways, of hydrophobic atoms inside them. inﬂammatory and immunological processes, virus replication and Although there are many biochemical studies that analyze many other biological functions (Barrett et al., 2004) Furthermore, diversity in inhibition processes [e.g. (Bode et al., 1986; Chakrabarti once this natural phenomenon is understood, it might be used and Janin, 2002a; Laskowski and Qasim, 2000; Qasim et al., 1997)], for biotechnological purposes including the development of drugs, experimental characterization of inhibition is a labor-intensive insecticides, pesticides and disinfectants. process. The large amount of possible inhibitors for a given enzyme A particular case is the inhibition of peptidases; on this subject, the can make tests costly; hence, in silico methods can contribute to MEROPS database is currently one of the most important peptidase predicting inhibitor–enzyme recognition. Despite its evident importance, there are few models and To whom correspondence should be addressed. algorithms that identify recognition and interaction patterns that 342 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 342 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe could help to clarify how cross-inhibition occurs. In this context, cross-inhibition structures in the Protein Data Bank (PDB) (Berman et al., 2000). Moreover, this is a well-studied family that presents some peculiarities a pattern is a conserved set of interface attributes that is used to and similarities in catalytic sites (Page and Di Cera, 2008). Although explain or predict binding. Trypsin-like and Subtilisin-like have very different 3D structures, they Traditionally, sequence comparison and/or structural alignment hydrolyze their substrates by the same mechanism (Ekici et al., 2008; Lesk methods have been used in conservation detection (Melo-Minardi and Fordham, 1996; Siezen and Leunissen, 1997). et al., 2007; Ribeiro et al., 2010; Zhang et al., 2011). According Enzyme–inhibitor complexes: we found ﬁve non-redundant complexes to Tuncbag et al. (2011), structures are more conserved than involving the Eglin C inhibitor: four bond to Subtilisin-like (PDB IDs: sequences, and interface-forming residues (IRFs) are even more 1TEC, 1CSE, 1MEE and 1SBN) and one to Trypsin-like (PDB ID: 1ACB) conserved than the whole structure. However, these classical enzymes. Likewise, we found four complexes involving the Ovomucoid methods are inappropriate because in cross-inhibition we may deal inhibitor: three complexed with Trypsin-like (PDB IDs: 1CHO, 1PPF and with very dissimilar sequences and even completely distant folds. 3SGB) and one with Subtilisin-like (PDB ID: 1R0R) enzymes. Despite the large amount of information on enzymatic complexes involving these Indeed, in cross-inhibition pattern detection with traditional two families, there is much redundant information regarding the sequence methods, we identify essentially known conserved residues that identities, and this leaves only a small number of non-redundant complexes directly participate in the catalysis process, such as the catalytic to be analyzed. triad, the speciﬁcity pocket and oxyanion-binding sites. We note Apo enzymes: we selected a set of non-redundant apo enzymes from the that to correctly assess the eventual hydrophobic contribution two families by removing enzymes that presented >50% of sequence identity. of the entire protease–inhibitor interface, we should abstract the Hence, we use 9 samples from Subtilisin-like and 35 from Trypsin-like residue semantics and should assess patches at the atomic level. families. The complete list of PDB ids is presented in the Supplementary A similar approach has been used to characterize the core of Material. protein domains with similar folds but very divergent sequence All the structures were submitted to standardization processes using the compositions (Soundararajan et al., 2010). The atomic level is more PDB Enhanced Structures Toolkit (PDBest) (Pires et al., 2007). appropriate because all residues have apolar portions. Lysine, for example, is considered a positively charged residue (at neutral pH), 2.2 IFRs but there are also several hydrophobic methyl groups. The current analysis is restricted to regions of the molecular interface of Enzyme–inhibitor recognition is determined by a network the enzyme and its inhibitor. The IFRs can be determined by three different of interactions between atoms; hence, graph modeling is a methods. The ﬁrst deﬁnes the interface simply by using a cut-off distance straightforward approach. We model hydrophobic atoms as nodes between the residues of the interacting molecules (Chothia and Janin, 1975; of a graph and the contacts between them as the edges. We use the Conte et al., 1999). The second approach computes the interactions based graph to obtain conserved hydrophobic patches or, in other words, on differences in solvent-accessible surface area (ASA) when the monomers connected components. are separated (Chakrabarti and Janin, 2002b; Janin et al., 1990). Finally, the last approach deﬁnes interfaces through computational geometry using Supposing that the most important property of a hydrophobic Voronoi diagrams and the alpha shapes theory (Pontius et al., 1996). We used patch is where it is positioned to interact with the ligand, we the ASA method because it is the most used method and is therefore more abstract from its composition volume, shape and density, and consolidated. we represent the patch as a geometric centroid that we call the Enzyme–inhibitor IFRs: we computed the IFRs in the cross-inhibition HP-centroid(hydrophobic patch centroid). In this work, we propose complexes using the ASA approach with the STING Millennium Suite a novel model and algorithms to detect conserved HP-centroids in platform (SMS) (Neshich et al., 2003). cross-inhibition. Projection of IFRs from complexes into apo enzymes: for the apo proteins, Finally, we present a qualitative case study that consists of the projection was derived by structural alignment using an enzyme–inhibitor two examples of cross-inhibition, Trypsin-like and Subtilisin-like complex and the computed IFR. Moreover, the structures were solvated using enzymes, both of which belong to the serine proteases family. Gromacs. After applying the treatment to PDB ﬁles, all structures, including the complex model, were superimposed using the program MultiProt. Finally, They present completely different 3D structures and the sequence the residues that aligned with the interface of the complex model were identity is as low as 20% (Wallace et al., 1996). However, they considered the interfaces of the apo proteins. This process was performed possess exactly the same Ser-His-Asp triad on their active sites. In for analysis of both sets (Trypsin-like and Subtilisin-like) of single-chain the ﬁrst case, we have complexes of Trypsin-like and Subtilisin- proteins in our database. like enzymes inhibited by Eglin C (Betzel et al., 1993), and in the second case, we have complexes of the same families with 2.3 Problem modeling Turkey Ovomucoid (Papamokos et al., 1982). We verify that the HP-centroids obtained from the complexes are present in a set of The proposed method is based on the search for conserved hydrophobic sequence-diverse apo structures that are conserved throughout the patches (HP-centroids). In what follows, we detail each step of our model: Graph construction: the ﬁrst step of our model consists of the family. representation of IFRs as graphs. The nodes are atoms from the IFR residues, and the edges are the presumed contacts. According to our previous work 2 MATERIALS AND METHODS (da Silveira et al., 2009), there are two main approaches to identify contacts Each step of the proposed methodology, called HydroPaCe, is in proteins: the ﬁrst is cut-off dependent (CD), and the other is independent described below. A complete workﬂow of the methodology is presented in (CI). Although in the above-mentioned study we found that, at the residue Figure 1. level, the CD approach was a simpler, more complete and more reliable technique than some CI techniques, here we chose to use a CI methodology 2.1 Data selection and preparation because we did not ﬁnd a reliable cut-off value at the atomic level. This As explained previously, we have chosen serine proteases to test our paradigm uses classical computational geometry algorithms to compute a algorithm. We chose them because there are few other examples of Voronoi diagram (VD) (Poupon, 2004) and its dual problem, the Delaunay [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 343 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. Fig. 1. HydroPaCe workﬂow: We searched for 3D structures of subtilisin-like and trypsin-like families in the PDB database. The PDB ids were separated into protein–inhibitor complexes and apo proteins. The structures with sequence identities that were >50% identical to other selected sequences were discarded. The cross-inhibition complexes were aligned by the inhibitor’s chain (a), and the interfaces of contact (also called IFRs) were identiﬁed using ASA methodology (b). The apo proteins were aligned by their single chains by using an enzyme–inhibitor complex to project the interface. Using DT, possible interatomic contacts were computed, resulting in the edges of a graph where nodes are atoms (c). We considered only the hydrophobic interactions between atoms and removed edges that represented covalent bonds. We then identiﬁed the connected components that represent the hydrophobic patches in these graphs (d). We propose two levels of abstractions to represent the hydrophobic patches, both of which are based on geometric centroids (HP-centroids). The ﬁrst isa coarse-grained analysis that consists of computing a centroid for each connected component, and the second is a ﬁne-grained analysis that searches for dense subregions using two different community detection algorithms and calculating the HP-centroids on communities. The obtained HP-centroids were clustered using the OM and AC methods (e). Finally, the HP-centroids were evaluated using PRM, which accounts positively for coverage in terms of enzymes and negatively for enzyme redundancy. In (a–d), the left-hand structure is Subtilisin-like and the right-hand structure is Trypsin-like. tessellation (DT) (Dupuis et al., 2005). In the 3D view, the VD decomposes a neighborhood with the closest (not occluded) contacts (da Silveira et al., the volume by associating a polyhedron with each site (which is called 2009). a Voronoi cell). Each face of these polyhedrons is composed of a plane Deletion of covalent edges: we are interested only in non-covalent that bisects the line and links each site to each of its near sites, thus mapping interactions; hence, we remove covalent bond edges in a post-processing step. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 344 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Deletion of polar edges: once we have a geometrical inference of when to stop the process to ensure that we have high-quality clusters. The non-occluded interactions, we classify them into hydrophobic and polar strategy for determining this stopping point, and a detailed explanation of interactions based on the classiﬁcation rules proposed in Sobolev et al. the algorithm, are presented in the Supplementary Material. (1999). The complete table with the classiﬁcations of all the atoms can be found in the Supplementary Material. As discussed previously, we restrict our 2.5 Evaluation analysis to hydrophobic interactions type by removing polar contact edges. To perform a quantitative evaluation of the clusters formed by the matches, Nevertheless, the analysis can be extended to deal with polar areas. we propose a metric based on the concept of recall that is penalized Computation of hydrophobic patches: we use a depth-ﬁrst search when different HP-centroids of the same protein (redundant centroids) are to efﬁciently detect the connected components, which are natural grouped together. We have it called the penalized recall metric (PRM) and representations of the hydrophobic patches. is formalized below: Abstraction of hydrophobic patches through centroids: hydrophobic D E patches may occur in different shapes and volumes; our model considers two C C 2 2 PRM = − (1) levels of abstractions to represent them, both of which are based on geometric P P C C 2 2 centroids (HP-centroid). The ﬁrst, which we call the coarse-grained analysis, consists of computing a centroid for each connected component. The second where C is the number of pairs of centroids from different enzymes in is a ﬁne-grained analysis that divides the original connected components into the same cluster, C is the number of pairs of HP-centroids from the same dense subgraphs or communities. A community is a subgraph in which the protein in the same cluster, C is the total number of pairs of HP-centroids nodes are much more connected with the other nodes in the community than in the cluster and the values of D and E are limited to P. with the external nodes. In this approach, the HP-centroids are computed The metric produces values in the range of [−1; 1] where −1 is the worst based on communities. case, with minimum recall and maximum redundancy. It will result in 1 when In conclusion, our method is based on the computation of hydrophobic we have maximum recall and minimum redundancy. When we have similar patches and their abstraction through geometric centroids (HP-centroids) values for recall and redundancy, the metric approaches 0. that can represent the entire patch (coarse-grained) or communities of these The average of the PRM of the clusters was used to evaluate the three patches (ﬁne-grained). Considering the HP-centroids of a set of cross- different approaches (CCC, EBCC and SGCC). It cannot be used to compare inhibition complexes, we propose algorithms to cluster the centroids and to between the OM and AC. In the OM, clusters are formed with total variability detect those that are conserved across all of them. We describe the algorithms by deﬁnition; in other words, there are no pair of centroids of the same in the next section and then explain how to evaluate the clusters obtained. enzyme in a cluster. In this case, we use traditional intra- and intercluster average distances. 2.4 Algorithms A priori, a high-quality cluster must have low intracluster and high intercluster distances. That is because, in an ideal clustering, similar elements Here, we describe in more detail the different approaches (coarse- and must be grouped together and dissimilar ones must be separated. ﬁne-grained) that we propose to abstract from the hydrophobic patches. We In conclusion, we compare the proposed approaches in the light of the brieﬂy describe the paradigms for community detection used in ﬁne-grained PRM (the closer to 1, the better) and the average intra- and intercluster decomposition of hydrophobic patches. Finally, we explain the algorithms distances (it is better to have low intracluster and high intercluster distances). that we use to cluster the HP-centroids: one attempts to globally match similar centroids and the other locally clustered centroids in an agglomerative manner. 3 RESULTS CCC: Connected Component Centroids is the name we give to the coarse-grained approach. In this section, we present and discuss the results of the case EBCC: the Edge Betweenness Community Centroid (EBCC) (Newman study of serine peptidases (Trypsin-like and Subtilisin-like) that are and Girvan, 2004) is a divisive approach in which the most central edges cross-inhibited by Eglin C and Turkey Ovomucoid. We also compare are broken one after another until the modularity of the graph is maximized. the quality of the conserved HP-centroids that are obtained by the The edge centrality is computed through the edge betweenness, which counts different proposed methods. the number of shortest paths that traverse through that edge. The higher the value of edge betweenness, the more the edge is used or the more central it is. In other words, this value indicates when there are no redundant edges to 3.1 The Eglin C Inhibitor cross between different communities and when the edge joins two different Eglin C is a small monomeric protein (70 residues) that belongs communities. to the Potato Chymotrypsin Inhibitor I family of serine protease SGCC: the Spin Glass Community Centroid (SGCC) (Reichardt and inhibitors that occurs naturally in the Leech Hirudo medicinalis. Bornholdt, 2006) tries to ﬁnd communities in graphs via a spin-glass model Functionally, Eglin C can inhibit more than one proteinase family and simulated annealing. That is, it uses simulated annealing to maximize with non-homologous structures (Hyberts et al., 1992). In the graph modularity. The modularity of a possible division of a graph into BRENDA database (Scheer et al., 2011), we found 12 different communities is deﬁned as the fraction of edges that falls within a given community minus the expected value of this fraction if edges were randomly EC numbers that are known to be inhibited by this molecule. In distributed. Commonly, the randomization of the edges is done in such a way this section, we present the analysis with the ﬁve non-redundant as to preserve the degree of each vertex. existing experimental complexes, four of which are Subtilisin-like OM: we have developed a linear programming Optimization Model (OM) and one of which is Trypsin-like. that is based on the transport problem and that attempts to match points As explained previously, we use different approaches to ﬁnd the by globally minimizing the differences between the edge sizes between HP-centroids. The OM has no parameters and it clusters all of the all possible pairs of points. The optimization functions that we want to centroids. With the AC, we must supply the number of clusters minimize, as well as the associated restrictions, are explained in detail in as an input parameter. Figure 2a shows the distributions of mean the Supplementary Material. PRM and intracluster distances. We observe that PRM is maximized AC: this method is a local strategy based on Agglomerative Clustering and intradistance values are stable with 12 clusters. With this (AC). It matches the closest HP-centroids through an iterative bottom-up conﬁguration, we obtain ﬁve high-quality clusters according to the agglomerative process. In this case, there is an important decision about [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 345 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. (a) (b) Table 2. Quantitative comparison of the proposed algorithms for Turkey Ovomucoid cross-inhibition. Mean intra (Å) Mean inter (Å) Mean PRM AC 4.803 13.239 0.94 CCC OM 8.009 10.402 – AC 2.901 10.999 0.75 SGCC OM 9.303 11.014 – AC 3.419 14.045 0.75 EBCC OM 6.459 11.997 – The best mean PRM value is in bold. Fig. 2. The CCC approach. In (a), we present the distribution used to maximize the mean PRM metric as well as the respective mean intracluster distance distribution. (b) Shows the PRM distribution for the why the proposed method reaches an abstraction level that is useful best conﬁguration achieved, with 12 clusters. for identifying relevant cross-inhibition patterns. When we compare the residues that compose cluster IV, we can see for a Trypsin-like enzyme the presence of LEU-143, THR-151, ALA-149, TYR-146, (a) (b) CYS-220, CYS-191 and MET-192. At the counterpart cluster in a Subtilisin-like enzyme, we ﬁnd PHE-193, ASN-163 and THR-224. Despite the very dissimilar residue compositions, patch volumes and densities, the method selects HP-centroids that are spatially conserved according to the inhibitor. Additional graphs for the other three samples are presented in the Supplementary Material. 3.2 The Turkey Ovomucoid inhibitor Ovomucoids are the glycoprotein protease–inhibitors of avian egg Fig. 3. The SGCC approach. In (a), we present the distribution used to whites. There are several protease inhibitors in egg white. The maximize mean PRM metric as well as the respective mean intracluster Turkey Ovomucoid is from a Kazal-type inhibitor family of serine distance distribution. (b) Shows the PRM distribution for the best protease inhibitors, which occurs naturally in Meleagris gallopavo. conﬁguration achieved, with 24 clusters. It is a signiﬁcant contaminant of crude Ovomucoid preparations, and it acts on Bovine Trypsin and Chymotrypsin as well as on Porcine Table 1. Quantitative comparison of the proposed algorithms for Eglin C Elastase and Fungal Proteinase (Fujinaga et al., 1987; Robertson cross-inhibition. et al., 1988). Mean intra (Å) Mean inter (Å) Mean PRM We analyze the four non-redundant existing complexes, of AC 3.435 13.835 0.98 which three have Trypsin-like enzymes and one has Subtilisin-like CCC OM 5.460 9.294 – enzymes. By conducting similar experiments to those presented in AC 2.450 13.138 0.82 the previous section, and by varying the number of clusters, we SGCC OM 5.093 11.231 – observe that the mean intradistance stabilizes from four clusters AC 2.679 12.670 0.90 on. We obtain three high-quality clusters according to the PRM EBCC OM 5.339 9.986 – (Supplementary Material). The best mean PRM value is in bold. Table 2 shows the results for the algorithm comparisons. As in the previous analysis, AC presents a combination of low intracluster PRM (Fig. 2b). This set of conserved HP-centroids presents a very distances, high intercluster distances and the highest PRM value high recall value (i.e. they are present in almost all the cross- (0.94) indicating a consistent match of the patterns. inhibition complexes) and furthermore, there is only one case where According to these results, the coarse-grained approach once more two points in a cluster come from the same complex. achieved better results than the ﬁne-grained approach. The same experiment was performed using the ﬁne-grained The three hydrophobic patches that were conserved in the approach, as presented in Figure 3. At this level of abstraction, Ovomucoid complexes are presented in Figure 5. Again, we we could not identify a threshold that clearly distinguishes high- can see a very dissimilar cluster composition and an interesting quality clusters from poor-quality ones. Since we aim to ﬁnd as many conservation of position according to the common inhibitor. We conserved HP-centroids as possible, the coarse-grained approach present additional graphs for Ovomucoid cross-inhibition in the systematically presents better results. This might indicate that the Supplementary Material. cross-inhibition pattern depends on the inhibitor-relative positions According to Baker and Murphy (1997), hydrophobic interactions of the conserved HP-centroids regardless of their density. are essential for explaining how inhibition happens in proteases. Table 1 shows the complete set of results. AC performs better, Our results are in agreement with this hypothesis. Searching for especially in the coarse-grained analysis, achieving low intra- and conserved abstractions of hydrophobic patches at the atomic level high intercluster distances combined with a very high PRM value (HP-centroids) in protease–inhibitor interfaces, we proposed and (0.98). evaluated a global and a local algorithm to cluster centroids. We The semantics of the ﬁve hydrophobic patches represented by aimed to ﬁnd conserved centroids at coarse- and ﬁne-grained levels. the conserved HP-centroids is presented in Figure 4. We can see We conclude that the coarse-grained AC local algorithm was able to [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 346 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Fig. 4. Hydrophobic patches for cross-inhibition by Eglin C. In (a) PDB id 1ACB:E, we can see a sample with Trypsin-like enzyme and in (b) PDB id 1TEC:E, with Subtilisin-like enzyme (the hydrophobic patches for the ﬁve complexes are in the Supplementary Material). We show an atomic graph in which the residue types and numbers are presented and the red (a) and green (b) spheres are the HP-centroids that represent each of the patches. In the last partof the ﬁgure (c), we present the inhibitor (residues from 40 to 48) as gray sticks (in black, the apolar portions), and the ﬁve centroids are superposed in colors. The green shades are the Subtilisin-like HP-centroids and the red ones are Trypsin-like. Fig. 5. Hydrophobic patches for cross-inhibition by Turkey Ovomucoid. In (a) PDB id 1R0R:E, we can see a sample with Subtilisin-like enzyme and in (b) PDB id 1PPF:E, a sample with Trypsin-like enzyme (the hydrophobic patches for the four complexes are shown in the Supplementary Material). We show an atomic graph in which the residue types and numbers are presented and the red (a) and green (b) spheres show the HP-centroids that represent each patch. In (c), we present the inhibitor (residues from 13 to 21) as gray sticks (apolar portions in black), and the ﬁve HP-centroids are superposed in colors. The green shades are the Subtilisin-like centroids and the red ones are the Trypsin-like centroids. identify the more complete set of invariant HP-centroids across the structural alignments as shown in Fig. 6). Notice that we can protease–inhibitor cross-inhibition examples. ﬁnd some conserved residues (marked with *) that are known to Certainly, the contribution of polar interactions must be studied participate in the catalysis (catalytic triad, oxyanion role) or in the in more detail in future work; interestingly, however, we have found speciﬁcity binding sites. Apart from these residues, no other interest a minimum of three invariant centroids in all cross-inhibition cases. conservation can be easily identiﬁed in these logos. As proteins are 3D objects, we conjecture that for a molecule to bind However, our hypothesis is that for inhibition to occur, we must and to hold another one, there must exist at least three non-collinear have very conserved hydrophobic patches in speciﬁc positions to contact points. It is possible that the conserved hydrophobic patches accommodate each of the inhibitors. For example, PHE-215 in obtained are responsible for binding and holding inhibitors at the the Trypsin-like enzymes in Figures 6b and 5b is a voluminous enzyme binding sites. hydrophobic residue that is equivalent to the hydrophobic portions of residues in positions LEU-96, ILE-107 and LEU-126 in the Subtilisin-like enzymes in Figures 6a and 5a. This is an example in 3.3 The use of HP-centroids for inhibition prediction which conserved patterns cannot be inferred from the sequence or structure but are clearly identiﬁed in our conserved HP-centroid I. Once we have the problem of scarcity of experimental complexes Going further, we believe that these patterns could be used representing cross-inhibition examples, it is intriguing to ask to predict inhibition for other enzymes for which structures are whether we can generalize the conserved HP-centroids to binding available but no experimental evidence of inhibition is known. For sites of apo enzymes of the studied families. We extended the instance, we used eight samples of non-redundant Subtilisin-like analysis to a set of non-redundant apo structures of serine proteases apo enzymes (listed in the Supplementary Material) belonging to (a list of proteins is in the Supplementary Material). We project ﬁve different EC numbers (3.4.21.62 / 64 / 66 / 75 / 97). We the IFR obtained from the cross-inhibition complexes to the considered only those enzymes for which the ECs are complete apo enzymes by using structural alignments, and we verify a with the four levels of annotation. According to the BRENDA strong conservation of the HP-centroids found through complex database, three of these are inhibited by Eglin C (3.4.21.62 / 66 analysis. / 75), and we can say that this constitutes successful predictions. Due to the low conservation of residues, it is not possible to As far as we are concerned, the other two enzymes (3.4.21.64 / understand how inhibition occurs by examining only sequence- level conservation (even when sequence alignments are done by 97), which represent Proteinase K and Assemblin Protease, are [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 347 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER V.M.Gonçalves-Almeida et al. Fig. 6. IFR projections of HP-centroids found in serine proteases that are cross-inhibited by Ovomucoid. In (a), we show results for nine non-redundant superposed Subtilisin-like enzymes (residue numbers according to PDB id 1R0R:E) and in (b) we show results for 35 non-redundant superposed Trypsin-like enzymes (residue numbers according to PDB id 1PPF:E). On both sides, the bottom logos show the residues that are in the IFR but that are not part of a conserved cluster. not mentioned in the literature but present the same pattern as apo enzymes representing entire families. By comparing with do the other Subtilisin-like enzymes. It would be very interesting experimental data available in the BRENDA database, we also show to verify experimentally whether they can be inhibited by Eglin some successful examples of how HP-centroids can be used to C, as they present the same HP-centroids as do other complexes predict enzymes that could be inhibited by the studied inhibitors. with this inhibitor. Similar analyses for Ovomucoid are presented Finally, we raise some questions about possible enzymes that might in the Supplementary Material, and we can also verify successful be inhibited by Eglic C and/or Turkey Ovomucoid and expose them predictions and several unknown inhibition possibilities. to further experimental validation. We believe that this work should be extended to other enzyme families for which entropic changes are known to be important factors in inhibition processes. It would also be interesting to 4 CONCLUSIONS verify whether this method should be used in other problems of In this work, we model the problem of understanding and predicting protein–protein interaction pattern detection. enzyme cross-inhibition. We propose and evaluate algorithms to Funding: Brazilian agencies Coordenação de Aperfeiçoamento detect conserved hydrophobic patch centroids (HP-centroids) to de Pessoal de Nível Superior (CAPES); Conselho Nacional de clarify how these centroids occur in proteases. Our model is based Desenvolvimento Cientíﬁco e Tecnológico (CNPq); Fundação de on the importance of apolar interactions to inhibition in this family Amparo a Pesquisa do Estado de Minas Gerais (FAPEMIG); and on the fact that these hydrophobic portions should be studied Financiadora de Estudos e Projetos (FINEP). at an atomic level. We model the interfaces between enzymes and inhibitors as graphs of atomic apolar interactions, detect connected Conﬂict of Interest: none declared. components to represent hydrophobic patches, summarize them using centroids and show how to obtain as complete a set of REFERENCES conserved centroids as possible. One of the strengths of the method is that it achieves the appropriate level of abstraction to detect Baker,B.M. and Murphy,K.P. (1997) Dissecting the energetics of a protein-protein interaction: the binding of ovomucoid third domain to elastase. J. Mol. Biol., 268, the invariant properties involved in cross-inhibition. One of the 557–569. main difﬁculties in the study and understanding of this complex Barrett,A.J. et al. (eds) (2004) Handbook of Proteolytic Enzymes, vol. 1–2, 2 edn. phenomenon through classical methods is that dissimilar sequences Elsevier, London. and structures might be inhibited by the same inhibitor. Despite Berman,H.M. et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. the lack of conservation at the sequence and structure levels, the Betzel,C. et al. (1993) Structure of the proteinase inhibitor eglin c with hydrolysed proposed HP-centroids appear to be promising, as they are very reactive centre at 2.0 a resolution. FEBS Lett., 317, 185–188. conserved across the studied cases of cross-inhibition. Bode,W. et al. (1986) X-ray crystal structure of the complex of human leukocyte As we have few non-redundant experimental complexes available, elastase (pmn elastase) and the third domain of the turkey ovomucoid inhibitor. we test the generality of HP-centroids with a set of non-redundant EMBO J., 5, 2453–2458. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 348 342–349 MANUSCRIPT CATEGORY: ORIGINAL PAPER HydroPaCe Chakrabarti,P. and Janin,J. (2002a) Dissecting protein-protein recognition sites. Pires,D.E.V. et al. (2007) Pdbest: Pdb enhanced structures toolkit. In Proceedings of Proteins, 47, 334–343. the 3rd International Conference of Brazil Association for Bioinformatics. AB3C Chakrabarti,P. and Janin,J. (2002b) Dissecting protein-protein recognition sites. Publishing, São Paulo, p. 39. Proteins Struct. Funct. Genet., 47, 334–343. Pontius,J. et al. (1996) Deviations from standard atomic volumes as a quality measure Chothia,C. and Janin,J. (1975) Principles of protein-protein recognition. Nature, 256, for protein crystal structures. J. Mol. Biol., 264, 121–136. 705–708. Poupon,A. (2004) Voronoi and voronoi-related tessellations in studies of protein Conte,L.L. et al. (1999) The atomic structure of protein-protein recognition sites. J. structure and interaction. Curr. Opin. Struct. Biol., 14, 233–241. Mol. Biol., 285, 2177–2198. Qasim,M.A. et al. (1997) Interscaffolding additivity. association of p1 variants of eglin da Silveira,C.H. et al. (2009) Protein cutoff scanning: a comparative analysis of cutoff c and of turkey ovomucoid third domain with serine proteinases. Biochemistry, 36, dependent and cutoff free methods for prospecting contacts in proteins. Proteins, 1598–1607. 74, 727–743. Rawlings,N.D. et al. (2008) Merops: the peptidase database. Nucleic Acids Res., 36, Dupuis,F. et al. (2005) Voro3d: 3d voronoi tessellations applied to protein structures. D320–D325. Bioinformatics, 21, 1715–1716. Reichardt,J. and Bornholdt,S. (2006) Statistical mechanics of community detection. Ekici,O.D. et al. (2008) Unconventional serine proteases: variations on the catalytic Phys. Rev. E, 74, 016110. ser/his/asp triad conﬁguration. Protein Sci., 17, 2023–2037. Ribeiro,C. et al. (2010) Analysis of binding properties and speciﬁcity through Fujinaga,M. et al. (1987) Crystal and molecular structures of the complex of alpha- identiﬁcation of the interface forming residues (ifr) for serine proteases in silico chymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 a resolution. docked to different inhibitors. BMC Struct. Biol., 10, 36. J. Mol. Biol., 195, 397–418. Robertson,A.D. et al. (1988) Two-dimensional NMR studies of kazal proteinase Hyberts,S.G. et al. (1992) The solution structure of eglin c based on measurements of inhibitors. 1. sequence-speciﬁc assignments and secondary structure of turkey many noes and coupling constants and its comparison with x-ray structures. Protein ovomucoid third domain. Biochemistry, 27, 2519–2529. Sci., 1, 736–751. Scheer,M. et al. (2011) BRENDA, the enzyme information system in 2011. Nucleic Janin,J. et al. (1990) The structure of protein-protein recognition sites. Structure, 265, Acids Res., 39, 670–676. 16027–16030. Siezen,R.J. and Leunissen,J.A. (1997) Subtilases: the superfamily of subtilisin-like Laskowski,M. and Qasim,M.A. (2000) What can the structures of enzyme-inhibitor serine proteases. Protein Sci., 6, 501–523. complexes tell us about the structures of enzyme substrate complexes? Biochim. Sobolev,V. et al. (1999) Automated analysis of interatomic contacts in proteins. Biophys. Acta, 1477, 324–337. Bioinformatics, 15, 327–332. Lesk,A.M. and Fordham,W.D. (1996) Conservation and variability in the structures of Soundararajan,V. et al. (2010) Atomic interaction networks in the core of protein serine proteinases of the chymotrypsin family. J. Mol. Biol., 258, 501–537. domains and their native folds. PLoS One, 5, e9391. Melo-Minardi,R.C. et al. (2007) Finding protein-protein interaction patterns by contact Tuncbag,N. et al. (2011) Prediction of protein-protein interactions: unifying evolution map matching. Genet. Mol. Res., 6, 946–963. and structure at protein interfaces. Phys. Biol., 8, 035006. Neshich,G. et al. (2003) Sting millennium: a web-based suite of programs for Wallace,A.C. et al. (1996) Derivation of 3D coordinate templates for searching comprehensive and simultaneous analysis of protein structure and sequence. Nucleic structural databases: application to ser-his-asp catalytic triads in the serine Acids Res., 31, 3386. proteinases and lipases. Protein Sci., 5, 1001–1013. Newman,M.E.J. and Girvan,M. (2004) Finding and evaluating community structure in Zhang,Z. et al. (2011) Identiﬁcation of cavities on protein surface using multiple networks. Phys. Rev. E, 69, 026113. computational approaches for drug binding site prediction. Bioinformatics, 27, Page,M.J. and Di Cera,E. (2008) Serine peptidases: classiﬁcation, structure and 2083–2088. function. Cell. Mol. Life Sci., 65, 1220–1236. Papamokos,E. et al. (1982) Crystallographic reﬁnement of japanese quail ovomucoid, a kazal-type inhibitor, and model building studies of complexes with serine proteases. J. Mol. Biol., 158, 515–537. [13:31 31/12/2011 Bioinformatics-btr680.tex] Page: 349 342–349

Journal

Bioinformatics – Oxford University Press

Published: Dec 9, 2011

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids

References (36)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies