Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A fast empirical approach to binding free energy calculations based on protein interface information

A fast empirical approach to binding free energy calculations based on protein interface information Abstract Three useful variables from the interfaces of 20 protein–protein complexes were investigated. These variables are the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solventaccessible surface areas (ΔΔASAapol). An empirical model based on the three variables was developed to describe the free energy of protein associations. As the results show, the side-chain accessible numbers characterize the loss of side-chain conformational entropy of protein interactions and the effective empirical function presented here has great capability for estimating the binding free energy. It was found that the variables of interface information capture most of the significant features of protein–protein association. Also, we applied the model based on the variables as a rescoring function to docking simulations and found that it has the potential to distinguish the ‘true’ binding mode. It is clear that the simple and empirical scale developed here is an attractive target function for calculating binding free energy for various biological processes to rational protein design. Introduction Protein–protein interactions play a central role in protein function. Owing to the free energy being the important criterion for protein–protein binding, research on it is important for a better understanding of protein interactions and for the subsequent application of this knowledge to protein engineering and drug design. Computer modeling makes it possible to perform direct simulations to study protein–protein associations. Accurate calculations of the free energy that drives the protein–protein association are based on molecular dynamics or Monte Carlo simulations (Karplus and Petsko, 1990) and the relative free energy is determined by perturbation or integration techniques (Mezei and Beveridge, 1986; Reynolds et al., 1992; Miyamoto and Kollman, 1993). However, these simulation methods require too much computational time for free energy calculation in conformational search, docking and drug design (Goodsell and Olson, 1990; Sezerman et al, 1993; Stoddard and Koshland, 1993). For simplicity, in the past decade several groups have developed empirical functions to compute the binding free energy (Novotny et al., 1989; Smith and Honig, 1994; Vajda et al., 1994, 1995, 1997; Jackson and Sternberg, 1995; Nauchitel and Villaverde, 1995; King et al., 1996; Weng et al., 1997; Xu et al., 1997; Zhang et al., 1997; Takamatsu and Itai, 1998; Camacho et al., 1999). For instance, Vadja and co-workers (Vajda et al., 1994) developed a relatively complete empirical free energy function:  \[{\Delta}\mathit{G}_{cal}\ {=}\ {\Delta}\mathit{E}_{el}\ {+}\ {\Delta}\mathit{G}_{d}\ {-}\ \mathit{T}{\Delta}\mathit{S}_{c}\ {+}\ {\Delta}\mathit{G}_{const}\] (1)where Eel, Gd and ΔSc represent the electrostatic energy change, the desolvation free energy and the change in conformational entropy, respectively, and T is the absolute temperature. The last term, ΔGconst, includes all other free energy changes associated with translation, rotation, vibration and protonation/deprotonation effects. The results show that the average difference between calculated and measured free energies of proteases and their inhibitors was ∼1.3 kcal/mol, representing an error of about 10% (Vajda et al., 1995; King et al., 1996). Subsequently, Zhang et al. put forward a binding free energy function based on the atomic contact energy (Zhang et al., 1997). The binding free energy is estimated by  \[{\Delta}\mathit{G}_{cal}\ {=}\ {\Delta}\mathit{E}_{c}\ {+}\ {\Delta}\mathit{E}_{el}\ {-}\ \mathit{T}{\Delta}\mathit{S}_{trv}\] (2)where ΔEc is the change in atomic contact energy and ΔEel is the direct electrostatic interaction between protease and its inhibitor. The term ΔStrv denotes the entropy change associated with the six degrees of freedom of rotation/translation and vibration. The precision of ΔGcal compared with experimental data was between ±0.1 and ±2 kcal/mol. In addition, Xu et al.(1997) devised a function relative to the hydrophilic number and the molecular surface:  \[{\Delta}\mathit{G}_{cal}\ {=}\ 0.0134\mathit{S}_{pho}\ {+}\ 0.0043\mathit{S}_{phi}\ {+}\ 0.3680\mathit{N}_{pair}\ {+}\ 0.81833\] (3)where Spho and Sphi indicate the buried hydrophobic and hydrophilic molecular surface and Npair denotes hydrophilic pairs of protein complexes, which relate to the strong electrostatic interactions, such as salt bridges, hydrogen bonds and polar–polar interactions. In general, entropy loss is indispensable to the binding free energy. As is well known, the entropy calculation, however, is difficult since it depends on the complete phase space of a molecular system and is sensitive to the inclusion of correlations between motions along the many degrees of freedom (Karplus and Kushick, 1981; Di Nola et al., 1984). Pickett and Sternberg developed an empirical scale to estimate the calculation of the side-chain conformational entropy loss (Pickett and Sternberg, 1993). In the entropy scale the maximum conformational entropy, Sc, of each side chain was calculated by the classical expression  \[\mathit{S}_{c}\ {=}\ {-}\mathit{R}{{\sum}_{i\ =\ 1}^{\mathit{N}}}\ \mathit{P}_{\mathit{ij}}\ ln\ (\mathit{P}_{\mathit{ij}})\] (4)where the Pij value is the probability of the side chain j being in the conformational state i, which can be calculated from the observed distributions of the exposed side chains in proteins with known X-ray structures. In order to avoid the complicated calculation for conformational entropy and to consider the effect of entropy on the binding free energy, we obtained a simple and effective empirical scale for the conformational entropy and the binding free energy through the analysis of protein interfaces. In this study, we analyzed the binding interfaces of 20 protein complexes and extracted the three variables concerned with the interface information, i.e. the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solvent-accessible surface areas of complexes interface (ΔASAapol). Then, the empirical scale in terms of the three variables was established by linear fitting with experimental data for the free energy. In addition, the scale was applied as a score function to the docking processes for 10 protein complexes. Finally, the feasibility and shortcomings of our empirical method are discussed. Systems and methods All X-ray structures of 20 protein complexes were taken from the Protein Data Bank (Bernstein et al., 1977). The unobserved atoms in each structure were generated with the InsightII package on an SGI workstation, which were selected from the extending conformations to avoid steric overlaps. Subsequently these structures were refined by energy minimization using the Gromacs programs (Berendsen et al., 1995). The entire atom model was chosen. The solvent-accessible surface area (ASA) was calculated according to the method of Lee and Richards (Lee and Richards, 1971). The atomic radii were taken from the Gromacs force field parameters. The radius of solvent probe was set to 1.4 Å. The change of the interface of the complex, ΔASA, was calculated from the difference in the buried surface area of each residue between two monomers and a dimer. If the relative change rate of ΔASA was more than 20%, the calculated residue was defined as an interface residue. For the apolar group, ΔASAapol was determined from the buried surface area of C atoms (the contribution of S atoms was omitted). The side-chain accessible number, Nb, was taken from the number of contacted residues in the interface and the contacted residue was defined by the effective accessibility (ΔRA) of its side chain, calculated by  \[{\Delta}\mathit{RA}\ {=}\ \frac{{\Delta}\mathit{A}_{t}}{\mathit{A}^{*}_{t}\ {\times}\ 60\%}\ {\times}\ 100\%\] (5)where ΔAt is the change of accessible surface area of side-chains and A*t is the standard side-chain surface area. If ΔRA of the residue across the interface of complexes was ≥1, the residue was taken as a side-chain accessible residue. The approximate value for 60% of the standard side-chain surface area in Equation 5 was set to 80 Å2 in this work. The number of hydrophilic pairs, Npair, was defined by the distance between the critical points of hydrophilic atoms, which was basically around their centers of contact surfaces (Lin et al., 1994). If the distance between two hydrophilic atoms was <2.8 Å (the diameter of the solvent probe), the two atoms were treated as a hydrophilic pair. To examine our model mentioned above, the 10 complexes with experimentally determined structures were selected as a test set to do molecular docking. The soft protein–protein docking algorithm (C.H.Li et al., in preparation) developed in our group was used for the test and was based on the ‘simplified protein’ models of Janin’s rigid-body protein–protein docking algorithm (Cherfils et al., 1991, 1994; Cherfils and Janin, 1993). The partial binding space including the partial surface of the receptor and complete surface of the ligand was searched, in which 3×104 different modes of contact between two proteins for each case were obtained. After filtering and clustering analysis, about 300 binding modes were retained. The binding free energy was then used to score those retained binding modes. Results and discussion Correlation analysis of interface information The conformational entropy is able to affect the binding free energy of protein and its ligand as well as to drive protein folding. A major unfavorable entropy effect arises from the reduction in the number of accessible conformation, which is available to the protein backbone and side chains. As an approximation, we assume that the backbone in all folded conformations has the same conformational entropy. Therefore, only the entropy loss from the side chain is taken into account when the accessibility of the side chain is more than 60% of the standard side-chain surface area. When the values of the side-chain accessible number, Nb, are used to fit the side-chain conformational entropy loss according to Pickett and Sternberg’s empirical scale, the linear fitting function is given by  \[\mathit{T}{\Delta}\mathit{S}\ {=}\ 1.17\ {-}\ (3.78\ {\pm}\ 0.26)\mathit{N}_{b}\] (6) Figure 1 shows a linear fitting of side-chain conformation entropy (TΔS) versus Nb. It is found that Nb correlates very well with TΔS values. Therefore, Nb can be used to represent the side-chain conformational entropy loss for the protein–protein binding process. Table I also lists other results, such as the buried apolar solvent-accessible areas ΔASAapol, the hydrophobic interaction energy ΔGd, the number of hydrophilic pairs Npair and the experimental binding free energies. Moreover, the electrostatic interaction energies ΔEel of 13 complexes are taken from Zhang et al. (Zhang et al., 1997). Using these values, we completed the following correlation analyses between Npair and ΔEel and between ΔGd and ΔASAapol. Similarly to Figure 1, Figure 2 shows the linear fitting of electrostatic interaction energies versus Npair. Figure 3 also shows the linear fitting of hydrophobic energies ΔGd versus ΔASAapol. It is found that the quantities Nb, Npair and ΔASAapol capture most of the significant features of the interactions involved in those complexes. Fast empirical calculation of binding free energy As mentioned above, Nb, Npair and ΔASAapol are related to the interface of protein complexes and correlate well with the conformational entropy change, the electrostatic interaction and the hydrophobic interaction, respectively. When the protein–protein binding free energy, ΔGcal, is written as a linear function of three variables Nb, Npair and ΔASAapol, ΔGcal can be expressed as  \[{\Delta}\mathit{G}_{cal}\ {=}\ {-}0.87\mathit{N}_{b}\ {-}\ 0.35\mathit{N}_{pair}\ {-}\ 0.03{\Delta}\mathit{ASA}_{apol}\ {+}\ 0.92\] (7)where the parameters are the coefficient obtained from the multiple linear regression method and their values are listed in the second column of Table II. The multiple correlation coefficient R is 95%. It is clear that Nb, Npair and ΔASAapol deduced from the interface can describe well the binding free energy of protein–protein association. Table III reports the comparison among the calculated binding free energies based on the different empirical functions. It is found that our binding free energy function has a higher correlation than other functions with the experimental data. This indicates that the three variables extracted from the interface information discussed here can quantitatively represent the free energy of protein–protein association. Application of the score function in protein–protein docking Currently, the approach of rescoring docked conformations has made progress to some extent and has been used to rescore the lower root mean square deviation (r.m.s.d.) conformations (Norel et al., 2001; Smith and Sternberg, 2002). The main terms used in the rescoring are the statistics of residue–residue contacts across the interfaces of complexes and electrostatics. As discussed above, we presented an empirical method, which was based on the three variables extracted from the binding interface information. The calculation of the free energy of protein–protein association with the method was quick and accurate. Especially the conformational entropy has been taken into account and this term is also accurate, which is supported from analysis. Therefore, we tried to apply this approach as a scoring function to rank the putative docked structures in the protein–protein docking problem. Table IV summarizes the docking results for the 10 protein–protein complexes including the name of the complexes, the ranking position of the first near-native structure using our scoring function and the corresponding r.m.s.d. from the X-ray crystallographic complex. For the first six cases, the complexes were reconstructed from the structures of the co-crystallized proteins. In these cases, the conformations of the two molecules are already ‘adapted’ to each other. For this set of docking simulations, XX was added after the PDB code in the ‘protein’ column. For the following two cases, the complexes were reconstructed from the structures in which one is from the protein of the complex and the other is from the free form. For this set of docking simulations, FX or XF was added after the PDB code, where F and X designate the free form and co-crystallized form, respectively. If the complexes were reconstructed from the structures of both proteins from the free form, FF was added to the PDB code. The docked geometry is taken into account only if the r.m.s.d. of the backbone atoms from the X-ray structure is <4.0 Å. For the 10 tested complexes, all the native-like docked geometries are found, of which six are found within the 10 top ranking solutions. This indicates that our scoring function is able to distinguish the ‘true’ binding mode from the remaining ‘false’ ones. Figure 4 shows a comparison of the experimentally determined structures of four protein complexes and the best-ranked near-native predictions reported in Table IV. Although the r.m.s.d. between the predicted and X-ray structures is around 3.00 Å (see Figure 4), it is clear that the binding site is satisfactorily identified. The definition of a general form of rescoring functions is required to distinguish reliably the ‘true’ binding mode from the remaining ‘false’ ones. Also, speed is an important factor considered in the rescoring functions. As the results show, the rescoring function presented here is relatively fast and effective for scoring the putative conformations. It is expected that the rescoring function is applicable to protein–protein docking. Conclusions The interface information for protein–protein complexes is important for understanding protein–protein interactions and recognition. In this work, we investigated the useful variables from the interfaces and developed a simple scale to calculate the binding free energy of protein–protein association. The variables are used as a scoring function in the protein– protein docking calculation. As discussed above, the side-chain accessible number, Nb, can be reasonable for depicting the loss of side-chain conformational entropy in the binding process. The interface information for complexes has great potential for describing protein–protein association and the corresponding three variables can be used to calculate the binding free energy. The model is advantageous in terms of saving calculation time and ease of use. However, the binding free energy function presented here is based on an approximate treatment in which the molecule is treated as a ‘rigid body’. Today it is necessary to develop both new docking methods for elucidating the details of specific interactions at the atomic level and computational tools for providing information on protein–protein association in various environments (Camacho and Vadja, 2002). The interface information for complexes may give us some helpful hints on the subject and help us to get some ideas about specific associations. Work on improving the accuracy of binding free energy and molecular flexibility is currently under way. Table I. Side-chain conformation entropy, binding free energies and fitting variables Protein ID  Npair  Nb  ΔASAapol  TΔS  ΔGd  ΔGexp  Experimental free energies of 20 protein complexes refer to Zhang et al. (Zhang et al., 1997) and Xu et al. (Xu et al., 1997). All energies are in kcal/mol. The unit of the area is Å2.  1CHO  20  6  121.72  −19.0  −25.4  −14.4  1CSE  21  5  122.53  −17.4  −25.4  −13.1  1TEC  21  5  135.15  −17.9  −27.5  −14.0  1BRS  25  7  138.91  −25.7  −28.9  −17.3  2KAI  22  5  115.27  −18.6  −21.3  −12.5  2PTC  26  5  114.73  −17.9  −19.4  −18.1  2SNI  25  6  139.72  −21.0  −28.8  −15.8  2TGP  25  5  133.85  −17.8  −22.6  −17.8  2TPI  7  4  20.15  −16.8  −7.3  −5.8  3CPA  11  2  72.55  −4.9  −14.9  −5.3  3HFL  15  7  124.95  −26.0  −22.7  −14.2  3SGB  17  4  118.49  −13.2  −21.4  −12.7  4SGB  16  3  120.96  −9.9  −23.1  −11.7  4INS  6  3  135.80  −9.3  −27.5  −7.4  4CPA  15  3  133.81  −11.7  −26.3  −10.0  4TPI  34  5  119.62  −17.3  −19.8  −17.7  2SIC  20  5  142.67  −18.4  −30.3  −12.7  1PPF  20  4  149.84  −14.7  −27.9  −13.5  2SEC  21  5  120.38  −17.3  −24.3  −14.0  3TPI  32  5  111.51  −17.3  −19.3  −17.3  Protein ID  Npair  Nb  ΔASAapol  TΔS  ΔGd  ΔGexp  Experimental free energies of 20 protein complexes refer to Zhang et al. (Zhang et al., 1997) and Xu et al. (Xu et al., 1997). All energies are in kcal/mol. The unit of the area is Å2.  1CHO  20  6  121.72  −19.0  −25.4  −14.4  1CSE  21  5  122.53  −17.4  −25.4  −13.1  1TEC  21  5  135.15  −17.9  −27.5  −14.0  1BRS  25  7  138.91  −25.7  −28.9  −17.3  2KAI  22  5  115.27  −18.6  −21.3  −12.5  2PTC  26  5  114.73  −17.9  −19.4  −18.1  2SNI  25  6  139.72  −21.0  −28.8  −15.8  2TGP  25  5  133.85  −17.8  −22.6  −17.8  2TPI  7  4  20.15  −16.8  −7.3  −5.8  3CPA  11  2  72.55  −4.9  −14.9  −5.3  3HFL  15  7  124.95  −26.0  −22.7  −14.2  3SGB  17  4  118.49  −13.2  −21.4  −12.7  4SGB  16  3  120.96  −9.9  −23.1  −11.7  4INS  6  3  135.80  −9.3  −27.5  −7.4  4CPA  15  3  133.81  −11.7  −26.3  −10.0  4TPI  34  5  119.62  −17.3  −19.8  −17.7  2SIC  20  5  142.67  −18.4  −30.3  −12.7  1PPF  20  4  149.84  −14.7  −27.9  −13.5  2SEC  21  5  120.38  −17.3  −24.3  −14.0  3TPI  32  5  111.51  −17.3  −19.3  −17.3  View Large Table II. Results of multiple linear regression of binding free energy Variable  Valuea  Errorb  t-Valuec  Prob>|t|d  aPartial regression coefficient.  bThe standard error of the value.  cThe zero-order correlation coefficient.  dThe significance level of the t-value.  Npair  −0.35123  0.04734  −7.41995  <0.0001  Nb  −0.87437  0.25209  −3.46846  0.00317  ΔASAapol  −0.02561  0.01067  −2.39917  0.02897  Constant  0.91791  1.36731  0.66913  0.51295  Variable  Valuea  Errorb  t-Valuec  Prob>|t|d  aPartial regression coefficient.  bThe standard error of the value.  cThe zero-order correlation coefficient.  dThe significance level of the t-value.  Npair  −0.35123  0.04734  −7.41995  <0.0001  Nb  −0.87437  0.25209  −3.46846  0.00317  ΔASAapol  −0.02561  0.01067  −2.39917  0.02897  Constant  0.91791  1.36731  0.66913  0.51295  View Large Table III. The comparison of four different empirical methods for calculating binding free energy Protein  ΔGexp  ΔG1cal  ΔG2cal  ΔG3cal  ΔG4cal  All values are in kcal/mol. The values of binding free energies are reported for different models: ΔG1cal are taken from Vajda et al. (Vajda et al., 1994) using the complete empirical free function; ΔG2cal are taken from Zhang et al. (Zhang et al., 1997) based on the atomic contact energies (ACE) method; ΔG3cal are taken from Xu et al. (Xu et al., 1997) based on the hydrophilic pair model; ΔG4cal are the calculated values obtained by using Equation 8.  1CHO  −14.4  −14.52  −13.62  −14.06  −14.40  1CSE  −13.1  −16.50  −  −15.18  −13.51  1TEC  −14.0  −14.52  −12.89  −14.79  −13.92  1BRS  −17.3  −  −  −14.93  −17.68  2KAI  −12.5  −15.98  −13.77  −12.82  −13.03  2PTC  −18.1  −17.20  −15.19  −15.55  −15.97  2SNI  −15.8  −15.62  −15.65  −16.74  −16.45  2TGP  −17.8  −17.09  −15.65  −14.96  −15.56  2TPI  −5.8  −  −  −5.29  −6.47  3CPA  −5.3  −  −  −6.66  −6.33  3HFL  14.2  15.40  −  12.54  13.70  3SGB  −12.7  −10.58  −8.54  −11.90  −10.57  4SGB  −11.7  −13.50  −12.43  −12.50  −10.09  4INS  −7.4  −  −  −8.44  −8.45  4CPA  −10.0  −  −  −12.54  −10.50  4TPI  −17.7  −19.40  −  −20.98  −18.84  2SIC  −12.7  −  −  −15.22  −13.92  1PPF  −13.5  −10.5  −  −  −11.80  2SEC  −14.0  −  −12.2  −  −13.51  3TPI  −17.3  −21.2  −  −  −18.02  Correlation coefficient/            No. of samples  −  0.75/13  0.70/9  0.86/17  0.95/20  Protein  ΔGexp  ΔG1cal  ΔG2cal  ΔG3cal  ΔG4cal  All values are in kcal/mol. The values of binding free energies are reported for different models: ΔG1cal are taken from Vajda et al. (Vajda et al., 1994) using the complete empirical free function; ΔG2cal are taken from Zhang et al. (Zhang et al., 1997) based on the atomic contact energies (ACE) method; ΔG3cal are taken from Xu et al. (Xu et al., 1997) based on the hydrophilic pair model; ΔG4cal are the calculated values obtained by using Equation 8.  1CHO  −14.4  −14.52  −13.62  −14.06  −14.40  1CSE  −13.1  −16.50  −  −15.18  −13.51  1TEC  −14.0  −14.52  −12.89  −14.79  −13.92  1BRS  −17.3  −  −  −14.93  −17.68  2KAI  −12.5  −15.98  −13.77  −12.82  −13.03  2PTC  −18.1  −17.20  −15.19  −15.55  −15.97  2SNI  −15.8  −15.62  −15.65  −16.74  −16.45  2TGP  −17.8  −17.09  −15.65  −14.96  −15.56  2TPI  −5.8  −  −  −5.29  −6.47  3CPA  −5.3  −  −  −6.66  −6.33  3HFL  14.2  15.40  −  12.54  13.70  3SGB  −12.7  −10.58  −8.54  −11.90  −10.57  4SGB  −11.7  −13.50  −12.43  −12.50  −10.09  4INS  −7.4  −  −  −8.44  −8.45  4CPA  −10.0  −  −  −12.54  −10.50  4TPI  −17.7  −19.40  −  −20.98  −18.84  2SIC  −12.7  −  −  −15.22  −13.92  1PPF  −13.5  −10.5  −  −  −11.80  2SEC  −14.0  −  −12.2  −  −13.51  3TPI  −17.3  −21.2  −  −  −18.02  Correlation coefficient/            No. of samples  −  0.75/13  0.70/9  0.86/17  0.95/20  View Large Table IV. Results of molecular docking calculations Protein  Name of complex  Rank  R.m.s.d. (Å)  4INSXX  Insulin dimer  3  0.74  1ACBXX  α-Chymotrypsin/ovomucoid  3  0.36  1CSEXX  Subtilisin/streptomyces inhibitor  8  3.50  1CHOXX  β-Trypsin/pancreatic trypsin inhibitor  2  3.30  2PTCXX  Cytochrome c peroxidase/iso-1-cytochrome c  19  0.85  2SICXX  Trypsinogen/pancreatic trypsin inhibitor  14  3.11  1PPEFX  Thermitase/eglin C  4  2.30  2TECFX  Trypsinogen/pancreatic trypsin inhibitor  27  3.45  1BRBFF  β-Trypsin/pancreatic trypsin inhibitor  8  3.00  2PTCFF  Trypsin/pancreatic trypsin inhibitor  15  1.39  Protein  Name of complex  Rank  R.m.s.d. (Å)  4INSXX  Insulin dimer  3  0.74  1ACBXX  α-Chymotrypsin/ovomucoid  3  0.36  1CSEXX  Subtilisin/streptomyces inhibitor  8  3.50  1CHOXX  β-Trypsin/pancreatic trypsin inhibitor  2  3.30  2PTCXX  Cytochrome c peroxidase/iso-1-cytochrome c  19  0.85  2SICXX  Trypsinogen/pancreatic trypsin inhibitor  14  3.11  1PPEFX  Thermitase/eglin C  4  2.30  2TECFX  Trypsinogen/pancreatic trypsin inhibitor  27  3.45  1BRBFF  β-Trypsin/pancreatic trypsin inhibitor  8  3.00  2PTCFF  Trypsin/pancreatic trypsin inhibitor  15  1.39  View Large Fig. 1. View largeDownload slide Linear fitting of side-chain conformation entropy (TΔS) versus Nb. Results are calculated for the 20 complexes reported in Table I. The correlation coefficient, R, is equal to 0.97. Fig. 1. View largeDownload slide Linear fitting of side-chain conformation entropy (TΔS) versus Nb. Results are calculated for the 20 complexes reported in Table I. The correlation coefficient, R, is equal to 0.97. Fig. 2. View largeDownload slide Linear fitting of electrostatic interaction energies versus Npair. Results are calculated for the 13 complexes. The correlation coefficient, R, is equal to 0.92. Fig. 2. View largeDownload slide Linear fitting of electrostatic interaction energies versus Npair. Results are calculated for the 13 complexes. The correlation coefficient, R, is equal to 0.92. Fig. 3. View largeDownload slide Linear fitting of hydrophobic energies versus ΔASAapol. The values of ΔASAapol are in Å2. Results are calculated for the 20 complexes in Table I. The correlation coefficient, R, is equal to 0.94. Fig. 3. View largeDownload slide Linear fitting of hydrophobic energies versus ΔASAapol. The values of ΔASAapol are in Å2. Results are calculated for the 20 complexes in Table I. The correlation coefficient, R, is equal to 0.94. Fig. 4. View largeDownload slide Superposition of the experimentally determined structures of four protein complexes and the best ranked near-correct predictions reported in Table IV. Thick lines: Cα trace of experimental structure. Thin lines: Cα trace of predicted model. The four selection plotting structures are taken from Table IV: (A) 1ACBXX; (B) 1PPEFX; (C) 1BRBFF; (D) 2PTCFF. Fig. 4. View largeDownload slide Superposition of the experimentally determined structures of four protein complexes and the best ranked near-correct predictions reported in Table IV. Thick lines: Cα trace of experimental structure. Thin lines: Cα trace of predicted model. The four selection plotting structures are taken from Table IV: (A) 1ACBXX; (B) 1PPEFX; (C) 1BRBFF; (D) 2PTCFF. 1 To whom correspondence should be addressed. E-mail: cxwang@bjpu.edu.cn We thank Professor J.Janin for providing the docking package. We also thank Dr Ben Zhuo Lu for helpful discussions. This work was supported in part by the Chinese Natural Science Foundation (Nos 29992590–2, 30170230 and 10174005). References Berendsen,H.J.C., van der Spoel,D. and van Drunen,R. ( 1995) Comput. Phys. Commun. , 91, 43–56. Google Scholar Bernstein,F.C., Koetzle,T.F., Williams,G.J.B, Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. ( 1977) J. Mol. Biol. , 112, 535–542. Google Scholar Camacho,C.J. and Vajda,S. ( 2002) Curr. Opin. Struct. Biol. , 12, 36–40. Google Scholar Camacho,C.J., Weng,Z., Vajda,S. and DeLisi,C. ( 1999) Biophys J. , 76, 1166–1178. Google Scholar Cherfils,J. and Janin,J. ( 1993) Curr. Opin. Struct. Biol. , 3, 265–269. Google Scholar Cherfils,J., Duquerroy,S. and Janin,J. ( 1991) Proteins: Struct. Funct. Genet. , 11, 271–280. Google Scholar Cherfils,J., Bizebard,T., Knossow,M. and Janin,J. ( 1994) Proteins: Struct. Funct. Genet. , 18, 8–18. Google Scholar Di Nola,A., Berendsen,H.J.C. and Edholm,O. ( 1984) Macromolecules , 17, 2044–2050. Google Scholar Goodsell,D.S. and Olson,A.J. ( 1990) Proteins: Struct. Funct. Genet. , 8, 195–202. Google Scholar Jackson,R.M. and Sternberg,M.J. ( 1995) J. Mol. Biol. , 250, 258–275. Google Scholar Karplus,M. and Kushick J.N. ( 1981) Macromolecules , 14, 325–332. Google Scholar Karplus,M. and Petsko,G.A. ( 1990) Nature , 347, 631–639. Google Scholar King,B.L., Vajda,S. and DeLisi,C. ( 1996) FEBS Lett. , 384, 87–91. Google Scholar Lee,B. and Richards F.M. ( 1971) J. Mol. Biol. , 55, 379–400. Google Scholar Lin,S.L., Nussinov,R., Fischer,D. and Wolfson,H.J. ( 1994) Proteins: Struct. Funct. Genet. , 18, 94–101. Google Scholar Mezei,M. and Beveridge,D.L. ( 1986) Ann. N. Y. Acad. Sci. , 482, 1–23. Google Scholar Miyamoto,S. and Kollman,P.A. ( 1993) Proteins: Struct. Funct. Genet. , 16, 226–245. Google Scholar Nauchitel,V., Villaverde,M.C. and Sussman,F. ( 1995) Protein Sci. , 4, 1356–1364. Google Scholar Norel,R., Sheinerman,F., Petrey,D. and Honig,B. ( 2001) Protein Sci. , 10, 2147–2161. Google Scholar Novotny,J., Bruccoleri,R.E. and Saul,F.A. ( 1989) Biochemistry , 28, 4735–4749. Google Scholar Pickett,S.D. and Sternberg,M.J.E. ( 1993) J. Mol. Biol. , 231, 825–839. Google Scholar Reynolds,C.A., King,P.M. and Richards,W.G. ( 1992) Mol. Phys. , 76, 251–275. Google Scholar Sezerman,U., Vajda,S., Cornette,J., DeLisi,C. ( 1993) Protein Sci. , 2, 1827–1843. Google Scholar Smith,G.R. and Sternberg,J.E. ( 2002) Curr. Opin. Struct. Biol. , 12, 28–35. Google Scholar Smith,K.C. and Honig,B. ( 1994) Proteins: Struct. Funct. Genet. , 18, 119–132. Google Scholar Stoddard,B.L. and Koshland,D.E.,Jr. ( 1993) Proc. Natl Acad. Sci. USA , 90, 1146–1153. Google Scholar Takamatsu,Y. and Itai,A. ( 1998) Proteins: Struct. Funct. Genet. , 33, 62–73. Google Scholar Vajda,S., Weng,Z.P., Rosenfld,R. and DeLisi,C. ( 1994) Biochemistry , 33, 13977–13988. Google Scholar Vajda,S., Weng,Z.P. and DeLisi,C. ( 1995) Protein Sci. , 8, 1081–1092. Google Scholar Vajda,S., Sippl,M., Novotny,J. ( 1997) Curr. Opin. Struct. Biol. , 2, 222–228. Google Scholar Weng,Z.P., DeLisi,C. and Vajda,S. ( 1997) Protein Sci. , 6, 1976–1984. Google Scholar Xu,D., Lin,S.L. and Nussinov,R. ( 1997) J. Mol. Biol. , 265, 68–84. Google Scholar Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. ( 1997) J. Mol. Biol. , 267, 707–726. Google Scholar © Oxford University Press http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Protein Engineering, Design and Selection Oxford University Press

A fast empirical approach to binding free energy calculations based on protein interface information

Loading next page...
 
/lp/oxford-university-press/a-fast-empirical-approach-to-binding-free-energy-calculations-based-on-VO46pufuuK

References (32)

Publisher
Oxford University Press
Copyright
© Oxford University Press
ISSN
1741-0126
eISSN
1741-0134
DOI
10.1093/protein/15.8.677
Publisher site
See Article on Publisher Site

Abstract

Abstract Three useful variables from the interfaces of 20 protein–protein complexes were investigated. These variables are the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solventaccessible surface areas (ΔΔASAapol). An empirical model based on the three variables was developed to describe the free energy of protein associations. As the results show, the side-chain accessible numbers characterize the loss of side-chain conformational entropy of protein interactions and the effective empirical function presented here has great capability for estimating the binding free energy. It was found that the variables of interface information capture most of the significant features of protein–protein association. Also, we applied the model based on the variables as a rescoring function to docking simulations and found that it has the potential to distinguish the ‘true’ binding mode. It is clear that the simple and empirical scale developed here is an attractive target function for calculating binding free energy for various biological processes to rational protein design. Introduction Protein–protein interactions play a central role in protein function. Owing to the free energy being the important criterion for protein–protein binding, research on it is important for a better understanding of protein interactions and for the subsequent application of this knowledge to protein engineering and drug design. Computer modeling makes it possible to perform direct simulations to study protein–protein associations. Accurate calculations of the free energy that drives the protein–protein association are based on molecular dynamics or Monte Carlo simulations (Karplus and Petsko, 1990) and the relative free energy is determined by perturbation or integration techniques (Mezei and Beveridge, 1986; Reynolds et al., 1992; Miyamoto and Kollman, 1993). However, these simulation methods require too much computational time for free energy calculation in conformational search, docking and drug design (Goodsell and Olson, 1990; Sezerman et al, 1993; Stoddard and Koshland, 1993). For simplicity, in the past decade several groups have developed empirical functions to compute the binding free energy (Novotny et al., 1989; Smith and Honig, 1994; Vajda et al., 1994, 1995, 1997; Jackson and Sternberg, 1995; Nauchitel and Villaverde, 1995; King et al., 1996; Weng et al., 1997; Xu et al., 1997; Zhang et al., 1997; Takamatsu and Itai, 1998; Camacho et al., 1999). For instance, Vadja and co-workers (Vajda et al., 1994) developed a relatively complete empirical free energy function:  \[{\Delta}\mathit{G}_{cal}\ {=}\ {\Delta}\mathit{E}_{el}\ {+}\ {\Delta}\mathit{G}_{d}\ {-}\ \mathit{T}{\Delta}\mathit{S}_{c}\ {+}\ {\Delta}\mathit{G}_{const}\] (1)where Eel, Gd and ΔSc represent the electrostatic energy change, the desolvation free energy and the change in conformational entropy, respectively, and T is the absolute temperature. The last term, ΔGconst, includes all other free energy changes associated with translation, rotation, vibration and protonation/deprotonation effects. The results show that the average difference between calculated and measured free energies of proteases and their inhibitors was ∼1.3 kcal/mol, representing an error of about 10% (Vajda et al., 1995; King et al., 1996). Subsequently, Zhang et al. put forward a binding free energy function based on the atomic contact energy (Zhang et al., 1997). The binding free energy is estimated by  \[{\Delta}\mathit{G}_{cal}\ {=}\ {\Delta}\mathit{E}_{c}\ {+}\ {\Delta}\mathit{E}_{el}\ {-}\ \mathit{T}{\Delta}\mathit{S}_{trv}\] (2)where ΔEc is the change in atomic contact energy and ΔEel is the direct electrostatic interaction between protease and its inhibitor. The term ΔStrv denotes the entropy change associated with the six degrees of freedom of rotation/translation and vibration. The precision of ΔGcal compared with experimental data was between ±0.1 and ±2 kcal/mol. In addition, Xu et al.(1997) devised a function relative to the hydrophilic number and the molecular surface:  \[{\Delta}\mathit{G}_{cal}\ {=}\ 0.0134\mathit{S}_{pho}\ {+}\ 0.0043\mathit{S}_{phi}\ {+}\ 0.3680\mathit{N}_{pair}\ {+}\ 0.81833\] (3)where Spho and Sphi indicate the buried hydrophobic and hydrophilic molecular surface and Npair denotes hydrophilic pairs of protein complexes, which relate to the strong electrostatic interactions, such as salt bridges, hydrogen bonds and polar–polar interactions. In general, entropy loss is indispensable to the binding free energy. As is well known, the entropy calculation, however, is difficult since it depends on the complete phase space of a molecular system and is sensitive to the inclusion of correlations between motions along the many degrees of freedom (Karplus and Kushick, 1981; Di Nola et al., 1984). Pickett and Sternberg developed an empirical scale to estimate the calculation of the side-chain conformational entropy loss (Pickett and Sternberg, 1993). In the entropy scale the maximum conformational entropy, Sc, of each side chain was calculated by the classical expression  \[\mathit{S}_{c}\ {=}\ {-}\mathit{R}{{\sum}_{i\ =\ 1}^{\mathit{N}}}\ \mathit{P}_{\mathit{ij}}\ ln\ (\mathit{P}_{\mathit{ij}})\] (4)where the Pij value is the probability of the side chain j being in the conformational state i, which can be calculated from the observed distributions of the exposed side chains in proteins with known X-ray structures. In order to avoid the complicated calculation for conformational entropy and to consider the effect of entropy on the binding free energy, we obtained a simple and effective empirical scale for the conformational entropy and the binding free energy through the analysis of protein interfaces. In this study, we analyzed the binding interfaces of 20 protein complexes and extracted the three variables concerned with the interface information, i.e. the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solvent-accessible surface areas of complexes interface (ΔASAapol). Then, the empirical scale in terms of the three variables was established by linear fitting with experimental data for the free energy. In addition, the scale was applied as a score function to the docking processes for 10 protein complexes. Finally, the feasibility and shortcomings of our empirical method are discussed. Systems and methods All X-ray structures of 20 protein complexes were taken from the Protein Data Bank (Bernstein et al., 1977). The unobserved atoms in each structure were generated with the InsightII package on an SGI workstation, which were selected from the extending conformations to avoid steric overlaps. Subsequently these structures were refined by energy minimization using the Gromacs programs (Berendsen et al., 1995). The entire atom model was chosen. The solvent-accessible surface area (ASA) was calculated according to the method of Lee and Richards (Lee and Richards, 1971). The atomic radii were taken from the Gromacs force field parameters. The radius of solvent probe was set to 1.4 Å. The change of the interface of the complex, ΔASA, was calculated from the difference in the buried surface area of each residue between two monomers and a dimer. If the relative change rate of ΔASA was more than 20%, the calculated residue was defined as an interface residue. For the apolar group, ΔASAapol was determined from the buried surface area of C atoms (the contribution of S atoms was omitted). The side-chain accessible number, Nb, was taken from the number of contacted residues in the interface and the contacted residue was defined by the effective accessibility (ΔRA) of its side chain, calculated by  \[{\Delta}\mathit{RA}\ {=}\ \frac{{\Delta}\mathit{A}_{t}}{\mathit{A}^{*}_{t}\ {\times}\ 60\%}\ {\times}\ 100\%\] (5)where ΔAt is the change of accessible surface area of side-chains and A*t is the standard side-chain surface area. If ΔRA of the residue across the interface of complexes was ≥1, the residue was taken as a side-chain accessible residue. The approximate value for 60% of the standard side-chain surface area in Equation 5 was set to 80 Å2 in this work. The number of hydrophilic pairs, Npair, was defined by the distance between the critical points of hydrophilic atoms, which was basically around their centers of contact surfaces (Lin et al., 1994). If the distance between two hydrophilic atoms was <2.8 Å (the diameter of the solvent probe), the two atoms were treated as a hydrophilic pair. To examine our model mentioned above, the 10 complexes with experimentally determined structures were selected as a test set to do molecular docking. The soft protein–protein docking algorithm (C.H.Li et al., in preparation) developed in our group was used for the test and was based on the ‘simplified protein’ models of Janin’s rigid-body protein–protein docking algorithm (Cherfils et al., 1991, 1994; Cherfils and Janin, 1993). The partial binding space including the partial surface of the receptor and complete surface of the ligand was searched, in which 3×104 different modes of contact between two proteins for each case were obtained. After filtering and clustering analysis, about 300 binding modes were retained. The binding free energy was then used to score those retained binding modes. Results and discussion Correlation analysis of interface information The conformational entropy is able to affect the binding free energy of protein and its ligand as well as to drive protein folding. A major unfavorable entropy effect arises from the reduction in the number of accessible conformation, which is available to the protein backbone and side chains. As an approximation, we assume that the backbone in all folded conformations has the same conformational entropy. Therefore, only the entropy loss from the side chain is taken into account when the accessibility of the side chain is more than 60% of the standard side-chain surface area. When the values of the side-chain accessible number, Nb, are used to fit the side-chain conformational entropy loss according to Pickett and Sternberg’s empirical scale, the linear fitting function is given by  \[\mathit{T}{\Delta}\mathit{S}\ {=}\ 1.17\ {-}\ (3.78\ {\pm}\ 0.26)\mathit{N}_{b}\] (6) Figure 1 shows a linear fitting of side-chain conformation entropy (TΔS) versus Nb. It is found that Nb correlates very well with TΔS values. Therefore, Nb can be used to represent the side-chain conformational entropy loss for the protein–protein binding process. Table I also lists other results, such as the buried apolar solvent-accessible areas ΔASAapol, the hydrophobic interaction energy ΔGd, the number of hydrophilic pairs Npair and the experimental binding free energies. Moreover, the electrostatic interaction energies ΔEel of 13 complexes are taken from Zhang et al. (Zhang et al., 1997). Using these values, we completed the following correlation analyses between Npair and ΔEel and between ΔGd and ΔASAapol. Similarly to Figure 1, Figure 2 shows the linear fitting of electrostatic interaction energies versus Npair. Figure 3 also shows the linear fitting of hydrophobic energies ΔGd versus ΔASAapol. It is found that the quantities Nb, Npair and ΔASAapol capture most of the significant features of the interactions involved in those complexes. Fast empirical calculation of binding free energy As mentioned above, Nb, Npair and ΔASAapol are related to the interface of protein complexes and correlate well with the conformational entropy change, the electrostatic interaction and the hydrophobic interaction, respectively. When the protein–protein binding free energy, ΔGcal, is written as a linear function of three variables Nb, Npair and ΔASAapol, ΔGcal can be expressed as  \[{\Delta}\mathit{G}_{cal}\ {=}\ {-}0.87\mathit{N}_{b}\ {-}\ 0.35\mathit{N}_{pair}\ {-}\ 0.03{\Delta}\mathit{ASA}_{apol}\ {+}\ 0.92\] (7)where the parameters are the coefficient obtained from the multiple linear regression method and their values are listed in the second column of Table II. The multiple correlation coefficient R is 95%. It is clear that Nb, Npair and ΔASAapol deduced from the interface can describe well the binding free energy of protein–protein association. Table III reports the comparison among the calculated binding free energies based on the different empirical functions. It is found that our binding free energy function has a higher correlation than other functions with the experimental data. This indicates that the three variables extracted from the interface information discussed here can quantitatively represent the free energy of protein–protein association. Application of the score function in protein–protein docking Currently, the approach of rescoring docked conformations has made progress to some extent and has been used to rescore the lower root mean square deviation (r.m.s.d.) conformations (Norel et al., 2001; Smith and Sternberg, 2002). The main terms used in the rescoring are the statistics of residue–residue contacts across the interfaces of complexes and electrostatics. As discussed above, we presented an empirical method, which was based on the three variables extracted from the binding interface information. The calculation of the free energy of protein–protein association with the method was quick and accurate. Especially the conformational entropy has been taken into account and this term is also accurate, which is supported from analysis. Therefore, we tried to apply this approach as a scoring function to rank the putative docked structures in the protein–protein docking problem. Table IV summarizes the docking results for the 10 protein–protein complexes including the name of the complexes, the ranking position of the first near-native structure using our scoring function and the corresponding r.m.s.d. from the X-ray crystallographic complex. For the first six cases, the complexes were reconstructed from the structures of the co-crystallized proteins. In these cases, the conformations of the two molecules are already ‘adapted’ to each other. For this set of docking simulations, XX was added after the PDB code in the ‘protein’ column. For the following two cases, the complexes were reconstructed from the structures in which one is from the protein of the complex and the other is from the free form. For this set of docking simulations, FX or XF was added after the PDB code, where F and X designate the free form and co-crystallized form, respectively. If the complexes were reconstructed from the structures of both proteins from the free form, FF was added to the PDB code. The docked geometry is taken into account only if the r.m.s.d. of the backbone atoms from the X-ray structure is <4.0 Å. For the 10 tested complexes, all the native-like docked geometries are found, of which six are found within the 10 top ranking solutions. This indicates that our scoring function is able to distinguish the ‘true’ binding mode from the remaining ‘false’ ones. Figure 4 shows a comparison of the experimentally determined structures of four protein complexes and the best-ranked near-native predictions reported in Table IV. Although the r.m.s.d. between the predicted and X-ray structures is around 3.00 Å (see Figure 4), it is clear that the binding site is satisfactorily identified. The definition of a general form of rescoring functions is required to distinguish reliably the ‘true’ binding mode from the remaining ‘false’ ones. Also, speed is an important factor considered in the rescoring functions. As the results show, the rescoring function presented here is relatively fast and effective for scoring the putative conformations. It is expected that the rescoring function is applicable to protein–protein docking. Conclusions The interface information for protein–protein complexes is important for understanding protein–protein interactions and recognition. In this work, we investigated the useful variables from the interfaces and developed a simple scale to calculate the binding free energy of protein–protein association. The variables are used as a scoring function in the protein– protein docking calculation. As discussed above, the side-chain accessible number, Nb, can be reasonable for depicting the loss of side-chain conformational entropy in the binding process. The interface information for complexes has great potential for describing protein–protein association and the corresponding three variables can be used to calculate the binding free energy. The model is advantageous in terms of saving calculation time and ease of use. However, the binding free energy function presented here is based on an approximate treatment in which the molecule is treated as a ‘rigid body’. Today it is necessary to develop both new docking methods for elucidating the details of specific interactions at the atomic level and computational tools for providing information on protein–protein association in various environments (Camacho and Vadja, 2002). The interface information for complexes may give us some helpful hints on the subject and help us to get some ideas about specific associations. Work on improving the accuracy of binding free energy and molecular flexibility is currently under way. Table I. Side-chain conformation entropy, binding free energies and fitting variables Protein ID  Npair  Nb  ΔASAapol  TΔS  ΔGd  ΔGexp  Experimental free energies of 20 protein complexes refer to Zhang et al. (Zhang et al., 1997) and Xu et al. (Xu et al., 1997). All energies are in kcal/mol. The unit of the area is Å2.  1CHO  20  6  121.72  −19.0  −25.4  −14.4  1CSE  21  5  122.53  −17.4  −25.4  −13.1  1TEC  21  5  135.15  −17.9  −27.5  −14.0  1BRS  25  7  138.91  −25.7  −28.9  −17.3  2KAI  22  5  115.27  −18.6  −21.3  −12.5  2PTC  26  5  114.73  −17.9  −19.4  −18.1  2SNI  25  6  139.72  −21.0  −28.8  −15.8  2TGP  25  5  133.85  −17.8  −22.6  −17.8  2TPI  7  4  20.15  −16.8  −7.3  −5.8  3CPA  11  2  72.55  −4.9  −14.9  −5.3  3HFL  15  7  124.95  −26.0  −22.7  −14.2  3SGB  17  4  118.49  −13.2  −21.4  −12.7  4SGB  16  3  120.96  −9.9  −23.1  −11.7  4INS  6  3  135.80  −9.3  −27.5  −7.4  4CPA  15  3  133.81  −11.7  −26.3  −10.0  4TPI  34  5  119.62  −17.3  −19.8  −17.7  2SIC  20  5  142.67  −18.4  −30.3  −12.7  1PPF  20  4  149.84  −14.7  −27.9  −13.5  2SEC  21  5  120.38  −17.3  −24.3  −14.0  3TPI  32  5  111.51  −17.3  −19.3  −17.3  Protein ID  Npair  Nb  ΔASAapol  TΔS  ΔGd  ΔGexp  Experimental free energies of 20 protein complexes refer to Zhang et al. (Zhang et al., 1997) and Xu et al. (Xu et al., 1997). All energies are in kcal/mol. The unit of the area is Å2.  1CHO  20  6  121.72  −19.0  −25.4  −14.4  1CSE  21  5  122.53  −17.4  −25.4  −13.1  1TEC  21  5  135.15  −17.9  −27.5  −14.0  1BRS  25  7  138.91  −25.7  −28.9  −17.3  2KAI  22  5  115.27  −18.6  −21.3  −12.5  2PTC  26  5  114.73  −17.9  −19.4  −18.1  2SNI  25  6  139.72  −21.0  −28.8  −15.8  2TGP  25  5  133.85  −17.8  −22.6  −17.8  2TPI  7  4  20.15  −16.8  −7.3  −5.8  3CPA  11  2  72.55  −4.9  −14.9  −5.3  3HFL  15  7  124.95  −26.0  −22.7  −14.2  3SGB  17  4  118.49  −13.2  −21.4  −12.7  4SGB  16  3  120.96  −9.9  −23.1  −11.7  4INS  6  3  135.80  −9.3  −27.5  −7.4  4CPA  15  3  133.81  −11.7  −26.3  −10.0  4TPI  34  5  119.62  −17.3  −19.8  −17.7  2SIC  20  5  142.67  −18.4  −30.3  −12.7  1PPF  20  4  149.84  −14.7  −27.9  −13.5  2SEC  21  5  120.38  −17.3  −24.3  −14.0  3TPI  32  5  111.51  −17.3  −19.3  −17.3  View Large Table II. Results of multiple linear regression of binding free energy Variable  Valuea  Errorb  t-Valuec  Prob>|t|d  aPartial regression coefficient.  bThe standard error of the value.  cThe zero-order correlation coefficient.  dThe significance level of the t-value.  Npair  −0.35123  0.04734  −7.41995  <0.0001  Nb  −0.87437  0.25209  −3.46846  0.00317  ΔASAapol  −0.02561  0.01067  −2.39917  0.02897  Constant  0.91791  1.36731  0.66913  0.51295  Variable  Valuea  Errorb  t-Valuec  Prob>|t|d  aPartial regression coefficient.  bThe standard error of the value.  cThe zero-order correlation coefficient.  dThe significance level of the t-value.  Npair  −0.35123  0.04734  −7.41995  <0.0001  Nb  −0.87437  0.25209  −3.46846  0.00317  ΔASAapol  −0.02561  0.01067  −2.39917  0.02897  Constant  0.91791  1.36731  0.66913  0.51295  View Large Table III. The comparison of four different empirical methods for calculating binding free energy Protein  ΔGexp  ΔG1cal  ΔG2cal  ΔG3cal  ΔG4cal  All values are in kcal/mol. The values of binding free energies are reported for different models: ΔG1cal are taken from Vajda et al. (Vajda et al., 1994) using the complete empirical free function; ΔG2cal are taken from Zhang et al. (Zhang et al., 1997) based on the atomic contact energies (ACE) method; ΔG3cal are taken from Xu et al. (Xu et al., 1997) based on the hydrophilic pair model; ΔG4cal are the calculated values obtained by using Equation 8.  1CHO  −14.4  −14.52  −13.62  −14.06  −14.40  1CSE  −13.1  −16.50  −  −15.18  −13.51  1TEC  −14.0  −14.52  −12.89  −14.79  −13.92  1BRS  −17.3  −  −  −14.93  −17.68  2KAI  −12.5  −15.98  −13.77  −12.82  −13.03  2PTC  −18.1  −17.20  −15.19  −15.55  −15.97  2SNI  −15.8  −15.62  −15.65  −16.74  −16.45  2TGP  −17.8  −17.09  −15.65  −14.96  −15.56  2TPI  −5.8  −  −  −5.29  −6.47  3CPA  −5.3  −  −  −6.66  −6.33  3HFL  14.2  15.40  −  12.54  13.70  3SGB  −12.7  −10.58  −8.54  −11.90  −10.57  4SGB  −11.7  −13.50  −12.43  −12.50  −10.09  4INS  −7.4  −  −  −8.44  −8.45  4CPA  −10.0  −  −  −12.54  −10.50  4TPI  −17.7  −19.40  −  −20.98  −18.84  2SIC  −12.7  −  −  −15.22  −13.92  1PPF  −13.5  −10.5  −  −  −11.80  2SEC  −14.0  −  −12.2  −  −13.51  3TPI  −17.3  −21.2  −  −  −18.02  Correlation coefficient/            No. of samples  −  0.75/13  0.70/9  0.86/17  0.95/20  Protein  ΔGexp  ΔG1cal  ΔG2cal  ΔG3cal  ΔG4cal  All values are in kcal/mol. The values of binding free energies are reported for different models: ΔG1cal are taken from Vajda et al. (Vajda et al., 1994) using the complete empirical free function; ΔG2cal are taken from Zhang et al. (Zhang et al., 1997) based on the atomic contact energies (ACE) method; ΔG3cal are taken from Xu et al. (Xu et al., 1997) based on the hydrophilic pair model; ΔG4cal are the calculated values obtained by using Equation 8.  1CHO  −14.4  −14.52  −13.62  −14.06  −14.40  1CSE  −13.1  −16.50  −  −15.18  −13.51  1TEC  −14.0  −14.52  −12.89  −14.79  −13.92  1BRS  −17.3  −  −  −14.93  −17.68  2KAI  −12.5  −15.98  −13.77  −12.82  −13.03  2PTC  −18.1  −17.20  −15.19  −15.55  −15.97  2SNI  −15.8  −15.62  −15.65  −16.74  −16.45  2TGP  −17.8  −17.09  −15.65  −14.96  −15.56  2TPI  −5.8  −  −  −5.29  −6.47  3CPA  −5.3  −  −  −6.66  −6.33  3HFL  14.2  15.40  −  12.54  13.70  3SGB  −12.7  −10.58  −8.54  −11.90  −10.57  4SGB  −11.7  −13.50  −12.43  −12.50  −10.09  4INS  −7.4  −  −  −8.44  −8.45  4CPA  −10.0  −  −  −12.54  −10.50  4TPI  −17.7  −19.40  −  −20.98  −18.84  2SIC  −12.7  −  −  −15.22  −13.92  1PPF  −13.5  −10.5  −  −  −11.80  2SEC  −14.0  −  −12.2  −  −13.51  3TPI  −17.3  −21.2  −  −  −18.02  Correlation coefficient/            No. of samples  −  0.75/13  0.70/9  0.86/17  0.95/20  View Large Table IV. Results of molecular docking calculations Protein  Name of complex  Rank  R.m.s.d. (Å)  4INSXX  Insulin dimer  3  0.74  1ACBXX  α-Chymotrypsin/ovomucoid  3  0.36  1CSEXX  Subtilisin/streptomyces inhibitor  8  3.50  1CHOXX  β-Trypsin/pancreatic trypsin inhibitor  2  3.30  2PTCXX  Cytochrome c peroxidase/iso-1-cytochrome c  19  0.85  2SICXX  Trypsinogen/pancreatic trypsin inhibitor  14  3.11  1PPEFX  Thermitase/eglin C  4  2.30  2TECFX  Trypsinogen/pancreatic trypsin inhibitor  27  3.45  1BRBFF  β-Trypsin/pancreatic trypsin inhibitor  8  3.00  2PTCFF  Trypsin/pancreatic trypsin inhibitor  15  1.39  Protein  Name of complex  Rank  R.m.s.d. (Å)  4INSXX  Insulin dimer  3  0.74  1ACBXX  α-Chymotrypsin/ovomucoid  3  0.36  1CSEXX  Subtilisin/streptomyces inhibitor  8  3.50  1CHOXX  β-Trypsin/pancreatic trypsin inhibitor  2  3.30  2PTCXX  Cytochrome c peroxidase/iso-1-cytochrome c  19  0.85  2SICXX  Trypsinogen/pancreatic trypsin inhibitor  14  3.11  1PPEFX  Thermitase/eglin C  4  2.30  2TECFX  Trypsinogen/pancreatic trypsin inhibitor  27  3.45  1BRBFF  β-Trypsin/pancreatic trypsin inhibitor  8  3.00  2PTCFF  Trypsin/pancreatic trypsin inhibitor  15  1.39  View Large Fig. 1. View largeDownload slide Linear fitting of side-chain conformation entropy (TΔS) versus Nb. Results are calculated for the 20 complexes reported in Table I. The correlation coefficient, R, is equal to 0.97. Fig. 1. View largeDownload slide Linear fitting of side-chain conformation entropy (TΔS) versus Nb. Results are calculated for the 20 complexes reported in Table I. The correlation coefficient, R, is equal to 0.97. Fig. 2. View largeDownload slide Linear fitting of electrostatic interaction energies versus Npair. Results are calculated for the 13 complexes. The correlation coefficient, R, is equal to 0.92. Fig. 2. View largeDownload slide Linear fitting of electrostatic interaction energies versus Npair. Results are calculated for the 13 complexes. The correlation coefficient, R, is equal to 0.92. Fig. 3. View largeDownload slide Linear fitting of hydrophobic energies versus ΔASAapol. The values of ΔASAapol are in Å2. Results are calculated for the 20 complexes in Table I. The correlation coefficient, R, is equal to 0.94. Fig. 3. View largeDownload slide Linear fitting of hydrophobic energies versus ΔASAapol. The values of ΔASAapol are in Å2. Results are calculated for the 20 complexes in Table I. The correlation coefficient, R, is equal to 0.94. Fig. 4. View largeDownload slide Superposition of the experimentally determined structures of four protein complexes and the best ranked near-correct predictions reported in Table IV. Thick lines: Cα trace of experimental structure. Thin lines: Cα trace of predicted model. The four selection plotting structures are taken from Table IV: (A) 1ACBXX; (B) 1PPEFX; (C) 1BRBFF; (D) 2PTCFF. Fig. 4. View largeDownload slide Superposition of the experimentally determined structures of four protein complexes and the best ranked near-correct predictions reported in Table IV. Thick lines: Cα trace of experimental structure. Thin lines: Cα trace of predicted model. The four selection plotting structures are taken from Table IV: (A) 1ACBXX; (B) 1PPEFX; (C) 1BRBFF; (D) 2PTCFF. 1 To whom correspondence should be addressed. E-mail: cxwang@bjpu.edu.cn We thank Professor J.Janin for providing the docking package. We also thank Dr Ben Zhuo Lu for helpful discussions. This work was supported in part by the Chinese Natural Science Foundation (Nos 29992590–2, 30170230 and 10174005). References Berendsen,H.J.C., van der Spoel,D. and van Drunen,R. ( 1995) Comput. Phys. Commun. , 91, 43–56. Google Scholar Bernstein,F.C., Koetzle,T.F., Williams,G.J.B, Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. ( 1977) J. Mol. Biol. , 112, 535–542. Google Scholar Camacho,C.J. and Vajda,S. ( 2002) Curr. Opin. Struct. Biol. , 12, 36–40. Google Scholar Camacho,C.J., Weng,Z., Vajda,S. and DeLisi,C. ( 1999) Biophys J. , 76, 1166–1178. Google Scholar Cherfils,J. and Janin,J. ( 1993) Curr. Opin. Struct. Biol. , 3, 265–269. Google Scholar Cherfils,J., Duquerroy,S. and Janin,J. ( 1991) Proteins: Struct. Funct. Genet. , 11, 271–280. Google Scholar Cherfils,J., Bizebard,T., Knossow,M. and Janin,J. ( 1994) Proteins: Struct. Funct. Genet. , 18, 8–18. Google Scholar Di Nola,A., Berendsen,H.J.C. and Edholm,O. ( 1984) Macromolecules , 17, 2044–2050. Google Scholar Goodsell,D.S. and Olson,A.J. ( 1990) Proteins: Struct. Funct. Genet. , 8, 195–202. Google Scholar Jackson,R.M. and Sternberg,M.J. ( 1995) J. Mol. Biol. , 250, 258–275. Google Scholar Karplus,M. and Kushick J.N. ( 1981) Macromolecules , 14, 325–332. Google Scholar Karplus,M. and Petsko,G.A. ( 1990) Nature , 347, 631–639. Google Scholar King,B.L., Vajda,S. and DeLisi,C. ( 1996) FEBS Lett. , 384, 87–91. Google Scholar Lee,B. and Richards F.M. ( 1971) J. Mol. Biol. , 55, 379–400. Google Scholar Lin,S.L., Nussinov,R., Fischer,D. and Wolfson,H.J. ( 1994) Proteins: Struct. Funct. Genet. , 18, 94–101. Google Scholar Mezei,M. and Beveridge,D.L. ( 1986) Ann. N. Y. Acad. Sci. , 482, 1–23. Google Scholar Miyamoto,S. and Kollman,P.A. ( 1993) Proteins: Struct. Funct. Genet. , 16, 226–245. Google Scholar Nauchitel,V., Villaverde,M.C. and Sussman,F. ( 1995) Protein Sci. , 4, 1356–1364. Google Scholar Norel,R., Sheinerman,F., Petrey,D. and Honig,B. ( 2001) Protein Sci. , 10, 2147–2161. Google Scholar Novotny,J., Bruccoleri,R.E. and Saul,F.A. ( 1989) Biochemistry , 28, 4735–4749. Google Scholar Pickett,S.D. and Sternberg,M.J.E. ( 1993) J. Mol. Biol. , 231, 825–839. Google Scholar Reynolds,C.A., King,P.M. and Richards,W.G. ( 1992) Mol. Phys. , 76, 251–275. Google Scholar Sezerman,U., Vajda,S., Cornette,J., DeLisi,C. ( 1993) Protein Sci. , 2, 1827–1843. Google Scholar Smith,G.R. and Sternberg,J.E. ( 2002) Curr. Opin. Struct. Biol. , 12, 28–35. Google Scholar Smith,K.C. and Honig,B. ( 1994) Proteins: Struct. Funct. Genet. , 18, 119–132. Google Scholar Stoddard,B.L. and Koshland,D.E.,Jr. ( 1993) Proc. Natl Acad. Sci. USA , 90, 1146–1153. Google Scholar Takamatsu,Y. and Itai,A. ( 1998) Proteins: Struct. Funct. Genet. , 33, 62–73. Google Scholar Vajda,S., Weng,Z.P., Rosenfld,R. and DeLisi,C. ( 1994) Biochemistry , 33, 13977–13988. Google Scholar Vajda,S., Weng,Z.P. and DeLisi,C. ( 1995) Protein Sci. , 8, 1081–1092. Google Scholar Vajda,S., Sippl,M., Novotny,J. ( 1997) Curr. Opin. Struct. Biol. , 2, 222–228. Google Scholar Weng,Z.P., DeLisi,C. and Vajda,S. ( 1997) Protein Sci. , 6, 1976–1984. Google Scholar Xu,D., Lin,S.L. and Nussinov,R. ( 1997) J. Mol. Biol. , 265, 68–84. Google Scholar Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. ( 1997) J. Mol. Biol. , 267, 707–726. Google Scholar © Oxford University Press

Journal

Protein Engineering, Design and SelectionOxford University Press

Published: Aug 1, 2002

There are no references for this article.