TY - JOUR AU - Go, Nobuhiro AB - Abstract In order to search for a common structural motif in the phosphate-binding sites of protein–mononucleotide complexes, we investigated the structural variety of phosphate-binding schemes by an all-against-all comparison of 491 binding sites found in the Protein Data Bank. We found four frequently occurring structural motifs composed of protein atoms interacting with phosphate groups, each of which appears in different protein superfamilies with different folds. The most frequently occurring motif, which we call the structural P-loop, is shared by 13 superfamilies and is characterized by a four-residue fragment, GXXX, interacting with a phosphate group through the backbone atoms. Various sequence motifs, including Walker's A motif or the P-loop, turn out to be a structural P-loop found in a few specific superfamilies. The other three motifs are found in pairs of superfamilies: protein kinase and glutathione synthetase ATPase domain like, actin-like ATPase domain and nucleotidyltransferase, and FMN-linked oxidoreductase and PRTase. Introduction Molecular recognition is one of the key steps in numerous biological functions of proteins. Specificity of recognition is determined by the detailed compatibility between the spatial arrangement of protein and ligand atoms. The increasing number of three-dimensional structures of proteins, complexed with appropriate ligands, in the Protein Data Bank (Bernstein et al., 1977) provides an important clue in understanding the mechanism of molecular recognition. In this study, we focused on protein–mononucleotide complexes. Owing to its repeated appearance in many biological processes such as energy transfer, signal transduction and protein biosynthesis, the study of protein–mononucleotide complexes and the associated interaction is extremely important. Comparative structural studies on mononucleotide-binding proteins have identified various sequence motifs (Traut, 1994) and a variety of protein folds (Schulz, 1992). Recently, structural analyses of atomic interactions between the base part of ATP or GTP and proteins revealed the structural variety of the purine base recognition (Kobayashi and Go, 1997) and showed that proteins, having totally different folds, may adopt similar recognition schemes (Kobayashi and Go, 1996). This study indicates that merely the sequence or the protein fold information is not sufficient to justify the variations of the binding schemes; rather, comparison at the atomic structure level is required. Keeping this view in mind, we have extended the previous study (Kobayashi and Go, 1997) and analyzed, using an all-against-all comparison of the local atomic environments around the recognition site, the phosphate-binding sites taken from 491 coordinate sets in Protein Data Bank (October 1997 release). Materials and methods The 491 PDB structures were taken from 339 entries (October 1997 release; some entries contain two or more mononucleotides). The comparison of the local environments around the phosphate group was carried out as follows. Atoms are chosen for comparison if they are within 7.0 Å of the nearest phosphorus atom of the mononucleotide. The spatial arrangement of the atoms is represented by a distance matrix containing a set of distances for all pairs of atoms. For two such distance matrices, a pair of such sub-matrices that satisfies the following four conditions is searched, for: (i) the difference in distance between any pair of corresponding atoms is <1.5 Å; (ii) the number of corresponding atoms, Nc, is maximum; (iii) the root mean square displacement (r.m.s.d.) of the corresponding atoms after superposition is <1.0 Å; and (iv) the pair of corresponding atoms belongs to the same atom type according to the definition of Warme and Morgan (1978). The maximum clique detection algorithm (Bron and Kerbosch, 1973) was used for finding such a pair of sub-matrices. The set of corresponding atoms thus found constitutes a common local environment whose level of similarity is measured by the number of the corresponding atoms, Nc. Since the number and the species of corresponding atoms in a common local environment are different, a cluster analysis, based on a single similarity scale, is not suitable for the purpose of classifying the common local environments. Instead, we represent the results in a correlation map shown in Figure 1a. This is a 491×491 symmetric matrix, where a dot indicates a pair of binding sites, within the 491 structures, that have similar configurations of protein atoms interacting with a phosphate group. The 491 structures are aligned on the axis so that the dots (binding site pairs) pack as tightly as possible on the diagonal line. This procedure places structures sharing similar local environments close to each other on the axis and clusters of dots are formed on the diagonal line as in Figure 1a. Results and discussion Classification of phosphate-binding site The results of the all-against-all comparison among the 491 mononucleotide-binding sites are summarized in the 491× 491 correlation map in Figure 1a. Dots are colored according to the level of similarity of the pair of binding sites: red dots are for highly similar pairs (Nc ⩾ 30) and green dots are for less similar ones (23 ⩽ Nc ⩽ 29). This lowest bound, 23, was chosen so that most of the common local environments defined between two superfamilies have a unique set of corresponding atoms. Below this threshold level, the common local environments for two superfamilies tend to have variations in the corresponding atoms. An example of the higher level similarity denoted by a red dot is found between entries (1) and (2) in Figure 1b, where atoms in a similar configuration exist in main or side chains indicated by red or green traces. Metal cations, indicated by gray spheres, also occupy similar positions. Such similarity is found in square-shaped clusters of red dots in Figure 1a. A detailed analysis of the clusters showed that each of these clusters corresponds to a superfamily defined in the SCOP database (Murzin et al., 1995). In other words, each superfamily appears to assume a different phosphate-binding scheme. On the other hand, a lower level of similarity, denoted by green off-diagonal dots, is found in a more limited portion of the binding site, such as between entries (1) and (3) in Figure 1b, where the atoms with green traces assume similar configurations. We call such a fragment, having an inter-cluster or inter-superfamily similarity, a structural motif. Many structural motifs, represented by green off-diagonal dots, are found in a square enclosed by the solid black line containing 13 superfamilies. The structural motifs within this square have a common spatial arrangement of the backbone and phosphate atoms with some variations in their side chain structures. We call this common structure, characterizing the phosphate-binding sites of the 13 superfamilies, a structural P-loop. The other three sets of structural motifs enclosed by circles are shared by pairs of superfamilies: (1) protein kinase and glutathione synthetase ATP-binding domain like (Figure 3a); (2) nucleotidyltransferase and actin-like ATPase domain (Figure 3b); and (3) FMN-linked oxidoreductase and PRTase catalytic (C) site (Figure 3c). The other green dots in Figure 1a, those above circle 3 and on the left of circle 2, are found in only a limited number of the superfamilies. These do not represent any structural motif, i.e. a unique set of corresponding atoms for a pair of superfamilies. Structural P-loop The common structure shared by 13 superfamilies, the structural P-loop, contains a four-residue backbone fragment whose conformation is illustrated in Figure 1b. The fragment forms hydrogen bonds between its backbone atoms and a phosphate group. The structural alignments, shown schematically in Figure 2, indicate that the first residues of the structural P-loops are always glycine and the other three residues are not conserved at all. Since side chain atoms do not participate in binding of the common phosphate, it is reasonable that the amino acid sequences are not conserved. The importance of the main chain atoms in nucleotide binding has already been pointed out by Swindells (1993) in loop search studies of nucleotide-binding sites in doubly wound α/β proteins. However, the reason for the strict conservation of the glycine residue is not clear; The glycine (ϕ,Ψ) angles are scattered all over the Ramachandran (ϕ,Ψ) map and their supposed Cβ positions are not necessarily occupied by one of the phosphate atoms. Figure 2 also shows that the structural P-loop is mostly on a loop connecting a β-strand to an α-helix, with some exceptions. This observation is consistent with the tendency that a negatively charged group frequently binds to the N-terminus of an α-helix (Hol et al., 1978). Various sequence motifs have been proposed for the phosphate-binding site: Walker's A motif (Walker et al., 1982) or the P-loop (Saraste et al., 1990), GXGXXG motif in protein kinase family (Hanks et al., 1988) and HSP70 protein family signatures (Bairoch et al., 1997). As can be seen in Figure 2, these sequence motifs always overlap with the structural P-loop. In contrast, there are many superfamilies with the structural P-loop that do not possess any sequence motif. Thus the structural P-loop covers much wider varieties of protein superfamilies than the sequence motifs. Protein kinase versus glutathione synthetase ATP-binding domain like Figure 3a shows a representative case of a common structural motif found in two superfamilies: protein kinase and glutathione synthetase ATP-binding domain like. The divalent cations are coordinated by the two phosphate groups of ATP and Asp184 in cAPK. In DD-ligase, the divalent cations are coordinated by a phosphate group of ADP, Glu270 and a phosphino-phosphate 3 in PHY (an inhibitor). In addition, the phosphate groups are hydrogen-bonded with the backbone amides of Ser150 in cAPK and Ser53 in DD-ligase. The r.m.s.d. value is 0.85 Å and Nc = 32. In these proteins, the negative charges on the phosphate groups are neutralized by, among others, two divalent cations and an ε-NH3+ group of lysine, Lys72 in cAPK and Lys97 in DD-ligase. The lysine side chains in the two proteins have totally different orientations, while the ε-NH3+ groups occupy very similar positions, as illustrated in the Figure 3a. It should be noted that these lysine residues do not have similar positions along the sequence. Although the local environments around the phosphate groups in cAPK and DD-ligase are strikingly similar to each other, the atomic configurations, other than the common phosphate-binding site, are totally different. This structural difference may reflect the difference of the target molecule to which a γ-phosphate is transferred, i.e. phosphorylase kinase in cAPK and a d-alanine in DD-ligase. Nucleotidyltransferase versus actin-like ATPase domain An example of the structural motif found both in nucleotidyltransferase and actin-like ATPase domain is shown in Figure 3b. The r.m.s.d. value after superposition is 0.92 Å and Nc = 28. This structural motif consists of three components which are not consecutive along the sequence, i.e. two backbone fragments (179A–181A, 187A–189A in D pol B and 12–14, 201–203 in HSP70) and an aspartic acid (D192 in D pol B and D10 in HSP70) coordinating a divalent cation (Mn2+ for Dpol B and Mg2+ for HSP70). The two backbone fragments are located in one case on a loop and in the other on two loops forming hydrogen bonds to the phosphate group through backbone atoms. The divalent cation in Dpol B interacts with α, β and γ phosphate groups in thymidine-5′-triphosphate (TTP), while the cation in HSP70 interacts with phosphoric acid and β phosphate group in ATP. The positions of the sugar moieties in mononucleotides are totally different from each other. Such a situation is also observed in all the other structural motifs found in our study. FMN-linked oxidoreductase versus PRTase C site The structural motif shown in Figure 3c is found in the two superfamilies, FMN-linked oxidoreductase and PRTase C site. The r.m.s.d. value is 0.57 Å and Nc = 28. The key features of similarity are two backbone fragments (325–326 and 345–348 in OYE and 353–354 and 347–350 in PRTase), one of which contains an arginine residue at the end (R348 in OYE and R350 in PRTase) making hydrogen bonds to the phosphate groups. In addition to the arginine residue, the backbone atoms also interact with the phosphate group. It should be noted that the two corresponding loops are in the opposite order along the sequence. Structural comparison versus sequence comparison The structural comparisons have revealed striking similarities in the phosphate-binding sites beyond the level of superfamily. Therefore, it is not possible for the sequence comparison alone to detect such similarities. It is found that the main chain atoms in the structural motifs are mostly responsible for the phosphate binding. This is the reason for the large variation in the local sequences (Figures 1 and 3). Furthermore, Figure 3 shows that the corresponding atoms in the structural motifs are not necessarily aligned along the sequence, and sometimes are even in the opposite order. All of these structural motifs found to be common among different superfamilies and folds should be the result of the convergent evolution. Comparison of the structural motifs with all PDB structures As in the sequence motifs, it should be possible to identify phosphate-binding sites by searching the Protein Data Bank with a query on the structural motifs. To examine this possibility, we compared 651 structures in the PDB-select dataset (25% list; October 1997 release) with the structural motifs, considering only the protein atoms. The result of the comparison was assessed by a Z-score, a normalized difference between the two kinds of the average values of Nc, one for proteins containing the structural motif and the other for all proteins. The normalization was done in terms of the standard deviation of Nc for all proteins. The Z-scores were calculated to be 0.19 for the structural P-loop (the backbone structure in Figure 1b; 16 protein atoms), 8.4 for protein kinase and glutathione synthetase ATP-binding domain like (Figure 3a; 31 protein atoms), 7.6 for nucleotidyltransferase and actin-like ATPase domain (Figure 3b; 38 protein atoms) and 3.9 for FMN-linked oxidoreductase and PRTase catalytic site (Figure 3c; 25 protein atoms). Each distribution for proteins containing one of the structural motifs, except the structural P-loop, is well separated from the distribution for all other proteins. The structural P-loop is only one of the necessary conditions of phosphate-binding, but not a sufficient condition. When non-protein atoms are ignored, the main chain trace of a β-turn of four residues becomes widespread, giving rise to the observed small Z-score for the structural P-loop. Therefore, to recognize the structural P-loop properly, it should ideally be defined in a similar manner as a template of a sequence motif, which contains not only the common part (or a product-set of various structures, structural P-loop in this case) but also some of the frequently occurring peripheral parts (or a sum-set). Actually, we found a high Z-score in an example containing both the structural P-loop and some peripheral parts; Z-score = 5.58 for the corresponding atoms between 6q21c and 1ayl shown in Figure 1b (drawn as red or green traces, 56 protein atoms). We are currently working on such a template representation, together with the appropriate search algorithm using the template. Fig. 1. View largeDownload slide (a) Results from the all-against-all comparison among the 491 mononucleotide-binding sites are summarized in this correlation map. All binding sites examined here are aligned on the axes of the map. A dot indicates a pair of similar binding sites, a red dot for a high level of similarity (Nc ⩾ 30) and a green dot for a lower level of similarity (23 ⩽ Nc ⩽ 29). Examples of similar pairs are shown in (b). Each square-shaped cluster of red dots can be assigned to a superfamily with name shown. The regions of inter-cluster (inter-superfamily) similarities of green dots are highlighted by a black square and three circles. (b) Typical examples of the two levels of similarity of the local environments, where oxygen and phosphorus atoms are colored yellow and purple, respectively. (1) P-loop containing nucleotide triphosphate hydrolase (PDB: 6q21 chain C); (2) phosphoenolpyruvate carboxykinase (PDB: 1ayl); (3) phosphofructokinase (PDB: 1pfk). The spatial arrangement of the backbone atoms illustrated by green traces is shared by proteins in the square box in (a). We call it the structural P-loop. The values of Nc (the r.m.s.d. value) between (1) and (2) and between (1) and (3) are 60 (0.62 Å) and 25 (0.68 Å), respectively. The figures were drawn by MOLSCRIPT (Kraulis, 1991). Fig. 1. View largeDownload slide (a) Results from the all-against-all comparison among the 491 mononucleotide-binding sites are summarized in this correlation map. All binding sites examined here are aligned on the axes of the map. A dot indicates a pair of similar binding sites, a red dot for a high level of similarity (Nc ⩾ 30) and a green dot for a lower level of similarity (23 ⩽ Nc ⩽ 29). Examples of similar pairs are shown in (b). Each square-shaped cluster of red dots can be assigned to a superfamily with name shown. The regions of inter-cluster (inter-superfamily) similarities of green dots are highlighted by a black square and three circles. (b) Typical examples of the two levels of similarity of the local environments, where oxygen and phosphorus atoms are colored yellow and purple, respectively. (1) P-loop containing nucleotide triphosphate hydrolase (PDB: 6q21 chain C); (2) phosphoenolpyruvate carboxykinase (PDB: 1ayl); (3) phosphofructokinase (PDB: 1pfk). The spatial arrangement of the backbone atoms illustrated by green traces is shared by proteins in the square box in (a). We call it the structural P-loop. The values of Nc (the r.m.s.d. value) between (1) and (2) and between (1) and (3) are 60 (0.62 Å) and 25 (0.68 Å), respectively. The figures were drawn by MOLSCRIPT (Kraulis, 1991). Fig. 2. View largeDownload slide Structural alignments of the fragments having the structural P-loop. The secondary structural elements are depicted by an arrow for a β-strand and a cylinder for an α-helix. The amino-acid sequence of the structural P-loop is indicated by a string of four large letters, and small letters represent the well known sequence motifs for the phosphate-binding site. The superfamilies shown here are: P-loop containing nucleotide triphosphate hydrolase (P1), another P-loop containing nucleotide triphosphate hydrolase (P2), phosphofructokinase (PFK), phosphoglycerate kinase (PGK), flavoprotein (FP), ferredoxin reductase (FR), cAMP-binding domain (CAMP), dihydrofolate reductase first fragment (DR1) and second fragment (DR2), protein kinase (PK), actin-like ATPase domain (ACT), sugar phosphatase (SP), a nucleotide-binding domain (AND) and phosphoribosyltransferase (PRTase). The residue number is given for the common glycine of the structural P-loop. Fig. 2. View largeDownload slide Structural alignments of the fragments having the structural P-loop. The secondary structural elements are depicted by an arrow for a β-strand and a cylinder for an α-helix. The amino-acid sequence of the structural P-loop is indicated by a string of four large letters, and small letters represent the well known sequence motifs for the phosphate-binding site. The superfamilies shown here are: P-loop containing nucleotide triphosphate hydrolase (P1), another P-loop containing nucleotide triphosphate hydrolase (P2), phosphofructokinase (PFK), phosphoglycerate kinase (PGK), flavoprotein (FP), ferredoxin reductase (FR), cAMP-binding domain (CAMP), dihydrofolate reductase first fragment (DR1) and second fragment (DR2), protein kinase (PK), actin-like ATPase domain (ACT), sugar phosphatase (SP), a nucleotide-binding domain (AND) and phosphoribosyltransferase (PRTase). The residue number is given for the common glycine of the structural P-loop. Fig. 3. View largeDownload slide Structural motifs of phosphate-binding sites illustrated by a superposition of a pair of local environments. (a) cAMP-dependent protein kinase catalytic subunit (cAPK; PDB:1atp; blue line) complexed with ATP (light blue line) and d-alanine–d-alanine ligase (DD-ligase; PDB: 2dln; red) complexed with ADP and PHY (light green line). This motif is denoted by the circle (1) in Figure 1a. (b) DNA polymerase β (Dpol B; PDB: 8icy; blue line) complexed with TPP (light blue line) and heat-shock cognate 70 kDa protein (HSP70; PDB: 1hpm; red line) complexed with ADP and a phosphate (light green line). This corresponds to the circle (2) in Fig. 1a. (c) Old yellow enzyme (OYE; PDB: 1oyb; blue line) complexed with FMN (light blue line) and glutamine phosphoribosylpyrophosphate amidotransferase (PRTase; PDB: 1gph; red line) and AMP (light green line). This is denoted by the circle (3) in Figure 1a. Fig. 3. View largeDownload slide Structural motifs of phosphate-binding sites illustrated by a superposition of a pair of local environments. (a) cAMP-dependent protein kinase catalytic subunit (cAPK; PDB:1atp; blue line) complexed with ATP (light blue line) and d-alanine–d-alanine ligase (DD-ligase; PDB: 2dln; red) complexed with ADP and PHY (light green line). This motif is denoted by the circle (1) in Figure 1a. (b) DNA polymerase β (Dpol B; PDB: 8icy; blue line) complexed with TPP (light blue line) and heat-shock cognate 70 kDa protein (HSP70; PDB: 1hpm; red line) complexed with ADP and a phosphate (light green line). This corresponds to the circle (2) in Fig. 1a. (c) Old yellow enzyme (OYE; PDB: 1oyb; blue line) complexed with FMN (light blue line) and glutamine phosphoribosylpyrophosphate amidotransferase (PRTase; PDB: 1gph; red line) and AMP (light green line). This is denoted by the circle (3) in Figure 1a. 1 To whom correspondence should be addressed. E-mail: kidera@qchem.kuchem.kyoto-u.ac.jp We are grateful to Gautam Basu for reading the manuscript carefully. This work was supported by a grant from MESC to A.K and N.G. The computations were performed at the Computer Center of the Institute for Molecular Science, Center for Promotion of Computational Science and Engineering of JAERI and Data Processing Center, Kyoto University. References Bairoch,A., Bucher,P. and Hofmann,K. ( 1997) Nucleic Acids Res. , 25, 217–221. Google Scholar Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. ( 1977) J. Mol. Biol. , 112, 535–542. Google Scholar Bron,C. and Kerbosch,J. ( 1973) Commun. A.C.M. , 16, 575–577. Google Scholar Hanks,S.K., Quinn,A. and Hunter,T. ( 1988) Science , 241, 42–52. Google Scholar Hol,W.G.J., van Duijnen,P.T. and Berendsen,H.J.C. ( 1978) Nature , 273, 443–446. Google Scholar Kobayashi,N. and Go,N. ( 1996) Nature Struct. Biol. , 4, 6–7. Google Scholar Kobayashi,N. and Go,N. ( 1997) Eur. Biophys. J. , 26, 135–144. Google Scholar Kraulis, P.J. ( 1991) J. Appl. Crystallogr. , 24, 946–950. Google Scholar Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. ( 1995) J. Mol. Biol. , 247, 536–540. Google Scholar Saraste,M., Shibbald,P.R. and Wittinghofer,A. ( 1990) Trends Biochem. Sci. , 15, 430–434. Google Scholar Schulz, G.E. ( 1992) Curr. Opin. Struct. Biol. , 2, 61–67. Google Scholar Swindells M.B. ( 1993) Protein Sci. , 2, 2146–2153 Google Scholar Traut, W.T. ( 1994) Eur. J. Biochem. , 222, 9–19. Google Scholar Walker,J.E., Saraste,M., Runswick,M.J. and Gay,N.J. ( 1982) EMBO J. , 1, 945–951. Google Scholar Warme,P.K. and Morgan,R.S. ( 1978) J. Mol. Biol. , 118, 273–287. Google Scholar © Oxford University Press TI - Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein–mononucleotide complexes JO - Protein Engineering, Design and Selection DO - 10.1093/protein/12.1.11 DA - 1999-01-01 UR - https://www.deepdyve.com/lp/oxford-university-press/structural-motif-of-phosphate-binding-site-common-to-various-protein-YiIHTvRhas SP - 11 EP - 14 VL - 12 IS - 1 DP - DeepDyve ER -