pdbFun: mass selection and fast comparison of annotated PDB residues

Gabriele Ausiello; Andreas Zanzoni; Daniele Peluso; Allegra Via; Manuela Helmer-Citterich

doi:10.1093/nar/gki499

pdbFun: mass selection and fast comparison of annotated PDB residues

Ausiello, Gabriele; Zanzoni, Andreas; Peluso, Daniele; Via, Allegra; Helmer-Citterich, Manuela 2005-07-01 00:00:00 Nucleic Acids Research, 2005, Vol. 33, Web Server issue W133–W137 doi:10.1093/nar/gki499 pdbFun: mass selection and fast comparison of annotated PDB residues Gabriele Ausiello*, Andreas Zanzoni, Daniele Peluso, Allegra Via and Manuela Helmer-Citterich Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy Received February 14, 2005; Accepted April 28, 2005 integration between data and computational tool(s) in the ABSTRACT resource and (iv) wholeness (the quantity of data that can be pdbFun (http://pdbfun.uniroma2.it) is a web server for analysed at the same time). structural and functional analysis of proteins at the (i) Data integration can provide consistent advantages in residue level. pdbFun gives fast access to the whole the analysis of protein structures, as demonstrated and Protein Data Bank (PDB) organized as a database of exemplified by PDBSUM (3), a database providing a annotated residues. The available data (features) vast amount of information on the PDB entries. At present, range from solvent exposure to ligand binding ability, the huge MSD project (4), merging all main databases location in a protein cavity, secondary structure, with the PDB, represents the best implementation of this residue type, sequence functional pattern, protein concept. domain and catalytic activity. Users can select any (ii) Data integration can operate at different levels. Large residue subset (even including any number of PDB volumes of data about protein structure and function are structures) by combining the available features. currently available in the biologically relevant databases. Selections can be used as probe and target in multiple Such data can be integrated at the protein level. More effectively, for a focus on molecular function, they can structure comparison searches. For example a search be mapped onto protein residues. Data integration at the could involve, as a query, all solvent-exposed, hydro- residue level is exemplified by the possibility of querying phylic residues that are not in alpha-helices and are for solvent-exposed amino acids located in the alpha- involved in nucleotide binding. Possible examples of helices of a protein structure. This feature has already targets are represented by another selection, a single been used in the SURFACE database (5) and has now structure or a dataset composed of many structures. been extended by MSDmine (unpublished resource, The output is a list of aligned structural matches http://www.ebi.ac.uk/msd-srv/msdmine). offered in tabular and also graphical format. (iii) Integration between data and one or more computational methods is a fundamental task. Such a task is achieved in tools where simple or complex selections of the integrated INTRODUCTION data can be built and straightforwardly used as input to Structural genomics projects (1) and the improvement of an embedded method (i.e. running a comparison program experimental techniques for structural analysis enrich the only on proteins sharing a specified function). Protein Data Bank (PDB) (2) with structural data of very high (iv) The last important property for a complete structural ana- quality and reliability. Nevertheless, few complete resources lysis tool is its being able to consider vast amounts of data at are available for analysing the connections between structural the same time, i.e. its wholeness or ability to work as a high- features and molecular functions that lie hidden in this huge throughput resource. Queries can be formulated with more amount of data. We identiﬁed some important characteristics or even all the available data. A user may choose to focus on that may be considered in the design and construction of all proteins belonging to a specified SCOP class or to select a complete resource for establishing structure–function links: all the tryptophan residues in the whole PDB catalytic sites. (i) presence of integrated data (number and type of different considered databases); (ii) level of the data integration In the perspective described here, we propose pdbFun as a fast detail (i.e. structure, domain, residue and atom); (iii) level of and user-friendly integrated web server for structural analysis *To whom correspondence should be addressed. Tel: +39 06 72594314; Fax: +39 06 72594314; Email: [email protected] The Author 2005. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] W134 Nucleic Acids Research, 2005, Vol. 33, Web Server issue of local similarities among proteins. pdbFun collects annota- Features tions derived from different databases (data integration), maps The different features currently available are shown on, and them onto single residues (good level of integration detail) and can be accessed from, the homepage. The user can start runs a local structural comparison algorithm on the selected creating one residue selection by choosing any one of the residues (data/method integration). Queries and comparisons following (Figure 1): are allowed on any sets of annotations or residues, even includ- ing the entire PDB (wholeness). (i) Structures. All residues belonging to one or many PDB structures can be selected, up to and including the whole Overview database. (ii) Chains. All residues belonging to one or more chains can pdbFun is an integrated web tool for querying the PDB at the be selected. Lists of non-redundant PDB chains are avail- residue level and for local structural comparison. pdbFun able here as pre-calculated selections. integrates knowledge about single residues in protein structures (iii) Surfaces. Residues can be selected according to their from other databases or calculated with available instruments or solvent-exposed or buried status given by the NACCESS instruments developed in-house for structural analysis. Each set program (7). of different annotations represents a feature. Typical features are (iv) Clefts. The SURFNET program (8) is used to assign sur- secondary structure assignments or SMART domains (6), face residues to protein cavities. Cavities are sorted by whose annotations are the H/T/E assignments or domain fam- size (number 1 refers to the biggest). ilies, respectively, reported at the residue level. The user can (v) Domains. Residues belonging to domains are annotated build simple residue selections by including any number of here using HMMER (9) on the SMART database. annotations from a single feature, e.g. all residues belonging (vi) Two-dimensional structures. Each residue is associated to any of three different SMART domains. The selections can be combined recursively to create more complex ones. The user is with the secondary structure assignment provided by the allowed to choose only the b-strand or turn residues of the pre- dssp (10) program. (E: extended strand; H: alpha-helix; vious three domains. Each selection can be manually reﬁned by T: hydrogen bonded turn, etc.). adding and removing single residues. Structural similarity can (vii) Motifs. PROSITE patterns (11) as found on the sequences be searched between any pair of selections. All comparisons of the PDB chains. (viii) Binding sites. Users can select residues whose distance is and queries are performed in real time with a fast program <3.5 s from any ligand molecule present in the PDB. (Ausiello,G., Via,A. and Helmer-Citterich,M., manuscript Choosing ATP or ADP selects all residues found at a submitted) running on the web server. Figure 1. A Selection table is shown. The user has created five selections: Selection 1, all PROSITE residues with the ATP keyword in the pattern description (using the motifs feature); Selection 2, all charged residues in the PDB (D, E, H, K and R in the residues feature); Selection 3, all exposed residues (surface feature); Selection 4, all charged residues in the selected motifs (Selection 1 INTERSECT Selection 2); Selection 5, all charged residues in the selected motifs that are not solvent-exposed (Selection 4 SUBTRACT Selection 3). The estimated time for comparing the first chain (see text) of Selection 5 as query and Selection 3 as target is 18 s. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W135 distance closer than the defined threshold from the ATP found on 2952 different PDB chains and comprise a total of or ADP nucleotides. 31 801 residues. (ix) Active sites. Active site residues in a set of enzyme struc- Combining selections tures obtained from the CatRes database (12). (x) Residues. The 20 residue types [from A (alanine) to W All selections can be combined using the AND, OR and NOT (tryptophan)]. This feature helps the user to concentrate boolean operators. The result is a new selection containing a only on some kinds of residues, while ignoring all the combination of their residues. The two selections to combine others (i.e. all charged residues or aromatic residues). are chosen with the ‘probe’ and ‘target’ radio buttons. Apply- ing the ‘Intersect’ (AND) on Selections 1 and 2 (see Figure 1) Annotations creates a new selection including only the common residues (e.g. the PDB proline residues that are found in alpha-helices), By selecting a feature from the pdbFun main page, the user whereas using ‘Add’ (OR) the two selections will be merged accesses the annotation page where single annotations of that (e.g. all residues that are in a big surface cleft ‘or’ belong to particular feature can be chosen to create a simple selection some active site). The ‘Subtract’ (NOT) is also a binary oper- of residues. The total number of selected residues corresponds ator and needs to be understood as an ‘AND’ between the to the sum of all the residues selected by a single annotation. probe and the complement of the target (e.g. all the charged We describe in detail the Motifs feature page. residues which are ‘not’ exposed). In the Motifs page, all PROSITE patterns are listed and Each selection created can be, recursively, the object of a represent the annotations. Fields duplicated locally are the new combination. pattern ‘id’, ‘name’ and ‘short description’. In addition, a The ‘residues’ and ‘chains’ columns of the Selection table ‘residues’ ﬁeld indicates the number of annotated residues in contain useful statistical information on the PDB residues’ the whole PDB. A ‘chains’ ﬁeld indicates the number of chains composition. Questions such as ‘How many charged residues containing at least one of the annotated residues. In order to are buried in the whole PDB, or in a certain type of domain?’ facilitate searching, the annotations are organized in pages and can be answered in a fraction of a second. can be sorted by any ﬁeld. Annotations (i.e. speciﬁc PROSITE motifs) can be added Structural comparison to the current selection in various ways: manually (using Selections can be chosen as probe and target of a structural check-boxes), by text search (only the selected ﬁeld will be comparison procedure to ﬁnd local similarities in residues’ searched) or by uploading a user ﬂat ﬁle containing a list of spatial arrangements. The selected residues in each chain of PROSITE codes. the probe are searched against the selected residues in each All the features available in pdbFun share identical organ- chain of the target. The comparison algorithm is guaranteed to ization. New features can therefore be added and annotations ﬁnd the largest subset of matching residues between two struc- handled without the need to modify the code. tures. The matching condition is an RMSD (root mean square Let us take as an example how to select all PDB residues difference) <0.7 A and a residue similarity >1.3 according to matching any of the PROSITE motifs involving ATP. the Dayhoff substitution matrix. The algorithm is exhaustive, (i) In the Motifs page, sort the annotations by the ‘description’ fast and sequence and fold independent. field by clicking on the column header. All the probe (but not the target) residues must belong to a (ii) Type ‘ATP’ in the search box (the search will be auto- single PDB chain (if the probe is a multi-chain selection, only matically conducted on the sorted field) and press the the ﬁrst chain will be compared by default). Comparisons stop search button. when a match is found comprising at least 10 residues. As soon (iii) All the 18 PROSITE motifs containing ‘ATP’ in the as a new probe or target is chosen, an estimate of the compar- description are selected, and the user can go back to the ison execution time is given at the bottom right of the screen. main page and find the selection described as a row on the pdbFun main page. Comparison results When a comparison is started, the user is redirected to the Simple selections Results page. Here new matches are immediately displayed as they are calculated. Matches are sorted by decreasing score Whenever a selection is made, pdbFun stores it as a row in a and are displayed in pages. The probe chain matching residues Selection table that can be visualized by going back to the are listed in the ﬁrst column of the Results table. Each target main page. Each selection is identiﬁed by a unique name, by a chain is shown in a different column, together with the match type (the feature used to generate it), by the number of annota- length. Target residues are listed in the same rows as the probe tions selected in the feature and by the total number of chains residues to which they are structurally aligned (see Figure 2). and residues in the PDB that have been selected. New selec- Columns can be selected for a graphical view of the match tions can be created by choosing one of the features available in single or multiple alignment using a Java applet. A text in the upper part of the screen. Existing selections can be ﬁle containing the results of the comparison is available for accessed and modiﬁed via the ‘annotations’ ﬁeld. downloading. For example, see Figure 1. The selection created in the previous example now appears in the Selection table as Manual selections ‘Selection 1’. The ‘feature type’ ﬁeld is ‘motifs’. The number of annotations selected is 18 (the 18 PROSITE patterns whose pdbFun allows the user to perform a manual selection of description contains the ATP word). Such patterns have been residues on a single PDB chain, according to his/her interest W136 Nucleic Acids Research, 2005, Vol. 33, Web Server issue Figure 2. The first Results page of a comparison. A manual selection of 5p21 (ras protein) residues involved in GTP binding was compared with the5500 chains of a non-redundant PDB (50%). The output is shown in tabular and also graphic format. In the first column of the table, the matching residues of the query PDB chain are reported; in the adjacent columns, the other PDB chains follow, and the residues aligned in three dimensions appear in the same rows. The matched PDB chains are reported in the first row; the number of matched residues in the second one. Matching residues are also displayed upon selection (pressing on the ‘draw’ button) with a Java applet. or personal knowledge (and not only by using the features Implementation notes calculated or extracted from pre-existing databases). Through In order to achieve high speed and a high level of interactivity, the ‘chains’ ﬁeld in the Selection table, the user accesses a page all residue data are stored in the server memory. A single C where he/she can choose the chain to work with manually. program executes both fast queries and structural comparis- All the residues in the chain of interest will appear as a list, ons, and a relational database is used only for the storage of the together with the available annotations. Sets of single residues feature annotations list and for web users management. All can be chosen. A simple Java applet helps the user in selec- selections can be run in a fraction of a second. Comparison tions. This selection appears in the Selection table as ‘manual times range from fractions of a second to minutes. No time selection’. limit is given to users (but a newly submitted job stops the running one). Web pages have been tested on the main brow- Non-redundant PDB sets sers for the Windows and Linux platforms. Mac users should utilize Safari >1.2. Non-redundant datasets of chains obtained from the PDB (2) at different (90, 70, 50 and 30%) redundancies are available and Future directions can be used to generate non-redundant selections of chains or as target datasets. These sets can be selected from the Chains Major future developments involve the addition of new feature page and modiﬁed manually or left as they are. features. Features in preparation are residue conservation Nucleic Acids Research, 2005, Vol. 33, Web Server issue W137 derived by HSSP (13), presence in structural fold derived by 4. Velankar,S., McNeil,P., Mittard-Runte1,V., Suarez,A., Barrell,D., Apweiler,R. and Henrick,K. (2005) E-MSD: an integrated data resource CATH (14), user-deﬁned sequence regular expressions and for bioinformatics. Nucleic Acids Res., 33, D262–D265. proximity of residues. Finally, to further improve the quality 5. Ferre `,F., Ausiello,G., Zanzoni,A. and Helmer-Citterich,M. (2004) of integration among different data sources, part of the MSD SURFACE: a database of protein surface regions for functional data collection could be used. annotation. Nucleic Acids Res., 32, 240–244. 6. Letunic,I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Upload of user structures will be made possible and Schultz,J., Ponting,C.P. and Bork,P. (2004) SMART 4.0: towards statistical signiﬁcance of the matches introduced. genomic data integration. Nucleic Acids Res., 32, D142–D144. 7. Hubbard,S.J. and Thornton,J.M. (1993) NACCESS Computer Program. Department of Biochemistry and Molecular Biology, University College ACKNOWLEDGEMENTS London. 8. Laskowski,R.A. (1995) SURFNET: a program for visualizing molecular We thank Federico Fratticci for his useful contribution. This surfaces, cavities and intermolecular interactions. J. Mol. Graph., 13, work was supported by Telethon project GGP04273, FIRB and 323–330. Genefun. Funding to pay the Open Access publication charges 9. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. for this article was provided by Telethon project GGP04273. 10. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary Conflict of interest statement. None declared. structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637. 11. Hulo,N., Sigrist,C.J.A., Le Saux,V., Langendijk-Genevaux,P.S., Bordoli,L., Gattiker,A., De Castro,E., Bucher,P. and Bairoch,A. (2004) REFERENCES Recent improvements to the PROSITE database. Nucleic Acids Res., 1. Skolnick,J., Fetrow,J.S. and Kolinski,A. (2000) Structural genomics and 32, D134–D137. its importance for gene function analysis. Nat. Biotechnol., 18, 283–287. 12. Bartlett,G.J., Porter,C.T., Borkakoti,N. and Thornton,J.M. (2002) 2. Deshpande,N., Addess,K.J., Bluhm,W.F., Merino-Ott,J.C., Analysis of catalytic residues in enzyme active sites. J. Mol. Biol., Townsend-Merino,W., Zhang,Q., Knezevich,C., Xie,L., Chen,L., 324, 105–121. Feng,Z. et al. (2005) The RCSB Protein Data Bank: a redesigned query 13. Sander,C. and Schneider,R. (1991) Database of homology derived protein system and relational database based on the mmCIF schema. Nucleic structures and the structural meaning of sequence alignment. Proteins, 9, 56–68. Acids Res., 33, D233–D237. 14. Pearl,F.M.G., Lee,D., Bray,J.E., Sillitoe,I., Todd,A.E., Harrison,A.P., 3. Laskowski,R.A., Chistyakov,V.V. and Thornton,J.M. (2005) PDBsum Thornton,J.M. and Orengo,C.A. (2000) Assigning genomic sequences to more: new summaries and analyses of the known 3D structures of proteins CATH. Nucleic Acids Res., 28, 277–282. and nucleic acids. Nucleic Acids Res., 33, D266–D268. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/pdbfun-mass-selection-and-fast-comparison-of-annotated-pdb-residues-NgDXn03EFs

Loading next page...

References (16)

Frances Pearl, David Lee, J. Bray, I. Sillitoe, A. Todd, A. Harrison, J. Thornton, C. Orengo (2000)
Assigning genomic sequences to CATH
Nucleic acids research, 28 1
C. Sander, R. Schneider (1991)
Database of homology‐derived protein structures and the structural meaning of sequence alignment
Proteins: Structure, 9
F. Ferrè, G. Ausiello, Andreas Zanzoni, M. Helmer-Citterich (2004)
SURFACE: a database of protein surface regions for functional annotation
Nucleic acids research, 32 Database issue
(1991)
1991)Database of homologyderived protein structures and the structural meaning of sequence
A. Golovin, T. Oldfield, J. Tate, S. Velankar, G. Barton, H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Hussain, J. Ionides, M. John, P. Keller, E. Krissinel, P. McNeil, A. Naim, R. Newman, A. Pajon, Jorge Pineda-Castillo, A. Rachedi, J. Copeland, A. Sitnov, S. Sobhany, A. Suarez-Uruena, Jawahar Swaminathan, M. Tagari, S. Tromm, W. Vranken, K. Henrick (2004)
E-MSD: an integrated data resource for bioinformatics
Nucleic Acids Research, 33
J. Skolnick, J. Fetrow, A. Kolinski (2000)
Structural genomics and its importance for gene function analysis
Nature Biotechnology, 18
N. Hulo, Christian Sigrist, Virginie Saux, P. Langendijk-Genevaux, L. Bordoli, Alexandre Gattiker, E. Castro, P. Bucher, A. Bairoch (2004)
Recent improvements to the PROSITE database
Nucleic acids research, 32 Database issue
(2004)
0: towards genomic data integration
S. Eddy (1998)
Profile hidden Markov models
Bioinformatics, 14 9
(1993)
NACCESS Computer Program. Department of Biochemistry and Molecular Biology, University College London
W. Kabsch, C. Sander (1983)
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 22
R. Laskowski (1995)
SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions.
Journal of molecular graphics, 13 5
R. Laskowski, V. Chistyakov, J. Thornton (2004)
PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids
Nucleic Acids Research, 33
Nita Deshpande, K. Addess, Wolfgang Bluhm, Jeffrey Merino-Ott, Wayne Townsend-Merino, Qing Zhang, Charlie Knezevich, Lie Xie, Li Chen, Zukang Feng, Rachel Green, J. Flippen-Anderson, J. Westbrook, H. Berman, P. Bourne (2004)
The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema
Nucleic Acids Research, 33
Ivica Letunic, R. Copley, Steffen Schmidt, F. Ciccarelli, T. Doerks, J. Schultz, C. Ponting, P. Bork (2004)
SMART 4.0: towards genomic data integration
Nucleic acids research, 32 Database issue
G. Bartlett, C. Porter, N. Borkakoti, J. Thornton (2002)
Analysis of catalytic residues in enzyme active sites.
Journal of molecular biology, 324 1

Publisher: Oxford University Press
Copyright: © The Author 2005. Published by Oxford University Press. All rights reserved  The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]
ISSN: 0305-1048
eISSN: 1362-4962
DOI: 10.1093/nar/gki499
pmid: 15980442
Publisher site: See Article on Publisher Site

Abstract

Nucleic Acids Research, 2005, Vol. 33, Web Server issue W133–W137 doi:10.1093/nar/gki499 pdbFun: mass selection and fast comparison of annotated PDB residues Gabriele Ausiello*, Andreas Zanzoni, Daniele Peluso, Allegra Via and Manuela Helmer-Citterich Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy Received February 14, 2005; Accepted April 28, 2005 integration between data and computational tool(s) in the ABSTRACT resource and (iv) wholeness (the quantity of data that can be pdbFun (http://pdbfun.uniroma2.it) is a web server for analysed at the same time). structural and functional analysis of proteins at the (i) Data integration can provide consistent advantages in residue level. pdbFun gives fast access to the whole the analysis of protein structures, as demonstrated and Protein Data Bank (PDB) organized as a database of exemplified by PDBSUM (3), a database providing a annotated residues. The available data (features) vast amount of information on the PDB entries. At present, range from solvent exposure to ligand binding ability, the huge MSD project (4), merging all main databases location in a protein cavity, secondary structure, with the PDB, represents the best implementation of this residue type, sequence functional pattern, protein concept. domain and catalytic activity. Users can select any (ii) Data integration can operate at different levels. Large residue subset (even including any number of PDB volumes of data about protein structure and function are structures) by combining the available features. currently available in the biologically relevant databases. Selections can be used as probe and target in multiple Such data can be integrated at the protein level. More effectively, for a focus on molecular function, they can structure comparison searches. For example a search be mapped onto protein residues. Data integration at the could involve, as a query, all solvent-exposed, hydro- residue level is exemplified by the possibility of querying phylic residues that are not in alpha-helices and are for solvent-exposed amino acids located in the alpha- involved in nucleotide binding. Possible examples of helices of a protein structure. This feature has already targets are represented by another selection, a single been used in the SURFACE database (5) and has now structure or a dataset composed of many structures. been extended by MSDmine (unpublished resource, The output is a list of aligned structural matches http://www.ebi.ac.uk/msd-srv/msdmine). offered in tabular and also graphical format. (iii) Integration between data and one or more computational methods is a fundamental task. Such a task is achieved in tools where simple or complex selections of the integrated INTRODUCTION data can be built and straightforwardly used as input to Structural genomics projects (1) and the improvement of an embedded method (i.e. running a comparison program experimental techniques for structural analysis enrich the only on proteins sharing a specified function). Protein Data Bank (PDB) (2) with structural data of very high (iv) The last important property for a complete structural ana- quality and reliability. Nevertheless, few complete resources lysis tool is its being able to consider vast amounts of data at are available for analysing the connections between structural the same time, i.e. its wholeness or ability to work as a high- features and molecular functions that lie hidden in this huge throughput resource. Queries can be formulated with more amount of data. We identiﬁed some important characteristics or even all the available data. A user may choose to focus on that may be considered in the design and construction of all proteins belonging to a specified SCOP class or to select a complete resource for establishing structure–function links: all the tryptophan residues in the whole PDB catalytic sites. (i) presence of integrated data (number and type of different considered databases); (ii) level of the data integration In the perspective described here, we propose pdbFun as a fast detail (i.e. structure, domain, residue and atom); (iii) level of and user-friendly integrated web server for structural analysis *To whom correspondence should be addressed. Tel: +39 06 72594314; Fax: +39 06 72594314; Email: [email protected] The Author 2005. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] W134 Nucleic Acids Research, 2005, Vol. 33, Web Server issue of local similarities among proteins. pdbFun collects annota- Features tions derived from different databases (data integration), maps The different features currently available are shown on, and them onto single residues (good level of integration detail) and can be accessed from, the homepage. The user can start runs a local structural comparison algorithm on the selected creating one residue selection by choosing any one of the residues (data/method integration). Queries and comparisons following (Figure 1): are allowed on any sets of annotations or residues, even includ- ing the entire PDB (wholeness). (i) Structures. All residues belonging to one or many PDB structures can be selected, up to and including the whole Overview database. (ii) Chains. All residues belonging to one or more chains can pdbFun is an integrated web tool for querying the PDB at the be selected. Lists of non-redundant PDB chains are avail- residue level and for local structural comparison. pdbFun able here as pre-calculated selections. integrates knowledge about single residues in protein structures (iii) Surfaces. Residues can be selected according to their from other databases or calculated with available instruments or solvent-exposed or buried status given by the NACCESS instruments developed in-house for structural analysis. Each set program (7). of different annotations represents a feature. Typical features are (iv) Clefts. The SURFNET program (8) is used to assign sur- secondary structure assignments or SMART domains (6), face residues to protein cavities. Cavities are sorted by whose annotations are the H/T/E assignments or domain fam- size (number 1 refers to the biggest). ilies, respectively, reported at the residue level. The user can (v) Domains. Residues belonging to domains are annotated build simple residue selections by including any number of here using HMMER (9) on the SMART database. annotations from a single feature, e.g. all residues belonging (vi) Two-dimensional structures. Each residue is associated to any of three different SMART domains. The selections can be combined recursively to create more complex ones. The user is with the secondary structure assignment provided by the allowed to choose only the b-strand or turn residues of the pre- dssp (10) program. (E: extended strand; H: alpha-helix; vious three domains. Each selection can be manually reﬁned by T: hydrogen bonded turn, etc.). adding and removing single residues. Structural similarity can (vii) Motifs. PROSITE patterns (11) as found on the sequences be searched between any pair of selections. All comparisons of the PDB chains. (viii) Binding sites. Users can select residues whose distance is and queries are performed in real time with a fast program <3.5 s from any ligand molecule present in the PDB. (Ausiello,G., Via,A. and Helmer-Citterich,M., manuscript Choosing ATP or ADP selects all residues found at a submitted) running on the web server. Figure 1. A Selection table is shown. The user has created five selections: Selection 1, all PROSITE residues with the ATP keyword in the pattern description (using the motifs feature); Selection 2, all charged residues in the PDB (D, E, H, K and R in the residues feature); Selection 3, all exposed residues (surface feature); Selection 4, all charged residues in the selected motifs (Selection 1 INTERSECT Selection 2); Selection 5, all charged residues in the selected motifs that are not solvent-exposed (Selection 4 SUBTRACT Selection 3). The estimated time for comparing the first chain (see text) of Selection 5 as query and Selection 3 as target is 18 s. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W135 distance closer than the defined threshold from the ATP found on 2952 different PDB chains and comprise a total of or ADP nucleotides. 31 801 residues. (ix) Active sites. Active site residues in a set of enzyme struc- Combining selections tures obtained from the CatRes database (12). (x) Residues. The 20 residue types [from A (alanine) to W All selections can be combined using the AND, OR and NOT (tryptophan)]. This feature helps the user to concentrate boolean operators. The result is a new selection containing a only on some kinds of residues, while ignoring all the combination of their residues. The two selections to combine others (i.e. all charged residues or aromatic residues). are chosen with the ‘probe’ and ‘target’ radio buttons. Apply- ing the ‘Intersect’ (AND) on Selections 1 and 2 (see Figure 1) Annotations creates a new selection including only the common residues (e.g. the PDB proline residues that are found in alpha-helices), By selecting a feature from the pdbFun main page, the user whereas using ‘Add’ (OR) the two selections will be merged accesses the annotation page where single annotations of that (e.g. all residues that are in a big surface cleft ‘or’ belong to particular feature can be chosen to create a simple selection some active site). The ‘Subtract’ (NOT) is also a binary oper- of residues. The total number of selected residues corresponds ator and needs to be understood as an ‘AND’ between the to the sum of all the residues selected by a single annotation. probe and the complement of the target (e.g. all the charged We describe in detail the Motifs feature page. residues which are ‘not’ exposed). In the Motifs page, all PROSITE patterns are listed and Each selection created can be, recursively, the object of a represent the annotations. Fields duplicated locally are the new combination. pattern ‘id’, ‘name’ and ‘short description’. In addition, a The ‘residues’ and ‘chains’ columns of the Selection table ‘residues’ ﬁeld indicates the number of annotated residues in contain useful statistical information on the PDB residues’ the whole PDB. A ‘chains’ ﬁeld indicates the number of chains composition. Questions such as ‘How many charged residues containing at least one of the annotated residues. In order to are buried in the whole PDB, or in a certain type of domain?’ facilitate searching, the annotations are organized in pages and can be answered in a fraction of a second. can be sorted by any ﬁeld. Annotations (i.e. speciﬁc PROSITE motifs) can be added Structural comparison to the current selection in various ways: manually (using Selections can be chosen as probe and target of a structural check-boxes), by text search (only the selected ﬁeld will be comparison procedure to ﬁnd local similarities in residues’ searched) or by uploading a user ﬂat ﬁle containing a list of spatial arrangements. The selected residues in each chain of PROSITE codes. the probe are searched against the selected residues in each All the features available in pdbFun share identical organ- chain of the target. The comparison algorithm is guaranteed to ization. New features can therefore be added and annotations ﬁnd the largest subset of matching residues between two struc- handled without the need to modify the code. tures. The matching condition is an RMSD (root mean square Let us take as an example how to select all PDB residues difference) <0.7 A and a residue similarity >1.3 according to matching any of the PROSITE motifs involving ATP. the Dayhoff substitution matrix. The algorithm is exhaustive, (i) In the Motifs page, sort the annotations by the ‘description’ fast and sequence and fold independent. field by clicking on the column header. All the probe (but not the target) residues must belong to a (ii) Type ‘ATP’ in the search box (the search will be auto- single PDB chain (if the probe is a multi-chain selection, only matically conducted on the sorted field) and press the the ﬁrst chain will be compared by default). Comparisons stop search button. when a match is found comprising at least 10 residues. As soon (iii) All the 18 PROSITE motifs containing ‘ATP’ in the as a new probe or target is chosen, an estimate of the compar- description are selected, and the user can go back to the ison execution time is given at the bottom right of the screen. main page and find the selection described as a row on the pdbFun main page. Comparison results When a comparison is started, the user is redirected to the Simple selections Results page. Here new matches are immediately displayed as they are calculated. Matches are sorted by decreasing score Whenever a selection is made, pdbFun stores it as a row in a and are displayed in pages. The probe chain matching residues Selection table that can be visualized by going back to the are listed in the ﬁrst column of the Results table. Each target main page. Each selection is identiﬁed by a unique name, by a chain is shown in a different column, together with the match type (the feature used to generate it), by the number of annota- length. Target residues are listed in the same rows as the probe tions selected in the feature and by the total number of chains residues to which they are structurally aligned (see Figure 2). and residues in the PDB that have been selected. New selec- Columns can be selected for a graphical view of the match tions can be created by choosing one of the features available in single or multiple alignment using a Java applet. A text in the upper part of the screen. Existing selections can be ﬁle containing the results of the comparison is available for accessed and modiﬁed via the ‘annotations’ ﬁeld. downloading. For example, see Figure 1. The selection created in the previous example now appears in the Selection table as Manual selections ‘Selection 1’. The ‘feature type’ ﬁeld is ‘motifs’. The number of annotations selected is 18 (the 18 PROSITE patterns whose pdbFun allows the user to perform a manual selection of description contains the ATP word). Such patterns have been residues on a single PDB chain, according to his/her interest W136 Nucleic Acids Research, 2005, Vol. 33, Web Server issue Figure 2. The first Results page of a comparison. A manual selection of 5p21 (ras protein) residues involved in GTP binding was compared with the5500 chains of a non-redundant PDB (50%). The output is shown in tabular and also graphic format. In the first column of the table, the matching residues of the query PDB chain are reported; in the adjacent columns, the other PDB chains follow, and the residues aligned in three dimensions appear in the same rows. The matched PDB chains are reported in the first row; the number of matched residues in the second one. Matching residues are also displayed upon selection (pressing on the ‘draw’ button) with a Java applet. or personal knowledge (and not only by using the features Implementation notes calculated or extracted from pre-existing databases). Through In order to achieve high speed and a high level of interactivity, the ‘chains’ ﬁeld in the Selection table, the user accesses a page all residue data are stored in the server memory. A single C where he/she can choose the chain to work with manually. program executes both fast queries and structural comparis- All the residues in the chain of interest will appear as a list, ons, and a relational database is used only for the storage of the together with the available annotations. Sets of single residues feature annotations list and for web users management. All can be chosen. A simple Java applet helps the user in selec- selections can be run in a fraction of a second. Comparison tions. This selection appears in the Selection table as ‘manual times range from fractions of a second to minutes. No time selection’. limit is given to users (but a newly submitted job stops the running one). Web pages have been tested on the main brow- Non-redundant PDB sets sers for the Windows and Linux platforms. Mac users should utilize Safari >1.2. Non-redundant datasets of chains obtained from the PDB (2) at different (90, 70, 50 and 30%) redundancies are available and Future directions can be used to generate non-redundant selections of chains or as target datasets. These sets can be selected from the Chains Major future developments involve the addition of new feature page and modiﬁed manually or left as they are. features. Features in preparation are residue conservation Nucleic Acids Research, 2005, Vol. 33, Web Server issue W137 derived by HSSP (13), presence in structural fold derived by 4. Velankar,S., McNeil,P., Mittard-Runte1,V., Suarez,A., Barrell,D., Apweiler,R. and Henrick,K. (2005) E-MSD: an integrated data resource CATH (14), user-deﬁned sequence regular expressions and for bioinformatics. Nucleic Acids Res., 33, D262–D265. proximity of residues. Finally, to further improve the quality 5. Ferre `,F., Ausiello,G., Zanzoni,A. and Helmer-Citterich,M. (2004) of integration among different data sources, part of the MSD SURFACE: a database of protein surface regions for functional data collection could be used. annotation. Nucleic Acids Res., 32, 240–244. 6. Letunic,I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Upload of user structures will be made possible and Schultz,J., Ponting,C.P. and Bork,P. (2004) SMART 4.0: towards statistical signiﬁcance of the matches introduced. genomic data integration. Nucleic Acids Res., 32, D142–D144. 7. Hubbard,S.J. and Thornton,J.M. (1993) NACCESS Computer Program. Department of Biochemistry and Molecular Biology, University College ACKNOWLEDGEMENTS London. 8. Laskowski,R.A. (1995) SURFNET: a program for visualizing molecular We thank Federico Fratticci for his useful contribution. This surfaces, cavities and intermolecular interactions. J. Mol. Graph., 13, work was supported by Telethon project GGP04273, FIRB and 323–330. Genefun. Funding to pay the Open Access publication charges 9. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. for this article was provided by Telethon project GGP04273. 10. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary Conflict of interest statement. None declared. structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637. 11. Hulo,N., Sigrist,C.J.A., Le Saux,V., Langendijk-Genevaux,P.S., Bordoli,L., Gattiker,A., De Castro,E., Bucher,P. and Bairoch,A. (2004) REFERENCES Recent improvements to the PROSITE database. Nucleic Acids Res., 1. Skolnick,J., Fetrow,J.S. and Kolinski,A. (2000) Structural genomics and 32, D134–D137. its importance for gene function analysis. Nat. Biotechnol., 18, 283–287. 12. Bartlett,G.J., Porter,C.T., Borkakoti,N. and Thornton,J.M. (2002) 2. Deshpande,N., Addess,K.J., Bluhm,W.F., Merino-Ott,J.C., Analysis of catalytic residues in enzyme active sites. J. Mol. Biol., Townsend-Merino,W., Zhang,Q., Knezevich,C., Xie,L., Chen,L., 324, 105–121. Feng,Z. et al. (2005) The RCSB Protein Data Bank: a redesigned query 13. Sander,C. and Schneider,R. (1991) Database of homology derived protein system and relational database based on the mmCIF schema. Nucleic structures and the structural meaning of sequence alignment. Proteins, 9, 56–68. Acids Res., 33, D233–D237. 14. Pearl,F.M.G., Lee,D., Bray,J.E., Sillitoe,I., Todd,A.E., Harrison,A.P., 3. Laskowski,R.A., Chistyakov,V.V. and Thornton,J.M. (2005) PDBsum Thornton,J.M. and Orengo,C.A. (2000) Assigning genomic sequences to more: new summaries and analyses of the known 3D structures of proteins CATH. Nucleic Acids Res., 28, 277–282. and nucleic acids. Nucleic Acids Res., 33, D266–D268.

Journal

Nucleic Acids Research – Oxford University Press

Published: Jul 1, 2005

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

pdbFun: mass selection and fast comparison of annotated PDB residues

pdbFun: mass selection and fast comparison of annotated PDB residues

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

pdbFun: mass selection and fast comparison of annotated PDB residues

pdbFun: mass selection and fast comparison of annotated PDB residues

References (16)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies