Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on... Vol. 21 no. 16 2005, pages 3433–3434 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti541 Structural bioinformatics IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content Zsuzsanna Dosztányi , Veronika Csizmok, Peter Tompa and István Simon Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary Received on March 24, 2005; revised on May 27, 2005; accepted on June 13, 2005 Advance Access publication June 14, 2005 ABSTRACT number of interresidue interactions, providing the stabilizing energy Summary: Intrinsically unstructured/disordered proteins and domains to overcome the entropy loss during folding (Garbuzynskiy et al., (IUPs) lack a well-defined three-dimensional structure under native 2004). In contrast, intrinsically unstructured/disordered proteins and conditions. The IUPred server presents a novel algorithm for predicting domains (IUPs) have special sequences that do not have the capacity such regions from amino acid sequences by estimating their total pair- to form sufficient interresidue interactions. To discriminate between wise interresidue interaction energy, based on the assumption that IUP ordered and disordered regions in proteins, we have developed a sequences do not fold due to their inability to form sufficient stabilizing new approach that estimates the potential of polypeptides to form interresidue interactions. Optional to the prediction are built-in para- such stabilizing contacts by using a statistical interaction poten- meter sets optimized for predicting short or long disordered regions tial (Thomas and Dill, 1996; Dosztányi et al., 2005). It was shown and structured domains. that the sum of interaction energies can be estimated by a quadratic Availability: The IUPred server is available for academic users at expression in the amino acid composition, which takes into account http://iupred.enzim.hu that the contribution of an amino acid to order/disorder depends not Contact: [email protected] only on its own chemical type, but also on its potential interaction partners (Dosztányi et al., 2005). INTRODUCTION The calculation involves a 20 × 20 energy predictor matrix, para- meterized by a statistical method to approach the expected pairwise Instrinsically unstructured proteins exist as an ensemble of altern- energy of globular proteins of known structure. Comparing globular ative conformations, in contrast to folded, globular proteins that proteins and disordered ones, a clear separation of their energy con- have unique native structure. Significant fraction of known gen- tent is found (Dosztányi et al., 2005). As no training on disordered omes encode for proteins with regions of disordered structure. In proteins is involved, this distinction underlines that the lack of a some eukaryotic genomes >20% of the coded residues are predicted well-defined three-dimensional structure is an intrinsic property of as disordered (Dunker et al., 2000; Ward et al., 2004a). In many certain evolved proteins. This approach was turned into a position- cases a protein is fully disordered, while in many other cases there specific method to predict protein disorder by considering only the are long disordered segments in otherwise ordered, folded proteins local sequential environment of residues within 2–100 residues in (Tompa, 2002; Dyson and Wright, 2005). Despite their lack of a well- either direction. The score is then smoothed over a window-size defined globular structure, these proteins carry out basic functions of 21. This prediction method (IUPred), when tested on datasets (Iakoucheva et al., 2002; Ward et al., 2004a), mostly associated with of globular proteins and long disordered protein segments, showed signal transduction, cell-cycle regulation and transcription. Several improved performance over some other widely used methods, such as methods have been developed to predict the disordered character DISOPRED2 (Ward et al., 2004a,b) and PONDR VL3H (Obradovic from amino acid sequences. Some are based on the special amino et al., 2003). acid composition of fully disordered proteins, i.e. the abundance of hydrophilic residues and a high net charge (Uversky et al., 2000; Vucetic et al., 2003), whereas others use various machine learning THE IUPred SERVER approaches trained on specific datasets (Obradovic et al., 2003; Ward The web server takes a single amino acid sequence as an input and et al., 2004a; Linding et al., 2003b). Recently, it was suggested that calculates the pairwise energy profile along the sequence. The energy these sequences do not have the capacity to properly wrap backbone values are then transformed into a probabilistic score ranging from hydrogen bonds (Fernandez and Berry, 2004), which has also been 0 (complete order) to 1 (complete disorder). Residues with a score shown to be important for protein stability. above 0.5 can be regarded as disordered. Optional is the predic- tion of long disorder, short disorder, and structured domains, each BACKGROUND using slightly different parameters. The main profile of our server Our method is footed on the physical explanation of the is to predict context-independent global disorder that encompasses ordered/disordered nature of proteins. Globular proteins make a large at least 30 consecutive residues of predicted disorder. A different set of parameters is suited for predicting short, probably context- To whom correspondence should be addressed. dependent, disordered regions such as missing residues in the X-ray © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 3433 Z.Dosztányi et al. structure of an otherwise globular protein. For this application the Scholarship. P.T. would like to acknowledge the support of the sequential neighborhood of only 25 residues is considered. As chain International Senior Research Fellowship GR067595 from the termini of globular proteins are often disordered in X-ray structures, Wellcome Trust. this is taken into account by an end-adjustment parameter that favors Conflict of Interest: none declared. disorder prediction at the ends. The dependable identification of ordered regions is a crucial step REFERENCES in target selection for structural studies and structural genomics pro- jects (Linding et al., 2003a). Finding putative structured domains Dosztányi,Z. et al. (2005) The pairwise energy content estimated from amino acid com- position discriminates between folded and intrinsically unstructured proteins. J. Mol. suitable for stucture determination is another potential application Biol., 347, 827–839. of this server. In this case the algorithm takes the energy profile and Dunker,A.K. et al. (2000) Intrinsic protein disorder in complete genomes. Genome finds continuous regions confidently predicted ordered. Neighboring Inform. Ser. Workshop Genome Inform., 11, 161–171. regions close to each other are merged, while regions shorter than Dyson,H.J. and Wright,P.E. (2005) Intrinsically unstructured proteins and their func- tions. Nat. Rev. Mol. Cell Biol., 6, 197–208. the minimal domain size of at least 30 residues are ignored. When Fernandez,A. and Berry,R.S. (2004) Molecular dimension explored in evolu- this prediction type is selected, the region(s) predicted to correspond tion to promote proteomic complexity. Proc. Natl Acad. Sci. USA, 101, to structured/globular domains are returned. 13460–13465. The core program to calculate the pairwise energy profile and dis- Garbuzynskiy,S.O. et al. (2004) To be folded or to be unfolded? Protein Sci., 13, order probability is written in C, the web server is written in PHP. 2871–2877. Iakoucheva,L.M. et al. (2002) Intrinsic disorder in cell-signaling and cancer-associated The calculation of the energy profile is based on single sequence, proteins. J. Mol. Biol., 323, 573–584. without time-consuming alignment calculations. To further facilitate JpGraph (2005) JpGraph. Aditus Consulting. the easy accessibility for scripting, a simple text output is generated Linding,R. et al. (2003a) GlobPlot: exploring protein sequences for globularity and on default. However, the user can also request a graphical output. disorder. Nucleic Acids Res., 31, 3701–3708. Linding,R. et al. (2003b) Protein disorder prediction: implications for structural The plot shows the disorder tendency of each residue along the proteomics. Structure (Camb), 11, 1453–1459. sequence. The plot is generated by the JpGraph software (JpGraph, Obradovic,Z. et al. (2003) Predicting intrinsic disorder from amino acid sequence. 2005, http://www.aditus.nu/jpgraph/) on the fly, without storing the Proteins, 53 (Suppl. 6), 566–572. graphical images on the local machine. When the prediction type of Thomas,P.D. and Dill,K.A. (1996) An iterative method for extracting energy-like structured domains is selected, these are highlighted on the plot by quantities from protein structures. Proc. Natl Acad. Sci. USA, 93, 11628–11633. Tompa,P. (2002) Intrinsically unstructured proteins. Trends Biochem. Sci., 27, thick lines. For long sequences, the graph is shown for fragments of 527–533. user-defined fixed length, 500 on default. Uversky,V.N. et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427. ACKNOWLEDGEMENTS Vucetic,S. et al. (2003) Flavors of protein disorder. Proteins, 52, 573–584. Ward,J.J. et al. (2004a) Prediction and functional analysis of native disorder in proteins This work has been sponsored by grants GVOP-3.1.1.-2004-05- from the three kingdoms of life. J. Mol. Biol., 337, 635–645. 0143/3.0, OTKA F043609, T049073, and NKFP MediChem2 Ward,J.J. et al. (2004b) The DISOPRED server for the prediction of protein disorder. 1/A/005/2004. Z.D. and P.T. were supported by the Bolyai János Bioinformatics, 20, 2138–2139. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

Loading next page...
 
/lp/oxford-university-press/iupred-web-server-for-the-prediction-of-intrinsically-unstructured-xJddZube8b

References (16)

Publisher
Oxford University Press
Copyright
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bti541
pmid
15955779
Publisher site
See Article on Publisher Site

Abstract

Vol. 21 no. 16 2005, pages 3433–3434 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti541 Structural bioinformatics IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content Zsuzsanna Dosztányi , Veronika Csizmok, Peter Tompa and István Simon Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary Received on March 24, 2005; revised on May 27, 2005; accepted on June 13, 2005 Advance Access publication June 14, 2005 ABSTRACT number of interresidue interactions, providing the stabilizing energy Summary: Intrinsically unstructured/disordered proteins and domains to overcome the entropy loss during folding (Garbuzynskiy et al., (IUPs) lack a well-defined three-dimensional structure under native 2004). In contrast, intrinsically unstructured/disordered proteins and conditions. The IUPred server presents a novel algorithm for predicting domains (IUPs) have special sequences that do not have the capacity such regions from amino acid sequences by estimating their total pair- to form sufficient interresidue interactions. To discriminate between wise interresidue interaction energy, based on the assumption that IUP ordered and disordered regions in proteins, we have developed a sequences do not fold due to their inability to form sufficient stabilizing new approach that estimates the potential of polypeptides to form interresidue interactions. Optional to the prediction are built-in para- such stabilizing contacts by using a statistical interaction poten- meter sets optimized for predicting short or long disordered regions tial (Thomas and Dill, 1996; Dosztányi et al., 2005). It was shown and structured domains. that the sum of interaction energies can be estimated by a quadratic Availability: The IUPred server is available for academic users at expression in the amino acid composition, which takes into account http://iupred.enzim.hu that the contribution of an amino acid to order/disorder depends not Contact: [email protected] only on its own chemical type, but also on its potential interaction partners (Dosztányi et al., 2005). INTRODUCTION The calculation involves a 20 × 20 energy predictor matrix, para- meterized by a statistical method to approach the expected pairwise Instrinsically unstructured proteins exist as an ensemble of altern- energy of globular proteins of known structure. Comparing globular ative conformations, in contrast to folded, globular proteins that proteins and disordered ones, a clear separation of their energy con- have unique native structure. Significant fraction of known gen- tent is found (Dosztányi et al., 2005). As no training on disordered omes encode for proteins with regions of disordered structure. In proteins is involved, this distinction underlines that the lack of a some eukaryotic genomes >20% of the coded residues are predicted well-defined three-dimensional structure is an intrinsic property of as disordered (Dunker et al., 2000; Ward et al., 2004a). In many certain evolved proteins. This approach was turned into a position- cases a protein is fully disordered, while in many other cases there specific method to predict protein disorder by considering only the are long disordered segments in otherwise ordered, folded proteins local sequential environment of residues within 2–100 residues in (Tompa, 2002; Dyson and Wright, 2005). Despite their lack of a well- either direction. The score is then smoothed over a window-size defined globular structure, these proteins carry out basic functions of 21. This prediction method (IUPred), when tested on datasets (Iakoucheva et al., 2002; Ward et al., 2004a), mostly associated with of globular proteins and long disordered protein segments, showed signal transduction, cell-cycle regulation and transcription. Several improved performance over some other widely used methods, such as methods have been developed to predict the disordered character DISOPRED2 (Ward et al., 2004a,b) and PONDR VL3H (Obradovic from amino acid sequences. Some are based on the special amino et al., 2003). acid composition of fully disordered proteins, i.e. the abundance of hydrophilic residues and a high net charge (Uversky et al., 2000; Vucetic et al., 2003), whereas others use various machine learning THE IUPred SERVER approaches trained on specific datasets (Obradovic et al., 2003; Ward The web server takes a single amino acid sequence as an input and et al., 2004a; Linding et al., 2003b). Recently, it was suggested that calculates the pairwise energy profile along the sequence. The energy these sequences do not have the capacity to properly wrap backbone values are then transformed into a probabilistic score ranging from hydrogen bonds (Fernandez and Berry, 2004), which has also been 0 (complete order) to 1 (complete disorder). Residues with a score shown to be important for protein stability. above 0.5 can be regarded as disordered. Optional is the predic- tion of long disorder, short disorder, and structured domains, each BACKGROUND using slightly different parameters. The main profile of our server Our method is footed on the physical explanation of the is to predict context-independent global disorder that encompasses ordered/disordered nature of proteins. Globular proteins make a large at least 30 consecutive residues of predicted disorder. A different set of parameters is suited for predicting short, probably context- To whom correspondence should be addressed. dependent, disordered regions such as missing residues in the X-ray © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 3433 Z.Dosztányi et al. structure of an otherwise globular protein. For this application the Scholarship. P.T. would like to acknowledge the support of the sequential neighborhood of only 25 residues is considered. As chain International Senior Research Fellowship GR067595 from the termini of globular proteins are often disordered in X-ray structures, Wellcome Trust. this is taken into account by an end-adjustment parameter that favors Conflict of Interest: none declared. disorder prediction at the ends. The dependable identification of ordered regions is a crucial step REFERENCES in target selection for structural studies and structural genomics pro- jects (Linding et al., 2003a). Finding putative structured domains Dosztányi,Z. et al. (2005) The pairwise energy content estimated from amino acid com- position discriminates between folded and intrinsically unstructured proteins. J. Mol. suitable for stucture determination is another potential application Biol., 347, 827–839. of this server. In this case the algorithm takes the energy profile and Dunker,A.K. et al. (2000) Intrinsic protein disorder in complete genomes. Genome finds continuous regions confidently predicted ordered. Neighboring Inform. Ser. Workshop Genome Inform., 11, 161–171. regions close to each other are merged, while regions shorter than Dyson,H.J. and Wright,P.E. (2005) Intrinsically unstructured proteins and their func- tions. Nat. Rev. Mol. Cell Biol., 6, 197–208. the minimal domain size of at least 30 residues are ignored. When Fernandez,A. and Berry,R.S. (2004) Molecular dimension explored in evolu- this prediction type is selected, the region(s) predicted to correspond tion to promote proteomic complexity. Proc. Natl Acad. Sci. USA, 101, to structured/globular domains are returned. 13460–13465. The core program to calculate the pairwise energy profile and dis- Garbuzynskiy,S.O. et al. (2004) To be folded or to be unfolded? Protein Sci., 13, order probability is written in C, the web server is written in PHP. 2871–2877. Iakoucheva,L.M. et al. (2002) Intrinsic disorder in cell-signaling and cancer-associated The calculation of the energy profile is based on single sequence, proteins. J. Mol. Biol., 323, 573–584. without time-consuming alignment calculations. To further facilitate JpGraph (2005) JpGraph. Aditus Consulting. the easy accessibility for scripting, a simple text output is generated Linding,R. et al. (2003a) GlobPlot: exploring protein sequences for globularity and on default. However, the user can also request a graphical output. disorder. Nucleic Acids Res., 31, 3701–3708. Linding,R. et al. (2003b) Protein disorder prediction: implications for structural The plot shows the disorder tendency of each residue along the proteomics. Structure (Camb), 11, 1453–1459. sequence. The plot is generated by the JpGraph software (JpGraph, Obradovic,Z. et al. (2003) Predicting intrinsic disorder from amino acid sequence. 2005, http://www.aditus.nu/jpgraph/) on the fly, without storing the Proteins, 53 (Suppl. 6), 566–572. graphical images on the local machine. When the prediction type of Thomas,P.D. and Dill,K.A. (1996) An iterative method for extracting energy-like structured domains is selected, these are highlighted on the plot by quantities from protein structures. Proc. Natl Acad. Sci. USA, 93, 11628–11633. Tompa,P. (2002) Intrinsically unstructured proteins. Trends Biochem. Sci., 27, thick lines. For long sequences, the graph is shown for fragments of 527–533. user-defined fixed length, 500 on default. Uversky,V.N. et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427. ACKNOWLEDGEMENTS Vucetic,S. et al. (2003) Flavors of protein disorder. Proteins, 52, 573–584. Ward,J.J. et al. (2004a) Prediction and functional analysis of native disorder in proteins This work has been sponsored by grants GVOP-3.1.1.-2004-05- from the three kingdoms of life. J. Mol. Biol., 337, 635–645. 0143/3.0, OTKA F043609, T049073, and NKFP MediChem2 Ward,J.J. et al. (2004b) The DISOPRED server for the prediction of protein disorder. 1/A/005/2004. Z.D. and P.T. were supported by the Bolyai János Bioinformatics, 20, 2138–2139.

Journal

BioinformaticsOxford University Press

Published: Jun 14, 2005

There are no references for this article.