IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

Zsuzsanna Dosztányi; Veronika Csizmok; Peter Tompa; István Simon

doi:10.1093/bioinformatics/bti541

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

Dosztányi, Zsuzsanna; Csizmok, Veronika; Tompa, Peter; Simon, István 2005-06-14 00:00:00 Vol. 21 no. 16 2005, pages 3433–3434 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti541 Structural bioinformatics IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content Zsuzsanna Dosztányi , Veronika Csizmok, Peter Tompa and István Simon Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary Received on March 24, 2005; revised on May 27, 2005; accepted on June 13, 2005 Advance Access publication June 14, 2005 ABSTRACT number of interresidue interactions, providing the stabilizing energy Summary: Intrinsically unstructured/disordered proteins and domains to overcome the entropy loss during folding (Garbuzynskiy et al., (IUPs) lack a well-deﬁned three-dimensional structure under native 2004). In contrast, intrinsically unstructured/disordered proteins and conditions. The IUPred server presents a novel algorithm for predicting domains (IUPs) have special sequences that do not have the capacity such regions from amino acid sequences by estimating their total pair- to form sufﬁcient interresidue interactions. To discriminate between wise interresidue interaction energy, based on the assumption that IUP ordered and disordered regions in proteins, we have developed a sequences do not fold due to their inability to form sufﬁcient stabilizing new approach that estimates the potential of polypeptides to form interresidue interactions. Optional to the prediction are built-in para- such stabilizing contacts by using a statistical interaction poten- meter sets optimized for predicting short or long disordered regions tial (Thomas and Dill, 1996; Dosztányi et al., 2005). It was shown and structured domains. that the sum of interaction energies can be estimated by a quadratic Availability: The IUPred server is available for academic users at expression in the amino acid composition, which takes into account http://iupred.enzim.hu that the contribution of an amino acid to order/disorder depends not Contact: [email protected] only on its own chemical type, but also on its potential interaction partners (Dosztányi et al., 2005). INTRODUCTION The calculation involves a 20 × 20 energy predictor matrix, para- meterized by a statistical method to approach the expected pairwise Instrinsically unstructured proteins exist as an ensemble of altern- energy of globular proteins of known structure. Comparing globular ative conformations, in contrast to folded, globular proteins that proteins and disordered ones, a clear separation of their energy con- have unique native structure. Signiﬁcant fraction of known gen- tent is found (Dosztányi et al., 2005). As no training on disordered omes encode for proteins with regions of disordered structure. In proteins is involved, this distinction underlines that the lack of a some eukaryotic genomes >20% of the coded residues are predicted well-deﬁned three-dimensional structure is an intrinsic property of as disordered (Dunker et al., 2000; Ward et al., 2004a). In many certain evolved proteins. This approach was turned into a position- cases a protein is fully disordered, while in many other cases there speciﬁc method to predict protein disorder by considering only the are long disordered segments in otherwise ordered, folded proteins local sequential environment of residues within 2–100 residues in (Tompa, 2002; Dyson and Wright, 2005). Despite their lack of a well- either direction. The score is then smoothed over a window-size deﬁned globular structure, these proteins carry out basic functions of 21. This prediction method (IUPred), when tested on datasets (Iakoucheva et al., 2002; Ward et al., 2004a), mostly associated with of globular proteins and long disordered protein segments, showed signal transduction, cell-cycle regulation and transcription. Several improved performance over some other widely used methods, such as methods have been developed to predict the disordered character DISOPRED2 (Ward et al., 2004a,b) and PONDR VL3H (Obradovic from amino acid sequences. Some are based on the special amino et al., 2003). acid composition of fully disordered proteins, i.e. the abundance of hydrophilic residues and a high net charge (Uversky et al., 2000; Vucetic et al., 2003), whereas others use various machine learning THE IUPred SERVER approaches trained on speciﬁc datasets (Obradovic et al., 2003; Ward The web server takes a single amino acid sequence as an input and et al., 2004a; Linding et al., 2003b). Recently, it was suggested that calculates the pairwise energy proﬁle along the sequence. The energy these sequences do not have the capacity to properly wrap backbone values are then transformed into a probabilistic score ranging from hydrogen bonds (Fernandez and Berry, 2004), which has also been 0 (complete order) to 1 (complete disorder). Residues with a score shown to be important for protein stability. above 0.5 can be regarded as disordered. Optional is the predic- tion of long disorder, short disorder, and structured domains, each BACKGROUND using slightly different parameters. The main proﬁle of our server Our method is footed on the physical explanation of the is to predict context-independent global disorder that encompasses ordered/disordered nature of proteins. Globular proteins make a large at least 30 consecutive residues of predicted disorder. A different set of parameters is suited for predicting short, probably context- To whom correspondence should be addressed. dependent, disordered regions such as missing residues in the X-ray © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 3433 Z.Dosztányi et al. structure of an otherwise globular protein. For this application the Scholarship. P.T. would like to acknowledge the support of the sequential neighborhood of only 25 residues is considered. As chain International Senior Research Fellowship GR067595 from the termini of globular proteins are often disordered in X-ray structures, Wellcome Trust. this is taken into account by an end-adjustment parameter that favors Conﬂict of Interest: none declared. disorder prediction at the ends. The dependable identiﬁcation of ordered regions is a crucial step REFERENCES in target selection for structural studies and structural genomics pro- jects (Linding et al., 2003a). Finding putative structured domains Dosztányi,Z. et al. (2005) The pairwise energy content estimated from amino acid com- position discriminates between folded and intrinsically unstructured proteins. J. Mol. suitable for stucture determination is another potential application Biol., 347, 827–839. of this server. In this case the algorithm takes the energy proﬁle and Dunker,A.K. et al. (2000) Intrinsic protein disorder in complete genomes. Genome ﬁnds continuous regions conﬁdently predicted ordered. Neighboring Inform. Ser. Workshop Genome Inform., 11, 161–171. regions close to each other are merged, while regions shorter than Dyson,H.J. and Wright,P.E. (2005) Intrinsically unstructured proteins and their func- tions. Nat. Rev. Mol. Cell Biol., 6, 197–208. the minimal domain size of at least 30 residues are ignored. When Fernandez,A. and Berry,R.S. (2004) Molecular dimension explored in evolu- this prediction type is selected, the region(s) predicted to correspond tion to promote proteomic complexity. Proc. Natl Acad. Sci. USA, 101, to structured/globular domains are returned. 13460–13465. The core program to calculate the pairwise energy proﬁle and dis- Garbuzynskiy,S.O. et al. (2004) To be folded or to be unfolded? Protein Sci., 13, order probability is written in C, the web server is written in PHP. 2871–2877. Iakoucheva,L.M. et al. (2002) Intrinsic disorder in cell-signaling and cancer-associated The calculation of the energy proﬁle is based on single sequence, proteins. J. Mol. Biol., 323, 573–584. without time-consuming alignment calculations. To further facilitate JpGraph (2005) JpGraph. Aditus Consulting. the easy accessibility for scripting, a simple text output is generated Linding,R. et al. (2003a) GlobPlot: exploring protein sequences for globularity and on default. However, the user can also request a graphical output. disorder. Nucleic Acids Res., 31, 3701–3708. Linding,R. et al. (2003b) Protein disorder prediction: implications for structural The plot shows the disorder tendency of each residue along the proteomics. Structure (Camb), 11, 1453–1459. sequence. The plot is generated by the JpGraph software (JpGraph, Obradovic,Z. et al. (2003) Predicting intrinsic disorder from amino acid sequence. 2005, http://www.aditus.nu/jpgraph/) on the ﬂy, without storing the Proteins, 53 (Suppl. 6), 566–572. graphical images on the local machine. When the prediction type of Thomas,P.D. and Dill,K.A. (1996) An iterative method for extracting energy-like structured domains is selected, these are highlighted on the plot by quantities from protein structures. Proc. Natl Acad. Sci. USA, 93, 11628–11633. Tompa,P. (2002) Intrinsically unstructured proteins. Trends Biochem. Sci., 27, thick lines. For long sequences, the graph is shown for fragments of 527–533. user-deﬁned ﬁxed length, 500 on default. Uversky,V.N. et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427. ACKNOWLEDGEMENTS Vucetic,S. et al. (2003) Flavors of protein disorder. Proteins, 52, 573–584. Ward,J.J. et al. (2004a) Prediction and functional analysis of native disorder in proteins This work has been sponsored by grants GVOP-3.1.1.-2004-05- from the three kingdoms of life. J. Mol. Biol., 337, 635–645. 0143/3.0, OTKA F043609, T049073, and NKFP MediChem2 Ward,J.J. et al. (2004b) The DISOPRED server for the prediction of protein disorder. 1/A/005/2004. Z.D. and P.T. were supported by the Bolyai János Bioinformatics, 20, 2138–2139. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/iupred-web-server-for-the-prediction-of-intrinsically-unstructured-xJddZube8b

Loading next page...

References (16)

Dunker Ak, Z. Obradovic, P. Romero, Ethan Garner, C. Brown (2000)
Intrinsic protein disorder in complete genomes.
Genome informatics. Workshop on Genome Informatics, 11
V. Uversky, J. Gillespie, A. Fink (2000)
Why are “natively unfolded” proteins unstructured under physiologic conditions?
Proteins: Structure, 41
R. Linding, R. Russell, Victor Neduva, T. Gibson (2003)
GlobPlot: exploring protein sequences for globularity and disorder
Nucleic acids research, 31 13
R. Linding, L. Jensen, F. Diella, P. Bork, T. Gibson, R. Russell (2003)
Protein disorder prediction: implications for structural proteomics.
Structure, 11 11
P. Tompa (2002)
Intrinsically unstructured proteins.
Trends in biochemical sciences, 27 10
(2005)
JpGraph
J. Ward, L. McGuffin, K. Bryson, B. Buxton, David Jones (2004)
The DISOPRED server for the prediction of protein disorder
Bioinformatics, 20 13
S. Garbuzynskiy, M. Lobanov, O. Galzitskaya (2004)
To be folded or to be unfolded?
Protein Science, 13
J. Ward, J. Sodhi, L. McGuffin, B. Buxton, David Jones (2004)
Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.
Journal of molecular biology, 337 3
Ariel Fernández, R. Berry (2004)
Molecular dimension explored in evolution to promote proteomic complexity.
Proceedings of the National Academy of Sciences of the United States of America, 101 37
P. Thomas, Ken Dill (1996)
An iterative method for extracting energy-like quantities from protein structures.
Proceedings of the National Academy of Sciences of the United States of America, 93 21
J. Dyson, Peter Wright (2005)
Intrinsically unstructured proteins and their functions
Nature Reviews Molecular Cell Biology, 6
Z. Dosztányi, V. Csizmok, P. Tompa, I. Simon (2005)
The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins.
Journal of molecular biology, 347 4
L. Iakoucheva, C. Brown, J. Lawson, Z. Obradovic, A. Dunker (2002)
Intrinsic disorder in cell-signaling and cancer-associated proteins.
Journal of molecular biology, 323 3
S. Vucetic, C. Brown, A. Dunker, Z. Obradovic (2003)
Flavors of protein disorder
Proteins: Structure, 52
Z. Obradovic, Kang Peng, S. Vucetic, P. Radivojac, C. Brown, A. Dunker (2003)
Predicting intrinsic disorder from amino acid sequence
Proteins: Structure, 53

Publisher: Oxford University Press
Copyright: © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
ISSN: 1367-4803
eISSN: 1460-2059
DOI: 10.1093/bioinformatics/bti541
pmid: 15955779
Publisher site: See Article on Publisher Site

Abstract

Vol. 21 no. 16 2005, pages 3433–3434 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti541 Structural bioinformatics IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content Zsuzsanna Dosztányi , Veronika Csizmok, Peter Tompa and István Simon Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary Received on March 24, 2005; revised on May 27, 2005; accepted on June 13, 2005 Advance Access publication June 14, 2005 ABSTRACT number of interresidue interactions, providing the stabilizing energy Summary: Intrinsically unstructured/disordered proteins and domains to overcome the entropy loss during folding (Garbuzynskiy et al., (IUPs) lack a well-deﬁned three-dimensional structure under native 2004). In contrast, intrinsically unstructured/disordered proteins and conditions. The IUPred server presents a novel algorithm for predicting domains (IUPs) have special sequences that do not have the capacity such regions from amino acid sequences by estimating their total pair- to form sufﬁcient interresidue interactions. To discriminate between wise interresidue interaction energy, based on the assumption that IUP ordered and disordered regions in proteins, we have developed a sequences do not fold due to their inability to form sufﬁcient stabilizing new approach that estimates the potential of polypeptides to form interresidue interactions. Optional to the prediction are built-in para- such stabilizing contacts by using a statistical interaction poten- meter sets optimized for predicting short or long disordered regions tial (Thomas and Dill, 1996; Dosztányi et al., 2005). It was shown and structured domains. that the sum of interaction energies can be estimated by a quadratic Availability: The IUPred server is available for academic users at expression in the amino acid composition, which takes into account http://iupred.enzim.hu that the contribution of an amino acid to order/disorder depends not Contact: [email protected] only on its own chemical type, but also on its potential interaction partners (Dosztányi et al., 2005). INTRODUCTION The calculation involves a 20 × 20 energy predictor matrix, para- meterized by a statistical method to approach the expected pairwise Instrinsically unstructured proteins exist as an ensemble of altern- energy of globular proteins of known structure. Comparing globular ative conformations, in contrast to folded, globular proteins that proteins and disordered ones, a clear separation of their energy con- have unique native structure. Signiﬁcant fraction of known gen- tent is found (Dosztányi et al., 2005). As no training on disordered omes encode for proteins with regions of disordered structure. In proteins is involved, this distinction underlines that the lack of a some eukaryotic genomes >20% of the coded residues are predicted well-deﬁned three-dimensional structure is an intrinsic property of as disordered (Dunker et al., 2000; Ward et al., 2004a). In many certain evolved proteins. This approach was turned into a position- cases a protein is fully disordered, while in many other cases there speciﬁc method to predict protein disorder by considering only the are long disordered segments in otherwise ordered, folded proteins local sequential environment of residues within 2–100 residues in (Tompa, 2002; Dyson and Wright, 2005). Despite their lack of a well- either direction. The score is then smoothed over a window-size deﬁned globular structure, these proteins carry out basic functions of 21. This prediction method (IUPred), when tested on datasets (Iakoucheva et al., 2002; Ward et al., 2004a), mostly associated with of globular proteins and long disordered protein segments, showed signal transduction, cell-cycle regulation and transcription. Several improved performance over some other widely used methods, such as methods have been developed to predict the disordered character DISOPRED2 (Ward et al., 2004a,b) and PONDR VL3H (Obradovic from amino acid sequences. Some are based on the special amino et al., 2003). acid composition of fully disordered proteins, i.e. the abundance of hydrophilic residues and a high net charge (Uversky et al., 2000; Vucetic et al., 2003), whereas others use various machine learning THE IUPred SERVER approaches trained on speciﬁc datasets (Obradovic et al., 2003; Ward The web server takes a single amino acid sequence as an input and et al., 2004a; Linding et al., 2003b). Recently, it was suggested that calculates the pairwise energy proﬁle along the sequence. The energy these sequences do not have the capacity to properly wrap backbone values are then transformed into a probabilistic score ranging from hydrogen bonds (Fernandez and Berry, 2004), which has also been 0 (complete order) to 1 (complete disorder). Residues with a score shown to be important for protein stability. above 0.5 can be regarded as disordered. Optional is the predic- tion of long disorder, short disorder, and structured domains, each BACKGROUND using slightly different parameters. The main proﬁle of our server Our method is footed on the physical explanation of the is to predict context-independent global disorder that encompasses ordered/disordered nature of proteins. Globular proteins make a large at least 30 consecutive residues of predicted disorder. A different set of parameters is suited for predicting short, probably context- To whom correspondence should be addressed. dependent, disordered regions such as missing residues in the X-ray © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 3433 Z.Dosztányi et al. structure of an otherwise globular protein. For this application the Scholarship. P.T. would like to acknowledge the support of the sequential neighborhood of only 25 residues is considered. As chain International Senior Research Fellowship GR067595 from the termini of globular proteins are often disordered in X-ray structures, Wellcome Trust. this is taken into account by an end-adjustment parameter that favors Conﬂict of Interest: none declared. disorder prediction at the ends. The dependable identiﬁcation of ordered regions is a crucial step REFERENCES in target selection for structural studies and structural genomics pro- jects (Linding et al., 2003a). Finding putative structured domains Dosztányi,Z. et al. (2005) The pairwise energy content estimated from amino acid com- position discriminates between folded and intrinsically unstructured proteins. J. Mol. suitable for stucture determination is another potential application Biol., 347, 827–839. of this server. In this case the algorithm takes the energy proﬁle and Dunker,A.K. et al. (2000) Intrinsic protein disorder in complete genomes. Genome ﬁnds continuous regions conﬁdently predicted ordered. Neighboring Inform. Ser. Workshop Genome Inform., 11, 161–171. regions close to each other are merged, while regions shorter than Dyson,H.J. and Wright,P.E. (2005) Intrinsically unstructured proteins and their func- tions. Nat. Rev. Mol. Cell Biol., 6, 197–208. the minimal domain size of at least 30 residues are ignored. When Fernandez,A. and Berry,R.S. (2004) Molecular dimension explored in evolu- this prediction type is selected, the region(s) predicted to correspond tion to promote proteomic complexity. Proc. Natl Acad. Sci. USA, 101, to structured/globular domains are returned. 13460–13465. The core program to calculate the pairwise energy proﬁle and dis- Garbuzynskiy,S.O. et al. (2004) To be folded or to be unfolded? Protein Sci., 13, order probability is written in C, the web server is written in PHP. 2871–2877. Iakoucheva,L.M. et al. (2002) Intrinsic disorder in cell-signaling and cancer-associated The calculation of the energy proﬁle is based on single sequence, proteins. J. Mol. Biol., 323, 573–584. without time-consuming alignment calculations. To further facilitate JpGraph (2005) JpGraph. Aditus Consulting. the easy accessibility for scripting, a simple text output is generated Linding,R. et al. (2003a) GlobPlot: exploring protein sequences for globularity and on default. However, the user can also request a graphical output. disorder. Nucleic Acids Res., 31, 3701–3708. Linding,R. et al. (2003b) Protein disorder prediction: implications for structural The plot shows the disorder tendency of each residue along the proteomics. Structure (Camb), 11, 1453–1459. sequence. The plot is generated by the JpGraph software (JpGraph, Obradovic,Z. et al. (2003) Predicting intrinsic disorder from amino acid sequence. 2005, http://www.aditus.nu/jpgraph/) on the ﬂy, without storing the Proteins, 53 (Suppl. 6), 566–572. graphical images on the local machine. When the prediction type of Thomas,P.D. and Dill,K.A. (1996) An iterative method for extracting energy-like structured domains is selected, these are highlighted on the plot by quantities from protein structures. Proc. Natl Acad. Sci. USA, 93, 11628–11633. Tompa,P. (2002) Intrinsically unstructured proteins. Trends Biochem. Sci., 27, thick lines. For long sequences, the graph is shown for fragments of 527–533. user-deﬁned ﬁxed length, 500 on default. Uversky,V.N. et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427. ACKNOWLEDGEMENTS Vucetic,S. et al. (2003) Flavors of protein disorder. Proteins, 52, 573–584. Ward,J.J. et al. (2004a) Prediction and functional analysis of native disorder in proteins This work has been sponsored by grants GVOP-3.1.1.-2004-05- from the three kingdoms of life. J. Mol. Biol., 337, 635–645. 0143/3.0, OTKA F043609, T049073, and NKFP MediChem2 Ward,J.J. et al. (2004b) The DISOPRED server for the prediction of protein disorder. 1/A/005/2004. Z.D. and P.T. were supported by the Bolyai János Bioinformatics, 20, 2138–2139.

Journal

Bioinformatics – Oxford University Press

Published: Jun 14, 2005

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

References (16)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies