PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

V. A. Simossis; J. Heringa

doi:10.1093/nar/gki390

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

Simossis, V. A.; Heringa, J. 2005-07-01 00:00:00 Nucleic Acids Research, 2005, Vol. 33, Web Server issue W289–W294 doi:10.1093/nar/gki390 PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information 1 1,2, V. A. Simossis and J. Heringa * 1 2 Bioinformatics Section, Faculty of Sciences and Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth & Life Sciences, Vrije Universiteit, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands Received February 11, 2005; Revised and Accepted March 10, 2005 State-of-the-art multiple sequence alignment (MSA) methods, ABSTRACT such as T-COFFEE (1) and MUSCLE (2), as well as other PRofile ALIgNEment (PRALINE) is a fully customizable MSA methods available to date, perform alignments by only multiple sequence alignment application. In addition using the sequences in the given set. Although they use proﬁle to a number of available alignment strategies, technology to match distant sequence sets, they do not use PRALINE can integrate information from database further homology information for the sequences that are avail- homology searches to generate a homology- able in current sequence databases. The beneﬁt of using homo- logous information to align distant sequences has been shown extended multiple alignment. PRALINE also provides in a number of studies (3), while the use of proﬁles to represent a choice of seven different secondary structure predic- the additional homologous information has been shown to tion programs that can be used individually or in com- have many advantages (4,5). For this reason, the PRALINE bination as a consensus for integrating structural toolbox (6,7) has been recently re-designed to include information into the alignment process. The program homology-extended multiple alignment (8), where as an initial can be used through two separate interfaces: one has step a proﬁle for each sequence in a given set is built by using been designed to cater to more advanced needs of PSI-BLAST (9,10) and the progressive alignment then pro- researchers in the field, and the other for standard ceeds using the PSI-BLAST proﬁles instead of the given construction of high confidence alignments. The web- sequences. This approach has been previously applied with based output is designed to facilitate the comprehens- success to local pairwise alignment methods for homology ive visualization of the generated alignments by means modelling (11–15) and is extended in PRALINE for global of five default colour schemes based on: residue type, MSA. The recently updated MAFFT alignment tool (3,16) also uses homologous sequences to improve the alignment quality position conservation, position reliability, residue of distant sequences. However, in the MAFFT approach, the hydrophobicity and secondary structure, depending additional information is not incorporated in proﬁles for each on the options set. A user can also define a custom of the query sequences, but homologous sequences are added colour scheme by selecting which colour will represent to the original set and then aligned together using the various one or more amino acids in the alignment. All generated MAFFT alignment strategies. In the end, the homologous alignments are also made available in the PDF format sequences are removed, leaving the aligned original sequences for easy figure generation for publications. The group- to form the ﬁnal alignment. ing of sequences, on which the alignment is based, can In this paper we present the new web server for the also be visualized as a dendrogram. PRALINE is avail- PRALINE toolbox (6,7), where we have added two new align- able at http://ibivu.cs.vu.nl/programs/pralinewww/. ment features: homology-extended multiple alignment (8) and the integration of predicted secondary structure information with iteration capabilities (V. A. Simossis and J. Heringa, submitted for publication). We show results for the cyto- INTRODUCTION chrome P450 HOMSTRAD (17) sequence set as an example The alignment of two or more sequences has become an to demonstrate how the homology-extended strategy and essential sequence analysis technique in biological research. integrating secondary structure information, in combination *To whom correspondence should be addressed. Tel: +31 20 5987649; Fax: +31 20 5987653; Email: [email protected] ª The Author 2005. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] W290 Nucleic Acids Research, 2005, Vol. 33, Web Server issue with the visualization possibilities of the server output can lead section. As shown in Table 1, the improvement in alignment to meaningful interpretations. Details about the PRALINE quality achieved by homology-extended alignment strategies and optimizations have been described previously (PRALINE ) as compared with other methods is signiﬁcant PSI (6–8,18). in the more difﬁcult alignment cases with average sequence identity percentages <60%. As would be expected, in the easier alignment cases that share >60% sequence identity, HOMOLOGY-EXTENDED MULTIPLE ALIGNMENT all the alignments are of comparable high quality. The homology-extended MSA strategy enriches the informa- When used as an option on the server, the homology- tion for each of the sequences in a given set by collecting extended alignment strategy can further be customized by putative homologous sequences. Each sequence is submitted manually entering the desired iteration count, starting as a query to PSI-BLAST over a database of choice [default: E-value cut-off and database to be searched by PSI-BLAST non-redundant (NR)]. The resulting PSI-BLAST alignments for the building of the homology-extended proﬁles (default: are then ﬁltered for redundancy (100% sequence identity). In 3 iterations, starting with a cut-off of 10 · 10 on the NR the event that no hits or only redundant hits are detected, the database). The default parameters have been optimized by PSI-BLAST E-value threshold is automatically adjusted to a testing different settings on the HOMSTRAD database of 6 5 10-fold less stringent setting (e.g. from 10 · 10 to 10 · 10 ) structural alignments (8). and the query is re-submitted. Once all the sequences to be aligned have at least one additional putative homologue, each INTEGRATION OF SECONDARY STRUCTURE PSI-BLAST alignment is converted into a proﬁle and pro- gressively aligned. A more detailed account of the PRALINE The rule-of-thumb that structure is more conserved than homology-extended multiple alignment algorithm and its sequence is a well-documented fact (21–24). As a result, performance is available in Ref. (8). many studies have shown that its use to guide sequence align- The advantage of this strategy is that it uses a much larger ment improves alignment quality, especially between distant amount of position-speciﬁc information in the homology- sequences (6–8,11–15,25). To this end, we have devised a extended proﬁles to score the alignment of two or more posi- secondary structure scoring scheme for the alignment algo- tions. As a result, the cases that beneﬁt the most are those that rithm that combines exchange weights from four types of evolution has changed so extensively (<30% identity) that matrices: sequence or proﬁle positions that have not been the homology (common ancestry) between them is almost assigned the same secondary structure class are scored undetectable when compared directly (8). using a generic matrix (default: BLOSUM62), otherwise the In Table 1, the performance of the homology-extended positions that have matching helix, strand or coil assignments alignment strategy on 254 HOMSTRAD (17) multiple align- use the Lu ¨ thy (26) helix-, strand- and coil-speciﬁc matrices, ment cases has been compared with the state-of-the-art meth- respectively. The use of the secondary structure information ods T-COFFEEv2.03 and MUSCLEv3.51. The results show signiﬁcantly improves the PRALINE alignment quality BASIC that for the strictest quality measure, column scoring, the and also boosts the PRALINE alignments in the very dif- PSI overall improvement of the PRALINE strategy is >3.5% ﬁcult alignment cases <20% sequence identity (V. A. Simossis PSI relative to T-COFFEE and MUSCLE. Moreover, the improve- and J. Heringa, submitted for publication). In Table 1, it is ment is >5% for the most distant and difﬁcult test cases with clearly shown that the use of the secondary structure is bene- sequences <30% sequence identity. In addition, PRALINE ﬁcial for PRALINE (>4% improvement in cases with PSI BASIC has also been compared with the PRALINE standard global <60% identity), albeit not as signiﬁcant as the improvements progressive alignment strategy (PRALINE ) (6) and the seen with PRALINE . BASIC PSI PRALINE and PRALINE strategies with integrated The secondary structure integration options of PRALINE BASIC PSI predicted [PSIPRED (19) and YASPIN (20)] secondary struc- involve the use of any one of the seven prediction methods that ture information, respectively, named as PRALINE , are listed [PHDpsi (27), PROFsec (B. Rost, unpublished data), BASIC-PSIPRED PRALINE , PRALINE and SSPRO 2.01 (28), YASPIN (20), PSIPRED (19), JNET (29) BASIC-YASPIN PSI-PSIPRED PRALINE . The latter secondary structure-guided and PREDATOR (30,31)] to predict the secondary structure of PSI-YASPIN alignment strategies of PRALINE are discussed in the next the input sequences. In addition, the user can optionally select Table 1. The quality assessment of 254 HOMSTRAD multiple alignment cases generated by different alignment strategies Alignment method Overall (%) 0–30 (%) 30–60 (%) 60–100 (%) P (0–100) Column score PRALINE 63.8 38.7 68.5 95.5 – BASIC PRALINE 68.0 45.3 72.2 96.3 0.106 BASIC-YASPIN PRALINE 67.4 43.5 72.1 95.9 0.337 BASIC-PSIPRED PRALINE 70.2 50.2 73.6 96.7 0.025 PSI PRALINE 70.0 49.7 73.6 96.5 0.042 PSI-YASPIN PRALINE 70.1 50.2 73.5 96.7 0.014 PSI-PSIPRED TCOFFEEv2.03 67.6 44.0 72.2 95.8 0.237 MUSCLEv3.51 67.5 45.0 71.6 96.3 0.461 The significance of the results (P-value from Kolmogorov–Smirnov test) is calculated with regard to the PRALINE method. The column scores are the BASIC percentage correctly aligned columns with regard to the HOMSTRAD structure alignment. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W291 to also search the Protein Data Bank (PDB) to ﬁnd 3D structure THE NEW PRALINE SERVER information for the input sequences and use the DSSP-derived The PRALINE program is designed to use two or more input secondary structure for the alignment. If both DSSP and a protein sequences in the FASTA format (34). The proposed prediction method are selected, the predictions will only be maximum number of sequences that should be submitted to the integrated into the alignment for those sequences that do not server is set to 500 with length 2000, but this is mainly to limit have a PDB entry. Finally, in the same list as the seven pre- the server load and is not the limit of the PRALINE program. diction methods, an optimally segmented (24) or majority In addition, owing to the long running time needed for strat- voting consensus can be alternatively used that currently com- egies, such as PRALINE , an optional email notiﬁcation can PSI bines the predictions of PROFsec, YASPIN and PSIPRED. be requested that is delivered upon a completion of the job and contains the link to the results and some statistics on the resulting alignment. PROFILE PRE-PROCESSING AND ITERATION Similar to the previous version of the server (18), the gap opening and gap extension penalties and the amino acid sub- PRALINE provides a number of alignment strategies, such as stitution matrix can be manually set if needed [default: 12, 1 proﬁle pre-processing and iterative alignment optimization with BLOSUM62 (35)] for any of the PRALINE alignment (6,7). The secondary structure-guided strategies using PHD, strategies. The results page is automatically displayed once the PROFsec, JNET and SSPRO, and the proﬁle pre-processing job is complete and contains various sections depending on strategies can be set to use consistency information to drive the options selected (Figure 1). In order to provide all gener- subsequent alignment rounds (iterations), each time drawing ated ﬁles for the user, there is a link to download a compressed upon the theoretically higher quality information from the ﬁle with all the results in the job directory [Figure 1, (D)] and previous cycle. A detailed account of these strategies can also individual links that allow the user to download speciﬁc be found in previously published work (6,7,18,25,32,33). Figure 1. The PRALINE results page headers. A: The subtitle indicating which iteration results are presented on this page (only available if iteration >0 is selected). B: The time taken to run the job and statistics related to the visible alignment. C: The links to all other available iteration cycle results (only available if iteration >0is selected). D: The link to download all job files as a compressed file. E: Links to tabulated specific file types. F: Links to iteration-specific output files (only available if iteration >0 is selected). G: The button that hides/reveals the profile pre-processing scores of the sequence set (only available if profile pre-processing is selected). H: The buttons that switch between colour schemes. I: The button that generates and opens a PDF version of the alignment in the visible colour scheme. W292 Nucleic Acids Research, 2005, Vol. 33, Web Server issue Figure 2. The PRALINE P450 alignment using both PROFsec and DSSP secondary structure integration settings. The alignment has been sectioned to focus on the PSI regions containing the conserved motifs of the cytochrome P450 enzymes (signified by the black bars above the rulers). (A) The oxygen-binding motif, (B) the ExxR motif and (C) the haem-binding motif. For each section, the top colour scheme shows conservation levels according to the colour key and the bottom one shows the secondary structure each residue belongs to (red: helix; green: strand; and clear: coil). The ruler on top of each alignment block shows which parts of the alignment are visible. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W293 ﬁles related to each sequence in the set (e.g. a PSI-BLAST are straightforwardly visualized in the PRALINE output con- proﬁle or a secondary structure ﬁle) [Figure 1, (E)]. servation colour scheme, while the secondary structure view If the iteration number selected is >0, a subtitle informs the allows us to relate them in a structural context. As stated in the user which iteration cycle results are presented on the page literature (37), the oxygen binding and ExxR motifs are each [Figure 1, (A)]. The alignment from each iteration cycle is part of two distinct C-terminal helices, while the haem-binding presented on a different page and is accessible by the corres- motif ﬂanks the N-terminal end of the last helix. Owing to ponding links [Figure 1, (C)]. In addition, it informs the user of space limitations the alignment has been sectioned to concen- the total time taken for the process to complete, provides some trate on these regions, but the full alignment can be viewed statistics related to the visible alignment [Figure 1, (B)] and if online in example 9 of the supplementary material. the iterations were halted due to alignment convergence or limit cycle convergence and which iteration was the last (not applicable in the Figure 1 example). In the case of iteration- ACKNOWLEDGEMENTS speciﬁc output, such as alignment of the iteration or secondary structure prediction, additional links are displayed The authors would like to thank the Vrije Universiteit [Figure 1, (F)]. Amsterdam for funding this project. Special thanks are also If proﬁle pre-processing is selected the user has the option of due to Drs Franca Fraternali, Jens Kleinjung and John Romein viewing the proﬁle pre-processing scores for all pairwise align- for help with debugging and server testing. Funding to pay the ments for deriving an optimum cut-off value [Figure 1, (G)]. Open Access publication charges of this article was provided Finally, depending on the selected parameters of the job, a by the Vrije Universiteit Amsterdam. series of buttons allows switching between the available Conflict of interest statement. None declared. colour-coded views [Figure 1, (H)] [details about the colour schemes are described in (18)]. At any point, the visible alignment can be converted into a PDF for printing or further manipulation [Figure 1, (I)]. The remaining of the results page REFERENCES consists of a short description of the visible colour scheme 1. Notredame,C., Higgins,D.G. and Heringa,J. (2000) T-Coffee: a novel with a key to the colours, after which the colour-coded align- method for fast and accurate multiple sequence alignment. J. Mol. Biol., ment follows (an example of the conservation and the second- 302, 205–217. ary structure colour-coding is shown in Figure 2). 2. Edgar,R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. 3. Katoh,K., Kuma,K., Toh,H. and Miyata,T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids SAMPLE OUTPUTS Res., 33, 511–518. 4. Wang,G. and Dunbrack,R.L.,Jr (2004) Scoring profile-to-profile Owing to the large number of possible outputs, we have pro- sequence alignments. Protein Sci., 13, 1612–1626. vided a set of nine representative sample outputs for the P450 5. Edgar,R.C. and Sjolander,K. (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20, 1301–1308. alignment on the server, each one representing a different 6. Heringa,J. (1999) Two strategies for sequence comparison: combination of PRALINE strategies and settings. These profile-preprocessed and secondary structure-induced multiple examples are intended as supplementary material to this alignment. Comput. Chem., 23, 341–364. article and can be accessed through a dedicated link on the 7. Heringa,J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26, 459–477. server pages or directly at http://ibivu.cs.vu.nl/programs/ 8. Simossis,V.A., Kleinjung,J. and Heringa,J. (2005) Homology-extended pralinewww/example/. They can also be used as an indication sequence alignment. Nucleic Acids Res., 33, 816–824. of CPU times needed by each of the PRALINE strategies. 9. Altschul,S.F. and Koonin,E.V. (1998) Iterated profile searches with In Figure 2, we illustrate sections of the PRALINE align- PSI PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. ment of the ‘p450’ HOMSTRAD sequence set (21% average Sci., 23, 444–447. 10. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., sequence identity) using both DSSP (36) and PROFsec sec- Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a ondary structure integration settings. The colour schemes in new generation of protein database search programs. Nucleic Acids Res., the ﬁgure are for positional conservation and secondary struc- 25, 3389–3402. ture. The secondary structure information for each sequence in 11. Chung,R. and Yona,G. (2004) Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics, 5, this alignment has been derived by using DSSP, since all the sequences have a corresponding PDB structure. 12. Ginalski,K., Pas,J., Wyrwicz,L.S., von Grotthuss,M., Bujnicki,J.M. and The cytochrome P450 enzymes primarily act as oxidases in Rychlewski,L. (2003) ORFeus: detection of distant homology using multi-component electron transport chains to break down nat- sequence profiles and predicted secondary structure. Nucleic Acids Res., urally occurring toxins and mutagens. The structure is almost 31, 3804–3807. 13. Ginalski,K., von Grotthuss,M., Grishin,N.V. and Rychlewski,L. (2004) triangular, with the C-terminal part being mostly helical, while Detecting distant homology with Meta-BASIC. Nucleic Acids Res., the N-terminal part is more b-sheet rich. The signature motif of 32, W576–W581. P450 enzymes is the haem-binding site, which is often rep- 14. Soding,J. (2004) Protein homology detection by HMM–HMM resented as FxxGxxxCxG (Figure 2C). Other conserved regi- comparison. Bioinformatics, 21, 951–960. 15. von Ohsen,N., Sommer,I., Zimmer,R. and Lengauer,T. (2004) Arby: ons include the motif A(A/G)x(E/D)T (Figure 2A) where the automatic protein structure prediction using profile–profile alignment threonine (T) residue is part of the oxygen-binding site and an and confidence measures. Bioinformatics, 20, 2228–2235. invariant ExxR sequence (Figure 2B). The ExxR and the C 16. Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) MAFFT: a novel residue at the haem-binding site are the only completely con- method for rapid multiple sequence alignment based on fast Fourier served amino acids in P450s. These well-documented details transform. Nucleic Acids Res., 30, 3059–3066. W294 Nucleic Acids Research, 2005, Vol. 33, Web Server issue 17. Mizuguchi,K., Deane,C.M., Blundell,T.L. and Overington,J.P. (1998) 27. Przybylski,D. and Rost,B. (2002) Alignments grow, secondary structure HOMSTRAD: a database of protein structure alignments for homologous prediction improves. Proteins, 46, 197–205. families. Protein Sci., 7, 2469–2471. 28. Pollastri,G., Przybylski,D., Rost,B. and Baldi,P. (2002) Improving 18. Simossis,V.A. and Heringa,J. (2003) The PRALINE online server: the prediction of protein secondary structure in three and eight optimising progressive multiple alignment on the web. Comput. Biol. classes using recurrent neural networks and profiles. Proteins, 47, Chem., 27, 511–519. 228–235. 19. Jones,D.T. (1999) Protein secondary structure prediction based on 29. Cuff,J.A. and Barton,G.J. (2000) Application of multiple sequence position-specific scoring matrices. J. Mol. Biol., 292, alignment profiles to improve protein secondary structure prediction. 195–202. Proteins, 40, 502–511. 20. Lin,K., Simossis,V.A., Taylor,W.R. and Heringa,J. (2005) A simple and 30. Frishman,D. and Argos,P. (1996) Incorporation of non-local interactions fast secondary structure prediction method using hidden neural in protein secondary structure prediction from the amino acid sequence. networks. Bioinformatics, 21, 152–159. Protein Eng., 9, 133–142. 21. Chothia,C. and Lesk,A.M. (1986) The relation between the divergence of 31. Frishman,D. and Argos,P. (1997) Seventy-five percent accuracy in sequence and structure in proteins. EMBO J., 5, 823–826. protein secondary structure prediction. Proteins, 27, 329–335. 22. Rost,B. (1999) Twilight zone of protein sequence alignments. Protein 32. Simossis,V.A. and Heringa,J. (2004) Integrating protein secondary Eng., 12, 85–94. structure prediction and multiple sequence alignment. Curr. Protein 23. Sander,C. and Schneider,R. (1991) Database of homology-derived Pept. Sci., 5, 249–266. protein structures and the structural meaning of sequence alignment. 33. Simossis,V.A., Kleinjung,J. and Heringa,J. (2003) An overview of Proteins, 9, 56–68. multiple sequence alignment. In Baxevanis,A.D. (ed.), Current Protocols 24. Simossis,V.A. and Heringa,J. (2004) The influence of gapped positions in in Bioinformatics.. John Wiley, NY, pp. 3.7.1–3.7.25. multiple sequence alignments on secondary structure prediction 34. Pearson,W.R. (2000) Flexible sequence similarity searching with the methods. Comput. Biol. Chem., 28, 351–366. FASTA3 program package. Methods Mol. Biol., 132, 185–219. 25. Heringa,J. (2000) Computational methods for protein secondary structure 35. Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing prediction using multiple sequence alignments. Curr. Protein Pept. homologies in protein sequences. Methods Enzymol., 91, 524–545. Sci., 1, 273–301. 36. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary 26. Lu ¨ thy,R., McLachlan,A.D. and Eisenberg,D. (1991) Secondary structure: pattern recognition of hydrogen-bonded and geometrical structure-based profiles: use of structure-conserving scoring tables in features. Biopolymers, 22, 2577–2637. searching protein sequence databases for structural similarities. Proteins, 37. In Ortiz de Montellano,P.R. (ed.), Cytochrome P450: Structure, 10, 229–239. Mechanism, and Biochemistry, 2nd edn. Plenum Press, NY. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/praline-a-multiple-sequence-alignment-toolbox-that-integrates-homology-a1A0s2MVnM

Loading next page...

References (41)

Richard Chung, G. Yona (2004)
Protein family comparison using statistical models and predicted structural information
BMC Bioinformatics, 5
Robert Edgar (2004)
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic acids research, 32 5
(2005)
W294 Nucleic Acids Research
K. Katoh, K. Misawa, K. Kuma, T. Miyata (2002)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
Nucleic acids research, 30 14
K. Ginalski, M. Grotthuss, N. Grishin, L. Rychlewski (2004)
Detecting distant homology with Meta-BASIC
Nucleic acids research, 32 Web Server issue
C. Sander, R. Schneider (1991)
Database of homology‐derived protein structures and the structural meaning of sequence alignment
Proteins: Structure, 9
V. Simossis, J. Kleinjung, J. Heringa (2003)
An overview of multiple sequence alignment.
Current protocols in bioinformatics, Chapter 3
C. Notredame, D. Higgins, J. Heringa (2000)
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
Journal of molecular biology, 302 1
James Cuff, G. Barton (2000)
Application of multiple sequence alignment profiles to improve protein secondary structure prediction
Proteins: Structure, 40
V. Simossis, J. Heringa (2003)
The PRALINE online server: optimising progressive multiple alignment on the web
Computational biology and chemistry, 27 4-5
A. Bateman, W. Pearson, L. Stein, G. Stormo, J. Yates (2002)
Current Protocols in Bioinformatics
V. Simossis, J. Kleinjung, J. Heringa (2005)
Homology-extended sequence alignment
Nucleic Acids Research, 33
M. Dayhoff, W. Barker, L. Hunt (1983)
Establishing homologies in protein sequences.
Methods in enzymology, 91
Guoli Wang, Roland Dunbrack (2004)
Scoring profile‐to‐profile sequence alignments
Protein Science, 13
D. Frishman, P. Argos (1997)
Seventy‐five percent accuracy in protein secondary structure prediction
Proteins: Structure, 27
R. Lüthy, A. McLachlan, D. Eisenberg (1991)
Secondary structure‐based profiles: Use of structure‐conserving scoring tables in searching protein sequence databases for structural similarities
Proteins: Structure, 10
D. Frishman, Patrick Argos (1996)
Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence.
Protein engineering, 9 2
Niklas Öhsen, Ingolf Sommer, R. Zimmer, Thomas Lengauer (2004)
Arby: automatic protein structure prediction using profile-profile alignment and confidence measures
Bioinformatics, 20 14
Robert Edgar, Kimmen Sjölander (2004)
A comparison of scoring functions for protein sequence profile alignment
Bioinformatics, 20 8
C. Chothia, A. Lesk (1986)
The relation between the divergence of sequence and structure in proteins.
The EMBO Journal, 5
K. Mizuguchi, C. Deane, T. Blundell, John Overington (1998)
HOMSTRAD: A database of protein structure alignments for homologous families
Protein Science, 7
J. Heringa (2002)
Local Weighting Schemes for Protein Multiple Sequence Alignment
Computers & chemistry, 26 5
V. Simossis, J. Heringa
The Praline Online Server: Optimising Progressive Multiple Alignment on the Web
S. Altschul, Thomas Madden, A. Schäffer, Jinghui Zhang, Zheng Zhang, W. Miller, D. Lipman (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic acids research, 25 17
V. Simossis, J. Heringa (2004)
The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods
Computational biology and chemistry, 28 5-6
J. Heringa (1999)
Two Strategies for Sequence Comparison: Profile-preprocessed and Secondary Structure-induced Multiple Alignment
Computers & chemistry, 23 3-4
S. Altschul, E. Koonin (1998)
Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases.
Trends in biochemical sciences, 23 11
Kuang Lin, V. Simossis, W. Taylor, J. Heringa (2005)
A simple and fast secondary structure prediction method using hidden neural networks
Bioinformatics, 21 2
J. Stegeman, P. Montellano (1986)
Cytochrome P-450: Structure, Mechanism, and Biochemistry
B. Rost (1999)
Twilight zone of protein sequence alignments.
Protein engineering, 12 2
K. Ginalski, J. Pas, L. Wyrwicz, M. Grotthuss, J. Bujnicki, L. Rychlewski (2003)
ORFeus: detection of distant homology using sequence profiles and predicted secondary structure
Nucleic acids research, 31 13
W. Kabsch, C. Sander (1983)
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 22
Dariusz Przybylski, B. Rost (2002)
Alignments grow, secondary structure prediction improves
Proteins: Structure, 46
J. Heringa (2000)
Computational methods for protein secondary structure prediction using multiple sequence alignments.
Current protein & peptide science, 1 3
W. Pearson (2000)
Flexible sequence similarity searching with the FASTA3 program package.
Methods in molecular biology, 132
W. Pryor (1996)
Cytochrome P450: Structure, mechanism, and biochemistry
Free Radical Biology and Medicine, 21
K. Katoh, K. Kuma, H. Toh, T. Miyata (2005)
MAFFT version 5: improvement in accuracy of multiple sequence alignment
Nucleic Acids Research, 33
V. Simossis, J. Heringa (2004)
Integrating protein secondary structure prediction and multiple sequence alignment.
Current protein & peptide science, 5 4
G. Pollastri, Darisz Przybylski, B. Rost, P. Baldi (2002)
Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles
Proteins: Structure, 47
David Jones (1999)
Protein secondary structure prediction based on position-specific scoring matrices.
Journal of molecular biology, 292 2
J. Söding (2005)
Protein homology detection by HMM?CHMM comparison
Bioinformatics, 21 7

Publisher: Oxford University Press
Copyright: © The Author 2005. Published by Oxford University Press. All rights reserved  The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]
ISSN: 0305-1048
eISSN: 1362-4962
DOI: 10.1093/nar/gki390
pmid: 15980472
Publisher site: See Article on Publisher Site

Abstract

Nucleic Acids Research, 2005, Vol. 33, Web Server issue W289–W294 doi:10.1093/nar/gki390 PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information 1 1,2, V. A. Simossis and J. Heringa * 1 2 Bioinformatics Section, Faculty of Sciences and Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth & Life Sciences, Vrije Universiteit, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands Received February 11, 2005; Revised and Accepted March 10, 2005 State-of-the-art multiple sequence alignment (MSA) methods, ABSTRACT such as T-COFFEE (1) and MUSCLE (2), as well as other PRofile ALIgNEment (PRALINE) is a fully customizable MSA methods available to date, perform alignments by only multiple sequence alignment application. In addition using the sequences in the given set. Although they use proﬁle to a number of available alignment strategies, technology to match distant sequence sets, they do not use PRALINE can integrate information from database further homology information for the sequences that are avail- homology searches to generate a homology- able in current sequence databases. The beneﬁt of using homo- logous information to align distant sequences has been shown extended multiple alignment. PRALINE also provides in a number of studies (3), while the use of proﬁles to represent a choice of seven different secondary structure predic- the additional homologous information has been shown to tion programs that can be used individually or in com- have many advantages (4,5). For this reason, the PRALINE bination as a consensus for integrating structural toolbox (6,7) has been recently re-designed to include information into the alignment process. The program homology-extended multiple alignment (8), where as an initial can be used through two separate interfaces: one has step a proﬁle for each sequence in a given set is built by using been designed to cater to more advanced needs of PSI-BLAST (9,10) and the progressive alignment then pro- researchers in the field, and the other for standard ceeds using the PSI-BLAST proﬁles instead of the given construction of high confidence alignments. The web- sequences. This approach has been previously applied with based output is designed to facilitate the comprehens- success to local pairwise alignment methods for homology ive visualization of the generated alignments by means modelling (11–15) and is extended in PRALINE for global of five default colour schemes based on: residue type, MSA. The recently updated MAFFT alignment tool (3,16) also uses homologous sequences to improve the alignment quality position conservation, position reliability, residue of distant sequences. However, in the MAFFT approach, the hydrophobicity and secondary structure, depending additional information is not incorporated in proﬁles for each on the options set. A user can also define a custom of the query sequences, but homologous sequences are added colour scheme by selecting which colour will represent to the original set and then aligned together using the various one or more amino acids in the alignment. All generated MAFFT alignment strategies. In the end, the homologous alignments are also made available in the PDF format sequences are removed, leaving the aligned original sequences for easy figure generation for publications. The group- to form the ﬁnal alignment. ing of sequences, on which the alignment is based, can In this paper we present the new web server for the also be visualized as a dendrogram. PRALINE is avail- PRALINE toolbox (6,7), where we have added two new align- able at http://ibivu.cs.vu.nl/programs/pralinewww/. ment features: homology-extended multiple alignment (8) and the integration of predicted secondary structure information with iteration capabilities (V. A. Simossis and J. Heringa, submitted for publication). We show results for the cyto- INTRODUCTION chrome P450 HOMSTRAD (17) sequence set as an example The alignment of two or more sequences has become an to demonstrate how the homology-extended strategy and essential sequence analysis technique in biological research. integrating secondary structure information, in combination *To whom correspondence should be addressed. Tel: +31 20 5987649; Fax: +31 20 5987653; Email: [email protected] ª The Author 2005. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] W290 Nucleic Acids Research, 2005, Vol. 33, Web Server issue with the visualization possibilities of the server output can lead section. As shown in Table 1, the improvement in alignment to meaningful interpretations. Details about the PRALINE quality achieved by homology-extended alignment strategies and optimizations have been described previously (PRALINE ) as compared with other methods is signiﬁcant PSI (6–8,18). in the more difﬁcult alignment cases with average sequence identity percentages <60%. As would be expected, in the easier alignment cases that share >60% sequence identity, HOMOLOGY-EXTENDED MULTIPLE ALIGNMENT all the alignments are of comparable high quality. The homology-extended MSA strategy enriches the informa- When used as an option on the server, the homology- tion for each of the sequences in a given set by collecting extended alignment strategy can further be customized by putative homologous sequences. Each sequence is submitted manually entering the desired iteration count, starting as a query to PSI-BLAST over a database of choice [default: E-value cut-off and database to be searched by PSI-BLAST non-redundant (NR)]. The resulting PSI-BLAST alignments for the building of the homology-extended proﬁles (default: are then ﬁltered for redundancy (100% sequence identity). In 3 iterations, starting with a cut-off of 10 · 10 on the NR the event that no hits or only redundant hits are detected, the database). The default parameters have been optimized by PSI-BLAST E-value threshold is automatically adjusted to a testing different settings on the HOMSTRAD database of 6 5 10-fold less stringent setting (e.g. from 10 · 10 to 10 · 10 ) structural alignments (8). and the query is re-submitted. Once all the sequences to be aligned have at least one additional putative homologue, each INTEGRATION OF SECONDARY STRUCTURE PSI-BLAST alignment is converted into a proﬁle and pro- gressively aligned. A more detailed account of the PRALINE The rule-of-thumb that structure is more conserved than homology-extended multiple alignment algorithm and its sequence is a well-documented fact (21–24). As a result, performance is available in Ref. (8). many studies have shown that its use to guide sequence align- The advantage of this strategy is that it uses a much larger ment improves alignment quality, especially between distant amount of position-speciﬁc information in the homology- sequences (6–8,11–15,25). To this end, we have devised a extended proﬁles to score the alignment of two or more posi- secondary structure scoring scheme for the alignment algo- tions. As a result, the cases that beneﬁt the most are those that rithm that combines exchange weights from four types of evolution has changed so extensively (<30% identity) that matrices: sequence or proﬁle positions that have not been the homology (common ancestry) between them is almost assigned the same secondary structure class are scored undetectable when compared directly (8). using a generic matrix (default: BLOSUM62), otherwise the In Table 1, the performance of the homology-extended positions that have matching helix, strand or coil assignments alignment strategy on 254 HOMSTRAD (17) multiple align- use the Lu ¨ thy (26) helix-, strand- and coil-speciﬁc matrices, ment cases has been compared with the state-of-the-art meth- respectively. The use of the secondary structure information ods T-COFFEEv2.03 and MUSCLEv3.51. The results show signiﬁcantly improves the PRALINE alignment quality BASIC that for the strictest quality measure, column scoring, the and also boosts the PRALINE alignments in the very dif- PSI overall improvement of the PRALINE strategy is >3.5% ﬁcult alignment cases <20% sequence identity (V. A. Simossis PSI relative to T-COFFEE and MUSCLE. Moreover, the improve- and J. Heringa, submitted for publication). In Table 1, it is ment is >5% for the most distant and difﬁcult test cases with clearly shown that the use of the secondary structure is bene- sequences <30% sequence identity. In addition, PRALINE ﬁcial for PRALINE (>4% improvement in cases with PSI BASIC has also been compared with the PRALINE standard global <60% identity), albeit not as signiﬁcant as the improvements progressive alignment strategy (PRALINE ) (6) and the seen with PRALINE . BASIC PSI PRALINE and PRALINE strategies with integrated The secondary structure integration options of PRALINE BASIC PSI predicted [PSIPRED (19) and YASPIN (20)] secondary struc- involve the use of any one of the seven prediction methods that ture information, respectively, named as PRALINE , are listed [PHDpsi (27), PROFsec (B. Rost, unpublished data), BASIC-PSIPRED PRALINE , PRALINE and SSPRO 2.01 (28), YASPIN (20), PSIPRED (19), JNET (29) BASIC-YASPIN PSI-PSIPRED PRALINE . The latter secondary structure-guided and PREDATOR (30,31)] to predict the secondary structure of PSI-YASPIN alignment strategies of PRALINE are discussed in the next the input sequences. In addition, the user can optionally select Table 1. The quality assessment of 254 HOMSTRAD multiple alignment cases generated by different alignment strategies Alignment method Overall (%) 0–30 (%) 30–60 (%) 60–100 (%) P (0–100) Column score PRALINE 63.8 38.7 68.5 95.5 – BASIC PRALINE 68.0 45.3 72.2 96.3 0.106 BASIC-YASPIN PRALINE 67.4 43.5 72.1 95.9 0.337 BASIC-PSIPRED PRALINE 70.2 50.2 73.6 96.7 0.025 PSI PRALINE 70.0 49.7 73.6 96.5 0.042 PSI-YASPIN PRALINE 70.1 50.2 73.5 96.7 0.014 PSI-PSIPRED TCOFFEEv2.03 67.6 44.0 72.2 95.8 0.237 MUSCLEv3.51 67.5 45.0 71.6 96.3 0.461 The significance of the results (P-value from Kolmogorov–Smirnov test) is calculated with regard to the PRALINE method. The column scores are the BASIC percentage correctly aligned columns with regard to the HOMSTRAD structure alignment. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W291 to also search the Protein Data Bank (PDB) to ﬁnd 3D structure THE NEW PRALINE SERVER information for the input sequences and use the DSSP-derived The PRALINE program is designed to use two or more input secondary structure for the alignment. If both DSSP and a protein sequences in the FASTA format (34). The proposed prediction method are selected, the predictions will only be maximum number of sequences that should be submitted to the integrated into the alignment for those sequences that do not server is set to 500 with length 2000, but this is mainly to limit have a PDB entry. Finally, in the same list as the seven pre- the server load and is not the limit of the PRALINE program. diction methods, an optimally segmented (24) or majority In addition, owing to the long running time needed for strat- voting consensus can be alternatively used that currently com- egies, such as PRALINE , an optional email notiﬁcation can PSI bines the predictions of PROFsec, YASPIN and PSIPRED. be requested that is delivered upon a completion of the job and contains the link to the results and some statistics on the resulting alignment. PROFILE PRE-PROCESSING AND ITERATION Similar to the previous version of the server (18), the gap opening and gap extension penalties and the amino acid sub- PRALINE provides a number of alignment strategies, such as stitution matrix can be manually set if needed [default: 12, 1 proﬁle pre-processing and iterative alignment optimization with BLOSUM62 (35)] for any of the PRALINE alignment (6,7). The secondary structure-guided strategies using PHD, strategies. The results page is automatically displayed once the PROFsec, JNET and SSPRO, and the proﬁle pre-processing job is complete and contains various sections depending on strategies can be set to use consistency information to drive the options selected (Figure 1). In order to provide all gener- subsequent alignment rounds (iterations), each time drawing ated ﬁles for the user, there is a link to download a compressed upon the theoretically higher quality information from the ﬁle with all the results in the job directory [Figure 1, (D)] and previous cycle. A detailed account of these strategies can also individual links that allow the user to download speciﬁc be found in previously published work (6,7,18,25,32,33). Figure 1. The PRALINE results page headers. A: The subtitle indicating which iteration results are presented on this page (only available if iteration >0 is selected). B: The time taken to run the job and statistics related to the visible alignment. C: The links to all other available iteration cycle results (only available if iteration >0is selected). D: The link to download all job files as a compressed file. E: Links to tabulated specific file types. F: Links to iteration-specific output files (only available if iteration >0 is selected). G: The button that hides/reveals the profile pre-processing scores of the sequence set (only available if profile pre-processing is selected). H: The buttons that switch between colour schemes. I: The button that generates and opens a PDF version of the alignment in the visible colour scheme. W292 Nucleic Acids Research, 2005, Vol. 33, Web Server issue Figure 2. The PRALINE P450 alignment using both PROFsec and DSSP secondary structure integration settings. The alignment has been sectioned to focus on the PSI regions containing the conserved motifs of the cytochrome P450 enzymes (signified by the black bars above the rulers). (A) The oxygen-binding motif, (B) the ExxR motif and (C) the haem-binding motif. For each section, the top colour scheme shows conservation levels according to the colour key and the bottom one shows the secondary structure each residue belongs to (red: helix; green: strand; and clear: coil). The ruler on top of each alignment block shows which parts of the alignment are visible. Nucleic Acids Research, 2005, Vol. 33, Web Server issue W293 ﬁles related to each sequence in the set (e.g. a PSI-BLAST are straightforwardly visualized in the PRALINE output con- proﬁle or a secondary structure ﬁle) [Figure 1, (E)]. servation colour scheme, while the secondary structure view If the iteration number selected is >0, a subtitle informs the allows us to relate them in a structural context. As stated in the user which iteration cycle results are presented on the page literature (37), the oxygen binding and ExxR motifs are each [Figure 1, (A)]. The alignment from each iteration cycle is part of two distinct C-terminal helices, while the haem-binding presented on a different page and is accessible by the corres- motif ﬂanks the N-terminal end of the last helix. Owing to ponding links [Figure 1, (C)]. In addition, it informs the user of space limitations the alignment has been sectioned to concen- the total time taken for the process to complete, provides some trate on these regions, but the full alignment can be viewed statistics related to the visible alignment [Figure 1, (B)] and if online in example 9 of the supplementary material. the iterations were halted due to alignment convergence or limit cycle convergence and which iteration was the last (not applicable in the Figure 1 example). In the case of iteration- ACKNOWLEDGEMENTS speciﬁc output, such as alignment of the iteration or secondary structure prediction, additional links are displayed The authors would like to thank the Vrije Universiteit [Figure 1, (F)]. Amsterdam for funding this project. Special thanks are also If proﬁle pre-processing is selected the user has the option of due to Drs Franca Fraternali, Jens Kleinjung and John Romein viewing the proﬁle pre-processing scores for all pairwise align- for help with debugging and server testing. Funding to pay the ments for deriving an optimum cut-off value [Figure 1, (G)]. Open Access publication charges of this article was provided Finally, depending on the selected parameters of the job, a by the Vrije Universiteit Amsterdam. series of buttons allows switching between the available Conflict of interest statement. None declared. colour-coded views [Figure 1, (H)] [details about the colour schemes are described in (18)]. At any point, the visible alignment can be converted into a PDF for printing or further manipulation [Figure 1, (I)]. The remaining of the results page REFERENCES consists of a short description of the visible colour scheme 1. Notredame,C., Higgins,D.G. and Heringa,J. (2000) T-Coffee: a novel with a key to the colours, after which the colour-coded align- method for fast and accurate multiple sequence alignment. J. Mol. Biol., ment follows (an example of the conservation and the second- 302, 205–217. ary structure colour-coding is shown in Figure 2). 2. Edgar,R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. 3. Katoh,K., Kuma,K., Toh,H. and Miyata,T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids SAMPLE OUTPUTS Res., 33, 511–518. 4. Wang,G. and Dunbrack,R.L.,Jr (2004) Scoring profile-to-profile Owing to the large number of possible outputs, we have pro- sequence alignments. Protein Sci., 13, 1612–1626. vided a set of nine representative sample outputs for the P450 5. Edgar,R.C. and Sjolander,K. (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20, 1301–1308. alignment on the server, each one representing a different 6. Heringa,J. (1999) Two strategies for sequence comparison: combination of PRALINE strategies and settings. These profile-preprocessed and secondary structure-induced multiple examples are intended as supplementary material to this alignment. Comput. Chem., 23, 341–364. article and can be accessed through a dedicated link on the 7. Heringa,J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26, 459–477. server pages or directly at http://ibivu.cs.vu.nl/programs/ 8. Simossis,V.A., Kleinjung,J. and Heringa,J. (2005) Homology-extended pralinewww/example/. They can also be used as an indication sequence alignment. Nucleic Acids Res., 33, 816–824. of CPU times needed by each of the PRALINE strategies. 9. Altschul,S.F. and Koonin,E.V. (1998) Iterated profile searches with In Figure 2, we illustrate sections of the PRALINE align- PSI PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. ment of the ‘p450’ HOMSTRAD sequence set (21% average Sci., 23, 444–447. 10. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., sequence identity) using both DSSP (36) and PROFsec sec- Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a ondary structure integration settings. The colour schemes in new generation of protein database search programs. Nucleic Acids Res., the ﬁgure are for positional conservation and secondary struc- 25, 3389–3402. ture. The secondary structure information for each sequence in 11. Chung,R. and Yona,G. (2004) Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics, 5, this alignment has been derived by using DSSP, since all the sequences have a corresponding PDB structure. 12. Ginalski,K., Pas,J., Wyrwicz,L.S., von Grotthuss,M., Bujnicki,J.M. and The cytochrome P450 enzymes primarily act as oxidases in Rychlewski,L. (2003) ORFeus: detection of distant homology using multi-component electron transport chains to break down nat- sequence profiles and predicted secondary structure. Nucleic Acids Res., urally occurring toxins and mutagens. The structure is almost 31, 3804–3807. 13. Ginalski,K., von Grotthuss,M., Grishin,N.V. and Rychlewski,L. (2004) triangular, with the C-terminal part being mostly helical, while Detecting distant homology with Meta-BASIC. Nucleic Acids Res., the N-terminal part is more b-sheet rich. The signature motif of 32, W576–W581. P450 enzymes is the haem-binding site, which is often rep- 14. Soding,J. (2004) Protein homology detection by HMM–HMM resented as FxxGxxxCxG (Figure 2C). Other conserved regi- comparison. Bioinformatics, 21, 951–960. 15. von Ohsen,N., Sommer,I., Zimmer,R. and Lengauer,T. (2004) Arby: ons include the motif A(A/G)x(E/D)T (Figure 2A) where the automatic protein structure prediction using profile–profile alignment threonine (T) residue is part of the oxygen-binding site and an and confidence measures. Bioinformatics, 20, 2228–2235. invariant ExxR sequence (Figure 2B). The ExxR and the C 16. Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) MAFFT: a novel residue at the haem-binding site are the only completely con- method for rapid multiple sequence alignment based on fast Fourier served amino acids in P450s. These well-documented details transform. Nucleic Acids Res., 30, 3059–3066. W294 Nucleic Acids Research, 2005, Vol. 33, Web Server issue 17. Mizuguchi,K., Deane,C.M., Blundell,T.L. and Overington,J.P. (1998) 27. Przybylski,D. and Rost,B. (2002) Alignments grow, secondary structure HOMSTRAD: a database of protein structure alignments for homologous prediction improves. Proteins, 46, 197–205. families. Protein Sci., 7, 2469–2471. 28. Pollastri,G., Przybylski,D., Rost,B. and Baldi,P. (2002) Improving 18. Simossis,V.A. and Heringa,J. (2003) The PRALINE online server: the prediction of protein secondary structure in three and eight optimising progressive multiple alignment on the web. Comput. Biol. classes using recurrent neural networks and profiles. Proteins, 47, Chem., 27, 511–519. 228–235. 19. Jones,D.T. (1999) Protein secondary structure prediction based on 29. Cuff,J.A. and Barton,G.J. (2000) Application of multiple sequence position-specific scoring matrices. J. Mol. Biol., 292, alignment profiles to improve protein secondary structure prediction. 195–202. Proteins, 40, 502–511. 20. Lin,K., Simossis,V.A., Taylor,W.R. and Heringa,J. (2005) A simple and 30. Frishman,D. and Argos,P. (1996) Incorporation of non-local interactions fast secondary structure prediction method using hidden neural in protein secondary structure prediction from the amino acid sequence. networks. Bioinformatics, 21, 152–159. Protein Eng., 9, 133–142. 21. Chothia,C. and Lesk,A.M. (1986) The relation between the divergence of 31. Frishman,D. and Argos,P. (1997) Seventy-five percent accuracy in sequence and structure in proteins. EMBO J., 5, 823–826. protein secondary structure prediction. Proteins, 27, 329–335. 22. Rost,B. (1999) Twilight zone of protein sequence alignments. Protein 32. Simossis,V.A. and Heringa,J. (2004) Integrating protein secondary Eng., 12, 85–94. structure prediction and multiple sequence alignment. Curr. Protein 23. Sander,C. and Schneider,R. (1991) Database of homology-derived Pept. Sci., 5, 249–266. protein structures and the structural meaning of sequence alignment. 33. Simossis,V.A., Kleinjung,J. and Heringa,J. (2003) An overview of Proteins, 9, 56–68. multiple sequence alignment. In Baxevanis,A.D. (ed.), Current Protocols 24. Simossis,V.A. and Heringa,J. (2004) The influence of gapped positions in in Bioinformatics.. John Wiley, NY, pp. 3.7.1–3.7.25. multiple sequence alignments on secondary structure prediction 34. Pearson,W.R. (2000) Flexible sequence similarity searching with the methods. Comput. Biol. Chem., 28, 351–366. FASTA3 program package. Methods Mol. Biol., 132, 185–219. 25. Heringa,J. (2000) Computational methods for protein secondary structure 35. Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing prediction using multiple sequence alignments. Curr. Protein Pept. homologies in protein sequences. Methods Enzymol., 91, 524–545. Sci., 1, 273–301. 36. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary 26. Lu ¨ thy,R., McLachlan,A.D. and Eisenberg,D. (1991) Secondary structure: pattern recognition of hydrogen-bonded and geometrical structure-based profiles: use of structure-conserving scoring tables in features. Biopolymers, 22, 2577–2637. searching protein sequence databases for structural similarities. Proteins, 37. In Ortiz de Montellano,P.R. (ed.), Cytochrome P450: Structure, 10, 229–239. Mechanism, and Biochemistry, 2nd edn. Plenum Press, NY.

Journal

Nucleic Acids Research – Oxford University Press

Published: Jul 1, 2005

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

References (41)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies