ESPript: analysis of multiple sequence alignments in PostScript.

P Gouet; E Courcelle; D I Stuart; F M√©toz

doi:10.1093/bioinformatics/15.4.305

ESPript: analysis of multiple sequence alignments in PostScript.

Gouet, P; Courcelle, E; Stuart, D I; M√©toz, F 1999-04-01 00:00:00 "1. 01 BIOINFORMATICS $*(5 ESPript: analysis of multiple sequence alignments in PostScript $64,&( 17(6 //$07(. 174&(..( $8,' 67$46 $0' 4='=4,& =61< $%14$614; 1) 1.(&7.$4 ,12+;5,&5 +( (: ,&+$4'5 7,.',0* 176+ $4-5 1$' :)14' # ! ! 4172( '( 4,56$..1*4$2+,( ,1.1*,37( 056,676 '( +$4/$&1.1*,( (6 '( ,1.1*,( 647&674$.( 4176( '( $4%100( 17.175( ('(: 4$0&( :)14' (064( )14 1.(&7.$4 &,(0&(5 (9 +(/,564; 7,.',0* 176+ $4-5 1$' :)14' # ! $0' 056,676 '( ,1.1*,( 647&674$.( 8(07( '(5 $46;45 4(01%.( ('(: 4$0&( Abstract tional entries in the PDB for 1997–98), increasing the prob- ability that any sequence will be homologous to one whose Motivation: The program ESPript (Easy Sequencing in 3D structure is known. In the absence of a determined tertiary PostScript) allows the rapid visualization, via PostScript structure, secondary structures can be predicted with reason- output, of sequences aligned with popular programs such as able reliability from amino acid sequences by software such CLUSTAL-W or GCG PILEUP. It can read secondary as PHD (Rost, 1996). structure files (such as that created by the program DSSP) to ESPript, Easy Sequencing in PostScript, is a program in produce a synthesis of both sequence and structural the tradition of ALSCRIPT (Barton, 1993), which renders information. sequence similarities and secondary structure information Results: ESPript can be run via a command file or a friendly for analysis and publication purposes. Most of the assign- html-based user interface. The program calculates an ments are made by default in ESPript and a user familiar with homology score by columns of residues and can sort this the program can obtain Figure 1 in a few minutes. ESPript is calculation by groups of sequences. It offers a palette of not a sequence editor like CINEMA (Attwood et al., 1997) markers to highlight important regions in the alignment. or CLUSTAL X (Thompson et al., 1997), but it can help to ESPript can also paste information on residue conservation optimize an alignment, by displaying on the same figure the into coordinate files, for subsequent visualization with a secondary structure information (observed or predicted) of graphics program. each aligned sequence. Availability: ESPript can be accessed on its Web site at A first version of the program was produced in 1993, at the http://www.ipbs.fr/ESPript. Sources and helpfiles can be Institut de Biologie Structurale, Grenoble. Since then, ES- downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is Pript has been rewritten in the Laboratory of Molecular Bio- held in the directory pub/ESPript. physics, Oxford, and is now developed in the Groupe de Contact: gouet@ipbs.fr Cristallographie Biologique, Toulouse. ESPript’s input con- sists of pre-aligned sequences and files defining secondary Introduction structures. Its output is a colourful PostScript file. The dis- The Internet allows biologists to browse a variety of ever- tributed package consists of the program source, a cgi script growing databases on-line. This enables them to search, using a library from Lincoln D.Stein, Cold Spring Harbour compare and retrieve protein sequences [e.g. from the SWIS- Laboratory, for use with a WWW server, a manual written in SPROT bank (Bairoch and Apweiler, 1997)] and three-di- hypertext, and examples related to a study made on orbivi- mensional (3D) structures [from the Protein Data Bank ruses in Oxford (Grimes et al., 1998). (PDB) (Bernstein et al., 1977)]. The number of entries de- posited with these databases is increasing rapidly (1500 addi- General description of the program The program can read up to 98 sequences aligned on 2000 columns, retaining the alignment of the input file. Sequences *Present address: Institut de Pharmacologie et de Biologie Structurale, 205 route de Narbonne, 31077 Toulouse Cedex, France can be displayed on up to 10 pages of PostScript. Parameters Oxford University Press 305 P.Gouet et al. Fig. 1. An ESPript output, obtained from orbivirus sequences extracted from the SWISSPROT data bank (Bairoch and Apweiler, 1997) and aligned with CLUSTAL-W (Thompson et al., 1994). Sequences are divided into three groups according to similarity. Residues strictly conserved have a black background, residues well conserved within a group according to a Risler matrix (Risler et al., 1988) are indicated by black bold letters and the remainder are in regular black (an inner group score cannot be calculated for group 3 which is made of a single sequence; no residue is written with black bold letters in this group); residues conserved between groups are boxed and residues conserved within a group, but showing significant differences between groups, are on a grey light background. Symbols above blocks of sequences correspond to the secondary structure of protein VP7 of bluetongue virus serotype 10 (Grimes et al., 1995). This protein consists of a helical domain and a beta domain, coloured in black and grey, respectively. VP7 of bluetongue virus serotype 1 from South Africa shares the same secondary structure (Grimes et al., 1998) and the names of the two sequences are in red. Symbols below blocks of sequences show (i) the limits of the two domains as triangles, (ii) an RGD tripeptide which may be important in cell entry as stars and (iii) the relative accessibility of BTV-10 VP7 as rectangles (accessible residues are in black, intermediate in grey and buried in white). 306 Analysing multiple sequence alignment in PostScript are fed in through the standard input, which is divided into are written in Courier. Figure can be in colour or black seven steps. and white. Portrait or landscape orientation in A4 or A3 formats are supported. 1. The program asks first for the name of the multiple 6. Symbols in different colours may be explicitly added alignment file. Files generated by CLUSTAL-W at the bottom of the sequences blocks. Important resi- (Thompson et al., 1994), GCG PILEUP (Wisconsin dues, like the RGD segment in protein VP7 of blue- Package Version 9.0, GCG, Madison), MAXHOM tongue virus in Figure 1, can be highlighted (Grimes et (Sander and Schneider, 1991) and THREADER (Jones al., 1995). It is also possible, at this stage, to change the et al., 1992) are supported. The program offers the default colours for sequence homology and secondary possibility of extracting a segment from the input se- structure representation. quences, and of choosing the number assigned to the 7. The last stage allows the user to define the displayed first residue. sequences and their order of appearance. Sequence 2. The names of one or two files containing secondary groups can be selected to enhance striking similarities structure information can be specified. These files refer (Figure 1). to the secondary structures of (i) the first sequence ap- ESPript is easy to use. In the simplest case, the user merely pearing on the PostScript output and (ii) one selected runs the program on-line, specifies the name of the alignment from the remaining sequences. Files generated by file (part 1 above) and skips all other steps. This creates a DSSP (Kabsch and Sander, 1983), STRIDE (Frishman PostScript file with information on sequence identities and and Argos, 1995) or PHD (Rost, 1996) are accepted. similarities. For more complex cases, it is best to prepare a Helices are symbolized by squiggles, strands by arrows command file or use the html–user interface. and turns by a T letter on the output. The program auto- matically numbers the secondary structural units. The Implementation relative accessibility can be indicated by symbols, if the secondary structures files were produced via DSSP or The source PHD. ESPript is written in FORTRAN77 and is developed on Sili- 3. The name of the PostScript output file is given. By de- con Graphics and DEC workstations at the IPBS, Toulouse. fault, the output name is that of the multiple sequence The present version, ESPript1.4, can be compiled using f77 file with a ‘.ps’ extension. or g77 and has been tested on most platforms (Unix, VMS, 4. A scoring scheme for similarities is given. Fully con- PC-Windows or Linux). served residues are shown on a red background. Simi- larity scores are calculated, by extracting all possible The html interface pairs of residues and by using a Risler (Risler et al., 1988), PAM250 (Dayhoff, 1978), BLOSUM62 (Heni- A cgi script written in perl v 5.004 and relying upon CGI.pm koff and Henikoff, 1996) or identity scoring matrix. If v 2.39 or later is provided with the program. This script can C(i,j) is the score for aligning a residue i with a residue be installed on a Web server, as has been done in Toulouse. j, a similarity score Sc = S C(i,j)/S (i,j) is calculated for The users can execute ESPript by filling the fields of an each column. Residues with a score above a user-de- HTML form. Results are presented as hypertext links to fined threshold are written in red and boxed in blue, PostScript files (or PDB files, if requested; see the paragraph others are in black. Groups of sequences can be defined below). These files can be viewed before retrieving, if the in step 7 below and additional scores calculated: an browser is properly configured. PostScript files can be con- inner group score, ISc, equal to Sc within a group; a verted into other graphics formats (jpeg, tiff, png), using a cross group score, XSc, for all possible pairs between program such as GHOSTSCRIPT. It is also possible to de- residues of different groups; a total group score, TSc = clare printers available to local users, whilst their access is ISc + XSc, and a difference group score, DSc = ISc – denied to remote users. XSc. Residues conserved within a group appear in red (ISc above threshold), residues conserved between Discussion and conclusion groups are boxed in blue (TSc above), residues con- served within a group, but significantly different from ESPript offers a few tricks not routinely available in other one group to the other, are written on yellow boxes programs, which can be used to build up information on the (DSc above). output file. It is possible to obtain an output from different 5. The plot layout is defined. The user can specify the size files of aligned sequences (Figure 2a) or to select sequences of the font, the number of residues per lines and the for similarity calculations, which are not displayed on the centring of the alignment on the paper. Sequence PostScript (Figure 2b). Information from two or more sec- names are written in Times and one-letter code residues ondary structure files can be entered and related to a se- 307 P.Gouet et al. (a) therine Mazza, EMBL Grenoble, and Jean-Denis Pedelacq, IPBS Toulouse, helped to test the program. References Attwood,T.K., Payne,A.W.R., Michie,A.D. and Parry-Smith,D.J. (1997) A Colour INteractive Editor for Multiple Alignments—CIN- EMA. EMBnet.news, 3. Bairoch,A. and Apweiler,R. (1997) The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res., 25, 31–36. Barton,G.J. (1993) ALSCRIPT a tool to format multiple sequence alignments. Protein Eng., 6, 37–40. (b) Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol., 112, 535–542. Brünger,A.T., Kuriyan,J. and Karplus,M. (1987) Crystallographic R factor refinement by molecular dynamics. Science, 235, 458–460. Fig. 2. ESPript outputs extracted from a study made on orbiviruses Dayhoff,M. (1978) Atlas of Protein Sequences and Structure. National showing (a) the capability of the program to generate a PostScript Biomedical Research Foundation, Washington, DC, pp. 345 file from several multiple aligned sequences files and (b) its Frishman,D. and Argos,P. (1995) Knowledge-based secondary struc- capability to map similarity information on a single sequence for ture assignment. Proteins, 23, 566–579. conciseness. Grimes,J., Basak,A.K., Roy,P. and Stuart,D. (1995) The crystal structure of bluetongue virus VP7. Nature, 373, 167–170. Grimes,J., Burroughs,N., Gouet,P., Diprose,J.M., Malby,R., Zeinta- ra,S., Mertens,P.P.C. and Stuart,D. (1998) The atomic structure of quence chosen by the user (Figure 2a and b). In addition, a the bluetongue virus core. Nature, 395, 470–478. PDB file can be produced with temperature factors replaced Henikoff,J.G. and Henikoff,S. (1996) Blocks database and applica- by similarity scores calculated with ESPript. This file can be tions. Methods Enzymol., 266, 88–105. passed to a graphics program to represent conserved areas by Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) A new approach to a colour code. protein fold recognition. Nature, 358, 86–89. One can get even more from the program. The PostScript Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary generated by ESPript starts with the definition of a new font, structure: pattern recognition of hydrogen-bonded and geometrical where letters correspond to drawing commands. The pro- features. Biopolymers, 22, 2577–2637. gram relies on the manipulation of arrays of characters. ES- Risler,J.L., Delorme,M.O., Delacroix,H. and Henaut,A. (1988) Amino Pript generates an output made up of a succession of lines, acid substitutions in structurally related proteins. A pattern recogni- containing commands to draw letters, digits or symbols in tion approach. Determination of a new and efficient scoring matrix. J. Mol. Biol., 204, 1019–1029. PostScript. It is quite easy to insert a subroutine in the pro- Rost,B. (1996) PHD: predicting one-dimensional protein structure by gram, to translate new information such as a list of contacts profile based neural networks. Methods Enzymol., 266, 525–539. generated by X-PLOR (Brünger et al., 1987) into an array of Sander,C. and Schneider,R. (1991) Database of homology-derived characters. Such a modification was used to generate a figure protein structures and the structural meaning of sequence alignment. published in Grimes et al. (1998), where residues involved Proteins, 9, 56–68. in intermolecular contacts are pointed out by a letter code. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: This feature has been implemented in the latest version of the improving the sensitivity of progressive multiple sequence align- program, ESPript 1.4, released in September 1998. ment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. Acknowledgements Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Hig- gins,D.G. (1997) The Clustal X windows interface: flexible ESPript is available thanks to Jean-Pierre Samama, in charge strategies for multiple sequence alignment aided by quality analysis of the Groupe de Cristallographie at the IPBS, Toulouse. Ca- tools. Nucleic Acids Res., 24, 4876–4882. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/espript-analysis-of-multiple-sequence-alignments-in-postscript-YhZS1Wp8dQ

Loading next page...

References (17)

J. Grimes, J. Burroughs, P. Gouet, J. Diprose, R. Malby, Stéphan Zientara, Peter Mertens, David Stuart (1998)
The atomic structure of the bluetongue virus core
Nature, 395
(1997)
A Colour INteractive Editor for Multiple Alignments—CINEMA
(1978)
Atlas of Protein Sequences and Structure
C. Sander, R. Schneider (1991)
Database of homology‐derived protein structures and the structural meaning of sequence alignment
Proteins: Structure, 9
(1995)
Knowledge-based secondary structure assignment
J. Thompson, T. Gibson, F. Plewniak, F. Jeanmougin, D. Higgins (1997)
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
Nucleic acids research, 25 24
F. Bernstein, T. Koetzle, G. Williams, E. Meyer, M. Brice, J. Rodgers, T. Singh, T. Shimanouchi, M. Tasumi
\the Protein Data Bank: a Computer-based Archival Le for Macromolecular Structures,"
G. Barton (1993)
ALSCRIPT: a tool to format multiple sequence alignments.
Protein engineering, 6 1
Axel BR�NGER, J. Kuriyan, M. Karplus (1987)
Crystallographic R Factor Refinement by Molecular Dynamics
Science, 235
(1996)
Blocks database and applications
J. Grimes, A. Basak, P. Roy, D. Stuart (1995)
The crystal structure of bluetongue virus VP7
Nature, 373
W. Kabsch, C. Sander (1983)
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 22
David Jones, W. Taylor, J. Thornton (1992)
A new approach to protein fold recognition
Nature, 358
A. Bairoch, R. Apweiler (1997)
The SWISS-PROT protein sequence data bank and its supplement TrEMBL
Nucleic acids research, 25 1
J. Risler, M. Delorme, H. Delacroix, A. Hénaut (1988)
Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix.
Journal of molecular biology, 204 4
B. Rost (1996)
PHD: predicting one-dimensional protein structure by profile-based neural networks.
Methods in enzymology, 266
J. Thompson, D. Higgins, T. Gibson (1994)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Nucleic acids research, 22 22

Publisher: Oxford University Press
Copyright: © Published by Oxford University Press.
ISSN: 1367-4803
eISSN: 1460-2059
DOI: 10.1093/bioinformatics/15.4.305
Publisher site: See Article on Publisher Site

Abstract

"1. 01 BIOINFORMATICS $*(5 ESPript: analysis of multiple sequence alignments in PostScript $64,&( 17(6 //$07(. 174&(..( $8,' 67$46 $0' 4='=4,& =61< $%14$614; 1) 1.(&7.$4 ,12+;5,&5 +( (: ,&+$4'5 7,.',0* 176+ $4-5 1$' :)14' # ! ! 4172( '( 4,56$..1*4$2+,( ,1.1*,37( 056,676 '( +$4/$&1.1*,( (6 '( ,1.1*,( 647&674$.( 4176( '( $4%100( 17.175( ('(: 4$0&( :)14' (064( )14 1.(&7.$4 &,(0&(5 (9 +(/,564; 7,.',0* 176+ $4-5 1$' :)14' # ! $0' 056,676 '( ,1.1*,( 647&674$.( 8(07( '(5 $46;45 4(01%.( ('(: 4$0&( Abstract tional entries in the PDB for 1997–98), increasing the prob- ability that any sequence will be homologous to one whose Motivation: The program ESPript (Easy Sequencing in 3D structure is known. In the absence of a determined tertiary PostScript) allows the rapid visualization, via PostScript structure, secondary structures can be predicted with reason- output, of sequences aligned with popular programs such as able reliability from amino acid sequences by software such CLUSTAL-W or GCG PILEUP. It can read secondary as PHD (Rost, 1996). structure files (such as that created by the program DSSP) to ESPript, Easy Sequencing in PostScript, is a program in produce a synthesis of both sequence and structural the tradition of ALSCRIPT (Barton, 1993), which renders information. sequence similarities and secondary structure information Results: ESPript can be run via a command file or a friendly for analysis and publication purposes. Most of the assign- html-based user interface. The program calculates an ments are made by default in ESPript and a user familiar with homology score by columns of residues and can sort this the program can obtain Figure 1 in a few minutes. ESPript is calculation by groups of sequences. It offers a palette of not a sequence editor like CINEMA (Attwood et al., 1997) markers to highlight important regions in the alignment. or CLUSTAL X (Thompson et al., 1997), but it can help to ESPript can also paste information on residue conservation optimize an alignment, by displaying on the same figure the into coordinate files, for subsequent visualization with a secondary structure information (observed or predicted) of graphics program. each aligned sequence. Availability: ESPript can be accessed on its Web site at A first version of the program was produced in 1993, at the http://www.ipbs.fr/ESPript. Sources and helpfiles can be Institut de Biologie Structurale, Grenoble. Since then, ES- downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is Pript has been rewritten in the Laboratory of Molecular Bio- held in the directory pub/ESPript. physics, Oxford, and is now developed in the Groupe de Contact: gouet@ipbs.fr Cristallographie Biologique, Toulouse. ESPript’s input con- sists of pre-aligned sequences and files defining secondary Introduction structures. Its output is a colourful PostScript file. The dis- The Internet allows biologists to browse a variety of ever- tributed package consists of the program source, a cgi script growing databases on-line. This enables them to search, using a library from Lincoln D.Stein, Cold Spring Harbour compare and retrieve protein sequences [e.g. from the SWIS- Laboratory, for use with a WWW server, a manual written in SPROT bank (Bairoch and Apweiler, 1997)] and three-di- hypertext, and examples related to a study made on orbivi- mensional (3D) structures [from the Protein Data Bank ruses in Oxford (Grimes et al., 1998). (PDB) (Bernstein et al., 1977)]. The number of entries de- posited with these databases is increasing rapidly (1500 addi- General description of the program The program can read up to 98 sequences aligned on 2000 columns, retaining the alignment of the input file. Sequences *Present address: Institut de Pharmacologie et de Biologie Structurale, 205 route de Narbonne, 31077 Toulouse Cedex, France can be displayed on up to 10 pages of PostScript. Parameters Oxford University Press 305 P.Gouet et al. Fig. 1. An ESPript output, obtained from orbivirus sequences extracted from the SWISSPROT data bank (Bairoch and Apweiler, 1997) and aligned with CLUSTAL-W (Thompson et al., 1994). Sequences are divided into three groups according to similarity. Residues strictly conserved have a black background, residues well conserved within a group according to a Risler matrix (Risler et al., 1988) are indicated by black bold letters and the remainder are in regular black (an inner group score cannot be calculated for group 3 which is made of a single sequence; no residue is written with black bold letters in this group); residues conserved between groups are boxed and residues conserved within a group, but showing significant differences between groups, are on a grey light background. Symbols above blocks of sequences correspond to the secondary structure of protein VP7 of bluetongue virus serotype 10 (Grimes et al., 1995). This protein consists of a helical domain and a beta domain, coloured in black and grey, respectively. VP7 of bluetongue virus serotype 1 from South Africa shares the same secondary structure (Grimes et al., 1998) and the names of the two sequences are in red. Symbols below blocks of sequences show (i) the limits of the two domains as triangles, (ii) an RGD tripeptide which may be important in cell entry as stars and (iii) the relative accessibility of BTV-10 VP7 as rectangles (accessible residues are in black, intermediate in grey and buried in white). 306 Analysing multiple sequence alignment in PostScript are fed in through the standard input, which is divided into are written in Courier. Figure can be in colour or black seven steps. and white. Portrait or landscape orientation in A4 or A3 formats are supported. 1. The program asks first for the name of the multiple 6. Symbols in different colours may be explicitly added alignment file. Files generated by CLUSTAL-W at the bottom of the sequences blocks. Important resi- (Thompson et al., 1994), GCG PILEUP (Wisconsin dues, like the RGD segment in protein VP7 of blue- Package Version 9.0, GCG, Madison), MAXHOM tongue virus in Figure 1, can be highlighted (Grimes et (Sander and Schneider, 1991) and THREADER (Jones al., 1995). It is also possible, at this stage, to change the et al., 1992) are supported. The program offers the default colours for sequence homology and secondary possibility of extracting a segment from the input se- structure representation. quences, and of choosing the number assigned to the 7. The last stage allows the user to define the displayed first residue. sequences and their order of appearance. Sequence 2. The names of one or two files containing secondary groups can be selected to enhance striking similarities structure information can be specified. These files refer (Figure 1). to the secondary structures of (i) the first sequence ap- ESPript is easy to use. In the simplest case, the user merely pearing on the PostScript output and (ii) one selected runs the program on-line, specifies the name of the alignment from the remaining sequences. Files generated by file (part 1 above) and skips all other steps. This creates a DSSP (Kabsch and Sander, 1983), STRIDE (Frishman PostScript file with information on sequence identities and and Argos, 1995) or PHD (Rost, 1996) are accepted. similarities. For more complex cases, it is best to prepare a Helices are symbolized by squiggles, strands by arrows command file or use the html–user interface. and turns by a T letter on the output. The program auto- matically numbers the secondary structural units. The Implementation relative accessibility can be indicated by symbols, if the secondary structures files were produced via DSSP or The source PHD. ESPript is written in FORTRAN77 and is developed on Sili- 3. The name of the PostScript output file is given. By de- con Graphics and DEC workstations at the IPBS, Toulouse. fault, the output name is that of the multiple sequence The present version, ESPript1.4, can be compiled using f77 file with a ‘.ps’ extension. or g77 and has been tested on most platforms (Unix, VMS, 4. A scoring scheme for similarities is given. Fully con- PC-Windows or Linux). served residues are shown on a red background. Simi- larity scores are calculated, by extracting all possible The html interface pairs of residues and by using a Risler (Risler et al., 1988), PAM250 (Dayhoff, 1978), BLOSUM62 (Heni- A cgi script written in perl v 5.004 and relying upon CGI.pm koff and Henikoff, 1996) or identity scoring matrix. If v 2.39 or later is provided with the program. This script can C(i,j) is the score for aligning a residue i with a residue be installed on a Web server, as has been done in Toulouse. j, a similarity score Sc = S C(i,j)/S (i,j) is calculated for The users can execute ESPript by filling the fields of an each column. Residues with a score above a user-de- HTML form. Results are presented as hypertext links to fined threshold are written in red and boxed in blue, PostScript files (or PDB files, if requested; see the paragraph others are in black. Groups of sequences can be defined below). These files can be viewed before retrieving, if the in step 7 below and additional scores calculated: an browser is properly configured. PostScript files can be con- inner group score, ISc, equal to Sc within a group; a verted into other graphics formats (jpeg, tiff, png), using a cross group score, XSc, for all possible pairs between program such as GHOSTSCRIPT. It is also possible to de- residues of different groups; a total group score, TSc = clare printers available to local users, whilst their access is ISc + XSc, and a difference group score, DSc = ISc – denied to remote users. XSc. Residues conserved within a group appear in red (ISc above threshold), residues conserved between Discussion and conclusion groups are boxed in blue (TSc above), residues con- served within a group, but significantly different from ESPript offers a few tricks not routinely available in other one group to the other, are written on yellow boxes programs, which can be used to build up information on the (DSc above). output file. It is possible to obtain an output from different 5. The plot layout is defined. The user can specify the size files of aligned sequences (Figure 2a) or to select sequences of the font, the number of residues per lines and the for similarity calculations, which are not displayed on the centring of the alignment on the paper. Sequence PostScript (Figure 2b). Information from two or more sec- names are written in Times and one-letter code residues ondary structure files can be entered and related to a se- 307 P.Gouet et al. (a) therine Mazza, EMBL Grenoble, and Jean-Denis Pedelacq, IPBS Toulouse, helped to test the program. References Attwood,T.K., Payne,A.W.R., Michie,A.D. and Parry-Smith,D.J. (1997) A Colour INteractive Editor for Multiple Alignments—CIN- EMA. EMBnet.news, 3. Bairoch,A. and Apweiler,R. (1997) The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res., 25, 31–36. Barton,G.J. (1993) ALSCRIPT a tool to format multiple sequence alignments. Protein Eng., 6, 37–40. (b) Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol., 112, 535–542. Brünger,A.T., Kuriyan,J. and Karplus,M. (1987) Crystallographic R factor refinement by molecular dynamics. Science, 235, 458–460. Fig. 2. ESPript outputs extracted from a study made on orbiviruses Dayhoff,M. (1978) Atlas of Protein Sequences and Structure. National showing (a) the capability of the program to generate a PostScript Biomedical Research Foundation, Washington, DC, pp. 345 file from several multiple aligned sequences files and (b) its Frishman,D. and Argos,P. (1995) Knowledge-based secondary struc- capability to map similarity information on a single sequence for ture assignment. Proteins, 23, 566–579. conciseness. Grimes,J., Basak,A.K., Roy,P. and Stuart,D. (1995) The crystal structure of bluetongue virus VP7. Nature, 373, 167–170. Grimes,J., Burroughs,N., Gouet,P., Diprose,J.M., Malby,R., Zeinta- ra,S., Mertens,P.P.C. and Stuart,D. (1998) The atomic structure of quence chosen by the user (Figure 2a and b). In addition, a the bluetongue virus core. Nature, 395, 470–478. PDB file can be produced with temperature factors replaced Henikoff,J.G. and Henikoff,S. (1996) Blocks database and applica- by similarity scores calculated with ESPript. This file can be tions. Methods Enzymol., 266, 88–105. passed to a graphics program to represent conserved areas by Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) A new approach to a colour code. protein fold recognition. Nature, 358, 86–89. One can get even more from the program. The PostScript Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary generated by ESPript starts with the definition of a new font, structure: pattern recognition of hydrogen-bonded and geometrical where letters correspond to drawing commands. The pro- features. Biopolymers, 22, 2577–2637. gram relies on the manipulation of arrays of characters. ES- Risler,J.L., Delorme,M.O., Delacroix,H. and Henaut,A. (1988) Amino Pript generates an output made up of a succession of lines, acid substitutions in structurally related proteins. A pattern recogni- containing commands to draw letters, digits or symbols in tion approach. Determination of a new and efficient scoring matrix. J. Mol. Biol., 204, 1019–1029. PostScript. It is quite easy to insert a subroutine in the pro- Rost,B. (1996) PHD: predicting one-dimensional protein structure by gram, to translate new information such as a list of contacts profile based neural networks. Methods Enzymol., 266, 525–539. generated by X-PLOR (Brünger et al., 1987) into an array of Sander,C. and Schneider,R. (1991) Database of homology-derived characters. Such a modification was used to generate a figure protein structures and the structural meaning of sequence alignment. published in Grimes et al. (1998), where residues involved Proteins, 9, 56–68. in intermolecular contacts are pointed out by a letter code. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: This feature has been implemented in the latest version of the improving the sensitivity of progressive multiple sequence align- program, ESPript 1.4, released in September 1998. ment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. Acknowledgements Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Hig- gins,D.G. (1997) The Clustal X windows interface: flexible ESPript is available thanks to Jean-Pierre Samama, in charge strategies for multiple sequence alignment aided by quality analysis of the Groupe de Cristallographie at the IPBS, Toulouse. Ca- tools. Nucleic Acids Res., 24, 4876–4882.

Journal

Bioinformatics – Oxford University Press

Published: Apr 1, 1999

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ESPript: analysis of multiple sequence alignments in PostScript.

ESPript: analysis of multiple sequence alignments in PostScript.

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ESPript: analysis of multiple sequence alignments in PostScript.

ESPript: analysis of multiple sequence alignments in PostScript.

References (17)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies