Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Visualizing profile–profile alignment: pairwise HMM logos

Visualizing profile–profile alignment: pairwise HMM logos Vol. 21 no. 12 2005, pages 2912–2913 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti434 Sequence analysis Visualizing profile–profile alignment: pairwise HMM logos Benjamin Schuster-Böckler and Alex Bateman The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Received on February 8, 2005; revised on March 29, 2005; accepted on March 31, 2005 Advance Access publication April 12, 2005 ABSTRACT FEATURES Summary: The availability of advanced profile–profile comparison Pairwise HMM Logos can be currently accessed in two differ- tools, such as PRC or HHsearch demands sophisticated visualization ent ways. First, they can be made online at http://www.sanger.ac. tools not presently available. We introduce an approach built upon the uk/Software/analysis/logomat-p. Second, they can be constructed concept of HMM logos. The method illustrates the similarities of pairs locally by downloading and installing the Perl sources. In the near of protein family profiles in an intuitive way. Two HMM logos, one for future, pairwise HMM Logos will also be added to the Pfam website. each profile, are drawn one upon the other. The aligned states are A typical pairwise HMM Logo is shown in Figure 1. We intended to then highlighted and connected. construct pairwise HMM Logos to look as similar to HMM Logos as Availability: A web interface offering online creation of pair- possible. This should facilitate their comprehension for users accus- wise HMM logos is available at http://www.sanger.ac.uk/Software/ tomed to HMM Logos. Therefore, we draw two HMM Logos, one analysis/logomat-p. Furthermore, software developers may download for each aligned family. To illustrate individual aligned states they a Perl package that includes methods for creation of pairwise HMM are framed and connected by a block. Unaligned states are shaded logos locally. in grey. In a local alignment, positions before the first and after Contact: bsb@sanger.ac.uk the last aligned states are not shown. A brief summary on the fea- tures of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., INTRODUCTION 2004). In our previous work (Schuster-Böckler et al., 2004), we intro- The problem of profile–profile comparison has a long history but duced the HMM Perl package. It provides generalized methods to has received a lot of attention recently (Söding, 2004; Lyngsø et al., access and modify HMMs. Emission and transition probabilities are 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a res- stored and retrieved as multidimensional matrices using PDL, the ult of the growing number of well characterized protein families Perl Data Language. HMMER files can be parsed and written. It in databases, such as Pfam (Bateman et al., 2004). By adding addi- also allows the creation of HMM logos from profile HMMs. We tional information about properties of the entire family, it has been added a class called HMM::Alignment to this existing framework shown that profile–profile methods significantly increase sensitivity that works as an abstraction layer to the HMM alignment program compared with profile–sequence comparison (Edgar and Sjölander, PRC (Madera, 2005, http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can 2004b). Several different concepts for profile–profile comparison parse and write PRC output as well as run PRC directly if it is installed have been reported. We focused on the visualization of HMM–HMM on the system. As it integrates into the HMM package, it takes alignments. The algorithms behind all currently available HMM HMM::Profile objects, HMMER files, Pfam IDs or combinations alignment programs are very similar. Newer approaches mainly dif- thereof as arguments for creating alignment objects. fer in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state- to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission prob- REQUIREMENTS ability). This can be done efficiently by creating a pair HMM On-the-fly creation of pairwise HMM Logos from HMMER files, (Durbin et al., 1998; Söding, 2004) from the two source HMMs multiple sequence alignments or Pfam IDs is available from and using standard forward or viterbi algorithms for searching an the website http://www.sanger.ac.uk/Software/analysis/logomat-p. optimal solution. Nevertheless, the raw output of the alignment Uploaded HMMs are aligned directly using PRC. Multiple align- tools can be difficult to understand. From the state-to-state pair- ments in ClustalW, MSF or SELEX format are used to create HMMs ings alone, it is not immediately obvious which features the two using HMMER before aligning them. The plain PRC output can be protein families have in common. It was our aim to develop a graph- downloaded separately. Local installation of the HMM Perl package ical representation of HMM–HMM alignments that resolves this requires the PDL and Imager packages to be installed on the sys- issue. tem together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested To whom correspondence should be addressed. against PRC version 1.5.2. 2912 © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org Visualization of pairwise HMM logos Fig. 1. Alignment of the Toxin_7 against the Toxin_9 Pfam family. For each family, an HMM logo is drawn. The numbers above and below each logo show state positions in the HMM. The overall height of the letter stacks represents the information content, the relative letter height corresponds to its emission probability. The column width denotes the relative contribution, the product of the probability that the state is traversed with the expected number of self transitions for the respective state. This is to account for the varying length of insertions. Insert states are drawn in red. Frequently, their relative contribution is very small, making them hard to see. In this picture, you find narrow insert states e.g. at positions 27 and 28 of the Toxin_7 family. The aligned states in each HMM are framed and connected by a block. Omitted states are shaded in grey. ACKNOWLEDGEMENTS Eddy,S.R. (2001) HMMER User’s Guide: Biological Sequence Analysis Using Profile Hidden Markov Models, Version 2.2. Washington University School of Medicine, We would like to thank Martin Madera and Robert Finn for the http://hmmer.wustl.edu. valuable information about theoretical and practical aspects of PRC. Edgar,R.C. and Sjölander,K. (2004a) COACH: profile–profile alignment of protein Johannes Söding kindly answered numerous questions about his families using hidden Markov models. Bioinformatics, 20, 1309–1318. Edgar,R.C. and Sjölander,K. (2004b) A comparison of scoring functions for protein HHsearch algorithm. The authors are grateful for the valuable sequence profile alignment. Bioinformatics, 20, 1301–1308. suggestions and corrections made by the reviewers. B.S.-B. is funded Lyngsø,R. et al. (1999) Metrics and similarity measures for hidden Markov models. by the Wellcome Trust. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1999, 178–186. Madera,M. (2005) PRC—the profile comparer. REFERENCES Schneider,T.D. and Stephens,R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, Schuster-Böckler,B., Schultz,J. and Rahmann,S. (2004) HMM Logos for visualization D138–D141. of protein families. BMC Bioinformatics, 5,7. Durbin,R., Eddy,S.R., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis. Söding,J. (2005) Protein homology detection by HMM–HMM comparison. Cambridge University Press, Cambridge, UK. Bioinformatics, 21, 951–960. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Visualizing profile–profile alignment: pairwise HMM logos

Bioinformatics , Volume 21 (12): 2 – Apr 12, 2005

Loading next page...
 
/lp/oxford-university-press/visualizing-profile-profile-alignment-pairwise-hmm-logos-QfIURmBf2M

References (7)

Publisher
Oxford University Press
Copyright
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bti434
pmid
15827079
Publisher site
See Article on Publisher Site

Abstract

Vol. 21 no. 12 2005, pages 2912–2913 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti434 Sequence analysis Visualizing profile–profile alignment: pairwise HMM logos Benjamin Schuster-Böckler and Alex Bateman The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Received on February 8, 2005; revised on March 29, 2005; accepted on March 31, 2005 Advance Access publication April 12, 2005 ABSTRACT FEATURES Summary: The availability of advanced profile–profile comparison Pairwise HMM Logos can be currently accessed in two differ- tools, such as PRC or HHsearch demands sophisticated visualization ent ways. First, they can be made online at http://www.sanger.ac. tools not presently available. We introduce an approach built upon the uk/Software/analysis/logomat-p. Second, they can be constructed concept of HMM logos. The method illustrates the similarities of pairs locally by downloading and installing the Perl sources. In the near of protein family profiles in an intuitive way. Two HMM logos, one for future, pairwise HMM Logos will also be added to the Pfam website. each profile, are drawn one upon the other. The aligned states are A typical pairwise HMM Logo is shown in Figure 1. We intended to then highlighted and connected. construct pairwise HMM Logos to look as similar to HMM Logos as Availability: A web interface offering online creation of pair- possible. This should facilitate their comprehension for users accus- wise HMM logos is available at http://www.sanger.ac.uk/Software/ tomed to HMM Logos. Therefore, we draw two HMM Logos, one analysis/logomat-p. Furthermore, software developers may download for each aligned family. To illustrate individual aligned states they a Perl package that includes methods for creation of pairwise HMM are framed and connected by a block. Unaligned states are shaded logos locally. in grey. In a local alignment, positions before the first and after Contact: bsb@sanger.ac.uk the last aligned states are not shown. A brief summary on the fea- tures of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., INTRODUCTION 2004). In our previous work (Schuster-Böckler et al., 2004), we intro- The problem of profile–profile comparison has a long history but duced the HMM Perl package. It provides generalized methods to has received a lot of attention recently (Söding, 2004; Lyngsø et al., access and modify HMMs. Emission and transition probabilities are 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a res- stored and retrieved as multidimensional matrices using PDL, the ult of the growing number of well characterized protein families Perl Data Language. HMMER files can be parsed and written. It in databases, such as Pfam (Bateman et al., 2004). By adding addi- also allows the creation of HMM logos from profile HMMs. We tional information about properties of the entire family, it has been added a class called HMM::Alignment to this existing framework shown that profile–profile methods significantly increase sensitivity that works as an abstraction layer to the HMM alignment program compared with profile–sequence comparison (Edgar and Sjölander, PRC (Madera, 2005, http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can 2004b). Several different concepts for profile–profile comparison parse and write PRC output as well as run PRC directly if it is installed have been reported. We focused on the visualization of HMM–HMM on the system. As it integrates into the HMM package, it takes alignments. The algorithms behind all currently available HMM HMM::Profile objects, HMMER files, Pfam IDs or combinations alignment programs are very similar. Newer approaches mainly dif- thereof as arguments for creating alignment objects. fer in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state- to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission prob- REQUIREMENTS ability). This can be done efficiently by creating a pair HMM On-the-fly creation of pairwise HMM Logos from HMMER files, (Durbin et al., 1998; Söding, 2004) from the two source HMMs multiple sequence alignments or Pfam IDs is available from and using standard forward or viterbi algorithms for searching an the website http://www.sanger.ac.uk/Software/analysis/logomat-p. optimal solution. Nevertheless, the raw output of the alignment Uploaded HMMs are aligned directly using PRC. Multiple align- tools can be difficult to understand. From the state-to-state pair- ments in ClustalW, MSF or SELEX format are used to create HMMs ings alone, it is not immediately obvious which features the two using HMMER before aligning them. The plain PRC output can be protein families have in common. It was our aim to develop a graph- downloaded separately. Local installation of the HMM Perl package ical representation of HMM–HMM alignments that resolves this requires the PDL and Imager packages to be installed on the sys- issue. tem together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested To whom correspondence should be addressed. against PRC version 1.5.2. 2912 © The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org Visualization of pairwise HMM logos Fig. 1. Alignment of the Toxin_7 against the Toxin_9 Pfam family. For each family, an HMM logo is drawn. The numbers above and below each logo show state positions in the HMM. The overall height of the letter stacks represents the information content, the relative letter height corresponds to its emission probability. The column width denotes the relative contribution, the product of the probability that the state is traversed with the expected number of self transitions for the respective state. This is to account for the varying length of insertions. Insert states are drawn in red. Frequently, their relative contribution is very small, making them hard to see. In this picture, you find narrow insert states e.g. at positions 27 and 28 of the Toxin_7 family. The aligned states in each HMM are framed and connected by a block. Omitted states are shaded in grey. ACKNOWLEDGEMENTS Eddy,S.R. (2001) HMMER User’s Guide: Biological Sequence Analysis Using Profile Hidden Markov Models, Version 2.2. Washington University School of Medicine, We would like to thank Martin Madera and Robert Finn for the http://hmmer.wustl.edu. valuable information about theoretical and practical aspects of PRC. Edgar,R.C. and Sjölander,K. (2004a) COACH: profile–profile alignment of protein Johannes Söding kindly answered numerous questions about his families using hidden Markov models. Bioinformatics, 20, 1309–1318. Edgar,R.C. and Sjölander,K. (2004b) A comparison of scoring functions for protein HHsearch algorithm. The authors are grateful for the valuable sequence profile alignment. Bioinformatics, 20, 1301–1308. suggestions and corrections made by the reviewers. B.S.-B. is funded Lyngsø,R. et al. (1999) Metrics and similarity measures for hidden Markov models. by the Wellcome Trust. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1999, 178–186. Madera,M. (2005) PRC—the profile comparer. REFERENCES Schneider,T.D. and Stephens,R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, Schuster-Böckler,B., Schultz,J. and Rahmann,S. (2004) HMM Logos for visualization D138–D141. of protein families. BMC Bioinformatics, 5,7. Durbin,R., Eddy,S.R., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis. Söding,J. (2005) Protein homology detection by HMM–HMM comparison. Cambridge University Press, Cambridge, UK. Bioinformatics, 21, 951–960. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763.

Journal

BioinformaticsOxford University Press

Published: Apr 12, 2005

There are no references for this article.