Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Pfam: A comprehensive database of protein domain families based on seed alignments

Pfam: A comprehensive database of protein domain families based on seed alignments Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam‐A is curated and contains well‐characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam‐B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam‐A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam‐A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley‐Liss, Inc. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Proteins: Structure Function and Bioinformatics Wiley

Pfam: A comprehensive database of protein domain families based on seed alignments

Loading next page...
 
/lp/wiley/pfam-a-comprehensive-database-of-protein-domain-families-based-on-seed-S006FKZpkn

References (0)

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Wiley
Copyright
Copyright © 1997 Wiley‐Liss, Inc.
ISSN
0887-3585
eISSN
1097-0134
DOI
10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.3.CO;2-Z
Publisher site
See Article on Publisher Site

Abstract

Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam‐A is curated and contains well‐characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam‐B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam‐A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam‐A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley‐Liss, Inc.

Journal

Proteins: Structure Function and BioinformaticsWiley

Published: Jul 1, 1997

Keywords: classification; clustering; protein domains; genome annotation; hidden Markov model; Caenorhabditis elegans

There are no references for this article.