modbase, a database of annotated comparative protein structure models and associated resources

Ursula Pieper; Narayanan Eswar; Ben M. Webb; David Eramian; Libusha Kelly; David T. Barkan; Hannah Carter; Parminder Mankoo; Rachel Karchin; Marc A. Marti-Renom; Fred P. Davis; Andrej Sali

doi:10.1093/nar/gkn791

modbase, a database of annotated comparative protein structure models and associated resources

Pieper, Ursula; Eswar, Narayanan; Webb, Ben M.; Eramian, David; Kelly, Libusha; Barkan, David T.; Carter, Hannah; Mankoo, Parminder; Karchin, Rachel; Marti-Renom, Marc A.; Davis, Fred P.; Sali, Andrej 2009-01-23 00:00:00 Published online 23 October 2008 Nucleic Acids Research, 2009, Vol. 37, Database issue D347–D354 doi:10.1093/nar/gkn791 MODBASE, a database of annotated comparative protein structure models and associated resources 1 1 1 1,2 Ursula Pieper , Narayanan Eswar , Ben M. Webb , David Eramian , 1,3 1,3 4 4 Libusha Kelly , David T. Barkan , Hannah Carter , Parminder Mankoo , 4 5 6 1, Rachel Karchin , Marc A. Marti-Renom , Fred P. Davis and Andrej Sali * Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California 2 3 at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigacio ´ n Prı´ncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA Received September 15, 2008; Accepted October 8, 2008 INTRODUCTION ABSTRACT The genome sequencing eﬀorts are providing us with com- MODBASE (http://salilab.org/modbase) is a data- plete genetic blueprints for hundreds of organisms, includ- base of annotated comparative protein structure ing humans. We are now faced with the challenge of models. The models are calculated by MODPIPE, assigning, investigating and modifying the functions of an automated modeling pipeline that relies primarily proteins encoded by these genomes. This task is generally on MODELLER for fold assignment, sequence– facilitated by 3D structures of the proteins (1–3), which structure alignment, model building and model are best determined by experimental methods such as assessment (http:/salilab.org/modeller). MODBASE X-ray crystallography and NMR-spectroscopy. The currently contains 5 152 695 reliable models for number of experimentally determined structures deposited domains in 1 593 209 unique protein sequences; in the Protein Data Bank (PDB) more than doubled from only models based on statistically significant align- 23 096 to 52 821 over the last 5 years (September 2008) (4). ments and/or models assessed to have the correct However, the number of sequences in comprehensive fold are included. MODBASE also allows users to sequence databases, such as UniProt (5) and GenPept calculate comparative models on demand, through (6), continues to grow even more rapidly than the number of known protein structures; for example, the an interface to the MODWEB modeling server number of sequences in UniProt increased from 1.2 mil- (http://salilab.org/modweb). Other resources inte- lion to 6.4 million over the same period. Therefore, pro- grated with MODBASE include databases of multi- tein structure prediction is essential for structural ple protein structure alignments (DBAli), structurally characterization of sequences without experimentally defined ligand binding sites (LIGBASE), predicted determined structures. ligand binding sites (AnnoLyze), structurally defined The most accurate models are generally obtained by binary domain interfaces (PIBASE) and annotated homology or comparative modeling (7–10), which is single nucleotide polymorphisms and somatic applicable when an experimentally determined structure mutations found in human proteins (LS-SNP, related to the target sequence is available. The fraction LS-Mut). MODBASE models are also available of sequences in a genome for which comparative models through the Protein Model Portal (http://www.prote can be obtained automatically varies from 20%– inmodelportal.org/). 75% (11). *To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231; Email: [email protected] 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D348 Nucleic Acids Research, 2009, Vol. 37, Database issue The process of comparative modeling usually requires structure as input, calculates a proﬁle for each identiﬁable the use of a number of programs to identify template sequence homolog in the UniProt database, followed by structures, to generate sequence–structure alignments, modeling these homologs based on detectable templates in to build the models and to evaluate them. In addition, the PDB as well as the user-provided structure. Finally, various sequence and structure databases that are accessed MODWEB proposes a representative model based on by these programs are needed. Once an initial model is model assessment. This module is a useful tool for mea- calculated, it is generally reﬁned and ultimately analyzed suring the impact of new structures, such as those gener- in the context of many other related proteins and their ated by structural genomics eﬀorts (21). The module functional annotations. Here, we describe MODBASE, a allows us to assess the impact of a newly determined pro- database of comparative protein structure models, and tein structure on the modeling of sequences of unknown several associated databases and servers that facilitate structure. It is also used to identify new members of modeling and analysis tasks for both expert and novice sequence superfamilies with at least one member of users. We highlight the improvements of MODBASE that known structure. The results of MODWEB calculations were implemented since the last report (11), including are available to the users through the MODBASE inter- updates in the modeling software, user interface and asso- face as private datasets protected with passwords. ciated annotation tools. We also illustrate the utility of Pairwise and multiple structure alignments (DBAli) MODBASE by describing several projects depending on large model sets. DBAli (http://www.dbali.org/) stores pairwise compari- sons of all structures in the PDB calculated using the pro- gram MAMMOTH (22), as well as multiple structure CONTENTS alignments generated by the SALIGN module of Comparative modeling (MODELLER and MODPIPE) MODELLER-9 (23). DBAli contains approximately 1.7 billion pairwise comparisons and 12 732 family-based mul- Models in MODBASE are calculated using MODPIPE, tiple structure alignments for 34 637 nonredundant protein our automated software pipeline for comparative model- chains out of 96 804 protein chains in the PDB. Additional ing (12). It relies primarily on the various modules of information is provided by ModDom that assigns domain MODELLER (13) for its functionality and is adapted boundaries from structure and ModClus that allows the for large-scale operation on a cluster of PCs using scripts user to generate clusters of similar protein structures. written in PERL and Python. Sequence–structure matches These DBAli tools help users to analyze the protein struc- are established using a variety of fold-assignment meth- ture space by establishing relationships between protein ods, including sequence–sequence (14), proﬁle–sequence structures and their fragments in a ﬂexible and dynamic (15,16) and proﬁle–proﬁle alignments (16,17). Odds manner. of ﬁnding a template structure are increased by using an E-value threshold of 1.0. By default, 10 models are calcu- Ligand binding sites (LIGBASE and AnnoLyze) lated for each of the alignments (13). A representative model for each alignment is then chosen by ranking The LIGBASE module stores a list of the binding sites of based on the atomic distance-dependent statistical poten- known structure for approximately 230 000 ligands found tial DOPE (18). Finally, the fold of each model is evalu- in the PDB (24). The ligands include small molecules, such ated using a composite model quality criterion that as metal ions, nucleotides, saccharides and peptides. includes the coverage of the modeled sequence, sequence Binding sites in all known structures are deﬁned to consist identity implied by the sequence–structure alignment, the of residues with at least one atom within 5 A of any ligand fraction of gaps in the alignment, the compactness of the atom. For each template structure, MODBASE also con- model and various statistical potential Z-scores (18–20). tains a list of putative binding sites that were predicted by Only models that are assessed to have the correct fold the AnnoLyze program (25). The predictions are based on were included in the ﬁnal model sets. inheriting an actual binding site from any related known A key feature of the pipeline is not prejudging the structure if at least 75% of the binding site residues are validity of sequence–structure relationships at the fold- within 4 A of the template residues in a global superposi- assignment stage; instead, sequence–structure matches tion of the two structures in DBALI and if at least 75% of are assessed after the construction of the models and the binding site residue types are invariant. In addition, their evaluation. This approach enables a thorough the putative ligand binding sites in the models are then exploration of fold assignments, sequence–structure align- mapped via the target–template alignments. The putative ments and conformations, with the aim of ﬁnding the ligand binding sites are stored as SITE records and the model with the best evaluation score. binding site membership frequency per residue is indicated in the B-factor column of the model coordinate ﬁles. Sixty- Comparative modeling web server (MODWEB) ﬁve percent of MODBASE models have at least one pre- dicted binding site. MODWEB is our comparative modeling web server that is an integral module of MODBASE (http://salilab.org/ Protein interactions (PIBASE) modweb) (12). MODWEB accepts one or more sequences in the FASTA format and calculates their models using PIBASE (http://pibase.janelia.org, http://salilab.org/ MODPIPE based on the best available templates from the pibase) is a comprehensive database of structurally deﬁned PDB. Alternatively, MODWEB also accepts a protein protein interfaces (26). It is composed of binary interfaces Nucleic Acids Research, 2009, Vol. 37, Database issue D349 between pairs of chains or domains extracted from struc- mutations may destabilize protein quaternary structure or tures in the PDB and the Probable Quaternary Structure interfere with small molecule ligand binding. server PQS using domain assignments from the Structural Classiﬁcation of Proteins and CATH fold classiﬁcation systems. PIBASE currently contains 269 821 SCOP, MODBASE MODEL SETS 269 438 CATH, and 216 739 chain binary interfaces. A Models in MODBASE are organized into a number of diverse set of geometrical, physiochemical and topological datasets. The largest dataset contains models of all properties are calculated for each complex, its domains, sequences in the UniProt database that are detectably interfaces and binding sites. The database is accessible related to at least one known structure in the PDB from through the web server and can also be installed locally. July 2005. Because of the rapid growth of the public The software used to build PIBASE is available for down- sequence databases, we now concentrate our eﬀorts on load under an open-source license. adding datasets that are useful for speciﬁc projects, PIBASE is a convenient resource for structural informa- rather than attempt to model all known protein sequen- tion on protein–protein interactions and is easily inte- ces with detectable template structures. Currently, grated with other databases. It is currently used by the MODBASE includes datasets of nine archaeal genomes, AnnoLyze annotation program (27) and the LS-SNP 13 bacterial genomes and 18 eukaryotic genomes annotation system (28). The complexes stored in (Table 1). Together with other project-oriented datasets, PIBASE can also be used as templates to predict the com- MODBASE currently contains 5 152 695 models from position and structure of protein complexes using com- domains in 1 593 209 unique sequences. Next, we illustrate parative modeling followed by an assessment of the the utility of MODBASE by outlining several recent modeled interface (29). This approach was applied to pre- projects. dict host–pathogen interactions for 10 ‘neglected’ human pathogens (30). Structural genomics of the enolase and amidohydrolase superfamilies Single nucleotide polymorphisms and somatic mutations Comparative models of enzymes in the amidohydrolase (LS-SNP and LS-Mut) and enolase superfamilies have contributed to studying their substrate speciﬁcity by the Enzyme Speciﬁcity LS-SNP [http://karchinlab.org/LS-SNP, http://salilab. Consortium (ENSPEC) as well as selecting targets for a org/LS-SNP (28)] and LS-Mut [http://karchinlab.org/LS- structural genomics eﬀort by the New York SGX Mut, (31,32)] are collections of annotated DNA sequence Research Center for Structural Genomics (NYSGXRC). variants in protein-coding exons that result in an amino In particular, we selected 535 target proteins from 130 acid residue-type substitution. These resources focus on genomes for high-throughput structure determination by inherited genetic variants and tumor-derived somatic X-ray crystallography, resulting in 61 unique structures mutations, respectively. For LS-SNP, genomic locations thus far. Both template-based modeling and sequence- of the variants are taken from the dbSNP database (33) based modeling were essential in identifying suitable and are mapped onto as many human proteins in the targets. UniProt database (34) as possible. The mapping is achieved via a collection of protein-to-mRNA and Structural genomics of membrane proteins mRNA-to-genome alignments produced with the Known Comparative modeling was also applied to inform target Genes algorithm (35). For LS-Mut, somatic mutation data selection for the structural genomics of membrane proteins from tumor sequencing projects are used, consisting of as part of the Center for Structures of Membrane Proteins transcript identiﬁers from RefSeq, CCDS and Ensembl (CSMP) at UCSF (40). The goal of CSMP is to express, (36,37), codon positions and amino acid residue-type sub- purify and determine the structures of representative mem- stitutions. Our software then maps the mutations onto bers of integral membrane protein classes. MODBASE translated protein sequences. LS-Mut currently includes models were combined with an interactive web-based mutations from 24 advanced pancreatic cancers and target selection tool to facilitate selection of biologically 22 glioblastoma multiforme (brain) tumors. For both interesting targets with little or no structural data LS-SNP and LS-Mut, human protein sequences are available. In addition, template-based modeling in aligned with homologous proteins of known structure MODWEB is being used to calculate how many sequences from PDB, to build comparative protein structure can be modeled based on newly determined CSMP models using MODPIPE. Models are constructed for all structures. signiﬁcant alignments covering a distinct region of protein sequence (E-value cutoﬀ 0.0001). UCSF Chimera (38) is ABC Transporters used to visualize the location of the residue substitutions on the model. We use our software and DSSP (39) to ABC transporters are a large and diverse set of integral identify secondary structure elements and relative solvent membrane proteins that couple the action of ATP binding, accessibility of the residue positions. Putative protein hydrolysis and release to substrate transport across a cel- and small ligand binding sites on the models are anno- lular membrane (41). Mutations in 13 of the 48 human tated with PIBASE and the LIGBASE module of ABC transporters are associated with monogenic human MODBASE, respectively, to infer which SNPs or somatic disease phenotypes (42). Additional variants are being D350 Nucleic Acids Research, 2009, Vol. 37, Database issue Table 1. MODBASE datasets Dataset/Project Taxonomy ID No. of No. of No. of Sequence source Transcripts Sequences modeled Models Genomes ( genomes for the TDI) Archaea Archaeoglobus fulgidus 2234 2409 1794 3980 NCBI Methanococcus jannaschii 2190 1785 1480 1707 NCBI Nanoarchaeum equitans 160 232 536 447 496 NCBI Picrophilus torridus 82 076 1535 1260 2902 NCBI Pyrobaculum aerophilum 13 773 2600 1566 3497 NCBI Pyrococcus furiosus 2261 2113 1524 3373 NCBI Sulfolobus solfataricus 2287 2922 2006 4451 NCBI Thermoplasma volcanium 50 339 1497 1204 2806 NCBI Thermoplasma acidophilum 1480 1220 2801 NCBI Bacteria Bacillus subtilis 1423 4105 3374 9245 NCBI Burkholderia mallei 13 373 4798 3910 23 219 NCBI Clostridium tetani 1513 2413 2158 5864 NCBI Escherichia coli 562 4206 3150 5994 NCBI Mycobacterium leprae 1769 1605 1178 2493 OrthoMCL-DB Mycobacterium tuberculosis 1773 3991 2808 5913 TubercuList Mycoplasma pneumoniae 2104 687 426 857 NCBI Pseudomonas aeruginosa 287 5559 3806 9222 NCBI Rickettsia prowazekii 782 835 754 2136 NCBI Staphylococcus aureus MRSA252 282 458 2635 1184 3161 NCBI Streptococcus pyogenes 1314 1691 1440 3984 NCBI Wolbachia 953 805 621 1873 TIGR Yersinia pestis 632 3882 3215 8371 NCBI Eukaryota Arabidopsis thaliana 3702 30 707 23 807 70 494 ENSEMBL Brugia malayi 6279 11 397 7850 23 219 TIGR Caenorhabditis elegans 6239 22 698 18 996 52 235 NCBI Canis familiaris 9615 30 264 22 614 65 617 ENSEMBL Cryptosporidium hominis 237 895 3886 1614 3287 CryptoDB Cryptosporidium parvum 5807 3806 1918 3969 CryptoDB Danio rerio Calculation in progress ENSEMBL Drosophila melanogaster 7227 17 104 9381 24 683 NCBI H.sapiens 9606 32 010 21 270 51 084 OrthoMCL-DB Leishmania major 5664 8274 3975 8285 GeneDB Mus musculus 10 090 30 133 25 338 70 783 NCBI Pan troglodytes Calculation in progress ENSEMBL Plasmodium falciparum 5833 5363 2599 5053 PlasmoDB Plasmodium vivax 5855 5342 2359 4670 PlasmoDB Rattus norvegicus Calculation in progress ENSEMBL Saccharomyces cerevisiae 4932 6600 3035 5543 NCBI Schistosoma mansoni 6183 25 304 8576 26 076 GeneDB Toxoplasma gondii 5811 7793 1530 3064 ToxoDB Trypanosoma brucei 5691 9210 3900 8054 GeneDB Trypanosoma cruzi 5693 19 607 7390 14 858 GeneDB Xenopus laevis 8355 27 952 25 457 69 191 NCBI Selected projects CSMP datasets 195 235 184 139 690 255 GENPEPT NR NYSGXRC datasets 553 537 493 672 1 415 237 GENPEPT NR Enzyme Speciﬁcity Project 15 833 10 875 183 591 SFLD/NR ABC Transporter 152 85 85 GPCR 11 586 11 551 24 272 UNIPROT Datasets 2005 1 742 816 1 025 196 2 146 830 UNIPROT Total (including other datasets) 2 608 987 1 593 209 5 152 695 The sequences were retrieved from ENSEMBL (36), TIGR (50), NCBI-Genbank (6), OrthoMCL-DB (51), TubercuList (52), CryptoDB (53), GeneDB (54), ToxoDB (55), SFLD (56) and UniProt (34). identiﬁed in hundreds of individuals by the Pharmacoge- sequences with disease-associated and polymorphic non- nomics of Membrane Transporters (PMT) consortium at synonymous SNPs found in the nucleotide binding UCSF (43). To annotate these variants, we modeled domains. Finally, the incomplete or unsatisfactory nucleotide binding and membrane spanning domains modeling coverage was used to suggest speciﬁc targets with detectably related template structures in all human for a structural genomics eﬀort on ABC transporters by ABC transporters. The dataset also includes models of CSMP. Nucleic Acids Research, 2009, Vol. 37, Database issue D351 Human caspases G-Protein Coupled receptors G-protein coupled receptors (GPCR) are a large family of Caspases are cysteine proteases involved in multiple apop- pharmacologically important transmembrane receptors totic pathways. An experimental approach was recently that are involved in the recognition of a wide variety of developed to identify caspase substrates by biotinylating extra-cellular ligands. It has been estimated that this natural protein N-termini and selecting protein fragments family of proteins is the target for about half of all cur- containing unblocked a-amines characteristically gener- rently marketed drugs. Atomic structures are known for ated upon proteolytic cleavage (44). Likely high accuracy only three sub-families of GPCRs, including light-sensitive models of protein substrates prior to cleavage were iden- rhodopsins, b1 and b2 adrenergic receptors that all belong tiﬁed in the MODBASE human genome datasets and ana- to the Class A Rhodopsin-like family (GPCRDB nomen- lysis of the structural properties of the cleavage sites was clature). The GPCR dataset in MODBASE consists of performed. While these sites often appeared in disordered, models for approximately 12 000 UniProt sequences that solvent accessible regions of the substrate as expected (45), are related to one of these structures. The models span a surprising number were found in a-helices and partially several sub-families of the Class A Rhodopsin-like inaccessible regions, information which can now be incor- family, including aminergic, peptide, hormone, opsin, porated into new algorithms for predicting additional cas- olfactory and nucleotide receptors. These models are pase substrates. used for ligand docking and virtual screening computa- tions by DOCK (47). Binding sites and ligands for the tropical disease initiative Open source drug discovery is an alternative avenue to ACCESS AND INTERFACE conventional patent-based drug development, illustrated The main access to MODBASE is through its web inter- by the proposed Tropical Disease Initiative (TDI) face at http://salilab.org/modbase, by querying with (http://tropicaldisease.org) (46). Open source drug discov- Uniprot and GI identiﬁers, gene names, annotation key- ery involves a decentralized, web-based and community- words, PDB codes, datasets, organisms, sequence similar- wide collaboration, in which scientists from laboratories, ity to the modeled sequences (BLAST) and model-speciﬁc universities, institutes and corporations volunteer to work criteria such as model reliability, model size and target– together for a common cause. To contribute to this eﬀort, template sequence identity. Additionally, it is possible to we calculated comparative protein structure models for 10 retrieve coordinate ﬁles, alignment ﬁles and ligand-binding genomes of organisms that cause ‘neglected’ tropical dis- information in text ﬁles. Select genome datasets are also eases (Table 1). We followed up by predicting binding sites available from our ftp server (ftp://salilab.org/databases/ for known drugs using the AnnoLyze program (25). These modbase/projects). predictions may be used as a starting point for experimen- The output of a search is displayed on pages with vary- tally testing the biological functions of the target proteins ing amounts of information about the modeled sequences, and potentially even as leads for drug discovery. template structures, alignments and functional annota- tions. An example of the output from a search resulting in one model is shown in Figure 1. A ribbon diagram of the Host–pathogen protein interactions for TDI model with the highest target–template sequence identity is Pathogens have evolved numerous strategies to infect their displayed by default, together with details of the modeling hosts, while hosts have evolved immune responses and calculation. Ribbon thumbprints of additional models for other defenses to these foreign challenges. The vast major- this sequence link to corresponding pages with more infor- ity of host–pathogen interactions involve protein–protein mation. The ribbon diagrams are generated on the ﬂy using recognition, yet our current understanding of these inter- Molscript (48) and Raster3D (49). A pull-down menu pro- actions is limited. We developed and applied a computa- vides links to additional functionality: the ligand-binding tional whole-genome protocol that generates testable module, the SNP module, retrieval of coordinate and predictions of host–pathogen protein interactions (30) alignment ﬁles, as well as molecular visualization by (http://salilab.org/hostpathogen). The protocol ﬁrst scans Chimera that allows the user to display template and model coordinates together with their alignment. If muta- the host and pathogen genomes for proteins with similar- tion information is available for a protein sequence, links ity to known protein complexes, then assesses these puta- to the details are provided in the cross-references section. tive interactions, using structure if available, and, ﬁnally, Additionally, cross-references to various other databases, ﬁlters the remaining interactions using biological context, including PDB, UniProt, SwissProt/TrEMBL, PubMed such as the stage-speciﬁc expression of pathogen proteins and the UCSC Genome Browser, are given. Other and tissue expression of host proteins. The technique was MODBASE pages provide overviews of more than one applied to 10 pathogens, using their MODBASE model sequence or structure. All MODBASE pages are intercon- datasets. Several speciﬁc predictions have been made that nected to facilitate easy navigation between diﬀerent views. warrant experimental follow-up, including interactions from previously characterized mechanisms, such as Access through external databases cytoadhesion and protease inhibition, as well as suspected interactions in hypothesized networks, such as apoptotic MODBASE models in academic and public datasets are pathways. directly accessible from several other databases, including D352 Nucleic Acids Research, 2009, Vol. 37, Database issue Figure 1. MODBASE Model Details page (Example Q9NP58 from the human genome dataset): this page provides links to all models for this speciﬁc sequence. A ribbon diagram of the primary model, database annotations and modeling details are displayed. Links to additional models for diﬀerent target regions or models from other datasets are displayed as thumbprints. The pull-down menu provides access to alternative MODBASE views and other types of information (if available), such as data about mutations and putative ligand binding sites. The cross-references section contains links to relevant internal and external databases. For this particular sequence, mutation data are available from LS-Mut, LS-SNP and ABC SNPs. the SwissProt/TrEMBL sequence pages, UniProt, PIR’s our own calculations of model datasets that are needed iProClass, EBI’s InterPro, the UCSC Genome Browser for our research projects (using MODPIPE, MODWEB and PubMed (LinkOut). Importantly, MODBASE or MODELLER). These updates will reﬂect improve- models are also accessible through the Protein Model ments in the methods and software used for calculating Portal (http://proteinmodelportal.org), a module of the the models as well as the new template structures in the Protein Structure Initiative Knowledgebase (PSI KB). PDB and new sequences in UniProt. In the future, we The Model Portal has the potential to become the single expect that most of the users will access MODBASE entry point for users interested in experimentally deter- models through the Protein Model Portal. mined or computationally predicted models. For a user query, the portal will interrogate participating source CITATION model databases and modeling servers to provide a com- prehensive view of all available models of the query Users of MODBASE are requested to cite this article in their sequence. publications. FUTURE DIRECTIONS ACKNOWLEDGEMENTS MODBASE will grow by adding models calculated on We are grateful to Tom Ferrin, Daniel Greenblatt, demand by external users (using MODWEB) as well as Conrad Huang and Tom Goddard for CHIMERA and Nucleic Acids Research, 2009, Vol. 37, Database issue D353 15. Altschul,S.F., Madden,T.L., Schaﬀer,A.A., Zhang,J., Zhang,Z., contributing to the MODBASE/CHIMERA interface. Miller,W. and Lipman,D.J. (1997) Gapped BLAST and For linking to MODBASE from their databases, we PSI-BLAST: a new generation of protein database search programs. thank Torsten Schwede (Protein Model Portal), David Nucleic Acids Res., 25, 3389–3402. Haussler and Jim Kent (UCSC Genome Browser), Amos 16. Eswar,N., Webb,B., Marti-Renom,M.A., Madhusudhan,M.S., Eramian,D., Shen,M.Y., Pieper,U. and Sali,A. (2006) Comparative Bairoch (SwissProt/TrEMBL), Rolf Apweiler (InterPro), protein structure modeling using Modeller. Curr. Protocols Patsy Babbitt (SFLD) and Cathy Wu (PIR/iProClass). Bioinformatics/editoral board, Andreas D. Baxevanis .. . et al., We are also grateful for computing hardware gifts from Chapter 5, Unit 56. Mike Homer, Ron Conway, NetApp, IBM, Hewlett 17. Marti-Renom,M.A., Madhusudhan,M.S. and Sali,A. (2004) Packard and Intel. Alignment of protein sequences by their proﬁles. Protein Sci., 13, 1071–1087. 18. Shen,M.Y. and Sali,A. (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci., 15, FUNDING 2507–2524. 19. Eramian,D., Shen,M.Y., Devos,D., Melo,F., Sali,A. and National Institutes of Health (R01 GM54762, U54 Marti-Renom,M.A. (2006) A composite score for predicting errors GM074945, U54 GM074929, U01 GM61390, P01 in protein structure models. Protein Sci., 15, 1653–1666. 20. Melo,F., Sanchez,R. and Sali,A. (2002) Statistical potentials for fold GM71790 to A.S., GM08284 to D.E., NSF EF 0626651); assessment. Protein Sci., 11, 430–448. the Sandler Family Supporting Foundation (to A.S.); 21. Chance,M.R., Fiser,A., Sali,A., Pieper,U., Eswar,N., Xu,G., Susan G. Komen Foundation (KG080137 to R.K.); Fajardo,J.E., Radhakannan,T. and Marinkovic,N. (2004) Spanish Ministerio de Educacion y Ciencia (BIO2007/ High-throughput computational and experimental techniques in structural genomics. Genome Res., 14, 2145–2154. 66670 to M.A.M-R). Funding for open access charge: 22. Ortiz,A.R., Strauss,C.E. and Olmea,O. (2002) MAMMOTH U54 GM074945. (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci., 11, 2606–2621. 23. Marti-Renom,M.A., Ilyin,V.A. and Sali,A. (2001) DBAli: a database of protein structure alignments. Bioinformatics, 17, REFERENCES 746–747. 1. Domingues,F.S., Koppensteiner,W.A. and Sippl,M.J. (2000) 24. Stuart,A.C., Ilyin,V.A. and Sali,A. (2002) LigBase: a database of The role of protein structure in genomics. FEBS Lett., 476, 98–102. families of aligned ligand binding sites in known protein sequences 2. Brenner,S.E. and Levitt,M. (2000) Expectations from structural and structures. Bioinformatics, 18, 200–201. genomics. Protein Sci., 9, 197–200. 25. Marti-Renom,M.A., Rossi,A., Al-Shahrour,F., Davis,F.P., 3. Skolnick,J., Fetrow,J.S. and Kolinski,A. (2000) Structural genomics Pieper,U., Dopazo,J. and Sali,A. (2007) The AnnoLite and and its importance for gene function analysis. Nat. Biotechnol., 18, AnnoLyze programs for comparative annotation of protein 283–287. structures. BMC Bioinformatics, 8(Suppl. 4), S4. 4. Deshpande,N., Addess,K.J., Bluhm,W.F., Merino-Ott,J.C., 26. Davis,F.P. and Sali,A. (2005) PIBASE: a comprehensive database Townsend-Merino,W., Zhang,Q., Knezevich,C., Xie,L., Chen,L., of structurally deﬁned protein interfaces. Bioinformatics, 21, Feng,Z. et al. (2005) The RCSB Protein Data Bank: a redesigned 1901–1907. query system and relational database based on the mmCIF schema. 27. Marti-Renom,M.A., Pieper,U., Madhusudhan,M.S., Rossi,A., Nucleic Acids Res., 33, D233–D237. Eswar,N., Davis,F.P., Al-Shahrour,F., Dopazo,J. and Sali,A. (2007) 5. Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B., DBAli tools: mining the protein structure space. Nucleic Acids Res., Ferro,S., Gasteiger,E., Huang,H., Lopez,R., Magrane,M. et al. 35, D393–D397. (2005) The Universal Protein Resource (UniProt). Nucleic Acids 28. Karchin,R., Diekhans,M., Kelly,L., Thomas,D.J., Pieper,U., Res., 33, D154–D159. Eswar,N., Haussler,D. and Sali,A. (2005) LS-SNP: large-scale 6. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and annotation of coding non-synonymous SNPs based on multiple Wheeler,D.L. (2008) GenBank. Nucleic Acids Res., 36, D25–D30. information sources. Bioinformatics, 21, 2814–2820. 7. Baker,D. and Sali,A. (2001) Protein structure prediction and 29. Davis,F.P., Braberg,H., Shen,M.Y., Pieper,U., Sali,A. and structural genomics. Science, 294, 93–96. Madhusudhan,M.S. (2006) Protein complex compositions predicted 8. Wallner,B. and Elofsson,A. (2005) All are not equal: a benchmark by structural similarity. Nucleic Acids Res., 34, 2943–2952. of diﬀerent homology modeling programs. Protein Sci., 14, 30. Davis,F.P., Barkan,D.T., Eswar,N., McKerrow,J.H. and Sali,A. 1315–1327. (2007) Host pathogen protein interactions predicted by comparative 9. Hillisch,A., Pineda,L.F. and Hilgenfeld,R. (2004) Utility of modeling. Protein Sci., 16, 2585–2596. homology models in the drug discovery process. Drug Discov. 31. Jones,S., Zhang,X., Parsons,D.W., Lin,J.C., Leary,R.J., Today, 9, 659–669. Angenendt,P., Mankoo,P., Carter,H., Kamiyama,H., Jimeno,A. 10. Eswar,N., Webb,B., Marti-Renom,M.A., Madhusudhan,M.S., et al. (2008) Core signaling pathways in human pancreatic cancers Eramian,D., Shen,M.Y., Pieper,U. and Sali,A. (2007) Comparative revealed by global genomic analyses. Science, 321, 1801–1806. protein structure modeling using MODELLER. Curr. Protocols 32. Parsons,D.W., Jones,S., Zhang,X., Lin,J.C., Leary,R.J., Protein Sci./editorial board, John E. Coligan .. . et al., Chapter 2, Angenendt,P., Mankoo,P., Carter,H., Siu,I.M., Gallia,G.L. et al. Unit 29. (2008) An integrated genomic analysis of human Glioblastoma 11. Pieper,U., Eswar,N., Davis,F.P., Braberg,H., Madhusudhan,M.S., multiforme. Science, 321, 1807–1812. Rossi,A., Marti-Renom,M., Karchin,R., Webb,B.M., Eramian,D. 33. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., et al. (2006) MODBASE: a database of annotated comparative Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database protein structure models and associated resources. Nucleic Acids of genetic variation. Nucleic Acids Res., 29, 308–311. Res., 34, D291–D295. 34. Wu,C.H., Apweiler,R., Bairoch,A., Natale,D.A., Barker,W.C., 12. Eswar,N., John,B., Mirkovic,N., Fiser,A., Ilyin,V.A., Pieper,U., Boeckmann,B., Ferro,S., Gasteiger,E., Huang,H., Lopez,R. et al. Stuart,A.C., Marti-Renom,M.A., Madhusudhan,M.S., Yerkovich,B. (2006) Nucleic Acids Res., 34, D187–191. et al. (2003) Tools for comparative protein structure modeling and 35. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and analysis. Nucleic Acids Res., 31, 3375–3380. Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22, 13. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling 1036–1046. by satisfaction of spatial restraints. J. Mol. Biol., 234, 779–815. 36. Flicek,P., Aken,B.L., Beal,K., Ballester,B., Caccamo,M., Chen,Y., 14. Smith,T.F. and Waterman,M.S. (1981) Identiﬁcation of common Clarke,L., Coates,G., Cunningham,F., Cutts,T. et al. (2008) molecular subsequences. J. Mol. Biol., 147, 195–197. Ensembl 2008. Nucleic Acids Res., 36, D707–D714. D354 Nucleic Acids Research, 2009, Vol. 37, Database issue 37. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., 47. Hermann,J.C., Marti-Arbona,R., Fedorov,A.A., Fedorov,E., Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R., Federhen,S. Almo,S.C., Shoichet,B.K. and Raushel,F.M. (2007) Structure-based et al. (2008) Database resources of the National Center for activity prediction for an enzyme of unknown function. Nature, 448, Biotechnology Information. Nucleic Acids Res., 36, D13–D21. 775–779. 38. Pettersen,E.F., Goddard,T.D., Huang,C.C., Couch,G.S., 48. Kraulis,P.J. (1991) MOLSCRIPT: a program to produce both Greenblatt,D.M., Meng,E.C. and Ferrin,T.E. (2004) UCSF detailed and schematic plorts of protein structures. J. Appl. Chimera—a visualization system for exploratory research and Crystallogr., 24, 946–950. analysis. J. Comput. Chem., 25, 1605–1612. 49. Merritt,E.A. and Bacon,D.J. (1997) Raster3D: photorealistic 39. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary molecular graphics. Methods Enzymol., 277, 505–524. structure: pattern recognition of hydrogen-bonded and geometrical 50. Ghedin,E., Wang,S., Spiro,D., Caler,E., Zhao,Q., Crabtree,J., features. Biopolymers, 22, 2577–2637. Allen,J.E., Delcher,A.L., Guiliano,D.B., Miranda-Saavedra,D. et al. 40. Li,M., Hays,F.A., Roe-Zurz,Z., Vuong,L., Kelly,L., Robbins,R., (2007) Draft genome of the ﬁlarial nematode parasite Brugia Ho,C.M., Pieper,U., O’Connell,J., Miercke,L.J. et al. (2008) malayi. Science, 317, 1756–1760. Eukaryotic Integral Membrane Protein Production For Structural 51. Chen,F., Mackey,A.J., Stoeckert,C.J. Jr. and Roos,D.S. (2006) Genomics. J. Mol. Biol., in press. OrthoMCL-DB: querying a comprehensive multi-species 41. Dean,M., Rzhetsky,A. and Allikmets,R. (2001) The human collection of ortholog groups. Nucleic Acids Res., 34, ATP-binding cassette (ABC) transporter superfamily. Genome Res., D363–D368. 11, 1156–1166. 52. Cole,S.T. (1999) Learning from the genome sequence of 42. Hamosh,A., Scott,A.F., Amberger,J.S., Bocchini,C.A. and Mycobacterium tuberculosis H37Rv. FEBS Lett., 452, 7–10. McKusick,V.A. (2005) Online Mendelian Inheritance in Man 53. Heiges,M., Wang,H., Robinson,E., Aurrecoechea,C., Gao,X., (OMIM), a knowledgebase of human genes and genetic disorders. Kaluskar,N., Rhodes,P., Wang,S., He,C.Z., Su,Y. et al. (2006) Nucleic Acids Res., 33, D514–D517. CryptoDB: a Cryptosporidium bioinformatics resource update. 43. Leabman,M.K., Huang,C.C., DeYoung,J., Carlson,E.J., Nucleic Acids Res., 34, D419–D422. Taylor,T.R., de la Cruz,M., Johns,S.J., Stryke,D., Kawamoto,M., 54. Hertz-Fowler,C., Peacock,C.S., Wood,V., Aslett,M., Kerhornou,A., Urban,T.J. et al. (2003) Natural variation in human membrane Mooney,P., Tivey,A., Berriman,M., Hall,N., Rutherford,K. et al. transporter genes reveals evolutionary and functional constraints. (2004) GeneDB: a resource for prokaryotic and eukaryotic Proc. Natl Acad. Sci. USA, 100, 5896–5901. organisms. Nucleic Acids Res., 32, D339–D343. 44. Mahrus,S., Trinidad,J.C., Barkan,D.T., Sali,A., Burlingame,A.L. 55. Gajria,B., Bahl,A., Brestelli,J., Dommer,J., Fischer,S., Gao,X., and Wells,J.A. (2008) Global sequencing of proteolytic cleavage Heiges,M., Iodice,J., Kissinger,J.C., Mackey,A.J. et al. (2008) sites in apoptosis by speciﬁc labeling of protein N termini. Cell, 134, ToxoDB: an integrated Toxoplasma gondii database resource. 866–876. Nucleic Acids Res., 36, D553–D556. 45. Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) Molecular 56. Pegg,S.C., Brown,S.D., Ojha,S., Seﬀernick,J., Meng,E.C., recognition. Conformational analysis of limited proteolytic sites Morris,J.H., Chang,P.J., Huang,C.C., Ferrin,T.E. and Babbitt,P.C. and serine proteinase protein inhibitors. J. Mol. Biol., 220, 507–530. (2006) Leveraging enzyme structure-function relationships for 46. Maurer,S.M., Rai,A. and Sali,A. (2004) Finding cures for tropical functional inference and experimental design: the structure- diseases: is open source an answer? PLoS Med., 1, e56. function linkage database. Biochemistry, 45, 2545–2555. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/modbase-a-database-of-annotated-comparative-protein-structure-models-pFc80Leonh

Loading next page...

References (158)

R. Hancock, F. Brinkman (2002)
Function of pseudomonas porins in uptake and efflux.
Annual review of microbiology, 56
J. Capra, R. Laskowski, J. Thornton, Mona Singh, T. Funkhouser (2009)
Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure
PLoS Computational Biology, 5
T. Kaneko, Lei Li, S. Li (2008)
The SH3 domain--a family of versatile peptide- and protein-recognition module.
Frontiers in bioscience : a journal and virtual library, 13
T. Klein, J. Chang, M. Cho, K. Easton, R. Fergerson, M. Hewett, Z. Lin, Y. Liu, S. Liu, D. Oliver, D. Rubin, F. Shafa, Joshua Stuart, R. Altman (2001)
Integrating genotype and phenotype information: an overview of the PharmGKB project
The Pharmacogenomics Journal, 1
S. Hubbard, F. CampbellSimon, J. Thornton (1991)
Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors.
Journal of molecular biology, 220 2
S. Sankararaman, Bryan Kolaczkowski, Kimmen Sjölander (2009)
INTREPID: a web server for prediction of functionally important residues by evolutionary analysis
Nucleic Acids Research, 37
David Eramian, N. Eswar, Min-Yi Shen, A. Sali (2008)
How well can the accuracy of comparative protein structure models be predicted?
Protein Science, 17
J. Ward, L. McGuffin, K. Bryson, B. Buxton, David Jones (2004)
The DISOPRED server for the prediction of protein disorder
Bioinformatics, 20 13
Russ Altman (PharmGKB), and Kathy Wu (PIR/iProClass). The project was supported by grants from National Institutes of Health
David Eramian, Min-Yi Shen, D. Devos, F. Melo, A. Sali, M. Martí-Renom (2006)
A composite score for predicting errors in protein structure models
Protein Science, 15
M. Martí-Renom, A. Rossi, F. Al-Shahrour, F. Davis, U. Pieper, J. Dopazo, A. Sali (2007)
The AnnoLite and AnnoLyze programs for comparative annotation of protein structures
BMC Bioinformatics, 8
M Li, FA Hays, Z Roe-Zurz, L Vuong, L Kelly, R Robbins, CM Ho, U Pieper, O'C, J onnell, LJ Miercke (2008)
Eukaryotic Integral Membrane Protein Production For Structural Genomics
J. Mol. Biol., in press.
Zhan-yang Zhu, Andrej Sali, Tom Blundell (1992)
A variable gap penalty function and feature weights for protein 3-D structure comparisons.
Protein engineering, 5 1
F. Davis, Hannes Braberg, Min-Yi Shen, U. Pieper, A. Sali, M. Madhusudhan (2006)
Protein complex compositions predicted by structural similarity
Nucleic Acids Research, 34
P. Konarev, V. Volkov, Anna Sokolova, M. Koch, D. Svergun (2003)
PRIMUS: a Windows PC-based system for small-angle scattering data analysis
Journal of Applied Crystallography, 36
Francisco Melo, A. Sali (2007)
Fold assessment for comparative protein structure modeling
Protein Science, 16
D. Baker, A. Sali (2001)
Protein Structure Prediction and Structural Genomics
Science, 294
E. Freed (2001)
HIV-1 Replication
Somatic Cell and Molecular Genetics, 26
I. Mochalkin, J. Miller, A. Evdokimov, Sandra Lightle, Chunhong Yan, Charles Stover, G. Waldrop (2008)
Structural evidence for substrate‐induced synergism and half‐sites reactivity in biotin carboxylase
Protein Science, 17
Björn Wallner, A. Elofsson (2005)
All are not equal: A benchmark of different homology modeling programs
Protein Science, 14
N. Kohl, E. Emini, W. Schleif, L. Davis, J. Heimbach, R. Dixon, E. Scolnick, I. Sigal (1988)
Active human immunodeficiency virus protease is required for viral infectivity.
Proceedings of the National Academy of Sciences of the United States of America, 85 13
A. Morgat, R. Apweiler, M. Martin, C. O’Donovan, M. Magrane, Y. Alam-Faruque, R. Antunes, D. Barrell, B. Bely, M. Bingley, David Binns, Lawrence Bower, Paul Browne, Chan Wm, E. Dimmer, R. Eberhardt, F. Fazzini, A. Fedotov, R. Foulger, J. Garavelli, Castro Lg, R. Huntley, Julius Jacobsen, M. Kleen, K. Laiho, D. Legge, Quan Lin, W. Liu, J. Luo, S. Orchard, S. Patient, K. Pichler, D. Poggioli, Nikolas Pontikos, Manuela Pruess, S. Rosanoff, T. Sawford, H. Sehra, E. Turner, M. Corbett, M. Donnelly, P. VanRensburg, I. Xenarios, L. Bougueleret, A. Auchincloss, Ghislaine Argoud-Puy, K. Axelsen, A. Bairoch, Delphine Baratin, Blatter Mc, B. Boeckmann, Jerven Bolleman, L. Bollondi, E. Boutet, Quintaje Sb, L. Breuza, A. Bridge, E. Decastro, E. Coudert, Isabelle Cusin, M. Doche, D. Dornevil, S. Duvaud, A. Estreicher, L. Famiglietti, M. Feuermann, S. Gehant, Serenella Ferro, E. Gasteiger, A. Gateau, Vivienne Gerritsen, A. Gos, N. Gruaz-Gumowski, U. Hinz, C. Hulo, N. Hulo, J. James, S. Jimenez, F. Jungo, T. Kappler, G. Keller, V. Lara, P. Lemercier, D. Lieberherr, X. Martin, P. Masson, M. Moinat, S. Paesano, I. Pedruzzi, S. Pilbout, S. Poux, Monica Pozzato, Nicole Redaschi, C. Rivoire, B. Roechert, M. Schneider, Christian Sigrist, K. Sonesson, S. Staehli, E. Stanley, A. Stutz, S. Sundaram, M. Tognolli, L. Verbregue, V. Al, Wu Ch, Arighi Cn, L. Arminski, Barker Wc, Chuming Chen, Yingfei Chen, P. Dubey, He Huang, R. Mazumder, P. McGarvey, Natale Da, N. Tg, J. Nchoutmboube, Roberts Nv, Suzek Be, U. Ugochukwu, Vinayak Cr, Qiang Wang, Y. Wang, Yeh Ls, J. Zhang (2010)
Ongoing and future developments at the Universal Protein Resource
Nucleic Acids Research, 39
S. Sankararaman, Kimmen Sjölander (2008)
INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification
Bioinformatics, 24
A. Hamosh, A. Scott, J. Amberger, C. Bocchini, David Valle, V. McKusick (2004)
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders
Nucleic Acids Research, 33
N. Eswar, Bino John, Nebojsa Mirkovic, A. Fiser, V. Ilyin, U. Pieper, A. Stuart, M. Martí-Renom, M. Madhusudhan, Bozidar Yerkovich, A. Sali (2003)
Tools for comparative protein structure modeling and analysis
Nucleic acids research, 31 13
D. Parsons, Siân Jones, Xiaosong Zhang, Jimmy Lin, R. Leary, P. Angenendt, P. Mankoo, H. Carter, I. Siu, G. Gallia, A. Olivi, R. McLendon, B. Rasheed, S. Keir, T. Nikolskaya, Y. Nikolsky, D. Busam, H. Tekleab, L. Diaz, James Hartigan, Douglas Smith, R. Strausberg, S. Marie, S. Shinjo, Hai Yan, G. Riggins, D. Bigner, R. Karchin, N. Papadopoulos, G. Parmigiani, B. Vogelstein, V. Velculescu, K. Kinzler (2008)
An Integrated Genomic Analysis of Human Glioblastoma Multiforme
Science, 321
W. Kabsch, C. Sander (1983)
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 22
T. Schwede, A. Sali, B. Honig, M. Levitt, H. Berman, David Jones, S. Brenner, S. Burley, Rhiju Das, N. Dokholyan, Roland Dunbrack, K. Fidelis, A. Fiser, A. Godzik, Yuanpeng Huang, C. Humblet, M. Jacobson, A. Joachimiak, S. Krystek, T. Kortemme, A. Kryshtafovych, G. Montelione, J. Moult, D. Murray, R. Sanchez, T. Sosnick, D. Standley, T. Stouch, S. Vajda, Max Vasquez, J. Westbrook, I. Wilson (2009)
Outcome of a workshop on applications of protein models in biomedical research.
Structure, 17 2
Siân Jones, Xiaosong Zhang, D. Parsons, Jimmy Lin, R. Leary, P. Angenendt, P. Mankoo, H. Carter, H. Kamiyama, A. Jimeno, Seung‐Mo Hong, Baojin Fu, Ming-Tseh Lin, E. Calhoun, M. Kamiyama, K. Walter, T. Nikolskaya, Y. Nikolsky, James Hartigan, Douglas Smith, M. Hidalgo, S. Leach, A. Klein, E. Jaffee, M. Goggins, A. Maitra, C. Iacobuzio-Donahue, J. Eshleman, S. Kern, R. Hruban, R. Karchin, N. Papadopoulos, G. Parmigiani, B. Vogelstein, V. Velculescu, K. Kinzler (2008)
Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses
Science, 321
F. Chen, A. Mackey, C. Stoeckert, D. Roos (2005)
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Research, 34
N. Eswar, David Eramian, Ben Webb, Min-Yi Shen, A. Sali (2008)
Protein structure modeling with MODELLER.
Methods in molecular biology, 426
K. Murthy, E. Winborne, M. Minnich, J. Culp, C. Debouck (1994)
The crystal structures at 2.2-A resolution of hydroxyethylene-based inhibitors bound to human immunodeficiency virus type 1 protease show that the inhibitors are present in two distinct orientations.
The Journal of biological chemistry, 267 32
K. Arnold, Florian Kiefer, J. Kopp, J. Battey, Michael Podvinec, J. Westbrook, H. Berman, L. Bordoli, T. Schwede (2008)
The Protein Model Portal
Journal of Structural and Functional Genomics, 10
Greg Hura, A. Menon, M. Hammel, R. Rambo, F. Poole, S. Tsutakawa, F. Jenney, S. Classen, K. Frankel, R. Hopkins, Sung-jae Yang, Joseph Scott, B. Dillard, M. Adams, J. Tainer (2009)
Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS)
Nature methods, 6
U. Pieper, Ranyee Chiang, J. Seffernick, Shoshana Brown, M. Glasner, L. Kelly, N. Eswar, J. Sauder, J. Bonanno, S. Swaminathan, S. Burley, Xiaojing Zheng, M. Chance, S. Almo, J. Gerlt, F. Raushel, M. Jacobson, P. Babbitt, A. Sali (2009)
Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies
Journal of Structural and Functional Genomics, 10
S. Maurer, A. Rai, A. Sali (2004)
Finding Cures for Tropical Diseases: Is Open Source an Answer?
PLoS Medicine, 1
A. Carr, D. Cooper (1996)
HIV protease inhibitors
AIDS, 10
S. Sherry, Minghong Ward, Michael Kholodov, J. Baker, Lon Phan, Elizabeth Smigielski, K. Sirotkin (2001)
dbSNP: the NCBI database of genetic variation
Nucleic acids research, 29 1
M. Martí-Renom, V. Ilyin, A. Sali (2001)
DBAli: a database of protein structure alignments
Bioinformatics, 17 8
The UCSC known genes
Ruth Seal, Susan Gordon, M. Lush, Mathew Wright, E. Bruford (2010)
genenames.org: the HGNC resources in 2011
Nucleic Acids Research, 39
Catherine Cormier, Jin Park, Michael Fiacco, J. Steel, Preston Hunter, Jason Kramer, Rajeev Singla, J. LaBaer (2011)
PSI:Biology-materials repository: a biologist’s resource for protein expression plasmids
Journal of Structural and Functional Genomics, 12
Johannes Hermann, Ricardo Martí-Arbona, A. Fedorov, E. Fedorov, S. Almo, B. Shoichet, F. Raushel (2007)
Structure-based activity prediction for an enzyme of unknown function
Nature, 448
Bindu Gajria, A. Bahl, John Brestelli, Jennifer Dommer, S. Fischer, Xin Gao, Mark Heiges, John Iodice, J. Kissinger, A. Mackey, D. Pinney, D. Roos, C. Stoeckert, Haiming Wang, B. Brunk (2007)
ToxoDB: an integrated Toxoplasma gondii database resource
Nucleic Acids Research, 36
Fernán Agüero, B. Al-Lazikani, M. Aslett, M. Berriman, F. Buckner, R. Campbell, Santiago Carmona, Ian Carruthers, A. Chan, Feng Chen, Gregory Crowther, M. Doyle, C. Hertz-Fowler, A. Hopkins, Gregg McAllister, S. Nwaka, John Overington, A. Pain, G. Paolini, U. Pieper, S. Ralph, Aaron Riechers, D. Roos, A. Sali, Dhanasekaran Shanmugam, Takashi Suzuki, W. Voorhis, C. Verlinde (2008)
Genomic-scale prioritization of drug targets: the TDR Targets database
Nature Reviews Drug Discovery, 7
D. Barkan, D. Hostetter, S. Mahrus, U. Pieper, J. Wells, C. Craik, A. Sali (2010)
Prediction of protease substrates using sequence and structure features
Bioinformatics, 26 14
L. Forrest, Christopher Tang, B. Honig (2006)
On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins.
Biophysical journal, 91 2
(2010)
GenBank
K. Okazaki, N. Koga, S. Takada, J. Onuchic, P. Wolynes (2006)
Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations
Proceedings of the National Academy of Sciences, 103
Mark Heiges, Haiming Wang, Edward Robinson, C. Aurrecoechea, Xin Gao, Nivedita Kaluskar, Philippa Rhodes, Sammy Wang, Congzhou He, Yanqi Su, John Miller, Eileen Kraemer, J. Kissinger (2005)
CryptoDB: a Cryptosporidium bioinformatics resource update
Nucleic Acids Research, 34
Pankaj Daga, Ronak Patel, R. Doerksen (2010)
Template-based protein modeling: recent methodological advances.
Current topics in medicinal chemistry, 10 1
P. Flicek, Bronwen Aken, Kathryn Beal, B. Ballester, M. Cáccamo, Yuan Chen, Laura Clarke, Guy Coates, Fiona Cunningham, T. Cutts, T. Down, S. Dyer, T. Eyre, Stephen Fitzgerald, J. Fernandez-Banet, S. Gräf, Syed Haider, M. Hammond, Richard Holland, K. Howe, K. Howe, Nathan Johnson, Andrew Jenkinson, Andreas Kähäri, Damian Keefe, F. Kokocinski, Eugene Kulesha, D. Lawson, Ian Longden, K. Megy, P. Meidl, B. Overduin, Anne Parker, Bethan Pritchard, A. Prlić, S. Rice, Daniel Rios, Michael Schuster, I. Sealy, G. Slater, D. Smedley, Giulietta Spudich, S. Trevanion, Albert Vilella, J. Vogel, S. White, M. Wood, E. Birney, Tony Cox, V. Curwen, R. Durbin, X. Fernández-Suárez, Javier Herrero, T. Hubbard, A. Kasprzyk, G. Proctor, James Smith, A. Ureta-Vidal, S. Searle (2007)
Ensembl 2008
Nucleic Acids Research, 36
Ruchira Datta, C. Meacham, Bushra Samad, C. Neyer, Kimmen Sjölander (2009)
Berkeley PHOG: PhyloFacts orthology group prediction web server
Nucleic Acids Research, 37
F. Poitevin, H. Orland, S. Doniach, P. Koehl, M. Delarue (2011)
AquaSAXS: a web server for computation and fitting of SAXS profiles with non-uniformally hydrated atomic models
Nucleic Acids Research, 39
F. Davis, D. Barkan, N. Eswar, J. McKerrow, A. Sali (2007)
Host–pathogen protein interactions predicted by comparative modeling
Protein Science, 16
A. Fiser (2010)
Template-based protein structure modeling.
Methods in molecular biology, 673
P. Sampathkumar, F. Lu, Xun Zhao, Zhenzhen Li, J. Gilmore, K. Bain, M. Rutter, T. Gheyi, K. Schwinn, J. Bonanno, U. Pieper, J. Fajardo, A. Fiser, S. Almo, S. Swaminathan, M. Chance, D. Baker, S. Atwell, Devon Thompson, J. Emtage, S. Wasserman, A. Sali, J. Sauder, S. Burley (2010)
Structure of a putative BenF‐like porin from Pseudomonas fluorescens Pf‐5 at 2.6 Å resolution
Proteins: Structure, 78
A. Algeciras-Schimnich, Anne-Sophie Belzacq-Casagrande, G. Bren, Z. Nie, Julie Taylor, S. Rizza, C. Brenner, A. Badley (2007)
Analysis of HIV Protease Killing Through Caspase 8 Reveals a Novel Interaction Between Caspase 8 and Mitochondria
The Open Virology Journal, 1
M. Dean, Y. Hamon, G. Chimini (2001)
The human ATP-binding cassette (ABC) transporter superfamily.
Journal of lipid research, 42 7
C. Putnam, M. Hammel, Greg Hura, J. Tainer (2011)
X-ray solution scattering (SAXS) combined with crystallography and computation: defining accurate macromolecular structures, conformations and assemblies in solution.
Quarterly reviews of biophysics, 40 3
A. Sali, T. Blundell (1993)
Comparative protein modelling by satisfaction of spatial restraints.
Journal of molecular biology, 234 3
A. Hinrichs, D. Karolchik, R. Baertsch, G. Barber, G. Bejerano, H. Clawson, M. Diekhans, T. Furey, R. Harte, F. Hsu, Jennifer Hillman-Jackson, R. Kuhn, J. Pedersen, A. Pohl, B. Raney, K. Rosenbloom, A. Siepel, Kayla Smith, C. Sugnet, A. Sultan-Qurraie, D. Thomas, Heather Trumbower, R. Weber, M. Weirauch, A. Zweig, D. Haussler, W. Kent (2005)
The UCSC Genome Browser Database: update 2006
Nucleic Acids Research, 34
S. Pegg, Shoshana Brown, S. Ojha, J. Seffernick, E. Meng, J. Morris, Patricia Chang, Conrad Huang, T. Ferrin, P. Babbitt (2006)
Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database.
Biochemistry, 45 8
L. Breiman (2001)
Random Forests
Machine Learning, 45
B. Pierce, Z. Weng (2007)
ZRANK: Reranking protein docking predictions with an optimized energy function
Proteins: Structure, 67
A. Stuart, V. Ilyin, A. Sali (2002)
LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures
Bioinformatics, 18 1
B. Mahalingam, J. Louis, Jason Hung, R. Harrison, I. Weber (2001)
Structural implications of drug‐resistant mutants of HIV‐1 protease: High‐resolution crystal structures of the mutant protease/substrate analogue complexes
Proteins: Structure, 43
D. Schneidman-Duhovny, M. Hammel, A. Sali (2010)
FoXS: a web server for rapid computation and fitting of SAXS profiles
Nucleic Acids Research, 38
E. Giglia (2009)
New year, new PubMed.
European journal of physical and rehabilitation medicine, 45 1
M. Hediger, M. Romero, Ji-Bin Peng, A. Rolfs, H. Takanaga, E. Bruford (2004)
The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins
Pflügers Archiv, 447
I. Mochalkin, J. Miller, L. Narasimhan, V. Thanabal, P. Erdman, Philip Cox, J. Prasad, Sandra Lightle, M. Huband, C. Stover (2009)
Discovery of antibacterial biotin carboxylase inhibitors by virtual screening and fragment-based approaches.
ACS chemical biology, 4 6
Cathy Wu, R. Apweiler, A. Bairoch, D. Natale, W. Barker, B. Boeckmann, Serenella Ferro, E. Gasteiger, Hongzhan Huang, R. Lopez, M. Magrane, M. Martin, R. Mazumder, C. O’Donovan, Nicole Redaschi, Baris Suzek (2005)
The Universal Protein Resource (UniProt): an expanding universe of protein information
Nucleic Acids Research, 34
M. Prabu‐Jeyabalan, E. Nalivaika, C. Schiffer (2000)
How does a symmetric dimer recognize an asymmetric substrate? A substrate complex of HIV-1 protease.
Journal of molecular biology, 301 5
L. Ortí, R. Carbajo, U. Pieper, N. Eswar, S. Maurer, A. Rai, Ginger Taylor, M. Todd, A. Pineda-Lucena, A. Sali, M. Martí-Renom (2009)
A Kernel for Open Source Drug Discovery in Tropical Diseases
PLoS Neglected Tropical Diseases, 3
A. Schlessinger, P. Matsson, James Shima, U. Pieper, S. Yee, L. Kelly, Leonard Apeltsin, R. Stroud, T. Ferrin, K. Giacomini, A. Sali (2010)
Comparison of human solute carriers
Protein Science, 19
J. Söding (2005)
Protein homology detection by HMM?CHMM comparison
Bioinformatics, 21 7
S. Biswas, M. Mohammad, L. Movileanu, B. Berg (2008)
Crystal structure of the outer membrane protein OpdK from Pseudomonas aeruginosa.
Structure, 16 7
M. Petoukhov, D. Svergun (2007)
Analysis of X-ray and neutron scattering from biomacromolecular solutions.
Current opinion in structural biology, 17 5
Ramneek Gupta, E. Jung, A. Gooley, Keith Williams, S. Brunak, Jan Hansen (1999)
Scanning the available Dictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites using neural networks.
Glycobiology, 9 10
F. Melo, R. Sánchez, A. Sali (2009)
Statistical potentials for fold assessment
Protein Science, 11
G. Temple, D. Gerhard, R. Rasooly, E. Feingold, P. Good, Cristen Robinson, Allison Mandich, J. Derge, J. Lewis, Debonny Shoaf, F. Collins, W. Jang, L. Wagner, C. Shenmen, L. Misquitta, C. Schaefer, K. Buetow, T. Bonner, Linda Yankie, Minghong Ward, Lon Phan, Alex Astashyn, Garth Brown, C. Farrell, Jennifer Hart, M. Landrum, B. Maidak, Mike Murphy, Terence Murphy, B. Rajput, Lillian Riddick, David Webb, Janet Weber, Wendy Wu, K. Pruitt, D. Maglott, A. Siepel, Broňa Brejová, M. Diekhans, R. Harte, R. Baertsch, Jim Kent, D. Haussler, M. Brent, Laura Langton, Charles Comstock, Michael Stevens, Chaochun Wei, M. Baren, K. Salehi-Ashtiani, Ryan Murray, L. Ghamsari, Elizabeth Mello, Chenwei Lin, C. Pennacchio, Kirsten Schreiber, N. Shapiro, A. Marsh, E. Pardes, T. Moore, A. Lebeau, M. Muratet, B. Simmons, D. Kloske, Stephanie Sieja, J. Hudson, P. Sethupathy, M. Brownstein, N. Bhat, J. Lázár, H. Jacob, C. Gruber, Mark Smith, J. McPherson, A. Garcia, P. Gunaratne, Jiaqian Wu, D. Muzny, R. Gibbs, Alice Young, G. Bouffard, R. Blakesley, J. Mullikin, E. Green, M. Dickson, Alex Rodriguez, J. Grimwood, J. Schmutz, R. Myers, M. Hirst, Thomas Zeng, Kane Tse, M. Moksa, Merinda Deng, Kevin Ma, Diana Mah, Johnson Pang, G. Taylor, E. Chuah, Athena Deng, K. Fichter, Anne Go, Stephanie Lee, Jing Wang, M. Griffith, Ryan Morin, Richard Moore, Michael Mayo, Sarah Munro, Susan Wagner, Steven Jones, R. Holt, M. Marra, Sun Lu, Shuwei Yang, James Hartigan, M. Graf, R. Wagner, Stan Letovksy, J. Pulido, K. Robison, D. Esposito, J. Hartley, Vanessa Wall, R. Hopkins, O. Ohara, S. Wiemann (2009)
The completion of the Mammalian Gene Collection (MGC).
Genome research, 19 12
P. Leverrier, J. Declercq, K. Denoncin, D. Vertommen, A. Hiniker, Seung-Hyun Cho, J. Collet (2011)
Crystal Structure of the Outer Membrane Protein RcsF, a New Substrate for the Periplasmic Protein-disulfide Isomerase DsbC*
The Journal of Biological Chemistry, 286
J. Pardo, J. Aguiló, A. Anel, P. Martin, L. Joeckel, C. Borner, R. Wallich, A. Müllbacher, C. Froelich, M. Simon (2009)
The biology of cytotoxic cell granule exocytosis pathway: granzymes have evolved to induce cell death and inflammation.
Microbes and infection, 11 4
M. Leabman, Conrad Huang, J. Deyoung, E. Carlson, T. Taylor, M. Cruz, S. Johns, D. Stryke, M. Kawamoto, T. Urban, D. Kroetz, T. Ferrin, A. Clark, N. Risch, I. Herskowitz, K. Giacomini (2003)
Natural variation in human membrane transporter genes reveals evolutionary and functional constraints
Proceedings of the National Academy of Sciences of the United States of America, 100
G. Leontiadis, P. Moayyedi, Alexander Ford (2009)
Helicobacter pylori infection.
BMJ clinical evidence, 2009
M. Pelikán, Greg Hura, M. Hammel (2009)
Structure and flexibility within proteins as identified through small angle X-ray scattering.
General physiology and biophysics, 28 2
M. Martí-Renom, U. Pieper, M. Madhusudhan, A. Rossi, N. Eswar, F. Davis, F. Al-Shahrour, J. Dopazo, A. Sali (2007)
DBAli tools: mining the protein structure space
Nucleic Acids Research, 35
B. Shoichet, I. Kuntz (1991)
Protein docking and complementarity.
Journal of molecular biology, 221 1
Alfredo Castello, David Franco, Pablo Moral-López, J. Berlanga, E. Álvarez, E. Wimmer, Luis Carrasco (2009)
HIV- 1 Protease Inhibits Cap- and Poly(A)-Dependent Translation upon eIF4GI and PABP Cleavage
PLoS ONE, 4
D. Gront, Daniel Kulp, R. Vernon, C. Strauss, D. Baker (2011)
Generalized Fragment Picking in Rosetta: Design, Protocols and Applications
PLoS ONE, 6
E. Ghedin, Shiliang Wang, D. Spiro, E. Caler, Qi Zhao, J. Crabtree, Jonathan Allen, A. Delcher, David Guiliano, Diego Miranda-Saavedra, Samuel Angiuoli, T. Creasy, P. Amedeo, B. Haas, N. El-Sayed, J. Wortman, T. Feldblyum, L. Tallon, M. Schatz, Martin Shumway, H. Koo, S. Salzberg, S. Schobel, M. Perțea, Mihai Pop, O. White, G. Barton, C. Carlow, M. Crawford, J. Daub, Matthew Dimmic, Chris Estes, J. Foster, M. Ganatra, W. Gregory, N. Johnson, Jiansheng Jin, R. Komuniecki, I. Korf, Sanjay Kumar, S. Laney, B. Li, Wen Li, T. Lindblom, S. Lustigman, D. Ma, C. Maina, David Martin, J. McCarter, L. McReynolds, M. Mitreva, T. Nutman, J. Parkinson, J. Peregrín-Alvarez, C. Poole, Q. Ren, Lori Saunders, A. Sluder, K. Smith, M. Stanke, T. Unnasch, J. Ware, A. Wei, G. Weil, Deryck Williams, Yinhua Zhang, Steven Williams, Claire Fraser-Liggett, B. Slatko, M. Blaxter, A. Scott (2007)
Draft Genome of the Filarial Nematode Parasite Brugia malayi
Science, 317
B. Boeckmann, A. Bairoch, R. Apweiler, M. Blatter, A. Estreicher, E. Gasteiger, M. Martin, Karine Michoud, C. O’Donovan, Isabelle Phan, S. Pilbout, Michel Schneider (2003)
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
Nucleic acids research, 31 1
Z. Nie, Z. Nie, B. Phenix, B. Phenix, J. Lum, J. Lum, A. Alam, D. Lynch, B. Beckett, P. Krammer, R. Sékaly, A. Badley, A. Badley, A. Badley (2002)
HIV-1 protease processes procaspase 8 to cause mitochondrial release of cytochrome c, caspase cleavage and nuclear fragmentation
Cell Death and Differentiation, 9
S. Cole (1999)
Learning from the genome sequence of Mycobacterium tuberculosis H37Rv
FEBS Letters, 452
(2009)
The Pharmacogenomics Center of the University of California, Sam Francisco: at the interface of genomics, biological mechanism and drug therapy
A. Ortiz, C. Strauss, Osvaldo Olmea (2002)
MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison
Protein Science, 11
M. Hewett, D. Oliver, D. Rubin, K. Easton, Joshua Stuart, R. Altman, T. Klein (2002)
PharmGKB: the Pharmacogenetics Knowledge Base
Nucleic acids research, 30 1
E. Merritt, David Bacon (1997)
Raster3D: photorealistic molecular graphics.
Methods in enzymology, 277
DL Wheeler, T Barrett, DA Benson, SH Bryant, K Canese, V Chetvernin, DM Church, M DiCuccio, R Edgar, S Federhen (2008)
Database resources of the National Center for Biotechnology Information
Nucleic Acids Res., 36
N Eswar, B Webb, MA Marti-Renom, MS Madhusudhan, D Eramian, MY Shen, U Pieper, A Sali (2007)
Comparative protein structure modeling using MODELLER
Curr. Protocols Protein Sci./editorial board, John E. Coligan … et al, Chapter 2
Mark Johnson, I. Zaretskaya, Yan Raytselis, Yuri Merezhuk, S. McGinnis, Thomas Madden (2008)
NCBI BLAST: a better web interface
Nucleic Acids Research, 36
A. Bairoch, R. Apweiler, Cathy Wu, W. Barker, B. Boeckmann, Serenella Ferro, E. Gasteiger, Hongzhan Huang, R. Lopez, M. Magrane, M. Martin, D. Natale, C. O’Donovan, Nicole Redaschi, L. Yeh (2004)
The Universal Protein Resource (UniProt)
Nucleic Acids Research, 33
M. Chance, A. Fiser, A. Sali, A. Sali, U. Pieper, U. Pieper, N. Eswar, N. Eswar, Guiping Xu, J. Fajardo, T. Radhakannan, N. Marinkovic (2004)
High-throughput computational and experimental techniques in structural genomics.
Genome research, 14 10B
Philippe Lamesch, Ning Li, S. Milstein, Changyu Fan, Tong Hao, Gábor Szabó, Zhenjun Hu, K. Venkatesan, G. Bethel, Paul Martin, J. Rogers, S. Lawlor, Stuart Mclaren, Amélie Dricot, Heather Borick, M. Cusick, J. Vandenhaute, I. Dunham, D. Hill, M. Vidal (2007)
hORFeome v3.1: A resource of human open reading frames representing over 10,000 human genes
Genomics, 89
M Dean, A Rzhetsky, R Allikmets (2001)
The human ATP-binding cassette (ABC) transporter superfamily
Genome Res., 11
C. Zmasek, S. Eddy (2002)
RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs
BMC Bioinformatics, 3
M. Madhusudhan, M. Martí-Renom, R. Sánchez, A. Sali (2006)
Variable gap penalty for protein sequence-structure alignment.
Protein engineering, design & selection : PEDS, 19 3
S. Hunter, R. Apweiler, T. Attwood, A. Bairoch, A. Bateman, David Binns, P. Bork, Ujjwal Das, L. Daugherty, Lauranne Duquenne, R. Finn, J. Gough, D. Haft, N. Hulo, D. Kahn, Elizabeth Kelly, A. Laugraud, Ivica Letunic, D. Lonsdale, R. Lopez, M. Madera, J. Maslen, C. McAnulla, J. McDowall, Jaina Mistry, A. Mitchell, N. Mulder, D. Natale, C. Orengo, Antony Quinn, J. Selengut, Christian Sigrist, Manjula Thimma, P. Thomas, F. Valentin, Derek Wilson, Cathy Wu, C. Yeats (2008)
InterPro: the integrative protein signature database
Nucleic Acids Research, 37
S. Biswas, M. Mohammad, D. Patel, L. Movileanu, B. Berg (2007)
Structural insight into OprD substrate specificity
Nature Structural &Molecular Biology, 14
D. Schneidman-Duhovny, M. Hammel, A. Sali (2011)
Macromolecular docking restrained by a small angle X-ray scattering profile.
Journal of structural biology, 173 3
David Jones (1999)
Protein secondary structure prediction based on position-specific scoring matrices.
Journal of molecular biology, 292 2
A. Fiser, A. Sali (2003)
ModLoop: automated modeling of loops in protein structures
Bioinformatics, 19 18
F. Domingues, W. Koppensteiner, M. Sippl (2000)
The role of protein structure in genomics
FEBS Letters, 476
R. Karchin, M. Diekhans, L. Kelly, D. Thomas, U. Pieper, N. Eswar, D. Haussler, A. Sali (2005)
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources
Bioinformatics, 21 12
U. Pieper, N. Eswar, Hannes Braberg, M. Madhusudhan, F. Davis, A. Stuart, Nebojsa Mirkovic, A. Rossi, M. Martí-Renom, A. Fiser, Ben Webb, Daniel Greenblatt, Conrad Huang, T. Ferrin, A. Sali (2004)
MODBASE, a database of annotated comparative protein structure models, and associated resources.
Nucleic acids research, 32 Database issue
S. Mahrus, J. Trinidad, D. Barkan, A. Sali, A. Burlingame, J. Wells (2008)
Global Sequencing of Proteolytic Cleavage Sites in Apoptosis by Specific Labeling of Protein N Termini
Cell, 134
Z. Dosztányi, V. Csizmok, P. Tompa, I. Simon (2005)
IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content
Bioinformatics, 21 16
C. McDonald, D. Kuritzkes (1997)
Human immunodeficiency virus type 1 protease inhibitors.
Archives of internal medicine, 157 9
Olga Kotik-Kogan, Elizabeth Valentine, D. Sanfelice, M. Conte, S. Curry (2008)
Structural Analysis Reveals Conformational Plasticity in the Recognition of RNA 3′ Ends by the Human La Protein
Structure(London, England:1993), 16
Hao Fan, D. Schneidman-Duhovny, J. Irwin, G. Dong, B. Shoichet, A. Sali (2011)
Statistical Potential for Modeling and Ranking of Protein-Ligand Interactions
Journal of chemical information and modeling, 51 12
G. Knudsen, K. Medzihradszky, K. Lim, E. Hansell, J. McKerrow (2005)
Proteomic Analysis of Schistosoma mansoni Cercarial Secretions*S
Molecular & Cellular Proteomics, 4
F. Davis, A. Sali (2005)
PIBASE: a comprehensive database of structurally defined protein interfaces
Bioinformatics, 21 9
Hiroki Kondo, Kazutaka Shiratsuchi, Tadashi Yoshimoto, Toyofumi Masuda, Ana Kitazono, D. Tsuru, Motoaki Anai, Mutsuo Sekiguchi, Tadashi Tanabe (1991)
Acetyl-CoA carboxylase from Escherichia coli: gene organization and nucleotide sequence of the biotin carboxylase subunit.
Proceedings of the National Academy of Sciences of the United States of America, 88
M. Jacobson, David Pincus, Chaya Rapp, T. Day, B. Honig, D. Shaw, R. Friesner (2004)
A hierarchical approach to all‐atom protein loop prediction
Proteins: Structure, 55
A. Hillisch, L. Pineda, R. Hilgenfeld (2004)
Utility of homology models in the drug discovery process
Drug Discovery Today, 9
M. Hammel, Martial Rey, Yaping Yu, R. Mani, S. Classen, Mona Liu, M. Pique, Shujuan Fang, Brandi Mahaney, M. Weinfeld, D. Schriemer, S. Lees-Miller, J. Tainer (2011)
XRCC4 Protein Interactions with XRCC4-like Factor (XLF) Create an Extended Grooved Scaffold for DNA Ligation and Double Strand Break Repair*♦
The Journal of Biological Chemistry, 286
C. Hertz-Fowler, C. Peacock, V. Wood, M. Aslett, A. Kerhornou, P. Mooney, A. Tivey, M. Berriman, N. Hall, Kim Rutherford, J. Parkhill, A. Ivens, M. Rajandream, B. Barrell (2004)
GeneDB: a resource for prokaryotic and eukaryotic organisms
Nucleic acids research, 32 Database issue
Nelly Andrusier, R. Nussinov, H. Wolfson (2007)
FireDock: Fast interaction refinement in molecular docking
Proteins: Structure, 69
Nita Deshpande, K. Addess, Wolfgang Bluhm, Jeffrey Merino-Ott, Wayne Townsend-Merino, Qing Zhang, Charlie Knezevich, Lie Xie, Li Chen, Zukang Feng, Rachel Green, J. Flippen-Anderson, J. Westbrook, H. Berman, P. Bourne (2004)
The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema
Nucleic Acids Research, 33
I. Schomburg, Antje Chang, D. Schomburg (2002)
BRENDA, enzyme data and metabolic information
Nucleic acids research, 30 1
B. Rhead, D. Karolchik, R. Kuhn, A. Hinrichs, A. Zweig, P. Fujita, M. Diekhans, Kayla Smith, K. Rosenbloom, B. Raney, A. Pohl, Michael Pheasant, L. Meyer, K. Learned, F. Hsu, Jennifer Hillman-Jackson, R. Harte, B. Giardine, T. Dreszer, H. Clawson, G. Barber, D. Haussler, W. Kent (2009)
The UCSC Genome Browser database: update 2010
Nucleic Acids Research, 38
M. Madhusudhan, Ben Webb, M. Martí-Renom, N. Eswar, A. Sali (2009)
Alignment of multiple protein structures based on sequence and structure features.
Protein engineering, design & selection : PEDS, 22 9
P. Whitford, J. Noel, S. Gosavi, A. Schug, K. Sanbonmatsu, J. Onuchic (2009)
An all‐atom structure‐based potential for proteins: Bridging minimal models with all‐atom empirical forcefields
Proteins: Structure, 75
K. Nakai, A. Kidera, M. Kanehisa (1988)
Cluster analysis of amino acid indices for prediction of protein structure and function.
Protein engineering, 2 2
A Hamosh, AF Scott, JS Amberger, CA Bocchini, VA McKusick (2005)
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders
Nucleic Acids Res., 33
S. Altschul, Thomas Madden, A. Schäffer, Jinghui Zhang, Zheng Zhang, W. Miller, D. Lipman (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic acids research, 25 17
S. Brenner, I. Levitt (2008)
Expectations from structural genomics
Protein Science, 9
E. Meng, E. Pettersen, Gregory Couch, Conrad Huang, T. Ferrin (2006)
Tools for integrated sequence-structure analysis with UCSF Chimera
BMC Bioinformatics, 7
M. Drąg, G. Salvesen (2010)
Emerging principles in protease-based drug discovery
Nature Reviews Drug Discovery, 9
K. Henrick, Zukang Feng, Wolfgang Bluhm, D. Dimitropoulos, J. Doreleijers, Shuchismita Dutta, J. Flippen-Anderson, J. Ionides, Chisa Kamada, E. Krissinel, C. Lawson, J. Markley, Haruki Nakamura, R. Newman, Yukiko Shimizu, Jawahar Swaminathan, S. Velankar, J. Ory, E. Ulrich, W. Vranken, J. Westbrook, R. Yamashita, Huanwang Yang, Jasmine Young, M. Yousufuddin, H. Berman (2007)
Remediation of the protein data bank archive
Nucleic Acids Research, 36
J. Skolnick, J. Fetrow, A. Kolinski (2000)
Structural genomics and its importance for gene function analysis
Nature Biotechnology, 18
N Eswar, B Webb, MA Marti-Renom, MS Madhusudhan, D Eramian, MY Shen, U Pieper, A Sali (2006)
Comparative protein structure modeling using Modeller
Curr. Protocols Bioinformatics/editoral board, Andreas D. Baxevanis … et al, Chapter 5
E. Pettersen, Thomas Goddard, Conrad Huang, Gregory Couch, Daniel Greenblatt, E. Meng, T. Ferrin (2004)
UCSF Chimera—A visualization system for exploratory research and analysis
Journal of Computational Chemistry, 25
Chi Zhang, Song Liu, Yaoqi Zhou (2004)
Accurate and efficient loop selections by the DFIRE‐based all‐atom statistical potential
Protein Science, 13
Min-Yi Shen, A. Sali (2006)
Statistical potential for assessment and prediction of protein structures
Protein Science, 15
M. Martí-Renom, M. Madhusudhan, A. Sali (2004)
Alignment of protein sequences by their profiles
Protein Science, 13
Tianyun Liu, Grace Tang, E. Capriotti (2011)
Comparative modeling: the state of the art and protein drug target structure prediction.
Combinatorial chemistry & high throughput screening, 14 6
Temple Smith, M. Waterman (1981)
Identification of common molecular subsequences.
Journal of molecular biology, 147 1
We are also grateful for computing hardware gifts from Mike Homer
T. Grant, J. Luft, Jennifer Wolfley, H. Tsuruta, A. Martel, G. Montelione, E. Snell (2011)
Small angle X-ray scattering as a complementary tool for high-throughput structural studies.
Biopolymers, 95 8
Shuchismita Dutta, K. Burkhardt, Jasmine Young, G. Swaminathan, T. Matsuura, K. Henrick, Haruki Nakamura, H. Berman (2009)
Data Deposition and Annotation at the Worldwide Protein Data Bank
Molecular Biotechnology, 42
Yuzo Ueda, H. Taketomi, N. Go (1978)
Studies on protein folding, unfolding, and fluctuations by computer simulation. II. A. Three‐dimensional lattice model of lysozyme
Biopolymers, 17
(1997)
MOLSCRIPT : A Program to produce both detailed and schematic plorts of protein structures
K. Tomii, M. Kanehisa (1996)
Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins.
Protein engineering, 9 1
(1995)
CRYSOL-a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates
Z. Nie, G. Bren, S. Rizza, A. Badley (2008)
HIV Protease Cleavage of Procaspase 8 is Necessary for Death of HIV-Infected Cells
The Open Virology Journal, 2
U Pieper, N Eswar, FP Davis, H Braberg, MS Madhusudhan, A Rossi, M Marti-Renom, R Karchin, BM Webb, D Eramian (2006)
MODBASE: a database of annotated comparative protein structure models and associated resources
Nucleic Acids Res., 34
G. Waldrop, I. Rayment, H. Holden (1994)
Three-dimensional structure of the biotin carboxylase subunit of acetyl-CoA carboxylase.
Biochemistry, 33 34

Publisher: Oxford University Press
Copyright: © 2008 The Author(s)
ISSN: 0305-1048
eISSN: 1362-4962
DOI: 10.1093/nar/gkn791
pmid: 18948282
Publisher site: See Article on Publisher Site

Abstract

Published online 23 October 2008 Nucleic Acids Research, 2009, Vol. 37, Database issue D347–D354 doi:10.1093/nar/gkn791 MODBASE, a database of annotated comparative protein structure models and associated resources 1 1 1 1,2 Ursula Pieper , Narayanan Eswar , Ben M. Webb , David Eramian , 1,3 1,3 4 4 Libusha Kelly , David T. Barkan , Hannah Carter , Parminder Mankoo , 4 5 6 1, Rachel Karchin , Marc A. Marti-Renom , Fred P. Davis and Andrej Sali * Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California 2 3 at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigacio ´ n Prı´ncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA Received September 15, 2008; Accepted October 8, 2008 INTRODUCTION ABSTRACT The genome sequencing eﬀorts are providing us with com- MODBASE (http://salilab.org/modbase) is a data- plete genetic blueprints for hundreds of organisms, includ- base of annotated comparative protein structure ing humans. We are now faced with the challenge of models. The models are calculated by MODPIPE, assigning, investigating and modifying the functions of an automated modeling pipeline that relies primarily proteins encoded by these genomes. This task is generally on MODELLER for fold assignment, sequence– facilitated by 3D structures of the proteins (1–3), which structure alignment, model building and model are best determined by experimental methods such as assessment (http:/salilab.org/modeller). MODBASE X-ray crystallography and NMR-spectroscopy. The currently contains 5 152 695 reliable models for number of experimentally determined structures deposited domains in 1 593 209 unique protein sequences; in the Protein Data Bank (PDB) more than doubled from only models based on statistically significant align- 23 096 to 52 821 over the last 5 years (September 2008) (4). ments and/or models assessed to have the correct However, the number of sequences in comprehensive fold are included. MODBASE also allows users to sequence databases, such as UniProt (5) and GenPept calculate comparative models on demand, through (6), continues to grow even more rapidly than the number of known protein structures; for example, the an interface to the MODWEB modeling server number of sequences in UniProt increased from 1.2 mil- (http://salilab.org/modweb). Other resources inte- lion to 6.4 million over the same period. Therefore, pro- grated with MODBASE include databases of multi- tein structure prediction is essential for structural ple protein structure alignments (DBAli), structurally characterization of sequences without experimentally defined ligand binding sites (LIGBASE), predicted determined structures. ligand binding sites (AnnoLyze), structurally defined The most accurate models are generally obtained by binary domain interfaces (PIBASE) and annotated homology or comparative modeling (7–10), which is single nucleotide polymorphisms and somatic applicable when an experimentally determined structure mutations found in human proteins (LS-SNP, related to the target sequence is available. The fraction LS-Mut). MODBASE models are also available of sequences in a genome for which comparative models through the Protein Model Portal (http://www.prote can be obtained automatically varies from 20%– inmodelportal.org/). 75% (11). *To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231; Email: [email protected] 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D348 Nucleic Acids Research, 2009, Vol. 37, Database issue The process of comparative modeling usually requires structure as input, calculates a proﬁle for each identiﬁable the use of a number of programs to identify template sequence homolog in the UniProt database, followed by structures, to generate sequence–structure alignments, modeling these homologs based on detectable templates in to build the models and to evaluate them. In addition, the PDB as well as the user-provided structure. Finally, various sequence and structure databases that are accessed MODWEB proposes a representative model based on by these programs are needed. Once an initial model is model assessment. This module is a useful tool for mea- calculated, it is generally reﬁned and ultimately analyzed suring the impact of new structures, such as those gener- in the context of many other related proteins and their ated by structural genomics eﬀorts (21). The module functional annotations. Here, we describe MODBASE, a allows us to assess the impact of a newly determined pro- database of comparative protein structure models, and tein structure on the modeling of sequences of unknown several associated databases and servers that facilitate structure. It is also used to identify new members of modeling and analysis tasks for both expert and novice sequence superfamilies with at least one member of users. We highlight the improvements of MODBASE that known structure. The results of MODWEB calculations were implemented since the last report (11), including are available to the users through the MODBASE inter- updates in the modeling software, user interface and asso- face as private datasets protected with passwords. ciated annotation tools. We also illustrate the utility of Pairwise and multiple structure alignments (DBAli) MODBASE by describing several projects depending on large model sets. DBAli (http://www.dbali.org/) stores pairwise compari- sons of all structures in the PDB calculated using the pro- gram MAMMOTH (22), as well as multiple structure CONTENTS alignments generated by the SALIGN module of Comparative modeling (MODELLER and MODPIPE) MODELLER-9 (23). DBAli contains approximately 1.7 billion pairwise comparisons and 12 732 family-based mul- Models in MODBASE are calculated using MODPIPE, tiple structure alignments for 34 637 nonredundant protein our automated software pipeline for comparative model- chains out of 96 804 protein chains in the PDB. Additional ing (12). It relies primarily on the various modules of information is provided by ModDom that assigns domain MODELLER (13) for its functionality and is adapted boundaries from structure and ModClus that allows the for large-scale operation on a cluster of PCs using scripts user to generate clusters of similar protein structures. written in PERL and Python. Sequence–structure matches These DBAli tools help users to analyze the protein struc- are established using a variety of fold-assignment meth- ture space by establishing relationships between protein ods, including sequence–sequence (14), proﬁle–sequence structures and their fragments in a ﬂexible and dynamic (15,16) and proﬁle–proﬁle alignments (16,17). Odds manner. of ﬁnding a template structure are increased by using an E-value threshold of 1.0. By default, 10 models are calcu- Ligand binding sites (LIGBASE and AnnoLyze) lated for each of the alignments (13). A representative model for each alignment is then chosen by ranking The LIGBASE module stores a list of the binding sites of based on the atomic distance-dependent statistical poten- known structure for approximately 230 000 ligands found tial DOPE (18). Finally, the fold of each model is evalu- in the PDB (24). The ligands include small molecules, such ated using a composite model quality criterion that as metal ions, nucleotides, saccharides and peptides. includes the coverage of the modeled sequence, sequence Binding sites in all known structures are deﬁned to consist identity implied by the sequence–structure alignment, the of residues with at least one atom within 5 A of any ligand fraction of gaps in the alignment, the compactness of the atom. For each template structure, MODBASE also con- model and various statistical potential Z-scores (18–20). tains a list of putative binding sites that were predicted by Only models that are assessed to have the correct fold the AnnoLyze program (25). The predictions are based on were included in the ﬁnal model sets. inheriting an actual binding site from any related known A key feature of the pipeline is not prejudging the structure if at least 75% of the binding site residues are validity of sequence–structure relationships at the fold- within 4 A of the template residues in a global superposi- assignment stage; instead, sequence–structure matches tion of the two structures in DBALI and if at least 75% of are assessed after the construction of the models and the binding site residue types are invariant. In addition, their evaluation. This approach enables a thorough the putative ligand binding sites in the models are then exploration of fold assignments, sequence–structure align- mapped via the target–template alignments. The putative ments and conformations, with the aim of ﬁnding the ligand binding sites are stored as SITE records and the model with the best evaluation score. binding site membership frequency per residue is indicated in the B-factor column of the model coordinate ﬁles. Sixty- Comparative modeling web server (MODWEB) ﬁve percent of MODBASE models have at least one pre- dicted binding site. MODWEB is our comparative modeling web server that is an integral module of MODBASE (http://salilab.org/ Protein interactions (PIBASE) modweb) (12). MODWEB accepts one or more sequences in the FASTA format and calculates their models using PIBASE (http://pibase.janelia.org, http://salilab.org/ MODPIPE based on the best available templates from the pibase) is a comprehensive database of structurally deﬁned PDB. Alternatively, MODWEB also accepts a protein protein interfaces (26). It is composed of binary interfaces Nucleic Acids Research, 2009, Vol. 37, Database issue D349 between pairs of chains or domains extracted from struc- mutations may destabilize protein quaternary structure or tures in the PDB and the Probable Quaternary Structure interfere with small molecule ligand binding. server PQS using domain assignments from the Structural Classiﬁcation of Proteins and CATH fold classiﬁcation systems. PIBASE currently contains 269 821 SCOP, MODBASE MODEL SETS 269 438 CATH, and 216 739 chain binary interfaces. A Models in MODBASE are organized into a number of diverse set of geometrical, physiochemical and topological datasets. The largest dataset contains models of all properties are calculated for each complex, its domains, sequences in the UniProt database that are detectably interfaces and binding sites. The database is accessible related to at least one known structure in the PDB from through the web server and can also be installed locally. July 2005. Because of the rapid growth of the public The software used to build PIBASE is available for down- sequence databases, we now concentrate our eﬀorts on load under an open-source license. adding datasets that are useful for speciﬁc projects, PIBASE is a convenient resource for structural informa- rather than attempt to model all known protein sequen- tion on protein–protein interactions and is easily inte- ces with detectable template structures. Currently, grated with other databases. It is currently used by the MODBASE includes datasets of nine archaeal genomes, AnnoLyze annotation program (27) and the LS-SNP 13 bacterial genomes and 18 eukaryotic genomes annotation system (28). The complexes stored in (Table 1). Together with other project-oriented datasets, PIBASE can also be used as templates to predict the com- MODBASE currently contains 5 152 695 models from position and structure of protein complexes using com- domains in 1 593 209 unique sequences. Next, we illustrate parative modeling followed by an assessment of the the utility of MODBASE by outlining several recent modeled interface (29). This approach was applied to pre- projects. dict host–pathogen interactions for 10 ‘neglected’ human pathogens (30). Structural genomics of the enolase and amidohydrolase superfamilies Single nucleotide polymorphisms and somatic mutations Comparative models of enzymes in the amidohydrolase (LS-SNP and LS-Mut) and enolase superfamilies have contributed to studying their substrate speciﬁcity by the Enzyme Speciﬁcity LS-SNP [http://karchinlab.org/LS-SNP, http://salilab. Consortium (ENSPEC) as well as selecting targets for a org/LS-SNP (28)] and LS-Mut [http://karchinlab.org/LS- structural genomics eﬀort by the New York SGX Mut, (31,32)] are collections of annotated DNA sequence Research Center for Structural Genomics (NYSGXRC). variants in protein-coding exons that result in an amino In particular, we selected 535 target proteins from 130 acid residue-type substitution. These resources focus on genomes for high-throughput structure determination by inherited genetic variants and tumor-derived somatic X-ray crystallography, resulting in 61 unique structures mutations, respectively. For LS-SNP, genomic locations thus far. Both template-based modeling and sequence- of the variants are taken from the dbSNP database (33) based modeling were essential in identifying suitable and are mapped onto as many human proteins in the targets. UniProt database (34) as possible. The mapping is achieved via a collection of protein-to-mRNA and Structural genomics of membrane proteins mRNA-to-genome alignments produced with the Known Comparative modeling was also applied to inform target Genes algorithm (35). For LS-Mut, somatic mutation data selection for the structural genomics of membrane proteins from tumor sequencing projects are used, consisting of as part of the Center for Structures of Membrane Proteins transcript identiﬁers from RefSeq, CCDS and Ensembl (CSMP) at UCSF (40). The goal of CSMP is to express, (36,37), codon positions and amino acid residue-type sub- purify and determine the structures of representative mem- stitutions. Our software then maps the mutations onto bers of integral membrane protein classes. MODBASE translated protein sequences. LS-Mut currently includes models were combined with an interactive web-based mutations from 24 advanced pancreatic cancers and target selection tool to facilitate selection of biologically 22 glioblastoma multiforme (brain) tumors. For both interesting targets with little or no structural data LS-SNP and LS-Mut, human protein sequences are available. In addition, template-based modeling in aligned with homologous proteins of known structure MODWEB is being used to calculate how many sequences from PDB, to build comparative protein structure can be modeled based on newly determined CSMP models using MODPIPE. Models are constructed for all structures. signiﬁcant alignments covering a distinct region of protein sequence (E-value cutoﬀ 0.0001). UCSF Chimera (38) is ABC Transporters used to visualize the location of the residue substitutions on the model. We use our software and DSSP (39) to ABC transporters are a large and diverse set of integral identify secondary structure elements and relative solvent membrane proteins that couple the action of ATP binding, accessibility of the residue positions. Putative protein hydrolysis and release to substrate transport across a cel- and small ligand binding sites on the models are anno- lular membrane (41). Mutations in 13 of the 48 human tated with PIBASE and the LIGBASE module of ABC transporters are associated with monogenic human MODBASE, respectively, to infer which SNPs or somatic disease phenotypes (42). Additional variants are being D350 Nucleic Acids Research, 2009, Vol. 37, Database issue Table 1. MODBASE datasets Dataset/Project Taxonomy ID No. of No. of No. of Sequence source Transcripts Sequences modeled Models Genomes ( genomes for the TDI) Archaea Archaeoglobus fulgidus 2234 2409 1794 3980 NCBI Methanococcus jannaschii 2190 1785 1480 1707 NCBI Nanoarchaeum equitans 160 232 536 447 496 NCBI Picrophilus torridus 82 076 1535 1260 2902 NCBI Pyrobaculum aerophilum 13 773 2600 1566 3497 NCBI Pyrococcus furiosus 2261 2113 1524 3373 NCBI Sulfolobus solfataricus 2287 2922 2006 4451 NCBI Thermoplasma volcanium 50 339 1497 1204 2806 NCBI Thermoplasma acidophilum 1480 1220 2801 NCBI Bacteria Bacillus subtilis 1423 4105 3374 9245 NCBI Burkholderia mallei 13 373 4798 3910 23 219 NCBI Clostridium tetani 1513 2413 2158 5864 NCBI Escherichia coli 562 4206 3150 5994 NCBI Mycobacterium leprae 1769 1605 1178 2493 OrthoMCL-DB Mycobacterium tuberculosis 1773 3991 2808 5913 TubercuList Mycoplasma pneumoniae 2104 687 426 857 NCBI Pseudomonas aeruginosa 287 5559 3806 9222 NCBI Rickettsia prowazekii 782 835 754 2136 NCBI Staphylococcus aureus MRSA252 282 458 2635 1184 3161 NCBI Streptococcus pyogenes 1314 1691 1440 3984 NCBI Wolbachia 953 805 621 1873 TIGR Yersinia pestis 632 3882 3215 8371 NCBI Eukaryota Arabidopsis thaliana 3702 30 707 23 807 70 494 ENSEMBL Brugia malayi 6279 11 397 7850 23 219 TIGR Caenorhabditis elegans 6239 22 698 18 996 52 235 NCBI Canis familiaris 9615 30 264 22 614 65 617 ENSEMBL Cryptosporidium hominis 237 895 3886 1614 3287 CryptoDB Cryptosporidium parvum 5807 3806 1918 3969 CryptoDB Danio rerio Calculation in progress ENSEMBL Drosophila melanogaster 7227 17 104 9381 24 683 NCBI H.sapiens 9606 32 010 21 270 51 084 OrthoMCL-DB Leishmania major 5664 8274 3975 8285 GeneDB Mus musculus 10 090 30 133 25 338 70 783 NCBI Pan troglodytes Calculation in progress ENSEMBL Plasmodium falciparum 5833 5363 2599 5053 PlasmoDB Plasmodium vivax 5855 5342 2359 4670 PlasmoDB Rattus norvegicus Calculation in progress ENSEMBL Saccharomyces cerevisiae 4932 6600 3035 5543 NCBI Schistosoma mansoni 6183 25 304 8576 26 076 GeneDB Toxoplasma gondii 5811 7793 1530 3064 ToxoDB Trypanosoma brucei 5691 9210 3900 8054 GeneDB Trypanosoma cruzi 5693 19 607 7390 14 858 GeneDB Xenopus laevis 8355 27 952 25 457 69 191 NCBI Selected projects CSMP datasets 195 235 184 139 690 255 GENPEPT NR NYSGXRC datasets 553 537 493 672 1 415 237 GENPEPT NR Enzyme Speciﬁcity Project 15 833 10 875 183 591 SFLD/NR ABC Transporter 152 85 85 GPCR 11 586 11 551 24 272 UNIPROT Datasets 2005 1 742 816 1 025 196 2 146 830 UNIPROT Total (including other datasets) 2 608 987 1 593 209 5 152 695 The sequences were retrieved from ENSEMBL (36), TIGR (50), NCBI-Genbank (6), OrthoMCL-DB (51), TubercuList (52), CryptoDB (53), GeneDB (54), ToxoDB (55), SFLD (56) and UniProt (34). identiﬁed in hundreds of individuals by the Pharmacoge- sequences with disease-associated and polymorphic non- nomics of Membrane Transporters (PMT) consortium at synonymous SNPs found in the nucleotide binding UCSF (43). To annotate these variants, we modeled domains. Finally, the incomplete or unsatisfactory nucleotide binding and membrane spanning domains modeling coverage was used to suggest speciﬁc targets with detectably related template structures in all human for a structural genomics eﬀort on ABC transporters by ABC transporters. The dataset also includes models of CSMP. Nucleic Acids Research, 2009, Vol. 37, Database issue D351 Human caspases G-Protein Coupled receptors G-protein coupled receptors (GPCR) are a large family of Caspases are cysteine proteases involved in multiple apop- pharmacologically important transmembrane receptors totic pathways. An experimental approach was recently that are involved in the recognition of a wide variety of developed to identify caspase substrates by biotinylating extra-cellular ligands. It has been estimated that this natural protein N-termini and selecting protein fragments family of proteins is the target for about half of all cur- containing unblocked a-amines characteristically gener- rently marketed drugs. Atomic structures are known for ated upon proteolytic cleavage (44). Likely high accuracy only three sub-families of GPCRs, including light-sensitive models of protein substrates prior to cleavage were iden- rhodopsins, b1 and b2 adrenergic receptors that all belong tiﬁed in the MODBASE human genome datasets and ana- to the Class A Rhodopsin-like family (GPCRDB nomen- lysis of the structural properties of the cleavage sites was clature). The GPCR dataset in MODBASE consists of performed. While these sites often appeared in disordered, models for approximately 12 000 UniProt sequences that solvent accessible regions of the substrate as expected (45), are related to one of these structures. The models span a surprising number were found in a-helices and partially several sub-families of the Class A Rhodopsin-like inaccessible regions, information which can now be incor- family, including aminergic, peptide, hormone, opsin, porated into new algorithms for predicting additional cas- olfactory and nucleotide receptors. These models are pase substrates. used for ligand docking and virtual screening computa- tions by DOCK (47). Binding sites and ligands for the tropical disease initiative Open source drug discovery is an alternative avenue to ACCESS AND INTERFACE conventional patent-based drug development, illustrated The main access to MODBASE is through its web inter- by the proposed Tropical Disease Initiative (TDI) face at http://salilab.org/modbase, by querying with (http://tropicaldisease.org) (46). Open source drug discov- Uniprot and GI identiﬁers, gene names, annotation key- ery involves a decentralized, web-based and community- words, PDB codes, datasets, organisms, sequence similar- wide collaboration, in which scientists from laboratories, ity to the modeled sequences (BLAST) and model-speciﬁc universities, institutes and corporations volunteer to work criteria such as model reliability, model size and target– together for a common cause. To contribute to this eﬀort, template sequence identity. Additionally, it is possible to we calculated comparative protein structure models for 10 retrieve coordinate ﬁles, alignment ﬁles and ligand-binding genomes of organisms that cause ‘neglected’ tropical dis- information in text ﬁles. Select genome datasets are also eases (Table 1). We followed up by predicting binding sites available from our ftp server (ftp://salilab.org/databases/ for known drugs using the AnnoLyze program (25). These modbase/projects). predictions may be used as a starting point for experimen- The output of a search is displayed on pages with vary- tally testing the biological functions of the target proteins ing amounts of information about the modeled sequences, and potentially even as leads for drug discovery. template structures, alignments and functional annota- tions. An example of the output from a search resulting in one model is shown in Figure 1. A ribbon diagram of the Host–pathogen protein interactions for TDI model with the highest target–template sequence identity is Pathogens have evolved numerous strategies to infect their displayed by default, together with details of the modeling hosts, while hosts have evolved immune responses and calculation. Ribbon thumbprints of additional models for other defenses to these foreign challenges. The vast major- this sequence link to corresponding pages with more infor- ity of host–pathogen interactions involve protein–protein mation. The ribbon diagrams are generated on the ﬂy using recognition, yet our current understanding of these inter- Molscript (48) and Raster3D (49). A pull-down menu pro- actions is limited. We developed and applied a computa- vides links to additional functionality: the ligand-binding tional whole-genome protocol that generates testable module, the SNP module, retrieval of coordinate and predictions of host–pathogen protein interactions (30) alignment ﬁles, as well as molecular visualization by (http://salilab.org/hostpathogen). The protocol ﬁrst scans Chimera that allows the user to display template and model coordinates together with their alignment. If muta- the host and pathogen genomes for proteins with similar- tion information is available for a protein sequence, links ity to known protein complexes, then assesses these puta- to the details are provided in the cross-references section. tive interactions, using structure if available, and, ﬁnally, Additionally, cross-references to various other databases, ﬁlters the remaining interactions using biological context, including PDB, UniProt, SwissProt/TrEMBL, PubMed such as the stage-speciﬁc expression of pathogen proteins and the UCSC Genome Browser, are given. Other and tissue expression of host proteins. The technique was MODBASE pages provide overviews of more than one applied to 10 pathogens, using their MODBASE model sequence or structure. All MODBASE pages are intercon- datasets. Several speciﬁc predictions have been made that nected to facilitate easy navigation between diﬀerent views. warrant experimental follow-up, including interactions from previously characterized mechanisms, such as Access through external databases cytoadhesion and protease inhibition, as well as suspected interactions in hypothesized networks, such as apoptotic MODBASE models in academic and public datasets are pathways. directly accessible from several other databases, including D352 Nucleic Acids Research, 2009, Vol. 37, Database issue Figure 1. MODBASE Model Details page (Example Q9NP58 from the human genome dataset): this page provides links to all models for this speciﬁc sequence. A ribbon diagram of the primary model, database annotations and modeling details are displayed. Links to additional models for diﬀerent target regions or models from other datasets are displayed as thumbprints. The pull-down menu provides access to alternative MODBASE views and other types of information (if available), such as data about mutations and putative ligand binding sites. The cross-references section contains links to relevant internal and external databases. For this particular sequence, mutation data are available from LS-Mut, LS-SNP and ABC SNPs. the SwissProt/TrEMBL sequence pages, UniProt, PIR’s our own calculations of model datasets that are needed iProClass, EBI’s InterPro, the UCSC Genome Browser for our research projects (using MODPIPE, MODWEB and PubMed (LinkOut). Importantly, MODBASE or MODELLER). These updates will reﬂect improve- models are also accessible through the Protein Model ments in the methods and software used for calculating Portal (http://proteinmodelportal.org), a module of the the models as well as the new template structures in the Protein Structure Initiative Knowledgebase (PSI KB). PDB and new sequences in UniProt. In the future, we The Model Portal has the potential to become the single expect that most of the users will access MODBASE entry point for users interested in experimentally deter- models through the Protein Model Portal. mined or computationally predicted models. For a user query, the portal will interrogate participating source CITATION model databases and modeling servers to provide a com- prehensive view of all available models of the query Users of MODBASE are requested to cite this article in their sequence. publications. FUTURE DIRECTIONS ACKNOWLEDGEMENTS MODBASE will grow by adding models calculated on We are grateful to Tom Ferrin, Daniel Greenblatt, demand by external users (using MODWEB) as well as Conrad Huang and Tom Goddard for CHIMERA and Nucleic Acids Research, 2009, Vol. 37, Database issue D353 15. Altschul,S.F., Madden,T.L., Schaﬀer,A.A., Zhang,J., Zhang,Z., contributing to the MODBASE/CHIMERA interface. Miller,W. and Lipman,D.J. (1997) Gapped BLAST and For linking to MODBASE from their databases, we PSI-BLAST: a new generation of protein database search programs. thank Torsten Schwede (Protein Model Portal), David Nucleic Acids Res., 25, 3389–3402. Haussler and Jim Kent (UCSC Genome Browser), Amos 16. Eswar,N., Webb,B., Marti-Renom,M.A., Madhusudhan,M.S., Eramian,D., Shen,M.Y., Pieper,U. and Sali,A. (2006) Comparative Bairoch (SwissProt/TrEMBL), Rolf Apweiler (InterPro), protein structure modeling using Modeller. Curr. Protocols Patsy Babbitt (SFLD) and Cathy Wu (PIR/iProClass). Bioinformatics/editoral board, Andreas D. Baxevanis .. . et al., We are also grateful for computing hardware gifts from Chapter 5, Unit 56. Mike Homer, Ron Conway, NetApp, IBM, Hewlett 17. Marti-Renom,M.A., Madhusudhan,M.S. and Sali,A. (2004) Packard and Intel. Alignment of protein sequences by their proﬁles. Protein Sci., 13, 1071–1087. 18. Shen,M.Y. and Sali,A. (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci., 15, FUNDING 2507–2524. 19. Eramian,D., Shen,M.Y., Devos,D., Melo,F., Sali,A. and National Institutes of Health (R01 GM54762, U54 Marti-Renom,M.A. (2006) A composite score for predicting errors GM074945, U54 GM074929, U01 GM61390, P01 in protein structure models. Protein Sci., 15, 1653–1666. 20. Melo,F., Sanchez,R. and Sali,A. (2002) Statistical potentials for fold GM71790 to A.S., GM08284 to D.E., NSF EF 0626651); assessment. Protein Sci., 11, 430–448. the Sandler Family Supporting Foundation (to A.S.); 21. Chance,M.R., Fiser,A., Sali,A., Pieper,U., Eswar,N., Xu,G., Susan G. Komen Foundation (KG080137 to R.K.); Fajardo,J.E., Radhakannan,T. and Marinkovic,N. (2004) Spanish Ministerio de Educacion y Ciencia (BIO2007/ High-throughput computational and experimental techniques in structural genomics. Genome Res., 14, 2145–2154. 66670 to M.A.M-R). Funding for open access charge: 22. Ortiz,A.R., Strauss,C.E. and Olmea,O. (2002) MAMMOTH U54 GM074945. (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci., 11, 2606–2621. 23. Marti-Renom,M.A., Ilyin,V.A. and Sali,A. (2001) DBAli: a database of protein structure alignments. Bioinformatics, 17, REFERENCES 746–747. 1. Domingues,F.S., Koppensteiner,W.A. and Sippl,M.J. (2000) 24. Stuart,A.C., Ilyin,V.A. and Sali,A. (2002) LigBase: a database of The role of protein structure in genomics. FEBS Lett., 476, 98–102. families of aligned ligand binding sites in known protein sequences 2. Brenner,S.E. and Levitt,M. (2000) Expectations from structural and structures. Bioinformatics, 18, 200–201. genomics. Protein Sci., 9, 197–200. 25. Marti-Renom,M.A., Rossi,A., Al-Shahrour,F., Davis,F.P., 3. Skolnick,J., Fetrow,J.S. and Kolinski,A. (2000) Structural genomics Pieper,U., Dopazo,J. and Sali,A. (2007) The AnnoLite and and its importance for gene function analysis. Nat. Biotechnol., 18, AnnoLyze programs for comparative annotation of protein 283–287. structures. BMC Bioinformatics, 8(Suppl. 4), S4. 4. Deshpande,N., Addess,K.J., Bluhm,W.F., Merino-Ott,J.C., 26. Davis,F.P. and Sali,A. (2005) PIBASE: a comprehensive database Townsend-Merino,W., Zhang,Q., Knezevich,C., Xie,L., Chen,L., of structurally deﬁned protein interfaces. Bioinformatics, 21, Feng,Z. et al. (2005) The RCSB Protein Data Bank: a redesigned 1901–1907. query system and relational database based on the mmCIF schema. 27. Marti-Renom,M.A., Pieper,U., Madhusudhan,M.S., Rossi,A., Nucleic Acids Res., 33, D233–D237. Eswar,N., Davis,F.P., Al-Shahrour,F., Dopazo,J. and Sali,A. (2007) 5. Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B., DBAli tools: mining the protein structure space. Nucleic Acids Res., Ferro,S., Gasteiger,E., Huang,H., Lopez,R., Magrane,M. et al. 35, D393–D397. (2005) The Universal Protein Resource (UniProt). Nucleic Acids 28. Karchin,R., Diekhans,M., Kelly,L., Thomas,D.J., Pieper,U., Res., 33, D154–D159. Eswar,N., Haussler,D. and Sali,A. (2005) LS-SNP: large-scale 6. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and annotation of coding non-synonymous SNPs based on multiple Wheeler,D.L. (2008) GenBank. Nucleic Acids Res., 36, D25–D30. information sources. Bioinformatics, 21, 2814–2820. 7. Baker,D. and Sali,A. (2001) Protein structure prediction and 29. Davis,F.P., Braberg,H., Shen,M.Y., Pieper,U., Sali,A. and structural genomics. Science, 294, 93–96. Madhusudhan,M.S. (2006) Protein complex compositions predicted 8. Wallner,B. and Elofsson,A. (2005) All are not equal: a benchmark by structural similarity. Nucleic Acids Res., 34, 2943–2952. of diﬀerent homology modeling programs. Protein Sci., 14, 30. Davis,F.P., Barkan,D.T., Eswar,N., McKerrow,J.H. and Sali,A. 1315–1327. (2007) Host pathogen protein interactions predicted by comparative 9. Hillisch,A., Pineda,L.F. and Hilgenfeld,R. (2004) Utility of modeling. Protein Sci., 16, 2585–2596. homology models in the drug discovery process. Drug Discov. 31. Jones,S., Zhang,X., Parsons,D.W., Lin,J.C., Leary,R.J., Today, 9, 659–669. Angenendt,P., Mankoo,P., Carter,H., Kamiyama,H., Jimeno,A. 10. Eswar,N., Webb,B., Marti-Renom,M.A., Madhusudhan,M.S., et al. (2008) Core signaling pathways in human pancreatic cancers Eramian,D., Shen,M.Y., Pieper,U. and Sali,A. (2007) Comparative revealed by global genomic analyses. Science, 321, 1801–1806. protein structure modeling using MODELLER. Curr. Protocols 32. Parsons,D.W., Jones,S., Zhang,X., Lin,J.C., Leary,R.J., Protein Sci./editorial board, John E. Coligan .. . et al., Chapter 2, Angenendt,P., Mankoo,P., Carter,H., Siu,I.M., Gallia,G.L. et al. Unit 29. (2008) An integrated genomic analysis of human Glioblastoma 11. Pieper,U., Eswar,N., Davis,F.P., Braberg,H., Madhusudhan,M.S., multiforme. Science, 321, 1807–1812. Rossi,A., Marti-Renom,M., Karchin,R., Webb,B.M., Eramian,D. 33. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., et al. (2006) MODBASE: a database of annotated comparative Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database protein structure models and associated resources. Nucleic Acids of genetic variation. Nucleic Acids Res., 29, 308–311. Res., 34, D291–D295. 34. Wu,C.H., Apweiler,R., Bairoch,A., Natale,D.A., Barker,W.C., 12. Eswar,N., John,B., Mirkovic,N., Fiser,A., Ilyin,V.A., Pieper,U., Boeckmann,B., Ferro,S., Gasteiger,E., Huang,H., Lopez,R. et al. Stuart,A.C., Marti-Renom,M.A., Madhusudhan,M.S., Yerkovich,B. (2006) Nucleic Acids Res., 34, D187–191. et al. (2003) Tools for comparative protein structure modeling and 35. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and analysis. Nucleic Acids Res., 31, 3375–3380. Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22, 13. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling 1036–1046. by satisfaction of spatial restraints. J. Mol. Biol., 234, 779–815. 36. Flicek,P., Aken,B.L., Beal,K., Ballester,B., Caccamo,M., Chen,Y., 14. Smith,T.F. and Waterman,M.S. (1981) Identiﬁcation of common Clarke,L., Coates,G., Cunningham,F., Cutts,T. et al. (2008) molecular subsequences. J. Mol. Biol., 147, 195–197. Ensembl 2008. Nucleic Acids Res., 36, D707–D714. D354 Nucleic Acids Research, 2009, Vol. 37, Database issue 37. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., 47. Hermann,J.C., Marti-Arbona,R., Fedorov,A.A., Fedorov,E., Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R., Federhen,S. Almo,S.C., Shoichet,B.K. and Raushel,F.M. (2007) Structure-based et al. (2008) Database resources of the National Center for activity prediction for an enzyme of unknown function. Nature, 448, Biotechnology Information. Nucleic Acids Res., 36, D13–D21. 775–779. 38. Pettersen,E.F., Goddard,T.D., Huang,C.C., Couch,G.S., 48. Kraulis,P.J. (1991) MOLSCRIPT: a program to produce both Greenblatt,D.M., Meng,E.C. and Ferrin,T.E. (2004) UCSF detailed and schematic plorts of protein structures. J. Appl. Chimera—a visualization system for exploratory research and Crystallogr., 24, 946–950. analysis. J. Comput. Chem., 25, 1605–1612. 49. Merritt,E.A. and Bacon,D.J. (1997) Raster3D: photorealistic 39. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary molecular graphics. Methods Enzymol., 277, 505–524. structure: pattern recognition of hydrogen-bonded and geometrical 50. Ghedin,E., Wang,S., Spiro,D., Caler,E., Zhao,Q., Crabtree,J., features. Biopolymers, 22, 2577–2637. Allen,J.E., Delcher,A.L., Guiliano,D.B., Miranda-Saavedra,D. et al. 40. Li,M., Hays,F.A., Roe-Zurz,Z., Vuong,L., Kelly,L., Robbins,R., (2007) Draft genome of the ﬁlarial nematode parasite Brugia Ho,C.M., Pieper,U., O’Connell,J., Miercke,L.J. et al. (2008) malayi. Science, 317, 1756–1760. Eukaryotic Integral Membrane Protein Production For Structural 51. Chen,F., Mackey,A.J., Stoeckert,C.J. Jr. and Roos,D.S. (2006) Genomics. J. Mol. Biol., in press. OrthoMCL-DB: querying a comprehensive multi-species 41. Dean,M., Rzhetsky,A. and Allikmets,R. (2001) The human collection of ortholog groups. Nucleic Acids Res., 34, ATP-binding cassette (ABC) transporter superfamily. Genome Res., D363–D368. 11, 1156–1166. 52. Cole,S.T. (1999) Learning from the genome sequence of 42. Hamosh,A., Scott,A.F., Amberger,J.S., Bocchini,C.A. and Mycobacterium tuberculosis H37Rv. FEBS Lett., 452, 7–10. McKusick,V.A. (2005) Online Mendelian Inheritance in Man 53. Heiges,M., Wang,H., Robinson,E., Aurrecoechea,C., Gao,X., (OMIM), a knowledgebase of human genes and genetic disorders. Kaluskar,N., Rhodes,P., Wang,S., He,C.Z., Su,Y. et al. (2006) Nucleic Acids Res., 33, D514–D517. CryptoDB: a Cryptosporidium bioinformatics resource update. 43. Leabman,M.K., Huang,C.C., DeYoung,J., Carlson,E.J., Nucleic Acids Res., 34, D419–D422. Taylor,T.R., de la Cruz,M., Johns,S.J., Stryke,D., Kawamoto,M., 54. Hertz-Fowler,C., Peacock,C.S., Wood,V., Aslett,M., Kerhornou,A., Urban,T.J. et al. (2003) Natural variation in human membrane Mooney,P., Tivey,A., Berriman,M., Hall,N., Rutherford,K. et al. transporter genes reveals evolutionary and functional constraints. (2004) GeneDB: a resource for prokaryotic and eukaryotic Proc. Natl Acad. Sci. USA, 100, 5896–5901. organisms. Nucleic Acids Res., 32, D339–D343. 44. Mahrus,S., Trinidad,J.C., Barkan,D.T., Sali,A., Burlingame,A.L. 55. Gajria,B., Bahl,A., Brestelli,J., Dommer,J., Fischer,S., Gao,X., and Wells,J.A. (2008) Global sequencing of proteolytic cleavage Heiges,M., Iodice,J., Kissinger,J.C., Mackey,A.J. et al. (2008) sites in apoptosis by speciﬁc labeling of protein N termini. Cell, 134, ToxoDB: an integrated Toxoplasma gondii database resource. 866–876. Nucleic Acids Res., 36, D553–D556. 45. Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) Molecular 56. Pegg,S.C., Brown,S.D., Ojha,S., Seﬀernick,J., Meng,E.C., recognition. Conformational analysis of limited proteolytic sites Morris,J.H., Chang,P.J., Huang,C.C., Ferrin,T.E. and Babbitt,P.C. and serine proteinase protein inhibitors. J. Mol. Biol., 220, 507–530. (2006) Leveraging enzyme structure-function relationships for 46. Maurer,S.M., Rai,A. and Sali,A. (2004) Finding cures for tropical functional inference and experimental design: the structure- diseases: is open source an answer? PLoS Med., 1, e56. function linkage database. Biochemistry, 45, 2545–2555.

Journal

Nucleic Acids Research – Oxford University Press

Published: Jan 23, 2009

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

modbase, a database of annotated comparative protein structure models and associated resources

modbase, a database of annotated comparative protein structure models and associated resources

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

modbase, a database of annotated comparative protein structure models and associated resources

modbase, a database of annotated comparative protein structure models and associated resources

References (158)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies