Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

bioDBnet: the biological database network

bioDBnet: the biological database network Vol. 25 no. 4 2009, pages 555–556 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn654 Databases and ontologies Uma Mudunuri, Anney Che, Ming Yi and Robert M. Stephens Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD 21702, USA Received on October 7, 2008; revised on December 5, 2008; accepted on December 17, 2008 Advance Access publication January 7, 2009 Associate Editor: Alex Bateman ABSTRACT 2 FUNCTIONALITY Summary: bioDBnet is an online web resource that provides The Advanced Biomedical Computing Center maintains local copies interconnected access to many types of biological databases. It has of many widely used biological databases and bioDBnet was integrated many of the most commonly used biological databases created with the intention of integrating all of these databases and in its current state has 153 database identifiers (nodes) covering (http://biodbnet.abcc.ncifcrf.gov/dbInfo/faq.php#net2). It is built by all aspects of biology including genes, proteins, pathways and other a data warehouse-based integration where the connections are biological concepts. bioDBnet offers various ways to work with formed by exploiting the existing cross-references in the local these databases including conversions, extensive database reports, copies of various public data sources, mainly Ensembl, UniProt and custom navigation and has various tools to enhance the quality of EntrezGene. The current release of bioDBnet is built by integrating the results. Importantly, the access to bioDBnet is updated regularly, 20 biological databases and recognizes more than 100 different types providing access to the most recent releases of each individual of database types from the molecular biology database collection database. (http://www.oxfordjournals.org/nar/database/subcat/3/8). It has 153 Availability: http://biodbnet.abcc.ncifcrf.gov database identifiers (nodes) connected by 554 cross-references Contact: stephensr@mail.nih.gov (edges) (http://biodbnet.abcc.ncifcrf.gov/dbInfo/netGraph.php). It Supplementary information: Supplementary data are available at includes gene centric database identifiers like EntrezGene Gene Bioinformatics online ID, Ensembl Gene ID; protein identifiers like UniProt Accession, Ensembl Protein ID; annotations like GO, InterPro; microarray identifiers from Affymetrix, Agilent; Sequence identifiers from 1 INTRODUCTION GenBank, RefSeq; and Pathway identifiers—from Biocarta and Deriving maximal biological insights from diverse platforms KEGG. of high-throughput data frequently requires the data to be Various options within bioDBnet offer a variety of functionalities converted amongst various database identifiers. There are many to suit different user needs. All of these tools support batch queries online resources which offer cross-references to various external and the results are downloadable as both excel and text files. In databases but the type and coverage of those varies depending addition, the identifiers in the results are linked to external resources on the resource (http://www.ensembl.org, http://www.genome. wherever applicable. jp/kegg, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene, http:// Brief descriptions of the main menu options: ‘db2db’ is a www.uniprot.org). Also, the input identifiers for these resources are conversion tool that lets users convert from one type of biological limited to a narrow subset of potential database types. Therefore, database identifier to another. ‘dbFind’ allows users to convert from one has to visit many web resources before getting the required one identifier to any of the standard identifiers in bioDBnet without conversions as the input data vary widely from one experimental specifying the actual type of input. It can be used when the exact platform to another and would also depend on other factors such as type of input is not known or with a mixture of database identifiers the species. (Fig. 1, i). ‘dbReport’ generates an all inclusive report with every bioDBnet offers a convenient solution for such database-related possible annotation for a given type of input (Fig. 1, ii). Wherever queries by integrating many biological databases. With bioDBnet applicable the reports have links to polyBrowse (http://pbrowse2. extensive biological reports can be obtained for many types of abcc.ncifcrf.gov), a gbrowse-based browser (Stein et al., 2002), the biological identifiers and data can be converted between and within UCSC genome browser (http://genome.ucsc.edu, Kent et al., 2002) various biological identifiers. At the same time it does not require for visualizing data on the chromosomes and to DAVID (http://david. learning new technologies or terminologies and also has extensive abcc.ncifcrf.gov, Huang da et al., 2007) for functional annotation background information on the technique and the coverage of the clustering. ‘dbWalk’ is a customizable database conversion tool databases so as to allow users to customize and use the resource in giving the users total control of the type of conversion and the the most productive way. intermediate databases (Fig. 1, iii). This allows a user to incorporate preferences into the path followed, based on the data coverage (http://boidbnet.abcc.ncifcrf.gov/dbInfo/netGraphTbl.php) or the To whom correspondence should be addressed. user’s confidence in the data quality from a particular database. © The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 555 U.Mudunuri et al. highly beneficial for converting these into a unified identifier so as to use them in microarray or pathway analysis software. bioDBnet handles such conversions with ease through dbFind (Fig. 1, i). bioDBnet also allows for custom queries, not performed by any other web integration tool, like retrieving homologs, obtained by the integration of the HomoloGene database, for any type of identifiers, (Fig. 1, iii) not just Gene and Protein identifiers. (Refer to Supplementary Material for comparisons between bioDBnet and other database integration tools). As per our knowledge, compared with any other integration tool, bioDBnet has the most coverage of biological databases (http:// Fig. 1. Partial screen shots of the results from bioDBnet. (i) dbFind to get biodbnet.abcc.ncifcrf.gov/dbInfo/netNodes.php). This allows for the types of a mixture of identifiers and converting them to a single type, more than 1000 types of conversions of biological annotations and in this case Gene ID’s. (ii) Partial report for Entrezgene Identifier ‘1’ from identifiers through db2db and extensive reports for more than 40 dbReport. (iii) dbWalk results page displaying the path to get the mouse different types of identifiers through dbReport. homologs for human Affymetrix Identifiers. bioDBnet also offers various supporting tools to enhance 4 CONCLUSIONS connectivity of biological knowledge and annotations. ‘bioText’ We think that bioDBnet is going to be very useful for any kind can be used to text mine Gene, UniProt or GO (Gene Ontology) of biological data analysis, as it offers a portal where biological annotations. ‘goTree’ displays the GO hierarchy for any GO summaries with identifiers for sequence, annotation and feature accession in a top-down manner starting with the input accession to information can be obtained for multiple, both in terms of number all its parents. Given any type of database identifiers the ‘chrView’ and variety, biological identifiers. tool tries to find their chromosomal location and displays the results bioDBnet is easily extendable to include additional databases and in a movable and zoomable SVG image. This provides for a whole- depending on user interest, many other databases will be added to genome view with an ability to detect clusters. ‘orgTaxon’ provides the network. At this point bioDBnet can handle a few but not all an easy-to-use search interface to find the taxon ID of any organism. obsolete identifiers. In the next release of bioDBnet we intend to have maximum possible coverage of these identifiers along with 3 ADVANTAGES literature mining and the ability to extend user data files with Compared with other integration approaches, bioDBnet is not additional annotations so as to provide for a complete biological developed by a federated architecture for integration nor is it a text database network. index-based retrieval system like SRS (Etzold et al., 1996). The data warehouse-based integration of bioDBnet allows for batch ACKNOWLEDGEMENTS queries of database identifiers using SQL and at the same time does not preclude linking over the web once the internal network The authors like to thank all the members in the Bioinformatics is constructed. Support Group at ABCC especially Gary Smythers, David Liu and Unlike other data warehouse-based integrations like Atlas (Shah Natalia Volfovsky for their help with bioDBnet. et al., 2005) and BioWarehouse (Lee et al., 2006), bioDBnet Funding: National Cancer Institute; National Institutes of Health does not have a common integration schema but incorporates the (contract N01-CO-12400). semantics of the data in a separate layer. This approach keeps the database layer independent of the integration layer, which Conflict of Interest: none declared. in turn offers greater flexibility and also allows easy updates of the underlying databases. At any given point the databases in REFERENCES bioDBnet are at the most a week apart from the current version of their publicly available database counterpart (http://biodbdev.abcc. Birkland,A. and Yona,G. (2006) BIOZON: a hub of heterogenous biological data. Nucleic Acids Res., 34, D235–D242. ncifcrf.gov/dbInfo/faq.php#data3). Do,H.H. and Rahm,E. (2004) Flexible integration of Molecular-biological Annotation The network-based approach of bioDBnet for linking disparate Data: The GenMapper Approach. In Proceedings of Advances in Database data sources is similar in part to BIOZON (Birkland and Yona, Technology, Heraklion, Greece, Springer Berlin/Heidelberg. 2006) and Genmapper (Do and Rahm, 2004), but bioDBnet, Etzold,T. et al. (1996) SRS: information retrieval system for molecular biology data compared with the current versions of both these integration banks. Methods Enzymol., 266, 114–128. Huang da,W. et al. (2007) The DAVID Gene Functional Classification Tool: a tools (http://www.biozon.org/, http://ducati.izbi.unileipzig.de:8080/ novel biological module-centric algorithm to functionally analyze large gene lists. GenMapper/), has a far wider coverage, is more flexible and has an Genome Biol., 8, R183. easy-to-use web interface. Kent,W.J. et al. (2002) The human genome browser at UCSC. Genome Res., 12, None of the above mentioned integration tools, other than 996–1006. Lee,T.J. et al. (2006) BioWarehouse: a bioinformatics database warehouse toolkit. BMC bioDBnet, offer a way to unify multiple types of identifiers. For Bioinformatics, 7, 170. example, database cross-references from GO to human protein Shah,S.P. et al. (2005) Atlas - a data warehouse for integrative bioinformatics. BMC identifiers contain a mixture of RefSeq, Ensembl, UniProt, H-Inv Bioinformatics, 6, 34. (Human-Invitational Database) and Vega protein identifiers (http:// Stein,L.D. et al. (2002) The generic genome browser: a building block for a model www.geneontology.org/GO.current.annotations.shtml). It would be organism system database. Genome Res., 12, 1599–1610. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

bioDBnet: the biological database network

Bioinformatics , Volume 25 (4): 2 – Jan 7, 2009

Loading next page...
 
/lp/oxford-university-press/biodbnet-the-biological-database-network-LhRRXThUya

References (8)

Publisher
Oxford University Press
Copyright
© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btn654
pmid
19129209
Publisher site
See Article on Publisher Site

Abstract

Vol. 25 no. 4 2009, pages 555–556 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn654 Databases and ontologies Uma Mudunuri, Anney Che, Ming Yi and Robert M. Stephens Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD 21702, USA Received on October 7, 2008; revised on December 5, 2008; accepted on December 17, 2008 Advance Access publication January 7, 2009 Associate Editor: Alex Bateman ABSTRACT 2 FUNCTIONALITY Summary: bioDBnet is an online web resource that provides The Advanced Biomedical Computing Center maintains local copies interconnected access to many types of biological databases. It has of many widely used biological databases and bioDBnet was integrated many of the most commonly used biological databases created with the intention of integrating all of these databases and in its current state has 153 database identifiers (nodes) covering (http://biodbnet.abcc.ncifcrf.gov/dbInfo/faq.php#net2). It is built by all aspects of biology including genes, proteins, pathways and other a data warehouse-based integration where the connections are biological concepts. bioDBnet offers various ways to work with formed by exploiting the existing cross-references in the local these databases including conversions, extensive database reports, copies of various public data sources, mainly Ensembl, UniProt and custom navigation and has various tools to enhance the quality of EntrezGene. The current release of bioDBnet is built by integrating the results. Importantly, the access to bioDBnet is updated regularly, 20 biological databases and recognizes more than 100 different types providing access to the most recent releases of each individual of database types from the molecular biology database collection database. (http://www.oxfordjournals.org/nar/database/subcat/3/8). It has 153 Availability: http://biodbnet.abcc.ncifcrf.gov database identifiers (nodes) connected by 554 cross-references Contact: stephensr@mail.nih.gov (edges) (http://biodbnet.abcc.ncifcrf.gov/dbInfo/netGraph.php). It Supplementary information: Supplementary data are available at includes gene centric database identifiers like EntrezGene Gene Bioinformatics online ID, Ensembl Gene ID; protein identifiers like UniProt Accession, Ensembl Protein ID; annotations like GO, InterPro; microarray identifiers from Affymetrix, Agilent; Sequence identifiers from 1 INTRODUCTION GenBank, RefSeq; and Pathway identifiers—from Biocarta and Deriving maximal biological insights from diverse platforms KEGG. of high-throughput data frequently requires the data to be Various options within bioDBnet offer a variety of functionalities converted amongst various database identifiers. There are many to suit different user needs. All of these tools support batch queries online resources which offer cross-references to various external and the results are downloadable as both excel and text files. In databases but the type and coverage of those varies depending addition, the identifiers in the results are linked to external resources on the resource (http://www.ensembl.org, http://www.genome. wherever applicable. jp/kegg, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene, http:// Brief descriptions of the main menu options: ‘db2db’ is a www.uniprot.org). Also, the input identifiers for these resources are conversion tool that lets users convert from one type of biological limited to a narrow subset of potential database types. Therefore, database identifier to another. ‘dbFind’ allows users to convert from one has to visit many web resources before getting the required one identifier to any of the standard identifiers in bioDBnet without conversions as the input data vary widely from one experimental specifying the actual type of input. It can be used when the exact platform to another and would also depend on other factors such as type of input is not known or with a mixture of database identifiers the species. (Fig. 1, i). ‘dbReport’ generates an all inclusive report with every bioDBnet offers a convenient solution for such database-related possible annotation for a given type of input (Fig. 1, ii). Wherever queries by integrating many biological databases. With bioDBnet applicable the reports have links to polyBrowse (http://pbrowse2. extensive biological reports can be obtained for many types of abcc.ncifcrf.gov), a gbrowse-based browser (Stein et al., 2002), the biological identifiers and data can be converted between and within UCSC genome browser (http://genome.ucsc.edu, Kent et al., 2002) various biological identifiers. At the same time it does not require for visualizing data on the chromosomes and to DAVID (http://david. learning new technologies or terminologies and also has extensive abcc.ncifcrf.gov, Huang da et al., 2007) for functional annotation background information on the technique and the coverage of the clustering. ‘dbWalk’ is a customizable database conversion tool databases so as to allow users to customize and use the resource in giving the users total control of the type of conversion and the the most productive way. intermediate databases (Fig. 1, iii). This allows a user to incorporate preferences into the path followed, based on the data coverage (http://boidbnet.abcc.ncifcrf.gov/dbInfo/netGraphTbl.php) or the To whom correspondence should be addressed. user’s confidence in the data quality from a particular database. © The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 555 U.Mudunuri et al. highly beneficial for converting these into a unified identifier so as to use them in microarray or pathway analysis software. bioDBnet handles such conversions with ease through dbFind (Fig. 1, i). bioDBnet also allows for custom queries, not performed by any other web integration tool, like retrieving homologs, obtained by the integration of the HomoloGene database, for any type of identifiers, (Fig. 1, iii) not just Gene and Protein identifiers. (Refer to Supplementary Material for comparisons between bioDBnet and other database integration tools). As per our knowledge, compared with any other integration tool, bioDBnet has the most coverage of biological databases (http:// Fig. 1. Partial screen shots of the results from bioDBnet. (i) dbFind to get biodbnet.abcc.ncifcrf.gov/dbInfo/netNodes.php). This allows for the types of a mixture of identifiers and converting them to a single type, more than 1000 types of conversions of biological annotations and in this case Gene ID’s. (ii) Partial report for Entrezgene Identifier ‘1’ from identifiers through db2db and extensive reports for more than 40 dbReport. (iii) dbWalk results page displaying the path to get the mouse different types of identifiers through dbReport. homologs for human Affymetrix Identifiers. bioDBnet also offers various supporting tools to enhance 4 CONCLUSIONS connectivity of biological knowledge and annotations. ‘bioText’ We think that bioDBnet is going to be very useful for any kind can be used to text mine Gene, UniProt or GO (Gene Ontology) of biological data analysis, as it offers a portal where biological annotations. ‘goTree’ displays the GO hierarchy for any GO summaries with identifiers for sequence, annotation and feature accession in a top-down manner starting with the input accession to information can be obtained for multiple, both in terms of number all its parents. Given any type of database identifiers the ‘chrView’ and variety, biological identifiers. tool tries to find their chromosomal location and displays the results bioDBnet is easily extendable to include additional databases and in a movable and zoomable SVG image. This provides for a whole- depending on user interest, many other databases will be added to genome view with an ability to detect clusters. ‘orgTaxon’ provides the network. At this point bioDBnet can handle a few but not all an easy-to-use search interface to find the taxon ID of any organism. obsolete identifiers. In the next release of bioDBnet we intend to have maximum possible coverage of these identifiers along with 3 ADVANTAGES literature mining and the ability to extend user data files with Compared with other integration approaches, bioDBnet is not additional annotations so as to provide for a complete biological developed by a federated architecture for integration nor is it a text database network. index-based retrieval system like SRS (Etzold et al., 1996). The data warehouse-based integration of bioDBnet allows for batch ACKNOWLEDGEMENTS queries of database identifiers using SQL and at the same time does not preclude linking over the web once the internal network The authors like to thank all the members in the Bioinformatics is constructed. Support Group at ABCC especially Gary Smythers, David Liu and Unlike other data warehouse-based integrations like Atlas (Shah Natalia Volfovsky for their help with bioDBnet. et al., 2005) and BioWarehouse (Lee et al., 2006), bioDBnet Funding: National Cancer Institute; National Institutes of Health does not have a common integration schema but incorporates the (contract N01-CO-12400). semantics of the data in a separate layer. This approach keeps the database layer independent of the integration layer, which Conflict of Interest: none declared. in turn offers greater flexibility and also allows easy updates of the underlying databases. At any given point the databases in REFERENCES bioDBnet are at the most a week apart from the current version of their publicly available database counterpart (http://biodbdev.abcc. Birkland,A. and Yona,G. (2006) BIOZON: a hub of heterogenous biological data. Nucleic Acids Res., 34, D235–D242. ncifcrf.gov/dbInfo/faq.php#data3). Do,H.H. and Rahm,E. (2004) Flexible integration of Molecular-biological Annotation The network-based approach of bioDBnet for linking disparate Data: The GenMapper Approach. In Proceedings of Advances in Database data sources is similar in part to BIOZON (Birkland and Yona, Technology, Heraklion, Greece, Springer Berlin/Heidelberg. 2006) and Genmapper (Do and Rahm, 2004), but bioDBnet, Etzold,T. et al. (1996) SRS: information retrieval system for molecular biology data compared with the current versions of both these integration banks. Methods Enzymol., 266, 114–128. Huang da,W. et al. (2007) The DAVID Gene Functional Classification Tool: a tools (http://www.biozon.org/, http://ducati.izbi.unileipzig.de:8080/ novel biological module-centric algorithm to functionally analyze large gene lists. GenMapper/), has a far wider coverage, is more flexible and has an Genome Biol., 8, R183. easy-to-use web interface. Kent,W.J. et al. (2002) The human genome browser at UCSC. Genome Res., 12, None of the above mentioned integration tools, other than 996–1006. Lee,T.J. et al. (2006) BioWarehouse: a bioinformatics database warehouse toolkit. BMC bioDBnet, offer a way to unify multiple types of identifiers. For Bioinformatics, 7, 170. example, database cross-references from GO to human protein Shah,S.P. et al. (2005) Atlas - a data warehouse for integrative bioinformatics. BMC identifiers contain a mixture of RefSeq, Ensembl, UniProt, H-Inv Bioinformatics, 6, 34. (Human-Invitational Database) and Vega protein identifiers (http:// Stein,L.D. et al. (2002) The generic genome browser: a building block for a model www.geneontology.org/GO.current.annotations.shtml). It would be organism system database. Genome Res., 12, 1599–1610.

Journal

BioinformaticsOxford University Press

Published: Jan 7, 2009

There are no references for this article.