Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DrugBank: a knowledgebase for drugs, drug actions and drug targets

DrugBank: a knowledgebase for drugs, drug actions and drug targets Published online 29 November 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D901–D906 doi:10.1093/nar/gkm958 DrugBank: a knowledgebase for drugs, drug actions and drug targets David S. Wishart*, Craig Knox, An Chi Guo, Dean Cheng, Savita Shrivastava, Dan Tzur, Bijaya Gautam and Murtaza Hassanali Department of Computing Science and Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E8 Received September 15, 2007; Revised October 11, 2007; Accepted October 15, 2007 selected drugs (their pharmacology, metabolism and ABSTRACT indications) with their data content being targeted more DrugBank is a richly annotated resource that towards pharmacists, physicians or consumers. Examples combines detailed drug data with comprehensive of chemically oriented drug (or small molecule) databases drug target and drug action information. Since its include the TTD (3), the Druggable Genome database (4), first release in 2006, DrugBank has been widely used KEGG (5), PubChem (6) and ChEBI (7). These excellent to facilitate in silico drug target discovery, drug databases provide synoptic data (5–10 data fields per design, drug docking or screening, drug metabolism entry) about the nomenclature, structure and/or physical properties of large numbers of small molecule drugs and, prediction, drug interaction prediction and general in some cases, their drug targets. Chemically oriented drug pharmaceutical education. The latest version of databases are typically oriented towards medicinal che- DrugBank (release 2.0) has been expanded signifi- mists, biochemists and molecular biologists. As a general cantly over the previous release. With 4900 drug rule, chemically oriented drug databases aim for very entries, it now contains 60% more FDA-approved broad coverage at the expense of depth, while clinically small molecule and biotech drugs including 10% oriented drug resources aim for far more depth (albeit in more ‘experimental’ drugs. Significantly, more pro- English sentences) at the expense of coverage. tein target data has also been added to the data- In an effort to bridge the ‘depth versus breadth’ gap base, with the latest version of DrugBank containing between clinically oriented drug resources and chemically three times as many non-redundant protein or drug oriented drug databases, we developed DrugBank (8). target sequences as before (1565 versus 524). Each First released in 2006, DrugBank was designed to serve as DrugCard entry now contains more than 100 data a comprehensive, fully searchable in silico drug resource that linked sequence, structure and mechanistic data fields with half of the information being devoted to about drug molecules (including biotech drugs) with drug/chemical data and the other half devoted to sequence, structure and mechanistic data about their pharmacological, pharmacogenomic and molecular drug targets. As a clinically oriented drug encyclopedia, biological data. A number of new data fields, includ- DrugBank is able to provide detailed, up-to-date, quanti- ing food–drug interactions, drug–drug interactions tative, analytic or molecular-scale information about and experimental ADME data have been added in drugs, drug targets and the biological or physiological response to numerous user requests. DrugBank has consequences of drug actions. As a chemically oriented also significantly improved the power and simplicity drug database, DrugBank is able to provide many built-in of its structure query and text query searches. tools for viewing, sorting, searching and extracting text, DrugBank is available at http://www.drugbank.ca image, sequence or structure data. Since its initial release, DrugBank has been used in a wide range of applications including in silico drug discovery (9), drug ‘rejuvenation’ INTRODUCTION (10), drug docking or screening (11), drug metabolism prediction (12), drug target prediction (13) and general There are essentially two kinds of online drug pharmaceutical education. Feedback from users has led resources: (i) clinically oriented drug ‘encyclopedias’ and to many excellent suggestions on how to expand and (ii) chemically oriented drug databases. Examples of some of the better clinically oriented drug resources include enhance DrugBank’s offerings. These requests also led to PharmGKB (1) and RxList (2). These knowledgebases the development of several new software tools to improve tend to offer very detailed clinical information about the entry, export and annotation of DrugBank’s data. *To whom correspondence should be addressed. Tel: 780-492-0383; Fax: 780-492-1071; Email: [email protected] 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D902 Nucleic Acids Research, 2008, Vol. 36, Database issue Here, we wish to report on these developments as well as drugs (or drug-like) compounds, which is primarily many additions and improvements appearing in the latest derived from the PDB’s Ligand database, has expanded version of DrugBank (release 2.0). to include 3116 compounds, compared to 2896 com- pounds in the first release. We are pleased to note that these experimental drugs have now been more completely DATABASE ENHANCEMENTS annotated, via BioSpider (14), than in the previous DrugBank release. Details relating to DrugBank’s overall design, querying In response to many user requests, we have also added capabilities, curation protocols, quality assurance and two new drug categories: (i) Withdrawn drugs and (ii) drug selection criteria have been described previously (8). Illicit drugs. Withdrawn drugs are those that have been These have largely remained the same between release 1.0 withdrawn from the market or certain market segments and 2.0. Here, we shall focus primarily on describing the due to safety concerns (such as Vioxx and Bextra). Illicit changes and enhancements made to the database and to drugs include those that are legally banned or selectively the annotation processes for release 2.0. More specifically, banned in most developed nations (such as cocaine and we will describe the: (i) enhancements to the DrugBank’s heroin). Chemical, pharmaceutical and biological infor- size and coverage; (ii) expanded database linkages; mation about these classes of drugs is extremely impor- (iii) data field additions; (iv) improvements in data tant, not only in understanding their adverse reactions, querying and data viewing and (v) improvements to but also in being able to predict whether a new drug entity DrugBank’s data handling processes. may have unexpected chemical or functional similarities to a dangerous drug. The number of drugs in the Expanded database size and coverage ‘Withdrawn’ category is 57, while the number of drugs A detailed content comparison between DrugBank in the ‘Illicit’ category is 188. As with all other entries in (release 1.0) versus DrugBank (release 2.0) is provided in DrugBank, the same level of drug, drug target and drug Table 1. As seen here, the latest release of DrugBank now action information has been collected for these drugs as has detailed information on 1467 FDA-approved drugs with all other drug entries in DrugBank. If one counts all corresponding to 28 447 brand names and synonyms. This drug entries in DrugBank (FDA-approved, Experimental, represents an expansion of nearly 60% over what was Biotech, Nutraceutical, Withdrawn, Illicit), the total previously contained in the database. The latest number of drugs or drug-like molecules comes to 4897, DrugBank release also includes 123 biotech (peptide or which represents an increase by 25% over the previous protein) drugs and 69 nutraceuticals (nutritional supple- release. ments), which corresponds to an increase of 10% over A significant increase in the number (and coverage) of what was in the previous DrugBank release. While many identified drug targets in DrugBank has been achieved for of these additions represent newly approved drugs (about this release of DrugBank, with 1565 non-redundant 50 new drugs are approved each year), a number of these protein/DNA targets being identified for FDA-approved new entries are little known, hard-to-find or infrequently drugs compared to 524 non-redundant targets identified in prescribed drugs that are not contained in most drug release 1.0. The identification of so many more targets was databases. To the best of our knowledge, DrugBank now aided by PolySearch (http://wishart.biology.ualberta.ca/ contains all (or almost all) drugs that have been approved polysearch/), a text-mining tool developed in our labora- in North America, Europe and Asia. In addition, tory to facilitate these kinds of searches. Additional details DrugBank’s collection of experimental or unapproved about PolySearch appear later in this article. All of these newly identified protein targets are fully referenced to an average of four PubMed citations each. Table 1. Comparison between the data content in DrugBank (release Of particular interest to many is DrugBank’s list of drug 1.0) versus DrugBank (release 2.0) targets. Several other drug target lists have been compiled or presented including those in TTD (3), as well as others Category Release Release by Hopkins et al. (15), Drews and Ryser (16), Imming 1.0 2.0 et al. (17) and Overington et al. (18). These report 578 molecular targets (out of 1512 total targets including No. of FDA-approved small molecule drugs 841 1344 No. of biotech drugs 113 123 disease and organism targets), 248 protein targets (out of No. of nutraceutical drugs 61 69 399 molecular targets), 483 molecular targets, 218 No. of withdrawn drugs 0 57 molecular targets and 324 molecular targets, respectively. No. of illicit drugs 0 188 DrugBank’s list of drug targets is 3–4 times larger than No. of experimental drugs 2894 3116 No. of total Small molecule drugs 3796 4774 these. The primary reasons are: (i) DrugBank has a much No. of total drugs 3909 4897 larger collection of small molecule drugs (approximately No. of names/brand names/synonyms 18 304 28 447 two times larger than any other resource), (ii) DrugBank No. of data fields 88 108 includes biotech drugs and nutraceuticals (which average No. of food–drug interactions 0 714 5–10 unique target proteins per drug), (iii) most other drug No. of drug–drug interactions 0 13 242 No. of ADMET parameters (Caco-2, LogS) 0 276 target lists only include a single ‘primary’ target rather No. of approved drug targets (non-redundant) 524 1565 than all targets that have been found to have physiological No. of all drug targets (non-redundant) 2133 3037 or pharmaceutical effects, (iv) DrugBank fully accounts No. of search types 8 12 for the fact that many drug targets are protein complexes Nucleic Acids Research, 2008, Vol. 36, Database issue D903 Figure 1. A screenshot montage of some of DrugBank’s new or modified querying tools including ChemQuery, TextQuery and an example of the new generic text query output. D904 Nucleic Acids Research, 2008, Vol. 36, Database issue composed of multiple subunits or combinations of food/drug compilations represent the most complete, subunits and (v) DrugBank annotators identify molecules publicly accessible collection of its kind. This interac- as drug targets if they play a critical role in the transport, tion information is particularly useful for physicians, delivery or activation of the drug. pharmacists and patients. However, it is also of increasing As a general rule, when more than one drug target is interest to those involved in pharmacogenomics and listed in DrugBank, the ordering of the drug targets nutrigenomics. corresponds ‘approximately’ to their order of physiologi- cal effect or their importance regarding the drug’s Enhanced querying and viewing capabilities therapeutic indication(s). A key feature that distinguishes DrugBank from other online drug resources is its extensive support for higher Expanded database linkages level database searching and selecting functions. In DrugBank is a database that contains extensive links addition to standard data viewing and sorting features, to almost all major bioinformatics and biomedical DrugBank also offers a generic text search, a local databases (GenBank, SwissProt/UniProt, PDB, ChEBI, BLAST search (SeqSearch), a higher level Boolean text KEGG, PubChem and PubMed). It also contains many search (TextQuery), a chemical structure search utility links to numerous drug and pharmaceutical databases (ChemQuery) and a relational data extraction tool (Data (RxList, PharmGKB and FDA labels). Over the past year, Extractor). Each of these search utilities has a number of DrugBank has also been reciprocally linked by SwissProt/ useful bioinformatics or cheminformatic applications, UniProt, Wikipedia, BioMOBY (19) and PubChem many of which were described in the first DrugBank (October 2007). Because of DrugBank’s appeal as an publication (8). For the latest release of DrugBank, we educational or public information resource, we are have added a number of improvements to both the generic actively seeking to expand these reciprocal linkages with text search and ChemQuery (Figure 1). In particular, the other databases and online resources. For example, all generic text search has been enhanced so that users now drug entries in Wikipedia are now linked to DrugBank have the option of clicking on check boxes to limit their and most drug ‘fact boxes’ in Wikipedia are actually search to either a drug’s common name, its synonyms/ generated from DrugBank tables. For the latest release brand names or all text fields. Because the vast majority of of DrugBank, several new database links have been queries to DrugBank are related to drug names/synonyms, added including hyperlinks to Wikipedia, PDRHealth, the default query always has these two boxes checked off. the Drug Product Database (DPD), the Human Genome Users wishing to search through the other 100+ data Nomenclature Commission (HGNC), GeneCards (20) and fields in DrugBank can select the ‘all text fields’ box. This GeneAtlas (21). change has also substantially improved the query response times for most DrugBank text searches. Data field additions Because the spelling of many drug names, chemical compound names and protein names is often difficult or As seen in Table 1, DrugBank now contains 107 data non-intuitive, DrugBank now supports an ‘intelligent’ text fields, compared to 88 data fields in release 1.0. Some search, where alternative spellings to misspelled or of these data fields have arisen to facilitate cataloging, incompletely entered names are automatically provided. but most have been added in response to user needs In addition to this change, the results from text queries and user requests. Specifically, these new data fields have also been enhanced so that the standard tabular include: (i) a primary accession number; (ii) a secondary output (primary accession number, generic drug name, accession number; (iii) drug synonyms; (iv) a compound chemical formula and molecular weight) is supplemented description; (v) drug brand names; (vi) SwissProt name with the query word highlighted in the selected DrugCard (if the drug is a peptide/protein drug); (vii) monoisotopic field(s) from which it was retrieved. molecular weight; (viii) isomeric SMILES string; To accommodate a variety of user requests and prefer- (ix) water solubility predicted via ALOGPS (22); ences, the ChemQuery tool has been modified for release (x) LogP predicted via ALOGPS; (xi) CACO permeabil- 2.0 to allow two different types of chemical drawing ity; (xii) experimental water solubility (LogS); (xiii) drug– applets to be used: the MarvinSketch (http://www. drug interactions; (xiv) food–drug interactions; (xv) chemaxon.com) structure drawing tool (new) and the Human Protein Reference Database ID; (xvi) HGNC ID; (xvii) GeneCards ID and (xviii) GeneAtlas ID. A total ACD structure drawing tool (old). The MarvinSketch of 194 experimental LogS values and 82 experimental applet is somewhat more intuitive and easier to use, while Caco-2 permeability values were obtained from the UCSD the ChemSketch (ACD) applet is somewhat more complex ADME databases (23). These values, along with the but offers more structural drawing options. The default structural and physico-chemical data in DrugBank, ChemQuery tool for this release is the MarvinSketch are particularly useful for computational ADMET applet. DrugBank’s structure querying capabilities have (Absorption, Distribution, Metabolism, Excretion and also been enhanced with the addition of a ‘Show Similar Toxicty) prediction. Additionally, 714 food–drug interac- Structure(s)’ button located at the top of every DrugCard. tions and 13 242 drug–drug interactions were compiled This allows users to rapidly search for structurally similar (through a variety of web and textbook resources), small molecules, without having to redraw the molecule checked by an accredited pharmacist and entered and search the database through the ChemQuery inter- manually. As far as we are aware, these drug/drug and face. Users can also limit their structure similarity search Nucleic Acids Research, 2008, Vol. 36, Database issue D905 to selected DrugBank subdatabases (Approved drugs, content-rich web sites using only a compound name, Nutracueticals, Illicit drugs, etc.) through a pull-down SMILES string or Chemical Abstract Service (CAS) menu located by the ‘Show Similar Structure(s)’ button. number as input. It then combines this data with a variety Both ‘Show Similar Structures’ and ChemQuery use of in-house molecular structure and property prediction a locally developed SMILES string comparison method tools to generate data tables that corresponds to many of to identify related structures and to perform structure the data fields in DrugBank. BioSpider allows many of the similarity searches. All structures are converted to tedious, error-prone or repetitive annotation activities in SMILES strings and a substring-matching program DrugBank to be handled by a computer, allowing our (similar to BLAST) is used to identify similar structures. annotation team to concentrate on higher level annotation The scoring scheme is based simply on the number of tasks (such as, gathering data on pharmacology, mechan- character matches for the longest matching substring. ism of action, metabolism or drug interactions). BioSpider has been extensively evaluated (14) and has been found to perform much better and much faster than skilled Improved data handling (entry, export and annotation) human annotators in these low-level annotation tasks. For most of the past 5 years, DrugBank has existed as To complement BioSpider’s role in low-level annotation, a series of text files that were manually edited or flat files we have also developed PolySearch to enhance higher level that were populated by writing Perl scripts to reformat annotation and research. PolySearch is a text-mining tool existing text to the DrugBank file format. Most of the designed to mine data from abstracts in PubMed. It is annotation in DrugBank (release 1.0) was assembled, similar in concept and design to EBIMed (25) and entered and validated manually. With the rapid growth in MedGene (26), but has been modified to facilitate the the size and scope of DrugBank, along with the continuing extraction of informative sentences or infor- needs for updates, we have had to become far more mative abstracts related to drugs, drug targets, drug efficient in our data management. Specifically, we have metabolites, diseases, proteins and drug–protein interac- had to streamline our methods for data entry, data export tions. PolySearch is used as an adjunct to our manual and database annotation. However, we have continued to annotation efforts and has greatly aided the identification maintain our same rigorous standards for manual data of numerous or little-known drug targets. validation. All textual data acquired from the BioSpider and To facilitate manual data entry and export for release PolySearch annotation programs are manually inspected 2.0, we have developed customized scientific data manage- by a minimum of two individuals, with at least one ment software (SDMS) called DrugBank–SDMS. This individual having an MD or a life science PhD. Additional web-enabled database system was built using the open spot checks are routinely performed on each entry by source Ruby-on-Rails web application framework. This senior members of the curation group, including a SDMS overlays a MySQL database that contains all of physician, an accredited pharmacist and two PhD-level the DrugBank data. The publicly viewable version of biochemists. While most information listed in the ‘Drug DrugBank is directly linked to the DrugBank–SDMS such Description’, ‘Pharmacology’, ‘Mechanisms of Action’, that every night the SDMS data is automatically exported ‘Half Life’, ‘Biotransformation Data’, ‘Protein Binding’, to the DrugBank server. This ‘near synchrony’ between ‘Toxicity’, ‘Absorption’ and ‘Indications’ data fields is the SDMS and DrugBank allows our database annotators manually entered, those entries that are acquired from our to remotely access the SDMS, to add data, to check automated annotation tools are all manually verified and entries or to make corrections in real time, without the edited (or rewritten) for readability and consistency. All need to write (or wait for) custom Perl scripts for data PolySearch-derived drug target data, in particular, has uploads. The use of a SDMS also allows for more been verified through multiple text sources (PubMed, drug extensive error checking. This is done both at the time of references, online sequence databases, online drug data- entry (via automated format and spelling checks) and later bases and FDA labels) by at least two members of the (once a week), through the use of ‘sanity checker’ DrugBank curatorial staff. Drugs with near-identical (Supplementary Table 1) that checks the consistency of structures and modes of action are cross-checked to chemical structure files, chemical formulae and chemical ensure that their drug target lists are nearly identical. properties using a variety of custom-built prediction and In addition to these manual checks, nearly 40 automated file-formatting programs (8, 14, 24). The development of data consistency checks are performed to ensure a a custom SDMS has also facilitated the export of publicly uniformly high level of data integrity (Supplementary downloadable DrugBank files. In particular, our SDMS Table 1). Even with these added checks and references we allows rapid generation of all of DrugBank’s flat file (text) still recommend that users carefully study the data sources downloads and facile creation of XML-formatted prior to making decisions about using it. DrugBank files—all of which are available through DrugBank’s download link. To improve our manual annotation efficiency and FUTURE DIRECTIONS coverage, the programming staff at DrugBank has developed several automated text and web-mining tools The DrugBank model of ‘breadth + depth’ has served as including BioSpider (14) and PolySearch. BioSpider is a a good template for the development of other small web spider that automatically gathers biological, chemical molecule databases in our laboratory, including the and pharmacological data from approximately 30 trusted, Human Metabolome Database or HMDB (24) and D906 Nucleic Acids Research, 2008, Vol. 36, Database issue Database resources of the National Center for Biotechnology FooDB (http://hmdb.med.ualberta.ca/foodb). The lessons Information. Nucleic Acids Res., 35 (Database issue), D5–D12. learned from building these and other related ‘metabo- 7. Brooksbank,C., Cameron,G. and Thornton,J. (2005) The European lomic’ databases are also helping to generate ideas, Bioinformatics Institute’s data resources: towards systems biology. software and protocols that could significantly enhance Nucleic Acids Res., 33 (Database issue), D46–D53. 8. Wishart,D.S., Knox,C., Guo,A.C., Shrivastava,S., Hassanali,M., the breadth and depth of information contained in future Stothard,P., Chang,Z. and Woolsey,J. (2006) DrugBank: a com- releases of DrugBank. Over the coming 3 years, prehensive resource for in silico drug discovery and exploration. DrugBank will adhere to a semi-annual updating schedule Nucleic Acids Res., 34 (Database issue), D668–D672. with new updates being released on the January 1 and July 9. Chang,C., Bahadduri,P.M., Polli,J.E., Swaan,P.W. and Ekins,S. 1 of each year. This will allow information on newly (2006) Rapid identification of P-glycoprotein substrates and inhibitors. Drug Metab. Dispos., 34, 1976–1984. approved and newly withdrawn drugs to be kept current. 10. Chong,C.R., Sullivan, and D.J., Jr (2007) New uses for old drugs. Previous versions of the database will be available from Nature, 448, 645–646. the DrugBank download page. A major focus over the 11. Li,H., Gao,Z., Kang,L., Zhang,H., Yang,K., Yu,K., Luo,X., coming 2 years will be to extend the database’s querying Zhu,W., Chen,K. et al. (2006) TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res., capabilities (improved structure searches), to acquire more 34 (Web Server issue), W219–W224. experimental spectral (MS and NMR) data, to expand its 12. Jolivette,L.J. and Ekins,S. (2007) Methods for predicting human coverage of nutraceuticals or herbal medicines, to enhance drug metabolism. Adv. Clin. Chem., 43, 131–176. the annotation of research/experimental compounds, to 13. Wishart,D.S. (2007) Discovering drug targets through the web. add many more pathway or network diagrams and to add Comp. Biochem. Physiol. D, 2, 9–17. 14. Knox,C., Shrivastava,S., Stothard,P., Eisner,R. and Wishart,D.S. a number of Java plug-ins to facilitate virtual drug (2007) BioSpider: a web server for automating metabolome screening and pharmacological (ADMET) modeling. annotations. Pac. Symp. Biocomput. 145–156. 15. Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., 1, 727–730. SUPPLEMENTARY DATA 16. Drews,J. and Ryser,S. (1997) The role of innovation in drug development. Nat. Biotechnol., 15, 1318–1319. Supplementary Data are available at NAR Online. 17. Imming,P., Sinning,C. and Meyer,A. (2006) Drugs, their targets and the nature and number of drug targets. Nat. Rev. Drug Discov., 5, 821–834. ACKNOWLEDGEMENTS 18. Overington,J.P., Al-Lazikani,B. and Hopkins,A.L. (2006) How The authors wish to thank the Canadian Institutes for many drug targets are there? Nat. Rev. Drug Discov., 5, 993–996. 19. Kawas,E., Senger,M. and Wilkinson,M.D. (2006) BioMoby exten- Health Research (CIHR), as well as Genome Alberta and sions to the Taverna workflow management and enactment Genome Canada for financial support. We are also software. BMC Bioinformatics, 7, 523. indebted to the many users of DrugBank who have 20. Rebhan,M., Chalifa-Caspi,V., Prilusky,J. and Lancet,D. (1998) provided valuable feedback and suggestions. Funding GeneCards: a novel functional genomics compendium with auto- to pay the Open Access publication charges was provided mated data mining and query reformulation support. Bioinformatics, 14, 656–664. by Genome Alberta. 21. Kitson,D.H., Badretdinov,A., Zhu,Z.Y., Velikanov,M., Edwards,D.J., Olszewski,K., Szalma,S. and Yan,L. (2002) Conflict of interest statement: None declared. Functional annotation of proteomic sequences based on consensus of sequence and structural analysis. Brief. Bioinform., 3, 32–44. REFERENCES 22. Tetko,I.V and Tanchuk,V.Y. (2002) Application of associative 1. Hodge,A.E., Altman,R.B. and Klein,T.E. (2007) The PharmGKB: neural networks for prediction of lipophilicity in ALOGPS 2.1 integration, aggregation, and annotation of pharmacogenomic data program. J. Chem. Inf. Comput. Sci., 42, 1136–1145. and knowledge. Clin. Pharmacol. Ther., 81, 21–24. 23. Hou,T., Wang,J., Zhang,W., Wang,W. and Xu,X. (2006) 2. Hatfield,C.L., May,S.K. and Markoff,J.S. (1999) Quality of Recent advances in computational prediction of drug absorption consumer drug information provided by four web sites. Am. J. and permeability in drug discovery. Curr. Med. Chem., 13, Health Syst. Pharm., 56, 2308–2311. 2653–2667. 3. Chen,X., Ji,Z.L. and Chen,Y.Z. (2002) TTD: therapeutic target 24. Wishart,D.S., Tzur,D., Knox,C., Eisner,R., Guo,A.C., Young,N., database. Nucleic Acids Res., 30, 412–415. Cheng,D., Jewell,K., Arndt,D. et al. (2007) HMDB: the Human Metabolome Database. Nucleic Acids Res., 35 (Database issue), 4. Russ,A.P. and Lampel,S. (2005) The druggable genome: an update. D521–D526. Drug Discov. Today, 10, 1607–1610. 25. Rebholz-Schuhmann,D., Kirsch,H., Arregui,M., Gaudan,S., 5. Kanehisa,M., Goto,S., Hattori,M., Aoki-Kinoshita,K.F., Itoh,M., Rynbeek,M. and Stoehr,P. (2006) Protein annotation by EBIMed. Kawashima,S., Katayama,T., Araki,M. and Hirakawa,M. (2006) Nat. Biotechnol., 24, 902–903. From genomics to chemical genomics: new developments in KEGG. 26. Hu,Y., Hines,L.M., Weng,H., Zuo,D., Rivera,M., Richardson,A. Nucleic Acids Res., 34 (Database issue), D354–D357. and LaBaer,J. (2003) Analysis of genomic and proteomic data using 6. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., advanced literature mining. J. Proteome Res., 2, 405–412. Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R. et al. (2007) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

DrugBank: a knowledgebase for drugs, drug actions and drug targets

Loading next page...
 
/lp/oxford-university-press/drugbank-a-knowledgebase-for-drugs-drug-actions-and-drug-targets-rX36yRP6Ck

References (27)

Publisher
Oxford University Press
Copyright
© Published by Oxford University Press.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkm958
pmid
18048412
Publisher site
See Article on Publisher Site

Abstract

Published online 29 November 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D901–D906 doi:10.1093/nar/gkm958 DrugBank: a knowledgebase for drugs, drug actions and drug targets David S. Wishart*, Craig Knox, An Chi Guo, Dean Cheng, Savita Shrivastava, Dan Tzur, Bijaya Gautam and Murtaza Hassanali Department of Computing Science and Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E8 Received September 15, 2007; Revised October 11, 2007; Accepted October 15, 2007 selected drugs (their pharmacology, metabolism and ABSTRACT indications) with their data content being targeted more DrugBank is a richly annotated resource that towards pharmacists, physicians or consumers. Examples combines detailed drug data with comprehensive of chemically oriented drug (or small molecule) databases drug target and drug action information. Since its include the TTD (3), the Druggable Genome database (4), first release in 2006, DrugBank has been widely used KEGG (5), PubChem (6) and ChEBI (7). These excellent to facilitate in silico drug target discovery, drug databases provide synoptic data (5–10 data fields per design, drug docking or screening, drug metabolism entry) about the nomenclature, structure and/or physical properties of large numbers of small molecule drugs and, prediction, drug interaction prediction and general in some cases, their drug targets. Chemically oriented drug pharmaceutical education. The latest version of databases are typically oriented towards medicinal che- DrugBank (release 2.0) has been expanded signifi- mists, biochemists and molecular biologists. As a general cantly over the previous release. With 4900 drug rule, chemically oriented drug databases aim for very entries, it now contains 60% more FDA-approved broad coverage at the expense of depth, while clinically small molecule and biotech drugs including 10% oriented drug resources aim for far more depth (albeit in more ‘experimental’ drugs. Significantly, more pro- English sentences) at the expense of coverage. tein target data has also been added to the data- In an effort to bridge the ‘depth versus breadth’ gap base, with the latest version of DrugBank containing between clinically oriented drug resources and chemically three times as many non-redundant protein or drug oriented drug databases, we developed DrugBank (8). target sequences as before (1565 versus 524). Each First released in 2006, DrugBank was designed to serve as DrugCard entry now contains more than 100 data a comprehensive, fully searchable in silico drug resource that linked sequence, structure and mechanistic data fields with half of the information being devoted to about drug molecules (including biotech drugs) with drug/chemical data and the other half devoted to sequence, structure and mechanistic data about their pharmacological, pharmacogenomic and molecular drug targets. As a clinically oriented drug encyclopedia, biological data. A number of new data fields, includ- DrugBank is able to provide detailed, up-to-date, quanti- ing food–drug interactions, drug–drug interactions tative, analytic or molecular-scale information about and experimental ADME data have been added in drugs, drug targets and the biological or physiological response to numerous user requests. DrugBank has consequences of drug actions. As a chemically oriented also significantly improved the power and simplicity drug database, DrugBank is able to provide many built-in of its structure query and text query searches. tools for viewing, sorting, searching and extracting text, DrugBank is available at http://www.drugbank.ca image, sequence or structure data. Since its initial release, DrugBank has been used in a wide range of applications including in silico drug discovery (9), drug ‘rejuvenation’ INTRODUCTION (10), drug docking or screening (11), drug metabolism prediction (12), drug target prediction (13) and general There are essentially two kinds of online drug pharmaceutical education. Feedback from users has led resources: (i) clinically oriented drug ‘encyclopedias’ and to many excellent suggestions on how to expand and (ii) chemically oriented drug databases. Examples of some of the better clinically oriented drug resources include enhance DrugBank’s offerings. These requests also led to PharmGKB (1) and RxList (2). These knowledgebases the development of several new software tools to improve tend to offer very detailed clinical information about the entry, export and annotation of DrugBank’s data. *To whom correspondence should be addressed. Tel: 780-492-0383; Fax: 780-492-1071; Email: [email protected] 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D902 Nucleic Acids Research, 2008, Vol. 36, Database issue Here, we wish to report on these developments as well as drugs (or drug-like) compounds, which is primarily many additions and improvements appearing in the latest derived from the PDB’s Ligand database, has expanded version of DrugBank (release 2.0). to include 3116 compounds, compared to 2896 com- pounds in the first release. We are pleased to note that these experimental drugs have now been more completely DATABASE ENHANCEMENTS annotated, via BioSpider (14), than in the previous DrugBank release. Details relating to DrugBank’s overall design, querying In response to many user requests, we have also added capabilities, curation protocols, quality assurance and two new drug categories: (i) Withdrawn drugs and (ii) drug selection criteria have been described previously (8). Illicit drugs. Withdrawn drugs are those that have been These have largely remained the same between release 1.0 withdrawn from the market or certain market segments and 2.0. Here, we shall focus primarily on describing the due to safety concerns (such as Vioxx and Bextra). Illicit changes and enhancements made to the database and to drugs include those that are legally banned or selectively the annotation processes for release 2.0. More specifically, banned in most developed nations (such as cocaine and we will describe the: (i) enhancements to the DrugBank’s heroin). Chemical, pharmaceutical and biological infor- size and coverage; (ii) expanded database linkages; mation about these classes of drugs is extremely impor- (iii) data field additions; (iv) improvements in data tant, not only in understanding their adverse reactions, querying and data viewing and (v) improvements to but also in being able to predict whether a new drug entity DrugBank’s data handling processes. may have unexpected chemical or functional similarities to a dangerous drug. The number of drugs in the Expanded database size and coverage ‘Withdrawn’ category is 57, while the number of drugs A detailed content comparison between DrugBank in the ‘Illicit’ category is 188. As with all other entries in (release 1.0) versus DrugBank (release 2.0) is provided in DrugBank, the same level of drug, drug target and drug Table 1. As seen here, the latest release of DrugBank now action information has been collected for these drugs as has detailed information on 1467 FDA-approved drugs with all other drug entries in DrugBank. If one counts all corresponding to 28 447 brand names and synonyms. This drug entries in DrugBank (FDA-approved, Experimental, represents an expansion of nearly 60% over what was Biotech, Nutraceutical, Withdrawn, Illicit), the total previously contained in the database. The latest number of drugs or drug-like molecules comes to 4897, DrugBank release also includes 123 biotech (peptide or which represents an increase by 25% over the previous protein) drugs and 69 nutraceuticals (nutritional supple- release. ments), which corresponds to an increase of 10% over A significant increase in the number (and coverage) of what was in the previous DrugBank release. While many identified drug targets in DrugBank has been achieved for of these additions represent newly approved drugs (about this release of DrugBank, with 1565 non-redundant 50 new drugs are approved each year), a number of these protein/DNA targets being identified for FDA-approved new entries are little known, hard-to-find or infrequently drugs compared to 524 non-redundant targets identified in prescribed drugs that are not contained in most drug release 1.0. The identification of so many more targets was databases. To the best of our knowledge, DrugBank now aided by PolySearch (http://wishart.biology.ualberta.ca/ contains all (or almost all) drugs that have been approved polysearch/), a text-mining tool developed in our labora- in North America, Europe and Asia. In addition, tory to facilitate these kinds of searches. Additional details DrugBank’s collection of experimental or unapproved about PolySearch appear later in this article. All of these newly identified protein targets are fully referenced to an average of four PubMed citations each. Table 1. Comparison between the data content in DrugBank (release Of particular interest to many is DrugBank’s list of drug 1.0) versus DrugBank (release 2.0) targets. Several other drug target lists have been compiled or presented including those in TTD (3), as well as others Category Release Release by Hopkins et al. (15), Drews and Ryser (16), Imming 1.0 2.0 et al. (17) and Overington et al. (18). These report 578 molecular targets (out of 1512 total targets including No. of FDA-approved small molecule drugs 841 1344 No. of biotech drugs 113 123 disease and organism targets), 248 protein targets (out of No. of nutraceutical drugs 61 69 399 molecular targets), 483 molecular targets, 218 No. of withdrawn drugs 0 57 molecular targets and 324 molecular targets, respectively. No. of illicit drugs 0 188 DrugBank’s list of drug targets is 3–4 times larger than No. of experimental drugs 2894 3116 No. of total Small molecule drugs 3796 4774 these. The primary reasons are: (i) DrugBank has a much No. of total drugs 3909 4897 larger collection of small molecule drugs (approximately No. of names/brand names/synonyms 18 304 28 447 two times larger than any other resource), (ii) DrugBank No. of data fields 88 108 includes biotech drugs and nutraceuticals (which average No. of food–drug interactions 0 714 5–10 unique target proteins per drug), (iii) most other drug No. of drug–drug interactions 0 13 242 No. of ADMET parameters (Caco-2, LogS) 0 276 target lists only include a single ‘primary’ target rather No. of approved drug targets (non-redundant) 524 1565 than all targets that have been found to have physiological No. of all drug targets (non-redundant) 2133 3037 or pharmaceutical effects, (iv) DrugBank fully accounts No. of search types 8 12 for the fact that many drug targets are protein complexes Nucleic Acids Research, 2008, Vol. 36, Database issue D903 Figure 1. A screenshot montage of some of DrugBank’s new or modified querying tools including ChemQuery, TextQuery and an example of the new generic text query output. D904 Nucleic Acids Research, 2008, Vol. 36, Database issue composed of multiple subunits or combinations of food/drug compilations represent the most complete, subunits and (v) DrugBank annotators identify molecules publicly accessible collection of its kind. This interac- as drug targets if they play a critical role in the transport, tion information is particularly useful for physicians, delivery or activation of the drug. pharmacists and patients. However, it is also of increasing As a general rule, when more than one drug target is interest to those involved in pharmacogenomics and listed in DrugBank, the ordering of the drug targets nutrigenomics. corresponds ‘approximately’ to their order of physiologi- cal effect or their importance regarding the drug’s Enhanced querying and viewing capabilities therapeutic indication(s). A key feature that distinguishes DrugBank from other online drug resources is its extensive support for higher Expanded database linkages level database searching and selecting functions. In DrugBank is a database that contains extensive links addition to standard data viewing and sorting features, to almost all major bioinformatics and biomedical DrugBank also offers a generic text search, a local databases (GenBank, SwissProt/UniProt, PDB, ChEBI, BLAST search (SeqSearch), a higher level Boolean text KEGG, PubChem and PubMed). It also contains many search (TextQuery), a chemical structure search utility links to numerous drug and pharmaceutical databases (ChemQuery) and a relational data extraction tool (Data (RxList, PharmGKB and FDA labels). Over the past year, Extractor). Each of these search utilities has a number of DrugBank has also been reciprocally linked by SwissProt/ useful bioinformatics or cheminformatic applications, UniProt, Wikipedia, BioMOBY (19) and PubChem many of which were described in the first DrugBank (October 2007). Because of DrugBank’s appeal as an publication (8). For the latest release of DrugBank, we educational or public information resource, we are have added a number of improvements to both the generic actively seeking to expand these reciprocal linkages with text search and ChemQuery (Figure 1). In particular, the other databases and online resources. For example, all generic text search has been enhanced so that users now drug entries in Wikipedia are now linked to DrugBank have the option of clicking on check boxes to limit their and most drug ‘fact boxes’ in Wikipedia are actually search to either a drug’s common name, its synonyms/ generated from DrugBank tables. For the latest release brand names or all text fields. Because the vast majority of of DrugBank, several new database links have been queries to DrugBank are related to drug names/synonyms, added including hyperlinks to Wikipedia, PDRHealth, the default query always has these two boxes checked off. the Drug Product Database (DPD), the Human Genome Users wishing to search through the other 100+ data Nomenclature Commission (HGNC), GeneCards (20) and fields in DrugBank can select the ‘all text fields’ box. This GeneAtlas (21). change has also substantially improved the query response times for most DrugBank text searches. Data field additions Because the spelling of many drug names, chemical compound names and protein names is often difficult or As seen in Table 1, DrugBank now contains 107 data non-intuitive, DrugBank now supports an ‘intelligent’ text fields, compared to 88 data fields in release 1.0. Some search, where alternative spellings to misspelled or of these data fields have arisen to facilitate cataloging, incompletely entered names are automatically provided. but most have been added in response to user needs In addition to this change, the results from text queries and user requests. Specifically, these new data fields have also been enhanced so that the standard tabular include: (i) a primary accession number; (ii) a secondary output (primary accession number, generic drug name, accession number; (iii) drug synonyms; (iv) a compound chemical formula and molecular weight) is supplemented description; (v) drug brand names; (vi) SwissProt name with the query word highlighted in the selected DrugCard (if the drug is a peptide/protein drug); (vii) monoisotopic field(s) from which it was retrieved. molecular weight; (viii) isomeric SMILES string; To accommodate a variety of user requests and prefer- (ix) water solubility predicted via ALOGPS (22); ences, the ChemQuery tool has been modified for release (x) LogP predicted via ALOGPS; (xi) CACO permeabil- 2.0 to allow two different types of chemical drawing ity; (xii) experimental water solubility (LogS); (xiii) drug– applets to be used: the MarvinSketch (http://www. drug interactions; (xiv) food–drug interactions; (xv) chemaxon.com) structure drawing tool (new) and the Human Protein Reference Database ID; (xvi) HGNC ID; (xvii) GeneCards ID and (xviii) GeneAtlas ID. A total ACD structure drawing tool (old). The MarvinSketch of 194 experimental LogS values and 82 experimental applet is somewhat more intuitive and easier to use, while Caco-2 permeability values were obtained from the UCSD the ChemSketch (ACD) applet is somewhat more complex ADME databases (23). These values, along with the but offers more structural drawing options. The default structural and physico-chemical data in DrugBank, ChemQuery tool for this release is the MarvinSketch are particularly useful for computational ADMET applet. DrugBank’s structure querying capabilities have (Absorption, Distribution, Metabolism, Excretion and also been enhanced with the addition of a ‘Show Similar Toxicty) prediction. Additionally, 714 food–drug interac- Structure(s)’ button located at the top of every DrugCard. tions and 13 242 drug–drug interactions were compiled This allows users to rapidly search for structurally similar (through a variety of web and textbook resources), small molecules, without having to redraw the molecule checked by an accredited pharmacist and entered and search the database through the ChemQuery inter- manually. As far as we are aware, these drug/drug and face. Users can also limit their structure similarity search Nucleic Acids Research, 2008, Vol. 36, Database issue D905 to selected DrugBank subdatabases (Approved drugs, content-rich web sites using only a compound name, Nutracueticals, Illicit drugs, etc.) through a pull-down SMILES string or Chemical Abstract Service (CAS) menu located by the ‘Show Similar Structure(s)’ button. number as input. It then combines this data with a variety Both ‘Show Similar Structures’ and ChemQuery use of in-house molecular structure and property prediction a locally developed SMILES string comparison method tools to generate data tables that corresponds to many of to identify related structures and to perform structure the data fields in DrugBank. BioSpider allows many of the similarity searches. All structures are converted to tedious, error-prone or repetitive annotation activities in SMILES strings and a substring-matching program DrugBank to be handled by a computer, allowing our (similar to BLAST) is used to identify similar structures. annotation team to concentrate on higher level annotation The scoring scheme is based simply on the number of tasks (such as, gathering data on pharmacology, mechan- character matches for the longest matching substring. ism of action, metabolism or drug interactions). BioSpider has been extensively evaluated (14) and has been found to perform much better and much faster than skilled Improved data handling (entry, export and annotation) human annotators in these low-level annotation tasks. For most of the past 5 years, DrugBank has existed as To complement BioSpider’s role in low-level annotation, a series of text files that were manually edited or flat files we have also developed PolySearch to enhance higher level that were populated by writing Perl scripts to reformat annotation and research. PolySearch is a text-mining tool existing text to the DrugBank file format. Most of the designed to mine data from abstracts in PubMed. It is annotation in DrugBank (release 1.0) was assembled, similar in concept and design to EBIMed (25) and entered and validated manually. With the rapid growth in MedGene (26), but has been modified to facilitate the the size and scope of DrugBank, along with the continuing extraction of informative sentences or infor- needs for updates, we have had to become far more mative abstracts related to drugs, drug targets, drug efficient in our data management. Specifically, we have metabolites, diseases, proteins and drug–protein interac- had to streamline our methods for data entry, data export tions. PolySearch is used as an adjunct to our manual and database annotation. However, we have continued to annotation efforts and has greatly aided the identification maintain our same rigorous standards for manual data of numerous or little-known drug targets. validation. All textual data acquired from the BioSpider and To facilitate manual data entry and export for release PolySearch annotation programs are manually inspected 2.0, we have developed customized scientific data manage- by a minimum of two individuals, with at least one ment software (SDMS) called DrugBank–SDMS. This individual having an MD or a life science PhD. Additional web-enabled database system was built using the open spot checks are routinely performed on each entry by source Ruby-on-Rails web application framework. This senior members of the curation group, including a SDMS overlays a MySQL database that contains all of physician, an accredited pharmacist and two PhD-level the DrugBank data. The publicly viewable version of biochemists. While most information listed in the ‘Drug DrugBank is directly linked to the DrugBank–SDMS such Description’, ‘Pharmacology’, ‘Mechanisms of Action’, that every night the SDMS data is automatically exported ‘Half Life’, ‘Biotransformation Data’, ‘Protein Binding’, to the DrugBank server. This ‘near synchrony’ between ‘Toxicity’, ‘Absorption’ and ‘Indications’ data fields is the SDMS and DrugBank allows our database annotators manually entered, those entries that are acquired from our to remotely access the SDMS, to add data, to check automated annotation tools are all manually verified and entries or to make corrections in real time, without the edited (or rewritten) for readability and consistency. All need to write (or wait for) custom Perl scripts for data PolySearch-derived drug target data, in particular, has uploads. The use of a SDMS also allows for more been verified through multiple text sources (PubMed, drug extensive error checking. This is done both at the time of references, online sequence databases, online drug data- entry (via automated format and spelling checks) and later bases and FDA labels) by at least two members of the (once a week), through the use of ‘sanity checker’ DrugBank curatorial staff. Drugs with near-identical (Supplementary Table 1) that checks the consistency of structures and modes of action are cross-checked to chemical structure files, chemical formulae and chemical ensure that their drug target lists are nearly identical. properties using a variety of custom-built prediction and In addition to these manual checks, nearly 40 automated file-formatting programs (8, 14, 24). The development of data consistency checks are performed to ensure a a custom SDMS has also facilitated the export of publicly uniformly high level of data integrity (Supplementary downloadable DrugBank files. In particular, our SDMS Table 1). Even with these added checks and references we allows rapid generation of all of DrugBank’s flat file (text) still recommend that users carefully study the data sources downloads and facile creation of XML-formatted prior to making decisions about using it. DrugBank files—all of which are available through DrugBank’s download link. To improve our manual annotation efficiency and FUTURE DIRECTIONS coverage, the programming staff at DrugBank has developed several automated text and web-mining tools The DrugBank model of ‘breadth + depth’ has served as including BioSpider (14) and PolySearch. BioSpider is a a good template for the development of other small web spider that automatically gathers biological, chemical molecule databases in our laboratory, including the and pharmacological data from approximately 30 trusted, Human Metabolome Database or HMDB (24) and D906 Nucleic Acids Research, 2008, Vol. 36, Database issue Database resources of the National Center for Biotechnology FooDB (http://hmdb.med.ualberta.ca/foodb). The lessons Information. Nucleic Acids Res., 35 (Database issue), D5–D12. learned from building these and other related ‘metabo- 7. Brooksbank,C., Cameron,G. and Thornton,J. (2005) The European lomic’ databases are also helping to generate ideas, Bioinformatics Institute’s data resources: towards systems biology. software and protocols that could significantly enhance Nucleic Acids Res., 33 (Database issue), D46–D53. 8. Wishart,D.S., Knox,C., Guo,A.C., Shrivastava,S., Hassanali,M., the breadth and depth of information contained in future Stothard,P., Chang,Z. and Woolsey,J. (2006) DrugBank: a com- releases of DrugBank. Over the coming 3 years, prehensive resource for in silico drug discovery and exploration. DrugBank will adhere to a semi-annual updating schedule Nucleic Acids Res., 34 (Database issue), D668–D672. with new updates being released on the January 1 and July 9. Chang,C., Bahadduri,P.M., Polli,J.E., Swaan,P.W. and Ekins,S. 1 of each year. This will allow information on newly (2006) Rapid identification of P-glycoprotein substrates and inhibitors. Drug Metab. Dispos., 34, 1976–1984. approved and newly withdrawn drugs to be kept current. 10. Chong,C.R., Sullivan, and D.J., Jr (2007) New uses for old drugs. Previous versions of the database will be available from Nature, 448, 645–646. the DrugBank download page. A major focus over the 11. Li,H., Gao,Z., Kang,L., Zhang,H., Yang,K., Yu,K., Luo,X., coming 2 years will be to extend the database’s querying Zhu,W., Chen,K. et al. (2006) TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res., capabilities (improved structure searches), to acquire more 34 (Web Server issue), W219–W224. experimental spectral (MS and NMR) data, to expand its 12. Jolivette,L.J. and Ekins,S. (2007) Methods for predicting human coverage of nutraceuticals or herbal medicines, to enhance drug metabolism. Adv. Clin. Chem., 43, 131–176. the annotation of research/experimental compounds, to 13. Wishart,D.S. (2007) Discovering drug targets through the web. add many more pathway or network diagrams and to add Comp. Biochem. Physiol. D, 2, 9–17. 14. Knox,C., Shrivastava,S., Stothard,P., Eisner,R. and Wishart,D.S. a number of Java plug-ins to facilitate virtual drug (2007) BioSpider: a web server for automating metabolome screening and pharmacological (ADMET) modeling. annotations. Pac. Symp. Biocomput. 145–156. 15. Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., 1, 727–730. SUPPLEMENTARY DATA 16. Drews,J. and Ryser,S. (1997) The role of innovation in drug development. Nat. Biotechnol., 15, 1318–1319. Supplementary Data are available at NAR Online. 17. Imming,P., Sinning,C. and Meyer,A. (2006) Drugs, their targets and the nature and number of drug targets. Nat. Rev. Drug Discov., 5, 821–834. ACKNOWLEDGEMENTS 18. Overington,J.P., Al-Lazikani,B. and Hopkins,A.L. (2006) How The authors wish to thank the Canadian Institutes for many drug targets are there? Nat. Rev. Drug Discov., 5, 993–996. 19. Kawas,E., Senger,M. and Wilkinson,M.D. (2006) BioMoby exten- Health Research (CIHR), as well as Genome Alberta and sions to the Taverna workflow management and enactment Genome Canada for financial support. We are also software. BMC Bioinformatics, 7, 523. indebted to the many users of DrugBank who have 20. Rebhan,M., Chalifa-Caspi,V., Prilusky,J. and Lancet,D. (1998) provided valuable feedback and suggestions. Funding GeneCards: a novel functional genomics compendium with auto- to pay the Open Access publication charges was provided mated data mining and query reformulation support. Bioinformatics, 14, 656–664. by Genome Alberta. 21. Kitson,D.H., Badretdinov,A., Zhu,Z.Y., Velikanov,M., Edwards,D.J., Olszewski,K., Szalma,S. and Yan,L. (2002) Conflict of interest statement: None declared. Functional annotation of proteomic sequences based on consensus of sequence and structural analysis. Brief. Bioinform., 3, 32–44. REFERENCES 22. Tetko,I.V and Tanchuk,V.Y. (2002) Application of associative 1. Hodge,A.E., Altman,R.B. and Klein,T.E. (2007) The PharmGKB: neural networks for prediction of lipophilicity in ALOGPS 2.1 integration, aggregation, and annotation of pharmacogenomic data program. J. Chem. Inf. Comput. Sci., 42, 1136–1145. and knowledge. Clin. Pharmacol. Ther., 81, 21–24. 23. Hou,T., Wang,J., Zhang,W., Wang,W. and Xu,X. (2006) 2. Hatfield,C.L., May,S.K. and Markoff,J.S. (1999) Quality of Recent advances in computational prediction of drug absorption consumer drug information provided by four web sites. Am. J. and permeability in drug discovery. Curr. Med. Chem., 13, Health Syst. Pharm., 56, 2308–2311. 2653–2667. 3. Chen,X., Ji,Z.L. and Chen,Y.Z. (2002) TTD: therapeutic target 24. Wishart,D.S., Tzur,D., Knox,C., Eisner,R., Guo,A.C., Young,N., database. Nucleic Acids Res., 30, 412–415. Cheng,D., Jewell,K., Arndt,D. et al. (2007) HMDB: the Human Metabolome Database. Nucleic Acids Res., 35 (Database issue), 4. Russ,A.P. and Lampel,S. (2005) The druggable genome: an update. D521–D526. Drug Discov. Today, 10, 1607–1610. 25. Rebholz-Schuhmann,D., Kirsch,H., Arregui,M., Gaudan,S., 5. Kanehisa,M., Goto,S., Hattori,M., Aoki-Kinoshita,K.F., Itoh,M., Rynbeek,M. and Stoehr,P. (2006) Protein annotation by EBIMed. Kawashima,S., Katayama,T., Araki,M. and Hirakawa,M. (2006) Nat. Biotechnol., 24, 902–903. From genomics to chemical genomics: new developments in KEGG. 26. Hu,Y., Hines,L.M., Weng,H., Zuo,D., Rivera,M., Richardson,A. Nucleic Acids Res., 34 (Database issue), D354–D357. and LaBaer,J. (2003) Analysis of genomic and proteomic data using 6. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., advanced literature mining. J. Proteome Res., 2, 405–412. Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R. et al. (2007)

Journal

Nucleic Acids ResearchOxford University Press

Published: Jan 29, 2008

There are no references for this article.