Access the full text.
Sign up today, get DeepDyve free for 14 days.
A. Gaulton, L. Bellis, A. Bento, Jon Chambers, M. Davies, A. Hersey, Yvonne Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani, John Overington (2011)
ChEMBL: a large-scale bioactivity database for drug discoveryNucleic Acids Research, 40
Craig Knox, V. Law, Timothy Jewison, Philip Liu, Son Ly, A. Frolkis, Allison Pon, Kelly Banco, Christine Mak, V. Neveu, Yannick Djoumbou, Roman Eisner, Anchi Guo, D. Wishart (2010)
DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugsNucleic Acids Research, 39
C. Steinbeck, Christian Hoppe, S. Kuhn, M. Floris, R. Guha, Egon Willighagen (2006)
Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics.Current pharmaceutical design, 12 17
J. Besnard, G. Ruda, V. Setola, Keren Abecassis, R. Rodriguiz, Xi-Ping Huang, S. Norval, Maria Sassano, Antony Shin, Lauren Webster, Frederick Simeons, L. Stojanovski, A. Prat, N. Seidah, D. Constam, G. Bickerton, K. Read, W. Wetsel, I. Gilbert, B. Roth, A. Hopkins (2012)
Automated design of ligands to polypharmacological profilesNature, 492
David Croft, Gavin O'Kelly, Guanming Wu, R. Haw, M. Gillespie, L. Matthews, M. Caudy, P. Garapati, Gopal Gopinath, B. Jassal, S. Jupe, Irina Kalatskaya, S. Mahajan, Bruce May, N. Ndegwa, E. Schmidt, V. Shamovsky, C. Yung, E. Birney, H. Hermjakob, P. D’Eustachio, L. Stein (2010)
Reactome: a database of reactions, pathways and biological processesNucleic Acids Research, 39
L. Jensen, Jasmin Saric, P. Bork (2006)
Literature mining for the biologist: from information retrieval to biological discoveryNature Reviews Genetics, 7
A. Barabási, Z. Oltvai (2004)
Network biology: understanding the cell's functional organizationNature Reviews Genetics, 5
Eugen Lounkine, Michael Keiser, S. Whitebread, D. Mikhailov, J. Hamon, J. Jenkins, Paul Lavan, E. Weber, Allison Doak, S. Côté, B. Shoichet, L. Urban (2012)
Large Scale Prediction and Testing of Drug Activity on Side-Effect TargetsNature, 486
M. Grever, S. Schepartz, B. Chabner (1992)
The National Cancer Institute: cancer drug discovery and development program.Seminars in oncology, 19 6
Andrea Franceschini, Damian Szklarczyk, Sune Pletscher-Frankild, Michael Kuhn, M. Simonovic, Alexander Roth, Jianyi Lin, Pablo Minguez, P. Bork, C. Mering, L. Jensen (2012)
STRING v9.1: protein-protein interaction networks, with increased coverage and integrationNucleic Acids Research, 41
R. Caspi, Tomer Altman, K. Dreher, C. Fulcher, Pallavi Subhraveti, I. Keseler, Anamika Kothari, Markus Krummenacker, Mario Latendresse, L. Mueller, Quang Ong, S. Paley, Anuradha Pujar, A. Shearer, Michael Travers, Deepika Weerasinghe, Peifen Zhang, P. Karp (2007)
The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome DatabasesNucleic Acids Research, 36
Michael Kuhn, Damian Szklarczyk, Andrea Franceschini, C. Mering, L. Jensen, P. Bork (2011)
STITCH 3: zooming in on protein–chemical interactionsNucleic Acids Research, 40
U. Hobohm, M. Scharf, R. Schneider, C. Sander (1992)
Selection of representative protein data setsProtein Science, 1
Michael Kuhn, Mumna Banchaabouchi, M. Campillos, Lars Jensen, Cornelius Gross, A. Gavin, Peer Bork (2013)
Systematic identification of proteins that elicit drug side effectsMolecular Systems Biology, 9
Michael Kuhn, Damian Szklarczyk, Andrea Franceschini, M. Campillos, C. Mering, L. Jensen, A. Beyer, P. Bork (2009)
STITCH 2: an interaction network database for small molecules and proteinsNucleic Acids Research, 38
Francesca Vitali, F. Mulas, P. Marini, R. Bellazzi (2013)
Network-based target ranking for polypharmacological therapiesJournal of biomedical informatics, 46 5
Marja Heiskanen, T. Aittokallio (2013)
Predicting drug-target interactions through integrative analysis of chemogenetic assays in yeast.Molecular bioSystems, 9 4
Omer Basha, Shoval Tirman, Amir Eluk, Esti Lotem (2013)
ResponseNet2.0: revealing signaling and regulatory pathways connecting your proteins and genes—now with human dataNucleic Acids Research, 41
A. Davis, Cynthia Murphy, Robin Johnson, Jean Lay, K. Lennon-Hopkins, Cynthia Saraceni-Richards, D. Sciaky, B. King, Michael Rosenstein, Thomas Wiegers, C. Mattingly (2012)
The Comparative Toxicogenomics Database: update 2013Nucleic Acids Research, 41
G. Paolini, Richard Shapland, W. Hoorn, J. Mason, A. Hopkins (2006)
Global mapping of pharmacological spaceNature Biotechnology, 24
M. Ashburner, C. Ball, J. Blake, D. Botstein, Heather Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. Matese, J. Richardson, M. Ringwald, G. Rubin, G. Sherlock (2000)
Gene Ontology: tool for the unification of biologyNature Genetics, 25
Theonie Anastassiadis, Sean Deacon, K. Devarajan, Haiching Ma, J. Peterson (2011)
Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivityNature biotechnology, 29
S. Günther, Michael Kuhn, Mathias Dunkel, M. Campillos, C. Senger, E. Petsalaki, Jessica Ahmed, Eduardo Urdiales, A. Gewiess, L. Jensen, Reinhard Schneider, Roman Skoblo, R. Russell, P. Bourne, P. Bork, R. Preissner (2007)
SuperTarget and Matador: resources for exploring drug-target relationshipsNucleic Acids Research, 36
Michael Kuhn, C. Mering, M. Campillos, L. Jensen, P. Bork (2007)
STITCH: interaction networks of chemicals and proteinsNucleic Acids Research, 36
Mindy Davis, Jeremy Hunt, S. Herrgård, P. Ciceri, L. Wodicka, Gabriel Pallares, M. Hocker, D. Treiber, P. Zarrinkar (2011)
Comprehensive analysis of kinase inhibitor selectivityNature Biotechnology, 29
Yanli Wang, Jewen Xiao, Tugba Suzek, Jian Zhang, Jiyao Wang, S. Bryant (2009)
PubChem: a public information system for analyzing bioactivities of small moleculesNucleic Acids Research, 37
Y. Okuno, Jiyoon Yang, Kei Taneishi, H. Yabuuchi, G. Tsujimoto (2005)
GLIDA: GPCR-ligand database for chemical genomic drug discoveryNucleic Acids Research, 34
M. Iskar, M. Campillos, Michael Kuhn, L. Jensen, V. Noort, P. Bork (2010)
Drug-Induced Regulation of Target ExpressionPLoS Computational Biology, 6
Miquel Duran-Frigola, P. Aloy (2013)
Analysis of chemical and biological features yields mechanistic insights into drug side effects.Chemistry & biology, 20 4
M. Kanehisa, S. Goto, Yoko Sato, Miho Furumichi, M. Tanabe (2011)
KEGG for integration and interpretation of large-scale molecular data setsNucleic Acids Research, 40
Mark Harrower, Cynthia Brewer (2003)
ColorBrewer.org: An Online Tool for Selecting Colour Schemes for MapsThe Cartographic Journal, 40
Jasmin Saric, L. Jensen, Rossitza Ouzounova, Isabel Rojas, P. Bork (2006)
Extraction of regulatory gene/protein networks from MedlineBioinformatics, 22 6
P. Rose, Bojan Beran, Chunxiao Bi, Wolfgang Bluhm, D. Dimitropoulos, D. Goodsell, A. Prlić, Martha Quesada, Greg Quinn, J. Westbrook, Jasmine Young, B. Yukich, C. Zardecki, H. Berman, P. Bourne (2010)
The RCSB Protein Data Bank: redesigned web site and web servicesNucleic Acids Research, 39
L. Breiman (2001)
Random ForestsMachine Learning, 45
C. Steinbeck, Yongquan Han, S. Kuhn, Oliver Horlacher, Edgar Luttmann, Egon Willighagen (2003)
The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and BioinformaticsJournal of Chemical Information and Computer Sciences, 43
Bin Chen, R. Sheridan, V. Hornak, J. Voigt (2012)
Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR PredictionsJournal of chemical information and modeling, 52 3
J. Amberger, C. Bocchini, A. Scott, A. Hamosh (2008)
McKusick's Online Mendelian Inheritance in Man (OMIM®)Nucleic Acids Research, 37
C. Schaefer, K. Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, K. Buetow (2008)
PID: the Pathway Interaction DatabaseNucleic Acids Research, 37
W. Schaal, U. Hammerling, M. Gustafsson, O. Spjuth (2013)
Automated QuantMap for rapid quantitative molecular network topology analysisBioinformatics, 29
F. Zhu, Zhe Shi, C. Qin, Lin Tao, Xin Liu, Feng Xu, Li Zhang, Yang Song, Xianghui Liu, Jing-Xian Zhang, B. Han, Peng Zhang, Yuzong Chen (2011)
Therapeutic target database update 2012: a resource for facilitating target-oriented drug discoveryNucleic Acids Research, 40
Didier Rognan (2013)
Towards the Next Generation of Computational Chemogenomics ToolsMolecular Informatics, 32
Deborah Hurt, K. Winestock, M. O'connor, M. Johnston (2015)
Nucleic Acids ResearchNucleic Acids Research, 43
Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, M. Simonovic, Alexander Roth, Pablo Minguez, T. Doerks, M. Stark, J. Muller, P. Bork, L. Jensen, C. Mering (2010)
The STRING database in 2011: functional interaction networks of proteins, globally integrated and scoredNucleic Acids Research, 39
B. Roth, E. Lopez, Shamil Patel, W. Kroeze (2000)
The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches?The Neuroscientist, 6
Michael Kuhn, M. Campillos, Ivica Letunic, L. Jensen, P. Bork (2010)
A side effect resource to capture phenotypic effects of drugsMolecular Systems Biology, 6
Published online 28 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D401–D407 doi:10.1093/nar/gkt1207 STITCH 4: integration of protein–chemical interactions with user data 1, 2 3 3 Michael Kuhn *, Damian Szklarczyk , Sune Pletscher-Frankild , Thomas H. Blicher , 2 3, 4,5, Christian von Mering , Lars J. Jensen * and Peer Bork * 1 2 Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany and Max-Delbru¨ ck-Centre for Molecular Medicine, Robert-Ro¨ ssle-Strasse 10, 13092 Berlin, Germany Received September 30, 2013; Revised November 1, 2013; Accepted November 4, 2013 cell or initiate many signaling cascades and most pharma- ABSTRACT ceutical interventions. A large collection of such inter- STITCH is a database of protein–chemical inter- actions can, therefore, be used to study a variety of actions that integrates many sources of experimen- cellular functions and the impact of drug treatment on tal and manually curated evidence with text-mining the cell. For such research, it is important to have, as information and interaction predictions. Available at complete as possible, data on protein–chemical inter- http://stitch.embl.de, the resulting interaction actions. By treating proteins and chemicals as nodes of a graph, which are linked by edges if they have been found to network includes 390 000 chemicals and 3.6 million interact (1), we can adopt a network view that enables us to proteins from 1133 organisms. Compared with the integrate many different sources. The concept of STITCH previous version, the number of high-confidence (‘search tool for interacting chemicals’) was from the begin- protein–chemical interactions in human has ning to combine sources of protein–chemical interactions increased by 45%, to 367 000. In this version, we from experimental databases, pathway databases, drug– added features for users to upload their own data target databases, text mining and drug–target predictions to STITCH in the form of internal identifiers, into a unified network (2–4). This network abstracts the chemical structures or quantitative data. For complexity of the underlying data sources, making large- example, a user can now upload a spreadsheet scale studies possible. At the same time, links to the original with screening hits to easily check which inter- sources are retained, making it possible to trace the prov- actions are already known. To increase the enance of the data. The underlying STITCH database can coverage of STITCH, we expanded the text mining be accessed in multiple ways: via an intuitive web interface, via download files (for large-scale analysis) and via an API to include full-text articles and added a prediction (enabling automated access on a small to medium scale). method based on chemical structures. We further Here, we present recent improvements to the database and changed our scheme for transferring interactions user interface of STITCH. Already in the previous between species to rely on orthology rather than versions, it has been possible to query STITCH using protein similarity. This improves the performance protein or chemical names, InChIKeys and SMILES within protein families, where scores are now strings. New in this version is the possibility to upload transferred only to orthologous proteins, but not to spreadsheets with chemical descriptors and experimental paralogous proteins. STITCH can be accessed with data that can be directly added to the network, as described a web-interface, an API and downloadable files. later in text. We also for the first time use the evidence transfer algorithm described for the STRING 9.1 database (5) to improve the performance for protein INTRODUCTION families. Protein–chemical interactions are essential for any biolo- Compared with STITCH 3, we use the same underlying gical system; for example, they drive the metabolism of the set of proteins, containing 1133 species. We updated the *To whom correspondence should be addressed. Tel: +49 6221 387 8526; Fax: +49 6221 387 8517; Email: [email protected] Correspondence may also be addressed to Michael Kuhn. Tel: +49 351 463 40063; Fax: +49 351 463 40061; Email: [email protected] dresden.de Correspondence may also be addressed to Lars J. Jensen. Tel: +45 353 25025; Fax: +45 353 25001; Email: [email protected] The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D402 Nucleic Acids Research, 2014, Vol. 42, Database issue Figure 1. Cumulative distribution of scores. For each confidence score threshold, the plot shows the number of chemicals (top) and protein–chemical interactions (bottom) that have at least this confidence score in the human protein–chemical network. For example, there are 172 000 chemicals with a high-confidence interaction (score at least 0.7). As there are many interactions with low confidence scores, we use a minimum score threshold of 0.15. Steps in the data correspond to large numbers of compounds that have the same maximum score in manually curated databases or the ChEMBL database (with different confidence levels). set of chemicals (6), and find interactions with 390 000 PREDICTION OF INTERACTIONS distinct chemicals. In human, high-confidence interactions STITCH contains verified interactions (from the sources for 172 000 compounds are available in STITCH 4 listed earlier in text) and predicted interactions, based (Figure 1), compared with 110 000 in STITCH 3 (4). In on text mining and other prediction methods. In the text- total, the human protein–chemical interaction network mining channels, interactions were extracted from the lit- contains 2.2 million interactions (Figure 1). Applying erature using both co-occurrence text mining and Natural different confidence thresholds, 570 000 interactions are Language Processing (21,22). For the first time for of medium confidence (score cutoff 0.5) and 367 000 inter- STITCH, we not only use data from MEDLINE abstracts actions are of high confidence (cutoff 0.7). and OMIM (23) but also from full-text articles freely avail- able from PubMed Central or publishers’ Web sites. In previous versions, we have used medical subject SOURCES OF INTERACTIONS headings (MeSH) terms in text mining and when import- Protein–chemical interactions are presented in four differ- ing external databases. These terms allowed us to expand ent channels: experiments, databases, text mining and pre- concepts like ‘alpha adrenergic receptors’ to individual dicted interactions. We import the following sources of proteins. We used to map MeSH terms to proteins using experimental information: ChEMBL [interactions with a combination of automatic and manual approaches, reported K or IC (7)], PDSP K Database (8), PDB which led to errors in some cases. Furthermore, the i 50 i (9) and—new to STITCH—data from two large-scale mapping was only valid for human proteins. We have, studies on kinase–ligand interactions (10,11). From the therefore, started to use terms from the Gene Ontology latter studies, we extracted 74 291 interactions between [GO terms, (24)] to define groups of proteins. We excluded 229 compounds and 414 human kinases. We converted GO annotations based on mutant phenotypes (IMP) and the reported residual kinase activities (10) and kinase electronic annotations (IEA). We then checked the affinities (11) to probabilistic scores, which gave rise to coverage of GO annotations for all species in STITCH. 14 187, 9431 and 5977 interactions of at least low, We only mapped GO terms to proteins for species medium and high confidence, respectively. The second where at least 10% of the proteins have been annotated, channel is made up of manually curated drug–target data- namely Drosophila melanogaster, Escherichia coli, Homo bases: DrugBank (12), GLIDA (13), Matador (14), TTD sapiens, Mus musculus, Saccharomyces cerevisiae and (15) and CTD (16); and pathway databases: KEGG (17), Schizosaccharomyces pombe. NCI/Nature Pathway Interaction Database (18), As the coverage of synonyms is lower than for MeSH Reactome (19) and BioCyc (20). terms, we manually added additional synonyms to GO Nucleic Acids Research, 2014, Vol. 42, Database issue D403 terms to increase the text-mining sensitivity. As one GO from text mining.) We also predicted shared mechanisms term corresponds to multiple proteins, the resulting confi- of action from MeSH pharmacological actions, the dence score for the individual protein–chemical inter- Connectivity Map using the DIPS method (34), which actions should be down-weighted compared with tests for similar changes in gene expression on interactions that are directly associated with a single compound treatment, and from screening data from the protein. We, therefore, determined a correction factor Developmental Therapeutics Program NCI/NIH (35). through benchmarking (as a function of the number of The latter screening data replaces our previous analysis member proteins in the GO term). For each channel, we of the NCI60 panel. We considered only the 70 of 115 looked at the GO terms that are interacting with chemicals. cell lines against which >10 000 compounds have been We then checked if the member proteins that are part of the screened and centered the negative logarithm of GI50 GO terms are in turn interacting with chemicals. For each values with respect to both compounds and cell lines. of these chemicals, we determined the fraction of member For the 47 692 compounds in the data set, we calculated proteins that are interacting. For example, if a drug was all-against-all covariance across cell lines and converted known to bind two of the three a2-adrenergic receptors, it these to probabilistic scores. This resulted in 114 072, was added as a data point (x = 3, y = 2/3) to the bench- 24 889 and 6890 pairs of compounds of at least low, mark data. The data points were then fitted for each medium and high confidence, respectively. channel by the following function: To account for the fact that many interactions are determined in model species, we transfer interactions fxðÞ¼ðÞ x a between species. Previously, the sequence similarity between two proteins was used to determine the confi- For larger groups, the function approaches x (i.e. inter- dence in the transferred score. This had the disadvantage acting with one protein is not predictive for the other that when transferring evidence from a selective binder proteins). (e.g. inhibiting only one subtype of a receptor), all In this version of STITCH, we introduced a fourth subtypes of the receptor in the target species would channel, namely predicted protein–chemical interactions receive a similar score. In the new scheme, only the based on chemical structure. Countless articles on the pre- orthologous protein receives the evidence from the diction of drug–target interactions have been published in specific compound. the last years [e.g. (25–27), reviewed in (28)]. In many cases, however, the actual predictions are not available. We, therefore, implemented a relatively simple and transparent INTEGRATION WITH USER DATA prediction scheme based on Random Forests (29,30): for Users can now upload a spreadsheet (e.g. in Microsoft each target for which >100 binding partners are known Excel format) with experimental data to STITCH using from the ChEMBL database, we attempted to make a pre- the ‘batch import’ functionality (Figure 2). For each diction. To avoid biases, we first excluded highly similar compound, the spreadsheet may contain: the name of chemicals, enforcing a maximum Tanimoto similarity of the compound, the chemical structure (as SMILES 0.9 (using Algorithm 2 described by Hobohm) (31) using string, InChI or InChIKey), an internal identifier and a 2D chemical fingerprints calculated with the chemistry de- readout value. STITCH uses the name and chemical struc- velopment kit (32,33). We then added ten times as many ture to find the compound in the STITCH database. random chemicals as non-binders to the training set and The name provided by the user can then be shown in used the fingerprints as predictors for all compounds. the interaction network, and the downloadable files Using 10-fold cross-validation, we assessed how predictive contain both the name and the user’s internal identifier the model is (by calculating the Pearson correlation coeffi- (if provided). The readout value may be a numerical cient between the training data and the cross-validation value, e.g. the activity of a compound in a screen. The results). We used the correlation as a correction factor to user can then select a palette from the ColorBrewer2 decrease the confidence score of the predicted interactions, color schemes (36). The palette is used to convert the nu- which were predicted for all compounds occurring in the merical value into a color, which is then used to highlight ChEMBL database. We repeated this procedure three the compounds in the network with a colored halo times for each compound and used the median predicted (Figure 3). It is also possible to directly specify colors score, to decrease the effect of the random negative set. As (in standard hexadecimal notation). interactions were predicted from the experimental channel, the predictions and experimental channels are not inde- pendent of each other. To compute the combined score USE CASES (which is shown on the network), we therefore took the highest of either score, instead of combining the scores in The majority of users access STITCH via the web inter- a Bayesian fashion as it is done for the other channels. face, where networks can be retrieved using single or In total, predictions were made for 767 proteins across multiple names of proteins or chemicals. Furthermore, 15 species. The median correlation between the training users can query STITCH with protein sequences and data and the cross-validation prediction was 0.90. chemical structures (in the form of SMILES strings). Links between compounds were also extracted from the The networks can then be explored interactively or aforementioned sources, if possible. (e.g. chemical reac- saved in different formats, including publication-quality tions from pathway databases or co-mentioned chemicals images. Proteins and chemicals can be clustered in the D404 Nucleic Acids Research, 2014, Vol. 42, Database issue Figure 2. Data upload. The user can use the batch import form to upload a spreadsheet, e.g. from Microsoft Excel (a). STITCH will then show the first five rows of the spreadsheet and ask the user to identify columns that contain the name, chemical structure or a numerical readout (b). Selected columns are highlighted in green. STITCH uses heuristics to suggest which kind of information the columns contain, e.g. by identifying SMILES strings as structural descriptors. Nucleic Acids Research, 2014, Vol. 42, Database issue D405 Figure 3. User data and the STITCH network. For four compounds that are part of the example data set from Figure 2, interacting proteins are shown. The numerical readout has been converted to a color on a red–blue gradient. Instead of the normal chemical names used by STITCH, the full names provided in the data set are used, enabling the user to easily recognize the studied chemicals. interactive network viewer and enriched GO terms among ACKNOWLEDGEMENTS the proteins can be computed (5,37). The set of all inter- The authors wish to thank Yan P. Yuan (EMBL) for his actions is also available for download under Creative outstanding support with the STITCH servers. Commons licenses (with separate commercial licensing for a subset). In this way, STITCH can be used to drive large-scale studies. Many research groups have already FUNDING used STITCH 3 in this way; a few examples illustrating different utilities follow: STITCH has been used to deter- Deutsche Forschungsgemeinschaft [DFG KU 2796/2-1 to mine which proteins cause side effects during drug treat- M.K.]; Novo Nordisk Foundation Center for Protein ment (38,39) by combining the STITCH network with Research. Funding for open access charge: European data from a side effect database (40). The database has Molecular Biology Laboratory. also been instrumental for the identification of druggable Conflict of interest statement. None declared. proteins to predict polypharmacological treatment of diseases on the basis of network topology features (41). For a method that predicts drug targets based on REFERENCES chemogenetic assays in yeast, STITCH has been chosen as a benchmark set (42). Lastly, STITCH has also been 1. Baraba´ si,A.L. and Oltvai,Z.N. (2004) Network biology: integrated into other tools, for example ResponseNet2.0 understanding the cell’s functional organization. Nat. Rev. Genet., and QuantMap (43,44). 5, 101–113. D406 Nucleic Acids Research, 2014, Vol. 42, Database issue 2. Kuhn,M., von Mering,C., Campillos,M., Jensen,L.J. and Bork,P. Krummenacker,M. et al. (2010) The MetaCyc database (2008) STITCH: interaction networks of chemicals and proteins. of metabolic pathways and enzymes and the BioCyc collection Nucleic Acids Res., 36, D684–D688. of pathway/genome databases. Nucleic Acids Res., 38, 3. Kuhn,M., Szklarczyk,D., Franceschini,A., Campillos,M., von D473–D479. Mering,C., Jensen,L.J., Beyer,A. and Bork,P. (2010) STITCH 2: 21. Saric,J., Jensen,L.J., Ouzounova,R., Rojas,I. and Bork,P. (2006) an interaction network database for small molecules and proteins. Extraction of regulatory gene/protein networks from Medline. Nucleic Acids Res., 38, D552–D556. Bioinformatics, 22, 645–650. 4. Kuhn,M., Szklarczyk,D., Franceschini,A., von Mering,C., 22. Jensen,L.J., Saric,J. and Bork,P. (2006) Literature mining for the Jensen,L.J. and Bork,P. (2012) STITCH 3: zooming in biologist: from information retrieval to biological discovery. on protein-chemical interactions. Nucleic Acids Res., 40, Nat. Rev. Genet., 7, 119–129. D876–D880. 23. Amberger,J., Bocchini,C.A., Scott,A.F. and Hamosh,A. (2009) 5. Franceschini,A., Szklarczyk,D., Frankild,S., Kuhn,M., McKusick’s Online Mendelian Inheritance in Man (OMIM). Simonovic,M., Roth,A., Lin,J., Minguez,P., Bork,P., Nucleic Acids Res., 37, D793–D796. von Mering,C. et al. (2013) STRING v9.1: protein-protein 24. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., interaction networks, with increased coverage and integration. Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. Nucleic Acids Res., 41, D808–D815. et al. (2000) Gene ontology: tool for the unification of biology. 6. Wang,Y., Xiao,J., Suzek,T.O., Zhang,J., Wang,J. and Bryant,S.H. The gene ontology consortium. Nat. Genet., 25, 25–29. (2009) PubChem: a public information system for analyzing 25. Lounkine,E., Keiser,M.J., Whitebread,S., Mikhailov,D., bioactivities of small molecules. Nucleic Acids Res., 37, Hamon,J., Jenkins,J.L., Lavan,P., Weber,E., Doak,A.K., Coˆ te´ ,S. W623–W33. et al. (2012) Large-scale prediction and testing of drug activity on 7. Gaulton,A., Bellis,L.J., Bento,A.P., Chambers,J., Davies,M., side-effect targets. Nature, 486, 361–367. Hersey,A., Light,Y., McGlinchey,S., Michalovich,D., Al- 26. Besnard,J., Ruda,G.F., Setola,V., Abecassis,K., Rodriguiz,R.M., Lazikani,B. et al. (2012) ChEMBL: a large-scale bioactivity Huang,X.P., Norval,S., Sassano,M.F., Shin,A.I., Webster,L.A. database for drug discovery. Nucleic Acids Res., 40, et al. (2012) Automated design of ligands to polypharmacological D1100–D1107. profiles. Nature, 492, 215–220. 8. Roth,B.L., Lopez,E., Patel,S. and Kroeze,W. (2000) 27. Paolini,G.V., Shapland,R.H., van Hoorn,W.P., Mason,J.S. and The multiplicity of serotonin receptors: uselessly diverse Hopkins,A.L. (2006) Global mapping of pharmacological space. molecules or an embarrassment of riches? Neuroscientist, 6, Nat. Biotechnol., 24, 805–815. 252–262. 28. Rognan,D. (2013) Towards the next generation of computational 9. Rose,P.W., Beran,B., Bi,C., Bluhm,W.F., Dimitropoulos,D., chemogenomics tools. Mol. Inf., 32, 1029–1034. Goodsell,D.S., Prlic,A., Quesada,M., Quinn,G.B., Westbrook,J.D. 29. Breiman,L. (2001) Random forests. Mach. Learn., 45, 5–32. et al. (2011) The RCSB protein data bank: redesigned web site 30. Chen,B., Sheridan,R.P., Hornak,V. and Voigt,J.H. (2012) Comparison of random forest and pipeline pilot naıve bayes and web services. Nucleic Acids Res., 39, D392–D401. ¨ in prospective QSAR predictions. J. Chem. Inf. Model, 52, 10. Anastassiadis,T., Deacon,S.W., Devarajan,K., Ma,H. and Peterson,J.R. (2011) Comprehensive assay of kinase catalytic 792–803. 31. Hobohm,U., Scharf,M., Schneider,R. and Sander,C. (1992) activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol, 29, 1039–1045. Selection of representative protein data sets. Protein Sci., 1, 11. Davis,M.I., Hunt,J.P., Herrgard,S., Ciceri,P., Wodicka,L.M., 409–417. Pallares,G., Hocker,M., Treiber,D.K. and Zarrinkar,P.P. (2011) 32. Steinbeck,C., Hoppe,C., Kuhn,S., Floris,M., Guha,R. and Comprehensive analysis of kinase inhibitor selectivity. Nat. Willighagen,E.L. (2006) Recent developments of the Biotechnol., 29, 1046–1051. chemistry development kit (CDK) - an open-source java 12. Knox,C., Law,V., Jewison,T., Liu,P., Ly,S., Frolkis,A., Pon,A., library for chemo- and bioinformatics. Curr. Pharm. Des., 12, Banco,K., Mak,C., Neveu,V. et al. (2011) DrugBank 3.0: a 2111–2120. comprehensive resource for ‘omics’ research on drugs. Nucleic 33. Steinbeck,C., Han,Y., Kuhn,S., Horlacher,O., Luttmann,E. and Acids Res., 39, D1035–D1041. Willighagen,E. (2003) The Chemistry Development Kit (CDK): 13. Okuno,Y., Yang,J., Taneishi,K., Yabuuchi,H. and Tsujimoto,G. an open-source java library for chemo- and bioinformatics. (2006) GLIDA: GPCR-ligand database for chemical genomic J. Chem. Inf. Comput. Sci., 43, 493–500. drug discovery. Nucleic Acids Res., 34, D673–D677. 34. Iskar,M., Campillos,M., Kuhn,M., Jensen,L.J., van Noort,V. and 14. Gu¨ nther,S., Kuhn,M., Dunkel,M., Campillos,M., Senger,C., Bork,P. (2010) Drug-induced regulation of target expression. Petsalaki,E., Ahmed,J., Urdiales,E.G., Gewiess,A., Jensen,L.J. PLoS Comput. Biol., 6, e1000925. et al. (2008) SuperTarget and Matador: resources for 35. Grever,M.R., Schepartz,S.A. and Chabner,B.A. (1992) The exploring drug-target relationships. Nucleic Acids Res., 36, National Cancer Institute: cancer drug discovery and development D919–D922. program. Semin. Oncol., 19, 622–638. 15. Zhu,F., Shi,Z., Qin,C., Tao,L., Liu,X., Xu,F., Zhang,L., Song,Y., 36. Harrower,M. and Brewer,C.A. (2003) ColorBrewer.org: an Liu,X., Zhang,J. et al. (2012) Therapeutic target database update online tool for selecting colour schemes for maps. Cartogr. J., 40, 2012: a resource for facilitating target-oriented drug discovery. 27–37. Nucleic Acids Res., 40, D1128–D1136. 37. Szklarczyk,D., Franceschini,A., Kuhn,M., Simonovic,M., Roth,A., 16. Davis,A.P., Murphy,C.G., Johnson,R., Lay,J.M., Minguez,P., Doerks,T., Stark,M., Muller,J., Bork,P. et al. (2011) Lennon-Hopkins,K., Saraceni-Richards,C., Sciaky,D., King,B.L., The STRING database in 2011: functional interaction networks Rosenstein,M.C., Wiegers,T.C. et al. (2013) The comparative of proteins, globally integrated and scored. Nucleic Acids Res., 39, D561–D568. toxicogenomics database: update 2013. Nucleic Acids Res., 41, 38. Duran-Frigola,M. and Aloy,P. (2013) Analysis of chemical and D1104–D1114. biological features yields mechanistic insights into drug side 17. Kanehisa,M., Goto,S., Sato,Y., Furumichi,M. and Tanabe,M. (2012) KEGG for integration and interpretation of large-scale effects. Chem. Biol., 20, 594–603. molecular data sets. Nucleic Acids Res., 40, D109–D114. 39. Kuhn,M., Al Banchaabouchi,M., Campillos,M., Jensen,L.J., 18. Schaefer,C.F., Anthony,K., Krupa,S., Buchoff,J., Day,M., Gross,C., Gavin,A.C. and Bork,P. (2013) Systematic Hannay,T. and Buetow,K.H. (2009) PID: the pathway identification of proteins that elicit drug side effects. Mol. Syst. interaction database. Nucleic Acids Res., 37, D674–D679. Biol., 9, 663. 19. Croft,D., O’Kelly,G., Wu,G., Haw,R., Gillespie,M., Matthews,L., 40. Kuhn,M., Campillos,M., Letunic,I., Jensen,L.J. and Bork,P. Caudy,M., Garapati,P., Gopinath,G., Jassal,B. et al. (2011) (2010) A side effect resource to capture phenotypic effects of Reactome: a database of reactions, pathways and biological drugs. Mol. Syst. Biol., 6, 343. processes. Nucleic Acids Res., 39, D691–D697. 41. Vitali,F., Mulas,F., Marini,P. and Bellazzi,R. (2013) 20. Caspi,R., Altman,T., Dale,J.M., Dreher,K., Fulcher,C.A., Network-based target ranking for polypharmacological therapies. Gilham,F., Kaipa,P., Karthikeyan,A.S., Kothari,A., J. Biomed. Inf., 46, 876–881. Nucleic Acids Research, 2014, Vol. 42, Database issue D407 42. Heiskanen,M.A. and Aittokallio,T. (2013) Predicting drug-target connecting your proteins and genes—now with human data. interactions through integrative analysis of chemogenetic assays in Nucleic Acids Res., 41, W198–W203. yeast. Mol. Biosyst., 9, 768–779. 44. Schaal,W., Hammerling,U., Gustafsson,M.G. and Spjuth,O. 43. Basha,O., Tirman,S., Eluk,A. and Yeger-Lotem,E. (2013) (2013) Automated QuantMap for rapid quantitative molecular ResponseNet2.0: revealing signaling and regulatory pathways network topology analysis. Bioinformatics, 29, 2369–2370.
Nucleic Acids Research – Oxford University Press
Published: Jan 28, 2014
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.