Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

SwissTargetPrediction: a web server for target prediction of bioactive small molecules

SwissTargetPrediction: a web server for target prediction of bioactive small molecules W32–W38 Nucleic Acids Research, 2014, Vol. 42, Web Server issue Published online 03 May 2014 doi: 10.1093/nar/gku293 SwissTargetPrediction: a web server for target prediction of bioactive small molecules 1 1 1 1 1,2,3,* David Gfeller ,Aurelien ´ Grosdidier , Matthias Wirth , Antoine Daina , Olivier Michielin 1,* and Vincent Zoete Swiss Institute of Bioinformatics (SIB), Quartier Sorge, Batiment ˆ Genopode ´ , CH-1015 Lausanne, Switzerland, Ludwig Institute for Cancer Research, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland and Oncology Department, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland Received January 29, 2014; Revised March 24, 2014; Accepted March 30, 2014 ABSTRACT for many proteins such as specific kinases or phosphatases, hundreds of small molecule ligands have been identified. Bioactive small molecules, such as drugs or metabo- Such large screening initiatives have also provided unique lites, bind to proteins or other macro-molecular tar- insights into the specificity and pharmacology of protein gets to modulate their activity, which in turn results families (1,5). Recently, these data have been collected in in the observed phenotypic effects. For this reason, several public databases, like ChEMBL (6) or PubChem mapping the targets of bioactive small molecules is (7) storing information on bioactivities, or ZINC (8) con- a key step toward unraveling the molecular mecha- taining information on commercially available compounds. These can be mined automatically to retrieve specific infor- nisms underlying their bioactivity and predicting po- mation for a large number of molecules. tential side effects or cross-reactivity. Recently, large However, molecular targets still remain unknown in sev- datasets of protein–small molecule interactions have eral cases. For instance, phenotypic assays indicate whether become available, providing a unique source of in- a molecule is active or not, without necessarily providing formation for the development of knowledge-based direct information on its actual molecular targets (9–11). approaches to computationally identify new targets Moreover, for most molecules, experiments have been per- for uncharacterized molecules or secondary targets formed with a limited set of targets, such as kinases or for known molecules. Here, we introduce SwissTar- G protein-coupled receptors, and possible off-target effects getPrediction, a web server to accurately predict the have been rarely tested for. Finally, new molecules being de- targets of bioactive molecules based on a combina- veloped for specific purposes may have several targets that are typically not known in advance. For instance, a recent tion of 2D and 3D similarity measures with known study on a set of 802 drugs and interaction data assembled ligands. Predictions can be carried out in five differ- from seven different databases has shown that known drugs ent organisms, and mapping predictions by homol- have on average six molecular targets on which they exhibit ogy within and between different species is enabled activity (12). Identifying these secondary targets is crucial. for close paralogs and orthologs. SwissTargetPre- First, it can indicate possible adverse side effects that might diction is accessible free of charge and without login arise when using the molecule, thereby decreasing the attri- requirement at http://www.swisstargetprediction.ch. tion rate in clinical trials due to toxicity (13,14). Second, it provides ways of repositioning (or repurposing) molecules for new applications. This has become a central theme in INTRODUCTION pharmaceutical research in view of the difficulty to launch Molecular insight into the mode of action of bioactive small new chemical entities. In particular, it is increasingly being molecules is key to understanding observed phenotypes, recognized that several compounds traditionally used for predicting potential side effects or cross-reactivity and op- one given application may actually show potent activity in timizing existing compounds (1–3). In particular, mapping other therapeutic settings (2,15,16). their targets is a crucial step toward providing a rational un- Computational predictions play an important role in nar- derstanding of small molecule’s bioactivity. For these rea- rowing down the set of potential targets and suggesting sons, high-throughput reverse screening of chemical com- secondary targets for known molecules (13,15). In partic- pounds against arrays of protein targets has become an ular, the large amount of information collected on protein– integral part of drug discovery pipelines (4). As a result, To whom correspondence should be addressed. Tel: +41 21 692 4053; Fax: +41 21 692 4065; Email: [email protected] Correspondence may also be addressed to Vincent Zoete. Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, Web Server issue W33 small molecule interactions in the last few years has enabled for each molecule size are listed in Supplementary Table researchers to develop ligand-based approaches for target S1. Target scores range therefore between 0 and 1, with the prediction (1,17–20). With SwissTargetPrediction, our goal largest possible value being reached if the query molecule is to provide a user-friendly web interface for a knowledge- is a known ligand of the target. These scores are used to based algorithm, recently developed in our group (18), rank predicted targets. A probability has been derived from to predict the targets of bioactive small molecules. Com- this score to assess the likelihood of the predictions to be pared to other existing approaches, SwissTargetPrediction correct. These probability values correspond to the aver- has several distinctive features. First, it enables combining age precision (i.e. number of true-positives divided by the both 2D and 3D similarity measures with known ligands. total number of predicted targets at different thresholds) Second, it provides results in vfi e different species. Third, it obtained in a leave-one-out cross-validation study over our allows users to map predictions between and within organ- training set (see Supplementary Materials). As it is based on isms based on target homology. cross-validation, they may suffer from internal biases in our training data (e.g. presence of large congeneric series of sim- THE SWISSTARGETPREDICTION METHOD AND ilar molecules) and if a new query molecule without related DATASET molecules in our database is tested, they may slightly over- estimate the prediction accuracy. For this reason, we stress SwissTargetPrediction is based on the observation that sim- that these probabilities are primarily used to rank targets ilar bioactive molecules are more likely to share similar predicted to bind to a given small molecule. In particular, targets (1,21). Therefore, the targets of a molecule can be they should not be used to compare predictions obtained predicted by identifying proteins with known ligands that with different molecules. are highly similar to the query molecule. In this ligand- The set of protein–ligand interactions was retrieved from based strategy, a major challenge is to accurately identify the ChEMBL database version 16 (6) using stringent cri- and quantify similarity between the query molecule and the teria to remove ambiguous cases. First, only interactions known ligands. Early approaches have focused on deter- involving single proteins or protein complexes as well as mining chemical similarity by using molecular fingerprints ligands with less than 80 heavy atoms were considered. (22) (sometimes called 2D similarity). While compounds ex- Second, selected interactions had to be annotated as di- hibiting a high similarity under these measures clearly have rect binding (‘assay type’ = ‘B’) with an activity (K ,K , i d an increased likelihood for interactions with similar tar- IC or EC ) lower than 10 M in all assays. Interactions 50 50 gets, the biophysics of molecular recognition suggests that were retrieved in vfi e organisms (human, mouse, rat, cow similarity in ligand shape or electrostatic potential distri- and horse). In total, our dataset consists of 280 381 small bution could also lead to a similar effect (23). Therefore, molecules interacting with 2686 targets, with the majority 3D structural similarity measures have been developed to of targets (66%) found in human (see Table 1). assess similarity between molecules (24–29). Recently, we have shown that combining 2D and 3D similarity measures significantly increases the target prediction accuracy, espe- THE SWISSTARGETPREDICTION WEB INTERFACE cially if the query molecule is new and does not belong to SwissTargetPrediction provides an intuitive interface to pre- an already well-studied chemical series (18). In SwissTarget- dict small molecule protein targets (see also Supplemen- Prediction, both 2D similarity and 3D similarity values are tary Figure S1). Query molecules can be inputted either as computed against a set of known ligands. For 2D similar- SMILES, or drawn in 2D using the javascript-based molec- ity, we use FP2 fingerprints to describe molecules, as imple- ular editor of ChemAxon (http://www.chemaxon.com). The mented in OpenBabel version 2.2.0. The similarity between SMILES input field and the 2D interface are automatically two molecules is quantified with the Tanimoto coefficient synchronized. The organism in which predictions should be (which corresponds to the number of shared fingerprint made can be selected. The current version of SwissTarget- patterns divided by the total number of fingerprint pat- Prediction allows users to choose between vfi e organisms: terns describing the two molecules). For 3D similarity, we human, mouse, rat, cow and horse, the default being human first generate 20 different conformations of each molecule (see Supplementary Figure S1). Once a molecule has been (see Supplementary Materials). From these different con- provided, either by SMILES or by drawing, and an organ- formations, 20 Electroshape vectors, which consist of 18- ism has been chosen, the ‘Submit’ button becomes clickable dimensional real vectors (27), are computed. The Manhat- and calculations can start. The SMILES is first checked to ensure that it corresponds to a valid chemical structure. If tan distance d = |x − y | is used to compare vectors s s true, the similarity (both 2D and 3D) between the query s=1 (x and y) describing two different molecules. The final 3D molecule and all ligands in our database is computed and similarity value between molecules i and j is computed as the score of each target is derived from the combined 2D 1/ 1 + d ,where d is the smallest Manhattan distance and 3D similarity values with the most similar ligands (see ij ij among the 20×20 distances calculated over all possible con- Supplementary Materials). formations of each molecule (see also Supplementary Ma- The result page lists the predicted targets with their com- terials). The final score of a target corresponds to a combi- mon name together with links to GeneCards (30) (for hu- nation of similarity measures based on a logistic regression man proteins), UniProt (31) and ChEMBL (6)databases of the similarity values, with the most similar ligands using when available (see Figure 1). Targets are ranked accord- both 2D and 3D similarity measures (see Supplementary ing to their score with respect to the query molecule. The Materials and (18)). Coefficients of the logistic regression target classes are displayed in the last column. These classes W34 Nucleic Acids Research, 2014, Vol. 42, Web Server issue Table 1. Number of targets in each organism Number of targets including homology-based Organisms Number of targets predictions Homo sapiens 1768 2547 Mus musculus 342 2345 Rattus norvegicus 469 2657 Bos taurus 104 2272 Equus caballus 3 2367 Total 2686 12 188 The first column shows the number of targets with experimental data. The second column shows the number of targets when including homology-based predictions. classes present among the predicted targets. All results can be downloaded as text (.txt or .csv), images (.jpg), printable report (.pdf), copied to the clipboard or sent to an email ad- dress by clicking on the links following the ‘Retrieve data:’ field. The probability derived from the target scores (see Supplementary Materials) is displayed in the fifth column as a horizontal bar (see Figure 1). In the example of Figure 1, the predicted tar- gets of chlorotrianisene (CHEMBL1200761) include Prostaglandin G/H synthase 1 (COX-1) and estrogen receptor (ESR1). Chlorotrianisene is a known inhibitor of COX-1 (13), although the interaction is not present in ChEMBL. Moreover, while no direct binding between chlorotrianisene and estrogen receptor is reported in ChEMBL, functional assay results in this database indi- cate that chlorotrianisene is active on estrogen receptor (32). These results show that, in this case, several of the predictions are true-positives. To enable users to visually explore the ligands of the pre- dicted targets, all ligands with a similarity (either 2D or 3D) larger than a minimal threshold value can be examined by following the links provided in the sixth column. Figure 2A shows an example of the results obtained by following the link in the red circle of Figure 1. Ligands are listed accord- ing to their similarity with the query molecule. A threshold for 3D similarity values has been set to 0.75 and the one for 2D similarity values to 0.45. Below these thresholds, lig- ands show very low similarity with the query molecule and Figure 1. Prediction result page. This page shows the list of predicted tar- are not listed. A link to the ChEMBL entries is provided gets for the query molecule (here chlorotrianisene). Targets are ranked ac- cording to their scores. Links to GeneCards (under ‘Common name’ col- for the ligands and the similarity with the query molecule is umn), UniProt and ChEMBL (when available) are provided. Green bars indicated. We note that manually exploring the ligands sim- indicate the estimated probability of a protein to be a true target given its ilar to the query molecule is strongly recommended to as- score. The sixth column (# sim cmpds 3D/2D) shows the number of lig- sess how reasonable the predictions are and to see what kind ands of the predicted target or its homologs that display similarity with the query molecule based on either 2D or 3D similarity measures. These of ligands display the strongest similarity with the query numbers are linked to pages containing information about these ligands. molecule. For instance, the number circled in red provides a link to the list of ligands Finally, help pages with interactive screenshots of the of ESR1 or its homologous proteins that display similarity with the query website are available, an FAQ page is provided to guide molecule (see Figure 2A). The pie chart shows the distribution of target classes. Predictions based on homology are indicated with ‘(by homology)’ users, and some of the raw data used in the predictions can (see the green box). be retrieved via the download page. HOMOLOGY-BASED PREDICTIONS were retrieved from the ChEMBL target annotation and in general correspond to the l1 level in the target classification Proteins originating from a common ancestor in general (6). Exceptions include enzymes and transcription factors display a high degree of sequence and structure similarity. for which more detailed classification based on l2or l3lev- From a computational point of view, this similarity has been els is sometimes shown if they occur frequently in the target widely used in protein structure and function prediction, for list (e.g. Tyr kinase, see Figure 1). The pie chart on the top instance (33,34). Recently, it has been shown that the bind- right of the page shows a summary of the different target ing of small molecules is also often conserved between ho- Nucleic Acids Research, 2014, Vol. 42, Web Server issue W35 over, in the list of ligands similar to the query molecule, those binding only to homologous targets are also desig- nated with ‘By Homology’ and the actual target is indicated (Figure 2B, green box). For instance, in Figure 1 chloro- trianisene (CHEMBL1200761) is predicted to bind ESR2 mainly because it shows similarity with ligands of ESR1 (see Figure 2A). The predicted target ESR2 is therefore an- notated with ‘by homology’ (green box, Figure 1). Figure 2B shows the list of most similar ligands obtained by fol- lowing the link in the green circle of Figure 1. As the most similar molecule is a ligand of ESR1, it is labeled with ‘By homology’ and both the actual target and the organism are displayed. We note that for organisms with less data (e.g. horse, cow), many predictions might be based on homology with targets in other species. Including homology-based predictions allowed us to ex- pand the list of predicted targets from 2686 to over 12 188 in all vfi e organisms studied here (see Table 1). As some of these proteins do not have reported bioactivity data directly associated with them, they may not be in the ChEMBL database. This is the reason why for instance KCNH6 and KCNH7 do not have ChEMBL IDs in Figure 1. Homol- ogy relationships between all targets can be downloaded at http://www.swisstargetprediction.ch/download.php. VALIDATION DATASET Extensive cross-validation of the SwissTargetPrediction algorithm has been published previously (18). To comple- ment these data, we also tested our method against a new Figure 2. (A) List of ligands of ESR1 or its homologous proteins display- set of molecules that are not present in the training set. In ing 3D similarity with a query molecule (here chlorotrianisene). This page is obtained by following the link in the red circle in Figure 1. Molecules particular, we used molecules from version 17 of ChEMBL are ordered based on their 3D similarity with the query molecule. (B) List (6) that were not present in version 16 (i.e. not present in of ligands of ESR2 or its homologous proteins displaying similarity with the training set). We further required that each molecule a query molecule (here chlorotrianisene) obtained by following the link be involved in at least one positive (<2 M) and one in the green circle in Figure 1. If a molecule is a ligand of a homologous negative (>50 M) interaction. This resulted in a set of 213 protein of the predicted target, the actual target as well as its organism is indicated (see the green box). When the most similar molecule is a ligand molecules with 346 positive and 278 negative interactions. of a homologous protein, the prediction is labeled as ‘by homology’ in the To obtain a more balanced dataset that better reflects the result page (Figure 1). A link to the ChEMBL entry is provided for each much larger number of non-interacting protein–ligand compound. pairs, we included additional negative interactions by link- ing the molecules in our test set to randomly chosen targets present in ChEMBL (version 16) so as to have vfi e times mologs (35–37). In particular, orthologous proteins in close more negative than positive interactions for each molecule. species such as human and rat often share most of their lig- The full benchmark dataset can be downloaded on our web- ands (36). The same holds for paralogs, although the de- site (http://www.swisstargetprediction.ch/download.php). gree of similarity between ligands of paralogous proteins is We then ran the SwissTargetPrediction algorithm as slightly lower than between orthologous proteins (36). implemented on the website to assess how accurate the In SwissTargetPrediction, we provide the possibility to predictions are. This resulted in an average AUC value of map predictions based on protein homology, both within 0.87 on this external test set of both positive and negative and between organisms. Orthologs and paralogs were re- interactions. We also assessed how often the known targets trieved from Ensembl Compara (38), Treefam (39)and fall into the top predicted ones in the SwissTargetPredic- orthoDB (40), using the union of all three datasets. tion general output (see Figure 1). For 70% of the ligands, Homology-based predictions were carried out as follows: at least one of the known targets is found among the the query molecule is compared to all molecules that bind to first 15 top predicted ones and for 31% of the ligands in targets that have homology with a protein in the selected or- our test set, the best predicted target is a true-positive. ganism. Predictions are then carried out as if the ligands of For instance, molecule CHEMBL2325087 (SMILES: these proteins were actual ligands of their homologs in the NC(=S)N1N=C(CC1c1ccc2ccccc2c1)c1ccc(Cl)c(Cl)c1) selected organism. If the ligand most similar to the query binds to EGFR and ERBB2 with sub-micromolar activity molecule is only observed to bind to a homologous protein, (41) and these two targets are accurately predicted by predictions are listed as ‘by homology’ on the SwissTar- SwissTargetPrediction (see Supplementary Figure S2). getPrediction result page (see Figure 1, green box). More- Although we cannot exclude that some molecules in our W36 Nucleic Acids Research, 2014, Vol. 42, Web Server issue test set were actually developed based on their similarity have been proposed to assess the confidence of predictions. with known ligands, our results strongly indicate that For instance, in Keiser et al. (1), an E-value is computed SwissTargetPrediction provides reliable predictions that from the 2D similarity with the set of ligands of a target. can be used in follow-up experiments. This E-value is derived from the statistics of similarity val- ues with all ligands (above a certain threshold), while in our case only the most similar ligand according to each simi- DISCUSSION larity measure is considered. Our probabilities can be inter- SwissTargetPrediction has been primarily developed for preted in terms of precision (i.e. number of true-positives di- identifying targets of molecules known to be bioactive. Nev- vided by the number of predicted targets), while E-values in- ertheless, users can upload any small molecule, real or vir- dicate how likely it would be to find a molecule with a given tual, even without prior knowledge of its potential effects. average similarity to the set of ligands of a target. In prac- In this case, the predicted targets may be relevant, especially tice, the most similar ligands are those contributing most to if the similarity with known ligands is high. The predictions the E-value, so the two approaches are not necessarily fun- may also provide hints on how a compound or a scaffold damentally different. Also, predictions with very low prob- might be chemically modified in order to increase its activ- ability in our approach correspond to low similarity values, and therefore would result in high E-values. Importantly, ity on a given target by comparing with known ligands that we point out that, by combining different kinds of chemical share some similarity (see also (42)). However, we point out that prediction accuracy is expected to be significantly lower similarity measures, our approach can explore more diverse for molecules with unknown bioactivity. This can be un- regions of the chemical space (18). derstood by noting that SwissTargetPrediction will always suggest some target, based on the assumption that if the CONCLUSION AND OUTLOOK molecule is active, it will likely bind to some protein. For SwissTargetPrediction is part of an important initiative of molecules with unknown bioactivity, this assumption is not the Swiss Institute of Bioinformatics to provide online tools valid per se and the molecule may not bind to any protein, in for computer-aided drug design, many of which are already which case all predicted targets are false-positives. In partic- available (42,44,46–48). In future developments, SwissTar- ular, inactive compounds can sometimes exhibit good sim- getPrediction will be further integrated with these tools, for ilarity with active molecules if they have been obtained by instance by predicting potential binding modes with Swiss- modifying an active compound at some key position that Dock (44). Moreover, as large screening campaigns are in- was crucial for its interactions. This is a known limitation creasingly being carried out in different organisms both in of ligand-based approaches when applied to any kind of industry and academia (49,50), SwissTargetPrediction will compounds and therefore target predictions should be in- be regularly updated and new organisms added to it. This terpreted with care in the absence of indication of bioactiv- will enable users to efficiently harness the wealth of publicly ity. available data to accurately predict new targets for bioactive Homology-based mapping of target predictions is in- small molecules in diverse species. creasingly being recognized as a powerful approach to translate results obtained in model organisms to human (35,36,43). In this work, we have considered homology re- SUPPLEMENTARY DATA lationships between and within vfi e vertebrate species, for Supplementary Data are available at NAR Online, includ- which most homologous proteins display a very high se- ing references [1–6]. quence identity and similar functions. Therefore, we did not filter out any homology relationship. For more dis- tant organisms (e.g. worm or yeast), greater care should ACKNOWLEDGMENT be taken, for instance by allowing only mapping between We are thankful to Tomislav Ilicic for insightful comments orthologous proteins that have conserved binding sites or about the web interface. high overall sequence identity. Another possible issue with homology-based mapping arises with molecules that are specifically designed to target some members of a protein FUNDING family and not others. Our algorithm, as most other ligand- Swiss Institute of Bioinformatics. Source of open access based methods, will likely fail to detect these subtle differ- funding: Swiss Institute of Bioinformatics (core funding). ences. For instance, in Supplementary Figure S2, molecule Conflict of interest statement. None declared. CHEMBL2325087 is also predicted to bind to ERBB3 with equal probability, although the experimental activity (51 M) is much lower than for EGFR and ERBB2(41).Toad- REFERENCES dress such issues, one possibility is to use other orthogonal 1. Keiser,M.J., Roth,B.L., Armbruster,B.N., Ernsberger,P., Irwin,J.J. computational approaches, such as structure-based analy- and Shoichet,B.K. (2007) Relating protein pharmacology by ligand ses or molecular docking (44,45), to refine the predictions chemistry. Nat. Biotechnol., 25, 197–206. 2. Oprea,T.I., Bauman,J.E., Bologa,C.G., Buranda,T., Chigaev,A., by considering small changes in protein binding sites that Edwards,B.S., Jarvik,J.W., Gresham,H.D., Haynes,M.K., Hjelle,B. could confer specificity to some targets. et al. (2011) Drug Repurposing from an Academic Perspective. Drug In SwissTargetPrediction, we use a probability derived Discov. Today. Therapeutic Strategies, 8, 61–69. from our cross-validation analysis to rank the targets and 3. Jorgensen,W.L. (2009) Efficient drug lead discovery and optimization. estimate the accuracy of the predictions. Other approaches Acc. Chem. Res., 42, 724–733. Nucleic Acids Research, 2014, Vol. 42, Web Server issue W37 4. Ziegler,S., Pries,V., Hedberg,C. and Waldmann,H. (2013) Target incorporating lipophilicity into ElectroShape as an extra dimension. identification for small bioactive molecules: finding the needle in the J. Comput. Aided Mol. Des., 25, 785–790. haystack. Angew. Chem. Int. Ed. Engl., 52, 2744–2792. 28. Perez-Nueno,V.I., Venkatraman,V., Mavridis,L. and Ritchie,D.W. 5. Karaman,M.W., Herrgard,S., Treiber,D.K., Gallant,P., (2012) Detecting drug promiscuity using Gaussian ensemble Atteridge,C.E., Campbell,B.T., Chan,K.W., Ciceri,P., Davis,M.I., screening. J. Chem. Inf. Model, 52, 1948–1961. Edeen,P.T. et al. (2008) A quantitative analysis of kinase inhibitor 29. Armstrong,M.S., Morris,G.M., Finn,P.W., Sharma,R., Moretti,L., selectivity. Nat. Biotechnol., 26, 127–132. Cooper,R.I. and Richards,W.G. (2010) ElectroShape: fast molecular 6. Bento,A.P., Gaulton,A., Hersey,A., Bellis,L.J., Chambers,J., similarity calculations incorporating shape, chirality and Davies,M., Kruger,F.A., Light,Y., Mak,L., McGlinchey,S. et al. electrostatics. J. Comput. Aided Mol. Des., 24, 789–801. (2014) The ChEMBL bioactivity database: an update. Nucleic Acids 30. Safran,M., Dalah,I., Alexander,J., Rosen,N., Iny Stein,T., Res., 42, D1083–D1090. Shmoish,M., Nativ,N., Bahir,I., Doniger,T., Krug,H. et al. 7. Bolton,E., Wang,Y., Thiessen,P.A. and Bryant,S.H. (2008), Annual (2010) GeneCards Version 3: the human gene integrator. Database Reports in Computational Chemistry, Vol. 4. American Chemical (Oxford), doi: 10.1093/database/baq020. Society, Washington DC. 31. UniProt,C. (2013) Update on activities at the Universal Protein 8. Irwin,J.J., Sterling,T., Mysinger,M.M., Bolstad,E.S. and Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47. Coleman,R.G. (2012) ZINC: a free tool to discover chemistry for 32. Kupfer,D. and Bulger,W.H. (1990) Inactivation of the uterine biology. J. Chem. Inf. Model, 52, 1757–1768. estrogen receptor binding of estradiol during P-450 catalyzed 9. Clemons,P.A. (2004) Complex phenotypic assays in high-throughput metabolism of chlorotrianisene (TACE). Speculation that TACE screening. Curr. Opin. Chem. Biol., 8, 334–338. antiestrogenic activity involves covalent binding to the estrogen 10. Inglese,J., Johnson,R.L., Simeonov,A., Xia,M., Zheng,W., receptor. FEBS Lett., 261, 59–62. Austin,C.P. and Auld,D.S. (2007) High-throughput screening assays 33. Kiefer,F., Arnold,K., Kunzli,M., Bordoli,L. and Schwede,T. (2009) for the identification of chemical probes. Nat. Chem. Biol., 3, The SWISS-MODEL repository and associated resources. Nucleic 466–479. Acids Res., 37, D387–D392. 11. Smith,A.M., Ammar,R., Nislow,C. and Giaever,G. (2010) A survey 34. Loewenstein,Y., Raimondo,D., Redfern,O.C., Watson,J., of yeast genomic assays for drug and target discovery. Pharmacol. Frishman,D., Linial,M., Orengo,C., Thornton,J. and Tramontano,A. Ther., 127, 156–164. (2009) Protein function annotation by homology-based inference. 12. Mestres,J., Gregori-Puigjane,E., Valverde,S. and Sole,R.V. (2009) The Genome Biol., 10, 207. topology of drug-target interaction networks: implicit dependence on 35. Klabunde,T. (2007) Chemogenomic approaches to drug discovery: drug properties and target families. Mol. Biosyst., 5, 1051–1057. similar receptors bind similar ligands. Br.J.Pharmacol., 152,5–7. 13. Lounkine,E., Keiser,M.J., Whitebread,S., Mikhailov,D., Hamon,J., 36. Kruger,F.A. and Overington,J.P. (2012) Global analysis of small Jenkins,J.L., Lavan,P., Weber,E., Doak,A.K., Cote,S. et al. (2012) molecule binding to related protein targets. PLoS Comput. Biol., 8, Large-scale prediction and testing of drug activity on side-effect e1002333. targets. Nature, 486, 361–367. 37. Paricharak,S., Klenka,T., Augustin,M., Patel,U.A. and Bender,A. 14. Kola,I. and Landis,J. (2004) Can the pharmaceutical industry reduce (2013) Are phylogenetic trees suitable for chemogenomics analyses of attrition rates? Nat. Rev. Drug Discov., 3, 711–715. bioactivity data sets: the importance of shared active compounds and 15. Keiser,M.J., Setola,V., Irwin,J.J., Laggner,C., Abbas,A.I., choosing a suitable data embedding method, as exemplified on Hufeisen,S.J., Jensen,N.H., Kuijer,M.B., Matos,R.C., Tran,T.B. et al. Kinases. J. Cheminform., 5, 49. (2009) Predicting new molecular targets for known drugs. Nature, 38. Vilella,A.J., Severin,J., Ureta-Vidal,A., Heng,L., Durbin,R. and 462, 175–181. Birney,E. (2009) EnsemblCompara GeneTrees: complete, 16. Issa,N.T., Kruger,J., Byers,S.W. and Dakshanamurthy,S. (2013) Drug duplication-aware phylogenetic trees in vertebrates. Genome Res., 19, repurposing a reality: from computers to the clinic. Expert Rev. Clin. 327–335. Pharmacol., 6, 95–97. 39. Schreiber,F., Patricio,M., Muffato,M., Pignatelli,M. and Bateman,A. 17. Dunkel,M., Gunther,S., Ahmed,J., Wittig,B. and Preissner,R. (2008) (2014) TreeFam v9: a new website, more species and SuperPred: drug classification and target prediction. Nucleic Acids orthology-on-the-fly. Nucleic Acids Res., 42, D922–D925. Res., 36, W55–W59. 40. Waterhouse,R.M., Tegenfeldt,F., Li,J., Zdobnov,E.M. and 18. Gfeller,D., Michielin,O. and Zoete,V. (2013) Shaping the interaction Kriventseva,E.V. (2013) OrthoDB: a hierarchical catalog of animal, landscape of bioactive molecules. Bioinformatics, 29, 3073–3079. fungal and bacterial orthologs. Nucleic Acids Res., 41, D358–D365. 19. Gong,J., Cai,C., Liu,X., Ku,X., Jiang,H., Gao,D. and Li,H. (2013) 41. Yang,W., Hu,Y., Yang,Y.S., Zhang,F., Zhang,Y.B., Wang,X.L., ChemMapper: a versatile web server for exploring pharmacology and Tang,J.F., Zhong,W.Q. and Zhu,H.L. (2013) Design, modification chemical structure association based on molecular 3D similarity and 3D QSAR studies of novel naphthalin-containing pyrazoline method. Bioinformatics, 29, 1827–1829. derivatives with/without thiourea skeleton as anticancer agents. 20. Wang,L., Ma,C., Wipf,P., Liu,H., Su,W. and Xie,X.Q. (2013) Bioorg. Med. Chem., 21, 1050–1063. TargetHunter: an in silico target identification tool for predicting 42. Wirth,M., Zoete,V., Michielin,O. and Sauer,W.H. (2013) therapeutic potential of small organic molecules based on SwissBioisostere: a database of molecular replacements for ligand chemogenomic database. AAPS J., 15, 395–406. design. Nucleic Acids Res., 41, D1137–D1143. 21. Campillos,M., Kuhn,M., Gavin,A.C., Jensen,L.J. and Bork,P. (2008) 43. Jacob,L. and Vert,J.P. (2008) Protein-ligand interaction prediction: an Drug target identification using side-effect similarity. Science, 321, improved chemogenomics approach. Bioinformatics, 24, 2149–2156. 263–266. 44. Grosdidier,A., Zoete,V. and Michielin,O. (2011) SwissDock, a 22. Willett,P. (2011) Similarity searching using 2D structural fingerprints. protein-small molecule docking web service based on EADock DSS. Methods Mol. Biol., 672, 133–158. Nucleic Acids Res., 39, W270–W277. 23. Wirth,M. and Sauer,W.H.B. (2011) Bioactive molecules: perfectly 45. Morris,G.M., Huey,R., Lindstrom,W., Sanner,M.F., Belew,R.K., shaped for their target. Mol. Inform., 30, 677–688. Goodsell,D.S. and Olson,A.J. (2009) AutoDock4 and 24. Ballester,P.J. and Richards,W.G. (2007) Ultrafast shape recognition to AutoDockTools4: Automated docking with selective receptor search compound databases for similar molecular shapes. J. Comput. flexibility. J. Comput. Chem., 30, 2785–2791. Chem., 28, 1711–1723. 46. Zoete,V., Cuendet,M.A., Grosdidier,A. and Michielin,O. (2011) 25. Sastry,G.M., Dixon,S.L. and Sherman,W. (2011) Rapid shape-based SwissParam: a fast force field generation tool for small organic ligand alignment and virtual screening method based on molecules. J. Comput. Chem., 32, 2359–2368. atom/feature-pair similarities and volume overlap scoring. J. Chem. 47. Gfeller,D., Michielin,O. and Zoete,V. (2013) SwissSidechain: a Inf. Model, 51, 2455–2466. molecular and structural database of non-natural sidechains. Nucleic 26. Liu,X., Jiang,H. and Li,H. (2011) SHAFTS: a hybrid approach for Acids Res., 41, D327–D332. 3D molecular similarity calculation. 1. Method and assessment of 48. Gfeller,D., Michielin,O. and Zoete,V. (2012) Expanding molecular virtual screening. J. Chem. Inf. Model, 51, 2372–2385. modeling and design tools to non-natural sidechains. J. Comput. 27. Armstrong,M.S., Finn,P.W., Morris,G.M. and Richards,W.G. (2011) Chem., 55, 1525–1535. Improving the accuracy of ultrafast ligand-based screening: W38 Nucleic Acids Research, 2014, Vol. 42, Web Server issue 49. Wallace,I.M., Urbanus,M.L., Luciani,G.M., Burns,A.R., Han,M.K., 50. Frearson,J.A. and Collie,I.T. (2009) HTS and hit finding in Wang,H., Arora,K., Heisler,L.E., Proctor,M., St Onge,R.P. et al. academia––from chemical genomics to drug discovery. Drug Discov. (2011) Compound prioritization methods increase rates of chemical Today, 14, 1150–1158. probe discovery in model organisms. Chem. Biol., 18, 1273–1283. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

SwissTargetPrediction: a web server for target prediction of bioactive small molecules

Loading next page...
 
/lp/oxford-university-press/swisstargetprediction-a-web-server-for-target-prediction-of-bioactive-Zl6V1NdNKD

References (52)

Publisher
Oxford University Press
Copyright
The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gku293
pmid
24792161
Publisher site
See Article on Publisher Site

Abstract

W32–W38 Nucleic Acids Research, 2014, Vol. 42, Web Server issue Published online 03 May 2014 doi: 10.1093/nar/gku293 SwissTargetPrediction: a web server for target prediction of bioactive small molecules 1 1 1 1 1,2,3,* David Gfeller ,Aurelien ´ Grosdidier , Matthias Wirth , Antoine Daina , Olivier Michielin 1,* and Vincent Zoete Swiss Institute of Bioinformatics (SIB), Quartier Sorge, Batiment ˆ Genopode ´ , CH-1015 Lausanne, Switzerland, Ludwig Institute for Cancer Research, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland and Oncology Department, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland Received January 29, 2014; Revised March 24, 2014; Accepted March 30, 2014 ABSTRACT for many proteins such as specific kinases or phosphatases, hundreds of small molecule ligands have been identified. Bioactive small molecules, such as drugs or metabo- Such large screening initiatives have also provided unique lites, bind to proteins or other macro-molecular tar- insights into the specificity and pharmacology of protein gets to modulate their activity, which in turn results families (1,5). Recently, these data have been collected in in the observed phenotypic effects. For this reason, several public databases, like ChEMBL (6) or PubChem mapping the targets of bioactive small molecules is (7) storing information on bioactivities, or ZINC (8) con- a key step toward unraveling the molecular mecha- taining information on commercially available compounds. These can be mined automatically to retrieve specific infor- nisms underlying their bioactivity and predicting po- mation for a large number of molecules. tential side effects or cross-reactivity. Recently, large However, molecular targets still remain unknown in sev- datasets of protein–small molecule interactions have eral cases. For instance, phenotypic assays indicate whether become available, providing a unique source of in- a molecule is active or not, without necessarily providing formation for the development of knowledge-based direct information on its actual molecular targets (9–11). approaches to computationally identify new targets Moreover, for most molecules, experiments have been per- for uncharacterized molecules or secondary targets formed with a limited set of targets, such as kinases or for known molecules. Here, we introduce SwissTar- G protein-coupled receptors, and possible off-target effects getPrediction, a web server to accurately predict the have been rarely tested for. Finally, new molecules being de- targets of bioactive molecules based on a combina- veloped for specific purposes may have several targets that are typically not known in advance. For instance, a recent tion of 2D and 3D similarity measures with known study on a set of 802 drugs and interaction data assembled ligands. Predictions can be carried out in five differ- from seven different databases has shown that known drugs ent organisms, and mapping predictions by homol- have on average six molecular targets on which they exhibit ogy within and between different species is enabled activity (12). Identifying these secondary targets is crucial. for close paralogs and orthologs. SwissTargetPre- First, it can indicate possible adverse side effects that might diction is accessible free of charge and without login arise when using the molecule, thereby decreasing the attri- requirement at http://www.swisstargetprediction.ch. tion rate in clinical trials due to toxicity (13,14). Second, it provides ways of repositioning (or repurposing) molecules for new applications. This has become a central theme in INTRODUCTION pharmaceutical research in view of the difficulty to launch Molecular insight into the mode of action of bioactive small new chemical entities. In particular, it is increasingly being molecules is key to understanding observed phenotypes, recognized that several compounds traditionally used for predicting potential side effects or cross-reactivity and op- one given application may actually show potent activity in timizing existing compounds (1–3). In particular, mapping other therapeutic settings (2,15,16). their targets is a crucial step toward providing a rational un- Computational predictions play an important role in nar- derstanding of small molecule’s bioactivity. For these rea- rowing down the set of potential targets and suggesting sons, high-throughput reverse screening of chemical com- secondary targets for known molecules (13,15). In partic- pounds against arrays of protein targets has become an ular, the large amount of information collected on protein– integral part of drug discovery pipelines (4). As a result, To whom correspondence should be addressed. Tel: +41 21 692 4053; Fax: +41 21 692 4065; Email: [email protected] Correspondence may also be addressed to Vincent Zoete. Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, Web Server issue W33 small molecule interactions in the last few years has enabled for each molecule size are listed in Supplementary Table researchers to develop ligand-based approaches for target S1. Target scores range therefore between 0 and 1, with the prediction (1,17–20). With SwissTargetPrediction, our goal largest possible value being reached if the query molecule is to provide a user-friendly web interface for a knowledge- is a known ligand of the target. These scores are used to based algorithm, recently developed in our group (18), rank predicted targets. A probability has been derived from to predict the targets of bioactive small molecules. Com- this score to assess the likelihood of the predictions to be pared to other existing approaches, SwissTargetPrediction correct. These probability values correspond to the aver- has several distinctive features. First, it enables combining age precision (i.e. number of true-positives divided by the both 2D and 3D similarity measures with known ligands. total number of predicted targets at different thresholds) Second, it provides results in vfi e different species. Third, it obtained in a leave-one-out cross-validation study over our allows users to map predictions between and within organ- training set (see Supplementary Materials). As it is based on isms based on target homology. cross-validation, they may suffer from internal biases in our training data (e.g. presence of large congeneric series of sim- THE SWISSTARGETPREDICTION METHOD AND ilar molecules) and if a new query molecule without related DATASET molecules in our database is tested, they may slightly over- estimate the prediction accuracy. For this reason, we stress SwissTargetPrediction is based on the observation that sim- that these probabilities are primarily used to rank targets ilar bioactive molecules are more likely to share similar predicted to bind to a given small molecule. In particular, targets (1,21). Therefore, the targets of a molecule can be they should not be used to compare predictions obtained predicted by identifying proteins with known ligands that with different molecules. are highly similar to the query molecule. In this ligand- The set of protein–ligand interactions was retrieved from based strategy, a major challenge is to accurately identify the ChEMBL database version 16 (6) using stringent cri- and quantify similarity between the query molecule and the teria to remove ambiguous cases. First, only interactions known ligands. Early approaches have focused on deter- involving single proteins or protein complexes as well as mining chemical similarity by using molecular fingerprints ligands with less than 80 heavy atoms were considered. (22) (sometimes called 2D similarity). While compounds ex- Second, selected interactions had to be annotated as di- hibiting a high similarity under these measures clearly have rect binding (‘assay type’ = ‘B’) with an activity (K ,K , i d an increased likelihood for interactions with similar tar- IC or EC ) lower than 10 M in all assays. Interactions 50 50 gets, the biophysics of molecular recognition suggests that were retrieved in vfi e organisms (human, mouse, rat, cow similarity in ligand shape or electrostatic potential distri- and horse). In total, our dataset consists of 280 381 small bution could also lead to a similar effect (23). Therefore, molecules interacting with 2686 targets, with the majority 3D structural similarity measures have been developed to of targets (66%) found in human (see Table 1). assess similarity between molecules (24–29). Recently, we have shown that combining 2D and 3D similarity measures significantly increases the target prediction accuracy, espe- THE SWISSTARGETPREDICTION WEB INTERFACE cially if the query molecule is new and does not belong to SwissTargetPrediction provides an intuitive interface to pre- an already well-studied chemical series (18). In SwissTarget- dict small molecule protein targets (see also Supplemen- Prediction, both 2D similarity and 3D similarity values are tary Figure S1). Query molecules can be inputted either as computed against a set of known ligands. For 2D similar- SMILES, or drawn in 2D using the javascript-based molec- ity, we use FP2 fingerprints to describe molecules, as imple- ular editor of ChemAxon (http://www.chemaxon.com). The mented in OpenBabel version 2.2.0. The similarity between SMILES input field and the 2D interface are automatically two molecules is quantified with the Tanimoto coefficient synchronized. The organism in which predictions should be (which corresponds to the number of shared fingerprint made can be selected. The current version of SwissTarget- patterns divided by the total number of fingerprint pat- Prediction allows users to choose between vfi e organisms: terns describing the two molecules). For 3D similarity, we human, mouse, rat, cow and horse, the default being human first generate 20 different conformations of each molecule (see Supplementary Figure S1). Once a molecule has been (see Supplementary Materials). From these different con- provided, either by SMILES or by drawing, and an organ- formations, 20 Electroshape vectors, which consist of 18- ism has been chosen, the ‘Submit’ button becomes clickable dimensional real vectors (27), are computed. The Manhat- and calculations can start. The SMILES is first checked to ensure that it corresponds to a valid chemical structure. If tan distance d = |x − y | is used to compare vectors s s true, the similarity (both 2D and 3D) between the query s=1 (x and y) describing two different molecules. The final 3D molecule and all ligands in our database is computed and similarity value between molecules i and j is computed as the score of each target is derived from the combined 2D 1/ 1 + d ,where d is the smallest Manhattan distance and 3D similarity values with the most similar ligands (see ij ij among the 20×20 distances calculated over all possible con- Supplementary Materials). formations of each molecule (see also Supplementary Ma- The result page lists the predicted targets with their com- terials). The final score of a target corresponds to a combi- mon name together with links to GeneCards (30) (for hu- nation of similarity measures based on a logistic regression man proteins), UniProt (31) and ChEMBL (6)databases of the similarity values, with the most similar ligands using when available (see Figure 1). Targets are ranked accord- both 2D and 3D similarity measures (see Supplementary ing to their score with respect to the query molecule. The Materials and (18)). Coefficients of the logistic regression target classes are displayed in the last column. These classes W34 Nucleic Acids Research, 2014, Vol. 42, Web Server issue Table 1. Number of targets in each organism Number of targets including homology-based Organisms Number of targets predictions Homo sapiens 1768 2547 Mus musculus 342 2345 Rattus norvegicus 469 2657 Bos taurus 104 2272 Equus caballus 3 2367 Total 2686 12 188 The first column shows the number of targets with experimental data. The second column shows the number of targets when including homology-based predictions. classes present among the predicted targets. All results can be downloaded as text (.txt or .csv), images (.jpg), printable report (.pdf), copied to the clipboard or sent to an email ad- dress by clicking on the links following the ‘Retrieve data:’ field. The probability derived from the target scores (see Supplementary Materials) is displayed in the fifth column as a horizontal bar (see Figure 1). In the example of Figure 1, the predicted tar- gets of chlorotrianisene (CHEMBL1200761) include Prostaglandin G/H synthase 1 (COX-1) and estrogen receptor (ESR1). Chlorotrianisene is a known inhibitor of COX-1 (13), although the interaction is not present in ChEMBL. Moreover, while no direct binding between chlorotrianisene and estrogen receptor is reported in ChEMBL, functional assay results in this database indi- cate that chlorotrianisene is active on estrogen receptor (32). These results show that, in this case, several of the predictions are true-positives. To enable users to visually explore the ligands of the pre- dicted targets, all ligands with a similarity (either 2D or 3D) larger than a minimal threshold value can be examined by following the links provided in the sixth column. Figure 2A shows an example of the results obtained by following the link in the red circle of Figure 1. Ligands are listed accord- ing to their similarity with the query molecule. A threshold for 3D similarity values has been set to 0.75 and the one for 2D similarity values to 0.45. Below these thresholds, lig- ands show very low similarity with the query molecule and Figure 1. Prediction result page. This page shows the list of predicted tar- are not listed. A link to the ChEMBL entries is provided gets for the query molecule (here chlorotrianisene). Targets are ranked ac- cording to their scores. Links to GeneCards (under ‘Common name’ col- for the ligands and the similarity with the query molecule is umn), UniProt and ChEMBL (when available) are provided. Green bars indicated. We note that manually exploring the ligands sim- indicate the estimated probability of a protein to be a true target given its ilar to the query molecule is strongly recommended to as- score. The sixth column (# sim cmpds 3D/2D) shows the number of lig- sess how reasonable the predictions are and to see what kind ands of the predicted target or its homologs that display similarity with the query molecule based on either 2D or 3D similarity measures. These of ligands display the strongest similarity with the query numbers are linked to pages containing information about these ligands. molecule. For instance, the number circled in red provides a link to the list of ligands Finally, help pages with interactive screenshots of the of ESR1 or its homologous proteins that display similarity with the query website are available, an FAQ page is provided to guide molecule (see Figure 2A). The pie chart shows the distribution of target classes. Predictions based on homology are indicated with ‘(by homology)’ users, and some of the raw data used in the predictions can (see the green box). be retrieved via the download page. HOMOLOGY-BASED PREDICTIONS were retrieved from the ChEMBL target annotation and in general correspond to the l1 level in the target classification Proteins originating from a common ancestor in general (6). Exceptions include enzymes and transcription factors display a high degree of sequence and structure similarity. for which more detailed classification based on l2or l3lev- From a computational point of view, this similarity has been els is sometimes shown if they occur frequently in the target widely used in protein structure and function prediction, for list (e.g. Tyr kinase, see Figure 1). The pie chart on the top instance (33,34). Recently, it has been shown that the bind- right of the page shows a summary of the different target ing of small molecules is also often conserved between ho- Nucleic Acids Research, 2014, Vol. 42, Web Server issue W35 over, in the list of ligands similar to the query molecule, those binding only to homologous targets are also desig- nated with ‘By Homology’ and the actual target is indicated (Figure 2B, green box). For instance, in Figure 1 chloro- trianisene (CHEMBL1200761) is predicted to bind ESR2 mainly because it shows similarity with ligands of ESR1 (see Figure 2A). The predicted target ESR2 is therefore an- notated with ‘by homology’ (green box, Figure 1). Figure 2B shows the list of most similar ligands obtained by fol- lowing the link in the green circle of Figure 1. As the most similar molecule is a ligand of ESR1, it is labeled with ‘By homology’ and both the actual target and the organism are displayed. We note that for organisms with less data (e.g. horse, cow), many predictions might be based on homology with targets in other species. Including homology-based predictions allowed us to ex- pand the list of predicted targets from 2686 to over 12 188 in all vfi e organisms studied here (see Table 1). As some of these proteins do not have reported bioactivity data directly associated with them, they may not be in the ChEMBL database. This is the reason why for instance KCNH6 and KCNH7 do not have ChEMBL IDs in Figure 1. Homol- ogy relationships between all targets can be downloaded at http://www.swisstargetprediction.ch/download.php. VALIDATION DATASET Extensive cross-validation of the SwissTargetPrediction algorithm has been published previously (18). To comple- ment these data, we also tested our method against a new Figure 2. (A) List of ligands of ESR1 or its homologous proteins display- set of molecules that are not present in the training set. In ing 3D similarity with a query molecule (here chlorotrianisene). This page is obtained by following the link in the red circle in Figure 1. Molecules particular, we used molecules from version 17 of ChEMBL are ordered based on their 3D similarity with the query molecule. (B) List (6) that were not present in version 16 (i.e. not present in of ligands of ESR2 or its homologous proteins displaying similarity with the training set). We further required that each molecule a query molecule (here chlorotrianisene) obtained by following the link be involved in at least one positive (<2 M) and one in the green circle in Figure 1. If a molecule is a ligand of a homologous negative (>50 M) interaction. This resulted in a set of 213 protein of the predicted target, the actual target as well as its organism is indicated (see the green box). When the most similar molecule is a ligand molecules with 346 positive and 278 negative interactions. of a homologous protein, the prediction is labeled as ‘by homology’ in the To obtain a more balanced dataset that better reflects the result page (Figure 1). A link to the ChEMBL entry is provided for each much larger number of non-interacting protein–ligand compound. pairs, we included additional negative interactions by link- ing the molecules in our test set to randomly chosen targets present in ChEMBL (version 16) so as to have vfi e times mologs (35–37). In particular, orthologous proteins in close more negative than positive interactions for each molecule. species such as human and rat often share most of their lig- The full benchmark dataset can be downloaded on our web- ands (36). The same holds for paralogs, although the de- site (http://www.swisstargetprediction.ch/download.php). gree of similarity between ligands of paralogous proteins is We then ran the SwissTargetPrediction algorithm as slightly lower than between orthologous proteins (36). implemented on the website to assess how accurate the In SwissTargetPrediction, we provide the possibility to predictions are. This resulted in an average AUC value of map predictions based on protein homology, both within 0.87 on this external test set of both positive and negative and between organisms. Orthologs and paralogs were re- interactions. We also assessed how often the known targets trieved from Ensembl Compara (38), Treefam (39)and fall into the top predicted ones in the SwissTargetPredic- orthoDB (40), using the union of all three datasets. tion general output (see Figure 1). For 70% of the ligands, Homology-based predictions were carried out as follows: at least one of the known targets is found among the the query molecule is compared to all molecules that bind to first 15 top predicted ones and for 31% of the ligands in targets that have homology with a protein in the selected or- our test set, the best predicted target is a true-positive. ganism. Predictions are then carried out as if the ligands of For instance, molecule CHEMBL2325087 (SMILES: these proteins were actual ligands of their homologs in the NC(=S)N1N=C(CC1c1ccc2ccccc2c1)c1ccc(Cl)c(Cl)c1) selected organism. If the ligand most similar to the query binds to EGFR and ERBB2 with sub-micromolar activity molecule is only observed to bind to a homologous protein, (41) and these two targets are accurately predicted by predictions are listed as ‘by homology’ on the SwissTar- SwissTargetPrediction (see Supplementary Figure S2). getPrediction result page (see Figure 1, green box). More- Although we cannot exclude that some molecules in our W36 Nucleic Acids Research, 2014, Vol. 42, Web Server issue test set were actually developed based on their similarity have been proposed to assess the confidence of predictions. with known ligands, our results strongly indicate that For instance, in Keiser et al. (1), an E-value is computed SwissTargetPrediction provides reliable predictions that from the 2D similarity with the set of ligands of a target. can be used in follow-up experiments. This E-value is derived from the statistics of similarity val- ues with all ligands (above a certain threshold), while in our case only the most similar ligand according to each simi- DISCUSSION larity measure is considered. Our probabilities can be inter- SwissTargetPrediction has been primarily developed for preted in terms of precision (i.e. number of true-positives di- identifying targets of molecules known to be bioactive. Nev- vided by the number of predicted targets), while E-values in- ertheless, users can upload any small molecule, real or vir- dicate how likely it would be to find a molecule with a given tual, even without prior knowledge of its potential effects. average similarity to the set of ligands of a target. In prac- In this case, the predicted targets may be relevant, especially tice, the most similar ligands are those contributing most to if the similarity with known ligands is high. The predictions the E-value, so the two approaches are not necessarily fun- may also provide hints on how a compound or a scaffold damentally different. Also, predictions with very low prob- might be chemically modified in order to increase its activ- ability in our approach correspond to low similarity values, and therefore would result in high E-values. Importantly, ity on a given target by comparing with known ligands that we point out that, by combining different kinds of chemical share some similarity (see also (42)). However, we point out that prediction accuracy is expected to be significantly lower similarity measures, our approach can explore more diverse for molecules with unknown bioactivity. This can be un- regions of the chemical space (18). derstood by noting that SwissTargetPrediction will always suggest some target, based on the assumption that if the CONCLUSION AND OUTLOOK molecule is active, it will likely bind to some protein. For SwissTargetPrediction is part of an important initiative of molecules with unknown bioactivity, this assumption is not the Swiss Institute of Bioinformatics to provide online tools valid per se and the molecule may not bind to any protein, in for computer-aided drug design, many of which are already which case all predicted targets are false-positives. In partic- available (42,44,46–48). In future developments, SwissTar- ular, inactive compounds can sometimes exhibit good sim- getPrediction will be further integrated with these tools, for ilarity with active molecules if they have been obtained by instance by predicting potential binding modes with Swiss- modifying an active compound at some key position that Dock (44). Moreover, as large screening campaigns are in- was crucial for its interactions. This is a known limitation creasingly being carried out in different organisms both in of ligand-based approaches when applied to any kind of industry and academia (49,50), SwissTargetPrediction will compounds and therefore target predictions should be in- be regularly updated and new organisms added to it. This terpreted with care in the absence of indication of bioactiv- will enable users to efficiently harness the wealth of publicly ity. available data to accurately predict new targets for bioactive Homology-based mapping of target predictions is in- small molecules in diverse species. creasingly being recognized as a powerful approach to translate results obtained in model organisms to human (35,36,43). In this work, we have considered homology re- SUPPLEMENTARY DATA lationships between and within vfi e vertebrate species, for Supplementary Data are available at NAR Online, includ- which most homologous proteins display a very high se- ing references [1–6]. quence identity and similar functions. Therefore, we did not filter out any homology relationship. For more dis- tant organisms (e.g. worm or yeast), greater care should ACKNOWLEDGMENT be taken, for instance by allowing only mapping between We are thankful to Tomislav Ilicic for insightful comments orthologous proteins that have conserved binding sites or about the web interface. high overall sequence identity. Another possible issue with homology-based mapping arises with molecules that are specifically designed to target some members of a protein FUNDING family and not others. Our algorithm, as most other ligand- Swiss Institute of Bioinformatics. Source of open access based methods, will likely fail to detect these subtle differ- funding: Swiss Institute of Bioinformatics (core funding). ences. For instance, in Supplementary Figure S2, molecule Conflict of interest statement. None declared. CHEMBL2325087 is also predicted to bind to ERBB3 with equal probability, although the experimental activity (51 M) is much lower than for EGFR and ERBB2(41).Toad- REFERENCES dress such issues, one possibility is to use other orthogonal 1. Keiser,M.J., Roth,B.L., Armbruster,B.N., Ernsberger,P., Irwin,J.J. computational approaches, such as structure-based analy- and Shoichet,B.K. (2007) Relating protein pharmacology by ligand ses or molecular docking (44,45), to refine the predictions chemistry. Nat. Biotechnol., 25, 197–206. 2. Oprea,T.I., Bauman,J.E., Bologa,C.G., Buranda,T., Chigaev,A., by considering small changes in protein binding sites that Edwards,B.S., Jarvik,J.W., Gresham,H.D., Haynes,M.K., Hjelle,B. could confer specificity to some targets. et al. (2011) Drug Repurposing from an Academic Perspective. Drug In SwissTargetPrediction, we use a probability derived Discov. Today. Therapeutic Strategies, 8, 61–69. from our cross-validation analysis to rank the targets and 3. Jorgensen,W.L. (2009) Efficient drug lead discovery and optimization. estimate the accuracy of the predictions. Other approaches Acc. Chem. Res., 42, 724–733. Nucleic Acids Research, 2014, Vol. 42, Web Server issue W37 4. Ziegler,S., Pries,V., Hedberg,C. and Waldmann,H. (2013) Target incorporating lipophilicity into ElectroShape as an extra dimension. identification for small bioactive molecules: finding the needle in the J. Comput. Aided Mol. Des., 25, 785–790. haystack. Angew. Chem. Int. Ed. Engl., 52, 2744–2792. 28. Perez-Nueno,V.I., Venkatraman,V., Mavridis,L. and Ritchie,D.W. 5. Karaman,M.W., Herrgard,S., Treiber,D.K., Gallant,P., (2012) Detecting drug promiscuity using Gaussian ensemble Atteridge,C.E., Campbell,B.T., Chan,K.W., Ciceri,P., Davis,M.I., screening. J. Chem. Inf. Model, 52, 1948–1961. Edeen,P.T. et al. (2008) A quantitative analysis of kinase inhibitor 29. Armstrong,M.S., Morris,G.M., Finn,P.W., Sharma,R., Moretti,L., selectivity. Nat. Biotechnol., 26, 127–132. Cooper,R.I. and Richards,W.G. (2010) ElectroShape: fast molecular 6. Bento,A.P., Gaulton,A., Hersey,A., Bellis,L.J., Chambers,J., similarity calculations incorporating shape, chirality and Davies,M., Kruger,F.A., Light,Y., Mak,L., McGlinchey,S. et al. electrostatics. J. Comput. Aided Mol. Des., 24, 789–801. (2014) The ChEMBL bioactivity database: an update. Nucleic Acids 30. Safran,M., Dalah,I., Alexander,J., Rosen,N., Iny Stein,T., Res., 42, D1083–D1090. Shmoish,M., Nativ,N., Bahir,I., Doniger,T., Krug,H. et al. 7. Bolton,E., Wang,Y., Thiessen,P.A. and Bryant,S.H. (2008), Annual (2010) GeneCards Version 3: the human gene integrator. Database Reports in Computational Chemistry, Vol. 4. American Chemical (Oxford), doi: 10.1093/database/baq020. Society, Washington DC. 31. UniProt,C. (2013) Update on activities at the Universal Protein 8. Irwin,J.J., Sterling,T., Mysinger,M.M., Bolstad,E.S. and Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47. Coleman,R.G. (2012) ZINC: a free tool to discover chemistry for 32. Kupfer,D. and Bulger,W.H. (1990) Inactivation of the uterine biology. J. Chem. Inf. Model, 52, 1757–1768. estrogen receptor binding of estradiol during P-450 catalyzed 9. Clemons,P.A. (2004) Complex phenotypic assays in high-throughput metabolism of chlorotrianisene (TACE). Speculation that TACE screening. Curr. Opin. Chem. Biol., 8, 334–338. antiestrogenic activity involves covalent binding to the estrogen 10. Inglese,J., Johnson,R.L., Simeonov,A., Xia,M., Zheng,W., receptor. FEBS Lett., 261, 59–62. Austin,C.P. and Auld,D.S. (2007) High-throughput screening assays 33. Kiefer,F., Arnold,K., Kunzli,M., Bordoli,L. and Schwede,T. (2009) for the identification of chemical probes. Nat. Chem. Biol., 3, The SWISS-MODEL repository and associated resources. Nucleic 466–479. Acids Res., 37, D387–D392. 11. Smith,A.M., Ammar,R., Nislow,C. and Giaever,G. (2010) A survey 34. Loewenstein,Y., Raimondo,D., Redfern,O.C., Watson,J., of yeast genomic assays for drug and target discovery. Pharmacol. Frishman,D., Linial,M., Orengo,C., Thornton,J. and Tramontano,A. Ther., 127, 156–164. (2009) Protein function annotation by homology-based inference. 12. Mestres,J., Gregori-Puigjane,E., Valverde,S. and Sole,R.V. (2009) The Genome Biol., 10, 207. topology of drug-target interaction networks: implicit dependence on 35. Klabunde,T. (2007) Chemogenomic approaches to drug discovery: drug properties and target families. Mol. Biosyst., 5, 1051–1057. similar receptors bind similar ligands. Br.J.Pharmacol., 152,5–7. 13. Lounkine,E., Keiser,M.J., Whitebread,S., Mikhailov,D., Hamon,J., 36. Kruger,F.A. and Overington,J.P. (2012) Global analysis of small Jenkins,J.L., Lavan,P., Weber,E., Doak,A.K., Cote,S. et al. (2012) molecule binding to related protein targets. PLoS Comput. Biol., 8, Large-scale prediction and testing of drug activity on side-effect e1002333. targets. Nature, 486, 361–367. 37. Paricharak,S., Klenka,T., Augustin,M., Patel,U.A. and Bender,A. 14. Kola,I. and Landis,J. (2004) Can the pharmaceutical industry reduce (2013) Are phylogenetic trees suitable for chemogenomics analyses of attrition rates? Nat. Rev. Drug Discov., 3, 711–715. bioactivity data sets: the importance of shared active compounds and 15. Keiser,M.J., Setola,V., Irwin,J.J., Laggner,C., Abbas,A.I., choosing a suitable data embedding method, as exemplified on Hufeisen,S.J., Jensen,N.H., Kuijer,M.B., Matos,R.C., Tran,T.B. et al. Kinases. J. Cheminform., 5, 49. (2009) Predicting new molecular targets for known drugs. Nature, 38. Vilella,A.J., Severin,J., Ureta-Vidal,A., Heng,L., Durbin,R. and 462, 175–181. Birney,E. (2009) EnsemblCompara GeneTrees: complete, 16. Issa,N.T., Kruger,J., Byers,S.W. and Dakshanamurthy,S. (2013) Drug duplication-aware phylogenetic trees in vertebrates. Genome Res., 19, repurposing a reality: from computers to the clinic. Expert Rev. Clin. 327–335. Pharmacol., 6, 95–97. 39. Schreiber,F., Patricio,M., Muffato,M., Pignatelli,M. and Bateman,A. 17. Dunkel,M., Gunther,S., Ahmed,J., Wittig,B. and Preissner,R. (2008) (2014) TreeFam v9: a new website, more species and SuperPred: drug classification and target prediction. Nucleic Acids orthology-on-the-fly. Nucleic Acids Res., 42, D922–D925. Res., 36, W55–W59. 40. Waterhouse,R.M., Tegenfeldt,F., Li,J., Zdobnov,E.M. and 18. Gfeller,D., Michielin,O. and Zoete,V. (2013) Shaping the interaction Kriventseva,E.V. (2013) OrthoDB: a hierarchical catalog of animal, landscape of bioactive molecules. Bioinformatics, 29, 3073–3079. fungal and bacterial orthologs. Nucleic Acids Res., 41, D358–D365. 19. Gong,J., Cai,C., Liu,X., Ku,X., Jiang,H., Gao,D. and Li,H. (2013) 41. Yang,W., Hu,Y., Yang,Y.S., Zhang,F., Zhang,Y.B., Wang,X.L., ChemMapper: a versatile web server for exploring pharmacology and Tang,J.F., Zhong,W.Q. and Zhu,H.L. (2013) Design, modification chemical structure association based on molecular 3D similarity and 3D QSAR studies of novel naphthalin-containing pyrazoline method. Bioinformatics, 29, 1827–1829. derivatives with/without thiourea skeleton as anticancer agents. 20. Wang,L., Ma,C., Wipf,P., Liu,H., Su,W. and Xie,X.Q. (2013) Bioorg. Med. Chem., 21, 1050–1063. TargetHunter: an in silico target identification tool for predicting 42. Wirth,M., Zoete,V., Michielin,O. and Sauer,W.H. (2013) therapeutic potential of small organic molecules based on SwissBioisostere: a database of molecular replacements for ligand chemogenomic database. AAPS J., 15, 395–406. design. Nucleic Acids Res., 41, D1137–D1143. 21. Campillos,M., Kuhn,M., Gavin,A.C., Jensen,L.J. and Bork,P. (2008) 43. Jacob,L. and Vert,J.P. (2008) Protein-ligand interaction prediction: an Drug target identification using side-effect similarity. Science, 321, improved chemogenomics approach. Bioinformatics, 24, 2149–2156. 263–266. 44. Grosdidier,A., Zoete,V. and Michielin,O. (2011) SwissDock, a 22. Willett,P. (2011) Similarity searching using 2D structural fingerprints. protein-small molecule docking web service based on EADock DSS. Methods Mol. Biol., 672, 133–158. Nucleic Acids Res., 39, W270–W277. 23. Wirth,M. and Sauer,W.H.B. (2011) Bioactive molecules: perfectly 45. Morris,G.M., Huey,R., Lindstrom,W., Sanner,M.F., Belew,R.K., shaped for their target. Mol. Inform., 30, 677–688. Goodsell,D.S. and Olson,A.J. (2009) AutoDock4 and 24. Ballester,P.J. and Richards,W.G. (2007) Ultrafast shape recognition to AutoDockTools4: Automated docking with selective receptor search compound databases for similar molecular shapes. J. Comput. flexibility. J. Comput. Chem., 30, 2785–2791. Chem., 28, 1711–1723. 46. Zoete,V., Cuendet,M.A., Grosdidier,A. and Michielin,O. (2011) 25. Sastry,G.M., Dixon,S.L. and Sherman,W. (2011) Rapid shape-based SwissParam: a fast force field generation tool for small organic ligand alignment and virtual screening method based on molecules. J. Comput. Chem., 32, 2359–2368. atom/feature-pair similarities and volume overlap scoring. J. Chem. 47. Gfeller,D., Michielin,O. and Zoete,V. (2013) SwissSidechain: a Inf. Model, 51, 2455–2466. molecular and structural database of non-natural sidechains. Nucleic 26. Liu,X., Jiang,H. and Li,H. (2011) SHAFTS: a hybrid approach for Acids Res., 41, D327–D332. 3D molecular similarity calculation. 1. Method and assessment of 48. Gfeller,D., Michielin,O. and Zoete,V. (2012) Expanding molecular virtual screening. J. Chem. Inf. Model, 51, 2372–2385. modeling and design tools to non-natural sidechains. J. Comput. 27. Armstrong,M.S., Finn,P.W., Morris,G.M. and Richards,W.G. (2011) Chem., 55, 1525–1535. Improving the accuracy of ultrafast ligand-based screening: W38 Nucleic Acids Research, 2014, Vol. 42, Web Server issue 49. Wallace,I.M., Urbanus,M.L., Luciani,G.M., Burns,A.R., Han,M.K., 50. Frearson,J.A. and Collie,I.T. (2009) HTS and hit finding in Wang,H., Arora,K., Heisler,L.E., Proctor,M., St Onge,R.P. et al. academia––from chemical genomics to drug discovery. Drug Discov. (2011) Compound prioritization methods increase rates of chemical Today, 14, 1150–1158. probe discovery in model organisms. Chem. Biol., 18, 1273–1283.

Journal

Nucleic Acids ResearchOxford University Press

Published: Jul 1, 2014

There are no references for this article.