Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Gaussian interaction profile kernels for predicting drug–target interaction

Gaussian interaction profile kernels for predicting drug–target interaction Vol. 27 no. 21 2011, pages 3036–3043 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btr500 Data and text mining Advance Access publication September 4, 2011 Gaussian interaction profile kernels for predicting drug–target interaction 1,∗ 2 1,∗ Twan van Laarhoven , Sander B. Nabuurs and Elena Marchiori 1 2 Department of Computer Science, Radboud University Nijmegen and Computational Drug Discovery, Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands Associate Editor: Jonathan Wren ABSTRACT Drug–target interaction data are available for many classes of pharmaceutically useful target proteins including Enzymes, Motivation: The in silico prediction of potential interactions between Ion Channels, G-protein-coupled receptors (GPCRs) and Nuclear drugs and target proteins is of core importance for the identification Receptors (Hopkins and Groom, 2002). Several publicly available of new drugs or novel targets for existing drugs. However, only databases have been built and maintained, such as KEGG a tiny portion of all drug–target pairs in current datasets are BRITE (Kanehisa et al., 2006), DrugBank (Wishart et al., 2008), experimentally validated interactions. This motivates the need for GLIDA (Okuno et al., 2007), SuperTarget and Matador (Günther developing computational methods that predict true interaction pairs et al., 2008), BRENDA (Schomburg et al., 2004) and ChEMBL with high accuracy. (Overington, 2009) containing drug–target interaction and other Results: We show that a simple machine learning method that related sources of information, like chemical and genomic data. uses the drug–target network as the only source of information A property of the current drug–target interaction databases is that is capable of predicting true interaction pairs with high accuracy. they contain a rather small number of drug–target pairs which are Specifically, we introduce interaction profiles of drugs (and of targets) experimentally validated interactions. This motivates the need for in a network, which are binary vectors specifying the presence or developing methods that predict true interacting pairs with high absence of interaction with every target (drug) in that network. We accuracy. define a kernel on these profiles, called the Gaussian Interaction Recently, machine learning methods have been introduced to Profile (GIP) kernel, and use a simple classifier, (kernel) Regularized tackle this problem. They can be viewed as instances of the more Least Squares (RLS), for prediction drug–target interactions. We general link prediction problem, see Lü and Zhou (2011) for test comparatively the effectiveness of RLS with the GIP kernel on a recent survey of this topic. These methods are motivated by four drug–target interaction networks used in previous studies. The the observation that similar drugs tend to target similar proteins proposed algorithm achieves area under the precision–recall curve (Klabunde, 2007; Schuffenhauer et al., 2003). This property was (AUPR) up to 92.7, significantly improving over results of state-of-the- shown, for instance, for chemical (Martin et al., 2002) and side effect art methods. Moreover, we show that using also kernels based on similarity (Campillos et al., 2008), and motivated the development chemical and genomic information further increases accuracy, with a of an integrated approach for drug–target interaction prediction neat improvement on small datasets. These results substantiate the (Jaroch and Weinmann, 2006). A desirable property of this approach relevance of the network topology (in the form of interaction profiles) is that it does not require the 3D structure information of the target as source of information for predicting drug–target interactions. proteins, which is needed in traditional methods based on docking Availability: Software and Supplementary Material are available at simulations (Cheng et al., 2007). http://cs.ru.nl/~tvanlaarhoven/drugtarget2011/. The current state-of-the-art for the in silico prediction of drug– Contact: [email protected]; [email protected] target interaction is formed by methods that employ similarity Supplementary Information: Supplementary data are available at measures for drugs and for targets in the form by kernel functions, Bioinformatics online. like Bleakley and Yamanishi (2009); Jacob and Vert (2008); Received on June 9, 2011; revised on August 12, 2011; accepted on Wassermann et al. (2009); Yamanishi et al. (2008, 2010). By using August 29, 2011 kernels, multiple sources of information can be easily incorporated for performing prediction (Schölkopf et al., 2004). In Yamanishi et al. (2008), different settings of the interaction 1 INTRODUCTION prediction problem are explored. The in silico prediction of interaction between drugs and target The authors make the distinction between ‘known’ drugs or proteins is a core step in the drug discovery process for identifying targets, for which at least one interaction is in the training set, and new drugs or novel targets for existing drugs, in order to guide and ‘new’ drugs or targets, for which there is not. There are then four speed up the laborious and costly experimental determination of possible settings, depending on whether the drugs and/or targets are drug–target interaction (Haggarty et al., 2003). known or new. In this article, we focus on the setting where both the drugs and targets are known. That is, we use known interactions To whom correspondence should be addressed. for predicting novel ones. 3036 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3036 3036–3043 GIP kernel Table 1. The number of drugs and target proteins, their ratio and the number We want to analyze the relevance of the topology of drug– of interactions in the drug–target datasets from Yamanishi et al. (2008) target interaction networks as source of information for predicting interactions. We do this by introducing a kernel that captures the Dataset Drugs Targets n /n Interactions d t topological information. Using a simple machine learning method, we then compare this kernel to kernels based on other sources of Enzyme 445 664 0.67 2926 information. Ion Channel 210 204 1.03 1476 Specifically, we start from the assumption that two drugs that GPCR 223 95 2.35 635 interact in a similar way with the targets in a known drug–target Nuclear Receptor 54 26 2.08 90 interaction network, will also interact in a similar way with new targets. We formalize this property by describing each drug with an interaction profile, a binary vector describing the presence or absence of interaction with every target in that network. The interaction profile of a target is defined in a similar way. From these profiles, we construct the Gaussian Interaction Profile kernel. We show that interaction profiling can be effectively used for accurate prediction of drug–target interaction. Specifically, we propose a simple regularized least square algorithm incorporating a product of kernels constructed from drug and target interaction profiles. We test the predictive performance of this method on four drug–target interaction networks in humans involving Enzymes, Ion Channels, GPCRs and Nuclear Receptors. These experiments show that using only information on the topology of the drug–target interaction, in the form of interaction profiles, excellent results are achieved as measured by the area under the precision–recall curve Fig. 1. An illustration of the construction of interaction profiles from a drug– (AUPR) (Davis and Goadrich, 2006). In particular, on three of the target interaction network. Circles are drugs, and squares are targets. In this four considered datasets the performance is superior to the best example, the interaction profile of target t indicates that it interacts with drugs d and d , but not with d , d or d . results of current state-of-the-art methods that use multiple sources 1 2 3 4 5 of information. We further show that the proposed method can be easily GENES database (Kanehisa et al., 2006). Sequence similarity extended to also use other sources of information in the form between proteins was computed using a normalized version of of suitable kernels. Results of experiments where also chemical Smith–Waterman score (Smith and Waterman, 1981), resulting in and genomic information on drugs and targets is included show a similarity matrix denoted S , which represents the genomic space. excellent performance, with AUPR score of 91.5, 94.3, 79.0 and 68.4 on the four datasets, achieving an improvement of 7.4, 13.0, 12.3 and 7.2 over the best results reported in Bleakley 3 METHODS and Yamanishi (2009). A thorough analysis of the results enable us to detect several new putative drug–target interactions, see 3.1 Problem formalization http://cs.ru.nl/~tvanlaarhoven/drugtarget2011/new-interactions/. We consider the problem of predicting new interactions in a drug–target interaction network. Formally, we are given a set X ={d ,d ,...,d } of d 1 2 n drugs and a set X ={t ,t ,...,t } of target proteins. There is also a set t 1 2 n 2 MATERIALS of known interactions between drugs and targets. If we consider these interactions as edges, then they form a bipartite network. We can characterize We used four drug–target interaction networks in humans involving this network by the n ×n adjacency matrix Y . That is, y = 1 if drug d d t ij i Enzymes, Ion Channels, GPCRs and Nuclear Receptors, first interacts with target t and y = 0 otherwise. Our task is now to rank all j ij analyzed by Yamanishi et al. (2008). We worked with the datasets drug–target pairs (d ,t ) such that highest ranked pairs are the most likely to i j provided by these authors, in order to facilitate benchmark interact. comparisons with the current state-of-the-art algorithms that do the same. These datasets are publicly available at http://web.kuicr.kyoto- 3.2 Gaussian interaction profile kernel u.ac.jp/supp/yoshi/drugtarget/. Table 1 lists some properties of the datasets. Our method is based on the assumption that drugs exhibiting a similar pattern of interaction and non-interaction with the targets of a drug–target interaction Drug–target interaction information was retrieved from the network are likely to show similar interaction behavior with respect to new KEGG BRITE (Kanehisa et al., 2006), BRENDA (Schomburg et al., targets. We use a similar assumption on targets. We, therefore, introduce the 2004), SuperTarget (Günther et al., 2008) and DrugBank (Wishart (target) interaction profile y of a drug d to be the binary vector encoding the di i et al., 2008) databases. Chemical structures of the compounds was presence or absence of interaction with every target in the considered drug– derived from the DRUG and COMPOUND sections in the KEGG target network. This is nothing more than row i of the adjacency matrix Y . LIGAND database (Kanehisa et al., 2006). The chemical structure Similarly, the (drug) interaction profile y of a target protein t is a vector tj similarity between compounds was computed using SIMCOMP specifying the presence or absence of interaction with every drug in the (Hattori et al., 2003). This resulted in a similarity matrix denoted considered drug–target network. The interaction profiles generated from a by S , which represents the chemical space. Amino acid sequences c drug–target interaction network can be used as feature vectors for a classifier. of the target (human) proteins were obtained from the KEGG Figure 1 illustrates the construction of interaction profiles. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3037 3036–3043 T.van Laarhoven et al. Following the current state-of-the-art for the drug–target interaction very different from 0, like −1, would place too much weight on non- prediction problem, we will use kernel methods, and hence construct a kernel interactions. The classifier would then try to avoid predicting pairs that look from the interaction profiles. This kernel does not include any information like non-interactions, rather than predicting pairs that look like interactions. beyond the topology of the drug–target network. In the previous sections, we defined kernels on drugs and kernels on target One of the most popular choices for constructing a kernel from a feature proteins. There are several ways in which we can use kernels in both these vector is the Gaussian kernel, also known as the radial basis function (RBF) dimensions. Following other works, like Bleakley and Yamanishi (2009); kernel. This kernel is, for drugs d and d , Zheng Xia and Wong (2010), a simple and effective approach is to apply i j the classifier for each drug independently using only the target kernel, and K (d ,d ) = exp(−γ y −y  ). GIP,d i j d di dj also for each target independently using only the drug kernel. Then the final score for a drug–target pair is a combination of the two outputs. A kernel for the similarities between target proteins, K , can be defined GIP,t Here we use the average of the output values, and denote the resulting analogously. We call these kernels Gaussian Interaction Profile (GIP) kernels. method by RLS-avg. Observe that in the formulation of the RLS classifier The parameter γ controls the kernel bandwidth. We set that we use, performing independent prediction amounts to replacing the 2 vector y with the matrix Y , and hence the prediction of RLS-avg is γ = γ ˜ |y | . d d di i=1 1 1 −1 −1 T Y = K (K + σI ) Y + K (K + σI ) Y . d d t t That is, we normalize the parameter by dividing it by the average 2 2 number of interactions per drug. With this choice, the kernel values become Note this model is slightly different from using the Kronecker sum kernel independent of the size of the dataset. In principle, the new bandwidth (Kashima et al., 2009a). Since regularization is performed for drugs and parameter γ ˜ could be set with cross-validation, but in this article, we simply targets separately in the RLS-avg method, rather than jointly. use γ ˜ = 1. There are other ways to construct a kernel from interaction profiles. For example, Basilico and Hofmann (2004) propose using the correlation of 3.5 RLS-Kron classifier interaction profiles. We have performed brief experiments with these other A better alternative is to combine the kernels into a larger kernel that directly kernels, which show that GIP kernels consistently outperform kernels based relates drug–target pairs. This is done with the Kronecker product kernel on correlation or inner products. The detailed results of these experiments (Basilico and Hofmann, 2004; Ben-Hur and Noble, 2005; Hue and Vert, are included in Supplementary Table S1. 2010; Oyama and Manning, 2004). The Kronecker product K ⊗K of the d t drug and target kernels is 3.3 Integrating chemical and genomic information K ((d ,t ),(d ,t )) =K (d ,d )K (t ,t ). i j k l d i k t j l We construct kernels containing information about the chemical and genomic space from the similarity matrices S and S . Since these similarity matrices d g With this kernel, we can make predictions for all pairs at once, are neither symmetric nor positive definite, we apply a simple transformation T T −1 T to make them symmetric with S = (S +S )/2 and add a small multiple ˆ sym vec(Y ) =K (K + σI ) vec(Y ), of the identity matrix to enforce the positive definite property. We denote the resulting kernels for drugs and targets by K and K , where vec(Y ) is the a vector of all interaction pairs, created by stacking the chemical,d genomic,t respectively. columns of Y . We call this method RLS-Kron. Using the Kronecker product kernel directly would involve calculating the To combine the interaction profile kernel with these chemical and genomic inverse of an n n ×n n matrix, which would take O((n n ) ) operations, kernels, we use a simple weighted average, d t d t d t and would also require too much memory. We use a more efficient K = α K + (1 − α )K d d chemical,d d GIP,d implementation based on eigen decompositions, previously presented in Raymond and Kashima (2010). K = α K + (1 − α )K . t t genomic,t t GIP,t T T Let K =V  V and K =V  V be the eigen decompositions of the t t t t d d d d For the reported results of our evaluation, we use simply the unweighted two kernel matrices. Since the eigenvalues (vectors) of a Kronecker product average, for both drugs and targets, i.e. α = α = 0.5. In Section 4.2, we d t are the Kronecker product of eigenvalues (vectors), for our Kronecker further analyze the effect of these parameters on the predictive performance T product kernel we have simply K =K ⊗K =V V , where  =  ⊗ d t d t of the method. and V =V ⊗V . The matrix that we want to invert, K + σI has these same d t eigenvectors V , and eigenvalues  + σI . Hence 3.4 RLS-avg classifier −1 −1 T K (K + σI ) =V ( + σI ) V . In principle, we could use the GIP kernels with any kernel-based classification or ranking algorithm. We choose to use a very basic classifier, T To efficiently multiply this matrix with vec(Y ), we can use a further the (kernel) Regularized Least Squares (RLS) classifier. While Least Squares T property of the Kronecker product, namely that (A ⊗B)vec(X ) = vec(BXA ). is primarily used for regression, when a good kernel is used it has Combining these facts, we get that the RLS prediction is classification accuracy similar to that of Support Vector Machines (Rifkin T T and Klautau, 2004). Our own experiments confirm this finding. In the RLS Y =V Z V , d t classifier, the predicted values y ˆ with a given kernel K have a simple closed where form solution, −1 T T −1 vec(Z ) = ( ⊗  )( ⊗  + σI ) vec(V Y V ). y ˆ =K (K + σI ) y, d t d t t d where σ is a regularization parameter. Higher values of σ give a smoother So, to make a RLS prediction using the Kronecker product kernel we result, while for σ =0weget y ˆ =y, and hence no generalization at all. The only need to perform the two eigen decompositions and some matrix 3 3 value y ˆ is a real-valued score, which we can interpret as a confidence. multiplications, bringing the runtime down to O(n +n ). The efficiency d t The RLS classifier is sensitive to the encoding used for y. Here, we of this computation could be further improved yielding a quadratic use 1 for encoding interacting pairs and 0 for non-interacting ones. Brief computational complexity by applying recent techniques for large-scale experiments have shown that the classifier is not sensitive to this choice, as kernel methods for computing the two kernel decompositions (Kashima et al., long as the value used for non-interactions is close to 0. Using a value 2009b; Wu et al., 2006). [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3038 3036–3043 GIP kernel Table 2. Results on the drug–target interaction datasets 3.6 Comparison methods In order to assess globally the performance of our method, we compare it Dataset Method Kernel AUC AUPR against current state-of-the-art algorithms. To the best of our knowledge, the best results on these datasets obtained so far are those reported by Bleakley BY09 (auc) chem/gen 97.6 83.3 and Yamanishi (2009), where the Bipartite Local Models (BLM) approach BY09 (aupr) chem/gen 97.3 84.1 was introduced. These results were achieved by combining the output scores of the Kernel Regression Method (KRM) (Yamanishi et al., 2008) and BLM RLS-avg GIP 98.2 88.1 by taking their maximum value. We briefly recall these methods here. Enzyme RLS-avg chem/gen 96.6 84.5 In the KRM method, drugs and targets are embedded into a unified space called the ‘pharmacological space’. A regression model is learned between RLS-avg avg. 97.9 90.5 the chemical structure (respectively, genomic sequence) similarity space and this pharmacological space. Then new potential drugs and targets are mapped RLS-Kron GIP 98.3* 88.5 into the pharmacological space using this regression model. Finally, new RLS-Kron chem/gen 96.6 85.6 drug–target interactions are predicted by connecting drugs and target proteins RLS-Kron avg. 97.8 91.5* that are closer than a threshold in the pharmacological space. The BLM method is similar to our RLS-avg method. In the BLM method, BY09 (auc) chem/gen 97.3 78.1 the presence or absence of a drug–target interaction is predicted as follows. BY09 (aupr) chem/gen 93.5 81.3 First, the target is excluded, and a training set is constructed consisting of two classes: all other known targets of the drug in question, and the targets RLS-avg GIP 98.5 91.8 Ion Channel not known to interact with that drug. Second, a Support Vector Machine that RLS-avg chem/gen 97.1 80.7 discriminates between the two classes is constructed, using the available RLS-avg avg. 98.1 93.2 genomic kernel for the targets. This model is then used to predict the label of the target, and hence the interaction or non-interaction of the considered RLS-Kron GIP 98.6* 92.7 drug–target pair. A similar procedure is applied with the roles of drugs RLS-Kron chem/gen 97.1 77.5 and targets reversed, using the chemical structure kernel instead. These two RLS-Kron avg. 98.4 94.3* results are combined by taking the maximum value. BY09 chem/gen 95.5* 66.7 4 EVALUATION RLS-avg GIP 94.5 70.0 In order to compare the performance of the methods, we performed GPCR RLS-avg chem/gen 94.7 66.0 RLS-avg avg. 95.0 77.1 systematic experiments simulating the process of bipartite network inference from biological data on four drug–target interaction RLS-Kron GIP 94.7 71.3 networks. These experiments are done by full leave-one-out cross- RLS-Kron chem/gen 94.8 63.8 validation (LOOCV) as follows. In each run of the method, one RLS-Kron avg. 95.4 79.0 drug–target pair (interacting or non-interacting) is left out by setting its entry in the Y matrix to 0. Then we try to recover its true label BY09 chem/gen 88.1 61.2 using the remaining data. Note that when leaving out a drug–target pair the Y matrix RLS-avg GIP 88.7 60.4 changes, and therefore the GIP kernel has to be recomputed. Nuclear Receptor RLS-avg chem/gen 86.4 54.7 We also performed a variation of these experiments using five RLS-avg avg. 92.5* 67.0 trials of 10-fold cross-validation. We recomputed the GIP kernels for each fold, also for 10-fold cross-validation. So no information RLS-Kron GIP 90.6 61.0 about the removed interactions was leaked in this way. RLS-Kron chem/gen 85.9 51.1 The results can be found in Supplementary Table S2; we observed RLS-Kron avg. 92.2 68.4* no large differences compared with the results obtained using The AUC and AUPR scores are normalized to 100. For each dataset, indicates the LOOCV. highest AUC/AUPR score. In all experiments, we have chosen the values for the parameters in an uninformative way. In particular, we set the regularization parameter σ = 1 for both RLS methods; and as stated before, we set the kernel bandwidths γ ˜ = γ ˜ = 1 for both the drug and target scores of true non-interactions. For this task, because there are few interaction profile kernels. true drug–target interactions, the AUPR is a more significant quality We assessed the performance of the methods with the following measure than the AUC, as it punishes much more the existence two quality measures generally used in this type of studies: AUC of false positive examples found among the best ranked prediction and AUPR. Specifically, we computed the ROC curve of true scores (Davis and Goadrich, 2006). positives as a function of false positives, and considered the AUC as Table 2 contains the results for the two RLS-based classifiers, quality measure (see for instance Fawcett, 2006). Furthermore, we RLS-avg and RLS-Kron, each with three different kernel considered the the precision–recall curve (Raghavan et al., 1989), combinations: that is, the plot of the ratio of true positives among all positive predictions for each given recall rate. The area under this curve  GIP: using only the GIP kernels, i.e. K =K and K = (AUPR) provides a quantitative assessment of how well, on average, d GIP,d t K , corresponding to α = α = 1. predicted scores of true interactions are separated from predicted GIP,t d t [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3039 3036–3043 T.van Laarhoven et al. (a) (b) (c) (d) Fig. 2. Precision–recall curves for the RLS-Kron method. The red dotted line corresponds to using only the chemical and genomic kernels. The green dashed line corresponds to using only the GIP kernels. The blue solid line corresponds to the average of the two types of kernels. On all datasets, the average kernel shows a small improvement over either kernel type alone. (a) Enzyme; (b) ion channel; (c) GPCR; (d) nuclear receptor.  chem/gen: using only the chemical structure and genomic sequence similarity, so K =K and K =K , d chemical,d t genomic,t corresponding to α = α = 0. d t  avg: using the average of the two types of kernels, corresponding to α = α = 0.5. For comparison, we have also included in the table as BY09 (auc) and BY09 (aupr), the best results from the combined BML and KRM methods from Bleakley and Yamanishi (2009). For the GPCR Fig. 3. AUPR and AUC scores for the GPCR dataset with different weightings of the kernels. Lighter colors are better. For all datasets α = and nuclear receptor datasets, the method with the highest AUC is d α = 0.5 gives near optimal results. the same as the one with the highest AUPR, therefore it is included only once, as BY09. 4.2 Kernels’ relevance In the previous section, we have shown that using a mix of the 4.1 Analysis GIP kernels and the chemical and genomic kernels gives results Using only the GIP kernel, our Kronecker product RLS method superior to either type of kernel alone. In order to determine has AUPR scores of 88.5, 92.7, 71.3 and 61.0 on the Enzyme, Ion the relative importance of the network topology compared with Channel, GPCR and nuclear receptor datasets, respectively. These chemical and sequence similarity, we have investigated the change results are superior to the results from using only the chemical and in prediction performance when varying the parameters α and genomic kernels. α between 0 (chemical/genomic kernels only) and 1 (interaction Overall, the RLS-Kron and RLS-avg methods have comparable profiles kernels only). For computational reasons, we have used AUC scores. However, the RLS-Kron has a better AUPR when using 10-fold cross-validation instead of leave-one-out. the GIP kernel, and a worse AUPR when using the chemical and In Figure 3, we have plotted the AUPR and AUC scores on the genomic kernels. We believe that this problem is due to the poor GPCR dataset for the different parameter values. Lighter colors quality of the chemical similarity kernel, to which the RLS-Kron correspond to higher values. Because of space limitations, plots for method is more sensitive. the other datasets are included in Supplementary Figures S1 and S2. Note also that the RLS-avg method is comparable to Bleakley and For all datasets, the optimal AUPR is obtained using a mix of the Yamanishi’s bipartite local model (BLM) approach. The differences drug and target kernels. Using the parameters α = α = 0.5, as we are that whereas we use a RLS classifier, they use Support Vector did in the previous section, seems to be a good choice across the Machines; and whereas we use the average to combine results, they datasets. Also note that the choice of α is more important than the use the maximum value. It is therefore not surprising that when choice of α . This seems to indicate that the sequence similarity for using the chemical and genomic kernels, the results of the RLS-avg targets is more informative than the chemical similarity for drugs. method are very similar to their results. A similar observation was also made in Bleakley and Yamanishi In all cases, the best results are obtained when the GIP kernels are (2009). The poor performance of the RLS-Kron method when using combined with the chemical and genomic kernels. With the RLS- only chemical and genomic kernels that we observed in the previous Kron method, we then obtain AUPR scores of 91.5, 94.3, 79.0 and section appears to be due entirely to this uninformative chemical 68.4 on the four datasets, which is an improvement of 7.4, 13.0, 12.3 similarity. and 7.2 over the best results reported by Bleakley and Yamanishi On the larger datasets (Enzyme and Ion Channel), the optimal (2009). Figure 2 shows the precision–recall curves for the RLS- AUC is obtained with α = 1, while that choice gives the worst Kron method. Compared with other methods, the RLS-Kron method results on the smaller datasets. This can be explained by noting that with the average kernels achieves a good precision also at higher when there are few drugs, there is less information available for each recall values, especially on the larger datasets (Enzyme and Ion entry of GIP target kernel, and hence this kernel will be of a lower Channel). quality. We have confirmed this hypothesis by testing different sized [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3040 3036–3043 GIP kernel Table 3. The top 10 new interactions predicted in the GPCR dataset, 4 have Table 4. The number of highly ranked new interactions that are found in at been confirmed (shown in bold) least one of the three considered databases (ChEMBL, DrugBank or KEGG DRUG) Rank Pair Description NN Dataset Method Top 20 (%) Top 50 (%) Top 80 (%) 1 D00283 Clozapine 0.769 [C,D] hsa1814 DRD3: dopamine receptor D3 0.455 BY09 6 (30) 15 (30) 17 (21) Enzyme RLS-Kron-avg 11 (55) 15 (30) 22 (28) 2 D02358 Metoprolol 0.750 [C,D] hsa154 ADRB2: beta-2 adrenergic receptor 0.434 BY09 11 (55) 14 (28) 18 (22) Ion Channel RLS-Kron-avg 8 (40) 12 (24) 22 (28) 3 D00604 Clonidine hydrochloride 0.933 hsa147 ADRA1B: alpha-1B adrenergic receptor 0.435 BY09 13 (65) 22 (44) 30 (38) GPCR RLS-Kron-avg 9 (45) 28 (56) 40 (50) 4 D03966 Eglumegad 0.036 hsa2914 GRM4: glutamate receptor, metabotropic 4 0.768 BY09 5 (25) 15 (30) 22 (28) Nuclear Receptor RLS-Kron-avg 9 (45) 20 (40) 22 (28) 5 D00255 Carvedilol 0.380 hsa152 ADRA2C: alpha-2C adrenergic receptor 0.489 6 D04625 Isoetharine 0.737 [K] hsa154 ADRB2: beta-2 adrenergic receptor 0.434 as new interactions. Moreover, these databases are incomplete, so if a predicted interaction is not present in one of the used 7 D03966 Eglumegad 0.036 databases, this does not necessarily mean it does not exist. For this hsa2917 GRM7: glutamate receptor, metabotropic 7 0.758 dataset, we started with only 635 known drug–target interactions and 20 550 drug–target pairs not known to interact. Of these 20 550 8 D02340 Loxapine 0.769 [D] hsa1812 DRD1: dopamine receptor D1 0.205 pairs, we selected 10 as putative drug–target interaction, and found that at least 4 of them are experimentally verified. These findings 9 D00503 Perphenazine 0.857 support the practical relevance of the proposed method. hsa1816 DRD5: dopamine receptor D5 0.529 We compared the newly predicted interactions generated by RLS- Kron-avg and those generated by Bleakley and Yamanishi (2009), 10 D00682 Carboprost tromethamine 0.914 here referred to as BY09. Specifically, given a dataset, for each hsa5739 PTGIR: prostaglandin I2 receptor (IP) 0.150 method we extracted from its top x new predictions those that have been experimentally validated (that is, that could be found Interactions that appear in the ChEMBL database are marked with ‘[C]’, interactions in in ChEMBL, DrugBank or KEGG DRUG). Table 4 contains a Drugbank are marked with ‘[D]’, and interactions in Kegg are marked with ‘[K]’. The summary of the results for x = 20,50,80. Looking at the top 20 NN column gives the similarity to the nearest drug interacting with the same target, and to the nearest target interacting with the same drug. predictions, it seems that the two methods perform best on different datasets. For the top 50 and top 80 predictions, the results indicate the capability of RLS-Kron-avg to predict successfully more new subsets of the Ion Channel dataset, where we observe the same effect interactions than BY09. on small subsets. The full results of that experiment are available in We then compared the resulting two sets of confirmed new Supplementary Figure S3. predictions among the top 50, by looking at common predictions and at interactions uniquely predicted by only one of the two methods. 4.3 New predicted interactions The results for the four datasets can be found in Supplementary In order to analyze the practical relevance of the method Tables S7–S10. for predicting novel drug–target interactions, we conducted an On the Enzyme dataset, BY09 and RLS-Kron-avg successfully experiment similar to that described by Bleakley and Yamanishi predicted 15 new interactions, with 10 common predictions. On (2009). We ranked the non-interacting pairs according to the scores the Ion Channel dataset, BY09 and RLS-Kron-avg successfully computed for LOOCV experiments. We estimate the most highly predicted 14 and 12 new interactions, respectively, of which only 1 ranked drug–target pairs as most likely to be putative interactions. interaction was predicted by both methods. Although BY09 found A list of the top 20 new interactions predicted for each of the four slightly more confirmed interactions they were less diverse, since datasets can be found in Supplementary Tables S3–S6. 11 of them involve interactions between (different types of) the Table 3 lists the top 10 new interactions predicted for the GPCR voltage-gated sodium channel alpha subunit target and only 2 dataset. We have looked up these predicted interactions in ChEMBL drugs: prilocaine and tocainide. On the other hand, RLS-Kron-avg version 9 (Overington, 2009), DrugBank (Wishart et al., 2008) found interactions of 4 different classes of targets and 10 different and the latest online version of KEGG DRUG (Kanehisa et al., drugs. On the GPCR dataset, BY09 and RLS-Kron-avg successfully 2006). A significant fraction of the predictions (4 out of 10) is predicted 22 and 28 new interactions, respectively, with 14 common found in one or more of these databases. One should bear in predictions. Finally, on the Nuclear Receptor dataset, BY09 and mind that a large fraction of the interactions in these databases are RLS-Kron-avg successfully predicted 15 and 20 new interactions, already included in the training data, and hence are not counted respectively. Among them, 13 were in common. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3041 3036–3043 T.van Laarhoven et al. In general, the two methods seem to differ in the type of an AUPR score of 92.7, which improves upon the state-of-the-art, new predictions made. While there is always an overlap of new while using less prior information. interactions between the two methods, there is also always a Besides the GIP kernel, we have also introduced the RLS-Kron subset of new interactions which RLS-Kron-avg can successfully algorithm that combines a kernel on drugs and a kernel on targets predict but BY09 fails to predict and vice versa. Moreover, there using the Kronecker product. Compared with previous methods that seems to be a slight tendency of BY09 to generate new successful do prediction with the two kernels independently and then combine predictions that are less diverse than those generated by RLS- the results, this new method represents a small but consistent Kron-avg. However, we were not able to identify any differential improvement. biological bias of the methods toward the detection of specific types By combining the GIP kernel with chemical and genomic of interactions. information, we get a method with excellent performance. This method has AUPR scores of 91.5, 94.3, 79.0 and 68.4 on four datasets of drug–target interaction networks in humans, representing 4.4 Surprising interactions an average improvement of 10 points over previous results. The AUPR is a particularly relevant metric for this problem, because it A closer inspection shows that many of the predicted interactions is very sensitive to the correctness of the highest ranked predictions. are not very surprising. For example, the GPCR dataset contains The large improvement in AUPR suggests that the top ranked the interaction between clozapine and dopamine receptor D1. The putative drug–target interactions found by our method are more drug loxapine is very similar to clozapine, and it is therefore to likely to be correct than those found in previous methods. be expected that our method also predicts loxapine to interact A limitation of all machine learning methods for finding new with dopamine receptor D1. An analogous thing happens with very drug–target interactions is that they are sensitive to inherent biases similar target proteins. In order to provide a quantitative measure of contained in the training data. It would be interesting to try and how surprising these predictions are, we computed the similarity analyze the bias of existing datasets of drug–target interaction, but of a the drug and target in an interaction pair to their Nearest this is out of the scope of this article. Note also that the datasets Neighbor (NN), that is, the most similar drug (with respect to by Yamanishi et al. (2008) used in this article do not include any chemical structure similarity) and target (with respect to sequence singletons: each drug interacts with at least one target, and each similarity) in the training set, respectively. These similarities, which target interacts with at least one drug. This property could affect we call surprise scores, are listed in the NN column of Table 3. An the cross-validation results, by allowing a limited form of cheating. inspection of the surprise scores shows that the majority of the drug– However, the experiments in Section 4.3 show that our method also target pairs predicted by our method consist of a drug and a target works when tested in other ways. very similar to a drug and a target already known to interact, and A further limitation of the approach used in this article is that therefore they are not very surprising. This phenomenon is common it can only be applied to detect new interactions for a target or a to any computational approach that uses similarity between objects drug for which at least one interaction has already been established. for inferring interaction. Therefore, biologists can use the method as guidance for extending To assess the ability of our method to also predict more surprising their knowledge about the interaction of a drug or of a target, not interactions, we have looked specifically at the predicted interactions for discovering interactions of a new drug or target (that is, one for where there is no similar drug interacting with the same target which no interaction is known). In particular, our method is useful or similar target interacting with the same drug in the dataset. for experimentalist to aid in experimental design and interpretation, We pick a threshold value and consider drugs (targets) to be especially in solving problems related to drug–target selectivity dissimilar if their chemical (genomic) similarity is less than this and polypharmacology (Merino et al., 2010; Metz and Hajduk, threshold. We have used the threshold 0.5 for the chemical similarity 2010). and 0.25 for the genomic similarity. There are several ways in which the result might further be When only these ‘surprising’ pairs are considered, we find, as improved. So far we have used uninformative choices of the expected, that fewer of them are present in the ChEMBL, DrugBank parameters: γ ˜ = 1, σ = 1 and α = 0.5. Of these choices, we have only and KEGG databases. But we still find more interactions among the investigated the last one. Perhaps with tuning of the other parameters highly ranked ‘surprising’ pairs compared with those that are ranked better predictions are possible, although one has to be careful not to lower. For example, on the GPCR dataset, 89 of the 500 highest over-fit them to the data. ranked pairs were surprising, and 10 of them (11%) were found in Another avenue for improvement is in using more information one of the databases (see Supplementary Material for details). about drugs and targets. Since combining the GIP kernel with chemical and genomic kernels leads to a better predictive performance, perhaps adding different information in the form of 5 DISCUSSION additional kernels would yield further improvements. These kernels We have presented a new kernel that leads to good predictive could be interaction profile kernels based on other types data, such performance as measured by AUPR on the task of predicting as protein–protein interaction networks. Similarly, for each pair of interactions between drugs and target proteins. An interesting aspect interacting drug and target more information is known beyond the of our GPI kernel is that it uses no properties beyond the interactions fact they interact. For example, the type of interaction, the binding themselves. This means that knowing the sequence of proteins and strength, the mechanism of discovery and its uncertainty might all be known. In this article, we have made no use of this additional chemical structure of drugs is perhaps not as important for this task information, nor did we attempt to predict the type or strength of as previously thought. For example, on the Ion Channel dataset our method with only the GIP kernel has an AUC score of 98.6 and interactions. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3042 3036–3043 GIP kernel Funding: Netherlands Organization for Scientific Research (NWO) Lü,L. and Zhou,T. (2011) Link prediction in complex networks: a survey. Phys. A Stat. Mech. Appl., 390, 1150–1170. within NWO project (612.066.927, in part). Martin,Y.C. et al. (2002) Do structurally similar molecules have similar biological activity? J. Med. Chem., 45, 4350–4358. Conflict of Interest: none declared. Merino,A. et al. (2010) Drug profiling: knowing where it hits. Drug Discov. Today, 15, 749–756. Metz,J.T. and Hajduk,P.J. (2010) Rational approaches to targeted polypharmacology: REFERENCES creating and navigating protein-ligand interaction networks. Curr. Opin. Chem. Basilico,J. and Hofmann,T. (2004) Unifying collaborative and content-based filtering. In Biol., 14, 498–504. ICML ’04: Proceedings of the 21st International Conference on Machine learning. Okuno,Y. et al. (2007) GLIDA: GPCR ligand database for chemical genomics drug ACM, New York, NY, pp. 65–72. discovery database and tools update. Nucleic Acids Res., 36, D907–D912. Ben-Hur,A. and Noble,W. S. (2005) Kernel methods for predicting protein–protein Overington,J. (2009) ChEMBL. An interview with John Overington, team leader, interactions. Bioinformatics, 21 (Suppl. 1), i38–i46. chemogenomics at the European Bioinformatics Institute Outstation of the European Bleakley,K. and Yamanishi,Y. (2009) Supervised prediction of drug-target interactions Molecular Biology Laboratory (EMBL-EBI). J. Comput. Aided Mol. Des., 23, using bipartite local models. Bioinformatics, 25, 2397–2403. 195–198. Campillos,M. et al. (2008) Drug target identification using side-effect similarity. Oyama,S. and Manning,C.D. (2004) Using feature conjunctions across examples for Science, 321, 263–266. learning pairwise classifiers. In ECML ’04: Proceedings of the 15th European Cheng,A.C. et al. (2007) Structure-based maximal affinity model predicts small- Conference on Machine Learning, Vol. 3201. Springer, pp. 322–333. molecule druggability. Nat. Biotechnol., 25, 71–5. Raghavan,V.V. et al. (1989) A critical investigation of recall and precision as measures Davis,J. and Goadrich,M. (2006) The relationship between precision-recall and ROC of retrieval system performance. ACM Trans. Informat. Syst., 7, 205–229. curves. In ICML ’06: Proceedings of the 23rd International Conference on Machine Raymond,R. and Kashima,H. (2010) Fast and scalable algorithms for semi-supervised learning. ACM, New York, NY, pp. 233–240. link prediction on static and dynamic graphs. In Proceedings of the 2010 European Fawcett,T. (2006) An introduction to ROC analysis. Patt. Recognit. Lett., 27, 861–874. conference on Machine learning and knowledge discovery in databases: Part III , Günther,S. et al. (2008) SuperTarget and Matador: resources for exploring drug-target ECML PKDD’10. Springer, Berlin, Heidelberg, pp. 131–147. relationships. Nucleic Acids Res., 36, D919–D922. Rifkin,R. and Klautau,A. (2004) In defense of one-vs-all classification. J. Mach. Learn. Haggarty,S.J. et al. (2003) Multidimensional chemical genetic analysis of diversity- Res., 5, 101–141. oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Schölkopf,B. et al. (eds) (2004) Kernel Methods in Computational Biology. MIT Press, Biol., 10, 383–396. Cambridge, MA. Hattori,M. et al. (2003) Development of a chemical structure comparison method for Schomburg,I. et al. (2004) BRENDA, the enzyme database: updates and major new integrated analysis of chemical and genomic information in the metabolic pathways. developments. Nucleic Acids Res., 32 (Suppl. 1), D431–D433. J. Am. Chem Soc., 125, 11853–11865. Schuffenhauer,A. et al. (2003) Similarity metrics for ligands reflecting the similarity of Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., the target proteins. J. Chem. Inf. Comput. Sci., 43, 391–405. 1, 727–730. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular Hue,M. and Vert,J.-P. (2010) On learning with kernels for unordered pairs. In subsequences. J. Mol. Biol., 147, 195–197. Fürnkranz,J. and Joachims,T. (eds) ICML ’10: Proceedings of the 27th International Wassermann,A.M. et al. (2009) Ligand prediction for orphan targets using support Conference on Machine Learning. Omnipress, Haifa, Israel, pp. 463–470. vector machines and various target-ligand kernels is dominated by nearest neighbor Jacob,L. and Vert,J.-P. (2008) Protein-ligand interaction prediction: an improved effects. J. Chem. Inf. Model, 49, 2155–2167. chemogenomics approach. Bioinformatics, 24, 2149–2156. Wishart,D.S. et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug Jaroch,S.E. and Weinmann,H. (eds) (2006) Chemical Genomics: Small Molecule targets. Nucleic Acids Res., 36, D901–D906. Probes to Study Cellular Function. Ernst Schering Research Foundation Workshop. Wu,G. et al. (2006) Incremental approximate matrix factorization for speeding up Springer, Berlin. support vector machines. In KDD ’06: Proceedings of the 12th ACM SIGKDD Kanehisa,M. et al. (2006) From genomics to chemical genomics: new developments in International Conference on Knowledge Discovery and Data Mining. ACM, New KEGG. Nucleic Acids Res., 34, D354–D357. York, NY, pp. 760–766. Kashima,H. et al. (2009a) On pairwise kernels: an efficient alternative and Yamanishi,Y. et al. (2008) Prediction of drug-target interaction networks from the generalization analysis. In PAKDD ’09: Proceedings of the 13th Pacific- integration of chemical and genomic spaces. Bioinformatics, 24, i232–i240. Asia Conference on Knowledge Discovery and Data Mining. Springer, Yamanishi,Y. et al. (2010) Drug-target interaction prediction from chemical, genomic pp. 1030–1037. and pharmacological data in an integrated framework. Bioinformatics, 26, Kashima,H. et al. (2009b) Recent advances and trends in large-scale kernel methods. i246–i254. IEICE Trans., 92-D, 1338–1353. Xia,Z. et al. (2010) Semi-supervised drug-protein interaction prediction from Klabunde,T. (2007) Chemogenomic approaches to drug discovery: similar receptors heterogeneous biological spaces. BMC Syst. Biol., 4 (Suppl. 2), S6. bind similar ligands. Br. J. Pharmacol., 152, 5–7. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3043 3036–3043 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Gaussian interaction profile kernels for predicting drug–target interaction

Loading next page...
 
/lp/oxford-university-press/gaussian-interaction-profile-kernels-for-predicting-drug-target-A2IBno87G0

References (40)

Publisher
Oxford University Press
Copyright
© The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btr500
pmid
21893517
Publisher site
See Article on Publisher Site

Abstract

Vol. 27 no. 21 2011, pages 3036–3043 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btr500 Data and text mining Advance Access publication September 4, 2011 Gaussian interaction profile kernels for predicting drug–target interaction 1,∗ 2 1,∗ Twan van Laarhoven , Sander B. Nabuurs and Elena Marchiori 1 2 Department of Computer Science, Radboud University Nijmegen and Computational Drug Discovery, Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands Associate Editor: Jonathan Wren ABSTRACT Drug–target interaction data are available for many classes of pharmaceutically useful target proteins including Enzymes, Motivation: The in silico prediction of potential interactions between Ion Channels, G-protein-coupled receptors (GPCRs) and Nuclear drugs and target proteins is of core importance for the identification Receptors (Hopkins and Groom, 2002). Several publicly available of new drugs or novel targets for existing drugs. However, only databases have been built and maintained, such as KEGG a tiny portion of all drug–target pairs in current datasets are BRITE (Kanehisa et al., 2006), DrugBank (Wishart et al., 2008), experimentally validated interactions. This motivates the need for GLIDA (Okuno et al., 2007), SuperTarget and Matador (Günther developing computational methods that predict true interaction pairs et al., 2008), BRENDA (Schomburg et al., 2004) and ChEMBL with high accuracy. (Overington, 2009) containing drug–target interaction and other Results: We show that a simple machine learning method that related sources of information, like chemical and genomic data. uses the drug–target network as the only source of information A property of the current drug–target interaction databases is that is capable of predicting true interaction pairs with high accuracy. they contain a rather small number of drug–target pairs which are Specifically, we introduce interaction profiles of drugs (and of targets) experimentally validated interactions. This motivates the need for in a network, which are binary vectors specifying the presence or developing methods that predict true interacting pairs with high absence of interaction with every target (drug) in that network. We accuracy. define a kernel on these profiles, called the Gaussian Interaction Recently, machine learning methods have been introduced to Profile (GIP) kernel, and use a simple classifier, (kernel) Regularized tackle this problem. They can be viewed as instances of the more Least Squares (RLS), for prediction drug–target interactions. We general link prediction problem, see Lü and Zhou (2011) for test comparatively the effectiveness of RLS with the GIP kernel on a recent survey of this topic. These methods are motivated by four drug–target interaction networks used in previous studies. The the observation that similar drugs tend to target similar proteins proposed algorithm achieves area under the precision–recall curve (Klabunde, 2007; Schuffenhauer et al., 2003). This property was (AUPR) up to 92.7, significantly improving over results of state-of-the- shown, for instance, for chemical (Martin et al., 2002) and side effect art methods. Moreover, we show that using also kernels based on similarity (Campillos et al., 2008), and motivated the development chemical and genomic information further increases accuracy, with a of an integrated approach for drug–target interaction prediction neat improvement on small datasets. These results substantiate the (Jaroch and Weinmann, 2006). A desirable property of this approach relevance of the network topology (in the form of interaction profiles) is that it does not require the 3D structure information of the target as source of information for predicting drug–target interactions. proteins, which is needed in traditional methods based on docking Availability: Software and Supplementary Material are available at simulations (Cheng et al., 2007). http://cs.ru.nl/~tvanlaarhoven/drugtarget2011/. The current state-of-the-art for the in silico prediction of drug– Contact: [email protected]; [email protected] target interaction is formed by methods that employ similarity Supplementary Information: Supplementary data are available at measures for drugs and for targets in the form by kernel functions, Bioinformatics online. like Bleakley and Yamanishi (2009); Jacob and Vert (2008); Received on June 9, 2011; revised on August 12, 2011; accepted on Wassermann et al. (2009); Yamanishi et al. (2008, 2010). By using August 29, 2011 kernels, multiple sources of information can be easily incorporated for performing prediction (Schölkopf et al., 2004). In Yamanishi et al. (2008), different settings of the interaction 1 INTRODUCTION prediction problem are explored. The in silico prediction of interaction between drugs and target The authors make the distinction between ‘known’ drugs or proteins is a core step in the drug discovery process for identifying targets, for which at least one interaction is in the training set, and new drugs or novel targets for existing drugs, in order to guide and ‘new’ drugs or targets, for which there is not. There are then four speed up the laborious and costly experimental determination of possible settings, depending on whether the drugs and/or targets are drug–target interaction (Haggarty et al., 2003). known or new. In this article, we focus on the setting where both the drugs and targets are known. That is, we use known interactions To whom correspondence should be addressed. for predicting novel ones. 3036 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3036 3036–3043 GIP kernel Table 1. The number of drugs and target proteins, their ratio and the number We want to analyze the relevance of the topology of drug– of interactions in the drug–target datasets from Yamanishi et al. (2008) target interaction networks as source of information for predicting interactions. We do this by introducing a kernel that captures the Dataset Drugs Targets n /n Interactions d t topological information. Using a simple machine learning method, we then compare this kernel to kernels based on other sources of Enzyme 445 664 0.67 2926 information. Ion Channel 210 204 1.03 1476 Specifically, we start from the assumption that two drugs that GPCR 223 95 2.35 635 interact in a similar way with the targets in a known drug–target Nuclear Receptor 54 26 2.08 90 interaction network, will also interact in a similar way with new targets. We formalize this property by describing each drug with an interaction profile, a binary vector describing the presence or absence of interaction with every target in that network. The interaction profile of a target is defined in a similar way. From these profiles, we construct the Gaussian Interaction Profile kernel. We show that interaction profiling can be effectively used for accurate prediction of drug–target interaction. Specifically, we propose a simple regularized least square algorithm incorporating a product of kernels constructed from drug and target interaction profiles. We test the predictive performance of this method on four drug–target interaction networks in humans involving Enzymes, Ion Channels, GPCRs and Nuclear Receptors. These experiments show that using only information on the topology of the drug–target interaction, in the form of interaction profiles, excellent results are achieved as measured by the area under the precision–recall curve Fig. 1. An illustration of the construction of interaction profiles from a drug– (AUPR) (Davis and Goadrich, 2006). In particular, on three of the target interaction network. Circles are drugs, and squares are targets. In this four considered datasets the performance is superior to the best example, the interaction profile of target t indicates that it interacts with drugs d and d , but not with d , d or d . results of current state-of-the-art methods that use multiple sources 1 2 3 4 5 of information. We further show that the proposed method can be easily GENES database (Kanehisa et al., 2006). Sequence similarity extended to also use other sources of information in the form between proteins was computed using a normalized version of of suitable kernels. Results of experiments where also chemical Smith–Waterman score (Smith and Waterman, 1981), resulting in and genomic information on drugs and targets is included show a similarity matrix denoted S , which represents the genomic space. excellent performance, with AUPR score of 91.5, 94.3, 79.0 and 68.4 on the four datasets, achieving an improvement of 7.4, 13.0, 12.3 and 7.2 over the best results reported in Bleakley 3 METHODS and Yamanishi (2009). A thorough analysis of the results enable us to detect several new putative drug–target interactions, see 3.1 Problem formalization http://cs.ru.nl/~tvanlaarhoven/drugtarget2011/new-interactions/. We consider the problem of predicting new interactions in a drug–target interaction network. Formally, we are given a set X ={d ,d ,...,d } of d 1 2 n drugs and a set X ={t ,t ,...,t } of target proteins. There is also a set t 1 2 n 2 MATERIALS of known interactions between drugs and targets. If we consider these interactions as edges, then they form a bipartite network. We can characterize We used four drug–target interaction networks in humans involving this network by the n ×n adjacency matrix Y . That is, y = 1 if drug d d t ij i Enzymes, Ion Channels, GPCRs and Nuclear Receptors, first interacts with target t and y = 0 otherwise. Our task is now to rank all j ij analyzed by Yamanishi et al. (2008). We worked with the datasets drug–target pairs (d ,t ) such that highest ranked pairs are the most likely to i j provided by these authors, in order to facilitate benchmark interact. comparisons with the current state-of-the-art algorithms that do the same. These datasets are publicly available at http://web.kuicr.kyoto- 3.2 Gaussian interaction profile kernel u.ac.jp/supp/yoshi/drugtarget/. Table 1 lists some properties of the datasets. Our method is based on the assumption that drugs exhibiting a similar pattern of interaction and non-interaction with the targets of a drug–target interaction Drug–target interaction information was retrieved from the network are likely to show similar interaction behavior with respect to new KEGG BRITE (Kanehisa et al., 2006), BRENDA (Schomburg et al., targets. We use a similar assumption on targets. We, therefore, introduce the 2004), SuperTarget (Günther et al., 2008) and DrugBank (Wishart (target) interaction profile y of a drug d to be the binary vector encoding the di i et al., 2008) databases. Chemical structures of the compounds was presence or absence of interaction with every target in the considered drug– derived from the DRUG and COMPOUND sections in the KEGG target network. This is nothing more than row i of the adjacency matrix Y . LIGAND database (Kanehisa et al., 2006). The chemical structure Similarly, the (drug) interaction profile y of a target protein t is a vector tj similarity between compounds was computed using SIMCOMP specifying the presence or absence of interaction with every drug in the (Hattori et al., 2003). This resulted in a similarity matrix denoted considered drug–target network. The interaction profiles generated from a by S , which represents the chemical space. Amino acid sequences c drug–target interaction network can be used as feature vectors for a classifier. of the target (human) proteins were obtained from the KEGG Figure 1 illustrates the construction of interaction profiles. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3037 3036–3043 T.van Laarhoven et al. Following the current state-of-the-art for the drug–target interaction very different from 0, like −1, would place too much weight on non- prediction problem, we will use kernel methods, and hence construct a kernel interactions. The classifier would then try to avoid predicting pairs that look from the interaction profiles. This kernel does not include any information like non-interactions, rather than predicting pairs that look like interactions. beyond the topology of the drug–target network. In the previous sections, we defined kernels on drugs and kernels on target One of the most popular choices for constructing a kernel from a feature proteins. There are several ways in which we can use kernels in both these vector is the Gaussian kernel, also known as the radial basis function (RBF) dimensions. Following other works, like Bleakley and Yamanishi (2009); kernel. This kernel is, for drugs d and d , Zheng Xia and Wong (2010), a simple and effective approach is to apply i j the classifier for each drug independently using only the target kernel, and K (d ,d ) = exp(−γ y −y  ). GIP,d i j d di dj also for each target independently using only the drug kernel. Then the final score for a drug–target pair is a combination of the two outputs. A kernel for the similarities between target proteins, K , can be defined GIP,t Here we use the average of the output values, and denote the resulting analogously. We call these kernels Gaussian Interaction Profile (GIP) kernels. method by RLS-avg. Observe that in the formulation of the RLS classifier The parameter γ controls the kernel bandwidth. We set that we use, performing independent prediction amounts to replacing the 2 vector y with the matrix Y , and hence the prediction of RLS-avg is γ = γ ˜ |y | . d d di i=1 1 1 −1 −1 T Y = K (K + σI ) Y + K (K + σI ) Y . d d t t That is, we normalize the parameter by dividing it by the average 2 2 number of interactions per drug. With this choice, the kernel values become Note this model is slightly different from using the Kronecker sum kernel independent of the size of the dataset. In principle, the new bandwidth (Kashima et al., 2009a). Since regularization is performed for drugs and parameter γ ˜ could be set with cross-validation, but in this article, we simply targets separately in the RLS-avg method, rather than jointly. use γ ˜ = 1. There are other ways to construct a kernel from interaction profiles. For example, Basilico and Hofmann (2004) propose using the correlation of 3.5 RLS-Kron classifier interaction profiles. We have performed brief experiments with these other A better alternative is to combine the kernels into a larger kernel that directly kernels, which show that GIP kernels consistently outperform kernels based relates drug–target pairs. This is done with the Kronecker product kernel on correlation or inner products. The detailed results of these experiments (Basilico and Hofmann, 2004; Ben-Hur and Noble, 2005; Hue and Vert, are included in Supplementary Table S1. 2010; Oyama and Manning, 2004). The Kronecker product K ⊗K of the d t drug and target kernels is 3.3 Integrating chemical and genomic information K ((d ,t ),(d ,t )) =K (d ,d )K (t ,t ). i j k l d i k t j l We construct kernels containing information about the chemical and genomic space from the similarity matrices S and S . Since these similarity matrices d g With this kernel, we can make predictions for all pairs at once, are neither symmetric nor positive definite, we apply a simple transformation T T −1 T to make them symmetric with S = (S +S )/2 and add a small multiple ˆ sym vec(Y ) =K (K + σI ) vec(Y ), of the identity matrix to enforce the positive definite property. We denote the resulting kernels for drugs and targets by K and K , where vec(Y ) is the a vector of all interaction pairs, created by stacking the chemical,d genomic,t respectively. columns of Y . We call this method RLS-Kron. Using the Kronecker product kernel directly would involve calculating the To combine the interaction profile kernel with these chemical and genomic inverse of an n n ×n n matrix, which would take O((n n ) ) operations, kernels, we use a simple weighted average, d t d t d t and would also require too much memory. We use a more efficient K = α K + (1 − α )K d d chemical,d d GIP,d implementation based on eigen decompositions, previously presented in Raymond and Kashima (2010). K = α K + (1 − α )K . t t genomic,t t GIP,t T T Let K =V  V and K =V  V be the eigen decompositions of the t t t t d d d d For the reported results of our evaluation, we use simply the unweighted two kernel matrices. Since the eigenvalues (vectors) of a Kronecker product average, for both drugs and targets, i.e. α = α = 0.5. In Section 4.2, we d t are the Kronecker product of eigenvalues (vectors), for our Kronecker further analyze the effect of these parameters on the predictive performance T product kernel we have simply K =K ⊗K =V V , where  =  ⊗ d t d t of the method. and V =V ⊗V . The matrix that we want to invert, K + σI has these same d t eigenvectors V , and eigenvalues  + σI . Hence 3.4 RLS-avg classifier −1 −1 T K (K + σI ) =V ( + σI ) V . In principle, we could use the GIP kernels with any kernel-based classification or ranking algorithm. We choose to use a very basic classifier, T To efficiently multiply this matrix with vec(Y ), we can use a further the (kernel) Regularized Least Squares (RLS) classifier. While Least Squares T property of the Kronecker product, namely that (A ⊗B)vec(X ) = vec(BXA ). is primarily used for regression, when a good kernel is used it has Combining these facts, we get that the RLS prediction is classification accuracy similar to that of Support Vector Machines (Rifkin T T and Klautau, 2004). Our own experiments confirm this finding. In the RLS Y =V Z V , d t classifier, the predicted values y ˆ with a given kernel K have a simple closed where form solution, −1 T T −1 vec(Z ) = ( ⊗  )( ⊗  + σI ) vec(V Y V ). y ˆ =K (K + σI ) y, d t d t t d where σ is a regularization parameter. Higher values of σ give a smoother So, to make a RLS prediction using the Kronecker product kernel we result, while for σ =0weget y ˆ =y, and hence no generalization at all. The only need to perform the two eigen decompositions and some matrix 3 3 value y ˆ is a real-valued score, which we can interpret as a confidence. multiplications, bringing the runtime down to O(n +n ). The efficiency d t The RLS classifier is sensitive to the encoding used for y. Here, we of this computation could be further improved yielding a quadratic use 1 for encoding interacting pairs and 0 for non-interacting ones. Brief computational complexity by applying recent techniques for large-scale experiments have shown that the classifier is not sensitive to this choice, as kernel methods for computing the two kernel decompositions (Kashima et al., long as the value used for non-interactions is close to 0. Using a value 2009b; Wu et al., 2006). [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3038 3036–3043 GIP kernel Table 2. Results on the drug–target interaction datasets 3.6 Comparison methods In order to assess globally the performance of our method, we compare it Dataset Method Kernel AUC AUPR against current state-of-the-art algorithms. To the best of our knowledge, the best results on these datasets obtained so far are those reported by Bleakley BY09 (auc) chem/gen 97.6 83.3 and Yamanishi (2009), where the Bipartite Local Models (BLM) approach BY09 (aupr) chem/gen 97.3 84.1 was introduced. These results were achieved by combining the output scores of the Kernel Regression Method (KRM) (Yamanishi et al., 2008) and BLM RLS-avg GIP 98.2 88.1 by taking their maximum value. We briefly recall these methods here. Enzyme RLS-avg chem/gen 96.6 84.5 In the KRM method, drugs and targets are embedded into a unified space called the ‘pharmacological space’. A regression model is learned between RLS-avg avg. 97.9 90.5 the chemical structure (respectively, genomic sequence) similarity space and this pharmacological space. Then new potential drugs and targets are mapped RLS-Kron GIP 98.3* 88.5 into the pharmacological space using this regression model. Finally, new RLS-Kron chem/gen 96.6 85.6 drug–target interactions are predicted by connecting drugs and target proteins RLS-Kron avg. 97.8 91.5* that are closer than a threshold in the pharmacological space. The BLM method is similar to our RLS-avg method. In the BLM method, BY09 (auc) chem/gen 97.3 78.1 the presence or absence of a drug–target interaction is predicted as follows. BY09 (aupr) chem/gen 93.5 81.3 First, the target is excluded, and a training set is constructed consisting of two classes: all other known targets of the drug in question, and the targets RLS-avg GIP 98.5 91.8 Ion Channel not known to interact with that drug. Second, a Support Vector Machine that RLS-avg chem/gen 97.1 80.7 discriminates between the two classes is constructed, using the available RLS-avg avg. 98.1 93.2 genomic kernel for the targets. This model is then used to predict the label of the target, and hence the interaction or non-interaction of the considered RLS-Kron GIP 98.6* 92.7 drug–target pair. A similar procedure is applied with the roles of drugs RLS-Kron chem/gen 97.1 77.5 and targets reversed, using the chemical structure kernel instead. These two RLS-Kron avg. 98.4 94.3* results are combined by taking the maximum value. BY09 chem/gen 95.5* 66.7 4 EVALUATION RLS-avg GIP 94.5 70.0 In order to compare the performance of the methods, we performed GPCR RLS-avg chem/gen 94.7 66.0 RLS-avg avg. 95.0 77.1 systematic experiments simulating the process of bipartite network inference from biological data on four drug–target interaction RLS-Kron GIP 94.7 71.3 networks. These experiments are done by full leave-one-out cross- RLS-Kron chem/gen 94.8 63.8 validation (LOOCV) as follows. In each run of the method, one RLS-Kron avg. 95.4 79.0 drug–target pair (interacting or non-interacting) is left out by setting its entry in the Y matrix to 0. Then we try to recover its true label BY09 chem/gen 88.1 61.2 using the remaining data. Note that when leaving out a drug–target pair the Y matrix RLS-avg GIP 88.7 60.4 changes, and therefore the GIP kernel has to be recomputed. Nuclear Receptor RLS-avg chem/gen 86.4 54.7 We also performed a variation of these experiments using five RLS-avg avg. 92.5* 67.0 trials of 10-fold cross-validation. We recomputed the GIP kernels for each fold, also for 10-fold cross-validation. So no information RLS-Kron GIP 90.6 61.0 about the removed interactions was leaked in this way. RLS-Kron chem/gen 85.9 51.1 The results can be found in Supplementary Table S2; we observed RLS-Kron avg. 92.2 68.4* no large differences compared with the results obtained using The AUC and AUPR scores are normalized to 100. For each dataset, indicates the LOOCV. highest AUC/AUPR score. In all experiments, we have chosen the values for the parameters in an uninformative way. In particular, we set the regularization parameter σ = 1 for both RLS methods; and as stated before, we set the kernel bandwidths γ ˜ = γ ˜ = 1 for both the drug and target scores of true non-interactions. For this task, because there are few interaction profile kernels. true drug–target interactions, the AUPR is a more significant quality We assessed the performance of the methods with the following measure than the AUC, as it punishes much more the existence two quality measures generally used in this type of studies: AUC of false positive examples found among the best ranked prediction and AUPR. Specifically, we computed the ROC curve of true scores (Davis and Goadrich, 2006). positives as a function of false positives, and considered the AUC as Table 2 contains the results for the two RLS-based classifiers, quality measure (see for instance Fawcett, 2006). Furthermore, we RLS-avg and RLS-Kron, each with three different kernel considered the the precision–recall curve (Raghavan et al., 1989), combinations: that is, the plot of the ratio of true positives among all positive predictions for each given recall rate. The area under this curve  GIP: using only the GIP kernels, i.e. K =K and K = (AUPR) provides a quantitative assessment of how well, on average, d GIP,d t K , corresponding to α = α = 1. predicted scores of true interactions are separated from predicted GIP,t d t [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3039 3036–3043 T.van Laarhoven et al. (a) (b) (c) (d) Fig. 2. Precision–recall curves for the RLS-Kron method. The red dotted line corresponds to using only the chemical and genomic kernels. The green dashed line corresponds to using only the GIP kernels. The blue solid line corresponds to the average of the two types of kernels. On all datasets, the average kernel shows a small improvement over either kernel type alone. (a) Enzyme; (b) ion channel; (c) GPCR; (d) nuclear receptor.  chem/gen: using only the chemical structure and genomic sequence similarity, so K =K and K =K , d chemical,d t genomic,t corresponding to α = α = 0. d t  avg: using the average of the two types of kernels, corresponding to α = α = 0.5. For comparison, we have also included in the table as BY09 (auc) and BY09 (aupr), the best results from the combined BML and KRM methods from Bleakley and Yamanishi (2009). For the GPCR Fig. 3. AUPR and AUC scores for the GPCR dataset with different weightings of the kernels. Lighter colors are better. For all datasets α = and nuclear receptor datasets, the method with the highest AUC is d α = 0.5 gives near optimal results. the same as the one with the highest AUPR, therefore it is included only once, as BY09. 4.2 Kernels’ relevance In the previous section, we have shown that using a mix of the 4.1 Analysis GIP kernels and the chemical and genomic kernels gives results Using only the GIP kernel, our Kronecker product RLS method superior to either type of kernel alone. In order to determine has AUPR scores of 88.5, 92.7, 71.3 and 61.0 on the Enzyme, Ion the relative importance of the network topology compared with Channel, GPCR and nuclear receptor datasets, respectively. These chemical and sequence similarity, we have investigated the change results are superior to the results from using only the chemical and in prediction performance when varying the parameters α and genomic kernels. α between 0 (chemical/genomic kernels only) and 1 (interaction Overall, the RLS-Kron and RLS-avg methods have comparable profiles kernels only). For computational reasons, we have used AUC scores. However, the RLS-Kron has a better AUPR when using 10-fold cross-validation instead of leave-one-out. the GIP kernel, and a worse AUPR when using the chemical and In Figure 3, we have plotted the AUPR and AUC scores on the genomic kernels. We believe that this problem is due to the poor GPCR dataset for the different parameter values. Lighter colors quality of the chemical similarity kernel, to which the RLS-Kron correspond to higher values. Because of space limitations, plots for method is more sensitive. the other datasets are included in Supplementary Figures S1 and S2. Note also that the RLS-avg method is comparable to Bleakley and For all datasets, the optimal AUPR is obtained using a mix of the Yamanishi’s bipartite local model (BLM) approach. The differences drug and target kernels. Using the parameters α = α = 0.5, as we are that whereas we use a RLS classifier, they use Support Vector did in the previous section, seems to be a good choice across the Machines; and whereas we use the average to combine results, they datasets. Also note that the choice of α is more important than the use the maximum value. It is therefore not surprising that when choice of α . This seems to indicate that the sequence similarity for using the chemical and genomic kernels, the results of the RLS-avg targets is more informative than the chemical similarity for drugs. method are very similar to their results. A similar observation was also made in Bleakley and Yamanishi In all cases, the best results are obtained when the GIP kernels are (2009). The poor performance of the RLS-Kron method when using combined with the chemical and genomic kernels. With the RLS- only chemical and genomic kernels that we observed in the previous Kron method, we then obtain AUPR scores of 91.5, 94.3, 79.0 and section appears to be due entirely to this uninformative chemical 68.4 on the four datasets, which is an improvement of 7.4, 13.0, 12.3 similarity. and 7.2 over the best results reported by Bleakley and Yamanishi On the larger datasets (Enzyme and Ion Channel), the optimal (2009). Figure 2 shows the precision–recall curves for the RLS- AUC is obtained with α = 1, while that choice gives the worst Kron method. Compared with other methods, the RLS-Kron method results on the smaller datasets. This can be explained by noting that with the average kernels achieves a good precision also at higher when there are few drugs, there is less information available for each recall values, especially on the larger datasets (Enzyme and Ion entry of GIP target kernel, and hence this kernel will be of a lower Channel). quality. We have confirmed this hypothesis by testing different sized [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3040 3036–3043 GIP kernel Table 3. The top 10 new interactions predicted in the GPCR dataset, 4 have Table 4. The number of highly ranked new interactions that are found in at been confirmed (shown in bold) least one of the three considered databases (ChEMBL, DrugBank or KEGG DRUG) Rank Pair Description NN Dataset Method Top 20 (%) Top 50 (%) Top 80 (%) 1 D00283 Clozapine 0.769 [C,D] hsa1814 DRD3: dopamine receptor D3 0.455 BY09 6 (30) 15 (30) 17 (21) Enzyme RLS-Kron-avg 11 (55) 15 (30) 22 (28) 2 D02358 Metoprolol 0.750 [C,D] hsa154 ADRB2: beta-2 adrenergic receptor 0.434 BY09 11 (55) 14 (28) 18 (22) Ion Channel RLS-Kron-avg 8 (40) 12 (24) 22 (28) 3 D00604 Clonidine hydrochloride 0.933 hsa147 ADRA1B: alpha-1B adrenergic receptor 0.435 BY09 13 (65) 22 (44) 30 (38) GPCR RLS-Kron-avg 9 (45) 28 (56) 40 (50) 4 D03966 Eglumegad 0.036 hsa2914 GRM4: glutamate receptor, metabotropic 4 0.768 BY09 5 (25) 15 (30) 22 (28) Nuclear Receptor RLS-Kron-avg 9 (45) 20 (40) 22 (28) 5 D00255 Carvedilol 0.380 hsa152 ADRA2C: alpha-2C adrenergic receptor 0.489 6 D04625 Isoetharine 0.737 [K] hsa154 ADRB2: beta-2 adrenergic receptor 0.434 as new interactions. Moreover, these databases are incomplete, so if a predicted interaction is not present in one of the used 7 D03966 Eglumegad 0.036 databases, this does not necessarily mean it does not exist. For this hsa2917 GRM7: glutamate receptor, metabotropic 7 0.758 dataset, we started with only 635 known drug–target interactions and 20 550 drug–target pairs not known to interact. Of these 20 550 8 D02340 Loxapine 0.769 [D] hsa1812 DRD1: dopamine receptor D1 0.205 pairs, we selected 10 as putative drug–target interaction, and found that at least 4 of them are experimentally verified. These findings 9 D00503 Perphenazine 0.857 support the practical relevance of the proposed method. hsa1816 DRD5: dopamine receptor D5 0.529 We compared the newly predicted interactions generated by RLS- Kron-avg and those generated by Bleakley and Yamanishi (2009), 10 D00682 Carboprost tromethamine 0.914 here referred to as BY09. Specifically, given a dataset, for each hsa5739 PTGIR: prostaglandin I2 receptor (IP) 0.150 method we extracted from its top x new predictions those that have been experimentally validated (that is, that could be found Interactions that appear in the ChEMBL database are marked with ‘[C]’, interactions in in ChEMBL, DrugBank or KEGG DRUG). Table 4 contains a Drugbank are marked with ‘[D]’, and interactions in Kegg are marked with ‘[K]’. The summary of the results for x = 20,50,80. Looking at the top 20 NN column gives the similarity to the nearest drug interacting with the same target, and to the nearest target interacting with the same drug. predictions, it seems that the two methods perform best on different datasets. For the top 50 and top 80 predictions, the results indicate the capability of RLS-Kron-avg to predict successfully more new subsets of the Ion Channel dataset, where we observe the same effect interactions than BY09. on small subsets. The full results of that experiment are available in We then compared the resulting two sets of confirmed new Supplementary Figure S3. predictions among the top 50, by looking at common predictions and at interactions uniquely predicted by only one of the two methods. 4.3 New predicted interactions The results for the four datasets can be found in Supplementary In order to analyze the practical relevance of the method Tables S7–S10. for predicting novel drug–target interactions, we conducted an On the Enzyme dataset, BY09 and RLS-Kron-avg successfully experiment similar to that described by Bleakley and Yamanishi predicted 15 new interactions, with 10 common predictions. On (2009). We ranked the non-interacting pairs according to the scores the Ion Channel dataset, BY09 and RLS-Kron-avg successfully computed for LOOCV experiments. We estimate the most highly predicted 14 and 12 new interactions, respectively, of which only 1 ranked drug–target pairs as most likely to be putative interactions. interaction was predicted by both methods. Although BY09 found A list of the top 20 new interactions predicted for each of the four slightly more confirmed interactions they were less diverse, since datasets can be found in Supplementary Tables S3–S6. 11 of them involve interactions between (different types of) the Table 3 lists the top 10 new interactions predicted for the GPCR voltage-gated sodium channel alpha subunit target and only 2 dataset. We have looked up these predicted interactions in ChEMBL drugs: prilocaine and tocainide. On the other hand, RLS-Kron-avg version 9 (Overington, 2009), DrugBank (Wishart et al., 2008) found interactions of 4 different classes of targets and 10 different and the latest online version of KEGG DRUG (Kanehisa et al., drugs. On the GPCR dataset, BY09 and RLS-Kron-avg successfully 2006). A significant fraction of the predictions (4 out of 10) is predicted 22 and 28 new interactions, respectively, with 14 common found in one or more of these databases. One should bear in predictions. Finally, on the Nuclear Receptor dataset, BY09 and mind that a large fraction of the interactions in these databases are RLS-Kron-avg successfully predicted 15 and 20 new interactions, already included in the training data, and hence are not counted respectively. Among them, 13 were in common. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3041 3036–3043 T.van Laarhoven et al. In general, the two methods seem to differ in the type of an AUPR score of 92.7, which improves upon the state-of-the-art, new predictions made. While there is always an overlap of new while using less prior information. interactions between the two methods, there is also always a Besides the GIP kernel, we have also introduced the RLS-Kron subset of new interactions which RLS-Kron-avg can successfully algorithm that combines a kernel on drugs and a kernel on targets predict but BY09 fails to predict and vice versa. Moreover, there using the Kronecker product. Compared with previous methods that seems to be a slight tendency of BY09 to generate new successful do prediction with the two kernels independently and then combine predictions that are less diverse than those generated by RLS- the results, this new method represents a small but consistent Kron-avg. However, we were not able to identify any differential improvement. biological bias of the methods toward the detection of specific types By combining the GIP kernel with chemical and genomic of interactions. information, we get a method with excellent performance. This method has AUPR scores of 91.5, 94.3, 79.0 and 68.4 on four datasets of drug–target interaction networks in humans, representing 4.4 Surprising interactions an average improvement of 10 points over previous results. The AUPR is a particularly relevant metric for this problem, because it A closer inspection shows that many of the predicted interactions is very sensitive to the correctness of the highest ranked predictions. are not very surprising. For example, the GPCR dataset contains The large improvement in AUPR suggests that the top ranked the interaction between clozapine and dopamine receptor D1. The putative drug–target interactions found by our method are more drug loxapine is very similar to clozapine, and it is therefore to likely to be correct than those found in previous methods. be expected that our method also predicts loxapine to interact A limitation of all machine learning methods for finding new with dopamine receptor D1. An analogous thing happens with very drug–target interactions is that they are sensitive to inherent biases similar target proteins. In order to provide a quantitative measure of contained in the training data. It would be interesting to try and how surprising these predictions are, we computed the similarity analyze the bias of existing datasets of drug–target interaction, but of a the drug and target in an interaction pair to their Nearest this is out of the scope of this article. Note also that the datasets Neighbor (NN), that is, the most similar drug (with respect to by Yamanishi et al. (2008) used in this article do not include any chemical structure similarity) and target (with respect to sequence singletons: each drug interacts with at least one target, and each similarity) in the training set, respectively. These similarities, which target interacts with at least one drug. This property could affect we call surprise scores, are listed in the NN column of Table 3. An the cross-validation results, by allowing a limited form of cheating. inspection of the surprise scores shows that the majority of the drug– However, the experiments in Section 4.3 show that our method also target pairs predicted by our method consist of a drug and a target works when tested in other ways. very similar to a drug and a target already known to interact, and A further limitation of the approach used in this article is that therefore they are not very surprising. This phenomenon is common it can only be applied to detect new interactions for a target or a to any computational approach that uses similarity between objects drug for which at least one interaction has already been established. for inferring interaction. Therefore, biologists can use the method as guidance for extending To assess the ability of our method to also predict more surprising their knowledge about the interaction of a drug or of a target, not interactions, we have looked specifically at the predicted interactions for discovering interactions of a new drug or target (that is, one for where there is no similar drug interacting with the same target which no interaction is known). In particular, our method is useful or similar target interacting with the same drug in the dataset. for experimentalist to aid in experimental design and interpretation, We pick a threshold value and consider drugs (targets) to be especially in solving problems related to drug–target selectivity dissimilar if their chemical (genomic) similarity is less than this and polypharmacology (Merino et al., 2010; Metz and Hajduk, threshold. We have used the threshold 0.5 for the chemical similarity 2010). and 0.25 for the genomic similarity. There are several ways in which the result might further be When only these ‘surprising’ pairs are considered, we find, as improved. So far we have used uninformative choices of the expected, that fewer of them are present in the ChEMBL, DrugBank parameters: γ ˜ = 1, σ = 1 and α = 0.5. Of these choices, we have only and KEGG databases. But we still find more interactions among the investigated the last one. Perhaps with tuning of the other parameters highly ranked ‘surprising’ pairs compared with those that are ranked better predictions are possible, although one has to be careful not to lower. For example, on the GPCR dataset, 89 of the 500 highest over-fit them to the data. ranked pairs were surprising, and 10 of them (11%) were found in Another avenue for improvement is in using more information one of the databases (see Supplementary Material for details). about drugs and targets. Since combining the GIP kernel with chemical and genomic kernels leads to a better predictive performance, perhaps adding different information in the form of 5 DISCUSSION additional kernels would yield further improvements. These kernels We have presented a new kernel that leads to good predictive could be interaction profile kernels based on other types data, such performance as measured by AUPR on the task of predicting as protein–protein interaction networks. Similarly, for each pair of interactions between drugs and target proteins. An interesting aspect interacting drug and target more information is known beyond the of our GPI kernel is that it uses no properties beyond the interactions fact they interact. For example, the type of interaction, the binding themselves. This means that knowing the sequence of proteins and strength, the mechanism of discovery and its uncertainty might all be known. In this article, we have made no use of this additional chemical structure of drugs is perhaps not as important for this task information, nor did we attempt to predict the type or strength of as previously thought. For example, on the Ion Channel dataset our method with only the GIP kernel has an AUC score of 98.6 and interactions. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3042 3036–3043 GIP kernel Funding: Netherlands Organization for Scientific Research (NWO) Lü,L. and Zhou,T. (2011) Link prediction in complex networks: a survey. Phys. A Stat. Mech. Appl., 390, 1150–1170. within NWO project (612.066.927, in part). Martin,Y.C. et al. (2002) Do structurally similar molecules have similar biological activity? J. Med. Chem., 45, 4350–4358. Conflict of Interest: none declared. Merino,A. et al. (2010) Drug profiling: knowing where it hits. Drug Discov. Today, 15, 749–756. Metz,J.T. and Hajduk,P.J. (2010) Rational approaches to targeted polypharmacology: REFERENCES creating and navigating protein-ligand interaction networks. Curr. Opin. Chem. Basilico,J. and Hofmann,T. (2004) Unifying collaborative and content-based filtering. In Biol., 14, 498–504. ICML ’04: Proceedings of the 21st International Conference on Machine learning. Okuno,Y. et al. (2007) GLIDA: GPCR ligand database for chemical genomics drug ACM, New York, NY, pp. 65–72. discovery database and tools update. Nucleic Acids Res., 36, D907–D912. Ben-Hur,A. and Noble,W. S. (2005) Kernel methods for predicting protein–protein Overington,J. (2009) ChEMBL. An interview with John Overington, team leader, interactions. Bioinformatics, 21 (Suppl. 1), i38–i46. chemogenomics at the European Bioinformatics Institute Outstation of the European Bleakley,K. and Yamanishi,Y. (2009) Supervised prediction of drug-target interactions Molecular Biology Laboratory (EMBL-EBI). J. Comput. Aided Mol. Des., 23, using bipartite local models. Bioinformatics, 25, 2397–2403. 195–198. Campillos,M. et al. (2008) Drug target identification using side-effect similarity. Oyama,S. and Manning,C.D. (2004) Using feature conjunctions across examples for Science, 321, 263–266. learning pairwise classifiers. In ECML ’04: Proceedings of the 15th European Cheng,A.C. et al. (2007) Structure-based maximal affinity model predicts small- Conference on Machine Learning, Vol. 3201. Springer, pp. 322–333. molecule druggability. Nat. Biotechnol., 25, 71–5. Raghavan,V.V. et al. (1989) A critical investigation of recall and precision as measures Davis,J. and Goadrich,M. (2006) The relationship between precision-recall and ROC of retrieval system performance. ACM Trans. Informat. Syst., 7, 205–229. curves. In ICML ’06: Proceedings of the 23rd International Conference on Machine Raymond,R. and Kashima,H. (2010) Fast and scalable algorithms for semi-supervised learning. ACM, New York, NY, pp. 233–240. link prediction on static and dynamic graphs. In Proceedings of the 2010 European Fawcett,T. (2006) An introduction to ROC analysis. Patt. Recognit. Lett., 27, 861–874. conference on Machine learning and knowledge discovery in databases: Part III , Günther,S. et al. (2008) SuperTarget and Matador: resources for exploring drug-target ECML PKDD’10. Springer, Berlin, Heidelberg, pp. 131–147. relationships. Nucleic Acids Res., 36, D919–D922. Rifkin,R. and Klautau,A. (2004) In defense of one-vs-all classification. J. Mach. Learn. Haggarty,S.J. et al. (2003) Multidimensional chemical genetic analysis of diversity- Res., 5, 101–141. oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Schölkopf,B. et al. (eds) (2004) Kernel Methods in Computational Biology. MIT Press, Biol., 10, 383–396. Cambridge, MA. Hattori,M. et al. (2003) Development of a chemical structure comparison method for Schomburg,I. et al. (2004) BRENDA, the enzyme database: updates and major new integrated analysis of chemical and genomic information in the metabolic pathways. developments. Nucleic Acids Res., 32 (Suppl. 1), D431–D433. J. Am. Chem Soc., 125, 11853–11865. Schuffenhauer,A. et al. (2003) Similarity metrics for ligands reflecting the similarity of Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., the target proteins. J. Chem. Inf. Comput. Sci., 43, 391–405. 1, 727–730. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular Hue,M. and Vert,J.-P. (2010) On learning with kernels for unordered pairs. In subsequences. J. Mol. Biol., 147, 195–197. Fürnkranz,J. and Joachims,T. (eds) ICML ’10: Proceedings of the 27th International Wassermann,A.M. et al. (2009) Ligand prediction for orphan targets using support Conference on Machine Learning. Omnipress, Haifa, Israel, pp. 463–470. vector machines and various target-ligand kernels is dominated by nearest neighbor Jacob,L. and Vert,J.-P. (2008) Protein-ligand interaction prediction: an improved effects. J. Chem. Inf. Model, 49, 2155–2167. chemogenomics approach. Bioinformatics, 24, 2149–2156. Wishart,D.S. et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug Jaroch,S.E. and Weinmann,H. (eds) (2006) Chemical Genomics: Small Molecule targets. Nucleic Acids Res., 36, D901–D906. Probes to Study Cellular Function. Ernst Schering Research Foundation Workshop. Wu,G. et al. (2006) Incremental approximate matrix factorization for speeding up Springer, Berlin. support vector machines. In KDD ’06: Proceedings of the 12th ACM SIGKDD Kanehisa,M. et al. (2006) From genomics to chemical genomics: new developments in International Conference on Knowledge Discovery and Data Mining. ACM, New KEGG. Nucleic Acids Res., 34, D354–D357. York, NY, pp. 760–766. Kashima,H. et al. (2009a) On pairwise kernels: an efficient alternative and Yamanishi,Y. et al. (2008) Prediction of drug-target interaction networks from the generalization analysis. In PAKDD ’09: Proceedings of the 13th Pacific- integration of chemical and genomic spaces. Bioinformatics, 24, i232–i240. Asia Conference on Knowledge Discovery and Data Mining. Springer, Yamanishi,Y. et al. (2010) Drug-target interaction prediction from chemical, genomic pp. 1030–1037. and pharmacological data in an integrated framework. Bioinformatics, 26, Kashima,H. et al. (2009b) Recent advances and trends in large-scale kernel methods. i246–i254. IEICE Trans., 92-D, 1338–1353. Xia,Z. et al. (2010) Semi-supervised drug-protein interaction prediction from Klabunde,T. (2007) Chemogenomic approaches to drug discovery: similar receptors heterogeneous biological spaces. BMC Syst. Biol., 4 (Suppl. 2), S6. bind similar ligands. Br. J. Pharmacol., 152, 5–7. [17:46 7/10/2011 Bioinformatics-btr500.tex] Page: 3043 3036–3043

Journal

BioinformaticsOxford University Press

Published: Sep 4, 2011

There are no references for this article.