Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Protein-ligand interaction prediction: an improved chemogenomics approach

Protein-ligand interaction prediction: an improved chemogenomics approach Vol. 24 no. 19 2008, pages 2149–2156 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn409 Genome analysis Protein–ligand interaction prediction: an improved chemogenomics approach 1,2,3,∗ 1,2,3 Laurent Jacob and Jean-Philippe Vert 1 2 Mines ParisTech, Centre for Computational Biology, 35 rue Saint Honoré, F-77305 Fontainebleau, Institut Curie and INSERM, U900, F-75248, Paris, France Received on April 4, 2008; revised on June 17, 2008; accepted on July 30, 2008 Advance Access publication August 1, 2008 Associate Editor: Alfonso Valencia ABSTRACT target, considering each target independently from other proteins. Usual methods are classified into ligand-based and structure- Motivation: Predicting interactions between small molecules and based or docking approaches. Ligand-based approaches compare proteins is a crucial step to decipher many biological processes, and a candidate ligand to the known ligands of the target to make plays a critical role in drug discovery. When no detailed 3D structure their prediction, typically using machine learning algorithms (Butina of the protein target is available, ligand-based virtual screening allows et al., 2002; Byvatov et al., 2003) whereas structure-based the construction of predictive models by learning to discriminate approaches use the 3D-structure of the target to determine how well known ligands from non-ligands. However, the accuracy of ligand- each candidate binds the target (Halperin et al., 2002). based models quickly degrades when the number of known ligands Ligand-based approaches require the knowledge of sufficient decreases, and in particular the approach is not applicable for orphan ligands of a given target with respect to the complexity of the receptors with no known ligand. ligand/non-ligand separation to produce accurate predictors. If few Results: We propose a systematic method to predict ligand–protein or no ligands are known for a target, one is compelled to use docking interactions, even for targets with no known 3D structure and few approaches, which in turn require the 3D structure of the target and or no known ligands. Following the recent chemogenomics trend, are very time consuming. If for a given target with unavailable 3D we adopt a cross-target view and attempt to screen the chemical structure no ligand is known, none of the classical approaches can space against whole families of proteins simultaneously. The lack be applied. This is the case for many GPCR as very few structures of known ligand for a given target can then be compensated by have been crystallized so far (Ballesteros and Palczewski, 2001) the availability of known ligands for similar targets. We test this and many of these receptors, referred to as orphan GPCR, have no strategy on three important classes of drug targets, namely enzymes, known ligand. G-protein-coupled receptors (GPCR) and ion channels, and report An interesting idea to overcome this issue is to stop considering dramatic improvements in prediction accuracy over classical ligand- each protein target independently from other proteins, and rather based virtual screening, in particular for targets with few or no known take the point of view of chemogenomics (Jaroch and Weinmann, ligands. 2006; Kubinyi et al., 2004). Roughly speaking, chemogenomics Availability: All data and algorithms are available as Supplementary aims at mining the entire chemical space, which corresponds to the Material. set of all small molecules, for interactions with the biological space, Contact: [email protected] i.e. the set of all proteins or at least protein families, in particular Supplementary information: Supplementary data are available at drug targets. A salient motivation of the chemogenomics approach Bioinformatics online. is the realization that some classes of molecules can bind ‘similar’ proteins, suggesting that the knowledge of some ligands for a target 1 INTRODUCTION can be helpful to determine ligands for similar targets. Besides, this type of method allows for a more rational approach to design Predicting interactions between small molecules and proteins is a drugs since controlling a whole ligand’s selectivity profile is crucial key element in the drug discovery process. In particular, several to make sure that no side effect occurs and that the compound is classes of proteins such as G-protein-coupled receptors (GPCR), compatible with therapeutical usage. enzymes and ion channels represent a large fraction of current drug Recent reviews (Jaroch and Weinmann, 2006; Klabunde, 2007; targets and important targets for new drug development (Hopkins Kubinyi et al., 2004; Rognan, 2007) describe several chemogenomic and Groom, 2002). Understanding and predicting the interactions approaches to predict interactions between compounds and targets. between small molecules and such proteins could therefore help in A first class of approaches, called ligand-based chemogenomics the discovery of new lead compounds. by Rognan (2007), pool together targets at the level of families Various approaches have already been developed and have proved (such as GPCR) or subfamilies (such as purinergic GPCR) and very useful to address this in silico prediction issue (Manly et al., learn a model for ligands at the level of the family (Balakin et al., 2001). The classical paradigm is to predict the modulators of a given 2002; Klabunde, 2006). Other approaches, termed target-based chemogenomic approaches by Rognan (2007), cluster receptors To whom correspondence should be addressed. © 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. L.Jacob and J.-P.Vert based on ligand binding site similarity and again pool together estimated based on its ability to correctly predict the classes of molecules in the training set. known ligands for each cluster to infer shared ligands (Frimurer The in silico chemogenomics problem is more general because data et al., 2005). Finally, a third strategy termed target-ligand approach involving interactions with different targets are available to train a model by Rognan (2007) attempts to predict ligands for a given target which must be able to predict interactions between any molecule and any by leveraging binding information for other targets in a single protein. In order to extend the previous machine learning approaches to this step, that is, without first attempting to define a particular set setting, we need to represent a pair (t,c) of target t and chemicals c by a of similar receptors. For example, Bock and Gough (2005) vector (t,c), then estimate a linear function f (t,c) =w (t,c) whose sign merge descriptors of ligands and targets to describe putative is used to predict whether or not c can bind to t. As before the vector w can ligand–receptor complexes, and use machine learning methods to be estimated from the training set of interacting and non-interacting pairs, discriminate real complexes from ligand–receptor pairs that do not using any linear machine learning algorithm. form complexes. Erhan et al. (2006) show how the same idea can To summarize, we propose to cast the in silico chemogenomics problem as a learning problem in the ligand–target space thus making it suitable to any be casted in the framework of neural networks and support vector classical linear machine learning approach as soon as a vector representation machines (SVM). In particular, they show that a given set of receptor (t,c) is chosen for protein/ligand pairs. We propose in the next sections a descriptors can be combined with a given set of ligand descriptors in systematic way to design such a representation. a computationally efficient framework, offering in principle a large flexibility in the choice of the receptor and ligand descriptors. 2.2 Vector representation of target/ligand pairs In this article, we go one step further in this direction and investigate various kinds of receptor and ligand descriptors that can A large literature in chemoinformatics has been devoted to the problem be combined for in silico chemogenomics screening with SVM, of representing a molecule c by a vector  (c) ∈R , e.g. using various lig molecular descriptors (Todeschini and Consonni, 2002). These descriptors building on recent development in the field of kernel methods for encode several features related to the physicochemical and structural bio- and chemoinformatics. In particular, we propose a new kernel properties of the molecules, and are widely used to model interactions for receptors, based on a priori defined hierarchies of receptors. between the small molecules and a single target using linear models described We test the different methods for the prediction of ligands for in the previous section (Gasteiger and Engel, 2003). Similarly, much work three major classes of therapeutic targets, namely enzymes, GPCR in computational biology has been devoted to the construction of descriptors and ion channels. We show that the choice of representation for genes and proteins, in order to represent a given protein t by a vector has a strong influence on the accuracy of the model estimated, d (t) ∈R . The descriptors typically capture properties of the sequence or tar and in particular that the new hierarchy kernel systematically structure of the protein, and can be used to infer models to predict, e.g. the outperforms other descriptors used in multitask learning or involving structural or functional class of a protein. receptor sequences. We show that the chemogenomics approach For our in silico chemogenomics problem, we need to represent each pair (c,t) of small molecule and protein by a single vector (c,t). In order to is, particularly, relevant for targets with few known ligands. In capture interactions between features of the molecule and of the protein that particular we estimate that, for orphan receptors with no known may be useful predictors for the interaction between c and t, we propose to ligands, our method reaches a normalized accuracy of 86.2%, 77.6% consider features for the pair (c,t) obtained by multiplying a descriptor of c and 80.5% on the enzymes, GPCR and ion channels, respectively, with a descriptor of t. Intuitively, if for example, the descriptors are binary well above the 50% accuracy of a random predictor that would be indicators of specific structural features in each small molecule and proteins, trained in a classical ligand-based virtual screening framework with then the product of two such features indicates that both the small molecule no training example. and the target carry specific features, which may be strongly correlated with the fact that they interact. More generally, if a molecule c is represented by a vector of descriptors  (c) ∈R and a target protein by a vector of lig descriptors  (t) ∈R , this suggests to represent the pair (c,t) by the set tar 2 METHOD of all possible products of features of c and t, i.e. by the tensor product: We formulate the typical in silico chemogenomics problem as the following learning problem: given a collection of n target/molecule pairs (c,t) =  (c) ⊗  (t). (1) lig tar (t ,c ), ...,(t ,c ) known to form complexes or not, estimate a function 1 1 n n Remember that the tensor product in (1) is a d ×d vector whose (i,j)-th c t f (t,c) that would predict whether any chemical c binds to any target t.In entry is exactly the product of the i-th entry of  (c)bythe j-th entry of lig this section, we propose a rigorous and general framework to solve this (t). This representation can be used to combine in an algorithmic way tar problems building on recent developments of kernel methods in bio- and any vector representation of small molecules with any vector representation chemoinformatics. This approach is similar to the approaches proposed in of proteins, for the purpose of in silico chemogenomics or any other task the context of MHC-I-peptide binding prediction (Jacob and Vert, 2008) and involving pairs of molecules/protein. A potential issue with this approach, in (Erhan et al., 2006). however, is that the size of the vector representation for a pair may be prohibitively large for practical computation and storage. For example, using a vector of molecular descriptors of size 1024 for molecules and 2.1 From single-target screening to chemogenomics representing a protein by the vector of counts of all 2mers of amino acids in Much effort in chemoinformatics has been devoted to the more restricted its sequence (d = 20 × 20 = 400) results in more than 400 k dimensions for problem of mining the chemical space for interaction with a single target the representation of a pair. In order to circumvent this issue we now show t, using a training set of molecules c ,...,c known to interact or not with 1 n how kernel methods such as SVM can efficiently work in such large spaces. the target. Machine learning approaches, such as artificial neural networks (ANN) or SVM, often provide competitive models for such problems. The 2.3 Kernels for target/ligand pairs simplest linear models start by representing each molecule c by a vector representation (c), before estimating a linear function f (c) =w (c) SVM is an algorithm to estimate linear binary classifiers from a training set whose sign (positive or negative) is used to predict whether or not the small of patterns with known class (Boser et al., 1992; Vapnik, 1998). A salient molecule c is a ligand of the target t. The weight vector w is typically feature of SVM, often referred to as the kernel trick, is its ability to process 2150 Protein–ligand interaction prediction large- or even infinite-dimensional patterns as soon as the inner product kernel, a classical choice that usually gives state-of-the-art performances in between any two patterns can be efficiently computed. This property is molecule classification tasks. It is defined as: shared by a large number of popular linear algorithms, collectively referred K (c,c ) ligand to as kernel methods, including for example, algorithms for regression, clustering or outlier detection (Schölkopf and Smola, 2002; Shawe-Taylor (c)  (c ) lig lig = , (4) and Cristianini, 2004). (c)  (c) +  (c )  (c ) −  (c)  (c ) lig lig lig lig lig lig In order to apply kernel methods such as SVM for in silico where  (c) is a binary vector whose bits indicate the presence or absence of lig chemogenomics, we therefore need to show how to efficiently compute the all linear path of length l or less as subgraph of the 2D structure of c. We chose inner product between the vector representations of two molecule/protein l = 8 in our experiment, i.e. characterize the molecules by the occurrences pairs. Interestingly, a classical property of tensor products allows us to of linear subgraphs of length 8 or less, a value previously observed to give factorize the inner product between two tensor product vectors as follows: good results in several virtual screening task (Mahé et al., 2005). We used the freely and publicly available ChemCPP software to compute this kernel (c) ⊗  (t)  (c ) ⊗  (t ) lig tar lig tar in the experiments. =  (c)  (c ) ×  (t)  (t ). (2) lig lig tar tar 2.5 Kernels for targets This factorization dramatically reduces the burden of working with tensor SVM and kernel methods are also widely used in bioinformatics (Schölkopf products in large dimensions. For example, in our previous example where et al., 2004), and a variety of approaches have been proposed to design the dimensions of the small molecule and proteins are vectors of respective kernels between proteins, ranging from kernels based on the amino-acid dimensions 1024 and 400, the inner product in >400 k dimensions between sequence of a protein (Cuturi and Vert, 2005; Jaakkola et al., 2000; Kuang tensor products is simply obtained from (2) by computing two inner products, et al., 2005; Leslie et al., 2002, 2004; Tsuda et al., 2002; Vert et al., 2004) respectively in dimensions 1024 and 400, before taking their product. to kernels based on the 3D structures of proteins (Borgwardt et al., 2005; Even more interestingly, this reasoning extends to the case where inner Dobson and Doig, 2005; Qiu et al., 2007) or the pattern of occurrences of products between vector representations of small molecules and proteins proteins in multiple sequenced genomes (Vert, 2002). These kernels have can themselves be efficiently computed with the help of positive definite been used in conjunction with SVM or other kernel methods for various kernels (Vapnik, 1998), as explained in the next sections. Positive definite tasks related to structural or functional classification of proteins. While any of kernels are linked to inner products by a fundamental result (Aronszajn, these kernels can theoretically be used as a target kernel in (3), we investigate 1950): the kernel between two points is equivalent to an inner product in this article a restricted list of specific kernels described below, aimed at between the points mapped to a Hilbert space uniquely defined by the kernel. illustrating the flexibility of our framework and test various hypothesis. Now by denoting  The Dirac kernel between two targets t,t is: K (c,c ) =  (c)  (c ), ligand lig lig 1if t =t , K (t,t ) = (5) K (t,t ) =  (t)  (t ), Dirac target tar tar 0 otherwise. we obtain the inner product between tensor products by: This basic kernel simply represents different targets as orthonormal vectors. From (3) we see that orthogonality between two proteins t K (c,t),(c ,t ) =K (t,t ) ×K (c,c ). (3) target ligand and t implies orthogonality between all pairs (c,t) and (c ,t ) for any two small molecules c and c . This means that a linear classifier In summary, as soon as two kernels K and K corresponding to ligand target for pairs (c,t) with this kernel decomposes as a set of independent two implicit embeddings of the chemical and biological spaces in two Hilbert linear classifiers for interactions between molecules and each target spaces are chosen, we can solve the in silico chemogenomics problem with protein, which are trained without sharing any information of known an SVM (or any other relevant kernel method) using the product kernel ligands between different targets. In other words, using Dirac kernel (3) between pairs. The particular kernels K and K should ideally ligand target for proteins amounts to performing classical learning independently encode properties related to the ability of similar molecules to bind similar for each target, which is our baseline approach. targets or ligands, respectively. We review in the next two sections possible choices for such kernels.  The multitask kernel between two targets t,t is defined as: K (t,t ) = 1 +K (t,t ). multitask Dirac 2.4 Kernels for ligands This kernel, originally proposed in the context of multitask learning Recent years have witnessed impressive advances in the use of SVM in (Evgeniou et al., 2005), removes the orthogonality of different chemoinformatics (Ivanciuc, 2007). In particular, much work has focused proteins to allow sharing of information. As explained in Evgeniou on the development of kernels for small molecules for the purpose of single- et al. (2005), plugging K in (3) amounts to decomposing the multitask target virtual screening and prediction of pharmacokinetics and toxicity. linear function used to predict interactions as a sum of a linear function For example, simple inner products between vectors of classical molecular common to all targets and of a linear function specific to each target: descriptors have been widely investigated, including physicochemical f (c,t) =w (c,t) =w  (c) +w  (c). (6) lig lig general t properties of molecules or 2D and 3D fingerprints (Azencott et al., 2007; Todeschini and Consonni, 2002). Other kernels have been designed directly A consequence is that only data related to the target t are used to from the comparison of 2D and 3D structures of molecules, including kernels estimate the specific vector w , while all data are used to estimate the based on the detection of common substructures in the 2D structures of common vector w . In our framework this classifier is therefore general molecules seen as graphs (Borgwardt and Kriegel, 2005; Gärtner et al., 2003; the combination of a target-specific part accounting for target-specific Horváth et al., 2004; Kashima et al., 2003, 2004; Mahé and Vert, 2006; Mahé properties of the ligands and a global part accounting for general et al., 2005; Ralaivola et al., 2005) or on the encoding of various properties properties of the ligands across the targets. The latter term allows of the 3D structure of molecules (Azencott et al., 2007; Mahé et al., 2006). to share information during the learning process, while the former While any of these kernels could be used to model the similarities of small ensures that specificities of the ligands for each target are not lost. molecules and be plugged into (3), we restrict ourselves in our experiment to a particular kernel proposed by Ralaivola et al. (2005) called the Tanimoto Available at http://chemcpp.sourceforge.net. 2151 L.Jacob and J.-P.Vert  While the multitask kernel provides a basic framework to share in the corresponding hierarchy plus one, that is, information across proteins, it does not allow to weigh differently K (t,t ) = (t), (t ), hierarchy h h how known interactions with a protein t should contribute to predict where  (t) contains as many features as there are nodes in the interactions with a target t . Empirical observations underlying hierarchy, each being set to 1 if the corresponding node is part of chemogenomics, on the other hand, suggest that molecules binding t’s hierarchy and 0 otherwise, plus one feature constantly set to a ligand t are only likely to bind ligand t similar to t in terms of one that accounts for the ‘plus one’ term of the kernel. One might structure or evolutionary history. In terms of kernels this suggest not expect the EC classification to be a good similarity measure in to plug into (3) a kernel for proteins that quantifies this notion of terms of binding, since it does not closely reflect evolutionary or similarity between proteins, which can, for example, be detected by mechanistic similarities except for the case of identical subclasses comparing the sequences of proteins. In order to test this approach, with different serial numbers. However, using the full hierarchy gave we therefore tested two commonly used kernels between protein a better accuracy in our experiments. Even if the hierarchy itself is not sequences: the mismatch kernel (Leslie et al., 2004), which compares fully relevant in this case, the improvement can be explained, on the proteins in terms of common short sequences of amino acids up one hand, by the multitask effect, i.e. by the fact that we use the data to some mismatches, and the local alignment kernel (Vert et al., from the target and the data from other targets with a smaller weight, 2004) which measures the similarity between proteins as an alignment and on the other hand, by the fact that we give more weight to the score between their primary sequences. In our experiments involving enzymes with the same serial number than to the other enzymes. the mismatch kernel, we use the classical choice of 3-mers with a maximum of one mismatch, and for the datasets where some sequences were not available in the database, we added K (t,t ) Dirac 3 DATA to the kernel (and normalized to one on the diagonal) in order to keep We extracted compound interaction data from the KEGG BRITE it valid. Database (Kanehisa et al., 2002, 2004) concerning enzyme, GPCR and ion channel, three target classes particularly relevant for novel  Alternatively, we propose a new kernel aimed at encoding the drug development. similarity of proteins with respect to the ligands they bind. For each family, the database provides a list of known compounds Indeed, for most major classes of drug targets such as the ones investigated in this study (GPCR, enzymes and ion channels), proteins for each target. Depending on the target families, various categories have been organized into hierarchies that typically describe the of compounds are defined to indicate the type of interaction between precise functions of the proteins within each family. Enzymes are each target and each compound. These are, for example, inhibitor, labeled with Enzyme Commission numbers (EC numbers) defined cofactor and effector for enzyme ligands, antagonist or (full/partial) in International Union of Biochemistry and Molecular Biology agonist for GPCR and pore blocker, (positive/negative) allosteric (1992), that classify the chemical reaction they catalyze, forming a modulator, agonist or antagonist for ion channels. The list is not four-level hierarchy encoded into four numbers. For example, EC 1 exhaustive for the latter since numerous categories exist. Although includes oxidoreductases, EC 1.2 includes oxidoreductases that act on different types of interactions on a given target might correspond the aldehyde or oxo group of donors, EC 1.2.2 is a subclass of EC 1.2 to different binding sites, it is theoretically possible for a non- with NAD+ or NADP+ as acceptor and EC 1.2.2.1 is a subgroup of enzymes catalyzing the oxidation of formate to bicarbonate. These linear classifier like SVM with non-linear kernels to learn classes number define a natural and very informative hierarchy on enzymes: consisting of several disconnected sets. Therefore, for the sake of one can expect that enzymes that are closer in the hierarchy will clarity of our analysis, we do not differentiate between the categories tend to have more similar ligands. Similarly, GPCRs are grouped into of compounds. four classes based on sequence homology and functional similarity: For each target class, we retained only one protein by element of the rhodopsin family (class A), the secretin family (class B), the the hierarchy. In particular, we did not take into account the different metabotropic family (class C) and a last class regrouping more orthologs of the targets, and the different enzymes corresponding diverse receptors (class D). The KEGG database (Kanehisa et al., to the same EC number. We then eliminated all compounds for 2002) subdivides the large rhodopsin family in three subgroups which no molecular descriptor was available (principally peptide (amine receptors, peptide receptors and other receptors) and adds compounds), and all the targets for which no compound was known. a second level of classification based on the type of ligands or known subdivisions. For example, the rhodopsin family with For each target, we generated as many negative ligand–target pairs amine receptors is subdivided into cholinergic receptors, adrenergic as we had known ligands forming positive pairs by combining receptors, etc. This also defines a natural hierarchy that we could the target with a ligand randomly chosen among the other targets’ use to compare GPCRs. Finally, KEGG also provides a classification ligands (excluding those that were known to interact with the given of ion channels. Classification of ion channels is a less simple task target). This protocol generates false negative data since some since some of them can be classified according to different criteria ligands could actually interact with the target although they have like voltage dependence or ligand gating. The classification proposed not been experimentally tested, and our method could benefit from by KEGG includes Cys-loop superfamily, glutamate-gated cation experimentally confirmed negative pairs. channels, epithelial and related Na channels, voltage-gated cation This resulted in 2436 data points for enzymes (1218 channels, related to voltage-gated cation channels, related to inward known enzyme–ligand pairs and 1218 generated negative points) rectifier K channels, chloride channels and related to ATPase-linked transporters and each of these classes is further subdivided according, representing interactions between 675 enzymes and 524 compounds, for example to the type of ligands (e.g. glutamate receptor) or to the 798 training data points for GPCRs representing interactions type of ion passing through the channel (e.g. Na channel). Here between 100 receptors and 219 compounds and 2330 ion channel again, this hierarchy can be used to define a meaningful similarity in data points representing interactions between 114 channels and 462 terms of interaction behavior. compounds. Besides, Figure 1 shows the distribution of the number of known ligands per target for each dataset and illustrates the fact For each of the three target families, we define the hierarchy kernel between two targets of the family as the number of common ancestors that for most of them, few compounds are known. 2152 Protein–ligand interaction prediction Fig. 1. Distribution of the number of training points for a target for the enzymes, GPCR and ion channel datasets. Each bar indicates the proportion of targets in the family for which a given (x-axis) number of data points is available. Table 1. AUC for the first protocol on each dataset with various target kernels For each target t in each family, we carried out two experiments. First, all data points corresponding to other targets in the family K \ Target Enzymes GPCR Channels were used for training only and the n points corresponding to t tar were k-folded with k = min(n ,10). That is, for each fold, an SVM Dirac 0.646 ± 0.009 0.750 ± 0.023 0.770 ± 0.020 classifier was trained on all points involving other targets of the Multitask 0.931 ± 0.006 0.749 ± 0.022 0.873 ± 0.015 family plus a fraction of the points involving t, then the performances Hierarchy 0.955 ± 0.005 0.926 ± 0.015 0.925 ± 0.012 of the classifier were tested on the remaining fraction of data points Mismatch 0.725 ± 0.009 0.805 ± 0.023 0.875 ± 0.015 for t. This protocol is intended to assess the incidence of using Local alignment 0.676 ± 0.009 0.824 ± 0.021 0.901 ± 0.013 ligands from other targets on the accuracy of the learned classifier for a given target. Second, for each target t we trained an SVM classifier using only interactions that did not involve t and tested on the points that involved t. This is intended to simulate the behavior of our framework when making predictions for orphan targets, i.e. for targets for which no ligand is known. For both experiments, we used the area under the ROC curve (AUC) as a performance measure. The ROC curve was computed for each target using the test points pooled from all the folds. For the first protocol, since training an SVM with only one training point does not really make sense and can lead to ‘anti-learning’ less than 0.5 performances, we set all results r involving the Dirac target Fig. 2. Target kernel Gram matrices (K ) for ion channels with multitask, tar hierarchy and local alignment kernels. kernel on targets with only one known ligand to max(r,0.5). This is to avoid any artifactual penalization of the Dirac approach and make sure we measure the actual improvement brought by sharing sharing information among known ligands of different targets, on information across targets. the one hand, and the relevance of incorporating prior information into the kernels, on the other hand. On the GPCR dataset though, the multitask kernel performs 4 RESULTS slightly worse than the Dirac kernel, probably because some targets We first discuss the results obtained on the three datasets for the in different subclasses show very different binding behavior, which first experiment, assessing how using training points from other results in adding more noise than information when sharing naively targets of the family improves prediction accuracy with respect to with this kernel. However, a more careful handling of the similarities individual (Dirac-based) learning. Table 1 shows the mean AUC between GPCRs through the hierarchy kernel results in significant across the family targets for an SVM with a product kernel using improvement over the Dirac kernel (from 75% to 92.6%), again the Tanimoto kernel for ligands and various kernels for proteins. demonstrating the relevance of the approach. For the enzymes and ion channels datasets, we observe significant Sequence-based target kernels do not achieve the same improvements when the multitask kernel is used in place of the Dirac performance as the hierarchy kernel, although they perform kernel, on the one hand, and when the hierarchy kernel replaces relatively well for the ion channel dataset, and give better results than the multitask kernel, on the other hand. For example, the Dirac the multitask kernel for both GPCR and ion channel datasets. In the kernel only performs at an average AUC of 77% for the ion channel case of enzymes, it can be explained by the diversity of the proteins dataset, while the multitask kernel increases the AUC to 87.3% in the family and for the GPCR, by the well-known fact that the and the hierarchy kernel brings it to 92.5%. For the enzymes, a receptors do not share overall sequence homology (Gether, 2000). global improvement of 30.9% is observed between the Dirac and Figure 2 shows three of the tested target kernels for the ion channel the hierarchy approaches. This clearly demonstrates the benefits of dataset. The hierarchy kernel adds some structure information with 2153 L.Jacob and J.-P.Vert Fig. 3. Relative improvement of the hierarchy kernel against the Dirac kernel as a function of the number of known ligands for enzymes, GPCR and ion channel datasets. Each point indicates the mean performance ratio between individual and hierarchy approaches across the targets of the family for which a given (x-axis) number of training points was available. Table 2. AUC for the second protocol on each dataset with various target respect to the multitask kernel, which explains the increase in AUC. kernels The local alignment sequence-based kernels fail to precisely rebuild this structure but retain some substructures. In the cases of GPCR K \ Target Enzymes GPCR Channels tar and enzymes, almost no structure is found by the sequence kernels, which, as alluded to above, was expected and suggests that more Dirac 0.500 ± 0.000 0.500 ± 0.000 0.500 ± 0.000 subtle comparison of the sequences would be required to exploit the Multitask 0.902 ± 0.008 0.576 ± 0.026 0.704 ± 0.026 information they contain. Hierarchy 0.938 ± 0.006 0.875 ± 0.020 0.853 ± 0.019 Figure 3 illustrates the influence of the number of training points Mismatch 0.602 ± 0.008 0.703 ± 0.027 0.729 ± 0.024 for a target on the improvement brought by using information from Local alignment 0.535 ± 0.005 0.751 ± 0.025 0.772 ± 0.023 similar targets. As one could expect, the improvement is very strong when few ligands are known and decreases when enough training points become available. After a certain point (around 30 training ligands of close targets in the hierarchy. In particular, it will predict points), using similar targets can even impair the performances. This that the ligands of the target’s direct neighbors are ligands of suggests that the method could be globally improved by learning for the target (which is an intuitive and natural way to choose new each target independently how much information should be shared, candidate binders). A major difference, however, is that a candidate for example, through kernel learning approaches (Lanckriet et al., molecule which is very similar to ligands of a close target, but 2004). not a ligand itself, will not be be predicted to be a ligand by the The second experiment aims at pushing this remark to its limit by annotation transfer approach. In particular, if the candidate molecule assessing how each strategy is able to predict ligands for proteins is not present anywhere else in the ligand database, it will never with no known ligand. Table 2 shows the results in that case. As be predicted to be a ligand. Exemples can be found in each of expected, the classifiers using Dirac kernels show random behavior the considered target classes. The 4-aminopyridine is a blocker of in this case since using a Dirac kernel with no data for the target the ion channel KCJN5, a potassium inwardly rectifying channel. amounts to learning with no training data at all. In particular, in the Although this molecule is a known blocker of other channels (in SVM implementation that we used, the classifier learned with no particular, many potassium channels), it is not a known ligand of data from the task gave constant scores to all the test points, hence the any other channel of KCJN5’s superfamily. However, the most 0.500 ± 0.000 AUC on the test data. On the other hand, we note that similar molecule in the database, in the sense of the Tanimoto it is still possible to obtain reasonable results using adequate target kernel, is the Pinacidil, which happens to be a known ligand of kernels. In particular, the hierarchy kernel loses only 7.2% of AUC two direct neighbors of KCJN5. This allows our method to predict for the ion channel dataset, 5.1% for the GPCR dataset and 1.7% for 4-aminopyridine as a ligand for this target. Similarly, N -acetyl- the enzymes compared to the first experiment where known ligands d-glucosamine 1,6-bisphosphate is the only known effector of were used, suggesting that if a target with no known compound is phosphoacetylglucosamine mutase, an enzyme of the isomerase placed in the hierarchy, e.g. in the case of GPCR homology detection family. This molecule is not a known ligand of any other enzyme in with known members of the family using specific GPCR alignment the database, so a direct annotation transfer approach would never algorithms (Kratochwil et al., 2005) or fingerprint analysis (Attwood predict it as a ligand. Our method, on the other hand, predicts it et al., 2003), it is possible to predict some of its ligands almost as correctly, taking advantage of the fact that very similar molecules accurately as if some of them were already available. like D-ribose 1,5-bisphosphate or α-d-glucose 1,6-bisphosphate are In this second setting, our approach when using the hierarchy known ligands of direct neighbors. The same observation can be kernel on the targets is closely related to annotation transfer. Indeed, made for several GPCRs, including the prostaglandin F receptor the learned predictor in this case will predict a molecule to be a whose three known ligands are not ligands of any other GPCR but ligand of a given target if the molecule is similar to the known whose direct neighbors have similar ligands. 2154 Protein–ligand interaction prediction 5 DISCUSSION REFERENCES We propose a general method to combine the chemical and the Aronszajn,N. (1950) Theory of reproducing kernels. Trans. Am. Math. Soc., 68, 337–404. biological space in an algorithmic way and predict interaction Attwood,T.K. et al. (2003) Prints and its automatic supplement, preprints. Nucleic Acids between any small molecule and any target, which makes it a very Res., 31, 400–402. valuable tool for drug discovery. The method allows one to represent Azencott,C.-A. et al. (2007) One- to four-dimensional kernels for virtual screening and systematically a ligand–target pair, including information on the the prediction of physical, chemical, and biological properties. J. Chem. Inf. Model, interaction between the ligand and the target. Prediction is then 47, 965–974. Balakin,K.V. et al. (2002) Property-based design of GPCR-targeted library. J. Chem. performed by any machine learning algorithm (an SVM in our case) Inf. Comput. Sci., 42, 1332–1342. in the joint space, which makes targets with few known ligands Ballesteros,J. and Palczewski,K. (2001) G protein-coupled receptor drug discovery: benefit from the data points of similar targets, and which allows implications from the crystal structure of rhodopsin. Curr. Opin. Drug Discov. one to make predictions for targets with no known ligand. Our Devel., 4, 561–574. information-sharing process is therefore simply based on a choice Bock,J.R. and Gough,D.A. (2005) Virtual screen for ligands of orphan g protein-coupled receptors. J. Chem. Inf. Model, 45, 1402–1414. of description for the ligands, another one for the targets and on Borgwardt,K. et al. (2005) Protein function prediction via graph kernels. Bioinformatics, classical machine learning methods. Everything is done by casting 21(Suppl. 1), i47–i56. the problem in a joint space and no explicit procedure to select which Borgwardt,K.M. and Kriegel,H.-P. (2005) Shortest-path kernels on graphs. In part of the information is shared is needed. Since it subdivides the Proceedings of the Fifth International Conference on Data Mining. IEEE Computer representation problem into two subproblems, our approach makes Society, Washington, DC, USA, pp. 74–81. Boser,B.E. et al. (1992) A training algorithm for optimal margin classifiers. In use of previous work on kernels for molecular graphs and kernels for Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory. biological targets. For the same reason, it will automatically benefit ACM Press, New York, USA, pp. 144–152. from future improvements in both fields. This leaves plenty of room Butina,D. et al. (2002) Predicting ADME properties in silico: methods and models. to increase the performance. Drug Discov. Today, 7(Suppl. 11), S83–S88. Results on experimental ligand datasets show that using target Byvatov,E. et al. (2003) Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci., 43, kernels allowing to share information across the targets considerably 1882–1889. improve the prediction, especially in the case of targets with Cuturi,M. and Vert,J.-P. (2005) The context-tree kernel for strings. Neural Netw., 18, few known ligands. The improvement is particularly strong 1111–1123. when the target kernel uses prior information on the structure Dobson,P. and Doig,A. (2005) Predicting enzyme class from protein structure without between the targets, e.g. a hierarchy defined on a target class. alignments. J. Mol. Biol., 345, 187–199. Erhan,D. et al. (2006) Collaborative filtering on a family of biological targets. J. Chem. Although the usage of a kernel based on the hierarchy is restricted Inf. Model, 46, 626–635. to protein families where hierarchical classification schemes exist, Evgeniou,T. et al. (2005) Learning multiple tasks with kernel methods. J. Mach. Learn. it applies to the three main classes of proteins targeted by drugs, and Res., 6, 615–637. others like cytochromes and abc transporters. Sequence kernels, on Frimurer,T.M. et al. (2005) A physicogenetic method to assign ligand-binding the other hand, did not give very good results in our experiments. relationships between 7tm receptors. Bioorg. Med. Chem. Lett., 15, 3707–3712. Gärtner,T. et al. (2003) On graph kernels: hardness results and efficient alternatives. However, we believe using the target sequence information could be In Schölkopf,B. and Warmuth,M. (eds) Proceedings of the Sixteenth Annual an interesting alternative or complement to the hierarchy kernel. For Conference on Computational Learning Theory and the Seventh Annual Workshop example, Jacob et al. (2008) used a kernel based on the sequence on Kernel Machines. Vol. 2777 of Lecture Notes in Computer Science. Springer, of the GPCR that performed as well as the kernel based on the Heidelberg, pp. 129–143. GPCR hierarchy. Further improvement could come from the use of Gasteiger,J. and Engel,T. (eds) (2003) Chemoinformatics : a Textbook. Wiley. Gether,U. (2000) Uncovering molecular mechanisms involved in activation of g protein- kernel for structures in the cases where 3D structure information is coupled receptors. Endocr. Rev., 21, 90–113. available (e.g. for the enzymes, but not for the GPCR). Our method Halperin,I. et al. (2002) Principles of docking: an overview of search algorithms and a also shows good performances even when no ligand is known at all guide to scoring functions. Proteins, 47, 409–443. for a given target, which is excellent news since classical ligand- Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., based approaches fail to predict ligand for these targets on the one 1, 727–730. Horváth,T. et al. (2004) Cyclic pattern kernels for predictive graph mining. In hand, and docking approaches are computationally expensive and Proceedings of the tenth ACM SIGKDD international conference on Knowledge not feasible when the target 3D structure is unknown, which is the discovery and data mining. ACM Press, New York, NY, pp. 158–167. case of GPCR on the other hand. International Union of Biochemistry and Molecular Biology (1992) Enzyme In future work, it could be interesting to apply this framework to Nomenclature 1992. Academic Press, California, USA. quantitative prediction of binding affinity using regression methods Ivanciuc,O. (2007) Applications of support vector machines in chemistry. In Lipkowitz,K.B. and Cundari,T.R. (eds) Reviews in Computational Chemistry. in the joint space. It would also be important to confirm predicted Vol. 23. Wiley-VCH, Weiheim, pp. 291–400. ligands experimentally or at least by docking approaches when the Jaakkola,T. et al. (2000) A discriminative framework for detecting remote protein target 3D structure is available. homologies. J. Comput. Biol., 7, 95–114. Jacob,L. and Vert,J.-P. (2008) Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics, 24, 358–366. Jacob,L. et al. (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics (in press). ACKNOWLEDGEMENTS Jaroch,S.E. and Weinmann,H. (eds) (2006) Chemical Genomics: Small Molecule We thank Pierre Mahé for his help with ChemCPP and kernels for Probes to Study Cellular Function. Ernst Schering Research Foundation Workshop. molecules, and Véronique Stoven for insightful discussions on the Springer, Berlin. Kanehisa,M. et al. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res., 30, biological aspects of the problem. 42–46. Conflict of Interest: none declared. 2155 L.Jacob and J.-P.Vert Kanehisa,M. et al. (2004) The KEGG resource for deciphering the genome. Nucleic Mahé,P. et al. (2005) Graph kernels for molecular structure-activity relationship analysis Acids Res., 32(Database issue), D277–D280. with support vector machines. J. Chem. Inf. Model, 45, 939–951. Kashima,H. et al. (2003) Marginalized kernels between labeled graphs. In Faucett,T. Mahé,P. et al. (2006) The pharmacophore kernel for virtual screening with support and Mishra,N. (eds), Proceedings of the Twentieth International Conference on vector machines. J. Chem. Inf. Model, 46, 2003–2014. Machine Learning, AAAI Press, pp. 321–328. Manly,C. et al. (2001) The impact of informatics and computational chemistry on Kashima,H. et al. (2004) Kernels for graphs. In Schölkopf,B. et al. (eds) Kernel Methods synthesis and screening. Drug Discov. Today, 6, 1101–1110. in Computational Biology. MIT Press, pp. 155–170. Qiu,J. et al. (2007) A structural alignment kernel for protein structures. Bioinformatics, Klabunde,T. (2006) Chemogenomics approaches to ligand design. In Ligand Design 23, 1090–1098. for G Protein-coupled Receptors. Ch. 7, Wiley-VCH, Great Britain, pp. 115–135. Ralaivola,L. et al. (2005) Graph kernels for chemical informatics. Neural Netw., 18, Klabunde,T. (2007) Chemogenomic approaches to drug discovery: similar receptors 1093–1110. bind similar ligands. Br. J. Pharmacol., 152, 5–7. Rognan,D. (2007) Chemogenomic approaches to rational drug design. Br. J. Kratochwil,N.A. et al. (2005) An automated system for the analysis of g protein- Pharmacol., 152, 38–52. coupled receptor transmembrane binding pockets: alignment, receptor-based Schölkopf,B. and Smola,A.J. (2002) Learning with Kernels: Support Vector Machines, pharmacophores, and their application. J. Chem. Inf. Model, 45, 1324–1336. Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Kuang,R. et al. (2005) Profile-based string kernels for remote homology detection and Schölkopf,B. et al. (2004) Kernel Methods in Computational Biology. MIT Press, motif extraction. J. Bioinform. Comput. Biol., 3, 527–550. Cambridge, Massachussetts. Kubinyi,H. et al. (eds) (2004) Chemo-Genomics in Drug Discovery: A Medicinal Shawe-Taylor,J. and Cristianini,N. (2004) Kernel Methods for Pattern Analysis. Chemistry Perspective. Methods and Principles in Medicinal Chemistry. Cambridge University Press, New York, USA. Wiley-VCH, New York. Todeschini,R. and Consonni,V. (2002) Handbook of Molecular Descriptors. Lanckriet,G.R.G. et al. (2004) A statistical framework for genomic data fusion. Wiley-VCH, New York, USA. Bioinformatics, 20, 2626–2635. Tsuda,K. et al. (2002) Marginalized kernels for biological sequences. Bioinformatics, Leslie,C. et al. (2002) The spectrum kernel: a string kernel for SVM protein 18, S268–S275. classification. In Altman,R.B. et al. (eds) Proceedings of the Pacific Symposium Vapnik,V.N. (1998) Statistical Learning Theory. Wiley, New York. on Biocomputing 2002. World Scientific, Singapore, pp. 564–575. Vert,J.-P. (2002) A tree kernel to analyze phylogenetic profiles. Bioinformatics, 18, Leslie,C.S. et al. (2004) Mismatch string kernels for discriminative protein S276–S284. classification. Bioinformatics, 20, 467–476. Vert,J.-P. et al. (2004) Local alignment kernels for biological sequences. In Schölkopf,B. Mahé,P. and Vert,J.-P. (2006) Graph kernels based on tree patterns for molecules. et al. (eds) Kernel Methods in Computational Biology. MIT Press, Cambridge, Technical Report ccsd-00095488, HAL. Massachussetts, pp. 131–154. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Pubmed Central

Protein-ligand interaction prediction: an improved chemogenomics approach

Bioinformatics , Volume 24 (19) – Aug 1, 2008

Loading next page...
 
/lp/pubmed-central/protein-ligand-interaction-prediction-an-improved-chemogenomics-cGF8sYy6vk

References (73)

Publisher
Pubmed Central
Copyright
© 2008 The Author(s)
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btn409
Publisher site
See Article on Publisher Site

Abstract

Vol. 24 no. 19 2008, pages 2149–2156 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn409 Genome analysis Protein–ligand interaction prediction: an improved chemogenomics approach 1,2,3,∗ 1,2,3 Laurent Jacob and Jean-Philippe Vert 1 2 Mines ParisTech, Centre for Computational Biology, 35 rue Saint Honoré, F-77305 Fontainebleau, Institut Curie and INSERM, U900, F-75248, Paris, France Received on April 4, 2008; revised on June 17, 2008; accepted on July 30, 2008 Advance Access publication August 1, 2008 Associate Editor: Alfonso Valencia ABSTRACT target, considering each target independently from other proteins. Usual methods are classified into ligand-based and structure- Motivation: Predicting interactions between small molecules and based or docking approaches. Ligand-based approaches compare proteins is a crucial step to decipher many biological processes, and a candidate ligand to the known ligands of the target to make plays a critical role in drug discovery. When no detailed 3D structure their prediction, typically using machine learning algorithms (Butina of the protein target is available, ligand-based virtual screening allows et al., 2002; Byvatov et al., 2003) whereas structure-based the construction of predictive models by learning to discriminate approaches use the 3D-structure of the target to determine how well known ligands from non-ligands. However, the accuracy of ligand- each candidate binds the target (Halperin et al., 2002). based models quickly degrades when the number of known ligands Ligand-based approaches require the knowledge of sufficient decreases, and in particular the approach is not applicable for orphan ligands of a given target with respect to the complexity of the receptors with no known ligand. ligand/non-ligand separation to produce accurate predictors. If few Results: We propose a systematic method to predict ligand–protein or no ligands are known for a target, one is compelled to use docking interactions, even for targets with no known 3D structure and few approaches, which in turn require the 3D structure of the target and or no known ligands. Following the recent chemogenomics trend, are very time consuming. If for a given target with unavailable 3D we adopt a cross-target view and attempt to screen the chemical structure no ligand is known, none of the classical approaches can space against whole families of proteins simultaneously. The lack be applied. This is the case for many GPCR as very few structures of known ligand for a given target can then be compensated by have been crystallized so far (Ballesteros and Palczewski, 2001) the availability of known ligands for similar targets. We test this and many of these receptors, referred to as orphan GPCR, have no strategy on three important classes of drug targets, namely enzymes, known ligand. G-protein-coupled receptors (GPCR) and ion channels, and report An interesting idea to overcome this issue is to stop considering dramatic improvements in prediction accuracy over classical ligand- each protein target independently from other proteins, and rather based virtual screening, in particular for targets with few or no known take the point of view of chemogenomics (Jaroch and Weinmann, ligands. 2006; Kubinyi et al., 2004). Roughly speaking, chemogenomics Availability: All data and algorithms are available as Supplementary aims at mining the entire chemical space, which corresponds to the Material. set of all small molecules, for interactions with the biological space, Contact: [email protected] i.e. the set of all proteins or at least protein families, in particular Supplementary information: Supplementary data are available at drug targets. A salient motivation of the chemogenomics approach Bioinformatics online. is the realization that some classes of molecules can bind ‘similar’ proteins, suggesting that the knowledge of some ligands for a target 1 INTRODUCTION can be helpful to determine ligands for similar targets. Besides, this type of method allows for a more rational approach to design Predicting interactions between small molecules and proteins is a drugs since controlling a whole ligand’s selectivity profile is crucial key element in the drug discovery process. In particular, several to make sure that no side effect occurs and that the compound is classes of proteins such as G-protein-coupled receptors (GPCR), compatible with therapeutical usage. enzymes and ion channels represent a large fraction of current drug Recent reviews (Jaroch and Weinmann, 2006; Klabunde, 2007; targets and important targets for new drug development (Hopkins Kubinyi et al., 2004; Rognan, 2007) describe several chemogenomic and Groom, 2002). Understanding and predicting the interactions approaches to predict interactions between compounds and targets. between small molecules and such proteins could therefore help in A first class of approaches, called ligand-based chemogenomics the discovery of new lead compounds. by Rognan (2007), pool together targets at the level of families Various approaches have already been developed and have proved (such as GPCR) or subfamilies (such as purinergic GPCR) and very useful to address this in silico prediction issue (Manly et al., learn a model for ligands at the level of the family (Balakin et al., 2001). The classical paradigm is to predict the modulators of a given 2002; Klabunde, 2006). Other approaches, termed target-based chemogenomic approaches by Rognan (2007), cluster receptors To whom correspondence should be addressed. © 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. L.Jacob and J.-P.Vert based on ligand binding site similarity and again pool together estimated based on its ability to correctly predict the classes of molecules in the training set. known ligands for each cluster to infer shared ligands (Frimurer The in silico chemogenomics problem is more general because data et al., 2005). Finally, a third strategy termed target-ligand approach involving interactions with different targets are available to train a model by Rognan (2007) attempts to predict ligands for a given target which must be able to predict interactions between any molecule and any by leveraging binding information for other targets in a single protein. In order to extend the previous machine learning approaches to this step, that is, without first attempting to define a particular set setting, we need to represent a pair (t,c) of target t and chemicals c by a of similar receptors. For example, Bock and Gough (2005) vector (t,c), then estimate a linear function f (t,c) =w (t,c) whose sign merge descriptors of ligands and targets to describe putative is used to predict whether or not c can bind to t. As before the vector w can ligand–receptor complexes, and use machine learning methods to be estimated from the training set of interacting and non-interacting pairs, discriminate real complexes from ligand–receptor pairs that do not using any linear machine learning algorithm. form complexes. Erhan et al. (2006) show how the same idea can To summarize, we propose to cast the in silico chemogenomics problem as a learning problem in the ligand–target space thus making it suitable to any be casted in the framework of neural networks and support vector classical linear machine learning approach as soon as a vector representation machines (SVM). In particular, they show that a given set of receptor (t,c) is chosen for protein/ligand pairs. We propose in the next sections a descriptors can be combined with a given set of ligand descriptors in systematic way to design such a representation. a computationally efficient framework, offering in principle a large flexibility in the choice of the receptor and ligand descriptors. 2.2 Vector representation of target/ligand pairs In this article, we go one step further in this direction and investigate various kinds of receptor and ligand descriptors that can A large literature in chemoinformatics has been devoted to the problem be combined for in silico chemogenomics screening with SVM, of representing a molecule c by a vector  (c) ∈R , e.g. using various lig molecular descriptors (Todeschini and Consonni, 2002). These descriptors building on recent development in the field of kernel methods for encode several features related to the physicochemical and structural bio- and chemoinformatics. In particular, we propose a new kernel properties of the molecules, and are widely used to model interactions for receptors, based on a priori defined hierarchies of receptors. between the small molecules and a single target using linear models described We test the different methods for the prediction of ligands for in the previous section (Gasteiger and Engel, 2003). Similarly, much work three major classes of therapeutic targets, namely enzymes, GPCR in computational biology has been devoted to the construction of descriptors and ion channels. We show that the choice of representation for genes and proteins, in order to represent a given protein t by a vector has a strong influence on the accuracy of the model estimated, d (t) ∈R . The descriptors typically capture properties of the sequence or tar and in particular that the new hierarchy kernel systematically structure of the protein, and can be used to infer models to predict, e.g. the outperforms other descriptors used in multitask learning or involving structural or functional class of a protein. receptor sequences. We show that the chemogenomics approach For our in silico chemogenomics problem, we need to represent each pair (c,t) of small molecule and protein by a single vector (c,t). In order to is, particularly, relevant for targets with few known ligands. In capture interactions between features of the molecule and of the protein that particular we estimate that, for orphan receptors with no known may be useful predictors for the interaction between c and t, we propose to ligands, our method reaches a normalized accuracy of 86.2%, 77.6% consider features for the pair (c,t) obtained by multiplying a descriptor of c and 80.5% on the enzymes, GPCR and ion channels, respectively, with a descriptor of t. Intuitively, if for example, the descriptors are binary well above the 50% accuracy of a random predictor that would be indicators of specific structural features in each small molecule and proteins, trained in a classical ligand-based virtual screening framework with then the product of two such features indicates that both the small molecule no training example. and the target carry specific features, which may be strongly correlated with the fact that they interact. More generally, if a molecule c is represented by a vector of descriptors  (c) ∈R and a target protein by a vector of lig descriptors  (t) ∈R , this suggests to represent the pair (c,t) by the set tar 2 METHOD of all possible products of features of c and t, i.e. by the tensor product: We formulate the typical in silico chemogenomics problem as the following learning problem: given a collection of n target/molecule pairs (c,t) =  (c) ⊗  (t). (1) lig tar (t ,c ), ...,(t ,c ) known to form complexes or not, estimate a function 1 1 n n Remember that the tensor product in (1) is a d ×d vector whose (i,j)-th c t f (t,c) that would predict whether any chemical c binds to any target t.In entry is exactly the product of the i-th entry of  (c)bythe j-th entry of lig this section, we propose a rigorous and general framework to solve this (t). This representation can be used to combine in an algorithmic way tar problems building on recent developments of kernel methods in bio- and any vector representation of small molecules with any vector representation chemoinformatics. This approach is similar to the approaches proposed in of proteins, for the purpose of in silico chemogenomics or any other task the context of MHC-I-peptide binding prediction (Jacob and Vert, 2008) and involving pairs of molecules/protein. A potential issue with this approach, in (Erhan et al., 2006). however, is that the size of the vector representation for a pair may be prohibitively large for practical computation and storage. For example, using a vector of molecular descriptors of size 1024 for molecules and 2.1 From single-target screening to chemogenomics representing a protein by the vector of counts of all 2mers of amino acids in Much effort in chemoinformatics has been devoted to the more restricted its sequence (d = 20 × 20 = 400) results in more than 400 k dimensions for problem of mining the chemical space for interaction with a single target the representation of a pair. In order to circumvent this issue we now show t, using a training set of molecules c ,...,c known to interact or not with 1 n how kernel methods such as SVM can efficiently work in such large spaces. the target. Machine learning approaches, such as artificial neural networks (ANN) or SVM, often provide competitive models for such problems. The 2.3 Kernels for target/ligand pairs simplest linear models start by representing each molecule c by a vector representation (c), before estimating a linear function f (c) =w (c) SVM is an algorithm to estimate linear binary classifiers from a training set whose sign (positive or negative) is used to predict whether or not the small of patterns with known class (Boser et al., 1992; Vapnik, 1998). A salient molecule c is a ligand of the target t. The weight vector w is typically feature of SVM, often referred to as the kernel trick, is its ability to process 2150 Protein–ligand interaction prediction large- or even infinite-dimensional patterns as soon as the inner product kernel, a classical choice that usually gives state-of-the-art performances in between any two patterns can be efficiently computed. This property is molecule classification tasks. It is defined as: shared by a large number of popular linear algorithms, collectively referred K (c,c ) ligand to as kernel methods, including for example, algorithms for regression, clustering or outlier detection (Schölkopf and Smola, 2002; Shawe-Taylor (c)  (c ) lig lig = , (4) and Cristianini, 2004). (c)  (c) +  (c )  (c ) −  (c)  (c ) lig lig lig lig lig lig In order to apply kernel methods such as SVM for in silico where  (c) is a binary vector whose bits indicate the presence or absence of lig chemogenomics, we therefore need to show how to efficiently compute the all linear path of length l or less as subgraph of the 2D structure of c. We chose inner product between the vector representations of two molecule/protein l = 8 in our experiment, i.e. characterize the molecules by the occurrences pairs. Interestingly, a classical property of tensor products allows us to of linear subgraphs of length 8 or less, a value previously observed to give factorize the inner product between two tensor product vectors as follows: good results in several virtual screening task (Mahé et al., 2005). We used the freely and publicly available ChemCPP software to compute this kernel (c) ⊗  (t)  (c ) ⊗  (t ) lig tar lig tar in the experiments. =  (c)  (c ) ×  (t)  (t ). (2) lig lig tar tar 2.5 Kernels for targets This factorization dramatically reduces the burden of working with tensor SVM and kernel methods are also widely used in bioinformatics (Schölkopf products in large dimensions. For example, in our previous example where et al., 2004), and a variety of approaches have been proposed to design the dimensions of the small molecule and proteins are vectors of respective kernels between proteins, ranging from kernels based on the amino-acid dimensions 1024 and 400, the inner product in >400 k dimensions between sequence of a protein (Cuturi and Vert, 2005; Jaakkola et al., 2000; Kuang tensor products is simply obtained from (2) by computing two inner products, et al., 2005; Leslie et al., 2002, 2004; Tsuda et al., 2002; Vert et al., 2004) respectively in dimensions 1024 and 400, before taking their product. to kernels based on the 3D structures of proteins (Borgwardt et al., 2005; Even more interestingly, this reasoning extends to the case where inner Dobson and Doig, 2005; Qiu et al., 2007) or the pattern of occurrences of products between vector representations of small molecules and proteins proteins in multiple sequenced genomes (Vert, 2002). These kernels have can themselves be efficiently computed with the help of positive definite been used in conjunction with SVM or other kernel methods for various kernels (Vapnik, 1998), as explained in the next sections. Positive definite tasks related to structural or functional classification of proteins. While any of kernels are linked to inner products by a fundamental result (Aronszajn, these kernels can theoretically be used as a target kernel in (3), we investigate 1950): the kernel between two points is equivalent to an inner product in this article a restricted list of specific kernels described below, aimed at between the points mapped to a Hilbert space uniquely defined by the kernel. illustrating the flexibility of our framework and test various hypothesis. Now by denoting  The Dirac kernel between two targets t,t is: K (c,c ) =  (c)  (c ), ligand lig lig 1if t =t , K (t,t ) = (5) K (t,t ) =  (t)  (t ), Dirac target tar tar 0 otherwise. we obtain the inner product between tensor products by: This basic kernel simply represents different targets as orthonormal vectors. From (3) we see that orthogonality between two proteins t K (c,t),(c ,t ) =K (t,t ) ×K (c,c ). (3) target ligand and t implies orthogonality between all pairs (c,t) and (c ,t ) for any two small molecules c and c . This means that a linear classifier In summary, as soon as two kernels K and K corresponding to ligand target for pairs (c,t) with this kernel decomposes as a set of independent two implicit embeddings of the chemical and biological spaces in two Hilbert linear classifiers for interactions between molecules and each target spaces are chosen, we can solve the in silico chemogenomics problem with protein, which are trained without sharing any information of known an SVM (or any other relevant kernel method) using the product kernel ligands between different targets. In other words, using Dirac kernel (3) between pairs. The particular kernels K and K should ideally ligand target for proteins amounts to performing classical learning independently encode properties related to the ability of similar molecules to bind similar for each target, which is our baseline approach. targets or ligands, respectively. We review in the next two sections possible choices for such kernels.  The multitask kernel between two targets t,t is defined as: K (t,t ) = 1 +K (t,t ). multitask Dirac 2.4 Kernels for ligands This kernel, originally proposed in the context of multitask learning Recent years have witnessed impressive advances in the use of SVM in (Evgeniou et al., 2005), removes the orthogonality of different chemoinformatics (Ivanciuc, 2007). In particular, much work has focused proteins to allow sharing of information. As explained in Evgeniou on the development of kernels for small molecules for the purpose of single- et al. (2005), plugging K in (3) amounts to decomposing the multitask target virtual screening and prediction of pharmacokinetics and toxicity. linear function used to predict interactions as a sum of a linear function For example, simple inner products between vectors of classical molecular common to all targets and of a linear function specific to each target: descriptors have been widely investigated, including physicochemical f (c,t) =w (c,t) =w  (c) +w  (c). (6) lig lig general t properties of molecules or 2D and 3D fingerprints (Azencott et al., 2007; Todeschini and Consonni, 2002). Other kernels have been designed directly A consequence is that only data related to the target t are used to from the comparison of 2D and 3D structures of molecules, including kernels estimate the specific vector w , while all data are used to estimate the based on the detection of common substructures in the 2D structures of common vector w . In our framework this classifier is therefore general molecules seen as graphs (Borgwardt and Kriegel, 2005; Gärtner et al., 2003; the combination of a target-specific part accounting for target-specific Horváth et al., 2004; Kashima et al., 2003, 2004; Mahé and Vert, 2006; Mahé properties of the ligands and a global part accounting for general et al., 2005; Ralaivola et al., 2005) or on the encoding of various properties properties of the ligands across the targets. The latter term allows of the 3D structure of molecules (Azencott et al., 2007; Mahé et al., 2006). to share information during the learning process, while the former While any of these kernels could be used to model the similarities of small ensures that specificities of the ligands for each target are not lost. molecules and be plugged into (3), we restrict ourselves in our experiment to a particular kernel proposed by Ralaivola et al. (2005) called the Tanimoto Available at http://chemcpp.sourceforge.net. 2151 L.Jacob and J.-P.Vert  While the multitask kernel provides a basic framework to share in the corresponding hierarchy plus one, that is, information across proteins, it does not allow to weigh differently K (t,t ) = (t), (t ), hierarchy h h how known interactions with a protein t should contribute to predict where  (t) contains as many features as there are nodes in the interactions with a target t . Empirical observations underlying hierarchy, each being set to 1 if the corresponding node is part of chemogenomics, on the other hand, suggest that molecules binding t’s hierarchy and 0 otherwise, plus one feature constantly set to a ligand t are only likely to bind ligand t similar to t in terms of one that accounts for the ‘plus one’ term of the kernel. One might structure or evolutionary history. In terms of kernels this suggest not expect the EC classification to be a good similarity measure in to plug into (3) a kernel for proteins that quantifies this notion of terms of binding, since it does not closely reflect evolutionary or similarity between proteins, which can, for example, be detected by mechanistic similarities except for the case of identical subclasses comparing the sequences of proteins. In order to test this approach, with different serial numbers. However, using the full hierarchy gave we therefore tested two commonly used kernels between protein a better accuracy in our experiments. Even if the hierarchy itself is not sequences: the mismatch kernel (Leslie et al., 2004), which compares fully relevant in this case, the improvement can be explained, on the proteins in terms of common short sequences of amino acids up one hand, by the multitask effect, i.e. by the fact that we use the data to some mismatches, and the local alignment kernel (Vert et al., from the target and the data from other targets with a smaller weight, 2004) which measures the similarity between proteins as an alignment and on the other hand, by the fact that we give more weight to the score between their primary sequences. In our experiments involving enzymes with the same serial number than to the other enzymes. the mismatch kernel, we use the classical choice of 3-mers with a maximum of one mismatch, and for the datasets where some sequences were not available in the database, we added K (t,t ) Dirac 3 DATA to the kernel (and normalized to one on the diagonal) in order to keep We extracted compound interaction data from the KEGG BRITE it valid. Database (Kanehisa et al., 2002, 2004) concerning enzyme, GPCR and ion channel, three target classes particularly relevant for novel  Alternatively, we propose a new kernel aimed at encoding the drug development. similarity of proteins with respect to the ligands they bind. For each family, the database provides a list of known compounds Indeed, for most major classes of drug targets such as the ones investigated in this study (GPCR, enzymes and ion channels), proteins for each target. Depending on the target families, various categories have been organized into hierarchies that typically describe the of compounds are defined to indicate the type of interaction between precise functions of the proteins within each family. Enzymes are each target and each compound. These are, for example, inhibitor, labeled with Enzyme Commission numbers (EC numbers) defined cofactor and effector for enzyme ligands, antagonist or (full/partial) in International Union of Biochemistry and Molecular Biology agonist for GPCR and pore blocker, (positive/negative) allosteric (1992), that classify the chemical reaction they catalyze, forming a modulator, agonist or antagonist for ion channels. The list is not four-level hierarchy encoded into four numbers. For example, EC 1 exhaustive for the latter since numerous categories exist. Although includes oxidoreductases, EC 1.2 includes oxidoreductases that act on different types of interactions on a given target might correspond the aldehyde or oxo group of donors, EC 1.2.2 is a subclass of EC 1.2 to different binding sites, it is theoretically possible for a non- with NAD+ or NADP+ as acceptor and EC 1.2.2.1 is a subgroup of enzymes catalyzing the oxidation of formate to bicarbonate. These linear classifier like SVM with non-linear kernels to learn classes number define a natural and very informative hierarchy on enzymes: consisting of several disconnected sets. Therefore, for the sake of one can expect that enzymes that are closer in the hierarchy will clarity of our analysis, we do not differentiate between the categories tend to have more similar ligands. Similarly, GPCRs are grouped into of compounds. four classes based on sequence homology and functional similarity: For each target class, we retained only one protein by element of the rhodopsin family (class A), the secretin family (class B), the the hierarchy. In particular, we did not take into account the different metabotropic family (class C) and a last class regrouping more orthologs of the targets, and the different enzymes corresponding diverse receptors (class D). The KEGG database (Kanehisa et al., to the same EC number. We then eliminated all compounds for 2002) subdivides the large rhodopsin family in three subgroups which no molecular descriptor was available (principally peptide (amine receptors, peptide receptors and other receptors) and adds compounds), and all the targets for which no compound was known. a second level of classification based on the type of ligands or known subdivisions. For example, the rhodopsin family with For each target, we generated as many negative ligand–target pairs amine receptors is subdivided into cholinergic receptors, adrenergic as we had known ligands forming positive pairs by combining receptors, etc. This also defines a natural hierarchy that we could the target with a ligand randomly chosen among the other targets’ use to compare GPCRs. Finally, KEGG also provides a classification ligands (excluding those that were known to interact with the given of ion channels. Classification of ion channels is a less simple task target). This protocol generates false negative data since some since some of them can be classified according to different criteria ligands could actually interact with the target although they have like voltage dependence or ligand gating. The classification proposed not been experimentally tested, and our method could benefit from by KEGG includes Cys-loop superfamily, glutamate-gated cation experimentally confirmed negative pairs. channels, epithelial and related Na channels, voltage-gated cation This resulted in 2436 data points for enzymes (1218 channels, related to voltage-gated cation channels, related to inward known enzyme–ligand pairs and 1218 generated negative points) rectifier K channels, chloride channels and related to ATPase-linked transporters and each of these classes is further subdivided according, representing interactions between 675 enzymes and 524 compounds, for example to the type of ligands (e.g. glutamate receptor) or to the 798 training data points for GPCRs representing interactions type of ion passing through the channel (e.g. Na channel). Here between 100 receptors and 219 compounds and 2330 ion channel again, this hierarchy can be used to define a meaningful similarity in data points representing interactions between 114 channels and 462 terms of interaction behavior. compounds. Besides, Figure 1 shows the distribution of the number of known ligands per target for each dataset and illustrates the fact For each of the three target families, we define the hierarchy kernel between two targets of the family as the number of common ancestors that for most of them, few compounds are known. 2152 Protein–ligand interaction prediction Fig. 1. Distribution of the number of training points for a target for the enzymes, GPCR and ion channel datasets. Each bar indicates the proportion of targets in the family for which a given (x-axis) number of data points is available. Table 1. AUC for the first protocol on each dataset with various target kernels For each target t in each family, we carried out two experiments. First, all data points corresponding to other targets in the family K \ Target Enzymes GPCR Channels were used for training only and the n points corresponding to t tar were k-folded with k = min(n ,10). That is, for each fold, an SVM Dirac 0.646 ± 0.009 0.750 ± 0.023 0.770 ± 0.020 classifier was trained on all points involving other targets of the Multitask 0.931 ± 0.006 0.749 ± 0.022 0.873 ± 0.015 family plus a fraction of the points involving t, then the performances Hierarchy 0.955 ± 0.005 0.926 ± 0.015 0.925 ± 0.012 of the classifier were tested on the remaining fraction of data points Mismatch 0.725 ± 0.009 0.805 ± 0.023 0.875 ± 0.015 for t. This protocol is intended to assess the incidence of using Local alignment 0.676 ± 0.009 0.824 ± 0.021 0.901 ± 0.013 ligands from other targets on the accuracy of the learned classifier for a given target. Second, for each target t we trained an SVM classifier using only interactions that did not involve t and tested on the points that involved t. This is intended to simulate the behavior of our framework when making predictions for orphan targets, i.e. for targets for which no ligand is known. For both experiments, we used the area under the ROC curve (AUC) as a performance measure. The ROC curve was computed for each target using the test points pooled from all the folds. For the first protocol, since training an SVM with only one training point does not really make sense and can lead to ‘anti-learning’ less than 0.5 performances, we set all results r involving the Dirac target Fig. 2. Target kernel Gram matrices (K ) for ion channels with multitask, tar hierarchy and local alignment kernels. kernel on targets with only one known ligand to max(r,0.5). This is to avoid any artifactual penalization of the Dirac approach and make sure we measure the actual improvement brought by sharing sharing information among known ligands of different targets, on information across targets. the one hand, and the relevance of incorporating prior information into the kernels, on the other hand. On the GPCR dataset though, the multitask kernel performs 4 RESULTS slightly worse than the Dirac kernel, probably because some targets We first discuss the results obtained on the three datasets for the in different subclasses show very different binding behavior, which first experiment, assessing how using training points from other results in adding more noise than information when sharing naively targets of the family improves prediction accuracy with respect to with this kernel. However, a more careful handling of the similarities individual (Dirac-based) learning. Table 1 shows the mean AUC between GPCRs through the hierarchy kernel results in significant across the family targets for an SVM with a product kernel using improvement over the Dirac kernel (from 75% to 92.6%), again the Tanimoto kernel for ligands and various kernels for proteins. demonstrating the relevance of the approach. For the enzymes and ion channels datasets, we observe significant Sequence-based target kernels do not achieve the same improvements when the multitask kernel is used in place of the Dirac performance as the hierarchy kernel, although they perform kernel, on the one hand, and when the hierarchy kernel replaces relatively well for the ion channel dataset, and give better results than the multitask kernel, on the other hand. For example, the Dirac the multitask kernel for both GPCR and ion channel datasets. In the kernel only performs at an average AUC of 77% for the ion channel case of enzymes, it can be explained by the diversity of the proteins dataset, while the multitask kernel increases the AUC to 87.3% in the family and for the GPCR, by the well-known fact that the and the hierarchy kernel brings it to 92.5%. For the enzymes, a receptors do not share overall sequence homology (Gether, 2000). global improvement of 30.9% is observed between the Dirac and Figure 2 shows three of the tested target kernels for the ion channel the hierarchy approaches. This clearly demonstrates the benefits of dataset. The hierarchy kernel adds some structure information with 2153 L.Jacob and J.-P.Vert Fig. 3. Relative improvement of the hierarchy kernel against the Dirac kernel as a function of the number of known ligands for enzymes, GPCR and ion channel datasets. Each point indicates the mean performance ratio between individual and hierarchy approaches across the targets of the family for which a given (x-axis) number of training points was available. Table 2. AUC for the second protocol on each dataset with various target respect to the multitask kernel, which explains the increase in AUC. kernels The local alignment sequence-based kernels fail to precisely rebuild this structure but retain some substructures. In the cases of GPCR K \ Target Enzymes GPCR Channels tar and enzymes, almost no structure is found by the sequence kernels, which, as alluded to above, was expected and suggests that more Dirac 0.500 ± 0.000 0.500 ± 0.000 0.500 ± 0.000 subtle comparison of the sequences would be required to exploit the Multitask 0.902 ± 0.008 0.576 ± 0.026 0.704 ± 0.026 information they contain. Hierarchy 0.938 ± 0.006 0.875 ± 0.020 0.853 ± 0.019 Figure 3 illustrates the influence of the number of training points Mismatch 0.602 ± 0.008 0.703 ± 0.027 0.729 ± 0.024 for a target on the improvement brought by using information from Local alignment 0.535 ± 0.005 0.751 ± 0.025 0.772 ± 0.023 similar targets. As one could expect, the improvement is very strong when few ligands are known and decreases when enough training points become available. After a certain point (around 30 training ligands of close targets in the hierarchy. In particular, it will predict points), using similar targets can even impair the performances. This that the ligands of the target’s direct neighbors are ligands of suggests that the method could be globally improved by learning for the target (which is an intuitive and natural way to choose new each target independently how much information should be shared, candidate binders). A major difference, however, is that a candidate for example, through kernel learning approaches (Lanckriet et al., molecule which is very similar to ligands of a close target, but 2004). not a ligand itself, will not be be predicted to be a ligand by the The second experiment aims at pushing this remark to its limit by annotation transfer approach. In particular, if the candidate molecule assessing how each strategy is able to predict ligands for proteins is not present anywhere else in the ligand database, it will never with no known ligand. Table 2 shows the results in that case. As be predicted to be a ligand. Exemples can be found in each of expected, the classifiers using Dirac kernels show random behavior the considered target classes. The 4-aminopyridine is a blocker of in this case since using a Dirac kernel with no data for the target the ion channel KCJN5, a potassium inwardly rectifying channel. amounts to learning with no training data at all. In particular, in the Although this molecule is a known blocker of other channels (in SVM implementation that we used, the classifier learned with no particular, many potassium channels), it is not a known ligand of data from the task gave constant scores to all the test points, hence the any other channel of KCJN5’s superfamily. However, the most 0.500 ± 0.000 AUC on the test data. On the other hand, we note that similar molecule in the database, in the sense of the Tanimoto it is still possible to obtain reasonable results using adequate target kernel, is the Pinacidil, which happens to be a known ligand of kernels. In particular, the hierarchy kernel loses only 7.2% of AUC two direct neighbors of KCJN5. This allows our method to predict for the ion channel dataset, 5.1% for the GPCR dataset and 1.7% for 4-aminopyridine as a ligand for this target. Similarly, N -acetyl- the enzymes compared to the first experiment where known ligands d-glucosamine 1,6-bisphosphate is the only known effector of were used, suggesting that if a target with no known compound is phosphoacetylglucosamine mutase, an enzyme of the isomerase placed in the hierarchy, e.g. in the case of GPCR homology detection family. This molecule is not a known ligand of any other enzyme in with known members of the family using specific GPCR alignment the database, so a direct annotation transfer approach would never algorithms (Kratochwil et al., 2005) or fingerprint analysis (Attwood predict it as a ligand. Our method, on the other hand, predicts it et al., 2003), it is possible to predict some of its ligands almost as correctly, taking advantage of the fact that very similar molecules accurately as if some of them were already available. like D-ribose 1,5-bisphosphate or α-d-glucose 1,6-bisphosphate are In this second setting, our approach when using the hierarchy known ligands of direct neighbors. The same observation can be kernel on the targets is closely related to annotation transfer. Indeed, made for several GPCRs, including the prostaglandin F receptor the learned predictor in this case will predict a molecule to be a whose three known ligands are not ligands of any other GPCR but ligand of a given target if the molecule is similar to the known whose direct neighbors have similar ligands. 2154 Protein–ligand interaction prediction 5 DISCUSSION REFERENCES We propose a general method to combine the chemical and the Aronszajn,N. (1950) Theory of reproducing kernels. Trans. Am. Math. Soc., 68, 337–404. biological space in an algorithmic way and predict interaction Attwood,T.K. et al. (2003) Prints and its automatic supplement, preprints. Nucleic Acids between any small molecule and any target, which makes it a very Res., 31, 400–402. valuable tool for drug discovery. The method allows one to represent Azencott,C.-A. et al. (2007) One- to four-dimensional kernels for virtual screening and systematically a ligand–target pair, including information on the the prediction of physical, chemical, and biological properties. J. Chem. Inf. Model, interaction between the ligand and the target. Prediction is then 47, 965–974. Balakin,K.V. et al. (2002) Property-based design of GPCR-targeted library. J. Chem. performed by any machine learning algorithm (an SVM in our case) Inf. Comput. Sci., 42, 1332–1342. in the joint space, which makes targets with few known ligands Ballesteros,J. and Palczewski,K. (2001) G protein-coupled receptor drug discovery: benefit from the data points of similar targets, and which allows implications from the crystal structure of rhodopsin. Curr. Opin. Drug Discov. one to make predictions for targets with no known ligand. Our Devel., 4, 561–574. information-sharing process is therefore simply based on a choice Bock,J.R. and Gough,D.A. (2005) Virtual screen for ligands of orphan g protein-coupled receptors. J. Chem. Inf. Model, 45, 1402–1414. of description for the ligands, another one for the targets and on Borgwardt,K. et al. (2005) Protein function prediction via graph kernels. Bioinformatics, classical machine learning methods. Everything is done by casting 21(Suppl. 1), i47–i56. the problem in a joint space and no explicit procedure to select which Borgwardt,K.M. and Kriegel,H.-P. (2005) Shortest-path kernels on graphs. In part of the information is shared is needed. Since it subdivides the Proceedings of the Fifth International Conference on Data Mining. IEEE Computer representation problem into two subproblems, our approach makes Society, Washington, DC, USA, pp. 74–81. Boser,B.E. et al. (1992) A training algorithm for optimal margin classifiers. In use of previous work on kernels for molecular graphs and kernels for Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory. biological targets. For the same reason, it will automatically benefit ACM Press, New York, USA, pp. 144–152. from future improvements in both fields. This leaves plenty of room Butina,D. et al. (2002) Predicting ADME properties in silico: methods and models. to increase the performance. Drug Discov. Today, 7(Suppl. 11), S83–S88. Results on experimental ligand datasets show that using target Byvatov,E. et al. (2003) Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci., 43, kernels allowing to share information across the targets considerably 1882–1889. improve the prediction, especially in the case of targets with Cuturi,M. and Vert,J.-P. (2005) The context-tree kernel for strings. Neural Netw., 18, few known ligands. The improvement is particularly strong 1111–1123. when the target kernel uses prior information on the structure Dobson,P. and Doig,A. (2005) Predicting enzyme class from protein structure without between the targets, e.g. a hierarchy defined on a target class. alignments. J. Mol. Biol., 345, 187–199. Erhan,D. et al. (2006) Collaborative filtering on a family of biological targets. J. Chem. Although the usage of a kernel based on the hierarchy is restricted Inf. Model, 46, 626–635. to protein families where hierarchical classification schemes exist, Evgeniou,T. et al. (2005) Learning multiple tasks with kernel methods. J. Mach. Learn. it applies to the three main classes of proteins targeted by drugs, and Res., 6, 615–637. others like cytochromes and abc transporters. Sequence kernels, on Frimurer,T.M. et al. (2005) A physicogenetic method to assign ligand-binding the other hand, did not give very good results in our experiments. relationships between 7tm receptors. Bioorg. Med. Chem. Lett., 15, 3707–3712. Gärtner,T. et al. (2003) On graph kernels: hardness results and efficient alternatives. However, we believe using the target sequence information could be In Schölkopf,B. and Warmuth,M. (eds) Proceedings of the Sixteenth Annual an interesting alternative or complement to the hierarchy kernel. For Conference on Computational Learning Theory and the Seventh Annual Workshop example, Jacob et al. (2008) used a kernel based on the sequence on Kernel Machines. Vol. 2777 of Lecture Notes in Computer Science. Springer, of the GPCR that performed as well as the kernel based on the Heidelberg, pp. 129–143. GPCR hierarchy. Further improvement could come from the use of Gasteiger,J. and Engel,T. (eds) (2003) Chemoinformatics : a Textbook. Wiley. Gether,U. (2000) Uncovering molecular mechanisms involved in activation of g protein- kernel for structures in the cases where 3D structure information is coupled receptors. Endocr. Rev., 21, 90–113. available (e.g. for the enzymes, but not for the GPCR). Our method Halperin,I. et al. (2002) Principles of docking: an overview of search algorithms and a also shows good performances even when no ligand is known at all guide to scoring functions. Proteins, 47, 409–443. for a given target, which is excellent news since classical ligand- Hopkins,A.L. and Groom,C.R. (2002) The druggable genome. Nat. Rev. Drug Discov., based approaches fail to predict ligand for these targets on the one 1, 727–730. Horváth,T. et al. (2004) Cyclic pattern kernels for predictive graph mining. In hand, and docking approaches are computationally expensive and Proceedings of the tenth ACM SIGKDD international conference on Knowledge not feasible when the target 3D structure is unknown, which is the discovery and data mining. ACM Press, New York, NY, pp. 158–167. case of GPCR on the other hand. International Union of Biochemistry and Molecular Biology (1992) Enzyme In future work, it could be interesting to apply this framework to Nomenclature 1992. Academic Press, California, USA. quantitative prediction of binding affinity using regression methods Ivanciuc,O. (2007) Applications of support vector machines in chemistry. In Lipkowitz,K.B. and Cundari,T.R. (eds) Reviews in Computational Chemistry. in the joint space. It would also be important to confirm predicted Vol. 23. Wiley-VCH, Weiheim, pp. 291–400. ligands experimentally or at least by docking approaches when the Jaakkola,T. et al. (2000) A discriminative framework for detecting remote protein target 3D structure is available. homologies. J. Comput. Biol., 7, 95–114. Jacob,L. and Vert,J.-P. (2008) Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics, 24, 358–366. Jacob,L. et al. (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics (in press). ACKNOWLEDGEMENTS Jaroch,S.E. and Weinmann,H. (eds) (2006) Chemical Genomics: Small Molecule We thank Pierre Mahé for his help with ChemCPP and kernels for Probes to Study Cellular Function. Ernst Schering Research Foundation Workshop. molecules, and Véronique Stoven for insightful discussions on the Springer, Berlin. Kanehisa,M. et al. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res., 30, biological aspects of the problem. 42–46. Conflict of Interest: none declared. 2155 L.Jacob and J.-P.Vert Kanehisa,M. et al. (2004) The KEGG resource for deciphering the genome. Nucleic Mahé,P. et al. (2005) Graph kernels for molecular structure-activity relationship analysis Acids Res., 32(Database issue), D277–D280. with support vector machines. J. Chem. Inf. Model, 45, 939–951. Kashima,H. et al. (2003) Marginalized kernels between labeled graphs. In Faucett,T. Mahé,P. et al. (2006) The pharmacophore kernel for virtual screening with support and Mishra,N. (eds), Proceedings of the Twentieth International Conference on vector machines. J. Chem. Inf. Model, 46, 2003–2014. Machine Learning, AAAI Press, pp. 321–328. Manly,C. et al. (2001) The impact of informatics and computational chemistry on Kashima,H. et al. (2004) Kernels for graphs. In Schölkopf,B. et al. (eds) Kernel Methods synthesis and screening. Drug Discov. Today, 6, 1101–1110. in Computational Biology. MIT Press, pp. 155–170. Qiu,J. et al. (2007) A structural alignment kernel for protein structures. Bioinformatics, Klabunde,T. (2006) Chemogenomics approaches to ligand design. In Ligand Design 23, 1090–1098. for G Protein-coupled Receptors. Ch. 7, Wiley-VCH, Great Britain, pp. 115–135. Ralaivola,L. et al. (2005) Graph kernels for chemical informatics. Neural Netw., 18, Klabunde,T. (2007) Chemogenomic approaches to drug discovery: similar receptors 1093–1110. bind similar ligands. Br. J. Pharmacol., 152, 5–7. Rognan,D. (2007) Chemogenomic approaches to rational drug design. Br. J. Kratochwil,N.A. et al. (2005) An automated system for the analysis of g protein- Pharmacol., 152, 38–52. coupled receptor transmembrane binding pockets: alignment, receptor-based Schölkopf,B. and Smola,A.J. (2002) Learning with Kernels: Support Vector Machines, pharmacophores, and their application. J. Chem. Inf. Model, 45, 1324–1336. Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Kuang,R. et al. (2005) Profile-based string kernels for remote homology detection and Schölkopf,B. et al. (2004) Kernel Methods in Computational Biology. MIT Press, motif extraction. J. Bioinform. Comput. Biol., 3, 527–550. Cambridge, Massachussetts. Kubinyi,H. et al. (eds) (2004) Chemo-Genomics in Drug Discovery: A Medicinal Shawe-Taylor,J. and Cristianini,N. (2004) Kernel Methods for Pattern Analysis. Chemistry Perspective. Methods and Principles in Medicinal Chemistry. Cambridge University Press, New York, USA. Wiley-VCH, New York. Todeschini,R. and Consonni,V. (2002) Handbook of Molecular Descriptors. Lanckriet,G.R.G. et al. (2004) A statistical framework for genomic data fusion. Wiley-VCH, New York, USA. Bioinformatics, 20, 2626–2635. Tsuda,K. et al. (2002) Marginalized kernels for biological sequences. Bioinformatics, Leslie,C. et al. (2002) The spectrum kernel: a string kernel for SVM protein 18, S268–S275. classification. In Altman,R.B. et al. (eds) Proceedings of the Pacific Symposium Vapnik,V.N. (1998) Statistical Learning Theory. Wiley, New York. on Biocomputing 2002. World Scientific, Singapore, pp. 564–575. Vert,J.-P. (2002) A tree kernel to analyze phylogenetic profiles. Bioinformatics, 18, Leslie,C.S. et al. (2004) Mismatch string kernels for discriminative protein S276–S284. classification. Bioinformatics, 20, 467–476. Vert,J.-P. et al. (2004) Local alignment kernels for biological sequences. In Schölkopf,B. Mahé,P. and Vert,J.-P. (2006) Graph kernels based on tree patterns for molecules. et al. (eds) Kernel Methods in Computational Biology. MIT Press, Cambridge, Technical Report ccsd-00095488, HAL. Massachussetts, pp. 131–154.

Journal

BioinformaticsPubmed Central

Published: Aug 1, 2008

There are no references for this article.