Artificial intelligence in drug combination therapy

Artificial intelligence in drug combination therapy Abstract Currently, the development of medicines for complex diseases requires the development of combination drug therapies. It is necessary because in many cases, one drug cannot target all necessary points of intervention. For example, in cancer therapy, a physician often meets a patient having a genomic profile including more than five molecular aberrations. Drug combination therapy has been an area of interest for a while, for example the classical work of Loewe devoted to the synergism of drugs was published in 1928—and it is still used in calculations for optimal drug combinations. More recently, over the past several years, there has been an explosion in the available information related to the properties of drugs and the biomedical parameters of patients. For the drugs, hundreds of 2D and 3D molecular descriptors for medicines are now available, while for patients, large data sets related to genetic/proteomic and metabolomics profiles of the patients are now available, as well as the more traditional data relating to the histology, history of treatments, pretreatment state of the organism, etc. Moreover, during disease progression, the genetic profile can change. Thus, the ability to optimize drug combinations for each patient is rapidly moving beyond the comprehension and capabilities of an individual physician. This is the reason, that biomedical informatics methods have been developed and one of the more promising directions in this field is the application of artificial intelligence (AI). In this review, we discuss several AI methods that have been successfully implemented in several instances of combination drug therapy from HIV, hypertension, infectious diseases to cancer. The data clearly show that the combination of rule-based expert systems with machine learning algorithms may be promising direction in this field. artificial intelligence, drug combination, combination therapy, machine learning, genomic profile Introduction It is becoming increasingly clear that targeted combination therapy is the treatment of choice in many complex human diseases, particularly those resulting from biological dysfunction driven by alterations/mutations in several genes or/and gene networks as is usually the case in cancer [1]. However, selection of the most efficacious combinations for each patient can be a daunting task. For example, based on gene alterations, it would be not unreasonable for a cancer treatment regimen to require six or more drugs (each targeting a particular genetic alteration), and when one considers the multiple possible dosing regimens, the number of potential combinations rapidly multiplies, achieving numbers as high as 1011 [2]. Moreover, as the number and type of genetic alterations can vary widely from patient to patient, these types of choices must be considered for each individual—clearly beyond the capabilities of primary care physicians. Thus, computational approaches are clearly required, and there are currently a number of artificial intelligence (AI) methods being used for optimization of combination therapies. For example, an early AI application proposed for drug treatment strategy selection was the computer-based consultation system, MYCIN [3]. The goal of this rule-based expert system, with ∼600 rules, was to provide physicians therapy recommendations for patients with bacterial infections [3]. The reasoning evaluation mechanisms in MYCIN included a fuzzy logic function for combining uncertain assertions within each rule, and while MYCIN was never used in practice, it did achieve a 69% success rate in choosing the acceptable pharmacotherapy—which was better than that of infectious disease experts using the same criteria [3]. A subsequent AI application for the selection of drug therapy used an expert system was also devised, but in this case, a knowledge base of only around 100 rules (extracted from medical professionals) was used for selection of the best antimicrobial therapy [4]. Nevertheless, even with this limited knowledge base, the system was useful for medical professionals [4]. Attempts to improve these early AI applications by enhancing the expertise of the physician have been performed by combining of expert systems where knowledge embedded in the system is internally represented by means of frames and rules with artificial neural networks (ANNs) [5]. For example, multilayer (up to six layers) ANNs were used in combination with expert systems for the creation of a machine learning (ML) system for the diagnosis and treatment of hypertension [5]. In this system, the selection of hypertension drug combinations was accomplished by training the system using blood pressure time series measurements from ∼300 healthy subjects and 85 hypertensive subjects, which were divided to the learning set of data used for training the system and testing set of data [5]. The multilayer ANN appeared to extract the distinguishing features of the learning set of data and recognized these patterns even with noisy input sets of data [5]. This ANN, using more than three layers, was one of the first deep learning (DL) methods used for the selection of drug combinations. The authors point out that the multilayer ANN extracts the central tendencies of the learning set and may recognize these patterns with noisy such input sets of data. In a different example, a patent [6] has been issued for a system that uses patient information provided to a computational knowledge-based system comprised not only of a multiplicity of different pharmacotherapeutic treatment regimens for the disease but also expert rules for: (1) selecting the appropriate treatment option and (2) patient advisory information on the different constituents of the regimens [6]. AI solutions that have been found to be successful for combination drug therapy are described below. ML systems Artificial neural networks Perceptron is a type of neuron in a neural network defined by a linear combination followed by a thresholding activation function. A perceptron is an algorithm that uses binary classifiers to map input data onto appropriate outputs, such that the output is +1 (if the weighted sum of its inputs exceeds threshold) or −1 or 0 (if they do not exceed that threshold). As this is a simplified algorithm for a function of the neuron, the term perceptron is sometimes substituted with ‘artificial neuron’, and thus, this is the term from which ‘neural network’ is derived [7] (Figure 1). Figure 1. View largeDownload slide MLP with one hidden layer. Figure 1. View largeDownload slide MLP with one hidden layer. In some applications, in which more complex decisions are needed and postprocessing is required, the perceptron is replaced by a so-called ‘sigmoidal nonlinearity’ [8]. Wang and coauthors [9] described the use of three-layer ANNs for the selection of drugs for the treatment of HIV in patients in which previously unidentified (possibly acquired during the treatment) genetic mutations of viral DNA appear to confer drug resistance on the virus. A combination of three drugs is frequently used for the treatment of HIV infection. However, as these drugs do not cure the disease but rather suppress the replication of the HIV virus, this treatment regimen has turned this disease into a chronic manageable disorder requiring life-long drug dosing. However, >200 viral mutations have been identified, which can affect drug susceptibility/resistance. Unfortunately, there is no simple prediction strategy based on direct drug resistance–mutation relationships. Wang and colleagues [9] examined 351 HIV treatment patients using a three-layer ANN, where the output of the models was the follow-up viral load after treatment. Theoretically, three-layer ANNs can approximate any function [10], so the authors used just one hidden layer (the middle layer of nodes is called the hidden layer because its values are not observed in the training set). The abovementioned authors have, for a number of years, successfully applied ANN for the prediction of drug resistance on the basis of mutation patterns seen in the patients. There are two approaches to estimate the drug’s efficacy—phenotyping and genotyping. ANN used in prediction of Lopinavir drug resistance related to genetic aberrations is a good example of the use of ANN in medicine. For this ANN, 1322 samples from which 267 were drug-resistant and 1055 were drug-susceptible were used. In total, 117 samples were randomly selected and assigned a status of ‘independent test set’. The remaining samples were randomly separated into training and validation subgroups. The genotyping data were classified as either 1 (if a mutation existed) or 0 (if there were no mutations). The phenotyping data used were the fold change in virus resistance. Two models were used: (1) samples having 11 mutations corresponding to susceptibility to Lopinavir; and (2) 28 mutations significant to Lopinavir resistance. This approach paved the road to using genetic information of HIV protease for prediction of drug resistance. Recently, three-layer ANNs have been used for a machine learning-based prediction of the sensitivity of cancer cells to drugs based on genomic profiles of the cell lines and chemical properties of the drug compounds [11]. Data from the Genomics of Drug Sensitivity in Cancer project [12] (http://www.cancerrxgene.org/) were used for the genomic profiling of the cell lines, and Encog 3.0.1 (http://www.heatonresearch.com/encog) [13]-based neural network system was used. The system included a feed-forward multilayer perceptron (MLP) (the connectivity graph of which does not have any directed loops or cycles); in this MLP, three different levels, input, hidden and output layer were used. Every perceptron of a lower level was completely connected to each perceptron of a higher level. The number of the input neuron units was defined by the number of features selected. The networks were trained using ‘resilient error backpropagation’ from the Encog program [14, 15]. The performance of the abovementioned MLP was comparable with ‘random forest’ regression models generated from the same training data, and the results indicated the feasibility of using the ML approaches in optimizing drug therapies even in the face of noisy data [11]. More recently, ANN has been used to associate drugs with diseases based on the biological process that are altered in the disease setting (rather than individual gene target involved in the biological process), so-called ‘process pharmacology’ [16] Drugs were classified in terms of their ability to target the specified gene ontology (GO) biological process and then presented as self-organizing maps (SOMs). The technical details of preparing the SOMs using ANN was described in a previous set of publications [17–20]. The Drugbank database [21] and the database for annotation, visualization and integrated discovery (DAVID) [22] were used to associate the drugs with biological process using ‘overrepresentation’ or ‘enrichment’ analyses in GO terms [23]. The following strategy was applied: if a drug was related to a particular gene, which was annotated to a specific biological process, then drug–bioprocess connections were established. Each such interaction was scored, and the sum of these interactions was then used to provide the ‘strength’ of such connections. A special parameter was established to score a number of these connections. The scalar element-wise products of the two matrices, drug–gene and drug–bioprocess, were then calculated. In this way, the authors were able to identify antihypertensive drug classes and subclasses. The drugs were initially classified using empirical pharmacological knowledge into eight classes. The drugs were then classified again using the ANN ML on the GO-biological process associated with each drug. Additional classes of drugs were identified with this approach. Pivetta and colleagues [24] used ANN with the standard back-propagation for prediction of synergism of anticancer drugs. They experimentally determined cytotoxicity of the drugs alone and in combination on cell lines. Then, they train ANN on the results of such experiments. They used 60 combinations from which 15 were the validation set. The system helped to evaluate the cytotoxicity of all possible combinations in the space of chosen concentrations. Support vector machine Currently, support vector machines (SVMs) are one of the most popular linear classifiers [25]. The function of SVM can be described as follows: given the labeled feature vectors (x1, y1) …, (xm, ym), a hyperplane that separates the positively labeled samples from the negatively labeled samples can be found while ensuring that the closest point in each class is as far away as possible from the hyperplane. They also are known for their ability to perform nonlinear classification [26] (Figure 2). Figure 2. View largeDownload slide Schematic structure of an SVM. Reproduced from the open access source [26, 27]. Figure 2. View largeDownload slide Schematic structure of an SVM. Reproduced from the open access source [26, 27]. In a recent study, SVMs were used to investigate resistance to paclitaxel and gemcitabine in breast cancer [28]. In this study, the SVM was trained using the Statistics Toolbox in MATLAB and then tested with the leave-one-out validation [28]. The SVM was first trained on breast cancer cell lines and a multifactorial, principal component analysis (MFA) was performed. The MFA indicated expression of genes targeted by paclitaxel as an indicator of sensitivity, while copy number and expression of genes targeted by gemcitabine as indicative of gemcitabine sensitivity. The sequential backward feature selection for feature optimization using a method of Dash and Liu [29] was used to minimize the percentage of misclassified cells. Genes that did not reduce or change the classification error were removed from the SVM (one at a time) with iterations conducted until the removal of a gene resulted in higher classification error. The SVM excluded 2 of the 49 explored cell lines. Two SVM models were trained using (1) normalized expression values, and (2) expression values binned to 10 categories. The SVM was trained on 15 gene variables for paclitaxel (49 cell lines) and 10 variables for gemcitabine (44 cell lines). The trained SVM misclassified 18% of cell lines for paclitaxel and 16% for gemcitabine. The authors found that the mutations were not useful for the SVM function, and they were not used in the final SVM. This is unfortunate because, in general, mutations can be a useful instrument of stratification of cell lines. The possible reason for this failure is the strategy used for inclusion of the mutation information. A parameter ‘mutation status’ was implemented when a gene contained one or more mutations. The pathogenic status of mutation (is it deleterious or not) was determined using SIFT [30], a program based on status of conservation of amino acids (AAs) across different species. While this program can, in general, shed some light on the ‘importance’ of a selected AA, it should not be a stringent criterion for defining whether a mutation in a particular AA is deleterious. In addition, using information about the number of mutations is also not useful. However, information about activation/deactivation of genes can be useful. While some mutations have no effect on the functional activity of a protein, others can lead to inappropriate activation or inactivation of a gene product, thus using the ‘activation’ (or ‘inactivation’) status of a gene or gene product is likely to bring much more weight for genomic information to the function of SVMs. Random forest A random forest is a classifier consisting of a collection of tree-structured classifiers {h(x, Ak), k = 1, …} where the {Ak} are independent identically distributed random vectors, and each tree casts a unit vote for the most popular class at input x [31]. Random forests are an effective tool in prediction [32] (Figure 3). Figure 3. View largeDownload slide Three decision trees and a classification obtained from each of them. The final prediction is based on majority voting and will be ‘Class B’ in the above case. Reproduced from the open access source [32]. Figure 3. View largeDownload slide Three decision trees and a classification obtained from each of them. The final prediction is based on majority voting and will be ‘Class B’ in the above case. Reproduced from the open access source [32]. Because of the law of large numbers, as pointed by Breiman [31], they do not overfit, and injecting the right kind of randomness makes them accurate classifiers and regressors. Currently, random forest is considered one of most effective ML techniques. For example, Chen and colleagues [33] used the random forest classifier to predict effective drug combinations. Three types of properties were used for learning parameters including: (1) chemical interactions between drugs in combination (determined using STITCH [34]), (2) protein interactions between the targets of drugs (determined using STRING8 [35]) and (3) target enrichments based on KEGG pathways. The random forest analysis identified 55 different features that were recognized as important for predicting the best drug combinations [33]. Hansen and colleagues [36] used the random forest to predict drug–drug interactions for 220 drug groups in >60000 prescriptions. Six drug groups with known interactions were rediscovered by this method. Logistic regression Simple logistic regression is analogous to linear regression, except that the dependent variable is nominal, not a measurement. It calculates a probability of getting a particular value of the nominal variable associated with the measurement variable; the other goal is to predict the probability of getting a particular value of the nominal variable, given the measurement variable [37]. Huang and colleagues [38] used a logistic regression model of ML to predict potentially efficacious drug combinations through the analysis of the side effects (SEs) of the individual drugs. For model building, the clinical phenotypic information (i.e. observed SEs reported in clinic) was used. Information for the various SEs was extracted from drug labels included in SIDER [39] and OFFSIDES [40], which uses data mined from the FDA postmarketing surveillance system FAERS (FDA Adverse Event Reporting System [41]). In total, 239 pairwise drug–drug co-prescriptions for marketed drug combinations were used as the positive set, and 2291 unsafe pairs were used as the negative set. In the model, each drug SE was considered a feature, and each drug pair was represented as complex feature with values of SE features: ‘0’ if neither drug had an SE, ‘1’ if one of the pair had an SE and ‘2’ when both drugs had SE. Two other ML algorithms, ‘decision tree’ and ‘naïve Bayes’, were also used and compared with the logistic regression model and were found to have similar results. It is interesting to note that the ‘rule of three’ was found for the prediction algorithm. The rule says that pneumonia, hemorrhage rectum and retina bleeding were the top features defining the model performance. If any of these features was present—the SE was strong. Adding more features did not improve the prediction capability of the model, and only these three features were used in a general drug combination selection. Important approach was implemented for SE stratification: SEs were classified into two categories: efficacy-related and undesired. The SEs contributing to the therapeutic effects of the drugs were called ‘efficacy-related SE’. An example of which is hypoglycemia related to use of antidiabetic drugs. Thus, the best pair of drugs would share efficacy-related SEs while having a minimum of undesired shared SEs. The logistic regression model used in the study was from Python Scikit-Learn package [42], and both penalty and regularization strength parameters r were taken into consideration in the regression logistic model. The Weka decision tree learner was used for feature selection. Weka (cs.waikato.ac.nz/ml/weka) is an open-source collection of ML algorithms developed by the University of Waikato and is bundled together with tools for preprocessing data to make it more easily understood by the ML algorithms. Stochastic gradient boosting Stochastic gradient boosting (SGB) that was originally presented by Fridman [43] became a frequently used tool for regression and classification problems. As discussed by Xu and coauthors [44], SGB algorithm constructs a prediction model using an ensemble of weak classifiers, typically decision trees. It builds the model in a stage-wise fashion and constructs additive regression models bysequentially fitting a simple parameterized function (base learner) to current pseudo-residuals by least squares at each iteration. As the features for the SGB algorithm were used: molecular 2D structures, drug structural similarity, anatomical therapeutic similarity, protein–protein interaction, chemical–chemical interaction and disease pathways. Three popular ML algorithms were used, and [44] SGB performed the best in comparison with naïve Bayes and SVM. The authors used this approach for 65 FDA-approved antihypertensive drugs to select the possible drug pairs and found that 6 of 17 predicted optimal drug combinations were already used in medical practice. Bayesian models Naïve Bayes is a statistical classification method based on the Bayes rule of conditional probability, which states that, given two events A and B, the probability of event A occurring, given that B has already occurred, P(A|B), is given by the equation: P(A|B)=P(B|A) P(A)/P(B), where P(A) and P(B) are the probabilities of events A and B, respectively. The Bayesian classifier is called naïve because it naïvely assumes the features are independent [45]. In elucidation of drug similarity and possible interactions, a significant role plays selection of attributes describing the drugs. There can include 2D and 3D structures parameters, types of atoms and bonds included in the drug compounds, targets of the drugs, etc. Bayesian methods were used for calculation of these attributes. Schuffenhauer and coauthors [46] proposed similarity metrics for selection of ligand similarity for specified targeted proteins. They introduced the so-called Similog keys, which are counts of atom triplets. Each triplet is characterized by the graph distances and the types of its atoms. The atom-typing scheme classifies each atom by its function as H-bond donor or acceptor and by its electronegativity and bulkiness. These are suitable types of molecular descriptors (fingerprints) of small molecules, although since then a number of other molecular descriptors have been suggested. The main point of such descriptors is to transfer the 2D or 3D structure of compounds into a numerical value that may be compared the values found in other molecules. Glick and colleagues [47] used Bayesian models with probabilities calculated using a Laplacian-corrected estimator as described earlier [48, 49] to predict targets of drug compounds. As they point out, ‘Machine learning algorithms are largely dependent on the training data sets. The quality of curation of the underlying chemogenomic database is vital to the success of the computational model’. In the case of antineoplastic, the authors successfully predicted the targets of the drugs used including tubulin, growth factor receptors [epidermal growth factor receptor (EGFR), FGFR, VEGF-R and PDGF-R], cell cycle and cell signaling kinases [PKC, PKA, CDK2, CDK4, Tie-2, adenosine kinase (AK), c-Src, Flt-1, Lck, TMPKmt and CSBP/p38] and some other proteins not included in these classes. Ren etal. [50] called their approach ‘Positive-Unlabeled learning’. It included a consequent use of naive Bayes and iterative SVM methods. They used also SOMs for clustering to elucidate the drug–drug interactions. Authors used the chemical, structural and other attributes of drug compounds to calculate their similarity and, according to the concept of Vlilar and coauthors [51], predicted drug–drug interaction based on these. Network-based modeling During the progression of cancer, genes related to cell proliferation, survival and apoptosis are likely to display genomic alterations. Zaman and colleagues [52] defined that ‘if the genes related to regulation of proliferation have non-synonymous mutations or are amplified—they became the cell-survival-related driving regulators’. The combination of properly selected parameters, such as hub-genes (i.e. genes which have connections to a significant number of other genes), cancer-essential genes and abovementioned driving regulators, made it possible to separate the basal-specific and luminal-specific gene subnetworks of breast cancer. Using Go-guided Markov Cluster (MCL) algorithm [53] together with their network approach, Wang and coauthors [54] demonstrated that these two types of breast cancer have markedly different functional modules of cancer development. In luminal-specific breast cancer, the main functional module was centered around CDK1/MYC and was related to the regulation of the cell cycle. In basal-specific breast cancer, the first module was centered around P53 for apoptosis regulation (or rather deregulation), while a second module was described, which was centered around EGFR and MAPK/MET (AKT/PIK3CA growth factors) both of which are related to regulation of cell proliferation. These results indicate clear differences in breast cancer subtypes and are important in the design of personalized drug therapy paving the road to more precise selection of drug targets in breast cancer. In a separate study, which also used the network approach, Li and colleagues [55] analyzed phosphotyrosine signaling, which is important in cancer. Using an evolutionary trajectory analysis, the authors found that tyrosine kinases can be separated into three specific groups based on their evolutionary origins (i.e. primitive, bilateral and vertebrate). These groups of tyrosine kinases differ by their cellular signaling function, such that those tyrosine kinases derived from primitive organisms are generally part of intracellular signaling, those with a bilaterian origin are largely involved in intercellular and extracellular signaling, while those tyrosine kinases which evolved mainly in vertebrates are more likely to be involved in tissue-specific signaling. The findings of this study were aided by the fact that authors considered the tyrosine kinase as a functional unit or ‘circuit’ comprised an inter-related triad of core functions, which include the ‘writer’ (the tyrosine kinase which phosphorylates the substrate), the ‘reader’ (for example, the SH2 domain which reads the modification) and the ‘eraser’ (the phosphatases which removes/deletes the phosphorylation modification on the substrate). Such an ‘elementary unit’ approach makes it possible to select much more powerful descriptors for genes/proteins that would improve a possible ML approach for drug-related predictions. If the molecular descriptors of drugs and drug-like compounds are developed comprehensively, for example descriptors in PaDEL [56, 57] and MOE (CCG, Montreal, Canada) programs, similar descriptors representing ‘hallmarks’ of cancer are not as comprehensive nor as well defined. The development of these descriptors is an on-going process; nevertheless, I can note the hallmark descriptors that are proposed in the Cancer Hallmark Network Framework [54]. These cancer hallmarks are represented by molecular/signaling subnetworks [52]. A network operational signature descriptor is introduced that can describe the state transitions from genomic alterations to clinical phenotypic profiles. The important concept of self-promoting positive feedback loops during tumorigenesis is also introduced. The authors also show that some Hallmark Networks can trigger genome duplications and eventually tumor development changes. McGee and coauthors [58] proposed that extremely small regulatory subnetworks, containing as few as three components, can act as positive regulators leading to prolonged activity of the network. A ‘brick’ of such positive regulation of the entire network would then be FFL—a feed-forward loop—consisting of a triad of genes including a ‘target’ gene and two input genes regulating each other and jointly regulating the target gene. Extremely important, and not explicitly stated by authors of abovementioned hallmark papers, is the concept that only the self-activating positive feedback subcircuits of cancer-related signaling and perhaps the metabolic networks must be taken in consideration for the predication of possible cancer development. These ‘bricks’ and their standard combinations would be useful as prediction descriptors. Such descriptors can significantly diminish overall number of parameters needed to be taken into consideration for ML prediction schemes. To select cancer hallmark-based gene signatures, Li and colleagues [59] used the cancer-related GO-terms as additional descriptors, and a special MCC algorithm of machine learning. This approach helped to segregate ‘driver’ mutations in genes from ‘passenger’ genes, and the signature sets appear to have a high predictive activity for patients’ clinical outcomes. DL multilayers ANN Deep convolution neural networks As pointed by Albelwi and Mahmood [60], convolution neural networks (CNNs) were developed using the concept of mammals’ visual cortexes as presented in Hubel and Wiesel’s model [61] (Figure 4). Figure 4. View largeDownload slide The structure of a CNN, consisting of convolutional, pooling and fully connected layers. Reproduced from the open access source [60]. Figure 4. View largeDownload slide The structure of a CNN, consisting of convolutional, pooling and fully connected layers. Reproduced from the open access source [60]. Recently, Preuer [62] described the use of DL for the optimization of drug combinations. The sets of parameters that were applied to the input of the multilevel DL neural network (DLNN) are described. The selection of the proper parameters is one of the main problems in ML systems. In the described multilayered neural network (MNN), chemical and biomedical data are applied to the input. In addition to the usual molecular descriptors of drug compounds covering both the 2D and 3D structures of chemical compounds, as actual experiments with cell lines and drug combinations are described, the dose response [EC50—the drug concentration at which half of the maximum effect is reached (cell death in this case)] of the drug was also included as a molecular descriptor [62]. The biomedical data (also termed ‘molecular data’ by the author) encompassed several hallmarks of cancer cells, including: activating invasion and metastasis, inducing angiogenesis, enabling replicative immortality, resisting cell death, sustaining proliferation signaling, evading growth suppressors, deregulating cellular energetics and avoiding immune destruction [63, 64]. Biomedical parameters, such as point mutations; small-scale insertions, deletions and duplications; copy number variations (CNVs); and DNA methylation were also included. The MNN was trained to predict a synergy score describing the differences between the observed effect of a drug combination and simple addition of participating drug effects. In real life, combining of two drugs often can lead to drug effects beyond simple addition of their impacts. Drugs in combinations can have completely independent, additive, synergistic or antagonistic impacts. One of widely used models for calculating of drugs synergy is the Loewe additivity model [65]. There also are more recent models, such as the Bliss independent action model [66] and its modification, the Berenbaum model [67]. The synergy score was also calculated using the Combenefit program [68]. Vougas and colleagues [69] recently used DLNNs [70] enhanced by Bagging Ensemble Learning [71] for the prediction of drug response in cancer. The studied sets included 689 cancer cell lines and 139 therapeutic compounds. The Genomics of Drug Sensitivity in Cancer (GDSC) [12] set was used for drug response source. Five main parameters (i.e. tissue of origin, gene expression, mutation status, CNV and drug response) were used to generate the comprehensive rule set containing all tissue-to-gene, tissue-to-drug, gene-to-gene, gene-to-drug and drug-to-drug associations. Owing to computer power limitations, only tissue-to-drug, gene-to-drug and drug-to-drug associations were used. The DLNN framework, H2O.ai (http://www.h2o.ai/) a cluster-ready framework ready for high-performance computers, was used for modeling. The Standardiser program [72] was also used to provide similar notations for all the compounds that came from different sources. Finally, the PaDEL descriptor, an open-source software [56, 57], was used to calculate the molecular descriptors of the drugs. Recurrent neural networks Proposed in 1989 by Williams and Zipser [73], recurrent neural networks (RNNs) are specifically suitable for analyzing of the data streams and are useful when the output depends on previous computing [74]. LSTM (long short-term memory unit) is a variation of RNN proposed by Hochreiter and Schmidhuber [75]. LSTM is convenient for applications with long-time lags of unknown size between important events [73–75]. It was used for the analysis of patient data histories. Proper classification of a diagnosis based on patient history is difficult. Episodes are different in length, ranging from a couple of hour to several months, and observations and laboratory tests are irregular. In addition, for cancer patients, the treatments are changed on an irregular basis. Lipton and colleagues [76] successfully used LSTM for recognition of patients’ diagnoses using time series training of the program with highly irregular time points and lab measurements, and the results potentially provide a means to more precise combination therapy administration. Lusci and colleagues [77] studied the opportunities to build aqueous solubility predictors that would overperform the current methods. They created an original method to use DAG-RNN (directed acyclic graph recursive neural networks) to describe the undirected graph-based systems. The descriptors, logP, first-order valence connectivity index, delta chi and information content were used. They validated this approach with UG-RNN (Undirected Graph Recursive Neural Networks) on the sets of >1000 molecules and show that it is strong method and in some variants giving better prediction than existing methods. Deep belief networks Deep belief network (DBN) was proposed by Hinton and colleagues [78] and, as pointed by Ravi and colleagues [74], DBN can be described as a composition of RBMs (restricted Boltzmann machine) with undirected connections at the top two layers and directed connections in the lower layers [79] (Figure 5). Figure 5. View largeDownload slide Deep belief network with three hidden layers organizing three RBMs. h – hidden and v – visible layer. Reproduced from the open-access source [85]. Figure 5. View largeDownload slide Deep belief network with three hidden layers organizing three RBMs. h – hidden and v – visible layer. Reproduced from the open-access source [85]. Ibrahim and colleagues [80] used DBNs for multilevel feature selection from genes and miRNA data. The results obtained showed that DBN outperformed the classical feature selection of specific data. Ghaisani and colleagues [81], using clinical and microarray analysis data, demonstrated that a combination structure of DBM and Bayesian network (BN) called DBN-BN overperformed traditional ML techniques like SVM and k-nearest neighbor in predictions of patient overall survival (OS) and disease-free survival. The authors state [81] that the combined DBM-BN approach in such an analysis overperformed the approach of Khademi and Nedialkov [82], in which the clinical model is constructed using BN, while the microarray model is constructed using (DBN). One of developments of DBM is CDBN—convolutional deep believe networks [83] that are similar to CNN but which are trained in a manner more similar to DBN—in this way exploiting the advantages of both methods [84]. Cao and coauthors [85] developed a method for the assessment of the quality of protein models based on DBN, which performs better than SVM. Protein structure prediction is important for the assessment of possible drug binding for combination therapy. Deep Boltzmann machine Deep Boltzmann machine (DBM) was proposed by Salakhutdinov and Hinton [87], and consists of n layers of neurons. Usually, the states of the neurons are taken to be binary, xi∈{0,1}, indicating whether a unit is ‘on’ or ‘off’, but it can use continuous-valued, rectified linear units [86, 87]. The schemes of General Boltzmann and Restricted Boltzmann machines [88] are presented on Figure 6. Figure 6. View largeDownload slide Left figure: a general Boltzmann machine. The top layer shows stochastic binary hidden units, and the bottom layer shows stochastic binary visible units. Right figure: Restricted Boltzmann machine. the joints between hidden units and also between visible units are disconnected. Reproduced from the open-access source [88]. Figure 6. View largeDownload slide Left figure: a general Boltzmann machine. The top layer shows stochastic binary hidden units, and the bottom layer shows stochastic binary visible units. Right figure: Restricted Boltzmann machine. the joints between hidden units and also between visible units are disconnected. Reproduced from the open-access source [88]. The states of each layer are written as vectors, denoted by X(0),…,X(n) (together denoted by X). Units xi and xj in adjacent layers are connected by symmetric connections with connection weight wij (modeling synaptic strength). For each adjacent pair of layers k and k+1, the weights can be combined into a weight matrix W(k). Each unit also has a bias parameter bi that determines its activation probability by functioning as a baseline input. In a traditional DBM, there are no lateral connections between units within a layer [89]. One of the main disadvantages of DBM is the significantly greater time needed for its function, which may be a problem with the large data sets [74, 90]. It was used for extraction of latent hierarchical representation from 3D patches of brain images [74, 91], and DBM learning was successfully used for the early diagnosis of Alzheimer’s disease [92]. The authors used a large data set from the Alzheimer’s disease neuroimaging initiative (ADNI), and cross-validation proved that the proposed method is not only valid for the differentiation between controls (NC) and AD images but it also provides good performance when tested for the more challenging case of classifying mild cognitive impairment subjects [92]. Such results may be used for creation of disease stage-oriented combination therapy strategy. Deep autoencoder learning In general, deep autoencoder (DA) refers to symmetric DBNs, which contain ‘encoder’ and ‘decoder’ parts [93]. The layers are restricted Boltzmann machines (Figure 6, right and Figure 7). Figure 7. View largeDownload slide The three input values are encoded to two feature variables. Pretraining defines the weight matrices W1 and W2. Reproduced from the open-access source [93]. Figure 7. View largeDownload slide The three input values are encoded to two feature variables. Pretraining defines the weight matrices W1 and W2. Reproduced from the open-access source [93]. The DA technique was used by Li and colleagues [94] for a template-based protein tertiary structure prediction. They used a version called ‘deep learning stacked denouncing autoencoder’ called PRSDA. In this study, 3D coordinates of four backbone atoms for each residue were used as parameters for the model, and the homology models were used for training the weights of the PRSDA model. Automatic chemical design of the drug molecules was proposed using a pair of neural networks trained together as an autoencoder [95]. The ‘deep learning stacked autoencoder’ method was successfully used for the prediction of drug–target interactions based on protein sequence parameters and substructure fingerprint information of the compounds [96]. Total 5-fold cross-validation demonstrated strong performance on a set of real examples with accuracy up to 94%. A combination of DL stacked autoencoder and ‘learning algorithm biased support vector machine’ (BSVM) was successfully used for the prediction of drug protein targets. The authors used as descriptors, the properties of the AAs of possible target proteins including tiny, small, aromatic, aliphatic, polar, nonpolar, charged and basic; they also included single-peptide cleavages, transmembrane helices, low complexity regions, N-glycosylation and O-glycosylation as descriptors. In total, using 39 properties as descriptors, Wang and colleagues [96] demonstrated high efficiency in predicting drug–target interactions using Stacked Autoencoder DNN. To describe the compounds, the authors used 881-2D features descriptors that can be downloaded from the PubChem website. To describe the drug–target interactions, the authors used a set of 5127 drug–target pairs from the following databases: SuperTarget [97], DrugBank [21], KEGG BRITE [98] and BRENDA [99] collected by Yamanishi and colleagues [100, 101]. On gold standard data sets (enzymes, ion channels, G protein-coupled receptors and nuclear receptors), the methods resulted in AUC values of 94.25% (83.2%), 91.10% (79.9%), 87.43% (85.7%) and 81.76% (82.4%). Selection of parameters for combination therapy ML In general, parameters needed for design of ML system in combination therapy can be divided into four groups. The first group, ‘physical and chemical parameters of compounds-based’, can include a significant number of 2D and 3D parameters. For example, the PaDel database [56, 57] is composed of >1000 descriptors related to such various compound parameters. The second group, ‘biochemical result based’, includes activity changes of the target biomolecule (protein, DNA, RNA, etc.) when the drug(s) are administered (e.g. changes in signaling or/and metabolic pathway activities). The third group, ‘cell-related results based’, includes changes in cell motility, proliferation, movement, etc., after the drug administration. The fourth group, ‘medical results based’, includes the initial characteristics of patients (gender, age, preliminary history of medications, etc.), as well as changes in patients’ state (PFS, OS, etc.) after treatment. These parameters also include changes in genomic, proteomic and metabolomic profiles after drug administration. Also, other data can be included the patients’ initial state and after treatment parameters, for example, gene aberrations and DNA methylation/acetylation, proteomic and metabolomic profiles, state of patient’s health and diagnosis. A more detailed description is presented in Table 1. Table 1. Parameters that can be used as descriptors in ML models for drug combination therapy efficacy prediction Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Table 1. Parameters that can be used as descriptors in ML models for drug combination therapy efficacy prediction Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Comparison of ML methods We compared Table 1) SVM, MLP Neural Nets, Bayesian, decision tree and random forest methods and DNN methods in different biomedical problems. I can state that there is no clear leader in traditional ML algorithms. The SVM method is the best in two cases: the first is related to microarray analysis [102], and the second is related to the prediction of bioactivity of drug-like compounds, although in the last case it performed somewhat worse than DNN CNN method [103]. The Bayesian method is the best for feature recognition in ultrasound images [104]. The random forest method was the best in two cases: first, in the recognition of features in MRI images [105], and second, in the prediction of drug-induced nephrotoxicity based on biochemical data [106]. DNN accuracy was, in all cases, better than traditional ML methods. In some cases, the improvement was not as great as expected, including in the prediction of bioactivity of protein inhibitors based on biochemical data [103], the recognition of lymph node metastasis from PET scan images [107], the detection of retinal detachment [108], the identification of autism spectrum disorder from the brain images [109] and the sequence-based prediction of protein–protein interaction [110]. In several other cases, DNN performed significantly better than traditional ML, including in image-based pulmonary nodule recognition in lung cancer [111], discrimination of breast cancer with microcalcifications on mammography [112] and detection of intracranial hypertension based on ECG and intracranial pressure data [113]. From the abovementioned results, one can guess that in traditional ML, there is no ‘champion’, and the results depend mostly on proper parametrization and descriptor selection. DNN performed better than traditional ML, but in each case, it is worth it to estimate a ratio of resource spending and accuracy improvement. Drug synergism and antagonism prediction DREAM—community computational challenge in prediction of drugs synergism and antagonism The competition organized by DREAM Challenges initiative and NCI involved 31 science teams from many countries. The problem was to solve the ability to predict if two drugs in combination were going to be synergistic or antagonistic based on their separate impact on OCI-LY3 human diffuse large B-cell lymphoma (DLBCL) cell line [114]. Organizers of this contest specifically stated that no preliminary training of the pairs of compounds known to be synergistic or antagonistic was permitted—clearly to prevent the use of any ML approach. In the context of this review, I think the importance of elucidating the results of this competition is for the establishment of boundaries for a possible non-ML approach in this field. Participants were provided with (i) dose–response curves for viability of OCI-LY3 cells following perturbation with 14 distinct compounds, (ii) gene expression profiles of the same cells including untreated and treated following perturbation with each of the 14 compounds and (iii) the previously reported baseline genetic profile of the OCI-LY3 cell line. The best-performing method DIGRE (drug-induced genomic residual effect) was based on the hypothesis that if cells are treated sequentially by two compounds, the transcription profiles induced by the first compound affect the outcome of the second compound. These assumptions were based on the previous work of Shah and Schwartz [115] and Recht with colleagues [116]. The second best-performing method (IU_UI-CCBB) was based on assumption that the activity of a compound can be estimated directly from differentially expressed genes after the treatment. Compound synergism or antagonism was defined from the concordance of the expression profiles in both cases. Several best methods were statistically significant in prediction of synergy (37.5 versus 17.5% by random selection) [114]. When competition arbiters created the integral predictor using all prediction methods used by the participants, the best predictive value was close to 46% sensitivity for synergy and 51% for antagonism. Not participating in the contest, but still taken in consideration, was the method by SnuGen which uses the Master Regulator Inference algorithm (MARINA) [117–119]. This MARINA approach was found to have 56% synergy prediction. Their approach, based on elucidation of ‘Mater Regulator’ genes, can be used in ML methods for selection of the valuable descriptors. Network-based Laplacian regularized least square synergistic drug combination prediction Prediction of synergism and antagonism of drugs is a valid problem because, taking into consideration the number of drugs, it is simply not possible in reasonable time to validate all possible combinations. Many scientists have tried to create computational methods aimed at such a prediction. In the previous section, we described the approach that specifically did not use ML and the best its result was a prediction with 46% of accuracy the synergy between two drugs, despite the efforts of 31 teams from around the world—applying ML approaches to the problem gives much better results. The method that Chen and colleagues [120] called Network-based Laplacian regularized least square synergistic drug combination prediction (NLLSS) was based on Laplasian regularized least square (LARLS). In the NLLSS strategy, several types of information are integrated, including known synergistic drug combinations for specific pathogens, drug combinations that do not show synergism, drug–target interactions and drug chemical structure. Sixty-nine compounds involved in antifungal drug combination experiments were studied. All published experimental studies of drug combinations were collated from the public sources. The authors classified compounds as either principal drugs or adjuvant drugs—if one compound in the synergistic combination shows experimental activity, but the other does not, then the first compound is considered the principal drug and the second the adjuvant drug. If both compounds in the synergistic pair show experimental activity or neither one shows activity, then these two compounds are named both principal and adjuvant drugs. If one compound does not have experimental effect with any of other compounds, then this compound is named according to its experimental activity. Using the NLSS approach, the authors achieved 89% prediction in 10-fold cross-validation. Rules-based optimization Drug treatment selection Rule-based expert systems are ‘a central foundational pillar of artificial intelligence’ as pointed out by Lathrop and Pazzani [121]. These authors describe the simple rules that are used in CTHIV (a rule-based expert computer program, ‘Customized Treatment Strategies for HIV’), a system in which drug treatment recommendations are made using: drug-resistant mutations, ranking and weighting based on the antiviral activities of the drug, overlapping toxicity’s, relative levels of drug resistance and the proportion of drug-resistant clones in the patients’ HIV quasi-species. The expert system is rule based, and the rules were written based on public information and case studies. For example, one rule in this system is [121, 122]: IF the value of RT codon number 151 is ATG (= it encodes methionine), THEN infer resistance to AZT, ddI, d4T, and ddC WITH weight = 1.0 The weigh in this rule is not ‘confidence’ as in standard expert systems but rather corresponds to estimated level of viral resistance to a specific drug. Weighs are in the range of 0.1 (low) to 1(high) and are defined by publication or/and expert opinions. In total, 55 rules are in the knowledge base and, in the case of HIV, the mutation of viral proteins is taken into consideration. A similar concept which takes into consideration the mutations of human genes can be used for estimation of possible drug resistance. For example, a famous mutation, C790T in the EGFR defines the resistance of the cancer cells to reversible tyrosine kinase inhibitors. The authors of CTHIV claim the ability to predict results for one to four drugs for optimal combination therapy. A potential drawback of this approach is that not only are real drug-resistant mutants considered but ‘nearby’ mutants—the HIV genes in the neighborhood or genes having drug-resistant mutations—are also considered. This can be misleading, as there is high specificity of drug binding to these proteins. Drug Interaction Knowledge-Base (DIKB) is a knowledge representation system designed to predict DDIs using drug action mechanisms [123]. Its knowledge base includes statements about drugs, drug metabolites and enzymes whose interactions are modeled basing on rule-based theory. Drug resistance elucidation There exist a number of rule-based systems that use the expert knowledge of HIV resistance mutations in viral proteins. Twenty of the early rule-based systems were devoted to elucidation of genotypic drug resistance for antiviral therapy in AIDS [121]. Several rules are embedded in such systems. For example, a rule: Y181C and E138C mutations in the virus reverse transcriptase cause resistance to etravirine, while addition of E138K mutation to Y181C decreases the level of resistance to this drug compared to Y181C alone [124]. Another rule states that the missense mutation N88S induces hypersensitivity to amprenavir [125]. These systems include databases covering all possible combinations of drug-resistant-associated mutations. Identification of a disease Total 20–25% of patients with primary hyperparathyroidism have multigland disease. Proper identification is important for decisions on medical treatment or surgery. Imbus and colleagues [126] studied 2010 patients of with primary hyperparathyroidism from a clinical trial. Medical imaging data were used for analysis. Random tree ML classifier had 96.1% predictive accuracy in selection. When a rule-based classifier was added the accuracy grew to 100%. Prediction of response to antiretroviral treatment Prosperi and colleagues [127] studied 3143 treatment change episodes for HIV patients from the EuResist database, which included patient demographics, treatment history and viral genotypes. Initial logistic regression ML model f prediction performed better than the rule-based genotypic interpretation system (accuracy 75.6 versus 70.0%) and more similar to random forest model (76.2%). Nevertheless, when the authors combined rule-based genotypic interpretation system with additional patients’ attributes, and this combination was used as input data for the regression model, the performance of the system increased significantly [127]. Discussion A number of AI methods are used in combination drug therapy. In many cases, the level of confidence—a percentage of correct predictions—varies between 0.7 and 0.9, what is comparable with most automatic prediction systems. The differences between the various types of ML used for these prediction scores are not too great. ANNs, random forest and SVM all have advantages and disadvantages, and the main problem in using AI for combination therapy is the proper selection of input parameters. It is crucial for the predicting methods that the parameters affecting the quality of the prediction model be applied. From this point of view, the work of Dash and Liu [29] on the use of SVM is particularly interesting. In that study, the authors filter the input parameters and then withdraw one parameter at a time to determine how the effectivity of the model deteriorates. Drug resistance is most probably the best example in which an AI system should include a combination of expert rules and machine learning. Indeed, the knowledge that some specific mutation leads to resistance to a specific drug is the result of expert knowledge based on the biomedical data. When the number of cases containing the same aberrations or aberrations with the similar functional impact on genes increases sufficiently the prediction system would be able to use the ML methods. Deep learning versus traditional machine learning A general conception regarding DL is: DL requires more of everything: more source data, more computational brawn and more memory and storage resources. Shaikh and colleagues [128] note that DL requires a lot of hardware. ‘I have seen people training a simple DL model for days on their laptops (typically without GPUs) which leads to an impression that DL requires big systems to run execute’. Kumar and colleagues [129] confirm this common point of view: ‘Deep learning requires large amount of computational power to train models with these large datasets. Nevertheless, with the cloud and availability of Graphical Processing Units (GPUs), it is becoming possible to build sophisticated deep neural architectures and train them on a large data set on powerful computing infrastructure on the cloud.’ As noted by Cui and colleagues [130] in DL, ‘large multi-layer neural networks are trained without preconceived models to learn complex features from raw input data. With sufficient training data and computing power, DL approaches far outperform other approaches for such tasks. The computation required, however, is substantial—prior studies have reported that satisfactory accuracy requires training large (billion-plus connection) neural networks on 100 s or 1000 s of servers for days [7, 14]’. Neural network training is known to be well supported by GPUs but, as noted by Chilimbi and colleagues [131], this approach is only efficient for smaller-scale neural networks that can fit on GPUs attached to a single machine. The challenges of limited GPU memory and inter-machine communication have been identified as major problems of GPU introduction to DL [130]. Yepes and coauthors [132] compared SVM and Deep Belief Networks as classifiers in text categorization in biomedical domain. They show that DBH are superior when a large set of training examples is available, with an F-score increase up to 5%. SVM performance is superior to DBM with smaller datasets. The differences in the best accuracies even for the larger data set of 7688 input examples are modest (e.g. the accuracy of SVM is 0.89, while the best for DBM is 0.90). So, when one decides whether to use DL instead of traditional ML methods, he/she must clearly understand all pros and cons for the introduction of DL. As I showed above (Table 2), only in three cases of eight DL significantly overperformed ML in solving biomedical problems. If one has a lot of training examples and available GPU containing computers or cloud computing, it is most likely that one will have to move to DL, and in the case of smaller data sets, one will have to decide on a case-by-case basis. Table 2. Comparison of ML methods accuracy for different biomedical problems Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 * Stacked autoencoder, ** Deep autoencoder. Table 2. Comparison of ML methods accuracy for different biomedical problems Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 * Stacked autoencoder, ** Deep autoencoder. Existing programs for ML Above I present a table of most popular ML programs that can be downloaded or/and used by both beginners and experienced users/programmers (Table 3). Table 3. ML and DL programs that can be used Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Table 3. ML and DL programs that can be used Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Key Points ML strategies are successful in drug design. Main problem in using AI for combination therapy is the proper selection of input parameters. Combination of rule-based and ML methods is promising in combination therapy. Funding This article is partially supported by CureMatch Inc. Igor F. Tsigelny is an expert in structural biology, molecular modeling, bioinformatics, structure-based drug design and personalized medicine. He published >200 articles, 4 scientific books and around 15 patents. The book ‘Protein Structure Prediction: Bioinformatic Approach’ that he edited has been called ‘The Bible of all current prediction techniques’ by BioPlanet Bioinformatics Forums. His computational study of molecular mechanisms of Parkinson’s disease was included in the US Department of Energy publication ‘Decade of Discovery’ where the best computational studies of the decade 1999–2009 have been described. He is a Research Professor in the UC San Diego and CTO of CureMatch Inc. (San Diego). References 1 Calzolari D , Bruschi S , Coquin L. Search algorithms as a framework for the optimization of drug combinations . PLoS Comput Biol 2008 ; 4 ( 12 ): e1000249 . Google Scholar CrossRef Search ADS PubMed 2 Calzolari D , Paternostro G , Harrington PL. Selective control of the apoptosis signaling network in heterogeneous cell populations . PLoS One 2007 ; 2 ( 6 ): e547 . Google Scholar CrossRef Search ADS PubMed 3 Shortliffe EH , Buchanan B. A model of inexact reasoning in medicine . Math Biosci 1975 ; 23 ( 3–4 ): 351 – 79 . Google Scholar CrossRef Search ADS 4 Shortliffe EH , Axline SG , Buchanan BG , et al. An artificial intelligence program to advise physicians regarding antimicrobial therapy . Comput Biomed Res 1973 ; 6 ( 6 ): 544 – 60 . http://dx.doi.org/10.1016/0010-4809(73)90029-3 Google Scholar CrossRef Search ADS PubMed 5 Poli R , Cagnoni S , Livi R , et al. A neural network expert system for diagnosing and treating hypertension . Computer 1991 ; 24 ( 3 ): 64 – 71 . http://dx.doi.org/10.1109/2.73514 Google Scholar CrossRef Search ADS 6 Barry DW , Underwood CS , McCreedy BJ , et al. US Patent 6188988. 7 Pedregosa F , Varoquaux G , Gramfort A , et al. Scikit-learn: machine learning in Python . J Mach Learn Res 2011 ; 12 : 2825 – 30 . 8 Vidyasagar M. Identifying predictive features in drug response using machine learning: opportunities and challenges, identifying predictive features in drug response using machine learning: opportunities and challenges . Annu Rev Pharmacol Toxicol 2015 ; 55 ( 1 ): 15 – 34 . http://dx.doi.org/10.1146/annurev-pharmtox-010814-124502 Google Scholar CrossRef Search ADS PubMed 9 Wang D , Larder BA , Revell A , et al. A neural network model using clinical cohort data accurately predicts virological response and identifies regimens with increased probability of success in treatment failures . Antiviral Therapy 2003 ; 8 : S112 . 10 Bishop CM. Neural Networks for Pattern Recognition . Oxford : Clarendon Press , 1995 . 11 Menden MP , Iorio F , Garnett M , et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties . PLoS One 2013 ; 8 ( 4 ): e61318 . Google Scholar CrossRef Search ADS PubMed 12 Garnett MJ , Edelman EJ , Heidorn SJ , et al. Systematic identification of genomic markers of drug sensitivity in cancer cells . Nature 2012 ; 483 ( 7391 ): 570 – 5 . http://dx.doi.org/10.1038/nature11005 Google Scholar CrossRef Search ADS PubMed 13 Heaton J. Programming Neural Networks with Encog3 in Java . St. Lois : Heaton Research, Inc. , 2011 . 14 Menden MP , Iorio F , Garnett M , et al. A direct adaptive method for faster backpropagation learning - the rprop algorithm . IEEE Intern Conf Neur Netw 2013 ; 8 ( 4 ): 586 – 91 . 15 Dosovitskiy A , Fischer P , Springenberg JT , et al. Discriminative unsupervised feature learning with exemplar convolutional neural networks . IEEE Trans Pattern Anal Mach Intell 2016 ; 38 ( 9 ): 1734 – 47 . http://dx.doi.org/10.1109/TPAMI.2015.2496141 Google Scholar CrossRef Search ADS PubMed 16 Lötsch J , Ultsch A. Process pharmacology: a pharmacological data science approach to drug development and therapy . CPT Pharmacometrics Syst Pharmacol 2016 ; 5 ( 4 ): 192 – 200 . Google Scholar CrossRef Search ADS PubMed 17 Ultsch A. Maps for visualization of high-dimensional data spaces. In: Proceedings of Workshop on Self-Organizing Maps. Kyushu, Japan: WSOM, 2003 , 225–30. 18 Ultsch A , Sieman HP , eds. Kohonen’s self-organizing feature maps for exploratory data analysis. In Proceedings of International Neural Networks Conference (INNC 1990). Dordrecht, Netherlands: Kluwer, 1990 . 19 Lotsch J , Ultsch A. Exploiting the structures of the U-Matrix. In: Villmann T , Schleif FM , Kaden M , Lange M. (eds). Advances in Intelligent Systems and Computing . Heidelberg, Germany : Springer , 2014 , 248 – 57 . 20 Ultsch A , Moerchen F. Databionic ESOM tools 2005. http://databionic-esom.sourceforge.net/devel.html 21 Wishart DS , Knox C , Guo AC , et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration . Nucleic Acids Res 2006 ; 34(Database issue) : D668 – 72 . Google Scholar CrossRef Search ADS 22 Huang DW , Sherman BT , Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources . Nat Protoc 2009 ; 4 ( 1 ): 44 – 57 . http://dx.doi.org/10.1038/nprot.2008.211 Google Scholar CrossRef Search ADS PubMed 23 Ashburner M , Ball CA , Blake JA , et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium . Nat Genet 2000 ; 25 ( 1 ): 25 – 9 . Google Scholar CrossRef Search ADS PubMed 24 Pivetta T , Isaia F , Trudu F , et al. Development and validation of a general approach to predict and quantify the synergism of anti-cancer drugs using experimental design and artificial neural networks . Talanta 2013 ; 115 : 84 – 93 . http://dx.doi.org/10.1016/j.talanta.2013.04.031 Google Scholar CrossRef Search ADS PubMed 25 Cortes C , Vapnik VN. Support vector networks . Mach Learn 1995 ; 20 ( 3 ): 273 – 97 . http://dx.doi.org/10.1007/BF00994018 26 Zhao M , Li Z , He W. Classifying four carbon fiber fabrics via machine learning: a comparative study using ANNs and SVM . Appl Sci 2016 ; 6 ( 8 ): 209 . http://dx.doi.org/10.3390/app6080209 Google Scholar CrossRef Search ADS 27 Li H , Tang X , Wang R , et al. Comparative study on theoretical and machine learning methods for acquiring compressed liquid densities of 1, 1, 1, 2, 3, 3, 3-heptafluoropropane (R227ea) via song and mason equation, support vector machine, and artificial neural networks . Appl Sci 2016 ; 6 ( 1 ): 25 . http://dx.doi.org/10.3390/app6010025 Google Scholar CrossRef Search ADS 28 Dorman SN , Baranova K , Knoll JH , et al. Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning . Mol Oncol 2016 ; 10 ( 1 ): 85 – 100 . http://dx.doi.org/10.1016/j.molonc.2015.07.006 Google Scholar CrossRef Search ADS PubMed 29 Dash M , Liu H. Feature selection for classification . Intell Data Anal 1997 ; 1 ( 1–4 ): 131 – 156 . Google Scholar CrossRef Search ADS 30 Menden MP , Iorio F , Garnett M , et al. SIFT: predicting amino acid changes that affect protein function . Nucleic Acids Res 2013 ; 8 ( 4 ): 3812 – 14 . 31 Breiman L. Random forests . Mach Learn 2001 ; 45 ( 1 ): 5 – 32 . http://dx.doi.org/10.1023/A:1010933404324 Google Scholar CrossRef Search ADS 32 Verikas A , Vaiciukynas E , Gelzinis A , et al. Electromyographic patterns during golf swing: activation sequence profiling and prediction of shot effectiveness . Sensors 2016 ; 16 ( 4 ): 592 . Google Scholar CrossRef Search ADS 33 Chen L , Li BQ , Zheng MY , et al. Prediction of effective drug combinations by chemical interaction, protein interaction and target enrichment of KEGG pathways . Biomed Res Int 2013 ; 2013 : 723780 . Google Scholar PubMed 34 Kuhn M , von Mering C , Campillos M , et al. STITCH: interaction networks of chemicals and proteins . Nucleic Acids Res 2008 ; 36(Database issue) : D684 – 8 . 35 Jensen LJ , Kuhn M , Stark M , et al. STRING 8—a global view on proteins and their functional interactions in 630 organisms . Nucleic Acids Res 2009 ; 37(Database issue) : D412 – 16 . Google Scholar CrossRef Search ADS 36 Hansen PW , Clemmensen L , Sehested TS , et al. Identifying drug–drug interactions by data mining . Circ Cardiovasc Qual Outcomes 2016 ; 9 ( 6 ): 621 – 8 . http://dx.doi.org/10.1161/CIRCOUTCOMES.116.003055 Google Scholar CrossRef Search ADS PubMed 37 McDonald JH. Handbook of Biological Statistics. http://www.biostathandbook.com/simplelogistic.html. 38 Huang H , Zhang P , A Xiaoyan Q , et al. Systematic prediction of drug combinations based on clinical side-effects . Sci Rep 2014 ; 4 : 7160 . Google Scholar CrossRef Search ADS PubMed 39 Kuhn M , Campillos M , Letunic I , et al. A side effect resource to capture phenotypic effects of drugs . Mol Syst Biol 2010 ; 6 : 343 . Google Scholar CrossRef Search ADS PubMed 40 Tatonetti NP , Ye PP , Daneshjou R , et al. Data-driven prediction of drug effects and interactions . Sci Transl Med 2012 ; 4 ( 125 ): 125ra131 . Google Scholar CrossRef Search ADS 41 http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/. 42 Menden MP , Iorio F , Garnett M , et al. Scikit-learn: machine learning in Python . J Mach Learn Res 2013 ; 8 ( 4 ): 2825 – 30 . 43 Friedman JH. Stochastic gradient boosting . Comp Stat Data Anal 2002 ; 38 ( 4 ): 367 – 78 . http://dx.doi.org/10.1016/S0167-9473(01)00065-2 Google Scholar CrossRef Search ADS 44 Xu Q , Xiong Y , Dai H , et al. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm . J Theor Bio 2017 ; 417 : 1 – 7 . http://dx.doi.org/10.1016/j.jtbi.2017.01.019 Google Scholar CrossRef Search ADS 45 Glick M , Jenkins JL , Nettles JH , et al. Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers . J Chem Inf Model 2006 ; 46 ( 1 ): 193 – 200 . http://dx.doi.org/10.1021/ci050374h Google Scholar CrossRef Search ADS PubMed 46 Schuffenhauer A , Floersheim P , Acklin P , et al. Similarity metrics for ligands reflecting the similarity of the target proteins . J Chem Inf Comput Sci 2003 ; 43 ( 2 ): 391 – 405 . http://dx.doi.org/10.1021/ci025569t Google Scholar CrossRef Search ADS PubMed 47 Nidhi GM , Davies JW , et al. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases . J Chem Inf Model 2006 ; 46 : 1124 – 33 . http://dx.doi.org/10.1021/ci060003g Google Scholar CrossRef Search ADS PubMed 48 Xia X , Maliski EG , Gallant P , et al. Classification of kinase inhibitors using a Bayesian model . J Med Chem 2004 ; 47 ( 18 ): 4463 – 70 . http://dx.doi.org/10.1021/jm0303195 Google Scholar CrossRef Search ADS PubMed 49 Glick M , Klon AE , Acklin P , et al. Enrichment of extremely noisy high-throughput screening data using a naïve Bayes classifier . J Biomol Screening 2004 ; 9 ( 1 ): 32 – 6 . Google Scholar CrossRef Search ADS 50 Hameed PN , Verspoor K , Kusljic S , et al. Positive-unlabeled learning for inferring drug interactions based on heterogeneous attributes . BMC Bioinformatics 2017 ; 18 ( 1 ): 140 . http://dx.doi.org/10.1186/s12859-017-1546-7 Google Scholar CrossRef Search ADS PubMed 51 Vilar S , Uriarte E , Santana L , et al. Similarity-based modeling in large-scale prediction of drug-drug interactions . Nat Protoc 2014 ; 9 ( 9 ): 2147 – 63 . http://dx.doi.org/10.1038/nprot.2014.151 Google Scholar CrossRef Search ADS PubMed 52 Zaman N , Li L , Jaramillo ML , et al. Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets . Cell Rep 2013 ; 5 ( 1 ): 216 – 23 . http://dx.doi.org/10.1016/j.celrep.2013.08.028 Google Scholar CrossRef Search ADS PubMed 53 Reimand J , Tooming L , Peterson H , et al. GraphWeb: mining heterogeneous biological networks for gene modules with functional significance . Nucleic Acids Res 2008 ; 36(Web Server issue) : W452 – 9 . Google Scholar CrossRef Search ADS 54 Wang E , Zaman N , Mcgee SR , et al. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data . Semin Cancer Biol 2015 ; 30 : 4 – 12 . http://dx.doi.org/10.1016/j.semcancer.2014.04.002 Google Scholar CrossRef Search ADS PubMed 55 Li L , Tibiche C , Fu C , et al. The human phosphotyrosine signaling network: evolution and hotspots of hijacking in cancer . Genome Res 2012 ; 22 ( 7 ): 1222 – 30 . http://dx.doi.org/10.1101/gr.128819.111 Google Scholar CrossRef Search ADS PubMed 56 Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints . J Comp Chem 2010 ; 7 : 1466 – 74 . 57 PaDel-descriptor. http://www.yapcwsoft.com/dd/padeldescriptor/. 58 McGee SR , Tibiche C , Trifiro M , et al. Network analysis reveals a signaling regulatory loop in pik3ca-mutated breast cancer predicting survival outcome . Genomics Proteomics Bioinformatics 2017 ; 15 ( 2 ): 121 – 9 . http://dx.doi.org/10.1016/j.gpb.2017.02.002 Google Scholar CrossRef Search ADS PubMed 59 Li J , Lenferink AEG , Deng Y , et al. Identification of high-quality cancer prognostic markers and metastasis network modules . Nat Commun 2010 ; 1 ( 34 ): 1 – 8 . Google Scholar PubMed 60 Albelwi S , Mahmood A. A framework for designing the architectures of deep convolutional neural networks . Entropy 2017 ; 19 ( 6 ): 242 . http://dx.doi.org/10.3390/e19060242 Google Scholar CrossRef Search ADS 61 Hubel DH , Wiesel TN. Receptive fields and functional architecture of monkey striate cortex . J Physiol 1968 ; 195 ( 1 ): 215 – 43 . http://dx.doi.org/10.1113/jphysiol.1968.sp008455 Google Scholar CrossRef Search ADS PubMed 62 Preuer K. Deep learning for drug combinations synergy prediction. Thesis, Johannes Kepler Universitat, Linz, 2016 . 63 Hanahan D , Weinberg RA. The hallmarks of cancer . Cell 2000 ; 100 ( 1 ): 57 – 70 . http://dx.doi.org/10.1016/S0092-8674(00)81683-9 Google Scholar CrossRef Search ADS PubMed 64 Hanahan D , Weinberg RA. Hallmarks of cancer: the next generation . Cell 2011 ; 144 ( 5 ): 646 – 74 . http://dx.doi.org/10.1016/j.cell.2011.02.013 Google Scholar CrossRef Search ADS PubMed 65 Loewe S. Die quantitativen probleme der pharmakologie . Ergeb Physiol 1928 ; 27 : 47 – 187 . http://dx.doi.org/10.1007/BF02322290 Google Scholar CrossRef Search ADS 66 Bliss C. The toxicity of poisons applied jointly . Ann Appl Biol 1939 ; 26 ( 3 ): 585 – 615 . http://dx.doi.org/10.1111/j.1744-7348.1939.tb06990.x Google Scholar CrossRef Search ADS 67 Berenbaum MC. What is synergy? Pharmacol Rev 1989 ; 41 ( 2 ): 93 – 141 . Google Scholar PubMed 68 Jodrell D. Combenefit. 2015 . http://sourceforge.net/projects/combenefit/ (2 August 2016, date last accessed). 69 Vougas K , Jackson T , Polyzos A , et al. Deep learning and association rule mining for predicting drug response in cancer. A personalised medicine approach . bioRxiv 2017 . http://dx.doi.org/10.1101/070490 (19 August 2016, date last accessed). 70 LeCun Y , Bengio Y , Hinton G. Deep learning . Nature 2015 ; 521 ( 7553 ): 436 – 44 . http://dx.doi.org/10.1038/nature14539 Google Scholar CrossRef Search ADS PubMed 71 Breiman L. Bagging predictors . Mach Learn 1996 ; 24 ( 2 ): 123 – 40 . http://dx.doi.org/10.1007/BF00058655 72 Atkinsonm F. Standardiser v0.1.7. 2014 . https://github.com/flatkinson/standardiser (11 December 2015, date last accessed). 73 Williams RJ , Zipser D. A learning algorithm for continually running fully recurrent neural networks . Neural Comput 1989 ; 1 ( 2 ): 270 – 80 . http://dx.doi.org/10.1162/neco.1989.1.2.270 Google Scholar CrossRef Search ADS 74 Ravi D , Wong C , Deligianni F , et al. Deep learning for health informatics . IEEE J Biomed Health Inform 2017 ; 21 ( 1 ): 4 – 21 . http://dx.doi.org/10.1109/JBHI.2016.2636665 Google Scholar CrossRef Search ADS PubMed 75 Hochreiter S , Schmidhuber J. Long short-term memory . Neural Comput 1997 ; 9 ( 8 ): 1735 – 80 . http://dx.doi.org/10.1162/neco.1997.9.8.1735 Google Scholar CrossRef Search ADS PubMed 76 Lipton ZC , Kale DC , Elkan C , et al. Learning to diagnose with LSTM recurrent neural networks . arXiv . arXiv: 1511.03677. 77 Lusci A , Pollastri G , Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules . J Chem Inf Model 2013 ; 53 ( 7 ): 1563 – 75 . http://dx.doi.org/10.1021/ci400187y Google Scholar CrossRef Search ADS PubMed 78 Hinton GE , Osindero S , Teh YW. A fast learning algorithm for deep belief nets . Neural Comput 2006 ; 18 ( 7 ): 1527 – 54 . http://dx.doi.org/10.1162/neco.2006.18.7.1527 Google Scholar CrossRef Search ADS PubMed 79 Hou Y , Wang C , Ji Y. The research of event detection and characterization technology of ticket gate in the urban rapid rail transit . J Softw Eng Appl 2015 ; 8 : 6 – 15 . http://dx.doi.org/10.4236/jsea.2015.81002 Google Scholar CrossRef Search ADS 80 Ibrahim R , Yousri NA , Ismail MA , et al. Multi-level gene/miRNA feature selection using deep belief nets and active learning . Proc Eng Med Biol Soc 2014 ; 2014 : 3957 – 60 . 81 Ghaisani F , Wasito I , Faturrahman M , et al. Prognosis cancer prediction model using deep belief network approach . J Theor Appl Inf Technol 2017 ; 95 ( 20 ): 5369 – 78 . 82 Khademi M , Nedialkov NS. Probabilistic graphical models and deep belief networks for prognosis of breast cancer. In: Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA 2015). 2015 , 727–32. 83 Lee H , Grosse R , Ranganath R , et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009 , 609–16. 84 Li H , Grosse R , Rengana R , et al. Unsupervised learning of hierarchical representation with convolutional deep belief networks . Comm of ACM 2011 ; 54 : 95 – 103 . Google Scholar CrossRef Search ADS 85 Cao R , Bhattacharya D , Hou J , et al. DeepQA: improving the estimation of single protein model quality with deep belief networks . BMC Bioinformatics 2016 ; 17 ( 1 ): 2 – 9 . Google Scholar PubMed 86 Nair V , Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010 , 807–14. Haifa, Israel: Omnipress. http://www.icml2010.org/papers/432.pdf. 87 Salakhutdinov R , Hinton GE. Deep Boltzmann machines. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. 2009 , 448–55. ACM 2009. 88 Keyvanrad MA , Homayoonpoor M. A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet V2.0) . arXiv . arXiv: 1408.3264 [cs.CV] https://www.researchgate.net/publication/264790642_ (11 December 2017, date last accessed). 89 Reichert DP , Seriès P , Storkey AJ. Charles bonnet syndrome: evidence for a generative model in the cortex? PLoS Comput Biol 2013 ; 9 ( 7 ): e100313 . Google Scholar CrossRef Search ADS 90 Guo Y , Liu Y , Oerlemans A , et al. Deep learning for visual understanding: a review . Neurocomput 2016 ; 187 : 27 – 48 . http://dx.doi.org/10.1016/j.neucom.2015.09.116 Google Scholar CrossRef Search ADS 91 Suk HI , Lee SW , Shen D. Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis . Neuroimage 2014 ; 101 : 569 – 82 . http://dx.doi.org/10.1016/j.neuroimage.2014.06.077 Google Scholar CrossRef Search ADS PubMed 92 Ortiz A , Munilla J , Górriz JM , Ramírez J. Ensembles of deep learning architectures for the early diagnosis of the Alzheimer's disease . Int J Neur Syst 2016 ; 26 ( 7 ): 1650025 . Google Scholar CrossRef Search ADS 93 Graff P , Feroz F , Hobson MP , Lasenby A. SKYNET: an efficient and robust neural network training tool for machine learning in astronomy . Mon Not Roy Astron Soc 2014 ; 441 ( 2 ): 1741 – 59 . arXiv: 1309.0790 [astro-ph.IM] Google Scholar CrossRef Search ADS 94 Li H , Lyu Q , Cheng J , et al. A tempate-based protein structure reconstruction method using deep autoencoder learning . J Proteomics Bioinform 2016 ; 9 ( 12 ): 306 – 13 . Google Scholar CrossRef Search ADS PubMed 95 Gomez-Bombarelli R , Duvenaud D , Miguel J. Automatic chemical design using a data-driven continuous representation of molecules . arXiv . arXiv: 1610.02415v2 [cs.LG] 6 Jan 2017 96 Wang L , You ZH , Chen X. A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network . J Comput Biol 2017 , in press. 10.1089/cmb.2017.0135. 97 Gunther S , Kuhn M , Dunkel M , et al. SuperTarget and matador: resources for exploring drug-target relationships . Nucl Acids Res 2008 ; 36(Database issue) : D919 – 22 . 98 Kanehisa M , Goto S , Hattori M , et al. From genomics to chemical genomics: new developments in KEGG . Nucl Acids Res 2006 ; 34 ( 90001 ): D354 – 7 . Google Scholar CrossRef Search ADS PubMed 99 Schomburg I , Chang A , Ebeling C , et al. BRENDA, the enzyme database: updates and major new developments . Nucl Acids Res 2004 ; 32(Database issue) : D431 – 3 . Google Scholar CrossRef Search ADS 100 Yamanishi Y , Araki M , Gutteridge A , et al. Prediction of drug-target interaction networks from the integrationof chemical and genomic spaces . Bioinformatics 2008 ; 24 ( 13 ): I232 – 40 . Google Scholar CrossRef Search ADS PubMed 101 Yamanishi Y , Kotera M , Kanehisa M , et al. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework . Bioinformatics 2010 ; 26 ( 12 ): i246 – 54 . Google Scholar CrossRef Search ADS PubMed 102 Pirooznia M , Yang JY , Yang MQ , Deng Y. A comparative study of different machine learning methods on microarray gene expression data . BMC Genomics 2008 ; 9(Suppl 1) : S13 . Google Scholar CrossRef Search ADS PubMed 103 Koutsoukas A , Monaghan KJ , Li X , Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data . J Cheminform 2017 ; 9 ( 1 ): 42 . http://dx.doi.org/10.1186/s13321-017-0226-y Google Scholar CrossRef Search ADS PubMed 104 Hu X , Reaven PD , Saremi A. Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance. EURASIP . J Bioinform Syst Biol 2016 ; 1 : 14 . Google Scholar CrossRef Search ADS 105 Douglas PK , Harris S , Yuille A , Cohen MS. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief . Neuroimage 2011 ; 56 ( 2 ): 544 – 53 . Google Scholar CrossRef Search ADS PubMed 106 Su R , Li Y , Zink D , Loo LH. Supervised prediction of drug-induced nephrotoxicity based on interleukin-6 and -8 expression levels . BMC Bioinformatics 2014 ; 15(Suppl 16) : S16 . Google Scholar CrossRef Search ADS PubMed 107 Wang H , Zhou Z , Li Y. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT . EJNMMI Res 2017 ; 7 : 11 . http://dx.doi.org/10.1186/s13550-017-0260-9 Google Scholar CrossRef Search ADS PubMed 108 Ohsugi H , Tabuchi H , Enno H. Accuracy of deep learning, a machine-learning technology, using ultra–wide-field fundus ophthalmoscopy for detecting rhegmatogenous retinal detachment . Sci Rep 2017 ; 7 : 9425 . http://dx.doi.org/10.1038/s41598-017-09891-x Google Scholar CrossRef Search ADS PubMed 109 Heinsfeld AS , Franco AR , Craddock RC , et al. Identification of autism spectrum disorder using deep learning and the ABIDE dataset . Neuroimage 2018 ; 17 : 16 – 23 . http://dx.doi.org/10.1016/j.nicl.2017.08.017 Google Scholar CrossRef Search ADS PubMed 110 Sun T , Zhou B , Lai L , Pei J. Sequence-based prediction of protein-protein interaction using a deep-learning algorithm . BMC Bioinformatics 2017 ; 18 ( 1 ): 277 . http://dx.doi.org/10.1186/s12859-017-1700-2 Google Scholar CrossRef Search ADS PubMed 111 Ciompi F , Chung K , van Riel SJ , et al. Towards automatic pulmonary nodule management in lung cancer screening with deep learning . Sci Rep 2017 ; 7 : 46479 . http://dx.doi.org/10.1038/srep46479 Google Scholar CrossRef Search ADS PubMed 112 Wang J , Yang X , Cai H , et al. Discrimination of breast cancer with microcalcifications on mammography by deep learning . Sci Rep 2016 ; 6 : 27327 . http://dx.doi.org/10.1038/srep27327 Google Scholar CrossRef Search ADS PubMed 113 Quachtran B , Hamilton R , Scalzo F. Detection of intracranial hypertension using deep learning . Proc IAPR Int Conf Pattern Recogn 2016 ; 2016 : 2491 – 6 . Google Scholar PubMed 114 Bansal M , Yang J , Karan C , et al. A community computational challenge to predict the activity of pairs of compounds . Nat Biotechnol 2014 ; 32 ( 12 ): 1213 – 22 . http://dx.doi.org/10.1038/nbt.3052 Google Scholar CrossRef Search ADS PubMed 115 Shah MA , Schwartz GK. Cell cycle-mediated drug resistance an emerging concept in cancer therapy . Clin Cancer Res 2001 ; 7 : 2168 – 81 . Google Scholar PubMed 116 Recht A , Come SE , Henderson IC , et al. The sequencing of chemotherapy and radiation therapy after conservative surgery for early-stage breast cancer . N Engl J Med 1996 ; 334 ( 21 ): 1356 – 61 . http://dx.doi.org/10.1056/NEJM199605233342102 Google Scholar CrossRef Search ADS PubMed 117 Aytes A , Mitrofanova A , Lefebvre C , et al. Cross-species regulatory network analysis identifies a synergistic interaction between FOXM1 and CENPF that drives prostate cancer malignancy . Cancer Cell 2014 ; 25 ( 5 ): 638 – 51 . http://dx.doi.org/10.1016/j.ccr.2014.03.017 Google Scholar CrossRef Search ADS PubMed 118 Chen JC , Alvarez MJ , Talos F , et al. Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks . Cell 2014 ; 159 ( 2 ): 402 – 14 . http://dx.doi.org/10.1016/j.cell.2014.09.021 Google Scholar CrossRef Search ADS PubMed 119 Chudnovsky Y , Kim D , Zheng S , et al. ZFHX4 interacts with the NuRD core member CHD4 and regulates the glioblastoma tumor-initiating cell state . Cell Rep 2014 ; 6 ( 2 ): 313 – 24 . http://dx.doi.org/10.1016/j.celrep.2013.12.032 Google Scholar CrossRef Search ADS PubMed 120 Chen X , Ren B , Chen M , et al. NLLSS: predicting synergistic drug combinations based on semi-supervised learning . PLoS Comput Biol 2016 ; 12 ( 7 ): e1004975 . Google Scholar CrossRef Search ADS PubMed 121 Lathrop RH , Pazzani MJ. Combinatorial optimization in rapidly mutating drug-resistant viruses . J Comb Optim 1999 ; 3 ( 2/3 ): 301 – 20 . Google Scholar CrossRef Search ADS 122 Iversen AK , Shafer RW , Wehrly K , et al. Multidrug-resistant human immunodeficiency type I strains resulting from combination antiretroviral therapy . J Virol 1996 ; 70 ( 2 ): 1086 – 90 . Google Scholar PubMed 123 Boyce R , Collins C , Horn J , et al. Computing with evidence. Part II: an evidential approach to predicting metabolic drug–drug interactions . J Biom Inform 2009 ; 42 : 990 – 1003 . Google Scholar CrossRef Search ADS 124 Xu HT , Oliveira M , Asahchop EL , et al. Molecular mechanism of antagonism between the Y181C and E138K mutations in HIV-1 reverse transcriptase . J Virol 2012 ; 86 ( 23 ): 12983 – 90 . http://dx.doi.org/10.1128/JVI.02005-12 Google Scholar CrossRef Search ADS PubMed 125 Ziermann R , Limoli K , Das K , et al. A mutation in human immunodeficiency virus type 1 protease, n88s, that causes in vitro hypersensitivity to amprenavir . J Virol 2000 ; 74 ( 9 ): 4414 – 18 . http://dx.doi.org/10.1128/JVI.74.9.4414-4419.2000 Google Scholar CrossRef Search ADS PubMed 126 Imbus JR , Randle RW , Pitt SC , et al. Machine learning to identify multigland disease in primary hyperparathyroidism . J Surg Res 2017 ; 219 : 173 – 9 . http://dx.doi.org/10.1016/j.jss.2017.05.117 Google Scholar CrossRef Search ADS PubMed 127 Prosperi MC , Altmann A , Rosen-Zvi M , et al. Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment . Antivir Ther 2009 ; 14 ( 3 ): 433 – 42 . Google Scholar PubMed 128 Shaikh F. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/05/gpus-necessary-for-deep-learning/. 129 Kumar G , Gronlund CJ , Severtson RS. Introduction to the deep learning virtual machine. Microsoft Azure. https://docs.microsoft.com/en-us/azure//////machine-learning/data-science-virtual-machine/deep-learning-dsvm-overview. 130 Cui H , Zhang Z , Ganger GB , et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceeding EuroSys '16 Proc Eleventh European Conference on Computer Systems Article No. 4. ACM, London, UK, 2016 . 131 Chilimbi T , Suzue Y , Apacible J , et al. Project Adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI. 2014 . USENIX Assn, Bromfeld, CO, USA. 132 Yepes AJ , MacKinlay A , Bedo J , et al. Deep belief networks and biomedical text categorisation. In: G Ferraro, S Wan (eds), Proceedings of Australasian Language Technology Association Workshop. 2014 , 123 − 7. RMTT, Melbourne, Australia. 133 Robinson ME , O'Shea AM , Craggs JG , et al. Comparison of machine classification algorithms for fibromyalgia: neuroimages versus self-report . J Pain 2015 ; 16 ( 5 ): 472 – 7 . http://dx.doi.org/10.1016/j.jpain.2015.02.002 Google Scholar CrossRef Search ADS PubMed 134 Angermueller C , Pärnamaa T , Parts L , Stegle O. Deep learning for computational biology . Mol Syst Biol 2016 ; 12 ( 7 ): 878 . Google Scholar CrossRef Search ADS PubMed 135 Jia Y , Shelhamer E , Donahue J. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia. New York, NY: ACM, 2014 , 675–8. 136 Bastien F , Lamblin P , Pascanu R , et al. ( 2012 ) Theano: new features and speed improvements . arXiv . arXiv: 1211.5590 137 Team TTD , Al-Rfou R , Alain G , et al. ( 2016 ) Theano: a python framework for fast computation of mathematical expressions . arXiv . arXiv: 1605.02688 © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Artificial intelligence in drug combination therapy

Loading next page...
 
/lp/ou_press/artificial-intelligence-in-drug-combination-therapy-6030zVYUQM
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bby004
Publisher site
See Article on Publisher Site

Abstract

Abstract Currently, the development of medicines for complex diseases requires the development of combination drug therapies. It is necessary because in many cases, one drug cannot target all necessary points of intervention. For example, in cancer therapy, a physician often meets a patient having a genomic profile including more than five molecular aberrations. Drug combination therapy has been an area of interest for a while, for example the classical work of Loewe devoted to the synergism of drugs was published in 1928—and it is still used in calculations for optimal drug combinations. More recently, over the past several years, there has been an explosion in the available information related to the properties of drugs and the biomedical parameters of patients. For the drugs, hundreds of 2D and 3D molecular descriptors for medicines are now available, while for patients, large data sets related to genetic/proteomic and metabolomics profiles of the patients are now available, as well as the more traditional data relating to the histology, history of treatments, pretreatment state of the organism, etc. Moreover, during disease progression, the genetic profile can change. Thus, the ability to optimize drug combinations for each patient is rapidly moving beyond the comprehension and capabilities of an individual physician. This is the reason, that biomedical informatics methods have been developed and one of the more promising directions in this field is the application of artificial intelligence (AI). In this review, we discuss several AI methods that have been successfully implemented in several instances of combination drug therapy from HIV, hypertension, infectious diseases to cancer. The data clearly show that the combination of rule-based expert systems with machine learning algorithms may be promising direction in this field. artificial intelligence, drug combination, combination therapy, machine learning, genomic profile Introduction It is becoming increasingly clear that targeted combination therapy is the treatment of choice in many complex human diseases, particularly those resulting from biological dysfunction driven by alterations/mutations in several genes or/and gene networks as is usually the case in cancer [1]. However, selection of the most efficacious combinations for each patient can be a daunting task. For example, based on gene alterations, it would be not unreasonable for a cancer treatment regimen to require six or more drugs (each targeting a particular genetic alteration), and when one considers the multiple possible dosing regimens, the number of potential combinations rapidly multiplies, achieving numbers as high as 1011 [2]. Moreover, as the number and type of genetic alterations can vary widely from patient to patient, these types of choices must be considered for each individual—clearly beyond the capabilities of primary care physicians. Thus, computational approaches are clearly required, and there are currently a number of artificial intelligence (AI) methods being used for optimization of combination therapies. For example, an early AI application proposed for drug treatment strategy selection was the computer-based consultation system, MYCIN [3]. The goal of this rule-based expert system, with ∼600 rules, was to provide physicians therapy recommendations for patients with bacterial infections [3]. The reasoning evaluation mechanisms in MYCIN included a fuzzy logic function for combining uncertain assertions within each rule, and while MYCIN was never used in practice, it did achieve a 69% success rate in choosing the acceptable pharmacotherapy—which was better than that of infectious disease experts using the same criteria [3]. A subsequent AI application for the selection of drug therapy used an expert system was also devised, but in this case, a knowledge base of only around 100 rules (extracted from medical professionals) was used for selection of the best antimicrobial therapy [4]. Nevertheless, even with this limited knowledge base, the system was useful for medical professionals [4]. Attempts to improve these early AI applications by enhancing the expertise of the physician have been performed by combining of expert systems where knowledge embedded in the system is internally represented by means of frames and rules with artificial neural networks (ANNs) [5]. For example, multilayer (up to six layers) ANNs were used in combination with expert systems for the creation of a machine learning (ML) system for the diagnosis and treatment of hypertension [5]. In this system, the selection of hypertension drug combinations was accomplished by training the system using blood pressure time series measurements from ∼300 healthy subjects and 85 hypertensive subjects, which were divided to the learning set of data used for training the system and testing set of data [5]. The multilayer ANN appeared to extract the distinguishing features of the learning set of data and recognized these patterns even with noisy input sets of data [5]. This ANN, using more than three layers, was one of the first deep learning (DL) methods used for the selection of drug combinations. The authors point out that the multilayer ANN extracts the central tendencies of the learning set and may recognize these patterns with noisy such input sets of data. In a different example, a patent [6] has been issued for a system that uses patient information provided to a computational knowledge-based system comprised not only of a multiplicity of different pharmacotherapeutic treatment regimens for the disease but also expert rules for: (1) selecting the appropriate treatment option and (2) patient advisory information on the different constituents of the regimens [6]. AI solutions that have been found to be successful for combination drug therapy are described below. ML systems Artificial neural networks Perceptron is a type of neuron in a neural network defined by a linear combination followed by a thresholding activation function. A perceptron is an algorithm that uses binary classifiers to map input data onto appropriate outputs, such that the output is +1 (if the weighted sum of its inputs exceeds threshold) or −1 or 0 (if they do not exceed that threshold). As this is a simplified algorithm for a function of the neuron, the term perceptron is sometimes substituted with ‘artificial neuron’, and thus, this is the term from which ‘neural network’ is derived [7] (Figure 1). Figure 1. View largeDownload slide MLP with one hidden layer. Figure 1. View largeDownload slide MLP with one hidden layer. In some applications, in which more complex decisions are needed and postprocessing is required, the perceptron is replaced by a so-called ‘sigmoidal nonlinearity’ [8]. Wang and coauthors [9] described the use of three-layer ANNs for the selection of drugs for the treatment of HIV in patients in which previously unidentified (possibly acquired during the treatment) genetic mutations of viral DNA appear to confer drug resistance on the virus. A combination of three drugs is frequently used for the treatment of HIV infection. However, as these drugs do not cure the disease but rather suppress the replication of the HIV virus, this treatment regimen has turned this disease into a chronic manageable disorder requiring life-long drug dosing. However, >200 viral mutations have been identified, which can affect drug susceptibility/resistance. Unfortunately, there is no simple prediction strategy based on direct drug resistance–mutation relationships. Wang and colleagues [9] examined 351 HIV treatment patients using a three-layer ANN, where the output of the models was the follow-up viral load after treatment. Theoretically, three-layer ANNs can approximate any function [10], so the authors used just one hidden layer (the middle layer of nodes is called the hidden layer because its values are not observed in the training set). The abovementioned authors have, for a number of years, successfully applied ANN for the prediction of drug resistance on the basis of mutation patterns seen in the patients. There are two approaches to estimate the drug’s efficacy—phenotyping and genotyping. ANN used in prediction of Lopinavir drug resistance related to genetic aberrations is a good example of the use of ANN in medicine. For this ANN, 1322 samples from which 267 were drug-resistant and 1055 were drug-susceptible were used. In total, 117 samples were randomly selected and assigned a status of ‘independent test set’. The remaining samples were randomly separated into training and validation subgroups. The genotyping data were classified as either 1 (if a mutation existed) or 0 (if there were no mutations). The phenotyping data used were the fold change in virus resistance. Two models were used: (1) samples having 11 mutations corresponding to susceptibility to Lopinavir; and (2) 28 mutations significant to Lopinavir resistance. This approach paved the road to using genetic information of HIV protease for prediction of drug resistance. Recently, three-layer ANNs have been used for a machine learning-based prediction of the sensitivity of cancer cells to drugs based on genomic profiles of the cell lines and chemical properties of the drug compounds [11]. Data from the Genomics of Drug Sensitivity in Cancer project [12] (http://www.cancerrxgene.org/) were used for the genomic profiling of the cell lines, and Encog 3.0.1 (http://www.heatonresearch.com/encog) [13]-based neural network system was used. The system included a feed-forward multilayer perceptron (MLP) (the connectivity graph of which does not have any directed loops or cycles); in this MLP, three different levels, input, hidden and output layer were used. Every perceptron of a lower level was completely connected to each perceptron of a higher level. The number of the input neuron units was defined by the number of features selected. The networks were trained using ‘resilient error backpropagation’ from the Encog program [14, 15]. The performance of the abovementioned MLP was comparable with ‘random forest’ regression models generated from the same training data, and the results indicated the feasibility of using the ML approaches in optimizing drug therapies even in the face of noisy data [11]. More recently, ANN has been used to associate drugs with diseases based on the biological process that are altered in the disease setting (rather than individual gene target involved in the biological process), so-called ‘process pharmacology’ [16] Drugs were classified in terms of their ability to target the specified gene ontology (GO) biological process and then presented as self-organizing maps (SOMs). The technical details of preparing the SOMs using ANN was described in a previous set of publications [17–20]. The Drugbank database [21] and the database for annotation, visualization and integrated discovery (DAVID) [22] were used to associate the drugs with biological process using ‘overrepresentation’ or ‘enrichment’ analyses in GO terms [23]. The following strategy was applied: if a drug was related to a particular gene, which was annotated to a specific biological process, then drug–bioprocess connections were established. Each such interaction was scored, and the sum of these interactions was then used to provide the ‘strength’ of such connections. A special parameter was established to score a number of these connections. The scalar element-wise products of the two matrices, drug–gene and drug–bioprocess, were then calculated. In this way, the authors were able to identify antihypertensive drug classes and subclasses. The drugs were initially classified using empirical pharmacological knowledge into eight classes. The drugs were then classified again using the ANN ML on the GO-biological process associated with each drug. Additional classes of drugs were identified with this approach. Pivetta and colleagues [24] used ANN with the standard back-propagation for prediction of synergism of anticancer drugs. They experimentally determined cytotoxicity of the drugs alone and in combination on cell lines. Then, they train ANN on the results of such experiments. They used 60 combinations from which 15 were the validation set. The system helped to evaluate the cytotoxicity of all possible combinations in the space of chosen concentrations. Support vector machine Currently, support vector machines (SVMs) are one of the most popular linear classifiers [25]. The function of SVM can be described as follows: given the labeled feature vectors (x1, y1) …, (xm, ym), a hyperplane that separates the positively labeled samples from the negatively labeled samples can be found while ensuring that the closest point in each class is as far away as possible from the hyperplane. They also are known for their ability to perform nonlinear classification [26] (Figure 2). Figure 2. View largeDownload slide Schematic structure of an SVM. Reproduced from the open access source [26, 27]. Figure 2. View largeDownload slide Schematic structure of an SVM. Reproduced from the open access source [26, 27]. In a recent study, SVMs were used to investigate resistance to paclitaxel and gemcitabine in breast cancer [28]. In this study, the SVM was trained using the Statistics Toolbox in MATLAB and then tested with the leave-one-out validation [28]. The SVM was first trained on breast cancer cell lines and a multifactorial, principal component analysis (MFA) was performed. The MFA indicated expression of genes targeted by paclitaxel as an indicator of sensitivity, while copy number and expression of genes targeted by gemcitabine as indicative of gemcitabine sensitivity. The sequential backward feature selection for feature optimization using a method of Dash and Liu [29] was used to minimize the percentage of misclassified cells. Genes that did not reduce or change the classification error were removed from the SVM (one at a time) with iterations conducted until the removal of a gene resulted in higher classification error. The SVM excluded 2 of the 49 explored cell lines. Two SVM models were trained using (1) normalized expression values, and (2) expression values binned to 10 categories. The SVM was trained on 15 gene variables for paclitaxel (49 cell lines) and 10 variables for gemcitabine (44 cell lines). The trained SVM misclassified 18% of cell lines for paclitaxel and 16% for gemcitabine. The authors found that the mutations were not useful for the SVM function, and they were not used in the final SVM. This is unfortunate because, in general, mutations can be a useful instrument of stratification of cell lines. The possible reason for this failure is the strategy used for inclusion of the mutation information. A parameter ‘mutation status’ was implemented when a gene contained one or more mutations. The pathogenic status of mutation (is it deleterious or not) was determined using SIFT [30], a program based on status of conservation of amino acids (AAs) across different species. While this program can, in general, shed some light on the ‘importance’ of a selected AA, it should not be a stringent criterion for defining whether a mutation in a particular AA is deleterious. In addition, using information about the number of mutations is also not useful. However, information about activation/deactivation of genes can be useful. While some mutations have no effect on the functional activity of a protein, others can lead to inappropriate activation or inactivation of a gene product, thus using the ‘activation’ (or ‘inactivation’) status of a gene or gene product is likely to bring much more weight for genomic information to the function of SVMs. Random forest A random forest is a classifier consisting of a collection of tree-structured classifiers {h(x, Ak), k = 1, …} where the {Ak} are independent identically distributed random vectors, and each tree casts a unit vote for the most popular class at input x [31]. Random forests are an effective tool in prediction [32] (Figure 3). Figure 3. View largeDownload slide Three decision trees and a classification obtained from each of them. The final prediction is based on majority voting and will be ‘Class B’ in the above case. Reproduced from the open access source [32]. Figure 3. View largeDownload slide Three decision trees and a classification obtained from each of them. The final prediction is based on majority voting and will be ‘Class B’ in the above case. Reproduced from the open access source [32]. Because of the law of large numbers, as pointed by Breiman [31], they do not overfit, and injecting the right kind of randomness makes them accurate classifiers and regressors. Currently, random forest is considered one of most effective ML techniques. For example, Chen and colleagues [33] used the random forest classifier to predict effective drug combinations. Three types of properties were used for learning parameters including: (1) chemical interactions between drugs in combination (determined using STITCH [34]), (2) protein interactions between the targets of drugs (determined using STRING8 [35]) and (3) target enrichments based on KEGG pathways. The random forest analysis identified 55 different features that were recognized as important for predicting the best drug combinations [33]. Hansen and colleagues [36] used the random forest to predict drug–drug interactions for 220 drug groups in >60000 prescriptions. Six drug groups with known interactions were rediscovered by this method. Logistic regression Simple logistic regression is analogous to linear regression, except that the dependent variable is nominal, not a measurement. It calculates a probability of getting a particular value of the nominal variable associated with the measurement variable; the other goal is to predict the probability of getting a particular value of the nominal variable, given the measurement variable [37]. Huang and colleagues [38] used a logistic regression model of ML to predict potentially efficacious drug combinations through the analysis of the side effects (SEs) of the individual drugs. For model building, the clinical phenotypic information (i.e. observed SEs reported in clinic) was used. Information for the various SEs was extracted from drug labels included in SIDER [39] and OFFSIDES [40], which uses data mined from the FDA postmarketing surveillance system FAERS (FDA Adverse Event Reporting System [41]). In total, 239 pairwise drug–drug co-prescriptions for marketed drug combinations were used as the positive set, and 2291 unsafe pairs were used as the negative set. In the model, each drug SE was considered a feature, and each drug pair was represented as complex feature with values of SE features: ‘0’ if neither drug had an SE, ‘1’ if one of the pair had an SE and ‘2’ when both drugs had SE. Two other ML algorithms, ‘decision tree’ and ‘naïve Bayes’, were also used and compared with the logistic regression model and were found to have similar results. It is interesting to note that the ‘rule of three’ was found for the prediction algorithm. The rule says that pneumonia, hemorrhage rectum and retina bleeding were the top features defining the model performance. If any of these features was present—the SE was strong. Adding more features did not improve the prediction capability of the model, and only these three features were used in a general drug combination selection. Important approach was implemented for SE stratification: SEs were classified into two categories: efficacy-related and undesired. The SEs contributing to the therapeutic effects of the drugs were called ‘efficacy-related SE’. An example of which is hypoglycemia related to use of antidiabetic drugs. Thus, the best pair of drugs would share efficacy-related SEs while having a minimum of undesired shared SEs. The logistic regression model used in the study was from Python Scikit-Learn package [42], and both penalty and regularization strength parameters r were taken into consideration in the regression logistic model. The Weka decision tree learner was used for feature selection. Weka (cs.waikato.ac.nz/ml/weka) is an open-source collection of ML algorithms developed by the University of Waikato and is bundled together with tools for preprocessing data to make it more easily understood by the ML algorithms. Stochastic gradient boosting Stochastic gradient boosting (SGB) that was originally presented by Fridman [43] became a frequently used tool for regression and classification problems. As discussed by Xu and coauthors [44], SGB algorithm constructs a prediction model using an ensemble of weak classifiers, typically decision trees. It builds the model in a stage-wise fashion and constructs additive regression models bysequentially fitting a simple parameterized function (base learner) to current pseudo-residuals by least squares at each iteration. As the features for the SGB algorithm were used: molecular 2D structures, drug structural similarity, anatomical therapeutic similarity, protein–protein interaction, chemical–chemical interaction and disease pathways. Three popular ML algorithms were used, and [44] SGB performed the best in comparison with naïve Bayes and SVM. The authors used this approach for 65 FDA-approved antihypertensive drugs to select the possible drug pairs and found that 6 of 17 predicted optimal drug combinations were already used in medical practice. Bayesian models Naïve Bayes is a statistical classification method based on the Bayes rule of conditional probability, which states that, given two events A and B, the probability of event A occurring, given that B has already occurred, P(A|B), is given by the equation: P(A|B)=P(B|A) P(A)/P(B), where P(A) and P(B) are the probabilities of events A and B, respectively. The Bayesian classifier is called naïve because it naïvely assumes the features are independent [45]. In elucidation of drug similarity and possible interactions, a significant role plays selection of attributes describing the drugs. There can include 2D and 3D structures parameters, types of atoms and bonds included in the drug compounds, targets of the drugs, etc. Bayesian methods were used for calculation of these attributes. Schuffenhauer and coauthors [46] proposed similarity metrics for selection of ligand similarity for specified targeted proteins. They introduced the so-called Similog keys, which are counts of atom triplets. Each triplet is characterized by the graph distances and the types of its atoms. The atom-typing scheme classifies each atom by its function as H-bond donor or acceptor and by its electronegativity and bulkiness. These are suitable types of molecular descriptors (fingerprints) of small molecules, although since then a number of other molecular descriptors have been suggested. The main point of such descriptors is to transfer the 2D or 3D structure of compounds into a numerical value that may be compared the values found in other molecules. Glick and colleagues [47] used Bayesian models with probabilities calculated using a Laplacian-corrected estimator as described earlier [48, 49] to predict targets of drug compounds. As they point out, ‘Machine learning algorithms are largely dependent on the training data sets. The quality of curation of the underlying chemogenomic database is vital to the success of the computational model’. In the case of antineoplastic, the authors successfully predicted the targets of the drugs used including tubulin, growth factor receptors [epidermal growth factor receptor (EGFR), FGFR, VEGF-R and PDGF-R], cell cycle and cell signaling kinases [PKC, PKA, CDK2, CDK4, Tie-2, adenosine kinase (AK), c-Src, Flt-1, Lck, TMPKmt and CSBP/p38] and some other proteins not included in these classes. Ren etal. [50] called their approach ‘Positive-Unlabeled learning’. It included a consequent use of naive Bayes and iterative SVM methods. They used also SOMs for clustering to elucidate the drug–drug interactions. Authors used the chemical, structural and other attributes of drug compounds to calculate their similarity and, according to the concept of Vlilar and coauthors [51], predicted drug–drug interaction based on these. Network-based modeling During the progression of cancer, genes related to cell proliferation, survival and apoptosis are likely to display genomic alterations. Zaman and colleagues [52] defined that ‘if the genes related to regulation of proliferation have non-synonymous mutations or are amplified—they became the cell-survival-related driving regulators’. The combination of properly selected parameters, such as hub-genes (i.e. genes which have connections to a significant number of other genes), cancer-essential genes and abovementioned driving regulators, made it possible to separate the basal-specific and luminal-specific gene subnetworks of breast cancer. Using Go-guided Markov Cluster (MCL) algorithm [53] together with their network approach, Wang and coauthors [54] demonstrated that these two types of breast cancer have markedly different functional modules of cancer development. In luminal-specific breast cancer, the main functional module was centered around CDK1/MYC and was related to the regulation of the cell cycle. In basal-specific breast cancer, the first module was centered around P53 for apoptosis regulation (or rather deregulation), while a second module was described, which was centered around EGFR and MAPK/MET (AKT/PIK3CA growth factors) both of which are related to regulation of cell proliferation. These results indicate clear differences in breast cancer subtypes and are important in the design of personalized drug therapy paving the road to more precise selection of drug targets in breast cancer. In a separate study, which also used the network approach, Li and colleagues [55] analyzed phosphotyrosine signaling, which is important in cancer. Using an evolutionary trajectory analysis, the authors found that tyrosine kinases can be separated into three specific groups based on their evolutionary origins (i.e. primitive, bilateral and vertebrate). These groups of tyrosine kinases differ by their cellular signaling function, such that those tyrosine kinases derived from primitive organisms are generally part of intracellular signaling, those with a bilaterian origin are largely involved in intercellular and extracellular signaling, while those tyrosine kinases which evolved mainly in vertebrates are more likely to be involved in tissue-specific signaling. The findings of this study were aided by the fact that authors considered the tyrosine kinase as a functional unit or ‘circuit’ comprised an inter-related triad of core functions, which include the ‘writer’ (the tyrosine kinase which phosphorylates the substrate), the ‘reader’ (for example, the SH2 domain which reads the modification) and the ‘eraser’ (the phosphatases which removes/deletes the phosphorylation modification on the substrate). Such an ‘elementary unit’ approach makes it possible to select much more powerful descriptors for genes/proteins that would improve a possible ML approach for drug-related predictions. If the molecular descriptors of drugs and drug-like compounds are developed comprehensively, for example descriptors in PaDEL [56, 57] and MOE (CCG, Montreal, Canada) programs, similar descriptors representing ‘hallmarks’ of cancer are not as comprehensive nor as well defined. The development of these descriptors is an on-going process; nevertheless, I can note the hallmark descriptors that are proposed in the Cancer Hallmark Network Framework [54]. These cancer hallmarks are represented by molecular/signaling subnetworks [52]. A network operational signature descriptor is introduced that can describe the state transitions from genomic alterations to clinical phenotypic profiles. The important concept of self-promoting positive feedback loops during tumorigenesis is also introduced. The authors also show that some Hallmark Networks can trigger genome duplications and eventually tumor development changes. McGee and coauthors [58] proposed that extremely small regulatory subnetworks, containing as few as three components, can act as positive regulators leading to prolonged activity of the network. A ‘brick’ of such positive regulation of the entire network would then be FFL—a feed-forward loop—consisting of a triad of genes including a ‘target’ gene and two input genes regulating each other and jointly regulating the target gene. Extremely important, and not explicitly stated by authors of abovementioned hallmark papers, is the concept that only the self-activating positive feedback subcircuits of cancer-related signaling and perhaps the metabolic networks must be taken in consideration for the predication of possible cancer development. These ‘bricks’ and their standard combinations would be useful as prediction descriptors. Such descriptors can significantly diminish overall number of parameters needed to be taken into consideration for ML prediction schemes. To select cancer hallmark-based gene signatures, Li and colleagues [59] used the cancer-related GO-terms as additional descriptors, and a special MCC algorithm of machine learning. This approach helped to segregate ‘driver’ mutations in genes from ‘passenger’ genes, and the signature sets appear to have a high predictive activity for patients’ clinical outcomes. DL multilayers ANN Deep convolution neural networks As pointed by Albelwi and Mahmood [60], convolution neural networks (CNNs) were developed using the concept of mammals’ visual cortexes as presented in Hubel and Wiesel’s model [61] (Figure 4). Figure 4. View largeDownload slide The structure of a CNN, consisting of convolutional, pooling and fully connected layers. Reproduced from the open access source [60]. Figure 4. View largeDownload slide The structure of a CNN, consisting of convolutional, pooling and fully connected layers. Reproduced from the open access source [60]. Recently, Preuer [62] described the use of DL for the optimization of drug combinations. The sets of parameters that were applied to the input of the multilevel DL neural network (DLNN) are described. The selection of the proper parameters is one of the main problems in ML systems. In the described multilayered neural network (MNN), chemical and biomedical data are applied to the input. In addition to the usual molecular descriptors of drug compounds covering both the 2D and 3D structures of chemical compounds, as actual experiments with cell lines and drug combinations are described, the dose response [EC50—the drug concentration at which half of the maximum effect is reached (cell death in this case)] of the drug was also included as a molecular descriptor [62]. The biomedical data (also termed ‘molecular data’ by the author) encompassed several hallmarks of cancer cells, including: activating invasion and metastasis, inducing angiogenesis, enabling replicative immortality, resisting cell death, sustaining proliferation signaling, evading growth suppressors, deregulating cellular energetics and avoiding immune destruction [63, 64]. Biomedical parameters, such as point mutations; small-scale insertions, deletions and duplications; copy number variations (CNVs); and DNA methylation were also included. The MNN was trained to predict a synergy score describing the differences between the observed effect of a drug combination and simple addition of participating drug effects. In real life, combining of two drugs often can lead to drug effects beyond simple addition of their impacts. Drugs in combinations can have completely independent, additive, synergistic or antagonistic impacts. One of widely used models for calculating of drugs synergy is the Loewe additivity model [65]. There also are more recent models, such as the Bliss independent action model [66] and its modification, the Berenbaum model [67]. The synergy score was also calculated using the Combenefit program [68]. Vougas and colleagues [69] recently used DLNNs [70] enhanced by Bagging Ensemble Learning [71] for the prediction of drug response in cancer. The studied sets included 689 cancer cell lines and 139 therapeutic compounds. The Genomics of Drug Sensitivity in Cancer (GDSC) [12] set was used for drug response source. Five main parameters (i.e. tissue of origin, gene expression, mutation status, CNV and drug response) were used to generate the comprehensive rule set containing all tissue-to-gene, tissue-to-drug, gene-to-gene, gene-to-drug and drug-to-drug associations. Owing to computer power limitations, only tissue-to-drug, gene-to-drug and drug-to-drug associations were used. The DLNN framework, H2O.ai (http://www.h2o.ai/) a cluster-ready framework ready for high-performance computers, was used for modeling. The Standardiser program [72] was also used to provide similar notations for all the compounds that came from different sources. Finally, the PaDEL descriptor, an open-source software [56, 57], was used to calculate the molecular descriptors of the drugs. Recurrent neural networks Proposed in 1989 by Williams and Zipser [73], recurrent neural networks (RNNs) are specifically suitable for analyzing of the data streams and are useful when the output depends on previous computing [74]. LSTM (long short-term memory unit) is a variation of RNN proposed by Hochreiter and Schmidhuber [75]. LSTM is convenient for applications with long-time lags of unknown size between important events [73–75]. It was used for the analysis of patient data histories. Proper classification of a diagnosis based on patient history is difficult. Episodes are different in length, ranging from a couple of hour to several months, and observations and laboratory tests are irregular. In addition, for cancer patients, the treatments are changed on an irregular basis. Lipton and colleagues [76] successfully used LSTM for recognition of patients’ diagnoses using time series training of the program with highly irregular time points and lab measurements, and the results potentially provide a means to more precise combination therapy administration. Lusci and colleagues [77] studied the opportunities to build aqueous solubility predictors that would overperform the current methods. They created an original method to use DAG-RNN (directed acyclic graph recursive neural networks) to describe the undirected graph-based systems. The descriptors, logP, first-order valence connectivity index, delta chi and information content were used. They validated this approach with UG-RNN (Undirected Graph Recursive Neural Networks) on the sets of >1000 molecules and show that it is strong method and in some variants giving better prediction than existing methods. Deep belief networks Deep belief network (DBN) was proposed by Hinton and colleagues [78] and, as pointed by Ravi and colleagues [74], DBN can be described as a composition of RBMs (restricted Boltzmann machine) with undirected connections at the top two layers and directed connections in the lower layers [79] (Figure 5). Figure 5. View largeDownload slide Deep belief network with three hidden layers organizing three RBMs. h – hidden and v – visible layer. Reproduced from the open-access source [85]. Figure 5. View largeDownload slide Deep belief network with three hidden layers organizing three RBMs. h – hidden and v – visible layer. Reproduced from the open-access source [85]. Ibrahim and colleagues [80] used DBNs for multilevel feature selection from genes and miRNA data. The results obtained showed that DBN outperformed the classical feature selection of specific data. Ghaisani and colleagues [81], using clinical and microarray analysis data, demonstrated that a combination structure of DBM and Bayesian network (BN) called DBN-BN overperformed traditional ML techniques like SVM and k-nearest neighbor in predictions of patient overall survival (OS) and disease-free survival. The authors state [81] that the combined DBM-BN approach in such an analysis overperformed the approach of Khademi and Nedialkov [82], in which the clinical model is constructed using BN, while the microarray model is constructed using (DBN). One of developments of DBM is CDBN—convolutional deep believe networks [83] that are similar to CNN but which are trained in a manner more similar to DBN—in this way exploiting the advantages of both methods [84]. Cao and coauthors [85] developed a method for the assessment of the quality of protein models based on DBN, which performs better than SVM. Protein structure prediction is important for the assessment of possible drug binding for combination therapy. Deep Boltzmann machine Deep Boltzmann machine (DBM) was proposed by Salakhutdinov and Hinton [87], and consists of n layers of neurons. Usually, the states of the neurons are taken to be binary, xi∈{0,1}, indicating whether a unit is ‘on’ or ‘off’, but it can use continuous-valued, rectified linear units [86, 87]. The schemes of General Boltzmann and Restricted Boltzmann machines [88] are presented on Figure 6. Figure 6. View largeDownload slide Left figure: a general Boltzmann machine. The top layer shows stochastic binary hidden units, and the bottom layer shows stochastic binary visible units. Right figure: Restricted Boltzmann machine. the joints between hidden units and also between visible units are disconnected. Reproduced from the open-access source [88]. Figure 6. View largeDownload slide Left figure: a general Boltzmann machine. The top layer shows stochastic binary hidden units, and the bottom layer shows stochastic binary visible units. Right figure: Restricted Boltzmann machine. the joints between hidden units and also between visible units are disconnected. Reproduced from the open-access source [88]. The states of each layer are written as vectors, denoted by X(0),…,X(n) (together denoted by X). Units xi and xj in adjacent layers are connected by symmetric connections with connection weight wij (modeling synaptic strength). For each adjacent pair of layers k and k+1, the weights can be combined into a weight matrix W(k). Each unit also has a bias parameter bi that determines its activation probability by functioning as a baseline input. In a traditional DBM, there are no lateral connections between units within a layer [89]. One of the main disadvantages of DBM is the significantly greater time needed for its function, which may be a problem with the large data sets [74, 90]. It was used for extraction of latent hierarchical representation from 3D patches of brain images [74, 91], and DBM learning was successfully used for the early diagnosis of Alzheimer’s disease [92]. The authors used a large data set from the Alzheimer’s disease neuroimaging initiative (ADNI), and cross-validation proved that the proposed method is not only valid for the differentiation between controls (NC) and AD images but it also provides good performance when tested for the more challenging case of classifying mild cognitive impairment subjects [92]. Such results may be used for creation of disease stage-oriented combination therapy strategy. Deep autoencoder learning In general, deep autoencoder (DA) refers to symmetric DBNs, which contain ‘encoder’ and ‘decoder’ parts [93]. The layers are restricted Boltzmann machines (Figure 6, right and Figure 7). Figure 7. View largeDownload slide The three input values are encoded to two feature variables. Pretraining defines the weight matrices W1 and W2. Reproduced from the open-access source [93]. Figure 7. View largeDownload slide The three input values are encoded to two feature variables. Pretraining defines the weight matrices W1 and W2. Reproduced from the open-access source [93]. The DA technique was used by Li and colleagues [94] for a template-based protein tertiary structure prediction. They used a version called ‘deep learning stacked denouncing autoencoder’ called PRSDA. In this study, 3D coordinates of four backbone atoms for each residue were used as parameters for the model, and the homology models were used for training the weights of the PRSDA model. Automatic chemical design of the drug molecules was proposed using a pair of neural networks trained together as an autoencoder [95]. The ‘deep learning stacked autoencoder’ method was successfully used for the prediction of drug–target interactions based on protein sequence parameters and substructure fingerprint information of the compounds [96]. Total 5-fold cross-validation demonstrated strong performance on a set of real examples with accuracy up to 94%. A combination of DL stacked autoencoder and ‘learning algorithm biased support vector machine’ (BSVM) was successfully used for the prediction of drug protein targets. The authors used as descriptors, the properties of the AAs of possible target proteins including tiny, small, aromatic, aliphatic, polar, nonpolar, charged and basic; they also included single-peptide cleavages, transmembrane helices, low complexity regions, N-glycosylation and O-glycosylation as descriptors. In total, using 39 properties as descriptors, Wang and colleagues [96] demonstrated high efficiency in predicting drug–target interactions using Stacked Autoencoder DNN. To describe the compounds, the authors used 881-2D features descriptors that can be downloaded from the PubChem website. To describe the drug–target interactions, the authors used a set of 5127 drug–target pairs from the following databases: SuperTarget [97], DrugBank [21], KEGG BRITE [98] and BRENDA [99] collected by Yamanishi and colleagues [100, 101]. On gold standard data sets (enzymes, ion channels, G protein-coupled receptors and nuclear receptors), the methods resulted in AUC values of 94.25% (83.2%), 91.10% (79.9%), 87.43% (85.7%) and 81.76% (82.4%). Selection of parameters for combination therapy ML In general, parameters needed for design of ML system in combination therapy can be divided into four groups. The first group, ‘physical and chemical parameters of compounds-based’, can include a significant number of 2D and 3D parameters. For example, the PaDel database [56, 57] is composed of >1000 descriptors related to such various compound parameters. The second group, ‘biochemical result based’, includes activity changes of the target biomolecule (protein, DNA, RNA, etc.) when the drug(s) are administered (e.g. changes in signaling or/and metabolic pathway activities). The third group, ‘cell-related results based’, includes changes in cell motility, proliferation, movement, etc., after the drug administration. The fourth group, ‘medical results based’, includes the initial characteristics of patients (gender, age, preliminary history of medications, etc.), as well as changes in patients’ state (PFS, OS, etc.) after treatment. These parameters also include changes in genomic, proteomic and metabolomic profiles after drug administration. Also, other data can be included the patients’ initial state and after treatment parameters, for example, gene aberrations and DNA methylation/acetylation, proteomic and metabolomic profiles, state of patient’s health and diagnosis. A more detailed description is presented in Table 1. Table 1. Parameters that can be used as descriptors in ML models for drug combination therapy efficacy prediction Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Table 1. Parameters that can be used as descriptors in ML models for drug combination therapy efficacy prediction Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Descriptors for the combination therapy efficacy prediction with ML systems Compound-related Physical and chemical parameters 2D parameters 3D parameters Biochemical results single compounds and combinations of compounds Target biomolecules (protein, DNA, RNA, etc.) reacting with compounds Signaling and/or metabolic pathways involved in interactions with compounds Cell-related Cell growth, proliferation apoptosis, etc., as reaction on compounds Patient-related Initial patients’ characteristics Diagnosis Genomic profile, including point mutations; small-scale insertions, deletions and duplications; CNVs; and DNA methylation, etc. Initial proteomic profile Initial metabolomic profile Initial histology Patients’ reaction on compounds Changes in patients’ health including PFS, OS, SEs, etc. Proteomic profile after a drug administration Metabolomic profile after a drug administration Histology after drugs administration Comparison of ML methods We compared Table 1) SVM, MLP Neural Nets, Bayesian, decision tree and random forest methods and DNN methods in different biomedical problems. I can state that there is no clear leader in traditional ML algorithms. The SVM method is the best in two cases: the first is related to microarray analysis [102], and the second is related to the prediction of bioactivity of drug-like compounds, although in the last case it performed somewhat worse than DNN CNN method [103]. The Bayesian method is the best for feature recognition in ultrasound images [104]. The random forest method was the best in two cases: first, in the recognition of features in MRI images [105], and second, in the prediction of drug-induced nephrotoxicity based on biochemical data [106]. DNN accuracy was, in all cases, better than traditional ML methods. In some cases, the improvement was not as great as expected, including in the prediction of bioactivity of protein inhibitors based on biochemical data [103], the recognition of lymph node metastasis from PET scan images [107], the detection of retinal detachment [108], the identification of autism spectrum disorder from the brain images [109] and the sequence-based prediction of protein–protein interaction [110]. In several other cases, DNN performed significantly better than traditional ML, including in image-based pulmonary nodule recognition in lung cancer [111], discrimination of breast cancer with microcalcifications on mammography [112] and detection of intracranial hypertension based on ECG and intracranial pressure data [113]. From the abovementioned results, one can guess that in traditional ML, there is no ‘champion’, and the results depend mostly on proper parametrization and descriptor selection. DNN performed better than traditional ML, but in each case, it is worth it to estimate a ratio of resource spending and accuracy improvement. Drug synergism and antagonism prediction DREAM—community computational challenge in prediction of drugs synergism and antagonism The competition organized by DREAM Challenges initiative and NCI involved 31 science teams from many countries. The problem was to solve the ability to predict if two drugs in combination were going to be synergistic or antagonistic based on their separate impact on OCI-LY3 human diffuse large B-cell lymphoma (DLBCL) cell line [114]. Organizers of this contest specifically stated that no preliminary training of the pairs of compounds known to be synergistic or antagonistic was permitted—clearly to prevent the use of any ML approach. In the context of this review, I think the importance of elucidating the results of this competition is for the establishment of boundaries for a possible non-ML approach in this field. Participants were provided with (i) dose–response curves for viability of OCI-LY3 cells following perturbation with 14 distinct compounds, (ii) gene expression profiles of the same cells including untreated and treated following perturbation with each of the 14 compounds and (iii) the previously reported baseline genetic profile of the OCI-LY3 cell line. The best-performing method DIGRE (drug-induced genomic residual effect) was based on the hypothesis that if cells are treated sequentially by two compounds, the transcription profiles induced by the first compound affect the outcome of the second compound. These assumptions were based on the previous work of Shah and Schwartz [115] and Recht with colleagues [116]. The second best-performing method (IU_UI-CCBB) was based on assumption that the activity of a compound can be estimated directly from differentially expressed genes after the treatment. Compound synergism or antagonism was defined from the concordance of the expression profiles in both cases. Several best methods were statistically significant in prediction of synergy (37.5 versus 17.5% by random selection) [114]. When competition arbiters created the integral predictor using all prediction methods used by the participants, the best predictive value was close to 46% sensitivity for synergy and 51% for antagonism. Not participating in the contest, but still taken in consideration, was the method by SnuGen which uses the Master Regulator Inference algorithm (MARINA) [117–119]. This MARINA approach was found to have 56% synergy prediction. Their approach, based on elucidation of ‘Mater Regulator’ genes, can be used in ML methods for selection of the valuable descriptors. Network-based Laplacian regularized least square synergistic drug combination prediction Prediction of synergism and antagonism of drugs is a valid problem because, taking into consideration the number of drugs, it is simply not possible in reasonable time to validate all possible combinations. Many scientists have tried to create computational methods aimed at such a prediction. In the previous section, we described the approach that specifically did not use ML and the best its result was a prediction with 46% of accuracy the synergy between two drugs, despite the efforts of 31 teams from around the world—applying ML approaches to the problem gives much better results. The method that Chen and colleagues [120] called Network-based Laplacian regularized least square synergistic drug combination prediction (NLLSS) was based on Laplasian regularized least square (LARLS). In the NLLSS strategy, several types of information are integrated, including known synergistic drug combinations for specific pathogens, drug combinations that do not show synergism, drug–target interactions and drug chemical structure. Sixty-nine compounds involved in antifungal drug combination experiments were studied. All published experimental studies of drug combinations were collated from the public sources. The authors classified compounds as either principal drugs or adjuvant drugs—if one compound in the synergistic combination shows experimental activity, but the other does not, then the first compound is considered the principal drug and the second the adjuvant drug. If both compounds in the synergistic pair show experimental activity or neither one shows activity, then these two compounds are named both principal and adjuvant drugs. If one compound does not have experimental effect with any of other compounds, then this compound is named according to its experimental activity. Using the NLSS approach, the authors achieved 89% prediction in 10-fold cross-validation. Rules-based optimization Drug treatment selection Rule-based expert systems are ‘a central foundational pillar of artificial intelligence’ as pointed out by Lathrop and Pazzani [121]. These authors describe the simple rules that are used in CTHIV (a rule-based expert computer program, ‘Customized Treatment Strategies for HIV’), a system in which drug treatment recommendations are made using: drug-resistant mutations, ranking and weighting based on the antiviral activities of the drug, overlapping toxicity’s, relative levels of drug resistance and the proportion of drug-resistant clones in the patients’ HIV quasi-species. The expert system is rule based, and the rules were written based on public information and case studies. For example, one rule in this system is [121, 122]: IF the value of RT codon number 151 is ATG (= it encodes methionine), THEN infer resistance to AZT, ddI, d4T, and ddC WITH weight = 1.0 The weigh in this rule is not ‘confidence’ as in standard expert systems but rather corresponds to estimated level of viral resistance to a specific drug. Weighs are in the range of 0.1 (low) to 1(high) and are defined by publication or/and expert opinions. In total, 55 rules are in the knowledge base and, in the case of HIV, the mutation of viral proteins is taken into consideration. A similar concept which takes into consideration the mutations of human genes can be used for estimation of possible drug resistance. For example, a famous mutation, C790T in the EGFR defines the resistance of the cancer cells to reversible tyrosine kinase inhibitors. The authors of CTHIV claim the ability to predict results for one to four drugs for optimal combination therapy. A potential drawback of this approach is that not only are real drug-resistant mutants considered but ‘nearby’ mutants—the HIV genes in the neighborhood or genes having drug-resistant mutations—are also considered. This can be misleading, as there is high specificity of drug binding to these proteins. Drug Interaction Knowledge-Base (DIKB) is a knowledge representation system designed to predict DDIs using drug action mechanisms [123]. Its knowledge base includes statements about drugs, drug metabolites and enzymes whose interactions are modeled basing on rule-based theory. Drug resistance elucidation There exist a number of rule-based systems that use the expert knowledge of HIV resistance mutations in viral proteins. Twenty of the early rule-based systems were devoted to elucidation of genotypic drug resistance for antiviral therapy in AIDS [121]. Several rules are embedded in such systems. For example, a rule: Y181C and E138C mutations in the virus reverse transcriptase cause resistance to etravirine, while addition of E138K mutation to Y181C decreases the level of resistance to this drug compared to Y181C alone [124]. Another rule states that the missense mutation N88S induces hypersensitivity to amprenavir [125]. These systems include databases covering all possible combinations of drug-resistant-associated mutations. Identification of a disease Total 20–25% of patients with primary hyperparathyroidism have multigland disease. Proper identification is important for decisions on medical treatment or surgery. Imbus and colleagues [126] studied 2010 patients of with primary hyperparathyroidism from a clinical trial. Medical imaging data were used for analysis. Random tree ML classifier had 96.1% predictive accuracy in selection. When a rule-based classifier was added the accuracy grew to 100%. Prediction of response to antiretroviral treatment Prosperi and colleagues [127] studied 3143 treatment change episodes for HIV patients from the EuResist database, which included patient demographics, treatment history and viral genotypes. Initial logistic regression ML model f prediction performed better than the rule-based genotypic interpretation system (accuracy 75.6 versus 70.0%) and more similar to random forest model (76.2%). Nevertheless, when the authors combined rule-based genotypic interpretation system with additional patients’ attributes, and this combination was used as input data for the regression model, the performance of the system increased significantly [127]. Discussion A number of AI methods are used in combination drug therapy. In many cases, the level of confidence—a percentage of correct predictions—varies between 0.7 and 0.9, what is comparable with most automatic prediction systems. The differences between the various types of ML used for these prediction scores are not too great. ANNs, random forest and SVM all have advantages and disadvantages, and the main problem in using AI for combination therapy is the proper selection of input parameters. It is crucial for the predicting methods that the parameters affecting the quality of the prediction model be applied. From this point of view, the work of Dash and Liu [29] on the use of SVM is particularly interesting. In that study, the authors filter the input parameters and then withdraw one parameter at a time to determine how the effectivity of the model deteriorates. Drug resistance is most probably the best example in which an AI system should include a combination of expert rules and machine learning. Indeed, the knowledge that some specific mutation leads to resistance to a specific drug is the result of expert knowledge based on the biomedical data. When the number of cases containing the same aberrations or aberrations with the similar functional impact on genes increases sufficiently the prediction system would be able to use the ML methods. Deep learning versus traditional machine learning A general conception regarding DL is: DL requires more of everything: more source data, more computational brawn and more memory and storage resources. Shaikh and colleagues [128] note that DL requires a lot of hardware. ‘I have seen people training a simple DL model for days on their laptops (typically without GPUs) which leads to an impression that DL requires big systems to run execute’. Kumar and colleagues [129] confirm this common point of view: ‘Deep learning requires large amount of computational power to train models with these large datasets. Nevertheless, with the cloud and availability of Graphical Processing Units (GPUs), it is becoming possible to build sophisticated deep neural architectures and train them on a large data set on powerful computing infrastructure on the cloud.’ As noted by Cui and colleagues [130] in DL, ‘large multi-layer neural networks are trained without preconceived models to learn complex features from raw input data. With sufficient training data and computing power, DL approaches far outperform other approaches for such tasks. The computation required, however, is substantial—prior studies have reported that satisfactory accuracy requires training large (billion-plus connection) neural networks on 100 s or 1000 s of servers for days [7, 14]’. Neural network training is known to be well supported by GPUs but, as noted by Chilimbi and colleagues [131], this approach is only efficient for smaller-scale neural networks that can fit on GPUs attached to a single machine. The challenges of limited GPU memory and inter-machine communication have been identified as major problems of GPU introduction to DL [130]. Yepes and coauthors [132] compared SVM and Deep Belief Networks as classifiers in text categorization in biomedical domain. They show that DBH are superior when a large set of training examples is available, with an F-score increase up to 5%. SVM performance is superior to DBM with smaller datasets. The differences in the best accuracies even for the larger data set of 7688 input examples are modest (e.g. the accuracy of SVM is 0.89, while the best for DBM is 0.90). So, when one decides whether to use DL instead of traditional ML methods, he/she must clearly understand all pros and cons for the introduction of DL. As I showed above (Table 2), only in three cases of eight DL significantly overperformed ML in solving biomedical problems. If one has a lot of training examples and available GPU containing computers or cloud computing, it is most likely that one will have to move to DL, and in the case of smaller data sets, one will have to decide on a case-by-case basis. Table 2. Comparison of ML methods accuracy for different biomedical problems Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 * Stacked autoencoder, ** Deep autoencoder. Table 2. Comparison of ML methods accuracy for different biomedical problems Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 Data set DNN type Accuracy (%) DNN- SVM MLP Neural Nets Bayesian Decision tree Random forest Recognition of cancers from microarray analysis (averaged of eight data sets by the author) [102] 96.17 84.65 86.81 82.5 84.32 fMRI decoding [105] 84 89 87 92 Prediction of rapid progression of atherosclerosis based on analysis of ultrasound images (AUC) [104] 71.1 79.7 73.6 Prediction of pain intensity based on MRI [133] 91.33 88.83 92.00 95.81 Prediction of drug-induced nephrotoxicity [106] 81.6 70.2 87.8 Prediction of bioactivity of inhibitors of seven proteins (averaged by I.F.T.) [103] CNN 91.2 90.3 76.3 89.1 Recognition of lymph node metastasis from PET scan images [107] best values CNN CNN 87.40 83.15 85.08 Detecting retinal detachment (AUC) [108] CNN 98.8 97.6 Identification of autism spectrum disorder from the brain images [109] DA** 70.0 65.0 63.0 Sequence-based prediction of protein–protein interaction [110] SAE* 97.2 92.0-97.4 90.0 Pulmonary nodule recognition in lung cancer (image-based) [111] CNN 78.0 40.0 Discrimination of breast cancer with microcalcifications on mammography [107] SAE* 89.7 61.3 Detection of intracranial hypertension based on ECG and intracranial pressure data [113] CNN 87.19 73.6 SAE*+CNN 92.05 73.6 * Stacked autoencoder, ** Deep autoencoder. Existing programs for ML Above I present a table of most popular ML programs that can be downloaded or/and used by both beginners and experienced users/programmers (Table 3). Table 3. ML and DL programs that can be used Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Table 3. ML and DL programs that can be used Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Shogun toolbox contains hundreds of various programs related to ML, including, but not limited SVM, MLP, random forest, DA, DBN. This is comprehensive set of programs that requires some knowledge of programming. http://www.shogun-toolbox.org/mission Mahout: Suite of ML libraries including logistic regression, naïve Bayes, hidden Markov models, k-means clustering and others. Require a programming knowledge. Mahout algorithms are implemented on top of Apache Hadoop package. http://mahout.apache.org/users/basics/algorithms.html Mlib apache spark library includes a number of ML tools: SVM, logic regression, naïve Bayes, decision trees, random forest, gradient boosted trees, K-means and other clustering tools https://spark.apache.org/docs/latest/mllib-guide.html H2O prediction engine: Open-source ML library known for speed and scalability. Especially good for large volumes of data. Its algorithms include DL (only MLP), ensemble trees such as XGBoost and random forest. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html Deep Water H20 supports DL CNN and RNN with the use of GPU. It integrates the open-source TensorFlow, MXNet and Caffe packages. https://www.h2o.ai/deep-water/ GoLEarn package includes k-NN, ANN, linear and logistic regression models https://godoc.org/github.com/sjwhitworth/golearn WEKA is an open-source program that can be downloaded and used without any additional programming. It contains tools for creating the following ML models: naïve Bayes, linear regression, k-NN, decision trees, including random forest, MLP and SVM. Existence of literary tens of examples and tutorials on Web makes this program useful for the beginners, but it also can be used for real solid applications. I wound recently >10 recent articles noting that the authors use WEKA for various tasks in biomedical science. https://www.cs.waikato.ac.nz/ml/weka/ ConvNetJS is a Javascript library for training DL models entirely in your browser. It contains traditional neural networks, SVM, regression, CNN and Deep Q Learning. Code is available on Github (https://github.com/karpathy/convnetjs) under MIT license. http://cs.stanford.edu/people/karpathy/convnetjs/[134] Caffe[135] is a DL tool. Models can be trained and used without programming, though Python and MATLAB interfaces are available. As noted by Angermueller and colleagues [134], Caffe offers one of the most efficient implementations for CNNs and provides multiple pretrained models for image recognition. RNNs are also implemented. As a downside, custom models need to be written in C++, and Caffe is not optimized for recurrent architectures. http://caffe.berkeleyvision.org/ Theano [136, 137] is well suited for building custom models and offers efficient implementations for RNNs. As noted by Angermueller and colleagues [134], software wrappers such as Keras (https://github.com/fchollet/keras) or Lasagne (https://github.com/Lasagne/Lasagne) provide allow building networks from existing components, and reusing pretrained networks. The major drawback of Theano is frequently long compile times when building larger models TensorFlow is created by Google to replace Theano and these two libraries are similar. RNN and CNN DL models can be created. Because of the algorithms used, it is significantly slower than other DL methods, but a level of user’s support is significantly more profound. https://www.tensorflow.org/ Torch7 has support for ML algorithms using GPU that make it convenient from the point of speed of execution. It can be used for creating RNN models, Autoencoders, along with K-mean and PCA. http://torch.ch/ Key Points ML strategies are successful in drug design. Main problem in using AI for combination therapy is the proper selection of input parameters. Combination of rule-based and ML methods is promising in combination therapy. Funding This article is partially supported by CureMatch Inc. Igor F. Tsigelny is an expert in structural biology, molecular modeling, bioinformatics, structure-based drug design and personalized medicine. He published >200 articles, 4 scientific books and around 15 patents. The book ‘Protein Structure Prediction: Bioinformatic Approach’ that he edited has been called ‘The Bible of all current prediction techniques’ by BioPlanet Bioinformatics Forums. His computational study of molecular mechanisms of Parkinson’s disease was included in the US Department of Energy publication ‘Decade of Discovery’ where the best computational studies of the decade 1999–2009 have been described. He is a Research Professor in the UC San Diego and CTO of CureMatch Inc. (San Diego). References 1 Calzolari D , Bruschi S , Coquin L. Search algorithms as a framework for the optimization of drug combinations . PLoS Comput Biol 2008 ; 4 ( 12 ): e1000249 . Google Scholar CrossRef Search ADS PubMed 2 Calzolari D , Paternostro G , Harrington PL. Selective control of the apoptosis signaling network in heterogeneous cell populations . PLoS One 2007 ; 2 ( 6 ): e547 . Google Scholar CrossRef Search ADS PubMed 3 Shortliffe EH , Buchanan B. A model of inexact reasoning in medicine . Math Biosci 1975 ; 23 ( 3–4 ): 351 – 79 . Google Scholar CrossRef Search ADS 4 Shortliffe EH , Axline SG , Buchanan BG , et al. An artificial intelligence program to advise physicians regarding antimicrobial therapy . Comput Biomed Res 1973 ; 6 ( 6 ): 544 – 60 . http://dx.doi.org/10.1016/0010-4809(73)90029-3 Google Scholar CrossRef Search ADS PubMed 5 Poli R , Cagnoni S , Livi R , et al. A neural network expert system for diagnosing and treating hypertension . Computer 1991 ; 24 ( 3 ): 64 – 71 . http://dx.doi.org/10.1109/2.73514 Google Scholar CrossRef Search ADS 6 Barry DW , Underwood CS , McCreedy BJ , et al. US Patent 6188988. 7 Pedregosa F , Varoquaux G , Gramfort A , et al. Scikit-learn: machine learning in Python . J Mach Learn Res 2011 ; 12 : 2825 – 30 . 8 Vidyasagar M. Identifying predictive features in drug response using machine learning: opportunities and challenges, identifying predictive features in drug response using machine learning: opportunities and challenges . Annu Rev Pharmacol Toxicol 2015 ; 55 ( 1 ): 15 – 34 . http://dx.doi.org/10.1146/annurev-pharmtox-010814-124502 Google Scholar CrossRef Search ADS PubMed 9 Wang D , Larder BA , Revell A , et al. A neural network model using clinical cohort data accurately predicts virological response and identifies regimens with increased probability of success in treatment failures . Antiviral Therapy 2003 ; 8 : S112 . 10 Bishop CM. Neural Networks for Pattern Recognition . Oxford : Clarendon Press , 1995 . 11 Menden MP , Iorio F , Garnett M , et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties . PLoS One 2013 ; 8 ( 4 ): e61318 . Google Scholar CrossRef Search ADS PubMed 12 Garnett MJ , Edelman EJ , Heidorn SJ , et al. Systematic identification of genomic markers of drug sensitivity in cancer cells . Nature 2012 ; 483 ( 7391 ): 570 – 5 . http://dx.doi.org/10.1038/nature11005 Google Scholar CrossRef Search ADS PubMed 13 Heaton J. Programming Neural Networks with Encog3 in Java . St. Lois : Heaton Research, Inc. , 2011 . 14 Menden MP , Iorio F , Garnett M , et al. A direct adaptive method for faster backpropagation learning - the rprop algorithm . IEEE Intern Conf Neur Netw 2013 ; 8 ( 4 ): 586 – 91 . 15 Dosovitskiy A , Fischer P , Springenberg JT , et al. Discriminative unsupervised feature learning with exemplar convolutional neural networks . IEEE Trans Pattern Anal Mach Intell 2016 ; 38 ( 9 ): 1734 – 47 . http://dx.doi.org/10.1109/TPAMI.2015.2496141 Google Scholar CrossRef Search ADS PubMed 16 Lötsch J , Ultsch A. Process pharmacology: a pharmacological data science approach to drug development and therapy . CPT Pharmacometrics Syst Pharmacol 2016 ; 5 ( 4 ): 192 – 200 . Google Scholar CrossRef Search ADS PubMed 17 Ultsch A. Maps for visualization of high-dimensional data spaces. In: Proceedings of Workshop on Self-Organizing Maps. Kyushu, Japan: WSOM, 2003 , 225–30. 18 Ultsch A , Sieman HP , eds. Kohonen’s self-organizing feature maps for exploratory data analysis. In Proceedings of International Neural Networks Conference (INNC 1990). Dordrecht, Netherlands: Kluwer, 1990 . 19 Lotsch J , Ultsch A. Exploiting the structures of the U-Matrix. In: Villmann T , Schleif FM , Kaden M , Lange M. (eds). Advances in Intelligent Systems and Computing . Heidelberg, Germany : Springer , 2014 , 248 – 57 . 20 Ultsch A , Moerchen F. Databionic ESOM tools 2005. http://databionic-esom.sourceforge.net/devel.html 21 Wishart DS , Knox C , Guo AC , et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration . Nucleic Acids Res 2006 ; 34(Database issue) : D668 – 72 . Google Scholar CrossRef Search ADS 22 Huang DW , Sherman BT , Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources . Nat Protoc 2009 ; 4 ( 1 ): 44 – 57 . http://dx.doi.org/10.1038/nprot.2008.211 Google Scholar CrossRef Search ADS PubMed 23 Ashburner M , Ball CA , Blake JA , et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium . Nat Genet 2000 ; 25 ( 1 ): 25 – 9 . Google Scholar CrossRef Search ADS PubMed 24 Pivetta T , Isaia F , Trudu F , et al. Development and validation of a general approach to predict and quantify the synergism of anti-cancer drugs using experimental design and artificial neural networks . Talanta 2013 ; 115 : 84 – 93 . http://dx.doi.org/10.1016/j.talanta.2013.04.031 Google Scholar CrossRef Search ADS PubMed 25 Cortes C , Vapnik VN. Support vector networks . Mach Learn 1995 ; 20 ( 3 ): 273 – 97 . http://dx.doi.org/10.1007/BF00994018 26 Zhao M , Li Z , He W. Classifying four carbon fiber fabrics via machine learning: a comparative study using ANNs and SVM . Appl Sci 2016 ; 6 ( 8 ): 209 . http://dx.doi.org/10.3390/app6080209 Google Scholar CrossRef Search ADS 27 Li H , Tang X , Wang R , et al. Comparative study on theoretical and machine learning methods for acquiring compressed liquid densities of 1, 1, 1, 2, 3, 3, 3-heptafluoropropane (R227ea) via song and mason equation, support vector machine, and artificial neural networks . Appl Sci 2016 ; 6 ( 1 ): 25 . http://dx.doi.org/10.3390/app6010025 Google Scholar CrossRef Search ADS 28 Dorman SN , Baranova K , Knoll JH , et al. Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning . Mol Oncol 2016 ; 10 ( 1 ): 85 – 100 . http://dx.doi.org/10.1016/j.molonc.2015.07.006 Google Scholar CrossRef Search ADS PubMed 29 Dash M , Liu H. Feature selection for classification . Intell Data Anal 1997 ; 1 ( 1–4 ): 131 – 156 . Google Scholar CrossRef Search ADS 30 Menden MP , Iorio F , Garnett M , et al. SIFT: predicting amino acid changes that affect protein function . Nucleic Acids Res 2013 ; 8 ( 4 ): 3812 – 14 . 31 Breiman L. Random forests . Mach Learn 2001 ; 45 ( 1 ): 5 – 32 . http://dx.doi.org/10.1023/A:1010933404324 Google Scholar CrossRef Search ADS 32 Verikas A , Vaiciukynas E , Gelzinis A , et al. Electromyographic patterns during golf swing: activation sequence profiling and prediction of shot effectiveness . Sensors 2016 ; 16 ( 4 ): 592 . Google Scholar CrossRef Search ADS 33 Chen L , Li BQ , Zheng MY , et al. Prediction of effective drug combinations by chemical interaction, protein interaction and target enrichment of KEGG pathways . Biomed Res Int 2013 ; 2013 : 723780 . Google Scholar PubMed 34 Kuhn M , von Mering C , Campillos M , et al. STITCH: interaction networks of chemicals and proteins . Nucleic Acids Res 2008 ; 36(Database issue) : D684 – 8 . 35 Jensen LJ , Kuhn M , Stark M , et al. STRING 8—a global view on proteins and their functional interactions in 630 organisms . Nucleic Acids Res 2009 ; 37(Database issue) : D412 – 16 . Google Scholar CrossRef Search ADS 36 Hansen PW , Clemmensen L , Sehested TS , et al. Identifying drug–drug interactions by data mining . Circ Cardiovasc Qual Outcomes 2016 ; 9 ( 6 ): 621 – 8 . http://dx.doi.org/10.1161/CIRCOUTCOMES.116.003055 Google Scholar CrossRef Search ADS PubMed 37 McDonald JH. Handbook of Biological Statistics. http://www.biostathandbook.com/simplelogistic.html. 38 Huang H , Zhang P , A Xiaoyan Q , et al. Systematic prediction of drug combinations based on clinical side-effects . Sci Rep 2014 ; 4 : 7160 . Google Scholar CrossRef Search ADS PubMed 39 Kuhn M , Campillos M , Letunic I , et al. A side effect resource to capture phenotypic effects of drugs . Mol Syst Biol 2010 ; 6 : 343 . Google Scholar CrossRef Search ADS PubMed 40 Tatonetti NP , Ye PP , Daneshjou R , et al. Data-driven prediction of drug effects and interactions . Sci Transl Med 2012 ; 4 ( 125 ): 125ra131 . Google Scholar CrossRef Search ADS 41 http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/. 42 Menden MP , Iorio F , Garnett M , et al. Scikit-learn: machine learning in Python . J Mach Learn Res 2013 ; 8 ( 4 ): 2825 – 30 . 43 Friedman JH. Stochastic gradient boosting . Comp Stat Data Anal 2002 ; 38 ( 4 ): 367 – 78 . http://dx.doi.org/10.1016/S0167-9473(01)00065-2 Google Scholar CrossRef Search ADS 44 Xu Q , Xiong Y , Dai H , et al. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm . J Theor Bio 2017 ; 417 : 1 – 7 . http://dx.doi.org/10.1016/j.jtbi.2017.01.019 Google Scholar CrossRef Search ADS 45 Glick M , Jenkins JL , Nettles JH , et al. Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers . J Chem Inf Model 2006 ; 46 ( 1 ): 193 – 200 . http://dx.doi.org/10.1021/ci050374h Google Scholar CrossRef Search ADS PubMed 46 Schuffenhauer A , Floersheim P , Acklin P , et al. Similarity metrics for ligands reflecting the similarity of the target proteins . J Chem Inf Comput Sci 2003 ; 43 ( 2 ): 391 – 405 . http://dx.doi.org/10.1021/ci025569t Google Scholar CrossRef Search ADS PubMed 47 Nidhi GM , Davies JW , et al. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases . J Chem Inf Model 2006 ; 46 : 1124 – 33 . http://dx.doi.org/10.1021/ci060003g Google Scholar CrossRef Search ADS PubMed 48 Xia X , Maliski EG , Gallant P , et al. Classification of kinase inhibitors using a Bayesian model . J Med Chem 2004 ; 47 ( 18 ): 4463 – 70 . http://dx.doi.org/10.1021/jm0303195 Google Scholar CrossRef Search ADS PubMed 49 Glick M , Klon AE , Acklin P , et al. Enrichment of extremely noisy high-throughput screening data using a naïve Bayes classifier . J Biomol Screening 2004 ; 9 ( 1 ): 32 – 6 . Google Scholar CrossRef Search ADS 50 Hameed PN , Verspoor K , Kusljic S , et al. Positive-unlabeled learning for inferring drug interactions based on heterogeneous attributes . BMC Bioinformatics 2017 ; 18 ( 1 ): 140 . http://dx.doi.org/10.1186/s12859-017-1546-7 Google Scholar CrossRef Search ADS PubMed 51 Vilar S , Uriarte E , Santana L , et al. Similarity-based modeling in large-scale prediction of drug-drug interactions . Nat Protoc 2014 ; 9 ( 9 ): 2147 – 63 . http://dx.doi.org/10.1038/nprot.2014.151 Google Scholar CrossRef Search ADS PubMed 52 Zaman N , Li L , Jaramillo ML , et al. Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets . Cell Rep 2013 ; 5 ( 1 ): 216 – 23 . http://dx.doi.org/10.1016/j.celrep.2013.08.028 Google Scholar CrossRef Search ADS PubMed 53 Reimand J , Tooming L , Peterson H , et al. GraphWeb: mining heterogeneous biological networks for gene modules with functional significance . Nucleic Acids Res 2008 ; 36(Web Server issue) : W452 – 9 . Google Scholar CrossRef Search ADS 54 Wang E , Zaman N , Mcgee SR , et al. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data . Semin Cancer Biol 2015 ; 30 : 4 – 12 . http://dx.doi.org/10.1016/j.semcancer.2014.04.002 Google Scholar CrossRef Search ADS PubMed 55 Li L , Tibiche C , Fu C , et al. The human phosphotyrosine signaling network: evolution and hotspots of hijacking in cancer . Genome Res 2012 ; 22 ( 7 ): 1222 – 30 . http://dx.doi.org/10.1101/gr.128819.111 Google Scholar CrossRef Search ADS PubMed 56 Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints . J Comp Chem 2010 ; 7 : 1466 – 74 . 57 PaDel-descriptor. http://www.yapcwsoft.com/dd/padeldescriptor/. 58 McGee SR , Tibiche C , Trifiro M , et al. Network analysis reveals a signaling regulatory loop in pik3ca-mutated breast cancer predicting survival outcome . Genomics Proteomics Bioinformatics 2017 ; 15 ( 2 ): 121 – 9 . http://dx.doi.org/10.1016/j.gpb.2017.02.002 Google Scholar CrossRef Search ADS PubMed 59 Li J , Lenferink AEG , Deng Y , et al. Identification of high-quality cancer prognostic markers and metastasis network modules . Nat Commun 2010 ; 1 ( 34 ): 1 – 8 . Google Scholar PubMed 60 Albelwi S , Mahmood A. A framework for designing the architectures of deep convolutional neural networks . Entropy 2017 ; 19 ( 6 ): 242 . http://dx.doi.org/10.3390/e19060242 Google Scholar CrossRef Search ADS 61 Hubel DH , Wiesel TN. Receptive fields and functional architecture of monkey striate cortex . J Physiol 1968 ; 195 ( 1 ): 215 – 43 . http://dx.doi.org/10.1113/jphysiol.1968.sp008455 Google Scholar CrossRef Search ADS PubMed 62 Preuer K. Deep learning for drug combinations synergy prediction. Thesis, Johannes Kepler Universitat, Linz, 2016 . 63 Hanahan D , Weinberg RA. The hallmarks of cancer . Cell 2000 ; 100 ( 1 ): 57 – 70 . http://dx.doi.org/10.1016/S0092-8674(00)81683-9 Google Scholar CrossRef Search ADS PubMed 64 Hanahan D , Weinberg RA. Hallmarks of cancer: the next generation . Cell 2011 ; 144 ( 5 ): 646 – 74 . http://dx.doi.org/10.1016/j.cell.2011.02.013 Google Scholar CrossRef Search ADS PubMed 65 Loewe S. Die quantitativen probleme der pharmakologie . Ergeb Physiol 1928 ; 27 : 47 – 187 . http://dx.doi.org/10.1007/BF02322290 Google Scholar CrossRef Search ADS 66 Bliss C. The toxicity of poisons applied jointly . Ann Appl Biol 1939 ; 26 ( 3 ): 585 – 615 . http://dx.doi.org/10.1111/j.1744-7348.1939.tb06990.x Google Scholar CrossRef Search ADS 67 Berenbaum MC. What is synergy? Pharmacol Rev 1989 ; 41 ( 2 ): 93 – 141 . Google Scholar PubMed 68 Jodrell D. Combenefit. 2015 . http://sourceforge.net/projects/combenefit/ (2 August 2016, date last accessed). 69 Vougas K , Jackson T , Polyzos A , et al. Deep learning and association rule mining for predicting drug response in cancer. A personalised medicine approach . bioRxiv 2017 . http://dx.doi.org/10.1101/070490 (19 August 2016, date last accessed). 70 LeCun Y , Bengio Y , Hinton G. Deep learning . Nature 2015 ; 521 ( 7553 ): 436 – 44 . http://dx.doi.org/10.1038/nature14539 Google Scholar CrossRef Search ADS PubMed 71 Breiman L. Bagging predictors . Mach Learn 1996 ; 24 ( 2 ): 123 – 40 . http://dx.doi.org/10.1007/BF00058655 72 Atkinsonm F. Standardiser v0.1.7. 2014 . https://github.com/flatkinson/standardiser (11 December 2015, date last accessed). 73 Williams RJ , Zipser D. A learning algorithm for continually running fully recurrent neural networks . Neural Comput 1989 ; 1 ( 2 ): 270 – 80 . http://dx.doi.org/10.1162/neco.1989.1.2.270 Google Scholar CrossRef Search ADS 74 Ravi D , Wong C , Deligianni F , et al. Deep learning for health informatics . IEEE J Biomed Health Inform 2017 ; 21 ( 1 ): 4 – 21 . http://dx.doi.org/10.1109/JBHI.2016.2636665 Google Scholar CrossRef Search ADS PubMed 75 Hochreiter S , Schmidhuber J. Long short-term memory . Neural Comput 1997 ; 9 ( 8 ): 1735 – 80 . http://dx.doi.org/10.1162/neco.1997.9.8.1735 Google Scholar CrossRef Search ADS PubMed 76 Lipton ZC , Kale DC , Elkan C , et al. Learning to diagnose with LSTM recurrent neural networks . arXiv . arXiv: 1511.03677. 77 Lusci A , Pollastri G , Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules . J Chem Inf Model 2013 ; 53 ( 7 ): 1563 – 75 . http://dx.doi.org/10.1021/ci400187y Google Scholar CrossRef Search ADS PubMed 78 Hinton GE , Osindero S , Teh YW. A fast learning algorithm for deep belief nets . Neural Comput 2006 ; 18 ( 7 ): 1527 – 54 . http://dx.doi.org/10.1162/neco.2006.18.7.1527 Google Scholar CrossRef Search ADS PubMed 79 Hou Y , Wang C , Ji Y. The research of event detection and characterization technology of ticket gate in the urban rapid rail transit . J Softw Eng Appl 2015 ; 8 : 6 – 15 . http://dx.doi.org/10.4236/jsea.2015.81002 Google Scholar CrossRef Search ADS 80 Ibrahim R , Yousri NA , Ismail MA , et al. Multi-level gene/miRNA feature selection using deep belief nets and active learning . Proc Eng Med Biol Soc 2014 ; 2014 : 3957 – 60 . 81 Ghaisani F , Wasito I , Faturrahman M , et al. Prognosis cancer prediction model using deep belief network approach . J Theor Appl Inf Technol 2017 ; 95 ( 20 ): 5369 – 78 . 82 Khademi M , Nedialkov NS. Probabilistic graphical models and deep belief networks for prognosis of breast cancer. In: Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA 2015). 2015 , 727–32. 83 Lee H , Grosse R , Ranganath R , et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009 , 609–16. 84 Li H , Grosse R , Rengana R , et al. Unsupervised learning of hierarchical representation with convolutional deep belief networks . Comm of ACM 2011 ; 54 : 95 – 103 . Google Scholar CrossRef Search ADS 85 Cao R , Bhattacharya D , Hou J , et al. DeepQA: improving the estimation of single protein model quality with deep belief networks . BMC Bioinformatics 2016 ; 17 ( 1 ): 2 – 9 . Google Scholar PubMed 86 Nair V , Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010 , 807–14. Haifa, Israel: Omnipress. http://www.icml2010.org/papers/432.pdf. 87 Salakhutdinov R , Hinton GE. Deep Boltzmann machines. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. 2009 , 448–55. ACM 2009. 88 Keyvanrad MA , Homayoonpoor M. A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet V2.0) . arXiv . arXiv: 1408.3264 [cs.CV] https://www.researchgate.net/publication/264790642_ (11 December 2017, date last accessed). 89 Reichert DP , Seriès P , Storkey AJ. Charles bonnet syndrome: evidence for a generative model in the cortex? PLoS Comput Biol 2013 ; 9 ( 7 ): e100313 . Google Scholar CrossRef Search ADS 90 Guo Y , Liu Y , Oerlemans A , et al. Deep learning for visual understanding: a review . Neurocomput 2016 ; 187 : 27 – 48 . http://dx.doi.org/10.1016/j.neucom.2015.09.116 Google Scholar CrossRef Search ADS 91 Suk HI , Lee SW , Shen D. Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis . Neuroimage 2014 ; 101 : 569 – 82 . http://dx.doi.org/10.1016/j.neuroimage.2014.06.077 Google Scholar CrossRef Search ADS PubMed 92 Ortiz A , Munilla J , Górriz JM , Ramírez J. Ensembles of deep learning architectures for the early diagnosis of the Alzheimer's disease . Int J Neur Syst 2016 ; 26 ( 7 ): 1650025 . Google Scholar CrossRef Search ADS 93 Graff P , Feroz F , Hobson MP , Lasenby A. SKYNET: an efficient and robust neural network training tool for machine learning in astronomy . Mon Not Roy Astron Soc 2014 ; 441 ( 2 ): 1741 – 59 . arXiv: 1309.0790 [astro-ph.IM] Google Scholar CrossRef Search ADS 94 Li H , Lyu Q , Cheng J , et al. A tempate-based protein structure reconstruction method using deep autoencoder learning . J Proteomics Bioinform 2016 ; 9 ( 12 ): 306 – 13 . Google Scholar CrossRef Search ADS PubMed 95 Gomez-Bombarelli R , Duvenaud D , Miguel J. Automatic chemical design using a data-driven continuous representation of molecules . arXiv . arXiv: 1610.02415v2 [cs.LG] 6 Jan 2017 96 Wang L , You ZH , Chen X. A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network . J Comput Biol 2017 , in press. 10.1089/cmb.2017.0135. 97 Gunther S , Kuhn M , Dunkel M , et al. SuperTarget and matador: resources for exploring drug-target relationships . Nucl Acids Res 2008 ; 36(Database issue) : D919 – 22 . 98 Kanehisa M , Goto S , Hattori M , et al. From genomics to chemical genomics: new developments in KEGG . Nucl Acids Res 2006 ; 34 ( 90001 ): D354 – 7 . Google Scholar CrossRef Search ADS PubMed 99 Schomburg I , Chang A , Ebeling C , et al. BRENDA, the enzyme database: updates and major new developments . Nucl Acids Res 2004 ; 32(Database issue) : D431 – 3 . Google Scholar CrossRef Search ADS 100 Yamanishi Y , Araki M , Gutteridge A , et al. Prediction of drug-target interaction networks from the integrationof chemical and genomic spaces . Bioinformatics 2008 ; 24 ( 13 ): I232 – 40 . Google Scholar CrossRef Search ADS PubMed 101 Yamanishi Y , Kotera M , Kanehisa M , et al. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework . Bioinformatics 2010 ; 26 ( 12 ): i246 – 54 . Google Scholar CrossRef Search ADS PubMed 102 Pirooznia M , Yang JY , Yang MQ , Deng Y. A comparative study of different machine learning methods on microarray gene expression data . BMC Genomics 2008 ; 9(Suppl 1) : S13 . Google Scholar CrossRef Search ADS PubMed 103 Koutsoukas A , Monaghan KJ , Li X , Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data . J Cheminform 2017 ; 9 ( 1 ): 42 . http://dx.doi.org/10.1186/s13321-017-0226-y Google Scholar CrossRef Search ADS PubMed 104 Hu X , Reaven PD , Saremi A. Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance. EURASIP . J Bioinform Syst Biol 2016 ; 1 : 14 . Google Scholar CrossRef Search ADS 105 Douglas PK , Harris S , Yuille A , Cohen MS. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief . Neuroimage 2011 ; 56 ( 2 ): 544 – 53 . Google Scholar CrossRef Search ADS PubMed 106 Su R , Li Y , Zink D , Loo LH. Supervised prediction of drug-induced nephrotoxicity based on interleukin-6 and -8 expression levels . BMC Bioinformatics 2014 ; 15(Suppl 16) : S16 . Google Scholar CrossRef Search ADS PubMed 107 Wang H , Zhou Z , Li Y. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT . EJNMMI Res 2017 ; 7 : 11 . http://dx.doi.org/10.1186/s13550-017-0260-9 Google Scholar CrossRef Search ADS PubMed 108 Ohsugi H , Tabuchi H , Enno H. Accuracy of deep learning, a machine-learning technology, using ultra–wide-field fundus ophthalmoscopy for detecting rhegmatogenous retinal detachment . Sci Rep 2017 ; 7 : 9425 . http://dx.doi.org/10.1038/s41598-017-09891-x Google Scholar CrossRef Search ADS PubMed 109 Heinsfeld AS , Franco AR , Craddock RC , et al. Identification of autism spectrum disorder using deep learning and the ABIDE dataset . Neuroimage 2018 ; 17 : 16 – 23 . http://dx.doi.org/10.1016/j.nicl.2017.08.017 Google Scholar CrossRef Search ADS PubMed 110 Sun T , Zhou B , Lai L , Pei J. Sequence-based prediction of protein-protein interaction using a deep-learning algorithm . BMC Bioinformatics 2017 ; 18 ( 1 ): 277 . http://dx.doi.org/10.1186/s12859-017-1700-2 Google Scholar CrossRef Search ADS PubMed 111 Ciompi F , Chung K , van Riel SJ , et al. Towards automatic pulmonary nodule management in lung cancer screening with deep learning . Sci Rep 2017 ; 7 : 46479 . http://dx.doi.org/10.1038/srep46479 Google Scholar CrossRef Search ADS PubMed 112 Wang J , Yang X , Cai H , et al. Discrimination of breast cancer with microcalcifications on mammography by deep learning . Sci Rep 2016 ; 6 : 27327 . http://dx.doi.org/10.1038/srep27327 Google Scholar CrossRef Search ADS PubMed 113 Quachtran B , Hamilton R , Scalzo F. Detection of intracranial hypertension using deep learning . Proc IAPR Int Conf Pattern Recogn 2016 ; 2016 : 2491 – 6 . Google Scholar PubMed 114 Bansal M , Yang J , Karan C , et al. A community computational challenge to predict the activity of pairs of compounds . Nat Biotechnol 2014 ; 32 ( 12 ): 1213 – 22 . http://dx.doi.org/10.1038/nbt.3052 Google Scholar CrossRef Search ADS PubMed 115 Shah MA , Schwartz GK. Cell cycle-mediated drug resistance an emerging concept in cancer therapy . Clin Cancer Res 2001 ; 7 : 2168 – 81 . Google Scholar PubMed 116 Recht A , Come SE , Henderson IC , et al. The sequencing of chemotherapy and radiation therapy after conservative surgery for early-stage breast cancer . N Engl J Med 1996 ; 334 ( 21 ): 1356 – 61 . http://dx.doi.org/10.1056/NEJM199605233342102 Google Scholar CrossRef Search ADS PubMed 117 Aytes A , Mitrofanova A , Lefebvre C , et al. Cross-species regulatory network analysis identifies a synergistic interaction between FOXM1 and CENPF that drives prostate cancer malignancy . Cancer Cell 2014 ; 25 ( 5 ): 638 – 51 . http://dx.doi.org/10.1016/j.ccr.2014.03.017 Google Scholar CrossRef Search ADS PubMed 118 Chen JC , Alvarez MJ , Talos F , et al. Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks . Cell 2014 ; 159 ( 2 ): 402 – 14 . http://dx.doi.org/10.1016/j.cell.2014.09.021 Google Scholar CrossRef Search ADS PubMed 119 Chudnovsky Y , Kim D , Zheng S , et al. ZFHX4 interacts with the NuRD core member CHD4 and regulates the glioblastoma tumor-initiating cell state . Cell Rep 2014 ; 6 ( 2 ): 313 – 24 . http://dx.doi.org/10.1016/j.celrep.2013.12.032 Google Scholar CrossRef Search ADS PubMed 120 Chen X , Ren B , Chen M , et al. NLLSS: predicting synergistic drug combinations based on semi-supervised learning . PLoS Comput Biol 2016 ; 12 ( 7 ): e1004975 . Google Scholar CrossRef Search ADS PubMed 121 Lathrop RH , Pazzani MJ. Combinatorial optimization in rapidly mutating drug-resistant viruses . J Comb Optim 1999 ; 3 ( 2/3 ): 301 – 20 . Google Scholar CrossRef Search ADS 122 Iversen AK , Shafer RW , Wehrly K , et al. Multidrug-resistant human immunodeficiency type I strains resulting from combination antiretroviral therapy . J Virol 1996 ; 70 ( 2 ): 1086 – 90 . Google Scholar PubMed 123 Boyce R , Collins C , Horn J , et al. Computing with evidence. Part II: an evidential approach to predicting metabolic drug–drug interactions . J Biom Inform 2009 ; 42 : 990 – 1003 . Google Scholar CrossRef Search ADS 124 Xu HT , Oliveira M , Asahchop EL , et al. Molecular mechanism of antagonism between the Y181C and E138K mutations in HIV-1 reverse transcriptase . J Virol 2012 ; 86 ( 23 ): 12983 – 90 . http://dx.doi.org/10.1128/JVI.02005-12 Google Scholar CrossRef Search ADS PubMed 125 Ziermann R , Limoli K , Das K , et al. A mutation in human immunodeficiency virus type 1 protease, n88s, that causes in vitro hypersensitivity to amprenavir . J Virol 2000 ; 74 ( 9 ): 4414 – 18 . http://dx.doi.org/10.1128/JVI.74.9.4414-4419.2000 Google Scholar CrossRef Search ADS PubMed 126 Imbus JR , Randle RW , Pitt SC , et al. Machine learning to identify multigland disease in primary hyperparathyroidism . J Surg Res 2017 ; 219 : 173 – 9 . http://dx.doi.org/10.1016/j.jss.2017.05.117 Google Scholar CrossRef Search ADS PubMed 127 Prosperi MC , Altmann A , Rosen-Zvi M , et al. Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment . Antivir Ther 2009 ; 14 ( 3 ): 433 – 42 . Google Scholar PubMed 128 Shaikh F. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/05/gpus-necessary-for-deep-learning/. 129 Kumar G , Gronlund CJ , Severtson RS. Introduction to the deep learning virtual machine. Microsoft Azure. https://docs.microsoft.com/en-us/azure//////machine-learning/data-science-virtual-machine/deep-learning-dsvm-overview. 130 Cui H , Zhang Z , Ganger GB , et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceeding EuroSys '16 Proc Eleventh European Conference on Computer Systems Article No. 4. ACM, London, UK, 2016 . 131 Chilimbi T , Suzue Y , Apacible J , et al. Project Adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI. 2014 . USENIX Assn, Bromfeld, CO, USA. 132 Yepes AJ , MacKinlay A , Bedo J , et al. Deep belief networks and biomedical text categorisation. In: G Ferraro, S Wan (eds), Proceedings of Australasian Language Technology Association Workshop. 2014 , 123 − 7. RMTT, Melbourne, Australia. 133 Robinson ME , O'Shea AM , Craggs JG , et al. Comparison of machine classification algorithms for fibromyalgia: neuroimages versus self-report . J Pain 2015 ; 16 ( 5 ): 472 – 7 . http://dx.doi.org/10.1016/j.jpain.2015.02.002 Google Scholar CrossRef Search ADS PubMed 134 Angermueller C , Pärnamaa T , Parts L , Stegle O. Deep learning for computational biology . Mol Syst Biol 2016 ; 12 ( 7 ): 878 . Google Scholar CrossRef Search ADS PubMed 135 Jia Y , Shelhamer E , Donahue J. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia. New York, NY: ACM, 2014 , 675–8. 136 Bastien F , Lamblin P , Pascanu R , et al. ( 2012 ) Theano: new features and speed improvements . arXiv . arXiv: 1211.5590 137 Team TTD , Al-Rfou R , Alain G , et al. ( 2016 ) Theano: a python framework for fast computation of mathematical expressions . arXiv . arXiv: 1605.02688 © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Briefings in BioinformaticsOxford University Press

Published: Feb 9, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off