Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications

Proteomic analysis and prediction of amino acid variations that influence protein... Abstract Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine. posttranslational modifications, amino acid variations, computational mutation analysis, protein PTM predictor, network biology Introduction Protein PTMs are biochemical alterations of amino acids that change the physicochemical properties of target proteins, leading to structural changes and therefore regulating protein–protein interactions and cellular signal transduction in developmental and cancer pathways [1]. Almost all the amino acids undergo the process of PTMs, except leucine (L), isoleucine (I), valine (V), alanine (A) and phenylalanine (F) [2]. PTMs are specific to types of amino acid residues. For example, phosphorylation mainly occurs on a subset of three types of amino acids, including serine (S), threonine (T) and tyrosine (Y). Methylation is predominantly found on lysine (K) and arginine (R) residues. N-linked glycosylation affects asparagine (N), and O-linked glycosylation occurs on the hydroxyl group of either serine (S) or threonine (T) [3]. The development and improvement of high-throughput mass spectrometry (MS) technique-based proteomics have greatly advanced the discovery and identification of PTMs [4, 5]. More than 600 different types of PTMs (http://www.uniprot.org/docs/ptmlist, Release: 20-Dec-2017) have been reported till date and many more are still being. Most PTMs are catalyzed by highly specific protein modifying enzymes, which have some specific recognition motif. For instance, the Type I protein arginine methyltransferases are known to methylate a number of proteins that contain an arginine glycine glycine (RGG)-motif [6]. Consequently, amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence PTM of protein and bring about a detrimental effect on protein function. Li et al. [7] analyzed amino acid variations of 15 different PTMs and indicated that about 4.5% of amino acid variations may affect protein function through disruption of PTMs, and the mutation of 238 PTMs sites in human proteins was causative of disease. There are a number of cases in which mutations of the modified sites were found to be involved in disease. It was reported that K630A mutation in androgen receptor has been shown to cause a loss of acetylation site and has been implicated in Kennedy’s disease, an inherited neurodegenerative disorder [8]. The amino acid variation S326C of human OGG1 disrupts Ser-326 phosphorylation site and affects susceptibility to a variety of cancers [9]. Lu et al. [10] reported that the K36M mutation of H3 impairs the differentiation of mesenchymal progenitor cells and promotes undifferentiated sarcoma through altered histone methylation landscape. More recently, Narayan et al. [11] indicated that acetylation and ubiquitination site mutations are enriched in cancer. In this regard, comprehensive studies of the impact of amino acid variation on protein PTMs will be helpful for further understanding of how genetic polymorphisms are involved in regulating biological and pathological processes and providing instructive information for drug development of various related diseases. The advanced laboratory techniques, such as MS-based experiments, have been used to analyze PTM sites. However, it often requires extensive laboratory work and considerable expense to make thousands of variant proteins and select amino acid variations that influence PTM sites. Moreover, with the advent of high-throughput variant detection, the amount of identified variations is growing rapidly. For instance, the SwissVariant database (http://swissvar.expasy.org/) contained 76 613 variants in 20 244 human proteins on 10 January 2018. Experimental methods cannot analyze all of these variations whether they change PTMs. Therefore, high speed and economic bioinformatics-based methods for analyzing the effect of mutations on the protein PTMs are gaining popularity. Computational mutation analysis can greatly narrow down the efforts on experimental work. Seventeen computational or in silico approaches have been reported in the literature for analyzing the effect of mutations on the protein PTMs since 2005. Table 1 provides a simple overview of all the approaches, including the type of PTMs, type of variations, PTM site predictor, year of publication and Website. We have broadly categorized these studies into two fields: one is the computational prediction of amino acid variations that influence protein PTMs, and the other is the corresponding functional analysis. Here, we try to review most of these works and discuss current computational challenges. Table 1. Summary of recent hierarchical ensemble techniques in prediction and analysis of amino acid variations that influence protein PTMs Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Table 1. Summary of recent hierarchical ensemble techniques in prediction and analysis of amino acid variations that influence protein PTMs Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTMvariants Non-synonymous single-nucleotide polymorphisms (nsSNPs) result in the substitution of the encoded amino acids. In this work, PTMvariants are amino acid variations that might influence protein PTM sites or their modifying enzymes. For example, phosphovariants are variations that affect phosphorylation sites or their interacting kinases and so on. In theory, all PTMvariants may be divided into six types from the existing literature [12–25]. Type I PTMvariant occurs at a PTM site position that directly adds [Type I (+)] or removes [Type I (−)] the PTM site. Type II PTMvariant does not occur at PTM site position but on the adjacent position of PTM site that adds [Type II (+)] or removes [Type II(−)] the PTM site. Type III PTMvariant occurs on the adjacent location of PTM site, which may change the type of modifying enzyme involved, without changing the PTM site itself. Some protein PTMs can occur on a subset of several types of amino acid residues. For instance, arginine and lysine are the most frequently methylated residues. Thus, an amino acid substitution between lysine (K) and arginine (R) occurs at a methylation site location, which might also induce a change of methyltransferases types for the methylation site. That is to say, the target site might still be methylated but by a different type of methyltransferases, which is defined as Type IV PTMvariant. Also, some amino acid residues can be modified by different kinds of PTMs. For example, the lysine side chain is a target of methylation, acetylation, ubiquitination, sumoyaltion, succinylation, hydroxylation and so on. So, amino acid variations that occur on the adjacent positions of lysine methylation site may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs that it should have happened, which is defined as type V PTMvariant. Moreover, Type VI PTMvariant occurs on the adjacent position of PTM site that results in a stop codon, which might remove its following PTM site in the protein C terminus. Among the above six type PTMvariants, the Type I and Type IV PTMvariants occur in the same location of PTM site, and other four type PTMvariants occur on the adjacent position of PTM site. Here, we choose lysine methylation peptide sequence as an example to describe the six type PTMvariants in Figure 1. Figure 1. View largeDownload slide Schematic illustration of six type PTMvariants with lysine methylation peptide sequence as an example, which include the change of an amino acid by lysine (K) residue to create a potential new [Type I (+)] or remove an original lysine methylation site [Type I (−)]; variation adjacent to lysine methylation site to create [Type II (+)] or remove [Type II (−)] lysine methylation site; variation in adjacent lysine methylation site, which may change the type of lysine methyltransferase (KMT) that recognizes lysine methylation, without changing the methylation site itself (Type III); variation between lysine (K) and arginine (R) at a methylation site location, which might induce a change of methyltransferases types for the methylation site (Type IV); variation adjacent to lysine methylation site, which may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs, such as lysine acetylation (Type V); and variation in adjacent lysine methylation position that results in a stop codon, which might remove its following lysine methylation site in the protein C terminus (Type VI). Pink amino acid residues are mutation residues. A lysine (K) linked with a methyl represents that this K can be methylated by KMT, and K linked with an acetyl represents that this K can be acetylated by lysine acetyltransferase (KAT). An arginine (R) linked with a methyl represents that this R can be methylated by arginine methyltransferases (RMT). KMT1 represents one type of lysine methyltransferase and KMT2 is another type of lysine methyltransferase. Figure 1. View largeDownload slide Schematic illustration of six type PTMvariants with lysine methylation peptide sequence as an example, which include the change of an amino acid by lysine (K) residue to create a potential new [Type I (+)] or remove an original lysine methylation site [Type I (−)]; variation adjacent to lysine methylation site to create [Type II (+)] or remove [Type II (−)] lysine methylation site; variation in adjacent lysine methylation site, which may change the type of lysine methyltransferase (KMT) that recognizes lysine methylation, without changing the methylation site itself (Type III); variation between lysine (K) and arginine (R) at a methylation site location, which might induce a change of methyltransferases types for the methylation site (Type IV); variation adjacent to lysine methylation site, which may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs, such as lysine acetylation (Type V); and variation in adjacent lysine methylation position that results in a stop codon, which might remove its following lysine methylation site in the protein C terminus (Type VI). Pink amino acid residues are mutation residues. A lysine (K) linked with a methyl represents that this K can be methylated by KMT, and K linked with an acetyl represents that this K can be acetylated by lysine acetyltransferase (KAT). An arginine (R) linked with a methyl represents that this R can be methylated by arginine methyltransferases (RMT). KMT1 represents one type of lysine methyltransferase and KMT2 is another type of lysine methyltransferase. Prediction of PTMvariants Computational procedure of PTMvariants detection is summarized in Figure 2. First, the genetic variations data are collected from NCBI dbSNP [28] or SwissVariant database [29]. Researchers further consult a number of databases (such as UniProtKB/Swiss-Prot, HPRD, etc.) about the underlying effects and the references of these variations and PTMs information of corresponding proteins. Second, the wild-type protein sequences and mutant sequences are submitted to corresponding PTMpredictor, which could identify a kind of PTM proteins or sites. Finally, the PTMvariants could be identified when the PTM sites or interacting modifying enzymes are altered between the wild-type and mutant sequences. Figure 2. View largeDownload slide Computational procedure of PTMvariants detection. Figure 2. View largeDownload slide Computational procedure of PTMvariants detection. The PTMpredictor is the critical point in the process of PTMvariants detection. The PTMpredictor is a prediction tool to determine whether a sequence that contains specific amino acid residues can be modified. The computer-aided prediction of the possibility of protein PTMs is an important task that is critical for the biological interpretation of proteome data. Moreover, with the explosive increase of the number of protein sequences in the databank over the years, experimental methods do not allow researchers to find all possible PTMs data. Therefore, the prediction of PTMs from amino acid sequences is one of the rapidly developing fields of bioinformatics. So far, there have been a number of PTMpredictors developed for identifying various types of PTM proteins or sites, including phosphorylation, glycosylation, acetylation, sumoylation, ubiquitination, palmitoylation, nitration and so on. The computational techniques for PTM prediction can be relatively complex. The earliest strategies for PTM prediction are the generation of consensus sequences (i.e. motifs), which effectively transform the information contained in sequence alignments into character-based patterns, which could then be scanned against a protein of interest [30]. With the rapid deposition of experimentally validated data of PTMs, machine learning algorithms have become increasingly used for the prediction of protein PTMs, including artificial neural networks [31, 32], random forest [33, 34], pseudo amino acid composition [35–44], group-based prediction system (GPS) [45, 46], decision trees [47, 48], support vector machines [49–51], etc. In general, in silico approach for predicting PTM sites is formulated as a two-class classification problem. First, experimentally validated examples of PTMs (positive) and non-PTMs (negative) short peptides are collected from related protein databases for constructing training data set. Then, various sequence, physicochemical, structural and evolutionary properties of positive and negative training data are extracted and encoded into fixed-length feature vectors. Subsequently, the feature vectors are incorporated into supervised machine learning algorithms. Afterward, some common metrics, such as sensitivity, specificity, accuracy and Matthews correlation coefficient, are determined through a cross-validation procedure for performance assessment of predictive tools. Finally, predictive algorithms are translated into user-friendly online Web tools whereby experimentalist can upload their protein sequences and receive PTM prediction results within minutes. More detailed information on prediction methodology and model construction of PTMpredictors can be obtained from several comprehensive reviews [31, 52–58]. In 2005, Savas and Ozcelik [12] first used the Web-based NetPhos tool to predict candidate PTMvariants and identified 15 nsSNPs that might create or abolish putative phosphorylation sites in 14 DNA repair and cell cycle proteins, and found three of these single-nucleotide polymorphisms were associated with altered cancer risk. In 2009, Ryu et al. [14] developed a kinase-specific phosphorylation sites predictor named PredPhospho (version 2), and further used PredPhospho to predict potential Type I, II, III phosphovariants, and ultimately developed PhosphoVariant that is a database for definite and possible human phosphovariants. In 2010, Ren et al. [15] collected 91 797 nsSNPs from NCBI dbSNP and used an in-house developed kinase-specific phosphorylation site predictor GPS 2.0 to predict potential Type I, II, III, IV, VI phosphovariants, and then integrated all phosphovariants into the PhosSNP 1.0 database, which is freely available for academic researchers. They computationally detected 69.76% of nsSNPs as potential Phosphovariants in 17 614 proteins and observed 74.58% of phosphovariants as Type III phosphovariants, suggesting that nsSNPs induce changes in protein kinase types in adjacent phosphorylation sites rather than creating or abolishing a phosphorylation site directly. The same year, Riaño-Pachón et al. [59] analyzed the effect of nsSNPs onto Arabidopsis thaliana phosphorylation sites and their patterns based on experimentally identified sites and high-confidence predicted phosphorylation sites taken from PhosPhAt (version 3.0), and found that the results of experimental phosphorylation sites were confirmed by similar analyses of predicted phosphorylation sites in A. thaliana [16]. In 2012, Mazumder et al. [17] carried out a comprehensive analysis of nsSNPs that lead to either loss or gain of the N-glycosylation motif, and found that 48% of the variations result in changes in glycosylation sites occur at the loop and bend regions of the proteins. In 2013, Suo et al. [18] constructed the lysine acetylation sites predictor KAcePred to identify Type I, II, III acetylvariants that are acetylation-related amino acid variations, and detected that 50.87% of amino acid variations are potential acetylvariants and 12.32% of disease mutations could lead to acetylvariants. In 2015, Xu et al. [24] developed a lysine sumoylation sites prediction platform SumoPred to efficiently identify the potential Sumovariants that are sumoylation-related amino acid variations, and observed that amino acid variations that directly create new potential sumoylation sites are more likely to cause diseases. Subsequently, Li et al. [25] presented an identification method SeqPalm for protein S-palmitoylation sites to study all known disease-associated variations, and discovered that 243 potential disruption or creation of palmitoylation sites are highly associated with human inherited disease. In 2016, Lin and colleagues [27] updated rice SNP resource based on the new rice genome Ver. 7.0, and then detected different types of rice phosphovariants by NetPhosK1.0 and Rice_phospho1.0, including Type I, II, III, IV phosphovariants, and further constructed a database, SNP-rice 1.0, which was accessible at http://bioinformatics.fafu.edu.cn. As summarized in Table 1, at present the types of PTMs for prediction of PTMvariants are phosphorylation, N-glycosylation, lysine acetylation, ubiquitination, sumoylation, S-palmitoylation, tyrosine sulfation and nitration. Most present research works have mainly focused on type I and II PTMvariants. These prediction platforms of PTMvariants and the corresponding data can provide more instructive help for further experimental investigation. Functional analysis of PTMvariants A number of studies pinpointed out that amino acid variations could not only affect protein stability and dynamics but also play important roles in rewiring signaling pathways by changing protein PTM patterns [7, 16, 60, 61]. Investigating of the functional effects of PTMvariants is concentrated in protein phosphorylation, which is the most extensively studied PTM among all known protein PTMs. In 2008, Radivojac et al. [13] investigated the role of phosphorylation in somatic cancer mutations and inherited diseases, and the phosphorylation site predictor named as DisPhos [62] was chosen to predict the gain or loss of a phosphorylation site in a target protein through spontaneous mutation. They further found that kinases in cancer are twice as likely to have mutations disrupting phosphorylation sites as compared with a kinase control set, and gain of phosphorylation sites in cancer-associated mutations is about 2-fold as compared with Swiss-Prot and human variation data, which suggests both gain and loss of a phosphorylation site in a target protein may be important features for predicting cancer-causing mutations. Besides, Bader lab has carried out a series of research works on the links between phosphorylation signaling and genetic variations that affect signaling systems in cancer. In 2013, they first developed a regression-based statistical method, named as ActiveDriver, to identify frequently mutated or variable protein sites [19]. Then, they used ActiveDriver to analyze 10 900 missense single-nucleotide variants (SNVs) from 793 samples of eight cancer types, and found genes with significant phosphorylation-associated SNVs (pSNVs). ActiveDriver highlighted 11 additional cancer genes and many novel candidates with highly specific phosphosite mutations. Next, they performed a pathway analysis to find systems of functionally related genes with frequent pSNVs and analyzed pSNVs in the kinase–substrate network. Finally, they revealed increased survival of patients with TP53 pSNVs, hierarchically organized cancer kinase modules, a novel pSNV in EGF receptor (EGFR), and an immune-related network of pSNVs that correlates with prolonged survival in ovarian cancer. Their findings included multiple actionable cancer gene candidates (FLNB, GRM1, POU2F1), protein complexes (HCF1, ASF1) and kinases (PRKCZ). The same year, Bader lab further used ActiveDriver to analyze pan-cancer data set of 3185 tumor genomes and 12 cancer types from The Cancer Genome Atlas (TCGA), and showed pSNVs occurred in ∼90% of tumors and were enriched in cancer genes and pathways [20]. Gene-centric analysis found 150 known and candidate cancer genes with significant pSNV recurrence. Using a novel high-confidence set of sequence patterns recognized by 96 kinases modeled as position weight matrices (PWMs), they predicted that 29% of these mutations directly disrupt phosphorylation or modify kinase target sites to rewire signaling pathways. In 2015, Bader lab developed a machine learning method based on Bayesian statistics, called mutation impact on phosphorylation (MIMP), to predict whether SNVs disrupt existing phosphorylation sites or create new sites [23]. They tested MIMP on 236 367 missense SNVs from TCGA pan-cancer data set of 3185 cancer samples of 12 tumor types, and the results indicated that MIMP can detect functional mutations in kinase-binding sites and propose corresponding mechanisms. Furthermore, Xue [22] research group also investigated the reconfiguration of phosphorylation signaling by genetic polymorphisms affecting cancer susceptibility. They first used the kinase-specific predictor of GPS 2.1 [46] previously developed by them to predict site-specific kinase–substrate relations (ssKSRs) for the original and nsSNP-containing proteins, respectively. Then, they adopted known phosphorylation sites and protein–protein interactions between kinases and substrates to remove false-positive predictions, and detected 9606 potential phosphorylation-related single-nucleotide polymorphisms (phosSNPs) in 7946 proteins. Subsequently, they reconstructed the human phosSNP-associated kinase–substrate phosphorylation network by comparing the predicted ssKSRs for the original and mutated proteins, and found that cancer genes and drug targets are highly enriched in the network, and the proteins in the network are significantly associated with various signaling and cancer pathways. In addition, Linding lab discussed whether cancer mutations could perturb phosphorylation signaling networks. They first proposed network-attacking mutations (NAMs), which lead to a new cellular phenotype by perturbing signaling networks either at the network structure or the network dynamics level in 2012 [63]. Later on, they further divided NAMs into six types based on perturbations of signaling network dynamics, network structure and dysregulation of phosphorylation sites in 2015 [26]. Furthermore, they developed a computational approach, termed ReKINect, to predict NAMs and systematically interpreted the exomes and quantitative phospho-proteomes of five ovarian cancer cell lines and the global cancer genome repository. They discovered mutant molecular logic gates, a drift toward phospho-threonine signaling, weakening of phosphorylation motifs and kinase-inactivating hotspots in cancer. Except for phosphorylation, recently, Bader lab performed computational analyses of acetylation and ubiquitination sites in a pan-cancer data set of 3200 tumor samples from TCGA [11]. Cancer mutations were mapped to PTM sites and gene-focused analysis with the ActiveDriver mutational significance model highlighted significant co-occurrences of acetylation and ubiquitination and mutation hotspots in known oncoproteins and showed candidate cancer driver genes with PTM-related mechanisms. Pathway analysis with functional annotations from the Gene Ontology revealed that PTM mutations in acetylation and ubiquitination sites accumulated in cancer-related processes such as cell cycle, apoptosis, chromatin regulation and metabolism. PTM-specific interaction network analysis revealed survival-associated protein modules and suggested that many PTM-related mutations were related to decreased patient survival. Taken together, these system analyses about PTM-related mutations should prove helpful for further elucidation of the functional impacts of disease-associated SNPs. Application of computational tools Computational approaches can filter out the most likely pathological variants from the huge pool of SNP data sets and represent a useful starting point to guide the design of functional assays. For example, Deng et al. [64] explored the functions of phosSNPs for bone mineral density (BMD) in humans to elucidate pathophysiological mechanism of osteoporosis. During the course of the investigation, out of the total 64 035 phosSNPs in the phosSNP 1.0 database [15], those covered by Affymetrix SNP Arrays were studied. The potential impacts of the three significant phosSNPs (rs16861032, rs2657879 and rs6265) on protein phosphorylation were predicted by GPS 2.0. They experimentally validated the prediction that BDNF-T62 was the target site of phosphorylation regulated by rs6265, and concluded that phosSNP rs6265, through regulating BDNF protein phosphorylation and osteoblast differentiation, influenced hip BMD in humans. In 2017, Cheng and colleagues [65] investigated potential functional mechanisms of type 2 diabetes (T2D)-associated SNPs and genes. They adopted the PhosSNP 1.0 database [15] (accessed 8 March 2016) to analyze the effects of T2D-associated SNPs on protein phosphorylation, and detected four T2D-associated and eight proxy phosSNPs, including Type II, III, IV phosSNPs. More recently, Krassowski et al. [66] developed a comprehensive human proteo-genomics database, ActiveDriverDB, which annotates disease mutations and population variants through the lens of PTMs. Herein, two interaction networks are available: the high-confidence experimental network includes experimentally determined kinase–substrate interactions, and the computationally predicted network includes gained and lost kinase–substrate interactions derived from sequence motif analysis with MIMP method [23]. The ReKINect platform [26] can be used to predict whether mutations can create or destroy phosphorylation sites, and identify kinase downstream rewiring mutations. Li [67] applied ReKINect to predict the likely functionality for each mutation and exploited the role of mutations in epigenetic regulators, particularly MLL2, in cervical carcinogenesis. There is no uniform standard between different methods, and it is extremely dificult to decide which method is best. To avoid limitation of each approach and improve the performance, it is best to combine many tools and resources in determining the possible functional mechanism. For instance, Rajendran and Deng [68] incorporated a dozen of driver gene prediction tools and resources to identify a list of candidate driver mutations involved in human breast cancer genes. To increase the reliability of prediction, six different approaches including ActiveDriver [19] were used to ensure the confidence level of potential breast cancer drivers. Discussion As the rapid progress of sequencing technologies and the appearance of new methods, a large number of protein PTM sites and genetic variations are emerging. The PTMVar data set (Release: 27-Sep-2017), derived from PhosphoSitePlus [69], includes 30 080 modification sites associated with single amino acid variants, of which 41.58% are classified as disease variants, 45.34% as polymorphisms and 13.08% as unclassified (Figure 3). As shown in Figure 3A, these amino acid variations mainly involve phosphorylation (73.96%), ubiquitination (13.82%), acetylation (10.09%), succinylation (0.93%), sumoylation (0.86%), methylation (0.22%) and neddylation (0.12%). Because PTMs are important active sites that are regulatory switches in proteins and pathways, specific mutations in PTM sites may alter networks and lead to changes in cellular phenotype involved in disease development [11]. However, the complexities of PTM mechanism cannot be perfectly solved by experimental approaches, and mutational analysis would be labor-intensive and time-consuming. Thus, it is necessary to provide new methods to automatically predict the impact of mutations on PTMs, especially for large-scale predictions. As mentioned above, many different methods have been developed to predict PTMvariants and perform related functional analysis and have made some progress. But computational mutation analysis on protein PTMs is still a complicated project, and there are also some problems and challenges in this field. Figure 3. View largeDownload slide The data statistics for the PTMVar data set (Release: 27 September 2017) derived from PhosphoSitePlus. (A) The proportion of different types PTMs impacted by variants. (B) The statistics of disease, polymorphisms and unclassified variants. (C) The proportion of Type I and Type II variants. Type I and Type II mutated residues are at or within five residues of PTM sites, respectively. Figure 3. View largeDownload slide The data statistics for the PTMVar data set (Release: 27 September 2017) derived from PhosphoSitePlus. (A) The proportion of different types PTMs impacted by variants. (B) The statistics of disease, polymorphisms and unclassified variants. (C) The proportion of Type I and Type II variants. Type I and Type II mutated residues are at or within five residues of PTM sites, respectively. The PTMvariants could be identified by protein PTM predictors, so the prediction quality of PTMpredictors directly affects identification of PTMvariants. Hundreds of PTMpredictors have already been designed to predict kinds of PTM proteins or sites so far. Most of the existing PTMpredictors are based on machine learning methods in which the success of the prediction heavily depends on the effective feature, and feature extraction and classification are regarded as two separate problems. However, the features are extracted from protein sequences by manual design, which may result in incomplete or biased features. Thus, new approaches, such as deep learning, should be introduced to deal with the problem and improve the prediction quality of PTMpredictors. For example, recently Wang et al. [70] applied deep learning method to update their previous tool Musite for general and kinase-specific phosphorylation site prediction, and showed that deep learning method has improved the performance of prediction and become a promising approach. Compared with conventional machine learning methods, deep learning technique allows computational model to be fed with raw sequence data and automatically discovers the complex representations needed for classification. Furthermore, using a combination of methods based on different theoretical principles may help mitigate false-positive and negative rates suffered by any one method alone, which results in a cleaner list of candidates for experimental validation [71]. In addition, the online service is not available for some existing PTMvariants predictors (Table 1). To develop practically more useful prediction methods, a user-friendly and publicly accessible Web server or Applet is necessary for those experimental scientists who could use the predictors as a tool without understanding the detailed mathematics. As shown in Figure 3C, the PTMVar data set derived from PhosphoSitePlus database provides Type I and Type II mutated residues, which are at or within five residues of PTM sites, respectively. Owing to the limited data available, most computational mutation analysis on protein PTMs have mainly focused on type I and II PTMvariants. However, accumulative studies have indicated that various PTMs can synergistically orchestrate specific biological processes by cross talks [72–75]. Multiple PTMs can ‘in situ’ interplay with each other by competitively modifying same residues [73–75]. For instance, lysine methylation and acetylation compete at Lys382 for modulating the p53 transcription activity [76]. Also, Lys382 acetylation of the p53 can also preclude its ubiquitination [77]. Likewise, nitration and phosphorylation can co-occur at Tyr125, Tyr133 and Tyr136 of alpha-synuclein (UniProt ID: P37840) [21, 78]. Human Gastrin (UniProt ID: P01350) was identified to be phosphorylated by v-Src at Tyr87, which is also modulated by sulfation [79, 80]. Xue [21] research group systematically analyzed in situ cross talk at the same positions among three tyrosine PTMs, including sulfation, nitration and phosphorylation. Their results suggested that multi-PTM targeting tyrosines are not more conserved than unmodified ones, and there is no functional constraint on multiply modified tyrosines. Multiple PTMs cross talks in a site-specific manner are significantly co-occurred. However, the intricate competition mechanism and influence of in situ cross talk have not been in-depth research. The amino acid variations that occur on the adjacent positions of in situ cross talk sites may change the circumstance around cross talk sites and further transform the type of PTMs that it should have happened. In this regard, Type V PTMvariant might be a valuable way to explore underlying connection of the cross talk among different PTMs. Meanwhile, with increasing availability of various modifying enzymes data, developing protein modifying enzyme-specific (such as methyltransferase-specific, acetyltransferase-specific, etc.) PTM substrate and site prediction tool is necessary and feasible, which will be helpful for identifying and analyzing of Type III, IV and V PTMvariants. There is a need for methods that not only identify PTMvariants but also predict how PTMvariants might affect cellular networks. Functional analysis of PTMvariants is still in its infancy. Mutations do not occur in isolation but coexist with other somatic alterations that work together to alter cellular processes. The integration of multiple sources of biological information, pathway and network analysis techniques based on graph theory, information theory or Bayesian theory can help address the challenge in interpreting proteomics results [71]. Network biology may fundamentally advance not only basic biology but also patient treatment [63]. Because personalized precision medicine is the frontier for scientists, industry and the general population, it is becoming more significance to exploit computational approaches that can lead to a better understanding of the etiology of disease. Apart from PTMs in protein, cancer and many other major diseases are often caused by post-replication modification (PTRM) in DNA, and posttranscription modification (PTCM) in RNA. Many different methods have been developed to predict PTRM sites for DNA sequences [81, 82] and PTCM sites for RNA sequences [83–86]. All these developments have significant impacts on medicinal chemistry [87] and even drive it into an unprecedented revolution [88]. Integration of genetic and molecular information is a sensible step in this direction because it provides a structural and functional perspective to the human variation data. The development of better approaches for functional analysis of PTMvariants will help to facilitate this process and further support the future development of personalized precision medicine. Key Points Computational mutation analysis can greatly narrow down the efforts on experimental work. We provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. Using a combination of methods based on different theoretical principles may help mitigate false-positive and negative rates suffered by any one method alone. Type V PTMvariant might be a valuable way to explore underlying connection of the cross talk among different PTMs. The integration of multiple sources of biological information, pathway and network analysis techniques based on graph theory, information theory or Bayesian theory can help address the challenge in interpreting proteomics results. Funding This work was supported by the National Natural Science Foundation of China (grant numbers 21665016 and 21305062), and the Natural Science Foundation of Jiangxi Province (grant number 20151BAB203022). Shaoping Shi is an associate professor at School of Sciences, Nanchang University. Her research focuses on the development of novel data analysis algorithms and bioinformatics tools for prediction of protein structure and function. Lina Wang is a lecturer at the Department of Science, Nanchang Institute of Technology. Her research focuses on computational prediction and function analysis of acylation modification. Cao Man is a graduate student at School of Sciences, Nanchang University. Her research focuses on integrative tyrosine PTMs data for analysis and validation. Guodong Chen is a graduate student at School of Sciences, Nanchang University. His current research focuses on developing novel data analysis algorithms and software of prokaryotes lysine acetylation. Jialin Yu is a graduate student at School of Sciences, Nanchang University. His research focuses on deep learning and prediction of protein structure and function. References 1 Reimand J , Wagih O , Bader GD. Evolutionary constraint and disease associations of post-translational modification sites in human genomes . PLoS Genet 2015 ; 11 : e1004919. Google Scholar CrossRef Search ADS PubMed 2 Kamath KS , Vasavada MS , Srivastava S. Proteomic databases and tools to decipher post-translational modifications . J Proteomics 2011 ; 75 ( 1 ): 127 – 44 . Google Scholar CrossRef Search ADS PubMed 3 Lichti CF , Wildburger NC , Emmett MR , et al. Post-translational modifications in the human proteome. In: Marko-Varga G (ed). Genomics and Proteomics for Clinical Discovery and Development. Translational Bioinformatics . Dordrecht : Springer , 2014 . 4 Olsen JV , Mann M. Status of large-scale analysis of post-translational modifications by mass spectrometry . Mol Cell Proteomics 2013 ; 12 : 3444 – 52 . Google Scholar CrossRef Search ADS PubMed 5 Liu ZX , Cai YD , Guo XJ , et al. Post-translational modification (PTM) bioinformatics in China: progresses and perspectives . Hereditas 2015 ; 37 : 621 – 34 . Google Scholar PubMed 6 Pang C , Gasteiger E , Wilkins MR. Identification of arginine- and lysine-methylation in the proteome of Saccharomyces cerevisiae and its functional implications . BMC Genomics 2010 ; 11 : 92 . Google Scholar CrossRef Search ADS PubMed 7 Li S , Iakoucheva LM , Mooney SD , et al. Loss of post-translational modification sites in disease . Pac Symp Biocomput 2010 ; 15 : 337 – 47 . 8 Thomas M , Dadgar N , Aphale A , et al. Androgen receptor acetylation site mutations cause trafficking defects, misfolding, and aggregation similar to expanded glutamine tracts . J Biol Chem 2004 ; 279 : 8389 – 95 . Google Scholar CrossRef Search ADS PubMed 9 Luna L , Rolseth V , Hildrestrand GA , et al. Dynamic relocalization of hOGG1 during the cell cycle is disrupted in cells harbouring the hOGG1-Cys326 polymorphic variant . Nucleic Acids Res 2005 ; 33 ( 6 ): 1813 – 24 . Google Scholar CrossRef Search ADS PubMed 10 Lu C , Jain SU , Hoelper D , et al. Histone H3K36 mutations promote sarcomagenesis through altered histone methylation landscape . Science 2016 ; 352 ( 6287 ): 844 – 9 . Google Scholar CrossRef Search ADS PubMed 11 Narayan S , Bader GD , Reimand J. Frequent mutations in acetylation and ubiquitination sites suggest novel driver mechanisms of cancer . Genome Med 2016 ; 8 : 55 . Google Scholar CrossRef Search ADS PubMed 12 Savas S , Ozcelik H. Phosphorylation states of cell cycle and DNA repair proteins can be altered by the nsSNPs . BMC Cancer 2005 ; 5 : 107 . Google Scholar CrossRef Search ADS PubMed 13 Radivojac P , Baenziger PH , Kann MG , et al. Gain and loss of phosphorylation sites in human cancer . Bioinformatics 2008 ; 24 : i241 – 7 . Google Scholar CrossRef Search ADS PubMed 14 Ryu GM , Song P , Kim KW , et al. Genome-wide analysis to predict protein sequence variations that change phosphorylation sites or their corresponding kinases . Nucleic Acids Res 2009 ; 37 ( 4 ): 1297 – 307 . Google Scholar CrossRef Search ADS PubMed 15 Ren J , Jiang CH , Gao XJ , et al. PhosSNP for systematic analysis of genetic polymorphisms that influence protein phosphorylation . Mol Cell Proteomics 2010 ; 9 : 623 – 34 . Google Scholar CrossRef Search ADS PubMed 16 Riaño-Pachón DM , Kleessen S , Neigenfind J , et al. Proteome-wide survey of phosphorylation patterns affected by nuclear DNA polymorphisms in Arabidopsis thaliana . BMC Genomics 2010 ; 11 : 411 . Google Scholar CrossRef Search ADS PubMed 17 Mazumder R , Morampudi KS , Motwani M , et al. Proteome-wide analysis of single-nucleotide variations in the n-glycosylation sequon of human genes . PLoS One 2012 ; 7 : e36212 . Google Scholar CrossRef Search ADS PubMed 18 Suo SB , Qiu JD , Shi SP , et al. Proteome-wide analysis of amino acid variations that influences protein lysine acetylation . J Proteome Res 2013 ; 12 : 949 – 58 . Google Scholar CrossRef Search ADS PubMed 19 Reimand J , Bader GD. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers . Mol Syst Biol 2013 ; 9 : 637 . Google Scholar CrossRef Search ADS PubMed 20 Reimand J , Wagih O , Bader GD. The mutational landscape of phosphorylation signaling in cancer . Sci Rep 2013 ; 3 : 2651 . Google Scholar CrossRef Search ADS PubMed 21 Pan Z , Liu Z , Cheng H , et al. Systematic analysis of the in situ crosstalk of tyrosine modifications reveals no additional natural selection on multiply modified residues . Sci Rep 2014 ; 4 : 7331. Google Scholar CrossRef Search ADS PubMed 22 Wang Y , Cheng H , Pan Z , et al. Reconfiguring phosphorylation signaling by genetic polymorphisms affects cancer susceptibility . J Mol Cell Biol 2015 ; 7 : 187 – 202 . Google Scholar CrossRef Search ADS PubMed 23 Wagih O , Reimand J , Bader GD. MIMP: predicting the impact of mutations on kinase-substrate phosphorylation . Nat Methods 2015 ; 12 : 531 – 3 . Google Scholar CrossRef Search ADS PubMed 24 Xu HD , Shi SP , Chen X , et al. Systematic analysis of the genetic variability that impacts sumo conjugation and their involvement in human diseases . Sci Rep 2015 ; 5 : 10900 . Google Scholar CrossRef Search ADS PubMed 25 Li S , Li J , Ning L , et al. In silico identification of protein S-palmitoylation sites and their involvement in human inherited disease . J Chem Inf Model 2015 ; 55 : 2015 – 25 . Google Scholar CrossRef Search ADS PubMed 26 Creixell P , Schoof EM , Simpson CD , et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling . Cell 2015 ; 163 : 202 – 17 . Google Scholar CrossRef Search ADS PubMed 27 Lin S , Chen L , Tao H , et al. Impact of SNPs on protein phosphorylation status in rice (Oryza sativa L.) . Int J Mol Sci 2016 ; 17 ( 11 ): 1738 . Google Scholar CrossRef Search ADS 28 Sherry ST , Ward MH , Kholodov M , et al. dbSNP: the NCBI database of genetic variation . Nucleic Acids Res 2001 ; 29 ( 1 ): 308 – 11 . Google Scholar CrossRef Search ADS PubMed 29 Yip YL , Scheib H , Diemand AV , et al. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants . Hum Mutat 2004 ; 23 : 464 – 70 . Google Scholar CrossRef Search ADS PubMed 30 Pearson RB , Kemp BE. Protein kinase phosphorylation site sequences and consensus specificity motifs: tabulations . Methods Enzymol 1991 ; 200 : 62 – 81 . Google Scholar CrossRef Search ADS PubMed 31 Blom N , Sicheritz-Ponten T , Gupta R , et al. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence . Proteomics 2004 ; 4 ( 6 ): 1633 – 49 . Google Scholar CrossRef Search ADS PubMed 32 Plewczynski D , Basu S , Saha I. AMS 4.0: consensus prediction of post-translational modifications in protein sequences . Amino Acids 2012 ; 43 ( 2 ): 573 – 82 . Google Scholar CrossRef Search ADS PubMed 33 Jia J , Liu Z , Xiao X , et al. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach . J Theor Biol 2016 ; 394 : 223 – 30 . Google Scholar CrossRef Search ADS PubMed 34 Qiu WR , Sun BQ , Xiao X , et al. iPTM-mLys: identifying multiple lysine PTM sites and their different types . Bioinformatics 2016 ; 32 ( 20 ): 3116 – 23 . Google Scholar CrossRef Search ADS PubMed 35 Xu Y , Ding J , Wu LY , et al. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition . PLoS One 2013 ; 8 : e55844 . Google Scholar CrossRef Search ADS PubMed 36 Xu Y , Shao XJ , Wu LY , et al. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins . Peer J 2013 ; 1 : e171 . Google Scholar CrossRef Search ADS PubMed 37 Jia C , Lin X , Wang Z. Prediction of protein S-nitrosylation sites based on adapted normal distribution Bi-Profile Bayes and Chou's Pseudo amino acid composition . Int J Mol Sci 2014 ; 15 : 10410 – 23 . Google Scholar CrossRef Search ADS PubMed 38 Qiu WR , Xiao X , Lin WZ , et al. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach . Biomed Res Int 2014 ; 2014 : 947416. Google Scholar PubMed 39 Xu Y , Wen X , Wen LS , et al. iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition . PLoS One 2014 ; 9 : e10501. 40 Jia J , Zhang L , Liu Z , et al. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC . Bioinformatics 2016 ; 32 ( 20 ): 3133 – 41 . Google Scholar CrossRef Search ADS PubMed 41 Liu LM , Xu Y . iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC . Med Chem 2017 ; 13 : 552 – 9 . Google Scholar CrossRef Search ADS PubMed 42 Jia J , Liu Z , Xiao X , et al. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset . Anal Biochem 2016 ; 497 : 48 – 56 . Google Scholar CrossRef Search ADS PubMed 43 Xu Y , Wen X , Shao XJ , et al. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition . Int J Mol Sci 2014 ; 15 : 7594 – 610 . Google Scholar CrossRef Search ADS PubMed 44 Qiu WR , Sun BQ , Xiao X , et al. iPhos-PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory . Mol Inform 2017 ; 36 ( 5–6 ): UNSP 1600010 . Google Scholar CrossRef Search ADS 45 Xue Y , Ren J , Gao X , et al. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy . Mol Cell Proteomics 2008 ; 7 : 1598 – 608 . Google Scholar CrossRef Search ADS PubMed 46 Xue Y , Liu Z , Cao J , et al. GPS 2.1: enhanced prediction of kinasespecific phosphorylation sites with an algorithm of motif length selection . Protein Eng Des Sel 2011 ; 24 : 255 – 60 . Google Scholar CrossRef Search ADS PubMed 47 Charpilloz C , Veuthey AL , Chopard B , et al. Motifs tree: a new method for predicting post-translational modifications . Bioinformatics 2014 ; 30 ( 14 ): 1974 – 82 . Google Scholar CrossRef Search ADS PubMed 48 Lopez Y , Dehzangi A , Lal SP , et al. SucStruct: prediction of succinylated lysine residues by using structural properties of amino acids . Anal Biochem 2017 ; 527 : 24 – 32 . Google Scholar CrossRef Search ADS PubMed 49 Shi SP , Sun XY , Qiu JD , et al. The prediction of palmitoylation site locations using a multiple feature extraction methods . J Mol Graph Model 2013 ; 40 : 125 – 30 . Google Scholar CrossRef Search ADS PubMed 50 Wang LN , Shi SP , Xu HD , et al. Computational prediction of species-specific malonylation sites via enhanced characteristic strategy . Bioinformatics 2017 ; 33 ( 10 ): 1457 – 63 . Google Scholar PubMed 51 Shi SP , Chen X , Xu HD , et al. PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure . Mol BioSyst 2015 ; 11 : 819 – 25 . Google Scholar CrossRef Search ADS PubMed 52 Xue Y , Gao XJ , Cao J , et al. A summary of computational resources for protein phosphorylation . Curr Protein Pept Sci 2010 ; 11 ( 6 ): 485 – 96 . Google Scholar CrossRef Search ADS PubMed 53 Eisenhaber B , Eisenhaber F. Prediction of posttranslational modification of proteins from their amino acid sequence . Methods Mol Biol 2010 ; 609 : 365 – 84 . Google Scholar CrossRef Search ADS PubMed 54 Trost B , Kusalik A. Computational prediction of eukaryotic phosphorylation sites . Bioinformatics 2011 ; 27 ( 21 ): 2927 – 35 . Google Scholar CrossRef Search ADS PubMed 55 Sobolev BN , Veselovsky AV , Poroikov VV. Prediction of protein post-translational modifications: main trends and methods . Russ Chem Rev 2014 ; 83 : 143 – 54 . Google Scholar CrossRef Search ADS 56 Shi SP , Xu HD , Wen PP , et al. Progress and challenges in predicting protein methylation sites . Mol BioSyst 2015 ; 11 : 2610 – 19 . Google Scholar CrossRef Search ADS PubMed 57 Chen Z , Zhou Y , Zhang ZD , et al. Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features . Brief Bioinform 2015 ; 16 ( 4 ): 640 – 57 . Google Scholar CrossRef Search ADS PubMed 58 Xu Y , Chou KC. Recent progress in predicting posttranslational modification sites in proteins . Curr Top Med Chem 2016 ; 16 ( 6 ): 591 – 603 . Google Scholar CrossRef Search ADS PubMed 59 Durek P , Schmidt R , Heazlewood J , et al. PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update . Nucleic Acids Res 2010 ; 38 : D828 – 34 . Google Scholar CrossRef Search ADS PubMed 60 Yue P , Moult J. Identification and analysis of deleterious human SNPs . J Mol Biol 2006 ; 356 ( 5 ): 1263 – 74 . Google Scholar CrossRef Search ADS PubMed 61 Creixell P , Schoof EM , Tan CSH , et al. Mutational properties of amino acid residues: implications for evolvability of phosphorylatable residues . Phil Trans R Soc B 2012 ; 367 ( 1602 ): 2584 – 93 . Google Scholar CrossRef Search ADS PubMed 62 Iakoucheva LM , Radivojac P , Brown CJ , et al. The importance of intrinsic disorder for protein phosphorylation . Nucleic Acids Res 2004 ; 32 ( 3 ): 1037 – 49 . Google Scholar CrossRef Search ADS PubMed 63 Creixell P , Schoof EM , Erler JT , et al. Navigating cancer network attractors for tumor-specific therapy . Nat Biotechnol 2012 ; 30 ( 9 ): 842 – 8 . Google Scholar CrossRef Search ADS PubMed 64 Deng FY , Tan LJ , Shen H , et al. SNP rs6265 regulates protein phosphorylation and osteoblast differentiation and influences BMD in humans . J Bone Miner Res 2013 ; 28 ( 12 ): 2498 – 507 . Google Scholar CrossRef Search ADS PubMed 65 Cheng M , Liu X , Yang M , et al. Computational analyses of type 2 diabetes-associated loci identified by genome-wide association studies . J Diabetes 2017 ; 9 ( 4 ): 362 – 77 . Google Scholar CrossRef Search ADS PubMed 66 Krassowski M , Paczkowska M , Cullion K , et al. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins . Nucleic Acids Res 2018 ; 46 : D901 – 10 . Google Scholar CrossRef Search ADS PubMed 67 Li X. Emerging role of mutations in epigenetic regulators including MLL2 derived from The Cancer Genome Atlas for cervical cancer . BMC Cancer 2017 ; 17 ( 1 ): 252 . Google Scholar CrossRef Search ADS PubMed 68 Rajendran BK , Deng CX. Characterization of potential driver mutations involved in human breast cancer by computational approaches . Oncotarget 2017 ; 8 ( 30 ): 50252 – 72 . Google Scholar CrossRef Search ADS PubMed 69 Hornbeck PV , Zhang B , Murray B , et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations . Nucleic Acid Res 2015 ; 43 ( D1 ): D512 – 20 . Google Scholar CrossRef Search ADS PubMed 70 Wang D , Zeng S , Xu C , et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction . Bioinformatics 2017 ; 33 ( 24 ): 3909 – 16 . Google Scholar CrossRef Search ADS PubMed 71 Gonzalez-Perez A , Mustonen V , Reva B , et al. Computational approaches to identify functional genetic variants in cancer genomes . Nat Methods 2013 ; 8 : 723 – 9 . 72 Lopez-Otin C , Hunter T. The regulatory crosstalk between kinases and proteases in cancer . Nat Rev Cancer 2010 ; 10 ( 4 ): 278 – 92 . Google Scholar CrossRef Search ADS PubMed 73 Yang XJ , Seto E. Lysine acetylation: codified crosstalk with other posttranslational modifications . Mol Cell 2008 ; 31 ( 4 ): 449 – 61 . Google Scholar CrossRef Search ADS PubMed 74 Hart GW , Slawson C , Ramirez-Correa G , et al. Cross talk between OGlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease . Annu Rev Biochem 2011 ; 80 : 825 – 58 . Google Scholar CrossRef Search ADS PubMed 75 Kaasik K , Kivimäe S , Allen JJ , et al. Glucose sensor O-GlcNAcylation coordinates with phosphorylation to regulate circadian clock . Cell Metab 2013 ; 17 ( 2 ): 291 – 302 . Google Scholar CrossRef Search ADS PubMed 76 Carter S , Vousden KH. Modifications of p53: competing for the lysines . Curr Opin in Genet Dev 2009 ; 19 ( 1 ): 18 – 24 . Google Scholar CrossRef Search ADS 77 Le Cam L , Linares LK , Paul C , et al. E4F1 is an atypical ubiquitin ligase that modulates p53 effector functions independently of degradation . Cell 2006 ; 127 ( 4 ): 775 – 88 . Google Scholar CrossRef Search ADS PubMed 78 Takahashi T , Yamashita H , Nakamura T , et al. Tyrosine 125 of alpha-synuclein plays a critical role for dimerization following nitrative stress . Brain Res 2002 ; 938 ( 1–2 ): 73 – 80 . Google Scholar CrossRef Search ADS PubMed 79 Songyang Z , Cantley LC. Recognition and specificity in protein tyrosine kinase-mediated signalling . Trends Biochem Sci 1995 ; 20 ( 11 ): 470 – 5 . Google Scholar CrossRef Search ADS PubMed 80 Rehfeld JF , Hansen CP , Johnsen AH. Post-poly(Glu) cleavage and degradation modified by O-sulfated tyrosine: a novel post-translational processing mechanism . EMBO J 1995 ; 14 : 389 – 96 . Google Scholar PubMed 81 Liu Z , Xiao X , Qiu WR , et al. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition . Anal Biochem 2015 ; 474 : 69 – 77 . Google Scholar CrossRef Search ADS PubMed 82 Feng P , Yang H , Ding H , et al. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC . Genomics 2018 , in press. doi: 10.1016/j.ygeno.2018.01.005. 83 Chen W , Tang H , Ye J , et al. iRNA-PseU: identifying RNA pseudouridine sites . Mol Ther Nucl Acids 2016 ; 5 : e332 . 84 Liu Z , Xiao X , Yu DJ , et al. pRNAm-PC: predicting N-methyladenosine sites in RNA sequences via physical-chemical properties . Anal Biochem 2016 ; 497 : 60 – 7 . Google Scholar CrossRef Search ADS PubMed 85 Feng P , Ding H , Yang H , et al. iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC . Mol Ther Nucl Acids 2017 ; 7 : 155 – 63 . Google Scholar CrossRef Search ADS 86 Qiu WR , Jiang SY , Sun BQ , et al. iRNA-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifier . Med Chem 2017 ; 13 : 734 – 43 . Google Scholar CrossRef Search ADS PubMed 87 Chou KC. Impacts of bioinformatics to medicinal chemistry . Med Chem 2015 ; 11 ( 3 ): 218 – 34 . Google Scholar CrossRef Search ADS PubMed 88 Chou KC. An unprecedented revolution in medicinal chemistry driven by the progress of biological science . Curr Top Med Chem 2017 ; 17 : 2337 – 58 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications

Loading next page...
 
/lp/ou_press/proteomic-analysis-and-prediction-of-amino-acid-variations-that-O9UWL2u0M1
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bby036
Publisher site
See Article on Publisher Site

Abstract

Abstract Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine. posttranslational modifications, amino acid variations, computational mutation analysis, protein PTM predictor, network biology Introduction Protein PTMs are biochemical alterations of amino acids that change the physicochemical properties of target proteins, leading to structural changes and therefore regulating protein–protein interactions and cellular signal transduction in developmental and cancer pathways [1]. Almost all the amino acids undergo the process of PTMs, except leucine (L), isoleucine (I), valine (V), alanine (A) and phenylalanine (F) [2]. PTMs are specific to types of amino acid residues. For example, phosphorylation mainly occurs on a subset of three types of amino acids, including serine (S), threonine (T) and tyrosine (Y). Methylation is predominantly found on lysine (K) and arginine (R) residues. N-linked glycosylation affects asparagine (N), and O-linked glycosylation occurs on the hydroxyl group of either serine (S) or threonine (T) [3]. The development and improvement of high-throughput mass spectrometry (MS) technique-based proteomics have greatly advanced the discovery and identification of PTMs [4, 5]. More than 600 different types of PTMs (http://www.uniprot.org/docs/ptmlist, Release: 20-Dec-2017) have been reported till date and many more are still being. Most PTMs are catalyzed by highly specific protein modifying enzymes, which have some specific recognition motif. For instance, the Type I protein arginine methyltransferases are known to methylate a number of proteins that contain an arginine glycine glycine (RGG)-motif [6]. Consequently, amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence PTM of protein and bring about a detrimental effect on protein function. Li et al. [7] analyzed amino acid variations of 15 different PTMs and indicated that about 4.5% of amino acid variations may affect protein function through disruption of PTMs, and the mutation of 238 PTMs sites in human proteins was causative of disease. There are a number of cases in which mutations of the modified sites were found to be involved in disease. It was reported that K630A mutation in androgen receptor has been shown to cause a loss of acetylation site and has been implicated in Kennedy’s disease, an inherited neurodegenerative disorder [8]. The amino acid variation S326C of human OGG1 disrupts Ser-326 phosphorylation site and affects susceptibility to a variety of cancers [9]. Lu et al. [10] reported that the K36M mutation of H3 impairs the differentiation of mesenchymal progenitor cells and promotes undifferentiated sarcoma through altered histone methylation landscape. More recently, Narayan et al. [11] indicated that acetylation and ubiquitination site mutations are enriched in cancer. In this regard, comprehensive studies of the impact of amino acid variation on protein PTMs will be helpful for further understanding of how genetic polymorphisms are involved in regulating biological and pathological processes and providing instructive information for drug development of various related diseases. The advanced laboratory techniques, such as MS-based experiments, have been used to analyze PTM sites. However, it often requires extensive laboratory work and considerable expense to make thousands of variant proteins and select amino acid variations that influence PTM sites. Moreover, with the advent of high-throughput variant detection, the amount of identified variations is growing rapidly. For instance, the SwissVariant database (http://swissvar.expasy.org/) contained 76 613 variants in 20 244 human proteins on 10 January 2018. Experimental methods cannot analyze all of these variations whether they change PTMs. Therefore, high speed and economic bioinformatics-based methods for analyzing the effect of mutations on the protein PTMs are gaining popularity. Computational mutation analysis can greatly narrow down the efforts on experimental work. Seventeen computational or in silico approaches have been reported in the literature for analyzing the effect of mutations on the protein PTMs since 2005. Table 1 provides a simple overview of all the approaches, including the type of PTMs, type of variations, PTM site predictor, year of publication and Website. We have broadly categorized these studies into two fields: one is the computational prediction of amino acid variations that influence protein PTMs, and the other is the corresponding functional analysis. Here, we try to review most of these works and discuss current computational challenges. Table 1. Summary of recent hierarchical ensemble techniques in prediction and analysis of amino acid variations that influence protein PTMs Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Table 1. Summary of recent hierarchical ensemble techniques in prediction and analysis of amino acid variations that influence protein PTMs Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTM Type of variations PTM site predictor Year/Reference Website (http://) Phosphorylation Type I, II NetPhos 2005/[12] – Phosphorylation Type I, II DisPhos 1.3 2008/[13] – Phosphorylation Type I, II, III PredPhospho (version 2) 2009/[14] phosphovariant.ngri.go.kr Phosphorylation Type I, II, III, IV, VI GPS 2.0 2010/[15] phossnp.biocuckoo.org Phosphorylation Type I PhosPhAt 2010/[16] – N-glycosylation Type I, II NetNGlyc 2012/[17] – Lysine acetylation Type I, II, III KAcePred 2013/[18] bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx Phosphorylation Type I, II ActiveDriver 2013/[19] individual.utoronto.ca/reimand/ActiveDriver/ Phosphorylation Type I, II ActiveDriver PWMs 2013/[20] – Tyrosine sulfation, nitration and phosphorylation Type I GPS-TSP 2014/[21] – GPS-YNO2 Phosphorylation Type I, II, III, IV GPS 2.1 2015/[22] phossnp.biocuckoo.org/dataset.php Phosphorylation Type II MIMP 2015/[23] mimp.baderlab.org Sumoylation Type I, II, III, V SumoPred 2015/[24] bioinfo.ncu.edu.cn/SUMOAMVR_Home.aspx S-palmitoylation Type I, II SeqPalm 2015/[25] lishuyan.lzu.edu.cn/seqpalm Phosphorylation NAMs ReKINect 2015/[26] rekinect.science/home Acetylation and ubiquitination Type I, II ActiveDriver 2016/[11] – Phosphorylation Type I, II, III, IV NetPhosK1.0 Rice_phospho1.0 2016/[27] bioinformatics.fafu.edu.cn Type of PTMvariants Non-synonymous single-nucleotide polymorphisms (nsSNPs) result in the substitution of the encoded amino acids. In this work, PTMvariants are amino acid variations that might influence protein PTM sites or their modifying enzymes. For example, phosphovariants are variations that affect phosphorylation sites or their interacting kinases and so on. In theory, all PTMvariants may be divided into six types from the existing literature [12–25]. Type I PTMvariant occurs at a PTM site position that directly adds [Type I (+)] or removes [Type I (−)] the PTM site. Type II PTMvariant does not occur at PTM site position but on the adjacent position of PTM site that adds [Type II (+)] or removes [Type II(−)] the PTM site. Type III PTMvariant occurs on the adjacent location of PTM site, which may change the type of modifying enzyme involved, without changing the PTM site itself. Some protein PTMs can occur on a subset of several types of amino acid residues. For instance, arginine and lysine are the most frequently methylated residues. Thus, an amino acid substitution between lysine (K) and arginine (R) occurs at a methylation site location, which might also induce a change of methyltransferases types for the methylation site. That is to say, the target site might still be methylated but by a different type of methyltransferases, which is defined as Type IV PTMvariant. Also, some amino acid residues can be modified by different kinds of PTMs. For example, the lysine side chain is a target of methylation, acetylation, ubiquitination, sumoyaltion, succinylation, hydroxylation and so on. So, amino acid variations that occur on the adjacent positions of lysine methylation site may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs that it should have happened, which is defined as type V PTMvariant. Moreover, Type VI PTMvariant occurs on the adjacent position of PTM site that results in a stop codon, which might remove its following PTM site in the protein C terminus. Among the above six type PTMvariants, the Type I and Type IV PTMvariants occur in the same location of PTM site, and other four type PTMvariants occur on the adjacent position of PTM site. Here, we choose lysine methylation peptide sequence as an example to describe the six type PTMvariants in Figure 1. Figure 1. View largeDownload slide Schematic illustration of six type PTMvariants with lysine methylation peptide sequence as an example, which include the change of an amino acid by lysine (K) residue to create a potential new [Type I (+)] or remove an original lysine methylation site [Type I (−)]; variation adjacent to lysine methylation site to create [Type II (+)] or remove [Type II (−)] lysine methylation site; variation in adjacent lysine methylation site, which may change the type of lysine methyltransferase (KMT) that recognizes lysine methylation, without changing the methylation site itself (Type III); variation between lysine (K) and arginine (R) at a methylation site location, which might induce a change of methyltransferases types for the methylation site (Type IV); variation adjacent to lysine methylation site, which may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs, such as lysine acetylation (Type V); and variation in adjacent lysine methylation position that results in a stop codon, which might remove its following lysine methylation site in the protein C terminus (Type VI). Pink amino acid residues are mutation residues. A lysine (K) linked with a methyl represents that this K can be methylated by KMT, and K linked with an acetyl represents that this K can be acetylated by lysine acetyltransferase (KAT). An arginine (R) linked with a methyl represents that this R can be methylated by arginine methyltransferases (RMT). KMT1 represents one type of lysine methyltransferase and KMT2 is another type of lysine methyltransferase. Figure 1. View largeDownload slide Schematic illustration of six type PTMvariants with lysine methylation peptide sequence as an example, which include the change of an amino acid by lysine (K) residue to create a potential new [Type I (+)] or remove an original lysine methylation site [Type I (−)]; variation adjacent to lysine methylation site to create [Type II (+)] or remove [Type II (−)] lysine methylation site; variation in adjacent lysine methylation site, which may change the type of lysine methyltransferase (KMT) that recognizes lysine methylation, without changing the methylation site itself (Type III); variation between lysine (K) and arginine (R) at a methylation site location, which might induce a change of methyltransferases types for the methylation site (Type IV); variation adjacent to lysine methylation site, which may change the circumstance surrounding the central lysine (K) and further transform the type of PTMs, such as lysine acetylation (Type V); and variation in adjacent lysine methylation position that results in a stop codon, which might remove its following lysine methylation site in the protein C terminus (Type VI). Pink amino acid residues are mutation residues. A lysine (K) linked with a methyl represents that this K can be methylated by KMT, and K linked with an acetyl represents that this K can be acetylated by lysine acetyltransferase (KAT). An arginine (R) linked with a methyl represents that this R can be methylated by arginine methyltransferases (RMT). KMT1 represents one type of lysine methyltransferase and KMT2 is another type of lysine methyltransferase. Prediction of PTMvariants Computational procedure of PTMvariants detection is summarized in Figure 2. First, the genetic variations data are collected from NCBI dbSNP [28] or SwissVariant database [29]. Researchers further consult a number of databases (such as UniProtKB/Swiss-Prot, HPRD, etc.) about the underlying effects and the references of these variations and PTMs information of corresponding proteins. Second, the wild-type protein sequences and mutant sequences are submitted to corresponding PTMpredictor, which could identify a kind of PTM proteins or sites. Finally, the PTMvariants could be identified when the PTM sites or interacting modifying enzymes are altered between the wild-type and mutant sequences. Figure 2. View largeDownload slide Computational procedure of PTMvariants detection. Figure 2. View largeDownload slide Computational procedure of PTMvariants detection. The PTMpredictor is the critical point in the process of PTMvariants detection. The PTMpredictor is a prediction tool to determine whether a sequence that contains specific amino acid residues can be modified. The computer-aided prediction of the possibility of protein PTMs is an important task that is critical for the biological interpretation of proteome data. Moreover, with the explosive increase of the number of protein sequences in the databank over the years, experimental methods do not allow researchers to find all possible PTMs data. Therefore, the prediction of PTMs from amino acid sequences is one of the rapidly developing fields of bioinformatics. So far, there have been a number of PTMpredictors developed for identifying various types of PTM proteins or sites, including phosphorylation, glycosylation, acetylation, sumoylation, ubiquitination, palmitoylation, nitration and so on. The computational techniques for PTM prediction can be relatively complex. The earliest strategies for PTM prediction are the generation of consensus sequences (i.e. motifs), which effectively transform the information contained in sequence alignments into character-based patterns, which could then be scanned against a protein of interest [30]. With the rapid deposition of experimentally validated data of PTMs, machine learning algorithms have become increasingly used for the prediction of protein PTMs, including artificial neural networks [31, 32], random forest [33, 34], pseudo amino acid composition [35–44], group-based prediction system (GPS) [45, 46], decision trees [47, 48], support vector machines [49–51], etc. In general, in silico approach for predicting PTM sites is formulated as a two-class classification problem. First, experimentally validated examples of PTMs (positive) and non-PTMs (negative) short peptides are collected from related protein databases for constructing training data set. Then, various sequence, physicochemical, structural and evolutionary properties of positive and negative training data are extracted and encoded into fixed-length feature vectors. Subsequently, the feature vectors are incorporated into supervised machine learning algorithms. Afterward, some common metrics, such as sensitivity, specificity, accuracy and Matthews correlation coefficient, are determined through a cross-validation procedure for performance assessment of predictive tools. Finally, predictive algorithms are translated into user-friendly online Web tools whereby experimentalist can upload their protein sequences and receive PTM prediction results within minutes. More detailed information on prediction methodology and model construction of PTMpredictors can be obtained from several comprehensive reviews [31, 52–58]. In 2005, Savas and Ozcelik [12] first used the Web-based NetPhos tool to predict candidate PTMvariants and identified 15 nsSNPs that might create or abolish putative phosphorylation sites in 14 DNA repair and cell cycle proteins, and found three of these single-nucleotide polymorphisms were associated with altered cancer risk. In 2009, Ryu et al. [14] developed a kinase-specific phosphorylation sites predictor named PredPhospho (version 2), and further used PredPhospho to predict potential Type I, II, III phosphovariants, and ultimately developed PhosphoVariant that is a database for definite and possible human phosphovariants. In 2010, Ren et al. [15] collected 91 797 nsSNPs from NCBI dbSNP and used an in-house developed kinase-specific phosphorylation site predictor GPS 2.0 to predict potential Type I, II, III, IV, VI phosphovariants, and then integrated all phosphovariants into the PhosSNP 1.0 database, which is freely available for academic researchers. They computationally detected 69.76% of nsSNPs as potential Phosphovariants in 17 614 proteins and observed 74.58% of phosphovariants as Type III phosphovariants, suggesting that nsSNPs induce changes in protein kinase types in adjacent phosphorylation sites rather than creating or abolishing a phosphorylation site directly. The same year, Riaño-Pachón et al. [59] analyzed the effect of nsSNPs onto Arabidopsis thaliana phosphorylation sites and their patterns based on experimentally identified sites and high-confidence predicted phosphorylation sites taken from PhosPhAt (version 3.0), and found that the results of experimental phosphorylation sites were confirmed by similar analyses of predicted phosphorylation sites in A. thaliana [16]. In 2012, Mazumder et al. [17] carried out a comprehensive analysis of nsSNPs that lead to either loss or gain of the N-glycosylation motif, and found that 48% of the variations result in changes in glycosylation sites occur at the loop and bend regions of the proteins. In 2013, Suo et al. [18] constructed the lysine acetylation sites predictor KAcePred to identify Type I, II, III acetylvariants that are acetylation-related amino acid variations, and detected that 50.87% of amino acid variations are potential acetylvariants and 12.32% of disease mutations could lead to acetylvariants. In 2015, Xu et al. [24] developed a lysine sumoylation sites prediction platform SumoPred to efficiently identify the potential Sumovariants that are sumoylation-related amino acid variations, and observed that amino acid variations that directly create new potential sumoylation sites are more likely to cause diseases. Subsequently, Li et al. [25] presented an identification method SeqPalm for protein S-palmitoylation sites to study all known disease-associated variations, and discovered that 243 potential disruption or creation of palmitoylation sites are highly associated with human inherited disease. In 2016, Lin and colleagues [27] updated rice SNP resource based on the new rice genome Ver. 7.0, and then detected different types of rice phosphovariants by NetPhosK1.0 and Rice_phospho1.0, including Type I, II, III, IV phosphovariants, and further constructed a database, SNP-rice 1.0, which was accessible at http://bioinformatics.fafu.edu.cn. As summarized in Table 1, at present the types of PTMs for prediction of PTMvariants are phosphorylation, N-glycosylation, lysine acetylation, ubiquitination, sumoylation, S-palmitoylation, tyrosine sulfation and nitration. Most present research works have mainly focused on type I and II PTMvariants. These prediction platforms of PTMvariants and the corresponding data can provide more instructive help for further experimental investigation. Functional analysis of PTMvariants A number of studies pinpointed out that amino acid variations could not only affect protein stability and dynamics but also play important roles in rewiring signaling pathways by changing protein PTM patterns [7, 16, 60, 61]. Investigating of the functional effects of PTMvariants is concentrated in protein phosphorylation, which is the most extensively studied PTM among all known protein PTMs. In 2008, Radivojac et al. [13] investigated the role of phosphorylation in somatic cancer mutations and inherited diseases, and the phosphorylation site predictor named as DisPhos [62] was chosen to predict the gain or loss of a phosphorylation site in a target protein through spontaneous mutation. They further found that kinases in cancer are twice as likely to have mutations disrupting phosphorylation sites as compared with a kinase control set, and gain of phosphorylation sites in cancer-associated mutations is about 2-fold as compared with Swiss-Prot and human variation data, which suggests both gain and loss of a phosphorylation site in a target protein may be important features for predicting cancer-causing mutations. Besides, Bader lab has carried out a series of research works on the links between phosphorylation signaling and genetic variations that affect signaling systems in cancer. In 2013, they first developed a regression-based statistical method, named as ActiveDriver, to identify frequently mutated or variable protein sites [19]. Then, they used ActiveDriver to analyze 10 900 missense single-nucleotide variants (SNVs) from 793 samples of eight cancer types, and found genes with significant phosphorylation-associated SNVs (pSNVs). ActiveDriver highlighted 11 additional cancer genes and many novel candidates with highly specific phosphosite mutations. Next, they performed a pathway analysis to find systems of functionally related genes with frequent pSNVs and analyzed pSNVs in the kinase–substrate network. Finally, they revealed increased survival of patients with TP53 pSNVs, hierarchically organized cancer kinase modules, a novel pSNV in EGF receptor (EGFR), and an immune-related network of pSNVs that correlates with prolonged survival in ovarian cancer. Their findings included multiple actionable cancer gene candidates (FLNB, GRM1, POU2F1), protein complexes (HCF1, ASF1) and kinases (PRKCZ). The same year, Bader lab further used ActiveDriver to analyze pan-cancer data set of 3185 tumor genomes and 12 cancer types from The Cancer Genome Atlas (TCGA), and showed pSNVs occurred in ∼90% of tumors and were enriched in cancer genes and pathways [20]. Gene-centric analysis found 150 known and candidate cancer genes with significant pSNV recurrence. Using a novel high-confidence set of sequence patterns recognized by 96 kinases modeled as position weight matrices (PWMs), they predicted that 29% of these mutations directly disrupt phosphorylation or modify kinase target sites to rewire signaling pathways. In 2015, Bader lab developed a machine learning method based on Bayesian statistics, called mutation impact on phosphorylation (MIMP), to predict whether SNVs disrupt existing phosphorylation sites or create new sites [23]. They tested MIMP on 236 367 missense SNVs from TCGA pan-cancer data set of 3185 cancer samples of 12 tumor types, and the results indicated that MIMP can detect functional mutations in kinase-binding sites and propose corresponding mechanisms. Furthermore, Xue [22] research group also investigated the reconfiguration of phosphorylation signaling by genetic polymorphisms affecting cancer susceptibility. They first used the kinase-specific predictor of GPS 2.1 [46] previously developed by them to predict site-specific kinase–substrate relations (ssKSRs) for the original and nsSNP-containing proteins, respectively. Then, they adopted known phosphorylation sites and protein–protein interactions between kinases and substrates to remove false-positive predictions, and detected 9606 potential phosphorylation-related single-nucleotide polymorphisms (phosSNPs) in 7946 proteins. Subsequently, they reconstructed the human phosSNP-associated kinase–substrate phosphorylation network by comparing the predicted ssKSRs for the original and mutated proteins, and found that cancer genes and drug targets are highly enriched in the network, and the proteins in the network are significantly associated with various signaling and cancer pathways. In addition, Linding lab discussed whether cancer mutations could perturb phosphorylation signaling networks. They first proposed network-attacking mutations (NAMs), which lead to a new cellular phenotype by perturbing signaling networks either at the network structure or the network dynamics level in 2012 [63]. Later on, they further divided NAMs into six types based on perturbations of signaling network dynamics, network structure and dysregulation of phosphorylation sites in 2015 [26]. Furthermore, they developed a computational approach, termed ReKINect, to predict NAMs and systematically interpreted the exomes and quantitative phospho-proteomes of five ovarian cancer cell lines and the global cancer genome repository. They discovered mutant molecular logic gates, a drift toward phospho-threonine signaling, weakening of phosphorylation motifs and kinase-inactivating hotspots in cancer. Except for phosphorylation, recently, Bader lab performed computational analyses of acetylation and ubiquitination sites in a pan-cancer data set of 3200 tumor samples from TCGA [11]. Cancer mutations were mapped to PTM sites and gene-focused analysis with the ActiveDriver mutational significance model highlighted significant co-occurrences of acetylation and ubiquitination and mutation hotspots in known oncoproteins and showed candidate cancer driver genes with PTM-related mechanisms. Pathway analysis with functional annotations from the Gene Ontology revealed that PTM mutations in acetylation and ubiquitination sites accumulated in cancer-related processes such as cell cycle, apoptosis, chromatin regulation and metabolism. PTM-specific interaction network analysis revealed survival-associated protein modules and suggested that many PTM-related mutations were related to decreased patient survival. Taken together, these system analyses about PTM-related mutations should prove helpful for further elucidation of the functional impacts of disease-associated SNPs. Application of computational tools Computational approaches can filter out the most likely pathological variants from the huge pool of SNP data sets and represent a useful starting point to guide the design of functional assays. For example, Deng et al. [64] explored the functions of phosSNPs for bone mineral density (BMD) in humans to elucidate pathophysiological mechanism of osteoporosis. During the course of the investigation, out of the total 64 035 phosSNPs in the phosSNP 1.0 database [15], those covered by Affymetrix SNP Arrays were studied. The potential impacts of the three significant phosSNPs (rs16861032, rs2657879 and rs6265) on protein phosphorylation were predicted by GPS 2.0. They experimentally validated the prediction that BDNF-T62 was the target site of phosphorylation regulated by rs6265, and concluded that phosSNP rs6265, through regulating BDNF protein phosphorylation and osteoblast differentiation, influenced hip BMD in humans. In 2017, Cheng and colleagues [65] investigated potential functional mechanisms of type 2 diabetes (T2D)-associated SNPs and genes. They adopted the PhosSNP 1.0 database [15] (accessed 8 March 2016) to analyze the effects of T2D-associated SNPs on protein phosphorylation, and detected four T2D-associated and eight proxy phosSNPs, including Type II, III, IV phosSNPs. More recently, Krassowski et al. [66] developed a comprehensive human proteo-genomics database, ActiveDriverDB, which annotates disease mutations and population variants through the lens of PTMs. Herein, two interaction networks are available: the high-confidence experimental network includes experimentally determined kinase–substrate interactions, and the computationally predicted network includes gained and lost kinase–substrate interactions derived from sequence motif analysis with MIMP method [23]. The ReKINect platform [26] can be used to predict whether mutations can create or destroy phosphorylation sites, and identify kinase downstream rewiring mutations. Li [67] applied ReKINect to predict the likely functionality for each mutation and exploited the role of mutations in epigenetic regulators, particularly MLL2, in cervical carcinogenesis. There is no uniform standard between different methods, and it is extremely dificult to decide which method is best. To avoid limitation of each approach and improve the performance, it is best to combine many tools and resources in determining the possible functional mechanism. For instance, Rajendran and Deng [68] incorporated a dozen of driver gene prediction tools and resources to identify a list of candidate driver mutations involved in human breast cancer genes. To increase the reliability of prediction, six different approaches including ActiveDriver [19] were used to ensure the confidence level of potential breast cancer drivers. Discussion As the rapid progress of sequencing technologies and the appearance of new methods, a large number of protein PTM sites and genetic variations are emerging. The PTMVar data set (Release: 27-Sep-2017), derived from PhosphoSitePlus [69], includes 30 080 modification sites associated with single amino acid variants, of which 41.58% are classified as disease variants, 45.34% as polymorphisms and 13.08% as unclassified (Figure 3). As shown in Figure 3A, these amino acid variations mainly involve phosphorylation (73.96%), ubiquitination (13.82%), acetylation (10.09%), succinylation (0.93%), sumoylation (0.86%), methylation (0.22%) and neddylation (0.12%). Because PTMs are important active sites that are regulatory switches in proteins and pathways, specific mutations in PTM sites may alter networks and lead to changes in cellular phenotype involved in disease development [11]. However, the complexities of PTM mechanism cannot be perfectly solved by experimental approaches, and mutational analysis would be labor-intensive and time-consuming. Thus, it is necessary to provide new methods to automatically predict the impact of mutations on PTMs, especially for large-scale predictions. As mentioned above, many different methods have been developed to predict PTMvariants and perform related functional analysis and have made some progress. But computational mutation analysis on protein PTMs is still a complicated project, and there are also some problems and challenges in this field. Figure 3. View largeDownload slide The data statistics for the PTMVar data set (Release: 27 September 2017) derived from PhosphoSitePlus. (A) The proportion of different types PTMs impacted by variants. (B) The statistics of disease, polymorphisms and unclassified variants. (C) The proportion of Type I and Type II variants. Type I and Type II mutated residues are at or within five residues of PTM sites, respectively. Figure 3. View largeDownload slide The data statistics for the PTMVar data set (Release: 27 September 2017) derived from PhosphoSitePlus. (A) The proportion of different types PTMs impacted by variants. (B) The statistics of disease, polymorphisms and unclassified variants. (C) The proportion of Type I and Type II variants. Type I and Type II mutated residues are at or within five residues of PTM sites, respectively. The PTMvariants could be identified by protein PTM predictors, so the prediction quality of PTMpredictors directly affects identification of PTMvariants. Hundreds of PTMpredictors have already been designed to predict kinds of PTM proteins or sites so far. Most of the existing PTMpredictors are based on machine learning methods in which the success of the prediction heavily depends on the effective feature, and feature extraction and classification are regarded as two separate problems. However, the features are extracted from protein sequences by manual design, which may result in incomplete or biased features. Thus, new approaches, such as deep learning, should be introduced to deal with the problem and improve the prediction quality of PTMpredictors. For example, recently Wang et al. [70] applied deep learning method to update their previous tool Musite for general and kinase-specific phosphorylation site prediction, and showed that deep learning method has improved the performance of prediction and become a promising approach. Compared with conventional machine learning methods, deep learning technique allows computational model to be fed with raw sequence data and automatically discovers the complex representations needed for classification. Furthermore, using a combination of methods based on different theoretical principles may help mitigate false-positive and negative rates suffered by any one method alone, which results in a cleaner list of candidates for experimental validation [71]. In addition, the online service is not available for some existing PTMvariants predictors (Table 1). To develop practically more useful prediction methods, a user-friendly and publicly accessible Web server or Applet is necessary for those experimental scientists who could use the predictors as a tool without understanding the detailed mathematics. As shown in Figure 3C, the PTMVar data set derived from PhosphoSitePlus database provides Type I and Type II mutated residues, which are at or within five residues of PTM sites, respectively. Owing to the limited data available, most computational mutation analysis on protein PTMs have mainly focused on type I and II PTMvariants. However, accumulative studies have indicated that various PTMs can synergistically orchestrate specific biological processes by cross talks [72–75]. Multiple PTMs can ‘in situ’ interplay with each other by competitively modifying same residues [73–75]. For instance, lysine methylation and acetylation compete at Lys382 for modulating the p53 transcription activity [76]. Also, Lys382 acetylation of the p53 can also preclude its ubiquitination [77]. Likewise, nitration and phosphorylation can co-occur at Tyr125, Tyr133 and Tyr136 of alpha-synuclein (UniProt ID: P37840) [21, 78]. Human Gastrin (UniProt ID: P01350) was identified to be phosphorylated by v-Src at Tyr87, which is also modulated by sulfation [79, 80]. Xue [21] research group systematically analyzed in situ cross talk at the same positions among three tyrosine PTMs, including sulfation, nitration and phosphorylation. Their results suggested that multi-PTM targeting tyrosines are not more conserved than unmodified ones, and there is no functional constraint on multiply modified tyrosines. Multiple PTMs cross talks in a site-specific manner are significantly co-occurred. However, the intricate competition mechanism and influence of in situ cross talk have not been in-depth research. The amino acid variations that occur on the adjacent positions of in situ cross talk sites may change the circumstance around cross talk sites and further transform the type of PTMs that it should have happened. In this regard, Type V PTMvariant might be a valuable way to explore underlying connection of the cross talk among different PTMs. Meanwhile, with increasing availability of various modifying enzymes data, developing protein modifying enzyme-specific (such as methyltransferase-specific, acetyltransferase-specific, etc.) PTM substrate and site prediction tool is necessary and feasible, which will be helpful for identifying and analyzing of Type III, IV and V PTMvariants. There is a need for methods that not only identify PTMvariants but also predict how PTMvariants might affect cellular networks. Functional analysis of PTMvariants is still in its infancy. Mutations do not occur in isolation but coexist with other somatic alterations that work together to alter cellular processes. The integration of multiple sources of biological information, pathway and network analysis techniques based on graph theory, information theory or Bayesian theory can help address the challenge in interpreting proteomics results [71]. Network biology may fundamentally advance not only basic biology but also patient treatment [63]. Because personalized precision medicine is the frontier for scientists, industry and the general population, it is becoming more significance to exploit computational approaches that can lead to a better understanding of the etiology of disease. Apart from PTMs in protein, cancer and many other major diseases are often caused by post-replication modification (PTRM) in DNA, and posttranscription modification (PTCM) in RNA. Many different methods have been developed to predict PTRM sites for DNA sequences [81, 82] and PTCM sites for RNA sequences [83–86]. All these developments have significant impacts on medicinal chemistry [87] and even drive it into an unprecedented revolution [88]. Integration of genetic and molecular information is a sensible step in this direction because it provides a structural and functional perspective to the human variation data. The development of better approaches for functional analysis of PTMvariants will help to facilitate this process and further support the future development of personalized precision medicine. Key Points Computational mutation analysis can greatly narrow down the efforts on experimental work. We provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. Using a combination of methods based on different theoretical principles may help mitigate false-positive and negative rates suffered by any one method alone. Type V PTMvariant might be a valuable way to explore underlying connection of the cross talk among different PTMs. The integration of multiple sources of biological information, pathway and network analysis techniques based on graph theory, information theory or Bayesian theory can help address the challenge in interpreting proteomics results. Funding This work was supported by the National Natural Science Foundation of China (grant numbers 21665016 and 21305062), and the Natural Science Foundation of Jiangxi Province (grant number 20151BAB203022). Shaoping Shi is an associate professor at School of Sciences, Nanchang University. Her research focuses on the development of novel data analysis algorithms and bioinformatics tools for prediction of protein structure and function. Lina Wang is a lecturer at the Department of Science, Nanchang Institute of Technology. Her research focuses on computational prediction and function analysis of acylation modification. Cao Man is a graduate student at School of Sciences, Nanchang University. Her research focuses on integrative tyrosine PTMs data for analysis and validation. Guodong Chen is a graduate student at School of Sciences, Nanchang University. His current research focuses on developing novel data analysis algorithms and software of prokaryotes lysine acetylation. Jialin Yu is a graduate student at School of Sciences, Nanchang University. His research focuses on deep learning and prediction of protein structure and function. References 1 Reimand J , Wagih O , Bader GD. Evolutionary constraint and disease associations of post-translational modification sites in human genomes . PLoS Genet 2015 ; 11 : e1004919. Google Scholar CrossRef Search ADS PubMed 2 Kamath KS , Vasavada MS , Srivastava S. Proteomic databases and tools to decipher post-translational modifications . J Proteomics 2011 ; 75 ( 1 ): 127 – 44 . Google Scholar CrossRef Search ADS PubMed 3 Lichti CF , Wildburger NC , Emmett MR , et al. Post-translational modifications in the human proteome. In: Marko-Varga G (ed). Genomics and Proteomics for Clinical Discovery and Development. Translational Bioinformatics . Dordrecht : Springer , 2014 . 4 Olsen JV , Mann M. Status of large-scale analysis of post-translational modifications by mass spectrometry . Mol Cell Proteomics 2013 ; 12 : 3444 – 52 . Google Scholar CrossRef Search ADS PubMed 5 Liu ZX , Cai YD , Guo XJ , et al. Post-translational modification (PTM) bioinformatics in China: progresses and perspectives . Hereditas 2015 ; 37 : 621 – 34 . Google Scholar PubMed 6 Pang C , Gasteiger E , Wilkins MR. Identification of arginine- and lysine-methylation in the proteome of Saccharomyces cerevisiae and its functional implications . BMC Genomics 2010 ; 11 : 92 . Google Scholar CrossRef Search ADS PubMed 7 Li S , Iakoucheva LM , Mooney SD , et al. Loss of post-translational modification sites in disease . Pac Symp Biocomput 2010 ; 15 : 337 – 47 . 8 Thomas M , Dadgar N , Aphale A , et al. Androgen receptor acetylation site mutations cause trafficking defects, misfolding, and aggregation similar to expanded glutamine tracts . J Biol Chem 2004 ; 279 : 8389 – 95 . Google Scholar CrossRef Search ADS PubMed 9 Luna L , Rolseth V , Hildrestrand GA , et al. Dynamic relocalization of hOGG1 during the cell cycle is disrupted in cells harbouring the hOGG1-Cys326 polymorphic variant . Nucleic Acids Res 2005 ; 33 ( 6 ): 1813 – 24 . Google Scholar CrossRef Search ADS PubMed 10 Lu C , Jain SU , Hoelper D , et al. Histone H3K36 mutations promote sarcomagenesis through altered histone methylation landscape . Science 2016 ; 352 ( 6287 ): 844 – 9 . Google Scholar CrossRef Search ADS PubMed 11 Narayan S , Bader GD , Reimand J. Frequent mutations in acetylation and ubiquitination sites suggest novel driver mechanisms of cancer . Genome Med 2016 ; 8 : 55 . Google Scholar CrossRef Search ADS PubMed 12 Savas S , Ozcelik H. Phosphorylation states of cell cycle and DNA repair proteins can be altered by the nsSNPs . BMC Cancer 2005 ; 5 : 107 . Google Scholar CrossRef Search ADS PubMed 13 Radivojac P , Baenziger PH , Kann MG , et al. Gain and loss of phosphorylation sites in human cancer . Bioinformatics 2008 ; 24 : i241 – 7 . Google Scholar CrossRef Search ADS PubMed 14 Ryu GM , Song P , Kim KW , et al. Genome-wide analysis to predict protein sequence variations that change phosphorylation sites or their corresponding kinases . Nucleic Acids Res 2009 ; 37 ( 4 ): 1297 – 307 . Google Scholar CrossRef Search ADS PubMed 15 Ren J , Jiang CH , Gao XJ , et al. PhosSNP for systematic analysis of genetic polymorphisms that influence protein phosphorylation . Mol Cell Proteomics 2010 ; 9 : 623 – 34 . Google Scholar CrossRef Search ADS PubMed 16 Riaño-Pachón DM , Kleessen S , Neigenfind J , et al. Proteome-wide survey of phosphorylation patterns affected by nuclear DNA polymorphisms in Arabidopsis thaliana . BMC Genomics 2010 ; 11 : 411 . Google Scholar CrossRef Search ADS PubMed 17 Mazumder R , Morampudi KS , Motwani M , et al. Proteome-wide analysis of single-nucleotide variations in the n-glycosylation sequon of human genes . PLoS One 2012 ; 7 : e36212 . Google Scholar CrossRef Search ADS PubMed 18 Suo SB , Qiu JD , Shi SP , et al. Proteome-wide analysis of amino acid variations that influences protein lysine acetylation . J Proteome Res 2013 ; 12 : 949 – 58 . Google Scholar CrossRef Search ADS PubMed 19 Reimand J , Bader GD. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers . Mol Syst Biol 2013 ; 9 : 637 . Google Scholar CrossRef Search ADS PubMed 20 Reimand J , Wagih O , Bader GD. The mutational landscape of phosphorylation signaling in cancer . Sci Rep 2013 ; 3 : 2651 . Google Scholar CrossRef Search ADS PubMed 21 Pan Z , Liu Z , Cheng H , et al. Systematic analysis of the in situ crosstalk of tyrosine modifications reveals no additional natural selection on multiply modified residues . Sci Rep 2014 ; 4 : 7331. Google Scholar CrossRef Search ADS PubMed 22 Wang Y , Cheng H , Pan Z , et al. Reconfiguring phosphorylation signaling by genetic polymorphisms affects cancer susceptibility . J Mol Cell Biol 2015 ; 7 : 187 – 202 . Google Scholar CrossRef Search ADS PubMed 23 Wagih O , Reimand J , Bader GD. MIMP: predicting the impact of mutations on kinase-substrate phosphorylation . Nat Methods 2015 ; 12 : 531 – 3 . Google Scholar CrossRef Search ADS PubMed 24 Xu HD , Shi SP , Chen X , et al. Systematic analysis of the genetic variability that impacts sumo conjugation and their involvement in human diseases . Sci Rep 2015 ; 5 : 10900 . Google Scholar CrossRef Search ADS PubMed 25 Li S , Li J , Ning L , et al. In silico identification of protein S-palmitoylation sites and their involvement in human inherited disease . J Chem Inf Model 2015 ; 55 : 2015 – 25 . Google Scholar CrossRef Search ADS PubMed 26 Creixell P , Schoof EM , Simpson CD , et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling . Cell 2015 ; 163 : 202 – 17 . Google Scholar CrossRef Search ADS PubMed 27 Lin S , Chen L , Tao H , et al. Impact of SNPs on protein phosphorylation status in rice (Oryza sativa L.) . Int J Mol Sci 2016 ; 17 ( 11 ): 1738 . Google Scholar CrossRef Search ADS 28 Sherry ST , Ward MH , Kholodov M , et al. dbSNP: the NCBI database of genetic variation . Nucleic Acids Res 2001 ; 29 ( 1 ): 308 – 11 . Google Scholar CrossRef Search ADS PubMed 29 Yip YL , Scheib H , Diemand AV , et al. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants . Hum Mutat 2004 ; 23 : 464 – 70 . Google Scholar CrossRef Search ADS PubMed 30 Pearson RB , Kemp BE. Protein kinase phosphorylation site sequences and consensus specificity motifs: tabulations . Methods Enzymol 1991 ; 200 : 62 – 81 . Google Scholar CrossRef Search ADS PubMed 31 Blom N , Sicheritz-Ponten T , Gupta R , et al. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence . Proteomics 2004 ; 4 ( 6 ): 1633 – 49 . Google Scholar CrossRef Search ADS PubMed 32 Plewczynski D , Basu S , Saha I. AMS 4.0: consensus prediction of post-translational modifications in protein sequences . Amino Acids 2012 ; 43 ( 2 ): 573 – 82 . Google Scholar CrossRef Search ADS PubMed 33 Jia J , Liu Z , Xiao X , et al. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach . J Theor Biol 2016 ; 394 : 223 – 30 . Google Scholar CrossRef Search ADS PubMed 34 Qiu WR , Sun BQ , Xiao X , et al. iPTM-mLys: identifying multiple lysine PTM sites and their different types . Bioinformatics 2016 ; 32 ( 20 ): 3116 – 23 . Google Scholar CrossRef Search ADS PubMed 35 Xu Y , Ding J , Wu LY , et al. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition . PLoS One 2013 ; 8 : e55844 . Google Scholar CrossRef Search ADS PubMed 36 Xu Y , Shao XJ , Wu LY , et al. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins . Peer J 2013 ; 1 : e171 . Google Scholar CrossRef Search ADS PubMed 37 Jia C , Lin X , Wang Z. Prediction of protein S-nitrosylation sites based on adapted normal distribution Bi-Profile Bayes and Chou's Pseudo amino acid composition . Int J Mol Sci 2014 ; 15 : 10410 – 23 . Google Scholar CrossRef Search ADS PubMed 38 Qiu WR , Xiao X , Lin WZ , et al. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach . Biomed Res Int 2014 ; 2014 : 947416. Google Scholar PubMed 39 Xu Y , Wen X , Wen LS , et al. iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition . PLoS One 2014 ; 9 : e10501. 40 Jia J , Zhang L , Liu Z , et al. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC . Bioinformatics 2016 ; 32 ( 20 ): 3133 – 41 . Google Scholar CrossRef Search ADS PubMed 41 Liu LM , Xu Y . iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC . Med Chem 2017 ; 13 : 552 – 9 . Google Scholar CrossRef Search ADS PubMed 42 Jia J , Liu Z , Xiao X , et al. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset . Anal Biochem 2016 ; 497 : 48 – 56 . Google Scholar CrossRef Search ADS PubMed 43 Xu Y , Wen X , Shao XJ , et al. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition . Int J Mol Sci 2014 ; 15 : 7594 – 610 . Google Scholar CrossRef Search ADS PubMed 44 Qiu WR , Sun BQ , Xiao X , et al. iPhos-PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory . Mol Inform 2017 ; 36 ( 5–6 ): UNSP 1600010 . Google Scholar CrossRef Search ADS 45 Xue Y , Ren J , Gao X , et al. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy . Mol Cell Proteomics 2008 ; 7 : 1598 – 608 . Google Scholar CrossRef Search ADS PubMed 46 Xue Y , Liu Z , Cao J , et al. GPS 2.1: enhanced prediction of kinasespecific phosphorylation sites with an algorithm of motif length selection . Protein Eng Des Sel 2011 ; 24 : 255 – 60 . Google Scholar CrossRef Search ADS PubMed 47 Charpilloz C , Veuthey AL , Chopard B , et al. Motifs tree: a new method for predicting post-translational modifications . Bioinformatics 2014 ; 30 ( 14 ): 1974 – 82 . Google Scholar CrossRef Search ADS PubMed 48 Lopez Y , Dehzangi A , Lal SP , et al. SucStruct: prediction of succinylated lysine residues by using structural properties of amino acids . Anal Biochem 2017 ; 527 : 24 – 32 . Google Scholar CrossRef Search ADS PubMed 49 Shi SP , Sun XY , Qiu JD , et al. The prediction of palmitoylation site locations using a multiple feature extraction methods . J Mol Graph Model 2013 ; 40 : 125 – 30 . Google Scholar CrossRef Search ADS PubMed 50 Wang LN , Shi SP , Xu HD , et al. Computational prediction of species-specific malonylation sites via enhanced characteristic strategy . Bioinformatics 2017 ; 33 ( 10 ): 1457 – 63 . Google Scholar PubMed 51 Shi SP , Chen X , Xu HD , et al. PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure . Mol BioSyst 2015 ; 11 : 819 – 25 . Google Scholar CrossRef Search ADS PubMed 52 Xue Y , Gao XJ , Cao J , et al. A summary of computational resources for protein phosphorylation . Curr Protein Pept Sci 2010 ; 11 ( 6 ): 485 – 96 . Google Scholar CrossRef Search ADS PubMed 53 Eisenhaber B , Eisenhaber F. Prediction of posttranslational modification of proteins from their amino acid sequence . Methods Mol Biol 2010 ; 609 : 365 – 84 . Google Scholar CrossRef Search ADS PubMed 54 Trost B , Kusalik A. Computational prediction of eukaryotic phosphorylation sites . Bioinformatics 2011 ; 27 ( 21 ): 2927 – 35 . Google Scholar CrossRef Search ADS PubMed 55 Sobolev BN , Veselovsky AV , Poroikov VV. Prediction of protein post-translational modifications: main trends and methods . Russ Chem Rev 2014 ; 83 : 143 – 54 . Google Scholar CrossRef Search ADS 56 Shi SP , Xu HD , Wen PP , et al. Progress and challenges in predicting protein methylation sites . Mol BioSyst 2015 ; 11 : 2610 – 19 . Google Scholar CrossRef Search ADS PubMed 57 Chen Z , Zhou Y , Zhang ZD , et al. Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features . Brief Bioinform 2015 ; 16 ( 4 ): 640 – 57 . Google Scholar CrossRef Search ADS PubMed 58 Xu Y , Chou KC. Recent progress in predicting posttranslational modification sites in proteins . Curr Top Med Chem 2016 ; 16 ( 6 ): 591 – 603 . Google Scholar CrossRef Search ADS PubMed 59 Durek P , Schmidt R , Heazlewood J , et al. PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update . Nucleic Acids Res 2010 ; 38 : D828 – 34 . Google Scholar CrossRef Search ADS PubMed 60 Yue P , Moult J. Identification and analysis of deleterious human SNPs . J Mol Biol 2006 ; 356 ( 5 ): 1263 – 74 . Google Scholar CrossRef Search ADS PubMed 61 Creixell P , Schoof EM , Tan CSH , et al. Mutational properties of amino acid residues: implications for evolvability of phosphorylatable residues . Phil Trans R Soc B 2012 ; 367 ( 1602 ): 2584 – 93 . Google Scholar CrossRef Search ADS PubMed 62 Iakoucheva LM , Radivojac P , Brown CJ , et al. The importance of intrinsic disorder for protein phosphorylation . Nucleic Acids Res 2004 ; 32 ( 3 ): 1037 – 49 . Google Scholar CrossRef Search ADS PubMed 63 Creixell P , Schoof EM , Erler JT , et al. Navigating cancer network attractors for tumor-specific therapy . Nat Biotechnol 2012 ; 30 ( 9 ): 842 – 8 . Google Scholar CrossRef Search ADS PubMed 64 Deng FY , Tan LJ , Shen H , et al. SNP rs6265 regulates protein phosphorylation and osteoblast differentiation and influences BMD in humans . J Bone Miner Res 2013 ; 28 ( 12 ): 2498 – 507 . Google Scholar CrossRef Search ADS PubMed 65 Cheng M , Liu X , Yang M , et al. Computational analyses of type 2 diabetes-associated loci identified by genome-wide association studies . J Diabetes 2017 ; 9 ( 4 ): 362 – 77 . Google Scholar CrossRef Search ADS PubMed 66 Krassowski M , Paczkowska M , Cullion K , et al. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins . Nucleic Acids Res 2018 ; 46 : D901 – 10 . Google Scholar CrossRef Search ADS PubMed 67 Li X. Emerging role of mutations in epigenetic regulators including MLL2 derived from The Cancer Genome Atlas for cervical cancer . BMC Cancer 2017 ; 17 ( 1 ): 252 . Google Scholar CrossRef Search ADS PubMed 68 Rajendran BK , Deng CX. Characterization of potential driver mutations involved in human breast cancer by computational approaches . Oncotarget 2017 ; 8 ( 30 ): 50252 – 72 . Google Scholar CrossRef Search ADS PubMed 69 Hornbeck PV , Zhang B , Murray B , et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations . Nucleic Acid Res 2015 ; 43 ( D1 ): D512 – 20 . Google Scholar CrossRef Search ADS PubMed 70 Wang D , Zeng S , Xu C , et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction . Bioinformatics 2017 ; 33 ( 24 ): 3909 – 16 . Google Scholar CrossRef Search ADS PubMed 71 Gonzalez-Perez A , Mustonen V , Reva B , et al. Computational approaches to identify functional genetic variants in cancer genomes . Nat Methods 2013 ; 8 : 723 – 9 . 72 Lopez-Otin C , Hunter T. The regulatory crosstalk between kinases and proteases in cancer . Nat Rev Cancer 2010 ; 10 ( 4 ): 278 – 92 . Google Scholar CrossRef Search ADS PubMed 73 Yang XJ , Seto E. Lysine acetylation: codified crosstalk with other posttranslational modifications . Mol Cell 2008 ; 31 ( 4 ): 449 – 61 . Google Scholar CrossRef Search ADS PubMed 74 Hart GW , Slawson C , Ramirez-Correa G , et al. Cross talk between OGlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease . Annu Rev Biochem 2011 ; 80 : 825 – 58 . Google Scholar CrossRef Search ADS PubMed 75 Kaasik K , Kivimäe S , Allen JJ , et al. Glucose sensor O-GlcNAcylation coordinates with phosphorylation to regulate circadian clock . Cell Metab 2013 ; 17 ( 2 ): 291 – 302 . Google Scholar CrossRef Search ADS PubMed 76 Carter S , Vousden KH. Modifications of p53: competing for the lysines . Curr Opin in Genet Dev 2009 ; 19 ( 1 ): 18 – 24 . Google Scholar CrossRef Search ADS 77 Le Cam L , Linares LK , Paul C , et al. E4F1 is an atypical ubiquitin ligase that modulates p53 effector functions independently of degradation . Cell 2006 ; 127 ( 4 ): 775 – 88 . Google Scholar CrossRef Search ADS PubMed 78 Takahashi T , Yamashita H , Nakamura T , et al. Tyrosine 125 of alpha-synuclein plays a critical role for dimerization following nitrative stress . Brain Res 2002 ; 938 ( 1–2 ): 73 – 80 . Google Scholar CrossRef Search ADS PubMed 79 Songyang Z , Cantley LC. Recognition and specificity in protein tyrosine kinase-mediated signalling . Trends Biochem Sci 1995 ; 20 ( 11 ): 470 – 5 . Google Scholar CrossRef Search ADS PubMed 80 Rehfeld JF , Hansen CP , Johnsen AH. Post-poly(Glu) cleavage and degradation modified by O-sulfated tyrosine: a novel post-translational processing mechanism . EMBO J 1995 ; 14 : 389 – 96 . Google Scholar PubMed 81 Liu Z , Xiao X , Qiu WR , et al. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition . Anal Biochem 2015 ; 474 : 69 – 77 . Google Scholar CrossRef Search ADS PubMed 82 Feng P , Yang H , Ding H , et al. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC . Genomics 2018 , in press. doi: 10.1016/j.ygeno.2018.01.005. 83 Chen W , Tang H , Ye J , et al. iRNA-PseU: identifying RNA pseudouridine sites . Mol Ther Nucl Acids 2016 ; 5 : e332 . 84 Liu Z , Xiao X , Yu DJ , et al. pRNAm-PC: predicting N-methyladenosine sites in RNA sequences via physical-chemical properties . Anal Biochem 2016 ; 497 : 60 – 7 . Google Scholar CrossRef Search ADS PubMed 85 Feng P , Ding H , Yang H , et al. iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC . Mol Ther Nucl Acids 2017 ; 7 : 155 – 63 . Google Scholar CrossRef Search ADS 86 Qiu WR , Jiang SY , Sun BQ , et al. iRNA-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifier . Med Chem 2017 ; 13 : 734 – 43 . Google Scholar CrossRef Search ADS PubMed 87 Chou KC. Impacts of bioinformatics to medicinal chemistry . Med Chem 2015 ; 11 ( 3 ): 218 – 34 . Google Scholar CrossRef Search ADS PubMed 88 Chou KC. An unprecedented revolution in medicinal chemistry driven by the progress of biological science . Curr Top Med Chem 2017 ; 17 : 2337 – 58 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Briefings in BioinformaticsOxford University Press

Published: May 17, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off