An in silico proteomics screen to predict and prioritize protein–protein interactions dependent on post-translationally modified motifs

An in silico proteomics screen to predict and prioritize protein–protein interactions dependent... Abstract Motivation The development of proteomic methods for the characterization of domain/motif interactions has greatly expanded our understanding of signal transduction. However, proteomics-based binding screens have limitations including that the queried tissue or cell type may not harbor all potential interacting partners or post-translational modifications (PTMs) required for the interaction. Therefore, we sought a generalizable, complementary in silico approach to identify potentially novel motif and PTM-dependent binding partners of high priority. Results We used as an initial example the interaction between the Src homology 2 (SH2) domains of the adaptor proteins CT10 regulator of kinase (CRK) and CRK-like (CRKL) and phosphorylated-YXXP motifs. Employing well-curated, publicly-available resources, we scored and prioritized potential CRK/CRKL–SH2 interactors possessing signature characteristics of known interacting partners. Our approach gave high priority scores to 102 of the >9000 YXXP motif-containing proteins. Within this 102 were 21 of the 25 curated CRK/CRKL–SH2-binding partners showing a more than 80-fold enrichment. Several predicted interactors were validated biochemically. To demonstrate generalized applicability, we used our workflow to predict protein–protein interactions dependent upon motif-specific arginine methylation. Our data demonstrate the applicability of our approach to, conceivably, any modular binding domain that recognizes a specific post-translationally modified motif. Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction The first description of the Src homology 2 (SH2) domain (Sadowski et al., 1986) and the subsequent discovery that SH2 domains bind phosphorylated tyrosine (pTyr) residues (Matsuda et al., 1991; Mayer et al., 1991; Moran et al., 1990) enlightened our previous understanding of signal transduction, taking pTyr past simple allosteric regulation to a world of multiprotein complexes. Our emerging understanding involved enzyme-induced protein/lipid modifications that could modulate the formation/dissociation of signaling hubs, affecting affinities of molecular interactions and subcellular localization (Mayer, 2015). The characterization of the SH2 domain spurred the identification of additional post-translational modification (PTM) recognition domains (Seet et al., 2006), most of which possess a unique affinity for amino acids (AAs) surrounding the modified site. The SH2 domain is comprised of a highly conserved sequence of ∼100 AAs found in many adaptors, scaffolding proteins, transcription factors, and non-receptor tyrosine kinases (Sadowski et al., 1986). SH2 domains bind pTyr residues within motifs specific to each SH2 domain, linking tyrosine kinases and their substrates with downstream effectors. CT10 regulator of kinase (CRK) and CRK-like (CRKL) are broadly expressed adaptors that execute central roles in complex formation during fundamental cellular processes including differentiation, proliferation, and migration (Brábek et al., 2005; Klemke et al., 1998; Park and Curran, 2014). Each family member possesses a single SH2 domain that binds phosphorylated YXXP (pYXXP) motifs and two SH3 domains, although the C-terminal CRK-SH3 can be deleted through alternative splicing. The N-terminal CRK/CRKL–SH3 domain is responsible for most intermolecular interactions and binds PXXPXK sequences (Wu et al., 1995). Although these adaptors facilitate complex assembly required for many well-studied metazoan signaling mechanisms, CRK family members are hypothesized to serve additional undiscovered roles. The CRK/CRKL–SH2 domain binds with high specificity to pYXXP, generated by the activity of kinases including Src family kinases (SFKs), Abl/Arg, focal adhesion kinase (FAK), tyrosine-protein kinase (SYK), platelet-derived growth factor receptor (PDGFR) and epidermal growth factor receptor (EGFR) (Supplementary Fig. S1, Supplementary Table S1). SFKs/Abl are responsible for phosphorylation events critical to CRK/CRKL binding in several important systems, including Reelin signaling (Ballif et al., 2004), and focal adhesion dynamics (Chodniewicz and Klemke, 2004; Park and Curran, 2008). Previously, we conducted a proteomics screen that aimed to identify SFK substrates whose pYXXP motifs would bind CRKL–SH2 in HEK293 cells (Aten et al., 2013). We identified the novel CRKL–SH2 interactor discoidin, CUB, and LCCL domain-containing protein 2 (DCBLD2), also endothelial and smooth muscle cell-derived neuropilin-like (ESDN) and CUB, LCCL-homology, and coagulation factor V/VIII-homology domains protein 1 (CLCP1), a scaffolding receptor with seven intracellular YXXP motifs. We recently characterized DCBLD2 alongside its family member DCBLD1, and found them to be SFK-/Abl-mediated pYXXP-dependent CRKL–SH2 interactors (Schmoker et al., 2017). Although proteomic methods that facilitate characterization of domain/motif interactions have accelerated explorations into mechanistic signaling, proteomics-based binding screens are accompanied by limitations. The organism/tissue/cell type employed may not express all potential interactors. Further, PTM-dependent screens may fail to identify important interactions if certain modifying enzymes (kinases/ligases/transferases) are not sufficiently activated, particularly if only a subset of cells within a tissue have the relevant signaling pathway engaged. Therefore, we sought a complementary approach to identify interactors of modular domains that might circumvent these limitations. Here we describe an in silico screen that uses signature characteristics of known domain/motif interactions and empirical data from mass spectrometric screens to predict PTM-dependent motif-specific interactions. By approaching this question from a bioinformatics perspective, the query is not limited to the proteome of a particular cell type, but is expanded to encompass that of an entire organism. As an initial example, we explored the CRK/CRKL–SH2-pYXXP interaction. Using well-curated databases and predictive tools, we compiled lists of proteins possessing defined characteristics of CRK/CRKL–SH2 domain interactors and then weighted and prioritized candidates by list membership. Our application of this bioinformatics pipeline to the CRK/CRKL–SH2-pYXXP interaction was successful in identifying both known and novel CRK/CRKL–SH2 candidate interactors, and several novel candidates were validated biochemically. We then tested our generalized workflow on the prediction of protein–protein interactions (PPIs) requiring motif-specific arginine methylation. Together our data show the applicability of this approach to, conceivably, any modular domain that recognizes a specific modified motif. 2 Materials and methods See Supplementary Material for a full description of experimental procedures. 3 Results 3.1 In silico motif-based proteomics screen To formulate a generalizable workflow for prioritizing PTM-dependent domain/motif interactions, we considered important characteristics of known domain interactors. We extracted motif-containing proteins from the proteome-of-interest and then further focused the screen into a central bullseye defined by primary characteristics (Fig. 1). Fig. 1. View largeDownload slide A tripartite bullseye defines high priority targets. From the entire proteome of the organism of interest, all motif-containing proteins are extracted. Candidate interactors are focused by the following characteristics: (A) proteins enriched in the motif-of-interest (Scansite), (B) proteins with confirmed experimental identifications of the PTM-motif (PhosphoSitePlus) and (C) motif-containing proteins that participate in enriched pathways of (A) and (B) (Reactome) Fig. 1. View largeDownload slide A tripartite bullseye defines high priority targets. From the entire proteome of the organism of interest, all motif-containing proteins are extracted. Candidate interactors are focused by the following characteristics: (A) proteins enriched in the motif-of-interest (Scansite), (B) proteins with confirmed experimental identifications of the PTM-motif (PhosphoSitePlus) and (C) motif-containing proteins that participate in enriched pathways of (A) and (B) (Reactome) First, we hypothesized that a sequence enriched in a particular domain-docking motif would, if properly modified, have a higher probability of interacting with that domain. This could facilitate multiple interactions simultaneously, allowing rapid signal propagation and increasing overall avidity. We conducted a motif enrichment analysis of the human proteome for the CRK/CRKL–SH2 binding motif. A Scansite query against the SwissProt database for all human proteins containing at least one YXXP yielded 9297 sequences (9153 unique proteins) (Supplementary Material S1). Figure 2A shows the binary logarithmic distribution of YXXP count per AA number for all human sequences (Supplementary Material S1). A total of 225 unique proteins fell two standard deviations above the mean and were extracted as the ‘Enriched’ list. Fig. 2. View largeDownload slide Formation of the tripartite bullseye of YXXP-containing proteins. (A) Binary logarithmic distribution of the number of YXXP sites per AA. Proteins that fell two standard deviations above the mean were taken as the ‘Enriched’ group. (B) Number of pYXXP experimental confirmations (sum low- and high-throughput) per protein (PhosphoSitePlus). All proteins above the mean were taken as the ‘pYXXP’ group. (C) ‘Enriched’ and ‘pYXXP’ proteins (493 Uniprot Accessions mapped to 475 NCBI Gene IDs) were subjected to a Reactome pathway analysis to extract enriched pathways. All YXXP-containing proteins in significantly enriched pathways (FDR < 0.05) were extracted as the ‘Enriched Pathways’ list. Venn diagrams show overlap of proteins in the Reactome input and enriched parent pathways with all human YXXP-containing proteins. Populations of overlapping sections are given in the subplots. Populations of the tripartite bullseye overlap (center) reflect identifiers post-Metascape conversion Fig. 2. View largeDownload slide Formation of the tripartite bullseye of YXXP-containing proteins. (A) Binary logarithmic distribution of the number of YXXP sites per AA. Proteins that fell two standard deviations above the mean were taken as the ‘Enriched’ group. (B) Number of pYXXP experimental confirmations (sum low- and high-throughput) per protein (PhosphoSitePlus). All proteins above the mean were taken as the ‘pYXXP’ group. (C) ‘Enriched’ and ‘pYXXP’ proteins (493 Uniprot Accessions mapped to 475 NCBI Gene IDs) were subjected to a Reactome pathway analysis to extract enriched pathways. All YXXP-containing proteins in significantly enriched pathways (FDR < 0.05) were extracted as the ‘Enriched Pathways’ list. Venn diagrams show overlap of proteins in the Reactome input and enriched parent pathways with all human YXXP-containing proteins. Populations of overlapping sections are given in the subplots. Populations of the tripartite bullseye overlap (center) reflect identifiers post-Metascape conversion We next reasoned that proteins empirically shown to be highly phosphorylated in pYXXP motifs would be strong candidate CRK/CRKL–SH2 interactors, and that such proteins would not necessarily be YXXP-enriched and, therefore, would only partially overlap with our ‘Enriched’ list. We extracted all proteins with experimentally identified pYXXP using PhosphoSitePlus, and constructed a distribution of the total number of pYXXP identifications per protein (Fig. 2B). The top ranking 700 of the 2086 total pYXXP proteins are shown. Of these, 289 proteins fell above the mean and were extracted as the ‘YXXP’ list. The most evident requirement for CRK/CRKL–SH2 interactors is the possession of pYXXP motifs; indeed, known interactors often harbor multiple phosphorylated motif occurrences. CAS1 (also BCAR1; Fig. 2A and B), a prominent scaffolding protein in focal adhesions, harbors 16 YXXP motifs within 870 AAs. PhosphoSitePlus has curated phospho-identifications of 15 CAS1 YXXP sites through high-throughput methods, as well as CAS1 site-specific evaluations of 13 of these. Signaling mechanisms, including adhesion-regulated SFK-induced YXXP phosphorylation of CAS1 (Hamasaki et al., 1996), induce binding of the CRK/CRKL–SH2 domain, bringing CRK/CRKL–SH3-bound cargo (e.g. C3G or DOCK180) to the CAS1-associated complex to alter cell adhesion/migration. However, only one pYXXP motif is required for a CRK/CRKL–SH2 interaction, and an enrichment assumption would fail to identify proteins such as DAB1 (two YXXP per 588 AAs). Expressed primarily in the embryonic brain, DAB1 is a SFK-mediated CRK/CRKL–SH2 interactor downstream of Reelin (Arnaud et al., 2003; Ballif et al., 2004). Although DAB1 tyrosine phosphorylation is essential for proper neuronal positioning (Howell et al., 2000), DAB1 is poorly represented in high-throughput proteomic studies, as the majority of studies are not conducted with embryonic brain tissue. PhosphoSitePlus cites 3929 total pYXXP identifications of the ubiquitously expressed BCAR1, while DAB1 has only 18 identifications. Thus, poor discovery-based identifications of YXXP-containing DAB1-like molecules might lead us to disregard them as high priority potential CRK/CRKL–SH2 interactors. In consideration of ways that DAB1-like proteins might emerge as high priority candidates, we hypothesized that proteins in PPI networks with known pYXXP substrates might have a higher probability of getting phosphorylated by an active YXXP-directed kinase. This would therefore increase their likelihood of becoming CRK/CRKL–SH2 binding partners. Corwin et al. (2017) demonstrated clustering of tyrosine kinase substrates within PPI networks when expressing human non-receptor tyrosine kinases in yeast, and similar results have been demonstrated in human PPI networks (Beltrao et al., 2012; Duan and Walther, 2015; Li et al., 2017; Woodsmith et al., 2013). These studies suggest that members of protein complexes within close proximity to a kinase have a high probability of getting phosphorylated simultaneously, and that proteins generally co-participating in signaling networks are more likely to be substrates of the same kinase. Therefore, we also considered proteins within PPI networks of the enriched YXXP-containing protein list as well as the PPI networks of the list of proteins with greater than average pYXXP identifications as a primary feature. To incorporate the PPI network aspect of the tripartite bullseye, ‘Enriched’ and ‘pYXXP’ proteins were combined to a single list that was input to Reactome.org. Of these 475 proteins, 288 mapped to the Reactome database and were used for a pathway enrichment analysis. Pathway enrichment results recovered 25 significantly-overrepresented pathways (false discovery rate (FDR) < 0.05) (Supplementary Material S2). Eleven over-represented pathways were parent or higher-level pathways. Parental pathways were assessed for overlap with the list of all YXXP-containing proteins in the human proteome (Fig. 2C). The higher-level ‘Developmental Biology’ pathway (1080 proteins) was deemed too broad for practical analysis. Extracted proteins from the remaining 10 pathways were combined to form the ‘Enriched Pathways’ list (492 proteins). Notably, the ‘Axon Guidance’ pathway extracted DAB1 as a member of this group, as well as 292 other YXXP-containing proteins that were neither found to be enriched in YXXP motifs nor identified as highly phosphorylated (Fig. 2C). The intersection of the resulting tripartite bullseye included 4 central proteins (CRKL, DOK1, DOK2 and SHB) and 97 possessing at least 2 primary characteristics. To further prioritize, we considered a series of secondary characteristics. Secondary characteristics will be based strongly on the unique goals of investigative teams; however, our approach exemplifies the process. Given our priorities in developmental cell motility-related signaling and the importance of SFKs/Abl in these processes, all known pYXXP substrates of Abl/Src/Fyn were obtained from PhosphoSitePlus using the ‘Substrates of:’ search tool (Supplementary Methods). Scansite-predicted pYXXP substrates of Src/Abl were extracted and narrowed to top-scoring proteins, and known CRK/CRKL interactors were extracted from the IntAct database (Supplementary Methods). The total number of proteins in each group was calculated for the 8887 YXXP-containing proteins previously defined in the Metascape-annotated matrix. Non-unique identifiers were removed and proteins were scored by the sum of weights (Table 1) applied via each primary or secondary feature (Supplementary Material S3). Table 1. Scoring system for primary and secondary characteristics Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Note: Webtools and databases used to annotate YXXP-containing proteins for each characteristic are listed, along with stringencies and assigned weights. All primary characteristics were weighted equally. Secondary characteristics were given 1-pt if experimentally determined, while predictive features were awarded 0.5-pt. The star (*) indicates features for which weights were potentially awarded multiple times for a given protein, as these were summed for each kinase separately. Weights summed across all primary/secondary characteristics for a given protein to obtain its Priority Score (Fig. 3C) Table 1. Scoring system for primary and secondary characteristics Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Note: Webtools and databases used to annotate YXXP-containing proteins for each characteristic are listed, along with stringencies and assigned weights. All primary characteristics were weighted equally. Secondary characteristics were given 1-pt if experimentally determined, while predictive features were awarded 0.5-pt. The star (*) indicates features for which weights were potentially awarded multiple times for a given protein, as these were summed for each kinase separately. Weights summed across all primary/secondary characteristics for a given protein to obtain its Priority Score (Fig. 3C) To determine the relative importance of primary and secondary characteristics in their predictive power, the distribution of priority scores of 25 known CRK/CRKL–SH2 interacting proteins were considered when various combinations of these predictive features were considered (Fig. 3A, Supplementary Material S3), demonstrating the effective clustering of positive controls in high-scoring regions. In addition, we assessed the enrichment of these known interactors within each category of interest in comparison to the total number of proteins in a given category (Fig. 3B). We then compared these enrichment indexes to those achieved in high-scoring regions when considering the prioritization scheme (‘All, score ≥6/8’ in Fig. 3B). Proteins scoring ≥ 8 were primarily positive controls (∼80% of the total) and, therefore, novel interactors within that scoring region would be highest-priority candidates for biochemical validation. However, we expanded our region of interest to encompass scores ≥6 to capture the top ∼1% of all YXXP-containing proteins (Fig. 3C) in order to, reasonably, capture less-studied proteins. Notably, both scoring regions effectively enriched positive controls more so than each characteristic alone and primary characteristics in combination (Fig. 3B). Surprisingly, consideration of known Src/Fyn/Abl substrates alone enriched positive controls more effectively than any other characteristic alone. However, we maintained this feature as a secondary characteristic as it could present considerable bias in favor of identifying well-characterized proteins if weighted as a primary feature. Fig. 3. View largeDownload slide Validation of primary and secondary characteristics and prioritization of potential CRK/CRKL–SH2 interactors. (A) Chosen primary (‘Enriched YXXP’, ‘Phospho-YXXP’, ‘Enriched Pathways’) and secondary (‘CRK/CRKL interactors’, ‘Predicted Kinase Substrates’, ‘Known Kinase Substrates’) characteristics were compared for their ability to prioritize known CRK/CRKL–SH2 interactors using the scoring system summarized in Table 1. The percent of positive controls scoring ≥6 and 8 are shown to the right of each histogram. Four known interactors (INPP5D [SHIP-1], DCBLD1, FLT4 [VEGFR-3] and ZAP70) remained with priority scores below 6. (B) Each primary and secondary characteristic is considered for its ability to enrich for CRK/CRKL–SH2 interactors. The ‘Enrichment Index’, defined as the percentage of positive controls relative to all YXXP-containing proteins, is plotted for each characteristic separately, as well as for combined primary characteristics. These enrichment indexes are compared with scoring regions of interest (in bold) when all characteristics are considered as outlined in Table 1. (C) All YXXP-containing proteins were scored by weighted primary and secondary features, as described in the Supplementary Methods. The y-axis for priority scores 3.5–11.5 is magnified in the inset. Proteins validated biochemically (Fig. 4) are indicated by gene symbol and are either black (induced to bind) or red (no interaction). (A and C) Percentage of positive controls scoring ≥6 and 8 are compared with all YXXP-containing proteins, highlighting the concentration of known CRK/CRKL–SH2 interactors in high-scoring bins (binwidth = 0.5). (D) All proteins that emerged with a priority score >6.5 are tabulated. Known CRK/CRKL–SH2 interactors are shown in bold Fig. 3. View largeDownload slide Validation of primary and secondary characteristics and prioritization of potential CRK/CRKL–SH2 interactors. (A) Chosen primary (‘Enriched YXXP’, ‘Phospho-YXXP’, ‘Enriched Pathways’) and secondary (‘CRK/CRKL interactors’, ‘Predicted Kinase Substrates’, ‘Known Kinase Substrates’) characteristics were compared for their ability to prioritize known CRK/CRKL–SH2 interactors using the scoring system summarized in Table 1. The percent of positive controls scoring ≥6 and 8 are shown to the right of each histogram. Four known interactors (INPP5D [SHIP-1], DCBLD1, FLT4 [VEGFR-3] and ZAP70) remained with priority scores below 6. (B) Each primary and secondary characteristic is considered for its ability to enrich for CRK/CRKL–SH2 interactors. The ‘Enrichment Index’, defined as the percentage of positive controls relative to all YXXP-containing proteins, is plotted for each characteristic separately, as well as for combined primary characteristics. These enrichment indexes are compared with scoring regions of interest (in bold) when all characteristics are considered as outlined in Table 1. (C) All YXXP-containing proteins were scored by weighted primary and secondary features, as described in the Supplementary Methods. The y-axis for priority scores 3.5–11.5 is magnified in the inset. Proteins validated biochemically (Fig. 4) are indicated by gene symbol and are either black (induced to bind) or red (no interaction). (A and C) Percentage of positive controls scoring ≥6 and 8 are compared with all YXXP-containing proteins, highlighting the concentration of known CRK/CRKL–SH2 interactors in high-scoring bins (binwidth = 0.5). (D) All proteins that emerged with a priority score >6.5 are tabulated. Known CRK/CRKL–SH2 interactors are shown in bold The distribution of priority scores for all YXXP-containing proteins is shown in Figure 3C. Proteins chosen for biochemical validation in a pulldown assay are indicated in either black (confirmed CRKL–SH2 interactors) or red (no interaction). Top-scoring proteins (>6.5) are displayed in Figure 3D, with known CRK/CRKL–SH2 interactors highlighted in bold font. 3.2 Biochemical validation: identification of novel CRKL–SH2 interactors For biochemical validation, we chose 10 proteins (Fig. 3C) to test in a CRKL–SH2 pulldown assay following co-expression with c-Abl. In addition to choosing several high-scoring proteins, we chose one protein with a low priority score friend leukemia integration 1 transcription factor (FLI1) and one highly enriched in YXXP sites but with few pYXXP identifications deleted in azoospermia protein 2 (DAZ2). In candidate selection, we also considered proteins with strong YXXP conservation across vertebrates and those expressed in the developing nervous system Mouse Genome Informatics (MGI) (Blake et al., 2017). cDNA constructs of selected candidates were expressed in HEK293 cells with or without c-Abl. Although not a suitable kinase for all YXXP tyrosines, c-Abl was selected for its robustness and high selectivity for YXXP substrates (Ballif et al., 2008; Colicelli, 2010). Cell lysates were incubated with GST-CRKL–SH2 resin and following washing, bound protiens were subjected to SDS-PAGE and immunoblotting. Figure 4 shows immunoblots of pulldown assays alongside schematics of each protein tested, which display domain structure, YXXP location, experimental pYXXP identifications and percent motif conservation. For reference, similar schematics for select positive controls are summarized in Supplementary Figure S2. Fig. 4. View largeDownload slide Domain structure, YXXP conservation and GST-CRKL–SH2 pulldowns of tested candidates. Mammalian expression vectors were introduced to HEK293 cells with/without c-Abl-Flag. GST-CRKL–SH2 pulldown assays were performed on cell extracts. Ponceau staining indicates relative GST-CRKL–SH2 levels. Immunoblotting was conducted with the indicated antibodies. Candidate domain structure and YXXP sites are shown with their corresponding blots. Percentages denote conservation of YXXP across five representative vertebrates, with two exceptions. DAZ2 is specific to higher primates; therefore, conservation could not be assessed across multiple vertebrate taxa. EMD was not found in Gallus gallus; therefore, percentage values reflect conservation across only four vertebrates. The number of experimental pYXXP identifications (if > 5) is given at the indicated tyrosine residue (PhosphoSitePlus) Fig. 4. View largeDownload slide Domain structure, YXXP conservation and GST-CRKL–SH2 pulldowns of tested candidates. Mammalian expression vectors were introduced to HEK293 cells with/without c-Abl-Flag. GST-CRKL–SH2 pulldown assays were performed on cell extracts. Ponceau staining indicates relative GST-CRKL–SH2 levels. Immunoblotting was conducted with the indicated antibodies. Candidate domain structure and YXXP sites are shown with their corresponding blots. Percentages denote conservation of YXXP across five representative vertebrates, with two exceptions. DAZ2 is specific to higher primates; therefore, conservation could not be assessed across multiple vertebrate taxa. EMD was not found in Gallus gallus; therefore, percentage values reflect conservation across only four vertebrates. The number of experimental pYXXP identifications (if > 5) is given at the indicated tyrosine residue (PhosphoSitePlus) To demonstrate the potential limitations of solely considering YXXP-enriched proteins, we compared the CRKL–SH2-binding ability of DAZ2 (15 YXXP, 558 AAs), a spermatogenesis-related protein (Reijo et al., 1996), to that of FLI1 (3 YXXP, 452 AAs) which is implicated in cell growth and malignancy (Brown et al., 2000; Truong and Ben-David, 2000). Although highly YXXP-enriched, DAZ2 was not an obvious CRK/CRKL–SH2 interactor for several reasons. The Y-chromosomal DAZ genes (DAZ1–4), common only to higher primates, possess 1–15 repeats of a 24-AA polymorphic sequence. Conservation of these repeats implies important functionality, however, only six possess a YXXP motif (Fu et al., 2015), suggesting that this motif is not essential to the conserved repeat function. Furthermore, DAZ2 had no other primary or secondary features to suggest it might interact with the CRK/CRKL–SH2 domain. Our biochemical analysis found that c-Abl co-expression did not induce DAZ2 to bind the CRKL–SH2 domain (Fig. 4A). Although it is formally possible that another tyrosine kinase with a YXXP target preference (Supplementary Fig. S1, Supplementary Table S1) can phosphorylate DAZ2, it is important to note that a protein with 15 target motifs cannot be induced spuriously to bind the CRKL–SH2 domain by co-expressing a kinase with a strong YXXP-target preference. Although FLI1 is not YXXP-enriched, it does possess one YXXP site with 26 pTyr identifications. Still, this fell below the average pYXXP number per protein and, with no score from any primary or secondary feature, FLI1 received a priority score of zero (Fig. 3C). Mechanistically, FLI1 has been shown to block erythropoietin-induced differentiation and promote erythroblast proliferation, however, the signaling mechanisms involved remain unknown (Pereira et al., 1999; Tamir et al., 1999). FLI1’s involvement in CRK/CRKL-related processes and expression in the developing nervous system (MGI) led us to investigate its CRKL–SH2-binding ability. Surprisingly, FLI1 was induced to bind the CRKL–SH2 domain when co-expressed with c-Abl (Fig. 4A), suggesting this interaction may be important in FLI1 signaling to modulate its roles in cell proliferation, differentiation or transformation. We next chose to test two high-scoring proteins that are expressed in the developing nervous system, WW domain-binding protein 2 (WBP2) and CRKL itself. WBP2 was the most YXXP-enriched protein identified, possessing 11 YXXP motifs within 261 AAs, but with motifs showing variable conservation across vertebrates (20–100%). As a coactivator of the estrogen receptor (ER) (Dhananjayan et al., 2006), WBP2 mediates proliferation/differentiation associated with breast cancer by regulating ER target gene expression (Buffa et al., 2013). WBP2 is phosphorylated on YXXP Tyr192 and Tyr231 by Src and Yes downstream of EGF, and overexpression in mice induces ER-dependent and independent loss of cell adhesion and increased tumor proliferation/invasion (Lim et al., 2011). Figure 4B displays the GST-CRKL–SH2 pulldown, demonstrating Abl-induced binding of WBP2. Currently, we are working to determine signaling pathways in which a WBP2/CRKL interplay makes a contribution. Intriguingly, CRKL is highly enriched with five strongly-conserved YXXP motifs (within 303 AAs). Three sites have been found phosphorylated >100 times, with pTyr207 possessing 1212 identifications (Fig. 4B); however, the functional significance has not been fully analyzed. The analogous CRK Tyr221 has been characterized to be part of a negative-regulatory intramolecular pTyr221/CRK-SH2 interaction, which prevents intermolecular CRK-SH2/pTyr and CRK-SH3/PXXPXK interactions (Chodniewicz and Klemke, 2004). Although a similar mechanism is presumed for CRKL Tyr207, this remains to be demonstrated. Additionally, CRKL pTyr207 or other sites might facilitate intermolecular dimerization, as has been demonstrated for CRK (Feller et al., 1994). To test this, we conducted our pulldown on extracts from cells co-expressing c-Abl and CRKL, and found that Abl induced a CRKL/CRKL–SH2 interaction (Fig. 4B). Additional proteins tested are shown in Figure 4C–E. We observed Abl-induced CRKL–SH2 binding of transmembrane protein 192 (TMEM192), emerin (EMD), PDZ and LIM domain protein 5 (PDLIM5), KIAA1143 and caveolin-1 (CAV1). However, neuronally expressed EPHA2 (MGI), with a high pYXXP count on conserved motifs, did not interact with CRKL–SH2 when co-expressed with c-Abl (Fig. 4D). EPHA2 regulates cell migration, adhesion and differentiation (Lin et al., 2010; Miao et al., 2009) and is required for proper lens organization in mice (Cheng et al., 2013). Despite these signature CRKL–SH2 interactor characteristics, EPHA2 was not induced to bind, although it is possible that pTyr588/pTyr594 is not Abl-induced. Together these data validate an Abl-mediated interaction between the CRKL–SH2 domain and several proteins identified from our screen. 3.3 Application to the methylated RG motif: a proof-of-principle Supplementary Figure S3 summarizes the in silico screen in a stepwise workflow that includes a generalized approach alongside examples from our CRKL–SH2 screen. As a proof-of-principle, we applied this screen to a different modification-dependent interaction, namely, those facilitated by arginine methylation in RG motifs (MeRG). RGG/GRG motifs are the preferred targets of many PRMTs, inducing protein–protein/nucleic acid interactions (Blanc and Richard, 2017; Thandapani et al., 2013). To date, the Tudor domain is the only known MeArg-binding domain; however, many PPIs are mediated through MeRG motifs, suggesting that additional MeArg-binding domains remain undiscovered. Figure 5 shows the formation of the tripartite bullseye from all human RG-containing proteins, using the same databases and approach outlined in Supplementary Figure S3. The results had a generally similar profile to our YXXP analysis (Fig. 5, Supplementary Material S4). These primary characteristics successfully focused known Tudor interactors and PRMT substrates into the central bullseye (Supplementary Material S4C). Among these were SMD3, which, along with its homolog SMD1 (‘Enriched’/‘MeRG’ overlap), binds the Tudor domain of survivor motor neuron protein (SMN) (Friesen et al., 2001), and Sam68, a MeRG-dependent SND1-Tudor interactor (Cappellari et al., 2014). Another important SMN–Tudor interactor, GAR1 (Whitehead et al., 2002), emerged in the overlap of the enriched RG and enriched pathways groups. This provides a strong foundation for the application of appropriate secondary features, prioritization and biochemical validation. Fig. 5. View largeDownload slide Formation of the tripartite bullseye of high-priority RG-containing proteins. (A) Binary logarithmic distribution of the number of RG sites per AA. Proteins two standard deviations above the mean were extracted as the ‘Enriched’ group. (B) Number of experimental MeRG confirmations per protein (PhosphositePlus). All proteins above the mean were taken as the ‘Modifed’ list and contained 229 unique entries as parsed by Metascape. (C) Metascape pathway enrichment analysis shows the overlap of the enriched pathways within the ‘Enriched’ and ‘Modified’ lists with all RG-containing proteins. Eight enriched parent pathways (FDR < 0.05) of (A) and (B) were obtained via Reactome. All RG-containing proteins in enriched pathways were extracted as the ‘Enriched Pathways’. Populations of overlap are given in the subplots Fig. 5. View largeDownload slide Formation of the tripartite bullseye of high-priority RG-containing proteins. (A) Binary logarithmic distribution of the number of RG sites per AA. Proteins two standard deviations above the mean were extracted as the ‘Enriched’ group. (B) Number of experimental MeRG confirmations per protein (PhosphositePlus). All proteins above the mean were taken as the ‘Modifed’ list and contained 229 unique entries as parsed by Metascape. (C) Metascape pathway enrichment analysis shows the overlap of the enriched pathways within the ‘Enriched’ and ‘Modified’ lists with all RG-containing proteins. Eight enriched parent pathways (FDR < 0.05) of (A) and (B) were obtained via Reactome. All RG-containing proteins in enriched pathways were extracted as the ‘Enriched Pathways’. Populations of overlap are given in the subplots 4 Discussion and conclusions Here, we present a generalized in silico proteomics screen that utilizes publicly available databases/tools to predict and prioritize domain-motif interactions. Using the CRK/CRKL–SH2-pYXXP interaction as an example, we successfully identified potential interactors with a priority-scoring system using signature CRKL–SH2 interactor characteristics, employing curated PTM data, PPI networks, and the molecular/cellular roles of CRK/CRKL in the developing nervous system. Alongside other known CRK/CRKL–SH2 interactors, our positive controls emerged with high priority scores (Fig. 3A–C, Supplementary Material S3). In spite of the demonstrated success of our approach, it presents certain limitations. Although we successfully identified the critical CRK/CRKL interactor DAB1 using the pathway enrichment data, and possibly other DAB1-like molecules not enriched in YXXP motifs or with few pYXXP identifications, not all such proteins would be readily detected. Proteins that are not characterized in PPI networks at the gene ontology level could continue to receive low priority scores. However, as new information populates the databases/tools employed here, relevant but obscured proteins will increase their priority scores. If identified as direct CRK/CRKL–SH2 interactors in proteomics screens, the in silico screen would be less important. However, if proteins emerge as general CRK/CRKL interactors, are placed in CRK/CRKL-related networks, or are identified experimentally with high pYXXP counts in a new tissue-type, then their priority scores will increase in a future repetition of the screen. This could be the fate of proteins such as FLI1, which received no points and bound the CRKL–SH2 domain in Abl-active conditions. These points represent a bias toward well-characterized proteins and those upregulated in cancer cells, from which the bulk of available proteomics data is composed. However, it is argued that our in silico approach provides a strong user-friendly companion to directed proteomics screens. Although we chose to include all YXXP-containing proteins within enriched Reactome pathways as a primary characteristic, researchers could further narrow their screen by choosing enriched pathways that are of relevance to their area of study. Such an approach would allow focus on a biological pathway of interest, rather than all enriched pathways. Motif conservation and tissue-specific expression were taken into account post-scoring; however, these could be included as weighted secondary characteristics in a future iteration. We attempted to integrate AL2CO conservation analysis (Pei and Grishin, 2001), but found it was not easily amenable to bulk analysis of motif conservation across our chosen taxa. Motif count per protein would be an acceptable proxy for evolutionary conservation, and we conducted this for YXXP motifs within mouse and zebrafish using a Scansite query (Supplementary Material S1). Although certain challenges were encountered in mapping non-human protein identifiers to human (Supplementary Material S1), these will be less of an issue as proteomes of other organisms become more comprehensive. Binding motif surface accessibility is another attractive secondary feature to consider, as it would theoretically weight likely PPI surfaces. We sought to include this as a weighted characteristic; however, upon reviewing the currently available surface accessibility prediction software in their ability to predict exposure of well characterized phosphorylated-YXXP motifs within known CRKL–SH2 interactors, we found that many of these tyrosine residues were predicted to be buried. Further, we did not find programs that would accommodate batch searches of ∼9000 sequences. For these reasons, we concluded that our phospho-YXXP dataset was sufficient to represent this aspect in our screen. However, this has limitations including when motifs remain hidden due to partial protein coverage when using bottom-up proteomics with common proteolytic enzymes such as trypsin. Additionally, protein expression/motif modification in specific tissues/environments that are under-sampled will leave some modifications hidden. We employed expression databases to determine whether high-scoring proteins were expressed in our tissues of interest. We used the MGI database to query expression levels in mice, which is easily focused to a tissue/developmental stage of choice. Attempting to include tissue-of-interest expression as a secondary feature, we mapped mRNA expression in the embryonic nervous system to our protein list using the MGI mouse-to-human identifier conversion tool. However, we encountered issues in the conversion of unconserved genes, such as dazl, which mapped to DAZ2 in our dataset. DAZ2, specific to higher primates, is only expressed in the male germline; therefore, we restricted prioritization based on expression data to a case-by-case search applied to high-scoring candidates from the initial screen. The databases used here are generally applicable to domain/motif interactions with post-translationally modified proteins, including PTMs beyond phosphorylation. With our approach, and potentially the inclusion of additional generalized and specific PTM databases/predictive tools for primary/secondary characteristics (Chen et al., 2017), we anticipate that non-specialists can easily employ this strategy, using in silico proteomics to unveil the identities of novel proteins relevant to specific biological mechanisms of signal transduction. We anticipate this screen will be an important, rapid first step to assist investigators in identifying top candidates warranting biochemical and genetic examination in the signaling systems they are studying. Funding This work was supported by U.S. National Science Foundation IOS awards [1021795 and 1656510]; the Vermont Genetics Network through U. S. National Institutes of Health award [8P20GM103449] from the INBRE program of the NIGMS; and U.S. National Institutes of Health award [5P20RR016435] from the COBRE program of the NIGMS. Conflict of Interest: none declared. References Arnaud L. et al. ( 2003 ) Fyn tyrosine kinase is a critical regulator of disabled-1 during brain development . Curr. Biol ., 13 , 9 – 17 . Google Scholar Crossref Search ADS PubMed Aten T.M. et al. ( 2013 ) Tyrosine phosphorylation of the orphan receptor ESDN/DCBLD2 serves as a scaffold for the signaling adaptor CrkL . FEBS Lett ., 587 , 2313 – 2318 . Google Scholar Crossref Search ADS PubMed Ballif B.A. et al. ( 2004 ) Activation of a Dab1/CrkL/C3G/Rap1 pathway in Reelin-stimulated neurons . Curr. Biol ., 14 , 606 – 610 . Google Scholar Crossref Search ADS PubMed Ballif B.A. et al. ( 2008 ) Large-scale identification and evolution indexing of tyrosine phosphorylation sites from murine brain . J. Proteome Res ., 7 , 311 – 318 . Google Scholar Crossref Search ADS PubMed Beltrao P. et al. ( 2012 ) Systematic functional prioritization of protein posttranslational modifications . Cell , 150 , 413 – 425 . Google Scholar Crossref Search ADS PubMed Blake J.A. et al. ( 2017 ) Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse . Nucleic Acids Res ., 45 , D723 – D729 . Google Scholar Crossref Search ADS PubMed Blanc R.S. , Richard S. ( 2017 ) Arginine methylation: the coming of age . Mol. Cell , 65 , 8 – 24 . Google Scholar Crossref Search ADS PubMed Brábek J. et al. ( 2005 ) Crk-associated substrate tyrosine phosphorylation sites are critical for invasion and metastasis of SRC-transformed cells . Mol. Cancer Res ., 3 , 307 – 315 . Google Scholar Crossref Search ADS PubMed Brown L.A. et al. ( 2000 ) Insights into early vasculogenesis revealed by expression of the ETS-domain transcription factor Fli-1 in wild-type and mutant zebrafish embryos . Mech. Develop ., 90 , 237 – 252 . Google Scholar Crossref Search ADS Buffa L. et al. ( 2013 ) Molecular mechanism of WW‐domain binding protein‐2 coactivation function in estrogen receptor signaling . IUBMB Life , 65 , 76 – 84 . Google Scholar Crossref Search ADS PubMed Cappellari M. et al. ( 2014 ) The transcriptional co-activator SND1 is a novel regulator of alternative splicing in prostate cancer cells . Oncogene , 33 , 3794 . Google Scholar Crossref Search ADS PubMed Chen C. et al. ( 2017 ) Protein bioinformatics databases and resources . In: Wu C. , Arighi C. , Ross K. (eds) Protein Bioinformatics. Methods in Molecular Biology , pp. 3 – 39 , Humana Press , New York, NY . Cheng C. et al. ( 2013 ) EphA2 and Src regulate equatorial cell morphogenesis during lens development . Development , 140 , 4237 – 4245 . Google Scholar Crossref Search ADS PubMed Chodniewicz D. , Klemke R.L. ( 2004 ) Regulation of integrin-mediated cellular responses through assembly of a CAS/Crk scaffold . Biochim. Biophys. Acta Mol. Cell Res ., 1692 , 63 – 76 . Google Scholar Crossref Search ADS Colicelli J. ( 2010 ) ABL tyrosine kinases: evolution of function, regulation, and specificity . Sci. Signal ., 3 , re6. Google Scholar Crossref Search ADS PubMed Corwin T. et al. ( 2017 ) Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate . Cell Syst ., 5 , 128 – 139 . e124. Google Scholar Crossref Search ADS PubMed Dhananjayan S.C. et al. ( 2006 ) WW domain binding protein-2, an E6-associated protein interacting protein, acts as a coactivator of estrogen and progesterone receptors . Mol. Endocrinol ., 20 , 2343 – 2354 . Google Scholar Crossref Search ADS PubMed Duan G. , Walther D. ( 2015 ) The roles of post-translational modifications in the context of protein interaction networks . PLoS Comput. Biol ., 11 , e1004049. Google Scholar Crossref Search ADS PubMed Feller S.M. et al. ( 1994 ) c‐Abl kinase regulates the protein binding activity of c‐Crk . EMBO J ., 13 , 2341 – 2351 . Google Scholar Crossref Search ADS PubMed Friesen W.J. et al. ( 2001 ) SMN, the product of the spinal muscular atrophy gene, binds preferentially to dimethylarginine-containing protein targets . Mol. Cell , 7 , 1111 – 1117 . Google Scholar Crossref Search ADS PubMed Fu X.-F. et al. ( 2015 ) DAZ family proteins, key players for germ cell development . Int. J. Biol. Sci ., 11 , 1226. Google Scholar Crossref Search ADS PubMed Hamasaki K. et al. ( 1996 ) Src kinase plays an essential role in integrin-mediated tyrosine phosphorylation of Crk-associated substrate p130Cas . Biochem. Biophys. Res. Commun ., 222 , 338 – 343 . Google Scholar Crossref Search ADS PubMed Howell B.W. et al. ( 2000 ) Dab1 tyrosine phosphorylation sites relay positional signals during mouse brain development . Curr. Biol ., 10 , 877 – 885 . Google Scholar Crossref Search ADS PubMed Klemke R.L. et al. ( 1998 ) CAS/Crk coupling serves as a “molecular switch” for induction of cell migration . J. Cell Biol ., 140 , 961 – 972 . Google Scholar Crossref Search ADS PubMed Li Y. et al. ( 2017 ) Co-occurring protein phosphorylation are functionally associated . PLoS Comput. Biol ., 13 , e1005502. Google Scholar Crossref Search ADS PubMed Lim S.K. et al. ( 2011 ) Tyrosine phosphorylation of transcriptional coactivator WW-domain binding protein 2 regulates estrogen receptor α function in breast cancer via the Wnt pathway . FASEB J ., 25 , 3004 – 3018 . Google Scholar Crossref Search ADS PubMed Lin S. et al. ( 2010 ) Ligand targeting of EphA2 enhances keratinocyte adhesion and differentiation via desmoglein 1 . Mol. Biol. Cell , 21 , 3902 – 3914 . Google Scholar Crossref Search ADS PubMed Matsuda M. et al. ( 1991 ) Identification of domains of the v-crk oncogene product sufficient for association with phosphotyrosine-containing proteins . Mol. Cell. Biol ., 11 , 1607 – 1613 . Google Scholar Crossref Search ADS PubMed Mayer B.J. ( 2015 ) The discovery of modular binding domains: building blocks of cell signalling . Nat. Rev. Mol. Cell Biol ., 16 , 691 – 698 . Google Scholar Crossref Search ADS PubMed Mayer B.J. et al. ( 1991 ) The noncatalytic src homology region 2 segment of abl tyrosine kinase binds to tyrosine-phosphorylated cellular proteins with high affinity . Proc. Natl. Acad. Sci. USA , 88 , 627 – 631 . Google Scholar Crossref Search ADS Miao H. et al. ( 2009 ) EphA2 mediates ligand-dependent inhibition and ligand-independent promotion of cell migration and invasion via a reciprocal regulatory loop with Akt . Cancer Cell , 16 , 9 – 20 . Google Scholar Crossref Search ADS PubMed Moran M.F. et al. ( 1990 ) Src homology region 2 domains direct protein-protein interactions in signal transduction . Proc. Natl. Acad. Sci. USA , 87 , 8622 – 8626 . Google Scholar Crossref Search ADS Park T. , Curran T. ( 2014 ) Essential roles of Crk and CrkL in fibroblast structure and motility . Oncogene , 33 , 5121 – 5132 . Google Scholar Crossref Search ADS PubMed Park T.-J. , Curran T. ( 2008 ) Crk and Crk-like play essential overlapping roles downstream of disabled-1 in the Reelin pathway . J. Neurosci ., 28 , 13551 – 13562 . Google Scholar Crossref Search ADS PubMed Pei J. , Grishin N.V. ( 2001 ) AL2CO: calculation of positional conservation in a protein sequence alignment . Bioinformatics , 17 , 700 – 712 . Google Scholar Crossref Search ADS PubMed Pereira R. et al. ( 1999 ) FLI-1 inhibits differentiation and induces proliferation of primary erythroblasts . Oncogene , 18 , 1597 . Google Scholar Crossref Search ADS PubMed Reijo R. et al. ( 1996 ) Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene . Hum. Reprod ., 11 , 27 – 54 . Google Scholar Crossref Search ADS PubMed Sadowski I. et al. ( 1986 ) A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps . Mol. Cell. Biol. , 6 , 4396 – 4308 . Google Scholar Crossref Search ADS PubMed Schmoker A.M. et al. ( 2017 ) Dynamic multi-site phosphorylation by Fyn and Abl drives the interaction between CRKL and the novel scaffolding receptors DCBLD1 and DCBLD2 . Biochem. J ., 474 , 3963 – 3984 . Google Scholar Crossref Search ADS PubMed Seet B.T. et al. ( 2006 ) Reading protein modifications with interaction domains . Nature Reviews Molecular Cell Biology , 7 , 473 – 483 . Google Scholar Crossref Search ADS PubMed Tamir A. et al. ( 1999 ) Fli-1, an Ets-related transcription factor, regulates erythropoietin-induced erythroid proliferation and differentiation: evidence for direct transcriptional repression of the Rb gene during differentiation . Mol. Cell. Biol ., 19 , 4452 – 4464 . Google Scholar Crossref Search ADS PubMed Thandapani P. et al. ( 2013 ) Defining the RGG/RG motif . Mol. Cell , 50 , 613 – 623 . Google Scholar Crossref Search ADS PubMed Truong A.H. , Ben-David Y. ( 2000 ) The role of Fli-1 in normal cell function and malignant transformation . Oncogene , 19 , 6482. Google Scholar Crossref Search ADS PubMed Whitehead S.E. et al. ( 2002 ) Determinants of the interaction of the spinal muscular atrophy disease protein SMN with the dimethylarginine-modified box H/ACA small nucleolar ribonucleoprotein GAR1 . J. Biol. Chem ., 277 , 48087 – 48093 . Google Scholar Crossref Search ADS PubMed Woodsmith J. et al. ( 2013 ) Dual coordination of post translational modifications in human protein networks . PLoS Comput. Biol ., 9 , e1002933 . Google Scholar Crossref Search ADS PubMed Wu X. et al. ( 1995 ) Structural basis for the specific interaction of lysine-containing proline-rich peptides with the N-terminal SH3 domain of c-Crk . Structure , 3 , 215 – 226 . Google Scholar Crossref Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

An in silico proteomics screen to predict and prioritize protein–protein interactions dependent on post-translationally modified motifs

Loading next page...
 
/lp/ou_press/an-in-silico-proteomics-screen-to-predict-and-prioritize-protein-0bwLHixsTq
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty434
Publisher site
See Article on Publisher Site

Abstract

Abstract Motivation The development of proteomic methods for the characterization of domain/motif interactions has greatly expanded our understanding of signal transduction. However, proteomics-based binding screens have limitations including that the queried tissue or cell type may not harbor all potential interacting partners or post-translational modifications (PTMs) required for the interaction. Therefore, we sought a generalizable, complementary in silico approach to identify potentially novel motif and PTM-dependent binding partners of high priority. Results We used as an initial example the interaction between the Src homology 2 (SH2) domains of the adaptor proteins CT10 regulator of kinase (CRK) and CRK-like (CRKL) and phosphorylated-YXXP motifs. Employing well-curated, publicly-available resources, we scored and prioritized potential CRK/CRKL–SH2 interactors possessing signature characteristics of known interacting partners. Our approach gave high priority scores to 102 of the >9000 YXXP motif-containing proteins. Within this 102 were 21 of the 25 curated CRK/CRKL–SH2-binding partners showing a more than 80-fold enrichment. Several predicted interactors were validated biochemically. To demonstrate generalized applicability, we used our workflow to predict protein–protein interactions dependent upon motif-specific arginine methylation. Our data demonstrate the applicability of our approach to, conceivably, any modular binding domain that recognizes a specific post-translationally modified motif. Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction The first description of the Src homology 2 (SH2) domain (Sadowski et al., 1986) and the subsequent discovery that SH2 domains bind phosphorylated tyrosine (pTyr) residues (Matsuda et al., 1991; Mayer et al., 1991; Moran et al., 1990) enlightened our previous understanding of signal transduction, taking pTyr past simple allosteric regulation to a world of multiprotein complexes. Our emerging understanding involved enzyme-induced protein/lipid modifications that could modulate the formation/dissociation of signaling hubs, affecting affinities of molecular interactions and subcellular localization (Mayer, 2015). The characterization of the SH2 domain spurred the identification of additional post-translational modification (PTM) recognition domains (Seet et al., 2006), most of which possess a unique affinity for amino acids (AAs) surrounding the modified site. The SH2 domain is comprised of a highly conserved sequence of ∼100 AAs found in many adaptors, scaffolding proteins, transcription factors, and non-receptor tyrosine kinases (Sadowski et al., 1986). SH2 domains bind pTyr residues within motifs specific to each SH2 domain, linking tyrosine kinases and their substrates with downstream effectors. CT10 regulator of kinase (CRK) and CRK-like (CRKL) are broadly expressed adaptors that execute central roles in complex formation during fundamental cellular processes including differentiation, proliferation, and migration (Brábek et al., 2005; Klemke et al., 1998; Park and Curran, 2014). Each family member possesses a single SH2 domain that binds phosphorylated YXXP (pYXXP) motifs and two SH3 domains, although the C-terminal CRK-SH3 can be deleted through alternative splicing. The N-terminal CRK/CRKL–SH3 domain is responsible for most intermolecular interactions and binds PXXPXK sequences (Wu et al., 1995). Although these adaptors facilitate complex assembly required for many well-studied metazoan signaling mechanisms, CRK family members are hypothesized to serve additional undiscovered roles. The CRK/CRKL–SH2 domain binds with high specificity to pYXXP, generated by the activity of kinases including Src family kinases (SFKs), Abl/Arg, focal adhesion kinase (FAK), tyrosine-protein kinase (SYK), platelet-derived growth factor receptor (PDGFR) and epidermal growth factor receptor (EGFR) (Supplementary Fig. S1, Supplementary Table S1). SFKs/Abl are responsible for phosphorylation events critical to CRK/CRKL binding in several important systems, including Reelin signaling (Ballif et al., 2004), and focal adhesion dynamics (Chodniewicz and Klemke, 2004; Park and Curran, 2008). Previously, we conducted a proteomics screen that aimed to identify SFK substrates whose pYXXP motifs would bind CRKL–SH2 in HEK293 cells (Aten et al., 2013). We identified the novel CRKL–SH2 interactor discoidin, CUB, and LCCL domain-containing protein 2 (DCBLD2), also endothelial and smooth muscle cell-derived neuropilin-like (ESDN) and CUB, LCCL-homology, and coagulation factor V/VIII-homology domains protein 1 (CLCP1), a scaffolding receptor with seven intracellular YXXP motifs. We recently characterized DCBLD2 alongside its family member DCBLD1, and found them to be SFK-/Abl-mediated pYXXP-dependent CRKL–SH2 interactors (Schmoker et al., 2017). Although proteomic methods that facilitate characterization of domain/motif interactions have accelerated explorations into mechanistic signaling, proteomics-based binding screens are accompanied by limitations. The organism/tissue/cell type employed may not express all potential interactors. Further, PTM-dependent screens may fail to identify important interactions if certain modifying enzymes (kinases/ligases/transferases) are not sufficiently activated, particularly if only a subset of cells within a tissue have the relevant signaling pathway engaged. Therefore, we sought a complementary approach to identify interactors of modular domains that might circumvent these limitations. Here we describe an in silico screen that uses signature characteristics of known domain/motif interactions and empirical data from mass spectrometric screens to predict PTM-dependent motif-specific interactions. By approaching this question from a bioinformatics perspective, the query is not limited to the proteome of a particular cell type, but is expanded to encompass that of an entire organism. As an initial example, we explored the CRK/CRKL–SH2-pYXXP interaction. Using well-curated databases and predictive tools, we compiled lists of proteins possessing defined characteristics of CRK/CRKL–SH2 domain interactors and then weighted and prioritized candidates by list membership. Our application of this bioinformatics pipeline to the CRK/CRKL–SH2-pYXXP interaction was successful in identifying both known and novel CRK/CRKL–SH2 candidate interactors, and several novel candidates were validated biochemically. We then tested our generalized workflow on the prediction of protein–protein interactions (PPIs) requiring motif-specific arginine methylation. Together our data show the applicability of this approach to, conceivably, any modular domain that recognizes a specific modified motif. 2 Materials and methods See Supplementary Material for a full description of experimental procedures. 3 Results 3.1 In silico motif-based proteomics screen To formulate a generalizable workflow for prioritizing PTM-dependent domain/motif interactions, we considered important characteristics of known domain interactors. We extracted motif-containing proteins from the proteome-of-interest and then further focused the screen into a central bullseye defined by primary characteristics (Fig. 1). Fig. 1. View largeDownload slide A tripartite bullseye defines high priority targets. From the entire proteome of the organism of interest, all motif-containing proteins are extracted. Candidate interactors are focused by the following characteristics: (A) proteins enriched in the motif-of-interest (Scansite), (B) proteins with confirmed experimental identifications of the PTM-motif (PhosphoSitePlus) and (C) motif-containing proteins that participate in enriched pathways of (A) and (B) (Reactome) Fig. 1. View largeDownload slide A tripartite bullseye defines high priority targets. From the entire proteome of the organism of interest, all motif-containing proteins are extracted. Candidate interactors are focused by the following characteristics: (A) proteins enriched in the motif-of-interest (Scansite), (B) proteins with confirmed experimental identifications of the PTM-motif (PhosphoSitePlus) and (C) motif-containing proteins that participate in enriched pathways of (A) and (B) (Reactome) First, we hypothesized that a sequence enriched in a particular domain-docking motif would, if properly modified, have a higher probability of interacting with that domain. This could facilitate multiple interactions simultaneously, allowing rapid signal propagation and increasing overall avidity. We conducted a motif enrichment analysis of the human proteome for the CRK/CRKL–SH2 binding motif. A Scansite query against the SwissProt database for all human proteins containing at least one YXXP yielded 9297 sequences (9153 unique proteins) (Supplementary Material S1). Figure 2A shows the binary logarithmic distribution of YXXP count per AA number for all human sequences (Supplementary Material S1). A total of 225 unique proteins fell two standard deviations above the mean and were extracted as the ‘Enriched’ list. Fig. 2. View largeDownload slide Formation of the tripartite bullseye of YXXP-containing proteins. (A) Binary logarithmic distribution of the number of YXXP sites per AA. Proteins that fell two standard deviations above the mean were taken as the ‘Enriched’ group. (B) Number of pYXXP experimental confirmations (sum low- and high-throughput) per protein (PhosphoSitePlus). All proteins above the mean were taken as the ‘pYXXP’ group. (C) ‘Enriched’ and ‘pYXXP’ proteins (493 Uniprot Accessions mapped to 475 NCBI Gene IDs) were subjected to a Reactome pathway analysis to extract enriched pathways. All YXXP-containing proteins in significantly enriched pathways (FDR < 0.05) were extracted as the ‘Enriched Pathways’ list. Venn diagrams show overlap of proteins in the Reactome input and enriched parent pathways with all human YXXP-containing proteins. Populations of overlapping sections are given in the subplots. Populations of the tripartite bullseye overlap (center) reflect identifiers post-Metascape conversion Fig. 2. View largeDownload slide Formation of the tripartite bullseye of YXXP-containing proteins. (A) Binary logarithmic distribution of the number of YXXP sites per AA. Proteins that fell two standard deviations above the mean were taken as the ‘Enriched’ group. (B) Number of pYXXP experimental confirmations (sum low- and high-throughput) per protein (PhosphoSitePlus). All proteins above the mean were taken as the ‘pYXXP’ group. (C) ‘Enriched’ and ‘pYXXP’ proteins (493 Uniprot Accessions mapped to 475 NCBI Gene IDs) were subjected to a Reactome pathway analysis to extract enriched pathways. All YXXP-containing proteins in significantly enriched pathways (FDR < 0.05) were extracted as the ‘Enriched Pathways’ list. Venn diagrams show overlap of proteins in the Reactome input and enriched parent pathways with all human YXXP-containing proteins. Populations of overlapping sections are given in the subplots. Populations of the tripartite bullseye overlap (center) reflect identifiers post-Metascape conversion We next reasoned that proteins empirically shown to be highly phosphorylated in pYXXP motifs would be strong candidate CRK/CRKL–SH2 interactors, and that such proteins would not necessarily be YXXP-enriched and, therefore, would only partially overlap with our ‘Enriched’ list. We extracted all proteins with experimentally identified pYXXP using PhosphoSitePlus, and constructed a distribution of the total number of pYXXP identifications per protein (Fig. 2B). The top ranking 700 of the 2086 total pYXXP proteins are shown. Of these, 289 proteins fell above the mean and were extracted as the ‘YXXP’ list. The most evident requirement for CRK/CRKL–SH2 interactors is the possession of pYXXP motifs; indeed, known interactors often harbor multiple phosphorylated motif occurrences. CAS1 (also BCAR1; Fig. 2A and B), a prominent scaffolding protein in focal adhesions, harbors 16 YXXP motifs within 870 AAs. PhosphoSitePlus has curated phospho-identifications of 15 CAS1 YXXP sites through high-throughput methods, as well as CAS1 site-specific evaluations of 13 of these. Signaling mechanisms, including adhesion-regulated SFK-induced YXXP phosphorylation of CAS1 (Hamasaki et al., 1996), induce binding of the CRK/CRKL–SH2 domain, bringing CRK/CRKL–SH3-bound cargo (e.g. C3G or DOCK180) to the CAS1-associated complex to alter cell adhesion/migration. However, only one pYXXP motif is required for a CRK/CRKL–SH2 interaction, and an enrichment assumption would fail to identify proteins such as DAB1 (two YXXP per 588 AAs). Expressed primarily in the embryonic brain, DAB1 is a SFK-mediated CRK/CRKL–SH2 interactor downstream of Reelin (Arnaud et al., 2003; Ballif et al., 2004). Although DAB1 tyrosine phosphorylation is essential for proper neuronal positioning (Howell et al., 2000), DAB1 is poorly represented in high-throughput proteomic studies, as the majority of studies are not conducted with embryonic brain tissue. PhosphoSitePlus cites 3929 total pYXXP identifications of the ubiquitously expressed BCAR1, while DAB1 has only 18 identifications. Thus, poor discovery-based identifications of YXXP-containing DAB1-like molecules might lead us to disregard them as high priority potential CRK/CRKL–SH2 interactors. In consideration of ways that DAB1-like proteins might emerge as high priority candidates, we hypothesized that proteins in PPI networks with known pYXXP substrates might have a higher probability of getting phosphorylated by an active YXXP-directed kinase. This would therefore increase their likelihood of becoming CRK/CRKL–SH2 binding partners. Corwin et al. (2017) demonstrated clustering of tyrosine kinase substrates within PPI networks when expressing human non-receptor tyrosine kinases in yeast, and similar results have been demonstrated in human PPI networks (Beltrao et al., 2012; Duan and Walther, 2015; Li et al., 2017; Woodsmith et al., 2013). These studies suggest that members of protein complexes within close proximity to a kinase have a high probability of getting phosphorylated simultaneously, and that proteins generally co-participating in signaling networks are more likely to be substrates of the same kinase. Therefore, we also considered proteins within PPI networks of the enriched YXXP-containing protein list as well as the PPI networks of the list of proteins with greater than average pYXXP identifications as a primary feature. To incorporate the PPI network aspect of the tripartite bullseye, ‘Enriched’ and ‘pYXXP’ proteins were combined to a single list that was input to Reactome.org. Of these 475 proteins, 288 mapped to the Reactome database and were used for a pathway enrichment analysis. Pathway enrichment results recovered 25 significantly-overrepresented pathways (false discovery rate (FDR) < 0.05) (Supplementary Material S2). Eleven over-represented pathways were parent or higher-level pathways. Parental pathways were assessed for overlap with the list of all YXXP-containing proteins in the human proteome (Fig. 2C). The higher-level ‘Developmental Biology’ pathway (1080 proteins) was deemed too broad for practical analysis. Extracted proteins from the remaining 10 pathways were combined to form the ‘Enriched Pathways’ list (492 proteins). Notably, the ‘Axon Guidance’ pathway extracted DAB1 as a member of this group, as well as 292 other YXXP-containing proteins that were neither found to be enriched in YXXP motifs nor identified as highly phosphorylated (Fig. 2C). The intersection of the resulting tripartite bullseye included 4 central proteins (CRKL, DOK1, DOK2 and SHB) and 97 possessing at least 2 primary characteristics. To further prioritize, we considered a series of secondary characteristics. Secondary characteristics will be based strongly on the unique goals of investigative teams; however, our approach exemplifies the process. Given our priorities in developmental cell motility-related signaling and the importance of SFKs/Abl in these processes, all known pYXXP substrates of Abl/Src/Fyn were obtained from PhosphoSitePlus using the ‘Substrates of:’ search tool (Supplementary Methods). Scansite-predicted pYXXP substrates of Src/Abl were extracted and narrowed to top-scoring proteins, and known CRK/CRKL interactors were extracted from the IntAct database (Supplementary Methods). The total number of proteins in each group was calculated for the 8887 YXXP-containing proteins previously defined in the Metascape-annotated matrix. Non-unique identifiers were removed and proteins were scored by the sum of weights (Table 1) applied via each primary or secondary feature (Supplementary Material S3). Table 1. Scoring system for primary and secondary characteristics Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Note: Webtools and databases used to annotate YXXP-containing proteins for each characteristic are listed, along with stringencies and assigned weights. All primary characteristics were weighted equally. Secondary characteristics were given 1-pt if experimentally determined, while predictive features were awarded 0.5-pt. The star (*) indicates features for which weights were potentially awarded multiple times for a given protein, as these were summed for each kinase separately. Weights summed across all primary/secondary characteristics for a given protein to obtain its Priority Score (Fig. 3C) Table 1. Scoring system for primary and secondary characteristics Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Scoring System Webtool / Database employed Stringency Weight (pt) Primary characteristics  Enriched in YXXP Scansite sequence pattern / Swissprot 2 SD above mean, #YXXP/AA 3  pYXXP PhosphoSitePlus modified sequence search above mean 3  Enriched Pathway Clusters of ‘Enriched’ and ‘Phospho-YXXP’ Reactome pathway analysis FDR < 0.05 3 Secondary characteristics  Known CRK/CRKL interactor IntAct interactors N/A 1  Known substrate of Src, Fyn or Abl PhosphoSitePlus ‘substrates of:’ search pYXXP substrates only 1*  Predicted substrate of Src or Abl Scansite Abl or Src kinase motif / Swissprot 2 SD below mean score 0.5* Note: Webtools and databases used to annotate YXXP-containing proteins for each characteristic are listed, along with stringencies and assigned weights. All primary characteristics were weighted equally. Secondary characteristics were given 1-pt if experimentally determined, while predictive features were awarded 0.5-pt. The star (*) indicates features for which weights were potentially awarded multiple times for a given protein, as these were summed for each kinase separately. Weights summed across all primary/secondary characteristics for a given protein to obtain its Priority Score (Fig. 3C) To determine the relative importance of primary and secondary characteristics in their predictive power, the distribution of priority scores of 25 known CRK/CRKL–SH2 interacting proteins were considered when various combinations of these predictive features were considered (Fig. 3A, Supplementary Material S3), demonstrating the effective clustering of positive controls in high-scoring regions. In addition, we assessed the enrichment of these known interactors within each category of interest in comparison to the total number of proteins in a given category (Fig. 3B). We then compared these enrichment indexes to those achieved in high-scoring regions when considering the prioritization scheme (‘All, score ≥6/8’ in Fig. 3B). Proteins scoring ≥ 8 were primarily positive controls (∼80% of the total) and, therefore, novel interactors within that scoring region would be highest-priority candidates for biochemical validation. However, we expanded our region of interest to encompass scores ≥6 to capture the top ∼1% of all YXXP-containing proteins (Fig. 3C) in order to, reasonably, capture less-studied proteins. Notably, both scoring regions effectively enriched positive controls more so than each characteristic alone and primary characteristics in combination (Fig. 3B). Surprisingly, consideration of known Src/Fyn/Abl substrates alone enriched positive controls more effectively than any other characteristic alone. However, we maintained this feature as a secondary characteristic as it could present considerable bias in favor of identifying well-characterized proteins if weighted as a primary feature. Fig. 3. View largeDownload slide Validation of primary and secondary characteristics and prioritization of potential CRK/CRKL–SH2 interactors. (A) Chosen primary (‘Enriched YXXP’, ‘Phospho-YXXP’, ‘Enriched Pathways’) and secondary (‘CRK/CRKL interactors’, ‘Predicted Kinase Substrates’, ‘Known Kinase Substrates’) characteristics were compared for their ability to prioritize known CRK/CRKL–SH2 interactors using the scoring system summarized in Table 1. The percent of positive controls scoring ≥6 and 8 are shown to the right of each histogram. Four known interactors (INPP5D [SHIP-1], DCBLD1, FLT4 [VEGFR-3] and ZAP70) remained with priority scores below 6. (B) Each primary and secondary characteristic is considered for its ability to enrich for CRK/CRKL–SH2 interactors. The ‘Enrichment Index’, defined as the percentage of positive controls relative to all YXXP-containing proteins, is plotted for each characteristic separately, as well as for combined primary characteristics. These enrichment indexes are compared with scoring regions of interest (in bold) when all characteristics are considered as outlined in Table 1. (C) All YXXP-containing proteins were scored by weighted primary and secondary features, as described in the Supplementary Methods. The y-axis for priority scores 3.5–11.5 is magnified in the inset. Proteins validated biochemically (Fig. 4) are indicated by gene symbol and are either black (induced to bind) or red (no interaction). (A and C) Percentage of positive controls scoring ≥6 and 8 are compared with all YXXP-containing proteins, highlighting the concentration of known CRK/CRKL–SH2 interactors in high-scoring bins (binwidth = 0.5). (D) All proteins that emerged with a priority score >6.5 are tabulated. Known CRK/CRKL–SH2 interactors are shown in bold Fig. 3. View largeDownload slide Validation of primary and secondary characteristics and prioritization of potential CRK/CRKL–SH2 interactors. (A) Chosen primary (‘Enriched YXXP’, ‘Phospho-YXXP’, ‘Enriched Pathways’) and secondary (‘CRK/CRKL interactors’, ‘Predicted Kinase Substrates’, ‘Known Kinase Substrates’) characteristics were compared for their ability to prioritize known CRK/CRKL–SH2 interactors using the scoring system summarized in Table 1. The percent of positive controls scoring ≥6 and 8 are shown to the right of each histogram. Four known interactors (INPP5D [SHIP-1], DCBLD1, FLT4 [VEGFR-3] and ZAP70) remained with priority scores below 6. (B) Each primary and secondary characteristic is considered for its ability to enrich for CRK/CRKL–SH2 interactors. The ‘Enrichment Index’, defined as the percentage of positive controls relative to all YXXP-containing proteins, is plotted for each characteristic separately, as well as for combined primary characteristics. These enrichment indexes are compared with scoring regions of interest (in bold) when all characteristics are considered as outlined in Table 1. (C) All YXXP-containing proteins were scored by weighted primary and secondary features, as described in the Supplementary Methods. The y-axis for priority scores 3.5–11.5 is magnified in the inset. Proteins validated biochemically (Fig. 4) are indicated by gene symbol and are either black (induced to bind) or red (no interaction). (A and C) Percentage of positive controls scoring ≥6 and 8 are compared with all YXXP-containing proteins, highlighting the concentration of known CRK/CRKL–SH2 interactors in high-scoring bins (binwidth = 0.5). (D) All proteins that emerged with a priority score >6.5 are tabulated. Known CRK/CRKL–SH2 interactors are shown in bold The distribution of priority scores for all YXXP-containing proteins is shown in Figure 3C. Proteins chosen for biochemical validation in a pulldown assay are indicated in either black (confirmed CRKL–SH2 interactors) or red (no interaction). Top-scoring proteins (>6.5) are displayed in Figure 3D, with known CRK/CRKL–SH2 interactors highlighted in bold font. 3.2 Biochemical validation: identification of novel CRKL–SH2 interactors For biochemical validation, we chose 10 proteins (Fig. 3C) to test in a CRKL–SH2 pulldown assay following co-expression with c-Abl. In addition to choosing several high-scoring proteins, we chose one protein with a low priority score friend leukemia integration 1 transcription factor (FLI1) and one highly enriched in YXXP sites but with few pYXXP identifications deleted in azoospermia protein 2 (DAZ2). In candidate selection, we also considered proteins with strong YXXP conservation across vertebrates and those expressed in the developing nervous system Mouse Genome Informatics (MGI) (Blake et al., 2017). cDNA constructs of selected candidates were expressed in HEK293 cells with or without c-Abl. Although not a suitable kinase for all YXXP tyrosines, c-Abl was selected for its robustness and high selectivity for YXXP substrates (Ballif et al., 2008; Colicelli, 2010). Cell lysates were incubated with GST-CRKL–SH2 resin and following washing, bound protiens were subjected to SDS-PAGE and immunoblotting. Figure 4 shows immunoblots of pulldown assays alongside schematics of each protein tested, which display domain structure, YXXP location, experimental pYXXP identifications and percent motif conservation. For reference, similar schematics for select positive controls are summarized in Supplementary Figure S2. Fig. 4. View largeDownload slide Domain structure, YXXP conservation and GST-CRKL–SH2 pulldowns of tested candidates. Mammalian expression vectors were introduced to HEK293 cells with/without c-Abl-Flag. GST-CRKL–SH2 pulldown assays were performed on cell extracts. Ponceau staining indicates relative GST-CRKL–SH2 levels. Immunoblotting was conducted with the indicated antibodies. Candidate domain structure and YXXP sites are shown with their corresponding blots. Percentages denote conservation of YXXP across five representative vertebrates, with two exceptions. DAZ2 is specific to higher primates; therefore, conservation could not be assessed across multiple vertebrate taxa. EMD was not found in Gallus gallus; therefore, percentage values reflect conservation across only four vertebrates. The number of experimental pYXXP identifications (if > 5) is given at the indicated tyrosine residue (PhosphoSitePlus) Fig. 4. View largeDownload slide Domain structure, YXXP conservation and GST-CRKL–SH2 pulldowns of tested candidates. Mammalian expression vectors were introduced to HEK293 cells with/without c-Abl-Flag. GST-CRKL–SH2 pulldown assays were performed on cell extracts. Ponceau staining indicates relative GST-CRKL–SH2 levels. Immunoblotting was conducted with the indicated antibodies. Candidate domain structure and YXXP sites are shown with their corresponding blots. Percentages denote conservation of YXXP across five representative vertebrates, with two exceptions. DAZ2 is specific to higher primates; therefore, conservation could not be assessed across multiple vertebrate taxa. EMD was not found in Gallus gallus; therefore, percentage values reflect conservation across only four vertebrates. The number of experimental pYXXP identifications (if > 5) is given at the indicated tyrosine residue (PhosphoSitePlus) To demonstrate the potential limitations of solely considering YXXP-enriched proteins, we compared the CRKL–SH2-binding ability of DAZ2 (15 YXXP, 558 AAs), a spermatogenesis-related protein (Reijo et al., 1996), to that of FLI1 (3 YXXP, 452 AAs) which is implicated in cell growth and malignancy (Brown et al., 2000; Truong and Ben-David, 2000). Although highly YXXP-enriched, DAZ2 was not an obvious CRK/CRKL–SH2 interactor for several reasons. The Y-chromosomal DAZ genes (DAZ1–4), common only to higher primates, possess 1–15 repeats of a 24-AA polymorphic sequence. Conservation of these repeats implies important functionality, however, only six possess a YXXP motif (Fu et al., 2015), suggesting that this motif is not essential to the conserved repeat function. Furthermore, DAZ2 had no other primary or secondary features to suggest it might interact with the CRK/CRKL–SH2 domain. Our biochemical analysis found that c-Abl co-expression did not induce DAZ2 to bind the CRKL–SH2 domain (Fig. 4A). Although it is formally possible that another tyrosine kinase with a YXXP target preference (Supplementary Fig. S1, Supplementary Table S1) can phosphorylate DAZ2, it is important to note that a protein with 15 target motifs cannot be induced spuriously to bind the CRKL–SH2 domain by co-expressing a kinase with a strong YXXP-target preference. Although FLI1 is not YXXP-enriched, it does possess one YXXP site with 26 pTyr identifications. Still, this fell below the average pYXXP number per protein and, with no score from any primary or secondary feature, FLI1 received a priority score of zero (Fig. 3C). Mechanistically, FLI1 has been shown to block erythropoietin-induced differentiation and promote erythroblast proliferation, however, the signaling mechanisms involved remain unknown (Pereira et al., 1999; Tamir et al., 1999). FLI1’s involvement in CRK/CRKL-related processes and expression in the developing nervous system (MGI) led us to investigate its CRKL–SH2-binding ability. Surprisingly, FLI1 was induced to bind the CRKL–SH2 domain when co-expressed with c-Abl (Fig. 4A), suggesting this interaction may be important in FLI1 signaling to modulate its roles in cell proliferation, differentiation or transformation. We next chose to test two high-scoring proteins that are expressed in the developing nervous system, WW domain-binding protein 2 (WBP2) and CRKL itself. WBP2 was the most YXXP-enriched protein identified, possessing 11 YXXP motifs within 261 AAs, but with motifs showing variable conservation across vertebrates (20–100%). As a coactivator of the estrogen receptor (ER) (Dhananjayan et al., 2006), WBP2 mediates proliferation/differentiation associated with breast cancer by regulating ER target gene expression (Buffa et al., 2013). WBP2 is phosphorylated on YXXP Tyr192 and Tyr231 by Src and Yes downstream of EGF, and overexpression in mice induces ER-dependent and independent loss of cell adhesion and increased tumor proliferation/invasion (Lim et al., 2011). Figure 4B displays the GST-CRKL–SH2 pulldown, demonstrating Abl-induced binding of WBP2. Currently, we are working to determine signaling pathways in which a WBP2/CRKL interplay makes a contribution. Intriguingly, CRKL is highly enriched with five strongly-conserved YXXP motifs (within 303 AAs). Three sites have been found phosphorylated >100 times, with pTyr207 possessing 1212 identifications (Fig. 4B); however, the functional significance has not been fully analyzed. The analogous CRK Tyr221 has been characterized to be part of a negative-regulatory intramolecular pTyr221/CRK-SH2 interaction, which prevents intermolecular CRK-SH2/pTyr and CRK-SH3/PXXPXK interactions (Chodniewicz and Klemke, 2004). Although a similar mechanism is presumed for CRKL Tyr207, this remains to be demonstrated. Additionally, CRKL pTyr207 or other sites might facilitate intermolecular dimerization, as has been demonstrated for CRK (Feller et al., 1994). To test this, we conducted our pulldown on extracts from cells co-expressing c-Abl and CRKL, and found that Abl induced a CRKL/CRKL–SH2 interaction (Fig. 4B). Additional proteins tested are shown in Figure 4C–E. We observed Abl-induced CRKL–SH2 binding of transmembrane protein 192 (TMEM192), emerin (EMD), PDZ and LIM domain protein 5 (PDLIM5), KIAA1143 and caveolin-1 (CAV1). However, neuronally expressed EPHA2 (MGI), with a high pYXXP count on conserved motifs, did not interact with CRKL–SH2 when co-expressed with c-Abl (Fig. 4D). EPHA2 regulates cell migration, adhesion and differentiation (Lin et al., 2010; Miao et al., 2009) and is required for proper lens organization in mice (Cheng et al., 2013). Despite these signature CRKL–SH2 interactor characteristics, EPHA2 was not induced to bind, although it is possible that pTyr588/pTyr594 is not Abl-induced. Together these data validate an Abl-mediated interaction between the CRKL–SH2 domain and several proteins identified from our screen. 3.3 Application to the methylated RG motif: a proof-of-principle Supplementary Figure S3 summarizes the in silico screen in a stepwise workflow that includes a generalized approach alongside examples from our CRKL–SH2 screen. As a proof-of-principle, we applied this screen to a different modification-dependent interaction, namely, those facilitated by arginine methylation in RG motifs (MeRG). RGG/GRG motifs are the preferred targets of many PRMTs, inducing protein–protein/nucleic acid interactions (Blanc and Richard, 2017; Thandapani et al., 2013). To date, the Tudor domain is the only known MeArg-binding domain; however, many PPIs are mediated through MeRG motifs, suggesting that additional MeArg-binding domains remain undiscovered. Figure 5 shows the formation of the tripartite bullseye from all human RG-containing proteins, using the same databases and approach outlined in Supplementary Figure S3. The results had a generally similar profile to our YXXP analysis (Fig. 5, Supplementary Material S4). These primary characteristics successfully focused known Tudor interactors and PRMT substrates into the central bullseye (Supplementary Material S4C). Among these were SMD3, which, along with its homolog SMD1 (‘Enriched’/‘MeRG’ overlap), binds the Tudor domain of survivor motor neuron protein (SMN) (Friesen et al., 2001), and Sam68, a MeRG-dependent SND1-Tudor interactor (Cappellari et al., 2014). Another important SMN–Tudor interactor, GAR1 (Whitehead et al., 2002), emerged in the overlap of the enriched RG and enriched pathways groups. This provides a strong foundation for the application of appropriate secondary features, prioritization and biochemical validation. Fig. 5. View largeDownload slide Formation of the tripartite bullseye of high-priority RG-containing proteins. (A) Binary logarithmic distribution of the number of RG sites per AA. Proteins two standard deviations above the mean were extracted as the ‘Enriched’ group. (B) Number of experimental MeRG confirmations per protein (PhosphositePlus). All proteins above the mean were taken as the ‘Modifed’ list and contained 229 unique entries as parsed by Metascape. (C) Metascape pathway enrichment analysis shows the overlap of the enriched pathways within the ‘Enriched’ and ‘Modified’ lists with all RG-containing proteins. Eight enriched parent pathways (FDR < 0.05) of (A) and (B) were obtained via Reactome. All RG-containing proteins in enriched pathways were extracted as the ‘Enriched Pathways’. Populations of overlap are given in the subplots Fig. 5. View largeDownload slide Formation of the tripartite bullseye of high-priority RG-containing proteins. (A) Binary logarithmic distribution of the number of RG sites per AA. Proteins two standard deviations above the mean were extracted as the ‘Enriched’ group. (B) Number of experimental MeRG confirmations per protein (PhosphositePlus). All proteins above the mean were taken as the ‘Modifed’ list and contained 229 unique entries as parsed by Metascape. (C) Metascape pathway enrichment analysis shows the overlap of the enriched pathways within the ‘Enriched’ and ‘Modified’ lists with all RG-containing proteins. Eight enriched parent pathways (FDR < 0.05) of (A) and (B) were obtained via Reactome. All RG-containing proteins in enriched pathways were extracted as the ‘Enriched Pathways’. Populations of overlap are given in the subplots 4 Discussion and conclusions Here, we present a generalized in silico proteomics screen that utilizes publicly available databases/tools to predict and prioritize domain-motif interactions. Using the CRK/CRKL–SH2-pYXXP interaction as an example, we successfully identified potential interactors with a priority-scoring system using signature CRKL–SH2 interactor characteristics, employing curated PTM data, PPI networks, and the molecular/cellular roles of CRK/CRKL in the developing nervous system. Alongside other known CRK/CRKL–SH2 interactors, our positive controls emerged with high priority scores (Fig. 3A–C, Supplementary Material S3). In spite of the demonstrated success of our approach, it presents certain limitations. Although we successfully identified the critical CRK/CRKL interactor DAB1 using the pathway enrichment data, and possibly other DAB1-like molecules not enriched in YXXP motifs or with few pYXXP identifications, not all such proteins would be readily detected. Proteins that are not characterized in PPI networks at the gene ontology level could continue to receive low priority scores. However, as new information populates the databases/tools employed here, relevant but obscured proteins will increase their priority scores. If identified as direct CRK/CRKL–SH2 interactors in proteomics screens, the in silico screen would be less important. However, if proteins emerge as general CRK/CRKL interactors, are placed in CRK/CRKL-related networks, or are identified experimentally with high pYXXP counts in a new tissue-type, then their priority scores will increase in a future repetition of the screen. This could be the fate of proteins such as FLI1, which received no points and bound the CRKL–SH2 domain in Abl-active conditions. These points represent a bias toward well-characterized proteins and those upregulated in cancer cells, from which the bulk of available proteomics data is composed. However, it is argued that our in silico approach provides a strong user-friendly companion to directed proteomics screens. Although we chose to include all YXXP-containing proteins within enriched Reactome pathways as a primary characteristic, researchers could further narrow their screen by choosing enriched pathways that are of relevance to their area of study. Such an approach would allow focus on a biological pathway of interest, rather than all enriched pathways. Motif conservation and tissue-specific expression were taken into account post-scoring; however, these could be included as weighted secondary characteristics in a future iteration. We attempted to integrate AL2CO conservation analysis (Pei and Grishin, 2001), but found it was not easily amenable to bulk analysis of motif conservation across our chosen taxa. Motif count per protein would be an acceptable proxy for evolutionary conservation, and we conducted this for YXXP motifs within mouse and zebrafish using a Scansite query (Supplementary Material S1). Although certain challenges were encountered in mapping non-human protein identifiers to human (Supplementary Material S1), these will be less of an issue as proteomes of other organisms become more comprehensive. Binding motif surface accessibility is another attractive secondary feature to consider, as it would theoretically weight likely PPI surfaces. We sought to include this as a weighted characteristic; however, upon reviewing the currently available surface accessibility prediction software in their ability to predict exposure of well characterized phosphorylated-YXXP motifs within known CRKL–SH2 interactors, we found that many of these tyrosine residues were predicted to be buried. Further, we did not find programs that would accommodate batch searches of ∼9000 sequences. For these reasons, we concluded that our phospho-YXXP dataset was sufficient to represent this aspect in our screen. However, this has limitations including when motifs remain hidden due to partial protein coverage when using bottom-up proteomics with common proteolytic enzymes such as trypsin. Additionally, protein expression/motif modification in specific tissues/environments that are under-sampled will leave some modifications hidden. We employed expression databases to determine whether high-scoring proteins were expressed in our tissues of interest. We used the MGI database to query expression levels in mice, which is easily focused to a tissue/developmental stage of choice. Attempting to include tissue-of-interest expression as a secondary feature, we mapped mRNA expression in the embryonic nervous system to our protein list using the MGI mouse-to-human identifier conversion tool. However, we encountered issues in the conversion of unconserved genes, such as dazl, which mapped to DAZ2 in our dataset. DAZ2, specific to higher primates, is only expressed in the male germline; therefore, we restricted prioritization based on expression data to a case-by-case search applied to high-scoring candidates from the initial screen. The databases used here are generally applicable to domain/motif interactions with post-translationally modified proteins, including PTMs beyond phosphorylation. With our approach, and potentially the inclusion of additional generalized and specific PTM databases/predictive tools for primary/secondary characteristics (Chen et al., 2017), we anticipate that non-specialists can easily employ this strategy, using in silico proteomics to unveil the identities of novel proteins relevant to specific biological mechanisms of signal transduction. We anticipate this screen will be an important, rapid first step to assist investigators in identifying top candidates warranting biochemical and genetic examination in the signaling systems they are studying. Funding This work was supported by U.S. National Science Foundation IOS awards [1021795 and 1656510]; the Vermont Genetics Network through U. S. National Institutes of Health award [8P20GM103449] from the INBRE program of the NIGMS; and U.S. National Institutes of Health award [5P20RR016435] from the COBRE program of the NIGMS. Conflict of Interest: none declared. References Arnaud L. et al. ( 2003 ) Fyn tyrosine kinase is a critical regulator of disabled-1 during brain development . Curr. Biol ., 13 , 9 – 17 . Google Scholar Crossref Search ADS PubMed Aten T.M. et al. ( 2013 ) Tyrosine phosphorylation of the orphan receptor ESDN/DCBLD2 serves as a scaffold for the signaling adaptor CrkL . FEBS Lett ., 587 , 2313 – 2318 . Google Scholar Crossref Search ADS PubMed Ballif B.A. et al. ( 2004 ) Activation of a Dab1/CrkL/C3G/Rap1 pathway in Reelin-stimulated neurons . Curr. Biol ., 14 , 606 – 610 . Google Scholar Crossref Search ADS PubMed Ballif B.A. et al. ( 2008 ) Large-scale identification and evolution indexing of tyrosine phosphorylation sites from murine brain . J. Proteome Res ., 7 , 311 – 318 . Google Scholar Crossref Search ADS PubMed Beltrao P. et al. ( 2012 ) Systematic functional prioritization of protein posttranslational modifications . Cell , 150 , 413 – 425 . Google Scholar Crossref Search ADS PubMed Blake J.A. et al. ( 2017 ) Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse . Nucleic Acids Res ., 45 , D723 – D729 . Google Scholar Crossref Search ADS PubMed Blanc R.S. , Richard S. ( 2017 ) Arginine methylation: the coming of age . Mol. Cell , 65 , 8 – 24 . Google Scholar Crossref Search ADS PubMed Brábek J. et al. ( 2005 ) Crk-associated substrate tyrosine phosphorylation sites are critical for invasion and metastasis of SRC-transformed cells . Mol. Cancer Res ., 3 , 307 – 315 . Google Scholar Crossref Search ADS PubMed Brown L.A. et al. ( 2000 ) Insights into early vasculogenesis revealed by expression of the ETS-domain transcription factor Fli-1 in wild-type and mutant zebrafish embryos . Mech. Develop ., 90 , 237 – 252 . Google Scholar Crossref Search ADS Buffa L. et al. ( 2013 ) Molecular mechanism of WW‐domain binding protein‐2 coactivation function in estrogen receptor signaling . IUBMB Life , 65 , 76 – 84 . Google Scholar Crossref Search ADS PubMed Cappellari M. et al. ( 2014 ) The transcriptional co-activator SND1 is a novel regulator of alternative splicing in prostate cancer cells . Oncogene , 33 , 3794 . Google Scholar Crossref Search ADS PubMed Chen C. et al. ( 2017 ) Protein bioinformatics databases and resources . In: Wu C. , Arighi C. , Ross K. (eds) Protein Bioinformatics. Methods in Molecular Biology , pp. 3 – 39 , Humana Press , New York, NY . Cheng C. et al. ( 2013 ) EphA2 and Src regulate equatorial cell morphogenesis during lens development . Development , 140 , 4237 – 4245 . Google Scholar Crossref Search ADS PubMed Chodniewicz D. , Klemke R.L. ( 2004 ) Regulation of integrin-mediated cellular responses through assembly of a CAS/Crk scaffold . Biochim. Biophys. Acta Mol. Cell Res ., 1692 , 63 – 76 . Google Scholar Crossref Search ADS Colicelli J. ( 2010 ) ABL tyrosine kinases: evolution of function, regulation, and specificity . Sci. Signal ., 3 , re6. Google Scholar Crossref Search ADS PubMed Corwin T. et al. ( 2017 ) Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate . Cell Syst ., 5 , 128 – 139 . e124. Google Scholar Crossref Search ADS PubMed Dhananjayan S.C. et al. ( 2006 ) WW domain binding protein-2, an E6-associated protein interacting protein, acts as a coactivator of estrogen and progesterone receptors . Mol. Endocrinol ., 20 , 2343 – 2354 . Google Scholar Crossref Search ADS PubMed Duan G. , Walther D. ( 2015 ) The roles of post-translational modifications in the context of protein interaction networks . PLoS Comput. Biol ., 11 , e1004049. Google Scholar Crossref Search ADS PubMed Feller S.M. et al. ( 1994 ) c‐Abl kinase regulates the protein binding activity of c‐Crk . EMBO J ., 13 , 2341 – 2351 . Google Scholar Crossref Search ADS PubMed Friesen W.J. et al. ( 2001 ) SMN, the product of the spinal muscular atrophy gene, binds preferentially to dimethylarginine-containing protein targets . Mol. Cell , 7 , 1111 – 1117 . Google Scholar Crossref Search ADS PubMed Fu X.-F. et al. ( 2015 ) DAZ family proteins, key players for germ cell development . Int. J. Biol. Sci ., 11 , 1226. Google Scholar Crossref Search ADS PubMed Hamasaki K. et al. ( 1996 ) Src kinase plays an essential role in integrin-mediated tyrosine phosphorylation of Crk-associated substrate p130Cas . Biochem. Biophys. Res. Commun ., 222 , 338 – 343 . Google Scholar Crossref Search ADS PubMed Howell B.W. et al. ( 2000 ) Dab1 tyrosine phosphorylation sites relay positional signals during mouse brain development . Curr. Biol ., 10 , 877 – 885 . Google Scholar Crossref Search ADS PubMed Klemke R.L. et al. ( 1998 ) CAS/Crk coupling serves as a “molecular switch” for induction of cell migration . J. Cell Biol ., 140 , 961 – 972 . Google Scholar Crossref Search ADS PubMed Li Y. et al. ( 2017 ) Co-occurring protein phosphorylation are functionally associated . PLoS Comput. Biol ., 13 , e1005502. Google Scholar Crossref Search ADS PubMed Lim S.K. et al. ( 2011 ) Tyrosine phosphorylation of transcriptional coactivator WW-domain binding protein 2 regulates estrogen receptor α function in breast cancer via the Wnt pathway . FASEB J ., 25 , 3004 – 3018 . Google Scholar Crossref Search ADS PubMed Lin S. et al. ( 2010 ) Ligand targeting of EphA2 enhances keratinocyte adhesion and differentiation via desmoglein 1 . Mol. Biol. Cell , 21 , 3902 – 3914 . Google Scholar Crossref Search ADS PubMed Matsuda M. et al. ( 1991 ) Identification of domains of the v-crk oncogene product sufficient for association with phosphotyrosine-containing proteins . Mol. Cell. Biol ., 11 , 1607 – 1613 . Google Scholar Crossref Search ADS PubMed Mayer B.J. ( 2015 ) The discovery of modular binding domains: building blocks of cell signalling . Nat. Rev. Mol. Cell Biol ., 16 , 691 – 698 . Google Scholar Crossref Search ADS PubMed Mayer B.J. et al. ( 1991 ) The noncatalytic src homology region 2 segment of abl tyrosine kinase binds to tyrosine-phosphorylated cellular proteins with high affinity . Proc. Natl. Acad. Sci. USA , 88 , 627 – 631 . Google Scholar Crossref Search ADS Miao H. et al. ( 2009 ) EphA2 mediates ligand-dependent inhibition and ligand-independent promotion of cell migration and invasion via a reciprocal regulatory loop with Akt . Cancer Cell , 16 , 9 – 20 . Google Scholar Crossref Search ADS PubMed Moran M.F. et al. ( 1990 ) Src homology region 2 domains direct protein-protein interactions in signal transduction . Proc. Natl. Acad. Sci. USA , 87 , 8622 – 8626 . Google Scholar Crossref Search ADS Park T. , Curran T. ( 2014 ) Essential roles of Crk and CrkL in fibroblast structure and motility . Oncogene , 33 , 5121 – 5132 . Google Scholar Crossref Search ADS PubMed Park T.-J. , Curran T. ( 2008 ) Crk and Crk-like play essential overlapping roles downstream of disabled-1 in the Reelin pathway . J. Neurosci ., 28 , 13551 – 13562 . Google Scholar Crossref Search ADS PubMed Pei J. , Grishin N.V. ( 2001 ) AL2CO: calculation of positional conservation in a protein sequence alignment . Bioinformatics , 17 , 700 – 712 . Google Scholar Crossref Search ADS PubMed Pereira R. et al. ( 1999 ) FLI-1 inhibits differentiation and induces proliferation of primary erythroblasts . Oncogene , 18 , 1597 . Google Scholar Crossref Search ADS PubMed Reijo R. et al. ( 1996 ) Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene . Hum. Reprod ., 11 , 27 – 54 . Google Scholar Crossref Search ADS PubMed Sadowski I. et al. ( 1986 ) A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps . Mol. Cell. Biol. , 6 , 4396 – 4308 . Google Scholar Crossref Search ADS PubMed Schmoker A.M. et al. ( 2017 ) Dynamic multi-site phosphorylation by Fyn and Abl drives the interaction between CRKL and the novel scaffolding receptors DCBLD1 and DCBLD2 . Biochem. J ., 474 , 3963 – 3984 . Google Scholar Crossref Search ADS PubMed Seet B.T. et al. ( 2006 ) Reading protein modifications with interaction domains . Nature Reviews Molecular Cell Biology , 7 , 473 – 483 . Google Scholar Crossref Search ADS PubMed Tamir A. et al. ( 1999 ) Fli-1, an Ets-related transcription factor, regulates erythropoietin-induced erythroid proliferation and differentiation: evidence for direct transcriptional repression of the Rb gene during differentiation . Mol. Cell. Biol ., 19 , 4452 – 4464 . Google Scholar Crossref Search ADS PubMed Thandapani P. et al. ( 2013 ) Defining the RGG/RG motif . Mol. Cell , 50 , 613 – 623 . Google Scholar Crossref Search ADS PubMed Truong A.H. , Ben-David Y. ( 2000 ) The role of Fli-1 in normal cell function and malignant transformation . Oncogene , 19 , 6482. Google Scholar Crossref Search ADS PubMed Whitehead S.E. et al. ( 2002 ) Determinants of the interaction of the spinal muscular atrophy disease protein SMN with the dimethylarginine-modified box H/ACA small nucleolar ribonucleoprotein GAR1 . J. Biol. Chem ., 277 , 48087 – 48093 . Google Scholar Crossref Search ADS PubMed Woodsmith J. et al. ( 2013 ) Dual coordination of post translational modifications in human protein networks . PLoS Comput. Biol ., 9 , e1002933 . Google Scholar Crossref Search ADS PubMed Wu X. et al. ( 1995 ) Structural basis for the specific interaction of lysine-containing proline-rich peptides with the N-terminal SH3 domain of c-Crk . Structure , 3 , 215 – 226 . Google Scholar Crossref Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

BioinformaticsOxford University Press

Published: Nov 15, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off