TY - JOUR AU - Vadher, Uma AB - Abstract This study explores the hypothesis that protein hormones are nested information systems in which initial products of gene transcription, and their subsequent protein fragments, before and after secretion and initial target cell action, play additional physiological regulatory roles. The study produced four tools and key results: (1) a problem approach that proceeds, with examples and suggestions for in vivo organismal functional tests for peptide–protein interactions, from proteolytic breakdown prediction to models of hormone fragment modulation of protein–protein binding motifs in unrelated proteins; (2) a catalog of 461 known soluble human protein hormones and their predicted fragmentation patterns; (3) an analysis of the predicted proteolytic patterns of the canonical protein hormone transcripts demonstrating near-universal persistence of 9 ± 7 peptides of 8 ± 8 amino acids even after cleavage with 24 proteases from four protease classes; and (4) a coincidence analysis of the predicted proteolysis locations and the 1939 exon junctions within the transcripts that shows an excess (P < 0.001) of predicted proteolysis within 10 residues, especially at the exonal junction (P < 0.01). It appears all protein hormone transcripts generate multiple fragments the size of peptide hormones or protein–protein binding domains that may alter intracellular or extracellular functions by acting as modulators of metabolic enzymes, transduction factors, protein binding proteins, or hormone receptors. High proteolytic frequency at exonal junctions suggests proteolysis has evolved, as a complement to gene exon fusion, to extract structures or functions within single exons or protein segments to simplify the genome by discarding archaic one-exon genes. Open in new tabDownload slide Open in new tabDownload slide Introduction Protein hormones have been viewed as one class of intercellular messages acting similarly to other classes, large or small. Classically, they are produced by a cell, travel via extracellular fluid to a receptor on, or in, a target cell, they initiate an intracellular cascade of amplifying transduction signals that produce a response by the target cell, and then they are cleared from circulation and the target cell allowing the cell to again respond to the same hormone. Past work has focused on hormone isolation and identification and characterization of receptors, transducer molecules, and response components. Less effort has addressed the simple question of why organisms use proteins, often large and complex molecules, to accomplish tasks that frequently parallel functions of small molecules like neurotransmitters or steroids. Why should organisms invest in synthesis and regulation of macromolecules if they are only going to act once and then be degraded? Is it possible that protein hormones are actually nested information systems in which the initial products of gene transcription, and their subsequent protein fragments, before and after secretion and initial target cell action, play additional roles in regulating physiological systems? Might those additional roles involve binding to non-classical molecular targets? Modern proteomic tools and exploration of intriguing examples from publications have uncovered information suggesting protein hormones are, indeed, more than simple one-use biochemical signals. A Nobel Prize was awarded to Roberts and Sharp for the “Split Gene”/“Genes in Pieces” hypothesis [1, 2] further explained by Gilbert [3]; they suggested proteins were composed of a series of domains that were assembled and mixed via gene duplication, DNA recombination, and evolutionary pressure to produce optimal solutions to organismal problems. Patterns of predicted proteolytic cleavages along the primary transcript of several families of protein hormones and the exon maps of the genes that encode them demonstrate that not only are gene domains preserved across many proteins, but the preferred sequences for polypeptide cleavage by multiple proteolytic enzymes are also preserved. This implies three things. First, if fragments of the original transcript are functional, even briefly, preservation of cleavage sites allows the organism to release those active fragments using its existing proteolytic machinery. Second, if individual protein domains play important functional roles by themselves, the organism preserves access to the more archaic functions of earlier protein domain assemblages by preserving the proteolytic motifs that allow release of those functional domains during the proteolytic breakdown of the more modern proteins. Third, preservation of access to archaic domain functions, by preserving proteolytic cleavages at the borders of those domains, allows the organism to gradually simplify its genome by loss of archaic protein genes whose functions remain accessible as modern protein proteolytic fragments. The concept of nested information systems is a logical extension of organismal use of macromolecular proteins as signals. To explore the question of why organisms use proteins as signals and whether those proteins might contain additional signals, the predicted cleavages of mature protein hormones by their primary target cells was examined. Physiologically, much of this proteolysis occurs as protein hormones are swept from membrane receptor binding sites into endocytotic vesicles and subsequently moved to late endocytotic vesicles and/or to lysosomes in the intracellular vesicular system (Figure 1). During this process, vesicular, membrane-embedded, proton pumps acidify intra-vesicular pH, from ~ 7.4 at the cell surface to as low as 3.5–4 near the cell nucleus [4, 5]. Protein structures uncoil and become more proteolytically susceptible during acidification. In addition to decreased pH, contents of endocytic/lysosomal vesicles are exposed to an array of proteolytic enzymes, often cathepsins, with differing pH optima (Supplemental Figure S1) [6–12]. Different spectra of proteolytic enzymes act at or near the cell surface versus deep in the cell interior. Thus, protein hormones are gradually unfolded by decreasing pH while also being exposed to a shifting mix of proteolytic enzymes. Fragments are being generated during this journey, but predictions from sophisticated proteolytic cleavage prediction algorithms [13–16] suggest some of these fragments are sizable, not just single amino acids or amino acid dimers, even after exposures to multiple proteolytic enzymes. This means these fragments may be available to act elsewhere in the target cell or that they may be recycled back to the cell surface and released to act elsewhere. Figure 1 Open in new tabDownload slide Possible fates of protein hormones after binding and endocytosis by primary target cells. The gradual decline in pH during passage through the endosomal and lysosomal compartments is shown from cell membrane to near the nucleus. Protein hormone proteolysis is indicated by the progression H (hormone)→h (primary fragment)→n (secondary fragment)→s (tertiary fragment)→aa (amino acids) through the endosomal/lysosomal series and extending to the cytoplasmic E2 proteolytic complex on the left. Direct recycling of H or h via endosomal exocytosis is shown on the upper right as is reprocessing such as carbohydrate redecoration. Antigen-like fragment presentation is depicted at the lower left. Fragment escape via exosomes is shown at the lower right. Figure 1 Open in new tabDownload slide Possible fates of protein hormones after binding and endocytosis by primary target cells. The gradual decline in pH during passage through the endosomal and lysosomal compartments is shown from cell membrane to near the nucleus. Protein hormone proteolysis is indicated by the progression H (hormone)→h (primary fragment)→n (secondary fragment)→s (tertiary fragment)→aa (amino acids) through the endosomal/lysosomal series and extending to the cytoplasmic E2 proteolytic complex on the left. Direct recycling of H or h via endosomal exocytosis is shown on the upper right as is reprocessing such as carbohydrate redecoration. Antigen-like fragment presentation is depicted at the lower left. Fragment escape via exosomes is shown at the lower right. Many examples now exist of the contents of the endosomal/lysosomal compartment escaping into cell cytoplasm. Macromolecular complexes as large as viruses [17–20] are known to move out of this organelle compartment. Dissolution of endosomal/lysosomal vesicles is also known [21]. If the fragments from protein hormones are not ubiquinated [9], and thereby quickly routed to the E2 cytoplasmic proteolysis complex, those peptides would be available to interact with intracellular proteins as stimulators or inhibitors of modulatory protein–protein interactions. We have found that many predicted protein hormone fragments bear significant resemblance, via primary and three-dimensional (3D) structure, to surface domains of intracellular proteins, suggesting such peptide-mediated modulation of their normal protein–protein interactions is distinctly possible. This would parallel the modulatory actions known for small RNAs at the levels of gene transcription and gene translation [22]. But even routing to the E2 proteasome complex may not fully degrade protein hormone fragments. Recently Ramachandran and Margolis [23] found a subset of E2 complexes that localize at neural cell membranes where, from intracellular protein hormones, they generate extracellular peptides that can stimulate neuronal signaling on their own. In addition to intracellular target cell escape, processing of protein hormones within the endosomal/lysosomal compartments can, via direct exocytosis or reprocessing via the Golgi apparatus, produce several other types of secondary signaling. Secondary hormones may be released, as in the case of prolactin, an angiogenic promotor, being degraded by bovine breast cells to a 16 kDa fragment, vasoinhibin, an angiogenic inhibitor, which is released to act on vascular endothelial cells [24, 25]. In analogy with immune cell processing and presentation of antigens [26, 27], protein hormone fragments may be presented on the surface of primary target cells in association with cell surface antigens; explaining, e.g., the finding of major histocompatibility proteins on bovine luteal cells [28]. Protein hormone fragments released from primary target cells could also modulate hormone binding of the parent hormone or related hormones to their cell receptors. Reichert et al. [29, 30] reported this action for FSH fragments. Fragment binding to protein/peptide hormone binding proteins (BPs) provides another means to modulate hormonal signaling function without changing intact protein hormone levels. This is akin to the molecular conversations between the maternal endometrial epithelium and the implanting blastocyst during optimization of implantation [31, 32]: trophoblastic growth hormones are muted by epithelial BPs that are inactivated by trophoblastic growth hormone BP proteases which are balanced by epithelial protease inhibitors, etc. Any protein hormonal system generating fragments that bind to circulating hormonal BPs may increase free hormone levels by occupying sites that would otherwise be occupied by protein hormone. In addition to the above possibilities for extending protein hormone signaling, recent work on circulating exosomes, which may arise from exocytosis of multivesicular bodies by live cells or from persistence of multivesicular bodies following cell death [33], provide another means for proteolytic fragments of protein hormones to make their way from primary to secondary hormonal targets. Although mature protein hormone molecules may serve as the source for secondary, proteolytically-generated modulating factors, the idea of nested information systems also extends to the original protein hormone translation product. Most of these products contain a 15–30 residue N-terminal leader peptide which anchors the transcript to the rough endoplasmic reticulum of the synthesizing cell. This is cleaved during post-translational processing. The transcripts also include pro-peptides that are often tens to hundreds of residues (and are not yet functionally characterized) along with one or more known and characterized protein hormone sequences that are cleaved within the synthesizing cell prior to release into circulation. The classic examples of multi-peptide hormone cascades from long precursors include pro-opiomelanocortin (POMC) [34–36], proenkephalins A and B [37], and gastrin [38]. Interestingly, there are also examples of multiple hormones being generated from the same transcript or translation product that control opposing physiological actions. The transcript for Ghrelin, a feeding stimulant, also encodes a long pro-peptide of unknown function along with the hormone obestatin, a feeding suppressant [39]. Selective processing in the secretory cell or during postsecretion generates a peptide mixture that modulates feeding behavior. The transcript for glucagon, a stimulant for gluconeogenesis and hepatic glucose release, essential in glucose metabolism and homeostasis, encodes various other functional peptides, including glucagon-like peptide 1 (GLP1), a direct inhibitor of glucagon secretion [40]. Selective processing of the glucagon transcript releases peptides that reinforce or counter glucagon actions. How release of the elements of such balancing systems is controlled, possibly via selective proteolysis of the gene translation product, is another problem that arises from the idea that protein hormones, and their original protein translation products, are nested or telescoped systems; systems in which elements helping to regulate a complete or complex physiological system are linked down to the DNA level and are released in a controlled manner at the synthesizing cell level, and/or during circulation to the primary target cell, and/or at the primary target cell level. Understanding these levels of processing and the peptide fragments involved will provide: a wealth of biomarkers to track the various peptides and functions involved, a set of tools to explore new aspects of intercellular signaling, and potential pharmaceuticals to modulate the cells and functions identified. Materials and methods General approach The project approach is summarized in Figure 2, color Supplemental Figure S2. To search for biologically active proteolytic fragments of protein hormones, FASTA sequences for protein hormone transcripts were obtained from the UniProt database [41]. These sequences and their known fragments were located as proteolytic substrates in the MEROPS database [42] which lists published, experimentally determined, cleavage sites of substrate proteins and their corresponding proteases [14]. In addition, FASTA sequences were submitted to the proteolytic cleavage prediction websites PROSPER [13], Site Prediction [16], or Prosperous [15] to locate probable cleavages by multiple proteases. The results of the known and predicted cleavages have been collected and added to a catalog of known human soluble protein hormones (op infra). Figure 2 Open in new tabDownload slide Transcript amino acid sequences and protease cleavage software was used to predict fragmentation patterns. Fragments were checked for 1D (BLASTp) and 3D (Label Hash) similarity to other proteins. Network neighborhoods of motif matched proteins were examined for partners. PDB structures then allowed examination of motif locations and possible involvement in protein hormone peptide modulation of protein–protein binding. Figure 2 Open in new tabDownload slide Transcript amino acid sequences and protease cleavage software was used to predict fragmentation patterns. Fragments were checked for 1D (BLASTp) and 3D (Label Hash) similarity to other proteins. Network neighborhoods of motif matched proteins were examined for partners. PDB structures then allowed examination of motif locations and possible involvement in protein hormone peptide modulation of protein–protein binding. Proteolysis predictions A download of PROSPER, Protease Specificity Prediction Server (Monash University, dated 2011) [13], was employed to predict cleavage site locations. The PROSPER prediction algorithm uses specificity information from the MEROPS database [13, 14] that has been validated for 24 different proteases (human, 588 known [43]) from the four major protease families (aspartic, cysteine, metallo-, and serine) to identify substrate protein cleavage sites for each of the enzymes used. The amino acid residue N-terminal to the cleavage site is color-coded by protease family (the color code assigned with the PROSPER program download was: red, aspartic; yellow, cysteine; blue, metallo-; and green, serine; multiples assigned to the highest scoring family at a given site). PROSPER achieves greater accuracy and coverage than its contemporaries (EXPASY, [44]; PoPS [45]; SitePrediction, [16]) through its more comprehensive server and machine-learning techniques. PROSPER output includes: the protease MEROPS ID and name; the substrate P4-P4′ sequence (scheme shown in Supplemental Figure S3) and location of the cleavage site prediction; and a calculated cleavage score (e.g., a result for the substrate kininogen-1 and the protease calpain-1: C02.001, calpain-1, 199, RITY|SIVQ, 1.23). The P4-P4′ scheme displays the amino acid residues (N-terminal to C-terminal) in the substrate sequence numbered outward from the cleavage site (P4-P3-P2-P1-P1′-P2′-P3′-P4′) where the cleavage site occurs between P1 and P1′. The cleavage score represents the confidence of each cleavage site prediction where the cleavage score must surpass a given threshold imposed by the machine-learning software to be considered a predicted cleavage [13]. For the PROSPER web version the thresholds vary by protease type and are specific to each of the proteases (in web reported cleavages these are usually 1.0–1.2). For the downloaded version of PROSPER we set a constant lowest cleavage score threshold (0.9, 0.95, or 1.0) for all proteases in a given program run. The reported prediction characteristics for all 24 proteases validated by the Monash group, based on cleavages in a broad spectrum of all possible substrate proteins, gave a sensitivity for a known cleavage (true positive, TP) of 80.10 ± 16.56%, a specificity for a known non-cleavage (true negative, TN) of 92.33 ± 3.60%, and an accuracy (= (TP + TN)/(TP + TN + FP + FN), where FP is a false positive and FN is a false negative) of 89.26 ± 6.63%. A check of PROSPER predictions for 51 protein hormone transcripts was done using Χ2 2×2 contingency tables on TP, TN, FP, and FN results found by comparing exact matches for all eight P4-P4′ cleavage site residues predicted from PROSPER to the known sites listed in MEROPS (“gold standard”) for the same substrate proteins. Although this computation (data not shown) used a limited number of protein substrates, it agrees with the Monash findings on specificity (81.9 ± 4.4%) and accuracy (79.3 ± 4.9%). It disagrees on sensitivity (25.1 ± 33.1%), probably due to the small data set drawn from a subset of all possible proteomes, and to the unbalanced result of known cuts and non-cuts examined. PROSPER uses a support vector machine, SVM, algorithm that maximizes differentiation between positive (cut) and negative (uncut) result classes; it favors specificity over sensitivity. Predicted proteolytic cleavage patterns were explored for 461 canonical human protein hormone transcripts using the downloaded version of PROSPER. The lowest cleavage score threshold for all proteases was varied from 0.9, to 0.95, to 1.0, gradually increasing the specificity of predictions while decreasing their sensitivity (and the predicted number of total cuts). For each transcript and cleavage score threshold, the number of cuts, fragments, and fragment lengths were tallied and used to compute frequencies as well as overall descriptive statistics for the canonical transcript dataset. Linear regressions were computed for fragment numbers versus cleavage threshold to ascertain upper and lower limits to the prediction threshold settings. Log-linear regressions of fragment length versus frequency, based on the 15 most abundant (>86%), and shortest, lengths, were used to extrapolate expected frequencies for the remaining, longer fragment lengths at each cleavage threshold. The number of fragments at each fragment length exceeding the log-linear regression expectation was computed and the population tested (one-tailed, paired, T-test) to evaluate excess numbers of protease resistant fragments. Residual sequence BLASTp Identified residual proteolytic cleavage products > 8–10 amino acids were used as queries in BLASTp software (optimized for short peptides) [46, 41]. Queries used the human, non-redundant protein sequence database, no compositional adjustment, and program default settings for short peptides (e.g., expectation value, threshold, word size, window size, substitution matrix, gap existence cost, and gap extension cost, respectively, for an 8 or 17 aa peptide: 200 000, 11, 2, 40, PAM30, 9, 1; or, a 55–121 aa protein: 10, 21, 6, 40, BLOSUM 62, 11, 1). The nonparent protein matches found indicated sequential motif matches of interest for the proteolytic peptides used as queries. Cutoffs used for BLASTp matches included coverage > 65%, with identity or similarity > 65%, and low E values (normally < 1; note, short sequences only have very low E values if the database searched is unconstrained (multispecies) and the match sequence fully covers, and is highly similar to, the query). Some resistant proteolytic peptides were also matched to motifs in other proteins using Label Hash [47] which uses a combination of linear sequence with residue bond angles and relative locations to find 3D structural motif matches within a proteome library; as these results only complement the BLASTp findings they have been omitted. Network neighborhoods The proteins found as sequential motif matches are termed proteolytic-peptide-matched-proteins, PPMPs. The PPMPs were located in the UniProt database to find any Protein Data Base (PDB) 3-D structures available and to identify their network neighborhood, NN, partners, PPMPPs, which were physically interacting proteins in the STRING, Cytoscape, BioGrid, or MINT searches linked to the databases [41] for each of the PPMPs. Matched sequence locations on PDBs The location of the peptide-matched residues within the PPMPs was identified by visualizing the PPMP PDB structures using Jmol [33] or the online PDB viewer iCn3D [48]. If the matched peptides were on the PPMP surface, they could modulate PPMP–PPMPP interactions. The protein hormone peptides identified by PROSPER, or in the MEROPS database, may inhibit or mimic PMPP–PPMPP interaction causing either inhibition or stimulation of the metabolic pathway or biological function involved. If the matched peptide motifs lie internal to the PPMP, it is unlikely that free protein hormone peptides could modify protein–protein interactions for the PPMP; internal motifs were not explored. Protein–protein modulation in PPMP–PPMPP co-crystals If both a PPMP and its network neighbors had PDB structures, co-crystals or co-models were sought. PDBs with PPMP/PPMPP structures allow visualization of how identified protein hormone proteolytic peptides might modulate PPMP–PPMPP interactions. Identification of these instances allows design of biochemical or genetic bench experiments for direct tests of hormone peptide modulation of specific protein–protein interactions. Such tests initially would be modifications of the protocols that originally identified nearest neighbor interactions between PPMPs and their PPMPPs. Protein hormone catalog compilation Although it is possible to track information for small numbers of protein hormone transcripts through the general protocol above, the task grows exponentially with larger numbers. To organize the collected data, and to find the magnitude of the task for encompassing all protein hormones in an organism, a specialized database of all known human soluble protein hormones, the Soluble Human Protein Hormone Proteome (v1.0), was constructed. 1 It was compiled by capturing and updating entries in the Peptide DB bioactive peptide database (http://www.peptides.be/) and augmenting them with entries from: the EndoNet genome encoded hormone list (http://endonet.bioinf.med.uni-goettingen.de/browse/hormoneClasses); soluble biomarker listings from Campbell and Rockett [49]; author notes from the 2011 to 2017 meetings of the Society for the Study of Reproduction and the Endocrine Society; and with clearly hormonal or secreted possible signal proteins found in a targeted UniProt search run in June 2014. The current catalog includes 2098 FASTA sequences including 932 transcripts, 461 of which are canonical (Supplemental Table 1), 915 known hormones, 436 known hormonal fragments, and 187 pro-peptides or known hormone pro-forms. All canonical transcript FASTA sequences in the catalog have been included in multiple alignment analyses using MUSCLE [50] or MAFFT [51] software. The resulting dendrogram molecular groupings and predicted evolutionary relationships among the various transcripts agree with those reported for individual hormonal families (angiopoietins, ANGs [52]; angiopoietin-like proteins, ANGPTLs [53]; cerebellins [54]; chemokines [55]; defensins [56]; endothelins, ENs [57]; family with sequence similarity 3, FAMs [58]; fibroblast growth factors, FGFs [59]; FGF-related proteins, FGFRPs [60]; growth hormones [61]; glycoproteins [62]; interferons, INFs [63]; insulin-like growth factors, IGFs [64]; interleukins, ILs [65]; neurotrophins, NTs [66]; platelet-derived growth factors, PDGFs/vascular endothelial growth factors, VEGFs [67]; tachykinins [68]; transforming growth factors, TGFs [69]; tumor necrosis factors, TNFs [70]; and, Wingless-related integration site factors, Wnts [71]). Transcript processing comparisons To visualize co-alignments of proteolytic cleavage locations with sequence or known structural motifs of protein transcripts, families of transcripts were aligned using MUSCLE or MAFFT within JalView [72] and overlain with color-coded cleavage locations from PROSPER. Secondary structure maps were constructed for each canonical transcript with the psipred structure prediction subroutine of Jones [73] within PROSPER; major features were single-letter coded: helix (H), beta sheet (E), and random coil (C). Combined map results were exported as.eps files then cropped and folded using Adobe Photoshop. Proteolytic cleavage alignment with exon junctions Alignment of predicted proteolytic cleavage sites with structural features of the hormone transcripts was extended to transcript processing by identifying the DNA exon boundaries within the transcribed protein via the e!Ensembl [74] (version 98) database. As e!Ensembl entries include all possible known transcripts of source genes, the first with the same transcript length as the canonical sequence in UniProt was chosen for further analyses (this was usually also cross-referenced within UniProt). Since visual inspection of proteolytic maps suggested cuts were more numerous near exon junctions, manual counts were made of cuts predicted within ±6 residues of known exon junctions for the third exon in 29 protein transcripts from 11 different hormonal families (Supplemental Table 2). Frequencies of predicted proteolytic cleavages near exon junctions were compared with all other predicted proteolytic cleavages within the encoded exon peptides; exon junctions were designated by the number (relative to the transcript N-terminus) of the last amino acid residue fully coded in the mRNA. Later coincidence tests in 461 canonical transcripts and their exon maps compared cleavages occurring within ±10 residues of any internal exon junction (position 0 is the C-terminus of the exon) or +10 residues of the N-terminus of the transcript, or −10 residues of the transcript C-terminus, to the number of cleavages that occur outside of the scanned windows.2 The number of residues at a given position in the scanning window is summed across all exons within a transcript or across all transcripts analyzed. An in-house program that mimics the manual steps was used for tabulations. Exon numbers and lengths were tallied for each of the transcripts and frequencies of exon number and exon length were computed for the full dataset. Persistence of the coincidence of cuts and junctions was also examined when the PROSPER minimal cleavage cutoff score was set at 0.9, 0.95, and 1.0. Probability of random coincidence for proteolytic cuts and exon junctions was computed using Χ2 analyses of 2×2 matrices containing the expected and actual numbers of cleavages or non-cleavages within or outside of the total scan window or at or outside of individual positions at specific distances from known exonal boundaries. Persistence of coincidence patterns was examined using two-way analysis of variance (ANOVA) on cuts/residue for all the cleavage predictions generated using three cleavage cutoff scores (op cit) arranged by position relative to an exonal junction; an F test against pooled overall error tested effects of PROSPER setting and position relative to a junction. Results Multiply proteolyzed transcripts have residual sequences > 10 amino acids long PROSPER analyses of all 461 canonical human protein hormone translation products (Supplemental Table 1) was run (total length 111 561 residues, mean/transcript 242 ± 170, range 64–1663). With the minimal cleavage score threshold at 1.0, where all predicted cleavages are likely correct, all transcripts demonstrate residual peptides of > 10 amino acids after exposure to 24 proteases. A total of 13 656 cleavages are predicted, 30 ± 24 per transcript; the mean number of peptides > 10 residues per transcript was 9 ± 7 (mode 5, range 1–68); the fragment number distribution is bimodal with peaks at 7 and 14 resistant fragments per transcript (Supplemental Figure S4). Mean predicted fragment length was 8 ± 8 (median 5, range 1–87). The peptide length distribution (Figure 3) is log-normal with a significant excess (P = 3 × 10−6) of proteolytically resistant fragments longer than 15 residues; total excessively long fragments, 1944, are 13.77% of total fragments. Predicted cuts were made by all four protease classes: aspartate, 3.5 ± 3.7%; metalloproteases, 18.26 ± 14.25%; cysteine, 26.88 ± 29.18%; and serine, 51.38 ± 41.88%. Figure 3 Open in new tabDownload slide Predicted fragment frequency by fragment length for 461 canonical transcripts of human protein hormones at three minimal proteolysis scores for PROSPER [13]. At minimal scores of 1.0, 0.95, and 0.90, long fragments, that exceed the expected length based on extrapolation of a regression from the first 15 data points, are significant (P < 0.001, P < 0.01, P = 0.021, respectively) and constitute 13.77%, 14.42%, and 1.6% of the total residues present. Figure 3 Open in new tabDownload slide Predicted fragment frequency by fragment length for 461 canonical transcripts of human protein hormones at three minimal proteolysis scores for PROSPER [13]. At minimal scores of 1.0, 0.95, and 0.90, long fragments, that exceed the expected length based on extrapolation of a regression from the first 15 data points, are significant (P < 0.001, P < 0.01, P = 0.021, respectively) and constitute 13.77%, 14.42%, and 1.6% of the total residues present. A minimal cleavage score of 0.95 makes cleavage prediction more sensitive but a little more often false. Total predicted cuts increase to 21 265 (46 ± 35 per transcript, averaging 5 ± 5 residues in length (range 1–53) with a mean of 7 ± 5 fragments > 10 residues/transcript; the long fragment distribution (Supplemental Figure S4) is left-skewed normal with a peak at 4 fragments/transcript, a shoulder at 5–7 fragments/transcript and secondary peaks at 9, 12, and 14 fragments/transcript. The peptide length distribution (Figure 3) is still log-normally distributed with an excess (P = 0.00022) of 961 (14.42% total) fragments longer than 15 residues. The protease classes for predicted cuts closely resemble those at minimal cleavage score 1.0. Reducing the minimal cleavage score to 0.9 again increases sensitivity and likelihood of false predictions. Total predicted cuts increase to 27 348 (59 ± 45 per transcript, mean length 4 ± 4 residues (range 1–40) with a mean of 5 ± 4 fragments per transcript > 10 residues long; the long fragment distribution is left-skewed normal with a single peak at 3 long fragments per transcript. Peptide length distribution is log-linear with fewer fragments longer than expected (P = 0.021; 445, 1.6% of total). Protease classes generating the cuts shift slightly (aspartate, 2.78%; metalloprotease, 15.80%; cysteine, 19.27%; and serine, 62.15%). The regression line for total predicted fragment number versus minimal cut score (Supplemental Figure S5) was y = −152 770x + 165 755, R2 = 0.985. Extrapolated to a cut score of 0.3547, total fragments equal total residues, i.e., complete transcript proteolysis. The regression line for long fragment numbers on minimal cut score was y = 19 550x − 15 426, R2 = 0.9995. Extrapolation to an intersection with total fragment numbers (cut score = 1.0514) implies an upper bound of long fragments of 5130; a minimum cut score of 1.0 achieves 80% (4110) of that maximum. At a minimal cut score of 0.789, long fragments disappear and total predicted fragments = 45 220, ~ 40% of the total residue number. Among residual peptides at the minimal cut score of 0.9, 1.35% demonstrate amino acid repeats of > 4 residues, most are leucine (0.69%), glutamic acid (0.22%), glycine (0.12%), proline (0.1%), serine (0.07%), or arginine (0.07%), consistent with leucine zippers in leader sequences or probable charged alpha helices elsewhere. The low proportion of total repeat residues suggests these sequences do not dictate the overall cleavage patterns observed. Visually, the scatter of predicted cleavages across transcripts is nonrandom. Patterns are shared by hormone family members and the clustering of cleavage sites is tight in some cases, loose in others. It appears protein hormones have evolved so portions of the transcripts are proteolytically resistant to multiple classes of protease, thereby preserving some proteolytic fragments longer in vivo. Simultaneously, other transcript portions are targeted by multiple protease classes, thereby insuring rapid destruction. Protein motif boundaries The nested information paradigm was explored initially with human interleukin and CXC chemokine subfamilies which work together and must respond synchronously to similar stimuli during innate immunity processes, e.g., leukocyte migration. This was extended to the FGF family and then to the entire catalog of canonical sequences in the human protein hormone proteome. Several conserved domains in a segment of the FGF family multiple sequence alignment are apparent from the Quality and Consensus plots at the bottom of Figure 4, Supplemental Figure S6. Although no pattern is apparent for many predicted cleavages, a notable number coincide with domain borders, e.g., residues 165, 167, 170, 187, 197, 211, 213, 243, 252, 253, 256, 258, and involve multiple proteases, or protease families. This was also seen in ILs and CXC chemokines, some borders of conserved family and inter-family motifs are also common with predicted protease cuts (not shown). The domains and cleavages often flank structural features in 3D models of these proteins, many in areas open to solvent. Similar, less common, co-alignments are also found in FGFs. Coincidence of these patterns with different proteases acting on different members of a hormone family implies evolutionary pressure to conserve the protein breakdown pattern and, at least temporary availability of freed domains. Shared patterns for ILs and CXC chemokines also suggest a possible mechanism, via secondary actions of common proteolytic products, by which these hormones might coordinate during innate immunity. Figure 4 Open in new tabDownload slide MUSCLE sequence alignments for FGFs 1–12, 14, and 16–23, residues 150–264, demonstrating conserved domains are shown overlain with proteolytic cleavage site predictions from FGF transcripts. Conserved sites of high quality coinciding with protein domain borders are indicated by * above the aligned sequences. Reported PROSPER cleavage predictions for eight proteases from three protease superfamilies are shown with the corresponding color over the P1 residue of the predicted cleavages. The colors, proteases, protease superfamily, their MEROPS IDs, and the % accuracy, % sensitivity, and % specificity of the PROSPER predictions were: , cathepsin K, cysteine, C01.036, 79.6, 47.1, and 90.6; , matrix metallopeptidase-2, metallo-, M10.003, 87, 77.4, and 90.2; , matrix metallopeptidase-9, metallo-, M10.004, 81.2, 28.9, and 98.6; , metallopeptidase-3, metallo-, M10.005, 79.9, 33.6, and 95.4; , matrix metallopeptidase-7, metallo-, M10.008, 81.6, 31.6, and 98.2; , elastase-2, serine, S01.131, 82.9, 37.8, and 98; , cathepsin G, serine, S01.133, 81, 71.6, and 84.1; , thrombin, serine, S01.217, 90.2, 64.9, and 98.6. Figure 4 Open in new tabDownload slide MUSCLE sequence alignments for FGFs 1–12, 14, and 16–23, residues 150–264, demonstrating conserved domains are shown overlain with proteolytic cleavage site predictions from FGF transcripts. Conserved sites of high quality coinciding with protein domain borders are indicated by * above the aligned sequences. Reported PROSPER cleavage predictions for eight proteases from three protease superfamilies are shown with the corresponding color over the P1 residue of the predicted cleavages. The colors, proteases, protease superfamily, their MEROPS IDs, and the % accuracy, % sensitivity, and % specificity of the PROSPER predictions were: , cathepsin K, cysteine, C01.036, 79.6, 47.1, and 90.6; , matrix metallopeptidase-2, metallo-, M10.003, 87, 77.4, and 90.2; , matrix metallopeptidase-9, metallo-, M10.004, 81.2, 28.9, and 98.6; , metallopeptidase-3, metallo-, M10.005, 79.9, 33.6, and 95.4; , matrix metallopeptidase-7, metallo-, M10.008, 81.6, 31.6, and 98.2; , elastase-2, serine, S01.131, 82.9, 37.8, and 98; , cathepsin G, serine, S01.133, 81, 71.6, and 84.1; , thrombin, serine, S01.217, 90.2, 64.9, and 98.6. Maps of hormone families, e.g., Figure 4, Supplemental Figure S6, also suggest loci where probable cleavage sites were missed, possibly due to low predictive sensitivity, by the prediction software. Some cleavages common across a protein family are probably missed, e.g., in positions 165, 167, 213, 243, 252, and 291, as the same amino acid residue or a conservative replacement is indicated as not cleaved when the most similar residues in family members are predicted as cleaved. Note, these inconsistent predictions may provide a way to estimate the FN rate for the transcripts examined. Network neighborhoods (NN) Three examples (follistatin [P19883], gremlin 2 [Q9H772], and TNFα [P01375]) using STRING, restricted to searches of experimental, database, co-expression, and homology data with a maximum of 10 neighbors, are shown in Figure 5 (Supplemental Figure S7, color) for PPMPs found using BLASTp in UniProt. Figure 5 Open in new tabDownload slide STRING Neighbors for proteins containing peptides that match hormonal proteolytic peptides. In the networks the red ball is the central node of the neighborhood. Combined scores indicate strong, probable protein–protein interactions with neighbor proteins (PPMP partners, PPMPPs). (A) DOCK4 residues 1189–1197 are a match for a protease resistant follistatin peptide, residues 42–72; DOCK4 interacts with at least 10 other proteins. (B) CHRNB3 residues 401–409 are a match for protease resistant gremlin 2 residues 26–34; CHRNB3 interacts with at least 10 other acetylcholine receptor subunits. (C) LRRC32 residues 239–253 are a match for protease resistant TNF-α residues 96–110; experiments show LRRC32 interacts with three adhesion G protein-coupled receptors. Figure 5 Open in new tabDownload slide STRING Neighbors for proteins containing peptides that match hormonal proteolytic peptides. In the networks the red ball is the central node of the neighborhood. Combined scores indicate strong, probable protein–protein interactions with neighbor proteins (PPMP partners, PPMPPs). (A) DOCK4 residues 1189–1197 are a match for a protease resistant follistatin peptide, residues 42–72; DOCK4 interacts with at least 10 other proteins. (B) CHRNB3 residues 401–409 are a match for protease resistant gremlin 2 residues 26–34; CHRNB3 interacts with at least 10 other acetylcholine receptor subunits. (C) LRRC32 residues 239–253 are a match for protease resistant TNF-α residues 96–110; experiments show LRRC32 interacts with three adhesion G protein-coupled receptors. For follistatin, residues 42–72 (CQVLYKTELSKEECCSTGRLTSWTEEDVND), form one of 15 proteolytically resistant peptides of >10 residues found for this protein. Human DOCK4, Dedicator of cytokinesis protein 4 (Q8N1I0), residues 1189–1197 (YKTELNKEE) are a match (E: 0.76; Score: 62; Identity: 88.9%; and Positives: 88.9%) for follistatin residues 46–54 (YKTELSKEE). STRING results for DOCK4 show 10 potential PPMPPs, all signal transduction or microtubule related, Figure 5A. Similarly, a randomly chosen 36 residues, proteolytically resistant, Gremlin 2 (Q9H772) peptide (80–115, QTVSEEGCRSRTILNRFCYGQCNSFYIPRHVKKEEE), matches a nonparent protein (E: 1.1; Score 62; Identity: 88.9%; and Positives: 88.9%) for a peptide (401–409, YISRHVKKE) in neuronal acetylcholine receptor subunit β-3 (Q05901). The STRING neighborhood analysis is shown in Figure 5B, the interactors being other subunits of acetylcholine receptor. The proteolytically resistant TNFα (P01375) peptide, residues 90–111 (AHVVANPQAEGQLQ WLNRRANA) matches (E: 0.82; Score: 59; Identity: 66.7%; and Positives: 73.3%) TGFβ activator LRRC32 (Q14392), residues 239–253 (PQAEFQLTWLDLREN); the STRING neighborhood is in Figure 5C. The structures in all 26 of the NN protein node symbols in Figure 5 indicate there are 3D structures (PDB) or models (Swiss Model Repository) available to explore how the protein–protein interactions shown physically occur. Data indicate >78% of the PPMPs and PPMPPs have known PDB structures or models available. Matched sequence locations on PDBs Three 3D models (solved structures from crystals, NMR results, or computational models based on solved structures) are shown in Figure 6 (Supplemental Figure S8, color) for the PPMPs explored in Figure 5. Locations of the BLASTp matched peptides are circled: in DOCK4, residues 1189–1197, a match for follistatin, residues 46–54; in neural acetylcholine receptor β-3, residues 401–409, a match for gremlin 2, residues 105–113; and in LRRC32, residues 239–253, a match for TNF-α, residues 96–110. The matched residues are: at the exterior turn connecting two helices near one surface of the DOCK 4 protein, Figure 6A; at a near-random exterior loop of acetylcholine receptor β-3, Figure 6B; and, at an exposed surface loop of LRRC32, Figure 6C. Thus far, most of the matched peptides map to the surfaces of the PPMPs. Figure 6 Open in new tabDownload slide Locations of the residues that match proteolytically resistant residues from the protein hormones examined in Figure 5 are circled. All are at their respective protein’s exteriors where they are apt to play roles in interacting with their identified network neighborhood proteins. (A) is the Swiss Model Repository structure 3b13.1.A of human DOCK4; residues 1189–1197, circled in red, are a match for follistatin residues 46–54. (B) is ModBase Model e34e9351aab0f15ce364f4bd7825ef21 of human neural acetylcholine receptor β-3; residues 401–409, circled in red, are a match for gremlin 2 residues 105–113. (C) is a single entity 3 extract from PDBe 6gff (LRRC32 in complex with latent TGF-β1 and MHG-8 Fab); residues 239–253, circled in red, are a match for TNF-α, residues 96–110. Figure 6 Open in new tabDownload slide Locations of the residues that match proteolytically resistant residues from the protein hormones examined in Figure 5 are circled. All are at their respective protein’s exteriors where they are apt to play roles in interacting with their identified network neighborhood proteins. (A) is the Swiss Model Repository structure 3b13.1.A of human DOCK4; residues 1189–1197, circled in red, are a match for follistatin residues 46–54. (B) is ModBase Model e34e9351aab0f15ce364f4bd7825ef21 of human neural acetylcholine receptor β-3; residues 401–409, circled in red, are a match for gremlin 2 residues 105–113. (C) is a single entity 3 extract from PDBe 6gff (LRRC32 in complex with latent TGF-β1 and MHG-8 Fab); residues 239–253, circled in red, are a match for TNF-α, residues 96–110. Protein–protein modulation in PPMP–PPMPP co-crystals Figure 7 (Supplemental Figure S9, color) shows a co-crystal, PDB 1m4u, of two molecules of noggin (Q13253), an inhibitor of bone mophogenic proteins (BMP4, BMP6, and BMP7), with two molecules of BMP7 (P18075). BMP7, residues 335–344, LYVSFR-DLGW, (equivalent to residues 43–52 in each half of the mirror-image crystal) is a PPMP for a sequence found in TGFβ1 (P01137), residues 298–309, LYIDFRKDLGW (BLASTp values, E: 1.2; Score: 56; Identities: 73%; and Positives: 81%). The interface (space-filled) between noggin and BMP7 is a sequence conserved between TGFβ1 and BMP7. The original proteolytic peptide from TGFβ1 may interact with the known BMP7 soluble modulator noggin. Although BMP7 is a member of the larger TGFβ1 family, the co-crystal demonstrates the analytical approach can find contact regions of PPMPs and PPMPPs that are susceptible to presence of the original matched peptide. Figure 7 Open in new tabDownload slide PDB 1m4u, co-crystal of two molecules of BMP7, pink, and two molecules of noggin, blue. The space-filled residues in BMP7 match a proteolytically resistant peptide from TGFβ1. The space-filled residues in noggin interact with the same BMP7 residues. Figure 7 Open in new tabDownload slide PDB 1m4u, co-crystal of two molecules of BMP7, pink, and two molecules of noggin, blue. The space-filled residues in BMP7 match a proteolytically resistant peptide from TGFβ1. The space-filled residues in noggin interact with the same BMP7 residues. Proteolytic cleavage alignment with exon junctions Co-maps of exon boundaries and predicted proteolytic cleavage sites for exon 3 in 29 protein hormones (Supplemental Table 2) covering 11 different hormonal families showed strong associations averaging two amino acids C- and N-terminal to the exact exon boundary in the encoded proteins. Proteolysis predictions (cuts/protein ± SD, 41.03 ± 10.96, n = 192) within a window of ±6 residue bonds at exon boundaries exceeded those found across the encoded exon proteins as a whole (39.78 ± 8.28, n = 426), Student’s t = 2.677, P < 0.005. A histogram of frequency of cleavages per peptide bond in the third exon versus the number of residues from a known exon junction showed marked elevation of cut rates near the N-terminal beginning of the exon (position +1) and again near the C-terminal end (position 0). A Χ2 value (14.2, P < 0.001) was computed for observed cleavage locations inside versus outside a ± 10 residue window near the exon boundaries. Analysis of all 461 canonical transcripts (1939 exons) from the human protein hormone catalog was run on the predicted cleavages at each of the minimal cut scores 0.9, 0.95, and 1.0. Each run yielded a similar pattern. The two-way ANOVA on cuts/total available bonds at a given position arrayed by minimal cut scores and by position relative to an exon boundary was highly significant, F = 300, P << 0.001, for the effect of cut scores (versus residual error). The effect of position was also significant, F = 2.39, P < 0.01. At a cut score of 0.9, the number of all predicted cleavages beyond ± 10 residues of an exon junction versus those within that analytical window for all transcripts and exons was more modest, Χ2 = 5.38, P < 0.05, suggesting there were more cuts near exon junctions. At cut scores of 0.95, and 1.0, respectively, Χ2 was 2.09, P < 0.15, and 3.21, P < 0.07, suggesting more predictive specificity eliminated the observed cut and exon junction coincidence. However, extending the Χ2 analyses to individual positions in the analytical windows (Figure 8), e.g., at cut score 0.95, the cleavage frequency range was 0.1667–0.2249 relative to a background rate of 0.1932 ± 0.0002 outside the evaluation windows. At positions 0, +3, and +8, i.e., at the exon junction, 3 residues C-terminal to the junction, and a full proteolytic binding site distance C-terminal to the exon junction, frequencies were 0.2249, 0.2218, and 0.2174, P < 0.01 (Χ2) relative to background. A frequency nadir relative to background of 0.1667 (P < 0.01) was also noted at −6, not quite a proteolytic binding site width, N-terminal to the exon location; position −1 was also low 0.1696 (P < 0.02). This pattern was repeated when the cut score was increased to 1.0 (background 0.1249 ± 0.0002) with the same residues showing excess or deficit cleavage rates (−6, 0.0994; −1, 0.0958; 0, 0.1439; +3, 0.1472; +8, 0.1422; all P < 0.02) relative to residues distant from the exon junctions. Alternative comparisons of positional cleavage rates using odds ratios (not shown) identified the same variant residues. Collectively, the results indicate perturbed cleavage frequencies close to exon junctions. Figure 8 Open in new tabDownload slide Coincidence of predicted proteolytic cuts and exonal junctions in 461 canonical transcripts of protein hormones. PROSPER was run with a minimal cut score of 0.95. Mean rates of cutting at positions within 10 residues of a known exonal junction are shown relative to a baseline obtained using all cuts in the protein, 0.1932 ± 0.0002; 0 is the residue immediately N-terminal to the exon junction; the band corresponding to ±95% CI around the baseline mean is shown by the dotted lines. Residues with a coincidence probability, versus the baseline, of <0.01 (Χ2) are shown by the asterisks. Figure 8 Open in new tabDownload slide Coincidence of predicted proteolytic cuts and exonal junctions in 461 canonical transcripts of protein hormones. PROSPER was run with a minimal cut score of 0.95. Mean rates of cutting at positions within 10 residues of a known exonal junction are shown relative to a baseline obtained using all cuts in the protein, 0.1932 ± 0.0002; 0 is the residue immediately N-terminal to the exon junction; the band corresponding to ±95% CI around the baseline mean is shown by the dotted lines. Residues with a coincidence probability, versus the baseline, of <0.01 (Χ2) are shown by the asterisks. Discussion The studies of Li et al. [34–36] on the breakdown of pro-opiomelanocortin, POMC, and subsequently those of Goldstein et al. [75] on dynorphins, Bateman et al. [76] on granulins, and Rehfeld et al. [77] on gastric hormones, showed cascades of peptide messengers were telescoped into single transcripts that gradually released the nested hormones as they were proteolytically processed in the synthetic cell, during circulation, and/or in the primary target cells. Was this a more general pattern for all protein hormonal transcripts? Studies of in vivo breakdown of 125I α/131I β hCG in the rat [78, 79] suggested the two hormonal subunits were processed differently and that acid-precipitable (large) fragments of the beta-subunit persisted longer in some ovarian target cells (granulosa) than in others (luteal). Did that mean more protein hormones fit the POMC pattern? Since Campbell et al. [78, 79], other examples of nested hormonal cascades akin to POMC [76, 77, 40] have appeared, as well as a slow stream of reports on single hormones indicating some hormone target cells do generate protein hormone breakdown products that are biologically active [23–25]. Still, not until completion of the Human Genome Project [80] and development of a series of reliable proteolysis prediction tools (EXPASY [44], SitePrediction [16], PROSPER [13]) was it possible to begin a broad attack on the nested hormone problem. If most protein hormone transcripts do include multiple peptide messages that may target primary target cells, secondary target cells, circulating carrier proteins, receptors for the primary hormone or other physiologically related hormones, or target cell intracellular metabolic pathways (Figure 1), then full description of protein hormone peptide breakdown patterns and testing of their proteolytically resistant fragments would open many new avenues to modulation and exploration of endocrine systems. This study has begun that process by producing: (1) an outline, with examples, of an investigative pathway leading from predictions to in vivo tests of how proteolytically resistant protein hormonal peptides may modulate protein–protein interactions (Figures 3–7); (2) an initial catalog listing: currently known soluble human protein hormones, their canonical transcripts (Supplemental Table 1), their predicted proteolytic fragmentation patterns (e.g., Figures 3 and 4), their predicted secondary structures, and their matched exon junction maps (Supplemental Table 1); (3) a numerical analysis of predicted proteolytic patterns observed for all canonical protein hormone transcripts; and (4) a coincidence analysis of predicted proteolysis locations for all canonical protein hormone transcripts and their exon junctions (Figure 8). A few randomly chosen examples provide insufficient data for a firm conclusion regarding generalized behavior of protein hormones as nested information systems, so a collated list of secreted human protein hormones has been a key tool in this study. Individual lists compiled from endocrine and physiological literature (http://www.peptides.be/; http://endonet.bioinf.med.uni-goettingen.de/browse/hormoneClasses, [49]) by 2014 did not approach estimates suggested in molecular genomic studies. The Human Proteome Project [81] lists ~ 1500 proteins as secreted with ~ 900 being extracellular, including several hundred enzymes and carrier proteins. Recently, Jiang et al. [82] working from mRNA body-wide human tissue expression patterns found “… that many (501/1902) predicted secretory proteins have good protein/RNA concordance … It is likely that these proteins undergo regulated secretion in which proteins are stored in secretory vesicles and released upon stimulation …” The catalog compiled for the current study approaches 500 canonical members and has > 900 total transcripts, numbers approximating these newest, purely molecular, secretome results. No doubt our catalog has gaps; several listings were added during the study from new publications or new UniProt entries while others were removed when UniProt or e!Ensembl curation showed them to be pseudogenes or alternative transcripts from actual in vivo mRNA. This catalog is a dynamic tool capable of responding to such changes and to other users’ additions. Results here capture behavior of proteins in the hormonal portion of the secretome and provide guidance for exploring newer additions. The study depends on well-validated proteolytic prediction software. Although several programs, e.g., EXPASY [44], SitePrediction [16], an in-house program, Protein Cyberase, based on EXPASY [83], were tried initially, most used only primary amino acid sequence information and formal sequence pattern rules to predict cleavages arising from a few common proteases. However, Song et al. have developed several newer programs (PROSPER [13]; PROSPERous [15], iProt-Sub [84]) which use SVM techniques to optimize program operation and inclusion of more structural and environmental information into their predictions. The developers use half of proteome-wide datasets to build the predictive algorithms and the remainder of the data to verify the predictions against published information in databases like MEROPS [14]. The most thoroughly tested of these programs is PROSPER which generates results for 24 proteases from the four common protease families (aspartate, metallo-, cysteine, and serine). PROSPER generates results for each of the proteases individually as it scans the primary protein substrate sequence from N- to C-terminus. It combines results for all 24 individual proteases and reports the data including scores that correlate with cleavage probability at each substrate peptide bond. PROSPER does not consider how the action of one protease or specific cleavage will affect the action of the same or other proteases at other substrate bonds. It also does not consider how common protein modifications, e.g., glycosylation, acetylation, methylation, phosphorylation, or lipidation, would affect cleavage predictions. Still, overall prediction performance of the program for all 24 proteases tested on proteome-sized databases was very good: sensitivity (TP rate) 80 ± 17%, specificity (TN rate) 92 ± 4%, and accuracy ((TP + TN)/Total) 89 ± 7%. In using PROSPER, this study favored specificity and accuracy in predictions over sensitivity. MUSCLE alignments of PROSPER predicted proteolysis patterns for several families of protein hormones (Figure 4, Supplemental Figure S6) show overall consistency across families, but gaps, demonstrated by non-cleavage in similar sequences cleaved in most family members, are supportive of low sensitivity among the samples used. Adjusting minimal cut scores within the program helps clarify patterns as program sensitivity and specificity vary. Patterns seen at cut scores of 0.95 and 1.0 are most similar and agree most closely to the Web version of PROSPER which adjusts cut scores for individual proteases; they appear to mimic physiological results. The newest predictive programs, iProt Sub [84] and PROSPERous [15], include 38 and 90 validated proteases from the same four protease families. Inclusion of all possible forms of protease in each of the major protease families is redundant in the present study as many will cleave at the exact same loci. Indeed, even results for the 24 proteases in PROSPER demonstrate frequent, cleavage-specific redundancy among protease family members. Ultimately, protease cleavage prediction needs to incorporate: effects of shifting pH within physiological compartments (Figure 1, Supplemental Figure S1); effects of temporal order of a series of proteases (Figure 1); actions of a protease on structural reorganization of the substrate protein; and, effects of known protein modifications on proteolytic actions. Most proteins include globular domains. External surfaces of these domains, along with exposed random coils, are the actual interfaces interacting with proteases. Once a protease acts, the resulting peptides alter their secondary, tertiary, and quaternary structures. That reorganization may permit, or prevent, further action of the initial protease, or subsequent proteases, on the product peptides. Likewise, presence of protein modifications will alter protease access or actions. Algorithms that track changing substrate structures and follow them through a physiologically relevant series of proteases (an idea we term onion peeling) will improve predictive program performance and provide an important tool to further develop our findings. These collective proteolytic predictions suggest few protein hormone transcripts follow the dogmatic pattern of synthesis, primary action, and rapid, complete proteolysis. Persistent resistance to 24 proteases from four protease subclasses of fragments >10 amino acids in length for all canonical transcripts examined suggests incomplete transcript digestion is the rule (at minimal cut scores of 1.0, 0.95, and 0.9 persistent fragments per transcript numbered 9 ± 7, 7 ± 5, and 5 ± 4, respectively). Persistent fragment size matches known hormonal peptides (e.g., oxytocin, LHRH, MSH, angiotensin II, somatostatin), and BLASTp matches to other human proteins in the proteome (Figures 5–7) agree with published sizes of protein–protein interaction motifs [85]. The commonly identified surface location of those peptides matched by BLASTp to portions of proteolytically resistant protein hormonal peptides is exactly as expected if the matched peptides are involved in protein–protein interactions. And, it is exactly where expected if the protein hormonal peptides play a modulatory role, even transiently, in those same protein–protein interactions. Proteins displaying BLASTp matches include: other hormones, hormone receptors, hormonal binding proteins, transduction factors, transcriptional factors, cell cycle modulators, and cellular enzymes. It remains unclear, pending a complete scan of all BLASTp results for all proteolytically resistant protein hormone peptides, if there is a pattern of pairing between resistant fragments, or fragment sizes, and classes of proteomic partners. But, it is clear the frequency of matches suggest nuances of endocrine and physiological controls extending beyond current classical models of hormone actions and controls. The idea that macromolecules as complex as proteins serve as singular messages working only through known signal transduction pathways is a dogma that may need revision. Especially so given possible escape of resistant peptides from endosomal/lysosomal vesicles and the repeated finding of sequence matches for proteolytically resistant protein hormone peptides in other intra- and extracellular proteins while, simultaneously, sequences as small as the TRH tripeptide prove unique in the protein hormone proteome to that hormone. Are proteolytically resistant peptides non-canonical signals? Intracellular modulation of metabolic pathways by proteolytically resistant peptides would parallel the role of small RNAs in transcription and translation [22]. Although this study cannot determine the biological lifetime of the predicted peptides, it opens the way for biochemical and genetic tests of their function. Most are small enough to be synthesized in bulk yet large enough to generate sequence-specific antibodies, the basis of high-throughput biomarker screens. Many may already exist in biological or pharmaceutical peptide libraries. Their small size allows them to be identified in biological samples by sensitive physical methods, e.g., HPLC, MS, or MALDITOFMS. Relationships to specific genes allow molecular modification of peptide or protease production or action via genomic methods. Peptide modulatory function investigation follows NN literature on protein–protein interactions between PPMPs and their PPMPP matched partners. Cross-species comparisons may provide information on preservation of proteolytic breakdown patterns that allow access to specific motifs or functions encoded by the exons of modern proteins and/or if there are archaic open reading frames (ORFs) in some species that correspond to some fragments described. Computational programs that predict secondary and tertiary structure, or proteolytic breakdown, depend heavily on primary transcript sequence. It is impossible to completely disentangle these structural levels to look independently at them or at proteolysis. But primary protein sequences and exon codes within DNA ORFs lie biochemical steps apart so exon translations may be treated as independent in correlational or coincidence analysis. Sequences with a high proportion of predicted proteolytic cleavages in the canonical transcripts are preserved across family members, (Figure 4, Supplemental Figure S6, positions 195–208). Those elevated proportions suggests these transcript segments are near protein surfaces, exposed to protease actions, favoring their cleavage and removal during transcript metabolism and hormonal clearance. An evolved structure favoring proteolytic peptide removal mirrors an evolved structure favoring peptide preservation; it prevents peptides that could interfere with homeostatic cellular or organismal function from persisting. Protein hormone families are defined by similarities in primary and secondary structures and in functional motifs. Predicted proteolytic cleavages also align with motif boundaries (Figure 4 and Supplemental Figure S6). Lack of complete agreement in these patterns across family members reflect changes in higher order structure dictated by unique primary sequences of deviant family members, changes that affect algorithm parameters beyond primary sequence, or lack of sensitivity in prediction programs. Alignment of cleavage motifs with edges of structural/functional motifs also suggest proteolysis plays a role in liberating fragments common among family members that play roles beyond those of their source transcripts. The patterns suggest conservation of proteolytic paths as ways to access structures or functions of organismal importance. They are consistent with protein hormones including layers of signals nested in their primary sequences. Coincidence of predicted proteolysis sites with transcribed exon junctions was unanticipated. Elevated rates of cutting occur at the exon junction (position 0), 3, and 8 residues C-terminal to 0, and decreased rates occur at 6 and 1 residue N-terminal to 0 (Figure 8). Most elevated rates lie at the C-terminus of one exon or the immediately following N-terminus of the next exon. As in gene transcription or mRNA translation, processing efficiency favors the beginning of exons. Full explanation of the pattern in Figure 8 requires examining coincidences with exon junctions for each class of protease, and possibly for each individual protease. The peak of cleavage at +8, is expected if proteases cleave preferentially near an exon junction and, again, upstream by one protease binding site. Is there a process underling the patterns? If evolution has preserved proteolytic target sites to allow excision of important cellular or physiological structures or information, the patterns agree with findings of proteolysis at the edges of structural/functional motifs. Excision of secondary messages or metabolically functional fragments fits the hypothesis that all protein hormones (and possibly non-hormonal proteins) are nested information systems in which the gene transcripts carry multiple physiological chemical signals that are gradually released during the biological lifetime of a protein molecule, including its degradation. Preservation of proteolytic pathways provides access to structures or information from archaic ORFs that have been fused in modern protein exons. Telescoping of structural or functional information into more complex modern transcripts allows organisms to simplify their genomes by culling simpler, archaic genes. This nested information idea is a provocative extension of the genes in pieces model, allowing old fused information to be accessed without need for yet newer fused genes. Do coincidences of proteolytic cleavages and exonal junctions apply to all proteins? The nested information systems concept serves to augment, or even replace, current models of hormone signaling. The idea helps build a deeper understanding of the various ways protein hormones are regulated, transcribed, and eliminated, and of the mechanism(s) by which they access and maintain multiple functions. Conflict of interest None of the authors nor their trainees have any conflicts of interest to declare. Authors’ contribution KC initiated and coordinated the project, designed the study outline, assembled the first version of the Soluble Human Protein Hormone Proteome, co-mentored the MS trainee and 20 of the undergraduate trainees assisting on the project, collated and analyzed all the Excel spreadsheets for the fragmentation and coincidence studies, and drafted, co-edited, and submitted the manuscript and all the figures and tables. NH helped design the computation-intensive portions of the project, ran and debugged the in-house version of PROSPER, generated the fragmentation prediction and coincidence data, mentored the PhD student, co-mentored the MS student, trained at least four undergraduate trainees, helped design and implement the SPL version of the catalog, evaluated all Excel and PDB analyses, and co-edited the manuscript and all figures and tables. CG did the manual co-mapping of predicted proteolytic sites in the third exon of 29 transcripts and discovered the elevation of proteolytic cleavage near exon junctions. NK set up the Soluble Human Protein Hormone Proteome database for the SPL version. IN did the co-mapping of predicted proteolytic fragmentation and structural motifs of the ILs and CXC chemokines. NS did the independent manual verification of PROSPER using data from 51 human protein transcripts and assisted with the initial manual counts of proteolytic fragment length predictions for all 461 canonical human protein hormone transcripts. UV wrote the Protein Cyberase proteolysis prediction software, analyzed several gonadotropins for probable breakdown, and identified PPMPs using BLASTp and Label Hash. Acknowledgments The senior authors (KC, NH) wish to thank the 26 trainees who have worked with us the past decade to develop and explore pieces of this project as well as related tangents. They have kept the ideas and data accumulating, prompted us to think and write more clearly, and encouraged our efforts as well as those of their peers: PhD trainees, Nuzulul Kurniatash (2016–2017); MS Thesis: Uma Vader (Bioactive Proteolytic Fragments of Gonadotropins, Department of Biology, UMB, 2011); Honors Theses: (2012) Briana Mason, Jeremy Steinbruck, Sherry Solchenberger; (2013) Barbara Dominas; (2014) Ron Bigos, G. Florent Taguzem (IMSD Fellow); (2015) Mike Vilme (McNair Fellow); (2016) Indira Nouduri (McCone Awardee), Seraphina Yang (McNair Fellow); (2017) Cassandra Gath (Sanofi Fellow, Oracle Fellow, Biology Research Awardee), Ahmad Hasaba, Naomi Stuffers; (2018) Mohamed Anwar (McNair Fellow, Sanofi Fellow), Dennis Cheng, Brian Hall; Other Trainees: Noelle Palmstrom (2011–2014), Nicholas Bacon (2012–2013), Charles Citrone (2012–2013), Wagner Calixte (2013, Bridges Fellow), Vanessa Ford (2014, Bridges Fellow), Tara Gabriel-Richards (2014, Bridges Fellow), Sulaiman Abdul Hadi (2017–2018, Bridges Fellow), and James Hoang (2017–2018). KC also acknowledges Alexandra and Ghotas Evindar for their 1999 work on MALDITOFMS of FSH digests, a physical attempt to look at hormone metabolism. Footnotes 1 The proteome spreadsheet including sequences, known breakdown products, predicted proteolytic fragmentation, secondary structure elements, and exon maps for each canonical transcript isoform is being transferred to an SQL database to improve utility and as a prelude to public access to the database. 2 Computationally: total number of junctions for a given transcript = (number of exons + 1); number of external junctions for a transcript is 2; number of internal junctions = [(total number of junctions) − (number of external junctions)]; number of junctions for multiple transcripts = (sum of junctions for individual transcripts); total number of cleavages counted for the entire transcript (which always occur on the C-terminal side of any given residue at the amide bond) = [(cleavages within the junctional scan windows) + (cleavages outside the windows)]; total bonds susceptible to cleavage = (total length of the transcript − 1); total bonds not cleaved = [(number of total bonds) − (number of total cleavages)]; the values for multiple transcripts is the sum for the individual transcripts. References 1. Roberts RJ . An amazing distortion in DNA induced by a methyltransferase . Biosci Rep 1994 ; 14 : 103 – 117 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Sharp PA . Split genes and RNA splicing . Cell 1994 ; 77 : 805 – 815 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Gilbert W . Why genes in pieces? Nature 1978 ; 271 : 501 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Asokan A , Cho MJ. Exploitation of intracellular pH gradients in the cellular delivery of macromolecules . J Pharm Sci 2002 ; 91 : 903 – 913 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Guha S , Padh H. Cathepsins: fundamental effectors of endolysosomal proteolysis . Indian J Biochem Biophys 2008 ; 45 : 75 – 90 . Google Scholar PubMed OpenURL Placeholder Text WorldCat 6. Bond JS , Butler PE. Intracellular proteases . Annu Rev Biochem 1987 ; 56 : 333 – 364 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Agarwal SK . Proteases Cathepsins -- A view . Biochem Educ 1990 ; 18 : 67 – 72 . Google Scholar Crossref Search ADS WorldCat 8. Brix K , Dunkhorst A, Mayer K, Jordans S. Cysteine cathepsins: Cellular roadmap to different functions . Biochimie 2008 ; 90 : 194 – 207 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Ciechanover A . Intracellular protein degradation from a vague idea through the lysosome and the ubiquitin-proteasome system and on to human diseases and drug targeting; Nobel Lecture, December 8, 2004 . Ann N Y Acad Sci 2007 ; 1116 : 1 – 28 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Dickinson DP . Cysteine peptidases of mammals: Their biological roles and potential effects in the oral cavity and other tissues in health and disease . Crit Rev Oral Biol Med 2002 ; 13 : 238 – 275 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Lecaille F , Bromme D, Lalmanach G. Biochemical properties and regulation of cathepsin K activity . Biochimie 2008 ; 90 : 208 – 226 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Turk B , Turk D, Turk V. Protease signalling: The cutting edge . EMBO J 2012 ; 31 : 1630 – 1643 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Song J , Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC, Pike RN. PROSPER: An integrated feature-based tool for predicting protease substrate cleavage sites . PLoS One 2012 ; 7 : e50300 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Rawlings N , Barrett AJ, Finn RD. Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors . Nucleic Acids Res 2016 ; 44 : D343 – D350 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Song J , Li F, Leier A, Marquez-Lago TT, Akutsu T, Haffari G, Chou K-C, Webb GI, Pike RN. PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy . Bioinformatics 2018 ; 34 : 684 – 687 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Verspurten J , Gevaert K, Declercq W, Vandenabeele P. SitePredicting the cleavage of proteinase substrates . Trends Biochem Sci 2009 ; 34 : 319 – 323 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Doherty GJ , McMahon HT. Mechanisms of endocytosis . Annu Rev Biochem 2009 ; 78 , 31 : 31 – 31.46 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Magzoub M , Pramanik A, Graslund A. Modeling the endosomal escape of cell-penetrating peptides: Transmembrane pH gradient driven translocation across phospholipid bilayers . Biochemistry 2005 ; 44 : 14890 – 14897 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Ahmed F , Discher DE. Self-porating polymerosomes of PEG-PLA and PEG-PCL: Hydrolysis-triggered controlled release vesicles . J Control Release 2004 ; 96 : 37 – 53 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Imelli N , Meier O, Boucke K, Hemmi S, Greber UF. Cholesterol is required for endocytosis and endosomal escape of adenovirus type 2 . J Virol 2004 ; 78 : 3089 – 3098 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Pei D , Buyanova M. Overcoming endosomal entrapment in drug delivery . Bioconjug Chem 2019 ; 30 : 273 – 283 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Mello CC . Return to the RNAi world: Rethinking gene expression and evolution (Nobel lecture) . Angew Chem Int Ed 2007 ; 46 : 6985 – 6994 . Google Scholar Crossref Search ADS WorldCat 23. Ramachandran KV , Margolis SS. A mammalian nervous-system-specific plasma membrane proteasome complex that modulates neuronal function . Nat Struct Mol Biol 2017 ; 24 : 419 – 430 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Triebel J , Bertsch T, Bollheimer C, Rios-Barrera D, Pearce C, Hüfner M, Martínez de la Escalera G, Clapp C. Principles of the prolactin/vasoinhibin axis . Am J Physiol Regul Integr Comp Physiol 2015 ; 309 : R1193 – R1203 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Nakajima R , Nakamura E, Harigaya T. Vasoinhibin, an N-terminal prolactin fragment, directly inhibits cardiac angiogenesis in three-dimensional heart culture . Front Endocrinol 2017 ; 8 : 1 – 6 . Google Scholar Crossref Search ADS WorldCat 26. Fineschi B , Miller J. Endosomal proteases and antigen processing . Trends Biochem Sci 1997 ; 22 : 377 – 382 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Moss CX , Tree TI, Watts C. Reconstruction of a pathway of antigen processing and class II MHC peptide capture . EMBO J 2007 ; 26 : 2137 – 2147 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Cannon MJ , Pate JL. The role of major histocompatibility complex molecules in luteal function . Reprod Biol Endocrinol 2003 ; 1 : 93 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Santa Coloma TA , Dattatreyamurty B, Reichert LEJ. A synthetic peptide corresponding to human FSH beta-subunit 33-53 binds to FSH receptor, stimulates basal estradiol biosynthesis, and is a partial antagonist of FSH . Biochemistry 1990 ; 29 : 1194 – 1200 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Dattatreyamurty B , Reichert LE. Identification of regions of the follitropin (FSH) β-subunit that interact with the N-terminus region (residues 9–30) of the FSH receptor . Mol Cell Endocrinol 1993 ; 93 : 39 – 46 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Giudice LC . Growth factors and growth modulators in human uterine endometrium: Their potential relevance to reproductive medicine . Fertil Steril 1994 ; 61 : 1 – 17 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Licht P , Losch A, Dittrich R, Neuwinger J, Siebzehnrubl E, Wildt L. Novel insights into human endometrial paracrinology and embryo-maternal communication by intrauterine microdialysis . Hum Reprod Update 1998 ; 4 : 532 – 538 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Jmol . Jmol: an open-source Java viewer for chemical structures in 3D . 2014-2020 . 34. Dixon J , Li CH. Isolation and properties of corticotropin from bovine pituitary glands . Science 1956 ; 124 : 934 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Lowry P . 60 YEARS OF POMC: Purification and biological characterisation of melanotrophins and corticotrophins . J Mol Endocrinol 2016 ; 56 : T1 – T12 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Smyth DG . 60 years of POMC: Lipotropin and beta-endorphin: A perspective . J Mol Endocrinol 2016 ; 56 : T13 – T25 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Kasckow J , Geracioti TDJ. Neuroregulatory Peptides of Central Nervous System Origin: From Bench to Bedside. In: Hormones, Brain and Behavior , vol. V . San Diego, CA, USA : Elsevier Science/Academic Press ; 2002 : 153 – 208 . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC 38. Dockray GJ , Varro A, Dimaline R, Wang T. The gastrins: Their production and biological activities . Annu Rev Physiol 2001 ; 63 : 119 – 139 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Zhang J , Ren P, Avsian-Kretchmer O, Luo C, Rauch R, Klein C, Hsueh A. Obestatin, a peptide encoded by the ghrelin gene, opposes ghrelin’s effects on food intake . Science 2005 ; 310 : 996 – 999 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Pocai A . Unraveling oxyntomodulin, GLP1’s enigmatic brother . J Endocrinol 2012 ; 215 : 335 – 346 . Google Scholar Crossref Search ADS PubMed WorldCat 41. UniProt Consortium EBIE-E, the SIB Swiss Institute of Bioinformatics, and the Protein Information Resource (PIR) . UniProt: The universal protein knowledge database . Nucleic Acids Res 2016 ; 45 : 158 – 169 . OpenURL Placeholder Text WorldCat 42. Rawlings ND , Barrett AJ, Thomas PD, Huang X, Bateman A, Finn RD. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database . Nucleic Acids Res 2018 ; 46 : D624 – D632 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Perez-Silva JG , Espanol Y, Velasco G, Quesada V. The Degradome database: Expanding roles of mammalian proteases in life and disease . Nucleic Acids Res 2016 ; 44 : D351 – D355 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Gasteiger E , Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A. Protein identification and analysis tools on the ExPASy server. In: The Proteomics Protocols Handbook . Totowa, NJ, USA : Humana Press ; 2005 : 571 – 607 . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC 45. Boyd SE , Garcia de la Banda M, Pike RN, Whisstock JC, Rudy GB. PoPS: A computational tool for modeling and predicting protease specificity . J Bioinform Comput Biol 2005 ; 3 : 551 – 585 . Google Scholar Crossref Search ADS PubMed WorldCat 46. National Center for Biotechnology Information USNLoM . BLASTp. In: BLAST: National Center for Biotechnology Information, U.S. National Library of Medicine . NCBI ; 2014-2020 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 47. Moll M , Bryant DH, Kavraki LE. The LabelHash algorithm for substructure matching . BMC Bioinf 2010 ; 11 :Article number 555. Google Scholar OpenURL Placeholder Text WorldCat 48. Wang J , Youkharibache P, Zhang D, Lanczycki CJ, Geer RC, Madej T, Phan L, Ward M, Lu S, Marchler GH, Wang Y, Bryant SH et al. iCn3D, a web-based 3D viewer for sharing 1D/2D/3D representations of biomolecular structures . Bioinformatics 2020 ; 36 : 131 – 135 . Google Scholar Crossref Search ADS PubMed WorldCat 49. Campbell KL , Rockett JC. Biomarkers of ovulation, endometrial receptivity, fertilisation, implantation and early pregnancy progression . Paediatr Perinat Epidemiol 2006 ; 20 : 13 – 25 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Madeira F , Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R. The EMBL-EBI search and sequence analysis tools APIs in 2019 . Nucleic Acids Res 2019 ; 47 : W636 – W641 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Katoh K , Rozewicki J, Yamada KD. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization . Brief Bioinform 2019 ; 20 : 1160 – 1166 . Google Scholar Crossref Search ADS PubMed WorldCat 52. Fagiani E , Christofori G. Angiopoietins in angiogenesis . Cancer Lett 2013 ; 328 : 18 – 26 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Santulli G . Angiopoietin-like proteins: A comprehensive look . Front Endocrinol 2014 ; 5 : 1 – 6 . Google Scholar Crossref Search ADS WorldCat 54. Seigneur E , Sudhof TC. Cerebellins are differentially expressed in selective subsets of neurons throughout the brain . J Comp Neurol 2017 ; 525 : 3286 – 3311 . Google Scholar Crossref Search ADS PubMed WorldCat 55. Fernandez EJ , Lolis E. Structure, function, and inhibition of chemokines . Annu Rev Pharmacol Toxicol 2002 ; 42 : 469 – 499 . Google Scholar Crossref Search ADS PubMed WorldCat 56. Hazlett L , Wu M. Defensins in innate immunity . Cell Tissue Res 2011 ; 343 : 175 – 188 . Google Scholar Crossref Search ADS PubMed WorldCat 57. Unic A , Derek L, Hodak N, Marijancevic D, Ceprnja M, Serdar T, Krhac M, Romic Z. Endothelins – Clinical perspectives . Biochem Med 2011 ; 21 : 231 – 242 . Google Scholar Crossref Search ADS WorldCat 58. Zhua Y , Xu G, Patel A, McLaughlin MM, Silverman C, Knecht KA, Sweitzer S, Li X, McDonnell P, Mirabile R, Zimmerman D, Boyce R et al. Cloning, expression, and initial characterization of a novel cytokine-like gene family . Genomics 2002 ; 80 : 144 – 150 . Google Scholar Crossref Search ADS PubMed WorldCat 59. Ornitz DM , Itoh N. The fibroblast growth factor signaling pathway . WIREs Dev Biol 2015 ; 4 : 215 – 266 . Google Scholar Crossref Search ADS WorldCat 60. Bovolenta P , Esteve P, Ruiz JM, Cisneros E, Lopez-Rios J. Beyond Wnt inhibition: New functions of secreted frizzled-related proteins in development and disease . J Cell Sci 2008 ; 121 : 737 – 746 . Google Scholar Crossref Search ADS PubMed WorldCat 61. Soares MJ . The prolactin and growth hormone families: Pregnancy-specific hormones/cytokines at the maternal-fetal interface . Reprod Biol Endocrinol 2004 ; 2 : 1 – 15 . Google Scholar Crossref Search ADS PubMed WorldCat 62. Cahoreau C , Klett D, Combarnous Y. Structure–function relationships of glycoprotein hormones and their subunits’ ancestors . Front Endocrinol 2015 ; 6 : 1 – 14 . Google Scholar Crossref Search ADS WorldCat 63. Capobianchi MR , Uleri E, Caglioti C, Dolei A. Type I IFN family members: Similarity, differences and interaction . Cytokine Growth Factor Rev 2015 ; 26 : 103 – 111 . Google Scholar Crossref Search ADS PubMed WorldCat 64. Pollak M . Insulin and insulin-like growth factor signalling in neoplasia . Nat Rev Cancer 2008 ; 8 : 915 – 928 . Google Scholar Crossref Search ADS PubMed WorldCat 65. Gibeon D , Menzies-Gow AN. Targeting interleukins to treat severe asthma . Expert Rev Respir Med 2012 ; 6 : 423 – 439 . Google Scholar Crossref Search ADS PubMed WorldCat 66. Lee R , Kermani P, Teng KK, Hempstead BL. Regulation of cell survival by secreted proneurotrophins . Science 2001 ; 294 : 1945 – 1948 . Google Scholar Crossref Search ADS PubMed WorldCat 67. Holmes DI , Zachary I. The vascular endothelial growth factor (VEGF) family: Angiogenic factors in health and disease . Genome Biol 2005 ; 6 : 209.201 – 209.210 . Google Scholar Crossref Search ADS WorldCat 68. Steinhoff MS , Mentzer BV, Geppetti P, Pothoulakis C, Bunnett NW. Tachykinins and their receptors: Contributions to physiological control and the mechanisms of disease . Physiol Rev 2014 ; 94 : 265 – 301 . Google Scholar Crossref Search ADS PubMed WorldCat 69. Trombly DJ , Woodruff TK, Mayo KE. Roles for transforming growth factor beta superfamily proteins in early folliculogenesis . Semin Reprod Med 2009 ; 27 : 14 – 23 . Google Scholar Crossref Search ADS PubMed WorldCat 70. Aggarwal BB , Gupta SC, Kim JH. Historical perspectives on tumor necrosis factor and its superfamily: 25 years later, a golden journey . Blood 2012 ; 119 : 651 – 665 . Google Scholar Crossref Search ADS PubMed WorldCat 71. Nusse R , Varmus H. Three decades of Wnts: A personal perspective on how a scientific field developed . EMBO J 2012 ; 2670 – 2684 . Google Scholar OpenURL Placeholder Text WorldCat 72. Waterhouse AM , Procter JB, Martin DMA, Clamp M, Barton GJ. JalView version 2 – A multiple sequence alignment editor and analysis workbench . Bioinformatics 2009 ; 25 : 1189 – 1191 . Google Scholar Crossref Search ADS PubMed WorldCat 73. Jones D . Protein secondary structure prediction based on position-specific scoring matrices . J Mol Biol 1999 ; 292 : 195 – 202 . Google Scholar Crossref Search ADS PubMed WorldCat 74. Zerbino DR , Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Girón CG, Gil L, Gordon L et al. Ensembl 2018 . Nucleic Acids Res 2018 ; 46 : D754 – D761 . Google Scholar Crossref Search ADS PubMed WorldCat 75. Goldstein A , Tachibana S, Lowney LI, Hunkapiller M, Hood L. Dynorphin-(1-13): An extraordinarily potent opiod peptide . Proc Natl Acad Sci U S A 1979 ; 76 : 6666 – 6670 . Google Scholar Crossref Search ADS PubMed WorldCat 76. Bateman A , Belcourt D, Bennett H, Lazure C, Solomon S. Granulins, a novel class of peptide from leukocytes . Biochem Biophys Res Commun 1990 ; 173 : 1161 – 1168 . Google Scholar Crossref Search ADS PubMed WorldCat 77. Rehfeld JF . The new biology of gastrointestinal hormones . Phys Rev 1998 ; 78 : 1087 – 1108 . Google Scholar OpenURL Placeholder Text WorldCat 78. Campbell KL , Landefeld TD, Midgley AR Jr. Differential processing of subunits of human chorionic gonadotropin by granulosa cells in vivo . Proc Natl Acad Sci U S A 1980 ; 77 : 4793 – 4797 . Google Scholar Crossref Search ADS PubMed WorldCat 79. Campbell KL , Bagavandoss P, Byrne MD, Jonassen JA, Landefeld TD, Quasney MW, Sanders MM, Midgley AR Jr. Differential processing of the two subunits of human choriogonadotropin (hCG) by granulosa cells. II. In vivo studies . Endocrinology 1981 ; 109 : 1858 – 1871 . Google Scholar Crossref Search ADS PubMed WorldCat 80. Consortium IHGS . Finishing the euchromatic sequence of the human genome . Nature 2004 ; 431 : 931 – 945 . Google Scholar Crossref Search ADS PubMed WorldCat 81. Uhlén M , Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I, Edlund K et al. Tissue-based map of the human proteome . Science 2015 ; 347 : 1260419 . Google Scholar Crossref Search ADS PubMed WorldCat 82. Jiang L , Wang M, Lin S, Jian R, Li X, Chan J, Fang H, Dong G, Consortium G, Tang H, Snyder MP. A quantitative proteome map of the human body . bioRxiv 2019 ; 797373. Google Scholar OpenURL Placeholder Text WorldCat 83. Vadher US . Proteolytic fragments of glycoprotein hormones show homologies to signal and metabolic proteins: Are the peptide fragments biologically active? Masters of Science Thesis. Biotechnology and Biomedical Science Program , University of Massachusetts Boston ; 2011 : 1 – 102 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 84. Song J , Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou K-C. iProt-Sub: A comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites . Brief Bioinform 2019 ; 20 : 638 – 658 . Google Scholar Crossref Search ADS PubMed WorldCat 85. Tompa P , Davey NE, Gibson TJ, Babu MM. A million peptide motifs for the molecular biologist . Mol Cell 2014 ; 55 : 161 – 169 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2020. Published by Oxford University Press on behalf of Society for the Study of Reproduction. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Protein hormone fragmentation in intercellular signaling: hormones as nested information systems JF - Biology of Reproduction DO - 10.1093/biolre/ioaa234 DA - 2021-01-05 UR - https://www.deepdyve.com/lp/oxford-university-press/protein-hormone-fragmentation-in-intercellular-signaling-hormones-as-RzCbNy5Scl SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -