ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease

ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/ac Article ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease # # Wai Tuck Soh, Fatih Demir, Elfriede Dall, Andreas Perrar, Sven O. Dahms, Maithreyan Kuppusamy, Hans Brandstetter, and Pitter F. Huesgen* Cite This: Anal. Chem. 2020, 92, 2961−2971 Read Online Metrics & More Article Recommendations sı Supporting Information ACCESS * ABSTRACT: Bottom-up mass spectrometry-based proteomics utilizes proteolytic enzymes with well characterized specificities to generate peptides amenable for identification by high-throughput tandem mass spectrometry. Trypsin, which cuts specifically after the basic residues lysine and arginine, is the predominant enzyme used for proteome digestion, although proteases with alternative specificities are required to detect sequences that are not accessible after tryptic digest. Here, we show that the human cysteine protease legumain exhibits a strict substrate specificity for cleavage after asparagine and aspartic acid residues during in-solution digestions of proteomes extracted from Escherichia coli, mouse embryonic fibroblast cell cultures, and Arabidopsis thaliana leaves. Generating peptides highly complementary in sequence, yet similar in their biophysical properties, legumain (as compared to trypsin or GluC) enabled complementary proteome and protein sequence coverage. Importantly, legumain further enabled the identification and enrichment of protein N- termini not accessible in GluC- or trypsin-digested samples. Legumain cannot cleave after glycosylated Asn residues, which enabled the robust identification and orthogonal validation of N-glycosylation sites based on alternating sequential sample treatments with legumain and PNGaseF and vice versa. Taken together, we demonstrate that legumain is a practical, efficient protease for extending the proteome and sequence coverage achieved with trypsin, with unique possibilities for the characterization of post-translational modification sites. urrent “bottom-up” mass spectrometry-based proteomics, modification sites, and even whole proteins remain invisible in C also termed shotgun proteomics, can achieve near- proteome analyses relying on trypsin alone. This is especially complete proteome coverage and allows for extensive mapping true for proteolytic processing, a site-specific post-translational of post-translational modification sites. The basis of this protein modification that can irreversibly alter protein 5,6 approach is the selective protease-mediated digestion of function, interaction, and localization and thereby exert isolated proteomes into peptides, which are then typically important signaling functions. Processed proteoforms are separated by reverse-phase liquid chromatography under acidic unambiguously identified by their new protease-generated neo- conditions and analyzed by tandem mass spectrometry (MS/ 8,9 N-, or C-termini. The identification of neo-N-, and C- MS). Peptides are subsequently identified by computational terminal peptides, which constitute a minor fraction among all matching of the acquired spectra to proteome databases or peptides in a proteome digest, is facilitated by a variety of spectral libraries, and the proteins present in the sample are methods that have been developed to allow for their selective inferred on the basis of the identified peptides. The serine enrichment. However, many neo-N-, or C-terminal peptides protease trypsin has become the dominant workhorse for the are too short for mass spectrometry-based identification when proteome digestions due to its high cleavage efficiency, high only a single protease is used. specificity for cleavage after Arg or Lys, and affordable price, Alternative proteases with a high sequence specificity are even for high-quality preparations. Proteomes digested with therefore of great interest and increasingly applied in bottom- trypsin therefore consist of predictable peptides with a C- 3,10 up proteomics, including termini profiling approaches. terminal basic residue favorable for the ionization and generation of a dominant y-ion series, which facilitates database searches and peptide identification. However, about Received: August 7, 2019 half of the peptides generated by trypsin are less than six Accepted: January 17, 2020 residues long and therefore too small for identification and/or Published: January 17, 2020 unambiguous assignment to specific protein sequences. Thus, many protein segments, including critical post-translational https://dx.doi.org/10.1021/acs.analchem.9b03604 © 2020 American Chemical Society Anal. Chem. 2020, 92, 2961−2971 2961 Analytical Chemistry pubs.acs.org/ac Article Established proteases include AspN for cleavage before Asp desalting using PD-10 columns (GE Healthcare). Purified and Glu; chymotrypsin for cleavage after Phe, Tyr, Leu, Trp, legumain was activated at 20 °Cina buffer containing 100 mM and Met; GluC (also known as Staphyloccoccus aureus protease citric acid (pH 4.0), 100 mM NaCl, and 2 mM DTT. The V8) for cleavage after Asp and Glu; LysC for cleavage after progress of autoactivation was monitored by SDS-PAGE. 3,11 Lys; LysN for cleavage before Lys; LysargiNase for cleavage Activated legumain was further purified using a PD-10 column before Arg and Lys; and the prolyl endopeptidase neprosin (GE Healthcare) followed by size exclution chromatography to that selectively cleaves after Pro and Ala. Also, proteases with have the active protein in a final buffer composed of 20 mM broader sequence specificity such as elastase and thermoly- citric acid (pH 4.0), 50 mM NaCl, and 2 mM DTT. Legumain 14 15 16 sin, proteinase K, subtilisin, and thermolysin WaLP and activity was evaluated using the legumain specific fluorescent MaLP are occasionally applied but less favored due to the substrate Z-Ala-Ala-Asn-AMC (AAN-AMC; Bachem) at a increased sample complexity with overlapping peptides and the concentration of 50 μM in assay buffer composed of 50 mM less efficient spectrum-to-sequence matching due to the lack of citric acid (pH 5.5), 100 mM NaCl, and 2 mM DTT at 37 °C. adefined cleavage specificity as a restraint. Notably, digest Fluorescence was detected using an Infinite M200 Plate with a single additional protease increases the number of Reader (Tecan) at 460 nm after excitation at 380 nm. protein identifications by an average of 7−8% and enables A. thaliana Proteome Preparation. A. thaliana Colum- the discovery of critical PTMs including phosphorylation bia (Col-8) leaves were harvested from 10 week old plants 16,19 10,20 sites and N-terminal processing sites that are missed in grown on soil under short day conditions (9 h/15 h tryptic digests. Hence, there is a persistent strong demand for −2 −1 photoperiod, 22 °C/18 °C, 120 μmol of photons m s ) new, highly specific proteolytic enzymes with improved, and snap frozen in liquid nitrogen. Leaves were ground in complementary, or unexplored sequence specificity. liquid nitrogen and resuspended in 10 mL/g fresh weight of Human legumain, also known as asparaginyl endopeptidase extraction buffer (6 M Gua-HCl, 0.1 M HEPES (pH 7.4), 5 (AEP), is a well characterized caspase-like human cysteine mM EDTA, 1 mM DTT, and HALT protease inhibitor protease known to cleave model substrates selectively after Asn cocktail; ThermoFisher, Dreieich, Germany). The suspension and Asp residues. Recently, legumain cleavage specificity was was homogenized using a Polytron PT-2500 (Kinematica, further characterized by in-gel digestion of denatured complex Luzern, Switzerland) and filtered through Miracloth (Merck, proteomes that revealed pH-dependent differences in sequence Darmstadt, Germany), and debris and nuclei were removed by specificity, with an optimal pH for cleavage after Asn and Asp 22 centrifugation at 500g,4 °C for 10 min. Proteins in the at pH 6 and 4.5, respectively. On the basis of this data, it was supernatant were purified by chloroform−methanol precip- further suggested that legumain may be a suitable choice as a 22 itation, resuspended in extraction buffer, and reduced with 5 precision digestion enzyme in proteomics applications. mM DTT at 56 °C, 30 min followed by alkylation with 15 mM Encouraged by these reports, we reasoned that legumain iodoacetamide for 30 min at 25 °C. The reaction was might also be an attractive enzyme for standard in-solution quenched by the addition of 15 mM DTT for 15 min. The digestion proteomics workflows. We show that the parallel proteome extract was purified again with chloroform− digestion of proteomes isolated from Arabidopsis thaliana (A. methanol precipitation, resuspended in 0.2 mL of 0.1 M thaliana) leaves, mouse embryonic fibroblasts (MEF), or NaOH, and diluted with water and 1 M Hepes (pH 7.4) to a Escherichia coli (E. coli) cell cultures with legumain, trypsin, and final concentration of 4 mg/mL in 0.1 M HEPES (pH 7.4). GluC results in the identification of distinct peptides that The protein concentration was quantified using the BCA assay together increase protein sequence and proteome coverage. (ThermoFisher, Dreieich, Germany). For digestion, aliquots of Legumain retained its remarkable specificity even under the concentrated A. thaliana proteome extracts were diluted at unfavorable conditions. N-terminome profiling demonstrated least four times to reach the required digestion buffer a strong complementarity to trypsin and superior performance conditions, and the pH was confirmed with pH strips compared to that of GluC. Asn is also the site of N-linked (Merck, Darmstadt, Germany). glycosylation, a common protein post-translational modifica- Mouse Embryonic Fibroblast Proteome Preparation. tion important in protein stability, folding, and protein− Mouse embryonic fibroblast (MEF) cells were cultured in protein interaction. By sequential processing with PNGase F DMEM GlutaMax high glucose (Gibco 61965-026) supple- and legumain, and vice versa, we demonstrate that N- mented with 10% FBS and 1× penicillin/streptomycin (Gibco glycosylation prevents legumain cleavage and propose that 15140-122) at 37 °C, 5% CO . Once the cells reached a this tandem treatment strategy can provide orthogonal validation of N-glycosylation sites. Taken together, our data confluency of up to 90%, the media were removed, washed demonstrate that legumain is an attractive and reliable protease with warm PBS, and trypsinized (Gibco 25300-054). The for the specific digestion of proteomes after Asn and Asp, with trypsinized cells were pelleted, washed twice with warm PBS to particular advantages for PTM site identification including remove excess media, and lysed with 1% SDS 100 mM HEPES processed N-termini and N-glycosylation sites. (pH 7.5) containing 1:50 (v/v) protease inhibitor cocktail (Sigma P8340). The sample was heated to 95 °C for 5 min, EXPERIMENTAL SECTION cooled, sonicated for 2 min, and heated again to 95 °C for 5 min to shear DNA. The protein concentration was measured, Expression, Purification, and Activation of Human and 100 μg of protein was used for each proteome digestion. Legumain. Human legumain was produced using the Proteins were reduced with 10 mM DTT for 30 min at 37 °C Leishmania tarentolae expression system (LEXSY) following a and alkylated by the addition of 50 mM chloroacetamide previously published protocol. Briefly, legumain was (CAA) and incubation for 30 min at RT in the dark. The recombinantly expressed as a secreted protein by a LEXSY reaction was quenched by incubation with 50 mM DTT for 20 suspension culture at 26 °C. The supernatant containing prolegumain protein was harvested by centrifugation and min at room temperature (RT) before purification with SP3 2+ subjected to Ni -NTA affinity purification, followed by beads and elution in the required digestion buffer. 2962 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article E. coli Proteome Preparation. E. coli Dh5α cells were Mass Spectrometry Data Analysis. Database searches grown in 200 mL of LB media until an optical density of OD were performed with MaxQuant v.1.6.0.16 using standard nm of 0.7. Cells were harvested by centrifugation at 400g for Bruker Q-TOF settings that included peptide mass tolerances 15 min at 4 °C, washed by adding ice-cold PBS, and of 0.07 Da in the first search and 0.006 Da in the main search. resuspended in 1 mL of lysis buffer (4% (v/v) SDS, 50 mM A. thaliana, M. musculus, and E. coli protein databases were HEPES (pH 7.4), 5 mM EDTA, 1× HALT protease inhibitor downloaded from UniProt (A. thaliana release 2018_01, cocktail (ThermoScientific)) per 0.1 g of fresh weight. The 41350 sequences) with appended common contaminants as cells were lysed by heating to 95 °C two times for 5 min, with embedded in MaxQuant. The “revert” option was enabled for 10 min of cooling with ice. Proteins were purified by decoy database generation. For shotgun proteome samples, chloroform−methanol precipitation and resuspended in 6 M specificity was set to “unspecific” for the characterization of the Gua-HCl, 100 mM HEPES (pH 7.4), 5 mM EDTA, and the cleavage specificity, otherwise according to the enzyme used concentration was estimated using the BCA assay (Thermo- (cleavage at K/R|X for trypsin, D/E|X for GluC, or D/N|X for Fisher, Dreieich, Germany). One hundred micrograms of legumain). Oxidation (M) and acetylation (protein N-term) proteome was reduced by the addition of 10 mM DTT for 30 were set as variable modifications, and the “match between runs” option was disabled. The analysis of the label-free min at 37 °C and alkylated by the addition of 50 mM CAA for shotgun data was performed with Perseus v.1.6.1.1; the 30 min at RT in the dark, and the reaction was quenched by validation of the protein identification required at least two incubation with 50 mM DTT for 20 min at RT. The proteome unique peptides for each protein and label-free quantification was purified by chloroform−methanol precipitation and (LFQ) in at least two replicates. Searches for the N-termini resolubilized in the appropriate digestion buffer. were performed as described above, except that the enzyme Proteome Digestions. Proteome aliquots of 100 μg were specificity was set as Arg-C/GluC (DE)/legumain semispecific individually digested by legumain, GluC, or trypsin. The digestion with legumain was carried out in a reaction with a free N-terminus and duplex dimethyl labeling with light 12 13 containing 0.1 M MES (pH 6.0), 0.1 M NaCl, and 2 mM CH O formaldehyde or heavy CD Oformaldehyde 2 2 DTT at a protease to proteome ratio of 1:50 (m:m), unless (peptide N-term and K). Oxidation (M), acetyl (N-term), otherwise stated. For GluC (SERVA Electrophoresis, Heidel- Gln → pyro-Glu, and Glu → pyro-Glu were set as dynamic berg, Germany) digestion, the same amount of proteome was modifications, and the requantify option was turned off; the digested in PBS (pH 7.4) with a protease to proteome ratio of unspecific search window was set to 8−40 amino acids. Data 1:50, whereas a 1:100 ratio was used for trypsin (SERVA evaluation and positional annotation for N-termini analyses Electrophoresis, Heidelberg, Germany) digestion in 0.1 M were performed using an in-house Perl script (MANTI.pl; HEPES (pH 7.4) supplemented with 5% acetonitrile and 5 available at http://MANTI.sourceforge.io) that combines mM CaCl . The pH was confirmed using pH strips (Merck, information provided by MaxQuant and UniProt to annotate Darmstadt, Germany), and the digestions were carried out at and classify identified N-terminal peptides. In short, MaxQuant 37 °C overnight. For pH shift assays with legumain, an aliquot peptide identifications are consolidated by removing nonvalid of the MEF proteome was digested at pH 6.0 for 5 h at 37 °C, identifications (peptides identified with N-terminal pyro-Glu and then the pH was lowered by the stepwise addition of 1 M peptides that do not contain Glu or Gln as N-terminal residue, HCl until pH 4.0 was reached. An additional 2 μg of legumain peptides with dimethylation at N-terminal Pro), contaminant, and 1 mM DTT were added and incubated for another 5 h at reverse database peptides, and nonquantifiable acetylated 37 °C. peptides in multichannel experiments (no K in peptide Mass Spectrometry. All samples were desalted using self- sequence to determine labeled channel). For N-terminal packed C18 Stop and Go Extraction tips as previously peptides mapping to multiple entries in the UniProt protein described. Analysis was performed on a two-column nano- database, a “preferred” entry was determined in a binary HPLC setup (Ultimate 3000 nano-RSLC system with Acclaim decision tree. Protein entries where the identified peptide PepMap 100 C18, i.d. 75 μm, particle size 3 μm; trap column matched positions 1 or 2 were preferred over alternative of 2 cm and analytical column of 50 cm length; ThermoFisher) positions, and then manually reviewed UniProt protein entries with a binary gradient from 5 to 32.5% B for 80 min (A, H O+ were favored over alternative models. If multiple entries 0.1% FA; B, ACN + 0.1% FA) and a total runtime of 2 h per persisted, the alphabetically first entry was used to retrieve sample coupled to a high-resolution Q-TOF mass spectrom- positional annotation information. For the visualization of eter (Impact II, Bruker) as previously described. Data was protein sequence coverage, protein structures were modeled acquired with the Bruker HyStar Software (v3.2, Bruker with the Phyre2 server. Daltonics) in line-mode in a mass range from 200 to 1500 m/z Enrichment of N-Terminal Peptides. Protein N-terminal at an acquisition rate of 4 Hz. The top 17 most intense ions peptides were enriched using the high-efficiency undecanal- were selected for fragmentation with a dynamic exclusion of based N-termini enrichment (HUNTER) method essentially previously selected precursors for the next 30 s, unless an as previously described. Briefly, equal amounts of A. thaliana intensity increase of factor 3 compared to the previous proteome were dimethyl labeled with 20 mM heavy ( CD O) precursor spectrum was observed. Intensity-dependent frag- or light (CH O) formaldehyde and 20 mM sodium mentation spectra were acquired between 5 Hz, for low- cyanoborohydride at 37 °C for 16 h to block all primary intensity precursor ions (>500 cts), and 20 Hz, for high- amines. To ensure a complete reaction, the same concentration intensity (>25k cts) spectra. Fragment spectra were acquired of reagents was added again and incubated for another 2 h. with stepped parameters, each with 50% of the acquisition time Proteins were purified by chloroform−methanol precipitation dedicated for each precursor: 61 μs transfer time, 7 eV collision to remove excess reagents and dissolved in 0.1 M HEPES (pH energy, and a collision radio frequency (RF) of 1500 Vpp 7.4), and the protein concentration was estimated using the followed by a 100 μs transfer time, 9 eV collision energy, and a BCA assay according to manufacturer instructions (Thermo- collision RF of 1800 Vpp. Fisher, Dreieich, Germany). The samples (400 μg/sample) 2963 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article were digested with legumain, GluC, and trypsin at 37 °C for 16 selectivity for Asn-containing substrates near pH 6. To test h in the respective digestion buffers and protease−proteome whether this exquisite specificity holds true under in-solution ratios as described above. The protease-generated peptides proteome digest conditions, we digested three aliquots of a were hydrophobically tagged with undecanal using an denatured A. thaliana proteome with legumain at pH 6.0 for 18 undecanal−proteome ratio of 50:1 and supplemented with h. In parallel, we digested three aliquots of the same proteome 20 mM sodium cyanoborohydride in 40% ethanol at 50 °C for with trypsin and GluC at pH 7.4. To determine protease 45 min. The reaction was extended by the addition of 20 mM cleavage site specificity, peptides were analyzed by nano-LC− sodium cyanoborohydride for another 45 min. The reaction MS/MS and the acquired spectra were matched to the UniProt was then acidified with a final 1% TFA and centrifuged at A. thaliana proteome database using nonspecificsearch 21000g for 5 min to precipitate free undecanal. Supernatant settings, i.e., without defining an enzyme cleavage specificity. was injected to a preactivated HR-X (M) cartridge (Macherey- This unbiased search identified 4452, 4078, and 7985 peptide Nagel, Düren, Germany). The flow-through containing N- sequences in legumain, GluC, and tryptic digests, respectively, terminal peptides was collected. Remaining N-terminal from which we compiled 6300, 5673, and 12107 unique peptides on the HR-X (M) cartridge were eluted with 40% nonredundant cleavage sites based on the sequence surround- ethanol containing 0.1% TFA, pooled with the first eluate, and ing both ends of the identified peptides. For legumain, 93.3% subsequently evaporated in the SpeedVac to a small volume of the observed cleavage sites were Asn and Asp (51.0% after suitable for C18 StageTips purification. Asn, 42.3% after Asp). A small percentage of unspecific Identification of Glycosylation Sites. Apoplastic fluid cleavage is expected because of endogenous background proteome enrichment was carried out as described with some proteolysis. The percentage of specific cleavage in a whole modifications. The whole A. thaliana rosettes were infiltrated proteome is comparable to 96.7% of cleavages after Lys and with cold sterile water in a SpeedVac for 3 min at a pressure Arg, as observed for trypsin (58.0% after Lys, 38.7% after Arg), between 600 and 2500 Pa. The infiltrated rosettes were then and more stringent than the 85.4% cleavages after Glu and Asp centrifuged at 4 °C, 3000g for 10 min into a collection tube (72.7% after Glu, 12.7% after Asp), as observed for GluC. The containing a Halt protease inhibitor cocktail (ThermoFisher, visualization of the relative amino acid abundance surrounding Dreieich, Germany). Extracted apoplastic fluid proteins were the cleavage sites with IceLogos reflected the strict specificity purified by chloroform−methanol precipitation and resus- at the P1 position, preceding the hydrolyzed peptide bond in pended in 50 mM HEPES (pH 7.4). The protein all three enzymes (Figure 1a−c). While GluC (Figure 1b) and concentration was quantified by using the BCA assay. The trypsin (Figure 1c) do not allow cleavage before proline (P1′ sample was then reduced with 5 mM DTT at 56 °C for 30 min position), this is not the case for legumain (Figure 1a). We and alkylated with 15 mM iodoacetamide at 25 °C for 30 min further analyzed a single replicate of a mouse embryonic in the dark, and the reaction was quenched with 15 mM DTT at 25 °C for 15 min. The protein extract was then separated into two aliquots. One aliquot of 100 μg of apoplast proteome was treated with PNGase F (SERVA Electrophoresis, Heidelberg, Germany) for 2 h at 37 °C before legumain digestion with protease at a ratio of 1:50 at 37 °C, pH 6 (pH adjusted with final concentration of 0.1 M MES pH 6.0). In parallel, another 100 μg of protein extract was predigested with legumain and then treated with PNGase F using the same conditions. The samples were subsequently dimethyl labeled with 20 mM heavy ( CD O) and light (CH O) formaldehyde 2 2 and 20 mM sodium cyanoborohydride at 37 °C for 2 h. The reactions were quenched with 0.1 M Tris pH 7.4 at 37 °C for 1 h and pooled in a 1:1 ratio, and peptides were purified by C18 StageTips. Data Deposition. MS data have been deposited to the ProteomeXchange Consortium (http://www. proteomexchange.org) via the PRIDE (https://www.ebi.ac. uk/pride/archive/) partner repository: PXD014696 for data relating to comparative proteome digestion with legumain, GluC, and trypsin, PXD014699 for A. thaliana proteome digested by legumain in the presence of various denaturants, PXD014698 for various pHs, PXD014697 for HUNTER N- termini profiling of A. thaliana leaves, and PXD014680 for N- glycosylation site mapping. RESULTS Figure 1. Substrate cleavage specificity of legumain, GluC, and Legumain Cleaves Denatured Proteomes Exclusively trypsin. IceLogos visualize the amino acid frequencies surrounding the after Asn and Asp. Previous data obtained by in-gel protein cleavage sites inferred from peptides identified by nonspecific digestion-based specificity profiling and by biochemical database searches after digestion of (a−c) an A. thaliana leaf characterization with test peptides suggested that legumain proteome or (d−f) mouse embryonic fibroblast cell lysate proteome cleaves substrates C-terminally to Asn and Asp residues in a with (a,d) legumain, (b,e) GluC, or (c,f) trypsin. The numbers of pH-dependent manner, with optimal activity and high nonredundant cleavage sites for each logo are indicated. 2964 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 2. Analysis of an A. thaliana leaf proteome digested with legumain, GluC, and trypsin, each performed in three technical repeats. (a) Overlap of unique peptide sequences identified using enzyme-specific database queries. Analysis of the (b) mass, (c) hydrophobicity, and (d) isoelectric point of the identified peptides. (e) Overlap in unique amino acids identified by digestion with the three proteases. (f) Protein sequence coverage observed for superoxide dismutase (At1g08830) in legumain (red, 93%), GluC (green, 43%), and trypsin (blue, 49%) proteome digests. (g) Upset plot showing the overlap in protein groups identified in individual technical digestion replicates. (h) Venn diagram showing the total overlap of protein groups identified by the three enzymes. (i) Reproducibility of proteome quantification (MaxQuant LFQ). Only proteins quantified with two or more peptides were considered. Value indicates the Pearson correlation between the LFQ values obtained for technical replicates. fibroblast proteome and identified 1893, 1722, and 4377 analyses of an E. coli proteome (Supporting Information (SI) peptides using nonspecific database searches after digestion Figure S1), where 2681 peptides identified after legumain with legumain, GluC, and trypsin. Similar specificity profiles digestion yielded 4187 cleavage sites with 86.2% cleavage after were obtained on the basis of the 3244, 2999, and 7965 Asn and Asp (53.1% after Asn, 33.1% after Asp), while 85.3% nonredundant cleavage sites derived from the peptides in of the 8597 unique cleavages observed in 5374 peptides legumain (Figure 1d), GluC (Figure 1e), and trypsin (Figure identified after tryptic digest matched the expected specificity 1f) digests, again showing that legumain tolerates Pro at P1′ (44.1% after Arg, 41.2% after Lys). (Figure 1d). Of the cleavages observed in legumain digest, Complementary Protein Sequence Coverage by 94.5% matched the expected specificity (63.6% after Asn, Digestion with Legumain Compared to GluC and 30.9% after Asp), 97.6% in the tryptic digest (51.9% after Arg, Trypsin. With the strict cleavage specificity of legumain 45.7% after Lys), and 85% in the GluC digest (76.6% after Glu, under proteome digest conditions confirmed by the unbiased 8.4% after Asp). These observations were further confirmed by database search, we repeated spectra-to-sequence matching 2965 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article using standard enzyme-specific settings with up to three missed high correlation between the LFQ values obtained from digests cleavages, using cleavage after Asn and Asp as a specificity rule of the three different proteases (Figure 2i). for legumain. As expected, the smaller search space In the MEF proteome, 1469, 1140, and 2242 protein groups significantly increased the number of peptide identifications were identified in legumain, GluC, and tryptic digests, in the A. thaliana data set by 64%, 8%, and 66% to 7284, 4394, combining to 2587 protein groups in total, with 7.7% and 12806 unique peptide sequences for legumain, GluC, and exclusively identified in the legumain digests (SI Figure S4c). trypsin, respectively (Figure 2a). Specific searches of the MEF A larger overlap was observed between the E. coli proteome proteome data set increased peptide identifications by 129%, digests, where 842 and 1180 protein groups were identified 73%, and 61% to 4296, 2983, and 8489 unique peptides for after legumain and tryptic digestion, respectively, but only 37 legumain, GluC, and trypsin, respectively, compared to results (3%) of these were exclusive for legumain (SI Figure S4d). for nonspecific searches. In E. coli, peptide identifications Legumain Cleaves after Asn More Efficiently than improved by 33% and 7% to 3568 and 5767 unique peptides after Asp. The digestion efficiency of a protease can be for legumain and trypsin. reflected by the number of missed potential cleavage sites While trypsin showed the expected superior performance, within the identified peptides. In the A. thaliana data set, legumain digests resulted in the identification of more peptides legumain generated on average 53% of the peptides without missed cleavage sites, 34% with one missed potential cleavage than GluC, for example, 66% more in the A. thaliana data set. site, and 13% with more than one missed cleavage site (Figure Interestingly, the legumain and GluC data sets showed only a 3a). GluC performed worse, with only 30% of the peptides minimal overlap of 66 identical peptides delimited by cleavages after Asp on both sides, which may occur with both enzymes but are not favored by GluC under the applied reaction conditions (Figure 2a). The analysis of the mass (Figure 2b), hydrophobicity (Figure 2c), and isoelectric point (Figure 2d) of the identified A. thaliana peptides revealed very similar properties for all three enzymes. In contrast, the biophysical properties of all theoretical peptides in in silico-digested A. thaliana and M. musculus proteomes predicted a higher number of peptides with pI > 9 in GluC- and legumain-digested proteomes compared to those with trypsin (SI Figure S2a,b). However, a comparison to our data (Figure 2b−d) suggests that such peptides are rarely identified with the standard experimental setup with reverse-phase chromatography under acidic conditions and ionization and mass spectrometric analysis in positive ion mode. Despite these physical similarities, peptides identified after digestion with the three proteases covered distinct amino acids in the identified A. thaliana proteins (Figure 2e). In total, the parallel application of legumain, GluC, and trypsin in technical triplicates identified 1524, 1090, and 2380 protein groups in the A. thaliana proteome, respectively, combining to a total of 2785 protein groups, with legumain contributing 8.8% exclusive identifications (Figure 2g,h, SI Figure 3. Potential cleavage sites missed by legumain, GluC, and Table S1). As expected from the number of peptide trypsin in A. thaliana leaf proteome digests. (a) Percentage of peptides containing up to three missed cleavage sites. (b) Missed cleavage sites identifications, a large majority of 2057 proteins (74.3%) had sorted by missed amino acid residues. the highest sequence coverage in the tryptic digest, followed by 507 (18.3%) in legumain digests and 206 (7.4%) in GluC digests (SI Table S1). For example, the sequence coverage of with no missed cleavage, but almost 12% of the identified superoxide dismutase (At1g08830) (Figure 2f, SI Figure S3a) peptides containing three missed cleavage sites. Trypsin was was a remarkable 93% in legumain digests compared to 43% the best performing enzyme, with only 18% of the peptides and 49% in the GluC and trypsin data sets, and sequence containing one or more missed cleavage sites (Figure 3a). coverage of the germin-like protein 1 (At1g72610) was at 63% When we further considered the identity of the amino acid with legumain compared to only 23% and 8% with GluC and residue, we noted that legumain reliably cleaved after Asn trypsin (SI Figure S3b). Notably, for each of the three residues, with only 5% of the peptides containing an internal proteases >80% of the proteins were identified in all three Asn, but it missed one or more Asp in 40% of the peptides replicates, indicating a high degree of reproducibility in the (Figure 3b). Most missed cleavage sites in GluC-digested digests (SI Figure S4a). On the single replicate level, the proteomes were at Asp, and even trypsin showed a higher combination of any tryptic digest with any legumain or GluC fidelity at Arg than at Lys (Figure 3b). Remarkably, legumain digest resulted in a slightly higher number of protein cleaved after Asn residues as efficiently as trypsin at the favored identifications than any two tryptic replicates combined (SI Arg-containing cleavage sites. Similar trends were observed in Figure S4b). We further compared reproducibility by label free digests of MEF and E. coli proteomes, where legumain digests proteome quantification (LFQ) with MaxQuant after filtering consistently showed a high cleavage efficiency at Asn sites with for protein groups quantified by two or more peptides (SI more missed cleavages at Asp (SI Figure S5). Table S2). This demonstrated excellent correlation of the LFQ Assessing Legumain Efficiency in Different Reaction values between the technical digestion replicates and also a Conditions. Previous publications have shown that legumain 2966 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 4. Complementary N-terminome coverage by parallel digestion with legumain, GluC, and trypsin. (a) Experimental workflow for the enrichment of N-terminal peptides using HUNTER. For detailed description, see the main text. Light blue and orange circles indicate differential stable isotope labeling by reductive dimethylation, and magenta triangles indicate undecanal modification. (b) Overlap in N-termini identification based on the first seven amino acids of each N-terminal peptide identified in the experiments with the three proteases. Peptide MS/MS fragmentation spectra of (c) the acetylated mature N-terminus of glucosinolate transporter-1 and (d) a proteolysis-derived dimethylated N- terminus in the CLPR3 subunit of the ATP-dependent Clp protease. Both termini were identified in legumain digests, with sequence context surrounding the identified peptide indicated in gray. UniProt accession code and gene accession numbers are indicated. is more active at a lower pH and that cleavage after Asn is step incubation maintained efficient cleavage at Asn residues favored at a higher pH. To test if this is also the case with the while decreasing the number of peptides containing missed digest conditions applied here, we digested whole-leaf A. Asp cleavage sites (SI Figure S5). Denaturants are commonly used for proteome preparations thaliana proteome at varying pHs between 5.0 and 6.5 for shorter (2 h) and longer (24 h) incubation times (SI Figure but are problematic during digestion. We tested the tolerance S6a). We observed the highest number of peptide of legumain to urea and guanidinium hydrochloride but identifications at pH 5.5 and pH 6, which may have been observed dramatically decreased digestion efficiency (SI Figure caused by the higher propensity for proteome precipitation at a S7a), reflected in decreased peptide identifications (SI Figure lower pH that we observed in concentrated samples. As S7b) with an increased frequency of missed cleavage (SI Figure expected, legumain showed an increasing preference for Asn S7c). In contrast, legumain tolerated the organic solvent with an increasing pH, and this kinetic preference was also acetonitrile quite well with little decrease in efficiency up to reflected at different digestion times. Short proteome 10% acetonitrile concentration (SI Figure S7). digestions (2 h) and/or lower pH (pH 5.0) resulted in a We also assessed the amount of legumain necessary to higher proportion of Asn cleavages (SI Figure S6b), whereas achieve optimal digest by varying the protease to proteome longer incubation (24 h) and/or higher pH yielded more ratio. Digestion appeared equally efficient in several dilutions complete cleavage after Asp (SI Figure S6b). On the basis of down to a legumain to proteome ratio of 1:100, as judged by this observation, we tested whether acidification of the MEF the number of identified peptides from an equal starting proteome digest after an initial incubation at pH 6 would result material (SI Figure S8a). Another important enzyme property in more complete cleavage at Asp residues. Indeed, this two- for routine use is the shelf-life time, where our recombinant 2967 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 5. Identification of N-glycosylation sites by sequential processing with legumain and PNGase F. (a) Scheme of the experimental workflow. For details, see the main text. Light blue and orange circles indicate differential stable isotope labeling by dimethylation. Asterisks indicate deamidated asparagine residue arising from PNGase F treatment. (b) Overlap of N-glycosylation identified with internal deamidated Asn in workflow 1 and with C-terminal deamidated Asn in workflow 2. (c) MS/MS fragmentation spectra of an N-glycosylation site in MYROSINASE 1 identified in both workflows. UniProt and A. thaliana gene accession codes are indicated. legumain preparations withstood 10 freeze/thaw cycles of identified protein N-termini, we extracted the first seven without loss of peptidase activity (SI Figure S8b). residues of each N-terminal peptide (Figure 4b). Only a Legumain is Highly Complementary for Protein N- minority of 100 protein N-termini were identified by all three Termini Profiling. The complementarity of different proteases, and an additional 632 were identified by two digestion enzymes is particularly helpful for the identification proteases, with a majority of 2101 N-termini identified only in of specific post-translational modification sites such as digests of a single enzyme (Figure 4b). For example, the 14,16 10,20 phosphorylations and protein termini, as these may acetylated, native N-terminus of the glucosinolate transporter-1 reside in sequences that are not accessible by trypsin. To NPF2.10 was only identified in legumain digests, whereas demonstrate the value of legumain for this purpose, we profiled multiple Glu in the N-terminal peptide excluded identification N-termini in the A. thaliana leaf proteome with our recently in GluC digests, while the tryptic digest would deliver a very established HUNTER protocol (Figure 4a). In three long peptide with unfavorably high content in acidic amino replicates per enzyme, two aliquots of A. thaliana leaf acids (Figure 4c). Similarly, legumain digests uniquely proteome were differentially dimethyl labeled to block all identified an endoproteolytic processing site in CLPR3 (Figure unmodified primary amines. Thus, all protein N-termini are 4d). modified, either by endogenous modifications such as Legumain as a Tool for N-Glycosylation Site acetylation or by in vitro dimethylation. Differentially labeled Mapping. N-Glycosylation is an important and frequent 23,37 duplicates are unified and digested in parallel with legumain, modification of secreted proteins. The removal of the GluC, or trypsin. This digestion generates new N-terminal glycan by PNGase F results in deamidation of the Asn to Asp primary amines in all internal and C-terminal peptides, which and facilitates mass spectrometry-based identification of are then undecanal labeled while the blocked N-terminal occupied N-glycosylation sites. We speculated that N- peptides remain inert. Undecanal tagging increases the glycosylation would prevent legumain from hydrolyzing hydrophobicity of the digest-generated peptides, which enables adjacent peptide bonds, on the basis of the crystal structure their selective retention on a C18 cartridge, while the dimethyl of human legumain that revealed that the zwitterionic labeled (or otherwise modified) protein N-terminal peptides character of its S1 subsite provides an ideal binding site for are highly enriched in the flow-through for selective analysis Asn, but no space to accommodate a glycosylated Asn (Figure 4a). With this negative selection, we identified a total residue. In contrast, Asp residues resulting from deglycosy- of 4773 N-terminal peptides (SI Table S3), with 1167, 1209, lation by a PNGase F treatment would be cleaved. Thus, a and 2342 N-terminal peptides identified in legumain, GluC, sequential treatment with legumain and PNGase F should and tryptic digests, respectively. The differential labeling result in longer peptides containing a missed deamidated Asn demonstrated equivalent accuracy in quantification for all (Figure 5a, workflow 1), whereas a PNGase F treatment before three enzymes (SI Figure S9). For comparison of the overlap legumain digest should result in shorter peptides ending with a 2968 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article deamidated Asn (Figure 5a, workflow 2). In proof of concept, unassignable, complex spectra. In contrast, fragmentation by we isolated A. thaliana apoplastic fluid proteome enriched in electron transfer dissociation (ETD) is not affected by the secreted N-glycosylated proteins and sequentially treated two position of the basic residues and has been reported to aliquots with legumain and PNGase F and vice versa in two improve peptide identifications after digestion with proteases parallel reactions (Figure 5a). Treated peptides were differ- that generate long peptides or peptides with internal basic entially dimethyl labeled with heavy and light formaldehyde residues. and combined before nano-LC−MS/MS analysis. Indeed, we Parallel digests with all three enzymes increased proteome found several peptides that fulfilled the expectations (Figure and protein sequence coverage and were particularly beneficial 5b, SI Table S4). Peptides from 45 proteins contained a for protein N-termini identification, where a single digest often deamidated Asn as missed cleavage in workflow 1, whereas generated N-terminal peptides that are too short, too long, or peptides from 49 proteins ended with deamidated Asn. For 6 otherwise unfavorable for identification. By extension, similar proteins, including myrosinase 1 (TGG1), an important benefits may be expected for other post-translational glycoprotein involved in plant defense, we observed peptides modifications. Furthermore, using a sequential incubation matching to the same N-glycosylation sites in both workflows, with legumain and PNGase F, we have demonstrated that providing intrinsic orthogonal validation (Figure 5c). Notably, legumain cannot cleave after glycosylated Asn residues, in this glycosylation site has also been reported previously. contrast to deamidated deglycosylated Asn after PNGase F treatment. On a larger scale, evidence for N-glycosylation can DISCUSSION be obtained by PNGase F treatment in O-water, which results in deamidation of Asn to partially O-labeled Asp. It is well established that the use of complementary proteases However, this partial labeling makes relative quantification with different specificity in bottom-up proteomic workflows across samples challenging, while omission of the O-labeling can improve proteome coverage and provide access to 3,11 decreases confidence of the site identification as deamidation sequences that are missed in tryptic digests. This not only can also occur spontaneously. On the basis of our proof-of- allows identification of “missing proteins” that have not been concept experiment with A. thaliana apoplast proteome, we identified by mass spectrometry before, one of the central propose tandem sequential PNGase F/legumain treatment as goals of the Human Proteome Project, but also is important an alternative strategy for experimental validation of N- for comprehensive mapping of post-translational modification 14,16 glycosylation sites. sites including phosphorylations and global identification 10,20 There are many further potential applications for legumain of protein termini. Here we characterize human legumain in peptide-centric proteome workflows. We have previously as a new digestion protease in the proteomic toolbox. used legumain to generate high-quality E. coli proteome- Legumain exhibited strict sequence specificity for cleavage derived peptide libraries, which enabled detailed cleavage after Asn and Asp and a high cleavage efficiency that makes it a specificity profiling of the vitamin K-dependent coagulation highly suitable alternative proteolytic enzyme for proteomics. protease sirtilin that would not be possible in trypsin-generated We have established conditions for reliable in-solution libraries. Legumain maintains activity at a low pH, down to proteome digestion with legumain and show that the pH 4.0, and is active in nonreducing conditions; therefore it alternative cleavage site at Asn yields an entirely different set is also suitable for protein disulfide bond determination at the of peptides than trypsin does, with only minimal overlap in the low-pH environment required to prevent disulfide reshuf- number of identified peptides delimited by Asp on both sides, fling. Currently pepsin is used for these experiments due to in comparison to those with GluC digests. In agreement with its high activity under acidic conditions. However, pepsin the kinetic cleavage preferences determined with peptide 22,24 generates a large number of overlapping peptides due to its substrates, Vidmar et al. reported only minimal cleavage at broad specificity with a nonexclusive preference for cleavage Asp residues at pH 6 during in-gel digestion of a denatured after Tyr, Phe, Trp, and Leu that complicate the spectra proteome. In contrast, we have observed a much higher assignment, whereas legumain’s high cleavage specificity would cleavage efficiency at Asp residues at pH 6 in our data set, alleviate this problem. Taken together, we propose that which likely arises from the different digest conditions (in-gel recombinant human legumain is an attractive protease to digestion for 2 h with citrate buffer compared to in-solution complement trypsin in bottom-up mass spectrometry-based digestion for 16 h in a MES buffer). We noted that the data set proteomics. of Vidmar et al. contains a higher proportion of missed cleavages at Asn and Asp residues than our data set (SI Figure ASSOCIATED CONTENT S10), suggesting that the prolonged reaction under more favorable in-solution conditions enables legumain to have a sı * Supporting Information more complete cleavage at Asp residues even at pH 6.0. The Supporting Information is available free of charge at Notably, a similar effect was observed for Ulilysin/LysargiNase, https://pubs.acs.org/doi/10.1021/acs.analchem.9b03604. which has a strong preference for Arg when tested with peptide Figures of substrate cleavage specificity, biophysical substrates but results in a near-complete digestion at Lys properties of peptides, comparisions of protein sequence residues under proteome digest conditions. coverage, overlap of protein group identifications, Digestion with legumain consistently identified more potential cleavage sites, optimization of legumain peptides than digestion with GluC, but trypsin was far digestion conditions, cleavage specificity analysis, superior. This has been reported for various other digestion legumain digestion efficiency, graphs of numbers of proteases, particularly those that do not select for cleavage at 4,11 peptides and missed cleavage sites, proteolytic activity of basic residues. One explanation is that digestion with legumain, quantification of protein N-terminal peptides, enzymes such as legumain and GluC generates peptides with and potential legumain cleavage sites present in peptides internal basic residues. This can give rise to internal fragment (PDF) ions during collision-induced dissociation (CID) and result in 2969 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article REFERENCES Tables of lists of proteins identified after digestion, N- termini identified, and identification of N-glycosylation (1) Aebersold, R.; Mann, M. Nature 2016, 537 (7620), 347−55. sites (XLSX) (2) Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J. R., 3rd Chem. Rev. 2013, 113 (4), 2343−94. (3) Tsiatsiani, L.; Heck, A. J. FEBS J. 2015, 282 (14), 2612−26. AUTHOR INFORMATION ■ (4) Swaney, D. L.; Wenger, C. D.; Coon, J. J. J. Proteome Res. 2010, 9 (3), 1323−9. Corresponding Author (5) Perrar, A.; Dissmeyer, N.; Huesgen, P. F. J. Exp. Bot. 2019, 70 Pitter F. Huesgen − Central Institute for Engineering, (7), 2021−38. Electronics and Analytics, ZEA-3, Forschungszentrum Jülich, (6) Lange, P. F.; Overall, C. M. Curr. Opin. Chem. Biol. 2013, 17 (1), 52428 Jülich, Germany; Cologne Excellence Cluster on Cellular 73−82. Stress Responses in Aging Associated Diseases, Medical Faculty (7) Turk, B.; Turk, D.; Turk, V. EMBO J. 2012, 31 (7), 1630−43. and University Hospital and Institute for Biochemistry, Faculty (8) Klein, T.; Eckhard, U.; Dufour, A.; Solis, N.; Overall, C. M. of Mathematics and Natural Sciences, University of Cologne, Chem. Rev. 2018, 118 (3), 1137−1168. 50931 Cologne, Germany; orcid.org/0000-0002-0335- (9) Niedermaier, S.; Huesgen, P. F. Biochim. Biophys. Acta, Proteins 2242; Email: p.huesgen@fz-juelich.de Proteomics 2019, 1867 (12), 140138. (10) Vogtle, F. N.; Wortelkamp, S.; Zahedi, R. P.; Becker, D.; Authors Leidhold, C.; Gevaert, K.; Kellermann, J.; Voos, W.; Sickmann, A.; Wai Tuck Soh − Department of Biosciences, University of Pfanner, N.; Meisinger, C. Cell 2009, 139 (2), 428−39. Salzburg, 5020 Salzburg, Austria; orcid.org/0000-0003- (11) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. Nat. Protoc. 0082-7983 2016, 11 (5), 993−1006. (12) Huesgen, P. F.; Lange, P. F.; Rogers, L. D.; Solis, N.; Eckhard, Fatih Demir − Central Institute for Engineering, Electronics and U.; Kleifeld, O.; Goulas, T.; Gomis-Ruth, F. X.; Overall, C. M. Nat. Analytics, ZEA-3, Forschungszentrum Jülich, 52428 Jülich, Methods 2015, 12 (1), 55−58. Germany; orcid.org/0000-0002-5744-0205 (13) Schrader, ̈ C. U.; Lee, L.; Rey, M.; Sarpe, V.; Man, P.; Sharma, Elfriede Dall − Department of Biosciences, University of S.; Zabrouskov, V.; Larsen, B.; Schriemer, D. C. Mol. Cell. Proteomics Salzburg, 5020 Salzburg, Austria 2017, 16, 1162−1171. Andreas Perrar − Central Institute for Engineering, Electronics (14) Schlosser, A.; Vanselow, J. T.; Kramer, A. Anal. Chem. 2005, 77, and Analytics, ZEA-3, Forschungszentrum Jülich, 52428 Jülich, 5243−5250. Germany (15) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Yates, J. R., 3rd Nat. Sven O. Dahms − Department of Biosciences, University of Biotechnol. 2003, 21 (5), 532−8. Salzburg, 5020 Salzburg, Austria (16) Gonczarowska-Jorge, H.; Loroch, S.; Dell’Aica, M.; Sickmann, Maithreyan Kuppusamy − Central Institute for Engineering, A.; Roos, A.; Zahedi, R. P. Anal. Chem. 2017, 89 (24), 13137−13145. Electronics and Analytics, ZEA-3, Forschungszentrum Jülich, (17) Meyer, J. G.; Kim, S.; Maltby, D. A.; Ghassemian, M.; Bandeira, 52428 Jülich, Germany; orcid.org/0000-0001-6866-0417 N.; Komives, E. A. Mol. Cell. Proteomics 2014, 13 (3), 823−35. (18) Wu, C. C.; Yates, J. R. Nat. Biotechnol. 2003, 21, 262−267. Hans Brandstetter − Department of Biosciences, University of (19) Giansanti, P.; Aye, T. T.; van den Toorn, H.; Peng, M.; van Salzburg, 5020 Salzburg, Austria; orcid.org/0000-0002- Breukelen, B.; Heck, A. J. Cell Rep. 2015, 11 (11), 1834−43. 6089-3045 (20) Lange, P. F.; Huesgen, P. F.; Nguyen, K.; Overall, C. M. J. Complete contact information is available at: Proteome Res. 2014, 13 (4), 2028−44. https://pubs.acs.org/10.1021/acs.analchem.9b03604 (21) Dall, E.; Brandstetter, H. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (27), 10940−5. Author Contributions (22) Vidmar, R.; Vizovisek, M.; Turk, D.; Turk, B.; Fonovic, M. EMBO J. 2017, 36 (16), 2455−2465. W.T.S. and F.D. contributed equally. (23) Hebert, D. N.; Lamriben, L.; Powers, E. T.; Kelly, J. W. Nat. Author Contributions Chem. Biol. 2014, 10 (11), 902−10. F.D. and P.F.H. conceived the project. W.T.S., F.D., E.D., (24) Dall, E.; Brandstetter, H. Acta Crystallogr., Sect. F: Struct. Biol. S.O.D., H.B., and P.F.H. designed the experiments and Cryst. Commun. 2012, 68 (1), 24−31. analyzed the data. E.D. and S.O.D. provided the recombinant (25) Wessel, D.; Flugge, U. I. Anal. Biochem. 1984, 138 (1), 141−3. human legumain, and W.T.S., F.D., M.K., and A.P. performed (26) Hughes, C. S.; Moggridge, S.; Muller, T.; Sorensen, P. H.; the experiments. The manuscript was written by W.T.S., F.D., Morin, G. B.; Krijgsveld, J. Nat. Protoc. 2019, 14 (1), 68−85. (27) Rappsilber, J.; Mann, M.; Ishihama, Y. Nat. Protoc. 2007, 2 (8), and P.F.H. and edited by all authors. All authors have approved 1896−906. the final version of the manuscript. (28) Rinschen, M. M.; Hoppe, A. K.; Grahammer, F.; Kann, M.; Notes Volker, L. A.; Schurek, E. M.; Binz, J.; Hohne, M.; Demir, F.; Malisic, The authors declare no competing financial interest. M.; Huber, T. B.; Kurschat, C.; Kizhakkedathu, J. N.; Schermer, B.; Huesgen, P. F.; Benzing, T. J. Am. Soc. Nephrol. 2017, 28 (10), 2867− ACKNOWLEDGMENTS (29) Tyanova, S.; Temu, T.; Cox, J. Nat. Protoc. 2016, 11 (12), We thank Dr. Ulrich Eckhard for critically reading this 2301−2319. manuscript. W.T.S. is a Ph.D. student in the Immunity in (30) Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M. Y.; Cancer and Allergy Ph.D. program funded by the Austrian Geiger, T.; Mann, M.; Cox, J. Nat. Methods 2016, 13 (9), 731−40. Science Fund FWF (project W_01213). This project was in (31) Kelley, L. A.; Mezulis, S.; Yates, C. M.; Wass, M. N.; Sternberg, part supported by a starting grant of the European Research M. J. Nat. Protoc. 2015, 10 (6), 845−58. Council, with funding from the European Union’s Horizon (32) Weng, S. S. H.; Demir, F.; Ergin, E. K.; Dirnberger, S.; Uzozie, 2020 program (grant 639905, to P.F.H.) and the German A.; Tuscher, D.; Nierves, L.; Tsui, J.; Huesgen, P. F.; Lange, P. F. Mol. Research Foundation DFG (FOR2743, grant HU1756/3-1 to Cell. Proteomics 2019, 18, 2335. P.F.H.). (33) Joosten, M. H. Methods Mol. Biol. 2012, 835, 603−10. 2970 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article (34) Deutsch, E. W.; Orchard, S.; Binz, P. A.; Bittremieux, W.; Eisenacher, M.; Hermjakob, H.; Kawano, S.; Lam, H.; Mayer, G.; Menschaert, G.; Perez-Riverol, Y.; Salek, R. M.; Tabb, D. L.; Tenzer, S.; Vizcaino, J. A.; Walzer, M.; Jones, A. R. J. Proteome Res. 2017, 16 (12), 4288−4298. (35) Vizcaino, J. A.; Csordas, A.; del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; Xu, Q. W.; Wang, R.; Hermjakob, H. Nucleic Acids Res. 2016, 44 (D1), D447−56. (36) Weng, S. S. H.; Demir, F.; Ergin, E. K.; Dirnberger, S.; Uzozie, A.; Tuscher, D.; Nierves, L.; Tsui, J.; Huesgen, P. F.; Lange, P. F. Mol. Cell. Proteomics 2019, 18, 2335. (37) Clerc, F.; Reiding, K. R.; Jansen, B. C.; Kammeijer, G. S. M.; Bondt, A.; Wuhrer, M. Glycoconjugate J. 2016, 33, 309−343. (38) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431−1440. (39) Dall, E.; Brandstetter, H. Mechanistic and structural studies on legumain explain its zymogenicity, distinct activation pathways, and regulation. Proceedings of the National Academy of Sciences of the United States of America, 2013. (40) Barth, C.; Jander, G. Plant J. 2006, 46 (4), 549−62. (41) Liebminger, E.; Grass, J.; Jez, J.; Neumann, L.; Altmann, F.; Strasser, R. Phytochemistry 2012, 84,24−30. (42) Wang, Y.; Chen, Y.; Zhang, Y.; Wei, W.; Li, Y.; Zhang, T.; He, F.; Gao, Y.; Xu, P. J. Proteome Res. 2017, 16 (12), 4352−4363. (43) Paik, Y. K.; Overall, C. M.; Deutsch, E. W.; Van Eyk, J. E.; Omenn, G. S. J. Proteome Res. 2017, 16 (12), 4253−4258. (44) Tallant, C.; Garcia-Castellanos, R.; Marrero, A.; Canals, F.; Yang, Y.; Reymond, J. L.; Sola, M.; Baumann, U.; Gomis-Ruth, F. X. Biol. Chem. 2007, 388 (11), 1243−53. (45) Dahms, S. O.; Demir, F.; Huesgen, P. F.; Thorn, K.; Brandstetter, H. J. Thromb. Haemostasis 2019, 17 (3), 470−481. (46) Gorman, J. J.; Wallis, T. P.; Pitt, J. J. Mass Spectrom. Rev. 2002, 21 (3), 183−216. 2971 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Analytical Chemistry Pubmed Central

ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease

Loading next page...
 
/lp/pubmed-central/extending-proteome-coverage-with-legumain-as-a-highly-specific-FJ4wdL7rXh
Publisher
Pubmed Central
Copyright
Copyright © 2020 American Chemical Society
ISSN
0003-2700
eISSN
1520-6882
DOI
10.1021/acs.analchem.9b03604
Publisher site
See Article on Publisher Site

Abstract

This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/ac Article ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease # # Wai Tuck Soh, Fatih Demir, Elfriede Dall, Andreas Perrar, Sven O. Dahms, Maithreyan Kuppusamy, Hans Brandstetter, and Pitter F. Huesgen* Cite This: Anal. Chem. 2020, 92, 2961−2971 Read Online Metrics & More Article Recommendations sı Supporting Information ACCESS * ABSTRACT: Bottom-up mass spectrometry-based proteomics utilizes proteolytic enzymes with well characterized specificities to generate peptides amenable for identification by high-throughput tandem mass spectrometry. Trypsin, which cuts specifically after the basic residues lysine and arginine, is the predominant enzyme used for proteome digestion, although proteases with alternative specificities are required to detect sequences that are not accessible after tryptic digest. Here, we show that the human cysteine protease legumain exhibits a strict substrate specificity for cleavage after asparagine and aspartic acid residues during in-solution digestions of proteomes extracted from Escherichia coli, mouse embryonic fibroblast cell cultures, and Arabidopsis thaliana leaves. Generating peptides highly complementary in sequence, yet similar in their biophysical properties, legumain (as compared to trypsin or GluC) enabled complementary proteome and protein sequence coverage. Importantly, legumain further enabled the identification and enrichment of protein N- termini not accessible in GluC- or trypsin-digested samples. Legumain cannot cleave after glycosylated Asn residues, which enabled the robust identification and orthogonal validation of N-glycosylation sites based on alternating sequential sample treatments with legumain and PNGaseF and vice versa. Taken together, we demonstrate that legumain is a practical, efficient protease for extending the proteome and sequence coverage achieved with trypsin, with unique possibilities for the characterization of post-translational modification sites. urrent “bottom-up” mass spectrometry-based proteomics, modification sites, and even whole proteins remain invisible in C also termed shotgun proteomics, can achieve near- proteome analyses relying on trypsin alone. This is especially complete proteome coverage and allows for extensive mapping true for proteolytic processing, a site-specific post-translational of post-translational modification sites. The basis of this protein modification that can irreversibly alter protein 5,6 approach is the selective protease-mediated digestion of function, interaction, and localization and thereby exert isolated proteomes into peptides, which are then typically important signaling functions. Processed proteoforms are separated by reverse-phase liquid chromatography under acidic unambiguously identified by their new protease-generated neo- conditions and analyzed by tandem mass spectrometry (MS/ 8,9 N-, or C-termini. The identification of neo-N-, and C- MS). Peptides are subsequently identified by computational terminal peptides, which constitute a minor fraction among all matching of the acquired spectra to proteome databases or peptides in a proteome digest, is facilitated by a variety of spectral libraries, and the proteins present in the sample are methods that have been developed to allow for their selective inferred on the basis of the identified peptides. The serine enrichment. However, many neo-N-, or C-terminal peptides protease trypsin has become the dominant workhorse for the are too short for mass spectrometry-based identification when proteome digestions due to its high cleavage efficiency, high only a single protease is used. specificity for cleavage after Arg or Lys, and affordable price, Alternative proteases with a high sequence specificity are even for high-quality preparations. Proteomes digested with therefore of great interest and increasingly applied in bottom- trypsin therefore consist of predictable peptides with a C- 3,10 up proteomics, including termini profiling approaches. terminal basic residue favorable for the ionization and generation of a dominant y-ion series, which facilitates database searches and peptide identification. However, about Received: August 7, 2019 half of the peptides generated by trypsin are less than six Accepted: January 17, 2020 residues long and therefore too small for identification and/or Published: January 17, 2020 unambiguous assignment to specific protein sequences. Thus, many protein segments, including critical post-translational https://dx.doi.org/10.1021/acs.analchem.9b03604 © 2020 American Chemical Society Anal. Chem. 2020, 92, 2961−2971 2961 Analytical Chemistry pubs.acs.org/ac Article Established proteases include AspN for cleavage before Asp desalting using PD-10 columns (GE Healthcare). Purified and Glu; chymotrypsin for cleavage after Phe, Tyr, Leu, Trp, legumain was activated at 20 °Cina buffer containing 100 mM and Met; GluC (also known as Staphyloccoccus aureus protease citric acid (pH 4.0), 100 mM NaCl, and 2 mM DTT. The V8) for cleavage after Asp and Glu; LysC for cleavage after progress of autoactivation was monitored by SDS-PAGE. 3,11 Lys; LysN for cleavage before Lys; LysargiNase for cleavage Activated legumain was further purified using a PD-10 column before Arg and Lys; and the prolyl endopeptidase neprosin (GE Healthcare) followed by size exclution chromatography to that selectively cleaves after Pro and Ala. Also, proteases with have the active protein in a final buffer composed of 20 mM broader sequence specificity such as elastase and thermoly- citric acid (pH 4.0), 50 mM NaCl, and 2 mM DTT. Legumain 14 15 16 sin, proteinase K, subtilisin, and thermolysin WaLP and activity was evaluated using the legumain specific fluorescent MaLP are occasionally applied but less favored due to the substrate Z-Ala-Ala-Asn-AMC (AAN-AMC; Bachem) at a increased sample complexity with overlapping peptides and the concentration of 50 μM in assay buffer composed of 50 mM less efficient spectrum-to-sequence matching due to the lack of citric acid (pH 5.5), 100 mM NaCl, and 2 mM DTT at 37 °C. adefined cleavage specificity as a restraint. Notably, digest Fluorescence was detected using an Infinite M200 Plate with a single additional protease increases the number of Reader (Tecan) at 460 nm after excitation at 380 nm. protein identifications by an average of 7−8% and enables A. thaliana Proteome Preparation. A. thaliana Colum- the discovery of critical PTMs including phosphorylation bia (Col-8) leaves were harvested from 10 week old plants 16,19 10,20 sites and N-terminal processing sites that are missed in grown on soil under short day conditions (9 h/15 h tryptic digests. Hence, there is a persistent strong demand for −2 −1 photoperiod, 22 °C/18 °C, 120 μmol of photons m s ) new, highly specific proteolytic enzymes with improved, and snap frozen in liquid nitrogen. Leaves were ground in complementary, or unexplored sequence specificity. liquid nitrogen and resuspended in 10 mL/g fresh weight of Human legumain, also known as asparaginyl endopeptidase extraction buffer (6 M Gua-HCl, 0.1 M HEPES (pH 7.4), 5 (AEP), is a well characterized caspase-like human cysteine mM EDTA, 1 mM DTT, and HALT protease inhibitor protease known to cleave model substrates selectively after Asn cocktail; ThermoFisher, Dreieich, Germany). The suspension and Asp residues. Recently, legumain cleavage specificity was was homogenized using a Polytron PT-2500 (Kinematica, further characterized by in-gel digestion of denatured complex Luzern, Switzerland) and filtered through Miracloth (Merck, proteomes that revealed pH-dependent differences in sequence Darmstadt, Germany), and debris and nuclei were removed by specificity, with an optimal pH for cleavage after Asn and Asp 22 centrifugation at 500g,4 °C for 10 min. Proteins in the at pH 6 and 4.5, respectively. On the basis of this data, it was supernatant were purified by chloroform−methanol precip- further suggested that legumain may be a suitable choice as a 22 itation, resuspended in extraction buffer, and reduced with 5 precision digestion enzyme in proteomics applications. mM DTT at 56 °C, 30 min followed by alkylation with 15 mM Encouraged by these reports, we reasoned that legumain iodoacetamide for 30 min at 25 °C. The reaction was might also be an attractive enzyme for standard in-solution quenched by the addition of 15 mM DTT for 15 min. The digestion proteomics workflows. We show that the parallel proteome extract was purified again with chloroform− digestion of proteomes isolated from Arabidopsis thaliana (A. methanol precipitation, resuspended in 0.2 mL of 0.1 M thaliana) leaves, mouse embryonic fibroblasts (MEF), or NaOH, and diluted with water and 1 M Hepes (pH 7.4) to a Escherichia coli (E. coli) cell cultures with legumain, trypsin, and final concentration of 4 mg/mL in 0.1 M HEPES (pH 7.4). GluC results in the identification of distinct peptides that The protein concentration was quantified using the BCA assay together increase protein sequence and proteome coverage. (ThermoFisher, Dreieich, Germany). For digestion, aliquots of Legumain retained its remarkable specificity even under the concentrated A. thaliana proteome extracts were diluted at unfavorable conditions. N-terminome profiling demonstrated least four times to reach the required digestion buffer a strong complementarity to trypsin and superior performance conditions, and the pH was confirmed with pH strips compared to that of GluC. Asn is also the site of N-linked (Merck, Darmstadt, Germany). glycosylation, a common protein post-translational modifica- Mouse Embryonic Fibroblast Proteome Preparation. tion important in protein stability, folding, and protein− Mouse embryonic fibroblast (MEF) cells were cultured in protein interaction. By sequential processing with PNGase F DMEM GlutaMax high glucose (Gibco 61965-026) supple- and legumain, and vice versa, we demonstrate that N- mented with 10% FBS and 1× penicillin/streptomycin (Gibco glycosylation prevents legumain cleavage and propose that 15140-122) at 37 °C, 5% CO . Once the cells reached a this tandem treatment strategy can provide orthogonal validation of N-glycosylation sites. Taken together, our data confluency of up to 90%, the media were removed, washed demonstrate that legumain is an attractive and reliable protease with warm PBS, and trypsinized (Gibco 25300-054). The for the specific digestion of proteomes after Asn and Asp, with trypsinized cells were pelleted, washed twice with warm PBS to particular advantages for PTM site identification including remove excess media, and lysed with 1% SDS 100 mM HEPES processed N-termini and N-glycosylation sites. (pH 7.5) containing 1:50 (v/v) protease inhibitor cocktail (Sigma P8340). The sample was heated to 95 °C for 5 min, EXPERIMENTAL SECTION cooled, sonicated for 2 min, and heated again to 95 °C for 5 min to shear DNA. The protein concentration was measured, Expression, Purification, and Activation of Human and 100 μg of protein was used for each proteome digestion. Legumain. Human legumain was produced using the Proteins were reduced with 10 mM DTT for 30 min at 37 °C Leishmania tarentolae expression system (LEXSY) following a and alkylated by the addition of 50 mM chloroacetamide previously published protocol. Briefly, legumain was (CAA) and incubation for 30 min at RT in the dark. The recombinantly expressed as a secreted protein by a LEXSY reaction was quenched by incubation with 50 mM DTT for 20 suspension culture at 26 °C. The supernatant containing prolegumain protein was harvested by centrifugation and min at room temperature (RT) before purification with SP3 2+ subjected to Ni -NTA affinity purification, followed by beads and elution in the required digestion buffer. 2962 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article E. coli Proteome Preparation. E. coli Dh5α cells were Mass Spectrometry Data Analysis. Database searches grown in 200 mL of LB media until an optical density of OD were performed with MaxQuant v.1.6.0.16 using standard nm of 0.7. Cells were harvested by centrifugation at 400g for Bruker Q-TOF settings that included peptide mass tolerances 15 min at 4 °C, washed by adding ice-cold PBS, and of 0.07 Da in the first search and 0.006 Da in the main search. resuspended in 1 mL of lysis buffer (4% (v/v) SDS, 50 mM A. thaliana, M. musculus, and E. coli protein databases were HEPES (pH 7.4), 5 mM EDTA, 1× HALT protease inhibitor downloaded from UniProt (A. thaliana release 2018_01, cocktail (ThermoScientific)) per 0.1 g of fresh weight. The 41350 sequences) with appended common contaminants as cells were lysed by heating to 95 °C two times for 5 min, with embedded in MaxQuant. The “revert” option was enabled for 10 min of cooling with ice. Proteins were purified by decoy database generation. For shotgun proteome samples, chloroform−methanol precipitation and resuspended in 6 M specificity was set to “unspecific” for the characterization of the Gua-HCl, 100 mM HEPES (pH 7.4), 5 mM EDTA, and the cleavage specificity, otherwise according to the enzyme used concentration was estimated using the BCA assay (Thermo- (cleavage at K/R|X for trypsin, D/E|X for GluC, or D/N|X for Fisher, Dreieich, Germany). One hundred micrograms of legumain). Oxidation (M) and acetylation (protein N-term) proteome was reduced by the addition of 10 mM DTT for 30 were set as variable modifications, and the “match between runs” option was disabled. The analysis of the label-free min at 37 °C and alkylated by the addition of 50 mM CAA for shotgun data was performed with Perseus v.1.6.1.1; the 30 min at RT in the dark, and the reaction was quenched by validation of the protein identification required at least two incubation with 50 mM DTT for 20 min at RT. The proteome unique peptides for each protein and label-free quantification was purified by chloroform−methanol precipitation and (LFQ) in at least two replicates. Searches for the N-termini resolubilized in the appropriate digestion buffer. were performed as described above, except that the enzyme Proteome Digestions. Proteome aliquots of 100 μg were specificity was set as Arg-C/GluC (DE)/legumain semispecific individually digested by legumain, GluC, or trypsin. The digestion with legumain was carried out in a reaction with a free N-terminus and duplex dimethyl labeling with light 12 13 containing 0.1 M MES (pH 6.0), 0.1 M NaCl, and 2 mM CH O formaldehyde or heavy CD Oformaldehyde 2 2 DTT at a protease to proteome ratio of 1:50 (m:m), unless (peptide N-term and K). Oxidation (M), acetyl (N-term), otherwise stated. For GluC (SERVA Electrophoresis, Heidel- Gln → pyro-Glu, and Glu → pyro-Glu were set as dynamic berg, Germany) digestion, the same amount of proteome was modifications, and the requantify option was turned off; the digested in PBS (pH 7.4) with a protease to proteome ratio of unspecific search window was set to 8−40 amino acids. Data 1:50, whereas a 1:100 ratio was used for trypsin (SERVA evaluation and positional annotation for N-termini analyses Electrophoresis, Heidelberg, Germany) digestion in 0.1 M were performed using an in-house Perl script (MANTI.pl; HEPES (pH 7.4) supplemented with 5% acetonitrile and 5 available at http://MANTI.sourceforge.io) that combines mM CaCl . The pH was confirmed using pH strips (Merck, information provided by MaxQuant and UniProt to annotate Darmstadt, Germany), and the digestions were carried out at and classify identified N-terminal peptides. In short, MaxQuant 37 °C overnight. For pH shift assays with legumain, an aliquot peptide identifications are consolidated by removing nonvalid of the MEF proteome was digested at pH 6.0 for 5 h at 37 °C, identifications (peptides identified with N-terminal pyro-Glu and then the pH was lowered by the stepwise addition of 1 M peptides that do not contain Glu or Gln as N-terminal residue, HCl until pH 4.0 was reached. An additional 2 μg of legumain peptides with dimethylation at N-terminal Pro), contaminant, and 1 mM DTT were added and incubated for another 5 h at reverse database peptides, and nonquantifiable acetylated 37 °C. peptides in multichannel experiments (no K in peptide Mass Spectrometry. All samples were desalted using self- sequence to determine labeled channel). For N-terminal packed C18 Stop and Go Extraction tips as previously peptides mapping to multiple entries in the UniProt protein described. Analysis was performed on a two-column nano- database, a “preferred” entry was determined in a binary HPLC setup (Ultimate 3000 nano-RSLC system with Acclaim decision tree. Protein entries where the identified peptide PepMap 100 C18, i.d. 75 μm, particle size 3 μm; trap column matched positions 1 or 2 were preferred over alternative of 2 cm and analytical column of 50 cm length; ThermoFisher) positions, and then manually reviewed UniProt protein entries with a binary gradient from 5 to 32.5% B for 80 min (A, H O+ were favored over alternative models. If multiple entries 0.1% FA; B, ACN + 0.1% FA) and a total runtime of 2 h per persisted, the alphabetically first entry was used to retrieve sample coupled to a high-resolution Q-TOF mass spectrom- positional annotation information. For the visualization of eter (Impact II, Bruker) as previously described. Data was protein sequence coverage, protein structures were modeled acquired with the Bruker HyStar Software (v3.2, Bruker with the Phyre2 server. Daltonics) in line-mode in a mass range from 200 to 1500 m/z Enrichment of N-Terminal Peptides. Protein N-terminal at an acquisition rate of 4 Hz. The top 17 most intense ions peptides were enriched using the high-efficiency undecanal- were selected for fragmentation with a dynamic exclusion of based N-termini enrichment (HUNTER) method essentially previously selected precursors for the next 30 s, unless an as previously described. Briefly, equal amounts of A. thaliana intensity increase of factor 3 compared to the previous proteome were dimethyl labeled with 20 mM heavy ( CD O) precursor spectrum was observed. Intensity-dependent frag- or light (CH O) formaldehyde and 20 mM sodium mentation spectra were acquired between 5 Hz, for low- cyanoborohydride at 37 °C for 16 h to block all primary intensity precursor ions (>500 cts), and 20 Hz, for high- amines. To ensure a complete reaction, the same concentration intensity (>25k cts) spectra. Fragment spectra were acquired of reagents was added again and incubated for another 2 h. with stepped parameters, each with 50% of the acquisition time Proteins were purified by chloroform−methanol precipitation dedicated for each precursor: 61 μs transfer time, 7 eV collision to remove excess reagents and dissolved in 0.1 M HEPES (pH energy, and a collision radio frequency (RF) of 1500 Vpp 7.4), and the protein concentration was estimated using the followed by a 100 μs transfer time, 9 eV collision energy, and a BCA assay according to manufacturer instructions (Thermo- collision RF of 1800 Vpp. Fisher, Dreieich, Germany). The samples (400 μg/sample) 2963 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article were digested with legumain, GluC, and trypsin at 37 °C for 16 selectivity for Asn-containing substrates near pH 6. To test h in the respective digestion buffers and protease−proteome whether this exquisite specificity holds true under in-solution ratios as described above. The protease-generated peptides proteome digest conditions, we digested three aliquots of a were hydrophobically tagged with undecanal using an denatured A. thaliana proteome with legumain at pH 6.0 for 18 undecanal−proteome ratio of 50:1 and supplemented with h. In parallel, we digested three aliquots of the same proteome 20 mM sodium cyanoborohydride in 40% ethanol at 50 °C for with trypsin and GluC at pH 7.4. To determine protease 45 min. The reaction was extended by the addition of 20 mM cleavage site specificity, peptides were analyzed by nano-LC− sodium cyanoborohydride for another 45 min. The reaction MS/MS and the acquired spectra were matched to the UniProt was then acidified with a final 1% TFA and centrifuged at A. thaliana proteome database using nonspecificsearch 21000g for 5 min to precipitate free undecanal. Supernatant settings, i.e., without defining an enzyme cleavage specificity. was injected to a preactivated HR-X (M) cartridge (Macherey- This unbiased search identified 4452, 4078, and 7985 peptide Nagel, Düren, Germany). The flow-through containing N- sequences in legumain, GluC, and tryptic digests, respectively, terminal peptides was collected. Remaining N-terminal from which we compiled 6300, 5673, and 12107 unique peptides on the HR-X (M) cartridge were eluted with 40% nonredundant cleavage sites based on the sequence surround- ethanol containing 0.1% TFA, pooled with the first eluate, and ing both ends of the identified peptides. For legumain, 93.3% subsequently evaporated in the SpeedVac to a small volume of the observed cleavage sites were Asn and Asp (51.0% after suitable for C18 StageTips purification. Asn, 42.3% after Asp). A small percentage of unspecific Identification of Glycosylation Sites. Apoplastic fluid cleavage is expected because of endogenous background proteome enrichment was carried out as described with some proteolysis. The percentage of specific cleavage in a whole modifications. The whole A. thaliana rosettes were infiltrated proteome is comparable to 96.7% of cleavages after Lys and with cold sterile water in a SpeedVac for 3 min at a pressure Arg, as observed for trypsin (58.0% after Lys, 38.7% after Arg), between 600 and 2500 Pa. The infiltrated rosettes were then and more stringent than the 85.4% cleavages after Glu and Asp centrifuged at 4 °C, 3000g for 10 min into a collection tube (72.7% after Glu, 12.7% after Asp), as observed for GluC. The containing a Halt protease inhibitor cocktail (ThermoFisher, visualization of the relative amino acid abundance surrounding Dreieich, Germany). Extracted apoplastic fluid proteins were the cleavage sites with IceLogos reflected the strict specificity purified by chloroform−methanol precipitation and resus- at the P1 position, preceding the hydrolyzed peptide bond in pended in 50 mM HEPES (pH 7.4). The protein all three enzymes (Figure 1a−c). While GluC (Figure 1b) and concentration was quantified by using the BCA assay. The trypsin (Figure 1c) do not allow cleavage before proline (P1′ sample was then reduced with 5 mM DTT at 56 °C for 30 min position), this is not the case for legumain (Figure 1a). We and alkylated with 15 mM iodoacetamide at 25 °C for 30 min further analyzed a single replicate of a mouse embryonic in the dark, and the reaction was quenched with 15 mM DTT at 25 °C for 15 min. The protein extract was then separated into two aliquots. One aliquot of 100 μg of apoplast proteome was treated with PNGase F (SERVA Electrophoresis, Heidelberg, Germany) for 2 h at 37 °C before legumain digestion with protease at a ratio of 1:50 at 37 °C, pH 6 (pH adjusted with final concentration of 0.1 M MES pH 6.0). In parallel, another 100 μg of protein extract was predigested with legumain and then treated with PNGase F using the same conditions. The samples were subsequently dimethyl labeled with 20 mM heavy ( CD O) and light (CH O) formaldehyde 2 2 and 20 mM sodium cyanoborohydride at 37 °C for 2 h. The reactions were quenched with 0.1 M Tris pH 7.4 at 37 °C for 1 h and pooled in a 1:1 ratio, and peptides were purified by C18 StageTips. Data Deposition. MS data have been deposited to the ProteomeXchange Consortium (http://www. proteomexchange.org) via the PRIDE (https://www.ebi.ac. uk/pride/archive/) partner repository: PXD014696 for data relating to comparative proteome digestion with legumain, GluC, and trypsin, PXD014699 for A. thaliana proteome digested by legumain in the presence of various denaturants, PXD014698 for various pHs, PXD014697 for HUNTER N- termini profiling of A. thaliana leaves, and PXD014680 for N- glycosylation site mapping. RESULTS Figure 1. Substrate cleavage specificity of legumain, GluC, and Legumain Cleaves Denatured Proteomes Exclusively trypsin. IceLogos visualize the amino acid frequencies surrounding the after Asn and Asp. Previous data obtained by in-gel protein cleavage sites inferred from peptides identified by nonspecific digestion-based specificity profiling and by biochemical database searches after digestion of (a−c) an A. thaliana leaf characterization with test peptides suggested that legumain proteome or (d−f) mouse embryonic fibroblast cell lysate proteome cleaves substrates C-terminally to Asn and Asp residues in a with (a,d) legumain, (b,e) GluC, or (c,f) trypsin. The numbers of pH-dependent manner, with optimal activity and high nonredundant cleavage sites for each logo are indicated. 2964 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 2. Analysis of an A. thaliana leaf proteome digested with legumain, GluC, and trypsin, each performed in three technical repeats. (a) Overlap of unique peptide sequences identified using enzyme-specific database queries. Analysis of the (b) mass, (c) hydrophobicity, and (d) isoelectric point of the identified peptides. (e) Overlap in unique amino acids identified by digestion with the three proteases. (f) Protein sequence coverage observed for superoxide dismutase (At1g08830) in legumain (red, 93%), GluC (green, 43%), and trypsin (blue, 49%) proteome digests. (g) Upset plot showing the overlap in protein groups identified in individual technical digestion replicates. (h) Venn diagram showing the total overlap of protein groups identified by the three enzymes. (i) Reproducibility of proteome quantification (MaxQuant LFQ). Only proteins quantified with two or more peptides were considered. Value indicates the Pearson correlation between the LFQ values obtained for technical replicates. fibroblast proteome and identified 1893, 1722, and 4377 analyses of an E. coli proteome (Supporting Information (SI) peptides using nonspecific database searches after digestion Figure S1), where 2681 peptides identified after legumain with legumain, GluC, and trypsin. Similar specificity profiles digestion yielded 4187 cleavage sites with 86.2% cleavage after were obtained on the basis of the 3244, 2999, and 7965 Asn and Asp (53.1% after Asn, 33.1% after Asp), while 85.3% nonredundant cleavage sites derived from the peptides in of the 8597 unique cleavages observed in 5374 peptides legumain (Figure 1d), GluC (Figure 1e), and trypsin (Figure identified after tryptic digest matched the expected specificity 1f) digests, again showing that legumain tolerates Pro at P1′ (44.1% after Arg, 41.2% after Lys). (Figure 1d). Of the cleavages observed in legumain digest, Complementary Protein Sequence Coverage by 94.5% matched the expected specificity (63.6% after Asn, Digestion with Legumain Compared to GluC and 30.9% after Asp), 97.6% in the tryptic digest (51.9% after Arg, Trypsin. With the strict cleavage specificity of legumain 45.7% after Lys), and 85% in the GluC digest (76.6% after Glu, under proteome digest conditions confirmed by the unbiased 8.4% after Asp). These observations were further confirmed by database search, we repeated spectra-to-sequence matching 2965 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article using standard enzyme-specific settings with up to three missed high correlation between the LFQ values obtained from digests cleavages, using cleavage after Asn and Asp as a specificity rule of the three different proteases (Figure 2i). for legumain. As expected, the smaller search space In the MEF proteome, 1469, 1140, and 2242 protein groups significantly increased the number of peptide identifications were identified in legumain, GluC, and tryptic digests, in the A. thaliana data set by 64%, 8%, and 66% to 7284, 4394, combining to 2587 protein groups in total, with 7.7% and 12806 unique peptide sequences for legumain, GluC, and exclusively identified in the legumain digests (SI Figure S4c). trypsin, respectively (Figure 2a). Specific searches of the MEF A larger overlap was observed between the E. coli proteome proteome data set increased peptide identifications by 129%, digests, where 842 and 1180 protein groups were identified 73%, and 61% to 4296, 2983, and 8489 unique peptides for after legumain and tryptic digestion, respectively, but only 37 legumain, GluC, and trypsin, respectively, compared to results (3%) of these were exclusive for legumain (SI Figure S4d). for nonspecific searches. In E. coli, peptide identifications Legumain Cleaves after Asn More Efficiently than improved by 33% and 7% to 3568 and 5767 unique peptides after Asp. The digestion efficiency of a protease can be for legumain and trypsin. reflected by the number of missed potential cleavage sites While trypsin showed the expected superior performance, within the identified peptides. In the A. thaliana data set, legumain digests resulted in the identification of more peptides legumain generated on average 53% of the peptides without missed cleavage sites, 34% with one missed potential cleavage than GluC, for example, 66% more in the A. thaliana data set. site, and 13% with more than one missed cleavage site (Figure Interestingly, the legumain and GluC data sets showed only a 3a). GluC performed worse, with only 30% of the peptides minimal overlap of 66 identical peptides delimited by cleavages after Asp on both sides, which may occur with both enzymes but are not favored by GluC under the applied reaction conditions (Figure 2a). The analysis of the mass (Figure 2b), hydrophobicity (Figure 2c), and isoelectric point (Figure 2d) of the identified A. thaliana peptides revealed very similar properties for all three enzymes. In contrast, the biophysical properties of all theoretical peptides in in silico-digested A. thaliana and M. musculus proteomes predicted a higher number of peptides with pI > 9 in GluC- and legumain-digested proteomes compared to those with trypsin (SI Figure S2a,b). However, a comparison to our data (Figure 2b−d) suggests that such peptides are rarely identified with the standard experimental setup with reverse-phase chromatography under acidic conditions and ionization and mass spectrometric analysis in positive ion mode. Despite these physical similarities, peptides identified after digestion with the three proteases covered distinct amino acids in the identified A. thaliana proteins (Figure 2e). In total, the parallel application of legumain, GluC, and trypsin in technical triplicates identified 1524, 1090, and 2380 protein groups in the A. thaliana proteome, respectively, combining to a total of 2785 protein groups, with legumain contributing 8.8% exclusive identifications (Figure 2g,h, SI Figure 3. Potential cleavage sites missed by legumain, GluC, and Table S1). As expected from the number of peptide trypsin in A. thaliana leaf proteome digests. (a) Percentage of peptides containing up to three missed cleavage sites. (b) Missed cleavage sites identifications, a large majority of 2057 proteins (74.3%) had sorted by missed amino acid residues. the highest sequence coverage in the tryptic digest, followed by 507 (18.3%) in legumain digests and 206 (7.4%) in GluC digests (SI Table S1). For example, the sequence coverage of with no missed cleavage, but almost 12% of the identified superoxide dismutase (At1g08830) (Figure 2f, SI Figure S3a) peptides containing three missed cleavage sites. Trypsin was was a remarkable 93% in legumain digests compared to 43% the best performing enzyme, with only 18% of the peptides and 49% in the GluC and trypsin data sets, and sequence containing one or more missed cleavage sites (Figure 3a). coverage of the germin-like protein 1 (At1g72610) was at 63% When we further considered the identity of the amino acid with legumain compared to only 23% and 8% with GluC and residue, we noted that legumain reliably cleaved after Asn trypsin (SI Figure S3b). Notably, for each of the three residues, with only 5% of the peptides containing an internal proteases >80% of the proteins were identified in all three Asn, but it missed one or more Asp in 40% of the peptides replicates, indicating a high degree of reproducibility in the (Figure 3b). Most missed cleavage sites in GluC-digested digests (SI Figure S4a). On the single replicate level, the proteomes were at Asp, and even trypsin showed a higher combination of any tryptic digest with any legumain or GluC fidelity at Arg than at Lys (Figure 3b). Remarkably, legumain digest resulted in a slightly higher number of protein cleaved after Asn residues as efficiently as trypsin at the favored identifications than any two tryptic replicates combined (SI Arg-containing cleavage sites. Similar trends were observed in Figure S4b). We further compared reproducibility by label free digests of MEF and E. coli proteomes, where legumain digests proteome quantification (LFQ) with MaxQuant after filtering consistently showed a high cleavage efficiency at Asn sites with for protein groups quantified by two or more peptides (SI more missed cleavages at Asp (SI Figure S5). Table S2). This demonstrated excellent correlation of the LFQ Assessing Legumain Efficiency in Different Reaction values between the technical digestion replicates and also a Conditions. Previous publications have shown that legumain 2966 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 4. Complementary N-terminome coverage by parallel digestion with legumain, GluC, and trypsin. (a) Experimental workflow for the enrichment of N-terminal peptides using HUNTER. For detailed description, see the main text. Light blue and orange circles indicate differential stable isotope labeling by reductive dimethylation, and magenta triangles indicate undecanal modification. (b) Overlap in N-termini identification based on the first seven amino acids of each N-terminal peptide identified in the experiments with the three proteases. Peptide MS/MS fragmentation spectra of (c) the acetylated mature N-terminus of glucosinolate transporter-1 and (d) a proteolysis-derived dimethylated N- terminus in the CLPR3 subunit of the ATP-dependent Clp protease. Both termini were identified in legumain digests, with sequence context surrounding the identified peptide indicated in gray. UniProt accession code and gene accession numbers are indicated. is more active at a lower pH and that cleavage after Asn is step incubation maintained efficient cleavage at Asn residues favored at a higher pH. To test if this is also the case with the while decreasing the number of peptides containing missed digest conditions applied here, we digested whole-leaf A. Asp cleavage sites (SI Figure S5). Denaturants are commonly used for proteome preparations thaliana proteome at varying pHs between 5.0 and 6.5 for shorter (2 h) and longer (24 h) incubation times (SI Figure but are problematic during digestion. We tested the tolerance S6a). We observed the highest number of peptide of legumain to urea and guanidinium hydrochloride but identifications at pH 5.5 and pH 6, which may have been observed dramatically decreased digestion efficiency (SI Figure caused by the higher propensity for proteome precipitation at a S7a), reflected in decreased peptide identifications (SI Figure lower pH that we observed in concentrated samples. As S7b) with an increased frequency of missed cleavage (SI Figure expected, legumain showed an increasing preference for Asn S7c). In contrast, legumain tolerated the organic solvent with an increasing pH, and this kinetic preference was also acetonitrile quite well with little decrease in efficiency up to reflected at different digestion times. Short proteome 10% acetonitrile concentration (SI Figure S7). digestions (2 h) and/or lower pH (pH 5.0) resulted in a We also assessed the amount of legumain necessary to higher proportion of Asn cleavages (SI Figure S6b), whereas achieve optimal digest by varying the protease to proteome longer incubation (24 h) and/or higher pH yielded more ratio. Digestion appeared equally efficient in several dilutions complete cleavage after Asp (SI Figure S6b). On the basis of down to a legumain to proteome ratio of 1:100, as judged by this observation, we tested whether acidification of the MEF the number of identified peptides from an equal starting proteome digest after an initial incubation at pH 6 would result material (SI Figure S8a). Another important enzyme property in more complete cleavage at Asp residues. Indeed, this two- for routine use is the shelf-life time, where our recombinant 2967 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article Figure 5. Identification of N-glycosylation sites by sequential processing with legumain and PNGase F. (a) Scheme of the experimental workflow. For details, see the main text. Light blue and orange circles indicate differential stable isotope labeling by dimethylation. Asterisks indicate deamidated asparagine residue arising from PNGase F treatment. (b) Overlap of N-glycosylation identified with internal deamidated Asn in workflow 1 and with C-terminal deamidated Asn in workflow 2. (c) MS/MS fragmentation spectra of an N-glycosylation site in MYROSINASE 1 identified in both workflows. UniProt and A. thaliana gene accession codes are indicated. legumain preparations withstood 10 freeze/thaw cycles of identified protein N-termini, we extracted the first seven without loss of peptidase activity (SI Figure S8b). residues of each N-terminal peptide (Figure 4b). Only a Legumain is Highly Complementary for Protein N- minority of 100 protein N-termini were identified by all three Termini Profiling. The complementarity of different proteases, and an additional 632 were identified by two digestion enzymes is particularly helpful for the identification proteases, with a majority of 2101 N-termini identified only in of specific post-translational modification sites such as digests of a single enzyme (Figure 4b). For example, the 14,16 10,20 phosphorylations and protein termini, as these may acetylated, native N-terminus of the glucosinolate transporter-1 reside in sequences that are not accessible by trypsin. To NPF2.10 was only identified in legumain digests, whereas demonstrate the value of legumain for this purpose, we profiled multiple Glu in the N-terminal peptide excluded identification N-termini in the A. thaliana leaf proteome with our recently in GluC digests, while the tryptic digest would deliver a very established HUNTER protocol (Figure 4a). In three long peptide with unfavorably high content in acidic amino replicates per enzyme, two aliquots of A. thaliana leaf acids (Figure 4c). Similarly, legumain digests uniquely proteome were differentially dimethyl labeled to block all identified an endoproteolytic processing site in CLPR3 (Figure unmodified primary amines. Thus, all protein N-termini are 4d). modified, either by endogenous modifications such as Legumain as a Tool for N-Glycosylation Site acetylation or by in vitro dimethylation. Differentially labeled Mapping. N-Glycosylation is an important and frequent 23,37 duplicates are unified and digested in parallel with legumain, modification of secreted proteins. The removal of the GluC, or trypsin. This digestion generates new N-terminal glycan by PNGase F results in deamidation of the Asn to Asp primary amines in all internal and C-terminal peptides, which and facilitates mass spectrometry-based identification of are then undecanal labeled while the blocked N-terminal occupied N-glycosylation sites. We speculated that N- peptides remain inert. Undecanal tagging increases the glycosylation would prevent legumain from hydrolyzing hydrophobicity of the digest-generated peptides, which enables adjacent peptide bonds, on the basis of the crystal structure their selective retention on a C18 cartridge, while the dimethyl of human legumain that revealed that the zwitterionic labeled (or otherwise modified) protein N-terminal peptides character of its S1 subsite provides an ideal binding site for are highly enriched in the flow-through for selective analysis Asn, but no space to accommodate a glycosylated Asn (Figure 4a). With this negative selection, we identified a total residue. In contrast, Asp residues resulting from deglycosy- of 4773 N-terminal peptides (SI Table S3), with 1167, 1209, lation by a PNGase F treatment would be cleaved. Thus, a and 2342 N-terminal peptides identified in legumain, GluC, sequential treatment with legumain and PNGase F should and tryptic digests, respectively. The differential labeling result in longer peptides containing a missed deamidated Asn demonstrated equivalent accuracy in quantification for all (Figure 5a, workflow 1), whereas a PNGase F treatment before three enzymes (SI Figure S9). For comparison of the overlap legumain digest should result in shorter peptides ending with a 2968 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article deamidated Asn (Figure 5a, workflow 2). In proof of concept, unassignable, complex spectra. In contrast, fragmentation by we isolated A. thaliana apoplastic fluid proteome enriched in electron transfer dissociation (ETD) is not affected by the secreted N-glycosylated proteins and sequentially treated two position of the basic residues and has been reported to aliquots with legumain and PNGase F and vice versa in two improve peptide identifications after digestion with proteases parallel reactions (Figure 5a). Treated peptides were differ- that generate long peptides or peptides with internal basic entially dimethyl labeled with heavy and light formaldehyde residues. and combined before nano-LC−MS/MS analysis. Indeed, we Parallel digests with all three enzymes increased proteome found several peptides that fulfilled the expectations (Figure and protein sequence coverage and were particularly beneficial 5b, SI Table S4). Peptides from 45 proteins contained a for protein N-termini identification, where a single digest often deamidated Asn as missed cleavage in workflow 1, whereas generated N-terminal peptides that are too short, too long, or peptides from 49 proteins ended with deamidated Asn. For 6 otherwise unfavorable for identification. By extension, similar proteins, including myrosinase 1 (TGG1), an important benefits may be expected for other post-translational glycoprotein involved in plant defense, we observed peptides modifications. Furthermore, using a sequential incubation matching to the same N-glycosylation sites in both workflows, with legumain and PNGase F, we have demonstrated that providing intrinsic orthogonal validation (Figure 5c). Notably, legumain cannot cleave after glycosylated Asn residues, in this glycosylation site has also been reported previously. contrast to deamidated deglycosylated Asn after PNGase F treatment. On a larger scale, evidence for N-glycosylation can DISCUSSION be obtained by PNGase F treatment in O-water, which results in deamidation of Asn to partially O-labeled Asp. It is well established that the use of complementary proteases However, this partial labeling makes relative quantification with different specificity in bottom-up proteomic workflows across samples challenging, while omission of the O-labeling can improve proteome coverage and provide access to 3,11 decreases confidence of the site identification as deamidation sequences that are missed in tryptic digests. This not only can also occur spontaneously. On the basis of our proof-of- allows identification of “missing proteins” that have not been concept experiment with A. thaliana apoplast proteome, we identified by mass spectrometry before, one of the central propose tandem sequential PNGase F/legumain treatment as goals of the Human Proteome Project, but also is important an alternative strategy for experimental validation of N- for comprehensive mapping of post-translational modification 14,16 glycosylation sites. sites including phosphorylations and global identification 10,20 There are many further potential applications for legumain of protein termini. Here we characterize human legumain in peptide-centric proteome workflows. We have previously as a new digestion protease in the proteomic toolbox. used legumain to generate high-quality E. coli proteome- Legumain exhibited strict sequence specificity for cleavage derived peptide libraries, which enabled detailed cleavage after Asn and Asp and a high cleavage efficiency that makes it a specificity profiling of the vitamin K-dependent coagulation highly suitable alternative proteolytic enzyme for proteomics. protease sirtilin that would not be possible in trypsin-generated We have established conditions for reliable in-solution libraries. Legumain maintains activity at a low pH, down to proteome digestion with legumain and show that the pH 4.0, and is active in nonreducing conditions; therefore it alternative cleavage site at Asn yields an entirely different set is also suitable for protein disulfide bond determination at the of peptides than trypsin does, with only minimal overlap in the low-pH environment required to prevent disulfide reshuf- number of identified peptides delimited by Asp on both sides, fling. Currently pepsin is used for these experiments due to in comparison to those with GluC digests. In agreement with its high activity under acidic conditions. However, pepsin the kinetic cleavage preferences determined with peptide 22,24 generates a large number of overlapping peptides due to its substrates, Vidmar et al. reported only minimal cleavage at broad specificity with a nonexclusive preference for cleavage Asp residues at pH 6 during in-gel digestion of a denatured after Tyr, Phe, Trp, and Leu that complicate the spectra proteome. In contrast, we have observed a much higher assignment, whereas legumain’s high cleavage specificity would cleavage efficiency at Asp residues at pH 6 in our data set, alleviate this problem. Taken together, we propose that which likely arises from the different digest conditions (in-gel recombinant human legumain is an attractive protease to digestion for 2 h with citrate buffer compared to in-solution complement trypsin in bottom-up mass spectrometry-based digestion for 16 h in a MES buffer). We noted that the data set proteomics. of Vidmar et al. contains a higher proportion of missed cleavages at Asn and Asp residues than our data set (SI Figure ASSOCIATED CONTENT S10), suggesting that the prolonged reaction under more favorable in-solution conditions enables legumain to have a sı * Supporting Information more complete cleavage at Asp residues even at pH 6.0. The Supporting Information is available free of charge at Notably, a similar effect was observed for Ulilysin/LysargiNase, https://pubs.acs.org/doi/10.1021/acs.analchem.9b03604. which has a strong preference for Arg when tested with peptide Figures of substrate cleavage specificity, biophysical substrates but results in a near-complete digestion at Lys properties of peptides, comparisions of protein sequence residues under proteome digest conditions. coverage, overlap of protein group identifications, Digestion with legumain consistently identified more potential cleavage sites, optimization of legumain peptides than digestion with GluC, but trypsin was far digestion conditions, cleavage specificity analysis, superior. This has been reported for various other digestion legumain digestion efficiency, graphs of numbers of proteases, particularly those that do not select for cleavage at 4,11 peptides and missed cleavage sites, proteolytic activity of basic residues. One explanation is that digestion with legumain, quantification of protein N-terminal peptides, enzymes such as legumain and GluC generates peptides with and potential legumain cleavage sites present in peptides internal basic residues. This can give rise to internal fragment (PDF) ions during collision-induced dissociation (CID) and result in 2969 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article REFERENCES Tables of lists of proteins identified after digestion, N- termini identified, and identification of N-glycosylation (1) Aebersold, R.; Mann, M. Nature 2016, 537 (7620), 347−55. sites (XLSX) (2) Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J. R., 3rd Chem. Rev. 2013, 113 (4), 2343−94. (3) Tsiatsiani, L.; Heck, A. J. FEBS J. 2015, 282 (14), 2612−26. AUTHOR INFORMATION ■ (4) Swaney, D. L.; Wenger, C. D.; Coon, J. J. J. Proteome Res. 2010, 9 (3), 1323−9. Corresponding Author (5) Perrar, A.; Dissmeyer, N.; Huesgen, P. F. J. Exp. Bot. 2019, 70 Pitter F. Huesgen − Central Institute for Engineering, (7), 2021−38. Electronics and Analytics, ZEA-3, Forschungszentrum Jülich, (6) Lange, P. F.; Overall, C. M. Curr. Opin. Chem. Biol. 2013, 17 (1), 52428 Jülich, Germany; Cologne Excellence Cluster on Cellular 73−82. Stress Responses in Aging Associated Diseases, Medical Faculty (7) Turk, B.; Turk, D.; Turk, V. EMBO J. 2012, 31 (7), 1630−43. and University Hospital and Institute for Biochemistry, Faculty (8) Klein, T.; Eckhard, U.; Dufour, A.; Solis, N.; Overall, C. M. of Mathematics and Natural Sciences, University of Cologne, Chem. Rev. 2018, 118 (3), 1137−1168. 50931 Cologne, Germany; orcid.org/0000-0002-0335- (9) Niedermaier, S.; Huesgen, P. F. Biochim. Biophys. Acta, Proteins 2242; Email: p.huesgen@fz-juelich.de Proteomics 2019, 1867 (12), 140138. (10) Vogtle, F. N.; Wortelkamp, S.; Zahedi, R. P.; Becker, D.; Authors Leidhold, C.; Gevaert, K.; Kellermann, J.; Voos, W.; Sickmann, A.; Wai Tuck Soh − Department of Biosciences, University of Pfanner, N.; Meisinger, C. Cell 2009, 139 (2), 428−39. Salzburg, 5020 Salzburg, Austria; orcid.org/0000-0003- (11) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. Nat. Protoc. 0082-7983 2016, 11 (5), 993−1006. (12) Huesgen, P. F.; Lange, P. F.; Rogers, L. D.; Solis, N.; Eckhard, Fatih Demir − Central Institute for Engineering, Electronics and U.; Kleifeld, O.; Goulas, T.; Gomis-Ruth, F. X.; Overall, C. M. Nat. Analytics, ZEA-3, Forschungszentrum Jülich, 52428 Jülich, Methods 2015, 12 (1), 55−58. Germany; orcid.org/0000-0002-5744-0205 (13) Schrader, ̈ C. U.; Lee, L.; Rey, M.; Sarpe, V.; Man, P.; Sharma, Elfriede Dall − Department of Biosciences, University of S.; Zabrouskov, V.; Larsen, B.; Schriemer, D. C. Mol. Cell. Proteomics Salzburg, 5020 Salzburg, Austria 2017, 16, 1162−1171. Andreas Perrar − Central Institute for Engineering, Electronics (14) Schlosser, A.; Vanselow, J. T.; Kramer, A. Anal. Chem. 2005, 77, and Analytics, ZEA-3, Forschungszentrum Jülich, 52428 Jülich, 5243−5250. Germany (15) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Yates, J. R., 3rd Nat. Sven O. Dahms − Department of Biosciences, University of Biotechnol. 2003, 21 (5), 532−8. Salzburg, 5020 Salzburg, Austria (16) Gonczarowska-Jorge, H.; Loroch, S.; Dell’Aica, M.; Sickmann, Maithreyan Kuppusamy − Central Institute for Engineering, A.; Roos, A.; Zahedi, R. P. Anal. Chem. 2017, 89 (24), 13137−13145. Electronics and Analytics, ZEA-3, Forschungszentrum Jülich, (17) Meyer, J. G.; Kim, S.; Maltby, D. A.; Ghassemian, M.; Bandeira, 52428 Jülich, Germany; orcid.org/0000-0001-6866-0417 N.; Komives, E. A. Mol. Cell. Proteomics 2014, 13 (3), 823−35. (18) Wu, C. C.; Yates, J. R. Nat. Biotechnol. 2003, 21, 262−267. Hans Brandstetter − Department of Biosciences, University of (19) Giansanti, P.; Aye, T. T.; van den Toorn, H.; Peng, M.; van Salzburg, 5020 Salzburg, Austria; orcid.org/0000-0002- Breukelen, B.; Heck, A. J. Cell Rep. 2015, 11 (11), 1834−43. 6089-3045 (20) Lange, P. F.; Huesgen, P. F.; Nguyen, K.; Overall, C. M. J. Complete contact information is available at: Proteome Res. 2014, 13 (4), 2028−44. https://pubs.acs.org/10.1021/acs.analchem.9b03604 (21) Dall, E.; Brandstetter, H. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (27), 10940−5. Author Contributions (22) Vidmar, R.; Vizovisek, M.; Turk, D.; Turk, B.; Fonovic, M. EMBO J. 2017, 36 (16), 2455−2465. W.T.S. and F.D. contributed equally. (23) Hebert, D. N.; Lamriben, L.; Powers, E. T.; Kelly, J. W. Nat. Author Contributions Chem. Biol. 2014, 10 (11), 902−10. F.D. and P.F.H. conceived the project. W.T.S., F.D., E.D., (24) Dall, E.; Brandstetter, H. Acta Crystallogr., Sect. F: Struct. Biol. S.O.D., H.B., and P.F.H. designed the experiments and Cryst. Commun. 2012, 68 (1), 24−31. analyzed the data. E.D. and S.O.D. provided the recombinant (25) Wessel, D.; Flugge, U. I. Anal. Biochem. 1984, 138 (1), 141−3. human legumain, and W.T.S., F.D., M.K., and A.P. performed (26) Hughes, C. S.; Moggridge, S.; Muller, T.; Sorensen, P. H.; the experiments. The manuscript was written by W.T.S., F.D., Morin, G. B.; Krijgsveld, J. Nat. Protoc. 2019, 14 (1), 68−85. (27) Rappsilber, J.; Mann, M.; Ishihama, Y. Nat. Protoc. 2007, 2 (8), and P.F.H. and edited by all authors. All authors have approved 1896−906. the final version of the manuscript. (28) Rinschen, M. M.; Hoppe, A. K.; Grahammer, F.; Kann, M.; Notes Volker, L. A.; Schurek, E. M.; Binz, J.; Hohne, M.; Demir, F.; Malisic, The authors declare no competing financial interest. M.; Huber, T. B.; Kurschat, C.; Kizhakkedathu, J. N.; Schermer, B.; Huesgen, P. F.; Benzing, T. J. Am. Soc. Nephrol. 2017, 28 (10), 2867− ACKNOWLEDGMENTS (29) Tyanova, S.; Temu, T.; Cox, J. Nat. Protoc. 2016, 11 (12), We thank Dr. Ulrich Eckhard for critically reading this 2301−2319. manuscript. W.T.S. is a Ph.D. student in the Immunity in (30) Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M. Y.; Cancer and Allergy Ph.D. program funded by the Austrian Geiger, T.; Mann, M.; Cox, J. Nat. Methods 2016, 13 (9), 731−40. Science Fund FWF (project W_01213). This project was in (31) Kelley, L. A.; Mezulis, S.; Yates, C. M.; Wass, M. N.; Sternberg, part supported by a starting grant of the European Research M. J. Nat. Protoc. 2015, 10 (6), 845−58. Council, with funding from the European Union’s Horizon (32) Weng, S. S. H.; Demir, F.; Ergin, E. K.; Dirnberger, S.; Uzozie, 2020 program (grant 639905, to P.F.H.) and the German A.; Tuscher, D.; Nierves, L.; Tsui, J.; Huesgen, P. F.; Lange, P. F. Mol. Research Foundation DFG (FOR2743, grant HU1756/3-1 to Cell. Proteomics 2019, 18, 2335. P.F.H.). (33) Joosten, M. H. Methods Mol. Biol. 2012, 835, 603−10. 2970 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971 Analytical Chemistry pubs.acs.org/ac Article (34) Deutsch, E. W.; Orchard, S.; Binz, P. A.; Bittremieux, W.; Eisenacher, M.; Hermjakob, H.; Kawano, S.; Lam, H.; Mayer, G.; Menschaert, G.; Perez-Riverol, Y.; Salek, R. M.; Tabb, D. L.; Tenzer, S.; Vizcaino, J. A.; Walzer, M.; Jones, A. R. J. Proteome Res. 2017, 16 (12), 4288−4298. (35) Vizcaino, J. A.; Csordas, A.; del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; Xu, Q. W.; Wang, R.; Hermjakob, H. Nucleic Acids Res. 2016, 44 (D1), D447−56. (36) Weng, S. S. H.; Demir, F.; Ergin, E. K.; Dirnberger, S.; Uzozie, A.; Tuscher, D.; Nierves, L.; Tsui, J.; Huesgen, P. F.; Lange, P. F. Mol. Cell. Proteomics 2019, 18, 2335. (37) Clerc, F.; Reiding, K. R.; Jansen, B. C.; Kammeijer, G. S. M.; Bondt, A.; Wuhrer, M. Glycoconjugate J. 2016, 33, 309−343. (38) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431−1440. (39) Dall, E.; Brandstetter, H. Mechanistic and structural studies on legumain explain its zymogenicity, distinct activation pathways, and regulation. Proceedings of the National Academy of Sciences of the United States of America, 2013. (40) Barth, C.; Jander, G. Plant J. 2006, 46 (4), 549−62. (41) Liebminger, E.; Grass, J.; Jez, J.; Neumann, L.; Altmann, F.; Strasser, R. Phytochemistry 2012, 84,24−30. (42) Wang, Y.; Chen, Y.; Zhang, Y.; Wei, W.; Li, Y.; Zhang, T.; He, F.; Gao, Y.; Xu, P. J. Proteome Res. 2017, 16 (12), 4352−4363. (43) Paik, Y. K.; Overall, C. M.; Deutsch, E. W.; Van Eyk, J. E.; Omenn, G. S. J. Proteome Res. 2017, 16 (12), 4253−4258. (44) Tallant, C.; Garcia-Castellanos, R.; Marrero, A.; Canals, F.; Yang, Y.; Reymond, J. L.; Sola, M.; Baumann, U.; Gomis-Ruth, F. X. Biol. Chem. 2007, 388 (11), 1243−53. (45) Dahms, S. O.; Demir, F.; Huesgen, P. F.; Thorn, K.; Brandstetter, H. J. Thromb. Haemostasis 2019, 17 (3), 470−481. (46) Gorman, J. J.; Wallis, T. P.; Pitt, J. J. Mass Spectrom. Rev. 2002, 21 (3), 183−216. 2971 https://dx.doi.org/10.1021/acs.analchem.9b03604 Anal. Chem. 2020, 92, 2961−2971

Journal

Analytical ChemistryPubmed Central

Published: Jan 17, 2020

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off