Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and activated skeletal muscle stem cells

Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and... Background: Skeletal muscle satellite (stem) cells are quiescent in adult mice and can undergo multiple rounds of proliferation and self-renewal following muscle injury. Several labs have profiled transcripts of myogenic cells during the developmental and adult myogenesis with the aim of identifying quiescent markers. Here, we focused on the quiescent cell state and generated new transcriptome profiles that include subfractionations of adult satellite cell populations, and an artificially induced prenatal quiescent state, to identify core signatures for quiescent and proliferating. Methods: Comparison of available data offered challenges related to the inherent diversity of datasets and biological conditions. We developed a standardized workflow to homogenize the normalization, filtering, and quality control steps for the analysis of gene expression profiles allowing the identification up- and down-regulated genes and the subsequent gene set enrichment analysis. To share the analytical pipeline of this work, we developed Sherpa, an interactive Shiny server that allows multi-scale comparisons for extraction of desired gene sets from the analyzed datasets. This tool is adaptable to cell populations in other contexts and tissues. Results: A multi-scale analysis comprising eight datasets of quiescent satellite cells had 207 and 542 genes commonly up- and down-regulated, respectively. Shared up-regulated gene sets include an over-representation of the TNFα pathway via NFKβ signaling, Il6-Jak-Stat3 signaling, and the apical surface processes, while shared down-regulated gene sets exhibited an over-representation of Myc and E2F targets and genes associated to the G2M checkpoint and oxidative phosphorylation. However, virtually all datasets contained genes that are associated with activation or cell cycle entry, such as the immediate early stress response genes Fos and Jun. An empirical examination of fixed and isolated satellite cells showed that these and other genes were absent in vivo, but activated during procedural isolation of cells. Conclusions: Through the systematic comparison and individual analysis of diverse transcriptomic profiles, we identified genes that were consistently differentially expressed among the different datasets and shared underlying biological processes key to the quiescent cell state. Our findings provide impetus to define and distinguish transcripts associated with true in vivo quiescence from those that are first responding genes due to disruption of the stem cell niche. Keywords: Skeletal muscle satellite cells, Quiescence, Multi-level transcriptomic comparisons, Sherpa * Correspondence: shaht@pasteur.fr; shahragim.tajbakhsh@pasteur.fr Equal contributors Stem Cells and Development, Department of Developmental and Stem Cell Biology, Institut Pasteur, 75015 Paris, France CNRS UMR 3738, Institut Pasteur, 75015 Paris, France Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 2 of 15 Background signature of the quiescent state. This large compendium Most adult stem cell populations identified to date are of expression data offers the first comparison and integra- in a quiescent state [1]. Following tissue damage or tion of nine independent studies of the quiescent state of disruption of the stem cell niche, skeletal muscle satellite mouse satellite cells, and we developed Sherpa, a shiny (stem) cells transit through different cell states from re- interactive web server to provide a user-friendly explor- versible cell cycle exit to a postmitotic multi-nucleate ation of the analysis. In addition, using a protocol for the state in myofibres. In mouse skeletal muscle, the tran- fixation and capture of mRNA directly from the tissue scription factor Pax7 marks satellite cells during quies- without the alteration in gene expression that could arise cence and proliferation, and it has been used to identify during the isolation procedure, which typically takes and isolate myogenic populations from skeletal muscle several hours with solid tissues, we have empirically tested [2, 3]. Myogenic cells have also been isolated by the expression of transcripts. Strikingly, several genes, fluorescence-activated cell sorting (FACS) using a variety including members of the Jun and Fos family, were found of surface markers, including α7-integrin, VCAM, and to be present in isolated satellite cells using conventional CD34 [4]. Although these cells have been extensively isolation procedures, but they were absent in vivo. These studied by transcriptome, and to a more limited extent findings, and the unique atlas that we report, will un- by proteome profiling, different methods have been used doubtedly improve our current understanding of the to isolate and profile myogenic cells thereby making molecular mechanisms governing the quiescent state and comparisons laborious and challenging. To address this contribute to the identification of critical regulatory genes issue, it is necessary to generate comprehensive catalogs involved in different cell states. of gene expression data of myogenic cells across distinct states and in different conditions. Methods Soon after their introduction two decades ago, high- Individual dataset transcriptomic analysis throughput microarray studies started to be compiled The analysis comprised a total of nine datasets, three into common repositories that provide the community novel microarray datasets and six publicly available data- access to the data. Several gene expression repositories sets [9–14], choosing only samples with overall similar for specific diseases, such as the Cancer Genome Atlas conditions. All datasets were analyzed independently (TCGA) [5], the Parkinson’s disease expression database following the same generalized pipeline based on ad hoc ParkDB [6], or for specific tissues, such the Allen Hu- R-implemented scripts (Fig. 2). man and Mouse Brain Atlases [7, 8] among many, have been crucial in allowing scientists the comparison of Gene expression profiles datasets, the application of novel methods to existing The microarray data compared activated satellite cells datasets, and thus a more global view of these biological (ASCs) and quiescent satellite cells (QSCs) from differ- systems. ent experiments. Table 1 describes the public datasets In this work, we generated transcriptome datasets that were taken into account for the analysis with the of satellite cells in different conditions and performed GEO [15] (Gene Expression Omnibus) identifications, comparisons with published datasets. Due to the diver- references, and sample distribution. The new mouse micro- sity of platforms and formats of published datasets, this array datasets include the following comparisons: young was not readily achievable. For this reason, we developed adult Quiescent(adult)/Activated (postnatal day 8) and an interactive tool called Sherpa (SHiny ExploRation Quiescent [high/low]/D3Activated [high/low], and Fetal_- tool for transcriPtomic Analysis) to provide comprehen- NICD [E17.5/E14.5]. Table 1 presents all sample details. sive access to the individual datasets analyzed in a homogeneous manner. This web server allows users to Animals, injuries, and cell sorting (i) identify differentially expressed genes of the individ- Animals were handled according to the national and ual datasets, (ii) identify the enriched gene sets of the European Community guidelines and the ethics commit- individual datasets, and (iii) effectively compare the tee of the Institut Pasteur (CTEA) in France. For isolation chosen datasets. Sherpa is adaptable and serves as a of quiescent satellite cells, Tg: Pax7-nGFP mice (6– repository for the integration and analysis of future tran- 12 weeks) [2] were anesthetized prior to the injury. Tibi- scriptomic data. It has a generic design that makes it alis anterior (TA) muscles were injured with notexin applicable to the analysis of other transcriptome datasets (10 μl–10 μM; Latoxan). Cells were then isolated by FACS generated in a variety of conditions and tissues. using FACS ARIA III (BD Biosciences), MoFlo Astrios Hi Lo We analyzed gene expression profiles (GEPs) of acti- and Legacy (Beckman Coulter) sorters. Pax7 and Pax7 vated and quiescent states of mouse satellite cells derived cells correspond to the 10% of cells with the highest and from three new experimental setups and six publicly avail- the lowest expression of nGFP, respectively, as defined able microarray datasets to define a consensus molecular previously [3]. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 3 of 15 Table 1 Summary of analyzed transcriptomic datasets of activated and quiescent states of mouse muscle stem (satellite) cells. Three new high-throughput experimental setups and six publically available microarray datasets comparing activated satellite cells (ASCs) and quiescent satellite cells (QSCs) are shown in the rows. The biological, experimental, and technical details of each experiment are shown in the different columns of the Table NICD Ref/code Quiescent [high/low] Quiescent Fetal R26 GSE47177 Liu et GSE3483 GSE15155 GSE38870 GSE70376 GSE81096 D3 Activated activated [E17.5/E14.5] al. [9] Fukada et al. [10] Pallafacchina et al. [11] Farina et al. [12] García-Prat et al. [13] Lukjanenko et al. [14] [high/low] Num. of 3 QSC_Pax7 low, 3 3 QSC, 3 3 "QSC," 3 3 QSC, 3 QSC, 3 ASC 3 QSC, 3 ASC 3 QSC, 3 ASC 4 QSC, 4 ASC 6 QSC, 5 ASC samples ASC_Pax7 low, 3 P8_ASC "ASC 3 ASC 60 h, QSC_Pax7 high, 3 3 ASC 84 h ASC_Pax7 high Date 2013 2007 2015 2013 2007 2010 2012 2015 2016 Anatomy Tibialis anterior Limb, body Forelimbs Hindlimb Hindlimb Diaphragm, pectoralis, Tibialis anterior Tibialis anterior Tibialis anterior, wall, diaphragm abdominal muscles gastro-necmius, quadriceps Sex M M, F M F M, F F M M Age 6–8w P8,4–5 w E14.5, E17.5 8 w 8–12 w 6 w 3–6 m Young: 3 m, old: Young: 9–15 w, 20–24 m old: 20–24 m Strain C57BL/6 B6.129 C57BL/6 (Jackson) C57BL/7 C57BL/6 x C57BL/6 C57BL/6 (Janvier) (nihon clea) DBA2 (??) nGFP/+ Cre+ CreER/+ GFP/+ Reporter Tg:Pax7-nGFP Pax7 Myf5 : Pax7 Pax3 (high GFP) stop-NICDgfp/+ eYFP/+ (10% high, R26 R26 10% low) GFP/+ GFP/ Activation Notexin P8 = “activated” E14.5 = “activated” BaCl QSCs in culture Pax3 , Pax3 Injury: BaCl Cardiotoxin Cardiotoxin 2 2 for 4 d :mdx:mdx (50 μL 1.2%) Adult; adult mdx; 1 w old; 3 d in culture CreER/+ QSCs Reporter Reporter Reporter FACS: Pax7 ; FACS: CD45− Reporter FACS: FACS: integrin- CD34+/integrin- eYFP/+ Purif. R26 /SM/C-2.6+ syndecan-3 alpha7+/Lin− alpha7+/Lin- /CD31−/CD45− /CD11b−/Sca1- Timing Quiescent and E17.5(“Q”), E14.5(“A”) 36 h (1.5 d), 60 h 4 d in culture 3 d in culture 12 h or 48 h 72 h (3 d) 72 h (3 d) 3 d postinjury (2.5 d), 84 h (3.5 d) postinjury postinjury Platform Affymetrix Mouse Affymetrix Affymetrix Mouse Affymetrix Mouse Affymetrix 430A Affymetrix 430_2.0 Affymetrix Agilent 028005 Illumina MouseRef- Gene 1.0ST 430_2.0 Gene 1.0ST, Affymetrix Gene 1.0ST 430_2.0 SurePrint G3Mouse 8 v2.0 Mouse Gene 2.0ST 8x60k Microarray h hours, d days, w weeks, m months Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 4 of 15 For isolation of activated satellite cells, TA muscles (day 3 Western blot analysis postinjury (D3) and non-injured) were collected and sub- Total protein extracts from satellite cells isolated by FACS jected to 4–5 rounds of digestion in a solution of 0.08% col- were run on a 4–12% Bis-Tris Gel NuPAGE (Invitrogen) lagenase D (Roche) and 0.1% trypsin (Gibco #31966) diluted and transferred on Amersham Hybond-P transfer mem- in DMEM-1% P/S (Invitrogen) supplemented with DNAse I brane (Ge Healthcare). The membrane was then blocked Hi Lo at 10 μg/ml (Roche, 11284932001) [2, 3]. Pax7 and Pax7 with 5% non-fat dry milk in TBS; probed with anti-JunD cells correspond to the 10% of cells with the highest and the (329) (1:1000, sc-74 Santa Cruz Biotechnology Inc.), anti- lowest expression of nGFP, respectively, as defined previously JunB (N-17) (1:1000, sc-46 Santa Cruz Biotechnology [3]. Inc.), or anti-c-Jun (H-79) (1:1000, sc-1694 Santa Cruz Skeletal muscle progenitors were obtained also from Biotechnology Inc.) overnight; washed and incubated with CreCAP/ the forelimbs of E14.5 and E17.5 fetuses of Myf5 HRP-conjugated donkey anti-rabbit IgG secondary anti- + stop-NICD-nGFP/+ :R26R [16] compound mice. Tissues were body (1:3000); and detected by chemiluminescence (Pierce dissociated in DMEM, 0.1% collagenase D (Roche, ECL2 western blotting substrate, Thermo Scientific) using 1088866), 0.25% trypsin (GIBCO, 15090-046), DNaseI the Typhoon imaging system. After extensive washing, the 10 μg/ml for three consecutive cycles of 15 min at 37 °C membrane was incubated with anti-Histone H3 antibody in a water bath under gentle agitation. For each round, a (ab1691, 1:10,000; abcam) as a loading control. All West- supernatant containing dissociated cells was filtered ern blots were run in triplicate, and bands were quanti- through 70-μm cell strainer, and trypsin was inhibited tated in one representative gel. Quantification was done with foetal calf serum (FCS). Pooled supernatants from using ImageJ software. each round of digestion were centrifuged at 1600 rpm for 15 min at 4 °C, and pellet was re-suspended in cold Isolation of fixed mouse muscle stem cells and real-time DMEM/1% PS/2%FCS and filtered through 40-μmcell PCR strainer. For empirical analysis of genes by RT-qPCR (e.g., Jun In other experiments, skeletal muscles from the limbs, and Fos), skeletal muscles were fixed immediately in body wall, and diaphragm were collected from pups at 0.5% for 1 h in paraformaldehyde (PFA) using a protocol postnatal day 8 (P8, mitotically active satellite cells) and based on the notion that transcripts are stabilized by 4–5 weeks old mice (quiescent satellite cells) of PFA fixation [18, 19]. Briefly, PFA fixed and unfixed nGFP/+ Pax7 knock-in line [17]. Cells were isolated by skeletal muscles were minced as described [4]; fixed FACS based on NICD-GFP or Pax7-nGFP intensity, samples were incubated with collagenase at double the using BD FACS ARIA III and MoFlo Astrios sorters. normal concentration, and mRNA was isolated following FACS based on size, granulosity, and GFP levels using a FACS Aria II (BD Bioscience). Total RNA was extracted Microarray sample preparation from fixed cells with RecoverAll™ (Total Nucleic Acid Total mRNAs were isolated using Qiagen RNAeasy® Mi- Isolation Kit Ambion, Thermo Fisher), according to cro Kit according to the manufacturer’s recommenda- manufacturer instructions. cDNA was prepared by tions; 5 ng of total RNA was reverse transcribed and random-primed reverse transcription (Super-Script II, amplified following the manufacturer’s protocols Invitrogen, 18,064–014), and real-time PCR was done (Ovation Pico WTA System v2 (Nugen Technologies, using SYBR Green Universal Mix (Roche, 13608700) Inc. 3302-12); Applause WTA Amp-Plus System (Nugen StepOne-Plus, Perkin-Elmer (Applied Biosystems). Spe- Technologies, Inc. 5510-24)), fragmented and biotin la- cific primers for each gene were designed, using the Pri- beled using the Encore Biotin Module (Nugen Tech- mer3Plus online software, to work under the same nologies, Inc. 4200-12). Gene expression was determined cycling conditions. For each reaction, standard curves by hybridization of the labeled template to GeneChip for reference genes were constructed based on six four- microarrays Mouse Gene 1.0 ST (Affymetrix). fold serial dilutions of cDNA. All samples were run in Hybridization cocktail and posthybridization processing triplicate. The relative amounts of gene expression were were performed according to the “Target Preparation for calculated with RLP13 expression as an internal standard Affymetrix GeneChip Eukaryotic Array Analysis” proto- (calibrator). RT-qPCR primers used appear in Add- col found in the appendix of the Nugen protocol of the itional file 6: Table S2. fragmentation kit. Arrays were hybridized for 18 h and washed using fluidics protocol FS450 0007 on a Gene- Normalization, quality control, and filtering of GEPs Chip Fluidic Station 450 (Affymetrix) and scanned with Gene expression profiles (GEPs) were processed using an Affymetrix GeneChip Scanner 3000, generating CEL standard quality control tools to obtain normalized, files for each array. Three biological replicates were run probeset-level expression data. For all raw datasets de- for each condition. rived from affymetrix chips, Robust Multi-Array Average Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 5 of 15 expression measure (rma) was used as normalization Gene set enrichment analysis on individual sets method using the affy and the oligo R packages [20, 21]. Each dataset was tested for gene set enrichment All analyses were preferentially conducted at the probe- independently, using the CAMERA competitive test im- set level. Probesets were annotated to gene symbol and plemented in the Limma R package [27] and three gene gene ENTREZ using chip-specific annotations. For gene set collections from the mouse version of the Molecular level results, the probeset with the highest expression Signatures Database MSigDB v6 [28, 29]: (1) Hallmark variability was selected to represent the corresponding gene sets (H), which summarize and represent specific gene. Quality controls were performed on raw data using well-defined biological states or processes displaying a relative log expression (RLE) and Normalized Unscaled coordinate gene expression; (2) Kyoto Encyclopedia of Standard Errors (NUSE) plots from the affyPLM R pack- Genes and Genomes (KEGG) canonical pathways (C2 age [22]. Sample distribution was examined using hier- CP:KEGG), derived from the Kyoto Encyclopedia of Genes archical clustering of the Euclidean distance and and Genomes [30]; and (3) Reactome canonical pathways principal component analysis from the stats [23] and (C2 CP:Reactome) from the curated and peer reviewed FactoMineR R packages [24] (see Additional file 1: Figure pathway database [31] (gene set analysis in Figs. 1 and 2). S1 for the resulting plots for dataset Quiescent [high/ low]/D3Activated [high/low]). The resulting plots of the remaining datasets are not shown, but they presented Multiple set analysis: determination of the quiescent similar trends, which can be explored through the inter- signature active web server Sherpa. The combinatorial landscape of datasets was explored using the SuperExactTest [32] and the UpSetR [33] R Differential expression analysis packages to test and visualize the intersection of the Each dataset was individually analyzed to identify genes datasets. Additionally, the Jaccard index [34] of similarity showing significant differential expression (DEGs) was calculated to assess the extent of similarity between between the ASC and the QSC (gene level analysis in statistically differentially expressed genes (DEGs) of each Fig. 1; differential analysis in Fig. 2). This analysis was pair of datasets. A significance ranking, based on several performed using the linear model method implemented criteria, was calculated for each individual dataset to in the Limma R package [25]. The basic statistic was the determine its presence or absence in the final dataset moderated t-statistic with a Benjamini and Hochberg’s ensemble, which was used for determining the gene multiple testing correction to control the false discovery signature. Once the dataset ensemble was defined, the rate (FDR) [26]. overlapping differentially up- and down-regulated genes Fig. 1 General framework of the analysis: an individual dataset analysis followed by a multi-set analysis. The individual dataset analysis consisted of (i) the analysis of gene expression profiles (GEPs) of each dataset, including normalization, filtering and quality control check of each raw dataset, and the differential analysis to identify dataset-specific differentially expressed genes (DEGs), (ii) the Gene Set Enrichment Analysis (GSEA) performed in the gene set space. The GSEA consisted in identifying enriched pathways from three gene sets of the MSigDB collection [29] (Hallmark gene sets, CP: KEGG gene sets and CP: Reactome gene sets); (iii) the multi-set analysis to assemble a study-independent gene signature, i.e., a list of genes specific to the quiescence state Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 6 of 15 purpose, commonly up-regulated or down-regulated genes were used in a one-sided Fisher’s exact test imple- mented in R script with a Benjamini and Hochberg’s multiple testing correction of the P value to determine the enriched gene sets and the direction of such enrichment. Web application: Sherpa We developed an interactive web application for the exploration, analysis, and visualization of the individual datasets and their combination (http://sherpa.pasteur.fr). This application allows the user to effectively and efficiently analyze the individual datasets one by one (in- dividual dataset analysis) or as an ensemble of datasets (multi-set analysis) and was developed with the Shiny R package [36]. Results This study involves an individual dataset analysis followed by a multi-set analysis (Fig. 1). First, each raw dataset was normalized, filtered, and subjected to the same quality controls and checks. Gene-level differential analysis and gene set enrichment analysis were then performed (Fig. 2). Finally, a multi-set analysis assembled a platform- independent list of genes specific to the quiescent state. Fig. 2 Workflow of the standardized individual dataset analysis. The When analyzing multiple microarray GEPs, however, sev- analysis of the nine datasets was performed in a consistent manner eral issues needed to be addressed regarding the experi- for each dataset using ad hoc R scripts. It included the first step of mental setup, the microarray platforms and the laboratory data preparation followed by a second step of data analysis. GEPs were processed using standard quality control tools to obtain conditions [37]. First, the individual studies, even if re- normalized, probeset-level expression data. For raw datasets derived from lated, had different aims, experimental designs, and cell affymetrix chips, Robust Multi-Array Average expression measure (rma) populations of interests (e.g., developmental stage and was used as normalization method. All analyses were conducted at gender of mice). Second, the different microarray plat- probeset level. Probesets were annotated to gene symbol and gene forms contained different probes and probesets with spe- ENTREZ using chip-specific annotations. Quality controls were performed on raw data using RLE and NUSE plots. The distribution of the QSC and cific locations and alternative splicing that might produce ASC samples according to their GEPs was explored using hierarchical different expression results [38]. Finally, sample prepar- clustering of the Euclidean distance and principal component analysis ation, protocols, and dates of extractions might have influ- (Additional file 1: Figure S1). Statistically, differentially expressed genes enced array hybridization and introduced bias [39]. This (DEGs) were identified between the ASC and the QSC groups using the experimental heterogeneity required critical data process- linear model implemented by the Limma R package [10]. Gene set enrichment analysis was based on three gene set collections from the ing to ensure statistically meaningful assumptions to drive mouse version of the Molecular Signatures Database MSigDB biological interpretation and compile gene signatures. For v6.0 [12, 13]: (1) Hallmark, which summarizes and represents specific this, we used a standardized workflow to reduce the tech- well-defined biological states or processes displaying a coordinate nical variations between datasets. Specifically, this work- gene expression; (2) KEGG canonical pathways, derived from the Kyoto flow applied (i) the same normalization method for the Encyclopedia of Genes and Genomes [14]; and (3) Reactome canonical pathways from the curated and peer-reviewed pathway database [15]. experiments having the same microarray chips, (ii) the To test for the enrichment of these gene sets, the competitive gene same quality control criteria to discard poor-quality sam- set test CAMERA [16] was used ples, (iii) the same aggregation method for summarizing probesets into single genes, and (iv) the same filtering in (DEGs, as defined by the adjusted P value ≤ 0.05) were all datasets. The filtering of the datasets was based on the used to build the quiescent signature. same significance criteria which included a minimum number of differentially expressed genes, the presence of Gene set enrichment analysis on the quiescent signature genes known to be differentially expressed between quies- An over-representation analysis (ORA) [35] was applied cent and activated states from previous studies, and a to the quiescent signature using the previously described similarity measure among the datasets. Table 1 summa- gene collection (Hallmark, Kegg, Reactome). For this rizes the main biological and experimental variations in Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 7 of 15 this study, as well as the technical differences present in was considered to be an outlier and was not included in the datasets. the final analysis. Three new sets of microarrays of quiescent versus The number of significantly up- and down-regulated activated satellite cell are reported here (see Table 1). The genes (DEGs) resulting from the differential expression first one is part of a developmental and postnatal series analysis of the quiescent with respect to the activated that was reported previously [16] (E12.5 vs. E17.5), and states were calculated (Additional file 5: Table S1). DEGs here, P8 (postnatal day 8, in vivo proliferating) and 4– were identified as having |logFC| ≥ 1 and a false discov- 5 week old (quiescent) mice were compared. The second ery rate FDR ≤ 0.05. The statistical analysis was per- one is based on previously reported differences in quies- formed at the probeset level, and only those probesets cent and proliferating cell states in subpopulations of satel- matching to genes are reported. On average, the datasets lite cells (quiescent: dormant, top 10% GFP+ cells vs. exhibited 1548 up-regulated genes with a standard primed, bottom 10% GFP+ cells isolated from Tg:Pax7- deviation of 1173 genes. The number of down-regulated nGFP mice; proliferating: 3 days postinjury [3]). The third genes corresponded to 2122, with a standard deviation dataset is based on previous observations that the Notch of 1658 genes. The lowest number of DEGs belonged to intracellular domain (NICD) when expressed constitu- the Fetal_NICD[E17.5/E14.5] dataset (39 up, 136 down), Cre stop-NICD tively (Myf5 : R26 ) in prenatal muscle progeni- while the highest number of DEGs belonged to the tors leads to cell-autonomous expansion of the myogenic GSE70376 dataset (4367 up, 6346 down). Additionally, progenitor population (Pax7+/Myod−) and the absence of an analysis of the distribution of the logFC across the differentiation, followed by premature quiescence at late datasets revealed that there were significant differences fetal stages (E17.5) [16]. Here, E17.5 (quiescent) and E14.5 among the ranges and shapes of such distributions for (proliferating) prenatal progenitors were compared. Ex- each dataset (Additional file 2: Figure S2). cept for our datasets Quiescent(adult)/Activated(P8) and Fetal_NICD[E17.5/E14.5], all the studies were conducted Gene set level analysis reveals common underlying on adult mice (male and female) with ages ranging from biological processes across the datasets 8 weeks to 6 months. Despitethe great differenceamong the number of DEGs While all datasets shared similar cell states (quiescent for the different sets, clear trends among the significantly (QSC) and activated (ASC) satellite cells), the experimen- enriched pathways were found (Fig. 3a). This heatmap tal procedures varied between studies. Activation of cells, shows each dataset as a column and each enriched gene set for instance, was achieved in different ways: (i) in vitro, by as a row. The gene set collection that was tested for enrich- culturing freshly isolated satellite cells for several days and ment corresponds to the Hallmark gene set collection from (ii) in vivo, by extracting ASCs from an injured muscle. MSigDB [40]. Enriched gene sets corresponding to over- Furthermore, for in vivo activation, several techniques expressed genes are shown in red, while enriched gene sets were used to induce the injuries—BaCl , or the snake that were generally abundant in under-expressed genes are venoms cardiotoxin or notexin. Cell extraction protocols shown in blue. Out of the 11 datasets, GSE38870 stood as also varied among the different studies: (i) using trans- an outlier for both over- and under-represented gene sets genic mice expressing a reporter gene that marks satellite compared to the rest. For the other ten datasets, most of cells (several alleles) or (ii) using a combination of anti- them showed an enrichment in the quiescent state for the bodies targeting surface cell antigens specific to satellite TNFA_SIGNALING_VIA_NFKB pathway (nine datasets), cells (several combinations, see Table 1). Finally, the nine while eight datasets were enriched in UV_RESPONSE_DN, datasets that were examined in this study date from 2007 IL6_JAK_STAT3_SIGNALING, APICAL_SURFACE, and to 2016. During this period, microarray technologies KRAS-SIGNALING_DN pathways. Similarly, the ten data- evolved, and the different chips available may introduce sets shared similar trends for under-expressed genes in the yet another source of variation among the compared pathways MYC_TARGETS_V1, E2F_TARGETS, G2M_ datasets. To carry out a statistically meaningful ana- CHECKPOINT, and OXYDATIVE_PHOSPORYLATION, lysis of these extensively heterogeneous datasets, crit- all of which are expected to be absent in the quiescent state. ical data processing was required to interpret gene In total, two subnetworks corresponding to 8 under- and signatures as described in the workflow (Fig. 1). 15 over-expressed enriched gene sets could be distin- guished (Fig. 3b). A network representation of the top 3 most commonly found enriched gene sets (nodes, thick- The number of differentially expressed genes varies outlined circles) is shown in Fig. 3b for the over-expressed significantly among different datasets (TNFA_SIGNALING_VIA_NFKB, UV_RESPONSE_DN, I A total of 32 samples from ASCs and 34 samples from L6_JAK_STAT3_SIGNALING) and under-expressed (MY QSCs from the nine datasets were analyzed. After the C_TARGETS_V1, E2F_TARGETS, G2M_CHECKPOINT) quality control, one sample from the GSE38870 dataset categories. The size of each node corresponds to the total Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 8 of 15 ab Fig. 3 Enriched gene sets across individual datasets. Enriched (over-represented) gene sets with over-expressed genes are shown in red; enriched gene sets with under-expressed genes are shown in blue. a Gene set enrichment profiles using the Hallmark gene set collection from MSigDB [40], each row corresponds to a gene set, and each column corresponds to a dataset. b Network representation of three most common over- and under-expressed gene sets (denoted by the thick border on the node) along with the gene sets sharing genes with them (connector lines). Nodes represent gene sets with a node circle size proportional to the number of times the gene sets appear as enriched in the different datasets (see panel a). Thickness of the connecting lines is proportional to the number of shared genes between nodes number of times that the gene set was enriched in all the differentially expressed within each dataset, in order to datasets, and the thickness of the interconnecting lines is identify genes commonly up- or down-regulated in the proportional to the number of genes shared between quiescent state. Although the aforementioned technical connected nodes. Gene sets sharing less than 10% of their and experimental heterogeneity could introduce noise in genes are not shown. We noted also that different gene sets this analysis, such variation was distinguishable from the had a varying number of genes in common (Fig. 3b); if the more stable, underlying common quiescent signature. gene overlap were large, those gene sets (and their Given that the distribution and ranges of the logFCs corresponding biological functions) will likely be also varied so drastically between datasets (Additional file 2: affected (i.e., activated or repressed). For the three most Figure S2), a single FC (fold change) threshold could not common enriched gene sets with under-expressed be chosen to be used for all datasets. Thus, for the genes, for example, we noted that gene set MYC_TAR- combinatorial analysis approach, we set out to maximize GETS_V1 shares most of its genes with gene sets the number of differentially expressed genes common to E2F_TARGETS and G2M_CHECKPOINT. This sug- all the datasets that were considered, where only the ad- gests that the three categories represented by these justed P value was used as a threshold to define DEGs. gene sets had an interplay of genes that displays them However, even in this low constrained scenario, combin- all as under-expressed. ing all the datasets together resulted in very few overlap- ping genes: 12 up (Arntl, Atf3, Atp1a2, Cdh13, Dnajb1, Determining a quiescent transcriptional signature among Enpp2, Ier2, Jun, Nfkbiz, Rgs4, Usp2, Zfp36) and 1 down all datasets (Igfbp2). Alternatively, when certain datasets were To determine a consensus quiescent signature from the excluded from the analysis, the number of DEGs in- datasets, we compared the genes found to be creased (Fig. 4a). Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 9 of 15 ab Fig. 4 Different combinatorial landscapes result in different degrees of stringency for the list of genes defining the quiescent state of satellite cells. a Barplot indicating the number of overlapping differentially expressed genes (DEGs) for each best combination of intersections, from degree 2 to 11. The dots underneath the barplot indicate the datasets included in the intersections. The total number of up (UP) and down (DOWN) DEGs for each dataset are indicated in light gray and dark gray, respectively. Panels b and c are the colored matrices showing the Jaccard index between each pair of datasets, for UP DEGs and DOWN DEGs, respectively. Dendrograms show the hierarchical clustering using the Jaccard index as Euclidean distance Combinatorial assessment of datasets according to first two closest datasets belonged to studies originating significance and similarity criteria from the same laboratory underscores the impact of To find the best combination of datasets defining a technical biases. The hierarchical clustering of the consistent and sufficiently large quiescent signature, we Euclidean distance of the Jaccard indexes shows that for ranked them according to their significance. This signifi- up- and down-regulated genes, the datasets Fetal_- cance was determined according to an ensemble of cri- NICD[E17.5/E14.5], GSE38870, and GSE81096 had a teria. First, the dataset should have a minimum number tendency to not group with the rest of the datasets. In of DEGs. Our Fetal_NICD[E17.5/E14.5] dataset, for in- addition to these criteria, others can be used to assess stance, had only 250 DEGs (Additional file 5: Table S1), the significance of the datasets. Choosing the datasets and using it in the analysis resulted in a dramatically re- according to the activation or extraction method of the duced number of overlapping DEGs (Additional file 3: cells, for example, would result in a more stringent Figure S3). A second criterion was the presence of genes ensemble of datasets. known to be differentially expressed between quiescent Taking into account the dataset significance (based on the and activated states from previous studies. In this case, number of DEGs and presence of some reported quiescent datasets GSE38870 and GSE81096 did not meet this cri- markers) and the low extent of overlap between Fetal_- terion, since they lacked genes known to be associated NICD[E17.5/E14.5], GSE38870, and GSE81096 datasets with or regulating the quiescent state such as Calcr, with respect to the remaining datasets, these three datasets Notch1, Chrdl2, Lama3, Pax7, and Bmp6 genes (unpub- were excluded from the multi-dataset analyses. The final lished data, see Fig. 7; [41–43]). As a third criterion, we ensemble comprised the eight remaining datasets which used the dataset similarity, which was assessed using the had 207 and 542 genes commonly up- and down-regulated, Jaccard index (JI), and a matrix of the JIs for the up- and respectively (Fig. 4a). To further characterize these down-regulated genes was generated (Figs. 4b, c, re- commonly regulated genes, we performed an over- spectively). In both matrices, the closest pairs of datasets representation analysis (ORA) of the gene sets. An enrich- were GSE47177 at 60 h and GSE47177 at 84 h (JI = 0.46 ment was detected for the 207 commonly up-regulated and 0.44 for the up- and down-regulated genes, respect- genes in seven different Hallmark gene sets (Fig. 5a). Some ively), followed by the second pair of closest sets Quies- genes were shared among different pathways (e.g., Atf3 and cent [high]/D3Activated [high] and Quiescent [low]/ Il6 were found in six different gene sets), while others were D3Activated [low] (JI = 0.39 and 0.33, for up- and down- found in one gene set only (e.g., Tgfbr3, Spsb1). These re- regulated genes, respectively). The observation that the sults are consistent with the individual gene set enrichment Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 10 of 15 a b Fig. 5 Gene expression of differentially expressed genes (DEGs) in satellite cells. a Binary heatmap of the over-representation analysis. Each col- umn represents one enriched (over-represented) gene set, and each row corresponds to a gene. Red cells indicate the presence of the corre- sponding gene in a given gene set. b Network representation of 39 GOSlim terms used to characterize the commonly regulated genes in satellite cells. Nodes represent gene sets with a node size proportional to the gene set size. Edges indicate that genes are shared among the gene sets. The thickness of the edge is proportional to the number of shared genes. Also shown are the heatmaps of logFC for genes belonging to extracellular matrix, nucleic acid binding and cell cycle and proliferation, nucleic acid binding, and signal transduction activity, respectively. Each row corresponds to a gene and each column corresponds to a dataset. Dendrograms show hierarchical clustering using the Euclidean distance analysis (see Fig. 3) emphasizing that these genes reflect the To assign a global function to the commonly regulated global traits associated with the quiescent state. Note that genes, we annotated them using GOSlim terms, which only a fraction of the 207 genes was found in known exist- summarize broad terms based on Gene Ontology (GO) ing gene sets (57/207), leaving about three-quarters of the terms [44]. To identify categories of genes, we generated commonly up-regulated genes not associated with any heatmaps of the logFC in the different datasets for a sub- existing gene set. This finding was expected given that a set of the 207 UP genes belonging to the extracellular quiescent signature is yet to be defined, and thus current matrix, nucleic acid binding activity (+/− cell cycle prolif- gene sets lack such annotations. To facilitate the analysis of eration), and signal transduction activity (Fig. 5b). transcriptomes as described here, we have developed an on- Unexpectedly, some genes associated with cell cycle pro- line interactive tool called Sherpa (Fig. 6). Sherpa allows liferation, such as c-Fos and c-Jun, were up-regulated in users to perform analyses on individual and on multiple the quiescent cell analyses in all datasets (Fig. 7a). To ver- datasets. Each individual dataset analysis involves the identi- ify the transcriptional relevance of these genes in fication of differentially expressed genes; comparison of the quiescent cells, we used a protocol to isolate satellite cells expression of selected genes in the quiescent and activated in which a short fixation (PFA) treatment was performed states through tables, heatmaps, and volcano plots; and ex- prior to harvesting the cells to arrest de novo transcription ploration of the distribution of the samples according to during the isolation protocol (see the “Methods” section). their variability through principal component analysis and Then, the expression level quantification was assessed at cluster analysis. The multiple dataset analysis allows the the transcript (RT-qPCR) and protein (Western blot) level comparison of selected datasets according to the commonly at different time points after isolation for a number of differentially expressed genes. All of these analyses are inter- genes (Fig. 7b, c). Notably, quantifications of c-Jun, Jun B, active, as they allow the user to select the thresholds of fold and Jun D levels showed that at time 0 (+PFA), these change (logFC) and false discovery rate (adj. P value). genes were not detected in quiescent cells, neither at the Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 11 of 15 Fig. 6 Snapshot of the interactive web application for transcriptomic data exploration and comparison. Sherpa (http://sherpa.pasteur.fr) allows users to perform individual dataset and multiple dataset analysis. In the individual dataset analysis (shown), the user chooses the dataset for which the analysis is to be performed. Then, it is possible to identify differentially expressed genes (e.g., volcano plot), compare the expression of selected genes in the quiescent and activated state (e.g., heatmap, as shown in the figure), and the distribution of the samples according to their variability (principal component analysis). All these analyses are interactive, as they allow the user to set the thresholds of fold change (logFC) and false discovery rate (adj. P value) mRNA (right panel) nor at the protein (left panel) level comparisons across divergent datasets that are heteroge- (Fig. 7c). However, these genes were up-regulated using neous in the platform and biological condition. Notably, conventional satellite cell isolation protocols that take sev- this pipeline allowed the examination of 11 datasets, eral hours. As a control, PFA treatment after cell isolation including three novel transcriptomes from our work, as had no effect on this expression pattern (Additional file 4: well as the identification of a variety of functional gene Figure S4). This rapid up-regulation was then followed by sets that appear in common with the majority of the a decline in expression levels of these genes (Fig. 7b, c), datasets. To perform this analysis, it was necessary to suggesting that this is the result of a stress response that is standardize every step of the analysis to attenuate the associated with the isolation procedure. impact of heterogeneity inherent in all of the datasets due to experimental, biological, and technical variations. Discussion These varying conditions led us to perform a combina- The transcriptome analysis and pipeline, as well as the torial assessment of the individual datasets according to Sherpa interface that we describe here, allow multi-scale their significance and similarity criteria. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 12 of 15 Fig. 7 Direct comparison of fixed and unfixed satellite cells identify immediate response genes not present the in vivo state. a Boxplots of four examples of genes found commonly upregulated in QSCs in the different datasets showing the distribution of intensities values in QSCs and ASCs. Colored dots indicate each dataset. Shape of the dot indicates whether the gene is significantly differentially expressed or not. b Fold change of mRNA (log10) between 0 h + PFA and 5 h + PFA. Blue bars indicate a higher expression in 0 h + PFA condition; the red bars indicate a higher expression in 5 h + PFA condition. Color intensities are proportional to the fold change. c c-Jun, Jun B, and Jun D protein levels from satellite cells at 0, 5, 10, 15 h after isolation (with and without PFA treatment) were measured by Western blotting, and band intensities were quantified by densitometric analysis with the ImageLab software (right). Basal levels of c-Jun, Jun B, and Jun D mRNA from satellite cells at 0, 5, 10, 15 h after isolation (with and without PFA treatment) were measured by real-time PCR (left) Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 13 of 15 Variations in datasets are not unique to the study of genes [56]. Using a novel isolation protocol based on the muscle stem cells. Indeed, the last decades have notion that tissues that are fixed prior to processing result witnessed many efforts to analyze microarray data to in stabilized mRNA [18, 19], we validated the expression provide relevant gene signatures. In cancer biology, for of several genes including Calcr and Teneurin4 (Tenm4) example, gene markers were sought either for prognosis, as true quiescent markers. In contrast, we show that Fos i.e., lists of genes able to predict clinical outcome [45] or and Jun transcripts and Jun family proteins are not present for molecular subtyping, i.e., list of genes able to classify at significant levels in vivo, but are robustly induced different subtypes of a disease [46, 47]. However, even if within 5 h, the average processing time taken for isolation markers performed well, gene signatures derived from by FACS of satellite cells. These results are concordant studies on the same treatments and diseases often with a recently published paper in which immediate early resulted in gene lists with little overlap [48]. In other and heat-shock genes were rapidly up-regulated during cases, the signatures proved to be unstable, having other the cell isolation procedure [57]. We propose that these gene lists on the same dataset with the same predictive and other stress response genes mitigate the quiescent to power [49]. These observations suggest that such sig- activation transition that accompanies the initial steps of natures may include causally related genes, i.e., down- exit from G0. stream of the phenotype-causing genes, and that Given these unexpected findings, the comparison of these gene lists may share the same biological path- transcriptomes of satellite cells from a fixed/in vivo state ways [50]. with those that were described here would be important Gene Set Enrichment Analysis (GSEA) has become an to delineate homeostatic vs. immediate early response efficient complementary approach for analyzing omic genes. For that purpose, Sherpa allows the integration of data in general and GEPs in particular [50–52]. It shifts datasets from fixed samples, or other methodologies, the expression analysis from a gene space to a gene set when they will be available. Beyond the present findings, space, where genes are organized into gene sets we propose that all transcriptome data obtained from according to a common feature, such as a functional cells isolated from solid tissues, which require extensive annotation (e.g., a Gene Ontology term) or a specific enzymatic digestion and processing before isolation of metabolic pathway (e.g., a KEGG pathway). In this way, RNA, need to be re-evaluated to distinguish those genes it incorporates previously existing biological knowledge that are induced by the isolation procedure. to drive and increase interpretation, while offering In addition to generating this open access compen- greater robustness and sensitivity than gene level strat- dium of GEPs, we provide a standardized pipeline that egies [50, 53, 54]. sets the basis for a multi-set analysis for an effective and In spite of the heterogeneity in datasets examining qui- systematic comparison of individual datasets. Analyzing escent muscle satellite cells, we were able to identify genes multiple datasets provides generalized information that were consistently up- and down-regulated among the across different studies [38, 39]. The cancer field was a different datasets (Additional file 5: Table S1). The final pioneer in combining several works [58, 59] and other multi-set analysis comprised eight datasets which had 207 fields, such as neurodegenerative diseases [60, 61] and and 542 genes that were commonly up- and down- regulatory genomics have successfully adopted this strat- regulated, respectively. Moreover, the gene set enrichment egy [62]. The multi-dimensional approach presented analysis of the individual datasets showed striking similar- here offers increased power, due to the higher sample ities on the over- and under-represented gene sets. These size and increased robustness, by highlighting variations gene sets, which summarize and represent well-defined in individual studies results [37, 63]. Such variations biological states and processes in the cells, were shared are the consequence of the high level of noise and ar- among the different datasets. They include an over- tefacts and are typically associated with microarray representation of genes in the TNFα pathway via NFKβ data [64]. signaling, Il6-Jak-Stat3 signaling, and the apical surface processes, and an under-representation of MYC and E2F Conclusions targets, and genes associated with the G2 M checkpoint Here, we compile the first comprehensive catalog of gene and oxidative phosphorylation. Some markers such as expression data of myogenic cells across distinct states Calcitonin receptor (Calcr), Teneurin4 (Tenm4), and stress and conditions, providing a global perspective on quies- pathways identified previously were also present in our cence. An extensive comparison of the transcriptomic analysis [11, 41, 55] (Additional file 6: Table S2). However, profiles of mouse skeletal muscle satellite cells in quies- we also report that virtually all datasets contained genes cent and activated states resulting from nine datasets re- that would be expected to be present during activation or vealed common features among the different studies from cell cycle entry, such as members of the Fos and Jun family other features which are more specific to the individual previously identified as immediate early stress response datasets. In spite of heterogeneities across platforms, we Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 14 of 15 were able to identify genes that were consistently up- and Funding ST was funded by Institut Pasteur, Centre National pour la Recherche down-regulated among the different datasets. By doing so, Scientific, the Agence Nationale de la Recherche (Laboratoire d’Excellence we developed and made available an open-access inter- Revive, Investissement d’Avenir; ANR-10-LABX- 73), and the European active exploratory tool called Sherpa (SHiny ExploRation Research Council (advanced research grant 332893). tool for transcriPtomic Analysis) that allows statistically Availability of data and materials valid analyses and systematic comparisons that cannot be The generated transcriptome datasets are available from the corresponding performed directly on the datasets. Finally, by obtaining author on reasonable request. Public datasets are available at https:// www.ncbi.nlm.nih.gov/geo/ under their corresponding identification mRNA directly from fixed muscle tissue for empirical test- number. ing of genes present during quiescence in vivo, we identi- fied immediately early expressed stress response genes Authors’ contributions that were present in all datasets due to the isolation and NP, SM, and ST analyzed and interpreted the data regarding the gene expression profiles. SY, MBB, HS, and RS performed the experiments to processing protocols used previously for solid tissues. generate transcriptome data. FP contributed to the data analysis. DDG and FP performed the validation experiments of candidate genes. All authors read and approved the final manuscript. Additional files Ethics approval and consent to participate Not applicable. Additional file 1: Figure S1. Quality controls and data sample distribution for Quiescent [high/low]/D3Activated [high/low] dataset. a Consent for publication Relative log expression (RLE) and b normalized unscaled standard errors Not applicable. (NUSE) plots for the D3P7 dataset show that as expected for good quality data, RLE median values are centered around 0.0, while the median standard error should be 1 for most genes in the NUSE plots. A sample Competing interests distribution is distributed according to status (D3H: activated, high; D3L: The authors declare that they have no competing interests. activated, low; QH: quiescent, high; QL: quiescent, low) using principal component analysis (c) and hierarchical clustering of the Euclidean Publisher’sNote distance (d). (PDF 103 kb) Springer Nature remains neutral with regard to jurisdictional claims in Additional file 2: Figure S2. Violin plots of the logFC distribution for published maps and institutional affiliations. each individual dataset. Density plots of the logFC (|logFC| < 1 in red; |logFC| > 1 in blue. (PDF 156 kb) Author details Additional file 3: Figure S3. Effect of adding NICD[E17.5/E14.5] dataset Bioinformatics and Biostatistics Hub, C3BI, USR 3756 IP CNRS, Institut on the best combinations of datasets. Impact of including or excluding Pasteur, 75015 Paris, France. Stem Cells and Development, Department of NICD dataset on overall analysis. (PDF 395 kb) Developmental and Stem Cell Biology, Institut Pasteur, 75015 Paris, France. 3 4 CNRS UMR 3738, Institut Pasteur, 75015 Paris, France. Institute for Stem Cell Additional file 4: Figure S4. Effect of PFA treatment at different time Biology and Regenerative Medicine, GKVK PO, Bellary Road, Bengaluru points in the experimental procedure. Control experiments showing no 560065, India. Dipartimento di Medicina Clinica e Chirurgia, Università degli effect of PFA on gene expression measurements. (PDF 445 kb) Studi di Napoli Federico II, Via S. Pansini 5, 80131 Naples, Italy. Novo Nordisk Additional file 5: Table S1. Identified differentially expressed genes in Foundation Center for Stem Cell Biology, DanStem, University of the QSCs condition for the nine datasets. Differentially expressed genes Copenhagen, 3B Blegdamsvej, DK-2200 Copenhagen N, Denmark. in the QSCs condition for the nine datasets using logFC = 1 and FDR = 0.05. (XLSX 48 kb) Received: 3 August 2017 Accepted: 29 November 2017 Additional file 6: Table S2. Primers used for validation of gene expression by RT-qPCR. Primers used for RT-qPCR studies in Fig. 7. (PDF 14 kb) References 1. Li L, Clevers H. Coexistence of quiescent and active adult stem cells in mammals. Science. 2010;327:542–5. Abbreviations 2. Sambasivan R, Gayraud-Morel B, Dumas G, et al. Distinct regulatory cascades ASCs: Activated satellite cells; DEGs: Statistically differentially expressed genes; govern extraocular and pharyngeal arch muscle progenitor cell fates. Dev FACS: Fluorescence-activated cell sorting; FC: Fold change; FDR: False Cell. 2009;16:810–21. discovery rate; FSC: Functional scoring method; GEO: Gene Expression 3. Rocheteau P, Gayraud-Morel B, Siegl-Cachedenier I, et al. A subpopulation Omnibus; GEPs: Gene expression profiles; GFP: Green fluorescent protein; of adult skeletal muscle stem cells retains all template DNA strands after cell GO: Gene Ontology; GSEA: Gene set enrichment analysis; KEGG: Kyoto division. Cell. 2012;148:112–25. Encyclopedia of Genes and Genomes; logFC: Logarithm of the fold change; 4. Gayraud-Morel B, Pala F, Sakai H, et al. Isolation of muscle stem cells from NICD: Notch intracellular domain; NUSE: Normalized Unscaled Standard mouse skeletal muscle. Methods Mol Biol Clifton NJ. 2017;1556:23–39. Errors; ORA: Over-representation analysis; P8: Postnatal day 8; 5. Network TCGAR, Weinstein JN, Collisson EA, et al. The cancer genome atlas PFA: Paraformaldehyde; QSCs: Quiescent satellite cells; RLE: Relative Log pan-cancer analysis project. Nat Genet. 2013;45:1113–20. Expression; RMA: Robust Multi-Array Average expression measure; 6. Taccioli C, Maselli V, Tegnér J, et al. ParkDB: a Parkinson’s disease gene Sherpa: SHiny ExploRation tool for transcriPtomic Analysis; TA: Tibialis expression database. Database. 2011; https://doi.org/10.1093/database/ Anterior; TCGA: The Cancer Genome Atlas bar007. Epub ahead of print 1 Jan 2011 7. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al. An anatomically Acknowledgements comprehensive atlas of the adult human brain transcriptome. Nature. 2012; We would like to thank K. Soni and U. Borello for their assistance in the early 489:391–9. stages of this work and P. Mourikis and F. Relaix for communicating 8. Lein ES, Hawrylycz MJ, Ao N, et al. Genome-wide atlas of gene expression in unpublished results and sharing protocols. We also acknowledge the Flow the adult mouse brain. Nature. 2007;445:168–76. Cytometry Platform of the Technology Core-Center for Translational Science 9. Liu L, Cheung TH, Charville GW, et al. Chromatin modifications as (CRT) at Institut Pasteur for the support in conducting this study and the determinants of muscle stem cell quiescence and chronological aging. Cell microarray platform at Institut Cochin. Rep. 2013;4:189–204. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 15 of 15 10. Fukada S, Uezumi A, Ikemoto M, et al. Molecular signature of quiescent 39. Irizarry RA, Warren D, Spencer F, et al. Multiple-laboratory comparison of satellite cells in adult skeletal muscle. Stem Cells. 2007;25:2448–59. microarray platforms. Nat Methods. 2005;2:345–50. 11. Pallafacchina G, François S, Regnault B, et al. An adult tissue-specific stem 40. Liberzon A, Birger C, Thorvaldsdóttir H, et al. The molecular signatures cell in its niche: a gene profiling analysis of in vivo quiescent and activated database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–25. muscle satellite cells. Stem Cell Res. 2010;4:77–91. 41. Yamaguchi M, Watanabe Y, Ohtani T, et al. Calcitonin receptor signaling 12. Farina NH, Hausburg M, Betta ND, et al. A role for RNA post-transcriptional inhibits muscle stem cells from escaping the quiescent state and the niche. regulation in satellite cell activation. Skelet Muscle. 2012;2:21. Cell Rep. 2015;13:302–14. 42. Mourikis P, Sambasivan R, Castel D, et al. A critical requirement for notch 13. García-Prat L, Martínez-Vicente M, Perdiguero E, et al. Autophagy maintains signaling in maintenance of the quiescent skeletal muscle stem cell state. stemness by preventing senescence. Nature. 2016;529:37–42. Stem Cells. 2012;30(2):243–52. 14. Lukjanenko L, Jung MJ, Hegde N, et al. Loss of fibronectin from the aged 43. Günther S, Kim J, Kostin S, et al. Myf5-positive satellite cells contribute to stem cell niche affects the regenerative capacity of skeletal muscle in mice. Pax7-dependent long-term maintenance of adult muscle stem cells. Cell Nat Med. 2016;22:897–905. Stem Cell. 2013;13(5):590–601. 15. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional 44. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification genomics datasets—update. Nucleic Acids Res. 2013;41:D991–5. of biology. Nat Genet. 2000;25:25–9. 16. Mourikis P, Gopalakrishnan S, Sambasivan R, et al. Cell-autonomous Notch 45. Paquet ER, Cui J, Davidson D, et al. A 12-gene signature to distinguish colon activity maintains the temporal specification potential of skeletal muscle cancer patients with better clinical outcome following treatment with 5- stem cells. Dev Camb Engl. 2012;139:4536–48. fluorouracil or FOLFIRI. J Pathol Clin Res. 2015;1:160–72. 17. Sambasivan R, Comai G, Le Roux I, et al. Embryonic founders of adult 46. Wang J, Mi J-Q, Debernardi A, et al. A six gene expression signature defines muscle stem cells are primed by the determination gene Mrf4. Dev Biol. aggressive subtypes and predicts outcome in childhood and adult acute 2013;381:241–55. lymphoblastic leukemia. Oncotarget. 2015;6:16527–42. 18. Houzelstein D, Tajbakhsh S. Increased in situ hybridization sensitivity using 47. Three-gene model to robustly identify breast cancer molecular subtypes | non-radioactive probes after staining for β-galactosidase activity. Tech Tips JNCI: Journal of the National Cancer Institute | Oxford Academichttps:// Online. 1998;3:147–9. academic.oup.com/jnci/article-lookup/doi/10.1093/jnci/djr545 (accessed 5 19. Machado L, Esteves de Lima J, Fabre O, et al. In Situ Fixation Redefines June 2017). Quiescence and Early Activation of Skeletal Muscle Stem Cells. Cell Stem 48. Fan C, Oh DS, Wessels L, et al. Concordance among gene-expression-based Cell. 2017;21:1982–93. predictors for breast cancer. N Engl J Med. 2006;355:560–9. 20. Gautier L, Cope L, Bolstad BM, et al. Affy—analysis of Affymetrix GeneChip 49. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with data at the probe level. Bioinformatics. 2004;20:307–15. microarrays: a multiple random validation strategy. Lancet. 2005;365:488–92. 21. Carvalho BS, Irizarry RA. A framework for oligonucleotide microarray 50. Abraham G, Kowalczyk A, Loi S, et al. Prediction of breast cancer prognosis preprocessing. Bioinformatics. 2010;26:2363–7. using gene set statistics provides signature stability and biological context. 22. Bolstad BM. Low-level analysis of high-density oligonucleotide array data: BMC Bioinformatics. 2010;11:277. background, normalization and summarization. University of California; 2004. 51. Mootha VK, Lindgren CM, Eriksson K-F, et al. PGC-1alpha-responsive genes 23. R Core Team. R: a language and environment for statistical computing. involved in oxidative phosphorylation are coordinately downregulated in Vienna: R Foundation for Statistical Computinghttps://www.R-project.org/; human diabetes. Nat Genet. 2003;34:267–73. 52. Varn FS, Ung MH, Lou SK, et al. Integrative analysis of survival-associated 24. FactoMineR: An R Package for Multivariate Analysis | Lê | Journal of gene sets in breast cancer. BMC Med Genet. 8 https://doi.org/10.1186/ Statistical Softwarehttps://www.jstatsoft.org/article/view/v025i01 (accessed 1 s12920-015-0086-0. Epub ahead of print 12 Mar 2015 June 2017). 53. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene 25. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression sets: methodological issues. Bioinformatics. 2007;23:980–7. analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 54. Luo W, Friedman MS, Shedden K, et al. GAGE: generally applicable gene set 2015;43:e47. enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161. 26. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical 55. Ishii K, Suzuki N, Mabuchi Y, et al. Muscle satellite cell protein teneurin-4 and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. regulates differentiation during muscle regeneration. Stem Cells. 2015;33: 1995;57:289–300. 3017–27. 27. Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter- 56. Lamph WW, Wamsley P, Sassone-Corsi P, et al. Induction of proto-oncogene gene correlation. Nucleic Acids Res. 2012;40:e133. JUN/AP-1 by serum and TPA. Nature. 1988;334:629–31. 28. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: 57. van den Brink SC, Sage F, et al. Single-cell sequencing reveals dissociation- a knowledge-based approach for interpreting genome-wide expression induced gene expression in tissue subpopulations. Nat Methods. 2017;14: profiles. Proc Natl Acad Sci. 2005;102:15545–50. 29. WEHI Bioinformatics—mouse and human orthologs of the MSigDBhttp:// 58. Rhodes DR, Barrette TR, Rubin MA, et al. Meta-analysis of microarrays: bioinf.wehi.edu.au/software/MSigDB/ (accessed 20 Apr 2017). interstudy validation of gene expression profiles reveals pathway 30. KEGG: Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/ dysregulation in prostate cancer. Cancer Res. 2002;62:4427–33. kegg/ (accessed 20 Apr 2017). 59. Grützmann R, Boriss H, Ammerpohl O, et al. Meta-analysis of microarray data 31. Reactome Pathway Databasehttp://www.reactome.org/ (accessed 20 Apr on pancreatic cancer defines a set of commonly dysregulated genes. 2017). Oncogene. 2005;24:5079–88. 32. Wang M, Zhao Y, Zhang B. Efficient test and visualization of multi-set 60. Cruz-Monteagudo M, Borges F, Paz-y-Miño C, et al. Efficient and biologically intersections. Sci Rep. 2015;5:16923. relevant consensus strategy for Parkinson’s disease gene prioritization. BMC 33. Lex A, Gehlenborg N, Strobelt H, et al. UpSet: visualization of intersecting Med Genet. 9 https://doi.org/10.1186/s12920-016-0173-x. Epub ahead of sets. IEEE Trans Vis Comput Graph. 2014;20:1983–92. print 9 Mar 2016 34. Jaccard P. Étude comparative de la distribution florale dans une portion des 61. Oerton E, Bender A. Concordance analysis of microarray studies identifies Alpes et des Jura. Bull Société Vaudoise Sci Nat. 1901;37:547–79. representative gene expression changes in Parkinson’s disease: a 35. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches comparison of 33 human and animal studies. BMC Neurol. 17 https://doi. and outstanding challenges. PLoS Comput Biol. 2012;8:e1002375. org/10.1186/s12883-017-0838-x. Epub ahead of print 23 Mar 2017 36. Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie, Jonathan McPherson. shiny: 62. Consortium TEP. An integrated encyclopedia of DNA elements in the Web Application Framework for R https://CRAN.R-project.org/package=shiny human genome. Nature. 2012;489:57–74. (2017). 63. Russ J, Futschik ME. Comparison and consolidation of microarray datasets of 37. Campain A, Yang YH. Comparison study of microarray meta-analysis human tissue expression. BMC Genomics. 2010;11:305. methods. BMC Bioinformatics. 2010;11:408. 64. Kitchen RR, Sabine VS, Simen AA, et al. Relative impact of key sources of 38. Ramasamy A, Mondry A, Holmes CC, et al. Key issues in conducting a meta- systematic noise in Affymetrix and Illumina gene-expression microarray analysis of gene expression microarray datasets. PLoS Med. 5 https://doi. experiments. BMC Genomics. 2011;12:589. org/10.1371/journal.pmed.0050184. Epub ahead of print September 2008 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Skeletal Muscle Springer Journals

Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and activated skeletal muscle stem cells

Free
15 pages

Loading next page...
 
/lp/springer_journal/comparison-of-multiple-transcriptomes-exposes-unified-and-divergent-wiMDmmxRJg
Publisher
Springer Journals
Copyright
Copyright © 2017 by The Author(s).
Subject
Life Sciences; Cell Biology; Developmental Biology; Biochemistry, general; Systems Biology; Biotechnology
eISSN
2044-5040
D.O.I.
10.1186/s13395-017-0144-8
Publisher site
See Article on Publisher Site

Abstract

Background: Skeletal muscle satellite (stem) cells are quiescent in adult mice and can undergo multiple rounds of proliferation and self-renewal following muscle injury. Several labs have profiled transcripts of myogenic cells during the developmental and adult myogenesis with the aim of identifying quiescent markers. Here, we focused on the quiescent cell state and generated new transcriptome profiles that include subfractionations of adult satellite cell populations, and an artificially induced prenatal quiescent state, to identify core signatures for quiescent and proliferating. Methods: Comparison of available data offered challenges related to the inherent diversity of datasets and biological conditions. We developed a standardized workflow to homogenize the normalization, filtering, and quality control steps for the analysis of gene expression profiles allowing the identification up- and down-regulated genes and the subsequent gene set enrichment analysis. To share the analytical pipeline of this work, we developed Sherpa, an interactive Shiny server that allows multi-scale comparisons for extraction of desired gene sets from the analyzed datasets. This tool is adaptable to cell populations in other contexts and tissues. Results: A multi-scale analysis comprising eight datasets of quiescent satellite cells had 207 and 542 genes commonly up- and down-regulated, respectively. Shared up-regulated gene sets include an over-representation of the TNFα pathway via NFKβ signaling, Il6-Jak-Stat3 signaling, and the apical surface processes, while shared down-regulated gene sets exhibited an over-representation of Myc and E2F targets and genes associated to the G2M checkpoint and oxidative phosphorylation. However, virtually all datasets contained genes that are associated with activation or cell cycle entry, such as the immediate early stress response genes Fos and Jun. An empirical examination of fixed and isolated satellite cells showed that these and other genes were absent in vivo, but activated during procedural isolation of cells. Conclusions: Through the systematic comparison and individual analysis of diverse transcriptomic profiles, we identified genes that were consistently differentially expressed among the different datasets and shared underlying biological processes key to the quiescent cell state. Our findings provide impetus to define and distinguish transcripts associated with true in vivo quiescence from those that are first responding genes due to disruption of the stem cell niche. Keywords: Skeletal muscle satellite cells, Quiescence, Multi-level transcriptomic comparisons, Sherpa * Correspondence: shaht@pasteur.fr; shahragim.tajbakhsh@pasteur.fr Equal contributors Stem Cells and Development, Department of Developmental and Stem Cell Biology, Institut Pasteur, 75015 Paris, France CNRS UMR 3738, Institut Pasteur, 75015 Paris, France Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 2 of 15 Background signature of the quiescent state. This large compendium Most adult stem cell populations identified to date are of expression data offers the first comparison and integra- in a quiescent state [1]. Following tissue damage or tion of nine independent studies of the quiescent state of disruption of the stem cell niche, skeletal muscle satellite mouse satellite cells, and we developed Sherpa, a shiny (stem) cells transit through different cell states from re- interactive web server to provide a user-friendly explor- versible cell cycle exit to a postmitotic multi-nucleate ation of the analysis. In addition, using a protocol for the state in myofibres. In mouse skeletal muscle, the tran- fixation and capture of mRNA directly from the tissue scription factor Pax7 marks satellite cells during quies- without the alteration in gene expression that could arise cence and proliferation, and it has been used to identify during the isolation procedure, which typically takes and isolate myogenic populations from skeletal muscle several hours with solid tissues, we have empirically tested [2, 3]. Myogenic cells have also been isolated by the expression of transcripts. Strikingly, several genes, fluorescence-activated cell sorting (FACS) using a variety including members of the Jun and Fos family, were found of surface markers, including α7-integrin, VCAM, and to be present in isolated satellite cells using conventional CD34 [4]. Although these cells have been extensively isolation procedures, but they were absent in vivo. These studied by transcriptome, and to a more limited extent findings, and the unique atlas that we report, will un- by proteome profiling, different methods have been used doubtedly improve our current understanding of the to isolate and profile myogenic cells thereby making molecular mechanisms governing the quiescent state and comparisons laborious and challenging. To address this contribute to the identification of critical regulatory genes issue, it is necessary to generate comprehensive catalogs involved in different cell states. of gene expression data of myogenic cells across distinct states and in different conditions. Methods Soon after their introduction two decades ago, high- Individual dataset transcriptomic analysis throughput microarray studies started to be compiled The analysis comprised a total of nine datasets, three into common repositories that provide the community novel microarray datasets and six publicly available data- access to the data. Several gene expression repositories sets [9–14], choosing only samples with overall similar for specific diseases, such as the Cancer Genome Atlas conditions. All datasets were analyzed independently (TCGA) [5], the Parkinson’s disease expression database following the same generalized pipeline based on ad hoc ParkDB [6], or for specific tissues, such the Allen Hu- R-implemented scripts (Fig. 2). man and Mouse Brain Atlases [7, 8] among many, have been crucial in allowing scientists the comparison of Gene expression profiles datasets, the application of novel methods to existing The microarray data compared activated satellite cells datasets, and thus a more global view of these biological (ASCs) and quiescent satellite cells (QSCs) from differ- systems. ent experiments. Table 1 describes the public datasets In this work, we generated transcriptome datasets that were taken into account for the analysis with the of satellite cells in different conditions and performed GEO [15] (Gene Expression Omnibus) identifications, comparisons with published datasets. Due to the diver- references, and sample distribution. The new mouse micro- sity of platforms and formats of published datasets, this array datasets include the following comparisons: young was not readily achievable. For this reason, we developed adult Quiescent(adult)/Activated (postnatal day 8) and an interactive tool called Sherpa (SHiny ExploRation Quiescent [high/low]/D3Activated [high/low], and Fetal_- tool for transcriPtomic Analysis) to provide comprehen- NICD [E17.5/E14.5]. Table 1 presents all sample details. sive access to the individual datasets analyzed in a homogeneous manner. This web server allows users to Animals, injuries, and cell sorting (i) identify differentially expressed genes of the individ- Animals were handled according to the national and ual datasets, (ii) identify the enriched gene sets of the European Community guidelines and the ethics commit- individual datasets, and (iii) effectively compare the tee of the Institut Pasteur (CTEA) in France. For isolation chosen datasets. Sherpa is adaptable and serves as a of quiescent satellite cells, Tg: Pax7-nGFP mice (6– repository for the integration and analysis of future tran- 12 weeks) [2] were anesthetized prior to the injury. Tibi- scriptomic data. It has a generic design that makes it alis anterior (TA) muscles were injured with notexin applicable to the analysis of other transcriptome datasets (10 μl–10 μM; Latoxan). Cells were then isolated by FACS generated in a variety of conditions and tissues. using FACS ARIA III (BD Biosciences), MoFlo Astrios Hi Lo We analyzed gene expression profiles (GEPs) of acti- and Legacy (Beckman Coulter) sorters. Pax7 and Pax7 vated and quiescent states of mouse satellite cells derived cells correspond to the 10% of cells with the highest and from three new experimental setups and six publicly avail- the lowest expression of nGFP, respectively, as defined able microarray datasets to define a consensus molecular previously [3]. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 3 of 15 Table 1 Summary of analyzed transcriptomic datasets of activated and quiescent states of mouse muscle stem (satellite) cells. Three new high-throughput experimental setups and six publically available microarray datasets comparing activated satellite cells (ASCs) and quiescent satellite cells (QSCs) are shown in the rows. The biological, experimental, and technical details of each experiment are shown in the different columns of the Table NICD Ref/code Quiescent [high/low] Quiescent Fetal R26 GSE47177 Liu et GSE3483 GSE15155 GSE38870 GSE70376 GSE81096 D3 Activated activated [E17.5/E14.5] al. [9] Fukada et al. [10] Pallafacchina et al. [11] Farina et al. [12] García-Prat et al. [13] Lukjanenko et al. [14] [high/low] Num. of 3 QSC_Pax7 low, 3 3 QSC, 3 3 "QSC," 3 3 QSC, 3 QSC, 3 ASC 3 QSC, 3 ASC 3 QSC, 3 ASC 4 QSC, 4 ASC 6 QSC, 5 ASC samples ASC_Pax7 low, 3 P8_ASC "ASC 3 ASC 60 h, QSC_Pax7 high, 3 3 ASC 84 h ASC_Pax7 high Date 2013 2007 2015 2013 2007 2010 2012 2015 2016 Anatomy Tibialis anterior Limb, body Forelimbs Hindlimb Hindlimb Diaphragm, pectoralis, Tibialis anterior Tibialis anterior Tibialis anterior, wall, diaphragm abdominal muscles gastro-necmius, quadriceps Sex M M, F M F M, F F M M Age 6–8w P8,4–5 w E14.5, E17.5 8 w 8–12 w 6 w 3–6 m Young: 3 m, old: Young: 9–15 w, 20–24 m old: 20–24 m Strain C57BL/6 B6.129 C57BL/6 (Jackson) C57BL/7 C57BL/6 x C57BL/6 C57BL/6 (Janvier) (nihon clea) DBA2 (??) nGFP/+ Cre+ CreER/+ GFP/+ Reporter Tg:Pax7-nGFP Pax7 Myf5 : Pax7 Pax3 (high GFP) stop-NICDgfp/+ eYFP/+ (10% high, R26 R26 10% low) GFP/+ GFP/ Activation Notexin P8 = “activated” E14.5 = “activated” BaCl QSCs in culture Pax3 , Pax3 Injury: BaCl Cardiotoxin Cardiotoxin 2 2 for 4 d :mdx:mdx (50 μL 1.2%) Adult; adult mdx; 1 w old; 3 d in culture CreER/+ QSCs Reporter Reporter Reporter FACS: Pax7 ; FACS: CD45− Reporter FACS: FACS: integrin- CD34+/integrin- eYFP/+ Purif. R26 /SM/C-2.6+ syndecan-3 alpha7+/Lin− alpha7+/Lin- /CD31−/CD45− /CD11b−/Sca1- Timing Quiescent and E17.5(“Q”), E14.5(“A”) 36 h (1.5 d), 60 h 4 d in culture 3 d in culture 12 h or 48 h 72 h (3 d) 72 h (3 d) 3 d postinjury (2.5 d), 84 h (3.5 d) postinjury postinjury Platform Affymetrix Mouse Affymetrix Affymetrix Mouse Affymetrix Mouse Affymetrix 430A Affymetrix 430_2.0 Affymetrix Agilent 028005 Illumina MouseRef- Gene 1.0ST 430_2.0 Gene 1.0ST, Affymetrix Gene 1.0ST 430_2.0 SurePrint G3Mouse 8 v2.0 Mouse Gene 2.0ST 8x60k Microarray h hours, d days, w weeks, m months Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 4 of 15 For isolation of activated satellite cells, TA muscles (day 3 Western blot analysis postinjury (D3) and non-injured) were collected and sub- Total protein extracts from satellite cells isolated by FACS jected to 4–5 rounds of digestion in a solution of 0.08% col- were run on a 4–12% Bis-Tris Gel NuPAGE (Invitrogen) lagenase D (Roche) and 0.1% trypsin (Gibco #31966) diluted and transferred on Amersham Hybond-P transfer mem- in DMEM-1% P/S (Invitrogen) supplemented with DNAse I brane (Ge Healthcare). The membrane was then blocked Hi Lo at 10 μg/ml (Roche, 11284932001) [2, 3]. Pax7 and Pax7 with 5% non-fat dry milk in TBS; probed with anti-JunD cells correspond to the 10% of cells with the highest and the (329) (1:1000, sc-74 Santa Cruz Biotechnology Inc.), anti- lowest expression of nGFP, respectively, as defined previously JunB (N-17) (1:1000, sc-46 Santa Cruz Biotechnology [3]. Inc.), or anti-c-Jun (H-79) (1:1000, sc-1694 Santa Cruz Skeletal muscle progenitors were obtained also from Biotechnology Inc.) overnight; washed and incubated with CreCAP/ the forelimbs of E14.5 and E17.5 fetuses of Myf5 HRP-conjugated donkey anti-rabbit IgG secondary anti- + stop-NICD-nGFP/+ :R26R [16] compound mice. Tissues were body (1:3000); and detected by chemiluminescence (Pierce dissociated in DMEM, 0.1% collagenase D (Roche, ECL2 western blotting substrate, Thermo Scientific) using 1088866), 0.25% trypsin (GIBCO, 15090-046), DNaseI the Typhoon imaging system. After extensive washing, the 10 μg/ml for three consecutive cycles of 15 min at 37 °C membrane was incubated with anti-Histone H3 antibody in a water bath under gentle agitation. For each round, a (ab1691, 1:10,000; abcam) as a loading control. All West- supernatant containing dissociated cells was filtered ern blots were run in triplicate, and bands were quanti- through 70-μm cell strainer, and trypsin was inhibited tated in one representative gel. Quantification was done with foetal calf serum (FCS). Pooled supernatants from using ImageJ software. each round of digestion were centrifuged at 1600 rpm for 15 min at 4 °C, and pellet was re-suspended in cold Isolation of fixed mouse muscle stem cells and real-time DMEM/1% PS/2%FCS and filtered through 40-μmcell PCR strainer. For empirical analysis of genes by RT-qPCR (e.g., Jun In other experiments, skeletal muscles from the limbs, and Fos), skeletal muscles were fixed immediately in body wall, and diaphragm were collected from pups at 0.5% for 1 h in paraformaldehyde (PFA) using a protocol postnatal day 8 (P8, mitotically active satellite cells) and based on the notion that transcripts are stabilized by 4–5 weeks old mice (quiescent satellite cells) of PFA fixation [18, 19]. Briefly, PFA fixed and unfixed nGFP/+ Pax7 knock-in line [17]. Cells were isolated by skeletal muscles were minced as described [4]; fixed FACS based on NICD-GFP or Pax7-nGFP intensity, samples were incubated with collagenase at double the using BD FACS ARIA III and MoFlo Astrios sorters. normal concentration, and mRNA was isolated following FACS based on size, granulosity, and GFP levels using a FACS Aria II (BD Bioscience). Total RNA was extracted Microarray sample preparation from fixed cells with RecoverAll™ (Total Nucleic Acid Total mRNAs were isolated using Qiagen RNAeasy® Mi- Isolation Kit Ambion, Thermo Fisher), according to cro Kit according to the manufacturer’s recommenda- manufacturer instructions. cDNA was prepared by tions; 5 ng of total RNA was reverse transcribed and random-primed reverse transcription (Super-Script II, amplified following the manufacturer’s protocols Invitrogen, 18,064–014), and real-time PCR was done (Ovation Pico WTA System v2 (Nugen Technologies, using SYBR Green Universal Mix (Roche, 13608700) Inc. 3302-12); Applause WTA Amp-Plus System (Nugen StepOne-Plus, Perkin-Elmer (Applied Biosystems). Spe- Technologies, Inc. 5510-24)), fragmented and biotin la- cific primers for each gene were designed, using the Pri- beled using the Encore Biotin Module (Nugen Tech- mer3Plus online software, to work under the same nologies, Inc. 4200-12). Gene expression was determined cycling conditions. For each reaction, standard curves by hybridization of the labeled template to GeneChip for reference genes were constructed based on six four- microarrays Mouse Gene 1.0 ST (Affymetrix). fold serial dilutions of cDNA. All samples were run in Hybridization cocktail and posthybridization processing triplicate. The relative amounts of gene expression were were performed according to the “Target Preparation for calculated with RLP13 expression as an internal standard Affymetrix GeneChip Eukaryotic Array Analysis” proto- (calibrator). RT-qPCR primers used appear in Add- col found in the appendix of the Nugen protocol of the itional file 6: Table S2. fragmentation kit. Arrays were hybridized for 18 h and washed using fluidics protocol FS450 0007 on a Gene- Normalization, quality control, and filtering of GEPs Chip Fluidic Station 450 (Affymetrix) and scanned with Gene expression profiles (GEPs) were processed using an Affymetrix GeneChip Scanner 3000, generating CEL standard quality control tools to obtain normalized, files for each array. Three biological replicates were run probeset-level expression data. For all raw datasets de- for each condition. rived from affymetrix chips, Robust Multi-Array Average Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 5 of 15 expression measure (rma) was used as normalization Gene set enrichment analysis on individual sets method using the affy and the oligo R packages [20, 21]. Each dataset was tested for gene set enrichment All analyses were preferentially conducted at the probe- independently, using the CAMERA competitive test im- set level. Probesets were annotated to gene symbol and plemented in the Limma R package [27] and three gene gene ENTREZ using chip-specific annotations. For gene set collections from the mouse version of the Molecular level results, the probeset with the highest expression Signatures Database MSigDB v6 [28, 29]: (1) Hallmark variability was selected to represent the corresponding gene sets (H), which summarize and represent specific gene. Quality controls were performed on raw data using well-defined biological states or processes displaying a relative log expression (RLE) and Normalized Unscaled coordinate gene expression; (2) Kyoto Encyclopedia of Standard Errors (NUSE) plots from the affyPLM R pack- Genes and Genomes (KEGG) canonical pathways (C2 age [22]. Sample distribution was examined using hier- CP:KEGG), derived from the Kyoto Encyclopedia of Genes archical clustering of the Euclidean distance and and Genomes [30]; and (3) Reactome canonical pathways principal component analysis from the stats [23] and (C2 CP:Reactome) from the curated and peer reviewed FactoMineR R packages [24] (see Additional file 1: Figure pathway database [31] (gene set analysis in Figs. 1 and 2). S1 for the resulting plots for dataset Quiescent [high/ low]/D3Activated [high/low]). The resulting plots of the remaining datasets are not shown, but they presented Multiple set analysis: determination of the quiescent similar trends, which can be explored through the inter- signature active web server Sherpa. The combinatorial landscape of datasets was explored using the SuperExactTest [32] and the UpSetR [33] R Differential expression analysis packages to test and visualize the intersection of the Each dataset was individually analyzed to identify genes datasets. Additionally, the Jaccard index [34] of similarity showing significant differential expression (DEGs) was calculated to assess the extent of similarity between between the ASC and the QSC (gene level analysis in statistically differentially expressed genes (DEGs) of each Fig. 1; differential analysis in Fig. 2). This analysis was pair of datasets. A significance ranking, based on several performed using the linear model method implemented criteria, was calculated for each individual dataset to in the Limma R package [25]. The basic statistic was the determine its presence or absence in the final dataset moderated t-statistic with a Benjamini and Hochberg’s ensemble, which was used for determining the gene multiple testing correction to control the false discovery signature. Once the dataset ensemble was defined, the rate (FDR) [26]. overlapping differentially up- and down-regulated genes Fig. 1 General framework of the analysis: an individual dataset analysis followed by a multi-set analysis. The individual dataset analysis consisted of (i) the analysis of gene expression profiles (GEPs) of each dataset, including normalization, filtering and quality control check of each raw dataset, and the differential analysis to identify dataset-specific differentially expressed genes (DEGs), (ii) the Gene Set Enrichment Analysis (GSEA) performed in the gene set space. The GSEA consisted in identifying enriched pathways from three gene sets of the MSigDB collection [29] (Hallmark gene sets, CP: KEGG gene sets and CP: Reactome gene sets); (iii) the multi-set analysis to assemble a study-independent gene signature, i.e., a list of genes specific to the quiescence state Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 6 of 15 purpose, commonly up-regulated or down-regulated genes were used in a one-sided Fisher’s exact test imple- mented in R script with a Benjamini and Hochberg’s multiple testing correction of the P value to determine the enriched gene sets and the direction of such enrichment. Web application: Sherpa We developed an interactive web application for the exploration, analysis, and visualization of the individual datasets and their combination (http://sherpa.pasteur.fr). This application allows the user to effectively and efficiently analyze the individual datasets one by one (in- dividual dataset analysis) or as an ensemble of datasets (multi-set analysis) and was developed with the Shiny R package [36]. Results This study involves an individual dataset analysis followed by a multi-set analysis (Fig. 1). First, each raw dataset was normalized, filtered, and subjected to the same quality controls and checks. Gene-level differential analysis and gene set enrichment analysis were then performed (Fig. 2). Finally, a multi-set analysis assembled a platform- independent list of genes specific to the quiescent state. Fig. 2 Workflow of the standardized individual dataset analysis. The When analyzing multiple microarray GEPs, however, sev- analysis of the nine datasets was performed in a consistent manner eral issues needed to be addressed regarding the experi- for each dataset using ad hoc R scripts. It included the first step of mental setup, the microarray platforms and the laboratory data preparation followed by a second step of data analysis. GEPs were processed using standard quality control tools to obtain conditions [37]. First, the individual studies, even if re- normalized, probeset-level expression data. For raw datasets derived from lated, had different aims, experimental designs, and cell affymetrix chips, Robust Multi-Array Average expression measure (rma) populations of interests (e.g., developmental stage and was used as normalization method. All analyses were conducted at gender of mice). Second, the different microarray plat- probeset level. Probesets were annotated to gene symbol and gene forms contained different probes and probesets with spe- ENTREZ using chip-specific annotations. Quality controls were performed on raw data using RLE and NUSE plots. The distribution of the QSC and cific locations and alternative splicing that might produce ASC samples according to their GEPs was explored using hierarchical different expression results [38]. Finally, sample prepar- clustering of the Euclidean distance and principal component analysis ation, protocols, and dates of extractions might have influ- (Additional file 1: Figure S1). Statistically, differentially expressed genes enced array hybridization and introduced bias [39]. This (DEGs) were identified between the ASC and the QSC groups using the experimental heterogeneity required critical data process- linear model implemented by the Limma R package [10]. Gene set enrichment analysis was based on three gene set collections from the ing to ensure statistically meaningful assumptions to drive mouse version of the Molecular Signatures Database MSigDB biological interpretation and compile gene signatures. For v6.0 [12, 13]: (1) Hallmark, which summarizes and represents specific this, we used a standardized workflow to reduce the tech- well-defined biological states or processes displaying a coordinate nical variations between datasets. Specifically, this work- gene expression; (2) KEGG canonical pathways, derived from the Kyoto flow applied (i) the same normalization method for the Encyclopedia of Genes and Genomes [14]; and (3) Reactome canonical pathways from the curated and peer-reviewed pathway database [15]. experiments having the same microarray chips, (ii) the To test for the enrichment of these gene sets, the competitive gene same quality control criteria to discard poor-quality sam- set test CAMERA [16] was used ples, (iii) the same aggregation method for summarizing probesets into single genes, and (iv) the same filtering in (DEGs, as defined by the adjusted P value ≤ 0.05) were all datasets. The filtering of the datasets was based on the used to build the quiescent signature. same significance criteria which included a minimum number of differentially expressed genes, the presence of Gene set enrichment analysis on the quiescent signature genes known to be differentially expressed between quies- An over-representation analysis (ORA) [35] was applied cent and activated states from previous studies, and a to the quiescent signature using the previously described similarity measure among the datasets. Table 1 summa- gene collection (Hallmark, Kegg, Reactome). For this rizes the main biological and experimental variations in Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 7 of 15 this study, as well as the technical differences present in was considered to be an outlier and was not included in the datasets. the final analysis. Three new sets of microarrays of quiescent versus The number of significantly up- and down-regulated activated satellite cell are reported here (see Table 1). The genes (DEGs) resulting from the differential expression first one is part of a developmental and postnatal series analysis of the quiescent with respect to the activated that was reported previously [16] (E12.5 vs. E17.5), and states were calculated (Additional file 5: Table S1). DEGs here, P8 (postnatal day 8, in vivo proliferating) and 4– were identified as having |logFC| ≥ 1 and a false discov- 5 week old (quiescent) mice were compared. The second ery rate FDR ≤ 0.05. The statistical analysis was per- one is based on previously reported differences in quies- formed at the probeset level, and only those probesets cent and proliferating cell states in subpopulations of satel- matching to genes are reported. On average, the datasets lite cells (quiescent: dormant, top 10% GFP+ cells vs. exhibited 1548 up-regulated genes with a standard primed, bottom 10% GFP+ cells isolated from Tg:Pax7- deviation of 1173 genes. The number of down-regulated nGFP mice; proliferating: 3 days postinjury [3]). The third genes corresponded to 2122, with a standard deviation dataset is based on previous observations that the Notch of 1658 genes. The lowest number of DEGs belonged to intracellular domain (NICD) when expressed constitu- the Fetal_NICD[E17.5/E14.5] dataset (39 up, 136 down), Cre stop-NICD tively (Myf5 : R26 ) in prenatal muscle progeni- while the highest number of DEGs belonged to the tors leads to cell-autonomous expansion of the myogenic GSE70376 dataset (4367 up, 6346 down). Additionally, progenitor population (Pax7+/Myod−) and the absence of an analysis of the distribution of the logFC across the differentiation, followed by premature quiescence at late datasets revealed that there were significant differences fetal stages (E17.5) [16]. Here, E17.5 (quiescent) and E14.5 among the ranges and shapes of such distributions for (proliferating) prenatal progenitors were compared. Ex- each dataset (Additional file 2: Figure S2). cept for our datasets Quiescent(adult)/Activated(P8) and Fetal_NICD[E17.5/E14.5], all the studies were conducted Gene set level analysis reveals common underlying on adult mice (male and female) with ages ranging from biological processes across the datasets 8 weeks to 6 months. Despitethe great differenceamong the number of DEGs While all datasets shared similar cell states (quiescent for the different sets, clear trends among the significantly (QSC) and activated (ASC) satellite cells), the experimen- enriched pathways were found (Fig. 3a). This heatmap tal procedures varied between studies. Activation of cells, shows each dataset as a column and each enriched gene set for instance, was achieved in different ways: (i) in vitro, by as a row. The gene set collection that was tested for enrich- culturing freshly isolated satellite cells for several days and ment corresponds to the Hallmark gene set collection from (ii) in vivo, by extracting ASCs from an injured muscle. MSigDB [40]. Enriched gene sets corresponding to over- Furthermore, for in vivo activation, several techniques expressed genes are shown in red, while enriched gene sets were used to induce the injuries—BaCl , or the snake that were generally abundant in under-expressed genes are venoms cardiotoxin or notexin. Cell extraction protocols shown in blue. Out of the 11 datasets, GSE38870 stood as also varied among the different studies: (i) using trans- an outlier for both over- and under-represented gene sets genic mice expressing a reporter gene that marks satellite compared to the rest. For the other ten datasets, most of cells (several alleles) or (ii) using a combination of anti- them showed an enrichment in the quiescent state for the bodies targeting surface cell antigens specific to satellite TNFA_SIGNALING_VIA_NFKB pathway (nine datasets), cells (several combinations, see Table 1). Finally, the nine while eight datasets were enriched in UV_RESPONSE_DN, datasets that were examined in this study date from 2007 IL6_JAK_STAT3_SIGNALING, APICAL_SURFACE, and to 2016. During this period, microarray technologies KRAS-SIGNALING_DN pathways. Similarly, the ten data- evolved, and the different chips available may introduce sets shared similar trends for under-expressed genes in the yet another source of variation among the compared pathways MYC_TARGETS_V1, E2F_TARGETS, G2M_ datasets. To carry out a statistically meaningful ana- CHECKPOINT, and OXYDATIVE_PHOSPORYLATION, lysis of these extensively heterogeneous datasets, crit- all of which are expected to be absent in the quiescent state. ical data processing was required to interpret gene In total, two subnetworks corresponding to 8 under- and signatures as described in the workflow (Fig. 1). 15 over-expressed enriched gene sets could be distin- guished (Fig. 3b). A network representation of the top 3 most commonly found enriched gene sets (nodes, thick- The number of differentially expressed genes varies outlined circles) is shown in Fig. 3b for the over-expressed significantly among different datasets (TNFA_SIGNALING_VIA_NFKB, UV_RESPONSE_DN, I A total of 32 samples from ASCs and 34 samples from L6_JAK_STAT3_SIGNALING) and under-expressed (MY QSCs from the nine datasets were analyzed. After the C_TARGETS_V1, E2F_TARGETS, G2M_CHECKPOINT) quality control, one sample from the GSE38870 dataset categories. The size of each node corresponds to the total Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 8 of 15 ab Fig. 3 Enriched gene sets across individual datasets. Enriched (over-represented) gene sets with over-expressed genes are shown in red; enriched gene sets with under-expressed genes are shown in blue. a Gene set enrichment profiles using the Hallmark gene set collection from MSigDB [40], each row corresponds to a gene set, and each column corresponds to a dataset. b Network representation of three most common over- and under-expressed gene sets (denoted by the thick border on the node) along with the gene sets sharing genes with them (connector lines). Nodes represent gene sets with a node circle size proportional to the number of times the gene sets appear as enriched in the different datasets (see panel a). Thickness of the connecting lines is proportional to the number of shared genes between nodes number of times that the gene set was enriched in all the differentially expressed within each dataset, in order to datasets, and the thickness of the interconnecting lines is identify genes commonly up- or down-regulated in the proportional to the number of genes shared between quiescent state. Although the aforementioned technical connected nodes. Gene sets sharing less than 10% of their and experimental heterogeneity could introduce noise in genes are not shown. We noted also that different gene sets this analysis, such variation was distinguishable from the had a varying number of genes in common (Fig. 3b); if the more stable, underlying common quiescent signature. gene overlap were large, those gene sets (and their Given that the distribution and ranges of the logFCs corresponding biological functions) will likely be also varied so drastically between datasets (Additional file 2: affected (i.e., activated or repressed). For the three most Figure S2), a single FC (fold change) threshold could not common enriched gene sets with under-expressed be chosen to be used for all datasets. Thus, for the genes, for example, we noted that gene set MYC_TAR- combinatorial analysis approach, we set out to maximize GETS_V1 shares most of its genes with gene sets the number of differentially expressed genes common to E2F_TARGETS and G2M_CHECKPOINT. This sug- all the datasets that were considered, where only the ad- gests that the three categories represented by these justed P value was used as a threshold to define DEGs. gene sets had an interplay of genes that displays them However, even in this low constrained scenario, combin- all as under-expressed. ing all the datasets together resulted in very few overlap- ping genes: 12 up (Arntl, Atf3, Atp1a2, Cdh13, Dnajb1, Determining a quiescent transcriptional signature among Enpp2, Ier2, Jun, Nfkbiz, Rgs4, Usp2, Zfp36) and 1 down all datasets (Igfbp2). Alternatively, when certain datasets were To determine a consensus quiescent signature from the excluded from the analysis, the number of DEGs in- datasets, we compared the genes found to be creased (Fig. 4a). Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 9 of 15 ab Fig. 4 Different combinatorial landscapes result in different degrees of stringency for the list of genes defining the quiescent state of satellite cells. a Barplot indicating the number of overlapping differentially expressed genes (DEGs) for each best combination of intersections, from degree 2 to 11. The dots underneath the barplot indicate the datasets included in the intersections. The total number of up (UP) and down (DOWN) DEGs for each dataset are indicated in light gray and dark gray, respectively. Panels b and c are the colored matrices showing the Jaccard index between each pair of datasets, for UP DEGs and DOWN DEGs, respectively. Dendrograms show the hierarchical clustering using the Jaccard index as Euclidean distance Combinatorial assessment of datasets according to first two closest datasets belonged to studies originating significance and similarity criteria from the same laboratory underscores the impact of To find the best combination of datasets defining a technical biases. The hierarchical clustering of the consistent and sufficiently large quiescent signature, we Euclidean distance of the Jaccard indexes shows that for ranked them according to their significance. This signifi- up- and down-regulated genes, the datasets Fetal_- cance was determined according to an ensemble of cri- NICD[E17.5/E14.5], GSE38870, and GSE81096 had a teria. First, the dataset should have a minimum number tendency to not group with the rest of the datasets. In of DEGs. Our Fetal_NICD[E17.5/E14.5] dataset, for in- addition to these criteria, others can be used to assess stance, had only 250 DEGs (Additional file 5: Table S1), the significance of the datasets. Choosing the datasets and using it in the analysis resulted in a dramatically re- according to the activation or extraction method of the duced number of overlapping DEGs (Additional file 3: cells, for example, would result in a more stringent Figure S3). A second criterion was the presence of genes ensemble of datasets. known to be differentially expressed between quiescent Taking into account the dataset significance (based on the and activated states from previous studies. In this case, number of DEGs and presence of some reported quiescent datasets GSE38870 and GSE81096 did not meet this cri- markers) and the low extent of overlap between Fetal_- terion, since they lacked genes known to be associated NICD[E17.5/E14.5], GSE38870, and GSE81096 datasets with or regulating the quiescent state such as Calcr, with respect to the remaining datasets, these three datasets Notch1, Chrdl2, Lama3, Pax7, and Bmp6 genes (unpub- were excluded from the multi-dataset analyses. The final lished data, see Fig. 7; [41–43]). As a third criterion, we ensemble comprised the eight remaining datasets which used the dataset similarity, which was assessed using the had 207 and 542 genes commonly up- and down-regulated, Jaccard index (JI), and a matrix of the JIs for the up- and respectively (Fig. 4a). To further characterize these down-regulated genes was generated (Figs. 4b, c, re- commonly regulated genes, we performed an over- spectively). In both matrices, the closest pairs of datasets representation analysis (ORA) of the gene sets. An enrich- were GSE47177 at 60 h and GSE47177 at 84 h (JI = 0.46 ment was detected for the 207 commonly up-regulated and 0.44 for the up- and down-regulated genes, respect- genes in seven different Hallmark gene sets (Fig. 5a). Some ively), followed by the second pair of closest sets Quies- genes were shared among different pathways (e.g., Atf3 and cent [high]/D3Activated [high] and Quiescent [low]/ Il6 were found in six different gene sets), while others were D3Activated [low] (JI = 0.39 and 0.33, for up- and down- found in one gene set only (e.g., Tgfbr3, Spsb1). These re- regulated genes, respectively). The observation that the sults are consistent with the individual gene set enrichment Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 10 of 15 a b Fig. 5 Gene expression of differentially expressed genes (DEGs) in satellite cells. a Binary heatmap of the over-representation analysis. Each col- umn represents one enriched (over-represented) gene set, and each row corresponds to a gene. Red cells indicate the presence of the corre- sponding gene in a given gene set. b Network representation of 39 GOSlim terms used to characterize the commonly regulated genes in satellite cells. Nodes represent gene sets with a node size proportional to the gene set size. Edges indicate that genes are shared among the gene sets. The thickness of the edge is proportional to the number of shared genes. Also shown are the heatmaps of logFC for genes belonging to extracellular matrix, nucleic acid binding and cell cycle and proliferation, nucleic acid binding, and signal transduction activity, respectively. Each row corresponds to a gene and each column corresponds to a dataset. Dendrograms show hierarchical clustering using the Euclidean distance analysis (see Fig. 3) emphasizing that these genes reflect the To assign a global function to the commonly regulated global traits associated with the quiescent state. Note that genes, we annotated them using GOSlim terms, which only a fraction of the 207 genes was found in known exist- summarize broad terms based on Gene Ontology (GO) ing gene sets (57/207), leaving about three-quarters of the terms [44]. To identify categories of genes, we generated commonly up-regulated genes not associated with any heatmaps of the logFC in the different datasets for a sub- existing gene set. This finding was expected given that a set of the 207 UP genes belonging to the extracellular quiescent signature is yet to be defined, and thus current matrix, nucleic acid binding activity (+/− cell cycle prolif- gene sets lack such annotations. To facilitate the analysis of eration), and signal transduction activity (Fig. 5b). transcriptomes as described here, we have developed an on- Unexpectedly, some genes associated with cell cycle pro- line interactive tool called Sherpa (Fig. 6). Sherpa allows liferation, such as c-Fos and c-Jun, were up-regulated in users to perform analyses on individual and on multiple the quiescent cell analyses in all datasets (Fig. 7a). To ver- datasets. Each individual dataset analysis involves the identi- ify the transcriptional relevance of these genes in fication of differentially expressed genes; comparison of the quiescent cells, we used a protocol to isolate satellite cells expression of selected genes in the quiescent and activated in which a short fixation (PFA) treatment was performed states through tables, heatmaps, and volcano plots; and ex- prior to harvesting the cells to arrest de novo transcription ploration of the distribution of the samples according to during the isolation protocol (see the “Methods” section). their variability through principal component analysis and Then, the expression level quantification was assessed at cluster analysis. The multiple dataset analysis allows the the transcript (RT-qPCR) and protein (Western blot) level comparison of selected datasets according to the commonly at different time points after isolation for a number of differentially expressed genes. All of these analyses are inter- genes (Fig. 7b, c). Notably, quantifications of c-Jun, Jun B, active, as they allow the user to select the thresholds of fold and Jun D levels showed that at time 0 (+PFA), these change (logFC) and false discovery rate (adj. P value). genes were not detected in quiescent cells, neither at the Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 11 of 15 Fig. 6 Snapshot of the interactive web application for transcriptomic data exploration and comparison. Sherpa (http://sherpa.pasteur.fr) allows users to perform individual dataset and multiple dataset analysis. In the individual dataset analysis (shown), the user chooses the dataset for which the analysis is to be performed. Then, it is possible to identify differentially expressed genes (e.g., volcano plot), compare the expression of selected genes in the quiescent and activated state (e.g., heatmap, as shown in the figure), and the distribution of the samples according to their variability (principal component analysis). All these analyses are interactive, as they allow the user to set the thresholds of fold change (logFC) and false discovery rate (adj. P value) mRNA (right panel) nor at the protein (left panel) level comparisons across divergent datasets that are heteroge- (Fig. 7c). However, these genes were up-regulated using neous in the platform and biological condition. Notably, conventional satellite cell isolation protocols that take sev- this pipeline allowed the examination of 11 datasets, eral hours. As a control, PFA treatment after cell isolation including three novel transcriptomes from our work, as had no effect on this expression pattern (Additional file 4: well as the identification of a variety of functional gene Figure S4). This rapid up-regulation was then followed by sets that appear in common with the majority of the a decline in expression levels of these genes (Fig. 7b, c), datasets. To perform this analysis, it was necessary to suggesting that this is the result of a stress response that is standardize every step of the analysis to attenuate the associated with the isolation procedure. impact of heterogeneity inherent in all of the datasets due to experimental, biological, and technical variations. Discussion These varying conditions led us to perform a combina- The transcriptome analysis and pipeline, as well as the torial assessment of the individual datasets according to Sherpa interface that we describe here, allow multi-scale their significance and similarity criteria. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 12 of 15 Fig. 7 Direct comparison of fixed and unfixed satellite cells identify immediate response genes not present the in vivo state. a Boxplots of four examples of genes found commonly upregulated in QSCs in the different datasets showing the distribution of intensities values in QSCs and ASCs. Colored dots indicate each dataset. Shape of the dot indicates whether the gene is significantly differentially expressed or not. b Fold change of mRNA (log10) between 0 h + PFA and 5 h + PFA. Blue bars indicate a higher expression in 0 h + PFA condition; the red bars indicate a higher expression in 5 h + PFA condition. Color intensities are proportional to the fold change. c c-Jun, Jun B, and Jun D protein levels from satellite cells at 0, 5, 10, 15 h after isolation (with and without PFA treatment) were measured by Western blotting, and band intensities were quantified by densitometric analysis with the ImageLab software (right). Basal levels of c-Jun, Jun B, and Jun D mRNA from satellite cells at 0, 5, 10, 15 h after isolation (with and without PFA treatment) were measured by real-time PCR (left) Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 13 of 15 Variations in datasets are not unique to the study of genes [56]. Using a novel isolation protocol based on the muscle stem cells. Indeed, the last decades have notion that tissues that are fixed prior to processing result witnessed many efforts to analyze microarray data to in stabilized mRNA [18, 19], we validated the expression provide relevant gene signatures. In cancer biology, for of several genes including Calcr and Teneurin4 (Tenm4) example, gene markers were sought either for prognosis, as true quiescent markers. In contrast, we show that Fos i.e., lists of genes able to predict clinical outcome [45] or and Jun transcripts and Jun family proteins are not present for molecular subtyping, i.e., list of genes able to classify at significant levels in vivo, but are robustly induced different subtypes of a disease [46, 47]. However, even if within 5 h, the average processing time taken for isolation markers performed well, gene signatures derived from by FACS of satellite cells. These results are concordant studies on the same treatments and diseases often with a recently published paper in which immediate early resulted in gene lists with little overlap [48]. In other and heat-shock genes were rapidly up-regulated during cases, the signatures proved to be unstable, having other the cell isolation procedure [57]. We propose that these gene lists on the same dataset with the same predictive and other stress response genes mitigate the quiescent to power [49]. These observations suggest that such sig- activation transition that accompanies the initial steps of natures may include causally related genes, i.e., down- exit from G0. stream of the phenotype-causing genes, and that Given these unexpected findings, the comparison of these gene lists may share the same biological path- transcriptomes of satellite cells from a fixed/in vivo state ways [50]. with those that were described here would be important Gene Set Enrichment Analysis (GSEA) has become an to delineate homeostatic vs. immediate early response efficient complementary approach for analyzing omic genes. For that purpose, Sherpa allows the integration of data in general and GEPs in particular [50–52]. It shifts datasets from fixed samples, or other methodologies, the expression analysis from a gene space to a gene set when they will be available. Beyond the present findings, space, where genes are organized into gene sets we propose that all transcriptome data obtained from according to a common feature, such as a functional cells isolated from solid tissues, which require extensive annotation (e.g., a Gene Ontology term) or a specific enzymatic digestion and processing before isolation of metabolic pathway (e.g., a KEGG pathway). In this way, RNA, need to be re-evaluated to distinguish those genes it incorporates previously existing biological knowledge that are induced by the isolation procedure. to drive and increase interpretation, while offering In addition to generating this open access compen- greater robustness and sensitivity than gene level strat- dium of GEPs, we provide a standardized pipeline that egies [50, 53, 54]. sets the basis for a multi-set analysis for an effective and In spite of the heterogeneity in datasets examining qui- systematic comparison of individual datasets. Analyzing escent muscle satellite cells, we were able to identify genes multiple datasets provides generalized information that were consistently up- and down-regulated among the across different studies [38, 39]. The cancer field was a different datasets (Additional file 5: Table S1). The final pioneer in combining several works [58, 59] and other multi-set analysis comprised eight datasets which had 207 fields, such as neurodegenerative diseases [60, 61] and and 542 genes that were commonly up- and down- regulatory genomics have successfully adopted this strat- regulated, respectively. Moreover, the gene set enrichment egy [62]. The multi-dimensional approach presented analysis of the individual datasets showed striking similar- here offers increased power, due to the higher sample ities on the over- and under-represented gene sets. These size and increased robustness, by highlighting variations gene sets, which summarize and represent well-defined in individual studies results [37, 63]. Such variations biological states and processes in the cells, were shared are the consequence of the high level of noise and ar- among the different datasets. They include an over- tefacts and are typically associated with microarray representation of genes in the TNFα pathway via NFKβ data [64]. signaling, Il6-Jak-Stat3 signaling, and the apical surface processes, and an under-representation of MYC and E2F Conclusions targets, and genes associated with the G2 M checkpoint Here, we compile the first comprehensive catalog of gene and oxidative phosphorylation. Some markers such as expression data of myogenic cells across distinct states Calcitonin receptor (Calcr), Teneurin4 (Tenm4), and stress and conditions, providing a global perspective on quies- pathways identified previously were also present in our cence. An extensive comparison of the transcriptomic analysis [11, 41, 55] (Additional file 6: Table S2). However, profiles of mouse skeletal muscle satellite cells in quies- we also report that virtually all datasets contained genes cent and activated states resulting from nine datasets re- that would be expected to be present during activation or vealed common features among the different studies from cell cycle entry, such as members of the Fos and Jun family other features which are more specific to the individual previously identified as immediate early stress response datasets. In spite of heterogeneities across platforms, we Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 14 of 15 were able to identify genes that were consistently up- and Funding ST was funded by Institut Pasteur, Centre National pour la Recherche down-regulated among the different datasets. By doing so, Scientific, the Agence Nationale de la Recherche (Laboratoire d’Excellence we developed and made available an open-access inter- Revive, Investissement d’Avenir; ANR-10-LABX- 73), and the European active exploratory tool called Sherpa (SHiny ExploRation Research Council (advanced research grant 332893). tool for transcriPtomic Analysis) that allows statistically Availability of data and materials valid analyses and systematic comparisons that cannot be The generated transcriptome datasets are available from the corresponding performed directly on the datasets. Finally, by obtaining author on reasonable request. Public datasets are available at https:// www.ncbi.nlm.nih.gov/geo/ under their corresponding identification mRNA directly from fixed muscle tissue for empirical test- number. ing of genes present during quiescence in vivo, we identi- fied immediately early expressed stress response genes Authors’ contributions that were present in all datasets due to the isolation and NP, SM, and ST analyzed and interpreted the data regarding the gene expression profiles. SY, MBB, HS, and RS performed the experiments to processing protocols used previously for solid tissues. generate transcriptome data. FP contributed to the data analysis. DDG and FP performed the validation experiments of candidate genes. All authors read and approved the final manuscript. Additional files Ethics approval and consent to participate Not applicable. Additional file 1: Figure S1. Quality controls and data sample distribution for Quiescent [high/low]/D3Activated [high/low] dataset. a Consent for publication Relative log expression (RLE) and b normalized unscaled standard errors Not applicable. (NUSE) plots for the D3P7 dataset show that as expected for good quality data, RLE median values are centered around 0.0, while the median standard error should be 1 for most genes in the NUSE plots. A sample Competing interests distribution is distributed according to status (D3H: activated, high; D3L: The authors declare that they have no competing interests. activated, low; QH: quiescent, high; QL: quiescent, low) using principal component analysis (c) and hierarchical clustering of the Euclidean Publisher’sNote distance (d). (PDF 103 kb) Springer Nature remains neutral with regard to jurisdictional claims in Additional file 2: Figure S2. Violin plots of the logFC distribution for published maps and institutional affiliations. each individual dataset. Density plots of the logFC (|logFC| < 1 in red; |logFC| > 1 in blue. (PDF 156 kb) Author details Additional file 3: Figure S3. Effect of adding NICD[E17.5/E14.5] dataset Bioinformatics and Biostatistics Hub, C3BI, USR 3756 IP CNRS, Institut on the best combinations of datasets. Impact of including or excluding Pasteur, 75015 Paris, France. Stem Cells and Development, Department of NICD dataset on overall analysis. (PDF 395 kb) Developmental and Stem Cell Biology, Institut Pasteur, 75015 Paris, France. 3 4 CNRS UMR 3738, Institut Pasteur, 75015 Paris, France. Institute for Stem Cell Additional file 4: Figure S4. Effect of PFA treatment at different time Biology and Regenerative Medicine, GKVK PO, Bellary Road, Bengaluru points in the experimental procedure. Control experiments showing no 560065, India. Dipartimento di Medicina Clinica e Chirurgia, Università degli effect of PFA on gene expression measurements. (PDF 445 kb) Studi di Napoli Federico II, Via S. Pansini 5, 80131 Naples, Italy. Novo Nordisk Additional file 5: Table S1. Identified differentially expressed genes in Foundation Center for Stem Cell Biology, DanStem, University of the QSCs condition for the nine datasets. Differentially expressed genes Copenhagen, 3B Blegdamsvej, DK-2200 Copenhagen N, Denmark. in the QSCs condition for the nine datasets using logFC = 1 and FDR = 0.05. (XLSX 48 kb) Received: 3 August 2017 Accepted: 29 November 2017 Additional file 6: Table S2. Primers used for validation of gene expression by RT-qPCR. Primers used for RT-qPCR studies in Fig. 7. (PDF 14 kb) References 1. Li L, Clevers H. Coexistence of quiescent and active adult stem cells in mammals. Science. 2010;327:542–5. Abbreviations 2. Sambasivan R, Gayraud-Morel B, Dumas G, et al. Distinct regulatory cascades ASCs: Activated satellite cells; DEGs: Statistically differentially expressed genes; govern extraocular and pharyngeal arch muscle progenitor cell fates. Dev FACS: Fluorescence-activated cell sorting; FC: Fold change; FDR: False Cell. 2009;16:810–21. discovery rate; FSC: Functional scoring method; GEO: Gene Expression 3. Rocheteau P, Gayraud-Morel B, Siegl-Cachedenier I, et al. A subpopulation Omnibus; GEPs: Gene expression profiles; GFP: Green fluorescent protein; of adult skeletal muscle stem cells retains all template DNA strands after cell GO: Gene Ontology; GSEA: Gene set enrichment analysis; KEGG: Kyoto division. Cell. 2012;148:112–25. Encyclopedia of Genes and Genomes; logFC: Logarithm of the fold change; 4. Gayraud-Morel B, Pala F, Sakai H, et al. Isolation of muscle stem cells from NICD: Notch intracellular domain; NUSE: Normalized Unscaled Standard mouse skeletal muscle. Methods Mol Biol Clifton NJ. 2017;1556:23–39. Errors; ORA: Over-representation analysis; P8: Postnatal day 8; 5. Network TCGAR, Weinstein JN, Collisson EA, et al. The cancer genome atlas PFA: Paraformaldehyde; QSCs: Quiescent satellite cells; RLE: Relative Log pan-cancer analysis project. Nat Genet. 2013;45:1113–20. Expression; RMA: Robust Multi-Array Average expression measure; 6. Taccioli C, Maselli V, Tegnér J, et al. ParkDB: a Parkinson’s disease gene Sherpa: SHiny ExploRation tool for transcriPtomic Analysis; TA: Tibialis expression database. Database. 2011; https://doi.org/10.1093/database/ Anterior; TCGA: The Cancer Genome Atlas bar007. Epub ahead of print 1 Jan 2011 7. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al. An anatomically Acknowledgements comprehensive atlas of the adult human brain transcriptome. Nature. 2012; We would like to thank K. Soni and U. Borello for their assistance in the early 489:391–9. stages of this work and P. Mourikis and F. Relaix for communicating 8. Lein ES, Hawrylycz MJ, Ao N, et al. Genome-wide atlas of gene expression in unpublished results and sharing protocols. We also acknowledge the Flow the adult mouse brain. Nature. 2007;445:168–76. Cytometry Platform of the Technology Core-Center for Translational Science 9. Liu L, Cheung TH, Charville GW, et al. Chromatin modifications as (CRT) at Institut Pasteur for the support in conducting this study and the determinants of muscle stem cell quiescence and chronological aging. Cell microarray platform at Institut Cochin. Rep. 2013;4:189–204. Pietrosemoli et al. Skeletal Muscle (2017) 7:28 Page 15 of 15 10. Fukada S, Uezumi A, Ikemoto M, et al. Molecular signature of quiescent 39. Irizarry RA, Warren D, Spencer F, et al. Multiple-laboratory comparison of satellite cells in adult skeletal muscle. Stem Cells. 2007;25:2448–59. microarray platforms. Nat Methods. 2005;2:345–50. 11. Pallafacchina G, François S, Regnault B, et al. An adult tissue-specific stem 40. Liberzon A, Birger C, Thorvaldsdóttir H, et al. The molecular signatures cell in its niche: a gene profiling analysis of in vivo quiescent and activated database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–25. muscle satellite cells. Stem Cell Res. 2010;4:77–91. 41. Yamaguchi M, Watanabe Y, Ohtani T, et al. Calcitonin receptor signaling 12. Farina NH, Hausburg M, Betta ND, et al. A role for RNA post-transcriptional inhibits muscle stem cells from escaping the quiescent state and the niche. regulation in satellite cell activation. Skelet Muscle. 2012;2:21. Cell Rep. 2015;13:302–14. 42. Mourikis P, Sambasivan R, Castel D, et al. A critical requirement for notch 13. García-Prat L, Martínez-Vicente M, Perdiguero E, et al. Autophagy maintains signaling in maintenance of the quiescent skeletal muscle stem cell state. stemness by preventing senescence. Nature. 2016;529:37–42. Stem Cells. 2012;30(2):243–52. 14. Lukjanenko L, Jung MJ, Hegde N, et al. Loss of fibronectin from the aged 43. Günther S, Kim J, Kostin S, et al. Myf5-positive satellite cells contribute to stem cell niche affects the regenerative capacity of skeletal muscle in mice. Pax7-dependent long-term maintenance of adult muscle stem cells. Cell Nat Med. 2016;22:897–905. Stem Cell. 2013;13(5):590–601. 15. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional 44. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification genomics datasets—update. Nucleic Acids Res. 2013;41:D991–5. of biology. Nat Genet. 2000;25:25–9. 16. Mourikis P, Gopalakrishnan S, Sambasivan R, et al. Cell-autonomous Notch 45. Paquet ER, Cui J, Davidson D, et al. A 12-gene signature to distinguish colon activity maintains the temporal specification potential of skeletal muscle cancer patients with better clinical outcome following treatment with 5- stem cells. Dev Camb Engl. 2012;139:4536–48. fluorouracil or FOLFIRI. J Pathol Clin Res. 2015;1:160–72. 17. Sambasivan R, Comai G, Le Roux I, et al. Embryonic founders of adult 46. Wang J, Mi J-Q, Debernardi A, et al. A six gene expression signature defines muscle stem cells are primed by the determination gene Mrf4. Dev Biol. aggressive subtypes and predicts outcome in childhood and adult acute 2013;381:241–55. lymphoblastic leukemia. Oncotarget. 2015;6:16527–42. 18. Houzelstein D, Tajbakhsh S. Increased in situ hybridization sensitivity using 47. Three-gene model to robustly identify breast cancer molecular subtypes | non-radioactive probes after staining for β-galactosidase activity. Tech Tips JNCI: Journal of the National Cancer Institute | Oxford Academichttps:// Online. 1998;3:147–9. academic.oup.com/jnci/article-lookup/doi/10.1093/jnci/djr545 (accessed 5 19. Machado L, Esteves de Lima J, Fabre O, et al. In Situ Fixation Redefines June 2017). Quiescence and Early Activation of Skeletal Muscle Stem Cells. Cell Stem 48. Fan C, Oh DS, Wessels L, et al. Concordance among gene-expression-based Cell. 2017;21:1982–93. predictors for breast cancer. N Engl J Med. 2006;355:560–9. 20. Gautier L, Cope L, Bolstad BM, et al. Affy—analysis of Affymetrix GeneChip 49. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with data at the probe level. Bioinformatics. 2004;20:307–15. microarrays: a multiple random validation strategy. Lancet. 2005;365:488–92. 21. Carvalho BS, Irizarry RA. A framework for oligonucleotide microarray 50. Abraham G, Kowalczyk A, Loi S, et al. Prediction of breast cancer prognosis preprocessing. Bioinformatics. 2010;26:2363–7. using gene set statistics provides signature stability and biological context. 22. Bolstad BM. Low-level analysis of high-density oligonucleotide array data: BMC Bioinformatics. 2010;11:277. background, normalization and summarization. University of California; 2004. 51. Mootha VK, Lindgren CM, Eriksson K-F, et al. PGC-1alpha-responsive genes 23. R Core Team. R: a language and environment for statistical computing. involved in oxidative phosphorylation are coordinately downregulated in Vienna: R Foundation for Statistical Computinghttps://www.R-project.org/; human diabetes. Nat Genet. 2003;34:267–73. 52. Varn FS, Ung MH, Lou SK, et al. Integrative analysis of survival-associated 24. FactoMineR: An R Package for Multivariate Analysis | Lê | Journal of gene sets in breast cancer. BMC Med Genet. 8 https://doi.org/10.1186/ Statistical Softwarehttps://www.jstatsoft.org/article/view/v025i01 (accessed 1 s12920-015-0086-0. Epub ahead of print 12 Mar 2015 June 2017). 53. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene 25. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression sets: methodological issues. Bioinformatics. 2007;23:980–7. analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 54. Luo W, Friedman MS, Shedden K, et al. GAGE: generally applicable gene set 2015;43:e47. enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161. 26. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical 55. Ishii K, Suzuki N, Mabuchi Y, et al. Muscle satellite cell protein teneurin-4 and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. regulates differentiation during muscle regeneration. Stem Cells. 2015;33: 1995;57:289–300. 3017–27. 27. Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter- 56. Lamph WW, Wamsley P, Sassone-Corsi P, et al. Induction of proto-oncogene gene correlation. Nucleic Acids Res. 2012;40:e133. JUN/AP-1 by serum and TPA. Nature. 1988;334:629–31. 28. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: 57. van den Brink SC, Sage F, et al. Single-cell sequencing reveals dissociation- a knowledge-based approach for interpreting genome-wide expression induced gene expression in tissue subpopulations. Nat Methods. 2017;14: profiles. Proc Natl Acad Sci. 2005;102:15545–50. 29. WEHI Bioinformatics—mouse and human orthologs of the MSigDBhttp:// 58. Rhodes DR, Barrette TR, Rubin MA, et al. Meta-analysis of microarrays: bioinf.wehi.edu.au/software/MSigDB/ (accessed 20 Apr 2017). interstudy validation of gene expression profiles reveals pathway 30. KEGG: Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/ dysregulation in prostate cancer. Cancer Res. 2002;62:4427–33. kegg/ (accessed 20 Apr 2017). 59. Grützmann R, Boriss H, Ammerpohl O, et al. Meta-analysis of microarray data 31. Reactome Pathway Databasehttp://www.reactome.org/ (accessed 20 Apr on pancreatic cancer defines a set of commonly dysregulated genes. 2017). Oncogene. 2005;24:5079–88. 32. Wang M, Zhao Y, Zhang B. Efficient test and visualization of multi-set 60. Cruz-Monteagudo M, Borges F, Paz-y-Miño C, et al. Efficient and biologically intersections. Sci Rep. 2015;5:16923. relevant consensus strategy for Parkinson’s disease gene prioritization. BMC 33. Lex A, Gehlenborg N, Strobelt H, et al. UpSet: visualization of intersecting Med Genet. 9 https://doi.org/10.1186/s12920-016-0173-x. Epub ahead of sets. IEEE Trans Vis Comput Graph. 2014;20:1983–92. print 9 Mar 2016 34. Jaccard P. Étude comparative de la distribution florale dans une portion des 61. Oerton E, Bender A. Concordance analysis of microarray studies identifies Alpes et des Jura. Bull Société Vaudoise Sci Nat. 1901;37:547–79. representative gene expression changes in Parkinson’s disease: a 35. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches comparison of 33 human and animal studies. BMC Neurol. 17 https://doi. and outstanding challenges. PLoS Comput Biol. 2012;8:e1002375. org/10.1186/s12883-017-0838-x. Epub ahead of print 23 Mar 2017 36. Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie, Jonathan McPherson. shiny: 62. Consortium TEP. An integrated encyclopedia of DNA elements in the Web Application Framework for R https://CRAN.R-project.org/package=shiny human genome. Nature. 2012;489:57–74. (2017). 63. Russ J, Futschik ME. Comparison and consolidation of microarray datasets of 37. Campain A, Yang YH. Comparison study of microarray meta-analysis human tissue expression. BMC Genomics. 2010;11:305. methods. BMC Bioinformatics. 2010;11:408. 64. Kitchen RR, Sabine VS, Simen AA, et al. Relative impact of key sources of 38. Ramasamy A, Mondry A, Holmes CC, et al. Key issues in conducting a meta- systematic noise in Affymetrix and Illumina gene-expression microarray analysis of gene expression microarray datasets. PLoS Med. 5 https://doi. experiments. BMC Genomics. 2011;12:589. org/10.1371/journal.pmed.0050184. Epub ahead of print September 2008

Journal

Skeletal MuscleSpringer Journals

Published: Dec 22, 2017

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off