Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound sites

Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound... Background: Transcription factor overexpression is common in biological experiments and transcription factor amplification is associated with many cancers, yet few studies have directly compared the DNA-binding profiles of endogenous versus overexpressed transcription factors. Methods: We analyzed MyoD ChIP-seq data from C2C12 mouse myotubes, primary mouse myotubes, and mouse fibroblasts differentiated into muscle cells by overexpression of MyoD and compared the genome-wide binding profiles and binding site characteristics of endogenous and overexpressed MyoD. Results: Overexpressed MyoD bound to the same sites occupied by endogenous MyoD and possessed the same E-box sequence preference and co-factor site enrichments, and did not bind to new sites with distinct characteristics. Conclusions: Our data demonstrate a robust fidelity of transcription factor binding sites over a range of expression levels and that increased amounts of transcription factor increase the binding at physiologically bound sites. Keywords: Transcription factor, Overexpressed, MyoD, c-Myc, ChIP-seq Background basis to assume that it does. As logical as this assertion The biological sciences have always relied on model sys- may seem, there is very little experimental evidence to sup- tems to test specific hypotheses, and the general validity of port or refute it, possibly because genome-wide factor each model system is constantly subject to vigorous debate. binding and transcriptional activation have only recently This is particularly true when transcription factors are been possible to assess. Yet, understanding the functional overexpressed in cells, and the investigator(s) extrapolate consequences of transcription factor overexpression is very their findings to the function of the endogenous factor. Al- important in cancer cell biology where gene amplifications, though overexpression studies have yielded many signifi- such as N-MYC amplification in neuroblastomas, promote cant advances in our understanding of cell biology, their tumor progression. validity is routinely challenged, particularly in manuscript Previously we reported the genome-wide binding of en- and grant reviews. Intuitively this skepticism is justified. dogenous MyoD in mouse muscle cells and compared Biochemistry predicts that higher factor concentrations will that to exogenously expressed MyoD in mouse embryonic drive non-physiological protein interactions or, in the case fibroblasts (MEFs) transduced with a MyoD expressing of transcription factors, DNA binding. Therefore, it is often retrovirus [1]. The transduced MEFs had levels of MyoD asserted that overexpression of a transcription factor will protein very similar to the endogenous MyoD and showed not accurately reflect the function of that factor at physio- a very similar binding profile. Here we compare the logic levels of expression or, at a minimum, that there is no genome-wide binding of endogenous MyoD in mouse skeletal muscle cells with highly overexpressed MyoD in MEFs to determine whether overexpression qualitatively * Correspondence: stapscot@fhcrc.org Equal contributors alters the binding profile. We find that the overexpressed Human Biology Division, Seattle, WA, USA 2 MyoD binds to the same sites as endogenous MyoD and Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 does not demonstrate binding to novel regions or motifs. Fairview Avenue North, Seattle, WA 98109, USA Full list of author information is available at the end of the article © 2013 Yao et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Yao et al. Skeletal Muscle 2013, 3:8 Page 2 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Our study shows that overexpression of MyoD accurately with similar GC content and distance to TSS. We infer a identifies sites bound by endogenous MyoD, suggesting an positional weight matrix (PWM) model from an output intrinsic biological robustness for varying levels of tran- motif using an iterative expectation-maximization (EM) scription factor in a cell. refinement process, which is similar to MEME [5]. Methods ChIP-seq sample comparison ChIP-seq The scatter plot of the MyoD peak heights in the two ChIP was performed as previously described [1,2]. ChIP samples (endogenous MyoD and lenti-MyoD) indicated samples were prepared for sequencing per the Illumina a strong correlation (Pearson correlation 0.52 with asinh Sample Preparation protocol with two modifications: (1) transformation). Nevertheless, due to the different ori- DNA fragments of 150–300 bp were selected at the gel- gins of samples, the overall variation between the two selection step; (2) 21 cycles of PCR were performed at was still far greater than their respective technical repli- the amplification step instead of 18. For the control sam- cates (data not shown). Therefore, it was challenging to −/− −/− ples, untransduced MEFs derived from Myod /Myf5 apply an appropriate statistical null model to capture the mice were ChIPed with MYOD antibody, and mouse stochastic variation between the two systems and to myotubes were ChIPed with preimmune serum. identify exactly the set of peaks that are identical or dif- ferent between the samples. Therefore, we chose a non- MyoD lentivirus parametric approach by comparing the overlap of peaks cDNA for Myod was cloned into the GFP locus of at a spectrum of different rank cutoffs in order to out- the pRRL.SIN.cPPT.PGK-GFP.WPRE lentiviral backbone line the global landscape of peak similarity. Cross cell- (Addgene), with expression thus driven by the Pgk pro- type comparison was performed similarly as previously moter. Replication-incompetent lentiviral particles were described [4]. We ranked all peaks by their p-values and packaged in 293T cells by the Fred Hutchinson Cancer group ranks into bins of 5,000 (i.e., the top 5K peaks, Research Center Lentivirus Core Facility. MEFs were then the top 10K peaks, etc.). Then we computed the transduced in DMEM containing polybrene at 8 μg/ml. fraction of the top x peaks in one sample that overlap After 24 h, media were replaced; cells were switched to with the top y peaks in another sample, where x and y differentiation media (1% heat inactivated horse serum, vary from 5K to 110K, and y is equal to or greater than x. 10 μg/ml insulin, 10 μg/ml transferrin) 48 h after To compare the coverage at E-boxes in endogenous and infection and harvested 36–40 h later. ChIP and Western lenti peaks, and to quantify the distribution of peak height blot were performed with a previously characterized ratios between the two samples, we adjusted for the differ- MYOD antibody [3]. Western blot bands were quantified ent numbers of total reads by sub-sampling equal num- with ImageJ. bers of endogenous and lenti reads and recomputed the coverage and peak height at these sites. ChIP-seq peak calling and significance inference Sequences were extracted using the GApipeline soft- Ethical approval ware. Reads mapping to the X and Y chromosomes were This study did not directly use vertebrate animals or hu- excluded from our analysis. Reads were aligned using man subjects and did not require ethical approval. MAQ to the mouse genome (mm9). Duplicate se- quences were discarded to minimize effects of PCR amplification. Each read was extended in the sequencing Results orientation to a total of 200 bases to infer the coverage Comparison of different MyoD expression levels in the at each genomic position. Peak calling was performed by conversion of fibroblasts to skeletal muscle an in-house developed R package that models back- Mouse embryonic fibroblasts (MEFs) can be converted to ground reads by a negative binomial distribution condi- skeletal muscle by the forced expression of MyoD. To de- tioned on GC content as previously described [1,2,4]. termine whether overexpression of MyoD can reliably The control ChIP-seq sample was used to eliminate sta- identify biologically relevant binding sites, we compared tistically significant peaks likely due to artifact. the binding profile of the endogenous MyoD in mouse muscle cells to the MyoD binding profile in MEFs with Motif analysis exogenously overexpressed MyoD. Transduction of MEFs We used a discriminative de-novo motif discovery tool de- with a lentivirus expressing MyoD from the Pgk promoter scribed previously [2] to find motifs that distinguish fore- (lenti-MyoD) induced differentiation to skeletal muscle as ground and background sequence data sets. To find determined by fusion and expression of myosin heavy motifs enriched under ChIP-seq peaks, we selected back- chain (Figure 1A). Western analysis demonstrated that ground sequences using random genomic regions sampled lenti-MyoD cells had approximately four-fold higher levels Yao et al. Skeletal Muscle 2013, 3:8 Page 3 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 of MyoD protein than the endogenous MyoD of C2C12 three control ChIP samples (data not shown) and remain myotubes (Figure 1B). of unknown etiology. These non-MyoD peaks were subtracted from the MyoD ChIP-seq data sets. Overexpressed MyoD binds the same sites as ChIP-seq for the endogenous MyoD identified ~37,000 -10 endogenous MyoD peaks at a p-value (see Methods) of 10 (read cutoff -5 To determine whether overexpressed MyoD can accur- ~20), ~67,000 peaks at a p-value of 10 (read cutoff -3 ately identify sites bound by endogenous MyoD, we ~11), and ~117,000 at a p-value of 10 (read cutoff ~8). compared a ChIP-seq data set obtained from MEFs A similar range of peaks was identified by the overexpressed transduced with lenti-MyoD [4] to our previous ChIP- MyoD but at slightly higher p-value thresholds: ~35,000 -20 -10 seq data from endogenous MyoD in mouse myotubes (p~10 , cutoff ~50), ~68,000 (p~10 , cutoff ~26), and -5 [1]. The lenti-MyoD data set had 17.5 million mapped ~122,000 (p~10 ,cutoff~14). At agiven p-value, the unique reads, and we combined endogenous MyoD overexpressed MyoD had approximately twice the number ChIP-seq data from mouse C2C12 myotubes (6.5 million of peaks compared to the endogenous MyoD. This could ei- reads) and primary differentiated cultured mouse muscle ther represent higher occupancy of the same sites bound by cells (8.5 million reads) to achieve a comparable total 15 the endogenous MyoD or a large number of off-target sites million reads for endogenous MyoD because our prior bound by the overexpressed MyoD and not bound by the analysis [1] demonstrated a high concordance of peak lo- endogenous MyoD. cations between these two samples. These reads were To accurately compare the similarity, or overlap, of processed and peaks were identified as described in MyoD binding sites in the different samples, we used a Methods and Additional file 1: Figure S1A and B. The non-parametric approach that compared the overlap of control ChIP-seq samples (~18 million pooled reads from peak locations based on the rank order of the peaks in pre-immune ChIP in muscle cells, MyoD antisera ChIP in each sample (see Methods). Comparing the top 35,000 MEFs that do not express MyoD, and beads alone) peaks bound by endogenous MyoD and lenti-MyoD, contained a small number of high peaks (Additional file 1: there was a 67% overlap, and the overlap was similar Figure S1B), which were found at similar locations in all comparing the top 70,000 or 110,000 peaks for each (Figure 2A). The lack of a complete overlap at each cut- off was largely due to the rank-order of the peaks (based on p-value) rather than distinct binding regions, because 87% of the top 35,000 lenti-MyoD peaks were repre- sented in the top 70,000 endogenous-MyoD peaks and 93% in the top 110,000 endogenous-MyoD peaks. Similarly, 85% and 91% of the top 35,000 endogenous- MyoD peaks were present in the top 70,000 and 110,000 lenti-MyoD peaks, respectively. A more detailed represen- tation of this data is shown in Figure 2B. Therefore, although there were some differences in the rank order of the peaks, the locations were almost the same with greater than 90% concordance. Overexpressed MyoD binds the same motifs as endogenous MyoD Since MyoD binds as a heterodimer with an E-protein to an E-box containing a CANNTG core sequence, with preference for GC or CC as the internal nucleotides [1,4], we next determined whether overexpression of MyoD resulted in binding to a distinct set of low affinity sites or sites that might reflect homodimers or other Figure 1 MEFs transduced with MyoD lentivirus. (A) MEFs protein complexes. We used two different approaches to transduced with MyoD lentivirus demonstrate nearly complete determine the binding site preferences for endogenous conversion into myotubes 72 h after infection (red: MYOD antibody; and overexpressed MyoD. green: myosin heavy chain antibody; blue: DAPI). (B) Western blot demonstrates higher MyoD expression in MEFs transduced with First, we ranked all of the approximately 15 million E-boxes MyoD lentivirus (2) compared to control MEFs (1), C2C12 myoblasts in the mouse genome based on their ChIP-seq coverage (3), and C2C12 myotubes (4). Tubulin blot demonstrates (see Methods for details of the statistical model) and then equivalent loading. binned them by rank as the top 1,000, 1,001–10,000, Yao et al. Skeletal Muscle 2013, 3:8 Page 4 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 2 The MyoD binding regions are largely shared between overexpressed MyoD in MEFs and endogenous MyoD. (A) Overlap of lenti-MyoD peaks with endogenous MyoD peaks. To assess the concordance between the two samples, we selected the top 5K to 110K peaks in each sample based on p-value. We calculated the number of overlapping peaks for the top peak sets at various rank cutoffs in both samples and divided this number by the size of the smaller peak set. Specifically, for a cell corresponding to the top x peaks in sample 1 compared to the top y peaks in sample 2, the fraction is computed as the number of overlapping peaks divided by the smaller value of x or y. (B) The overlapping fractions were calculated as in A and are plotted with color-coding as specified in the figure. For example, in the cell at the row labeled 5K and column labeled 10K, we plotted the fraction of the top 5K peaks in lenti-MyoD that overlap with the top 10K peaks in primary + C2C12 myotubes. 10,001-100,000, etc. For both endogenous and lenti-MyoD, endogenous MyoD and overexpressed MyoD (Figure 3B). the CAGCTG and CACCTG E-boxes were enriched for The identified E-box motifs, and flanking preferences MyoD binding (Figure 3A) in similar proportions within were nearly identical for endogenous MyoD and each bin. Therefore, overexpressed MyoD binds a similar overexpressed MyoD at different rankings, although distribution of E-box sequences as the endogenous MyoD. the lower ranked peaks had a slightly more degener- For the second approach to determine whether ate sequence compared to the higher ranked peaks in overexpression of MyoD resulted in binding to a different both groups. Plotting the average position weight set of E-boxes, we used a motif discovery algorithm to matrix (PWM) score for the highest PWM E-box identify preferred E-box sequences associated with the top under each peak against the rank of the peaks dem- 35,000 ranked peaks (ranked on p-value), the 35,001- onstrates that the average PWM for the low ranked 70,000 peaks, and the 70,001-110,000 peaks for both peaks (rank > 85,000) falls off more rapidly for the Yao et al. Skeletal Muscle 2013, 3:8 Page 5 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 3 MyoD has a similar E-box preference for both endogenous and overexpressed MyoD. (A) Overexpressed MyoD (lenti) and endogenous MyoD (primary.tube) have similar E-box distributions. We collected all genomic E-boxes (excluding sex chromosomes and those present in the peaks of the control samples) and ranked them based on p-values for the read coverage at the E-boxes. We partitioned them into bins of top 1K, top 1001 to 10K, etc., until all E-boxes are included. Within each bin, we calculated the percentage of each type of E-box variant, and plotted the distribution. The background E-box distribution over the entire genome is also included as a reference. (B) MYOD binding sites for overexpressed MyoD (lenti) and endogenous MyoD (primary.tube) share the same sequence preference. We used motif discovery to identify the E-box motif under MyoD bound peaks for the top 35K, the top 35K+1 to 70K peaks, and the top 70K+1 to 110K peaks. The E-box sequence preferences are nearly identical, including within the flanking regions. Motifs in lower ranking peaks tend to have slightly more sequence degeneracy. (C) MyoD E-box average PWM score compared to peak rank. The MyoD PWM is derived from our previous study [4]. Weak peaks tend to have weaker motifs, but the degradation is more gradual for overexpressed MyoD peaks (lenti) beyond the top 67K, suggesting that a subset of noisy low peaks in the endogenous MyoD (primary.tube) is elevated to reasonably strong peaks distinguished from the background. X-axis: the peak rank bins. Y-axis: the average PWM scores for all peaks within the rank bin. endogenous MyoD compared to the lenti-MyoD PWM score, suggesting that the overexpression of (Figure 3C). This likely reflects the difficulty of accur- MyoD enhances the ability to discriminate weaker ately discriminating weak MyoD binding sites from MyoD binding sites from background. Taken together, background reads for the endogenous MyoD, whereas the data indicate that overexpressed MyoD in MEFs the lenti-MyoD maintains a higher average E-box binds to a nearly identical set of sites and E-boxes as Yao et al. Skeletal Muscle 2013, 3:8 Page 6 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 the endogenous MyoD in mouse muscle cells without using a de novo motif search strategy (see Methods). The a substantial number of off-target sites. peaks in the mouse muscle cells were enriched in E-box To determine whether overexpressed MyoD was bound motifs (>7-fold) and had modest enrichment for several to all of the E-boxes that match the consensus site, we other motifs: MEIS (1.6-fold), RUNX (1.3-fold), and AP1 graphed the number of reads over all the RRCAGSTG (2.2-fold). Similarly, the peaks in the MEFs with sites in the mappable genome (Figure 4A), sub-sampling overexpressed MyoD were enriched for E-box motifs the same number of reads for both the overexpressed (>6-fold), MEIS (1.6-fold), and RUNX (1.4-fold) relative to and endogenous MyoD ChIP-seq data sets. Despite the background sequence (Additional file 1: Figure S2A). It overexpression in the lenti-MyoD samples, the majority remains possible that some of the rank differences reflect of these high PWM E-boxes had one read or less, similar the relative abundance of co-factors in the different cell to the endogenous-MyoD samples, and for both samples backgrounds, but there is not a strong association of a spe- fewer than 25% of these high PWM E-boxes had more cific factor motif with the MyoD binding sites. than four reads. Therefore, for both the endogenous and Although there was about 90% concordance between overexpressed MyoD, a minor subset of high PWM endogenous and overexpressed MyoD peaks, albeit with E-boxes was occupied by MyoD. some difference in ranking position, approximately 5-7% Previously, we had measured E-box accessibility in of the highly ranked peaks in each set was not repre- fibroblasts prior to the expression of MyoD by exposing sented in the top 110,000 peaks in the other set. Motif isolated nuclei to the restriction enzyme PvuII, which analysis of the sequences under the endogenous-only cleaves at CAGCTG E-boxes [4]. Using these data, we peaks (i.e., ranked in the top 30,000 endogenous peaks assigned each CAGCTG E-box to one of three groups: but not in the top 110,000 lenti-MyoD peaks) compared relatively inaccessible, moderately accessible, or highly to peaks only present in the lenti-MyoD (i.e., ranked in accessible. About 40% of all highly accessible CAGCTG the top 30,000 lenti-MyoD peaks but not in the top E-boxes had more than four reads in the endogenous 110,000 endogenous MyoD peaks) showed an enrich- MyoD samples, and slightly over 50% had four or more ment of a variant of the RUNX motif (2-fold), a PITX- reads in the overexpressed MyoD samples; this compares like motif (4-fold), and a CGNCAG motif (2.7-fold). A to approximately 15% for both samples over the rela- similar motif analysis comparing the endogenous-only tively inaccessible E-boxes (Figure 4B). However, only peaks to shared peaks also revealed the RUNX and about one-third of the highly accessible group had read PITX-like motifs, albeit at a slightly lower fold enrich- coverage above the average cutoff for the top ~70,000 ment. The comparable analysis to identify motifs in the peaks (11 reads for endogenous MyoD and 26 reads for lenti-only peaks revealed a slight enrichment for E-boxes overexpressed MyoD). Therefore, E-box accessibility in with non-preferred core sequences (Additional file 1: the chromatin was a major determinant of MyoD bind- Figure S2B), indicating that a small number of the lenti- ing, but a large fraction of relatively highly accessible only peaks might represent binding to lower affinity E-boxes with a strong PWM remained unbound by sites, possibly driven by the higher amount of MyoD. MyoD, indicating that only a subset of accessible E-boxes Therefore, while there might be some contribution of with a good PWM showed substantial MyoD binding even co-factors expressed in the mouse muscle cells that ac- when MyoD was overexpressed. In this regard, our prior counts for the small number of endogenous-only peaks, study [4] showed that several sequence motifs were the motif analysis does not identify more than a modest enriched in the region of bound accessible sites (additional enrichment of the motif for any specific factor, consist- E-boxes, higher PWM E-boxes, and a motif similar to a ent with the finding that the MyoD binding sites in both MEIS binding site), indicating that several factors might cells types show over 90% concordance. operate to enhance or stabilize MyoD binding at subsets of accessible sites. Discussion We conclude that overexpression of MyoD can accur- Co-factor motifs and MyoD binding ately identify endogenous MyoD binding sites. This is Although there was a very high concordance of binding true despite the fact that the binding pattern of the en- sites for endogenous MyoD in mouse muscle cells and dogenous MyoD was determined in skeletal muscle cells overexpressed MyoD in MEFs, there was some differ- (primary myotubes and differentiated C2C12 cells), ence in peak rank, as evidenced by a 67% overlap of the whereas the binding sites of the overexpressed MyoD top 35,000 peaks in each set with most of the additional were determined in MEFs. The concordance of binding 33% of peaks present in the other set at lower rank. sites in these two cell types might reflect the ability of Since the E-box motifs were similar in both sets and did MyoD to convert MEFs to skeletal muscle. In this not apparently account for rank differences, we exam- process, MyoD activates the expression of many co- ined the top 30,000 peaks in each set for co-factor motifs factors that cooperate in a feed-forward circuit with Yao et al. Skeletal Muscle 2013, 3:8 Page 7 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 4 MyoD binds a subset of accessible E-boxes. (A) Coverage distribution over consensus MyoD binding sites RRCAGSTG. X-axis: log2 transformed coverage. Y-axis: the proportion of sites with coverage greater than the given value x. To make read coverage more comparable, we sub-sampled the same number of reads in each group. The distributions of coverage at RRCAGSTG sites in both samples are shown in solid lines. For comparison, E-boxes other than RRCAGSTG sites are shown in dashed lines. Only the E-boxes that are uniquely mapped within a ±200-bp window are included. (B) Coverage distribution over E-boxes similar to panel A but divided into E-boxes showing relatively low accessibility (0,1], moderate accessibility (1,2], or relatively high accessibility (2,Inf], as previously determined by PvuII accessibility [4]. Yao et al. Skeletal Muscle 2013, 3:8 Page 8 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 MyoD to orchestrate gene expression [7-9]. Since MyoD function in enhancing gene transcription. Together these can activate its own co-factors, the initial differences in findings suggest that transcription factor overexpression, co-factor expression between muscle cells and MEFs e.g., induced by gene amplification or other mechanisms might not alter the ultimate binding pattern of MyoD. in cancers, might have major biological consequences as a In differentiating muscle cells, MyoD binds DNA as a result of increased binding at physiologically bound sites. heterodimer with an E-protein. However, in vitro binding studies demonstrate that MyoD can form Conclusions homodimers and bind E-boxes. Therefore, we had antic- Our comparison of genome-wide binding of endogenous ipated that overexpressed MyoD might bind as a MyoD with overexpressed MyoD demonstrated that homodimer because of limiting amounts of E-proteins. overexpressed MyoD binds to the same sites as endo- Surprisingly, we do not think there is any evidence for genous MyoD and does not demonstrate binding to novel homodimer binding. The E-box motif analysis of the en- regions or motifs. The samples with overexpressed MyoD dogenous MyoD has asymmetric flanking sequences: showed better foreground-to-background signal and per- RRCAGSTG. In a recent study we have shown that mitted site determination at higher statistical significance, NeuroD2 binds an E-box with similar flanking prefer- suggesting that increased amounts of transcription factor ences on one side: RRCAGMTGG [4]. Because both increased the binding at physiologically bound sites. Over- MyoD and NeuroD2 form heterodimers with the same all, our study shows that overexpression of MyoD accur- E-proteins, we assume that the flanking RR is deter- ately identifies sites bound by endogenous MyoD and mined by the common E-protein partner, and initial demonstrates an intrinsic biological robustness for varying binding studies support this conclusion (AP Fong, un- levels of transcription factor in a cell. published data). Since these flanking preferences are maintained at the E-boxes when MyoD is overexpressed, Accession numbers it suggests that the E-protein determined sequence pref- ChIP-seq data have been deposited in Gene Expression erence is maintained and that MyoD is binding as a Omnibus (GEO) under accession number GSE34906 (lenti- heterodimer even when overexpressed. It is possible that overepressed MyoD) and in DDBJ Sequence Read Archive the requirement for heterodimer binding prevents off- (DRA) accession number SRP001761 (endogenous MyoD). target DNA binding by the overexpressed MyoD since the amount of the E-protein dimerization partner would be limiting. Additional file The fact that overexpression of MyoD improved the foreground-to-background signal and permitted site deter- Additional file 1: Figure S1. (A) The percentage of genome covered at a given p-value significance. X-axis corresponds to the negative logarithm of the mination at higher p-value stringency suggests that many p-value significance level, and Y-axis is the fraction of genome covered, both of the MyoD binding sites might not be occupied 100% of in log10 scale. The pink curve (ref) corresponds to the estimated percentage the time at endogenous levels of MyoD, although this re- of genome covered at the given p-value cutoff based on the null hypothesis in each sample, and the blue curve (obs) is the observed percentage of mains speculative since other unknown variables might genome covered at the given p-value cutoff. FDR is defined as the ratio of have affected the ChIP efficiencies or foreground/back- observed vs. background genome covered at a given p-value. The three ground read ratios in the different experiments. With these panels correspond to overexpressed MyoD (lenti), endogenous MyoD (primary.tube), and the control samples (control). (B) Pairwise comparison of caveats in mind, if the majority of sites are not saturated by control samples. We have three types of control: pooled reads from pre- physiological levels of MyoD (i.e., not bound by MyoD immune ChIP in muscle cells (Tube preimmune), MyoD antisera ChIP in MEFs 100% of the time) then these would present a large “sink” that do not express MyoD (MEF control), and beads alone (MEF bead). Reads from all control lanes are pooled to infer peaks at very low significance (p- for the overexpressed MyoD protein, which might further -3 value 10 ), and we calculate the maximum coverage for each sample at limit ectopic binding. In this regard, it is interesting to note these peaks. The pairwise comparison of coverage of each sample in square that re-analysis of published c-Myc binding ChIP-seq data root transformation is shown. Figure S2: Motif enrichment analysis for regions under overexpressed MyoD peaks and endogenous MyoD peaks. (A) [6] under low and high serum conditions that result in an Motifs enriched under all overexpressed MyoD peaks (lenti) or all endogenous approximately five-fold change in c-Myc mRNA also MyoD peaks (primary.tube). (B) Motifs specific to endogenous or shows enhanced binding of weakly bound sites with in- overexpressed MyoD peaks. Primary-Lenti: Motifs enriched in peaks present only in endogenous MyoD compared to peaks present only in overexpressed creased c-Myc levels (ZY and SJT, unpublished data). MyoD. Primary-Shared: Motifs enriched in endogenous-only peaks compared Furthermore, while this manuscript was under review, Lin to peaks present in both groups, i.e., shared peaks. Lenti-Shared: Motifs et al. [10] demonstrated that increased amounts of c-Myc enriched in peaks only in overexpressed MyoD compared to shared peaks. Consensus, consensus sequence for the motif; Anno, annotated factor for motif protein resulted in greater saturation of weakly bound c- consensus; scores, the regression z-values representing the discriminative Myc sites near promoters and this was associated with power of the motif for separating the foreground and background where increased gene transcription. Additional studies will be positive values indicate enriched motifs and negative values indicate depleted motifs; ratio, the enrichment (or depletion) ratio of the motifs in the required to determine whether increased MyoD bind- foreground relative to the background; fg.frac, the percentage of the ing at physiologically unsaturated sites has a similar Yao et al. Skeletal Muscle 2013, 3:8 Page 9 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 8. Penn BH, Bergstrom DA, Dilworth FJ, Bengal E, Tapscott SJ: A MyoD- foreground sequences containing the motif; bg.frac, the percentage of the generated feed-forward circuit temporally patterns gene expression background sequences containing the motif; logo, the PWM logo. during skeletal muscle differentiation. Genes Dev 2004, 18:2348–2353. 9. Tapscott SJ: The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development 2005, 132:2685–2695. Abbreviations 10. Lin CY, Loven J, Rahl PB, Paranal RM, Burge CB, Bradner JE, Lee TI, Young MEFs: Mouse embryonic fibroblasts; lenti-MyoD: MEFs transduced with MyoD RA: Transcriptional amplification in tumor cells with elevated c-Myc. lentivirus; PWM: Position weight matrix; EM: Expectation-maximization. Cell 2012, 151:56–67. Competing interests doi:10.1186/2044-5040-3-8 The authors declare that they have no competing interests. Cite this article as: Yao et al.: Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound sites. Skeletal Muscle 2013 3:8. Authors’ contributions ZY performed the computational analysis; APF and YC performed the experiments; WLR and RCG provided oversight for the computational analysis; SJT provided oversight for the biological experiments; all authors participated in the experimental design and interpretation. All authors read and approved the final manuscript. Acknowledgments This study was supported by NIH NIAMS R01AR045113; A.P.F was supported by a grant from the University of Washington Child Health Research Center, NIH U5K12HD043376-08; Z.Y. was supported by an Interdisciplinary Training Program grant, T32 CA080416. We thank Mark Biggin for suggesting analysis of saturation and Bruno Amati and Heiko Muller for sharing coverage data from their c-Myc study [6]. Author details 1 2 Human Biology Division, Seattle, WA, USA. Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA. Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA. Department of Pediatrics, University of Washington, School of Medicine, Seattle, WA 98105, USA. Departments of Computer Science and Engineering and Genome Sciences, Seattle, WA, USA. Department of Neurology, University of Washington, School of Medicine, Seattle, WA 98105, USA. Received: 2 October 2012 Accepted: 6 March 2013 Published: 8 April 2013 References 1. Cao Y, Yao Z, Sarkar D, Lawrence M, Sanchez GJ, Parker MH, MacQuarrie KL, Davison J, Morgan MT, Ruzzo WL, et al: Genome-wide MyoD binding in skeletal muscle cells: a potential for broad cellular reprogramming. Dev Cell 2010, 18:662–674. 2. Palii CG, Perez-Iratxeta C, Yao Z, Cao Y, Dai F, Davison J, Atkins H, Allan D, Dilworth FJ, Gentleman R, et al: Differential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages. EMBO J 2011, 30:494–509. 3. Tapscott SJ, Davis RL, Thayer MJ, Cheng PF, Weintraub H, Lassar AB: MyoD1: a nuclear phosphoprotein requiring a Myc homology region to convert fibroblasts to myoblasts. Science 1988, 242:405–411. 4. Fong AP, Yao Z, Zhong JW, Cao Y, Ruzzo WL, Gentleman RC, Tapscott SJ: Genetic and Epigenetic Determinants of Neurogenesis and Myogenesis. Dev Cell 2012, 22(4):721–735. 5. Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proceedings of International Conference on Intelligent Systems for Submit your next manuscript to BioMed Central Molecular Biology; ISMB. Proc Int Conf Intell Syst Mol Biol 1995, 3:21–29. and take full advantage of: 6. Perna D, Faga G, Verrecchia A, Gorski MM, Barozzi I, Narang V, Khng J, Lim KC, Sung WK, Sanges R, et al: Genome-wide mapping of Myc binding and gene regulation in serum-stimulated fibroblasts. Oncogene 2012, 31:1695–1709. • Convenient online submission 7. Cao Y, Kumar RM, Penn BH, Berkes CA, Kooperberg C, Boyer LA, Young RA, • Thorough peer review Tapscott SJ: Global and gene-specific analyses show distinct roles for Myod • No space constraints or color figure charges and Myog at a common set of promoters. EMBO J 2006, 25:502–511. • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Skeletal Muscle Springer Journals

Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound sites

Loading next page...
 
/lp/springer-journals/comparison-of-endogenous-and-overexpressed-myod-shows-enhanced-binding-cMdJn1VUeG
Publisher
Springer Journals
Copyright
Copyright © 2013 by Yao et al.; licensee BioMed Central Ltd.
Subject
Life Sciences; Cell Biology; Developmental Biology; Biochemistry, general; Systems Biology; Biotechnology
eISSN
2044-5040
DOI
10.1186/2044-5040-3-8
pmid
23566431
Publisher site
See Article on Publisher Site

Abstract

Background: Transcription factor overexpression is common in biological experiments and transcription factor amplification is associated with many cancers, yet few studies have directly compared the DNA-binding profiles of endogenous versus overexpressed transcription factors. Methods: We analyzed MyoD ChIP-seq data from C2C12 mouse myotubes, primary mouse myotubes, and mouse fibroblasts differentiated into muscle cells by overexpression of MyoD and compared the genome-wide binding profiles and binding site characteristics of endogenous and overexpressed MyoD. Results: Overexpressed MyoD bound to the same sites occupied by endogenous MyoD and possessed the same E-box sequence preference and co-factor site enrichments, and did not bind to new sites with distinct characteristics. Conclusions: Our data demonstrate a robust fidelity of transcription factor binding sites over a range of expression levels and that increased amounts of transcription factor increase the binding at physiologically bound sites. Keywords: Transcription factor, Overexpressed, MyoD, c-Myc, ChIP-seq Background basis to assume that it does. As logical as this assertion The biological sciences have always relied on model sys- may seem, there is very little experimental evidence to sup- tems to test specific hypotheses, and the general validity of port or refute it, possibly because genome-wide factor each model system is constantly subject to vigorous debate. binding and transcriptional activation have only recently This is particularly true when transcription factors are been possible to assess. Yet, understanding the functional overexpressed in cells, and the investigator(s) extrapolate consequences of transcription factor overexpression is very their findings to the function of the endogenous factor. Al- important in cancer cell biology where gene amplifications, though overexpression studies have yielded many signifi- such as N-MYC amplification in neuroblastomas, promote cant advances in our understanding of cell biology, their tumor progression. validity is routinely challenged, particularly in manuscript Previously we reported the genome-wide binding of en- and grant reviews. Intuitively this skepticism is justified. dogenous MyoD in mouse muscle cells and compared Biochemistry predicts that higher factor concentrations will that to exogenously expressed MyoD in mouse embryonic drive non-physiological protein interactions or, in the case fibroblasts (MEFs) transduced with a MyoD expressing of transcription factors, DNA binding. Therefore, it is often retrovirus [1]. The transduced MEFs had levels of MyoD asserted that overexpression of a transcription factor will protein very similar to the endogenous MyoD and showed not accurately reflect the function of that factor at physio- a very similar binding profile. Here we compare the logic levels of expression or, at a minimum, that there is no genome-wide binding of endogenous MyoD in mouse skeletal muscle cells with highly overexpressed MyoD in MEFs to determine whether overexpression qualitatively * Correspondence: stapscot@fhcrc.org Equal contributors alters the binding profile. We find that the overexpressed Human Biology Division, Seattle, WA, USA 2 MyoD binds to the same sites as endogenous MyoD and Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 does not demonstrate binding to novel regions or motifs. Fairview Avenue North, Seattle, WA 98109, USA Full list of author information is available at the end of the article © 2013 Yao et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Yao et al. Skeletal Muscle 2013, 3:8 Page 2 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Our study shows that overexpression of MyoD accurately with similar GC content and distance to TSS. We infer a identifies sites bound by endogenous MyoD, suggesting an positional weight matrix (PWM) model from an output intrinsic biological robustness for varying levels of tran- motif using an iterative expectation-maximization (EM) scription factor in a cell. refinement process, which is similar to MEME [5]. Methods ChIP-seq sample comparison ChIP-seq The scatter plot of the MyoD peak heights in the two ChIP was performed as previously described [1,2]. ChIP samples (endogenous MyoD and lenti-MyoD) indicated samples were prepared for sequencing per the Illumina a strong correlation (Pearson correlation 0.52 with asinh Sample Preparation protocol with two modifications: (1) transformation). Nevertheless, due to the different ori- DNA fragments of 150–300 bp were selected at the gel- gins of samples, the overall variation between the two selection step; (2) 21 cycles of PCR were performed at was still far greater than their respective technical repli- the amplification step instead of 18. For the control sam- cates (data not shown). Therefore, it was challenging to −/− −/− ples, untransduced MEFs derived from Myod /Myf5 apply an appropriate statistical null model to capture the mice were ChIPed with MYOD antibody, and mouse stochastic variation between the two systems and to myotubes were ChIPed with preimmune serum. identify exactly the set of peaks that are identical or dif- ferent between the samples. Therefore, we chose a non- MyoD lentivirus parametric approach by comparing the overlap of peaks cDNA for Myod was cloned into the GFP locus of at a spectrum of different rank cutoffs in order to out- the pRRL.SIN.cPPT.PGK-GFP.WPRE lentiviral backbone line the global landscape of peak similarity. Cross cell- (Addgene), with expression thus driven by the Pgk pro- type comparison was performed similarly as previously moter. Replication-incompetent lentiviral particles were described [4]. We ranked all peaks by their p-values and packaged in 293T cells by the Fred Hutchinson Cancer group ranks into bins of 5,000 (i.e., the top 5K peaks, Research Center Lentivirus Core Facility. MEFs were then the top 10K peaks, etc.). Then we computed the transduced in DMEM containing polybrene at 8 μg/ml. fraction of the top x peaks in one sample that overlap After 24 h, media were replaced; cells were switched to with the top y peaks in another sample, where x and y differentiation media (1% heat inactivated horse serum, vary from 5K to 110K, and y is equal to or greater than x. 10 μg/ml insulin, 10 μg/ml transferrin) 48 h after To compare the coverage at E-boxes in endogenous and infection and harvested 36–40 h later. ChIP and Western lenti peaks, and to quantify the distribution of peak height blot were performed with a previously characterized ratios between the two samples, we adjusted for the differ- MYOD antibody [3]. Western blot bands were quantified ent numbers of total reads by sub-sampling equal num- with ImageJ. bers of endogenous and lenti reads and recomputed the coverage and peak height at these sites. ChIP-seq peak calling and significance inference Sequences were extracted using the GApipeline soft- Ethical approval ware. Reads mapping to the X and Y chromosomes were This study did not directly use vertebrate animals or hu- excluded from our analysis. Reads were aligned using man subjects and did not require ethical approval. MAQ to the mouse genome (mm9). Duplicate se- quences were discarded to minimize effects of PCR amplification. Each read was extended in the sequencing Results orientation to a total of 200 bases to infer the coverage Comparison of different MyoD expression levels in the at each genomic position. Peak calling was performed by conversion of fibroblasts to skeletal muscle an in-house developed R package that models back- Mouse embryonic fibroblasts (MEFs) can be converted to ground reads by a negative binomial distribution condi- skeletal muscle by the forced expression of MyoD. To de- tioned on GC content as previously described [1,2,4]. termine whether overexpression of MyoD can reliably The control ChIP-seq sample was used to eliminate sta- identify biologically relevant binding sites, we compared tistically significant peaks likely due to artifact. the binding profile of the endogenous MyoD in mouse muscle cells to the MyoD binding profile in MEFs with Motif analysis exogenously overexpressed MyoD. Transduction of MEFs We used a discriminative de-novo motif discovery tool de- with a lentivirus expressing MyoD from the Pgk promoter scribed previously [2] to find motifs that distinguish fore- (lenti-MyoD) induced differentiation to skeletal muscle as ground and background sequence data sets. To find determined by fusion and expression of myosin heavy motifs enriched under ChIP-seq peaks, we selected back- chain (Figure 1A). Western analysis demonstrated that ground sequences using random genomic regions sampled lenti-MyoD cells had approximately four-fold higher levels Yao et al. Skeletal Muscle 2013, 3:8 Page 3 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 of MyoD protein than the endogenous MyoD of C2C12 three control ChIP samples (data not shown) and remain myotubes (Figure 1B). of unknown etiology. These non-MyoD peaks were subtracted from the MyoD ChIP-seq data sets. Overexpressed MyoD binds the same sites as ChIP-seq for the endogenous MyoD identified ~37,000 -10 endogenous MyoD peaks at a p-value (see Methods) of 10 (read cutoff -5 To determine whether overexpressed MyoD can accur- ~20), ~67,000 peaks at a p-value of 10 (read cutoff -3 ately identify sites bound by endogenous MyoD, we ~11), and ~117,000 at a p-value of 10 (read cutoff ~8). compared a ChIP-seq data set obtained from MEFs A similar range of peaks was identified by the overexpressed transduced with lenti-MyoD [4] to our previous ChIP- MyoD but at slightly higher p-value thresholds: ~35,000 -20 -10 seq data from endogenous MyoD in mouse myotubes (p~10 , cutoff ~50), ~68,000 (p~10 , cutoff ~26), and -5 [1]. The lenti-MyoD data set had 17.5 million mapped ~122,000 (p~10 ,cutoff~14). At agiven p-value, the unique reads, and we combined endogenous MyoD overexpressed MyoD had approximately twice the number ChIP-seq data from mouse C2C12 myotubes (6.5 million of peaks compared to the endogenous MyoD. This could ei- reads) and primary differentiated cultured mouse muscle ther represent higher occupancy of the same sites bound by cells (8.5 million reads) to achieve a comparable total 15 the endogenous MyoD or a large number of off-target sites million reads for endogenous MyoD because our prior bound by the overexpressed MyoD and not bound by the analysis [1] demonstrated a high concordance of peak lo- endogenous MyoD. cations between these two samples. These reads were To accurately compare the similarity, or overlap, of processed and peaks were identified as described in MyoD binding sites in the different samples, we used a Methods and Additional file 1: Figure S1A and B. The non-parametric approach that compared the overlap of control ChIP-seq samples (~18 million pooled reads from peak locations based on the rank order of the peaks in pre-immune ChIP in muscle cells, MyoD antisera ChIP in each sample (see Methods). Comparing the top 35,000 MEFs that do not express MyoD, and beads alone) peaks bound by endogenous MyoD and lenti-MyoD, contained a small number of high peaks (Additional file 1: there was a 67% overlap, and the overlap was similar Figure S1B), which were found at similar locations in all comparing the top 70,000 or 110,000 peaks for each (Figure 2A). The lack of a complete overlap at each cut- off was largely due to the rank-order of the peaks (based on p-value) rather than distinct binding regions, because 87% of the top 35,000 lenti-MyoD peaks were repre- sented in the top 70,000 endogenous-MyoD peaks and 93% in the top 110,000 endogenous-MyoD peaks. Similarly, 85% and 91% of the top 35,000 endogenous- MyoD peaks were present in the top 70,000 and 110,000 lenti-MyoD peaks, respectively. A more detailed represen- tation of this data is shown in Figure 2B. Therefore, although there were some differences in the rank order of the peaks, the locations were almost the same with greater than 90% concordance. Overexpressed MyoD binds the same motifs as endogenous MyoD Since MyoD binds as a heterodimer with an E-protein to an E-box containing a CANNTG core sequence, with preference for GC or CC as the internal nucleotides [1,4], we next determined whether overexpression of MyoD resulted in binding to a distinct set of low affinity sites or sites that might reflect homodimers or other Figure 1 MEFs transduced with MyoD lentivirus. (A) MEFs protein complexes. We used two different approaches to transduced with MyoD lentivirus demonstrate nearly complete determine the binding site preferences for endogenous conversion into myotubes 72 h after infection (red: MYOD antibody; and overexpressed MyoD. green: myosin heavy chain antibody; blue: DAPI). (B) Western blot demonstrates higher MyoD expression in MEFs transduced with First, we ranked all of the approximately 15 million E-boxes MyoD lentivirus (2) compared to control MEFs (1), C2C12 myoblasts in the mouse genome based on their ChIP-seq coverage (3), and C2C12 myotubes (4). Tubulin blot demonstrates (see Methods for details of the statistical model) and then equivalent loading. binned them by rank as the top 1,000, 1,001–10,000, Yao et al. Skeletal Muscle 2013, 3:8 Page 4 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 2 The MyoD binding regions are largely shared between overexpressed MyoD in MEFs and endogenous MyoD. (A) Overlap of lenti-MyoD peaks with endogenous MyoD peaks. To assess the concordance between the two samples, we selected the top 5K to 110K peaks in each sample based on p-value. We calculated the number of overlapping peaks for the top peak sets at various rank cutoffs in both samples and divided this number by the size of the smaller peak set. Specifically, for a cell corresponding to the top x peaks in sample 1 compared to the top y peaks in sample 2, the fraction is computed as the number of overlapping peaks divided by the smaller value of x or y. (B) The overlapping fractions were calculated as in A and are plotted with color-coding as specified in the figure. For example, in the cell at the row labeled 5K and column labeled 10K, we plotted the fraction of the top 5K peaks in lenti-MyoD that overlap with the top 10K peaks in primary + C2C12 myotubes. 10,001-100,000, etc. For both endogenous and lenti-MyoD, endogenous MyoD and overexpressed MyoD (Figure 3B). the CAGCTG and CACCTG E-boxes were enriched for The identified E-box motifs, and flanking preferences MyoD binding (Figure 3A) in similar proportions within were nearly identical for endogenous MyoD and each bin. Therefore, overexpressed MyoD binds a similar overexpressed MyoD at different rankings, although distribution of E-box sequences as the endogenous MyoD. the lower ranked peaks had a slightly more degener- For the second approach to determine whether ate sequence compared to the higher ranked peaks in overexpression of MyoD resulted in binding to a different both groups. Plotting the average position weight set of E-boxes, we used a motif discovery algorithm to matrix (PWM) score for the highest PWM E-box identify preferred E-box sequences associated with the top under each peak against the rank of the peaks dem- 35,000 ranked peaks (ranked on p-value), the 35,001- onstrates that the average PWM for the low ranked 70,000 peaks, and the 70,001-110,000 peaks for both peaks (rank > 85,000) falls off more rapidly for the Yao et al. Skeletal Muscle 2013, 3:8 Page 5 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 3 MyoD has a similar E-box preference for both endogenous and overexpressed MyoD. (A) Overexpressed MyoD (lenti) and endogenous MyoD (primary.tube) have similar E-box distributions. We collected all genomic E-boxes (excluding sex chromosomes and those present in the peaks of the control samples) and ranked them based on p-values for the read coverage at the E-boxes. We partitioned them into bins of top 1K, top 1001 to 10K, etc., until all E-boxes are included. Within each bin, we calculated the percentage of each type of E-box variant, and plotted the distribution. The background E-box distribution over the entire genome is also included as a reference. (B) MYOD binding sites for overexpressed MyoD (lenti) and endogenous MyoD (primary.tube) share the same sequence preference. We used motif discovery to identify the E-box motif under MyoD bound peaks for the top 35K, the top 35K+1 to 70K peaks, and the top 70K+1 to 110K peaks. The E-box sequence preferences are nearly identical, including within the flanking regions. Motifs in lower ranking peaks tend to have slightly more sequence degeneracy. (C) MyoD E-box average PWM score compared to peak rank. The MyoD PWM is derived from our previous study [4]. Weak peaks tend to have weaker motifs, but the degradation is more gradual for overexpressed MyoD peaks (lenti) beyond the top 67K, suggesting that a subset of noisy low peaks in the endogenous MyoD (primary.tube) is elevated to reasonably strong peaks distinguished from the background. X-axis: the peak rank bins. Y-axis: the average PWM scores for all peaks within the rank bin. endogenous MyoD compared to the lenti-MyoD PWM score, suggesting that the overexpression of (Figure 3C). This likely reflects the difficulty of accur- MyoD enhances the ability to discriminate weaker ately discriminating weak MyoD binding sites from MyoD binding sites from background. Taken together, background reads for the endogenous MyoD, whereas the data indicate that overexpressed MyoD in MEFs the lenti-MyoD maintains a higher average E-box binds to a nearly identical set of sites and E-boxes as Yao et al. Skeletal Muscle 2013, 3:8 Page 6 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 the endogenous MyoD in mouse muscle cells without using a de novo motif search strategy (see Methods). The a substantial number of off-target sites. peaks in the mouse muscle cells were enriched in E-box To determine whether overexpressed MyoD was bound motifs (>7-fold) and had modest enrichment for several to all of the E-boxes that match the consensus site, we other motifs: MEIS (1.6-fold), RUNX (1.3-fold), and AP1 graphed the number of reads over all the RRCAGSTG (2.2-fold). Similarly, the peaks in the MEFs with sites in the mappable genome (Figure 4A), sub-sampling overexpressed MyoD were enriched for E-box motifs the same number of reads for both the overexpressed (>6-fold), MEIS (1.6-fold), and RUNX (1.4-fold) relative to and endogenous MyoD ChIP-seq data sets. Despite the background sequence (Additional file 1: Figure S2A). It overexpression in the lenti-MyoD samples, the majority remains possible that some of the rank differences reflect of these high PWM E-boxes had one read or less, similar the relative abundance of co-factors in the different cell to the endogenous-MyoD samples, and for both samples backgrounds, but there is not a strong association of a spe- fewer than 25% of these high PWM E-boxes had more cific factor motif with the MyoD binding sites. than four reads. Therefore, for both the endogenous and Although there was about 90% concordance between overexpressed MyoD, a minor subset of high PWM endogenous and overexpressed MyoD peaks, albeit with E-boxes was occupied by MyoD. some difference in ranking position, approximately 5-7% Previously, we had measured E-box accessibility in of the highly ranked peaks in each set was not repre- fibroblasts prior to the expression of MyoD by exposing sented in the top 110,000 peaks in the other set. Motif isolated nuclei to the restriction enzyme PvuII, which analysis of the sequences under the endogenous-only cleaves at CAGCTG E-boxes [4]. Using these data, we peaks (i.e., ranked in the top 30,000 endogenous peaks assigned each CAGCTG E-box to one of three groups: but not in the top 110,000 lenti-MyoD peaks) compared relatively inaccessible, moderately accessible, or highly to peaks only present in the lenti-MyoD (i.e., ranked in accessible. About 40% of all highly accessible CAGCTG the top 30,000 lenti-MyoD peaks but not in the top E-boxes had more than four reads in the endogenous 110,000 endogenous MyoD peaks) showed an enrich- MyoD samples, and slightly over 50% had four or more ment of a variant of the RUNX motif (2-fold), a PITX- reads in the overexpressed MyoD samples; this compares like motif (4-fold), and a CGNCAG motif (2.7-fold). A to approximately 15% for both samples over the rela- similar motif analysis comparing the endogenous-only tively inaccessible E-boxes (Figure 4B). However, only peaks to shared peaks also revealed the RUNX and about one-third of the highly accessible group had read PITX-like motifs, albeit at a slightly lower fold enrich- coverage above the average cutoff for the top ~70,000 ment. The comparable analysis to identify motifs in the peaks (11 reads for endogenous MyoD and 26 reads for lenti-only peaks revealed a slight enrichment for E-boxes overexpressed MyoD). Therefore, E-box accessibility in with non-preferred core sequences (Additional file 1: the chromatin was a major determinant of MyoD bind- Figure S2B), indicating that a small number of the lenti- ing, but a large fraction of relatively highly accessible only peaks might represent binding to lower affinity E-boxes with a strong PWM remained unbound by sites, possibly driven by the higher amount of MyoD. MyoD, indicating that only a subset of accessible E-boxes Therefore, while there might be some contribution of with a good PWM showed substantial MyoD binding even co-factors expressed in the mouse muscle cells that ac- when MyoD was overexpressed. In this regard, our prior counts for the small number of endogenous-only peaks, study [4] showed that several sequence motifs were the motif analysis does not identify more than a modest enriched in the region of bound accessible sites (additional enrichment of the motif for any specific factor, consist- E-boxes, higher PWM E-boxes, and a motif similar to a ent with the finding that the MyoD binding sites in both MEIS binding site), indicating that several factors might cells types show over 90% concordance. operate to enhance or stabilize MyoD binding at subsets of accessible sites. Discussion We conclude that overexpression of MyoD can accur- Co-factor motifs and MyoD binding ately identify endogenous MyoD binding sites. This is Although there was a very high concordance of binding true despite the fact that the binding pattern of the en- sites for endogenous MyoD in mouse muscle cells and dogenous MyoD was determined in skeletal muscle cells overexpressed MyoD in MEFs, there was some differ- (primary myotubes and differentiated C2C12 cells), ence in peak rank, as evidenced by a 67% overlap of the whereas the binding sites of the overexpressed MyoD top 35,000 peaks in each set with most of the additional were determined in MEFs. The concordance of binding 33% of peaks present in the other set at lower rank. sites in these two cell types might reflect the ability of Since the E-box motifs were similar in both sets and did MyoD to convert MEFs to skeletal muscle. In this not apparently account for rank differences, we exam- process, MyoD activates the expression of many co- ined the top 30,000 peaks in each set for co-factor motifs factors that cooperate in a feed-forward circuit with Yao et al. Skeletal Muscle 2013, 3:8 Page 7 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 Figure 4 MyoD binds a subset of accessible E-boxes. (A) Coverage distribution over consensus MyoD binding sites RRCAGSTG. X-axis: log2 transformed coverage. Y-axis: the proportion of sites with coverage greater than the given value x. To make read coverage more comparable, we sub-sampled the same number of reads in each group. The distributions of coverage at RRCAGSTG sites in both samples are shown in solid lines. For comparison, E-boxes other than RRCAGSTG sites are shown in dashed lines. Only the E-boxes that are uniquely mapped within a ±200-bp window are included. (B) Coverage distribution over E-boxes similar to panel A but divided into E-boxes showing relatively low accessibility (0,1], moderate accessibility (1,2], or relatively high accessibility (2,Inf], as previously determined by PvuII accessibility [4]. Yao et al. Skeletal Muscle 2013, 3:8 Page 8 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 MyoD to orchestrate gene expression [7-9]. Since MyoD function in enhancing gene transcription. Together these can activate its own co-factors, the initial differences in findings suggest that transcription factor overexpression, co-factor expression between muscle cells and MEFs e.g., induced by gene amplification or other mechanisms might not alter the ultimate binding pattern of MyoD. in cancers, might have major biological consequences as a In differentiating muscle cells, MyoD binds DNA as a result of increased binding at physiologically bound sites. heterodimer with an E-protein. However, in vitro binding studies demonstrate that MyoD can form Conclusions homodimers and bind E-boxes. Therefore, we had antic- Our comparison of genome-wide binding of endogenous ipated that overexpressed MyoD might bind as a MyoD with overexpressed MyoD demonstrated that homodimer because of limiting amounts of E-proteins. overexpressed MyoD binds to the same sites as endo- Surprisingly, we do not think there is any evidence for genous MyoD and does not demonstrate binding to novel homodimer binding. The E-box motif analysis of the en- regions or motifs. The samples with overexpressed MyoD dogenous MyoD has asymmetric flanking sequences: showed better foreground-to-background signal and per- RRCAGSTG. In a recent study we have shown that mitted site determination at higher statistical significance, NeuroD2 binds an E-box with similar flanking prefer- suggesting that increased amounts of transcription factor ences on one side: RRCAGMTGG [4]. Because both increased the binding at physiologically bound sites. Over- MyoD and NeuroD2 form heterodimers with the same all, our study shows that overexpression of MyoD accur- E-proteins, we assume that the flanking RR is deter- ately identifies sites bound by endogenous MyoD and mined by the common E-protein partner, and initial demonstrates an intrinsic biological robustness for varying binding studies support this conclusion (AP Fong, un- levels of transcription factor in a cell. published data). Since these flanking preferences are maintained at the E-boxes when MyoD is overexpressed, Accession numbers it suggests that the E-protein determined sequence pref- ChIP-seq data have been deposited in Gene Expression erence is maintained and that MyoD is binding as a Omnibus (GEO) under accession number GSE34906 (lenti- heterodimer even when overexpressed. It is possible that overepressed MyoD) and in DDBJ Sequence Read Archive the requirement for heterodimer binding prevents off- (DRA) accession number SRP001761 (endogenous MyoD). target DNA binding by the overexpressed MyoD since the amount of the E-protein dimerization partner would be limiting. Additional file The fact that overexpression of MyoD improved the foreground-to-background signal and permitted site deter- Additional file 1: Figure S1. (A) The percentage of genome covered at a given p-value significance. X-axis corresponds to the negative logarithm of the mination at higher p-value stringency suggests that many p-value significance level, and Y-axis is the fraction of genome covered, both of the MyoD binding sites might not be occupied 100% of in log10 scale. The pink curve (ref) corresponds to the estimated percentage the time at endogenous levels of MyoD, although this re- of genome covered at the given p-value cutoff based on the null hypothesis in each sample, and the blue curve (obs) is the observed percentage of mains speculative since other unknown variables might genome covered at the given p-value cutoff. FDR is defined as the ratio of have affected the ChIP efficiencies or foreground/back- observed vs. background genome covered at a given p-value. The three ground read ratios in the different experiments. With these panels correspond to overexpressed MyoD (lenti), endogenous MyoD (primary.tube), and the control samples (control). (B) Pairwise comparison of caveats in mind, if the majority of sites are not saturated by control samples. We have three types of control: pooled reads from pre- physiological levels of MyoD (i.e., not bound by MyoD immune ChIP in muscle cells (Tube preimmune), MyoD antisera ChIP in MEFs 100% of the time) then these would present a large “sink” that do not express MyoD (MEF control), and beads alone (MEF bead). Reads from all control lanes are pooled to infer peaks at very low significance (p- for the overexpressed MyoD protein, which might further -3 value 10 ), and we calculate the maximum coverage for each sample at limit ectopic binding. In this regard, it is interesting to note these peaks. The pairwise comparison of coverage of each sample in square that re-analysis of published c-Myc binding ChIP-seq data root transformation is shown. Figure S2: Motif enrichment analysis for regions under overexpressed MyoD peaks and endogenous MyoD peaks. (A) [6] under low and high serum conditions that result in an Motifs enriched under all overexpressed MyoD peaks (lenti) or all endogenous approximately five-fold change in c-Myc mRNA also MyoD peaks (primary.tube). (B) Motifs specific to endogenous or shows enhanced binding of weakly bound sites with in- overexpressed MyoD peaks. Primary-Lenti: Motifs enriched in peaks present only in endogenous MyoD compared to peaks present only in overexpressed creased c-Myc levels (ZY and SJT, unpublished data). MyoD. Primary-Shared: Motifs enriched in endogenous-only peaks compared Furthermore, while this manuscript was under review, Lin to peaks present in both groups, i.e., shared peaks. Lenti-Shared: Motifs et al. [10] demonstrated that increased amounts of c-Myc enriched in peaks only in overexpressed MyoD compared to shared peaks. Consensus, consensus sequence for the motif; Anno, annotated factor for motif protein resulted in greater saturation of weakly bound c- consensus; scores, the regression z-values representing the discriminative Myc sites near promoters and this was associated with power of the motif for separating the foreground and background where increased gene transcription. Additional studies will be positive values indicate enriched motifs and negative values indicate depleted motifs; ratio, the enrichment (or depletion) ratio of the motifs in the required to determine whether increased MyoD bind- foreground relative to the background; fg.frac, the percentage of the ing at physiologically unsaturated sites has a similar Yao et al. Skeletal Muscle 2013, 3:8 Page 9 of 9 http://www.skeletalmusclejournal.com/content/3/1/8 8. Penn BH, Bergstrom DA, Dilworth FJ, Bengal E, Tapscott SJ: A MyoD- foreground sequences containing the motif; bg.frac, the percentage of the generated feed-forward circuit temporally patterns gene expression background sequences containing the motif; logo, the PWM logo. during skeletal muscle differentiation. Genes Dev 2004, 18:2348–2353. 9. Tapscott SJ: The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development 2005, 132:2685–2695. Abbreviations 10. Lin CY, Loven J, Rahl PB, Paranal RM, Burge CB, Bradner JE, Lee TI, Young MEFs: Mouse embryonic fibroblasts; lenti-MyoD: MEFs transduced with MyoD RA: Transcriptional amplification in tumor cells with elevated c-Myc. lentivirus; PWM: Position weight matrix; EM: Expectation-maximization. Cell 2012, 151:56–67. Competing interests doi:10.1186/2044-5040-3-8 The authors declare that they have no competing interests. Cite this article as: Yao et al.: Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound sites. Skeletal Muscle 2013 3:8. Authors’ contributions ZY performed the computational analysis; APF and YC performed the experiments; WLR and RCG provided oversight for the computational analysis; SJT provided oversight for the biological experiments; all authors participated in the experimental design and interpretation. All authors read and approved the final manuscript. Acknowledgments This study was supported by NIH NIAMS R01AR045113; A.P.F was supported by a grant from the University of Washington Child Health Research Center, NIH U5K12HD043376-08; Z.Y. was supported by an Interdisciplinary Training Program grant, T32 CA080416. We thank Mark Biggin for suggesting analysis of saturation and Bruno Amati and Heiko Muller for sharing coverage data from their c-Myc study [6]. Author details 1 2 Human Biology Division, Seattle, WA, USA. Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA. Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA. Department of Pediatrics, University of Washington, School of Medicine, Seattle, WA 98105, USA. Departments of Computer Science and Engineering and Genome Sciences, Seattle, WA, USA. Department of Neurology, University of Washington, School of Medicine, Seattle, WA 98105, USA. Received: 2 October 2012 Accepted: 6 March 2013 Published: 8 April 2013 References 1. Cao Y, Yao Z, Sarkar D, Lawrence M, Sanchez GJ, Parker MH, MacQuarrie KL, Davison J, Morgan MT, Ruzzo WL, et al: Genome-wide MyoD binding in skeletal muscle cells: a potential for broad cellular reprogramming. Dev Cell 2010, 18:662–674. 2. Palii CG, Perez-Iratxeta C, Yao Z, Cao Y, Dai F, Davison J, Atkins H, Allan D, Dilworth FJ, Gentleman R, et al: Differential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages. EMBO J 2011, 30:494–509. 3. Tapscott SJ, Davis RL, Thayer MJ, Cheng PF, Weintraub H, Lassar AB: MyoD1: a nuclear phosphoprotein requiring a Myc homology region to convert fibroblasts to myoblasts. Science 1988, 242:405–411. 4. Fong AP, Yao Z, Zhong JW, Cao Y, Ruzzo WL, Gentleman RC, Tapscott SJ: Genetic and Epigenetic Determinants of Neurogenesis and Myogenesis. Dev Cell 2012, 22(4):721–735. 5. Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proceedings of International Conference on Intelligent Systems for Submit your next manuscript to BioMed Central Molecular Biology; ISMB. Proc Int Conf Intell Syst Mol Biol 1995, 3:21–29. and take full advantage of: 6. Perna D, Faga G, Verrecchia A, Gorski MM, Barozzi I, Narang V, Khng J, Lim KC, Sung WK, Sanges R, et al: Genome-wide mapping of Myc binding and gene regulation in serum-stimulated fibroblasts. Oncogene 2012, 31:1695–1709. • Convenient online submission 7. Cao Y, Kumar RM, Penn BH, Berkes CA, Kooperberg C, Boyer LA, Young RA, • Thorough peer review Tapscott SJ: Global and gene-specific analyses show distinct roles for Myod • No space constraints or color figure charges and Myog at a common set of promoters. EMBO J 2006, 25:502–511. • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

Journal

Skeletal MuscleSpringer Journals

Published: Apr 8, 2013

References