MICMIC: identification of DNA methylation of distal regulatory regions with causal effects on tumorigenesis

MICMIC: identification of DNA methylation of distal regulatory regions with causal effects on... Aberrant promoter methylation is a common mechanism for tumor suppressor inactivation in cancer. We develop a set of tools to identify genome-wide DNA methylation in distal regions with causal effect on tumorigenesis called MICMIC. Many predictions are directly validated by dCas9-based epigenetic editing to support the accuracy and efficiency of our tool. Oncogenic and lineage-specific transcription factors are shown to aberrantly shape the methylation landscape by modifying tumor-subtype core regulatory circuitry. Notably, the gene regulatory networks orchestrated by enhancer methylation across different cancer types are seen to converge on a common architecture. MICMIC is available on https://github.com/ZhangJlab/MICMIC. Keywords: DNA methylation, Enhancer, Bioinformatics, Cancer, Information theoretic approaches, Epigenetic editing Background methylation may be similar to somatic mutations in can- Appropriate DNA methylation patterns are critical for cer, in which only a subset of events is causal or “drivers,” (epi)genomic stability and gene expression regulation [1]. while most are “passengers.” To identify the subset that In particular, it is well established that promoter hyperme- are causal, we need solutions that enable us to: (1) thylation is a common epigenetic mechanism for tumor genome-wide identify causal DNA methylation of en- suppressor inactivation in cancer [2]. However, many hancers and its gene targets in pan-cancers in an unbiased genes lowly expressed in normal samples were not differ- manner; and (2) directly validate a specific methylation entially expressed with differentially methylated promoter event on the putative enhancer by experimentation. [3, 4]. Some genes have been verified to be regulated by Pharmacological inhibition of DNA methylation with the aberrant promoter methylation with a causal effect on drug 5-azacitidine is commonly used for experimental val- tumorigenesis, including CDKN2B, CDKN2A, RB, APC, idation, but it induces genome-wide DNA demethylation BRCA1, and MLH1 [5–7]. Recently, DNA methylation of without specificity. enhancers in various cancers has been under intense study In this study, we designed a set of tools for identifying [4, 8–11]. However, its exact role and whether it is merely genome-wide DNA methylation of distal regulatory sites a marker of malignancy or a causal factor is largely un- that result in a causal effect on tumorigenesis. De novo known. Some of these studies focused on well-annotated enhancers/silencers and its direct gene targets were in- enhancer regions. However, the annotated enhancer sites ferred by information theoretic approaches [12, 13] and are mainly derived from the epigenome profiling of lim- validated with the emerging CRISPR/dCas9 epigenetic ited cell lines or tissues, lacking an in-depth coverage of editing [14–17] technique. Information theoretic ap- distal regulatory sites in patient cancer samples. DNA proaches have been proved effective to distinguish the direct from indirect connection in other applications with solid mathematical proof [18, 19]. Strikingly, we * Correspondence: jzhang1@hku.hk have found that the modulation of DNA methylation on Yin Tong and Jianlong Sun contributed equally to this work. School of Biological Sciences, The University of Hong Kong, Hong Kong, distal regulatory sites by dCas9-DNMT3A-3 L has pro- Hong Kong found effect on cancer cell behavior similar to promoter Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Tong et al. Genome Biology (2018) 19:73 Page 2 of 17 methylation, e.g. cell migration and proliferation altered TCGA level 3 datasets for various cancers, encompassing along with target gene expression change, even though HumanMethylation450 array and RNA-sequencing the distal regulatory site 200 kb away. By contrast, (RNA-seq) data. As an example, in the TCGA gastric can- dCas9-TET1 has the opposite effect on its target gene cer cohort (STAD) for the gene CDCA5, we identified ten expression. Our strategy recovered many known en- DREs associated with CDCA5 expression, with four of hancers and unannotated regulatory sites from different them > 240 kb away from the TSS of CDCA5 (Fig. 1c). cancer types, differential methylation of which regulated Subsequently, we successfully experimentally verified one known or novel tumor-suppressor/oncogene with causal of these DREs, cg02933228, which will be discussed fur- effect on cell malignancy and patient survival. Furthermore, ther below. The false discovery rate (FDR) for MICMIC our study also provides mechanistic insight on how DNA was 0.05 based on simulation testing (Fig. 1d). methylation of distal regulatory sites is critical for the main- tenance of tumor cell identity and malignancy with gene Genomic features enriched in distal regulatory network perspective. interactions identified by MICMIC From analysis of 11 different cancer types from the TCGA Results datasets, the number of DREs was in the range of 2192– Pipeline for MICMIC to infer methylation regulation 13,027 (total 73,255) and the number of DRE-target pairs networks was in the range of 2234–13,570 (total 80,334). Of To identify driver methylation events during tumorigen- DRE-target pairs, 57.4% were cancer specific and 42.6% esis, we developed a strategy based on information theor- shared by more than one cancer type. A total of 55,993 etic approaches to distinguish the direct from indirect DREs that were > 2 kb away from the TSS were termed dis- correlation between the methylation of CpG probes and tal DREs, similar to a previous study [9]. Of the promoter the expression of its potential gene targets. Our method, DREs (≤ 2 kb), 88.8% were negatively correlated with their “Methylation Regulation Network Inference by Condi- target genes (Fig. 2a, b), among which the majority were tional Mutual Information Based PC-algorithm” (MIC- downregulated (Additional file 1:FigureS1).The percentage MIC), is composed of three layers. The bottom layer uses of negative and positive correlations for distal DRE-target conditional mutual information (CMI) to determine the pairs were 37.9% and 62.1%, respectively (Fig. 2a, b). To dependence relationship between three nodes, genes, and/ identify enriched genomic features, we used the ENCODE or CpG probes (Fig. 1a). If variables X and Y are con- ChromHMM 18-state models to annotate the distal DREs nected only via A, then CMI(X,Y|A) will be close to zero, for 6/11 cancer types based on the availability of the corre- indicating that there is no direct connection between X sponding cell line data (see “Methods”)[20]. Of the six and Y. The middle layer uses a path consistency algorithm tested, all of the distal DREs negatively correlated with its (PC-algorithm) to infer the regulatory network that in- targets were enriched (p value < 0.01) in two or more en- cludes all nodes (Fig. 1a). To start with, all nodes are con- hancer regions (EnhG1, EnhG2, EnhA1, EnhA2, EnhWk), sidered connected and each edge is tested by CMI based suggesting that methylation of an enhancer could on the observed data. The final network emerges after all negatively regulate target gene expression (Fig. 2c and false positive connections are eliminated. Finally, in the Additional file 1: Figure S2). On the contrary, all of the top layer, MICMIC identifies each CpG probe and its dir- DREs positively correlated with its targets were enriched ect target(s) as a pair, denoted here as a DRE-target pair in one or two of the repressor regions (ReprPC, (DRE, direct regulatory elements) (Fig. 1b). Since many ReprRCWk), but not in the enhancer regions (Fig. 2c and methylation events are merely a consequential effect of Additional file 1: Figure S2). Bivalent Enhancers (EnhBiv), the cancerous state rather than being causal, MICMIC first reported in stem cells [21], were enriched in both was purposely designed not to call differentially methyl- negatively and positively correlated DREs. We then ated regions. To identify DRE-target pairs relevant to compared both negatively and positively correlated DREs tumorigenesis, we focused on genes that were determined for the enrichment of active chromatin marks (H3K27ac, to be essential for tumorigenesis by differential expression H3K4me1, p300, and DNase I hypersensitivity) and re- test and master regulator analysis (MRA), which was pressive marks (H3K9me3 and H3K27me3). We observed designed to quantify the enrichment of cancer signature strong enrichment of active marks around the negatively genes among the regulatory neighbors of the target gene correlated distal DREs and strong enrichment of repres- (see “Methods”). For each target gene tested, we included sive marks at the positively correlated ones (Fig. 2d and all nearby genes and CpG probes ±300 kb away from the Additional file 1: Figure S4). Enrichment of H3K4me3, transcriptional start site (TSS) of the gene and merged the marker of active promoters, was only observed at a expression and methylation matrix together. The minority (< 30%) of negatively-correlated DREs, which were CMI-based PC algorithm inferred the regulatory network 2–3 kb away from TSS (Additional file 1:FigureS4c). and the DRE-target pair (see “Methods”). We downloaded Similarly, the PhastCons conservation score reached its peak Tong et al. Genome Biology (2018) 19:73 Page 3 of 17 Fig. 1 Pipeline for inferring methylation regulation networks. a Top: Schematic of the MICMIC pipeline that uses information theoretic approaches to distinguish direct regulation from indirect correlation, where variables X and Y are connected only via variable A, then CMI(X,Y|A) will be close to zero, suggesting that there is no direct connection between X and Y. Bottom: A PC algorithm is used to infer the regulatory network from the observed data matrix, eliminating the indirect edges by CMI testing. b MICMIC is designed to identify the regulatory relationship between the methylation level of a CpG probe and the expression level of its potential gene target. For every target gene tested, we included all nearby genes and CpGs ± 300 kb from the transcription start site (TSS) of the test gene and merged the related expression and methylation matrix together. Then MICMIC applies the CMI-based PC algorithm to infer the regulatory network. CpG probes that passed the test were named direct regulatory elements (DREs). The DRE and its gene targets were denoted as DRE-target pairs. c A representative example of the MICMIC output for the CDCA5 gene, where ten DREs (nine shown here) were identified to be associated with CDCA5 expression in gastric cancer (TCGA STAD), four of which were at least 240 kb away from the TSS of the target gene. One of these DREs, cg02933228 (blue oval), was experimentally verified. In the lollipop diagram, green represents significant CpG probes, i.e. DREs. The Pearson correlation coefficient (PCC) for each DRE-target pair was represented by a vertical line (red for negative PCC and green for positive PCC). d Simulation test to justify the MICMIC p value cut-off. The number of actual DREs identified (blue) vs the number of DREs identified by chance (red) at various p value cut-offs at the center of the genomic regions flanking negatively cor- separatedbyupto25kband 50%evenwhenthe pairswere related distal DREs across all cancer types, and was signifi- separatedupto 100 kb (Fig. 2f). The TCGA samples ana- cantly higher than the control group (p value was in the lyzedinthisstudy andthe DREsidentified by MICMIC were range of 2.18e-68–0.027) (Fig. 2e). By contrast, there is much listed in the following tables (Additional file 2:Table S1 and weaker or no enrichment for positively correlated distal Additional file 3:Table S2). DREs (Fig. 2e). The precision of our distal DRE-target pre- diction was evaluated by different chromatin interaction Validation of causal DNA methylation events involved in datasets, such as IM-PET, Hi-C, RAD21-cohesin, and tumorigenesis by epigenome engineering techniques in ChIA-PET [22–24](see “Methods,” only negatively corre- gastric cancer lated pairs considered here). The precision rate of MICMIC We chose distal DRE-target pairs for validation if: (1) reached up to 90% when the DRE-target pairs were there was a strong correlation between expression and Tong et al. Genome Biology (2018) 19:73 Page 4 of 17 Fig. 2 Genomic features of DREs identified across cancers. a Bar chart showing the number of promoter DRE-gene pairs (top) and distal DRE- gene pairs (bottom) identified from the TCGA cancer cohorts. Blue bars indicate the fraction of the DRE-gene pairs shared by more than one cancer type, while red bars indicate the fraction of the cancer-type specific DRE-gene pairs. b The negative and positive correlation between DRE methylation and its target gene expression are shown by red and blue, respectively. Promoter pairs are mainly negatively correlated (88% in total). c Representative results showing the preferred chromatin state of distal DREs in liver (HepG2) and breast (HMEC) cancer cell lines. The distal DREs for each cell line were inferred from those identified from the corresponding TCGA cohort, LIHC and BRCA, respectively, here in this example. The number of distal DREs were counted at each chromatin state, with the heatmap color and number indicating the enrichment p value of distal DREs in each state. Results for other cancer types can be found in Additional file 1: Figure S2. d Representative results in liver cancer cell line HepG2, showing increased chromatin signals for H3K27ac, H3K4me1, p300, and DNase I hypersensitivity at genomic regions surrounding the negatively correlated distal DREs, in contrast to increased chromatin signals for repressive marks H3K9me3 and H3K27me3 surrounding the positively correlated distal DREs. P values were calculated by t-test comparing the signals of negatively correlated distal DREs vs that of an all probes control. The results for other cancer cell lines can be found in Additional file 1: Figure S4. e Average conservation score of distal DRE flanking regions. PhastCons conservation score was in the range of 0–1 (non- to perfectly conserved). P values are calculated between distal DREs and all probes (as control) by t-test. f The precision of DRE-target pairs predicted by MICMIC was determined by calculating the positive predictive value (PPV) in comparison with other chromatin interaction data, including IM-PET, Hi-C, RAD21-cohesin, and ChIA-PET (see “Methods,” only negatively correlated pairs considered here) methylation, represented by a significant Pearson correl- produced similar results to treatment with the global DNA ation coefficient (PCC < − 0.3 or > 0.3); and (2) the target methylation inhibitor, 5-AZA, and upregulated WNT5B gene was determined to be essential for tumorigenesis (Fig. 3b). The effect of targeting DNMT3A-3 L/TET1 to by differential expression test and MRA (see “Methods,” the distal DRE site of WNT5B was confirmed by bisulfite Figs. 3a and 4a, and Additional file 1: Figures S5 and sequencing without off-target on other genes (Fig. 3c and S7). For example, in gastric cancer, WNT5B expression Additional file 1: Figure S7a). We then tested if modulation and methylation of its distal DRE (cg02935351) were of the DNA methylation of distal DREs could affect cell strongly anti-correlated and WNT5B was predicted to migration. Strikingly, cancer cell migration increased as a be a tumor suppressor by MRA (Fig. 3a). Next, we per- result of DNMT3A-3 L targeting, but decreased by TET1 formed epigenetic editing by using CRISPR-dCas9 based targeting or overexpression of WNT5B complementary technologies, such as the casilio system [14] for targeted DNA (cDNA) (Fig. 3b and Additional file 1:FigureS9).To methylation with a DNMT3A-3 L fusion protein and the further confirm the regulatory function of this distal DRE re- dCas9-SunTag scaffold with scFv–TET1 catalytic domain gion, we cloned a 1-kb genomic region flanking cg02935351 fusions [15] for targeted demethylation to the intended and the WNT5B promoter into the pGL3 luciferase reporter genomic sites in the AGS human gastric cancer cell line vector and verified its putative enhancer status (Fig. 3f). (Additional file 4: Table S3). Remarkably, targeting Interestingly, co-transfection with dCas9-DNMT3A-3 L was DNMT3A-3 L to the region near cg02935351 downregu- also able to regulate the reporter constructs (Fig. 3f). In lated WNT5B, while targeting TET1 to this region addition, we verified several other genes, including MLEC, Tong et al. Genome Biology (2018) 19:73 Page 5 of 17 Fig. 3 Validation of causal DNA methylation events in gastric cancer. a Representative results showing the negative regulation of WNT5B by methylation of its distal DRE (cg02935351, 22,595 bp from WNT5B TSS). Box plot shows the high, middle, and low expression groups of WNT5B, plotted against the methylation of the distal DRE in each group. MRA analysis was implemented by the gene set enrichment analysis (GSEA) method. GSEA graphs show tumor-suppressive signatures of WNT5B by MRA. Correlation analysis and MRA for other genes are shown in Additional file 1: Figure S5. b Confirmation that methylation of the distal DRE is the causal event for WNT5B regulation and cellular malignancy. qPCR results showing increased WNT5B expression in AGS cells treated with 5-AZA or dCas9-TET1, and decreased expression in cells transfected with dCas9-DNMT3A-3 L relative to controls. Cell migration assays showed that dCas9-DNMT3A-3 L targeting increased cell migration, while overexpression (OE) of WNT5B suppressed tumor cell migration. Significance was determined by t-test and error bars represent ± SD. c Bisulfite sequencing validation of increased methylation of the CpGs surrounding the dCas9-DNMT3A-3 L targeted DRE of WNT5B. In the lollipop diagram, black circles stand for methylated Cs and white circles for unmethylated Cs. Each box below corresponds to one CpG position in the genomic sequence. The colored bars summarize the methylation states of all sequences at that position with yellow for methylated Cs and blue for unmethylated Cs. d qPCR results for eight gastric cancer genes after dCas9-DNMT3A-3 L/TET1 epigenetic editing with dCas9-only as control, labelled as ctr1 and ctr2. Three independent replicates were conducted for each experiment. All DRE-target pairs tested here showed strong anti-correlation between expression and methylation, and the qPCR results showed dCas9-TET1 targeting increased gene expression, while dCas9-DNMT3A-3 L targeting inhibited gene expression (p value < 0.01, student t-test). e Summary of cell migration assay results for eight gastric cancer genes, showing the causal effects of distal DRE methylation on cancer cell malignancy. See photos in Additional file 1: Figure S9. f The distal DRE region and promoter of each gene of interest were cloned into the pGL3 reporter vector and assayed for luciferase activity. The reporter constructs were also co-transfected with dCas9-DNMT3A-3 L (pro_enh + targeted_DNMT3A-3 L), resulting in decreased luciferase activity LLGL2, CDCA5, MEN1, CLDN7, SOX9, and FGFR1 by epi- off-target due to overexpression DNMT3A-3 L/TET1 (see genetic modulation of distal DREs followed by quantitative “Methods” and Additional file 1:Figure S6). Overall,our polymerase chain reaction (qPCR), migration assay, and experimental results were fully consistent with MICMIC luciferase reporter assay (Fig. 3d–f,Additional file 1: predictions. As mentioned above in Fig. 1,MICMIC Figure S9). We performed experiments using scrambled predicted a distal DRE for CDCA5, cg02933228, which is single guide RNA (sgRNA), “untargeted,” or catalytically > 240 kb away from the TSS of CDCA5 (Fig. 1c), but we inactive DNMT3A-3 L/TET1 to rule out the possibility of were able to achieve robust regulation of this distal DRE Tong et al. Genome Biology (2018) 19:73 Page 6 of 17 Fig. 4 Validation of causal DNA methylation events in liver cancer. a Representative results showing the negative regulation of HDAC11 by methylation of its distal DRE (cg03190578, 3817 bp from HDAC11 TSS). Box plot shows the high, middle, and low expression groups of HDAC11, plotted against the methylation of the distal DRE in each group. GSEA graphs show oncogenic signatures of HDAC11 by MRA. Correlation analysis and MRA for other genes are shown in Additional file 1:FigureS10. b Confirmation that methylation of the distal DRE is the causal event for HDAC11 regulation and cellular malignancy. qPCR results showing increased HDAC11 expression in PLC8024 cells treated with 5-AZA or dCas9-TET1, and decreased expression in cells transfected with dCas9-DNMT3A-3 L relative to controls. Cell migration assays showed that dCas9-DNMT3A-3 L targeting suppressed cell migration, while overexpression (OE) of HDAC11 increased tumor cell migration. Significance was determined by t-test and error bars represent ± SD. c qPCR results for 11 liver cancer genes after dCas9-DNMT3A-3 L/TET1 epigenetic editing with dCas9-only as control, labelled as ctr1 and ctr2. Three independent replicates were conducted for each experiment. Ten out of 11 DRE-target pairs were predicted to be negatively regulated by DRE methylation, CBFA2T3 was predicted to be positively regulated by methylation of the distal DRE (cg20283771).The qPCR results (p value < 0.01 by Student’s t-test) were consistent with the predictions (Additional file 1:Figure S10). d Summary of results for cell migration and proliferation assays for liver cancer genes, showing the causal effects of distal DRE methylation on cancer cell malignancy. See photos in Additional file 1: Figure S11. e The distal DRE region and promoter of each gene of interest were cloned into the pGL3 reporter vector and assayed for luciferase activity. The reporter constructs were also co-transfected with dCas9-DNMT3A-3 L (pro_enh + targeted_DNMT3A-3 L), resulting in decreased luciferase activity except CBFA2T3 positively correlated with its DRE methylation with dCas9 epigenetic editing (Fig. 3d–f). This same distal Validation of causal DNA methylation events involved in DRE, cg02933228, was also predicted to control the gene tumorigenesis by epigenome engineering techniques in MEN1, which we were also able to experimentally con- liver cancer firm. Additionally, our study is the first to show evidence We also validated MICMIC predictions in liver cancer. of the gene Malectin (MLEC) being a tumor suppressor First, we observed a strong anti-correlation between (Fig. 3d and e, Additional file 1: Figures S5 and S9). Taken HDAC11 expression and cg03190578 methylation (Fig. 4a). together, MICMIC along with MRA was able to identify As expected, targeted methylation with DNMT3A-3 L to causal events in tumorigenesis involving DNA methyla- the cg03190578 region decreased HDAC11 expression, tion of distal regulatory regions, which we were able to while targeted demethylation with TET1 dramatically in- verify via epigenetic editing by dCas9 fused with TET1 or creased HDAC11 expression (Fig. 4b). Consequently, we DNMT3A-3 L and identify novel oncogenes/tumor-sup- found that modulation of DNA methylation on the distal pressors in the process. DRE, cg03190578, by dcas9-DNMT3A-3 L significantly Tong et al. Genome Biology (2018) 19:73 Page 7 of 17 decreased cancer cell migration suggesting an oncogenic oncogenic TFs (Fig. 5b inset and Additional file 1:FigureS13), function for HDAC11 in liver cancer, which was confirmed suggesting that hypomethylation of distal DREs from by increased cell migration upon overexpression of negative-up group together with the increased expres- HDAC11 (Fig. 4b). In contrast, dcas9-TET1 targeting to sion of the cognate oncogenic TFs, consequentially the cg03190578 region increased cancer cell migration lead to upregulation of its distal gene targets. For distal DREs (Additional file 1: Figure S11). In addition to HDAC11, we positively correlated with its targets, we found significant validated other genes as well and identified the distal DREs enrichment of TFs with repressor activity (p value = 8e-7), of HDAC11, APOA1, NDRG1, TK1, and TKT to be en- suggesting that DNA methylation may affect the binding of hancers and the distal DREs of BIRC5, CDT1, CBFA2T3, TF repressors with implications in tumorigenesis (Additional SLC16A3, KLF9, and APOC3 to be silencers (Fig. 4c–e, file 1:FigureS14). Additional file 1: Figures S10 and S11). Among these We also investigated the association between DRE genes, some shared the same distal DRE, e.g. APOA1 methylation and patient survival. We identified 1081 DRE shared cg23193059 with APOC3 and CDT1 shared methylation correlated with patient survival (q-value < 0.1, cg20283771 with CBFA2T3. Intriguingly, methylation FDR by BH procedure) in bladder cancer (BLCA), breast of DRE cg20283771 was positively correlated with cancer (BRCA), head and neck carcinoma (HNSC), liver CBFA2T3 expression, but negatively correlated with cancer (LIHC), lung cancer (LUAD), and uterine corpus CDT1 (Fig. 4c, Additional file 1: Figure S10). Both endometrial cancer (UCEC). For BLCA, the DREs genes were verified to be causally regulated by methy- associated with survival were enriched in intergenic lation of cg20283771 with combined effect on cancer regions. For LUAD and UCEC, the DREs associated cell migration after dCas9-DNMT3A-3 L targeting with survival were enriched in distal regions (enrich- (Fig. 4c–e). For two genes, NDRG1 and TK1, there ment p value < 0.05) (Fig. 5d). We then calculated the was no significant difference in cell migration after number of master cancer genes (via MRA) that are dCas9-DNMT3A-3 L targeting of their distal DREs, regulated by DNA methylation of the promoter or distal but they did show a significant decrease in cell prolif- DREs and used the density distribution to quantify the ef- eration (Fig. 4d). fect that methylation of those DREs have on tumorigen- esis (Fig. 5e,Additional file 1: Figures S15 and S16). The Aberrant methylation landscape of distal DREs can be results indicated that the methylation of distal DREs com- shaped by oncogenic and lineage-specific transcription pared to proximal DREs had more of an impact on the factors (TFs) with profound effects on tumorigenesis and regulation of both oncogenes and tumor suppressors at patient survival the initiation and progression stage of tumorigenesis. We next investigated how TFs can regulate and shape the Furthermore, we analyzed the dynamic change in methy- methylation landscape of distal DREs in cancers (see lation patterns that can occur at distal DREs as tumors “Methods”). First, we categorized all distal DREs in each transition from the initiation to the progression stage. cancer into four subgroups, i.e. negative-up, negative-down, During this transition, methylation patterns of distal DREs positive-up, and positive-down, dependent on whether the can remain the same (“consistent”), become differentially pair was negatively or positively correlated and whether the methylated in the opposite direction (“reversed”), or show target gene was up- or downregulated in tumor versus nor- increased (“stronger”) or decreased (“weaker”)methylation mal samples. After identification of TFs associated with dis- change in the initiation versus the progression stage tal DREs (Additional file 1: Figure S12), we calculated the (Additional file 5: Table S4). Strikingly, distal DREs related PCC between the expression level of each enriched TF and to patient survival were more enriched in the “reversed” the average methylation level of its cognate binding sites on group (Fig. 5f). For example, in uterine cancer, the distal distal DREs for each subgroup (Fig. 5c) and ranked TFs by DRE of PAQR4 was de-methylated at the initiation stage its PCC in ascending order. Strikingly, the top ranked TFs but became re-methylated in higher stage tumors. identified from the negative-down group were mostly Moreover, the high methylation of the DRE and lower ex- tissue-specific TFs across various cancer types, whereas TFs pression of PAQR4 were correlated with poorer patient sur- identified from the negative-up group were mainly onco- vival (Fig. 5g). Many more distal DRE-target pairs fell into genic TFs (Fig. 5a and b, Additional file 1: Figure S13). this category, including the gene STX18 (Fig. 5g). GSEA further confirmed that these tissue-specific TFs are tumor suppressors (Fig. 5a inset), suggesting that hyper- Diverged tumor-subtype core regulatory circuitry and methylated distal DREs from the negative-down group in converged pan-cancer global topology of TF network conjunction with the decreased expression of the cognate associated with distal DRE tissue-specific TFs, lead to downregulation of its distal gene Multiple lines of evidence have indicated that super-enhancers targets in cancer. Similarly, GSEA confirmed that the top (SEs) with associated oncogenic TFs play a pivotal role in ranked TFs in the negative-up group were enriched for regulating and maintaining tumor cellular identity [25, 26]. It Tong et al. Genome Biology (2018) 19:73 Page 8 of 17 Fig. 5 Interplay between distal DRE methylation and the cognate transcription factor binding has profound effects on tumorigenesis. Bar charts show the enriched transcription factors (TFs) that bind to DREs showing a negative correlation between the average methylation of TF binding motifs and TF expression, with (a) downregulation (negative-down DREs) or (b) upregulation (negative-up DREs) of the target gene. Intensity of blue color indicates the degree of tissue specificity of the TF in breast cancer (BRCA) compared to other tissue types. Intensity of red/green color indicates the degree of oncogenic/ tumor suppressive behavior of the TF. Bar chart showing the enriched TFs binding on the group-specific distal DREs. The TFs were ranked by the negative correlation between the TF expression and average DNA methylation of the TF binding motifs on the group-specific distal DREs. The correlation value is shownonthe y-axis. Colors represent the tissue type significance or master regulator significance of the TF gene. The cancer signature association of the TFs is showninthe inset GSEA plot. c Representative examples of TFs that showed a negative correlation between the expression of the TF and average DNA methylation of the TF binding motifs on distal DREs. d Heatmap showing the enrichment significance of DREs associated with patient survival in various genomic regions. The heatmap color and number indicates the enrichment p value of DREs associated with patient survival in each category. e Higher impact of distal DREs on cancer genes compared with promoter DREs during initiation (top) and progression (bottom). Y-axis for the two top waterfall plots indicates the master regulator significance for each gene, ranked from tumor-suppressive to oncogenic. Y-axis for the two bottom panels shows the density of normalized gene counts controlled by promoter or distal DREs. See results of other cancers in Additional file 1: Figure S16. f Heatmap showing the enrichment significance of distal DREs associated with patient survival in the four methylation patterns of distal DREs, i.e. “consistent,” “reversed,”“stronger,” and “weaker” according to the direction of methylation change from the initiation to progression stage of tumorigenesis for each cancer type. The number in each square represents the p values. g Example of two distal-DRE target pairs identified in uterine cancer that show a “reversed” methylation pattern. Box plots on the x-axis show the DRE is demethylated during the initiation stage but becomes remethylated during cancer progression. High methylation of both DREs and low expression of their target genes were associated with poorer patient survival has been shown that SEs function as a platform to integrate a (Additional file 6: Table S5). We hypothesized that cancer sub- set of key TFs forming a core regulatory circuitry (CRC) to types could be distinguished by the joint consensus clustering regulatetumor-subtypespecificgeneexpression.The TFsin of the DNA methylation of each TF’s cognate binding site and each CRC are auto-regulated by itself through binding sites the expression level of the corresponding TF. Strikingly, the on its corresponding SE. The TFs can also cross-regulate each joint consensus clustering with the assembled CRCs for can- other by forming an interconnected loop with cognate binding cers, including breast, liver, stomach, and endometrial carcin- sites on other TFs’ related SEs. Based on this information, we oma, can identify the cancer subtypes in line with the took advantage of the genome-wide information of distal previously established molecular/pathological subtypes. For in- DRE-target derived from MICMIC to assemble the CRCs stance, breast cancer subtypes (lumA, lumB, and basal like) regulated by DNA methylation for each cancer type [27] can be identified by the joint clustering (Fig. 6b). We can Tong et al. Genome Biology (2018) 19:73 Page 9 of 17 Fig. 6 Tumor-subtype core regulatory circuitry and pan-cancer global topology of TF network regulated by DNA methylation of distal DREs. a The interconnected auto- and cross-regulation loops within the CRC TFs. The links between TFs were derived from distal DRE-target pairs in which the DRE harbors binding sites for the CRC TFs. The TFs are colored by the tumor subtypes in which they are highly expressed. Effects of the CRC on tumorigenesis are analyzed by the cancer pathway enrichment of the TFs’ targets (hypergeometric p value < 0.05), representing in the right side of the CRC. b Top: Heatmap of the expression Z-score of CRC TFs in the tumor subtypes. Bottom: Joint consensus clustering by the expression of CRC TFs and methylation of binding DREs shows a great similarity between the CRC subtypes and PAM50 subtypes in breast cancer. See results of other cancers in Additional file 1: Figures S17–S19. c Signaling pathways in breast cancer regulated by CRC TFs whose targets were identified by the distal DRE-target pairs in which the DRE harbored the TF binding sites. Each color of a gene node indicates a different cancer pathway. Edges represent regulatory relationships. d Convergence of network topology across cancer types (see “Methods”). For each cancer type, their TF networks were decomposed and categorized into 13 different types of basic three-node network motifs, indicated by the topology structures above the graph. The X-axis shows the numerical identification number associated with each motif. The relative enrichment (Z > 2) or depletion (Z < − 2) of each of the 13 basic network motifs for each cancer type was calculated as a Z-score (Y-axis) Tong et al. Genome Biology (2018) 19:73 Page 10 of 17 further identify the underlying signaling pathways reg- from the methylation modulation by dCas9 targeting ulated by CRC in different subgroups (Fig. 6c and since the output could be the combined effect of Additional file 1:Figures S17–S19). Similarly, the glo- multiple genes targeted by the same distal DRE. bal gene regulatory network (GRN) for each cancer Our study provides mechanistic insight on how can be generated with the information of our DNA methylation of distal DREs is critical for the genome-wide distal DRE-target interaction and TFs maintenance of tumor cell identity and malignancy. associated with each DRE. Topology of GRN can be We found that oncogenic and lineage-specific TFs compared based on the normalized frequency of the shape the methylation landscape of distal DREs, three-node network motif in each cancer [28, 29]. which is controlled in concert by the expression level Notably, GRNs across various cancer types converged of each enriched TF and the average methylation level on a common architecture (Additional file 7:Table of its cognate binding sites on distal DREs. Key TFs S6), highlighting the similarity of GRN controlled by were identified to be part of core regulatory circuit- DNA methylation of distal regulatory regions at the ries (CRCs) associated with distal DREs for regulation higher-order organization level (Fig. 6d). of tumor-subtype specific gene expression. Further- more, we showed that the network topology of GRN Discussion derived from DNA methylation of distal DREs may In this study, we aimed to identify DNA methylation of have the same architecture across different cancer distal regulatory regions with causal effects on tumorigen- types, enriched for network motifs like “feed forwards esis. MICMIC is different from other currently available loop,”“regulated mutual,” and “regulating mutual.” methylation analysis software in two respects. First, since This similarity in topology suggests that a common many methylation events are merely a consequence of epi- organization principle governs this type of biological genetic disruption and not the cause, rather than calling networks regulated by DNA methylation of distal differentially methylated regions first, we begin by: (1) regulation regions. using genes essential for tumorigenesis by differential ex- pression test and MRA to find its distal DRE(s); and (2) Conclusions take novel application of information theoretic approaches In this study, we have developed a set of tools to in DRE-target call. Interestingly, about 23.7% putative en- genome-wide identify DNA methylation in distal regions hancers flanking our distal DREs harbor known COSMIC with causal effect on tumorigenesis. Novel oncogenes/ non-coding mutations in liver cancer (Additional file 8: tumor-suppressors and their putative enhancers can be Table S7). This can help prioritize the somatic mutations identified together based on this strategy. We have exten- locating on distal regulatory sites as cancer risk loci sively validated many of the predictions by epigenetic non-coding variants are enriched in enhancers [25, 30]. editing. Our study reveals the prevalent regulation of Our bench validation with dCas9 targeting is dependent genome-wide putative enhancers by DNA-methylation on the experiment with co-transfection of multiple plas- with causal effect on cellular malignancy and patient sur- mids into cancer cell lines that have to be effectively trans- vival. Our study also provides mechanistic insight on how fected. This could be challenging for certain cancer types, DNA methylation of distal regulatory regions is critical for e.g. only one gastric cell line “AGS” (over 50% transfection the maintenance of tumor cell identity and malignancy. efficiency with lipofectamine3000) and a few liver cancer lines have acceptable transfection efficiency in our hand. Methods However, the DNA methylation level seems quite hetero- Data collection geneous for most DREs in the same cell line. For instance, We downloaded TCGA level 3 DNA methylation data, we can increase or decrease the DNA methylation level of clinical data, and RNA-seq data for 4747 matched samples the same DRE site in AGS cell line by dCas9 targeting, encompassing 11 cancer types: bladder urothelial carcin- and consequentially change the gene expression level in oma (BLCA); breast invasive carcinoma (BRCA); cervical both directions, upregulation or downregulation. squamous cell carcinoma and endocervical adenocarcin- It is common for a single enhancer to control more oma (CESC); colon adenocarcinoma (COAD); esophageal than one gene and vice versa. As shown above, both carcinoma (ESCA); head and neck squamous cell carcin- oncogene CDCA5 and tumor-suppressor MEN1 were oma (HNSC); liver hepatocellular carcinoma (LIHC); lung verified to be regulated by the same distal DRE adenocarcinoma (LUAD); lung squamous cell carcinoma cg02933228. However, the decreased cell migration (LUSC); stomach adenocarcinoma (STAD); and uterine phenotype after dCas9-DNMT3A-3 L targeting of corpus endometrial carcinoma (UCEC) (Additional file 2: cg02933228 was only consistent with CDCA5’sfunc- Table S1). The methylation data is based on the Infinium tion prediction. We need to take into account this HumanMethylation450 BeadArray platform, in which the complexity when interpreting the phenotypic output probes covered 485,000 CpG sites across the genome. Tong et al. Genome Biology (2018) 19:73 Page 11 of 17 Mutual information and conditional mutual information IXðÞ ; Y jZ ¼ HXðÞ ; Z Mutual information (MI) is a general measurement of þHYðÞ ; Z −HXðÞ ; Y ; Z −HZðÞ ð8Þ dependence between individual events. This method where p(X,Y,Z) is the joint probabilities and H(X,Y,Z) is is based on the joint probability of events to infer de- the joint entropy. A high value for CMI(X,Y|Z) would pendence without making any assumptions about the mean X and Y are directly connected and do not rely on nature of their underlying relationships. MI is based the given variable Z. on information theory and can be calculated by the We used the kernel density estimation (KDE) to estimate entropy of variables. For any variable A, the entropy the probability distribution of continuous variables, such as H(A) is the average amount of information gained gene expression. KDE was found to be superior to the from a measurement. And it can be defined by: histograms estimator and the estimation of probability distribution by KDE has been used in MI calculation as fol- HAðÞ¼ − paðÞ logpaðÞ ð1Þ i i lows [18, 19], i¼1 PXðÞ where p(a) is the probability of any possible value of A. 1 1 1 The joint entropy of two discrete systems A and B is de- T −1 ¼ exp − X −X C X −X j i j i n=2 n=2 fined by N 2 ðÞ 2π jj C j¼1 N N A B X X (9) HAðÞ ; B ¼ − pa ; b logpa ; b ð2Þ i j i j where C is the covariance matrix of X and |C| is the i¼1 j¼1 determinant of matrix C. From Eqs. 1, 6, and 9, we got the entropy of variable where the p(a,b) is the joint probability. When both A X, MI of (X,Y), and CMI of (X,Y|Z) as: and B are independent events, the joint entropy of A hi and B can be denoted by: 1=2 HXðÞ¼ logðÞ 2πe jj C ; ð10Þ HAðÞ ; B ¼ HAðÞþHBðÞ ð3Þ 1 j CXðÞj � j CYðÞj IXðÞ ; Y ¼ log ; ð11Þ For any dependent events A and B, the joint entropy 2 j CXðÞ ; Y j will follow: 1 j CXðÞ ; Z j � j CYðÞ ; Z j IXðÞ ; Y jZ¼ log : ð12Þ HAðÞ ; B < HAðÞþHBðÞ ð4Þ 2 j CZðÞj � j CXðÞ ; Y ; Zj The mutual information of I(A,B), which quantifies MI and CMI were normalized by: the dependence of A and B, is defined as the difference IXðÞ ; Y between H(A) + H(B) and H(A,B): IXðÞ ; Y ¼ ; ð13Þ maxðÞ IXðÞ ; Y IAðÞ ; B ¼ HAðÞþHBðÞ−HAðÞ ; B ð5Þ IXðÞ ; Y jZ N N IXðÞ ; Y jZ¼ ; ð14Þ a b X X pa ; b i j maxðÞ IXðÞ ; Y jZ IAðÞ ; B ¼ pa ; b ð6Þ i j paðÞpb i j i¼1 j¼1 where maximal(MI) and maximal(CMI) were the MI and CMI values when Y was totally dependent on X. A higher MI represents a greater connection between Then, the normalized MI and CMI value were between the events. 0and 1. To further study the dependence within three or more variables, conditional mutual information (CMI) is intro- Significance level determination duced to assess the exclusive dependence between any pairs To determine the significance level of our MI and CMI of variables given the value of a third one. CMI can distin- examination, we used random permutation and Fisher’s guish pairs directly from indirectly connected. The condi- Z statistics to calculate the z-score and p value [19]. We tional mutual information (CMI) can be calculated by: randomly shuffled the vectors X and Y many times and X X X got the correlation r between random X,Y (CMI). Then p ðÞ z p ðÞ x; y; z Z X;Y ;Z IXðÞ ; Y jZ¼ p ðÞ x; y; z log X;Y ;Z we transformed r to z by: p ðÞ x; z p ðÞ y; z X;Z Y ;Z z∈Z y∈Y x∈X ð7Þ z ¼ :5l½ nðÞ 1 þ r − lnðÞ 1−r ð15Þ or in terms of entropy: The confidence interval would be: Tong et al. Genome Biology (2018) 19:73 Page 12 of 17 Identification of the direct regulatory elements by MI/CMI z  zσ 0 ð16Þ based PC-algorithm For genes being tested, we identified the DREs by the following steps: Here, the σ is the standard deviation of z. We used the observed X,Y value to get the observed CMI value 1. Data preparation. We selected neighboring elements and transformed it into Z-value. The Z-score was calcu- (i.e. messenger RNA expression and CpG probe lated by Z score = (Z value-z’)/σ . And the p value was methylation) of a target gene within a genomic calculated by 2*pnorm(−|Zscore|). range (default ± 300 kb from TSS of the gene) and integrated the data value for these selected elements Differential expression analysis (e.g. expression and methylation value). The final We used the Voom method to normalize the RNA-seq result was a data matrix in which columns data and calculated the gene differentially expressed be- correspond to samples and rows to variables (i.e. tween tumor and normal by limma package [31]. We se- gene or CpG probes). We chose genomic range ± lected the differentially expressed genes (DEGs) by the 300 kb since it was reported that the enhancer- cut-off of |log2FC| > 0.58 (i.e. fold change cut-off either promoter interactions peak around 120 kb upstream upregulation > 1.5-fold or downregulation at least of the TSS [34]. 1.5-fold) and adjusted P < 0.01, and the DEG list was 2. Identification of DREs for the gene on test. We used used for downstream analysis, e.g. identification of the the network inference method called PC algorithm corresponding DREs and master regulator analysis. to infer the regulatory network based on the MI/ CMI connections [12]. The PC algorithm is Master regulators analysis (MRA) computationally feasible and very efficient for Master regulators (MRs) control a large number of down- sparse connections frequently encountered in stream targets that play important roles in cancer stage biological networks. The result returned an adjacent transition. Here, we exploited a classical strategy to identify matrix representing the direct connected edges. MRs for cancer initiation (paired tumor vs normal samples) First, we assumed all nodes connected by default to and progression (late-stage [IV] vs early stage [I] samples). generate a completely connected graph between all The basic framework contains two parts: (1) based on can- genes and all CpG probes within the genomic range cer specific gene expression profile, transcriptional targets (default ± 300 kb from TSS of the gene). Second, (termed as regulon) of TFs are inferred using ARACNe MI was calculated for any node pair, e.g. node i and [32] with default parameters. Data processing inequality j based on their values in samples. Third, the edge (DPI) was set to reduce the number of indirect connections; between i and j will be kept in the network only if (2) gene set enrichment analysis with R gage package [33] their MI passes the significance testing (cut-off is conducted to evaluate whether the regulon of TFs is p < 0.01). Fourth, all of the common partners (k) for enriched in the signature of cancer-related phenotype tran- the i and j pair surviving last test will be used to sition (ranked gene list using t value from differential ex- calculate the CMI(i,j|k), which can distinguish if i-j pression analysis). Specifically, the regulon genes of a TF connection is conditional on variable k. Fifth, we are divided into positive (+) and negative (−)groupsbased generated a directly connected network in an on the Spearman’s correlation coefficients between the ex- adjacent matrix after deletion of these indirectly pression level of the TF and each gene in its regulon. Then, connected edges. Herein, a mutual information two runs of gene set enrichment analysis are carried out to cutoff (MI > 0.1 bits) was used to remove weak determine the MR is activated (i.e. oncogenic) or repressed connections. Finally, we generated a list of the (i.e. tumor-suppressor): run 1 regulon (+) in from the up- DRE-target pairs that were directly connected. regulated side and regulon (−) from the downregulated 3. Classification of the DRE-target pairs. The DREs were side;run 2regulon (+)infromthe downregulatedside and classified based on the target gene expression (up- or regulon (−) from the upregulated side. In each run, the en- downregulation in tumors), direction of correlation richment q-values are calculated by Fisher’smethod. Regu- with its target gene expression (positive or negative), lon(+) of a gene is also called positive neighbors and and the distance from the TSS of its target gene. regulon(−) of a gene is called negative neighbors in this DREs locating within ± 2000 bp of the TSS of its paper. Whichever of the two runs gives the more significant target genes were classified as the promoter DREs q-value is used as the final q-value; the MR is predicted as and otherswereclassifiedasdistalDREs. oncogenic(theqvalueinrun 1<the q valueinrun 2) or tumor-suppressive (the q-value in run 1 > the q-value in The MICMIC pipeline can be adjusted to handle genomic run 2) correspondingly. range beyond ± 300 kb. We chose genomic range ± 300 kb Tong et al. Genome Biology (2018) 19:73 Page 13 of 17 here since it was reported that the enhancer-promoter in- scores for each genomic feature on genomic regions teractions peak around 120 kb upstream of the TSS [34]. 6000 bp flanking each DRE, then got the average score for Examples of the indirectly correlated CpG-gene all DREs from the same cancer cell line. pairsrejectedbyour method arepresented in Additional file 1:FigureS3c.The deregulationsofthe Precision of DRE-target pairs CIMP genes are controlled by the hypermethylation We computed the precision of DRE-target pair predictions of genome-wide CpG islands and the strongly corre- by comparing them to the enhancer-promoter pairs lated CpGs rejected by our method showed no correl- (EP-pairs) predicted by chromatin interactions derived from ation in the non-CIMP samples. IM-PET, ChIA-PET, Hi-C, and RAD21-cohesin. These tools mainly detect active enhancers with enrichment of active his- Mapping chromatin state of DREs by ChromHMM 18-state tone marks, such as H3K4me1, H3K4me3, and K3K27Ac, model which were confirmed to be enriched in our DREs negatively To annotate the DREs, we downloaded chromHMM correlated its targets (Fig. 2d). Other studies show that active 18-state data of HMEC breast epithelial cells (E119), enhancers with low DNA methylation tend to have gene tar- HeLaS3 cervix cancer cells (E117), colon tissue (E106), gets with high expression [35–38]. DREs positively correlated HepG2 cells (E118), A549 lung cancer cells (E114), and with its targets were enriched for genomic repressive regions gastric tissue (E094) from the ROADMAP Epigenomics and TF repressors (Fig. 2c and Additional file 1:Figures S2 Project. We counted the number of DREs overlapping and S14), but not enriched for active histone marks. This with each chromatin state. For each chromatin state, the suggested that DREs positively correlated its targets may use enrichment fold change and significance were computed different mechanism to indirectly regulate gene expression. by hypergeometric testing using the total CpG probes Herein, we only considered DREs negatively correlated with on HM450 array as control. its targets for further analysis, similar to other studies [9, 10]. We used the hypergeometric test to calculate the stat- A predicted DRE-target pair will be counted as confirmed if istical significance of the over-represented chromatin its DRE and target gene overlapped the two ends of an inter- state for the DREs. We assigned N as the total number action from the IM-PET, ChIA-PET, HiC, or of probes in the HM450 array and K as the number of RAD21-cohesin data [23]. The precision result was similar probes overlapping with the chromatin state under test, but superior to the result obtained through other methods n as the number of DREs from N probes that can regu- (e.g. ELMER [9]) (Additional file 1:FigureS3b). Of note,our late its target genes, and x as the number of DREs over- method output many more negatively correlated EP pairs lapping with the chromatin state under test. The compared with ELMER (Additional file 1:FigureS3).The enrichment fold change was calculated as ratio between datasets of IM-PET, ChIA-PET, and HiC were downloaded x/n and K/N. The over-enrichment of chromatin states from the 4DGenome database [39]. A supplement of HiC in DREs was calculated with hypergeometric distribution. data was downloaded from GEO (GSE63525) and ChIA-PET data were downloaded from ENCODE Histone modifications, sequence conservation, and DNase (ENCSR436IAJ). We used the similar procedure [23]tocon- I hypersensitivity duct the RAD21-cohesin interaction analysis (termed as In order to systematically benchmark the DREs we identi- CNC), which used ChIP-Seq data to find pairs of cohesin fied, we collected epigenomic data of various human cells binding-sites that do not contain CTCF sites. The ChIP-Seq and tissues from the ENCODE Project (Additional file 9: datasets of CTCF and RAD21 were downloaded from EN- Table S8). We downloaded chromatin marks including CODE (ENCFF095BZW, ENCFF001TTK, ENCFF001UNO, histone modifications of H3K4me1, H3K4me3, H3K9me3, ENCFF059UOO, ENCFF594DJD, ENCFF001XLM, ENCFF H3K27me3, H3K27ac, and p300 ChIP-seq signals to 001TTJ, ENCFF001TTK, ENCFF001VDS). evaluate the enhancer activity of distal DREs, from breast cancer cells (MCF-7), colon cancer cells (HCT116), cer- Comparing MICMIC with other methods vical cancer cells (HeLa-S3), liver cancer cells (HepG2), We used IM-PET 23,106 EP interaction pairs between and lung cancer cells (A549). The enrichment of histone 5311 CpG probes and 344 genes as positive control marks at the distal DREs derived from TCGA cancer co- and tested the precision of EP prediction from patient horts was calculated with the epigenome profiling data data by four methods: MICMIC; ELMER; BNstruct from the corresponding cell lines or tissues. To evaluate (Bayesian Network Structure Learning) [40]; and the status of evolutionary conservation, we obtained the NEO2 (Network Edge Orienting (NEO) Software) 100-way PhastCons conservation data to calculate the [41]. All the methods were applied on the expression conservation score for the distal DREs in each cancer. We and methylation data from the same patient cohort of have also tested DNase I hypersensitivity data from TCGA liver cancer. The MICMIC EP prediction was MCF7, HelaS3, A549, and HepG2. We calculated the ranked by the normalized mutual information and Tong et al. Genome Biology (2018) 19:73 Page 14 of 17 conditional mutual information. The ELMER EP pre- and C706D mutations) by point mutagenesis with primers: diction was ranked by the empirical p value (Pe). The Dmt3a-muP705-Forward, GGC AGT GTC GAC AAT BNstruct EP prediction was ranked by the confidence GAC CTC TCC ATT GTC AAC CCT G; threshold (alpha). The NEO2 EP prediction was Dmt3a-muP705-Reverse, TCA TTG TCG ACA CTG CCT ranked by edge orienting score (LEO.NB.OCA). The CCA ATC ACC AGG, with sequencing confirmation. precision rates were calculated and compared when Putative distal regulatory regions and promoters of the tar- selecting the same number of top ranked EP pairs get genes were amplified from human genomic DNA (see from different methods. Additional file 4: Table S3 for primer sequences used in cloning) and inserted into the pGL3-basic vector (Promega). Cell culture For dCas9-TET1 targeting, we used these plasmids: Gastric cancer cell line AGS was from ATCC and liver pCAG-dCas9-5xPlat2AflD and pCAG-scFvGCN4sfGFP cancer cell lines BEL-7402 and PLC8024 were obtained TET1CD (Addgene plasmid #82560 and #82561, gifts from the Institute of Virology of the Chinese Academy of from Izuho Hatada). We generated catalytically inactive Medical Sciences (Beijing, China). AGS cells were cul- TET1 with H1671Y and D1673A mutations with primers: tured in RPMI-1640 medium (Gibco) supplemented with Tet1-muH1671-Forward, TCC CTA CAG GGC CAT 10% fetal bovine serum (HyClone) and 1% Anti-Anti TCA CAA CAT GAA TAA TGG AAG CAC TG; and (Gibco). BEL-7402 and PLC8024 cells were cultured in Tet1-muH1671-Reverse, AAT GGC CCT GTA GGG DMEM medium (Gibco) supplemented with 10% fetal bo- ATG AGC ACA GAA GTC CAG, with sequencing vine serum and 1% Anti-Anti. The AGS cell line can be ef- confirmation. fectively transiently transfected with efficiency > 50% with Before we decided to use single sgRNA to target one dis- lipofectamine3000. We selected liver cancer cell line tal DRE, we tested two or three sgRNAs in combination to BEL-7402 to test the effect of downregulation of tumor target one distal DRE. However, there is no difference for suppressors, such as KLF9, APOA1, APOC3, and the dCas9 targeting effect on the target gene expression. CBFA2T3. We used liver cancer cell line PLC8024, a more aggressive one compared with BEL-7402, to test the effect Transfections and control design of downregulation of oncogenes, such as HDCA11, All transfections were done with lipofectamine 3000 CDT1, NDRG1, TKT, TK1, BIRC5, and SLC16A3. (Invitrogen) according to the manufacturer’s instruc- tions. The ratios of co-transfected plasmids were as fol- RNA purification and qPCR lows: 1 gRNAs: 2 px330-dCas9: 1 pcDNA3-Dnmt3A-3 L Total RNA was purified using the method described pre- (test) or pcDNA3 (control) for qPCR; 1 gRNAs: 1 viously [42], followed by treatment with RNase-free pCAG-dCas9-5xPlat2AflD: 1 pCAG-scFvGCN4sfGFP DNaseI (NEB). RevertAid RT Reverse Transcription Kit TET1CD (test) or pcDNA3 (control) for qPCR; 19 (Thermo) was used to perform the first strand cDNA pGL3-promoter or pGL3-promoter-enhancer: 1 pRL-TK synthesis according to the manufacturer’s instructions. for luciferase assay; and 5 gRNAs: 10 Px330-dCas9: 5 For qPCR analysis, cDNA was subjected to quantifica- pcDNA3-Dnmt3A-3 L (test) or pcDNA3 (control): 19 tion by iTaq Universal SYBR green supermix (Bio-Rad). pGL3-promoter or pGL3-promoter-enhancer: 1 pRL-TK for luciferase assay. Plasmids and cloning Above “pcDNA3 (control)” is a control for dCas9 target- Catalytic domains of Dnmt3a and Dnmt3l were ampli- ing, in which dCas9 co-transfected with empty pcDNA3 fied from mouse cDNA and were fused to form without DNMT3A-3 L/TET1. The same conclusion as Dnmt3A-3 L. PUFa from pAC1405-pCR8-4xNLS shown in Figs. 3d and 4c can be reached by using scram- _PUFa_2xNLS (Addgene #71903) were fused with bled sgRNA as the control for the qPCR test (Additional Dnmt3a-3 l into vector of pcDNA3-Flag-HA (Addgene file 1: Figure S8c). dCas9 targeting specificity was con- #10792, a gift from William Sellers). gRNAs were cloned firmed with off-target test by bisulfite sequencing of into pAC1371-pX-sgRNA-5xPBSa (Addgene #71888, non-targeted sites (WNT5B-sgRNA in Additional file 1: Additional file 4: Table. S3 for sgRNA sequences). Figure S7a vs Fig. 3c, and NDRG1-sgRNA in Additional pAC1405-pCR8-4xNLS_PUFa_2xNLS and pAC1371-pX file 1: Figure S7b). dCas9 targeting specificity was also -sgRNA-5xPBSa were gifts from Albert Cheng (Addgene confirmed with qPCR quantifying other non-targeted plasmid #71888, Addgene #71903). dCas9 expression genes with WNT5B-sgRNA (Additional file 1:Figure S7c). plasmid was generated by replacement of the cas9 with Furthermore, we performed experiments by using “untar- dCas9 cassette in px330 vector (px330, Addgene plasmid geted” or catalytically inactive DNMT3A-3 L/TET1 to #42230, a gift from Feng Zhang; 3xFLAG-dCas9/ rule out the possibility of off-target due to overexpression pMXs-neo Addgene plasmid #51260, a gift from Hodaka DNMT3A-3 L/TET1 (Additional file 1: Figure S8a,b). The Fujii). We generated catalytically inactive Dnmt3a (P705V “untargeted” constructs were generated by removal of the Tong et al. Genome Biology (2018) 19:73 Page 15 of 17 PUFa linker from DNMT3A-3 L-fusion, or removal of positive-up, or positive-down), we counted the number of scFv linker from TET1-fusion (Additional file 1: DREs containing the binding site of the TF being tested, de- Figure S6). For these “untargeted,” catalytically active noted as variable “a” below and variable “b” fornumberof DNMT3A-3 L/TET1 was overexpressed but targeted DREs not containing the TF being tested. For the entire to nowhere due to the deletion of “linker” domain. The DREs combining the four subgroups, we can also get simi- “untargeted” or catalytically inactive DNMT3A-3 L/TET1 larnumberas “c” and “d” for containing and not containing did not result in any significant change of the target gene the TF being tested, respectively. Calculation of the enrich- expression (Additional file 1: Figure S8a, b). ment odds ratio (OR) and a 95% confidence interval (CI) was conducted with the following formulas: cDNA cloning and overexpression in lentivirus OR ¼ðÞ a=c =ðÞ b=d We cloned HDAC11, WNT5B, and MLEC cDNA from human cDNA library. We then inserted each pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CI ¼ exp logðÞ OR  1:96 1=a þ 1=b þ 1=c þ 1=d cDNA into lentiviral expression vector lenti-Blast, modified from lentiCas9-Blast (Addgene #52962, a gift from Feng Zhang). The lentivirus was packed with We then filtered TFs with an OR > 1.05 as the plasmids pMD2.G and psPAX2 after co-transfection enriched TFs in each DRE category. into 293 T cells. All cDNAs have been confirmed by DNA sequencing. Evaluate the tissue specificity of genes We downloaded the gene expression data of human tis- Dual luciferase assay sues from GTEx (GTEx V6 dataset) [44]. We used the The Dual-Luciferase Reporter Assay System (Promega) Voom method to normalize the data and limma [31]to was used in dual luciferase assay according to the manu- identify the differential expression genes comparing facturer’s instructions. samples of one tissue against all other tissues. Genes passing the threshold, log2 transformed Fold-Chang Migration assay >0.58 or< − 0.58 and adjusted p value < 0.01, were 5 6 4×10 of AGS cells or 1 × 10 BEL-7402 and PLC8024 identified as the tissue specific ones. cells were used to conduct migration assay using the 12-Well Chemotaxis Chamber (Neuro Probe) according Enrichment of transcription repressors to the manufacturer’s instructions. We searched the AmiGO database [45]with the key words “transcription repressor” and “negative regulation” to obtain Cell proliferation assay a list of genes related to the transcriptional repression CCk-8 (Dojindo) was used to perform cell proliferation process and collected the repressor information from assay following the manufacturer’s instructions. GO:0017053, GO:0090571, GO:0001206, GO:0001227, GO:0001191, GO:0000900, GO:0070491, GO:0070176, 5-aza-deoxycytidine treatment GO:0003714,GO:0032785, GO:2000143, GO:1903507, and AGS, BEL-7402, and PLC8024 cell lines were treated GO:0001078. These gene sets include transcriptional re- with 10 μM 5-aza-dC (Sigma-Aldrich) for 48 h, followed pressor activity, translation repressor activity, and transcrip- by RNA purification and qRT-PCR as described. DMSO tion repressor complex. Enrichment of transcription was used as a control to establish baseline expression. repressor of TFs associated with distal DREs was conducted by hypergeometric analysis. Identification of enriched transcription factor bindings For a distal DRE-target pair, a TF is considered a Discovery of core transcriptional regulatory circuitry regulator of the target if the cognate binding motif Core regulatory circuitry (CRCs) is formed by a set of of this TF can be found on the ± 250-bp genomic re- key TFs associated with super-enhancers (SEs) in gions flanking the DRE. To identify TFs associated regulating tumor-subtype specific gene expression and with the ± 250-bp genomic region flanking each maintaining tumor cellular identity. The TFs in each DRE, we used TFs from the Mocap database, con- CRC are auto-regulated by themselves via binding taining genomic mapping for 823 TFs [43]with sites on their corresponding SE. The TFs can also binding quality. Stringent cut-off (p value < 1e-5) cross-regulate each other by forming an intercon- was applied to select the TF binding sites. Mocap nected loop via cognate binding sites on other TFs’ method is an integrated classifier that assembles motif related SEs. Based on this information, we took ad- scores, chromatin accessibility, TF footprints, evolutionary vantageof thegenome-wideinformation on distal conservation, and other factors to predict TF bindings. For DRE-targets generated from our MICMIC method to each DRE category tested (negative-up, negative-down, assemble the CRCs regulated by DNA methylation for Tong et al. Genome Biology (2018) 19:73 Page 16 of 17 each cancer type. The information for SEs of human Additional file 8: Table S7. Cosmic non-coding variations neighboring genome hg19 was downloaded from dbSUPER [46]. LIHC DRE. (XLSX 187 kb) First, we selected the distal DREs overlapping with Additional file 9: Table S8. Data source of histone marks and DNase I. (XLSX 10 kb) SEs and identified the enriched TFs (OR > 1.05, CI = 95%) associated with these distal DREs. We then pre- Abbreviations dicted the auto-regulatory loops with the following DREs: Direct regulatory elements; MRA: Master regulator analysis; TCGA: The criterion: the TF on test is under regulation of distal Cancer Genome Atlas; TF: Transcription factor; TSS: Transcription start sites DREs with binding sites for TF itself. Cross-regulation between a pair of TFs can be inferred if the cognate Funding This work was supported by the Research Grants Council of Hong Kong binding site of one TF can be found on the other grants, HKU 17127014 General Research Fund (JZ) and HKU T12–710/16R TF’s related SE. After putting together all of the auto- Theme-based Research Scheme (SYL and JZ). and cross-regulations, we generated an interconnected Availability of data and materials CRCs eventually. TCGA methylation and RNA-seq data are available from BROAD GDAC Firehose ( http://gdac.broadinstitute.org/runs/stddata__2016_01_28/)[48]. The datasets of IM-PET, ChIA-PET, and HiC were downloaded from the TF targets and downstream cancer pathway analysis 4DGenome database (https://4dgenome.research.chop.edu/Tables/4DGeno- As mentioned before, for a distal DRE-target pair, a TF me_full.txt)[39]. The histone and DNase I signals can be accessed from ENCODE (https:// is considered a regulator of the target if the cognate www.encodeproject.org/matrix/?type=Experiment)[49] with accession ID binding motif of this TF can be found on the surround- listed in Additional file 9: Table S8. ing regions of the DRE (± 250 bp). After identification of The MICMIC software is available from https://github.com/ZhangJlab/MICMIC, under Creative Commons Attribution 4.0 license [50]. The DOI for source code the targets for CRCs, we conducted enrichment analysis used in this article is https://doi.org/10.5281/zenodo.1220929 [51]. for the downstream pathways. Enrichment analysis of KEGG cancer pathways was conducted to identify the Authors’ contributions pathway targeted by CRC TFs highly expressed in each YT performed the computational analyses with the aid of BR. JS performed experiments with the aid of CW, QK, and CNW. YT and JZ analyzed the data. SYL tumor subtype (cut-off p value < 0.05). and ASC provided support and critical comments to the manuscript. JZ conceived of, designed, and directed the study. YT and JZ wrote the manuscript with help from all authors. All authors read and approved the final manuscript. TF network decomposition and network motif identification Ethics approval and consent to participate The TF network mediated by distal DREs was derived Not applicable. from genome-wide DRE-target information predicted by Competing interests MICMIC after removal of non-TF genes. For network The authors declare that they have no competing interests. motif analysis, we used the mfinder software [47] to dis- assemble the TF network. On average across the 11 can- Publisher’sNote cer types, the TF networks were decomposed into 1.85 Springer Nature remains neutral with regard to jurisdictional claims in million three-node subgraphs with 13 types of published maps and institutional affiliations. three-node network motifs identified. Relative enrich- Author details ment or depletion of each of the 13 basic network motifs School of Biological Sciences, The University of Hong Kong, Hong Kong, within each cancer was calculated. Two hundred ran- Hong Kong. Department of Pathology, The University of Hong Kong, Queen Mary Hospital, Pokfulam, Hong Kong. domized same-size networks were used as the random control and the significance Z-score was calculated Received: 27 October 2017 Accepted: 3 May 2018 (Z > 2 considered as enriched and Z < −2asdepleted). References Additional files 1. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011;43:768–75. Additional file 1: Supplementary figures. (DOCX 10287 kb) 2. Baylin SB, Jones PA. A decade of exploring the cancer epigenome—biological Additional file 2: Table S1. Barcode of TCGA samples. (XLSX 81 kb) and translational implications. Nat Rev Cancer. 2011;11:726. 3. Sproul D, Nestor C, Culley J, Dickson JH, Dixon JM, Harrison DJ, et al. Additional file 3: Table S2. DREs identified from 11 cancers. Transcriptionally repressed genes become aberrantly methylated and (XLSX 5825 kb) distinguish tumors of different lineages in breast cancer. Proc Natl Acad Sci Additional file 4: Table S3. sgRNA-primer of DRE tested. (XLSX 11 kb) U S A. 2011;108:4364–9. Additional file 5: Table S4. Category of methylation change. 4. Hovestadt V, Jones DT, Picelli S, Wang W, Kool M, Northcott PA, et al. (XLSX 5313 kb) Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014;510:537–41. Additional file 6: Table S5. Interconnected loops in core regulatory 5. Bernstein DL, Le Lay JE, Ruano EG, Kaestner KH. TALE-mediated epigenetic circuits. (XLSX 40 kb) suppression of CDKN2A increases replication in human fibroblasts. J Clin Additional file 7: Table S6. TF_network_motifs. (XLSX 11 kb) Invest. 2015;125:1998–2006. Tong et al. Genome Biology (2018) 19:73 Page 17 of 17 6. Cui C, Gan Y, Gu L, Wilson J, Liu Z, Zhang B, et al. P16-specific 34. Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape DNA methylation by engineered zinc finger methyltransferase of gene promoters. Nature. 2012;489:109–13. inactivates gene transcription and promotes cancer metastasis. 35. Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A Genome Biol. 2015;16:252. unique chromatin signature uncovers early developmental enhancers in 7. Baylin SB. DNA methylation and gene silencing in cancer. Nat Clin Pract humans. Nature. 2011;470:279–83. Oncol. 2005;2:S4–S11. 36. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, 8. Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites et al. Histone H3K27ac separates active from poised enhancers and predicts characterizes dysregulation of cancer genes. Genome Biol. 2013;14:R21. developmental state. Proc Natl Acad Sci U S A. 2010;107:21931–6. 37. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, et al. DNA- 9. Yao L, Shen H, Laird PW, Farnham PJ, Berman BP. Inferring regulatory binding factors shape the mouse methylome at distal regulatory regions. element landscapes and transcription factor networks from cancer Nature. 2011;480:490–5. methylomes. Genome Biol. 2015;16:105. 38. Burger L, Gaidatzis D, Schubeler D, Stadler MB. Identification of active 10. Bell RE, Golan T, Sheinboim D, Malcov H, Amar D, Salamon A, et al. Enhancer regulatory regions from DNA methylation data. Nucleic Acids Res. 2013; methylation dynamics contribute to cancer plasticity and patient mortality. 41:e155. Genome Res. 2016;26:601–11. 39. Teng L, He B, Wang J, Tan K. 4DGenome: a comprehensive database of 11. Lin X, Su J, Chen K, Rodriguez B, Li W. Sparse conserved under-methylated chromatin interactions. Bioinformatics. 2015;31:2560–4. CpGs are associated with high-order chromatin structure. Genome Biol. 40. Franzin A, Sambo F, Di Camillo B. bnstruct: an R package for Bayesian 2017;18:163. Network structure learning in the presence of missing data. Bioinformatics. 12. Kalisch M, Buhlmann P. Estimating high-dimensional directed acyclic graphs 2017;33:1250–2. with the PC-algorithm. J Mach Learn Res. 2007;8:613–36. 41. Aten JE, Fuller TF, Lusis AJ, Horvath S. Using genetic markers to orient 13. Kalisch M, Buhlmann P. Robustification of the PC-Algorithm for Directed the edges in quantitative trait networks: the NEO software. BMC Syst Acyclic Graphs. J Comput Graph Stat. 2008;17:773–89. Biol. 2008;2:34. 14. Cheng AW, Jillette N, Lee P, Plaskon D, Fujiwara Y, Wang W, et al. Casilio: a 42. Shatzkes K, Teferedegne B, Murata H. A simple, inexpensive method for versatile CRISPR-Cas9-Pumilio hybrid for gene regulation and genomic preparing cell lysates suitable for downstream reverse transcription labeling. Cell Res. 2016;26:254. quantitative PCR. Sci Rep. 2014;4:4659. 15. Morita S, Noguchi H, Horii T, Nakabayashi K, Kimura M, Okamura K, et al. 43. Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of Targeted DNA demethylation in vivo using dCas9-peptide repeat and scFv- transcription factor binding sites from chromatin accessibility. Nucleic Acids TET1 catalytic domain fusions. Nat Biotechnol. 2016;34:1060–5. Res. 2017;45:4315–29. https://www.nature.com/articles/ncomms15943. 16. Liu XS, Wu H, Ji X, Stelzer Y, Wu X, Czauderna S, et al. Editing DNA 44. Consortium GT. The Genotype-Tissue Expression (GTEx) project. Nat Genet. Methylation in the Mammalian Genome. Cell. 2016;167:233–47. e217 2013;45:580–5. 17. Huang YH, Su J, Lei Y, Brunetti L, Gundry MC, Zhang X, et al. DNA epigenome 45. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, et al. AmiGO: online editing using CRISPR-Cas SunTag-directed DNMT3A. Genome Biol. 2017;18:176. access to ontology and annotation data. Bioinformatics. 2009;25:288–9. 18. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. 46. Khan A, Zhang X. dbSUPER: a database of super-enhancers in mouse and Reverse engineering of regulatory networks in human B cells. Nat Genet. human genome. Nucleic Acids Res. 2016;44:D164–71. 2005;37:382. 47. Kashtan N, Itzkovitz S, Milo R, Alon U. Efficient sampling algorithm for 19. Zhang X, Zhao X-M, He K, Lu L, Cao Y, Liu J, et al. Inferring gene regulatory estimating subgraph concentrations and detecting network motifs. networks from gene expression data by path consistency algorithm based Bioinformatics. 2004;20:1746–58. on conditional mutual information. Bioinformatics. 2011;28:98–104. 48. Broad Institute TCGA Genome Data Analysis Center. Analysis-ready 20. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and standardized TCGA data from Broad GDAC Firehose 2016_01_28 run. Broad characterization. Nat Methods. 2012;9:215–6. Institute of MIT and Harvard. Dataset. 2016. https://doi.org/10.7908/C11G0KM9. 21. Charlet J, Duymich CE, Lay FD, Mundbjerg K, Sørensen KD, Liang G, et al. 49. Consortium EP. Identification and analysis of functional elements in 1% of Bivalent regions of cytosine methylation and H3K27 acetylation suggest an the human genome by the ENCODE pilot project. Nature. 2007;447:799. active role for DNA methylation at enhancers. Mol Cell. 2016;62:422–31. 50. Tong Y, Sun J, Wong CF, Kang Q, Ru B, Wong CN, et al. MICMIC: identification 22. Consortium EP. An integrated encyclopedia of DNA elements in the human of DNA methylation of distal regulatory regions with causal effects on genome. Nature. 2012;489:57–74. tumorigenesis. Github. https://github.com/ZhangJlab/MICMIC. (2018). 23. He B, Chen C, Teng L, Tan K. Global view of enhancer–promoter Accessed 19 Apr 2018. interactome in human cells. Proc Natl Acad Sci. 2014;111:E2191–9. 51. Tong Y, Sun J, Wong CF, Kang Q, Ru B, Wong CN, et al. MICMIC: 24. Fullwood MJ, Liu MH, Pan YF, Liu J, Han X, Mohamed YB, et al. An oestrogen identification of DNA methylation of distal regulatory regions with causal receptor α-bound human chromatin interactome. Nature. 2009;462:58. effects on tumorigenesis. zenodo. https://zenodo.org/record/1220929#. 25. Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, et al. Super- WuH5WC7wbIU (2018). Accessed 19 Apr 2018. enhancers in the control of cell identity and disease. Cell. 2013;155:934–47. 26. Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–19. 27. Network CGA. Comprehensive molecular portraits of human breast tumors. Nature. 2012;490:61. 28. Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, et al. Superfamilies of evolved and designed networks. Science. 2004;303:1538–42. 29. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012;150:1274–86. 30. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75. 31. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. 32. Lachmann A, Giorgi FM, Lopez G, Califano A. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics. 2016;32:2233–5. 33. Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology Springer Journals

MICMIC: identification of DNA methylation of distal regulatory regions with causal effects on tumorigenesis

Free
17 pages

Loading next page...
 
/lp/springer_journal/micmic-identification-of-dna-methylation-of-distal-regulatory-regions-YyCWfGlvJM
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s).
Subject
Life Sciences; Animal Genetics and Genomics; Human Genetics; Plant Genetics and Genomics; Microbial Genetics and Genomics; Bioinformatics; Evolutionary Biology
eISSN
1474-760X
D.O.I.
10.1186/s13059-018-1442-0
Publisher site
See Article on Publisher Site

Abstract

Aberrant promoter methylation is a common mechanism for tumor suppressor inactivation in cancer. We develop a set of tools to identify genome-wide DNA methylation in distal regions with causal effect on tumorigenesis called MICMIC. Many predictions are directly validated by dCas9-based epigenetic editing to support the accuracy and efficiency of our tool. Oncogenic and lineage-specific transcription factors are shown to aberrantly shape the methylation landscape by modifying tumor-subtype core regulatory circuitry. Notably, the gene regulatory networks orchestrated by enhancer methylation across different cancer types are seen to converge on a common architecture. MICMIC is available on https://github.com/ZhangJlab/MICMIC. Keywords: DNA methylation, Enhancer, Bioinformatics, Cancer, Information theoretic approaches, Epigenetic editing Background methylation may be similar to somatic mutations in can- Appropriate DNA methylation patterns are critical for cer, in which only a subset of events is causal or “drivers,” (epi)genomic stability and gene expression regulation [1]. while most are “passengers.” To identify the subset that In particular, it is well established that promoter hyperme- are causal, we need solutions that enable us to: (1) thylation is a common epigenetic mechanism for tumor genome-wide identify causal DNA methylation of en- suppressor inactivation in cancer [2]. However, many hancers and its gene targets in pan-cancers in an unbiased genes lowly expressed in normal samples were not differ- manner; and (2) directly validate a specific methylation entially expressed with differentially methylated promoter event on the putative enhancer by experimentation. [3, 4]. Some genes have been verified to be regulated by Pharmacological inhibition of DNA methylation with the aberrant promoter methylation with a causal effect on drug 5-azacitidine is commonly used for experimental val- tumorigenesis, including CDKN2B, CDKN2A, RB, APC, idation, but it induces genome-wide DNA demethylation BRCA1, and MLH1 [5–7]. Recently, DNA methylation of without specificity. enhancers in various cancers has been under intense study In this study, we designed a set of tools for identifying [4, 8–11]. However, its exact role and whether it is merely genome-wide DNA methylation of distal regulatory sites a marker of malignancy or a causal factor is largely un- that result in a causal effect on tumorigenesis. De novo known. Some of these studies focused on well-annotated enhancers/silencers and its direct gene targets were in- enhancer regions. However, the annotated enhancer sites ferred by information theoretic approaches [12, 13] and are mainly derived from the epigenome profiling of lim- validated with the emerging CRISPR/dCas9 epigenetic ited cell lines or tissues, lacking an in-depth coverage of editing [14–17] technique. Information theoretic ap- distal regulatory sites in patient cancer samples. DNA proaches have been proved effective to distinguish the direct from indirect connection in other applications with solid mathematical proof [18, 19]. Strikingly, we * Correspondence: jzhang1@hku.hk have found that the modulation of DNA methylation on Yin Tong and Jianlong Sun contributed equally to this work. School of Biological Sciences, The University of Hong Kong, Hong Kong, distal regulatory sites by dCas9-DNMT3A-3 L has pro- Hong Kong found effect on cancer cell behavior similar to promoter Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Tong et al. Genome Biology (2018) 19:73 Page 2 of 17 methylation, e.g. cell migration and proliferation altered TCGA level 3 datasets for various cancers, encompassing along with target gene expression change, even though HumanMethylation450 array and RNA-sequencing the distal regulatory site 200 kb away. By contrast, (RNA-seq) data. As an example, in the TCGA gastric can- dCas9-TET1 has the opposite effect on its target gene cer cohort (STAD) for the gene CDCA5, we identified ten expression. Our strategy recovered many known en- DREs associated with CDCA5 expression, with four of hancers and unannotated regulatory sites from different them > 240 kb away from the TSS of CDCA5 (Fig. 1c). cancer types, differential methylation of which regulated Subsequently, we successfully experimentally verified one known or novel tumor-suppressor/oncogene with causal of these DREs, cg02933228, which will be discussed fur- effect on cell malignancy and patient survival. Furthermore, ther below. The false discovery rate (FDR) for MICMIC our study also provides mechanistic insight on how DNA was 0.05 based on simulation testing (Fig. 1d). methylation of distal regulatory sites is critical for the main- tenance of tumor cell identity and malignancy with gene Genomic features enriched in distal regulatory network perspective. interactions identified by MICMIC From analysis of 11 different cancer types from the TCGA Results datasets, the number of DREs was in the range of 2192– Pipeline for MICMIC to infer methylation regulation 13,027 (total 73,255) and the number of DRE-target pairs networks was in the range of 2234–13,570 (total 80,334). Of To identify driver methylation events during tumorigen- DRE-target pairs, 57.4% were cancer specific and 42.6% esis, we developed a strategy based on information theor- shared by more than one cancer type. A total of 55,993 etic approaches to distinguish the direct from indirect DREs that were > 2 kb away from the TSS were termed dis- correlation between the methylation of CpG probes and tal DREs, similar to a previous study [9]. Of the promoter the expression of its potential gene targets. Our method, DREs (≤ 2 kb), 88.8% were negatively correlated with their “Methylation Regulation Network Inference by Condi- target genes (Fig. 2a, b), among which the majority were tional Mutual Information Based PC-algorithm” (MIC- downregulated (Additional file 1:FigureS1).The percentage MIC), is composed of three layers. The bottom layer uses of negative and positive correlations for distal DRE-target conditional mutual information (CMI) to determine the pairs were 37.9% and 62.1%, respectively (Fig. 2a, b). To dependence relationship between three nodes, genes, and/ identify enriched genomic features, we used the ENCODE or CpG probes (Fig. 1a). If variables X and Y are con- ChromHMM 18-state models to annotate the distal DREs nected only via A, then CMI(X,Y|A) will be close to zero, for 6/11 cancer types based on the availability of the corre- indicating that there is no direct connection between X sponding cell line data (see “Methods”)[20]. Of the six and Y. The middle layer uses a path consistency algorithm tested, all of the distal DREs negatively correlated with its (PC-algorithm) to infer the regulatory network that in- targets were enriched (p value < 0.01) in two or more en- cludes all nodes (Fig. 1a). To start with, all nodes are con- hancer regions (EnhG1, EnhG2, EnhA1, EnhA2, EnhWk), sidered connected and each edge is tested by CMI based suggesting that methylation of an enhancer could on the observed data. The final network emerges after all negatively regulate target gene expression (Fig. 2c and false positive connections are eliminated. Finally, in the Additional file 1: Figure S2). On the contrary, all of the top layer, MICMIC identifies each CpG probe and its dir- DREs positively correlated with its targets were enriched ect target(s) as a pair, denoted here as a DRE-target pair in one or two of the repressor regions (ReprPC, (DRE, direct regulatory elements) (Fig. 1b). Since many ReprRCWk), but not in the enhancer regions (Fig. 2c and methylation events are merely a consequential effect of Additional file 1: Figure S2). Bivalent Enhancers (EnhBiv), the cancerous state rather than being causal, MICMIC first reported in stem cells [21], were enriched in both was purposely designed not to call differentially methyl- negatively and positively correlated DREs. We then ated regions. To identify DRE-target pairs relevant to compared both negatively and positively correlated DREs tumorigenesis, we focused on genes that were determined for the enrichment of active chromatin marks (H3K27ac, to be essential for tumorigenesis by differential expression H3K4me1, p300, and DNase I hypersensitivity) and re- test and master regulator analysis (MRA), which was pressive marks (H3K9me3 and H3K27me3). We observed designed to quantify the enrichment of cancer signature strong enrichment of active marks around the negatively genes among the regulatory neighbors of the target gene correlated distal DREs and strong enrichment of repres- (see “Methods”). For each target gene tested, we included sive marks at the positively correlated ones (Fig. 2d and all nearby genes and CpG probes ±300 kb away from the Additional file 1: Figure S4). Enrichment of H3K4me3, transcriptional start site (TSS) of the gene and merged the marker of active promoters, was only observed at a expression and methylation matrix together. The minority (< 30%) of negatively-correlated DREs, which were CMI-based PC algorithm inferred the regulatory network 2–3 kb away from TSS (Additional file 1:FigureS4c). and the DRE-target pair (see “Methods”). We downloaded Similarly, the PhastCons conservation score reached its peak Tong et al. Genome Biology (2018) 19:73 Page 3 of 17 Fig. 1 Pipeline for inferring methylation regulation networks. a Top: Schematic of the MICMIC pipeline that uses information theoretic approaches to distinguish direct regulation from indirect correlation, where variables X and Y are connected only via variable A, then CMI(X,Y|A) will be close to zero, suggesting that there is no direct connection between X and Y. Bottom: A PC algorithm is used to infer the regulatory network from the observed data matrix, eliminating the indirect edges by CMI testing. b MICMIC is designed to identify the regulatory relationship between the methylation level of a CpG probe and the expression level of its potential gene target. For every target gene tested, we included all nearby genes and CpGs ± 300 kb from the transcription start site (TSS) of the test gene and merged the related expression and methylation matrix together. Then MICMIC applies the CMI-based PC algorithm to infer the regulatory network. CpG probes that passed the test were named direct regulatory elements (DREs). The DRE and its gene targets were denoted as DRE-target pairs. c A representative example of the MICMIC output for the CDCA5 gene, where ten DREs (nine shown here) were identified to be associated with CDCA5 expression in gastric cancer (TCGA STAD), four of which were at least 240 kb away from the TSS of the target gene. One of these DREs, cg02933228 (blue oval), was experimentally verified. In the lollipop diagram, green represents significant CpG probes, i.e. DREs. The Pearson correlation coefficient (PCC) for each DRE-target pair was represented by a vertical line (red for negative PCC and green for positive PCC). d Simulation test to justify the MICMIC p value cut-off. The number of actual DREs identified (blue) vs the number of DREs identified by chance (red) at various p value cut-offs at the center of the genomic regions flanking negatively cor- separatedbyupto25kband 50%evenwhenthe pairswere related distal DREs across all cancer types, and was signifi- separatedupto 100 kb (Fig. 2f). The TCGA samples ana- cantly higher than the control group (p value was in the lyzedinthisstudy andthe DREsidentified by MICMIC were range of 2.18e-68–0.027) (Fig. 2e). By contrast, there is much listed in the following tables (Additional file 2:Table S1 and weaker or no enrichment for positively correlated distal Additional file 3:Table S2). DREs (Fig. 2e). The precision of our distal DRE-target pre- diction was evaluated by different chromatin interaction Validation of causal DNA methylation events involved in datasets, such as IM-PET, Hi-C, RAD21-cohesin, and tumorigenesis by epigenome engineering techniques in ChIA-PET [22–24](see “Methods,” only negatively corre- gastric cancer lated pairs considered here). The precision rate of MICMIC We chose distal DRE-target pairs for validation if: (1) reached up to 90% when the DRE-target pairs were there was a strong correlation between expression and Tong et al. Genome Biology (2018) 19:73 Page 4 of 17 Fig. 2 Genomic features of DREs identified across cancers. a Bar chart showing the number of promoter DRE-gene pairs (top) and distal DRE- gene pairs (bottom) identified from the TCGA cancer cohorts. Blue bars indicate the fraction of the DRE-gene pairs shared by more than one cancer type, while red bars indicate the fraction of the cancer-type specific DRE-gene pairs. b The negative and positive correlation between DRE methylation and its target gene expression are shown by red and blue, respectively. Promoter pairs are mainly negatively correlated (88% in total). c Representative results showing the preferred chromatin state of distal DREs in liver (HepG2) and breast (HMEC) cancer cell lines. The distal DREs for each cell line were inferred from those identified from the corresponding TCGA cohort, LIHC and BRCA, respectively, here in this example. The number of distal DREs were counted at each chromatin state, with the heatmap color and number indicating the enrichment p value of distal DREs in each state. Results for other cancer types can be found in Additional file 1: Figure S2. d Representative results in liver cancer cell line HepG2, showing increased chromatin signals for H3K27ac, H3K4me1, p300, and DNase I hypersensitivity at genomic regions surrounding the negatively correlated distal DREs, in contrast to increased chromatin signals for repressive marks H3K9me3 and H3K27me3 surrounding the positively correlated distal DREs. P values were calculated by t-test comparing the signals of negatively correlated distal DREs vs that of an all probes control. The results for other cancer cell lines can be found in Additional file 1: Figure S4. e Average conservation score of distal DRE flanking regions. PhastCons conservation score was in the range of 0–1 (non- to perfectly conserved). P values are calculated between distal DREs and all probes (as control) by t-test. f The precision of DRE-target pairs predicted by MICMIC was determined by calculating the positive predictive value (PPV) in comparison with other chromatin interaction data, including IM-PET, Hi-C, RAD21-cohesin, and ChIA-PET (see “Methods,” only negatively correlated pairs considered here) methylation, represented by a significant Pearson correl- produced similar results to treatment with the global DNA ation coefficient (PCC < − 0.3 or > 0.3); and (2) the target methylation inhibitor, 5-AZA, and upregulated WNT5B gene was determined to be essential for tumorigenesis (Fig. 3b). The effect of targeting DNMT3A-3 L/TET1 to by differential expression test and MRA (see “Methods,” the distal DRE site of WNT5B was confirmed by bisulfite Figs. 3a and 4a, and Additional file 1: Figures S5 and sequencing without off-target on other genes (Fig. 3c and S7). For example, in gastric cancer, WNT5B expression Additional file 1: Figure S7a). We then tested if modulation and methylation of its distal DRE (cg02935351) were of the DNA methylation of distal DREs could affect cell strongly anti-correlated and WNT5B was predicted to migration. Strikingly, cancer cell migration increased as a be a tumor suppressor by MRA (Fig. 3a). Next, we per- result of DNMT3A-3 L targeting, but decreased by TET1 formed epigenetic editing by using CRISPR-dCas9 based targeting or overexpression of WNT5B complementary technologies, such as the casilio system [14] for targeted DNA (cDNA) (Fig. 3b and Additional file 1:FigureS9).To methylation with a DNMT3A-3 L fusion protein and the further confirm the regulatory function of this distal DRE re- dCas9-SunTag scaffold with scFv–TET1 catalytic domain gion, we cloned a 1-kb genomic region flanking cg02935351 fusions [15] for targeted demethylation to the intended and the WNT5B promoter into the pGL3 luciferase reporter genomic sites in the AGS human gastric cancer cell line vector and verified its putative enhancer status (Fig. 3f). (Additional file 4: Table S3). Remarkably, targeting Interestingly, co-transfection with dCas9-DNMT3A-3 L was DNMT3A-3 L to the region near cg02935351 downregu- also able to regulate the reporter constructs (Fig. 3f). In lated WNT5B, while targeting TET1 to this region addition, we verified several other genes, including MLEC, Tong et al. Genome Biology (2018) 19:73 Page 5 of 17 Fig. 3 Validation of causal DNA methylation events in gastric cancer. a Representative results showing the negative regulation of WNT5B by methylation of its distal DRE (cg02935351, 22,595 bp from WNT5B TSS). Box plot shows the high, middle, and low expression groups of WNT5B, plotted against the methylation of the distal DRE in each group. MRA analysis was implemented by the gene set enrichment analysis (GSEA) method. GSEA graphs show tumor-suppressive signatures of WNT5B by MRA. Correlation analysis and MRA for other genes are shown in Additional file 1: Figure S5. b Confirmation that methylation of the distal DRE is the causal event for WNT5B regulation and cellular malignancy. qPCR results showing increased WNT5B expression in AGS cells treated with 5-AZA or dCas9-TET1, and decreased expression in cells transfected with dCas9-DNMT3A-3 L relative to controls. Cell migration assays showed that dCas9-DNMT3A-3 L targeting increased cell migration, while overexpression (OE) of WNT5B suppressed tumor cell migration. Significance was determined by t-test and error bars represent ± SD. c Bisulfite sequencing validation of increased methylation of the CpGs surrounding the dCas9-DNMT3A-3 L targeted DRE of WNT5B. In the lollipop diagram, black circles stand for methylated Cs and white circles for unmethylated Cs. Each box below corresponds to one CpG position in the genomic sequence. The colored bars summarize the methylation states of all sequences at that position with yellow for methylated Cs and blue for unmethylated Cs. d qPCR results for eight gastric cancer genes after dCas9-DNMT3A-3 L/TET1 epigenetic editing with dCas9-only as control, labelled as ctr1 and ctr2. Three independent replicates were conducted for each experiment. All DRE-target pairs tested here showed strong anti-correlation between expression and methylation, and the qPCR results showed dCas9-TET1 targeting increased gene expression, while dCas9-DNMT3A-3 L targeting inhibited gene expression (p value < 0.01, student t-test). e Summary of cell migration assay results for eight gastric cancer genes, showing the causal effects of distal DRE methylation on cancer cell malignancy. See photos in Additional file 1: Figure S9. f The distal DRE region and promoter of each gene of interest were cloned into the pGL3 reporter vector and assayed for luciferase activity. The reporter constructs were also co-transfected with dCas9-DNMT3A-3 L (pro_enh + targeted_DNMT3A-3 L), resulting in decreased luciferase activity LLGL2, CDCA5, MEN1, CLDN7, SOX9, and FGFR1 by epi- off-target due to overexpression DNMT3A-3 L/TET1 (see genetic modulation of distal DREs followed by quantitative “Methods” and Additional file 1:Figure S6). Overall,our polymerase chain reaction (qPCR), migration assay, and experimental results were fully consistent with MICMIC luciferase reporter assay (Fig. 3d–f,Additional file 1: predictions. As mentioned above in Fig. 1,MICMIC Figure S9). We performed experiments using scrambled predicted a distal DRE for CDCA5, cg02933228, which is single guide RNA (sgRNA), “untargeted,” or catalytically > 240 kb away from the TSS of CDCA5 (Fig. 1c), but we inactive DNMT3A-3 L/TET1 to rule out the possibility of were able to achieve robust regulation of this distal DRE Tong et al. Genome Biology (2018) 19:73 Page 6 of 17 Fig. 4 Validation of causal DNA methylation events in liver cancer. a Representative results showing the negative regulation of HDAC11 by methylation of its distal DRE (cg03190578, 3817 bp from HDAC11 TSS). Box plot shows the high, middle, and low expression groups of HDAC11, plotted against the methylation of the distal DRE in each group. GSEA graphs show oncogenic signatures of HDAC11 by MRA. Correlation analysis and MRA for other genes are shown in Additional file 1:FigureS10. b Confirmation that methylation of the distal DRE is the causal event for HDAC11 regulation and cellular malignancy. qPCR results showing increased HDAC11 expression in PLC8024 cells treated with 5-AZA or dCas9-TET1, and decreased expression in cells transfected with dCas9-DNMT3A-3 L relative to controls. Cell migration assays showed that dCas9-DNMT3A-3 L targeting suppressed cell migration, while overexpression (OE) of HDAC11 increased tumor cell migration. Significance was determined by t-test and error bars represent ± SD. c qPCR results for 11 liver cancer genes after dCas9-DNMT3A-3 L/TET1 epigenetic editing with dCas9-only as control, labelled as ctr1 and ctr2. Three independent replicates were conducted for each experiment. Ten out of 11 DRE-target pairs were predicted to be negatively regulated by DRE methylation, CBFA2T3 was predicted to be positively regulated by methylation of the distal DRE (cg20283771).The qPCR results (p value < 0.01 by Student’s t-test) were consistent with the predictions (Additional file 1:Figure S10). d Summary of results for cell migration and proliferation assays for liver cancer genes, showing the causal effects of distal DRE methylation on cancer cell malignancy. See photos in Additional file 1: Figure S11. e The distal DRE region and promoter of each gene of interest were cloned into the pGL3 reporter vector and assayed for luciferase activity. The reporter constructs were also co-transfected with dCas9-DNMT3A-3 L (pro_enh + targeted_DNMT3A-3 L), resulting in decreased luciferase activity except CBFA2T3 positively correlated with its DRE methylation with dCas9 epigenetic editing (Fig. 3d–f). This same distal Validation of causal DNA methylation events involved in DRE, cg02933228, was also predicted to control the gene tumorigenesis by epigenome engineering techniques in MEN1, which we were also able to experimentally con- liver cancer firm. Additionally, our study is the first to show evidence We also validated MICMIC predictions in liver cancer. of the gene Malectin (MLEC) being a tumor suppressor First, we observed a strong anti-correlation between (Fig. 3d and e, Additional file 1: Figures S5 and S9). Taken HDAC11 expression and cg03190578 methylation (Fig. 4a). together, MICMIC along with MRA was able to identify As expected, targeted methylation with DNMT3A-3 L to causal events in tumorigenesis involving DNA methyla- the cg03190578 region decreased HDAC11 expression, tion of distal regulatory regions, which we were able to while targeted demethylation with TET1 dramatically in- verify via epigenetic editing by dCas9 fused with TET1 or creased HDAC11 expression (Fig. 4b). Consequently, we DNMT3A-3 L and identify novel oncogenes/tumor-sup- found that modulation of DNA methylation on the distal pressors in the process. DRE, cg03190578, by dcas9-DNMT3A-3 L significantly Tong et al. Genome Biology (2018) 19:73 Page 7 of 17 decreased cancer cell migration suggesting an oncogenic oncogenic TFs (Fig. 5b inset and Additional file 1:FigureS13), function for HDAC11 in liver cancer, which was confirmed suggesting that hypomethylation of distal DREs from by increased cell migration upon overexpression of negative-up group together with the increased expres- HDAC11 (Fig. 4b). In contrast, dcas9-TET1 targeting to sion of the cognate oncogenic TFs, consequentially the cg03190578 region increased cancer cell migration lead to upregulation of its distal gene targets. For distal DREs (Additional file 1: Figure S11). In addition to HDAC11, we positively correlated with its targets, we found significant validated other genes as well and identified the distal DREs enrichment of TFs with repressor activity (p value = 8e-7), of HDAC11, APOA1, NDRG1, TK1, and TKT to be en- suggesting that DNA methylation may affect the binding of hancers and the distal DREs of BIRC5, CDT1, CBFA2T3, TF repressors with implications in tumorigenesis (Additional SLC16A3, KLF9, and APOC3 to be silencers (Fig. 4c–e, file 1:FigureS14). Additional file 1: Figures S10 and S11). Among these We also investigated the association between DRE genes, some shared the same distal DRE, e.g. APOA1 methylation and patient survival. We identified 1081 DRE shared cg23193059 with APOC3 and CDT1 shared methylation correlated with patient survival (q-value < 0.1, cg20283771 with CBFA2T3. Intriguingly, methylation FDR by BH procedure) in bladder cancer (BLCA), breast of DRE cg20283771 was positively correlated with cancer (BRCA), head and neck carcinoma (HNSC), liver CBFA2T3 expression, but negatively correlated with cancer (LIHC), lung cancer (LUAD), and uterine corpus CDT1 (Fig. 4c, Additional file 1: Figure S10). Both endometrial cancer (UCEC). For BLCA, the DREs genes were verified to be causally regulated by methy- associated with survival were enriched in intergenic lation of cg20283771 with combined effect on cancer regions. For LUAD and UCEC, the DREs associated cell migration after dCas9-DNMT3A-3 L targeting with survival were enriched in distal regions (enrich- (Fig. 4c–e). For two genes, NDRG1 and TK1, there ment p value < 0.05) (Fig. 5d). We then calculated the was no significant difference in cell migration after number of master cancer genes (via MRA) that are dCas9-DNMT3A-3 L targeting of their distal DREs, regulated by DNA methylation of the promoter or distal but they did show a significant decrease in cell prolif- DREs and used the density distribution to quantify the ef- eration (Fig. 4d). fect that methylation of those DREs have on tumorigen- esis (Fig. 5e,Additional file 1: Figures S15 and S16). The Aberrant methylation landscape of distal DREs can be results indicated that the methylation of distal DREs com- shaped by oncogenic and lineage-specific transcription pared to proximal DREs had more of an impact on the factors (TFs) with profound effects on tumorigenesis and regulation of both oncogenes and tumor suppressors at patient survival the initiation and progression stage of tumorigenesis. We next investigated how TFs can regulate and shape the Furthermore, we analyzed the dynamic change in methy- methylation landscape of distal DREs in cancers (see lation patterns that can occur at distal DREs as tumors “Methods”). First, we categorized all distal DREs in each transition from the initiation to the progression stage. cancer into four subgroups, i.e. negative-up, negative-down, During this transition, methylation patterns of distal DREs positive-up, and positive-down, dependent on whether the can remain the same (“consistent”), become differentially pair was negatively or positively correlated and whether the methylated in the opposite direction (“reversed”), or show target gene was up- or downregulated in tumor versus nor- increased (“stronger”) or decreased (“weaker”)methylation mal samples. After identification of TFs associated with dis- change in the initiation versus the progression stage tal DREs (Additional file 1: Figure S12), we calculated the (Additional file 5: Table S4). Strikingly, distal DREs related PCC between the expression level of each enriched TF and to patient survival were more enriched in the “reversed” the average methylation level of its cognate binding sites on group (Fig. 5f). For example, in uterine cancer, the distal distal DREs for each subgroup (Fig. 5c) and ranked TFs by DRE of PAQR4 was de-methylated at the initiation stage its PCC in ascending order. Strikingly, the top ranked TFs but became re-methylated in higher stage tumors. identified from the negative-down group were mostly Moreover, the high methylation of the DRE and lower ex- tissue-specific TFs across various cancer types, whereas TFs pression of PAQR4 were correlated with poorer patient sur- identified from the negative-up group were mainly onco- vival (Fig. 5g). Many more distal DRE-target pairs fell into genic TFs (Fig. 5a and b, Additional file 1: Figure S13). this category, including the gene STX18 (Fig. 5g). GSEA further confirmed that these tissue-specific TFs are tumor suppressors (Fig. 5a inset), suggesting that hyper- Diverged tumor-subtype core regulatory circuitry and methylated distal DREs from the negative-down group in converged pan-cancer global topology of TF network conjunction with the decreased expression of the cognate associated with distal DRE tissue-specific TFs, lead to downregulation of its distal gene Multiple lines of evidence have indicated that super-enhancers targets in cancer. Similarly, GSEA confirmed that the top (SEs) with associated oncogenic TFs play a pivotal role in ranked TFs in the negative-up group were enriched for regulating and maintaining tumor cellular identity [25, 26]. It Tong et al. Genome Biology (2018) 19:73 Page 8 of 17 Fig. 5 Interplay between distal DRE methylation and the cognate transcription factor binding has profound effects on tumorigenesis. Bar charts show the enriched transcription factors (TFs) that bind to DREs showing a negative correlation between the average methylation of TF binding motifs and TF expression, with (a) downregulation (negative-down DREs) or (b) upregulation (negative-up DREs) of the target gene. Intensity of blue color indicates the degree of tissue specificity of the TF in breast cancer (BRCA) compared to other tissue types. Intensity of red/green color indicates the degree of oncogenic/ tumor suppressive behavior of the TF. Bar chart showing the enriched TFs binding on the group-specific distal DREs. The TFs were ranked by the negative correlation between the TF expression and average DNA methylation of the TF binding motifs on the group-specific distal DREs. The correlation value is shownonthe y-axis. Colors represent the tissue type significance or master regulator significance of the TF gene. The cancer signature association of the TFs is showninthe inset GSEA plot. c Representative examples of TFs that showed a negative correlation between the expression of the TF and average DNA methylation of the TF binding motifs on distal DREs. d Heatmap showing the enrichment significance of DREs associated with patient survival in various genomic regions. The heatmap color and number indicates the enrichment p value of DREs associated with patient survival in each category. e Higher impact of distal DREs on cancer genes compared with promoter DREs during initiation (top) and progression (bottom). Y-axis for the two top waterfall plots indicates the master regulator significance for each gene, ranked from tumor-suppressive to oncogenic. Y-axis for the two bottom panels shows the density of normalized gene counts controlled by promoter or distal DREs. See results of other cancers in Additional file 1: Figure S16. f Heatmap showing the enrichment significance of distal DREs associated with patient survival in the four methylation patterns of distal DREs, i.e. “consistent,” “reversed,”“stronger,” and “weaker” according to the direction of methylation change from the initiation to progression stage of tumorigenesis for each cancer type. The number in each square represents the p values. g Example of two distal-DRE target pairs identified in uterine cancer that show a “reversed” methylation pattern. Box plots on the x-axis show the DRE is demethylated during the initiation stage but becomes remethylated during cancer progression. High methylation of both DREs and low expression of their target genes were associated with poorer patient survival has been shown that SEs function as a platform to integrate a (Additional file 6: Table S5). We hypothesized that cancer sub- set of key TFs forming a core regulatory circuitry (CRC) to types could be distinguished by the joint consensus clustering regulatetumor-subtypespecificgeneexpression.The TFsin of the DNA methylation of each TF’s cognate binding site and each CRC are auto-regulated by itself through binding sites the expression level of the corresponding TF. Strikingly, the on its corresponding SE. The TFs can also cross-regulate each joint consensus clustering with the assembled CRCs for can- other by forming an interconnected loop with cognate binding cers, including breast, liver, stomach, and endometrial carcin- sites on other TFs’ related SEs. Based on this information, we oma, can identify the cancer subtypes in line with the took advantage of the genome-wide information of distal previously established molecular/pathological subtypes. For in- DRE-target derived from MICMIC to assemble the CRCs stance, breast cancer subtypes (lumA, lumB, and basal like) regulated by DNA methylation for each cancer type [27] can be identified by the joint clustering (Fig. 6b). We can Tong et al. Genome Biology (2018) 19:73 Page 9 of 17 Fig. 6 Tumor-subtype core regulatory circuitry and pan-cancer global topology of TF network regulated by DNA methylation of distal DREs. a The interconnected auto- and cross-regulation loops within the CRC TFs. The links between TFs were derived from distal DRE-target pairs in which the DRE harbors binding sites for the CRC TFs. The TFs are colored by the tumor subtypes in which they are highly expressed. Effects of the CRC on tumorigenesis are analyzed by the cancer pathway enrichment of the TFs’ targets (hypergeometric p value < 0.05), representing in the right side of the CRC. b Top: Heatmap of the expression Z-score of CRC TFs in the tumor subtypes. Bottom: Joint consensus clustering by the expression of CRC TFs and methylation of binding DREs shows a great similarity between the CRC subtypes and PAM50 subtypes in breast cancer. See results of other cancers in Additional file 1: Figures S17–S19. c Signaling pathways in breast cancer regulated by CRC TFs whose targets were identified by the distal DRE-target pairs in which the DRE harbored the TF binding sites. Each color of a gene node indicates a different cancer pathway. Edges represent regulatory relationships. d Convergence of network topology across cancer types (see “Methods”). For each cancer type, their TF networks were decomposed and categorized into 13 different types of basic three-node network motifs, indicated by the topology structures above the graph. The X-axis shows the numerical identification number associated with each motif. The relative enrichment (Z > 2) or depletion (Z < − 2) of each of the 13 basic network motifs for each cancer type was calculated as a Z-score (Y-axis) Tong et al. Genome Biology (2018) 19:73 Page 10 of 17 further identify the underlying signaling pathways reg- from the methylation modulation by dCas9 targeting ulated by CRC in different subgroups (Fig. 6c and since the output could be the combined effect of Additional file 1:Figures S17–S19). Similarly, the glo- multiple genes targeted by the same distal DRE. bal gene regulatory network (GRN) for each cancer Our study provides mechanistic insight on how can be generated with the information of our DNA methylation of distal DREs is critical for the genome-wide distal DRE-target interaction and TFs maintenance of tumor cell identity and malignancy. associated with each DRE. Topology of GRN can be We found that oncogenic and lineage-specific TFs compared based on the normalized frequency of the shape the methylation landscape of distal DREs, three-node network motif in each cancer [28, 29]. which is controlled in concert by the expression level Notably, GRNs across various cancer types converged of each enriched TF and the average methylation level on a common architecture (Additional file 7:Table of its cognate binding sites on distal DREs. Key TFs S6), highlighting the similarity of GRN controlled by were identified to be part of core regulatory circuit- DNA methylation of distal regulatory regions at the ries (CRCs) associated with distal DREs for regulation higher-order organization level (Fig. 6d). of tumor-subtype specific gene expression. Further- more, we showed that the network topology of GRN Discussion derived from DNA methylation of distal DREs may In this study, we aimed to identify DNA methylation of have the same architecture across different cancer distal regulatory regions with causal effects on tumorigen- types, enriched for network motifs like “feed forwards esis. MICMIC is different from other currently available loop,”“regulated mutual,” and “regulating mutual.” methylation analysis software in two respects. First, since This similarity in topology suggests that a common many methylation events are merely a consequence of epi- organization principle governs this type of biological genetic disruption and not the cause, rather than calling networks regulated by DNA methylation of distal differentially methylated regions first, we begin by: (1) regulation regions. using genes essential for tumorigenesis by differential ex- pression test and MRA to find its distal DRE(s); and (2) Conclusions take novel application of information theoretic approaches In this study, we have developed a set of tools to in DRE-target call. Interestingly, about 23.7% putative en- genome-wide identify DNA methylation in distal regions hancers flanking our distal DREs harbor known COSMIC with causal effect on tumorigenesis. Novel oncogenes/ non-coding mutations in liver cancer (Additional file 8: tumor-suppressors and their putative enhancers can be Table S7). This can help prioritize the somatic mutations identified together based on this strategy. We have exten- locating on distal regulatory sites as cancer risk loci sively validated many of the predictions by epigenetic non-coding variants are enriched in enhancers [25, 30]. editing. Our study reveals the prevalent regulation of Our bench validation with dCas9 targeting is dependent genome-wide putative enhancers by DNA-methylation on the experiment with co-transfection of multiple plas- with causal effect on cellular malignancy and patient sur- mids into cancer cell lines that have to be effectively trans- vival. Our study also provides mechanistic insight on how fected. This could be challenging for certain cancer types, DNA methylation of distal regulatory regions is critical for e.g. only one gastric cell line “AGS” (over 50% transfection the maintenance of tumor cell identity and malignancy. efficiency with lipofectamine3000) and a few liver cancer lines have acceptable transfection efficiency in our hand. Methods However, the DNA methylation level seems quite hetero- Data collection geneous for most DREs in the same cell line. For instance, We downloaded TCGA level 3 DNA methylation data, we can increase or decrease the DNA methylation level of clinical data, and RNA-seq data for 4747 matched samples the same DRE site in AGS cell line by dCas9 targeting, encompassing 11 cancer types: bladder urothelial carcin- and consequentially change the gene expression level in oma (BLCA); breast invasive carcinoma (BRCA); cervical both directions, upregulation or downregulation. squamous cell carcinoma and endocervical adenocarcin- It is common for a single enhancer to control more oma (CESC); colon adenocarcinoma (COAD); esophageal than one gene and vice versa. As shown above, both carcinoma (ESCA); head and neck squamous cell carcin- oncogene CDCA5 and tumor-suppressor MEN1 were oma (HNSC); liver hepatocellular carcinoma (LIHC); lung verified to be regulated by the same distal DRE adenocarcinoma (LUAD); lung squamous cell carcinoma cg02933228. However, the decreased cell migration (LUSC); stomach adenocarcinoma (STAD); and uterine phenotype after dCas9-DNMT3A-3 L targeting of corpus endometrial carcinoma (UCEC) (Additional file 2: cg02933228 was only consistent with CDCA5’sfunc- Table S1). The methylation data is based on the Infinium tion prediction. We need to take into account this HumanMethylation450 BeadArray platform, in which the complexity when interpreting the phenotypic output probes covered 485,000 CpG sites across the genome. Tong et al. Genome Biology (2018) 19:73 Page 11 of 17 Mutual information and conditional mutual information IXðÞ ; Y jZ ¼ HXðÞ ; Z Mutual information (MI) is a general measurement of þHYðÞ ; Z −HXðÞ ; Y ; Z −HZðÞ ð8Þ dependence between individual events. This method where p(X,Y,Z) is the joint probabilities and H(X,Y,Z) is is based on the joint probability of events to infer de- the joint entropy. A high value for CMI(X,Y|Z) would pendence without making any assumptions about the mean X and Y are directly connected and do not rely on nature of their underlying relationships. MI is based the given variable Z. on information theory and can be calculated by the We used the kernel density estimation (KDE) to estimate entropy of variables. For any variable A, the entropy the probability distribution of continuous variables, such as H(A) is the average amount of information gained gene expression. KDE was found to be superior to the from a measurement. And it can be defined by: histograms estimator and the estimation of probability distribution by KDE has been used in MI calculation as fol- HAðÞ¼ − paðÞ logpaðÞ ð1Þ i i lows [18, 19], i¼1 PXðÞ where p(a) is the probability of any possible value of A. 1 1 1 The joint entropy of two discrete systems A and B is de- T −1 ¼ exp − X −X C X −X j i j i n=2 n=2 fined by N 2 ðÞ 2π jj C j¼1 N N A B X X (9) HAðÞ ; B ¼ − pa ; b logpa ; b ð2Þ i j i j where C is the covariance matrix of X and |C| is the i¼1 j¼1 determinant of matrix C. From Eqs. 1, 6, and 9, we got the entropy of variable where the p(a,b) is the joint probability. When both A X, MI of (X,Y), and CMI of (X,Y|Z) as: and B are independent events, the joint entropy of A hi and B can be denoted by: 1=2 HXðÞ¼ logðÞ 2πe jj C ; ð10Þ HAðÞ ; B ¼ HAðÞþHBðÞ ð3Þ 1 j CXðÞj � j CYðÞj IXðÞ ; Y ¼ log ; ð11Þ For any dependent events A and B, the joint entropy 2 j CXðÞ ; Y j will follow: 1 j CXðÞ ; Z j � j CYðÞ ; Z j IXðÞ ; Y jZ¼ log : ð12Þ HAðÞ ; B < HAðÞþHBðÞ ð4Þ 2 j CZðÞj � j CXðÞ ; Y ; Zj The mutual information of I(A,B), which quantifies MI and CMI were normalized by: the dependence of A and B, is defined as the difference IXðÞ ; Y between H(A) + H(B) and H(A,B): IXðÞ ; Y ¼ ; ð13Þ maxðÞ IXðÞ ; Y IAðÞ ; B ¼ HAðÞþHBðÞ−HAðÞ ; B ð5Þ IXðÞ ; Y jZ N N IXðÞ ; Y jZ¼ ; ð14Þ a b X X pa ; b i j maxðÞ IXðÞ ; Y jZ IAðÞ ; B ¼ pa ; b ð6Þ i j paðÞpb i j i¼1 j¼1 where maximal(MI) and maximal(CMI) were the MI and CMI values when Y was totally dependent on X. A higher MI represents a greater connection between Then, the normalized MI and CMI value were between the events. 0and 1. To further study the dependence within three or more variables, conditional mutual information (CMI) is intro- Significance level determination duced to assess the exclusive dependence between any pairs To determine the significance level of our MI and CMI of variables given the value of a third one. CMI can distin- examination, we used random permutation and Fisher’s guish pairs directly from indirectly connected. The condi- Z statistics to calculate the z-score and p value [19]. We tional mutual information (CMI) can be calculated by: randomly shuffled the vectors X and Y many times and X X X got the correlation r between random X,Y (CMI). Then p ðÞ z p ðÞ x; y; z Z X;Y ;Z IXðÞ ; Y jZ¼ p ðÞ x; y; z log X;Y ;Z we transformed r to z by: p ðÞ x; z p ðÞ y; z X;Z Y ;Z z∈Z y∈Y x∈X ð7Þ z ¼ :5l½ nðÞ 1 þ r − lnðÞ 1−r ð15Þ or in terms of entropy: The confidence interval would be: Tong et al. Genome Biology (2018) 19:73 Page 12 of 17 Identification of the direct regulatory elements by MI/CMI z  zσ 0 ð16Þ based PC-algorithm For genes being tested, we identified the DREs by the following steps: Here, the σ is the standard deviation of z. We used the observed X,Y value to get the observed CMI value 1. Data preparation. We selected neighboring elements and transformed it into Z-value. The Z-score was calcu- (i.e. messenger RNA expression and CpG probe lated by Z score = (Z value-z’)/σ . And the p value was methylation) of a target gene within a genomic calculated by 2*pnorm(−|Zscore|). range (default ± 300 kb from TSS of the gene) and integrated the data value for these selected elements Differential expression analysis (e.g. expression and methylation value). The final We used the Voom method to normalize the RNA-seq result was a data matrix in which columns data and calculated the gene differentially expressed be- correspond to samples and rows to variables (i.e. tween tumor and normal by limma package [31]. We se- gene or CpG probes). We chose genomic range ± lected the differentially expressed genes (DEGs) by the 300 kb since it was reported that the enhancer- cut-off of |log2FC| > 0.58 (i.e. fold change cut-off either promoter interactions peak around 120 kb upstream upregulation > 1.5-fold or downregulation at least of the TSS [34]. 1.5-fold) and adjusted P < 0.01, and the DEG list was 2. Identification of DREs for the gene on test. We used used for downstream analysis, e.g. identification of the the network inference method called PC algorithm corresponding DREs and master regulator analysis. to infer the regulatory network based on the MI/ CMI connections [12]. The PC algorithm is Master regulators analysis (MRA) computationally feasible and very efficient for Master regulators (MRs) control a large number of down- sparse connections frequently encountered in stream targets that play important roles in cancer stage biological networks. The result returned an adjacent transition. Here, we exploited a classical strategy to identify matrix representing the direct connected edges. MRs for cancer initiation (paired tumor vs normal samples) First, we assumed all nodes connected by default to and progression (late-stage [IV] vs early stage [I] samples). generate a completely connected graph between all The basic framework contains two parts: (1) based on can- genes and all CpG probes within the genomic range cer specific gene expression profile, transcriptional targets (default ± 300 kb from TSS of the gene). Second, (termed as regulon) of TFs are inferred using ARACNe MI was calculated for any node pair, e.g. node i and [32] with default parameters. Data processing inequality j based on their values in samples. Third, the edge (DPI) was set to reduce the number of indirect connections; between i and j will be kept in the network only if (2) gene set enrichment analysis with R gage package [33] their MI passes the significance testing (cut-off is conducted to evaluate whether the regulon of TFs is p < 0.01). Fourth, all of the common partners (k) for enriched in the signature of cancer-related phenotype tran- the i and j pair surviving last test will be used to sition (ranked gene list using t value from differential ex- calculate the CMI(i,j|k), which can distinguish if i-j pression analysis). Specifically, the regulon genes of a TF connection is conditional on variable k. Fifth, we are divided into positive (+) and negative (−)groupsbased generated a directly connected network in an on the Spearman’s correlation coefficients between the ex- adjacent matrix after deletion of these indirectly pression level of the TF and each gene in its regulon. Then, connected edges. Herein, a mutual information two runs of gene set enrichment analysis are carried out to cutoff (MI > 0.1 bits) was used to remove weak determine the MR is activated (i.e. oncogenic) or repressed connections. Finally, we generated a list of the (i.e. tumor-suppressor): run 1 regulon (+) in from the up- DRE-target pairs that were directly connected. regulated side and regulon (−) from the downregulated 3. Classification of the DRE-target pairs. The DREs were side;run 2regulon (+)infromthe downregulatedside and classified based on the target gene expression (up- or regulon (−) from the upregulated side. In each run, the en- downregulation in tumors), direction of correlation richment q-values are calculated by Fisher’smethod. Regu- with its target gene expression (positive or negative), lon(+) of a gene is also called positive neighbors and and the distance from the TSS of its target gene. regulon(−) of a gene is called negative neighbors in this DREs locating within ± 2000 bp of the TSS of its paper. Whichever of the two runs gives the more significant target genes were classified as the promoter DREs q-value is used as the final q-value; the MR is predicted as and otherswereclassifiedasdistalDREs. oncogenic(theqvalueinrun 1<the q valueinrun 2) or tumor-suppressive (the q-value in run 1 > the q-value in The MICMIC pipeline can be adjusted to handle genomic run 2) correspondingly. range beyond ± 300 kb. We chose genomic range ± 300 kb Tong et al. Genome Biology (2018) 19:73 Page 13 of 17 here since it was reported that the enhancer-promoter in- scores for each genomic feature on genomic regions teractions peak around 120 kb upstream of the TSS [34]. 6000 bp flanking each DRE, then got the average score for Examples of the indirectly correlated CpG-gene all DREs from the same cancer cell line. pairsrejectedbyour method arepresented in Additional file 1:FigureS3c.The deregulationsofthe Precision of DRE-target pairs CIMP genes are controlled by the hypermethylation We computed the precision of DRE-target pair predictions of genome-wide CpG islands and the strongly corre- by comparing them to the enhancer-promoter pairs lated CpGs rejected by our method showed no correl- (EP-pairs) predicted by chromatin interactions derived from ation in the non-CIMP samples. IM-PET, ChIA-PET, Hi-C, and RAD21-cohesin. These tools mainly detect active enhancers with enrichment of active his- Mapping chromatin state of DREs by ChromHMM 18-state tone marks, such as H3K4me1, H3K4me3, and K3K27Ac, model which were confirmed to be enriched in our DREs negatively To annotate the DREs, we downloaded chromHMM correlated its targets (Fig. 2d). Other studies show that active 18-state data of HMEC breast epithelial cells (E119), enhancers with low DNA methylation tend to have gene tar- HeLaS3 cervix cancer cells (E117), colon tissue (E106), gets with high expression [35–38]. DREs positively correlated HepG2 cells (E118), A549 lung cancer cells (E114), and with its targets were enriched for genomic repressive regions gastric tissue (E094) from the ROADMAP Epigenomics and TF repressors (Fig. 2c and Additional file 1:Figures S2 Project. We counted the number of DREs overlapping and S14), but not enriched for active histone marks. This with each chromatin state. For each chromatin state, the suggested that DREs positively correlated its targets may use enrichment fold change and significance were computed different mechanism to indirectly regulate gene expression. by hypergeometric testing using the total CpG probes Herein, we only considered DREs negatively correlated with on HM450 array as control. its targets for further analysis, similar to other studies [9, 10]. We used the hypergeometric test to calculate the stat- A predicted DRE-target pair will be counted as confirmed if istical significance of the over-represented chromatin its DRE and target gene overlapped the two ends of an inter- state for the DREs. We assigned N as the total number action from the IM-PET, ChIA-PET, HiC, or of probes in the HM450 array and K as the number of RAD21-cohesin data [23]. The precision result was similar probes overlapping with the chromatin state under test, but superior to the result obtained through other methods n as the number of DREs from N probes that can regu- (e.g. ELMER [9]) (Additional file 1:FigureS3b). Of note,our late its target genes, and x as the number of DREs over- method output many more negatively correlated EP pairs lapping with the chromatin state under test. The compared with ELMER (Additional file 1:FigureS3).The enrichment fold change was calculated as ratio between datasets of IM-PET, ChIA-PET, and HiC were downloaded x/n and K/N. The over-enrichment of chromatin states from the 4DGenome database [39]. A supplement of HiC in DREs was calculated with hypergeometric distribution. data was downloaded from GEO (GSE63525) and ChIA-PET data were downloaded from ENCODE Histone modifications, sequence conservation, and DNase (ENCSR436IAJ). We used the similar procedure [23]tocon- I hypersensitivity duct the RAD21-cohesin interaction analysis (termed as In order to systematically benchmark the DREs we identi- CNC), which used ChIP-Seq data to find pairs of cohesin fied, we collected epigenomic data of various human cells binding-sites that do not contain CTCF sites. The ChIP-Seq and tissues from the ENCODE Project (Additional file 9: datasets of CTCF and RAD21 were downloaded from EN- Table S8). We downloaded chromatin marks including CODE (ENCFF095BZW, ENCFF001TTK, ENCFF001UNO, histone modifications of H3K4me1, H3K4me3, H3K9me3, ENCFF059UOO, ENCFF594DJD, ENCFF001XLM, ENCFF H3K27me3, H3K27ac, and p300 ChIP-seq signals to 001TTJ, ENCFF001TTK, ENCFF001VDS). evaluate the enhancer activity of distal DREs, from breast cancer cells (MCF-7), colon cancer cells (HCT116), cer- Comparing MICMIC with other methods vical cancer cells (HeLa-S3), liver cancer cells (HepG2), We used IM-PET 23,106 EP interaction pairs between and lung cancer cells (A549). The enrichment of histone 5311 CpG probes and 344 genes as positive control marks at the distal DREs derived from TCGA cancer co- and tested the precision of EP prediction from patient horts was calculated with the epigenome profiling data data by four methods: MICMIC; ELMER; BNstruct from the corresponding cell lines or tissues. To evaluate (Bayesian Network Structure Learning) [40]; and the status of evolutionary conservation, we obtained the NEO2 (Network Edge Orienting (NEO) Software) 100-way PhastCons conservation data to calculate the [41]. All the methods were applied on the expression conservation score for the distal DREs in each cancer. We and methylation data from the same patient cohort of have also tested DNase I hypersensitivity data from TCGA liver cancer. The MICMIC EP prediction was MCF7, HelaS3, A549, and HepG2. We calculated the ranked by the normalized mutual information and Tong et al. Genome Biology (2018) 19:73 Page 14 of 17 conditional mutual information. The ELMER EP pre- and C706D mutations) by point mutagenesis with primers: diction was ranked by the empirical p value (Pe). The Dmt3a-muP705-Forward, GGC AGT GTC GAC AAT BNstruct EP prediction was ranked by the confidence GAC CTC TCC ATT GTC AAC CCT G; threshold (alpha). The NEO2 EP prediction was Dmt3a-muP705-Reverse, TCA TTG TCG ACA CTG CCT ranked by edge orienting score (LEO.NB.OCA). The CCA ATC ACC AGG, with sequencing confirmation. precision rates were calculated and compared when Putative distal regulatory regions and promoters of the tar- selecting the same number of top ranked EP pairs get genes were amplified from human genomic DNA (see from different methods. Additional file 4: Table S3 for primer sequences used in cloning) and inserted into the pGL3-basic vector (Promega). Cell culture For dCas9-TET1 targeting, we used these plasmids: Gastric cancer cell line AGS was from ATCC and liver pCAG-dCas9-5xPlat2AflD and pCAG-scFvGCN4sfGFP cancer cell lines BEL-7402 and PLC8024 were obtained TET1CD (Addgene plasmid #82560 and #82561, gifts from the Institute of Virology of the Chinese Academy of from Izuho Hatada). We generated catalytically inactive Medical Sciences (Beijing, China). AGS cells were cul- TET1 with H1671Y and D1673A mutations with primers: tured in RPMI-1640 medium (Gibco) supplemented with Tet1-muH1671-Forward, TCC CTA CAG GGC CAT 10% fetal bovine serum (HyClone) and 1% Anti-Anti TCA CAA CAT GAA TAA TGG AAG CAC TG; and (Gibco). BEL-7402 and PLC8024 cells were cultured in Tet1-muH1671-Reverse, AAT GGC CCT GTA GGG DMEM medium (Gibco) supplemented with 10% fetal bo- ATG AGC ACA GAA GTC CAG, with sequencing vine serum and 1% Anti-Anti. The AGS cell line can be ef- confirmation. fectively transiently transfected with efficiency > 50% with Before we decided to use single sgRNA to target one dis- lipofectamine3000. We selected liver cancer cell line tal DRE, we tested two or three sgRNAs in combination to BEL-7402 to test the effect of downregulation of tumor target one distal DRE. However, there is no difference for suppressors, such as KLF9, APOA1, APOC3, and the dCas9 targeting effect on the target gene expression. CBFA2T3. We used liver cancer cell line PLC8024, a more aggressive one compared with BEL-7402, to test the effect Transfections and control design of downregulation of oncogenes, such as HDCA11, All transfections were done with lipofectamine 3000 CDT1, NDRG1, TKT, TK1, BIRC5, and SLC16A3. (Invitrogen) according to the manufacturer’s instruc- tions. The ratios of co-transfected plasmids were as fol- RNA purification and qPCR lows: 1 gRNAs: 2 px330-dCas9: 1 pcDNA3-Dnmt3A-3 L Total RNA was purified using the method described pre- (test) or pcDNA3 (control) for qPCR; 1 gRNAs: 1 viously [42], followed by treatment with RNase-free pCAG-dCas9-5xPlat2AflD: 1 pCAG-scFvGCN4sfGFP DNaseI (NEB). RevertAid RT Reverse Transcription Kit TET1CD (test) or pcDNA3 (control) for qPCR; 19 (Thermo) was used to perform the first strand cDNA pGL3-promoter or pGL3-promoter-enhancer: 1 pRL-TK synthesis according to the manufacturer’s instructions. for luciferase assay; and 5 gRNAs: 10 Px330-dCas9: 5 For qPCR analysis, cDNA was subjected to quantifica- pcDNA3-Dnmt3A-3 L (test) or pcDNA3 (control): 19 tion by iTaq Universal SYBR green supermix (Bio-Rad). pGL3-promoter or pGL3-promoter-enhancer: 1 pRL-TK for luciferase assay. Plasmids and cloning Above “pcDNA3 (control)” is a control for dCas9 target- Catalytic domains of Dnmt3a and Dnmt3l were ampli- ing, in which dCas9 co-transfected with empty pcDNA3 fied from mouse cDNA and were fused to form without DNMT3A-3 L/TET1. The same conclusion as Dnmt3A-3 L. PUFa from pAC1405-pCR8-4xNLS shown in Figs. 3d and 4c can be reached by using scram- _PUFa_2xNLS (Addgene #71903) were fused with bled sgRNA as the control for the qPCR test (Additional Dnmt3a-3 l into vector of pcDNA3-Flag-HA (Addgene file 1: Figure S8c). dCas9 targeting specificity was con- #10792, a gift from William Sellers). gRNAs were cloned firmed with off-target test by bisulfite sequencing of into pAC1371-pX-sgRNA-5xPBSa (Addgene #71888, non-targeted sites (WNT5B-sgRNA in Additional file 1: Additional file 4: Table. S3 for sgRNA sequences). Figure S7a vs Fig. 3c, and NDRG1-sgRNA in Additional pAC1405-pCR8-4xNLS_PUFa_2xNLS and pAC1371-pX file 1: Figure S7b). dCas9 targeting specificity was also -sgRNA-5xPBSa were gifts from Albert Cheng (Addgene confirmed with qPCR quantifying other non-targeted plasmid #71888, Addgene #71903). dCas9 expression genes with WNT5B-sgRNA (Additional file 1:Figure S7c). plasmid was generated by replacement of the cas9 with Furthermore, we performed experiments by using “untar- dCas9 cassette in px330 vector (px330, Addgene plasmid geted” or catalytically inactive DNMT3A-3 L/TET1 to #42230, a gift from Feng Zhang; 3xFLAG-dCas9/ rule out the possibility of off-target due to overexpression pMXs-neo Addgene plasmid #51260, a gift from Hodaka DNMT3A-3 L/TET1 (Additional file 1: Figure S8a,b). The Fujii). We generated catalytically inactive Dnmt3a (P705V “untargeted” constructs were generated by removal of the Tong et al. Genome Biology (2018) 19:73 Page 15 of 17 PUFa linker from DNMT3A-3 L-fusion, or removal of positive-up, or positive-down), we counted the number of scFv linker from TET1-fusion (Additional file 1: DREs containing the binding site of the TF being tested, de- Figure S6). For these “untargeted,” catalytically active noted as variable “a” below and variable “b” fornumberof DNMT3A-3 L/TET1 was overexpressed but targeted DREs not containing the TF being tested. For the entire to nowhere due to the deletion of “linker” domain. The DREs combining the four subgroups, we can also get simi- “untargeted” or catalytically inactive DNMT3A-3 L/TET1 larnumberas “c” and “d” for containing and not containing did not result in any significant change of the target gene the TF being tested, respectively. Calculation of the enrich- expression (Additional file 1: Figure S8a, b). ment odds ratio (OR) and a 95% confidence interval (CI) was conducted with the following formulas: cDNA cloning and overexpression in lentivirus OR ¼ðÞ a=c =ðÞ b=d We cloned HDAC11, WNT5B, and MLEC cDNA from human cDNA library. We then inserted each pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CI ¼ exp logðÞ OR  1:96 1=a þ 1=b þ 1=c þ 1=d cDNA into lentiviral expression vector lenti-Blast, modified from lentiCas9-Blast (Addgene #52962, a gift from Feng Zhang). The lentivirus was packed with We then filtered TFs with an OR > 1.05 as the plasmids pMD2.G and psPAX2 after co-transfection enriched TFs in each DRE category. into 293 T cells. All cDNAs have been confirmed by DNA sequencing. Evaluate the tissue specificity of genes We downloaded the gene expression data of human tis- Dual luciferase assay sues from GTEx (GTEx V6 dataset) [44]. We used the The Dual-Luciferase Reporter Assay System (Promega) Voom method to normalize the data and limma [31]to was used in dual luciferase assay according to the manu- identify the differential expression genes comparing facturer’s instructions. samples of one tissue against all other tissues. Genes passing the threshold, log2 transformed Fold-Chang Migration assay >0.58 or< − 0.58 and adjusted p value < 0.01, were 5 6 4×10 of AGS cells or 1 × 10 BEL-7402 and PLC8024 identified as the tissue specific ones. cells were used to conduct migration assay using the 12-Well Chemotaxis Chamber (Neuro Probe) according Enrichment of transcription repressors to the manufacturer’s instructions. We searched the AmiGO database [45]with the key words “transcription repressor” and “negative regulation” to obtain Cell proliferation assay a list of genes related to the transcriptional repression CCk-8 (Dojindo) was used to perform cell proliferation process and collected the repressor information from assay following the manufacturer’s instructions. GO:0017053, GO:0090571, GO:0001206, GO:0001227, GO:0001191, GO:0000900, GO:0070491, GO:0070176, 5-aza-deoxycytidine treatment GO:0003714,GO:0032785, GO:2000143, GO:1903507, and AGS, BEL-7402, and PLC8024 cell lines were treated GO:0001078. These gene sets include transcriptional re- with 10 μM 5-aza-dC (Sigma-Aldrich) for 48 h, followed pressor activity, translation repressor activity, and transcrip- by RNA purification and qRT-PCR as described. DMSO tion repressor complex. Enrichment of transcription was used as a control to establish baseline expression. repressor of TFs associated with distal DREs was conducted by hypergeometric analysis. Identification of enriched transcription factor bindings For a distal DRE-target pair, a TF is considered a Discovery of core transcriptional regulatory circuitry regulator of the target if the cognate binding motif Core regulatory circuitry (CRCs) is formed by a set of of this TF can be found on the ± 250-bp genomic re- key TFs associated with super-enhancers (SEs) in gions flanking the DRE. To identify TFs associated regulating tumor-subtype specific gene expression and with the ± 250-bp genomic region flanking each maintaining tumor cellular identity. The TFs in each DRE, we used TFs from the Mocap database, con- CRC are auto-regulated by themselves via binding taining genomic mapping for 823 TFs [43]with sites on their corresponding SE. The TFs can also binding quality. Stringent cut-off (p value < 1e-5) cross-regulate each other by forming an intercon- was applied to select the TF binding sites. Mocap nected loop via cognate binding sites on other TFs’ method is an integrated classifier that assembles motif related SEs. Based on this information, we took ad- scores, chromatin accessibility, TF footprints, evolutionary vantageof thegenome-wideinformation on distal conservation, and other factors to predict TF bindings. For DRE-targets generated from our MICMIC method to each DRE category tested (negative-up, negative-down, assemble the CRCs regulated by DNA methylation for Tong et al. Genome Biology (2018) 19:73 Page 16 of 17 each cancer type. The information for SEs of human Additional file 8: Table S7. Cosmic non-coding variations neighboring genome hg19 was downloaded from dbSUPER [46]. LIHC DRE. (XLSX 187 kb) First, we selected the distal DREs overlapping with Additional file 9: Table S8. Data source of histone marks and DNase I. (XLSX 10 kb) SEs and identified the enriched TFs (OR > 1.05, CI = 95%) associated with these distal DREs. We then pre- Abbreviations dicted the auto-regulatory loops with the following DREs: Direct regulatory elements; MRA: Master regulator analysis; TCGA: The criterion: the TF on test is under regulation of distal Cancer Genome Atlas; TF: Transcription factor; TSS: Transcription start sites DREs with binding sites for TF itself. Cross-regulation between a pair of TFs can be inferred if the cognate Funding This work was supported by the Research Grants Council of Hong Kong binding site of one TF can be found on the other grants, HKU 17127014 General Research Fund (JZ) and HKU T12–710/16R TF’s related SE. After putting together all of the auto- Theme-based Research Scheme (SYL and JZ). and cross-regulations, we generated an interconnected Availability of data and materials CRCs eventually. TCGA methylation and RNA-seq data are available from BROAD GDAC Firehose ( http://gdac.broadinstitute.org/runs/stddata__2016_01_28/)[48]. The datasets of IM-PET, ChIA-PET, and HiC were downloaded from the TF targets and downstream cancer pathway analysis 4DGenome database (https://4dgenome.research.chop.edu/Tables/4DGeno- As mentioned before, for a distal DRE-target pair, a TF me_full.txt)[39]. The histone and DNase I signals can be accessed from ENCODE (https:// is considered a regulator of the target if the cognate www.encodeproject.org/matrix/?type=Experiment)[49] with accession ID binding motif of this TF can be found on the surround- listed in Additional file 9: Table S8. ing regions of the DRE (± 250 bp). After identification of The MICMIC software is available from https://github.com/ZhangJlab/MICMIC, under Creative Commons Attribution 4.0 license [50]. The DOI for source code the targets for CRCs, we conducted enrichment analysis used in this article is https://doi.org/10.5281/zenodo.1220929 [51]. for the downstream pathways. Enrichment analysis of KEGG cancer pathways was conducted to identify the Authors’ contributions pathway targeted by CRC TFs highly expressed in each YT performed the computational analyses with the aid of BR. JS performed experiments with the aid of CW, QK, and CNW. YT and JZ analyzed the data. SYL tumor subtype (cut-off p value < 0.05). and ASC provided support and critical comments to the manuscript. JZ conceived of, designed, and directed the study. YT and JZ wrote the manuscript with help from all authors. All authors read and approved the final manuscript. TF network decomposition and network motif identification Ethics approval and consent to participate The TF network mediated by distal DREs was derived Not applicable. from genome-wide DRE-target information predicted by Competing interests MICMIC after removal of non-TF genes. For network The authors declare that they have no competing interests. motif analysis, we used the mfinder software [47] to dis- assemble the TF network. On average across the 11 can- Publisher’sNote cer types, the TF networks were decomposed into 1.85 Springer Nature remains neutral with regard to jurisdictional claims in million three-node subgraphs with 13 types of published maps and institutional affiliations. three-node network motifs identified. Relative enrich- Author details ment or depletion of each of the 13 basic network motifs School of Biological Sciences, The University of Hong Kong, Hong Kong, within each cancer was calculated. Two hundred ran- Hong Kong. Department of Pathology, The University of Hong Kong, Queen Mary Hospital, Pokfulam, Hong Kong. domized same-size networks were used as the random control and the significance Z-score was calculated Received: 27 October 2017 Accepted: 3 May 2018 (Z > 2 considered as enriched and Z < −2asdepleted). References Additional files 1. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011;43:768–75. Additional file 1: Supplementary figures. (DOCX 10287 kb) 2. Baylin SB, Jones PA. A decade of exploring the cancer epigenome—biological Additional file 2: Table S1. Barcode of TCGA samples. (XLSX 81 kb) and translational implications. Nat Rev Cancer. 2011;11:726. 3. Sproul D, Nestor C, Culley J, Dickson JH, Dixon JM, Harrison DJ, et al. Additional file 3: Table S2. DREs identified from 11 cancers. Transcriptionally repressed genes become aberrantly methylated and (XLSX 5825 kb) distinguish tumors of different lineages in breast cancer. Proc Natl Acad Sci Additional file 4: Table S3. sgRNA-primer of DRE tested. (XLSX 11 kb) U S A. 2011;108:4364–9. Additional file 5: Table S4. Category of methylation change. 4. Hovestadt V, Jones DT, Picelli S, Wang W, Kool M, Northcott PA, et al. (XLSX 5313 kb) Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014;510:537–41. Additional file 6: Table S5. Interconnected loops in core regulatory 5. Bernstein DL, Le Lay JE, Ruano EG, Kaestner KH. TALE-mediated epigenetic circuits. (XLSX 40 kb) suppression of CDKN2A increases replication in human fibroblasts. J Clin Additional file 7: Table S6. TF_network_motifs. (XLSX 11 kb) Invest. 2015;125:1998–2006. Tong et al. Genome Biology (2018) 19:73 Page 17 of 17 6. Cui C, Gan Y, Gu L, Wilson J, Liu Z, Zhang B, et al. P16-specific 34. Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape DNA methylation by engineered zinc finger methyltransferase of gene promoters. Nature. 2012;489:109–13. inactivates gene transcription and promotes cancer metastasis. 35. Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A Genome Biol. 2015;16:252. unique chromatin signature uncovers early developmental enhancers in 7. Baylin SB. DNA methylation and gene silencing in cancer. Nat Clin Pract humans. Nature. 2011;470:279–83. Oncol. 2005;2:S4–S11. 36. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, 8. Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites et al. Histone H3K27ac separates active from poised enhancers and predicts characterizes dysregulation of cancer genes. Genome Biol. 2013;14:R21. developmental state. Proc Natl Acad Sci U S A. 2010;107:21931–6. 37. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, et al. DNA- 9. Yao L, Shen H, Laird PW, Farnham PJ, Berman BP. Inferring regulatory binding factors shape the mouse methylome at distal regulatory regions. element landscapes and transcription factor networks from cancer Nature. 2011;480:490–5. methylomes. Genome Biol. 2015;16:105. 38. Burger L, Gaidatzis D, Schubeler D, Stadler MB. Identification of active 10. Bell RE, Golan T, Sheinboim D, Malcov H, Amar D, Salamon A, et al. Enhancer regulatory regions from DNA methylation data. Nucleic Acids Res. 2013; methylation dynamics contribute to cancer plasticity and patient mortality. 41:e155. Genome Res. 2016;26:601–11. 39. Teng L, He B, Wang J, Tan K. 4DGenome: a comprehensive database of 11. Lin X, Su J, Chen K, Rodriguez B, Li W. Sparse conserved under-methylated chromatin interactions. Bioinformatics. 2015;31:2560–4. CpGs are associated with high-order chromatin structure. Genome Biol. 40. Franzin A, Sambo F, Di Camillo B. bnstruct: an R package for Bayesian 2017;18:163. Network structure learning in the presence of missing data. Bioinformatics. 12. Kalisch M, Buhlmann P. Estimating high-dimensional directed acyclic graphs 2017;33:1250–2. with the PC-algorithm. J Mach Learn Res. 2007;8:613–36. 41. Aten JE, Fuller TF, Lusis AJ, Horvath S. Using genetic markers to orient 13. Kalisch M, Buhlmann P. Robustification of the PC-Algorithm for Directed the edges in quantitative trait networks: the NEO software. BMC Syst Acyclic Graphs. J Comput Graph Stat. 2008;17:773–89. Biol. 2008;2:34. 14. Cheng AW, Jillette N, Lee P, Plaskon D, Fujiwara Y, Wang W, et al. Casilio: a 42. Shatzkes K, Teferedegne B, Murata H. A simple, inexpensive method for versatile CRISPR-Cas9-Pumilio hybrid for gene regulation and genomic preparing cell lysates suitable for downstream reverse transcription labeling. Cell Res. 2016;26:254. quantitative PCR. Sci Rep. 2014;4:4659. 15. Morita S, Noguchi H, Horii T, Nakabayashi K, Kimura M, Okamura K, et al. 43. Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of Targeted DNA demethylation in vivo using dCas9-peptide repeat and scFv- transcription factor binding sites from chromatin accessibility. Nucleic Acids TET1 catalytic domain fusions. Nat Biotechnol. 2016;34:1060–5. Res. 2017;45:4315–29. https://www.nature.com/articles/ncomms15943. 16. Liu XS, Wu H, Ji X, Stelzer Y, Wu X, Czauderna S, et al. Editing DNA 44. Consortium GT. The Genotype-Tissue Expression (GTEx) project. Nat Genet. Methylation in the Mammalian Genome. Cell. 2016;167:233–47. e217 2013;45:580–5. 17. Huang YH, Su J, Lei Y, Brunetti L, Gundry MC, Zhang X, et al. DNA epigenome 45. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, et al. AmiGO: online editing using CRISPR-Cas SunTag-directed DNMT3A. Genome Biol. 2017;18:176. access to ontology and annotation data. Bioinformatics. 2009;25:288–9. 18. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. 46. Khan A, Zhang X. dbSUPER: a database of super-enhancers in mouse and Reverse engineering of regulatory networks in human B cells. Nat Genet. human genome. Nucleic Acids Res. 2016;44:D164–71. 2005;37:382. 47. Kashtan N, Itzkovitz S, Milo R, Alon U. Efficient sampling algorithm for 19. Zhang X, Zhao X-M, He K, Lu L, Cao Y, Liu J, et al. Inferring gene regulatory estimating subgraph concentrations and detecting network motifs. networks from gene expression data by path consistency algorithm based Bioinformatics. 2004;20:1746–58. on conditional mutual information. Bioinformatics. 2011;28:98–104. 48. Broad Institute TCGA Genome Data Analysis Center. Analysis-ready 20. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and standardized TCGA data from Broad GDAC Firehose 2016_01_28 run. Broad characterization. Nat Methods. 2012;9:215–6. Institute of MIT and Harvard. Dataset. 2016. https://doi.org/10.7908/C11G0KM9. 21. Charlet J, Duymich CE, Lay FD, Mundbjerg K, Sørensen KD, Liang G, et al. 49. Consortium EP. Identification and analysis of functional elements in 1% of Bivalent regions of cytosine methylation and H3K27 acetylation suggest an the human genome by the ENCODE pilot project. Nature. 2007;447:799. active role for DNA methylation at enhancers. Mol Cell. 2016;62:422–31. 50. Tong Y, Sun J, Wong CF, Kang Q, Ru B, Wong CN, et al. MICMIC: identification 22. Consortium EP. An integrated encyclopedia of DNA elements in the human of DNA methylation of distal regulatory regions with causal effects on genome. Nature. 2012;489:57–74. tumorigenesis. Github. https://github.com/ZhangJlab/MICMIC. (2018). 23. He B, Chen C, Teng L, Tan K. Global view of enhancer–promoter Accessed 19 Apr 2018. interactome in human cells. Proc Natl Acad Sci. 2014;111:E2191–9. 51. Tong Y, Sun J, Wong CF, Kang Q, Ru B, Wong CN, et al. MICMIC: 24. Fullwood MJ, Liu MH, Pan YF, Liu J, Han X, Mohamed YB, et al. An oestrogen identification of DNA methylation of distal regulatory regions with causal receptor α-bound human chromatin interactome. Nature. 2009;462:58. effects on tumorigenesis. zenodo. https://zenodo.org/record/1220929#. 25. Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, et al. Super- WuH5WC7wbIU (2018). Accessed 19 Apr 2018. enhancers in the control of cell identity and disease. Cell. 2013;155:934–47. 26. Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–19. 27. Network CGA. Comprehensive molecular portraits of human breast tumors. Nature. 2012;490:61. 28. Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, et al. Superfamilies of evolved and designed networks. Science. 2004;303:1538–42. 29. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012;150:1274–86. 30. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75. 31. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. 32. Lachmann A, Giorgi FM, Lopez G, Califano A. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics. 2016;32:2233–5. 33. Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161.

Journal

Genome BiologySpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off