Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma

Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of... Abstract The alteration of DNA methylation landscape is a key epigenetic event in cancer. As the accumulation of large-scale genome-wide DNA methylation data from clinical samples, we are able to characterize the patterns of DNA methylation alterations for identifying candidate epigenetic markers and drivers. In this survey, we take hepatocellular carcinoma (HCC) as an example to show the basic steps of analyzing the DNA methylation patterns in cancer across multiple data sets. We collected three genome-wide DNA methylation data sets with ∼800 clinical samples and the corresponding gene expression data sets. First, by quantitatively analyzing two global methylation alterations, it is found that about 90% tumors acquire either genome-wide DNA hypo-methylation or CpG island methylator phenotype. Second, probe-level analysis identified 267, 228 and 197 hyper-methylated sites in promoter regions for the three data sets, respectively. These local hyper-methylated patterns are highly consistent: 84 sites (from 61 promoters) are hyper-methylated in all the three studied data sets, including many previously reported genes, such as CDKL2, TBX15 and NKX6-2. Then, these hyper-methylated sites were used as candidate markers to classify tumor and non-tumor samples. The classifiers based on only 10 selected probes can achieve high discriminative ability across different data sets. Finally, by integrative analyzing DNA methylation and gene expression data, we identified 222 candidate epigenetic drivers, which are enriched in inflammatory response and multiple metabolic pathways. A set of high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with patients’ overall survivals. In summary, this study systematically characterized the DNA methylation alterations and their impacts on gene expressions in HCCs based on multiple data sets. DNA methylation, molecular marker, epigenetic driver, hepatocellular carcinoma Introduction DNA methylation, a key epigenetic regulatory factor, plays important roles in cancer initiation and progression [1, 2]. The ubiquity of the DNA methylation alterations provides a novel choice of molecular markers for cancer diagnosis and prognosis [3–7]. Also, integrative analysis of DNA methylation and gene expression can help the identification of epigenetic drivers in cancer [8, 9]. Hepatocellular carcinoma (HCC) is one of most common solid tumors worldwide [10–12]. Genome-wide differential DNA methylations [13, 14] and prognostic epigenetic signatures [15] in HCCs have been reported in a few studies. As the accumulation of genome-wide DNA methylation data sets from clinical samples, we are able to systematically characterize the DNA methylation patterns and their impacts on gene expressions of HCCs. In this study, we collected three genome-wide DNA methylation data sets of 646 tumor and 134 non-tumor samples in total from HCC patients. First, we analyzed two global DNA methylation alterations: the genome-wide DNA hypo-methylation (GDH) [16, 17] and the CpG island methylator phenotype (CIMP) [18, 19]. Results show that about 90% tumors acquire one of the two alterations. Then, we analyzed the hyper-methylated sites in promoter regions, which can be used as candidate markers for discriminating tumor and non-tumor samples [13, 20, 21]. Results show that the hyper-methylated sites are highly consistent in different data sets. The classifiers, which were built based on one data set, have high discriminative ability in the other two data sets. The performances can be further validated in an independent cohort of 10 HCC patients. By integrative analysis of promoter DNA methylations and gene expressions, a tendency of negative correlations is observed. And, the correlations are highly consistent between different data sets. The anti-correlations identified 222 candidate epigenetic driver genes corresponding to 462 promoter sites. These candidates are highly enriched in inflammatory response and multiple metabolic processes. By further comparing the promoter methylation levels and gene expressions in tumor and non-tumor samples, a set of high-confidence candidates were identified. Some of them are significantly associated with patients’ overall survivals, including SFN, SPP1 and TKT. Materials and methods Data sets Three public genome-wide DNA methylation data sets of HCCs (all detected by Illumina Infinium HumanMethylation450 Beadchip) were downloaded from NCBI GEO databases (DS1: GSE54503 [13], DS2: GSE56588 [15]) and TCGA data portal (DS3: TCGA-LIHC). The probes with low-quality signals (detected P-value > 0.05), and those overlapped with single-nucleotide polymorphisms or cross-hybridized with multiple genomic loci were removed as recommended [22]. The beta-values, which represent the methylation levels of probe target sites, were used in the following computational analyses. The corresponding gene expression data of two data sets (DS2 and DS3) were also collected. For the validation cohort, the DNA methylations of 10 paired tumor and non-tumor samples resected from HCC patients were profiled by the same platform. This study was approved by the ethics committee of Peking Union Medical College Hospital. The written informed consents were collected from all the participants. Please see the details of all the used data sets in Table 1. The curated data sets are available via the Web site http://bioinfo.au.tsinghua.edu.cn/member/jgu/hcc-dnameth/. Table 1 The data sets used in this study Data set  DNA methylation   Gene expression   Source  #Ta  #NTb  Source  #T  #NT  DS1  GSE54503  66  66        DS2  GSE56588  224  19  GSE63898  228  168  DS3  TCGAc  353  49  TCGA  356  49  DS4  PUMCHd  10  10        Data set  DNA methylation   Gene expression   Source  #Ta  #NTb  Source  #T  #NT  DS1  GSE54503  66  66        DS2  GSE56588  224  19  GSE63898  228  168  DS3  TCGAc  353  49  TCGA  356  49  DS4  PUMCHd  10  10        aThe number of the tumor samples. bThe number of the non-tumor samples. cOnly the patients from LIHC annotated with ‘Hepatocellular Carcinoma’ are used. dAn additional validation cohort. The probes in CpG islands were derived from Bioconductor annotation database IlluminaHuman-Methylation450k.db. The probes in promoter regions (−1500 ∼ +500 around the transcription start sites) and gene body regions were re-annotated according to ENSEMBL Release GRCh37-r75. Global measurements of DNA methylation alterations Previous studies suggest that GDH and CIMP are two typical patterns of global DNA methylation alterations in cancer. For measuring the levels of GDHs, the medians of all probes in non-tumor samples were used as the reference. Then, a GDH z-score was calculated against the reference for each sample:   ZiGDH=Mediani-AVEMedianNon-TumorSDMedianNon-Tumor. For measuring the levels of CIMPs, the 3rd quantile (the quantile with the higher methylation level) of the probes located in CpG islands in non-tumor samples were used as the reference. Then, a CIMP z-score was calculated against the reference for each sample:   ZiCIMP=ThridQuantilei-AVEThirdQuantileNon-TumorSDThirdQuantileNon-Tumor. More negative GDH z-score means more severe global hypo-methylation, and more positive CIMP z-score means more severe CpG island hyper-methylation. Classifiers for discriminating tumor and non-tumor samples To objectively assess the predictive performances of the classifiers, only DS1 (GSE54503) was used for training. First, the hyper-methylated sites in promoter regions were preselected by comparing the DNA methylation levels between tumor and non-tumor samples in the training data set by FastDMA (a package to identify differential methylated probes and regions based on a generalized linear model) [22] with q-value < 1e-10 and increased methylation level > 0.3. Based on the two global features or the preselected probes, three classifiers were built based on the training set. Area under the curves and specificity/sensitivity were used to assess the performances. Three different classifiers were built based on different features: (1) the ‘Global’ classifier based on the two global features; (2) the ‘Top10’ classifier based on the top 10 probes with the most hyper-methylations in promoter regions; and (3) the ‘ENet’ classifier based on the 10 probes (promoter regions) selected by elastic net [23, 24] with bootstrapping. Elastic net is a regularized linear regression algorithm that can do feature selections via model fitting. The 10 probes most recurrently selected during bootstrapping were used in the final classifier, according to the computational framework proposed in reference [25]. Identifications of candidate epigenetic drivers To avoid the noises caused by the sites with small variations, only the probes in promoters with inter-quantile range > 0.1 in tumor samples were used in this study. Owing to the un-adjustable batch effects of the expression data (one data set by array and another by sequencing), the Spearman’s rank correlations between promoter probes and their corresponding gene expressions were separately calculated for each data set. The genes whose expressions were anti-correlated with the methylation levels of promoter probes were identified as the candidate epigenetic drivers (the z-scores of correlations < −5 in both data sets). Then, a set of high-confidence candidates were selected by further comparing the differential DNA methylations (differential methylation level > 0.1) and gene expressions (fold change > 1.5) between tumor and non-tumor tissues: oncogenic drivers were defined as hypo-methylated and over-expressed in tumor samples; and tumor-suppressive drivers were defined as hyper-methylated and down-expressed in tumor samples. Results Analysis of global DNA methylation alterations GDH (an indication of genome instability) and CIMP are two typical epigenetic alterations in solid tumors. We proposed quantile-based z-scores to measure the two alterations: (1) the median methylation levels of all probes in adjacent normal samples were used as the reference to calculate the z-scores for global hypo-methylation phenotypes (z-score < −3 for GDH); and (2) the third quantile methylation levels of CpGi probes in adjacent normal samples were used as the reference to calculate the z-scores for CIMPs (z-score > 3 for CIMP). Results show that 90.1% (585 of 646) HCC samples have either GDH or CIMP, and 32.4% (209 of 646) samples have both phenotypes. The two phenotypes have correlated but contradictory effects on either other. Only 2.2% (3 of 134) adjacent normal samples are not ‘Normal-Like’ (both z-scores between −3 and 3) and 9.1% (59 of 646) HCC samples are classified as ‘Normal-Like’ (Figure 1A). But the four subtypes do not correlate with HCC invasions, clinical stages or patients’ overall survivals (Supplementary Figure S1). Only using the two global features, classifiers can achieve high performance for discriminating tumor and non-tumor samples (Figure 1B). Figure 1 View largeDownload slide The two global features of DNA methylation alterations in HCCs. (A) The quantitative measurements of GDH and CIMP. (B) The predictive performances based on the two global features. Figure 1 View largeDownload slide The two global features of DNA methylation alterations in HCCs. (A) The quantitative measurements of GDH and CIMP. (B) The predictive performances based on the two global features. Identifications of DNA methylation-based molecular markers Hyper-methylation of CpG islands and selective promoters is another feature of DNA methylation alterations in solid tumors. Those hyper-methylated sites in promoter regions are candidate molecular markers of HCCs. By setting a stringent criterion for differential methylation between tumor and non-tumor tissues (FastDMA q-value < 1e-10 and increased methylation level > 0.3), we identified 267, 228 and 197 hyper-methylated sites from 168, 128 and 116 promoters for the three data sets, respectively. These hyper-methylated events are not randomly occurred: 84 sites and 61 promoters are consistently hyper-methylated (Supplementary Figure S2 and Table S1). A set of genes, including CDKL2, DUOXA1, NKX6-2, FSCN1, TBX15, ASCL2, ZSCAN1, DNM3 and TMEM240, have at least three hyper-methylated probes in all the three studied data sets. To evaluate the discriminative ability of these hyper-methylated sites, DS1 (GSE54503) was used as the training data set to build classifiers. Based on the 267 hyper-methylated probes from DS1, two predictive classifiers were built: one based on the top 10 hyper-methylated probes and the other one based 10 computationally selected probes (named as Top10 and ENet classifiers, respectively. Please see details in the ‘Methods’ section). Then, the classifiers were applied on the other two data sets. Results show that the classifiers can achieve high predictive performances (Figure 2A, B): for the Top10 classifier, the sensitivities are > 86% (91.1% and 86.2% in DS2 and DS3, respectively) and specificities are almost 100% (100% and 98%) in the two testing data sets; for the ENet classifier, the sensitivities are lower (89.3% and 78.4%), but both specificities are 100%. To further test the predictive performance, an additional validation data set was generated from paired tumor and non-tumor samples of 10 HCC patients. Similar performances can be observed (Table 2). The selected probes in the classifiers are lowly methylated in adjacent normal tissues and consistently hyper-methylated in HCC tissues (Figure 2C–F). Above results indicate that the selective hyper-methylated promoter sites are good candidates for molecular markers. Table 2 The classification performances based on hyper-methylated probes Performance  DS1 (GSE54503)   DS2 (GSE56588)   DS3 (TCGA)   DS4 (PUMCH)   Globala  ENetb  Top10c  Global  ENet  Top10  Global  ENet  Top10  Global  ENet  Top10  Sensitivity  62/66  66/66  65/66  206/224  200/224  204/224  319/356  279/356  307/356  10/10  9/10  10/10  (93.9%)  (100%)  (98.5%)  (92.0%)  (89.3%)  (91.1%)  (89.6%)  (78.4%)  (86.2%)  (100%)  (90%)  (100%)  Specificity  65/66  66/66  66/66  19/19  19/19  19/19  48/49  49/49  48/49  10/10  10/10  8/10  (98.5%)  (100%)  (100%)  (100%)  (100%)  (100%)  (98.0%)  (100%)  (98.0%)  (100%)  (100%)  (80%)  Performance  DS1 (GSE54503)   DS2 (GSE56588)   DS3 (TCGA)   DS4 (PUMCH)   Globala  ENetb  Top10c  Global  ENet  Top10  Global  ENet  Top10  Global  ENet  Top10  Sensitivity  62/66  66/66  65/66  206/224  200/224  204/224  319/356  279/356  307/356  10/10  9/10  10/10  (93.9%)  (100%)  (98.5%)  (92.0%)  (89.3%)  (91.1%)  (89.6%)  (78.4%)  (86.2%)  (100%)  (90%)  (100%)  Specificity  65/66  66/66  66/66  19/19  19/19  19/19  48/49  49/49  48/49  10/10  10/10  8/10  (98.5%)  (100%)  (100%)  (100%)  (100%)  (100%)  (98.0%)  (100%)  (98.0%)  (100%)  (100%)  (80%)  aThe performances based on two global features. bThe performances based on the 10 probes selected by elastic net with bootstrapping. cThe performances based on the top 10 probes with the largest differential methylation levels. Figure 2 View largeDownload slide The discriminative abilities and methylation patterns of selected hyper-methylated probes in promoter regions. (A) The predictive performance of the classifier based on 10 most hyper-methylated probes. (B) The predictive performance of the classifier based on 10 selected probes by elastic net. (C–F) The heatmaps of the selected probes in all the studied data sets. Figure 2 View largeDownload slide The discriminative abilities and methylation patterns of selected hyper-methylated probes in promoter regions. (A) The predictive performance of the classifier based on 10 most hyper-methylated probes. (B) The predictive performance of the classifier based on 10 selected probes by elastic net. (C–F) The heatmaps of the selected probes in all the studied data sets. Identifications of candidate epigenetic drivers by integrative analysis The DNA methylation of promoter regions (especially CpG island associated promoters) could negatively regulate gene transcription [9, 26]. Correlation analysis shows overall negative correlations between promoter methylations and gene expressions in tumor samples (Figure 3A, B), and the correlations were highly consistent in DS2 and DS3 (Figure 3C). A set of 222 candidate epigenetic drivers (with 462 probes), whose expressions were anti-correlated with at least one promoter probe methylations in tumor samples in both data sets, were identified. These candidates are highly enriched in inflammatory response (26 genes, adjusted P-value 7.6e-6) and several metabolic processes, such as ‘xenobiotic metabolic process’ (19 genes, 8.6e-9), ‘carboxylic acid metabolic process’ (39 genes, 3.5e-8), ‘oxidation-reduction process’ (36 genes, adjusted P-value 2.2e-6) (by DAVID v6.8beta [27]). Figure 3 View largeDownload slide Correlation analysis of promoter methylations and gene expressions. (A, B) The distribution of Spearman’s correlations between promoter methylation levels and gene expressions in DS2 and DS3, respectively. (C) The second-order correlations between the correlations of DS2 and DS3. Figure 3 View largeDownload slide Correlation analysis of promoter methylations and gene expressions. (A, B) The distribution of Spearman’s correlations between promoter methylation levels and gene expressions in DS2 and DS3, respectively. (C) The second-order correlations between the correlations of DS2 and DS3. These candidate epigenetic drivers were further investigated according to their differential methylations and expressions between tumor and non-tumor samples (Supplementary Table S2). A set of ‘high-confidence’ oncogenic candidates were identified as over-expressed and hypo-methylated in tumor samples, including SFN, SPP1, ACSL4, ALDH3A1, CLDN15, CYP7A1, TKT and GLUL (Table 3). The following survival analysis shows that the high expressions of SFN, SPP1 and TKT in tumor tissues are significantly associated with poor survivals (Figure 4). SPP1, named as osteopontin, has been widely studied in HCCs as a novel molecular marker [28–30] and can promote metastasis and stemness [31]. SFN, named as stratifin, is activated by promoter hypo-methylation in lung adenocarcinoma [32] and can accelerate tumor development [33]. TKT, named as transketolase, is over-expressed in metastatic relapsed HCC [34] and can promote cancer progression [35]. Also, the high-confidence tumor-suppressive candidates were identified as under-expressed and hyper-methylated in tumor samples, including SLC25A47, APOA5, ENDOD1, EXOCL3L4, EZR and PHYHD1 (Table 3). SLC24A47, named as HCC down-regulated mitochondrial carrier protein [36], is strongly under-expressed in HCCs, but its role in HCC remains unknown. Taken together, above results indicate that integrative analysis of promoter DNA methylation and gene expressions in multiple data sets can facilitate the identification of epigenetic drivers in cancer. Table 3 The high-confidence candidate epigenetic drivers Symbol  Differential expressiona(log2FC)   Probe  Correlationb(Transformed Z-score)   Differential methylation levelc  DS2  DS3  DS2  DS3  SFN  0.94  3.59  cg07786675  −10.76  −18.33  −0.117        cg13466284  −10.22  −21.91  −0.229        cg17330303  −10.49  −21.08  −0.224        cg06720467  −8.79  −17.88  −0.255        cg21950166  −9.12  −18.89  −0.237        cg11348165  −10.50  −19.44   −−0.164  SLC25A47  −2.26  −5.11  cg00946753  −12.86  −20.15  0.182        cg13640769  −11.61  −18.82  0.143        cg00239071  −11.13  −20.72  0.215  TKT  1.36  1.14  cg07918978  −8.02  −12.43  −0.124        cg00707777  −8.96  −12.75  −0.138        cg19378537  −7.65  −11.03  −0.104  ARL4C  −1.06  −0.83  cg05308656  −8.98  −9.28  0.132        cg09453076  −7.85  −8.91  0.169  CLDN15  0.98  1.28  cg07651914  −9.59  −14.55  −0.116        cg08636573  −8.84  −14.45  −0.177  QSOX1  −1.43  −1.03  cg03208016  −5.88  −8.70  0.100        cg09505809  −5.72  −5.70  0.111  SPP1  1.43  1.83  cg00088885  −7.74  −7.77  −0.214        cg15460348  −12.67  −17.22  −0.148  TIMP2  −0.99  −0.61  cg06641285  −6.37  −8.39  0.299        cg05306745  −7.76  −9.36  0.363  ACSL4  0.95  2.08  cg14457256  −8.42  −13.56  −0.147  APOA5  −1.48  −2.44  cg02157083  −7.28  −13.47  0.113  C17orf58  1.05  0.69  cg12739664  −6.37  −6.27  −0.122  ENDOD1  −0.82  −1.02  cg16317734  −5.82  −8.65  0.278  EZR  −1.44  −0.62  cg22812275  −6.05  −7.67  0.123  GSTP1  −0.81  −0.81  cg04920951  −8.04  −10.60  0.258  PHYHIPL  1.02  0.91  cg06972969  −7.35  −6.89  −0.126  PTGDS  −2.60  −1.64  cg13796381  −5.89  −6.59  0.196  RAMP1  1.10  0.59  cg03647559  −6.73  −10.38  −0.109  SHISA4  0.78  0.85  cg11586189  −7.73  −9.45  −0.104  SLC22A1  −2.73  −4.70  cg13434757  −7.20  −11.14  0.145  Symbol  Differential expressiona(log2FC)   Probe  Correlationb(Transformed Z-score)   Differential methylation levelc  DS2  DS3  DS2  DS3  SFN  0.94  3.59  cg07786675  −10.76  −18.33  −0.117        cg13466284  −10.22  −21.91  −0.229        cg17330303  −10.49  −21.08  −0.224        cg06720467  −8.79  −17.88  −0.255        cg21950166  −9.12  −18.89  −0.237        cg11348165  −10.50  −19.44   −−0.164  SLC25A47  −2.26  −5.11  cg00946753  −12.86  −20.15  0.182        cg13640769  −11.61  −18.82  0.143        cg00239071  −11.13  −20.72  0.215  TKT  1.36  1.14  cg07918978  −8.02  −12.43  −0.124        cg00707777  −8.96  −12.75  −0.138        cg19378537  −7.65  −11.03  −0.104  ARL4C  −1.06  −0.83  cg05308656  −8.98  −9.28  0.132        cg09453076  −7.85  −8.91  0.169  CLDN15  0.98  1.28  cg07651914  −9.59  −14.55  −0.116        cg08636573  −8.84  −14.45  −0.177  QSOX1  −1.43  −1.03  cg03208016  −5.88  −8.70  0.100        cg09505809  −5.72  −5.70  0.111  SPP1  1.43  1.83  cg00088885  −7.74  −7.77  −0.214        cg15460348  −12.67  −17.22  −0.148  TIMP2  −0.99  −0.61  cg06641285  −6.37  −8.39  0.299        cg05306745  −7.76  −9.36  0.363  ACSL4  0.95  2.08  cg14457256  −8.42  −13.56  −0.147  APOA5  −1.48  −2.44  cg02157083  −7.28  −13.47  0.113  C17orf58  1.05  0.69  cg12739664  −6.37  −6.27  −0.122  ENDOD1  −0.82  −1.02  cg16317734  −5.82  −8.65  0.278  EZR  −1.44  −0.62  cg22812275  −6.05  −7.67  0.123  GSTP1  −0.81  −0.81  cg04920951  −8.04  −10.60  0.258  PHYHIPL  1.02  0.91  cg06972969  −7.35  −6.89  −0.126  PTGDS  −2.60  −1.64  cg13796381  −5.89  −6.59  0.196  RAMP1  1.10  0.59  cg03647559  −6.73  −10.38  −0.109  SHISA4  0.78  0.85  cg11586189  −7.73  −9.45  −0.104  SLC22A1  −2.73  −4.70  cg13434757  −7.20  −11.14  0.145  aThe log2-transformed fold changes between tumor and non-tumor samples (>0 means over-expression in tumor samples) in DS2 and DS3. bThe Spearman’s rank correlations (transformed as z-scores by Fisher’s transformation) between gene expressions and probe methylation levels in tumor samples of DS2 and DS3. cThe differential methylation levels between tumor and non-tumor samples (>0 means hyper-methylation in tumor samples) across all the three studied data sets. Figure 4 View largeDownload slide The high-confidence candidate epigenetic drivers significantly associated with overall survivals, including SFN, SPP1 and TKT. In the left and middle columns, the samples are ordered according to the decreasing gene expressions in each data set and the methylation levels of the corresponding anti-correlated probes are plotted. The right column shows the survival curves according to the gene expressions. Figure 4 View largeDownload slide The high-confidence candidate epigenetic drivers significantly associated with overall survivals, including SFN, SPP1 and TKT. In the left and middle columns, the samples are ordered according to the decreasing gene expressions in each data set and the methylation levels of the corresponding anti-correlated probes are plotted. The right column shows the survival curves according to the gene expressions. Discussion The DNA methylation alteration is a molecular landmark of cancer. The major alterations can be characterized from three aspects: the global alterations, the site-level local alterations (especially hyper-methylations in CpG islands and promoters) and their impacts on gene expressions. Taking HCC as the example, first, the global analysis indicates that the DNA methylation patterns are significantly rewired during HCC development: about 90% tumor samples acquire strong GDH or CIMP. Differential methylation analysis shows that many promoter regions are selectively hyper-methylated in tumors. The classifiers based on 10 selected hyper-methylated sites achieved high performances to discriminate tumor and non-tumor samples. These sites can be further investigated as candidate molecular markers. By integrative analysis with gene expressions, we identified 222 candidate epigenetic driver genes whose expressions are strongly negatively regulated by promoter methylations. Several high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with overall survivals of HCC patients. In summary, this study suggests that DNA methylation alterations are good candidates for biomarker discovery, and their impacts on gene expression can expand our catalog of cancer drivers. Key Points A practical guideline to identify candidate epigenetic markers and drivers based on multiple genome-wide DNA methylation data sets. Global analysis shows that ∼90% HCC samples acquire GDH or CpGi methylator phenotype (CIMP). Hyper-methylated sites in promoter regions are candidate molecular markers to discriminate tumor and non-tumor samples. Correlation analysis of DNA methylation and gene expression data identified a set of candidate epigenetic drivers enriched in inflammatory response and metabolic processes. A few high-confidence candidate drivers are significantly associated with patient overall survivals, including SFN, SPP1 and TKT. Supplementary data Supplementary data are available online at http://bib.oxfordjournals.org/. Yongchang Zheng, MD, is a doctor in Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College Qianqian Huang is a master’s student of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Zijian Ding is a PhD student of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Tingting Liu, MSc, is a bioinformatics scientist at My Health Gene Technology Co., Ltd. Chenghai Xue, PhD, is a bioinformatics scientist at My Health Gene Technology Co., Ltd. He is also a member at Joint Laboratory of Large-scale Medical Data Pattern Mining and Application, Institute of Automation, Chinese academy of sciences, Beijing, China. Xinting Sang, MD, is the Director of Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College Jin Gu, PhD, is an Assistant Professor of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Acknowledgements We thank Dongfang Wang for clustering analysis and Jinxia Guan for helpful discussions. Funding National Basic Research Program of China (2012CB316503), National Natural Science Foundation of China (61370035 and 31361163004) and Tsinghua University Initiative Scientific Research Program. References 1 Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet  2007; 8: 286– 98. Google Scholar CrossRef Search ADS PubMed  2 Kulis M, Esteller M. DNA methylation and cancer. Adv Genet  2010; 70: 27– 56. Google Scholar PubMed  3 BLUEPRINT Consortium. Quantitative comparison of DNA methylation assays for biomarker development and clinical applications. Nat Biotechnol  2016; 34: 726– 37. CrossRef Search ADS PubMed  4 Heyn H, Esteller M. DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet  2012; 13: 679– 92. Google Scholar CrossRef Search ADS PubMed  5 Laird PW. The power and the promise of DNA methylation markers. Nat Rev Cancer  2003; 3: 253– 66. Google Scholar CrossRef Search ADS PubMed  6 Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Brief Bioinform  2016, doi: 10.1093/bib/bbw063. 7 Church TR, Wandell M, Lofton-Day C, et al.   Prospective evaluation of methylated SEPT9 in plasma for detection of asymptomatic colorectal cancer. Gut  2014; 63: 317– 25. Google Scholar CrossRef Search ADS PubMed  8 Baylin SB, Ohm JE. Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction? Nat Rev Cancer  2006; 6: 107– 16. Google Scholar CrossRef Search ADS PubMed  9 De Carvalho DD, Sharma S, You JS, et al.   DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell  2012; 21: 655– 67. Google Scholar CrossRef Search ADS PubMed  10 Bruix J, Gores GJ, Mazzaferro V. Hepatocellular carcinoma: clinical frontiers and perspectives. Gut  2014; 63: 844– 55. Google Scholar CrossRef Search ADS PubMed  11 El-Serag HB. Hepatocellular carcinoma. N Engl J Med  2011; 365: 1118– 27. Google Scholar CrossRef Search ADS PubMed  12 Wang H, Chen L. Tumor microenviroment and hepatocellular carcinoma metastasis. J Gastroenterol Hepatol  2013; 28(Suppl 1): 43– 8. Google Scholar CrossRef Search ADS PubMed  13 Shen J, Wang S, Zhang YJ, et al.   Genome-wide DNA methylation profiles in hepatocellular carcinoma. Hepatology  2012; 55: 1799– 808. Google Scholar CrossRef Search ADS PubMed  14 Shen J, Wang S, Zhang YJ, et al.   Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips. Epigenetics  2013; 8: 34– 43. Google Scholar CrossRef Search ADS PubMed  15 Villanueva A, Portela A, Sayols S, et al.   DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology  2015; 61: 1945– 56. Google Scholar CrossRef Search ADS PubMed  16 Eden A, Gaudet F, Waghmare A, et al.   Chromosomal instability and tumors promoted by DNA hypomethylation. Science  2003; 300: 455. Google Scholar CrossRef Search ADS PubMed  17 Ehrlich M. DNA hypomethylation in cancer cells. Epigenomics  2009; 1: 239– 59. Google Scholar CrossRef Search ADS PubMed  18 Hughes LA, Melotte V, de Schrijver J, et al.   The CpG island methylator phenotype: what's in a name? Cancer Res  2013; 73: 5858– 68. Google Scholar CrossRef Search ADS PubMed  19 Issa JP. CpG island methylator phenotype in cancer. Nat Rev Cancer  2004; 4: 988– 93. Google Scholar CrossRef Search ADS PubMed  20 Belinsky SA. Gene-promoter hypermethylation as a biomarker in lung cancer. Nat Rev Cancer  2004; 4: 707– 17. Google Scholar CrossRef Search ADS PubMed  21 Zhang YA, Ma X, Sathe A, et al.   Validation of SCT methylation as a hallmark biomarker for lung cancers. J Thorac Oncol  2016; 11: 346– 60. Google Scholar CrossRef Search ADS PubMed  22 Wu D, Gu J, Zhang MQ. FastDMA: an infinium humanmethylation450 beadchip analyzer. PLoS One  2013; 8: e74275. Google Scholar CrossRef Search ADS PubMed  23 Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw  2010; 33: 1– 22. Google Scholar CrossRef Search ADS PubMed  24 Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol  2005; 67: 301– 20. Google Scholar CrossRef Search ADS   25 Ding Z, Zu S, Gu J. Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics  2016, doi: 10.1093/bioinformatics/btw344. 26 Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet  2012; 13: 484– 92. Google Scholar CrossRef Search ADS PubMed  27 Dennis GJr., Sherman BT, Hosack DA, et al.   DAVID: database for annotation, visualization, and integrated discovery. Genome Biol  2003; 4: P3. Google Scholar CrossRef Search ADS PubMed  28 Shang S, Plymoth A, Ge S, et al.   Identification of osteopontin as a novel marker for early hepatocellular carcinoma. Hepatology  2012; 55: 483– 90. Google Scholar CrossRef Search ADS PubMed  29 Wan HG, Xu H, Gu YM, et al.   Comparison osteopontin vs AFP for the diagnosis of HCC: a meta-analysis. Clin Res Hepatol Gastroenterol  2014; 38: 706– 14. Google Scholar CrossRef Search ADS PubMed  30 Tsuchiya N, Sawada Y, Endo I, et al.   Biomarkers for the early diagnosis of hepatocellular carcinoma. World J Gastroenterol  2015; 21: 10573– 83. Google Scholar CrossRef Search ADS PubMed  31 Ye QH, Qin LX, Forgues M, et al.   Predicting hepatitis B virus-positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nat Med  2003; 9: 416– 23. Google Scholar CrossRef Search ADS PubMed  32 Shiba-Ishii A, Noguchi M. Aberrant stratifin overexpression is regulated by tumor-associated CpG demethylation in lung adenocarcinoma. Am J Pathol  2012; 180: 1653– 62. Google Scholar CrossRef Search ADS PubMed  33 Shiba-Ishii A, Kim Y, Shiozawa T, et al.   Stratifin accelerates progression of lung adenocarcinoma at an early stage. Mol Cancer  2015; 14: 142. Google Scholar CrossRef Search ADS PubMed  34 Tan GS, Lim KH, Tan HT, et al.   Novel proteomic biomarker panel for prediction of aggressive metastatic hepatocellular carcinoma relapse in surgically resectable patients. J Proteome Res  2014; 13: 4833– 46. Google Scholar CrossRef Search ADS PubMed  35 Xu IM, Lai RK, Lin SH, et al.   Transketolase counteracts oxidative stress to drive cancer development. Proc Natl Acad Sci USA  2016; 113: E725– 34. Google Scholar CrossRef Search ADS PubMed  36 Tan MG, Ooi LL, Aw SE, et al.   Cloning and identification of hepatocellular carcinoma down-regulated mitochondrial carrier protein, a novel liver-specific uncoupling protein. J Biol Chem  2004; 279: 45235– 44. Google Scholar CrossRef Search ADS PubMed  © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma

Loading next page...
 
/lp/ou_press/genome-wide-dna-methylation-analysis-identifies-candidate-epigenetic-5GWDIdC066
Publisher
Oxford University Press
Copyright
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bbw094
Publisher site
See Article on Publisher Site

Abstract

Abstract The alteration of DNA methylation landscape is a key epigenetic event in cancer. As the accumulation of large-scale genome-wide DNA methylation data from clinical samples, we are able to characterize the patterns of DNA methylation alterations for identifying candidate epigenetic markers and drivers. In this survey, we take hepatocellular carcinoma (HCC) as an example to show the basic steps of analyzing the DNA methylation patterns in cancer across multiple data sets. We collected three genome-wide DNA methylation data sets with ∼800 clinical samples and the corresponding gene expression data sets. First, by quantitatively analyzing two global methylation alterations, it is found that about 90% tumors acquire either genome-wide DNA hypo-methylation or CpG island methylator phenotype. Second, probe-level analysis identified 267, 228 and 197 hyper-methylated sites in promoter regions for the three data sets, respectively. These local hyper-methylated patterns are highly consistent: 84 sites (from 61 promoters) are hyper-methylated in all the three studied data sets, including many previously reported genes, such as CDKL2, TBX15 and NKX6-2. Then, these hyper-methylated sites were used as candidate markers to classify tumor and non-tumor samples. The classifiers based on only 10 selected probes can achieve high discriminative ability across different data sets. Finally, by integrative analyzing DNA methylation and gene expression data, we identified 222 candidate epigenetic drivers, which are enriched in inflammatory response and multiple metabolic pathways. A set of high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with patients’ overall survivals. In summary, this study systematically characterized the DNA methylation alterations and their impacts on gene expressions in HCCs based on multiple data sets. DNA methylation, molecular marker, epigenetic driver, hepatocellular carcinoma Introduction DNA methylation, a key epigenetic regulatory factor, plays important roles in cancer initiation and progression [1, 2]. The ubiquity of the DNA methylation alterations provides a novel choice of molecular markers for cancer diagnosis and prognosis [3–7]. Also, integrative analysis of DNA methylation and gene expression can help the identification of epigenetic drivers in cancer [8, 9]. Hepatocellular carcinoma (HCC) is one of most common solid tumors worldwide [10–12]. Genome-wide differential DNA methylations [13, 14] and prognostic epigenetic signatures [15] in HCCs have been reported in a few studies. As the accumulation of genome-wide DNA methylation data sets from clinical samples, we are able to systematically characterize the DNA methylation patterns and their impacts on gene expressions of HCCs. In this study, we collected three genome-wide DNA methylation data sets of 646 tumor and 134 non-tumor samples in total from HCC patients. First, we analyzed two global DNA methylation alterations: the genome-wide DNA hypo-methylation (GDH) [16, 17] and the CpG island methylator phenotype (CIMP) [18, 19]. Results show that about 90% tumors acquire one of the two alterations. Then, we analyzed the hyper-methylated sites in promoter regions, which can be used as candidate markers for discriminating tumor and non-tumor samples [13, 20, 21]. Results show that the hyper-methylated sites are highly consistent in different data sets. The classifiers, which were built based on one data set, have high discriminative ability in the other two data sets. The performances can be further validated in an independent cohort of 10 HCC patients. By integrative analysis of promoter DNA methylations and gene expressions, a tendency of negative correlations is observed. And, the correlations are highly consistent between different data sets. The anti-correlations identified 222 candidate epigenetic driver genes corresponding to 462 promoter sites. These candidates are highly enriched in inflammatory response and multiple metabolic processes. By further comparing the promoter methylation levels and gene expressions in tumor and non-tumor samples, a set of high-confidence candidates were identified. Some of them are significantly associated with patients’ overall survivals, including SFN, SPP1 and TKT. Materials and methods Data sets Three public genome-wide DNA methylation data sets of HCCs (all detected by Illumina Infinium HumanMethylation450 Beadchip) were downloaded from NCBI GEO databases (DS1: GSE54503 [13], DS2: GSE56588 [15]) and TCGA data portal (DS3: TCGA-LIHC). The probes with low-quality signals (detected P-value > 0.05), and those overlapped with single-nucleotide polymorphisms or cross-hybridized with multiple genomic loci were removed as recommended [22]. The beta-values, which represent the methylation levels of probe target sites, were used in the following computational analyses. The corresponding gene expression data of two data sets (DS2 and DS3) were also collected. For the validation cohort, the DNA methylations of 10 paired tumor and non-tumor samples resected from HCC patients were profiled by the same platform. This study was approved by the ethics committee of Peking Union Medical College Hospital. The written informed consents were collected from all the participants. Please see the details of all the used data sets in Table 1. The curated data sets are available via the Web site http://bioinfo.au.tsinghua.edu.cn/member/jgu/hcc-dnameth/. Table 1 The data sets used in this study Data set  DNA methylation   Gene expression   Source  #Ta  #NTb  Source  #T  #NT  DS1  GSE54503  66  66        DS2  GSE56588  224  19  GSE63898  228  168  DS3  TCGAc  353  49  TCGA  356  49  DS4  PUMCHd  10  10        Data set  DNA methylation   Gene expression   Source  #Ta  #NTb  Source  #T  #NT  DS1  GSE54503  66  66        DS2  GSE56588  224  19  GSE63898  228  168  DS3  TCGAc  353  49  TCGA  356  49  DS4  PUMCHd  10  10        aThe number of the tumor samples. bThe number of the non-tumor samples. cOnly the patients from LIHC annotated with ‘Hepatocellular Carcinoma’ are used. dAn additional validation cohort. The probes in CpG islands were derived from Bioconductor annotation database IlluminaHuman-Methylation450k.db. The probes in promoter regions (−1500 ∼ +500 around the transcription start sites) and gene body regions were re-annotated according to ENSEMBL Release GRCh37-r75. Global measurements of DNA methylation alterations Previous studies suggest that GDH and CIMP are two typical patterns of global DNA methylation alterations in cancer. For measuring the levels of GDHs, the medians of all probes in non-tumor samples were used as the reference. Then, a GDH z-score was calculated against the reference for each sample:   ZiGDH=Mediani-AVEMedianNon-TumorSDMedianNon-Tumor. For measuring the levels of CIMPs, the 3rd quantile (the quantile with the higher methylation level) of the probes located in CpG islands in non-tumor samples were used as the reference. Then, a CIMP z-score was calculated against the reference for each sample:   ZiCIMP=ThridQuantilei-AVEThirdQuantileNon-TumorSDThirdQuantileNon-Tumor. More negative GDH z-score means more severe global hypo-methylation, and more positive CIMP z-score means more severe CpG island hyper-methylation. Classifiers for discriminating tumor and non-tumor samples To objectively assess the predictive performances of the classifiers, only DS1 (GSE54503) was used for training. First, the hyper-methylated sites in promoter regions were preselected by comparing the DNA methylation levels between tumor and non-tumor samples in the training data set by FastDMA (a package to identify differential methylated probes and regions based on a generalized linear model) [22] with q-value < 1e-10 and increased methylation level > 0.3. Based on the two global features or the preselected probes, three classifiers were built based on the training set. Area under the curves and specificity/sensitivity were used to assess the performances. Three different classifiers were built based on different features: (1) the ‘Global’ classifier based on the two global features; (2) the ‘Top10’ classifier based on the top 10 probes with the most hyper-methylations in promoter regions; and (3) the ‘ENet’ classifier based on the 10 probes (promoter regions) selected by elastic net [23, 24] with bootstrapping. Elastic net is a regularized linear regression algorithm that can do feature selections via model fitting. The 10 probes most recurrently selected during bootstrapping were used in the final classifier, according to the computational framework proposed in reference [25]. Identifications of candidate epigenetic drivers To avoid the noises caused by the sites with small variations, only the probes in promoters with inter-quantile range > 0.1 in tumor samples were used in this study. Owing to the un-adjustable batch effects of the expression data (one data set by array and another by sequencing), the Spearman’s rank correlations between promoter probes and their corresponding gene expressions were separately calculated for each data set. The genes whose expressions were anti-correlated with the methylation levels of promoter probes were identified as the candidate epigenetic drivers (the z-scores of correlations < −5 in both data sets). Then, a set of high-confidence candidates were selected by further comparing the differential DNA methylations (differential methylation level > 0.1) and gene expressions (fold change > 1.5) between tumor and non-tumor tissues: oncogenic drivers were defined as hypo-methylated and over-expressed in tumor samples; and tumor-suppressive drivers were defined as hyper-methylated and down-expressed in tumor samples. Results Analysis of global DNA methylation alterations GDH (an indication of genome instability) and CIMP are two typical epigenetic alterations in solid tumors. We proposed quantile-based z-scores to measure the two alterations: (1) the median methylation levels of all probes in adjacent normal samples were used as the reference to calculate the z-scores for global hypo-methylation phenotypes (z-score < −3 for GDH); and (2) the third quantile methylation levels of CpGi probes in adjacent normal samples were used as the reference to calculate the z-scores for CIMPs (z-score > 3 for CIMP). Results show that 90.1% (585 of 646) HCC samples have either GDH or CIMP, and 32.4% (209 of 646) samples have both phenotypes. The two phenotypes have correlated but contradictory effects on either other. Only 2.2% (3 of 134) adjacent normal samples are not ‘Normal-Like’ (both z-scores between −3 and 3) and 9.1% (59 of 646) HCC samples are classified as ‘Normal-Like’ (Figure 1A). But the four subtypes do not correlate with HCC invasions, clinical stages or patients’ overall survivals (Supplementary Figure S1). Only using the two global features, classifiers can achieve high performance for discriminating tumor and non-tumor samples (Figure 1B). Figure 1 View largeDownload slide The two global features of DNA methylation alterations in HCCs. (A) The quantitative measurements of GDH and CIMP. (B) The predictive performances based on the two global features. Figure 1 View largeDownload slide The two global features of DNA methylation alterations in HCCs. (A) The quantitative measurements of GDH and CIMP. (B) The predictive performances based on the two global features. Identifications of DNA methylation-based molecular markers Hyper-methylation of CpG islands and selective promoters is another feature of DNA methylation alterations in solid tumors. Those hyper-methylated sites in promoter regions are candidate molecular markers of HCCs. By setting a stringent criterion for differential methylation between tumor and non-tumor tissues (FastDMA q-value < 1e-10 and increased methylation level > 0.3), we identified 267, 228 and 197 hyper-methylated sites from 168, 128 and 116 promoters for the three data sets, respectively. These hyper-methylated events are not randomly occurred: 84 sites and 61 promoters are consistently hyper-methylated (Supplementary Figure S2 and Table S1). A set of genes, including CDKL2, DUOXA1, NKX6-2, FSCN1, TBX15, ASCL2, ZSCAN1, DNM3 and TMEM240, have at least three hyper-methylated probes in all the three studied data sets. To evaluate the discriminative ability of these hyper-methylated sites, DS1 (GSE54503) was used as the training data set to build classifiers. Based on the 267 hyper-methylated probes from DS1, two predictive classifiers were built: one based on the top 10 hyper-methylated probes and the other one based 10 computationally selected probes (named as Top10 and ENet classifiers, respectively. Please see details in the ‘Methods’ section). Then, the classifiers were applied on the other two data sets. Results show that the classifiers can achieve high predictive performances (Figure 2A, B): for the Top10 classifier, the sensitivities are > 86% (91.1% and 86.2% in DS2 and DS3, respectively) and specificities are almost 100% (100% and 98%) in the two testing data sets; for the ENet classifier, the sensitivities are lower (89.3% and 78.4%), but both specificities are 100%. To further test the predictive performance, an additional validation data set was generated from paired tumor and non-tumor samples of 10 HCC patients. Similar performances can be observed (Table 2). The selected probes in the classifiers are lowly methylated in adjacent normal tissues and consistently hyper-methylated in HCC tissues (Figure 2C–F). Above results indicate that the selective hyper-methylated promoter sites are good candidates for molecular markers. Table 2 The classification performances based on hyper-methylated probes Performance  DS1 (GSE54503)   DS2 (GSE56588)   DS3 (TCGA)   DS4 (PUMCH)   Globala  ENetb  Top10c  Global  ENet  Top10  Global  ENet  Top10  Global  ENet  Top10  Sensitivity  62/66  66/66  65/66  206/224  200/224  204/224  319/356  279/356  307/356  10/10  9/10  10/10  (93.9%)  (100%)  (98.5%)  (92.0%)  (89.3%)  (91.1%)  (89.6%)  (78.4%)  (86.2%)  (100%)  (90%)  (100%)  Specificity  65/66  66/66  66/66  19/19  19/19  19/19  48/49  49/49  48/49  10/10  10/10  8/10  (98.5%)  (100%)  (100%)  (100%)  (100%)  (100%)  (98.0%)  (100%)  (98.0%)  (100%)  (100%)  (80%)  Performance  DS1 (GSE54503)   DS2 (GSE56588)   DS3 (TCGA)   DS4 (PUMCH)   Globala  ENetb  Top10c  Global  ENet  Top10  Global  ENet  Top10  Global  ENet  Top10  Sensitivity  62/66  66/66  65/66  206/224  200/224  204/224  319/356  279/356  307/356  10/10  9/10  10/10  (93.9%)  (100%)  (98.5%)  (92.0%)  (89.3%)  (91.1%)  (89.6%)  (78.4%)  (86.2%)  (100%)  (90%)  (100%)  Specificity  65/66  66/66  66/66  19/19  19/19  19/19  48/49  49/49  48/49  10/10  10/10  8/10  (98.5%)  (100%)  (100%)  (100%)  (100%)  (100%)  (98.0%)  (100%)  (98.0%)  (100%)  (100%)  (80%)  aThe performances based on two global features. bThe performances based on the 10 probes selected by elastic net with bootstrapping. cThe performances based on the top 10 probes with the largest differential methylation levels. Figure 2 View largeDownload slide The discriminative abilities and methylation patterns of selected hyper-methylated probes in promoter regions. (A) The predictive performance of the classifier based on 10 most hyper-methylated probes. (B) The predictive performance of the classifier based on 10 selected probes by elastic net. (C–F) The heatmaps of the selected probes in all the studied data sets. Figure 2 View largeDownload slide The discriminative abilities and methylation patterns of selected hyper-methylated probes in promoter regions. (A) The predictive performance of the classifier based on 10 most hyper-methylated probes. (B) The predictive performance of the classifier based on 10 selected probes by elastic net. (C–F) The heatmaps of the selected probes in all the studied data sets. Identifications of candidate epigenetic drivers by integrative analysis The DNA methylation of promoter regions (especially CpG island associated promoters) could negatively regulate gene transcription [9, 26]. Correlation analysis shows overall negative correlations between promoter methylations and gene expressions in tumor samples (Figure 3A, B), and the correlations were highly consistent in DS2 and DS3 (Figure 3C). A set of 222 candidate epigenetic drivers (with 462 probes), whose expressions were anti-correlated with at least one promoter probe methylations in tumor samples in both data sets, were identified. These candidates are highly enriched in inflammatory response (26 genes, adjusted P-value 7.6e-6) and several metabolic processes, such as ‘xenobiotic metabolic process’ (19 genes, 8.6e-9), ‘carboxylic acid metabolic process’ (39 genes, 3.5e-8), ‘oxidation-reduction process’ (36 genes, adjusted P-value 2.2e-6) (by DAVID v6.8beta [27]). Figure 3 View largeDownload slide Correlation analysis of promoter methylations and gene expressions. (A, B) The distribution of Spearman’s correlations between promoter methylation levels and gene expressions in DS2 and DS3, respectively. (C) The second-order correlations between the correlations of DS2 and DS3. Figure 3 View largeDownload slide Correlation analysis of promoter methylations and gene expressions. (A, B) The distribution of Spearman’s correlations between promoter methylation levels and gene expressions in DS2 and DS3, respectively. (C) The second-order correlations between the correlations of DS2 and DS3. These candidate epigenetic drivers were further investigated according to their differential methylations and expressions between tumor and non-tumor samples (Supplementary Table S2). A set of ‘high-confidence’ oncogenic candidates were identified as over-expressed and hypo-methylated in tumor samples, including SFN, SPP1, ACSL4, ALDH3A1, CLDN15, CYP7A1, TKT and GLUL (Table 3). The following survival analysis shows that the high expressions of SFN, SPP1 and TKT in tumor tissues are significantly associated with poor survivals (Figure 4). SPP1, named as osteopontin, has been widely studied in HCCs as a novel molecular marker [28–30] and can promote metastasis and stemness [31]. SFN, named as stratifin, is activated by promoter hypo-methylation in lung adenocarcinoma [32] and can accelerate tumor development [33]. TKT, named as transketolase, is over-expressed in metastatic relapsed HCC [34] and can promote cancer progression [35]. Also, the high-confidence tumor-suppressive candidates were identified as under-expressed and hyper-methylated in tumor samples, including SLC25A47, APOA5, ENDOD1, EXOCL3L4, EZR and PHYHD1 (Table 3). SLC24A47, named as HCC down-regulated mitochondrial carrier protein [36], is strongly under-expressed in HCCs, but its role in HCC remains unknown. Taken together, above results indicate that integrative analysis of promoter DNA methylation and gene expressions in multiple data sets can facilitate the identification of epigenetic drivers in cancer. Table 3 The high-confidence candidate epigenetic drivers Symbol  Differential expressiona(log2FC)   Probe  Correlationb(Transformed Z-score)   Differential methylation levelc  DS2  DS3  DS2  DS3  SFN  0.94  3.59  cg07786675  −10.76  −18.33  −0.117        cg13466284  −10.22  −21.91  −0.229        cg17330303  −10.49  −21.08  −0.224        cg06720467  −8.79  −17.88  −0.255        cg21950166  −9.12  −18.89  −0.237        cg11348165  −10.50  −19.44   −−0.164  SLC25A47  −2.26  −5.11  cg00946753  −12.86  −20.15  0.182        cg13640769  −11.61  −18.82  0.143        cg00239071  −11.13  −20.72  0.215  TKT  1.36  1.14  cg07918978  −8.02  −12.43  −0.124        cg00707777  −8.96  −12.75  −0.138        cg19378537  −7.65  −11.03  −0.104  ARL4C  −1.06  −0.83  cg05308656  −8.98  −9.28  0.132        cg09453076  −7.85  −8.91  0.169  CLDN15  0.98  1.28  cg07651914  −9.59  −14.55  −0.116        cg08636573  −8.84  −14.45  −0.177  QSOX1  −1.43  −1.03  cg03208016  −5.88  −8.70  0.100        cg09505809  −5.72  −5.70  0.111  SPP1  1.43  1.83  cg00088885  −7.74  −7.77  −0.214        cg15460348  −12.67  −17.22  −0.148  TIMP2  −0.99  −0.61  cg06641285  −6.37  −8.39  0.299        cg05306745  −7.76  −9.36  0.363  ACSL4  0.95  2.08  cg14457256  −8.42  −13.56  −0.147  APOA5  −1.48  −2.44  cg02157083  −7.28  −13.47  0.113  C17orf58  1.05  0.69  cg12739664  −6.37  −6.27  −0.122  ENDOD1  −0.82  −1.02  cg16317734  −5.82  −8.65  0.278  EZR  −1.44  −0.62  cg22812275  −6.05  −7.67  0.123  GSTP1  −0.81  −0.81  cg04920951  −8.04  −10.60  0.258  PHYHIPL  1.02  0.91  cg06972969  −7.35  −6.89  −0.126  PTGDS  −2.60  −1.64  cg13796381  −5.89  −6.59  0.196  RAMP1  1.10  0.59  cg03647559  −6.73  −10.38  −0.109  SHISA4  0.78  0.85  cg11586189  −7.73  −9.45  −0.104  SLC22A1  −2.73  −4.70  cg13434757  −7.20  −11.14  0.145  Symbol  Differential expressiona(log2FC)   Probe  Correlationb(Transformed Z-score)   Differential methylation levelc  DS2  DS3  DS2  DS3  SFN  0.94  3.59  cg07786675  −10.76  −18.33  −0.117        cg13466284  −10.22  −21.91  −0.229        cg17330303  −10.49  −21.08  −0.224        cg06720467  −8.79  −17.88  −0.255        cg21950166  −9.12  −18.89  −0.237        cg11348165  −10.50  −19.44   −−0.164  SLC25A47  −2.26  −5.11  cg00946753  −12.86  −20.15  0.182        cg13640769  −11.61  −18.82  0.143        cg00239071  −11.13  −20.72  0.215  TKT  1.36  1.14  cg07918978  −8.02  −12.43  −0.124        cg00707777  −8.96  −12.75  −0.138        cg19378537  −7.65  −11.03  −0.104  ARL4C  −1.06  −0.83  cg05308656  −8.98  −9.28  0.132        cg09453076  −7.85  −8.91  0.169  CLDN15  0.98  1.28  cg07651914  −9.59  −14.55  −0.116        cg08636573  −8.84  −14.45  −0.177  QSOX1  −1.43  −1.03  cg03208016  −5.88  −8.70  0.100        cg09505809  −5.72  −5.70  0.111  SPP1  1.43  1.83  cg00088885  −7.74  −7.77  −0.214        cg15460348  −12.67  −17.22  −0.148  TIMP2  −0.99  −0.61  cg06641285  −6.37  −8.39  0.299        cg05306745  −7.76  −9.36  0.363  ACSL4  0.95  2.08  cg14457256  −8.42  −13.56  −0.147  APOA5  −1.48  −2.44  cg02157083  −7.28  −13.47  0.113  C17orf58  1.05  0.69  cg12739664  −6.37  −6.27  −0.122  ENDOD1  −0.82  −1.02  cg16317734  −5.82  −8.65  0.278  EZR  −1.44  −0.62  cg22812275  −6.05  −7.67  0.123  GSTP1  −0.81  −0.81  cg04920951  −8.04  −10.60  0.258  PHYHIPL  1.02  0.91  cg06972969  −7.35  −6.89  −0.126  PTGDS  −2.60  −1.64  cg13796381  −5.89  −6.59  0.196  RAMP1  1.10  0.59  cg03647559  −6.73  −10.38  −0.109  SHISA4  0.78  0.85  cg11586189  −7.73  −9.45  −0.104  SLC22A1  −2.73  −4.70  cg13434757  −7.20  −11.14  0.145  aThe log2-transformed fold changes between tumor and non-tumor samples (>0 means over-expression in tumor samples) in DS2 and DS3. bThe Spearman’s rank correlations (transformed as z-scores by Fisher’s transformation) between gene expressions and probe methylation levels in tumor samples of DS2 and DS3. cThe differential methylation levels between tumor and non-tumor samples (>0 means hyper-methylation in tumor samples) across all the three studied data sets. Figure 4 View largeDownload slide The high-confidence candidate epigenetic drivers significantly associated with overall survivals, including SFN, SPP1 and TKT. In the left and middle columns, the samples are ordered according to the decreasing gene expressions in each data set and the methylation levels of the corresponding anti-correlated probes are plotted. The right column shows the survival curves according to the gene expressions. Figure 4 View largeDownload slide The high-confidence candidate epigenetic drivers significantly associated with overall survivals, including SFN, SPP1 and TKT. In the left and middle columns, the samples are ordered according to the decreasing gene expressions in each data set and the methylation levels of the corresponding anti-correlated probes are plotted. The right column shows the survival curves according to the gene expressions. Discussion The DNA methylation alteration is a molecular landmark of cancer. The major alterations can be characterized from three aspects: the global alterations, the site-level local alterations (especially hyper-methylations in CpG islands and promoters) and their impacts on gene expressions. Taking HCC as the example, first, the global analysis indicates that the DNA methylation patterns are significantly rewired during HCC development: about 90% tumor samples acquire strong GDH or CIMP. Differential methylation analysis shows that many promoter regions are selectively hyper-methylated in tumors. The classifiers based on 10 selected hyper-methylated sites achieved high performances to discriminate tumor and non-tumor samples. These sites can be further investigated as candidate molecular markers. By integrative analysis with gene expressions, we identified 222 candidate epigenetic driver genes whose expressions are strongly negatively regulated by promoter methylations. Several high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with overall survivals of HCC patients. In summary, this study suggests that DNA methylation alterations are good candidates for biomarker discovery, and their impacts on gene expression can expand our catalog of cancer drivers. Key Points A practical guideline to identify candidate epigenetic markers and drivers based on multiple genome-wide DNA methylation data sets. Global analysis shows that ∼90% HCC samples acquire GDH or CpGi methylator phenotype (CIMP). Hyper-methylated sites in promoter regions are candidate molecular markers to discriminate tumor and non-tumor samples. Correlation analysis of DNA methylation and gene expression data identified a set of candidate epigenetic drivers enriched in inflammatory response and metabolic processes. A few high-confidence candidate drivers are significantly associated with patient overall survivals, including SFN, SPP1 and TKT. Supplementary data Supplementary data are available online at http://bib.oxfordjournals.org/. Yongchang Zheng, MD, is a doctor in Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College Qianqian Huang is a master’s student of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Zijian Ding is a PhD student of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Tingting Liu, MSc, is a bioinformatics scientist at My Health Gene Technology Co., Ltd. Chenghai Xue, PhD, is a bioinformatics scientist at My Health Gene Technology Co., Ltd. He is also a member at Joint Laboratory of Large-scale Medical Data Pattern Mining and Application, Institute of Automation, Chinese academy of sciences, Beijing, China. Xinting Sang, MD, is the Director of Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College Jin Gu, PhD, is an Assistant Professor of bioinformatics at MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University Acknowledgements We thank Dongfang Wang for clustering analysis and Jinxia Guan for helpful discussions. Funding National Basic Research Program of China (2012CB316503), National Natural Science Foundation of China (61370035 and 31361163004) and Tsinghua University Initiative Scientific Research Program. References 1 Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet  2007; 8: 286– 98. Google Scholar CrossRef Search ADS PubMed  2 Kulis M, Esteller M. DNA methylation and cancer. Adv Genet  2010; 70: 27– 56. Google Scholar PubMed  3 BLUEPRINT Consortium. Quantitative comparison of DNA methylation assays for biomarker development and clinical applications. Nat Biotechnol  2016; 34: 726– 37. CrossRef Search ADS PubMed  4 Heyn H, Esteller M. DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet  2012; 13: 679– 92. Google Scholar CrossRef Search ADS PubMed  5 Laird PW. The power and the promise of DNA methylation markers. Nat Rev Cancer  2003; 3: 253– 66. Google Scholar CrossRef Search ADS PubMed  6 Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Brief Bioinform  2016, doi: 10.1093/bib/bbw063. 7 Church TR, Wandell M, Lofton-Day C, et al.   Prospective evaluation of methylated SEPT9 in plasma for detection of asymptomatic colorectal cancer. Gut  2014; 63: 317– 25. Google Scholar CrossRef Search ADS PubMed  8 Baylin SB, Ohm JE. Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction? Nat Rev Cancer  2006; 6: 107– 16. Google Scholar CrossRef Search ADS PubMed  9 De Carvalho DD, Sharma S, You JS, et al.   DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell  2012; 21: 655– 67. Google Scholar CrossRef Search ADS PubMed  10 Bruix J, Gores GJ, Mazzaferro V. Hepatocellular carcinoma: clinical frontiers and perspectives. Gut  2014; 63: 844– 55. Google Scholar CrossRef Search ADS PubMed  11 El-Serag HB. Hepatocellular carcinoma. N Engl J Med  2011; 365: 1118– 27. Google Scholar CrossRef Search ADS PubMed  12 Wang H, Chen L. Tumor microenviroment and hepatocellular carcinoma metastasis. J Gastroenterol Hepatol  2013; 28(Suppl 1): 43– 8. Google Scholar CrossRef Search ADS PubMed  13 Shen J, Wang S, Zhang YJ, et al.   Genome-wide DNA methylation profiles in hepatocellular carcinoma. Hepatology  2012; 55: 1799– 808. Google Scholar CrossRef Search ADS PubMed  14 Shen J, Wang S, Zhang YJ, et al.   Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips. Epigenetics  2013; 8: 34– 43. Google Scholar CrossRef Search ADS PubMed  15 Villanueva A, Portela A, Sayols S, et al.   DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology  2015; 61: 1945– 56. Google Scholar CrossRef Search ADS PubMed  16 Eden A, Gaudet F, Waghmare A, et al.   Chromosomal instability and tumors promoted by DNA hypomethylation. Science  2003; 300: 455. Google Scholar CrossRef Search ADS PubMed  17 Ehrlich M. DNA hypomethylation in cancer cells. Epigenomics  2009; 1: 239– 59. Google Scholar CrossRef Search ADS PubMed  18 Hughes LA, Melotte V, de Schrijver J, et al.   The CpG island methylator phenotype: what's in a name? Cancer Res  2013; 73: 5858– 68. Google Scholar CrossRef Search ADS PubMed  19 Issa JP. CpG island methylator phenotype in cancer. Nat Rev Cancer  2004; 4: 988– 93. Google Scholar CrossRef Search ADS PubMed  20 Belinsky SA. Gene-promoter hypermethylation as a biomarker in lung cancer. Nat Rev Cancer  2004; 4: 707– 17. Google Scholar CrossRef Search ADS PubMed  21 Zhang YA, Ma X, Sathe A, et al.   Validation of SCT methylation as a hallmark biomarker for lung cancers. J Thorac Oncol  2016; 11: 346– 60. Google Scholar CrossRef Search ADS PubMed  22 Wu D, Gu J, Zhang MQ. FastDMA: an infinium humanmethylation450 beadchip analyzer. PLoS One  2013; 8: e74275. Google Scholar CrossRef Search ADS PubMed  23 Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw  2010; 33: 1– 22. Google Scholar CrossRef Search ADS PubMed  24 Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol  2005; 67: 301– 20. Google Scholar CrossRef Search ADS   25 Ding Z, Zu S, Gu J. Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics  2016, doi: 10.1093/bioinformatics/btw344. 26 Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet  2012; 13: 484– 92. Google Scholar CrossRef Search ADS PubMed  27 Dennis GJr., Sherman BT, Hosack DA, et al.   DAVID: database for annotation, visualization, and integrated discovery. Genome Biol  2003; 4: P3. Google Scholar CrossRef Search ADS PubMed  28 Shang S, Plymoth A, Ge S, et al.   Identification of osteopontin as a novel marker for early hepatocellular carcinoma. Hepatology  2012; 55: 483– 90. Google Scholar CrossRef Search ADS PubMed  29 Wan HG, Xu H, Gu YM, et al.   Comparison osteopontin vs AFP for the diagnosis of HCC: a meta-analysis. Clin Res Hepatol Gastroenterol  2014; 38: 706– 14. Google Scholar CrossRef Search ADS PubMed  30 Tsuchiya N, Sawada Y, Endo I, et al.   Biomarkers for the early diagnosis of hepatocellular carcinoma. World J Gastroenterol  2015; 21: 10573– 83. Google Scholar CrossRef Search ADS PubMed  31 Ye QH, Qin LX, Forgues M, et al.   Predicting hepatitis B virus-positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nat Med  2003; 9: 416– 23. Google Scholar CrossRef Search ADS PubMed  32 Shiba-Ishii A, Noguchi M. Aberrant stratifin overexpression is regulated by tumor-associated CpG demethylation in lung adenocarcinoma. Am J Pathol  2012; 180: 1653– 62. Google Scholar CrossRef Search ADS PubMed  33 Shiba-Ishii A, Kim Y, Shiozawa T, et al.   Stratifin accelerates progression of lung adenocarcinoma at an early stage. Mol Cancer  2015; 14: 142. Google Scholar CrossRef Search ADS PubMed  34 Tan GS, Lim KH, Tan HT, et al.   Novel proteomic biomarker panel for prediction of aggressive metastatic hepatocellular carcinoma relapse in surgically resectable patients. J Proteome Res  2014; 13: 4833– 46. Google Scholar CrossRef Search ADS PubMed  35 Xu IM, Lai RK, Lin SH, et al.   Transketolase counteracts oxidative stress to drive cancer development. Proc Natl Acad Sci USA  2016; 113: E725– 34. Google Scholar CrossRef Search ADS PubMed  36 Tan MG, Ooi LL, Aw SE, et al.   Cloning and identification of hepatocellular carcinoma down-regulated mitochondrial carrier protein, a novel liver-specific uncoupling protein. J Biol Chem  2004; 279: 45235– 44. Google Scholar CrossRef Search ADS PubMed  © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Briefings in BioinformaticsOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve Freelancer

DeepDyve Pro

Price
FREE
$49/month

$360/year
Save searches from
Google Scholar,
PubMed
Create lists to
organize your research
Export lists, citations
Read DeepDyve articles
Abstract access only
Unlimited access to over
18 million full-text articles
Print
20 pages/month
PDF Discount
20% off