Background: Hepatocellular carcinoma (HCC) is the one of the most common cancers and lethal diseases in the world. DNA methylation alteration is frequently observed in HCC and may play important roles in carcinogenesis and diagnosis. Methods: Using the TCGA HCC dataset, we classified HCC patients into different methylation subtypes, identified differentially methylated and expressed genes, and analyzed cis- and trans-regulation of DNA methylation and gene expression. To find potential diagnostic biomarkers for HCC, we screened HCC-specific CpGs by comparing the methylation profiles of 375 samples from HCC patients, 50 normal liver samples, 184 normal blood samples, and 3780 samples from patients with other cancers. A logistic regression model was constructed to distinguish HCC patients from normal controls. Model performance was evaluated using three independent datasets (including 327 HCC samples and 122 normal samples) and ten newly collected biopsies. Results: We identified a group of patients with a CpG island methylator phenotype (CIMP) and found that the overall survival of CIMP patients was poorer than that of non-CIMP patients. Our analyses showed that the cis- regulation of DNA methylation and gene expression was dominated by the negative correlation, while the trans- regulation was more complex. More importantly, we identified six HCC-specific hypermethylated sites as potential diagnostic biomarkers. The combination of six sites achieved ~ 92% sensitivity in predicting HCC, ~ 98% specificity in excluding normal livers, and ~ 98% specificity in excluding other cancers. Compared with previously published methylation markers, our markers are the only ones that can distinguish HCC from other cancers. Conclusions: Overall, our study systematically describes the DNA methylation characteristics of HCC and provides promising biomarkers for the diagnosis of HCC. Keywords: Hepatocellular carcinoma, Methylation, CpG island methylator phenotype, Gene regulation, Specific diagnostic biomarker Background is chronic infection with hepatitis B virus (HBV); in Hepatocellular carcinoma (HCC) is the sixth most com- contrast, the main cause in developed countries, such as mon cancer and the third leading cause of cancer deaths the USA, is infection with hepatitis C virus (HCV) . in the world . Most cases of HCC occur in developing Other risk factors for developing HCC include exposure countries, such as China, and the leading cause of HCC to aflatoxin, excessive alcohol consumption, tobacco smoking, and diabetes . After being affected by one or more of these risk factors, both genetic and epigenetic * Correspondence: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org Key Lab of Computational Biology, CAS-MPG Partner Institute for alterations will emerge, which may result in the activa- Computational Biology, Shanghai Institutes for Biological Sciences, Chinese tion of oncogenes and the inactivation of tumor suppres- Academy of Sciences, Shanghai, China sor genes, leading to the occurrence of hepatocellular Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China carcinoma. The 5-year survival rate is > 70% if patients Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Cheng et al. Genome Medicine (2018) 10:42 Page 2 of 11 are diagnosed at an early stage , while the 5-year that the circulating tumor DNA (ctDNA) methylation of survival rate decreases to approximately 10% for another ten CpGs could also discriminate HCC patients advanced HCC patients . Therefore, early detection of from healthy individuals with a sensitivity of more than HCC is important for increasing the chances for effect- 83% and a specificity of more than 90%. Both CpG sets ive treatment and improving the survival rate. could be good biomarkers for the diagnosis of HCC, but Alpha-fetoprotein (AFP) combined with ultrasonog- neither of these research groups considered whether raphy is a widely used method for the screening and other cancer types could have similar methylation alter- diagnosis of HCC. Marrero et al.  reported the diag- ations; hence, these biomarkers may not be HCC-specific, nostic performance of serum AFP when using a cut-off and specific biomarkers are absent and needed. of 20 ng/mL. Its sensitivity is 59% and specificity 90% In this study, we first classified HCC patients into for all HCC patients. Additionally, the sensitivity is 53% different methylation subtypes and analyzed the cis- and and the specificity 90% for early-stage HCC . Due to trans-regulation of DNA methylation and gene expres- the lack of diagnostic accuracy, the American Associ- sion. Then, we identified six HCC-specific methylation ation for the Study of Liver Diseases and the European biomarkers by comparing HCC with normal livers and Association for the Study of the Liver do not recom- other cancer types. The combinations of two and six mend AFP for HCC diagnosis [6, 7]. The development markers achieved 84.8–92.0 and 90.9–92.4% sensitivity of omics technologies has allowed researchers to choose and 97.0–100% and 97.0–100% specificity, respectively, a single molecule or a panel of multiple molecules as in three independent datasets. potential diagnostic biomarkers. Des-γ-carboxy pro- thrombin (DCP) is a promising serum biomarker. It Methods achieved 74% sensitivity and 70% specificity in all HCC Data preparation patients, as well as 61% sensitivity and 70% specificity DNA methylation, gene expression, and clinical HCC for early-stage HCC at the level of 150 mAU/mL . data were collected from The Cancer Genome Atlas Another serum biomarker, Dickkopf-1 (DKK1), has simi- (TCGA) project (https://portal.gdc.cancer.gov/). The lar sensitivity (~ 70%) and specificity (~ 90%) in all HCC methylation level of CpGs was represented as β values patients and for early-stage HCC at a cut-off of (375 HCC and 50 normal; β = Intensity of the methyl- 2.153 ng/mL . Although many candidate biomarkers ated allele (M)/[Intensity of the unmethylated allele (U) have been reported, few of them are currently used in + Intensity of the methylated allele (M) + 100], ranging clinical practice. More effective biomarkers are urgently from 0 to 1) . Gene expression was defined using the needed to increase the accuracy of HCC diagnosis. raw read count or log2 transformed normalized count DNA methylation alteration has been observed in vari- (369 HCC and 41 normal). Moreover, the methylation ous cancers and is considered to be a cause of carcino- levels for another ten tumor types were collected from genesis. Global hypomethylation is frequently seen in TCGA: BLCA (409 tumor, 21 normal), BRCA (774 highly and moderately repeated DNA sequences and tumor, 82 normal), COAD (292 tumor, 38 normal), plays a key role in chromosomal instability [9, 10]. GBM (126 tumor, 2 normal), HNSC (523 tumor, 45 nor- Hypermethylation in gene promoter regions, such as in mal), KIRC (316 tumor, 160 normal), LUAD (455 tumor, tumor suppressor genes, is usually related to gene silen- 32 normal), LUSC (365 tumor, 41 normal), READ (95 cing [9, 11]. Some DNA methylation is involved in the tumor, 7 normal) and UCEC (425 tumor, 46 normal), early stage of carcinogenesis, such as RASSF1A in ovar- which have both tumor and normal tissues. ian cancer . Additionally, DNA methylation is rela- Additionally, four methylation array datasets were tively stable over time  and can be non-invasively collected from the Gene Expression Omnibus (GEO) detected in blood. Therefore, DNA methylation has a database: GSE69270  (blood of 184 young Finns), great potential to become an early diagnostic biomarker GSE54503  (66 paired HCC and adjacent normal), of cancers. An increasing number of methylation-based GSE89852  (37 paired HCC and adjacent normal), biomarkers have been developed to aid in the early diag- and GSE56588  (224 HCC, nine cirrhotic, and ten nosis of cancers . The FDA-approved “Epi proColon normal). The array platform was the HumanMethyla- test” is based on the SEPT9 promoter methylation status tion450 BeadChip (GPL13534). The CpG annotations in the plasma. This diagnostic test had a sensitivity of were downloaded from GEO. 36.6 to 95.6% and a specificity of 81.5 to 99.0% for colo- rectal cancer . Zheng et al.  reported that using CpG island methylator phenotype the DNA methylation of ten CpGs could achieve good To find CIMP in HCC, we selected CpGs in the performance to discriminate tumors from normal tissues promoter region that have a high standard deviation in HCC patients, with a sensitivity of more than 86% (SD > 0.2) of the methylation level in 375 tumor tissues and a specificity of almost 100%. Xu et al.  found and a low methylation level (mean β value < 0.05) in 50 Cheng et al. Genome Medicine (2018) 10:42 Page 3 of 11 normal tissues, similar to the results of previous studies Identification of candidate diagnostic biomarkers [23, 24]. K-means-based consensus clustering was per- TCGA datasets were used to screen potential methyla- formed using the R package ConsensusClusterPlus . tion sites as diagnostic biomarkers of HCC. First, 50 Overall survival of the CIMP group and other groups paired HCC and normal samples were compared to was estimated using the Kaplan-Meier method. Fisher’s select hypermethylated CpGs of low-expression genes in exact test was performed to associate the clinical charac- HCC. Second, 375 HCC and 50 normal tissues were teristics with each cluster. compared. CpGs without significantly different methyla- tion were filtered out. Third, 375 HCC tissues were compared with blood samples from individuals without Differential analysis of DNA methylation and gene HCC (GSE69270); we removed CpGs that had higher expression average methylation levels in blood than in HCC tissues. Fifty of the 375 patients from TCGA have both HCC Fourth, HCC-specific hypermethylated sites were and normal methylation profiles, and the paired HCC selected by removing CpGs whose mean methylation and normal methylation data were used for differential levels were higher than 0.1 in tumor or normal samples methylation analysis. CpGs with more than 10% missing of another ten tumor types. The remaining CpGs were values were removed. The remaining missing values candidate diagnostic biomarkers of HCC. Finally, infor- were imputed with the Bioconductor package impute. mation gain-based feature selection was used to decrease Then, a paired t-test was used to identify differentially the number of candidate diagnostic biomarkers. methylated CpGs between the tumor and adjacent nor- mal tissue. P values were adjusted using the false discov- Evaluation of candidate diagnostic biomarkers ery rate (FDR) method. CpGs in chromosomes X and Y The TCGA HCC dataset was taken as the training set, were ignored. The CpGs with an FDR less than 0.05 and while three other independent datasets (GSE54503, an absolute value of the β difference greater than 0.2 GSE89852, and GSE56588) were used as test sets. A were considered to be differentially methylated. When a logistic regression model was built based on the methy- CpG mapped to more than one gene, the first gene was lation levels of the candidate diagnostic biomarkers. This taken as the reference. model was used to predict the tumor and normal Of the 50 patients from TCGA, 41 have both HCC samples. Sensitivity and specificity were calculated to and normal expression profiles, and the paired HCC and evaluate the accuracy of the prediction model. Modeling normal expression data were used for differential expres- and prediction were performed in the data mining tool sion analysis. The Bioconductor package edgeR was WEKA . used to identify differentially expressed (DE) genes from raw read counts. Genes with an FDR less than 0.05 and Bisulfite sequencing PCR experiments an absolute value of log (fold change) greater than 1 Surgical biopsies were collected from ten Chinese were considered to be differentially expressed. patients diagnosed with HCC. This study was approved by the ethical committee of the Zhongshan hospital. All Correlation between DNA methylation and gene patients signed written informed consent to donate their expression tissue samples for research. Fresh tumor and normal The 369 tumor samples with matched methylation tissues were subjected to bisulfite sequencing PCR (BSP) and expression data were used for correlation ana- and quantitative PCR (qPCR) experiments. lysis. First, we investigated the correlation between Genomic DNA was extracted from tissue samples DNA methylation and gene expression (cis-regula- using a QiaAmp DNA Mini Kit (Qiagen, Valencia, CA, tion). As one gene contains multiple CpGs, Pearson USA) according to the manufacturer’s manual. The correlation coefficients were calculated between the DNA sample quality and integrity were determined by expression value and the methylation level of each the A260/280 ratio and agarose gel electrophoresis using CpG site. Correlation was significant if the correlation Nanodrop2000 (Thermo Scientific, USA) and Horizontal coefficient was greater than 0.3 and FDR was less Electrophoresis Systems (Bio-Rad, USA). The BSP than 0.05. Second, we investigated the correlation of primers were designed using online websites with one gene’s methylation and another gene’sexpression customization, and all PCR products were approximately (trans-regulation) using a similar method. Only differ- 400 bp. The CpGs we were interested in were designed entially expressed genes were used to analyze at almost the middle of the PCR product. Additionally, trans-regulation, and the DNA methylation was 250 ng of genomic DNA was converted using an EZ focused on CpGs that were located simultaneously in DNA Methylation-Gold Kit™ (Zymo Research, USA) differentially methylated and differentially expressed according to the manufacturer’s manual. Bisulfite PCR genes. amplification was performed with KAPA Uracil+ PCR Cheng et al. Genome Medicine (2018) 10:42 Page 4 of 11 Ready Mix (KAPA Biosystems, USA) and BSP PCR the other clusters. To understand whether the poor primers, and the PCR conditions were optimized. The prognosis of CIMP was due to more stage III patients, PCR product was directly sequenced on an ABI 3730× we compared the survival probability of stage III patients system (Thermo Scientific, USA) using the same primers in the CIMP group and stage III patients in the as the BSP amplification. The results from direct non-CIMP group. We found that stage III patients in sequencing were analyzed with Sequencing Scanner 2 the CIMP group had a much poorer prognosis than (Thermo Scientific, USA) using C/(C + T) peak ratios to stage III patients in the non-CIMP group (Fig. 1e). define a CpG site methylation rate for each CpG Hence, the poor prognosis of CIMP is possibly associ- dinucleotide within the covered region. ated with global hypermethylation. Gene expression experiments Differential analysis of methylation and expression Total RNA was isolated with an RNeasy Plus Mini kit Methylation data of 50 paired samples from TCGA were (Qiagen, Valencia, CA, USA) with DNase I digestion, used for differential methylation analysis (|β value differ- and cDNA was synthesized by using a PrimeScript RT ence| > 0.2 and FDR < 0.05). There were 7372 hyper- Reagent Kit (TaKaRa, Japan) according to the manufac- methylated and 39,995 hypomethylated CpGs in HCC, turer’s manual. PCR primers were designed using which correspond to 2222 hypermethylated and 5478 Primer3 online tools. Quantitative PCR was performed hypomethylated genes. Then we analyzed the distribu- using SYBR GREEN (Bio-Rad, USA) on an Eco qPCR tion of differentially methylated (DM) CpGs and genes system (Illumina, USA). Target mRNA expression was in different genomic regions (Fig. 2a, d). Hypomethyla- compared between the samples by normalization to tion occurred globally in the whole genome, involving beta-actin (ACTB) mRNA expression. 84% of CpGs and 71% of genes. However, 61% of the CpGs (73% of genes) were hypermethylated in CpG-rich Results regions (CpG islands), and 91% of the CpGs (93% of Methylation landscape of HCC genes) were hypermethylated in the CpG islands of the DNA methylation profiles of 375 HCC tumor samples promoter regions. When we considered the distance of and 50 adjacent normal tissue samples were obtained the probes to CpG islands, the percentage of hyperme- from TCGA. We selected the 591 most variable CpGs thylation was highest in the CpG island. This percentage and performed unsupervised consensus clustering. HCC decreased when the probes were far away from the CpG samples were classified into seven clusters (Fig. 1a). The islands (Fig. 2b). The gene body was dominated by hypo- methylation level of cluster 2 was the lowest. Cluster 7 methylation, while hypermethylation occurred preferen- (4.3%) showed widespread hypermethylation of tially in the regions around the transcription start sites promoter-associated CpGs and was considered to have (Fig. 2c). Such hypomethylation of the whole genome the CpG island methylator phenotype. To determine and hypermethylation of the promoter CpG islands are whether the methylation subtypes are related to progno- general characteristics of solid tumors. Expression data sis, the overall survival of each cluster was estimated of 41 paired samples from TCGA were used for differen- using the Kaplan-Meier method. The p value obtained tial expression analysis (|log (fold change)| > 1 and FDR from the log-rank test is approximately 0.12, indicating < 0.05). We found 662 highly expressed (“DE-high”) and there were differences in prognosis among the different 1553 lowly expressed (“DE-low”) genes in HCC. subtypes (Fig. 1b). Furthermore, we compared the survival probability of CIMP patients (cluster 7) with Roles of methylation in regulating gene expression those of other patients (clusters 1–6). The CIMP First, we analyzed the intersection between differentially subgroup showed poorer prognosis (P = 0.0185; Fig. 1d). expressed genes and differentially methylated genes We next examined whether the subtypes were signifi- (Fig. 3a). Methylation alterations of the genes were cantly associated with clinical characteristics. The signifi- assigned based on the status of the promoter methyla- cant characteristics of cluster 1 were that there were tion. Genes were called “DM-high” if at least one pro- more male (P = 0.0054) and virus-infected (HBV and moter CpG had a higher methylation level in HCC than HCV, P = 0.0012) patients. The genetic background of in normal tissues. Similarly, “DM-low” genes had at least cluster 3 included mainly Asians (P = 0.0034). Cluster 2 one hypomethylated promoter CpG. The promoter had more patients without virus infection (P = 0.0055) methylation defined 881 DM-high genes and 2550 and showed low methylation. Cluster 4 had more male DM-low genes. In total, 293 genes were differentially patients (P = 1.21e-06) and cluster 6 had more female methylated and differentially expressed: 97 genes were patients (P = 0.014). No significant characteristics were hypermethylated with low expression in HCC, 32 genes found for cluster 5. The CIMP group had more stage III were hypomethylated with high expression, 20 genes (P = 0.0141) and HCV-infected (P = 0.0330) patients than were hypermethylated with high expression, and 144 Cheng et al. Genome Medicine (2018) 10:42 Page 5 of 11 (CIMP) ab C2 C3 C6 C5 C4 C7C1Normal Cluster Stage Gender Ethnicity C1 Virus Cluster C1 0.8 C2 C3 C6 C3 0.6 C4 C5 C5 C2 0.4 C6 C7 0.2 Stage C4 Stage_I Stage_II Stage_III 02468 10 Stage_IV Stage_Unknown Year Gender FEMALE MALE Ethnicity C1 C2 C3 C4 C5 C6 C7 AMERICAN_INDIAN_OR_ALASKA_NATIVE ASIAN BLACK_OR_AFRICAN_AMERICAN Stage III Not_Reported WHITE Gender Male Male Female Virus HBV HCV Ethnicity Asian HBV_and_HCV Other Virus Virus Other HCV de value = 0.01852 value = 0.11715 non-CIMP non-CIMP Stage III CIMP CIMP Stage III 02468 10 02468 10 Year Year Fig. 1 The DNA methylation landscape of hepatocellular carcinoma. a Seven methylation clusters were obtained from k-means consensus clustering. Rows are 591 CpGs that had high variation (SD > 0.2) in tumor tissues and low (β value < 0.05) methylation level in normal tissues. Cluster 7 (purple) showed a hypermethylation pattern in nearly all CpGs and was regarded as the CpG island methylator phenotype. b Kaplan-Meier survival curves of each cluster. The CIMP group had a poorer survival than other clusters. c Characteristics of the clusters. Significance was obtained from Fisher’sexact test (p value < 0.05). d Overall survival of CIMP and non-CIMP patients. e Overall survival of CIMP stage III and non-CIMP stage III patients genes were hypomethylated with low expression (Fig. 3a). expression (Fig. 3b), which was consistent with previous Since promoter hypermethylation plays important roles reports [29, 30]. in the inactivation of cancer-related genes , we are Furthermore, we investigated whether DNA methylation particularly interested in the 97 highly methylated and lowly was related to the expression of other genes (trans-regula- expressed genes, and in the subsequent analysis we used tion). We focused on 512 CpGs in 287 differentially meth- these genes to screen candidate diagnostic biomarkers. ylated and differentially expressed genes, analyzing their To study the effect of DNA methylation on the expres- correlation with 2215 differentially expressed genes sion of the same gene (cis-regulation), Pearson correl- (Fig. 3c). The methylation of DM-high genes was predom- ation coefficients were calculated between promoter inantly negative correlated with gene expression while the methylation and gene expression. Among 16,206 genes methylation of DM-low genes was more likely to be with methylation and expression profiles, promoter positively correlated with gene expression. methylation of 2798 (877) genes was significantly nega- tively (positively) correlated with gene expression Identification HCC-specific methylation markers (Fig. 3b). Cis-regulation was dominated by the negative To find sensitive and specific methylation biomarkers correlation between promoter methylation and gene for HCC, we designed a workflow to strictly screen Survival Probability 0.0 0.2 0.4 0.6 0.8 1.0 Survival Probability Survival Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Cheng et al. Genome Medicine (2018) 10:42 Page 6 of 11 ab Tumor_Low Tumor_Low Tumor_High Tumor_High cd Tumor_Low Tumor_Low Tumor_High Tumor_High Promoter Fig. 2 Distribution of differentially methylated CpGs and genes. a Distribution of differentially methylated CpGs in various genomic locations. Promoter, 1500 bp upstream of the transcription start site (TSS); CGI, CpG island; Pro & CGI, promoter and CpG island; WG, whole genome. b Distribution of differentially methylated CpGs according to CpG island. c Distribution of differentially methylated CpGs according to the distance to the TSS. d. Distribution of differentially methylated genes in various genomic locations biomarkers by comparing HCC with normal livers and The six HCC-specific CpGs are mapped to four genes: other cancers (Fig. 4a). We started from 185 hypermethy- NEBL (cg23565942), FAM55C (cg21908638, cg11223367, lated CpGs that were located in 97 lowly expressed genes. and cg03509671), GALNT3 (cg05569109), and DSE First, 130 CpGs remained after requiring hypermethylation (cg11481534). Since the methylation status of CpGs is in 375 HCC tissues. Second, the methylation data from usually similar in neighboring regions , we investi- blood of healthy people was used for filtering, and 109 gated other CpG sites in the promoter of these four CpGs were selected which were lowly methylated in healthy genes (Additional file 2:FigureS1).Mostofthe CpGs people and highly methylated in HCC. Figure 4b illustrates were also hypermethylated in HCC compared to nor- the methylation levels of these 109 CpGs in TCGA and mal tissues, consistent with the six specific CpGs. three independent datasets (Additional file 1). Tumor sam- Next, we compared the methylation status of patients ples could be well discriminated from normal tissue and in different stages. The results showed that six blood samples, indicating the robustness of our results. HCC-specific CpGs are significantly hypermethylated Third, 109 CpGs were further filtered, requiring hyperme- even in stage I patients (Additional file 2:FigureS2). thylation only in HCC but not in ten other cancers in Therefore, these six CpGs are good candidates for the TCGA, and six HCC-specific CpGs were obtained (Fig. 4c). early detection of HCC. Number of CpGs Number of CpGs TSS1500 Promoter TSS200 CGI 5'UTR 1stExon Pro & CGI Body WG 3'UTR Number of genes Number of CpGs N_Shelf Promoter N_Shore CGI Island Pro & CGI S_Shore WG S_Shelf Cheng et al. Genome Medicine (2018) 10:42 Page 7 of 11 a b Fig. 3 Relationship between DNA methylation and gene expression. a Comparison of differentially methylated genes and differentially expressed genes. Genes were considered differentially methylated if at least one promoter CpG site was significantly differentially methylated. b Correlation between gene expression and its promoter methylation. Correlations were calculated using all 16,206 genes, 2215 differentially expressed (DE) genes, 3364 differentially methylated (DM) genes, or 287 both DE and DM genes. The vertical axis shows the percentage of negatively correlated genes (green), positively correlated genes (red), and genes with both negative and positive correlation (black). c Correlation between promoter methylation and other gene expression. This analysis focused on the promoter methylation of 287 DM and DE genes (columns) and the gene expression of 2215 DE genes (rows). Positive and negative correlations are shown in red and green, respectively Evaluation of diagnostic accuracy in independent Then, we compared our results with previously pub- datasets lished methylation markers. Logistic regression models Methylation data of the 50 paired HCC and normal were built based on different feature sets: six or two tissues from TCGA were used as a training set. Three CpGs from our study, nine CpGs from Zheng et al. , independent methylation datasets of HCC (GSE54503, and seven CpGs from Xu et al.  (Additional file 3). GSE89852, and GSE56588) were used as test sets, The sensitivity and specificity of distinguishing HCC including 327 HCC samples and 122 normal samples. from normal livers were high and similar among the Information gain-based feature selection was performed different feature sets (Table 1), while the number of on the six CpGs to rank them. A logistic regression CpGs we used was the least. Next, we compared the model was used to predict HCCs from one CpG to the ability of different methylation markers to distinguish combination of six CpGs. The ROC area associated with HCC from other cancers. Tumor and normal tissues using one CpG to the combination of six CpGs to from other cancers were seldom (0–12%, median 0.15%) predict HCC in three independent datasets is shown in predicted as HCC when using two or six HCC-specific Fig. 5a. The performance using a combination of six CpGs in our study. However, 32.5 to 100% (median HCC-specific CpGs was very good, with ROC areas of 92.85%) of tumor and 0 to 100% (median 48.95%) of 0.972, 0.945, and 0.957 in GSE54503, GSE89852, and normal tissues were predicted as HCC when using the GSE56588, respectively. When using a combination of CpGs of Zheng et al. and Xu et al. as feature sets two specific CpGs (cg23565942 and cg21908638), the (Fig. 5b). Therefore, our study found more cost-effective ROC area was higher than 0.92 in all three test sets. and specific biomarkers for HCC diagnosis. Hence, using a combination of two specific CpGs as To verify whether the six HCC-specific CpGs could be markers could be more cost-effective. stably detected by cheaper technologies, BSP was used UCEC READ LUSC LUAD KIRC HNSC GBM COAD UCEC UCEC BRCA READ READ BLCA LUSC LUSC LIHC LUAD LUAD KIRC KIRC HNSC HNSC GBM GBM COAD COAD BRCA BRCA UCEC BLCA BLCA GSE56588 GSE56588 READ GSE89852 GSE89852 LUSC GSE54503 GSE54503 LUAD KIRC HNSC GBM COAD BRCA BLCA LIHC Cheng et al. Genome Medicine (2018) 10:42 Page 8 of 11 a b TCGA LIHC data 185 hypermethylated CpGs on Tissue (50 paired HCCs 97 low-expressed genes and normal) TCGA_T 0.8 TCGA_N GSE89852_T TCGA LIHC data Hypermethylated in 375 0.6 (375 HCCs) tumors? GSE89852_N GSE54503_T Yes 0.4 GSE54503_N 130 hypermethylated CpGs GSE56588_T GSE56588_N 0.2 Blood GSE69270 Lowly methylated (< 375 (healthy individuals) HCCs) in healthy blood? 0 Yes 109 hypermethylated CpGs Lowly methylated in tumor and TCGA 10 other normal tissues of TCGA 10 cancers other cancers? Yes 6 HCC specific GSE54503 hypermethylated CpGs tumor normal (66 paired HCCs and normal) 0.3 0.3 cg23565942 cg23565942 0.25 0.25 GSE89852 cg21908638 cg21908638 (37 paired HCCs Diagnostic accuracy for HCC cg11223367 0.2 cg11223367 0.2 and normal) cg03509671 0.15 cg03509671 0.15 cg05569109 0.1 cg05569109 0.1 GSE56588 cg11481534 cg11481534 (224 HCCs and 19 0.05 0.05 normal ) Fig. 4 Identification of HCC-specific hypermethylated sites. a Protocol for finding candidate diagnostic biomarkers for HCC. b Unsupervised hierarchical clustering of HCC and normal controls using HCC hypermethylated sites. The heatmap shows the methylation levels of 109 CpGs in five datasets (TCGA, GSE54503, GSE89852, GSE56588, and GSE69270). Normal controls are clustered together, separated from HCC. c The average methylation level of six HCC-specific CpGs in HCC and ten other cancers ab tumor cg23565942 + cg21908638 0.9 2_specific_CpGs 0.7 6_specific_CpGs 9_CpGs_of_Zheng 0.5 7_CpGs_of_Xu 0.3 0.1 HCC normal 0.9 GSE54503 2_specific_CpGs 0.7 GSE89852 6_specific_CpGs GSE56588 9_CpGs_of_Zheng 0.5 7_CpGs_of_Xu 0.3 0.1 Number of CpGs cd 1.0 HCC HCC Normal 20 Normal 0.8 0.6 0.4 0.2 0.0 0 Fig. 5 Performance of HCC-specific hypermethylated sites as diagnostic biomarkers. a Prediction accuracy using different combinations of HCC- specific CpGs. Logistic regression models were built using 50 paired TCGA samples and were tested using three independent datasets. Accuracy was measured by the area under the ROC curve. b Comparison of our markers with previously published methylation markers. Rows show different sources of methylation markers. The horizontal axis shows the different methylation datasets. The first three are HCC datasets, and the remainder are ten other cancer types. Colors indicate the percentage of different samples being predicted as HCC. c Validation of the methylation markers using ten paired HCC–normal tissues. Methylation values were measured by bisulfite sequencing PCR (BSP). d Combination score of methylation markers in ten paired HCC–normal tissues. Scores were calculated by the logistic regression model cg23565942 cg21908638 cg11223367 cg03509671 cg05569109 Comb_2_CpGs Comb_5_CpGs Methylation ROC Area 0.0 0.2 0.4 0.6 0.8 1.0 Logit Methylation Cheng et al. Genome Medicine (2018) 10:42 Page 9 of 11 Table 1 Comparison of the performance of different methylation markers for classifying HCC and normal tissues a b Markers Two HCC-specific CpGs Six HCC-specific CpGs Nine CpGs of Zheng et al. Seven CpGs of Xu et al. Sensitivity GSE54503 0.848 0.909 0.970 0.833 GSE89852 0.892 0.919 0.919 0.946 GSE56588 0.920 0.924 0.942 0.741 Specificity GSE54503 0.970 0.970 0.970 0.955 GSE89852 0.973 0.973 0.892 0.919 GSE56588 1.000 1.000 1.000 1.000 Zheng et al.  reported ten CpGs as HCC diagnostic markers. Nine of them had methylation values in TCGA HCC dataset Xu et al.  reported ten CpGs as HCC diagnostic markers. Seven of them had methylation values in TCGA HCC dataset to determine the methylation status of ten fresh frozen of hypomethylated genes was positively correlated with HCC and normal tissues. The BSP primers of gene expression. Furthermore, we identified six CpGs as cg11481534 cannot amplify enough PCR products; thus, HCC-specific diagnostic biomarkers by comparing HCC, the methylation status of another five CpGs was normal controls, and non-HCC cancers. These sites analyzed (Additional file 4). Four specific CpGs achieved ~ 91% sensitivity and ~ 97% specificity when (cg21908638, cg11223367, cg03509671, and cg05569109) predicting HCC. Our diagnostic biomarkers are more were significantly hypermethylated (P < 0.05), as sensitive and more specific than most of the previously determined using the paired t-test. Another CpG reported protein markers or methylation markers. These (cg23565942) also showed some difference between the results provide new insights into the roles of DNA tumor and normal samples, although the p value (P =0.37) methylation in gene regulation and diagnosis. was not significant (Fig. 5c). Then we combined the methy- CIMP is a phenomenon of simultaneous methylation lation of two specific CpGs and five specific CpGs accord- of a group of genes in a subset of tumors  and has ing to the formula obtained from logistic regression and been studied in multiple cancer types, such as colorectal compared the difference in the combined score between cancer (22.4%) , papillary renal-cell carcinoma (5.6%) the tumor and normal tissues. The combined score for the , and glioblastoma (8.8%) . It has been associated two and five CpGs was significantly higher in tumor tissues with prognosis, but the impact of CIMP on prognosis is than in normal tissues, with p values of 0.009 and 0.008, not consistent among different cancers. We found that respectively (Fig. 5d). Additionally, we validated the expres- 4.3% of HCC patients had CIMP. Compared to other sion of the four genes mapped by six HCC-specific CpGs cancer types, the fraction of CIMP is smaller in HCC. in the paired fresh frozen tissues by qPCR (Additional file 2: However, the CIMP group needs special attention due to Figure S3). The expression of three genes (FAM55C, their poor prognosis. Somatic mutations of IDH1 and GALNT3, and DSE) was significantly (p <0.05) lower in IDH2 have been reported to be associated with glioma tumor tissues than in normal tissues. The expression of CIMP . Due to the low mutation frequencies of these NEBL was alsolower in thetumor,but thedifferencewas genes in HCC, we did not observe a significant associ- not significant (p = 0.17). The phenomenon of hypermethy- ation between them and the CIMP group. lation and low expression of these genes in the fresh frozen DNA methylation is an important epigenetic regulator tissuesisconcordant withthat inTCGAHCC datasets. of gene expression. We observed that cis-regulation is Thus, the specific CpGs identified in our study are promis- predominantly negatively correlated, which is concord- ing diagnostic biomarkers specific for HCC. ant with the views of gene expression silenced by promoter DNA methylation [28, 32]. Promoter methyla- Discussion tion of a hypermethylated gene was mainly negatively In this study, we systematically analyzed the DNA correlated with the expression of other genes, but methylation and gene expression data of hepatocellular promoter methylation of a hypomethylated gene was carcinoma. We identified a subgroup of patients with prone to being positively correlated with the expression CIMP and observed the poor prognosis of these patients. of other genes. The reason for the inconsistent relation- We found that methylation was negatively correlated ship of hypermethylated and hypomethylated genes in with gene expression in cis-regulation. The patterns of trans-regulation is unclear. trans-regulation are more complex; generally, the The most important finding of this study is the identi- methylation of hypermethylated genes was negatively fication of several methylated CpGs as candidate correlated with gene expression, while the methylation diagnostic biomarkers of HCC. An ideal diagnostic Cheng et al. Genome Medicine (2018) 10:42 Page 10 of 11 biomarker should have high sensitivity, enabling the Additional file 3: Prediction performance of specific CpGs in detection of HCC at an early stage; should be specific to distinguishing HCC from normal tissue and other tumors. (XLSX 12 kb) HCC and not detected in other tumor types or premalig- Additional file 4: Primers of CpGs or genes in BSP and gene expression experiments. BSP and qPCR data of fresh frozen tissues from ten HCC nant liver diseases; should be measurable by patients. (XLSX 14 kb) non-invasive and cost-effective technology; and should be validated across different populations. Here, we dis- Abbreviations covered six HCC-specific hypermethylated sites whose AFP: Alpha-fetoprotein; BSP: Bisulfite sequencing PCR; CIMP: CpG island sensitivity and specificity are better than the widely used methylator phenotype; ctDNA: Circulating tumor DNA; DCP: Des-γ-carboxy prothrombin; DE: Differentially expressed; DKK1: Dickkopf-1; FDR: False serum biomarker AFP and another candidate serum bio- discovery rate; GEO: Gene Expression Omnibus; HBV: Hepatitis B virus; marker, DKK1. Moreover, their methylation levels can be HCC: Hepatocellular carcinoma; HCV: Hepatitis C virus; DM: Differentially measured by relatively cheap PCR-based technology. methylated; TCGA: The Cancer Genome Atlas; TSS: Transcription start site However, we have not validated their diagnostic ability Acknowledgements using non-invasive biospecimens. To resolve this We would like to thank all the patients who contributed samples for this problem, we will first develop a sensitive technology to research. detect methylation in cell-free ctDNA. Then, we will compare the consistency of methylation between tissues Funding This work was supported by grants from National Key Research and and blood and validate the prediction ability of the Development Plan (2016YFC0902400) and National Natural Science candidate biomarkers by measuring DNA methylation in Foundation of China (NSFC, 31771472, 31501077). the blood. Another problem is whether the methylation-based biomarkers could distinguish HCC Availability of data and materials The bisulfite sequencing PCR data of ten HCC patients analyzed during the from other liver diseases. In the future, we plan to inves- current study are included in Additional file 4. tigate methylation profiles during the progression of liver cancers, including non-alcoholic fatty liver, hepato- Authors’ contributions cirrhosis, and early HCC. Additionally, we are also inter- JC collected and analyzed methylation data, screened candidate diagnostic markers, and built prediction models. LY, LW, and TH helped in data analysis. ested in the downstream biological functions of JC and HL wrote the manuscript. DW, GL, and GD designed and performed methylation biomarkers, which may help us to under- experimental validation. YJ and LC provided patients’ samples for validation. stand the roles of methylation in carcinogenesis. LX, HL, GD, and YL designed the study and revised the manuscript. All authors read and approved the final manuscript. Conclusions Ethics approval and consent to participate DNA methylation plays important roles in gene regula- Fresh frozen tissues of ten patients used in this study were obtained from the Zhongshan hospital. Written informed consent of all the patients was tion and carcinogenesis in HCC. We discovered several obtained to participate in the study. The research was approved by the methylation-based biomarkers by analyzing the ethical committee of the Zhongshan hospital and was conducted in genome-wide methylation data of 375 HCC samples, 50 accordance with the principles of the Declaration of Helsinki. normal liver samples, 3780 samples of cancers of other Competing interests sites, and 474 normal samples of other organs. The can- The authors declare that they have no competing interests. didate biomarkers were validated in three independent datasets with more than 300 HCC samples and 100 nor- Publisher’sNote mal liver samples. Then, BSP-based experimental valid- Springer Nature remains neutral with regard to jurisdictional claims in ation was performed in ten HCC patients. The candidate published maps and institutional affiliations. biomarkers achieved high diagnostic ability and have the Author details potential to be translated into clinical application. Future Department of Bioinformatics and Biostatistics, School of Life Sciences and translational research will accelerate the clinical valid- 2 Biotechnology, Shanghai Jiao Tong University, Shanghai, China. Key Lab of ation of candidate biomarkers and promote the early Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of detection of HCC. A similar analysis method could be Sciences, Shanghai, China. Basepair biotechnology Co. LTD, Suzhou, China. used for other tumor types to find more associations 4 Department of Pathology, Zhongshan Hospital, Fudan University, Shanghai, between methylation and cancer diagnosis. China. Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai, China. Additional files Received: 29 November 2017 Accepted: 8 May 2018 Additional file 1: Methylation information for 109 CpGs in different HCC References datasets. (XLSX 26 kb) 1. Forner A, Llovet JM, Bruix J. Hepatocellular carcinoma. Lancet. 2012; Additional file 2: Figure S1. Promoter methylation of four genes. 379(9822):1245–55. Figure S2 Stage-related methylation of six HCC-specific CpGs. Figure S3 2. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer Gene expression validation of the four genes by qPCR. (PDF 2167 kb) statistics. CA Cancer J Clin. 2011;61(2):69–90. Cheng et al. Genome Medicine (2018) 10:42 Page 11 of 11 3. Wang JH, Wang CC, Hung CH, Chen CL, Lu SN. Survival comparison 25. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with between surgical resection and radiofrequency ablation for patients in BCLC confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–3. very early/early stage hepatocellular carcinoma. J Hepatol. 2012;56(2):412–8. 26. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for 4. Altekruse SF, McGlynn KA, Reichman ME. Hepatocellular carcinoma differential expression analysis of digital gene expression data. incidence, mortality, and survival trends in the United States from 1975 to Bioinformatics. 2010;26(1):139–40. 2005. J Clin Oncol. 2009;27(9):1485–91. 27. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations 5. Marrero JA, Feng Z, Wang Y, Nguyen MH, Befeler AS, Roberts LR, Reddy KR, Newslett. 2009;11(1):10–8. Harnois D, Llovet JM, Normolle D, et al. Alpha-fetoprotein, des-gamma 28. Herman JG, Baylin SB. Gene silencing in cancer in association with promoter carboxyprothrombin, and lectin-bound alpha-fetoprotein in early hypermethylation. N Engl J Med. 2003;349(21):2042–54. hepatocellular carcinoma. Gastroenterology. 2009;137(1):110–8. 29. Wagner JR, Busche S, Ge B, Kwan T, Pastinen T, Blanchette M. The relationship 6. Bruix J, Sherman M. American Association for the Study of Liver D: between DNA methylation, genetic and expression inter-individual variation in Management of hepatocellular carcinoma: an update. Hepatology. 2011; untransformed human fibroblasts. Genome Biol. 2014;15(2):R37. 53(3):1020–2. 30. Yang IV, Pedersen BS, Rabinovich E, Hennessy CE, Davidson EJ, Murphy E, 7. European Association for the Study of the Liver, European Organisation for Guardela BJ, Tedrow JR, Zhang Y, Singh MK, et al. Relationship of DNA Research and Treatment of Cancer. EASL-EORTC clinical practice guidelines: methylation and gene expression in idiopathic pulmonary fibrosis. Am J management of hepatocellular carcinoma. J Hepatol. 2012;56(4):908–43. Respir Crit Care Med. 2014;190(11):1263–72. 8. Shen QJ, Fan J, Yang XR, Tan YX, Zhao WF, Xu Y, Wang N, Niu YD, Wu Z, 31. Hinoue T, Weisenberger DJ, Lange CP, Shen H, Byun HM, Van Den Berg D, Malik Zhou J, et al. Serum DKK1 as a protein biomarker for the diagnosis of S, Pan F, Noushmehr H, van Dijk CM, et al. Genome-scale analysis of aberrant hepatocellular carcinoma: a large-scale, multicentre study. Lancet Oncol. DNA methylation in colorectal cancer. Genome Res. 2012;22(2):271–82. 2012;13(8):817–26. 32. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies 9. Ehrlich M. DNA methylation in cancer: too much, but also too little. and beyond. Nat Rev Genet. 2012;13(7):484–92. Oncogene. 2002;21(35):5400–13. 10. Eden A, Gaudet F, Waghmare A, Jaenisch R. Chromosomal instability and tumors promoted by DNA hypomethylation. Science. 2003;300(5618):455. 11. Yang B, Guo M, Herman JG, Clark DP. Aberrant promoter methylation profiles of tumor suppressor genes in hepatocellular carcinoma. Am J Pathol. 2003;163(3):1101–7. 12. Si JG, Su YY, Han YH, Chen RH. Role of RASSF1A promoter methylation in the pathogenesis of ovarian cancer: a meta-analysis. Genet Test Mol Biomarkers. 2014;18(6):394–402. 13. Laird PW. The power and the promise of DNA methylation markers. Nat Rev Cancer. 2003;3(4):253–66. 14. Leygo C, Williams M, Jin HC, Chan MWY, Chu WK, Grusch M, Cheng YY. DNA methylation as a noninvasive epigenetic biomarker for the detection of cancer. Dis Markers. 2017;2017:3726595. 15. Song LL, Li YM. Current noninvasive tests for colorectal cancer screening: An overview of colorectal cancer screening tests. World J Gastrointest Oncol. 2016;8(11):793–800. 16. Zheng Y, Huang Q, Ding Z, Liu T, Xue C, Sang X, Gu J. Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma. Brief Bioinform. 2016;19(1):101–8. 17. Xu RH, Wei W, Krawczyk M, Wang W, Luo H, Flagg K, Yi S, Shi W, Quan Q, Li K, et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat Mater. 2017;16(11):1155–61. 18. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–95. 19. Kananen L, Marttila S, Nevalainen T, Jylhava J, Mononen N, Kahonen M, Raitakari OT, Lehtimaki T, Hurme M. Aging-associated DNA methylation changes in middle-aged individuals: the Young Finns study. BMC Genomics. 2016;17:103. 20. Shen J, Wang S, Zhang YJ, Wu HC, Kibriya MG, Jasmine F, Ahsan H, Wu DPH, Siegel AB, Remotti H, et al. Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips. Epigenetics-Us. 2013;8(1):34–43. 21. Kuramoto J, Arai E, Tian Y, Funahashi N, Hiramoto M, Nammo T, Nozaki Y, Takahashi Y, Ito N, Shibuya A, et al. Genome-wide DNA methylation analysis during non-alcoholic steatohepatitis-related multistage hepatocarcinogenesis: comparison with hepatitis virus-related carcinogenesis. Carcinogenesis. 2017;38(3):261–70. 22. Villanueva A, Portela A, Sayols S, Battiston C, Hoshida Y, Mendez-Gonzalez J, Imbeaud S, Letouze E, Hernandez-Gea V, Cornella H, et al. DNA methylation- based prognosis and epidrivers in hepatocellular carcinoma. Hepatology. 2015;61(6):1945–56. 23. Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, Pan F, Pelloski CE, Sulman EP, Bhat KP, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17(5):510–22. 24. Cancer Genome Atlas Research Network, Linehan WM, Spellman PT, Ricketts CJ, Creighton CJ, Fei SS, Davis C, Wheeler DA, Murray BA, Schmidt L, et al. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med. 2016;374(2):135–45.
Genome Medicine – Springer Journals
Published: May 30, 2018