Pancancer analysis identifies prognostic high-APOBEC1 expression level implicated in cancer in-frame insertions and deletions

Pancancer analysis identifies prognostic high-APOBEC1 expression level implicated in cancer... Abstract Genome insertions and deletions (indels) show tremendous functional impacts despite they are much less common than single nucleotide variants, which are at the center of studies assessing cancer mutational signatures. We studied 8891 tumor samples of 32 types from The Cancer Genome Atlas in order to explore those genes which are potentially implicated in cancer indels. Survival analysis identified in-frame indels as the most important variants predicting adverse outcome. Transcriptome-wide association study identified 16 genes overexpressed in both tumor samples and tumor types with high number of in-frame indels, of whom four (APOBEC1, BCL2L15, FOXL1 and PDX1) were identified with gene products distributed within the nucleus. APOBEC1 emerged as the mere consistently hypomethylated gene in tumor samples with high number of in-frame indels. The correlation of APOBEC1 expression levels with cancer indels was independent of age and defects in DNA homologous recombination (HR) and/or mismatch repair. Unlike frame-shift indels, triplet repeat motifs were found to occur frequently at in-frame indel sites. The splicing variant 3, making a shorter isoform b, showed essentially all the same indel correlations as of APOBEC1. Expression levels of both APOBEC1 and variant 3 were found to be predicting adverse prognosis independent of DNA HR and mismatch repair. Not less importantly, high level of variant 3 in paired normal tissues was also proved to predict cancer outcome. Our findings propose APOBEC1 and isoform b as the potential endogenous mutators implicated in cancer in-frame indels and pave the way for their use as novel prognostic tumor markers. Introduction Somatic mutations are hallmarks of cancer genomes. Insertion or deletion (indel) of nucleotides with potentially important functional implications constitutes the second most common class after single-nucleotide variants (SNVs) in cancer. Recent works indicate that indel variants are associated with six mutational signatures (1). While the signature 1 is age-dependent, the signature 3 is associated with defects in the double-strand break repair by homologous recombination (HR) due to BRCA1/BRCA2 inactivation. The remaining signatures including 6, 15, 20 and 26 are supposed to be related to defects in DNA mismatch repair (MMR) causing microsatellite instability. DNA microsatellites are defined as genomic repeats of 2–5 nucleotide size (2) which are hot spots for indel variants when the post-replicative DNA repair is defective (3). Microsatellite instability is commonly seen in endometrial (4), gastric (5), pancreatic (6) and colorectal cancers (7). Although the role of defective DNA repair pathways has been much studied in cancer indels (3–7), few works have attempted to unravel those molecular mechanisms underlying the original DNA insult. This is in sharp contrast to particular SNVs which have already been attributed to APOBEC family members, including APOBEC3B (8,9), APOBEC3A (9,10) and APOBEC3H (11) with endogenous mutator activities. In this study, we have attempted to explore those genes which are potentially implicated in cancer indels, particularly in a pancancer setting. We hypothesized that gene expression changes underlie cancer indels regardless of the potential DNA repair pathways implicated. Survival analysis identified in-frame indels as the most important genomic variants predicting cancer outcome. We next proceeded to discover candidate genes implicated in cancer in-frame indels by identifying differentially expressed genes among both tumor samples and tumor types, and validated the results using gene methylation findings. The final candidate gene, namely APOBEC1, and a particular variant were found to be independently predicting both in-frame indels and adverse outcome. Materials and methods Patients and samples The results shown here are based upon data generated by TCGA Research Network: http://cancergenome.nih.gov/. The complete lists of somatic mutations identified by whole-exome sequencing of 8891 paired tumor-normal samples were obtained from TCGA Data Portal as .maf files (Level 2, v.20151009), comprising 32 tumor types from 26 primary sites (Supplementary Table 1, available at Carcinogenesis Online). These data had been collected from a total of 105 sequencing runs performed by seven participating Centers. A total of 2 458 352 duplicate mutations identified by multiple sequencing runs were removed, leaving 2 338 886 unique variants. The following open access (Level 3) databases were obtained from Broad Institute GDAC Firehose (http://gdac.broadinstitute.org/): normalized RNA-Seq by Expectation-Maximization (RSEM) for genes (RSEM_genes_normalized_data, v.20160128) and isoforms (RSEM_isoforms_normalized_data, v.20160128), gene methylation (meth.by_min_expr_corr.data, v.20160128) data and clinical data including age, sex, vital status and days to death/last follow-up (Clinical_Pick_Tier1, v.20160128). The normalized RSEM data resulted from RNA-sequencing was used as an estimate of TCGA gene expression profiles, and they were further normalized to TBP (TATA-binding protein) expression levels. Indels of longer than one nucleotide were classified based on their 5ʹ-end nucleotide. The list of HR and MMR genes were obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG maps ko03440 and ko03430, respectively), and all exome variants except for silent ones were included for analysis of implicated DNA repair pathways. The list of oncogenes and tumor suppressor genes were obtained from a census of human cancer genes (12) and Cosmic supplemental analysis information (http://cancer.sanger.ac.uk/census#cl_analysis). Statistical analysis Pearson correlation analysis (IBM SPSS, v.22) was used in order to assess the correlation of various indels to each other and to study the correlation of the number of in-frame indel variants in each tumor exome with TBP-normalized expression levels of 30 762 genes/gene variants, as well as APOBEC1 variants. Pearson correlation analysis was also conducted to examine the correlation of the mean number of in-frame indels in each tumor type with the mean normalized expression levels of 30 762 genes/gene variants, as well as APOBEC1 variants. Pearson correlation analysis was performed in order to assess the correlation of the number of in-frame indel variants in each tumor exome with methylation levels of candidate genes. Benjamini–Hochberg false discovery rate (FDR) was used to adjust for multiple testing in Pearson correlation analyses, with acceptable FDR of up to 0.05. Multivariate linear regression analysis (IBM SPSS, v.22) was performed in order to assess the impact of multiple covariates on the number of in-frame indels, including high-APOBEC1 status, HR-mutant status, MMR-mutant status and age. The significance and extent of enrichment of the in-frame indels in oncogenes as compared to tumor suppressor genes were tested using two-sided Mantel–Haenszel test (IBM SPSS, v.22). Two-sided Mantel–Haenszel test was also used to test the significance and extent of enrichment of the homologous triplet units among in-frame indels and their contexts. Unpaired two-tailed Student’s t-test (IBM SPSS, v.22) was used to compare the mean number of indels among low-APOBEC1 versus high-APOBEC1, HR-intact versus HR-mutant and MMR-intact versus MMR-mutant samples, tumor types with/without indel signatures, as well as in-frame indel counts among various groups. Paired two-tailed Student’s t-test (IBM SPSS, v.22) was used to compare the mean APOBEC1 expression levels among tumor and normal samples. Gene-E was used in order to show the correlation heat map of various indels, and Graphpad Prism (v.5.0.1) to illustrate the correlations of gene expression levels and indel numbers. The R (v.3.1.1) package ‘seqinr’ (v.3.1.3) (13) was used to identify the genomic context for each indel site, including genomic positions −30 to +30. The R package ‘ClusteredMutations’ (v.1.0.1) was used to plot exome inter-mutation distance against genomic position. Indel kataegis events were defined by clusters of at least two indels of the same type occurring within 1000 base distance. WebLogo (v.3.4) (14) was used to show the nucleotide probability logos within indel context, as well as for illustrating the triplet repeat motifs. Motif-x (v.1.2) (15) was used to discover significantly enriched context motifs and their enrichment between positions −12 and +12. The genomic context of the corresponding frame-shift indels using all studied exomes were considered as the background, with a minimum acceptable significance of 10−10. Prognostic analysis Overall survival (OS) was measured from the date of patient enrolment to the date of death or last known alive. Cox regression analysis (IBM SPSS, v.22) was used for univariate analysis of the survival using the number of six mutational classes, including SNVs, double-nucleotide variants, frame-shift deletions, frame-shift insertions, in-frame deletions and in-frame insertions in all cancer cases as the covariates. Cox regression analysis was also used for univariate analysis of the survival using high-APOBEC1 or high-Var3 in all cancer patients as the covariate, with age, sex and HR and MMR-mutant status considered in multivariate analysis. Kaplan–Meier analysis was performed using Mantel-Cox statistic (IBM SPSS, v.22) in order to test the equality of survival distributions among low-APOBEC1 (APOBEC1/TBP = 0) and high-APOBEC1 (APOBEC1/TBP > 0), as well as low-Var3 (Var3/TBP = 0) and high-Var3 (Var3/TBP > 0) classes on a pancancer basis. Similar analyses were conducted in each tumor type. Results The prevalence of various mutations in human cancers Assessment of 8891 tumor exomes of 32 types from The Cancer Genome Atlas (TCGA, Supplementary Table 1, available at Carcinogenesis Online) identified 2 338 886 unique variants, of them 192 075 variants (8.2%) were of indel type. Those indels longer than one nucleotide were classified based on their 5ʹ-end one. Single-nucleotide indels constituted just more than two thirds (67.8%) of all indels, followed by trinucleotide indels (12.8%) (Supplementary Figure 1A, available at Carcinogenesis Online). The proportion of indels among total variants varied from 1.3% in skin melanoma to 29.1% in pancreatic adenocarcinoma (Supplementary Figure 1B, available at Carcinogenesis Online). About 16.2% of indels occurred at known genomic polymorphic sites (dbVar). Deletions were more prevalent than insertions in all cancers but kidney clear cell carcinoma and acute myeloid leukemia (Supplementary Figure 1C, available at Carcinogenesis Online). About 21.1% of the all indels were of in-frame deletion or insertion type, varying from 5.7% in hepatocellular carcinoma to 41.6% in pancreatic adenocarcinoma (Supplementary Figure 1D, available at Carcinogenesis Online). Indel events were found to be correlated to each other, most significantly DelA/T and DelC/G (β coefficient = 0.94) (Supplementary Figure 2A, available at Carcinogenesis Online). This was also the case for in-frame indels, with the highest correlation between IF-DelA/T and IF-DelC/G (β coefficient = 0.90) (Supplementary Figure 2B, available at Carcinogenesis Online). Closely-clustered mutations (kataegis) are assumed to be caused by the same molecular mechanism (16), and analysis of kataegis plots revealed much smaller insertion clusters compared to deletion ones. In other words, mean distances between clustered DelA/T and DelC/G events were found to be 111 and 149 nucleotides, respectively (Supplementary Figure 2C, available at Carcinogenesis Online), while they were 7 and 25 nucleotides in the case of clustered InsA/T and InsC/G events, respectively (Supplementary Figure 2D, available at Carcinogenesis Online). These suggested shared underlying mechanisms implicated in deletions and insertions. Survival analysis identified IFD and IFI classes as the most deleterious variants, followed by frame-shift deletion class (Figure 1). Overall, 32 838 in-frame indels (1.4% of all variants) predicted adverse overall survival (OS) much more significantly [hazard ratio = 1.011, 95% confidence interval (CI): 1.008–1.013, P = 2 × 10–16] than frame-shift ones (hazard ratio = 1.001, 95% CI: 1.000–1.001, P = 0.026) and other variant classes, proposing them as the most important class in cancer driver mutations. Assessment of the indel-affected genes for functional roles identified the ratio of in-frame indels to be about 2.3 times higher in oncogenes compared to tumor suppressor genes (Supplementary Figure 3, available at Carcinogenesis Online). Figure 1. View largeDownload slide Prognostic impact of various substitution and indel variant classes. Cox regression analysis identified the impact (hazard ratio) of each variant class on OS. In-frame indels including IFDs (P = 2 × 10–18) and IFIs (P = 0.044), as well as FSDs (P = 0.001) were seen to be significantly predicting the OS. Bars indicate the 95% confidence intervals. DNV, dinucleotide variant; FSD, frame-shift deletion; FSI, frame-shift insertion; IFD, in-frame deletion; IFI, in-frame insertion; SNV, single-nucleotide variant. Figure 1. View largeDownload slide Prognostic impact of various substitution and indel variant classes. Cox regression analysis identified the impact (hazard ratio) of each variant class on OS. In-frame indels including IFDs (P = 2 × 10–18) and IFIs (P = 0.044), as well as FSDs (P = 0.001) were seen to be significantly predicting the OS. Bars indicate the 95% confidence intervals. DNV, dinucleotide variant; FSD, frame-shift deletion; FSI, frame-shift insertion; IFD, in-frame deletion; IFI, in-frame insertion; SNV, single-nucleotide variant. APOBEC1 expression level as the optimal predictor of indel variants The expression profiles of 30 762 genes and gene variants were screened for correlation with the number of exome in-frame indels, showing 1237 differentially expressed genes (844 overexpressed and 393 underexpressed) in those tumor samples with frequent in-frame indels (Supplementary Table 2, available at Carcinogenesis Online). Likewise, 39 genes were found to be differentially expressed (all overexpressed) in those tumors with high mean number of in-frame indels (Supplementary Table 3, available at Carcinogenesis Online). Overall, 16 genes were identified to be overexpressed among both tumor samples and tumor types with frequent in-frame indels (Table 1). Gene ontology assessment identified four overexpressed genes including APOBEC1, BCL2L15 (BCL2 like 15), FOXL1 (forkhead box L1) and PDX1 (pancreatic and duodenal homeobox 1) whose products were localized to the nucleus, suggestive of candidate genes implicated in in-frame indels. The gene products of the latter two are known as transcriptional regulators (17,18), and a pro-apoptotic function has been shown for BCL2L15 (19); none of them has been reported to be mutagenic. APOBEC1, however, is known as both mRNA and DNA modifier. Moreover, APOBEC1 was seen as the only gene which was consistently demethylated in those tumor samples with high number of in-frame indels (Supplementary Figure 4, available at Carcinogenesis Online). The work-flow for transcriptome-wide association study has been depicted in Supplementary Figure 5, available at Carcinogenesis Online. Table 1. The list of differentially expressed genes among both tumor samples and tumor types with high number of in-frame indels Gene symbol  Entrez ID  Tumor exomes  Tumor types  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  βa  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR   APOBEC1b  339  0.179  6e−58  0.188  5e−64  0.065  5e−07  0.106  4e−20  0.786  2e−05  0.749  2e−04  0.769  7e−05  0.688  2e−02  AXDND1  126 859  0.115  2e−22  0.114  5e−22  0.055  8e−05  0.086  2e−12  0.663  1e−02  0.624  3e−02  0.688  5e−03  0.754  1e−02  BCL2L15b  440 603  0.258  9e−123  0.262  4e−126  0.154  3e−41  0.211  1e−79  0.713  9e−04  0.660  6e−03  0.712  1e−03  0.657  4e−02  ERN2  10 595  0.277  8e−142  0.282  2e−147  0.130  3e−29  0.158  3e−44  0.794  1e−05  0.754  1e−04  0.780  4e−05  0.657  4e−02  FOXL1b  2300  0.270  1e−134  0.327  2e−201  0.094  5e−15  0.096  7e−16  0.946  2e−13  0.950  5e−14  0.896  2e−09  0.631  5e−02  GCNT3  9245  0.277  1e−141  0.294  3e−160  0.110  9e−21  0.160  3e−45  0.846  3e−07  0.815  3e−06  0.801  1e−05  0.628  5e−02  INS  3630  0.252  7e−105  0.295  5e−146  0.042  1e−02  0.057  2e−05  0.979  5e−17  0.993  2e−23  0.918  1e−09  0.666  4e−02  KCNK16  83 795  0.168  1e−48  0.199  2e−68  0.034  5e−02  0.035  4e−02  0.968  4e−16  0.987  6e−22  0.906  1e−09  0.652  3e−02  LRRC66  339 977  0.259  3e−123  0.268  7e−133  0.116  3e−23  0.159  7e−45  0.834  9e−07  0.795  1e−05  0.808  7e−06  0.712  2e−02  MTMR11  10 903  0.215  8e−84  0.235  4e−101  0.122  2e−25  0.180  5e−58  0.648  9e−03  0.612  3e−02  0.649  1e−02  0.674  3e−02  PDX1b  3651  0.284  3e−149  0.285  1e−150  0.109  3e−20  0.176  9e−56  0.744  2e−04  0.706  1e−03  0.717  8e−04  0.656  3e−02  PTGES2-AS1  389 791  0.138  2e−34  0.148  2e−39  0.045  2e−03  0.077  1e−10  0.718  7e−04  0.684  3e−03  0.676  4e−03  0.639  4e−02  REG1A  5967  0.182  8e−60  0.197  5e−70  0.036  3e−02  0.058  3e−06  0.952  2e−14  0.946  2e−13  0.897  2e−09  0.636  4e−02  REG4  83 998  0.159  2e−45  0.154  1e−42  0.047  9e−04  0.085  4e−13  0.727  5e−04  0.681  3e−03  0.718  8e−04  0.657  4e−02  TCN1  6947  0.196  1e−69  0.216  1e−84  0.061  3e−06  0.085  8e−13  0.953  2e−14  0.952  3e−14  0.914  9e−10  0.649  3e−02  VILL  50 853  0.304  2e−172  0.349  3e−230  0.115  9e−23  0.131  5e−30  0.924  2e−11  0.921  3e−11  0.873  3e−08  0.626  5e−02  Gene symbol  Entrez ID  Tumor exomes  Tumor types  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  βa  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR   APOBEC1b  339  0.179  6e−58  0.188  5e−64  0.065  5e−07  0.106  4e−20  0.786  2e−05  0.749  2e−04  0.769  7e−05  0.688  2e−02  AXDND1  126 859  0.115  2e−22  0.114  5e−22  0.055  8e−05  0.086  2e−12  0.663  1e−02  0.624  3e−02  0.688  5e−03  0.754  1e−02  BCL2L15b  440 603  0.258  9e−123  0.262  4e−126  0.154  3e−41  0.211  1e−79  0.713  9e−04  0.660  6e−03  0.712  1e−03  0.657  4e−02  ERN2  10 595  0.277  8e−142  0.282  2e−147  0.130  3e−29  0.158  3e−44  0.794  1e−05  0.754  1e−04  0.780  4e−05  0.657  4e−02  FOXL1b  2300  0.270  1e−134  0.327  2e−201  0.094  5e−15  0.096  7e−16  0.946  2e−13  0.950  5e−14  0.896  2e−09  0.631  5e−02  GCNT3  9245  0.277  1e−141  0.294  3e−160  0.110  9e−21  0.160  3e−45  0.846  3e−07  0.815  3e−06  0.801  1e−05  0.628  5e−02  INS  3630  0.252  7e−105  0.295  5e−146  0.042  1e−02  0.057  2e−05  0.979  5e−17  0.993  2e−23  0.918  1e−09  0.666  4e−02  KCNK16  83 795  0.168  1e−48  0.199  2e−68  0.034  5e−02  0.035  4e−02  0.968  4e−16  0.987  6e−22  0.906  1e−09  0.652  3e−02  LRRC66  339 977  0.259  3e−123  0.268  7e−133  0.116  3e−23  0.159  7e−45  0.834  9e−07  0.795  1e−05  0.808  7e−06  0.712  2e−02  MTMR11  10 903  0.215  8e−84  0.235  4e−101  0.122  2e−25  0.180  5e−58  0.648  9e−03  0.612  3e−02  0.649  1e−02  0.674  3e−02  PDX1b  3651  0.284  3e−149  0.285  1e−150  0.109  3e−20  0.176  9e−56  0.744  2e−04  0.706  1e−03  0.717  8e−04  0.656  3e−02  PTGES2-AS1  389 791  0.138  2e−34  0.148  2e−39  0.045  2e−03  0.077  1e−10  0.718  7e−04  0.684  3e−03  0.676  4e−03  0.639  4e−02  REG1A  5967  0.182  8e−60  0.197  5e−70  0.036  3e−02  0.058  3e−06  0.952  2e−14  0.946  2e−13  0.897  2e−09  0.636  4e−02  REG4  83 998  0.159  2e−45  0.154  1e−42  0.047  9e−04  0.085  4e−13  0.727  5e−04  0.681  3e−03  0.718  8e−04  0.657  4e−02  TCN1  6947  0.196  1e−69  0.216  1e−84  0.061  3e−06  0.085  8e−13  0.953  2e−14  0.952  3e−14  0.914  9e−10  0.649  3e−02  VILL  50 853  0.304  2e−172  0.349  3e−230  0.115  9e−23  0.131  5e−30  0.924  2e−11  0.921  3e−11  0.873  3e−08  0.626  5e−02  Sixteen genes were found to be common among both differentially expressed genes in tumor exomes (1237) and differentially expressed genes in tumor types (22) with high number of in-frame (IF) indels (FDR or adjusted Pearson correlation analysis P ≤ 0.05). aPearson correlation β coefficient. bGenes whose products are distributed within nucleus. View Large We examined 712 available cases with paired normal-tumor samples across 21 cancer types and did not observe a significant difference in expression levels of APOBEC1 (Student’s t-test P = 0.09). However, after exclusion of the gastrointestinal samples (ESCA, STAD and COADREAD) showing high normal expression levels of APOBEC1 (median normal APOBEC1 expression level > 0), mean expression level of APOBEC1 in the remaining 619 samples of 18 tumor types was found to be nearly 36 times higher than the paired normal samples (Student’s t-test P = 5 × 10–4) (Supplementary Figure 6, available at Carcinogenesis Online). Further assessments showed that mean APOBEC1 expression levels were 38 times higher in those tumors with indel mutational signatures (signatures 1, 3, 6, 15, 20 and 26) compared to those with non-indel signatures (Supplementary Figure 7, available at Carcinogenesis Online). Intuitively, tumors with the highest mean number of in-frame indels (i.e. pancreas, stomach, colorectum, endometrium and adrenal cortex) tended to show the highest mean APOBEC1 levels (Supplementary Figure 8, available at Carcinogenesis Online). Noteworthy, APOBEC1 was found to be the only APOBEC/AICDA (activation induced cytidine deaminase) family member with differential expression in high in-frame indel states (Supplementary Tables 2 and 3, available at Carcinogenesis Online). Moreover, A1CF (APOBEC1 complementation factor 1) which is crucial in APOBEC1 mRNA editing activity, was not found to be differentially expressed in high in-frame indel states (Supplementary Tables 2 and 3, available at Carcinogenesis Online). Reassessing the limited number of available experiments (21) revealed at least one clonal DelA/T variant within the BCR-ABL1 fusion gene upon overexpression of the rat APOBEC1 in K562 cells. Because indel variants are frequent in six mutational signatures associated with age and HR/MMR defects, we tested the possibility that high-APOBEC1 confounded the impact of defective HR/MMR pathways. Mutations of the KEGG (Kyoto Encyclopedia of Genes and Genomes) HR/MMR pathway genes were used as an indicator of HR/MMR deficiency (Supplementary Tables 4 and 5, available at Carcinogenesis Online). Preliminary assessment showed that both HR-mutant (β coefficient = 0.26, P = 9 × 10–138) and MMR-mutant (β coefficient = 0.32, P = 4 × 10–214) indices highly predicted cancer indels, well reflecting the functional status of HR/MMR pathways. While in-frame indels were found to be mostly associated with high-APOBEC1 (high-A1) status (Figure 2A), frame-shift indels were seen to be mostly associated with MMR mutational status (Figure 2B). A multivariate model identified high-APOBEC1 as the most important factor predicting in-frame indels (Figure 2C), while MMR and HR mutational status were found to be more important in frame-shift indels (Figure 2D). Figure 2. View largeDownload slide The impact of various factors potentially implicated in cancer indels. (A, B) The mean number of indels (±SEM) in each tumor exome as classified by APOBEC1 level and mutational status of HR/MMR genes. All factors are significantly associated with the number of in-frame and frame-shift indels. Multivariate regression analysis shows that in-frame indels are mostly dependent on high-APOBEC1 followed by MMR-mutant status (C), while frame-shift indels are mostly dependent on MMR-mutant followed by HR-mutant status (D). Age does not appear to have any independent impact on cancer indels. Bars indicate the 95% confidence intervals. Unpaired two-tailed Student’s t-test ***P < 10−10. β, Pearson correlation coefficient. Figure 2. View largeDownload slide The impact of various factors potentially implicated in cancer indels. (A, B) The mean number of indels (±SEM) in each tumor exome as classified by APOBEC1 level and mutational status of HR/MMR genes. All factors are significantly associated with the number of in-frame and frame-shift indels. Multivariate regression analysis shows that in-frame indels are mostly dependent on high-APOBEC1 followed by MMR-mutant status (C), while frame-shift indels are mostly dependent on MMR-mutant followed by HR-mutant status (D). Age does not appear to have any independent impact on cancer indels. Bars indicate the 95% confidence intervals. Unpaired two-tailed Student’s t-test ***P < 10−10. β, Pearson correlation coefficient. Triplet repeat motifs correlated with indel variants We next analyzed the genomic contexts of 192 075 indel variants and identified that the nucleotide A was significantly enriched at essentially all positions in IndelA/T variants, while C was enriched at nearly all positions in IndelC/G variants (Figure 3A–D). Analysis of the 33 013 in-frame indel variants, however, showed a distinct pattern. The nucleotide A was found to be highly enriched at triple intervals in IndelA/T, particularly downstream to the indel site (Figure 3E and G), and C was enriched at triple intervals in IndelC/G (Figure 3F and H). In fact, the genomic context for in-frame indels revealed certain triplet repeat motifs. The consensus motifs were seen to be (ASN)4A(GGA)4 in IF-DelA/T, (CNN)4C(TBC)4 in IF-DelC/G, (ANN)4A(NNA)4 in IF-InsA/T, and (CSB)4C(SGC)4 in IF-InsC/G (Figure 3I–L; S = C or G; B = non-A). Analysis of common motifs among this shorter context revealed some motifs of several hundred times enrichment. For instance, A(NGA)4 motif constituting about 5 and 9% of IF-DelA/T and IF-InsA/T variants, respectively, was found to be 232 and 510 fold enriched (Supplementary Figure 9, available at Carcinogenesis Online). Likewise, C(NGC)4 motif comprising about 5 and 10% of IF-DelC/G and IF-InsC/G variants, respectively, was observed to be enriched 230 and 853 times. Analysis of the in-frame indel sequences showed them homologous to their contexts, with the first triplet of in-frame indels homologous to the first triplet of the context in 35.1% of events, showing about 27-fold enrichment (microhomologous; two-sided Mantel-Haenszel test P < 10–300). Figure 3. View largeDownload slide The genomic contexts of cancer indels. (A–D) The genomic contexts of 127 885 deletions and 64 190 insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site. (A) DelA/T. (B) DelC/G. (C) InsA/T. (D) InsC/G. The affected nucleotides A or C appear to be enriched at nearly all positions, particularly downstream to indel sites. One important exception is the position −1 which is significantly depleted in affected nucleotides A or C in both deletions and insertions. (E–L) The genomic context of 27 434 in-frame deletions and 5579 in-frame insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site, reveals triplet repeat motifs. (E) In-frame (IF-) DelA/T. (F) IF-DelC/G. (G) IF-InsA/T. (H) IF-InsC/G. The affected nucleotide A/C is seen to be highly enriched at triple intervals, particularly downstream to indel site. (I–L) The consensus motifs identified in in-frame indels between positions −12 and +12, showing four triplet repeat motifs upstream and four triplet repeats downstream to indel sites. These motifs can be summarized as (ASN)4A(GGA)4 for IF-DelA/T (I), (CNN)4C(TBC)4 for IF-DelC/G (J), (ANN)4A(NNA)4 for IF-InsA/T (K), and (CSB)4C(SGC)4 for IF-InsC/G (L) (S = C or G; B = non-A). Figure 3. View largeDownload slide The genomic contexts of cancer indels. (A–D) The genomic contexts of 127 885 deletions and 64 190 insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site. (A) DelA/T. (B) DelC/G. (C) InsA/T. (D) InsC/G. The affected nucleotides A or C appear to be enriched at nearly all positions, particularly downstream to indel sites. One important exception is the position −1 which is significantly depleted in affected nucleotides A or C in both deletions and insertions. (E–L) The genomic context of 27 434 in-frame deletions and 5579 in-frame insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site, reveals triplet repeat motifs. (E) In-frame (IF-) DelA/T. (F) IF-DelC/G. (G) IF-InsA/T. (H) IF-InsC/G. The affected nucleotide A/C is seen to be highly enriched at triple intervals, particularly downstream to indel site. (I–L) The consensus motifs identified in in-frame indels between positions −12 and +12, showing four triplet repeat motifs upstream and four triplet repeats downstream to indel sites. These motifs can be summarized as (ASN)4A(GGA)4 for IF-DelA/T (I), (CNN)4C(TBC)4 for IF-DelC/G (J), (ANN)4A(NNA)4 for IF-InsA/T (K), and (CSB)4C(SGC)4 for IF-InsC/G (L) (S = C or G; B = non-A). APOBEC1 transcript variant 3 showing the same APOBEC1 correlations We next explored the potential correlation of in-frame indels with particular APOBEC1 splicing variants, including NM_001644, NM_001304566 and NM_005889. The former two, transcript variants 1 and 2 respectively, give rise to the full-length isoform a (NP_001635 and NP_001291495), while the latter variant 3 (Var3) with an alternative start codon within exon 2 yields the shorter isoform b (NP_005880). Var3 was seen to be overexpressed in non-gastrointestinal tumor cells (Supplementary Figure 6, available at Carcinogenesis Online), and dominantly expressed in vast majority of tumors (Supplementary Figure 10, available at Carcinogenesis Online). The number of all in-frame indel variants per tumor exome was found to be mainly correlated with the expression levels of Var3 (Figure 4), and their mean numbers correlated with the mean Var3 level in each tumor type (Supplementary Figure 11, available at Carcinogenesis Online). Figure 4. View largeDownload slide Correlation of the expression levels of APOBEC1 transcript variants with the number of in-frame indel variants. Analysis of 8891 pancancer samples showed the expression levels of variant 3 (Var3) as the most correlated APOBEC1 variant with the number of in-frame indels, followed by variant 2 (Var2). β: Pearson correlation coefficient. Figure 4. View largeDownload slide Correlation of the expression levels of APOBEC1 transcript variants with the number of in-frame indel variants. Analysis of 8891 pancancer samples showed the expression levels of variant 3 (Var3) as the most correlated APOBEC1 variant with the number of in-frame indels, followed by variant 2 (Var2). β: Pearson correlation coefficient. Prognostic implication of expression levels of APOBEC1 and Var3 Pancancer survival analysis revealed both high-APOBEC1 (hazard ratio = 1.30, 95% CI: 1.19–1.42) and high-Var3 (hazard ratio = 1.35, 95% CI: 1.24–1.48) to be significantly associated with shorter OS (Figure 5), irrespective of age, sex and other APOBEC1 splicing variants. Median survival was found to be 864 (977) days shorter in high-APOBEC1 (high-Var3) compared to low-APOBEC1 (low-Var3) patients. Similar analyses performed in individual tumor types demonstrated the prognostic value of the high-APOBEC1 and/or high-Var3 in five tumors, including adrenocortical carcinoma, endometrial carcinoma, pancreatic adenocarcinoma, mesothelioma and thyroid carcinoma (Supplementary Figures 12, 13A and B, available at Carcinogenesis Online). However, this did not mean that the high-APOBEC1/Var3 were not of prognostic value in remaining tumors, because they remained significantly prognostic after exclusion of above five tumors from pancancer analysis (Supplementary Figure 13C and D, available at Carcinogenesis Online). Intriguingly, mean IF indel count was significantly higher in high-APOBEC/Var3 groups in parallel to APOBEC/Var3 prognostic impact. A survival analysis assessing potentially implicated factors in cancer indels (Supplementary Table 6A, available at Carcinogenesis Online) identified age, male gender, high-APOBEC1 and HR mutations to be independently predicting pancancer outcome, with no remaining impact of MMR mutations (Supplementary Table 6B, available at Carcinogenesis Online). High-Var3 was also found to have a similar independent prognostic impact (Supplementary Table 6C, available at Carcinogenesis Online). Figure 5. View largeDownload slide Prognostic assessment of the high expression levels of APOBEC1 and Var3. (A) Cancer cases with high-APOBEC1 (High-A1) showed shorter median survival as compared to those with low-APOBEC1 (2036 versus 2900 days, hazard ratio = 1.30, 95% CI: 1.19–1.42). (B) Those patients with high-Var3 showed shorter median survival as compared to those with low-Var3 levels (1933 versus 2910 days, hazard ratio = 1.35, 95% CI: 1.24–1.48). Mantel-Cox P value is reported for each analysis. Mean IF indel count has been shown for each group. Unpaired two-tailed Student’s t-test ***P < 10−10. Figure 5. View largeDownload slide Prognostic assessment of the high expression levels of APOBEC1 and Var3. (A) Cancer cases with high-APOBEC1 (High-A1) showed shorter median survival as compared to those with low-APOBEC1 (2036 versus 2900 days, hazard ratio = 1.30, 95% CI: 1.19–1.42). (B) Those patients with high-Var3 showed shorter median survival as compared to those with low-Var3 levels (1933 versus 2910 days, hazard ratio = 1.35, 95% CI: 1.24–1.48). Mantel-Cox P value is reported for each analysis. Mean IF indel count has been shown for each group. Unpaired two-tailed Student’s t-test ***P < 10−10. Finally, the fact that high levels of the APOBEC1/Var3 in tumor cells were found to be an independent prognostic factor raised a question that whether they were also predictive of poor outcome in paired normal tissues. Examining 712 cases with available normal samples did not show significant prognostic value for high-APOBEC1 levels in paired normal tissues, neither in pancancer analysis (21 tumor types available) nor among non-gastrointestinal cancer cases. However, a similar analysis using high-Var3 levels demonstrated high prognostic impact in paired normal tissues of cancer patients in pancancer analysis (hazard ratio = 1.60, 95% CI = 1.08–2.35), and the impact was even higher when analysis was limited to non-gastrointestinal cancers (hazard ratio = 2.49, 95% CI = 1.16–5.30). Median survival time was found to be 876 days shorter in those cases with normal tissue high-Var3 in pancancer analysis, and 921 days shorter in non-gastrointestinal cancers with high-Var3 (Supplementary Figure 14, available at Carcinogenesis Online). Discussion We performed a transcriptome-wide association study in 8891 tumor samples of 32 types in order to screen those genes whose expression levels were associated with in-frame indels as the single most important mutational variant predicting poor cancer outcome. This identified 16 genes whose expression levels were correlated with the number of in-frame indels both in tumor samples and tumor types. We further narrowed down to four candidate genes including APOBEC1, BCL2L15, FOXL1 and PDX1 by exclusion of those genes whose products lacked a nuclear distribution. Methylation analysis of the latter candidate genes across 8891 tumor samples identified APOBEC1 as the only gene with methylation status inversely correlated with in-frame indels. A pro-apoptotic role has been shown for BCL2L15 (19), and both FOXL1 and PDX1 have been known as transcriptional regulators (17,18); none of them reported to be mutagenic. APOBEC1 was next confirmed to be overexpressed in primary non-gastrointestinal tumors. On the other hand, the magnitude of APOBEC1 indel impact appeared to be comparable to A3B impact on C>G/T variants (8). The absence of other APOBEC/AID family members among differentially expressed genes excluded the possibility that APOBEC1 just co-expressed with another truly implicated family member, including APOBEC3B or APOBEC3A. Two mutational signatures have been attributed to APOBEC/AID family (1), particularly APOBEC3B (8,9), APOBEC3A (9,10) and APOBEC3H (11). However, the implication of other family members with nuclear distribution, including APOBEC1 (23), APOBEC3C (24) and APOBEC3F (25) in cancer mutations remains to be investigated. Activation-induced deaminase (AID), which is normally involved in antibody diversification through DNA editing, has long been known for its tumorigenic impact in B-cell leukemia and lymphoma through off-target genomic deamination and induction of chromosomal translocations (26). While APOBEC1 had been recognized to deaminate cytidine in certain mRNAs (27), it was later found to be a DNA mutator as well (28,29). Sustained APOBEC1 overexpression in mouse liver was shown to induce hepatocellular carcinoma (30), and examining three mRNA editing sites across 28 tumor types essentially excluded the possibility of mRNA editing at known target sites as the cause of APOBEC1 carcinogenesis (31,32). Besides some clonal indels we observed, APOBEC1 overexpression causes a mutational signature compatible to what is seen in esophageal adenocarcinoma (21). On the other hand, APOBEC1−/− dramatically decreases the formation of intestinal adenomas in APCmin/+ mice. These propose an important role for APOBEC1 mutagenesis in gastrointestinal cancers (33). The functions of AID and APOBEC1 are comparable in many aspects. AID causes DNA cytidine deamination leading to somatic hypermutations, which are associated with indels in up to 6% of mutational events (34,35), and the resultant in-frame indels increase the immunoglobulin diversity and potency (36,37). Both AID and APOBEC1 deaminate the methyl/hydroxymethyl cytidine (5mC/5hmC) in single-stranded DNA (38), and APOBEC1 is the most potent family member deaminating 5hmC in brain cell DNA (39). Furthermore, both AICDA and APOBEC1 are located in a cluster of pluripotency genes including NANOG and DPPA3 (developmental pluripotency associated 3), co-expressing in oocytes and embryonic germ/stem cells (38). A demethylating role has also been suggested for APOBEC1 in testicular carcinoma (22), and APOBEC1 deficiency reduces the risk of mouse testicular germ cell tumors (20). We showed that the correlation of in-frame indels with high-APOBEC1 was independent of HR/MMR mutational status, as the proposed factors implicated in the indel mutational signatures. This is compatible with a role in causation rather than repair of in-frame indels. High-APOBEC1 impact was also reflected in its pancancer prognostic capacity independent of HR/MMR mutational status. We also identified triplet repeat motifs flanking in-frame indel sites, which might serve as DNA binding sites for implicated protein(s), as occurs in zinc-finger proteins including APOBEC family (40). Intriguingly, one consensus triplet motif [(GGA)4] has been reported to form G-quadruplex (G4) in the MYB promoter (41), with partial deletion giving rise to MYB transcriptional activation and complete deletion leading to MYB repression. The G4 motifs have been shown to form DNA binding sites for zinc-finger proteins (42). High-APOBEC1 expression levels were found to be correlated with adverse prognosis in five tumors, including adrenocortical carcinoma, uterine endometrial carcinoma, pancreatic adenocarcinoma, mesothelioma and thyroid carcinoma. The fact that high-APOBEC1 showed a significant albeit less prognostic impact after exclusion of these tumors indicated that its impact was global and not just limited to latter five tumors. We also explored APOBEC1 splicing variants in a search for more precise molecular explanations, identifying Var3 to be dominantly expressed in nearly all tumors studied, with the same indel and clinical associations as of APOBEC1. Var3 constitutes about half of the adult small intestine APOBEC1 mRNA as compared to nearly 90% in fetal small intestine (43), indicating a more important role in stem/progenitor cells. Intriguingly, it is reported that alternative initiation events might occur in response to DNA damage (44). The resultant isoform b lacks N-terminal 45 amino acids which are involved in nuclear localization of A1CF (27) and known to be essential for mRNA editing (45). This, in addition to lack of APOBEC1-A1CF coexpression in high-IF indel states, might impair mRNA editing activity, probably contributing to adverse prognosis in these states. Increased Var3 expression seems to be an early high-impact event in mutagenesis/carcinogenesis, because it was found to be predictive of poor outcome even in paired normal tissues of cancer patients. The fact that APOBEC1 (46) and other family members including APOBEC3A (47), APOBEC3F (48,49) and APOBEC3G (48,50) exist as dimers/multimers might explain how the shorter isoform interferes with the mRNA editing functions of the full-length one (51). The promutagenic impact of APOBEC1 in a pluripotency context (38) may explain its carcinogenic role. Moreover, alterations in the length of repetitive DNA are known to create diversity (52) and APOBEC1 can be predicted to enhance genomic variation by introducing in-frame indels in normal pluripotent cells, as occurs in the immunoglobulin locus by AID (37). In-frame indels are more likely to cause gain-of-function rather than frame-shift ones, and it was therefore not surprising that they occurred more frequently in oncogenes than in tumor suppressor genes. Furthermore, tumors with frame-shift indels unaffected by nonsense-mediated decay are likely to make neoantigens (53), potentially recruiting tumor-infiltrating lymphocytes, which effectively kill tumor cells (54). This might explain why frame-shift indels show less adverse clinical impact compared to in-frame ones. The programmed death-1 (PD-1) pathway which suppresses many immune cells including tumor-infiltrating lymphocytes (55) is upregulated in many tumors and their microenvironments, and its blockade has been successfully attempted in the treatment of several types of tumors, including melanoma, bladder cancer and small-cell lung cancer (56). A recent study shows that MMR deficiency predicts those metastatic colorectal cancer cases who would benefit from PD-1 blockade (57). However, MMR deficiency was not seen to predict successful PD-1 blockade in non-colorectal tumors, and it would be intriguing to test whether high-APOBEC1 can be helpful in this regard. In summary, pancancer prognostic analysis identified in-frame indels as the most important cancer variants affecting patients’ survival. Transcriptome-wide expression and methylation studies identified APOBEC1 as the only candidate gene with the expression and methylation levels correlated with the number of in-frame indels, and the splicing variant 3 mainly contributing to APOBEC1 impact. This indel impact was found to be independent of those DNA repair mechanisms which were proposed to be implicated in cancer indels. Unlike frame-shift indels, consensus triplet repeats were found to occur at in-frame indel sites. Expression levels of both APOBEC1 and variant 3 were found to predict pancancer outcome. However, surprisingly high prognostic value of high-APOBEC1/Var3 in cancers like pancreatic and thyroid carcinoma as well as mesothelioma promise the development of a novel predictive marker in these particular cancers. Moreover, high expression level of variant 3 was found to predict cancer outcome in paired normal tissues, suggesting it as an early mutagenic/carcinogenic event, and paving the way for its utilization as a novel prognostic marker in normal tissues. Our findings propose APOBEC1 and its isoform b as the endogenous mutators implicated in cancer in-frame indels, warranting extensive studies in order to validate both mutagenic and prognostic impacts of the high-APOBEC1 in various cancers. Supplementary material Supplementary material can be found at Carcinogenesis online. Funding This study was supported by a research grant from Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran. Abbreviations CI confidence interval FDR false discovery rate HR homologous recombination MMR mismatch repair OS overall survival SNV single-nucleotide variant Acknowledgements We gratefully acknowledge contributions from TCGA Research Network. We would like to thank Dr. James McKay (Genetic Cancer Susceptibility Group, IARC, WHO, Lyon, France), Dr. Yasmin Reyal (Department of Haematology, University College London Hospitals NHS Trust, London, United Kingdom), Dr. Hossein Poustchi (Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran) and Dr. Sadaf Sepanlou Ghajar (Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran) for their comments. Conflict of Interest Statement: None declared. References 1. Alexandrov, L.B.et al.  ; Australian Pancreatic Cancer Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain. ( 2013) Signatures of mutational processes in human cancer. Nature , 500, 415– 421. Google Scholar CrossRef Search ADS PubMed  2. Turnpenny, P.D.et al.  ( 2012) Emery’s elements of medical genetics . Elsevier/Churchill Livingstone, Philadelphia, PA. 3. Sia, E.A.et al.  ( 1997) Microsatellite instability in yeast: dependence on repeat unit size and DNA mismatch repair genes. Mol. Cell. Biol ., 17, 2851– 2858. Google Scholar CrossRef Search ADS PubMed  4. Kandoth, C.et al.  ( 2013) Integrated genomic characterization of endometrial carcinoma. Nature , 497, 67– 73. Google Scholar CrossRef Search ADS PubMed  5. TCGA( 2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature , 513, 202– 9. CrossRef Search ADS PubMed  6. Nakata, B.et al.  ( 2002) Prognostic value of microsatellite instability in resectable pancreatic cancer. Clin. Cancer Res ., 8, 2536– 2540. Google Scholar PubMed  7. TCGA( 2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature , 487, 330– 7. CrossRef Search ADS PubMed  8. Burns, M.B.et al.  ( 2013) Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet ., 45, 977– 983. Google Scholar CrossRef Search ADS PubMed  9. Nik-Zainal, S.et al.  ( 2014) Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet ., 46, 487– 491. Google Scholar CrossRef Search ADS PubMed  10. Chan, K.et al.  ( 2015) An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet ., 47, 1067– 1072. Google Scholar CrossRef Search ADS PubMed  11. Starrett, G.J.et al.  ( 2016) The DNA cytosine deaminase APOBEC3H haplotype I likely contributes to breast and lung cancer mutagenesis. Nat. Commun ., 7, 12918. Google Scholar CrossRef Search ADS PubMed  12. Futreal, P.A.et al.  ( 2004) A census of human cancer genes. Nat. Rev. Cancer , 4, 177– 183. Google Scholar CrossRef Search ADS PubMed  13. Bastolla, U. ( 2007) Structural approaches to sequence evolution: molecules, networks, populations . Springer, Berlin, New York. Google Scholar CrossRef Search ADS   14. Crooks, G.E.et al.  ( 2004) WebLogo: a sequence logo generator. Genome Res ., 14, 1188– 1190. Google Scholar CrossRef Search ADS PubMed  15. Schwartz, D.et al.  ( 2005) An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol ., 23, 1391– 1398. Google Scholar CrossRef Search ADS PubMed  16. Nik-Zainal, S.et al.  ; Breast Cancer Working Group of the International Cancer Genome Consortium. ( 2012) Mutational processes molding the genomes of 21 breast cancers. Cell , 149, 979– 993. Google Scholar CrossRef Search ADS PubMed  17. Perreault, N.et al.  ( 2001) Foxl1 controls the Wnt/beta-catenin pathway by modulating the expression of proteoglycans in the gut. J. Biol. Chem ., 276, 43328– 43333. Google Scholar CrossRef Search ADS PubMed  18. Oliver-Krasinski, J.M.et al.  ( 2009) The diabetes gene Pdx1 regulates the transcriptional network of pancreatic endocrine progenitor cells in mice. J. Clin. Invest ., 119, 1888– 1898. Google Scholar CrossRef Search ADS PubMed  19. Dempsey, C.E.et al.  ( 2005) Expression of pro-apoptotic Bfk isoforms reduces during malignant transformation in the human gastrointestinal tract. FEBS Lett ., 579, 3646– 3650. Google Scholar CrossRef Search ADS PubMed  20. Nelson, V.R.et al.  ( 2012) Transgenerational epigenetic effects of the Apobec1 cytidine deaminase deficiency on testicular germ cell tumor susceptibility and embryonic viability. Proc. Natl. Acad. Sci. USA , 109, E2766– E2773. Google Scholar CrossRef Search ADS   21. Saraconi, G.et al.  ( 2014) The RNA editing enzyme APOBEC1 induces somatic mutations and a compatible mutational signature is present in esophageal adenocarcinomas. Genome Biol ., 15, 417. Google Scholar CrossRef Search ADS PubMed  22. Kristensen, D.G.et al.  ( 2014) Evidence that active demethylation mechanisms maintain the genome of carcinoma in situ cells hypomethylated in the adult testis. Br. J. Cancer , 110, 668– 678. Google Scholar CrossRef Search ADS PubMed  23. Lau, P.P.et al.  ( 1991) Apolipoprotein B mRNA editing is an intranuclear event that occurs posttranscriptionally coincident with splicing and polyadenylation. J. Biol. Chem ., 266, 20550– 20554. Google Scholar PubMed  24. Bogerd, H.P.et al.  ( 2006) Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc. Natl. Acad. Sci. USA , 103, 8780– 8785. Google Scholar CrossRef Search ADS   25. Burdick, R.C.et al.  ( 2013) Nuclear import of APOBEC3F-labeled HIV-1 preintegration complexes. Proc. Natl. Acad. Sci. USA , 110, E4780– E4789. Google Scholar CrossRef Search ADS   26. Casellas, R.et al.  ( 2016) Mutations, kataegis and translocations in B cells: understanding AID promiscuous activity. Nat. Rev. Immunol ., 16, 164– 176. Google Scholar CrossRef Search ADS PubMed  27. Teng, B.et al.  ( 1993) Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science , 260, 1816– 1819. Google Scholar CrossRef Search ADS PubMed  28. Harris, R.S.et al.  ( 2002) RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell , 10, 1247– 1253. Google Scholar CrossRef Search ADS PubMed  29. Conticello, S.G. ( 2012) Creative deaminases, self-inflicted damage, and genome evolution. Ann. N. Y. Acad. Sci ., 1267, 79– 85. Google Scholar CrossRef Search ADS PubMed  30. Yamanaka, S.et al.  ( 1995) Apolipoprotein B mRNA-editing protein induces hepatocellular carcinoma and dysplasia in transgenic animals. Proc. Natl. Acad. Sci. USA , 92, 8483– 8487. Google Scholar CrossRef Search ADS   31. Yamanaka, S.et al.  ( 1997) A novel translational repressor mRNA is edited extensively in livers containing tumors caused by the transgene expression of the apoB mRNA-editing enzyme. Genes Dev ., 11, 321– 333. Google Scholar CrossRef Search ADS PubMed  32. Greeve, J.et al.  ( 1999) Absence of APOBEC-1 mediated mRNA editing in human carcinomas. Oncogene , 18, 6357– 6366. Google Scholar CrossRef Search ADS PubMed  33. Blanc, V.et al.  ( 2007) Deletion of the AU-rich RNA binding protein Apobec-1 reduces intestinal tumor burden in Apc(min) mice. Cancer Res ., 67, 8565– 8573. Google Scholar CrossRef Search ADS PubMed  34. Wilson, P.C.et al.  ( 1998) Somatic hypermutation introduces insertions and deletions into immunoglobulin V genes. J. Exp. Med ., 187, 59– 70. Google Scholar CrossRef Search ADS PubMed  35. Goossens, T.et al.  ( 1998) Frequent occurrence of deletions and duplications during somatic hypermutation: implications for oncogene translocations and heavy chain disease. Proc. Natl. Acad. Sci. USA , 95, 2463– 2468. Google Scholar CrossRef Search ADS   36. Chaudhuri, J.et al.  ( 2003) Transcription-targeted DNA deamination by the AID antibody diversification enzyme. Nature , 422, 726– 730. Google Scholar CrossRef Search ADS PubMed  37. Walker, L.M.et al.  ; Protocol G Principal Investigators. ( 2011) Broad neutralization coverage of HIV by multiple highly potent antibodies. Nature , 477, 466– 470. Google Scholar CrossRef Search ADS PubMed  38. Morgan, H.D.et al.  ( 2004) Activation-induced cytidine deaminase deaminates 5-methylcytosine in DNA and is expressed in pluripotent tissues: implications for epigenetic reprogramming. J. Biol. Chem ., 279, 52353– 52360. Google Scholar CrossRef Search ADS PubMed  39. Guo, J.U.et al.  ( 2011) Hydroxylation of 5-methylcytosine by TET1 promotes active DNA demethylation in the adult brain. Cell , 145, 423– 434. Google Scholar CrossRef Search ADS PubMed  40. Jarmuz, A.et al.  ( 2002) An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics , 79, 285– 296. Google Scholar CrossRef Search ADS PubMed  41. Palumbo, S.L.et al.  ( 2008) A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity. Nucleic Acids Res ., 36, 1755– 1769. Google Scholar CrossRef Search ADS PubMed  42. Chariker, J.H.et al.  ( 2016) Computational analysis of G-quadruplex forming sequences across chromosomes reveals high density patterns near the terminal ends. PLoS One , 11, e0165101. Google Scholar CrossRef Search ADS PubMed  43. Hirano, K.et al.  ( 1997) Characterization of the human apobec-1 gene: expression in gastrointestinal tissues determined by alternative splicing with production of a novel truncated peptide. J. Lipid Res ., 38, 847– 859. Google Scholar PubMed  44. Sprung, C.N.et al.  ( 2011) Alternative transcript initiation and splicing as a response to DNA damage. PLoS One , 6, e25758. Google Scholar CrossRef Search ADS PubMed  45. Teng, B.B.et al.  ( 1999) Mutational analysis of apolipoprotein B mRNA editing enzyme (APOBEC1). structure-function relationships of RNA editing and dimerization. J. Lipid Res ., 40, 623– 635. Google Scholar PubMed  46. Lau, P.P.et al.  ( 1994) Dimeric structure of a human apolipoprotein B mRNA editing protein and cloning and chromosomal localization of its gene. Proc. Natl. Acad. Sci. USA , 91, 8522– 8526. Google Scholar CrossRef Search ADS   47. Bohn, M.F.et al.  ( 2015) The ssDNA mutator APOBEC3A is regulated by cooperative dimerization. Structure , 23, 903– 911. Google Scholar CrossRef Search ADS PubMed  48. Wiegand, H.L.et al.  ( 2004) A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO J ., 23, 2451– 2458. Google Scholar CrossRef Search ADS PubMed  49. Ara, A.et al.  ( 2014) Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog ., 10, e1004024. Google Scholar CrossRef Search ADS PubMed  50. Wedekind, J.E.et al.  ( 2006) Nanostructures of APOBEC3G support a hierarchical assembly model of high molecular mass ribonucleoprotein particles from dimeric subunits. J. Biol. Chem ., 281, 38122– 38126. Google Scholar CrossRef Search ADS PubMed  51. Anant, S.et al.  ( 2001) ARCD-1, an apobec-1-related cytidine deaminase, exerts a dominant negative effect on C to U RNA editing. Am. J. Physiol. Cell Physiol ., 281, C1904– C1916. Google Scholar CrossRef Search ADS PubMed  52. Slattery, J.P.et al.  ( 2000) Patterns of diversity among SINE elements isolated from three Y-chromosome genes in carnivores. Mol. Biol. Evol ., 17, 825– 829. Google Scholar CrossRef Search ADS PubMed  53. Saeterdal, I.et al.  ( 2001) Frameshift-mutation-derived peptides as tumor-specific antigens in inherited and spontaneous colorectal cancer. Proc. Natl. Acad. Sci. USA , 98, 13255– 13260. Google Scholar CrossRef Search ADS   54. Westdorp, H.et al.  ( 2016) Opportunities for immunotherapy in microsatellite instable colorectal cancer. Cancer Immunol. Immunother ., 65, 1249– 1259. Google Scholar CrossRef Search ADS PubMed  55. Keir, M.E.et al.  ( 2008) PD-1 and its ligands in tolerance and immunity. Annu. Rev. Immunol ., 26, 677– 704. Google Scholar CrossRef Search ADS PubMed  56. Brahmer, J.R.et al.  ( 2012) Safety and activity of anti-PD-L1 antibody in patients with advanced cancer. N. Engl. J. Med ., 366, 2455– 2465. Google Scholar CrossRef Search ADS PubMed  57. Le, D.T.et al.  ( 2015) PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med ., 372, 2509– 2520. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Carcinogenesis Oxford University Press

Pancancer analysis identifies prognostic high-APOBEC1 expression level implicated in cancer in-frame insertions and deletions

Loading next page...
 
/lp/ou_press/pancancer-analysis-identifies-prognostic-high-apobec1-expression-level-rhbQ79CcGq
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
ISSN
0143-3334
eISSN
1460-2180
D.O.I.
10.1093/carcin/bgy005
Publisher site
See Article on Publisher Site

Abstract

Abstract Genome insertions and deletions (indels) show tremendous functional impacts despite they are much less common than single nucleotide variants, which are at the center of studies assessing cancer mutational signatures. We studied 8891 tumor samples of 32 types from The Cancer Genome Atlas in order to explore those genes which are potentially implicated in cancer indels. Survival analysis identified in-frame indels as the most important variants predicting adverse outcome. Transcriptome-wide association study identified 16 genes overexpressed in both tumor samples and tumor types with high number of in-frame indels, of whom four (APOBEC1, BCL2L15, FOXL1 and PDX1) were identified with gene products distributed within the nucleus. APOBEC1 emerged as the mere consistently hypomethylated gene in tumor samples with high number of in-frame indels. The correlation of APOBEC1 expression levels with cancer indels was independent of age and defects in DNA homologous recombination (HR) and/or mismatch repair. Unlike frame-shift indels, triplet repeat motifs were found to occur frequently at in-frame indel sites. The splicing variant 3, making a shorter isoform b, showed essentially all the same indel correlations as of APOBEC1. Expression levels of both APOBEC1 and variant 3 were found to be predicting adverse prognosis independent of DNA HR and mismatch repair. Not less importantly, high level of variant 3 in paired normal tissues was also proved to predict cancer outcome. Our findings propose APOBEC1 and isoform b as the potential endogenous mutators implicated in cancer in-frame indels and pave the way for their use as novel prognostic tumor markers. Introduction Somatic mutations are hallmarks of cancer genomes. Insertion or deletion (indel) of nucleotides with potentially important functional implications constitutes the second most common class after single-nucleotide variants (SNVs) in cancer. Recent works indicate that indel variants are associated with six mutational signatures (1). While the signature 1 is age-dependent, the signature 3 is associated with defects in the double-strand break repair by homologous recombination (HR) due to BRCA1/BRCA2 inactivation. The remaining signatures including 6, 15, 20 and 26 are supposed to be related to defects in DNA mismatch repair (MMR) causing microsatellite instability. DNA microsatellites are defined as genomic repeats of 2–5 nucleotide size (2) which are hot spots for indel variants when the post-replicative DNA repair is defective (3). Microsatellite instability is commonly seen in endometrial (4), gastric (5), pancreatic (6) and colorectal cancers (7). Although the role of defective DNA repair pathways has been much studied in cancer indels (3–7), few works have attempted to unravel those molecular mechanisms underlying the original DNA insult. This is in sharp contrast to particular SNVs which have already been attributed to APOBEC family members, including APOBEC3B (8,9), APOBEC3A (9,10) and APOBEC3H (11) with endogenous mutator activities. In this study, we have attempted to explore those genes which are potentially implicated in cancer indels, particularly in a pancancer setting. We hypothesized that gene expression changes underlie cancer indels regardless of the potential DNA repair pathways implicated. Survival analysis identified in-frame indels as the most important genomic variants predicting cancer outcome. We next proceeded to discover candidate genes implicated in cancer in-frame indels by identifying differentially expressed genes among both tumor samples and tumor types, and validated the results using gene methylation findings. The final candidate gene, namely APOBEC1, and a particular variant were found to be independently predicting both in-frame indels and adverse outcome. Materials and methods Patients and samples The results shown here are based upon data generated by TCGA Research Network: http://cancergenome.nih.gov/. The complete lists of somatic mutations identified by whole-exome sequencing of 8891 paired tumor-normal samples were obtained from TCGA Data Portal as .maf files (Level 2, v.20151009), comprising 32 tumor types from 26 primary sites (Supplementary Table 1, available at Carcinogenesis Online). These data had been collected from a total of 105 sequencing runs performed by seven participating Centers. A total of 2 458 352 duplicate mutations identified by multiple sequencing runs were removed, leaving 2 338 886 unique variants. The following open access (Level 3) databases were obtained from Broad Institute GDAC Firehose (http://gdac.broadinstitute.org/): normalized RNA-Seq by Expectation-Maximization (RSEM) for genes (RSEM_genes_normalized_data, v.20160128) and isoforms (RSEM_isoforms_normalized_data, v.20160128), gene methylation (meth.by_min_expr_corr.data, v.20160128) data and clinical data including age, sex, vital status and days to death/last follow-up (Clinical_Pick_Tier1, v.20160128). The normalized RSEM data resulted from RNA-sequencing was used as an estimate of TCGA gene expression profiles, and they were further normalized to TBP (TATA-binding protein) expression levels. Indels of longer than one nucleotide were classified based on their 5ʹ-end nucleotide. The list of HR and MMR genes were obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG maps ko03440 and ko03430, respectively), and all exome variants except for silent ones were included for analysis of implicated DNA repair pathways. The list of oncogenes and tumor suppressor genes were obtained from a census of human cancer genes (12) and Cosmic supplemental analysis information (http://cancer.sanger.ac.uk/census#cl_analysis). Statistical analysis Pearson correlation analysis (IBM SPSS, v.22) was used in order to assess the correlation of various indels to each other and to study the correlation of the number of in-frame indel variants in each tumor exome with TBP-normalized expression levels of 30 762 genes/gene variants, as well as APOBEC1 variants. Pearson correlation analysis was also conducted to examine the correlation of the mean number of in-frame indels in each tumor type with the mean normalized expression levels of 30 762 genes/gene variants, as well as APOBEC1 variants. Pearson correlation analysis was performed in order to assess the correlation of the number of in-frame indel variants in each tumor exome with methylation levels of candidate genes. Benjamini–Hochberg false discovery rate (FDR) was used to adjust for multiple testing in Pearson correlation analyses, with acceptable FDR of up to 0.05. Multivariate linear regression analysis (IBM SPSS, v.22) was performed in order to assess the impact of multiple covariates on the number of in-frame indels, including high-APOBEC1 status, HR-mutant status, MMR-mutant status and age. The significance and extent of enrichment of the in-frame indels in oncogenes as compared to tumor suppressor genes were tested using two-sided Mantel–Haenszel test (IBM SPSS, v.22). Two-sided Mantel–Haenszel test was also used to test the significance and extent of enrichment of the homologous triplet units among in-frame indels and their contexts. Unpaired two-tailed Student’s t-test (IBM SPSS, v.22) was used to compare the mean number of indels among low-APOBEC1 versus high-APOBEC1, HR-intact versus HR-mutant and MMR-intact versus MMR-mutant samples, tumor types with/without indel signatures, as well as in-frame indel counts among various groups. Paired two-tailed Student’s t-test (IBM SPSS, v.22) was used to compare the mean APOBEC1 expression levels among tumor and normal samples. Gene-E was used in order to show the correlation heat map of various indels, and Graphpad Prism (v.5.0.1) to illustrate the correlations of gene expression levels and indel numbers. The R (v.3.1.1) package ‘seqinr’ (v.3.1.3) (13) was used to identify the genomic context for each indel site, including genomic positions −30 to +30. The R package ‘ClusteredMutations’ (v.1.0.1) was used to plot exome inter-mutation distance against genomic position. Indel kataegis events were defined by clusters of at least two indels of the same type occurring within 1000 base distance. WebLogo (v.3.4) (14) was used to show the nucleotide probability logos within indel context, as well as for illustrating the triplet repeat motifs. Motif-x (v.1.2) (15) was used to discover significantly enriched context motifs and their enrichment between positions −12 and +12. The genomic context of the corresponding frame-shift indels using all studied exomes were considered as the background, with a minimum acceptable significance of 10−10. Prognostic analysis Overall survival (OS) was measured from the date of patient enrolment to the date of death or last known alive. Cox regression analysis (IBM SPSS, v.22) was used for univariate analysis of the survival using the number of six mutational classes, including SNVs, double-nucleotide variants, frame-shift deletions, frame-shift insertions, in-frame deletions and in-frame insertions in all cancer cases as the covariates. Cox regression analysis was also used for univariate analysis of the survival using high-APOBEC1 or high-Var3 in all cancer patients as the covariate, with age, sex and HR and MMR-mutant status considered in multivariate analysis. Kaplan–Meier analysis was performed using Mantel-Cox statistic (IBM SPSS, v.22) in order to test the equality of survival distributions among low-APOBEC1 (APOBEC1/TBP = 0) and high-APOBEC1 (APOBEC1/TBP > 0), as well as low-Var3 (Var3/TBP = 0) and high-Var3 (Var3/TBP > 0) classes on a pancancer basis. Similar analyses were conducted in each tumor type. Results The prevalence of various mutations in human cancers Assessment of 8891 tumor exomes of 32 types from The Cancer Genome Atlas (TCGA, Supplementary Table 1, available at Carcinogenesis Online) identified 2 338 886 unique variants, of them 192 075 variants (8.2%) were of indel type. Those indels longer than one nucleotide were classified based on their 5ʹ-end one. Single-nucleotide indels constituted just more than two thirds (67.8%) of all indels, followed by trinucleotide indels (12.8%) (Supplementary Figure 1A, available at Carcinogenesis Online). The proportion of indels among total variants varied from 1.3% in skin melanoma to 29.1% in pancreatic adenocarcinoma (Supplementary Figure 1B, available at Carcinogenesis Online). About 16.2% of indels occurred at known genomic polymorphic sites (dbVar). Deletions were more prevalent than insertions in all cancers but kidney clear cell carcinoma and acute myeloid leukemia (Supplementary Figure 1C, available at Carcinogenesis Online). About 21.1% of the all indels were of in-frame deletion or insertion type, varying from 5.7% in hepatocellular carcinoma to 41.6% in pancreatic adenocarcinoma (Supplementary Figure 1D, available at Carcinogenesis Online). Indel events were found to be correlated to each other, most significantly DelA/T and DelC/G (β coefficient = 0.94) (Supplementary Figure 2A, available at Carcinogenesis Online). This was also the case for in-frame indels, with the highest correlation between IF-DelA/T and IF-DelC/G (β coefficient = 0.90) (Supplementary Figure 2B, available at Carcinogenesis Online). Closely-clustered mutations (kataegis) are assumed to be caused by the same molecular mechanism (16), and analysis of kataegis plots revealed much smaller insertion clusters compared to deletion ones. In other words, mean distances between clustered DelA/T and DelC/G events were found to be 111 and 149 nucleotides, respectively (Supplementary Figure 2C, available at Carcinogenesis Online), while they were 7 and 25 nucleotides in the case of clustered InsA/T and InsC/G events, respectively (Supplementary Figure 2D, available at Carcinogenesis Online). These suggested shared underlying mechanisms implicated in deletions and insertions. Survival analysis identified IFD and IFI classes as the most deleterious variants, followed by frame-shift deletion class (Figure 1). Overall, 32 838 in-frame indels (1.4% of all variants) predicted adverse overall survival (OS) much more significantly [hazard ratio = 1.011, 95% confidence interval (CI): 1.008–1.013, P = 2 × 10–16] than frame-shift ones (hazard ratio = 1.001, 95% CI: 1.000–1.001, P = 0.026) and other variant classes, proposing them as the most important class in cancer driver mutations. Assessment of the indel-affected genes for functional roles identified the ratio of in-frame indels to be about 2.3 times higher in oncogenes compared to tumor suppressor genes (Supplementary Figure 3, available at Carcinogenesis Online). Figure 1. View largeDownload slide Prognostic impact of various substitution and indel variant classes. Cox regression analysis identified the impact (hazard ratio) of each variant class on OS. In-frame indels including IFDs (P = 2 × 10–18) and IFIs (P = 0.044), as well as FSDs (P = 0.001) were seen to be significantly predicting the OS. Bars indicate the 95% confidence intervals. DNV, dinucleotide variant; FSD, frame-shift deletion; FSI, frame-shift insertion; IFD, in-frame deletion; IFI, in-frame insertion; SNV, single-nucleotide variant. Figure 1. View largeDownload slide Prognostic impact of various substitution and indel variant classes. Cox regression analysis identified the impact (hazard ratio) of each variant class on OS. In-frame indels including IFDs (P = 2 × 10–18) and IFIs (P = 0.044), as well as FSDs (P = 0.001) were seen to be significantly predicting the OS. Bars indicate the 95% confidence intervals. DNV, dinucleotide variant; FSD, frame-shift deletion; FSI, frame-shift insertion; IFD, in-frame deletion; IFI, in-frame insertion; SNV, single-nucleotide variant. APOBEC1 expression level as the optimal predictor of indel variants The expression profiles of 30 762 genes and gene variants were screened for correlation with the number of exome in-frame indels, showing 1237 differentially expressed genes (844 overexpressed and 393 underexpressed) in those tumor samples with frequent in-frame indels (Supplementary Table 2, available at Carcinogenesis Online). Likewise, 39 genes were found to be differentially expressed (all overexpressed) in those tumors with high mean number of in-frame indels (Supplementary Table 3, available at Carcinogenesis Online). Overall, 16 genes were identified to be overexpressed among both tumor samples and tumor types with frequent in-frame indels (Table 1). Gene ontology assessment identified four overexpressed genes including APOBEC1, BCL2L15 (BCL2 like 15), FOXL1 (forkhead box L1) and PDX1 (pancreatic and duodenal homeobox 1) whose products were localized to the nucleus, suggestive of candidate genes implicated in in-frame indels. The gene products of the latter two are known as transcriptional regulators (17,18), and a pro-apoptotic function has been shown for BCL2L15 (19); none of them has been reported to be mutagenic. APOBEC1, however, is known as both mRNA and DNA modifier. Moreover, APOBEC1 was seen as the only gene which was consistently demethylated in those tumor samples with high number of in-frame indels (Supplementary Figure 4, available at Carcinogenesis Online). The work-flow for transcriptome-wide association study has been depicted in Supplementary Figure 5, available at Carcinogenesis Online. Table 1. The list of differentially expressed genes among both tumor samples and tumor types with high number of in-frame indels Gene symbol  Entrez ID  Tumor exomes  Tumor types  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  βa  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR   APOBEC1b  339  0.179  6e−58  0.188  5e−64  0.065  5e−07  0.106  4e−20  0.786  2e−05  0.749  2e−04  0.769  7e−05  0.688  2e−02  AXDND1  126 859  0.115  2e−22  0.114  5e−22  0.055  8e−05  0.086  2e−12  0.663  1e−02  0.624  3e−02  0.688  5e−03  0.754  1e−02  BCL2L15b  440 603  0.258  9e−123  0.262  4e−126  0.154  3e−41  0.211  1e−79  0.713  9e−04  0.660  6e−03  0.712  1e−03  0.657  4e−02  ERN2  10 595  0.277  8e−142  0.282  2e−147  0.130  3e−29  0.158  3e−44  0.794  1e−05  0.754  1e−04  0.780  4e−05  0.657  4e−02  FOXL1b  2300  0.270  1e−134  0.327  2e−201  0.094  5e−15  0.096  7e−16  0.946  2e−13  0.950  5e−14  0.896  2e−09  0.631  5e−02  GCNT3  9245  0.277  1e−141  0.294  3e−160  0.110  9e−21  0.160  3e−45  0.846  3e−07  0.815  3e−06  0.801  1e−05  0.628  5e−02  INS  3630  0.252  7e−105  0.295  5e−146  0.042  1e−02  0.057  2e−05  0.979  5e−17  0.993  2e−23  0.918  1e−09  0.666  4e−02  KCNK16  83 795  0.168  1e−48  0.199  2e−68  0.034  5e−02  0.035  4e−02  0.968  4e−16  0.987  6e−22  0.906  1e−09  0.652  3e−02  LRRC66  339 977  0.259  3e−123  0.268  7e−133  0.116  3e−23  0.159  7e−45  0.834  9e−07  0.795  1e−05  0.808  7e−06  0.712  2e−02  MTMR11  10 903  0.215  8e−84  0.235  4e−101  0.122  2e−25  0.180  5e−58  0.648  9e−03  0.612  3e−02  0.649  1e−02  0.674  3e−02  PDX1b  3651  0.284  3e−149  0.285  1e−150  0.109  3e−20  0.176  9e−56  0.744  2e−04  0.706  1e−03  0.717  8e−04  0.656  3e−02  PTGES2-AS1  389 791  0.138  2e−34  0.148  2e−39  0.045  2e−03  0.077  1e−10  0.718  7e−04  0.684  3e−03  0.676  4e−03  0.639  4e−02  REG1A  5967  0.182  8e−60  0.197  5e−70  0.036  3e−02  0.058  3e−06  0.952  2e−14  0.946  2e−13  0.897  2e−09  0.636  4e−02  REG4  83 998  0.159  2e−45  0.154  1e−42  0.047  9e−04  0.085  4e−13  0.727  5e−04  0.681  3e−03  0.718  8e−04  0.657  4e−02  TCN1  6947  0.196  1e−69  0.216  1e−84  0.061  3e−06  0.085  8e−13  0.953  2e−14  0.952  3e−14  0.914  9e−10  0.649  3e−02  VILL  50 853  0.304  2e−172  0.349  3e−230  0.115  9e−23  0.131  5e−30  0.924  2e−11  0.921  3e−11  0.873  3e−08  0.626  5e−02  Gene symbol  Entrez ID  Tumor exomes  Tumor types  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  IF DelA/T  IF DelC/G  IF InsA/T  IF InsC/G  βa  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR  β  FDR   APOBEC1b  339  0.179  6e−58  0.188  5e−64  0.065  5e−07  0.106  4e−20  0.786  2e−05  0.749  2e−04  0.769  7e−05  0.688  2e−02  AXDND1  126 859  0.115  2e−22  0.114  5e−22  0.055  8e−05  0.086  2e−12  0.663  1e−02  0.624  3e−02  0.688  5e−03  0.754  1e−02  BCL2L15b  440 603  0.258  9e−123  0.262  4e−126  0.154  3e−41  0.211  1e−79  0.713  9e−04  0.660  6e−03  0.712  1e−03  0.657  4e−02  ERN2  10 595  0.277  8e−142  0.282  2e−147  0.130  3e−29  0.158  3e−44  0.794  1e−05  0.754  1e−04  0.780  4e−05  0.657  4e−02  FOXL1b  2300  0.270  1e−134  0.327  2e−201  0.094  5e−15  0.096  7e−16  0.946  2e−13  0.950  5e−14  0.896  2e−09  0.631  5e−02  GCNT3  9245  0.277  1e−141  0.294  3e−160  0.110  9e−21  0.160  3e−45  0.846  3e−07  0.815  3e−06  0.801  1e−05  0.628  5e−02  INS  3630  0.252  7e−105  0.295  5e−146  0.042  1e−02  0.057  2e−05  0.979  5e−17  0.993  2e−23  0.918  1e−09  0.666  4e−02  KCNK16  83 795  0.168  1e−48  0.199  2e−68  0.034  5e−02  0.035  4e−02  0.968  4e−16  0.987  6e−22  0.906  1e−09  0.652  3e−02  LRRC66  339 977  0.259  3e−123  0.268  7e−133  0.116  3e−23  0.159  7e−45  0.834  9e−07  0.795  1e−05  0.808  7e−06  0.712  2e−02  MTMR11  10 903  0.215  8e−84  0.235  4e−101  0.122  2e−25  0.180  5e−58  0.648  9e−03  0.612  3e−02  0.649  1e−02  0.674  3e−02  PDX1b  3651  0.284  3e−149  0.285  1e−150  0.109  3e−20  0.176  9e−56  0.744  2e−04  0.706  1e−03  0.717  8e−04  0.656  3e−02  PTGES2-AS1  389 791  0.138  2e−34  0.148  2e−39  0.045  2e−03  0.077  1e−10  0.718  7e−04  0.684  3e−03  0.676  4e−03  0.639  4e−02  REG1A  5967  0.182  8e−60  0.197  5e−70  0.036  3e−02  0.058  3e−06  0.952  2e−14  0.946  2e−13  0.897  2e−09  0.636  4e−02  REG4  83 998  0.159  2e−45  0.154  1e−42  0.047  9e−04  0.085  4e−13  0.727  5e−04  0.681  3e−03  0.718  8e−04  0.657  4e−02  TCN1  6947  0.196  1e−69  0.216  1e−84  0.061  3e−06  0.085  8e−13  0.953  2e−14  0.952  3e−14  0.914  9e−10  0.649  3e−02  VILL  50 853  0.304  2e−172  0.349  3e−230  0.115  9e−23  0.131  5e−30  0.924  2e−11  0.921  3e−11  0.873  3e−08  0.626  5e−02  Sixteen genes were found to be common among both differentially expressed genes in tumor exomes (1237) and differentially expressed genes in tumor types (22) with high number of in-frame (IF) indels (FDR or adjusted Pearson correlation analysis P ≤ 0.05). aPearson correlation β coefficient. bGenes whose products are distributed within nucleus. View Large We examined 712 available cases with paired normal-tumor samples across 21 cancer types and did not observe a significant difference in expression levels of APOBEC1 (Student’s t-test P = 0.09). However, after exclusion of the gastrointestinal samples (ESCA, STAD and COADREAD) showing high normal expression levels of APOBEC1 (median normal APOBEC1 expression level > 0), mean expression level of APOBEC1 in the remaining 619 samples of 18 tumor types was found to be nearly 36 times higher than the paired normal samples (Student’s t-test P = 5 × 10–4) (Supplementary Figure 6, available at Carcinogenesis Online). Further assessments showed that mean APOBEC1 expression levels were 38 times higher in those tumors with indel mutational signatures (signatures 1, 3, 6, 15, 20 and 26) compared to those with non-indel signatures (Supplementary Figure 7, available at Carcinogenesis Online). Intuitively, tumors with the highest mean number of in-frame indels (i.e. pancreas, stomach, colorectum, endometrium and adrenal cortex) tended to show the highest mean APOBEC1 levels (Supplementary Figure 8, available at Carcinogenesis Online). Noteworthy, APOBEC1 was found to be the only APOBEC/AICDA (activation induced cytidine deaminase) family member with differential expression in high in-frame indel states (Supplementary Tables 2 and 3, available at Carcinogenesis Online). Moreover, A1CF (APOBEC1 complementation factor 1) which is crucial in APOBEC1 mRNA editing activity, was not found to be differentially expressed in high in-frame indel states (Supplementary Tables 2 and 3, available at Carcinogenesis Online). Reassessing the limited number of available experiments (21) revealed at least one clonal DelA/T variant within the BCR-ABL1 fusion gene upon overexpression of the rat APOBEC1 in K562 cells. Because indel variants are frequent in six mutational signatures associated with age and HR/MMR defects, we tested the possibility that high-APOBEC1 confounded the impact of defective HR/MMR pathways. Mutations of the KEGG (Kyoto Encyclopedia of Genes and Genomes) HR/MMR pathway genes were used as an indicator of HR/MMR deficiency (Supplementary Tables 4 and 5, available at Carcinogenesis Online). Preliminary assessment showed that both HR-mutant (β coefficient = 0.26, P = 9 × 10–138) and MMR-mutant (β coefficient = 0.32, P = 4 × 10–214) indices highly predicted cancer indels, well reflecting the functional status of HR/MMR pathways. While in-frame indels were found to be mostly associated with high-APOBEC1 (high-A1) status (Figure 2A), frame-shift indels were seen to be mostly associated with MMR mutational status (Figure 2B). A multivariate model identified high-APOBEC1 as the most important factor predicting in-frame indels (Figure 2C), while MMR and HR mutational status were found to be more important in frame-shift indels (Figure 2D). Figure 2. View largeDownload slide The impact of various factors potentially implicated in cancer indels. (A, B) The mean number of indels (±SEM) in each tumor exome as classified by APOBEC1 level and mutational status of HR/MMR genes. All factors are significantly associated with the number of in-frame and frame-shift indels. Multivariate regression analysis shows that in-frame indels are mostly dependent on high-APOBEC1 followed by MMR-mutant status (C), while frame-shift indels are mostly dependent on MMR-mutant followed by HR-mutant status (D). Age does not appear to have any independent impact on cancer indels. Bars indicate the 95% confidence intervals. Unpaired two-tailed Student’s t-test ***P < 10−10. β, Pearson correlation coefficient. Figure 2. View largeDownload slide The impact of various factors potentially implicated in cancer indels. (A, B) The mean number of indels (±SEM) in each tumor exome as classified by APOBEC1 level and mutational status of HR/MMR genes. All factors are significantly associated with the number of in-frame and frame-shift indels. Multivariate regression analysis shows that in-frame indels are mostly dependent on high-APOBEC1 followed by MMR-mutant status (C), while frame-shift indels are mostly dependent on MMR-mutant followed by HR-mutant status (D). Age does not appear to have any independent impact on cancer indels. Bars indicate the 95% confidence intervals. Unpaired two-tailed Student’s t-test ***P < 10−10. β, Pearson correlation coefficient. Triplet repeat motifs correlated with indel variants We next analyzed the genomic contexts of 192 075 indel variants and identified that the nucleotide A was significantly enriched at essentially all positions in IndelA/T variants, while C was enriched at nearly all positions in IndelC/G variants (Figure 3A–D). Analysis of the 33 013 in-frame indel variants, however, showed a distinct pattern. The nucleotide A was found to be highly enriched at triple intervals in IndelA/T, particularly downstream to the indel site (Figure 3E and G), and C was enriched at triple intervals in IndelC/G (Figure 3F and H). In fact, the genomic context for in-frame indels revealed certain triplet repeat motifs. The consensus motifs were seen to be (ASN)4A(GGA)4 in IF-DelA/T, (CNN)4C(TBC)4 in IF-DelC/G, (ANN)4A(NNA)4 in IF-InsA/T, and (CSB)4C(SGC)4 in IF-InsC/G (Figure 3I–L; S = C or G; B = non-A). Analysis of common motifs among this shorter context revealed some motifs of several hundred times enrichment. For instance, A(NGA)4 motif constituting about 5 and 9% of IF-DelA/T and IF-InsA/T variants, respectively, was found to be 232 and 510 fold enriched (Supplementary Figure 9, available at Carcinogenesis Online). Likewise, C(NGC)4 motif comprising about 5 and 10% of IF-DelC/G and IF-InsC/G variants, respectively, was observed to be enriched 230 and 853 times. Analysis of the in-frame indel sequences showed them homologous to their contexts, with the first triplet of in-frame indels homologous to the first triplet of the context in 35.1% of events, showing about 27-fold enrichment (microhomologous; two-sided Mantel-Haenszel test P < 10–300). Figure 3. View largeDownload slide The genomic contexts of cancer indels. (A–D) The genomic contexts of 127 885 deletions and 64 190 insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site. (A) DelA/T. (B) DelC/G. (C) InsA/T. (D) InsC/G. The affected nucleotides A or C appear to be enriched at nearly all positions, particularly downstream to indel sites. One important exception is the position −1 which is significantly depleted in affected nucleotides A or C in both deletions and insertions. (E–L) The genomic context of 27 434 in-frame deletions and 5579 in-frame insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site, reveals triplet repeat motifs. (E) In-frame (IF-) DelA/T. (F) IF-DelC/G. (G) IF-InsA/T. (H) IF-InsC/G. The affected nucleotide A/C is seen to be highly enriched at triple intervals, particularly downstream to indel site. (I–L) The consensus motifs identified in in-frame indels between positions −12 and +12, showing four triplet repeat motifs upstream and four triplet repeats downstream to indel sites. These motifs can be summarized as (ASN)4A(GGA)4 for IF-DelA/T (I), (CNN)4C(TBC)4 for IF-DelC/G (J), (ANN)4A(NNA)4 for IF-InsA/T (K), and (CSB)4C(SGC)4 for IF-InsC/G (L) (S = C or G; B = non-A). Figure 3. View largeDownload slide The genomic contexts of cancer indels. (A–D) The genomic contexts of 127 885 deletions and 64 190 insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site. (A) DelA/T. (B) DelC/G. (C) InsA/T. (D) InsC/G. The affected nucleotides A or C appear to be enriched at nearly all positions, particularly downstream to indel sites. One important exception is the position −1 which is significantly depleted in affected nucleotides A or C in both deletions and insertions. (E–L) The genomic context of 27 434 in-frame deletions and 5579 in-frame insertions, showing the probability of each nucleotide between positions −30 and +30 in relation to the indel site, reveals triplet repeat motifs. (E) In-frame (IF-) DelA/T. (F) IF-DelC/G. (G) IF-InsA/T. (H) IF-InsC/G. The affected nucleotide A/C is seen to be highly enriched at triple intervals, particularly downstream to indel site. (I–L) The consensus motifs identified in in-frame indels between positions −12 and +12, showing four triplet repeat motifs upstream and four triplet repeats downstream to indel sites. These motifs can be summarized as (ASN)4A(GGA)4 for IF-DelA/T (I), (CNN)4C(TBC)4 for IF-DelC/G (J), (ANN)4A(NNA)4 for IF-InsA/T (K), and (CSB)4C(SGC)4 for IF-InsC/G (L) (S = C or G; B = non-A). APOBEC1 transcript variant 3 showing the same APOBEC1 correlations We next explored the potential correlation of in-frame indels with particular APOBEC1 splicing variants, including NM_001644, NM_001304566 and NM_005889. The former two, transcript variants 1 and 2 respectively, give rise to the full-length isoform a (NP_001635 and NP_001291495), while the latter variant 3 (Var3) with an alternative start codon within exon 2 yields the shorter isoform b (NP_005880). Var3 was seen to be overexpressed in non-gastrointestinal tumor cells (Supplementary Figure 6, available at Carcinogenesis Online), and dominantly expressed in vast majority of tumors (Supplementary Figure 10, available at Carcinogenesis Online). The number of all in-frame indel variants per tumor exome was found to be mainly correlated with the expression levels of Var3 (Figure 4), and their mean numbers correlated with the mean Var3 level in each tumor type (Supplementary Figure 11, available at Carcinogenesis Online). Figure 4. View largeDownload slide Correlation of the expression levels of APOBEC1 transcript variants with the number of in-frame indel variants. Analysis of 8891 pancancer samples showed the expression levels of variant 3 (Var3) as the most correlated APOBEC1 variant with the number of in-frame indels, followed by variant 2 (Var2). β: Pearson correlation coefficient. Figure 4. View largeDownload slide Correlation of the expression levels of APOBEC1 transcript variants with the number of in-frame indel variants. Analysis of 8891 pancancer samples showed the expression levels of variant 3 (Var3) as the most correlated APOBEC1 variant with the number of in-frame indels, followed by variant 2 (Var2). β: Pearson correlation coefficient. Prognostic implication of expression levels of APOBEC1 and Var3 Pancancer survival analysis revealed both high-APOBEC1 (hazard ratio = 1.30, 95% CI: 1.19–1.42) and high-Var3 (hazard ratio = 1.35, 95% CI: 1.24–1.48) to be significantly associated with shorter OS (Figure 5), irrespective of age, sex and other APOBEC1 splicing variants. Median survival was found to be 864 (977) days shorter in high-APOBEC1 (high-Var3) compared to low-APOBEC1 (low-Var3) patients. Similar analyses performed in individual tumor types demonstrated the prognostic value of the high-APOBEC1 and/or high-Var3 in five tumors, including adrenocortical carcinoma, endometrial carcinoma, pancreatic adenocarcinoma, mesothelioma and thyroid carcinoma (Supplementary Figures 12, 13A and B, available at Carcinogenesis Online). However, this did not mean that the high-APOBEC1/Var3 were not of prognostic value in remaining tumors, because they remained significantly prognostic after exclusion of above five tumors from pancancer analysis (Supplementary Figure 13C and D, available at Carcinogenesis Online). Intriguingly, mean IF indel count was significantly higher in high-APOBEC/Var3 groups in parallel to APOBEC/Var3 prognostic impact. A survival analysis assessing potentially implicated factors in cancer indels (Supplementary Table 6A, available at Carcinogenesis Online) identified age, male gender, high-APOBEC1 and HR mutations to be independently predicting pancancer outcome, with no remaining impact of MMR mutations (Supplementary Table 6B, available at Carcinogenesis Online). High-Var3 was also found to have a similar independent prognostic impact (Supplementary Table 6C, available at Carcinogenesis Online). Figure 5. View largeDownload slide Prognostic assessment of the high expression levels of APOBEC1 and Var3. (A) Cancer cases with high-APOBEC1 (High-A1) showed shorter median survival as compared to those with low-APOBEC1 (2036 versus 2900 days, hazard ratio = 1.30, 95% CI: 1.19–1.42). (B) Those patients with high-Var3 showed shorter median survival as compared to those with low-Var3 levels (1933 versus 2910 days, hazard ratio = 1.35, 95% CI: 1.24–1.48). Mantel-Cox P value is reported for each analysis. Mean IF indel count has been shown for each group. Unpaired two-tailed Student’s t-test ***P < 10−10. Figure 5. View largeDownload slide Prognostic assessment of the high expression levels of APOBEC1 and Var3. (A) Cancer cases with high-APOBEC1 (High-A1) showed shorter median survival as compared to those with low-APOBEC1 (2036 versus 2900 days, hazard ratio = 1.30, 95% CI: 1.19–1.42). (B) Those patients with high-Var3 showed shorter median survival as compared to those with low-Var3 levels (1933 versus 2910 days, hazard ratio = 1.35, 95% CI: 1.24–1.48). Mantel-Cox P value is reported for each analysis. Mean IF indel count has been shown for each group. Unpaired two-tailed Student’s t-test ***P < 10−10. Finally, the fact that high levels of the APOBEC1/Var3 in tumor cells were found to be an independent prognostic factor raised a question that whether they were also predictive of poor outcome in paired normal tissues. Examining 712 cases with available normal samples did not show significant prognostic value for high-APOBEC1 levels in paired normal tissues, neither in pancancer analysis (21 tumor types available) nor among non-gastrointestinal cancer cases. However, a similar analysis using high-Var3 levels demonstrated high prognostic impact in paired normal tissues of cancer patients in pancancer analysis (hazard ratio = 1.60, 95% CI = 1.08–2.35), and the impact was even higher when analysis was limited to non-gastrointestinal cancers (hazard ratio = 2.49, 95% CI = 1.16–5.30). Median survival time was found to be 876 days shorter in those cases with normal tissue high-Var3 in pancancer analysis, and 921 days shorter in non-gastrointestinal cancers with high-Var3 (Supplementary Figure 14, available at Carcinogenesis Online). Discussion We performed a transcriptome-wide association study in 8891 tumor samples of 32 types in order to screen those genes whose expression levels were associated with in-frame indels as the single most important mutational variant predicting poor cancer outcome. This identified 16 genes whose expression levels were correlated with the number of in-frame indels both in tumor samples and tumor types. We further narrowed down to four candidate genes including APOBEC1, BCL2L15, FOXL1 and PDX1 by exclusion of those genes whose products lacked a nuclear distribution. Methylation analysis of the latter candidate genes across 8891 tumor samples identified APOBEC1 as the only gene with methylation status inversely correlated with in-frame indels. A pro-apoptotic role has been shown for BCL2L15 (19), and both FOXL1 and PDX1 have been known as transcriptional regulators (17,18); none of them reported to be mutagenic. APOBEC1 was next confirmed to be overexpressed in primary non-gastrointestinal tumors. On the other hand, the magnitude of APOBEC1 indel impact appeared to be comparable to A3B impact on C>G/T variants (8). The absence of other APOBEC/AID family members among differentially expressed genes excluded the possibility that APOBEC1 just co-expressed with another truly implicated family member, including APOBEC3B or APOBEC3A. Two mutational signatures have been attributed to APOBEC/AID family (1), particularly APOBEC3B (8,9), APOBEC3A (9,10) and APOBEC3H (11). However, the implication of other family members with nuclear distribution, including APOBEC1 (23), APOBEC3C (24) and APOBEC3F (25) in cancer mutations remains to be investigated. Activation-induced deaminase (AID), which is normally involved in antibody diversification through DNA editing, has long been known for its tumorigenic impact in B-cell leukemia and lymphoma through off-target genomic deamination and induction of chromosomal translocations (26). While APOBEC1 had been recognized to deaminate cytidine in certain mRNAs (27), it was later found to be a DNA mutator as well (28,29). Sustained APOBEC1 overexpression in mouse liver was shown to induce hepatocellular carcinoma (30), and examining three mRNA editing sites across 28 tumor types essentially excluded the possibility of mRNA editing at known target sites as the cause of APOBEC1 carcinogenesis (31,32). Besides some clonal indels we observed, APOBEC1 overexpression causes a mutational signature compatible to what is seen in esophageal adenocarcinoma (21). On the other hand, APOBEC1−/− dramatically decreases the formation of intestinal adenomas in APCmin/+ mice. These propose an important role for APOBEC1 mutagenesis in gastrointestinal cancers (33). The functions of AID and APOBEC1 are comparable in many aspects. AID causes DNA cytidine deamination leading to somatic hypermutations, which are associated with indels in up to 6% of mutational events (34,35), and the resultant in-frame indels increase the immunoglobulin diversity and potency (36,37). Both AID and APOBEC1 deaminate the methyl/hydroxymethyl cytidine (5mC/5hmC) in single-stranded DNA (38), and APOBEC1 is the most potent family member deaminating 5hmC in brain cell DNA (39). Furthermore, both AICDA and APOBEC1 are located in a cluster of pluripotency genes including NANOG and DPPA3 (developmental pluripotency associated 3), co-expressing in oocytes and embryonic germ/stem cells (38). A demethylating role has also been suggested for APOBEC1 in testicular carcinoma (22), and APOBEC1 deficiency reduces the risk of mouse testicular germ cell tumors (20). We showed that the correlation of in-frame indels with high-APOBEC1 was independent of HR/MMR mutational status, as the proposed factors implicated in the indel mutational signatures. This is compatible with a role in causation rather than repair of in-frame indels. High-APOBEC1 impact was also reflected in its pancancer prognostic capacity independent of HR/MMR mutational status. We also identified triplet repeat motifs flanking in-frame indel sites, which might serve as DNA binding sites for implicated protein(s), as occurs in zinc-finger proteins including APOBEC family (40). Intriguingly, one consensus triplet motif [(GGA)4] has been reported to form G-quadruplex (G4) in the MYB promoter (41), with partial deletion giving rise to MYB transcriptional activation and complete deletion leading to MYB repression. The G4 motifs have been shown to form DNA binding sites for zinc-finger proteins (42). High-APOBEC1 expression levels were found to be correlated with adverse prognosis in five tumors, including adrenocortical carcinoma, uterine endometrial carcinoma, pancreatic adenocarcinoma, mesothelioma and thyroid carcinoma. The fact that high-APOBEC1 showed a significant albeit less prognostic impact after exclusion of these tumors indicated that its impact was global and not just limited to latter five tumors. We also explored APOBEC1 splicing variants in a search for more precise molecular explanations, identifying Var3 to be dominantly expressed in nearly all tumors studied, with the same indel and clinical associations as of APOBEC1. Var3 constitutes about half of the adult small intestine APOBEC1 mRNA as compared to nearly 90% in fetal small intestine (43), indicating a more important role in stem/progenitor cells. Intriguingly, it is reported that alternative initiation events might occur in response to DNA damage (44). The resultant isoform b lacks N-terminal 45 amino acids which are involved in nuclear localization of A1CF (27) and known to be essential for mRNA editing (45). This, in addition to lack of APOBEC1-A1CF coexpression in high-IF indel states, might impair mRNA editing activity, probably contributing to adverse prognosis in these states. Increased Var3 expression seems to be an early high-impact event in mutagenesis/carcinogenesis, because it was found to be predictive of poor outcome even in paired normal tissues of cancer patients. The fact that APOBEC1 (46) and other family members including APOBEC3A (47), APOBEC3F (48,49) and APOBEC3G (48,50) exist as dimers/multimers might explain how the shorter isoform interferes with the mRNA editing functions of the full-length one (51). The promutagenic impact of APOBEC1 in a pluripotency context (38) may explain its carcinogenic role. Moreover, alterations in the length of repetitive DNA are known to create diversity (52) and APOBEC1 can be predicted to enhance genomic variation by introducing in-frame indels in normal pluripotent cells, as occurs in the immunoglobulin locus by AID (37). In-frame indels are more likely to cause gain-of-function rather than frame-shift ones, and it was therefore not surprising that they occurred more frequently in oncogenes than in tumor suppressor genes. Furthermore, tumors with frame-shift indels unaffected by nonsense-mediated decay are likely to make neoantigens (53), potentially recruiting tumor-infiltrating lymphocytes, which effectively kill tumor cells (54). This might explain why frame-shift indels show less adverse clinical impact compared to in-frame ones. The programmed death-1 (PD-1) pathway which suppresses many immune cells including tumor-infiltrating lymphocytes (55) is upregulated in many tumors and their microenvironments, and its blockade has been successfully attempted in the treatment of several types of tumors, including melanoma, bladder cancer and small-cell lung cancer (56). A recent study shows that MMR deficiency predicts those metastatic colorectal cancer cases who would benefit from PD-1 blockade (57). However, MMR deficiency was not seen to predict successful PD-1 blockade in non-colorectal tumors, and it would be intriguing to test whether high-APOBEC1 can be helpful in this regard. In summary, pancancer prognostic analysis identified in-frame indels as the most important cancer variants affecting patients’ survival. Transcriptome-wide expression and methylation studies identified APOBEC1 as the only candidate gene with the expression and methylation levels correlated with the number of in-frame indels, and the splicing variant 3 mainly contributing to APOBEC1 impact. This indel impact was found to be independent of those DNA repair mechanisms which were proposed to be implicated in cancer indels. Unlike frame-shift indels, consensus triplet repeats were found to occur at in-frame indel sites. Expression levels of both APOBEC1 and variant 3 were found to predict pancancer outcome. However, surprisingly high prognostic value of high-APOBEC1/Var3 in cancers like pancreatic and thyroid carcinoma as well as mesothelioma promise the development of a novel predictive marker in these particular cancers. Moreover, high expression level of variant 3 was found to predict cancer outcome in paired normal tissues, suggesting it as an early mutagenic/carcinogenic event, and paving the way for its utilization as a novel prognostic marker in normal tissues. Our findings propose APOBEC1 and its isoform b as the endogenous mutators implicated in cancer in-frame indels, warranting extensive studies in order to validate both mutagenic and prognostic impacts of the high-APOBEC1 in various cancers. Supplementary material Supplementary material can be found at Carcinogenesis online. Funding This study was supported by a research grant from Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran. Abbreviations CI confidence interval FDR false discovery rate HR homologous recombination MMR mismatch repair OS overall survival SNV single-nucleotide variant Acknowledgements We gratefully acknowledge contributions from TCGA Research Network. We would like to thank Dr. James McKay (Genetic Cancer Susceptibility Group, IARC, WHO, Lyon, France), Dr. Yasmin Reyal (Department of Haematology, University College London Hospitals NHS Trust, London, United Kingdom), Dr. Hossein Poustchi (Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran) and Dr. Sadaf Sepanlou Ghajar (Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran) for their comments. Conflict of Interest Statement: None declared. References 1. Alexandrov, L.B.et al.  ; Australian Pancreatic Cancer Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain. ( 2013) Signatures of mutational processes in human cancer. Nature , 500, 415– 421. Google Scholar CrossRef Search ADS PubMed  2. Turnpenny, P.D.et al.  ( 2012) Emery’s elements of medical genetics . Elsevier/Churchill Livingstone, Philadelphia, PA. 3. Sia, E.A.et al.  ( 1997) Microsatellite instability in yeast: dependence on repeat unit size and DNA mismatch repair genes. Mol. Cell. Biol ., 17, 2851– 2858. Google Scholar CrossRef Search ADS PubMed  4. Kandoth, C.et al.  ( 2013) Integrated genomic characterization of endometrial carcinoma. Nature , 497, 67– 73. Google Scholar CrossRef Search ADS PubMed  5. TCGA( 2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature , 513, 202– 9. CrossRef Search ADS PubMed  6. Nakata, B.et al.  ( 2002) Prognostic value of microsatellite instability in resectable pancreatic cancer. Clin. Cancer Res ., 8, 2536– 2540. Google Scholar PubMed  7. TCGA( 2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature , 487, 330– 7. CrossRef Search ADS PubMed  8. Burns, M.B.et al.  ( 2013) Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet ., 45, 977– 983. Google Scholar CrossRef Search ADS PubMed  9. Nik-Zainal, S.et al.  ( 2014) Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet ., 46, 487– 491. Google Scholar CrossRef Search ADS PubMed  10. Chan, K.et al.  ( 2015) An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet ., 47, 1067– 1072. Google Scholar CrossRef Search ADS PubMed  11. Starrett, G.J.et al.  ( 2016) The DNA cytosine deaminase APOBEC3H haplotype I likely contributes to breast and lung cancer mutagenesis. Nat. Commun ., 7, 12918. Google Scholar CrossRef Search ADS PubMed  12. Futreal, P.A.et al.  ( 2004) A census of human cancer genes. Nat. Rev. Cancer , 4, 177– 183. Google Scholar CrossRef Search ADS PubMed  13. Bastolla, U. ( 2007) Structural approaches to sequence evolution: molecules, networks, populations . Springer, Berlin, New York. Google Scholar CrossRef Search ADS   14. Crooks, G.E.et al.  ( 2004) WebLogo: a sequence logo generator. Genome Res ., 14, 1188– 1190. Google Scholar CrossRef Search ADS PubMed  15. Schwartz, D.et al.  ( 2005) An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol ., 23, 1391– 1398. Google Scholar CrossRef Search ADS PubMed  16. Nik-Zainal, S.et al.  ; Breast Cancer Working Group of the International Cancer Genome Consortium. ( 2012) Mutational processes molding the genomes of 21 breast cancers. Cell , 149, 979– 993. Google Scholar CrossRef Search ADS PubMed  17. Perreault, N.et al.  ( 2001) Foxl1 controls the Wnt/beta-catenin pathway by modulating the expression of proteoglycans in the gut. J. Biol. Chem ., 276, 43328– 43333. Google Scholar CrossRef Search ADS PubMed  18. Oliver-Krasinski, J.M.et al.  ( 2009) The diabetes gene Pdx1 regulates the transcriptional network of pancreatic endocrine progenitor cells in mice. J. Clin. Invest ., 119, 1888– 1898. Google Scholar CrossRef Search ADS PubMed  19. Dempsey, C.E.et al.  ( 2005) Expression of pro-apoptotic Bfk isoforms reduces during malignant transformation in the human gastrointestinal tract. FEBS Lett ., 579, 3646– 3650. Google Scholar CrossRef Search ADS PubMed  20. Nelson, V.R.et al.  ( 2012) Transgenerational epigenetic effects of the Apobec1 cytidine deaminase deficiency on testicular germ cell tumor susceptibility and embryonic viability. Proc. Natl. Acad. Sci. USA , 109, E2766– E2773. Google Scholar CrossRef Search ADS   21. Saraconi, G.et al.  ( 2014) The RNA editing enzyme APOBEC1 induces somatic mutations and a compatible mutational signature is present in esophageal adenocarcinomas. Genome Biol ., 15, 417. Google Scholar CrossRef Search ADS PubMed  22. Kristensen, D.G.et al.  ( 2014) Evidence that active demethylation mechanisms maintain the genome of carcinoma in situ cells hypomethylated in the adult testis. Br. J. Cancer , 110, 668– 678. Google Scholar CrossRef Search ADS PubMed  23. Lau, P.P.et al.  ( 1991) Apolipoprotein B mRNA editing is an intranuclear event that occurs posttranscriptionally coincident with splicing and polyadenylation. J. Biol. Chem ., 266, 20550– 20554. Google Scholar PubMed  24. Bogerd, H.P.et al.  ( 2006) Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc. Natl. Acad. Sci. USA , 103, 8780– 8785. Google Scholar CrossRef Search ADS   25. Burdick, R.C.et al.  ( 2013) Nuclear import of APOBEC3F-labeled HIV-1 preintegration complexes. Proc. Natl. Acad. Sci. USA , 110, E4780– E4789. Google Scholar CrossRef Search ADS   26. Casellas, R.et al.  ( 2016) Mutations, kataegis and translocations in B cells: understanding AID promiscuous activity. Nat. Rev. Immunol ., 16, 164– 176. Google Scholar CrossRef Search ADS PubMed  27. Teng, B.et al.  ( 1993) Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science , 260, 1816– 1819. Google Scholar CrossRef Search ADS PubMed  28. Harris, R.S.et al.  ( 2002) RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell , 10, 1247– 1253. Google Scholar CrossRef Search ADS PubMed  29. Conticello, S.G. ( 2012) Creative deaminases, self-inflicted damage, and genome evolution. Ann. N. Y. Acad. Sci ., 1267, 79– 85. Google Scholar CrossRef Search ADS PubMed  30. Yamanaka, S.et al.  ( 1995) Apolipoprotein B mRNA-editing protein induces hepatocellular carcinoma and dysplasia in transgenic animals. Proc. Natl. Acad. Sci. USA , 92, 8483– 8487. Google Scholar CrossRef Search ADS   31. Yamanaka, S.et al.  ( 1997) A novel translational repressor mRNA is edited extensively in livers containing tumors caused by the transgene expression of the apoB mRNA-editing enzyme. Genes Dev ., 11, 321– 333. Google Scholar CrossRef Search ADS PubMed  32. Greeve, J.et al.  ( 1999) Absence of APOBEC-1 mediated mRNA editing in human carcinomas. Oncogene , 18, 6357– 6366. Google Scholar CrossRef Search ADS PubMed  33. Blanc, V.et al.  ( 2007) Deletion of the AU-rich RNA binding protein Apobec-1 reduces intestinal tumor burden in Apc(min) mice. Cancer Res ., 67, 8565– 8573. Google Scholar CrossRef Search ADS PubMed  34. Wilson, P.C.et al.  ( 1998) Somatic hypermutation introduces insertions and deletions into immunoglobulin V genes. J. Exp. Med ., 187, 59– 70. Google Scholar CrossRef Search ADS PubMed  35. Goossens, T.et al.  ( 1998) Frequent occurrence of deletions and duplications during somatic hypermutation: implications for oncogene translocations and heavy chain disease. Proc. Natl. Acad. Sci. USA , 95, 2463– 2468. Google Scholar CrossRef Search ADS   36. Chaudhuri, J.et al.  ( 2003) Transcription-targeted DNA deamination by the AID antibody diversification enzyme. Nature , 422, 726– 730. Google Scholar CrossRef Search ADS PubMed  37. Walker, L.M.et al.  ; Protocol G Principal Investigators. ( 2011) Broad neutralization coverage of HIV by multiple highly potent antibodies. Nature , 477, 466– 470. Google Scholar CrossRef Search ADS PubMed  38. Morgan, H.D.et al.  ( 2004) Activation-induced cytidine deaminase deaminates 5-methylcytosine in DNA and is expressed in pluripotent tissues: implications for epigenetic reprogramming. J. Biol. Chem ., 279, 52353– 52360. Google Scholar CrossRef Search ADS PubMed  39. Guo, J.U.et al.  ( 2011) Hydroxylation of 5-methylcytosine by TET1 promotes active DNA demethylation in the adult brain. Cell , 145, 423– 434. Google Scholar CrossRef Search ADS PubMed  40. Jarmuz, A.et al.  ( 2002) An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics , 79, 285– 296. Google Scholar CrossRef Search ADS PubMed  41. Palumbo, S.L.et al.  ( 2008) A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity. Nucleic Acids Res ., 36, 1755– 1769. Google Scholar CrossRef Search ADS PubMed  42. Chariker, J.H.et al.  ( 2016) Computational analysis of G-quadruplex forming sequences across chromosomes reveals high density patterns near the terminal ends. PLoS One , 11, e0165101. Google Scholar CrossRef Search ADS PubMed  43. Hirano, K.et al.  ( 1997) Characterization of the human apobec-1 gene: expression in gastrointestinal tissues determined by alternative splicing with production of a novel truncated peptide. J. Lipid Res ., 38, 847– 859. Google Scholar PubMed  44. Sprung, C.N.et al.  ( 2011) Alternative transcript initiation and splicing as a response to DNA damage. PLoS One , 6, e25758. Google Scholar CrossRef Search ADS PubMed  45. Teng, B.B.et al.  ( 1999) Mutational analysis of apolipoprotein B mRNA editing enzyme (APOBEC1). structure-function relationships of RNA editing and dimerization. J. Lipid Res ., 40, 623– 635. Google Scholar PubMed  46. Lau, P.P.et al.  ( 1994) Dimeric structure of a human apolipoprotein B mRNA editing protein and cloning and chromosomal localization of its gene. Proc. Natl. Acad. Sci. USA , 91, 8522– 8526. Google Scholar CrossRef Search ADS   47. Bohn, M.F.et al.  ( 2015) The ssDNA mutator APOBEC3A is regulated by cooperative dimerization. Structure , 23, 903– 911. Google Scholar CrossRef Search ADS PubMed  48. Wiegand, H.L.et al.  ( 2004) A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO J ., 23, 2451– 2458. Google Scholar CrossRef Search ADS PubMed  49. Ara, A.et al.  ( 2014) Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog ., 10, e1004024. Google Scholar CrossRef Search ADS PubMed  50. Wedekind, J.E.et al.  ( 2006) Nanostructures of APOBEC3G support a hierarchical assembly model of high molecular mass ribonucleoprotein particles from dimeric subunits. J. Biol. Chem ., 281, 38122– 38126. Google Scholar CrossRef Search ADS PubMed  51. Anant, S.et al.  ( 2001) ARCD-1, an apobec-1-related cytidine deaminase, exerts a dominant negative effect on C to U RNA editing. Am. J. Physiol. Cell Physiol ., 281, C1904– C1916. Google Scholar CrossRef Search ADS PubMed  52. Slattery, J.P.et al.  ( 2000) Patterns of diversity among SINE elements isolated from three Y-chromosome genes in carnivores. Mol. Biol. Evol ., 17, 825– 829. Google Scholar CrossRef Search ADS PubMed  53. Saeterdal, I.et al.  ( 2001) Frameshift-mutation-derived peptides as tumor-specific antigens in inherited and spontaneous colorectal cancer. Proc. Natl. Acad. Sci. USA , 98, 13255– 13260. Google Scholar CrossRef Search ADS   54. Westdorp, H.et al.  ( 2016) Opportunities for immunotherapy in microsatellite instable colorectal cancer. Cancer Immunol. Immunother ., 65, 1249– 1259. Google Scholar CrossRef Search ADS PubMed  55. Keir, M.E.et al.  ( 2008) PD-1 and its ligands in tolerance and immunity. Annu. Rev. Immunol ., 26, 677– 704. Google Scholar CrossRef Search ADS PubMed  56. Brahmer, J.R.et al.  ( 2012) Safety and activity of anti-PD-L1 antibody in patients with advanced cancer. N. Engl. J. Med ., 366, 2455– 2465. Google Scholar CrossRef Search ADS PubMed  57. Le, D.T.et al.  ( 2015) PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med ., 372, 2509– 2520. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Journal

CarcinogenesisOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off