TSVdb: a web-tool for TCGA splicing variants analysis

TSVdb: a web-tool for TCGA splicing variants analysis Background: Collaborative projects such as The Cancer Genome Atlas (TCGA) have generated various -omics and clinical data on cancer. Many computational tools have been developed to facilitate the study of the molecular characterization of tumors using data from the TCGA. Alternative splicing of a gene produces splicing variants, and accumulating evidence has revealed its essential role in cancer-related processes, implying the urgent need to discover tumor-specific isoforms and uncover their potential functions in tumorigenesis. Result: We developed TSVdb, a web-based tool, to explore alternative splicing based on TCGA samples with 30 clinical variables from 33 tumors. TSVdb has an integrated and well-proportioned interface for visualization of the clinical data, gene expression, usage of exons/junctions and splicing patterns. Researchers can interpret the isoform expression variations between or across clinical subgroups and estimate the relationships between isoforms and patient prognosis. TSVdb is available at http://www.tsvdb.com,andthesourcecodeisavailable at https://github. com/wenjie1991/TSVdb. Conclusion: TSVdb will inspire oncologists and accelerate isoform-level advances in cancer research. Keywords: Splicing variant, Alternative splicing, TCGA, Cancer, Visualization tools Background variants could be potential biomarkers [7] and therapeutic During transcription in eukaryotes, alternative splicing targets in cancer studies. (AS) of message precursor RNA generates splicing vari- The Cancer Genome Atlas (TCGA) project (http:// ants for a single gene, and particular exons may be cancergenome.nih.gov) has incorporated a vast bulk of included or excluded. It was estimated that approximately genomic sequences, epigenetic profiles, transcriptomes 92-94% of human genes undergo AS [1]. As one of the and multidimensional clinical datasets. It is an excel- most common mechanisms associated with gene regula- lent source for exploring and validating genes of interest tion [2], AS has emerged as a vital mechanism in tumori- through the TCGA RNA-Seq. genesis that regulates the function of cancer-related genes Information on splicing variants can be identified from [3]. Aberrant splicing patterns are closely related to tumor RNA Sequencing data through software such as Cufflinks, progression [4]. For example, misregulation of splicing RSEM (RNA-Seq by Expectation Maximization), Kallisto caused by splicing factor Serine And Arginine Rich Splic- and MapSplicing [8–10].ToolssuchasTCGASpliceSeq ing Factor 1 (SRSF1) can lead to the malignant transfor- [9] and ISOexpresso [11] provide users with alternative mation of normal mammary cells [5]; we also reported splicing patterns and isoform expression data between that Serine And Arginine Rich Splicing Factor 6 (SRSF6) normal and tumor cells. However, detailed clinical infor- promotes tumor progression by regulating AS and might mation, such as the tumor grade, race and survival time, be a potential therapeutic target [6]. Thus, splicing are not provided by the existing tools that investigate splicing variants. To address these questions, we introduced TSVdb, an *Correspondence: lmp@zju.edu.cn; honghezhang@zju.edu.cn Wenjie Sun and Ting Duan contributed equally to this work. interactive web-portal, to perform comparative analy- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang sisonsplicingvariantsacrosstumorsubgroupsusing Province, School of Medicine, Zhejiang University, 310058 Hangzhou, China Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Sun et al. BMC Genomics (2018) 19:405 Page 2 of 7 TCGA RNA-Seq datasets from 33 tumors and 30 clinical a defined non-zero day_to_death value; otherwise, variables. the event variable was considered “censored”. The TSVdb presents a well-organized visualization of survival time was set to the larger variable among the exon/junction usage and splicing patterns, which enable day_to_death and days_to_last_follow-up. The users to readily and quickly access, analyze, and interpret survival status was coded as 0 (live or censored) or 1 splicing variants for interesting genes. Users can inves- (death). tigate the isoform expression between tumor subgroups (c) Stage. The class number was reduced to 5 (I, II, III, and the association of splicing variant expression with IV, and X) for the pathology_stage, clinical_stage and overall survival. masaoka_stage. We believe that TSVdb provides a user-friendly plat- (d) Risk factors of LIHC. “Alpha-1 antitrypsin form for researchers to maximize TCGA utilization and deficiency”, “hemochromatosis” and “other” were unearth more potential cancer biomarkers. combined into “others”. “Alcohol consumption”, “hepatitis b” and “hepatitis c” were classified by Implementation themselves due to their large proportion. Data collection (e) Alcohol consumption per day was divided into “0” TCGA (version 20160128) level 3 data were downloaded and “> 0”. from the TCGA FTP site Firehose. The data included the (f) The number of pregnancies was grouped into six gene, isoform RSEM data, exon, and junction-normalized classes, which defined the number of pregnancies as read count data (UNC illuminaHiSeq_RNASeqV2) and either “1”, “2”, “3”, “4”, “5” or “> 5”. clinical data (Merged_clinical_level_1). Because the data “Indeterminate” data were shown as “UNDEFINED”. in Firehose no longer updates, data updates will use Additional file 1: Table S1 shows the phenotype statistics the TCGA data in the GDC data portal in the forthetumortypes. future. The current TCGA data version was noted All the data described above were deposited into the in the footer on the TSVdb plot page. The RNA- NoSQL database MongoDB (See Additional file 2 for the Seq data transformation was accomplished with the database scheme). R software version 3.2.3. Genes with Entrez IDs in both the TCGA data and annotation package Website and Data Visualization org.Hs.eg.db(3.2.3) were used. The annotation package The nodejs scripts were used to provide the private APIs TxDb.Hsapiens.UCSC.hg19.knownGene(3.2.2) was used to for the web-end. There were three APIs with the follow- annotate the isoform, exon and junction data in the TCGA ing functions: (1) Autocompleting the gene symbol or datasets. The annotation of the TCGA exon and junc- alias, (2) Validating the symbol or the input Entrez ID, tion data was performed by overlapping the locations of (3) Querying MongoDB and then returning the clinical the exons/junctions with the isoform range. The annota- variable list for the queried tumor, and (4) Querying Mon- tion package was also used to plot the transcript isoform goDB and then returning the data to draw the results. structure. The JavaScript libraries d3.js [12] (version 3.0.6, http:// d3js.org/) were used to construct the interactive SVG Data manipulation and transformationn graph. The d3.js-based interactive KM-plot was modified The related clinical information for each cancer type from Nick Strayer’s code (http://bl.ocks.org/nstrayer/). was selected and prepared for each tumor. Thirty clini- The box plot and other distribution charts were modified cal variables were chosen and processed (Additional file 1: from Andrew Sielen’s code (https://github.com/asielen/ TableS1).Acut-offvalue wasusedtomakethe numer- D3_Reusable_Charts). The SVG graph download function ical variables classified variables; if there were too many was adapted from A. Gordon’s solution (https://github. classes, a combination was applied to reduce the class com/agordon/d3export_demo). number. The transformation methods were as follows: ij min ,1 if Q > 0.05 95,i e ·Q j 95,i (a) The number of packs smoked per year smoked, y = (1) ij ij min ,1 if Q ≤ 0.05 95,i which was an integer variable. The cut-off values e ·0.05 were set to 10 and 100 to create the following three ij i=1 e = (2) categories: (1) “less than 10”, “less than 100”, and “greater than 100” packs smoked per year. The exon/junction usage value y displayed in the main ij (b) Overall survival. The day_to_death and results (Fig. 1) was derived from the exon/junction quan- days_to_last_follow-up variables were used to tification value x with a series of scalings and normal- ij generate the time and event variables for overall izations Eq. (1). The effects of the normalization are survival. The event was set to “death” if a patient had shown in Additional file 1:FigureS2foragenethat Sun et al. BMC Genomics (2018) 19:405 Page 3 of 7 Fig. 1 An illustration of the TSVdb database. The top of the web page displays the query parameters buttons; under those buttons were the figure legends. Under the legends were the main results. The sample type, gene expression and exon/junction usage (patients are presented in columns arranged according to their gene expression from low to high in each group and the exons/junctions are arrayed in rows) are displayed on the right side and from the top down. The left side shows the gene transcriptional pattern, in which the thin lines represent the introns and boxes connected by the lines represent the exons corresponding to the rows on the right side. Hovering or clicking on the rows will highlight the corresponding exon in the transcription pattern and double clicking on the rows will open the UCSC Genome Browser in a new tab/window and showing the gene structure. Additionally, clicking on the isoform structure lead to isoform-specific expression data or a survival curve has n exon/junction values (i = 1, ... , n)in m samples finishing the input, the main output window showed (j = 1, ... , m). First, following the idea of a “splicing the clinical information, gene expression, exon/junction index” [13], each sample’s exon/junction quantification usage and isoform structure diagram (Fig. 1). As was value was divided by the expression quantity e of the gene shown, the samples were divided into two or more sub- to which the exon belongs Eq. (2). The gene expression groups according to their clinical information, e.g., “Solid quantity was estimated by averaging the quantification Tissue Normal” and “Primary Solid Tumor”. The sam- of the exons/junctions. Therefore, the gene expression ples in each group were arrayed by their gene expression effect was removed and the AS event was highlighted. levels, which helped to distinguish the correlation Next, the d3.js linear scale was used to map the normal- between the isoform expression and gene expres- ized exon/junction values to the graph coordination. The sion. Meanwhile, the shadowed-line charts display the interval of the normalized exon/junction values (domain exon/junction usage values for each sample to facilitate argument in scale function in D3)was setto (0, Q ) the recognition of alternative splicing. Links to the UCSC 95,i (95% quantile) to diminish the outlier’s impact, which Genome Browser also offer the exon/junction’s loci as may minimize the differences in the AS events between well as further annotation information such as the con- groups. Furthermore, if Q < 0.05, which indicates the servation of the exon sequence, single nucleotide poly- 95,i exon/junction expression quantity, is relatively small when morphisms, and mutations, so that researchers can gain a corresponding to the gene expression quantity, the upper full-scale understanding of the exon or junction. bound of the interval would be set to 0.05. Furthermore, the transcriptional pattern was also dis- played to reveal the splicing isoforms for a single gene and Results their constitutions. Notably, by double-clicking the tran- TSVdb use scriptional pattern, the expression of the isoforms in the Four dialogs were initially used to input the tumor type, different subgroups was shown using a box plot (Fig. 3), exon/junction and clinical data for a specific gene. After and the correlation of the isoforms with overall survival Sun et al. BMC Genomics (2018) 19:405 Page 4 of 7 was demonstrated by a KM-plot (Fig. 4), which indicated type could be downloaded into one file by clicking the promising use for clinical cancer researchers. The KM sur- download link. An illustration of the downloaded table is vival results described four different parts as follows. (1) shown in Additional file 1:TableS2 In the bottom right, there was a way to adjust the cut-off for grouping, where the knob could be adjusted to change Example analysis the cut-off. The default value was set to the middle of the The oncogene Rac Family Small GTPase 1 (RAC1) is isoform expression range. (2) The top right part displayed closely associated with tumorigenesis, tumor progression information on the groups, including the cut-off value and and therapy resistance [14–16], and RAC1 alternative sample size of each group. (3) The top left showed the sur- splicing is important for its regulatory role in cancers.[17]. vival line for each entered individual after filtering them As illustrated by RAC1 in colon cancer, RAC1 generates by the survival start time. The start time could be changed three splicing isoforms, and the fourth exon is skipped by adjusting the knob. (4) The bottom left was the KM- in normal tissues and included in tumor tissues (Fig. 2a). plot. Moreover, users could also click on the right y-axis or Consistently, it was revealed that the use of the junc- bottom x-axis to invoke input boxes and set the position tions linking exons 3 to 4 and 4 to 5 were high in tumor by inputting a specific value. The formatted exon/junction tissues (Additional file 1: Figure S3A). Similarly, by choos- quantification, transcript isoform expression, clinical vari- ing microsatellite instability (MSI) status as the pheno- ables and gene expression data for a gene in one tumor type, the results revealed that the fourth exon usage was Fig. 2 Visualization of the TCGA data for RAC1 in colon adenocarcinoma using TSVdb. a The exon usage results in the different RAC1 sample types. b The exon usage results for different RAC1 MSI statuses using TSVdb. c RAC1 isoforms and annotation of the fourth exon from the UCSC Genome Browser Sun et al. BMC Genomics (2018) 19:405 Page 5 of 7 high in the microsatellite stable (MSS) and microsatel- samples, which can also be achieved by ISOexpresso [11] lite instability-low (MSI-L) groups (Fig. 2b), suggesting its and TCGASpliceSeq [9], respectively (Table 1). potential role in DNA damage response [18]. Moreover, Moreover, TSVdb presents a better and more conve- annotation from the UCSC Genome Browser (Fig. 2c) nient visualization for users to assess exon/junction usage, showed that the DNA sequence of the fourth exon was transcript isoform expression, isoform pattern graph, and evolutionarily conserved in vertebrates, which indicated clinical information in one graph. This is the first time the importance of its function in tumor biology. that we have integrated comprehensive clinical data with The isoform expression variation for RAC1 was also TCGA alternative splicing analysis tools; this web-tool shown by the box plot (Fig. 3). Primary solid tumor tis- will help to perform comparative analysis across different sues had higher uc003spw.3 transcript expression, while tumor subgroups. The subgroups are defined by demo- there was lower uc003spx.3 transcript expression relative graphic data, clinical diagnosis data, treatment data and to normal tissues (Fig. 3a and b) Interestingly, the fourth follow-up data. exon was only included in isoform uc003spw.3. Further- By taking advantage of clinical data from TCGA, in the more, the MSI-H samples showed lower uc003spw.3 analysis example, the result of the gene RAC1 was shown. transcript expression but higher uc003spx.3 transcript We found that the expression of the fourth exon was high expression (Fig. 3c and d). Additionally, the KM-plots in adenocarcinoma tumor tissue. Moreover, it was discov- showed the correlation between uc003spw.3 and ered that exon usage mainly increased in tumor tissues uc003spx.3 with the overall survival of colon cancer from MSS and MSI-L patients. patients (Fig. 4), and the high uc003spw.3 expression was In the data visualization tactic aspect, Sashimi Plots correlated with poor prognosis. aremostcommonlyusedtovisualize alternativesplicing [19]. Sashimi Plots use read densities to represent the Discussion amount of reads alignment in exons or junctions. There TSVdb is a user-friendly interface for unearthing alterna- are some variations of Sashimi Plots. For example, tive splicing variations in 33 cancers. Similar to existing TCGASpliceSeq uses the number to show the reads quan- tools, TSVdb provides a comparison of isoform expres- tity and GTEx project uses the color gradient to quantify. sion and alternative splicing between tumor and normal However, Sashimi Plots cannot visualize many samples Fig. 3 Expression of RAC1 splicing isoforms in colon adenocarcinoma in TSVdb. a and b Differential expression of isoforms uc003spw.3 and uc003spx.3, respectively, in tumor and normal tissues. c and d Differential expression of isoforms uc003spw.3 and uc003spx.3, respectively, in different MSI statuses Sun et al. BMC Genomics (2018) 19:405 Page 6 of 7 as well and do not facilitate comparisons between sam- ples. Visualizing Alternative Splicing (Vials) (http://vials. io/vials/) resolved the problem using a complex multivari- able graph [20]. In TSVdb, the data visualization strategy was inspired by MEXPRESS [21], in which samples are plotted on the x-axis, the genome is arranged on the y- axis and comparisons between subgroups are achieved by sorting the samples by phenotypes. This strategy makes it possible to display data for hundreds of samples in a single figure with genome annotations. Although, as the price, TSVdb cannot display the exon and junction reads quan- tity simultaneously, it can take advantage of the big sample size in TCGA datasets. Conclusion In summary, we provided a web-based tool for splic- ing variants analysis. We believe that TSVdb offers researchers a quick and straightforward visualization tool to explore alternative splicing and isoform expression of target genes in clinical subgroups within the TCGA data. Availability and requirements Project name: TSVdb Project home page: http://www.tsvdb.com/ Operating system(s): Platform independent Programming language: R, Nodejs and perl 6 (server side scripts) Fig. 4 Kaplan-Meier plots showing the associations of the RAC1 Other requirements: Internet browser required for net- isoform (a) uc003spw.3 and (b) uc003spx.3 with overall patient work visualization survival License: Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) Table 1 Summary of the features of the three databases, Any restrictions to use by non-academics: no restriction including SpliceSeq, ISOexpresso, and TSVdb, for visualizing the TCGA splice variant data Features TCGA SpliceSeq ISOexpresso TSVdb Additional files Splicing event Yes No No measurements Additional file 1: Supplementary figures and tables. Table S1. The clinical variable distribution. Table S2. The data download format. Figure S1. The Algorithm integrate Yes No No query dialogs for choosing the tumor type. Figure S2. The procedures for exon, junction reads calculating the RAC1 exon usage in colon adenocarcinoma. Figure S3. The Provide Screening Yes No No RAC1 junction usage in colon adenocarcinoma using TSVdb. Figure S4. result The Kaplan-Meier plots showing the associations of the RAC1 isoform uc003spw.3, uc003spx.3 with overall patient survival. (PDF 919 kb) Splicing pattern graph Yes No No Additional file 2: MongoDB database scheme. (TXT 2 kb) Isoform pattern graph No Yes Yes Isoform expression No Yes Yes Abbreviations ACC: Adrenocortical carcinoma; AS: Alternative splicing; BLCA: Bladder Exon quantification Yes No Yes urothelial carcinoma; BRCA: Breast invasive carcinoma; CESC: Cervical Junction Yes No Yes squamous cell carcinoma and endocervical adenocarcinoma; CHOL: quantification Cholangiocarcinoma; COAD: Colon adenocarcinoma; DDR: DNA damage response; DLBC: Lymphoid neoplasm diffuse large B-cell lymphoma; ESCA: Clinical data No No Yes Esophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck ∗ † Multi-omics No Partial Partial squamous cell carcinoma; KICH: Kidney Chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute Genome-wide data Yes No No myeloid leukemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular download carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell ISOexpresso provided eQTL analysis carcinoma; MESO: Mesothelioma; MSI-H: microsatellite instability-high; MSI-L: TSVdb provided gene expression correlated AS analysis microsatellite instability-low; MSS: microsatellite stable; OV: Ovarian serous Sun et al. BMC Genomics (2018) 19:405 Page 7 of 7 cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: 5. Anczuków O, Akerman M, Cléry A, Wu J, Shen C, Shirole NH, Raimer A, Pheochromocytoma and Paraganglioma; PRAD: Prostate adenocarcinoma; Sun S, Jensen MA, Hua Y, Allain FH-T, Krainer AR. SRSF1-Regulated RAC1: Rac family small GTPase 1; READ: Rectum adenocarcinoma; RSEM: Alternative Splicing in Breast Cancer. Mol Cell. 2015;60(1):105–17. RNA-Seq by expectation maximization; SARC: Sarcoma; SKCM: Skin cutaneous https://doi.org/10.1016/j.molcel.2015.09.005. Accessed 27 Oct 2017. melanoma; SRSF1: Serine and arginine rich splicing factor 1; SRSF6: Serine and 6. Wan L, Yu W, Shen E, Sun W, Liu Y, Kong J, Wu Y, Han F, Zhang L, Yu T, arginine rich splicing factor 6; STAD: Stomach adenocarcinoma; TCGA: The Zhou Y, Xie S, Xu E, Zhang H, Lai M. SRSF6-regulated alternative splicing cancer genome atlas; TGCT: Testicular germ cell tumors; THCA: Thyroid that promotes tumour progression offers a therapy target for colorectal carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; cancer. Gut. 2017;2017–314983. https://doi.org/10.1136/gutjnl-2017- UCS: Uterine carcinosarcoma; VM: Uveal melanoma 314983. Accessed 18 Nov 2017. 7. Omenn GS, Guan Y, Menon R. A new class of protein cancer biomarker Acknowledgements candidates: Differentially expressed splice variants of ERBB2 (HER2/neu) We would like to thank Ledong Wan and Riccardo Fodde who tested this and ERBB1 (EGFR) in breast cancer cell lines. J Proteome. 2014;107:103–12. website and gave us very valuable feedback. The data generated by TCGA https://doi.org/10.1016/j.jprot.2014.04.012. Accessed 05 Sep 2017. Research Network (http://cancergenome.nih.gov/) has been used for TSVdb 8. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, development. McPherson A, Szczesniak ´ MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. Funding https://doi.org/10.1186/s13059-016-0881-8. Accessed 28 Jun 2016. This work is supported by grants from the National Natural Science 9. Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, Foundation of China (81672730 to H.Z. and 81572716 to M.L.), the 111 Project Weinstein J. TCGASpliceSeq a compendium of alternative mRNA splicing (B13026 to M.L.) and the Fundamental Research Funds for the Central in cancer. Nucleic Acids Res. 2015;1288. https://doi.org/10.1093/nar/ Universities (172210271 to H.Z.). These funding sources had no involvement in gkv1288 00001. Accessed 19 Dec 2016. the study design; the collection, analysis, or interpretation of data; the writing 10. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq of the report; or the decision to submit the manuscript for publication. data with or without a reference genome. BMC Bioinformatics. 2011;12:323. https://doi.org/10.1186/1471-2105-12-323 Accessed 27 Oct 2017. Availability of data and materials 11. Yang IS, Son H, Kim S, Kim S. ISOexpresso: a web-based platform for TSVdb is freely available for academic or commercial use at isoform-level expression analysis in human cancer. BMC Genomics. http://www.tsvdb.com/. 2016;17:631. https://doi.org/10.1186/s12864-016-2852-6 Accessed 26 Oct 2017. Authors’ contributions 12. Bostock M, Ogievetsky V, Heer J. D3 Data-Driven Documents. IEEE Trans WS, ML, and HZ conceived the project. WS, PY, and KC designed and Vis Comput Graph. 2011;17(12):2301–9. https://doi.org/10.1109/TVCG. developed the TSVdb. DT and GZ participated in the designing tested the 2011.185. Accessed 30 Oct 2017. utility of the tool. WS and DT wrote and revised the manuscript. WS, TD, and 13. Cuperlovic-Culf M, Belacel N, Culf AS, Ouellette RJ. Data analysis of GZ collected and cleared up the TCGA data. All authors have reviewd and alternative splicing microarrays. Drug Discov Today. 2006;11(21):983–90. approved the final version of this manuscript. https://doi.org/10.1016/j.drudis.2006.09.011. Accessed 08 Sep 2017. 14. Ma Q, Cavallin LE, Yan B, Zhu S, Duran EM, Wang H, Hale LP, Dong C, Ethics approval and consent to participate Cesarman E, Mesri EA, Goldschmidt-Clermont PJ. Antitumorigenesis of Not applicable. antioxidants in a transgenic Rac1 model of Kaposi’s sarcoma. Proc Natl Acad Sci. 2009;106(21):8683–8. https://doi.org/10.1073/pnas.0812688106. Competing interests Accessed 01 Nov 2017. The authors declare that they have no competing interests. 15. Stallings-Mann ML, Waldmann J, Zhang Y, Miller E, Gauthier ML, Visscher DW, Downey GP, Radisky ES, Fields AP, Radisky DC. Matrix Publisher’s Note metalloproteinase induction of Rac1b, a key effector of lung cancer Springer Nature remains neutral with regard to jurisdictional claims in progression. Sci Transl Med. 2012;4(142):142–95. https://doi.org/10.1126/ published maps and institutional affiliations. scitranslmed.3004062. 16. Dokmanovic M, Hirsch DS, Shen Y, Wu WJ. Rac1 contributes to Author details trastuzumab resistance of breast cancer cells: Rac1 as a potential Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang therapeutic target for the treatment of trastuzumab-resistant breast Province, School of Medicine, Zhejiang University, 310058 Hangzhou, China. cancer. Mol Cancer Ther. 2009;8(6):1557–69. https://doi.org/10.1158/ Department of Toxicology, School of Medicine, Zhejiang University, 310058 1535-7163.MCT-09-0140. Accessed 01 Nov 2017. Hangzhou, China. Hikvision Digital Technology, 310051 Hangzhou, China. 17. Fu X-D. Both sides of the same coin: Rac1 splicing regulation by EGF Department of Integrative Biology and Physiology, University of California, signaling. Cell Res. 2017;27(4):455–6. https://doi.org/10.1038/cr.2017.19. Los Angeles, Los Angeles, CA, USA. Accessed 01 Nov 2017. 18. Vilar E, Gruber SB. Microsatellite instability in colorectal cancer—the Received: 27 November 2017 Accepted: 10 May 2018 stable evidence. Nat Rev Clin Oncol. 2010;7(3):2009–237. https://doi.org/ 10.1038/nrclinonc.2009.237. Accessed 20 Nov 2017. 19. Katz Y, Wang ET, Silterra J, Schwartz S, Wong B, Thorvaldsdóttir H, Robinson JT, Mesirov JP, Airoldi EM, Burge CB. Quantitative visualization References of alternative exon expression from RNA-seq data. Bioinformatics. 1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, 2015;31(14):2400–2. https://doi.org/10.1093/bioinformatics/btv034. Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in Accessed 11 Apr 2018. human tissue transcriptomes. Nature. 2008;456(7221):470–6. https://doi. 20. Strobelt H, Alsallakh B, Botros J, Peterson B, Borowsky M, Pfister H, Lex A. org/10.1038/nature07509. Accessed 04 Sep 2017. Vials: Visualizing Alternative Splicing of Genes. IEEE Trans Vis Comput 2. Blencowe BJ. The Relationship between Alternative Splicing and Graph. 2016;22(1):399–408. https://doi.org/10.1109/TVCG.2015.2467911. Proteomic Complexity. Trends Biochem Sci. 2017;42(6):407–8. https://doi. 21. Koch A, De Meyer T, Jeschke J, Van Criekinge W. MEXPRESS: visualizing org/10.1016/j.tibs.2017.04.001. Accessed 20 Nov 2017. expression, DNA methylation and clinical TCGA data. BMC Genomics. 3. Skotheim RI, Nees M. Alternative splicing in cancer: Noise, functional, or 2015;16:636. https://doi.org/10.1186/s12864-015-1847-z. Accessed 16 systematic? Int J Biochem Cell Biol. 2007;39(7):1432–49. https://doi.org/ Nov 2016. 10.1016/j.biocel.2007.02.016. Accessed 04 Sep 2017. 4. Sveen A, Kilpinen S, Ruusulehto A, Lothe RA, Skotheim RI. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene. 2016;35(19):2413–27. https://doi.org/10.1038/ onc.2015.318. Accessed 04 Sep 2017. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Genomics Springer Journals

TSVdb: a web-tool for TCGA splicing variants analysis

Free
7 pages
Loading next page...
 
/lp/springer_journal/tsvdb-a-web-tool-for-tcga-splicing-variants-analysis-Nrdjo0HHDX
Publisher
BioMed Central
Copyright
Copyright © 2018 by The Author(s)
Subject
Life Sciences; Life Sciences, general; Microarrays; Proteomics; Animal Genetics and Genomics; Microbial Genetics and Genomics; Plant Genetics and Genomics
eISSN
1471-2164
D.O.I.
10.1186/s12864-018-4775-x
Publisher site
See Article on Publisher Site

Abstract

Background: Collaborative projects such as The Cancer Genome Atlas (TCGA) have generated various -omics and clinical data on cancer. Many computational tools have been developed to facilitate the study of the molecular characterization of tumors using data from the TCGA. Alternative splicing of a gene produces splicing variants, and accumulating evidence has revealed its essential role in cancer-related processes, implying the urgent need to discover tumor-specific isoforms and uncover their potential functions in tumorigenesis. Result: We developed TSVdb, a web-based tool, to explore alternative splicing based on TCGA samples with 30 clinical variables from 33 tumors. TSVdb has an integrated and well-proportioned interface for visualization of the clinical data, gene expression, usage of exons/junctions and splicing patterns. Researchers can interpret the isoform expression variations between or across clinical subgroups and estimate the relationships between isoforms and patient prognosis. TSVdb is available at http://www.tsvdb.com,andthesourcecodeisavailable at https://github. com/wenjie1991/TSVdb. Conclusion: TSVdb will inspire oncologists and accelerate isoform-level advances in cancer research. Keywords: Splicing variant, Alternative splicing, TCGA, Cancer, Visualization tools Background variants could be potential biomarkers [7] and therapeutic During transcription in eukaryotes, alternative splicing targets in cancer studies. (AS) of message precursor RNA generates splicing vari- The Cancer Genome Atlas (TCGA) project (http:// ants for a single gene, and particular exons may be cancergenome.nih.gov) has incorporated a vast bulk of included or excluded. It was estimated that approximately genomic sequences, epigenetic profiles, transcriptomes 92-94% of human genes undergo AS [1]. As one of the and multidimensional clinical datasets. It is an excel- most common mechanisms associated with gene regula- lent source for exploring and validating genes of interest tion [2], AS has emerged as a vital mechanism in tumori- through the TCGA RNA-Seq. genesis that regulates the function of cancer-related genes Information on splicing variants can be identified from [3]. Aberrant splicing patterns are closely related to tumor RNA Sequencing data through software such as Cufflinks, progression [4]. For example, misregulation of splicing RSEM (RNA-Seq by Expectation Maximization), Kallisto caused by splicing factor Serine And Arginine Rich Splic- and MapSplicing [8–10].ToolssuchasTCGASpliceSeq ing Factor 1 (SRSF1) can lead to the malignant transfor- [9] and ISOexpresso [11] provide users with alternative mation of normal mammary cells [5]; we also reported splicing patterns and isoform expression data between that Serine And Arginine Rich Splicing Factor 6 (SRSF6) normal and tumor cells. However, detailed clinical infor- promotes tumor progression by regulating AS and might mation, such as the tumor grade, race and survival time, be a potential therapeutic target [6]. Thus, splicing are not provided by the existing tools that investigate splicing variants. To address these questions, we introduced TSVdb, an *Correspondence: lmp@zju.edu.cn; honghezhang@zju.edu.cn Wenjie Sun and Ting Duan contributed equally to this work. interactive web-portal, to perform comparative analy- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang sisonsplicingvariantsacrosstumorsubgroupsusing Province, School of Medicine, Zhejiang University, 310058 Hangzhou, China Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Sun et al. BMC Genomics (2018) 19:405 Page 2 of 7 TCGA RNA-Seq datasets from 33 tumors and 30 clinical a defined non-zero day_to_death value; otherwise, variables. the event variable was considered “censored”. The TSVdb presents a well-organized visualization of survival time was set to the larger variable among the exon/junction usage and splicing patterns, which enable day_to_death and days_to_last_follow-up. The users to readily and quickly access, analyze, and interpret survival status was coded as 0 (live or censored) or 1 splicing variants for interesting genes. Users can inves- (death). tigate the isoform expression between tumor subgroups (c) Stage. The class number was reduced to 5 (I, II, III, and the association of splicing variant expression with IV, and X) for the pathology_stage, clinical_stage and overall survival. masaoka_stage. We believe that TSVdb provides a user-friendly plat- (d) Risk factors of LIHC. “Alpha-1 antitrypsin form for researchers to maximize TCGA utilization and deficiency”, “hemochromatosis” and “other” were unearth more potential cancer biomarkers. combined into “others”. “Alcohol consumption”, “hepatitis b” and “hepatitis c” were classified by Implementation themselves due to their large proportion. Data collection (e) Alcohol consumption per day was divided into “0” TCGA (version 20160128) level 3 data were downloaded and “> 0”. from the TCGA FTP site Firehose. The data included the (f) The number of pregnancies was grouped into six gene, isoform RSEM data, exon, and junction-normalized classes, which defined the number of pregnancies as read count data (UNC illuminaHiSeq_RNASeqV2) and either “1”, “2”, “3”, “4”, “5” or “> 5”. clinical data (Merged_clinical_level_1). Because the data “Indeterminate” data were shown as “UNDEFINED”. in Firehose no longer updates, data updates will use Additional file 1: Table S1 shows the phenotype statistics the TCGA data in the GDC data portal in the forthetumortypes. future. The current TCGA data version was noted All the data described above were deposited into the in the footer on the TSVdb plot page. The RNA- NoSQL database MongoDB (See Additional file 2 for the Seq data transformation was accomplished with the database scheme). R software version 3.2.3. Genes with Entrez IDs in both the TCGA data and annotation package Website and Data Visualization org.Hs.eg.db(3.2.3) were used. The annotation package The nodejs scripts were used to provide the private APIs TxDb.Hsapiens.UCSC.hg19.knownGene(3.2.2) was used to for the web-end. There were three APIs with the follow- annotate the isoform, exon and junction data in the TCGA ing functions: (1) Autocompleting the gene symbol or datasets. The annotation of the TCGA exon and junc- alias, (2) Validating the symbol or the input Entrez ID, tion data was performed by overlapping the locations of (3) Querying MongoDB and then returning the clinical the exons/junctions with the isoform range. The annota- variable list for the queried tumor, and (4) Querying Mon- tion package was also used to plot the transcript isoform goDB and then returning the data to draw the results. structure. The JavaScript libraries d3.js [12] (version 3.0.6, http:// d3js.org/) were used to construct the interactive SVG Data manipulation and transformationn graph. The d3.js-based interactive KM-plot was modified The related clinical information for each cancer type from Nick Strayer’s code (http://bl.ocks.org/nstrayer/). was selected and prepared for each tumor. Thirty clini- The box plot and other distribution charts were modified cal variables were chosen and processed (Additional file 1: from Andrew Sielen’s code (https://github.com/asielen/ TableS1).Acut-offvalue wasusedtomakethe numer- D3_Reusable_Charts). The SVG graph download function ical variables classified variables; if there were too many was adapted from A. Gordon’s solution (https://github. classes, a combination was applied to reduce the class com/agordon/d3export_demo). number. The transformation methods were as follows: ij min ,1 if Q > 0.05 95,i e ·Q j 95,i (a) The number of packs smoked per year smoked, y = (1) ij ij min ,1 if Q ≤ 0.05 95,i which was an integer variable. The cut-off values e ·0.05 were set to 10 and 100 to create the following three ij i=1 e = (2) categories: (1) “less than 10”, “less than 100”, and “greater than 100” packs smoked per year. The exon/junction usage value y displayed in the main ij (b) Overall survival. The day_to_death and results (Fig. 1) was derived from the exon/junction quan- days_to_last_follow-up variables were used to tification value x with a series of scalings and normal- ij generate the time and event variables for overall izations Eq. (1). The effects of the normalization are survival. The event was set to “death” if a patient had shown in Additional file 1:FigureS2foragenethat Sun et al. BMC Genomics (2018) 19:405 Page 3 of 7 Fig. 1 An illustration of the TSVdb database. The top of the web page displays the query parameters buttons; under those buttons were the figure legends. Under the legends were the main results. The sample type, gene expression and exon/junction usage (patients are presented in columns arranged according to their gene expression from low to high in each group and the exons/junctions are arrayed in rows) are displayed on the right side and from the top down. The left side shows the gene transcriptional pattern, in which the thin lines represent the introns and boxes connected by the lines represent the exons corresponding to the rows on the right side. Hovering or clicking on the rows will highlight the corresponding exon in the transcription pattern and double clicking on the rows will open the UCSC Genome Browser in a new tab/window and showing the gene structure. Additionally, clicking on the isoform structure lead to isoform-specific expression data or a survival curve has n exon/junction values (i = 1, ... , n)in m samples finishing the input, the main output window showed (j = 1, ... , m). First, following the idea of a “splicing the clinical information, gene expression, exon/junction index” [13], each sample’s exon/junction quantification usage and isoform structure diagram (Fig. 1). As was value was divided by the expression quantity e of the gene shown, the samples were divided into two or more sub- to which the exon belongs Eq. (2). The gene expression groups according to their clinical information, e.g., “Solid quantity was estimated by averaging the quantification Tissue Normal” and “Primary Solid Tumor”. The sam- of the exons/junctions. Therefore, the gene expression ples in each group were arrayed by their gene expression effect was removed and the AS event was highlighted. levels, which helped to distinguish the correlation Next, the d3.js linear scale was used to map the normal- between the isoform expression and gene expres- ized exon/junction values to the graph coordination. The sion. Meanwhile, the shadowed-line charts display the interval of the normalized exon/junction values (domain exon/junction usage values for each sample to facilitate argument in scale function in D3)was setto (0, Q ) the recognition of alternative splicing. Links to the UCSC 95,i (95% quantile) to diminish the outlier’s impact, which Genome Browser also offer the exon/junction’s loci as may minimize the differences in the AS events between well as further annotation information such as the con- groups. Furthermore, if Q < 0.05, which indicates the servation of the exon sequence, single nucleotide poly- 95,i exon/junction expression quantity, is relatively small when morphisms, and mutations, so that researchers can gain a corresponding to the gene expression quantity, the upper full-scale understanding of the exon or junction. bound of the interval would be set to 0.05. Furthermore, the transcriptional pattern was also dis- played to reveal the splicing isoforms for a single gene and Results their constitutions. Notably, by double-clicking the tran- TSVdb use scriptional pattern, the expression of the isoforms in the Four dialogs were initially used to input the tumor type, different subgroups was shown using a box plot (Fig. 3), exon/junction and clinical data for a specific gene. After and the correlation of the isoforms with overall survival Sun et al. BMC Genomics (2018) 19:405 Page 4 of 7 was demonstrated by a KM-plot (Fig. 4), which indicated type could be downloaded into one file by clicking the promising use for clinical cancer researchers. The KM sur- download link. An illustration of the downloaded table is vival results described four different parts as follows. (1) shown in Additional file 1:TableS2 In the bottom right, there was a way to adjust the cut-off for grouping, where the knob could be adjusted to change Example analysis the cut-off. The default value was set to the middle of the The oncogene Rac Family Small GTPase 1 (RAC1) is isoform expression range. (2) The top right part displayed closely associated with tumorigenesis, tumor progression information on the groups, including the cut-off value and and therapy resistance [14–16], and RAC1 alternative sample size of each group. (3) The top left showed the sur- splicing is important for its regulatory role in cancers.[17]. vival line for each entered individual after filtering them As illustrated by RAC1 in colon cancer, RAC1 generates by the survival start time. The start time could be changed three splicing isoforms, and the fourth exon is skipped by adjusting the knob. (4) The bottom left was the KM- in normal tissues and included in tumor tissues (Fig. 2a). plot. Moreover, users could also click on the right y-axis or Consistently, it was revealed that the use of the junc- bottom x-axis to invoke input boxes and set the position tions linking exons 3 to 4 and 4 to 5 were high in tumor by inputting a specific value. The formatted exon/junction tissues (Additional file 1: Figure S3A). Similarly, by choos- quantification, transcript isoform expression, clinical vari- ing microsatellite instability (MSI) status as the pheno- ables and gene expression data for a gene in one tumor type, the results revealed that the fourth exon usage was Fig. 2 Visualization of the TCGA data for RAC1 in colon adenocarcinoma using TSVdb. a The exon usage results in the different RAC1 sample types. b The exon usage results for different RAC1 MSI statuses using TSVdb. c RAC1 isoforms and annotation of the fourth exon from the UCSC Genome Browser Sun et al. BMC Genomics (2018) 19:405 Page 5 of 7 high in the microsatellite stable (MSS) and microsatel- samples, which can also be achieved by ISOexpresso [11] lite instability-low (MSI-L) groups (Fig. 2b), suggesting its and TCGASpliceSeq [9], respectively (Table 1). potential role in DNA damage response [18]. Moreover, Moreover, TSVdb presents a better and more conve- annotation from the UCSC Genome Browser (Fig. 2c) nient visualization for users to assess exon/junction usage, showed that the DNA sequence of the fourth exon was transcript isoform expression, isoform pattern graph, and evolutionarily conserved in vertebrates, which indicated clinical information in one graph. This is the first time the importance of its function in tumor biology. that we have integrated comprehensive clinical data with The isoform expression variation for RAC1 was also TCGA alternative splicing analysis tools; this web-tool shown by the box plot (Fig. 3). Primary solid tumor tis- will help to perform comparative analysis across different sues had higher uc003spw.3 transcript expression, while tumor subgroups. The subgroups are defined by demo- there was lower uc003spx.3 transcript expression relative graphic data, clinical diagnosis data, treatment data and to normal tissues (Fig. 3a and b) Interestingly, the fourth follow-up data. exon was only included in isoform uc003spw.3. Further- By taking advantage of clinical data from TCGA, in the more, the MSI-H samples showed lower uc003spw.3 analysis example, the result of the gene RAC1 was shown. transcript expression but higher uc003spx.3 transcript We found that the expression of the fourth exon was high expression (Fig. 3c and d). Additionally, the KM-plots in adenocarcinoma tumor tissue. Moreover, it was discov- showed the correlation between uc003spw.3 and ered that exon usage mainly increased in tumor tissues uc003spx.3 with the overall survival of colon cancer from MSS and MSI-L patients. patients (Fig. 4), and the high uc003spw.3 expression was In the data visualization tactic aspect, Sashimi Plots correlated with poor prognosis. aremostcommonlyusedtovisualize alternativesplicing [19]. Sashimi Plots use read densities to represent the Discussion amount of reads alignment in exons or junctions. There TSVdb is a user-friendly interface for unearthing alterna- are some variations of Sashimi Plots. For example, tive splicing variations in 33 cancers. Similar to existing TCGASpliceSeq uses the number to show the reads quan- tools, TSVdb provides a comparison of isoform expres- tity and GTEx project uses the color gradient to quantify. sion and alternative splicing between tumor and normal However, Sashimi Plots cannot visualize many samples Fig. 3 Expression of RAC1 splicing isoforms in colon adenocarcinoma in TSVdb. a and b Differential expression of isoforms uc003spw.3 and uc003spx.3, respectively, in tumor and normal tissues. c and d Differential expression of isoforms uc003spw.3 and uc003spx.3, respectively, in different MSI statuses Sun et al. BMC Genomics (2018) 19:405 Page 6 of 7 as well and do not facilitate comparisons between sam- ples. Visualizing Alternative Splicing (Vials) (http://vials. io/vials/) resolved the problem using a complex multivari- able graph [20]. In TSVdb, the data visualization strategy was inspired by MEXPRESS [21], in which samples are plotted on the x-axis, the genome is arranged on the y- axis and comparisons between subgroups are achieved by sorting the samples by phenotypes. This strategy makes it possible to display data for hundreds of samples in a single figure with genome annotations. Although, as the price, TSVdb cannot display the exon and junction reads quan- tity simultaneously, it can take advantage of the big sample size in TCGA datasets. Conclusion In summary, we provided a web-based tool for splic- ing variants analysis. We believe that TSVdb offers researchers a quick and straightforward visualization tool to explore alternative splicing and isoform expression of target genes in clinical subgroups within the TCGA data. Availability and requirements Project name: TSVdb Project home page: http://www.tsvdb.com/ Operating system(s): Platform independent Programming language: R, Nodejs and perl 6 (server side scripts) Fig. 4 Kaplan-Meier plots showing the associations of the RAC1 Other requirements: Internet browser required for net- isoform (a) uc003spw.3 and (b) uc003spx.3 with overall patient work visualization survival License: Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) Table 1 Summary of the features of the three databases, Any restrictions to use by non-academics: no restriction including SpliceSeq, ISOexpresso, and TSVdb, for visualizing the TCGA splice variant data Features TCGA SpliceSeq ISOexpresso TSVdb Additional files Splicing event Yes No No measurements Additional file 1: Supplementary figures and tables. Table S1. The clinical variable distribution. Table S2. The data download format. Figure S1. The Algorithm integrate Yes No No query dialogs for choosing the tumor type. Figure S2. The procedures for exon, junction reads calculating the RAC1 exon usage in colon adenocarcinoma. Figure S3. The Provide Screening Yes No No RAC1 junction usage in colon adenocarcinoma using TSVdb. Figure S4. result The Kaplan-Meier plots showing the associations of the RAC1 isoform uc003spw.3, uc003spx.3 with overall patient survival. (PDF 919 kb) Splicing pattern graph Yes No No Additional file 2: MongoDB database scheme. (TXT 2 kb) Isoform pattern graph No Yes Yes Isoform expression No Yes Yes Abbreviations ACC: Adrenocortical carcinoma; AS: Alternative splicing; BLCA: Bladder Exon quantification Yes No Yes urothelial carcinoma; BRCA: Breast invasive carcinoma; CESC: Cervical Junction Yes No Yes squamous cell carcinoma and endocervical adenocarcinoma; CHOL: quantification Cholangiocarcinoma; COAD: Colon adenocarcinoma; DDR: DNA damage response; DLBC: Lymphoid neoplasm diffuse large B-cell lymphoma; ESCA: Clinical data No No Yes Esophageal carcinoma; GBM: Glioblastoma multiforme; HNSC: Head and neck ∗ † Multi-omics No Partial Partial squamous cell carcinoma; KICH: Kidney Chromophobe; KIRC: Kidney renal clear cell carcinoma; KIRP: Kidney renal papillary cell carcinoma; LAML: Acute Genome-wide data Yes No No myeloid leukemia; LGG: Brain lower grade glioma; LIHC: Liver hepatocellular download carcinoma; LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell ISOexpresso provided eQTL analysis carcinoma; MESO: Mesothelioma; MSI-H: microsatellite instability-high; MSI-L: TSVdb provided gene expression correlated AS analysis microsatellite instability-low; MSS: microsatellite stable; OV: Ovarian serous Sun et al. BMC Genomics (2018) 19:405 Page 7 of 7 cystadenocarcinoma; PAAD: Pancreatic adenocarcinoma; PCPG: 5. Anczuków O, Akerman M, Cléry A, Wu J, Shen C, Shirole NH, Raimer A, Pheochromocytoma and Paraganglioma; PRAD: Prostate adenocarcinoma; Sun S, Jensen MA, Hua Y, Allain FH-T, Krainer AR. SRSF1-Regulated RAC1: Rac family small GTPase 1; READ: Rectum adenocarcinoma; RSEM: Alternative Splicing in Breast Cancer. Mol Cell. 2015;60(1):105–17. RNA-Seq by expectation maximization; SARC: Sarcoma; SKCM: Skin cutaneous https://doi.org/10.1016/j.molcel.2015.09.005. Accessed 27 Oct 2017. melanoma; SRSF1: Serine and arginine rich splicing factor 1; SRSF6: Serine and 6. Wan L, Yu W, Shen E, Sun W, Liu Y, Kong J, Wu Y, Han F, Zhang L, Yu T, arginine rich splicing factor 6; STAD: Stomach adenocarcinoma; TCGA: The Zhou Y, Xie S, Xu E, Zhang H, Lai M. SRSF6-regulated alternative splicing cancer genome atlas; TGCT: Testicular germ cell tumors; THCA: Thyroid that promotes tumour progression offers a therapy target for colorectal carcinoma; THYM: Thymoma; UCEC: Uterine corpus endometrial carcinoma; cancer. Gut. 2017;2017–314983. https://doi.org/10.1136/gutjnl-2017- UCS: Uterine carcinosarcoma; VM: Uveal melanoma 314983. Accessed 18 Nov 2017. 7. Omenn GS, Guan Y, Menon R. A new class of protein cancer biomarker Acknowledgements candidates: Differentially expressed splice variants of ERBB2 (HER2/neu) We would like to thank Ledong Wan and Riccardo Fodde who tested this and ERBB1 (EGFR) in breast cancer cell lines. J Proteome. 2014;107:103–12. website and gave us very valuable feedback. The data generated by TCGA https://doi.org/10.1016/j.jprot.2014.04.012. Accessed 05 Sep 2017. Research Network (http://cancergenome.nih.gov/) has been used for TSVdb 8. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, development. McPherson A, Szczesniak ´ MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. Funding https://doi.org/10.1186/s13059-016-0881-8. Accessed 28 Jun 2016. This work is supported by grants from the National Natural Science 9. Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, Foundation of China (81672730 to H.Z. and 81572716 to M.L.), the 111 Project Weinstein J. TCGASpliceSeq a compendium of alternative mRNA splicing (B13026 to M.L.) and the Fundamental Research Funds for the Central in cancer. Nucleic Acids Res. 2015;1288. https://doi.org/10.1093/nar/ Universities (172210271 to H.Z.). These funding sources had no involvement in gkv1288 00001. Accessed 19 Dec 2016. the study design; the collection, analysis, or interpretation of data; the writing 10. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq of the report; or the decision to submit the manuscript for publication. data with or without a reference genome. BMC Bioinformatics. 2011;12:323. https://doi.org/10.1186/1471-2105-12-323 Accessed 27 Oct 2017. Availability of data and materials 11. Yang IS, Son H, Kim S, Kim S. ISOexpresso: a web-based platform for TSVdb is freely available for academic or commercial use at isoform-level expression analysis in human cancer. BMC Genomics. http://www.tsvdb.com/. 2016;17:631. https://doi.org/10.1186/s12864-016-2852-6 Accessed 26 Oct 2017. Authors’ contributions 12. Bostock M, Ogievetsky V, Heer J. D3 Data-Driven Documents. IEEE Trans WS, ML, and HZ conceived the project. WS, PY, and KC designed and Vis Comput Graph. 2011;17(12):2301–9. https://doi.org/10.1109/TVCG. developed the TSVdb. DT and GZ participated in the designing tested the 2011.185. Accessed 30 Oct 2017. utility of the tool. WS and DT wrote and revised the manuscript. WS, TD, and 13. Cuperlovic-Culf M, Belacel N, Culf AS, Ouellette RJ. Data analysis of GZ collected and cleared up the TCGA data. All authors have reviewd and alternative splicing microarrays. Drug Discov Today. 2006;11(21):983–90. approved the final version of this manuscript. https://doi.org/10.1016/j.drudis.2006.09.011. Accessed 08 Sep 2017. 14. Ma Q, Cavallin LE, Yan B, Zhu S, Duran EM, Wang H, Hale LP, Dong C, Ethics approval and consent to participate Cesarman E, Mesri EA, Goldschmidt-Clermont PJ. Antitumorigenesis of Not applicable. antioxidants in a transgenic Rac1 model of Kaposi’s sarcoma. Proc Natl Acad Sci. 2009;106(21):8683–8. https://doi.org/10.1073/pnas.0812688106. Competing interests Accessed 01 Nov 2017. The authors declare that they have no competing interests. 15. Stallings-Mann ML, Waldmann J, Zhang Y, Miller E, Gauthier ML, Visscher DW, Downey GP, Radisky ES, Fields AP, Radisky DC. Matrix Publisher’s Note metalloproteinase induction of Rac1b, a key effector of lung cancer Springer Nature remains neutral with regard to jurisdictional claims in progression. Sci Transl Med. 2012;4(142):142–95. https://doi.org/10.1126/ published maps and institutional affiliations. scitranslmed.3004062. 16. Dokmanovic M, Hirsch DS, Shen Y, Wu WJ. Rac1 contributes to Author details trastuzumab resistance of breast cancer cells: Rac1 as a potential Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang therapeutic target for the treatment of trastuzumab-resistant breast Province, School of Medicine, Zhejiang University, 310058 Hangzhou, China. cancer. Mol Cancer Ther. 2009;8(6):1557–69. https://doi.org/10.1158/ Department of Toxicology, School of Medicine, Zhejiang University, 310058 1535-7163.MCT-09-0140. Accessed 01 Nov 2017. Hangzhou, China. Hikvision Digital Technology, 310051 Hangzhou, China. 17. Fu X-D. Both sides of the same coin: Rac1 splicing regulation by EGF Department of Integrative Biology and Physiology, University of California, signaling. Cell Res. 2017;27(4):455–6. https://doi.org/10.1038/cr.2017.19. Los Angeles, Los Angeles, CA, USA. Accessed 01 Nov 2017. 18. Vilar E, Gruber SB. Microsatellite instability in colorectal cancer—the Received: 27 November 2017 Accepted: 10 May 2018 stable evidence. Nat Rev Clin Oncol. 2010;7(3):2009–237. https://doi.org/ 10.1038/nrclinonc.2009.237. Accessed 20 Nov 2017. 19. Katz Y, Wang ET, Silterra J, Schwartz S, Wong B, Thorvaldsdóttir H, Robinson JT, Mesirov JP, Airoldi EM, Burge CB. Quantitative visualization References of alternative exon expression from RNA-seq data. Bioinformatics. 1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, 2015;31(14):2400–2. https://doi.org/10.1093/bioinformatics/btv034. Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in Accessed 11 Apr 2018. human tissue transcriptomes. Nature. 2008;456(7221):470–6. https://doi. 20. Strobelt H, Alsallakh B, Botros J, Peterson B, Borowsky M, Pfister H, Lex A. org/10.1038/nature07509. Accessed 04 Sep 2017. Vials: Visualizing Alternative Splicing of Genes. IEEE Trans Vis Comput 2. Blencowe BJ. The Relationship between Alternative Splicing and Graph. 2016;22(1):399–408. https://doi.org/10.1109/TVCG.2015.2467911. Proteomic Complexity. Trends Biochem Sci. 2017;42(6):407–8. https://doi. 21. Koch A, De Meyer T, Jeschke J, Van Criekinge W. MEXPRESS: visualizing org/10.1016/j.tibs.2017.04.001. Accessed 20 Nov 2017. expression, DNA methylation and clinical TCGA data. BMC Genomics. 3. Skotheim RI, Nees M. Alternative splicing in cancer: Noise, functional, or 2015;16:636. https://doi.org/10.1186/s12864-015-1847-z. Accessed 16 systematic? Int J Biochem Cell Biol. 2007;39(7):1432–49. https://doi.org/ Nov 2016. 10.1016/j.biocel.2007.02.016. Accessed 04 Sep 2017. 4. Sveen A, Kilpinen S, Ruusulehto A, Lothe RA, Skotheim RI. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene. 2016;35(19):2413–27. https://doi.org/10.1038/ onc.2015.318. Accessed 04 Sep 2017.

Journal

BMC GenomicsSpringer Journals

Published: May 29, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off