SEGreg: a database for human specifically expressed genes and their regulations in cancer and normal tissue

SEGreg: a database for human specifically expressed genes and their regulations in cancer and... Abstract Human specifically expressed genes (SEGs) usually serve as potential biomarkers for disease diagnosis and treatment. However, the regulation underlying their specific expression remains to be revealed. In this study, we constructed SEG regulation database (SEGreg; available at http://bioinfo.life.hust.edu.cn/SEGreg) for showing SEGs and their transcription factors (TFs) and microRNA (miRNA) regulations under different physiological conditions, which include normal tissue, cancer tissue and cell line. In total, SEGreg collected 6387, 1451, 4506 and 5320 SEGs from expression profiles of 34 cancer types and 55 tissues of The Cancer Genome Atlas, Cancer Cell Line Encyclopedia, Human Body Map and Genotype-Tissue Expression databases/projects, respectively. The cancer or tissue corresponding expressed miRNAs and TFs were identified from miRNA and gene expression profiles, and their targets were collected from several public resources. Then the regulatory networks of all SEGs were constructed and integrated into SEGreg. Through a user-friendly interface, users can browse and search SEGreg by gene name, data source, tissue, cancer type and regulators. In summary, SEGreg is a specialized resource to explore SEGs and their regulations, which provides clues to reveal the mechanisms of carcinogenesis and biological processes. SEGreg, miRNA, transcription factor, specific expression, regulation, network Introduction The specifically expressed genes (SEGs), which are expressed in a unique or a small number of tissues/physiological conditions, could serve as biomarkers and provide important clues for gene function and tissue-specific characteristics [1]. In recent years, high-throughput sequencing techniques provided abundant data to examine gene expression patterns. Some prevalent methods such as correlated expression, differential expression and specific expression were widely used for specific gene identification [2]. Besides, some sophisticated algorithms such as PaGeFinder [3] and SpeCond [4] were also developed to screen out tissue-specific genes. To date, some relevant studies have paid attention to SEGs and their functions, such as TissGDB [5], PaGenBase [6], HOMER [1], C-It [7] and TiGER [8]. But most of these resources just allow users to access SEGs over a limited range of tissues or diseases, or some of the data were sourced from Expressed Sequence Tag or text mining, which were less accurate than the prevailing high-throughput sequencing data. Besides, the regulatory mechanisms underlying physiology and pathogenesis have not been systematic investigated. Although the TissGDB is a newly tissue-specific gene expression database containing SEGs from public resources of HPA, TiGER and Genotype-Tissue Expression (GTEx), the identification method of SEGs is different from this study, and the SEG list is not completely equivalent to SEG regulation database (SEGreg). What is more, it contains fewer tissue/cancer types than SEGreg. By testing GTEx data, our SEGtool algorithm (http://bioinfo.life.hust.edu.cn/SEGtool/) [9] is proven to outperform the most prevalent SEG detection methods including Z-score [10], ROKU [11], SpeCond [4] and PaGeFinder [3], and the results were proven to be more precise. Another existing difference is the content of mechanism investigation, by using The Cancer Genome Atlas (TCGA) data, TissGDB performed gene expression, somatic mutation analyses and co-expressed protein interaction network analysis, while in this study we performed microRNA (miRNA) and transcription factor (TF) regulatory network analysis. miRNA and TF are two major focuses that have drawn extensive attention in diseases and physiological activities. They play pivotal roles in proliferation, differentiation, invasion and metastasis of tumor, or affect the development, metabolism, apoptosis and other biological processes of normal tissues [12]. To further explore the molecular mechanisms of complex diseases, regulatory networks were widely studied and proven to be effective means for elucidating gene function and interactions [13]. In our previous studies, miRNA and TF regulatory networks have been used to reveal some complex molecular mechanisms in diseases and cell development. For example, Ye et al. [14] used TF and miRNA co-regulatory network, and analysis revealed miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia. Lin et al. [15] investigated TF and miRNA co-regulatory network in the development of B cell and T cell. Through TF and miRNA regulation analysis, the complex gene regulatory relationships in normal tissues or diseases will be illuminated on systematic levels. In this study, to survey and provide a comprehensive understanding of the SEG regulations involved in human cancers/cancer cells and normal tissues, first, we identified SEGs from four large-scale data sets using SEGtool; then analyzed their miRNA and TF regulation in each cancer/normal tissue; finally, we designed SEGreg database to store and present the results. As a whole, SEGreg is a freely available database aiming to provide a resource that includes accurate SEGs and their expression as well as regulatory networks, which will facilitate the research of gene function in critical diseases and specific biological processes. Materials and methods Obtaining SEGs and constructing networks Cancer-related expression profiles were downloaded from TCGA (https://tcga-data.nci.nih.gov/tcga) [16] and Cancer Cell Line Encyclopedia (CCLE) [17], while the normal tissue expression data were downloaded from Human Body Map (BodyMap) [18] and GTEx project [19]. In CCLE data set, various cell lines from the same tissue were regarded as parallel samples of that tissue. In SEGreg, SEGs were identified from 11 092 TCGA samples of 34 cancer types, 1036 CCLE samples of 24 tissues, 141 BodyMap samples of 35 tissues and 8555 GTEx samples of 31 tissues using SEGtool. The expression levels of biological replicates in the same tissue were normalized by the ‘deal_replicated’ function in SEGtool to generate a representative value. The matched miRNA expression profiles of TCGA cancers were downloaded from FireBrowse (http://www.firebrowse.org), while the miRNA expression profile of normal tissues were downloaded from our HMED database [20]. What is more, specifically expressed TFs of each cancer/tissue were identified from its corresponding gene expression matrix. Genes were filtered with expression threshold of the number of reads/fragments per kilobase of transcript per million mapped reads (RPKM/FPKM) > 3.0 in at least one tissue/cancer, while the threshold of miRNA and TF expression level were set as RPKM/FPKM >10.0. Networks were constructed using the SEGs, as well as collected miRNAs and TFs from several databases. For miRNA, we collected experimentally verified and predicted targets as described in our previous paper [21]. The data mainly include the overlapped results from miRecords4, miRTarbase6, Tarbase7, TargetScanHuman7.1 and miRWalk2. TF targets of human were extracted from TRANSFAC database. Data of regulatory networks were generated from the above data using in-house scripts. For each network, TF, miRNA and SEG were from the same cancer/tissue. Database and Web site implementation A recently developed python micro-framework—Flask—and its extensions enabled us to write web applications. The processed data were imported into a MySQL relational database (version: 5.7.18). All stored data were accessible through a series of Python functions to dynamically generate the content for the Apache web server (version: 2.4.18). By using the Bootstrap extension, we offered a user-friendly web interface. Cytoscape plug-in [22] was also integrated into the application to exhibit regulatory networks online. Results Data summary in SEGreg In SEGreg, four SEG data sets were generated by analyzing gene expression profiles from TCGA, CCLE, BodyMap and GTEx databases/projects, respectively. Totally, SEGreg includes 6387, 1451, 4506 and 5320 SEGs, which come from 34 cancer types and 24, 35, 31 tissues, respectively (Table 1). For cancer data sets, TCGA and CCLE shared 806 common SEGs, while the normal data sets BodyMap and GTEx shared 2734 common SEGs; the big difference between lists enriched our database. Interestingly, 357, 12, 12 and 16 genes in these four data sets served as the intersection of high and low SEGs (Table 1); in each data set, a highly SEG in one disease/tissue can also be identified as lowly SEG in another disease/tissue. Furthermore, we identified 314, 368, 448 and 442 miRNAs and 764, 680, 463 and 455 TFs as regulators of the SEGs in four data sets, respectively (Table 1). There were 4289 SEGs, which were regulated by both miRNA and TF in TCGA data set, while in the other three data sets, the numbers were 283, 1530 and 1903. The separate number of miRNA–gene regulation pair and TF–gene regulation pair were also surveyed (Table 1), and the detailed numbers of regulation pairs in each cancer/tissue were investigated (Figure 1). It is not hard to conclude that the cancer LAML (Acute Myeloid Leukemia) (Figure 1A) and cancer tissue autonomic ganglia (Figure 1B) have the most complex regulatory networks, while in normal tissues, brain (Figure 1C) and testis (Figure 1D) have the most regulative relations. One gene can be regulated by several miRNAs and TFs; meanwhile, the gene may target several other genes in case it is a TF. What is more, SEGreg allows users to construct synergistic regulatory networks of miRNAs and TFs in realtime. In addition, we found that a gene may synchronously act as SEGs in multiple tissues/cancer types and result in the regulatory networks different from each other. Table 1 Overview of SEGs and their regulator in SEGreg Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Table 1 Overview of SEGs and their regulator in SEGreg Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Figure 1. View largeDownload slide The statistics of miRNA–gene pair and TF–gene pair obtained from four data sets: (A) TCGA; (B) CCLE; (C) BodyMap; (D) GTEx. Figure 1. View largeDownload slide The statistics of miRNA–gene pair and TF–gene pair obtained from four data sets: (A) TCGA; (B) CCLE; (C) BodyMap; (D) GTEx. Functions of cancer and tissue SEGs In SEGreg, cancer SEGs were identified from resources of TCGA and CCLE, while normal SEGs were identified from BodyMap and GTEx expression profiles. To further recognize the characteristics of the SEGs and their biology functions, we paid more attention to the cancer types/tissues, which owned the most SEGs. For TCGA, totally 6387 SEGs were identified from 34 cancer types, among which LAML (Acute Myeloid Leukemia) possesses 1623 (39.3%) SEGs accounting for the largest proportion. To reveal the potential function of these SEGs in LAML, we performed Gene Ontology enrichment analysis using DAVID (https://david.ncifcrf.gov/). The results (Figure 2) showed that specifically high expressed genes in LAML mainly participate in activation, differentiation and proliferation of leukocyte, lymphocyte, myeloid cell and mononuclear cell, as well as cell activation involved immune responses. The specifically low expressed genes are mainly involved in blood vessel, heart and cardiovascular system development and morphogenesis. The other cancer-related data set is CCLE, which contains 24 cancer tissues. The autonomic ganglia, small intestine and hematopoietic and lymphoid tissue, respectively, possess 351 (24.2%), 320 (22.0%) and 288 (19.8%) SEGs and rank the top in all tissues. The main functions of SEGs in autonomic ganglia are neurogenesis, neuron differentiation and development, while in small intestine are tissue development, cell differentiation, development and morphogenesis, in hematopoietic and lymphoid tissue are immune response, leukocyte, lymphocyte, B cell, T cell activation, differentiation and proliferation. Through statistical analysis of normal tissues in BodyMap and GTEx data sets, we found that in both these data sets testis possesses the most SEGs, especially highly expressed SEGs; they play key roles in reproduction, meiotic nuclear division, gamete generation and germ cell development; interestingly there was no lowly expressed SEGs in testis. Our previous study has pointed out that the SEGs in testis are strongly associated with organ-specific functions [9]. Figure 2. View largeDownload slide Gene Ontology enrichment of SEGs in LAML. Figure 2. View largeDownload slide Gene Ontology enrichment of SEGs in LAML. Web interface Through a user-friendly interface, users can browse or search data by SEGs, miRNAs and TFs in various cancer types/tissues from each data set, as well as download and display synergistic regulatory network. When browsing by SEG list of each data set, the result will be displayed in a table (Figure 3A). Detailed information of SEGs can also be displayed by clicking the ‘detail’ button. When browsing by cancer type/tissue, a statistical barchart will show the numbers of highly or lowly expressed SEGs (Figure 3B). By clicking the ‘high’ bar or ‘low’ bar in the barchart, users can further obtain the corresponding highly or lowly expressed SEG list (similar to Figure 3A) of each cancer type/tissue. Furthermore, users can construct and browse regulatory networks online by clicking the ‘network’ button in the table (Figure 3C). If a gene is specifically expressed in various cancer types/tissues, user will obtain a page containing multiple regulatory networks. Besides, by clicking the ‘high’ or ‘low’ tag in the table, users can obtain a barchart displaying the gene expression levels among different tissues. All of the data can be sorted in ascending or descending according to cancer/tissue/gene/miRNA/TF name. On the ‘Search’ page, users can search data by data sources and/or tissue and/or genes, or search by regulators in different data sets. What is more, SEGreg provided a quick search function on each webpage. By inputting a tissue name or cancer name or gene name or miRNA name, users can obtain a comprehensive result that contains all information from four data sets. All expression data and regulation data can be downloaded from the ‘Download’ page. We also designed a function to construct synergistic regulatory network online by optional use of miRNA and/or TF combinations on the homepage (Figure 3D and E). Through testing, SEGreg works well on all major browsers. Figure 3. View largeDownload slide Browse SEGs and their regulations by gene, cancer type, tissue, miRNA and TF, and construct regulatory networks in each cancer type/tissue. (A) Result presentation of browsing by genes in each data set. (B) Browse highly or lowly expressed SEGs in each cancer type/tissue. By clicking each bar in the plot, users can get the corresponding highly or lowly expressed SEGs and further investigate their expression levels or miRNA/TF regulatory networks. (C) An example of the regulatory network of SEG in SEGreg database. Color: Yellow: SEG; Blue: miRNA; Purple: TF; Green: if this SEG is also a TF, its target genes are marked in green. (D) Construction of miRNAs and/or TFs co-regulatory networks. (E) An example of co-regulatory network. Color: Green: SEG; Yellow: miRNA; Blue: TF. Figure 3. View largeDownload slide Browse SEGs and their regulations by gene, cancer type, tissue, miRNA and TF, and construct regulatory networks in each cancer type/tissue. (A) Result presentation of browsing by genes in each data set. (B) Browse highly or lowly expressed SEGs in each cancer type/tissue. By clicking each bar in the plot, users can get the corresponding highly or lowly expressed SEGs and further investigate their expression levels or miRNA/TF regulatory networks. (C) An example of the regulatory network of SEG in SEGreg database. Color: Yellow: SEG; Blue: miRNA; Purple: TF; Green: if this SEG is also a TF, its target genes are marked in green. (D) Construction of miRNAs and/or TFs co-regulatory networks. (E) An example of co-regulatory network. Color: Green: SEG; Yellow: miRNA; Blue: TF. Gene regulation and case studies Some SEGs and regulators have been proven by publications to be crucial in occurrence and development of diseases. For example, LGALS3 (galectin-3), a highly expressed SEG in three cancer types (COAD: Colon Adenocarcinoma, KICH: Kidney Chromophobe and READ: Rectum Adenocarcinoma) in this study (Figure 4A), while lowly expressed in normal liver tissue, recently has been reported to promote colorectal cancer metastasis and act as diagnostic marker in multiple diseases [23, 24]. It is common that the same oncogenic gene plays different roles in different cancer types [5]. LGALS3 could be co-regulated by 34 TFs in COAD, KICH and READ (Figure 4B). Some of these TFs such as ATF3 and C/EBPα are tumor suppressors. To further understand the co-regulation of ATF3 and C/EBPα in these three cancer types, we construct synergistic regulatory networks in these three cancer types and found they target 17, 43 and 17 genes in COAD, KICH and READ, respectively. Some SEGs play key roles in tissue development. For example, NEUROD2 and NEUROD6 are key regulators in brain function [25] and they were highly expressed in brain tissue of GTEx data set in SEGreg. In brain, they can be regulated by many TFs (Figure 4C and D) including SOX9, which is a specific marker for neural stem cells and expresses in the adult cerebellum [26]. Some other regulations were also verified by our network analysis. For example, E2F2 can be suppressed by miR-424 in liver hepatocellular carcinoma [27], and hsa-let-7i plays a crucial role in colorectal cancer metastasis and targets eight genes including SOX13, SLC25A4 and SEMA4F [28]. These interesting findings will benefit users who are devoted to investigating the regulatory mechanisms of cancers. Figure 4. View largeDownload slide Expression level of LGALS3 in different cancer types and regulatory networks of NEUROD2 and NEUROD6 in GTEx brain tissue. (A) Among 35 cancer types, LGALS3 is highly expressed in COAD, KICH and READ. (B) Venn diagram of TFs which target LGALS3 in COAD, KICH, READ and normal liver tissue. (C) Regulatory network of NEUROD2 in GTEx brain tissue. (D) Regulatory network of NEUROD6 in GTEx brain tissue. Color: Yellow: gene; Blue: miRNA; Purple: TF; Green: If this SEG is also a TF, its target genes are marked in green. Figure 4. View largeDownload slide Expression level of LGALS3 in different cancer types and regulatory networks of NEUROD2 and NEUROD6 in GTEx brain tissue. (A) Among 35 cancer types, LGALS3 is highly expressed in COAD, KICH and READ. (B) Venn diagram of TFs which target LGALS3 in COAD, KICH, READ and normal liver tissue. (C) Regulatory network of NEUROD2 in GTEx brain tissue. (D) Regulatory network of NEUROD6 in GTEx brain tissue. Color: Yellow: gene; Blue: miRNA; Purple: TF; Green: If this SEG is also a TF, its target genes are marked in green. Summary and perspective SEGreg is a specialized database that focused on SEGs and their miRNA and TF regulations in different cancer types/tissues. To build the database, we collected RNA expression profiles, miRNA expression profiles and regulators from various data sources which contain extensive samples. Accurate SEGs were detected by SEGtool, a newly developed tool based on Fuzzy c-means, Jaccard index and greedy annealing method with higher specificity and accuracy than existing major tools. What is more, miRNA and TF regulations in each cancer/tissue were revealed and regulatory networks were constructed. Finally, SEGreg database was designed to archive and present all the information. To detect more expression mechanisms and enrich the content of SEGreg, we will investigate the methylation and mutation of SEGs in the near future. SEGreg will be regularly updated as the growing of public data. We anticipate SEGreg will provide comprehensive information and significant clues to researchers who focused on tissue-specific function, gene expression and regulation and biomarker discovery in cancer/tissue. Key Points Focused on specifically high/low expressed genes of both cancer and normal tissues. Identified more precise and complete SEGs from four public resources using SEGtool. Network analysis was used to clarify gene expression mechanisms. Developed a specialized SEG regulation database (SEGreg). Funding This research was supported by funding from the National Natural Science Foundation of China (No. 31471247 and 31771458) and Open Project funded by Key laboratory of Carcinogenesis and Translational Research, Ministry of Education/Beijing (2017 Open Project-6). Qin Tang, is a Postdoctoral Fellow in College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China. Her research interests include development of databases and bioinformatics tools, and cancer genomics. Qiong Zhang is a PhD candidate in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. Yao Lv is a master student in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. Ya-Ru Miao is a master student in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. An-Yuan Guo, is a Professor at the Key Laboratory of Molecular Biophysics of the Ministry of Education, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, China. His research interests include TF&ncRNA regulation in diseases, development of databases and bioinformatics algorithms and cancer genomics. Reference 1 Zhang F , Chen JY. HOMER: a human organ-specific molecular electronic repository . BMC Bioinformatics 2011 ; 12(Suppl 10) : S4. Google Scholar CrossRef Search ADS PubMed 2 Xiao SJ , Zhang C , Zou Q , et al. TiSGeD: a database for tissue-specific genes . Bioinformatics 2010 ; 26 ( 9 ): 1273 – 5 . http://dx.doi.org/10.1093/bioinformatics/btq109 Google Scholar CrossRef Search ADS PubMed 3 Pan JB , Hu SC , Wang H , et al. PaGeFinder: quantitative identification of spatiotemporal pattern genes . Bioinformatics 2012 ; 28 ( 11 ): 1544 – 5 . http://dx.doi.org/10.1093/bioinformatics/bts169 Google Scholar CrossRef Search ADS PubMed 4 Cavalli FM , Bourgon R , Vaquerizas JM , et al. SpeCond: a method to detect condition-specific gene expression . Genome Biol 2011 ; 12 ( 10 ): R101 . Google Scholar CrossRef Search ADS PubMed 5 Kim P , Park A , Han G , et al. TissGDB: tissue-specific gene database in cancer . Nucleic Acids Res 2017 , doi:10.1093/nar/gkx850. 6 Pan JB , Hu SC , Shi D , et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function . PLoS One 2013 ; 8 ( 12 ): e80747 . Google Scholar CrossRef Search ADS PubMed 7 Gellert P , Jenniches K , Braun T , et al. C-It: a knowledge database for tissue-enriched genes . Bioinformatics 2010 ; 26 ( 18 ): 2328 – 33 . http://dx.doi.org/10.1093/bioinformatics/btq417 Google Scholar CrossRef Search ADS PubMed 8 Liu X , Yu X , Zack DJ , et al. TiGER: a database for tissue-specific gene expression and regulation . BMC Bioinformatics 2008 ; 9 : 271. http://dx.doi.org/10.1186/1471-2105-9-271 Google Scholar CrossRef Search ADS PubMed 9 Zhang Q , Liu W , Liu C , et al. SEGtool: a specifically expressed gene detection tool and applications in human tissue and single-cell sequencing data . Brief Bioinform 2017 , doi:10.1093/bib/bbx074. 10 Cheadle C , Vawter MP , Freed WJ , et al. Analysis of microarray data using Z score transformation . J Mol Diagn 2003 ; 5 ( 2 ): 73 – 81 . http://dx.doi.org/10.1016/S1525-1578(10)60455-2 Google Scholar CrossRef Search ADS PubMed 11 Kadota K , Ye J , Nakai Y , et al. ROKU: a novel method for identification of tissue-specific genes . BMC Bioinformatics 2006 ; 7 ( 1 ): 294 . http://dx.doi.org/10.1186/1471-2105-7-294 Google Scholar CrossRef Search ADS PubMed 12 Salehi Z , Akrami H. Target genes prediction and functional analysis of microRNAs differentially expressed in gastric cancer stem cells MKN-45 . J Cancer Res Ther 2017 ; 13 : 477 – 83 . Google Scholar PubMed 13 Saha A , Kim Y , Gewirtz ADH , et al. Co-expression networks reveal the tissue-specific regulation of transcription and splicing . Genome Res 2017 ; 27 : 1843 – 58 . http://dx.doi.org/10.1101/gr.216721.116 Google Scholar CrossRef Search ADS PubMed 14 Ye H , Liu X , Lv M , et al. MicroRNA and transcription factor co-regulatory network analysis reveals miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia . Nucleic Acids Res 2012 ; 40 ( 12 ): 5201 – 14 . http://dx.doi.org/10.1093/nar/gks175 Google Scholar CrossRef Search ADS PubMed 15 Lin Y , Zhang Q , Zhang HM , et al. Transcription factor and miRNA co-regulatory network reveals shared and specific regulators in the development of B cell and T cell . Sci Rep 2015 ; 5 ( 1 ): 15215 . http://dx.doi.org/10.1038/srep15215 Google Scholar CrossRef Search ADS PubMed 16 Tomczak K , Czerwińska P , Wiznerowicz M. The Cancer Genome Atlas (TCGA) an immeasurable source of knowledge . Contemp Oncol 2015 ; 19 ( 1A ): A68 – 77 . 17 Barretina J , Caponigro G , Stransky N , et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity . Nature 2012 ; 483 ( 7391 ):603–307. http://dx.doi.org/10.1038/nature11003 18 Hishiki T , Kawamoto S , Morishita S , et al. BodyMap: a human and mouse gene expression database . Nucleic Acids Res 2000 ; 28 ( 1 ): 136 – 8 . http://dx.doi.org/10.1093/nar/28.1.136 Google Scholar CrossRef Search ADS PubMed 19 Carithers LJ , Moore HM. The Genotype-Tissue Expression (GTEx) project . Biopreserv Biobank 2015 ; 13 ( 5 ): 307 – 8 . http://dx.doi.org/10.1089/bio.2015.29031.hmm Google Scholar CrossRef Search ADS PubMed 20 Gong J , Wu Y , Zhang X , et al. Comprehensive analysis of human small RNA sequencing data provides insights into expression profiles and miRNA editing . RNA Biol 2014 ; 11 ( 11 ): 1375 – 85 . http://dx.doi.org/10.1080/15476286.2014.996465 Google Scholar CrossRef Search ADS PubMed 21 Zhang HM , Kuang S , Xiong X , et al. Transcription factor and microRNA co-regulatory loops: important regulatory motifs in biological processes and diseases . Brief Bioinform 2015 ; 16 ( 1 ): 45 – 58 . http://dx.doi.org/10.1093/bib/bbt085 Google Scholar CrossRef Search ADS PubMed 22 Lopes CT , Franz M , Kazi F , et al. Cytoscape Web: an interactive web-based network browser . Bioinformatics 2010 ; 26 ( 18 ): 2347 – 8 . http://dx.doi.org/10.1093/bioinformatics/btq430 Google Scholar CrossRef Search ADS PubMed 23 Wu KL , Huang EY , Yeh WL , et al. Synergistic interaction between galectin-3 and carcinoembryonic antigen promotes colorectal cancer metastasis . Oncotarget 2017 ; 8 : 61935 – 43 . Google Scholar PubMed 24 Hogas S , Bilha SC , Branisteanu D , et al. Potential novel biomarkers of cardiovascular dysfunction and disease: cardiotrophin-1, adipokines and galectin-3 . Arch Med Sci 2017 ; 4 : 897 – 913 . http://dx.doi.org/10.5114/aoms.2016.58664 Google Scholar CrossRef Search ADS 25 Bormuth I , Yan K , Yonemasu T , et al. Neuronal basic helix-loop-helix proteins neurod2/6 regulate cortical commissure formation before midline interactions . J Neurosci 2013 ; 33 ( 2 ): 641 – 51 . http://dx.doi.org/10.1523/JNEUROSCI.0899-12.2013 Google Scholar CrossRef Search ADS PubMed 26 Alcock J , Lowe J , England T , et al. Expression of Sox1, Sox2 and Sox9 is maintained in adult human cerebellar cortex . Neurosci Lett 2009 ; 450 ( 2 ): 114 – 16 . http://dx.doi.org/10.1016/j.neulet.2008.11.047 Google Scholar CrossRef Search ADS PubMed 27 Yang H , Zheng W , Shuai X , et al. MicroRNA-424 inhibits Akt3-E2F3 axis and tumor growth in hepatocellular carcinoma . Oncotarget 2015 ; 6 ( 29 ): 27736 – 50 . http://dx.doi.org/10.18632/oncotarget.4811 Google Scholar CrossRef Search ADS PubMed 28 Zhang P , Ma Y , Wang F , et al. Comprehensive gene and microRNA expression profiling reveals the crucial role of hsa-let-7i and its target genes in colorectal cancer metastasis . Mol Biol Rep 2012 ; 39 ( 2 ): 1471 – 8 . http://dx.doi.org/10.1007/s11033-011-0884-1 Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

SEGreg: a database for human specifically expressed genes and their regulations in cancer and normal tissue

Loading next page...
 
/lp/ou_press/segreg-a-database-for-human-specifically-expressed-genes-and-their-j62HURVUU5
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bbx173
Publisher site
See Article on Publisher Site

Abstract

Abstract Human specifically expressed genes (SEGs) usually serve as potential biomarkers for disease diagnosis and treatment. However, the regulation underlying their specific expression remains to be revealed. In this study, we constructed SEG regulation database (SEGreg; available at http://bioinfo.life.hust.edu.cn/SEGreg) for showing SEGs and their transcription factors (TFs) and microRNA (miRNA) regulations under different physiological conditions, which include normal tissue, cancer tissue and cell line. In total, SEGreg collected 6387, 1451, 4506 and 5320 SEGs from expression profiles of 34 cancer types and 55 tissues of The Cancer Genome Atlas, Cancer Cell Line Encyclopedia, Human Body Map and Genotype-Tissue Expression databases/projects, respectively. The cancer or tissue corresponding expressed miRNAs and TFs were identified from miRNA and gene expression profiles, and their targets were collected from several public resources. Then the regulatory networks of all SEGs were constructed and integrated into SEGreg. Through a user-friendly interface, users can browse and search SEGreg by gene name, data source, tissue, cancer type and regulators. In summary, SEGreg is a specialized resource to explore SEGs and their regulations, which provides clues to reveal the mechanisms of carcinogenesis and biological processes. SEGreg, miRNA, transcription factor, specific expression, regulation, network Introduction The specifically expressed genes (SEGs), which are expressed in a unique or a small number of tissues/physiological conditions, could serve as biomarkers and provide important clues for gene function and tissue-specific characteristics [1]. In recent years, high-throughput sequencing techniques provided abundant data to examine gene expression patterns. Some prevalent methods such as correlated expression, differential expression and specific expression were widely used for specific gene identification [2]. Besides, some sophisticated algorithms such as PaGeFinder [3] and SpeCond [4] were also developed to screen out tissue-specific genes. To date, some relevant studies have paid attention to SEGs and their functions, such as TissGDB [5], PaGenBase [6], HOMER [1], C-It [7] and TiGER [8]. But most of these resources just allow users to access SEGs over a limited range of tissues or diseases, or some of the data were sourced from Expressed Sequence Tag or text mining, which were less accurate than the prevailing high-throughput sequencing data. Besides, the regulatory mechanisms underlying physiology and pathogenesis have not been systematic investigated. Although the TissGDB is a newly tissue-specific gene expression database containing SEGs from public resources of HPA, TiGER and Genotype-Tissue Expression (GTEx), the identification method of SEGs is different from this study, and the SEG list is not completely equivalent to SEG regulation database (SEGreg). What is more, it contains fewer tissue/cancer types than SEGreg. By testing GTEx data, our SEGtool algorithm (http://bioinfo.life.hust.edu.cn/SEGtool/) [9] is proven to outperform the most prevalent SEG detection methods including Z-score [10], ROKU [11], SpeCond [4] and PaGeFinder [3], and the results were proven to be more precise. Another existing difference is the content of mechanism investigation, by using The Cancer Genome Atlas (TCGA) data, TissGDB performed gene expression, somatic mutation analyses and co-expressed protein interaction network analysis, while in this study we performed microRNA (miRNA) and transcription factor (TF) regulatory network analysis. miRNA and TF are two major focuses that have drawn extensive attention in diseases and physiological activities. They play pivotal roles in proliferation, differentiation, invasion and metastasis of tumor, or affect the development, metabolism, apoptosis and other biological processes of normal tissues [12]. To further explore the molecular mechanisms of complex diseases, regulatory networks were widely studied and proven to be effective means for elucidating gene function and interactions [13]. In our previous studies, miRNA and TF regulatory networks have been used to reveal some complex molecular mechanisms in diseases and cell development. For example, Ye et al. [14] used TF and miRNA co-regulatory network, and analysis revealed miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia. Lin et al. [15] investigated TF and miRNA co-regulatory network in the development of B cell and T cell. Through TF and miRNA regulation analysis, the complex gene regulatory relationships in normal tissues or diseases will be illuminated on systematic levels. In this study, to survey and provide a comprehensive understanding of the SEG regulations involved in human cancers/cancer cells and normal tissues, first, we identified SEGs from four large-scale data sets using SEGtool; then analyzed their miRNA and TF regulation in each cancer/normal tissue; finally, we designed SEGreg database to store and present the results. As a whole, SEGreg is a freely available database aiming to provide a resource that includes accurate SEGs and their expression as well as regulatory networks, which will facilitate the research of gene function in critical diseases and specific biological processes. Materials and methods Obtaining SEGs and constructing networks Cancer-related expression profiles were downloaded from TCGA (https://tcga-data.nci.nih.gov/tcga) [16] and Cancer Cell Line Encyclopedia (CCLE) [17], while the normal tissue expression data were downloaded from Human Body Map (BodyMap) [18] and GTEx project [19]. In CCLE data set, various cell lines from the same tissue were regarded as parallel samples of that tissue. In SEGreg, SEGs were identified from 11 092 TCGA samples of 34 cancer types, 1036 CCLE samples of 24 tissues, 141 BodyMap samples of 35 tissues and 8555 GTEx samples of 31 tissues using SEGtool. The expression levels of biological replicates in the same tissue were normalized by the ‘deal_replicated’ function in SEGtool to generate a representative value. The matched miRNA expression profiles of TCGA cancers were downloaded from FireBrowse (http://www.firebrowse.org), while the miRNA expression profile of normal tissues were downloaded from our HMED database [20]. What is more, specifically expressed TFs of each cancer/tissue were identified from its corresponding gene expression matrix. Genes were filtered with expression threshold of the number of reads/fragments per kilobase of transcript per million mapped reads (RPKM/FPKM) > 3.0 in at least one tissue/cancer, while the threshold of miRNA and TF expression level were set as RPKM/FPKM >10.0. Networks were constructed using the SEGs, as well as collected miRNAs and TFs from several databases. For miRNA, we collected experimentally verified and predicted targets as described in our previous paper [21]. The data mainly include the overlapped results from miRecords4, miRTarbase6, Tarbase7, TargetScanHuman7.1 and miRWalk2. TF targets of human were extracted from TRANSFAC database. Data of regulatory networks were generated from the above data using in-house scripts. For each network, TF, miRNA and SEG were from the same cancer/tissue. Database and Web site implementation A recently developed python micro-framework—Flask—and its extensions enabled us to write web applications. The processed data were imported into a MySQL relational database (version: 5.7.18). All stored data were accessible through a series of Python functions to dynamically generate the content for the Apache web server (version: 2.4.18). By using the Bootstrap extension, we offered a user-friendly web interface. Cytoscape plug-in [22] was also integrated into the application to exhibit regulatory networks online. Results Data summary in SEGreg In SEGreg, four SEG data sets were generated by analyzing gene expression profiles from TCGA, CCLE, BodyMap and GTEx databases/projects, respectively. Totally, SEGreg includes 6387, 1451, 4506 and 5320 SEGs, which come from 34 cancer types and 24, 35, 31 tissues, respectively (Table 1). For cancer data sets, TCGA and CCLE shared 806 common SEGs, while the normal data sets BodyMap and GTEx shared 2734 common SEGs; the big difference between lists enriched our database. Interestingly, 357, 12, 12 and 16 genes in these four data sets served as the intersection of high and low SEGs (Table 1); in each data set, a highly SEG in one disease/tissue can also be identified as lowly SEG in another disease/tissue. Furthermore, we identified 314, 368, 448 and 442 miRNAs and 764, 680, 463 and 455 TFs as regulators of the SEGs in four data sets, respectively (Table 1). There were 4289 SEGs, which were regulated by both miRNA and TF in TCGA data set, while in the other three data sets, the numbers were 283, 1530 and 1903. The separate number of miRNA–gene regulation pair and TF–gene regulation pair were also surveyed (Table 1), and the detailed numbers of regulation pairs in each cancer/tissue were investigated (Figure 1). It is not hard to conclude that the cancer LAML (Acute Myeloid Leukemia) (Figure 1A) and cancer tissue autonomic ganglia (Figure 1B) have the most complex regulatory networks, while in normal tissues, brain (Figure 1C) and testis (Figure 1D) have the most regulative relations. One gene can be regulated by several miRNAs and TFs; meanwhile, the gene may target several other genes in case it is a TF. What is more, SEGreg allows users to construct synergistic regulatory networks of miRNAs and TFs in realtime. In addition, we found that a gene may synchronously act as SEGs in multiple tissues/cancer types and result in the regulatory networks different from each other. Table 1 Overview of SEGs and their regulator in SEGreg Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Table 1 Overview of SEGs and their regulator in SEGreg Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Data type TCGA CCLE BodyMap GTEx Cancer/tissue number 34 24 35 31 High-SEGs 5724 1278 4401 5056 Low-SEGs 1020 185 117 280 High-low-common 357 12 12 16 Unique SEGs 6387 1451 4506 5320 miRNAs 314 368 448 442 TFs 764 680 463 455 miRNA–gene pairs 83 607 5724 18 321 18 550 TF–gene pairs 466 288 126 170 58 889 59 760 miRNA and TF target SEGs 4289 283 1530 1903 Figure 1. View largeDownload slide The statistics of miRNA–gene pair and TF–gene pair obtained from four data sets: (A) TCGA; (B) CCLE; (C) BodyMap; (D) GTEx. Figure 1. View largeDownload slide The statistics of miRNA–gene pair and TF–gene pair obtained from four data sets: (A) TCGA; (B) CCLE; (C) BodyMap; (D) GTEx. Functions of cancer and tissue SEGs In SEGreg, cancer SEGs were identified from resources of TCGA and CCLE, while normal SEGs were identified from BodyMap and GTEx expression profiles. To further recognize the characteristics of the SEGs and their biology functions, we paid more attention to the cancer types/tissues, which owned the most SEGs. For TCGA, totally 6387 SEGs were identified from 34 cancer types, among which LAML (Acute Myeloid Leukemia) possesses 1623 (39.3%) SEGs accounting for the largest proportion. To reveal the potential function of these SEGs in LAML, we performed Gene Ontology enrichment analysis using DAVID (https://david.ncifcrf.gov/). The results (Figure 2) showed that specifically high expressed genes in LAML mainly participate in activation, differentiation and proliferation of leukocyte, lymphocyte, myeloid cell and mononuclear cell, as well as cell activation involved immune responses. The specifically low expressed genes are mainly involved in blood vessel, heart and cardiovascular system development and morphogenesis. The other cancer-related data set is CCLE, which contains 24 cancer tissues. The autonomic ganglia, small intestine and hematopoietic and lymphoid tissue, respectively, possess 351 (24.2%), 320 (22.0%) and 288 (19.8%) SEGs and rank the top in all tissues. The main functions of SEGs in autonomic ganglia are neurogenesis, neuron differentiation and development, while in small intestine are tissue development, cell differentiation, development and morphogenesis, in hematopoietic and lymphoid tissue are immune response, leukocyte, lymphocyte, B cell, T cell activation, differentiation and proliferation. Through statistical analysis of normal tissues in BodyMap and GTEx data sets, we found that in both these data sets testis possesses the most SEGs, especially highly expressed SEGs; they play key roles in reproduction, meiotic nuclear division, gamete generation and germ cell development; interestingly there was no lowly expressed SEGs in testis. Our previous study has pointed out that the SEGs in testis are strongly associated with organ-specific functions [9]. Figure 2. View largeDownload slide Gene Ontology enrichment of SEGs in LAML. Figure 2. View largeDownload slide Gene Ontology enrichment of SEGs in LAML. Web interface Through a user-friendly interface, users can browse or search data by SEGs, miRNAs and TFs in various cancer types/tissues from each data set, as well as download and display synergistic regulatory network. When browsing by SEG list of each data set, the result will be displayed in a table (Figure 3A). Detailed information of SEGs can also be displayed by clicking the ‘detail’ button. When browsing by cancer type/tissue, a statistical barchart will show the numbers of highly or lowly expressed SEGs (Figure 3B). By clicking the ‘high’ bar or ‘low’ bar in the barchart, users can further obtain the corresponding highly or lowly expressed SEG list (similar to Figure 3A) of each cancer type/tissue. Furthermore, users can construct and browse regulatory networks online by clicking the ‘network’ button in the table (Figure 3C). If a gene is specifically expressed in various cancer types/tissues, user will obtain a page containing multiple regulatory networks. Besides, by clicking the ‘high’ or ‘low’ tag in the table, users can obtain a barchart displaying the gene expression levels among different tissues. All of the data can be sorted in ascending or descending according to cancer/tissue/gene/miRNA/TF name. On the ‘Search’ page, users can search data by data sources and/or tissue and/or genes, or search by regulators in different data sets. What is more, SEGreg provided a quick search function on each webpage. By inputting a tissue name or cancer name or gene name or miRNA name, users can obtain a comprehensive result that contains all information from four data sets. All expression data and regulation data can be downloaded from the ‘Download’ page. We also designed a function to construct synergistic regulatory network online by optional use of miRNA and/or TF combinations on the homepage (Figure 3D and E). Through testing, SEGreg works well on all major browsers. Figure 3. View largeDownload slide Browse SEGs and their regulations by gene, cancer type, tissue, miRNA and TF, and construct regulatory networks in each cancer type/tissue. (A) Result presentation of browsing by genes in each data set. (B) Browse highly or lowly expressed SEGs in each cancer type/tissue. By clicking each bar in the plot, users can get the corresponding highly or lowly expressed SEGs and further investigate their expression levels or miRNA/TF regulatory networks. (C) An example of the regulatory network of SEG in SEGreg database. Color: Yellow: SEG; Blue: miRNA; Purple: TF; Green: if this SEG is also a TF, its target genes are marked in green. (D) Construction of miRNAs and/or TFs co-regulatory networks. (E) An example of co-regulatory network. Color: Green: SEG; Yellow: miRNA; Blue: TF. Figure 3. View largeDownload slide Browse SEGs and their regulations by gene, cancer type, tissue, miRNA and TF, and construct regulatory networks in each cancer type/tissue. (A) Result presentation of browsing by genes in each data set. (B) Browse highly or lowly expressed SEGs in each cancer type/tissue. By clicking each bar in the plot, users can get the corresponding highly or lowly expressed SEGs and further investigate their expression levels or miRNA/TF regulatory networks. (C) An example of the regulatory network of SEG in SEGreg database. Color: Yellow: SEG; Blue: miRNA; Purple: TF; Green: if this SEG is also a TF, its target genes are marked in green. (D) Construction of miRNAs and/or TFs co-regulatory networks. (E) An example of co-regulatory network. Color: Green: SEG; Yellow: miRNA; Blue: TF. Gene regulation and case studies Some SEGs and regulators have been proven by publications to be crucial in occurrence and development of diseases. For example, LGALS3 (galectin-3), a highly expressed SEG in three cancer types (COAD: Colon Adenocarcinoma, KICH: Kidney Chromophobe and READ: Rectum Adenocarcinoma) in this study (Figure 4A), while lowly expressed in normal liver tissue, recently has been reported to promote colorectal cancer metastasis and act as diagnostic marker in multiple diseases [23, 24]. It is common that the same oncogenic gene plays different roles in different cancer types [5]. LGALS3 could be co-regulated by 34 TFs in COAD, KICH and READ (Figure 4B). Some of these TFs such as ATF3 and C/EBPα are tumor suppressors. To further understand the co-regulation of ATF3 and C/EBPα in these three cancer types, we construct synergistic regulatory networks in these three cancer types and found they target 17, 43 and 17 genes in COAD, KICH and READ, respectively. Some SEGs play key roles in tissue development. For example, NEUROD2 and NEUROD6 are key regulators in brain function [25] and they were highly expressed in brain tissue of GTEx data set in SEGreg. In brain, they can be regulated by many TFs (Figure 4C and D) including SOX9, which is a specific marker for neural stem cells and expresses in the adult cerebellum [26]. Some other regulations were also verified by our network analysis. For example, E2F2 can be suppressed by miR-424 in liver hepatocellular carcinoma [27], and hsa-let-7i plays a crucial role in colorectal cancer metastasis and targets eight genes including SOX13, SLC25A4 and SEMA4F [28]. These interesting findings will benefit users who are devoted to investigating the regulatory mechanisms of cancers. Figure 4. View largeDownload slide Expression level of LGALS3 in different cancer types and regulatory networks of NEUROD2 and NEUROD6 in GTEx brain tissue. (A) Among 35 cancer types, LGALS3 is highly expressed in COAD, KICH and READ. (B) Venn diagram of TFs which target LGALS3 in COAD, KICH, READ and normal liver tissue. (C) Regulatory network of NEUROD2 in GTEx brain tissue. (D) Regulatory network of NEUROD6 in GTEx brain tissue. Color: Yellow: gene; Blue: miRNA; Purple: TF; Green: If this SEG is also a TF, its target genes are marked in green. Figure 4. View largeDownload slide Expression level of LGALS3 in different cancer types and regulatory networks of NEUROD2 and NEUROD6 in GTEx brain tissue. (A) Among 35 cancer types, LGALS3 is highly expressed in COAD, KICH and READ. (B) Venn diagram of TFs which target LGALS3 in COAD, KICH, READ and normal liver tissue. (C) Regulatory network of NEUROD2 in GTEx brain tissue. (D) Regulatory network of NEUROD6 in GTEx brain tissue. Color: Yellow: gene; Blue: miRNA; Purple: TF; Green: If this SEG is also a TF, its target genes are marked in green. Summary and perspective SEGreg is a specialized database that focused on SEGs and their miRNA and TF regulations in different cancer types/tissues. To build the database, we collected RNA expression profiles, miRNA expression profiles and regulators from various data sources which contain extensive samples. Accurate SEGs were detected by SEGtool, a newly developed tool based on Fuzzy c-means, Jaccard index and greedy annealing method with higher specificity and accuracy than existing major tools. What is more, miRNA and TF regulations in each cancer/tissue were revealed and regulatory networks were constructed. Finally, SEGreg database was designed to archive and present all the information. To detect more expression mechanisms and enrich the content of SEGreg, we will investigate the methylation and mutation of SEGs in the near future. SEGreg will be regularly updated as the growing of public data. We anticipate SEGreg will provide comprehensive information and significant clues to researchers who focused on tissue-specific function, gene expression and regulation and biomarker discovery in cancer/tissue. Key Points Focused on specifically high/low expressed genes of both cancer and normal tissues. Identified more precise and complete SEGs from four public resources using SEGtool. Network analysis was used to clarify gene expression mechanisms. Developed a specialized SEG regulation database (SEGreg). Funding This research was supported by funding from the National Natural Science Foundation of China (No. 31471247 and 31771458) and Open Project funded by Key laboratory of Carcinogenesis and Translational Research, Ministry of Education/Beijing (2017 Open Project-6). Qin Tang, is a Postdoctoral Fellow in College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China. Her research interests include development of databases and bioinformatics tools, and cancer genomics. Qiong Zhang is a PhD candidate in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. Yao Lv is a master student in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. Ya-Ru Miao is a master student in bioinformatics in College of Life Science and Technology, Huazhong University of Science and Technology, China. An-Yuan Guo, is a Professor at the Key Laboratory of Molecular Biophysics of the Ministry of Education, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, China. His research interests include TF&ncRNA regulation in diseases, development of databases and bioinformatics algorithms and cancer genomics. Reference 1 Zhang F , Chen JY. HOMER: a human organ-specific molecular electronic repository . BMC Bioinformatics 2011 ; 12(Suppl 10) : S4. Google Scholar CrossRef Search ADS PubMed 2 Xiao SJ , Zhang C , Zou Q , et al. TiSGeD: a database for tissue-specific genes . Bioinformatics 2010 ; 26 ( 9 ): 1273 – 5 . http://dx.doi.org/10.1093/bioinformatics/btq109 Google Scholar CrossRef Search ADS PubMed 3 Pan JB , Hu SC , Wang H , et al. PaGeFinder: quantitative identification of spatiotemporal pattern genes . Bioinformatics 2012 ; 28 ( 11 ): 1544 – 5 . http://dx.doi.org/10.1093/bioinformatics/bts169 Google Scholar CrossRef Search ADS PubMed 4 Cavalli FM , Bourgon R , Vaquerizas JM , et al. SpeCond: a method to detect condition-specific gene expression . Genome Biol 2011 ; 12 ( 10 ): R101 . Google Scholar CrossRef Search ADS PubMed 5 Kim P , Park A , Han G , et al. TissGDB: tissue-specific gene database in cancer . Nucleic Acids Res 2017 , doi:10.1093/nar/gkx850. 6 Pan JB , Hu SC , Shi D , et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function . PLoS One 2013 ; 8 ( 12 ): e80747 . Google Scholar CrossRef Search ADS PubMed 7 Gellert P , Jenniches K , Braun T , et al. C-It: a knowledge database for tissue-enriched genes . Bioinformatics 2010 ; 26 ( 18 ): 2328 – 33 . http://dx.doi.org/10.1093/bioinformatics/btq417 Google Scholar CrossRef Search ADS PubMed 8 Liu X , Yu X , Zack DJ , et al. TiGER: a database for tissue-specific gene expression and regulation . BMC Bioinformatics 2008 ; 9 : 271. http://dx.doi.org/10.1186/1471-2105-9-271 Google Scholar CrossRef Search ADS PubMed 9 Zhang Q , Liu W , Liu C , et al. SEGtool: a specifically expressed gene detection tool and applications in human tissue and single-cell sequencing data . Brief Bioinform 2017 , doi:10.1093/bib/bbx074. 10 Cheadle C , Vawter MP , Freed WJ , et al. Analysis of microarray data using Z score transformation . J Mol Diagn 2003 ; 5 ( 2 ): 73 – 81 . http://dx.doi.org/10.1016/S1525-1578(10)60455-2 Google Scholar CrossRef Search ADS PubMed 11 Kadota K , Ye J , Nakai Y , et al. ROKU: a novel method for identification of tissue-specific genes . BMC Bioinformatics 2006 ; 7 ( 1 ): 294 . http://dx.doi.org/10.1186/1471-2105-7-294 Google Scholar CrossRef Search ADS PubMed 12 Salehi Z , Akrami H. Target genes prediction and functional analysis of microRNAs differentially expressed in gastric cancer stem cells MKN-45 . J Cancer Res Ther 2017 ; 13 : 477 – 83 . Google Scholar PubMed 13 Saha A , Kim Y , Gewirtz ADH , et al. Co-expression networks reveal the tissue-specific regulation of transcription and splicing . Genome Res 2017 ; 27 : 1843 – 58 . http://dx.doi.org/10.1101/gr.216721.116 Google Scholar CrossRef Search ADS PubMed 14 Ye H , Liu X , Lv M , et al. MicroRNA and transcription factor co-regulatory network analysis reveals miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia . Nucleic Acids Res 2012 ; 40 ( 12 ): 5201 – 14 . http://dx.doi.org/10.1093/nar/gks175 Google Scholar CrossRef Search ADS PubMed 15 Lin Y , Zhang Q , Zhang HM , et al. Transcription factor and miRNA co-regulatory network reveals shared and specific regulators in the development of B cell and T cell . Sci Rep 2015 ; 5 ( 1 ): 15215 . http://dx.doi.org/10.1038/srep15215 Google Scholar CrossRef Search ADS PubMed 16 Tomczak K , Czerwińska P , Wiznerowicz M. The Cancer Genome Atlas (TCGA) an immeasurable source of knowledge . Contemp Oncol 2015 ; 19 ( 1A ): A68 – 77 . 17 Barretina J , Caponigro G , Stransky N , et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity . Nature 2012 ; 483 ( 7391 ):603–307. http://dx.doi.org/10.1038/nature11003 18 Hishiki T , Kawamoto S , Morishita S , et al. BodyMap: a human and mouse gene expression database . Nucleic Acids Res 2000 ; 28 ( 1 ): 136 – 8 . http://dx.doi.org/10.1093/nar/28.1.136 Google Scholar CrossRef Search ADS PubMed 19 Carithers LJ , Moore HM. The Genotype-Tissue Expression (GTEx) project . Biopreserv Biobank 2015 ; 13 ( 5 ): 307 – 8 . http://dx.doi.org/10.1089/bio.2015.29031.hmm Google Scholar CrossRef Search ADS PubMed 20 Gong J , Wu Y , Zhang X , et al. Comprehensive analysis of human small RNA sequencing data provides insights into expression profiles and miRNA editing . RNA Biol 2014 ; 11 ( 11 ): 1375 – 85 . http://dx.doi.org/10.1080/15476286.2014.996465 Google Scholar CrossRef Search ADS PubMed 21 Zhang HM , Kuang S , Xiong X , et al. Transcription factor and microRNA co-regulatory loops: important regulatory motifs in biological processes and diseases . Brief Bioinform 2015 ; 16 ( 1 ): 45 – 58 . http://dx.doi.org/10.1093/bib/bbt085 Google Scholar CrossRef Search ADS PubMed 22 Lopes CT , Franz M , Kazi F , et al. Cytoscape Web: an interactive web-based network browser . Bioinformatics 2010 ; 26 ( 18 ): 2347 – 8 . http://dx.doi.org/10.1093/bioinformatics/btq430 Google Scholar CrossRef Search ADS PubMed 23 Wu KL , Huang EY , Yeh WL , et al. Synergistic interaction between galectin-3 and carcinoembryonic antigen promotes colorectal cancer metastasis . Oncotarget 2017 ; 8 : 61935 – 43 . Google Scholar PubMed 24 Hogas S , Bilha SC , Branisteanu D , et al. Potential novel biomarkers of cardiovascular dysfunction and disease: cardiotrophin-1, adipokines and galectin-3 . Arch Med Sci 2017 ; 4 : 897 – 913 . http://dx.doi.org/10.5114/aoms.2016.58664 Google Scholar CrossRef Search ADS 25 Bormuth I , Yan K , Yonemasu T , et al. Neuronal basic helix-loop-helix proteins neurod2/6 regulate cortical commissure formation before midline interactions . J Neurosci 2013 ; 33 ( 2 ): 641 – 51 . http://dx.doi.org/10.1523/JNEUROSCI.0899-12.2013 Google Scholar CrossRef Search ADS PubMed 26 Alcock J , Lowe J , England T , et al. Expression of Sox1, Sox2 and Sox9 is maintained in adult human cerebellar cortex . Neurosci Lett 2009 ; 450 ( 2 ): 114 – 16 . http://dx.doi.org/10.1016/j.neulet.2008.11.047 Google Scholar CrossRef Search ADS PubMed 27 Yang H , Zheng W , Shuai X , et al. MicroRNA-424 inhibits Akt3-E2F3 axis and tumor growth in hepatocellular carcinoma . Oncotarget 2015 ; 6 ( 29 ): 27736 – 50 . http://dx.doi.org/10.18632/oncotarget.4811 Google Scholar CrossRef Search ADS PubMed 28 Zhang P , Ma Y , Wang F , et al. Comprehensive gene and microRNA expression profiling reveals the crucial role of hsa-let-7i and its target genes in colorectal cancer metastasis . Mol Biol Rep 2012 ; 39 ( 2 ): 1471 – 8 . http://dx.doi.org/10.1007/s11033-011-0884-1 Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Briefings in BioinformaticsOxford University Press

Published: Jan 3, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off