ADEPTUS: a discovery tool for disease prediction, enrichment and network analysis based on profiles from many diseases

ADEPTUS: a discovery tool for disease prediction, enrichment and network analysis based on... Abstract Motivation Large-scale publicly available genomic data on many disease phenotypes could improve our understanding of the molecular basis of disease. Tools that undertake this challenge by jointly analyzing multiple phenotypes are needed. Results ADEPTUS is a web-tool that enables various functional genomics analyses based on a high-quality curated database spanning >38, 000 gene expression profiles and >100 diseases. It offers four types of analysis. (i) For a gene list provided by the user it computes disease ontology (DO), pathway, and gene ontology (GO) enrichment and displays the genes as a network. (ii) For a given disease, it enables exploration of drug repurposing by creating a gene network summarizing the genomic events in it. (iii) For a gene of interest, it generates a report summarizing its behavior across several studies. (iv) It can predict the tissue of origin and the disease of a sample based on its gene expression or its somatic mutation profile. Such analyses open novel ways to understand new datasets and to predict primary site of cancer. Availability and implementation Data and tool: http://adeptus.cs.tau.ac.il/home Analyses: Supplementary Material. Contact rshamir@tau.ac.il Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Case-control studies seek discriminatory signals between different phenotypes, in order to decipher the molecular basis of disease. Most analyses address a small number of studies, spanning very few phenotypes and tissues and a modest number of samples. These limitations reduce the reliability and specificity of disease data analysis. We introduce ADEPTUS, a new web-tool that aims to overcome these issues by conducting analyses based on multiple diseases and numerous studies simultaneously. It employs a novel high quality database of >37 000 gene expression profiles and >9500 cancer somatic mutation profiles, covering >200 different disease and tissues. We developed a classifier that gave high quality predictions for 68 of the disease and tissue labels. Good predictors produced biomarkers that were scored for replicability, intensity, and specificity. ADEPTUS uses these results for diverse functional analyses, including disease and function enrichment of gene sets, network analysis of a disease, and disease label prediction (Fig. 1A). Fig. 1. View largeDownload slide Analysis of melanoma in ADEPTUS. (A) Three analysis types. (B) The resulting DO subnetwork, containing the term melanoma, and the largest connected component of its biomarker genes. (C) The network analysis options and the subnetwork produced by setting specificity ROC = 0.9 for melanoma Fig. 1. View largeDownload slide Analysis of melanoma in ADEPTUS. (A) Three analysis types. (B) The resulting DO subnetwork, containing the term melanoma, and the largest connected component of its biomarker genes. (C) The network analysis options and the subnetwork produced by setting specificity ROC = 0.9 for melanoma 2 Materials and methods 2.1 The gene expression database We assembled and manually annotated 37 337 gene expression profiles. Each profile was annotated with the tissue of origin and either with a set of disease ontology (DO) terms or as a control. The label of a profile is a phenotype of a disease, a tissue, or both. For example, melanoma, skin, and melanoma of skin are all labels. The profiles cover 10, 501 genes that were shared across all samples, 190 disease labels and 18 tissue labels. Using this database we performed for each label and gene a conservative leave-study-out cross validation (Amar et al., 2015), obtaining three scores for label-gene association: signal strength and specificity (computed using ROC, default threshold: 0.7), and replicability (computed using meta-analysis over the datasets, default threshold-value: 0.01; >50% P-values < 0.05 was defined as replicated signal). See the Supplementary Material for details. Sixty-eight labels passed all thresholds. Those were defined as well-classified and used for subsequent analyses. 2.2 Analysis of a gene list Users can upload a gene list and perform several enrichment analyses: (i) gene ontology (GO) enrichments, computed using TANGO (Ulitsky et al., 2010), (ii) pathway enrichment, computed using Fisher’s exact test on KEGG (Kanehisa and Goto, 2000) and WikiPathways (Kelder et al., 2012) pathways and (iii) DO enrichment, using a gene set enrichment analysis (GSEA) -like analysis (Subramanian et al., 2005). To the best of our knowledge, this is the first tool to provide disease enrichment. 2.3 Analysis of a disease For each well-classified label the tool displays its biomarker genes in a summary network. Nodes are genes and edges are protein or genetic interactions taken from GeneMANIA (Montojo et al., 2014). Color-coding of the gene’s node gives information on it: up- or down-regulation compared to the negatives or the background, and availability of a drug targeting the gene (Law et al., 2014). For cancer labels, a color indicates when the gene is associated with the label based on somatic mutation data (Amar et al., 2017). The set of genes displayed can be changed by adjusting score thresholds. 2.4 Analysis of new profiles Our classifiers (for the well-classified labels) can be applied on new profiles. Given an input matrix of patients × profiles (gene expression or mutated genes), the classifiers output a table of patients × labels, whose values are the predicted association probabilities. Given a list of mutated genes in a biopsy, we predict the cancer subtype (e.g. the primary cancer of a metastasis sample) using our multi-label classifiers (Amar et al., 2017). 3 Results We analyzed melanoma, the most lethal and treatment-resistant human skin cancer, for which new treatment and prevention approaches are needed (Hodis et al., 2012). The ADEPTUS database had three relevant labels: melanoma, skin melanoma, and uveal melanoma. Only melanoma was well-classified, mainly because no reliable classifier distinguished between skin and uveal melanoma. Melanocytes, the melanoma origin cells, are specialized in releasing pigment vesicles, termed melanosomes (Raposo and Marks, 2007). Melanoma keeps this ability in order to directly affect the formation of their tumor niche by microRNA trafficking via melanosomes (Dror et al., 2016). Remarkably, ADEPTUS selected genes (Fig. 1B) that were enriched for vesicles transport and pigmentation (q < 1e-03). When we compared melanoma to other diseases by increasing the specificity score to 0.9 (Fig. 1C), melanoma vesicles were again found as the most significant GO enrichment. Dror et al. (2016) further found that inhibition of melanosome trafficking by melanoma can block melanoma formation. ADEPTUS offered new drug targets (Fig. 1B and C) with promising therapeutic potential to block melanosome trafficking in addition to the drug used in (3). Taken together, ADEPTUS is a strong tool for identifying disease related genes and for proposing new potential drugs. Funding R.S. was supported by the Israel Science Foundation [grant 317/13 and the ISF-NSFC joint program 2015–18] and by the Bella Walter Memorial Fund of the Israel Cancer Association. D.A. was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. C.L. thanks the support of the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme [grant 726225]. Conflict of Interest: none declared. References Amar D. et al. . ( 2015 ) Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets . Nucleic Acids Res ., 43 , 7779 – 7789 . Google Scholar CrossRef Search ADS PubMed Amar D. et al. . ( 2017 ) Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications . Oncogene , 36 , 3375 – 3383 . Google Scholar CrossRef Search ADS PubMed Dror S. et al. . ( 2016 ) Melanoma miRNA trafficking controls tumour primary niche formation . Nat. Cell Biol ., 18 , 1006 – 1017 . Google Scholar CrossRef Search ADS PubMed Hodis E. et al. . ( 2012 ) A landscape of driver mutations in melanoma . Cell , 150 , 251 – 263 . Google Scholar CrossRef Search ADS PubMed Kanehisa M. , Goto S. ( 2000 ) KEGG: kyoto encyclopaedia of genes and genomes . Nucleic Acids Res ., 28 , 27 – 30 . Google Scholar CrossRef Search ADS PubMed Kelder T. et al. . ( 2012 ) WikiPathways: building research communities on biological pathways . Nucleic Acids Res ., 40 , D1301 – D1307 . Google Scholar CrossRef Search ADS PubMed Law V. et al. . ( 2014 ) DrugBank 4.0: shedding new light on drug metabolism . Nucleic Acids Res ., 42 , D1091 – D1097 . Google Scholar CrossRef Search ADS PubMed Raposo G. , Marks M.S. ( 2007 ) Melanosomes–dark organelles enlighten endosomal membrane transport . Nat. Rev. Mol. Cell Biol ., 8 , 786 – 797 . Google Scholar CrossRef Search ADS PubMed Subramanian A. et al. . ( 2005 ) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles . Proc. Natl. Acad. Sci. USA , 102 , 15545 – 15550 . Google Scholar CrossRef Search ADS Ulitsky I. et al. . ( 2010 ) Expander: from expression microarrays to networks and functions . Nat. Protoc ., 5 , 303 – 322 . Google Scholar CrossRef Search ADS PubMed Montojo J. et al. . ( 2014 ) GeneMANIA: fast gene network construction and function prediction for Cytoscape . F1000Research, 3, 153. http://doi.org/10.12688/f1000research.4572.1. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

ADEPTUS: a discovery tool for disease prediction, enrichment and network analysis based on profiles from many diseases

Loading next page...
 
/lp/ou_press/adeptus-a-discovery-tool-for-disease-prediction-enrichment-and-network-WfDqbfS0Ql
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty027
Publisher site
See Article on Publisher Site

Abstract

Abstract Motivation Large-scale publicly available genomic data on many disease phenotypes could improve our understanding of the molecular basis of disease. Tools that undertake this challenge by jointly analyzing multiple phenotypes are needed. Results ADEPTUS is a web-tool that enables various functional genomics analyses based on a high-quality curated database spanning >38, 000 gene expression profiles and >100 diseases. It offers four types of analysis. (i) For a gene list provided by the user it computes disease ontology (DO), pathway, and gene ontology (GO) enrichment and displays the genes as a network. (ii) For a given disease, it enables exploration of drug repurposing by creating a gene network summarizing the genomic events in it. (iii) For a gene of interest, it generates a report summarizing its behavior across several studies. (iv) It can predict the tissue of origin and the disease of a sample based on its gene expression or its somatic mutation profile. Such analyses open novel ways to understand new datasets and to predict primary site of cancer. Availability and implementation Data and tool: http://adeptus.cs.tau.ac.il/home Analyses: Supplementary Material. Contact rshamir@tau.ac.il Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Case-control studies seek discriminatory signals between different phenotypes, in order to decipher the molecular basis of disease. Most analyses address a small number of studies, spanning very few phenotypes and tissues and a modest number of samples. These limitations reduce the reliability and specificity of disease data analysis. We introduce ADEPTUS, a new web-tool that aims to overcome these issues by conducting analyses based on multiple diseases and numerous studies simultaneously. It employs a novel high quality database of >37 000 gene expression profiles and >9500 cancer somatic mutation profiles, covering >200 different disease and tissues. We developed a classifier that gave high quality predictions for 68 of the disease and tissue labels. Good predictors produced biomarkers that were scored for replicability, intensity, and specificity. ADEPTUS uses these results for diverse functional analyses, including disease and function enrichment of gene sets, network analysis of a disease, and disease label prediction (Fig. 1A). Fig. 1. View largeDownload slide Analysis of melanoma in ADEPTUS. (A) Three analysis types. (B) The resulting DO subnetwork, containing the term melanoma, and the largest connected component of its biomarker genes. (C) The network analysis options and the subnetwork produced by setting specificity ROC = 0.9 for melanoma Fig. 1. View largeDownload slide Analysis of melanoma in ADEPTUS. (A) Three analysis types. (B) The resulting DO subnetwork, containing the term melanoma, and the largest connected component of its biomarker genes. (C) The network analysis options and the subnetwork produced by setting specificity ROC = 0.9 for melanoma 2 Materials and methods 2.1 The gene expression database We assembled and manually annotated 37 337 gene expression profiles. Each profile was annotated with the tissue of origin and either with a set of disease ontology (DO) terms or as a control. The label of a profile is a phenotype of a disease, a tissue, or both. For example, melanoma, skin, and melanoma of skin are all labels. The profiles cover 10, 501 genes that were shared across all samples, 190 disease labels and 18 tissue labels. Using this database we performed for each label and gene a conservative leave-study-out cross validation (Amar et al., 2015), obtaining three scores for label-gene association: signal strength and specificity (computed using ROC, default threshold: 0.7), and replicability (computed using meta-analysis over the datasets, default threshold-value: 0.01; >50% P-values < 0.05 was defined as replicated signal). See the Supplementary Material for details. Sixty-eight labels passed all thresholds. Those were defined as well-classified and used for subsequent analyses. 2.2 Analysis of a gene list Users can upload a gene list and perform several enrichment analyses: (i) gene ontology (GO) enrichments, computed using TANGO (Ulitsky et al., 2010), (ii) pathway enrichment, computed using Fisher’s exact test on KEGG (Kanehisa and Goto, 2000) and WikiPathways (Kelder et al., 2012) pathways and (iii) DO enrichment, using a gene set enrichment analysis (GSEA) -like analysis (Subramanian et al., 2005). To the best of our knowledge, this is the first tool to provide disease enrichment. 2.3 Analysis of a disease For each well-classified label the tool displays its biomarker genes in a summary network. Nodes are genes and edges are protein or genetic interactions taken from GeneMANIA (Montojo et al., 2014). Color-coding of the gene’s node gives information on it: up- or down-regulation compared to the negatives or the background, and availability of a drug targeting the gene (Law et al., 2014). For cancer labels, a color indicates when the gene is associated with the label based on somatic mutation data (Amar et al., 2017). The set of genes displayed can be changed by adjusting score thresholds. 2.4 Analysis of new profiles Our classifiers (for the well-classified labels) can be applied on new profiles. Given an input matrix of patients × profiles (gene expression or mutated genes), the classifiers output a table of patients × labels, whose values are the predicted association probabilities. Given a list of mutated genes in a biopsy, we predict the cancer subtype (e.g. the primary cancer of a metastasis sample) using our multi-label classifiers (Amar et al., 2017). 3 Results We analyzed melanoma, the most lethal and treatment-resistant human skin cancer, for which new treatment and prevention approaches are needed (Hodis et al., 2012). The ADEPTUS database had three relevant labels: melanoma, skin melanoma, and uveal melanoma. Only melanoma was well-classified, mainly because no reliable classifier distinguished between skin and uveal melanoma. Melanocytes, the melanoma origin cells, are specialized in releasing pigment vesicles, termed melanosomes (Raposo and Marks, 2007). Melanoma keeps this ability in order to directly affect the formation of their tumor niche by microRNA trafficking via melanosomes (Dror et al., 2016). Remarkably, ADEPTUS selected genes (Fig. 1B) that were enriched for vesicles transport and pigmentation (q < 1e-03). When we compared melanoma to other diseases by increasing the specificity score to 0.9 (Fig. 1C), melanoma vesicles were again found as the most significant GO enrichment. Dror et al. (2016) further found that inhibition of melanosome trafficking by melanoma can block melanoma formation. ADEPTUS offered new drug targets (Fig. 1B and C) with promising therapeutic potential to block melanosome trafficking in addition to the drug used in (3). Taken together, ADEPTUS is a strong tool for identifying disease related genes and for proposing new potential drugs. Funding R.S. was supported by the Israel Science Foundation [grant 317/13 and the ISF-NSFC joint program 2015–18] and by the Bella Walter Memorial Fund of the Israel Cancer Association. D.A. was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. C.L. thanks the support of the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme [grant 726225]. Conflict of Interest: none declared. References Amar D. et al. . ( 2015 ) Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets . Nucleic Acids Res ., 43 , 7779 – 7789 . Google Scholar CrossRef Search ADS PubMed Amar D. et al. . ( 2017 ) Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications . Oncogene , 36 , 3375 – 3383 . Google Scholar CrossRef Search ADS PubMed Dror S. et al. . ( 2016 ) Melanoma miRNA trafficking controls tumour primary niche formation . Nat. Cell Biol ., 18 , 1006 – 1017 . Google Scholar CrossRef Search ADS PubMed Hodis E. et al. . ( 2012 ) A landscape of driver mutations in melanoma . Cell , 150 , 251 – 263 . Google Scholar CrossRef Search ADS PubMed Kanehisa M. , Goto S. ( 2000 ) KEGG: kyoto encyclopaedia of genes and genomes . Nucleic Acids Res ., 28 , 27 – 30 . Google Scholar CrossRef Search ADS PubMed Kelder T. et al. . ( 2012 ) WikiPathways: building research communities on biological pathways . Nucleic Acids Res ., 40 , D1301 – D1307 . Google Scholar CrossRef Search ADS PubMed Law V. et al. . ( 2014 ) DrugBank 4.0: shedding new light on drug metabolism . Nucleic Acids Res ., 42 , D1091 – D1097 . Google Scholar CrossRef Search ADS PubMed Raposo G. , Marks M.S. ( 2007 ) Melanosomes–dark organelles enlighten endosomal membrane transport . Nat. Rev. Mol. Cell Biol ., 8 , 786 – 797 . Google Scholar CrossRef Search ADS PubMed Subramanian A. et al. . ( 2005 ) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles . Proc. Natl. Acad. Sci. USA , 102 , 15545 – 15550 . Google Scholar CrossRef Search ADS Ulitsky I. et al. . ( 2010 ) Expander: from expression microarrays to networks and functions . Nat. Protoc ., 5 , 303 – 322 . Google Scholar CrossRef Search ADS PubMed Montojo J. et al. . ( 2014 ) GeneMANIA: fast gene network construction and function prediction for Cytoscape . F1000Research, 3, 153. http://doi.org/10.12688/f1000research.4572.1. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BioinformaticsOxford University Press

Published: Jan 19, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off