Enrichment analysis with EpiAnnotator

Enrichment analysis with EpiAnnotator Abstract Motivation Deciphering relevant biological insights from epigenomic data can be a challenging task. One commonly used approach is to perform enrichment analysis. However, finding, downloading and using the publicly available functional annotations require time, programming skills and IT infrastructure. Here we describe the online tool EpiAnnotator for performing enrichment analyses on epigenomic data in a fast and user-friendly way. Results EpiAnnotator is an R Package accompanied by a web interface. It contains regularly updated annotations from 4 public databases: Blueprint, RoadMap, GENCODE and the UCSC Genome Browser. Annotations are hosted locally or in a server environment and automatically updated by scripts of our own design. Thousands of tracks are available, reflecting data on a variety of tissues, cell types and cell lines from the human and mouse genomes. Users need to upload sets of selected and background regions. Results are displayed in customizable and easily interpretable figures. Availability and implementation The R package and Shiny app are open source and available under the GPL v3 license. EpiAnnotator’s web interface is accessible at http://computational-epigenomics.com/en/epiannotator. Contact epiannotator@computational-epigenomics.com 1 Introduction Interpretation of large epigenomic datasets is usually context-dependent and associated to a genome assembly of interest. Unravelling relevant biological insights from such datasets can be a burdensome and time-consuming task. One common approach to overcome some of these difficulties is to perform enrichment analysis. The recent increase in the use of new technologies designed for profiling epigenetic marks allows us to access large amount of methylation data from different repositories—ENCODE (ENCODE Project Consortium, 2004), the UCSC Genome Browser (Karolchik et al., 2003), the International Human Epigenome Consortium (Bujold et al., 2016), Roadmap Epigenomics (Bernstein et al., 2010), etc. (Fig. 1A). However, multiplication of sources for genomic and epigenomic datasets can complicate analysis and results interpretation. Fig. 1. View largeDownload slide EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks Fig. 1. View largeDownload slide EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks 2 Implementation To address these potential difficulties, we developed the EpiAnnotator web service as an all-encompassing enrichment analysis tool in a logic of centralization and regular updates from large web resources (Fig. 1B). Thousands of annotations are accessible to researchers to enable them to conveniently conduct comparative enrichment analyses and generating rapidly their own results in the form of comprehensive publication quality figures. EpiAnnotator builds upon extensive bioinformatical tools dedicated to enrichment analysis on genetic and epigenetic data. The R package LOLA (Sheffield and Bock, 2016) provides enrichment analysis but lacks visualization. The widely used DAVID service (Huang et al., 2009) focuses on genes only and is not integrated with epigenomic repositories. DeepBlue (Albrecht et al., 2016) provides a programmatic interface for accessing such repositories but does not perform enrichment analysis. The Genomation toolkit (Akalin et al., 2015) and the EpiExplorer web service (Halachev et al., 2012) are suitable for summarization and visualization of genomic intervals but lack functionalities for enrichment analysis. Moreover, most of the tools described above require programming expertise from their users, whereas EpiAnnotator provides a user-friendly web interface relying on a Shiny app. Its functionalities are implemented in a back-end R package which utilizes the robust IRanges package from Bioconductor (Lawrence et al., 2013) to optimize the computation of region overlap. 3 Workflow EpiAnnotator enrichment analysis commonly needs three sources of data: a BED file containing a set of selected genomic regions of interest, another BED file containing a set of background genomic regions (Fig. 1C), and annotations, i.e. reference sets of genomic regions. Both the selected and background genomic regions are uploaded by the user (Fig. 1D). Annotations are selected from the EpiAnnotator interface after specifying the databank to be used (Fig. 1E). In addition to the default collection of annotations, we provide access to the LOLA core and extended databases. Conveniently, users who focus on studies using the HumanMethylation450 or MethylationEPIC assay can upload sets of probe identifiers instead of their targeted CpG sites. Annotations are analyzed for overlap with the uploaded regions (Fig. 1F). Two options are available to the user: an enrichment analysis using Fisher’s exact test (Fig. 1G) or an overview of the data. The result of an enrichment analysis is a table listing number of overlapping regions, as well as fold changes and P-values. EpiAnnotator provides multiple visualizations through easily interpretable plots. As an example, Figure 1H shows a summary plot displaying the results of the enrichment analysis performed with selected and background genomic regions from Taylor et al. (2017) and annotation tracks from Taberlay et al. (2014). The border and size of the circles denote significance level of the overlap; degree of enrichment or depletion is represented by the fill color. EpiAnnotator’s interface has been designed to be compatible with both computer screen and smartphone displays. 4 Conclusion A key element allowing EpiAnnotator to decrease the long-extended computation time to a few seconds, is the usage of pre-computed distances for the reference set of genomic regions hosted in EpiAnnotator’s databanks. The databases are updated every two months to provide users with the latest annotations. Using EpiAnnotator does not require any coding skills, gives access to thousands of annotations through a web interface and provides enrichment analysis results along with high quality figures. Acknowledgement We would like to thank Erik Bernstein for the outstanding IT support. Funding Y.A. was supported by DZHK (German Centre for Cardiovascular Research). This work was funded in part by the German Research Foundation (DFG), FOR 2674 and by the German Federal Ministry of Education and Research (BMBF), ICGC-Data Mining 01KU1505A. Conflict of Interest: none declared. References Akalin A. et al.   ( 2015) genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics , 31, 1127– 1129. Google Scholar CrossRef Search ADS PubMed  Albrecht F. et al.   ( 2016) DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Res ., 44, W581– W586. Google Scholar CrossRef Search ADS PubMed  Bernstein B.E. et al.   ( 2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol ., 28, 1045– 1048. Google Scholar CrossRef Search ADS PubMed  Bujold D. et al.   ( 2016) The International Human Epigenome Consortium Data Portal. Cell Syst ., 3, 496.e2– 499.e2. ENCODE Project Consortium ( 2004) The ENCODE (ENCyclopedia of DNA elements) project. Science , 306, 636– 640. CrossRef Search ADS PubMed  Halachev K. et al.   ( 2012) EpiExplorer: live exploration and global analysis of large epigenomic datasets. Genome Biol ., 13, R96. Google Scholar CrossRef Search ADS PubMed  Huang D.W. et al.   ( 2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res ., 37, 1– 13. Google Scholar CrossRef Search ADS PubMed  Karolchik D. et al.   ( 2003) The UCSC genome browser database. Nucleic Acids Res ., 31, 51– 54. Google Scholar CrossRef Search ADS PubMed  Lawrence M. et al.   ( 2013) Software for computing and annotating genomic ranges. PLoS Comput. Biol ., 9, e1003118. Google Scholar CrossRef Search ADS PubMed  Sheffield N.C., Bock C. ( 2016) LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics , 32, 587– 589. Google Scholar CrossRef Search ADS PubMed  Taberlay P.C. et al.   ( 2014) Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res ., 24, 1421– 1432. Google Scholar CrossRef Search ADS PubMed  Taylor R.A. et al.   ( 2017) Germline BRCA2 mutations drive prostate cancers with distinct evolutionary trajectories. Nat. Commun ., 8, 13671. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Enrichment analysis with EpiAnnotator

Loading next page...
 
/lp/ou_press/enrichment-analysis-with-epiannotator-tAKnJ2OLgK
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty007
Publisher site
See Article on Publisher Site

Abstract

Abstract Motivation Deciphering relevant biological insights from epigenomic data can be a challenging task. One commonly used approach is to perform enrichment analysis. However, finding, downloading and using the publicly available functional annotations require time, programming skills and IT infrastructure. Here we describe the online tool EpiAnnotator for performing enrichment analyses on epigenomic data in a fast and user-friendly way. Results EpiAnnotator is an R Package accompanied by a web interface. It contains regularly updated annotations from 4 public databases: Blueprint, RoadMap, GENCODE and the UCSC Genome Browser. Annotations are hosted locally or in a server environment and automatically updated by scripts of our own design. Thousands of tracks are available, reflecting data on a variety of tissues, cell types and cell lines from the human and mouse genomes. Users need to upload sets of selected and background regions. Results are displayed in customizable and easily interpretable figures. Availability and implementation The R package and Shiny app are open source and available under the GPL v3 license. EpiAnnotator’s web interface is accessible at http://computational-epigenomics.com/en/epiannotator. Contact epiannotator@computational-epigenomics.com 1 Introduction Interpretation of large epigenomic datasets is usually context-dependent and associated to a genome assembly of interest. Unravelling relevant biological insights from such datasets can be a burdensome and time-consuming task. One common approach to overcome some of these difficulties is to perform enrichment analysis. The recent increase in the use of new technologies designed for profiling epigenetic marks allows us to access large amount of methylation data from different repositories—ENCODE (ENCODE Project Consortium, 2004), the UCSC Genome Browser (Karolchik et al., 2003), the International Human Epigenome Consortium (Bujold et al., 2016), Roadmap Epigenomics (Bernstein et al., 2010), etc. (Fig. 1A). However, multiplication of sources for genomic and epigenomic datasets can complicate analysis and results interpretation. Fig. 1. View largeDownload slide EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks Fig. 1. View largeDownload slide EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks 2 Implementation To address these potential difficulties, we developed the EpiAnnotator web service as an all-encompassing enrichment analysis tool in a logic of centralization and regular updates from large web resources (Fig. 1B). Thousands of annotations are accessible to researchers to enable them to conveniently conduct comparative enrichment analyses and generating rapidly their own results in the form of comprehensive publication quality figures. EpiAnnotator builds upon extensive bioinformatical tools dedicated to enrichment analysis on genetic and epigenetic data. The R package LOLA (Sheffield and Bock, 2016) provides enrichment analysis but lacks visualization. The widely used DAVID service (Huang et al., 2009) focuses on genes only and is not integrated with epigenomic repositories. DeepBlue (Albrecht et al., 2016) provides a programmatic interface for accessing such repositories but does not perform enrichment analysis. The Genomation toolkit (Akalin et al., 2015) and the EpiExplorer web service (Halachev et al., 2012) are suitable for summarization and visualization of genomic intervals but lack functionalities for enrichment analysis. Moreover, most of the tools described above require programming expertise from their users, whereas EpiAnnotator provides a user-friendly web interface relying on a Shiny app. Its functionalities are implemented in a back-end R package which utilizes the robust IRanges package from Bioconductor (Lawrence et al., 2013) to optimize the computation of region overlap. 3 Workflow EpiAnnotator enrichment analysis commonly needs three sources of data: a BED file containing a set of selected genomic regions of interest, another BED file containing a set of background genomic regions (Fig. 1C), and annotations, i.e. reference sets of genomic regions. Both the selected and background genomic regions are uploaded by the user (Fig. 1D). Annotations are selected from the EpiAnnotator interface after specifying the databank to be used (Fig. 1E). In addition to the default collection of annotations, we provide access to the LOLA core and extended databases. Conveniently, users who focus on studies using the HumanMethylation450 or MethylationEPIC assay can upload sets of probe identifiers instead of their targeted CpG sites. Annotations are analyzed for overlap with the uploaded regions (Fig. 1F). Two options are available to the user: an enrichment analysis using Fisher’s exact test (Fig. 1G) or an overview of the data. The result of an enrichment analysis is a table listing number of overlapping regions, as well as fold changes and P-values. EpiAnnotator provides multiple visualizations through easily interpretable plots. As an example, Figure 1H shows a summary plot displaying the results of the enrichment analysis performed with selected and background genomic regions from Taylor et al. (2017) and annotation tracks from Taberlay et al. (2014). The border and size of the circles denote significance level of the overlap; degree of enrichment or depletion is represented by the fill color. EpiAnnotator’s interface has been designed to be compatible with both computer screen and smartphone displays. 4 Conclusion A key element allowing EpiAnnotator to decrease the long-extended computation time to a few seconds, is the usage of pre-computed distances for the reference set of genomic regions hosted in EpiAnnotator’s databanks. The databases are updated every two months to provide users with the latest annotations. Using EpiAnnotator does not require any coding skills, gives access to thousands of annotations through a web interface and provides enrichment analysis results along with high quality figures. Acknowledgement We would like to thank Erik Bernstein for the outstanding IT support. Funding Y.A. was supported by DZHK (German Centre for Cardiovascular Research). This work was funded in part by the German Research Foundation (DFG), FOR 2674 and by the German Federal Ministry of Education and Research (BMBF), ICGC-Data Mining 01KU1505A. Conflict of Interest: none declared. References Akalin A. et al.   ( 2015) genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics , 31, 1127– 1129. Google Scholar CrossRef Search ADS PubMed  Albrecht F. et al.   ( 2016) DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Res ., 44, W581– W586. Google Scholar CrossRef Search ADS PubMed  Bernstein B.E. et al.   ( 2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol ., 28, 1045– 1048. Google Scholar CrossRef Search ADS PubMed  Bujold D. et al.   ( 2016) The International Human Epigenome Consortium Data Portal. Cell Syst ., 3, 496.e2– 499.e2. ENCODE Project Consortium ( 2004) The ENCODE (ENCyclopedia of DNA elements) project. Science , 306, 636– 640. CrossRef Search ADS PubMed  Halachev K. et al.   ( 2012) EpiExplorer: live exploration and global analysis of large epigenomic datasets. Genome Biol ., 13, R96. Google Scholar CrossRef Search ADS PubMed  Huang D.W. et al.   ( 2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res ., 37, 1– 13. Google Scholar CrossRef Search ADS PubMed  Karolchik D. et al.   ( 2003) The UCSC genome browser database. Nucleic Acids Res ., 31, 51– 54. Google Scholar CrossRef Search ADS PubMed  Lawrence M. et al.   ( 2013) Software for computing and annotating genomic ranges. PLoS Comput. Biol ., 9, e1003118. Google Scholar CrossRef Search ADS PubMed  Sheffield N.C., Bock C. ( 2016) LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics , 32, 587– 589. Google Scholar CrossRef Search ADS PubMed  Taberlay P.C. et al.   ( 2014) Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res ., 24, 1421– 1432. Google Scholar CrossRef Search ADS PubMed  Taylor R.A. et al.   ( 2017) Germline BRCA2 mutations drive prostate cancers with distinct evolutionary trajectories. Nat. Commun ., 8, 13671. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

BioinformaticsOxford University Press

Published: Jan 10, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off