ASaiM: a Galaxy-based framework to analyze microbiota data

ASaiM: a Galaxy-based framework to analyze microbiota data Background: New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings: We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions: Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable. Keywords: metagenomics; metataxonomics; user-friendly; Galaxy; Docker; microbiota; training scriptomics. These techniques are giving insight into taxonomic Findings profiles and genomic components of microbial communities. Background However, meta’omic data exploitation is not trivial due to the The study of microbiota and microbial communities has been fa- large amount of data, their complexity, the incompleteness of reference databases, and the difficulty to find, configure, use, cilitated by the evolution of sequencing techniques and the de- velopment of metataxonomics, metagenomics, and metatran- and combine the dedicated bioinformatics tools, etc. Hence, to Received: 8 September 2017; Revised: 6 January 2018; Accepted: 10 May 2018 The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 2 ASaiM: a Galaxy-based framework to analyze microbiota data extract useful information, a sequenced microbiota sample has A framework built on the shoulders of giants to be processed by sophisticated workflows with numerous suc- The ASaiM framework is built on existing tools and infrastruc- cessive bioinformatics steps [1]. Each step may require execution tures and combines all their forces to create an easily accessible of several tools or software. For example, to extract taxonomic and reproducible analysis platform. information with the widely used QIIME [2]orMothur[3], at least ASaiM is implemented as a portable virtualized container 10 different tools with at least four parameters each are needed. based on the Galaxy framework [8]. Galaxy provides researchers Designed for amplicon data, both QIIME and Mothur cannot be with means to reproduce their own workflows analyses, rerun directly applied to shotgun metagenomics data. In addition, the entire pipelines, or publish and share them with others. Based tools can be complex to use; they are command-line tools and on Galaxy, ASaiM is scalable from single CPU installations to may require extensive computational resources (memory, disk large multi-node high performance computing environments space). In this context, selecting the best tools, configuring them and manages efficiently job submission as well as memory con- to use the correct parameters and appropriate computational sumption of the tools. Deployments can be achieved by using resources, and combining them together in an analysis chain a pre-built ASaiM Docker image, which is based on the Galaxy is a complex and error-prone process. These issues and the in- Docker project [16]. This ASaiM Docker flavour is customized volved complexity are prohibiting scientists from participating with a variety of selected tools, workflows, interactive tours, and in the analysis of their own data. Furthermore, bioinformatics data that have been added as additional layers on top of the tools are often manually executed and/or patched together with generic Galaxy Docker instance. The containerization keeps the custom scripts. These practices raise doubts about a science deployment task to a minimum. The selected Galaxy tools are gold standard: reproducibility [3, 4]. Web services and automated automatically installed from the Galaxy ToolShed [17] using the pipelines such as MG-RAST [5] and EBI metagenomics [6] offer Galaxy API BioBlend [18], and the installation of the tools and solutions to the accessibility issue. However, these web services their dependencies are automatically resolved using packages work as a black box and are lacking in transparency, flexibil- available through Bioconda [19]. To populate ASaiM with the se- ity, and even reproducibility as the version and parameters of lected microbiota tools, we migrated the 12 tools/suites of tools the tools are not always available. Alternative approaches to im- and their dependencies to Bioconda (e.g., HUMAnN2), integrated prove accessibility, modularity, and reproducibility can be found 16 suites (>100 tools) into Galaxy (e.g., HUMAn2 or QIIME with its in open-source workflow systems such as Galaxy [ 6-8]. Galaxy approximately 40 tools), and updated the already available ones is a lightweight environment providing a web-based, intuitive, (Table 1). and accessible user interface to command-line tools, while auto- matically managing computation and transparently managing data provenance and workflow scheduling [ 6-8]. More than 5,500 Tools for microbiota data analyses tools can be used inside any Galaxy environment. For example, The tools integrated in ASaiM can be seen in Table 1.Theyare the main Galaxy server [9] integrates many genomic tools, and expertly selected for their relevance with regard to microbiota the few integrated metagenomics tools such as Kraken [10]or studies, such as Mothur (mothur, RRID:SCR 011947)[3], QIIME VSearch [11] have been showcased in the published windshield (QIIME, RRID:SCR 008249)[2], MetaPhlAn2 (MetaPhlAn, RRID:SC splatter analysis [12]. The tools can also be selected and com- R 004915)[45], HUMAnN2 [46], or tools used in existing pipelines bined to build Galaxy flavors focusing on specific type of analy- such as EBI Metagenomics’ one. We also added general tools sis, for example, the Galaxy RNA workbench [13] or the special- used in sequence analysis such as quality control, mapping, or ized Galaxy server of the Huttenhower lab [14]. However, none of similarity search tools. these solutions is dedicated to microbiota data analysis in gen- An effort in development was made to integrate these tools eral and with the community-standard tools. into Conda and the Galaxy environment (>100 tools integrated) In this context, we developed ASaiM (Auvergne Sequence with the help and support of the Galaxy community. We also de- analysis of intestinal Microbiota, RRID:SCR 015878), an Open- veloped two new tools to search and get data from EBI Metage- Source opinionated Galaxy-based framework. It integrates more nomics and ENA databases (EBISearch [20] and ENASearch [21]) than 100 tools and several workflows dedicated to microbiota and a tool to group HUMAnN2 outputs into Gene Ontology Slim analyses with an extensive documentation [15] and training Terms [47]. Tools inside ASaiM are documented [15] and orga- support. nized to make them findable. Diverse source of data Goals of ASaiM An easy way to upload user-data into ASaiM is provided by a ASaiM is developed as a modular, accessible, redistributable, web interface or more sophisticatedly via FTP or SFTP. On the sharable, and user-friendly framework for scientists working top, we added specialised tools that can interact with external with microbiota data. This framework is unique in combining databases like NCBI, ENA, or EBI Metagenomics to query them curated tools and workflows and providing easy access and sup- and download data into the ASaiM environment. port for scientists. ASaiM is based on four pillars: (1) easy and stable dissemina- Visualization of the data tion via Galaxy, Docker, and Conda, (2) a comprehensive set of microbiota-related tools, (3) a set of predefined and tested work- An analysis often ends with summarizing figures that conclude flows, and (4) extensive documentation and training to help sci- and represent the findings. ASaiM includes standard interactive entists in their analyses. plotting tools to draw bar charts and scatter plots for all kinds of tabular data. Phinch visualization [52] is also included to in- teractively visualize and explore any BIOM file and generate dif- ferent types of ready-to-publish figures. We also integrated two Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 3 Table 1: Available tools in ASaiM Section Subsection Tools File and meta tools Data retrieval EBISearch [20], ENASearch [21], SRA Tools Text manipulation Tools from Galaxy ToolShed Sequence file manipulation Tools from Galaxy ToolShed BAM/SAM file manipulation SAM tools [ 22-24] BIOM file manipulation BIOM-Format tools [ 25] Genomics tools Quality control FastQC [26], PRINSEQ [27], Trim Galore! [28, Trimmomatic [29], MultiQC [30] Clustering CD-Hit [31], Format CD-HIT outputs Sorting and prediction SortMeRNA [32], FragGeneScan [33] Mapping BWA [34], Bowtie [35] Similarity search NCBI Blast+ [36, 37], Diamond [38] Alignment HMMER3 [39] Microbiota dedicated tools Metagenomics data manipulation VSEARCH [11], Nonpareil [40] Assembly MEGAHIT [41], metaSPAdes [42], metaQUAST [43], VALET [44] Metataxonomic sequence analysis Mothur [3], QIIME [2] Taxonomy assignation on WGS sequences MetaPhlAn2 [45], Format MetaPhlan2, Kraken [10] Metabolism assignation HUMAnN2 [46], Group HUMAnN2 to GO slim terms [47], Compare HUMAnN2 outputs, PICRUST [48], InterProScan Combination of functional and taxonomic Combine MetaPhlAn2 and HUMAnN2 outputs results Visualization Export2graphlan [49], GraPhlAn [50], KRONA [51] This table presents the tools, organized in sections and subsections to help users. A more detailed table of the available tools and some documentation can be found in the online documentation (http://asaim.readthedocs.io/en/latest/tools/). other tools to explore and represent the community structure: metabolic assignation and pathway reconstruction (HUMAnN2 KRONA [51] and GraPhlAn [53]. Moreover, as in any Galaxy in- [46]); (iv) functional and taxonomic combination with developed stance, other visualizations are included such as Phyloviz [54] tools combining HUMAnN2 and MetaPhlAn2 outputs. for phylogenetic trees or the genome browser Trackster [55]for This workflow has been tested on two mock metagenomic visualizing SAM/BAM, BED, GFF/GTF, WIG, bigWig, bigBed, bed- datasets with controlled communities (Supplementary mate- Graph, and VCF datasets. rial). We have compared the extracted taxonomic and functional information to such information extracted with the EBI metage- nomics’ pipeline and to the expectations from the mock datasets Workflows to illustrate the potential of the ASaiM workflow. With ASaiM, we Each tool can be used separately in an explorative manner, the generate accurate and precise data for taxonomic analyses (Fig. Galaxy tool form helping users in setting meaningful parame- 2), and we can access information at the level of the species. ters. Tools can be also orchestrated inside workflows using the More functional information (e.g., gene families, gene ontolo- powerful Galaxy workflow manager. To assist in microbiota anal- gies, pathways) are also extracted with ASaiM compared to the yses, several workflows, including a few well-known pipelines, ones available on EBI metagenomics. With this workflow, we can are offered and documented (tools and their default parame- go one step further and investigate which taxons are involved in ters) in ASaiM. These workflows can be used as is; customized a specific pathway or a gene family (e.g ., involved species and either on the fly to tune the parameters or globally to change the their relative involvement in different step of fatty acid biosyn- tools, their order, and their default parameters; or even used as thesis pathways, Fig. 3). subworkflows. Moreover, users can also design novel meaning- For the tests, ASaiM was deployed on a computer with Debian ful workflows via the Galaxy workflow interface using the >100 GNU/Linux System, 8 cores Intel(R) Xeon(R) at 2.40 GHz and 32 Go available tools. of RAM. The workflow processed the 1,225,169 and 1,386,198 454 GS FLX Titanium reads of each datasets, with a stable memory usage, in 4h44 and 5h22 respectively (Supplementary material). Analysis of raw metagenomic or metatranscriptomic The execution time is logarithmically linked to the input data shotgun data size. With this workflow, it is then easy and quick to process raw microbiota data and extract diverse useful information. The workflow quickly produces, from raw metagenomic or metatranscriptomic shotgun data, accurate and precise taxo- nomic assignations, wide extended functional results, and tax- Assembly of metagenomics data onomically related metabolism information (Fig. 1). This work- flow consists of (i) processing with quality control/trimming Microbiota data usually come with quite short reads. To recon- (FastQC and Trim Galore!) and dereplication (VSearch [11]); (ii) struct genomes or to get longer sequences for further analy- taxonomic analyses with assignation (MetaPhlAn2 [45]) and vi- sis, microbiota sequences have to be assembled with dedicated sualization (KRONA, GraPhlAn); (iii) functional analyses with metagenome assemblers. To help in this task, two workflows Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 4 ASaiM: a Galaxy-based framework to analyze microbiota data Figure 1: Main ASaiM workflow to analyze raw sequences. This workflow takes as input a dataset of raw shotgun sequences (in FastQ format) from microbiota, preprocess it (yellow boxes), extracts taxonomic (red boxes) and functional (purple boxes) assignations, and combines them (green boxes). Image available under CC-BY license (https://doi.org/10.6084/m9.figshare.5371396.v3). Figure 2: Comparisons of the community structure for SRR072233. This figure compares the community structure between the expectations (mapping of the sequenc es on the expected genomes), data found on EBI Metagenomics database (extracted with the EBI Metagenomics pipeline), and the results of the main ASaiM workflow (Fig. 1). have been developed in ASaiM, each one using one of the well- Analysis of metataxonomic data performing assemblers [56-62]: MEGAHIT [41] and MetaSPAdes To analyze amplicon or internal transcribed spacer data, the [42]. Both workflows consists of: (1) processing with quality con- Mothur and QIIME tool suites are available in ASaiM. We inte- trol/trimming (FastQC and Trim Galore!); (2) assembly with ei- grated the workflows described in tutorials of Mothur and QI- ther MEGAHIT or MetaSPAdes; (3) estimation of the assembly IME as an example of metataxonomic data analyses as well as quality statistics with MetaQUAST [43]; (4) identification of po- support for the training material. tential assembly error signature with VALET; and (5) determi- nation of percentage of unmapped reads with Bowtie2 (Bowtie, Running as in EBI Metagenomics RRID:SCR 005476)[36] combined with MultiQC [30] to aggregate the results. As the tools used in the EBI Metagenomics pipeline (version 3) are also available in ASaiM, we integrate them in a workflow with the same steps as the EBI Metagenomics pipeline. Anal- yses made in the EBI Metagenomics website can be then re- Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 5 Figure 3: Example of an investigation of the relation between community structure and functions. The involved species and their relative involvement in fatty acid biosynthesis pathways have been extracted with ASaiM workflow (Fig. 1) for SRR072233. produced locally without having to wait for availability of EBI Installation and running ASaiM Metagenomics or to upload any data on EBI Metagenomics. How- Running the containerized ASaiM simply requires the user to ever, the parameters must be defined by the user, as we cannot install Docker and to start the ASaiM image with: find them on EBI Metagenomics documentation. In ASaiM, the entire provenance and every parameter are tracked to guarantee the reproducibility. $ docker run -d -p 8080:80 quay.io/bebatut/asaim- framework:latest Documentation and training As Galaxy, ASaiM is production ready and can be configured to use external accessible computer clusters or cloud environ- A tool or software is easier to use if it is well documented. Hence, ments. It is also possible and easy to install all or only a subset extensive documentation helps the users to be familiar with the of tools of the ASaiM framework on existing Galaxy instances, as tool and also prevents mis-usage. For ASaiM, we developed an we did on the European Galaxy instance [66]. More details about extensive online documentation [15] , mainly to explain how to the installation and the use of ASaiM are available on the online use it, how to deploy it, which tools are integrated with small documentation [15]. documentation about these tools, which workflows are avail- able, and how to use them. In addition to this online documentation, training materi- Conclusion als have been developed. Some Galaxy interactive tours are in- ASaiM provides a powerful framework to easily and quickly ana- cluded inside the Galaxy instance to guide users through entire lyze microbiota data in a reproducible, accessible, and transpar- microbiota analyses in an interactive (step-by-step) way. We also ent way. Built on a Galaxy instance wrapped in a Docker image, developed several step-by-step tutorials to explain the concepts ASaiM can be easily deployed with its extensive set of tools and of microbiota analyses, the different tools and parameters, and their dependencies, saving users from the hassle of installing ASaiM workflows with toy datasets. Hosted within the Galaxy all software. These tools are complemented with a set of pre- Training Material [63], the tutorials are available online at [64] and also directly accessible from ASaiM and its documentation defined and tested workflows to address the main questions of microbiota research (assembly, community structure, and func- for self-training. These tutorials and ASaiM have been used dur- ing several workshops on metagenomics data analysis and some tion). All these tools and workflows are extensively documented online [15] and supported by interactive tours and tutorials. undergraduate courses to explain and use the EBI Metagenomics workflow in a reproducible way. ASaiM is also used as support for With this complete infrastructure, ASaiM offers a sophis- ticated environment for microbiota analyses to any scientist a citizen science and education project (BeerDeCoded [65]). while promoting transparency, sharing, and reproducibility. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 6 ASaiM: a Galaxy-based framework to analyze microbiota data Methods References For the tests, ASaiM was deployed on a computer with Debian 1. Ladoukakis E, Kolisis FN, Chatziioannou AA. Integrative GNU/Linux System, 8 cores Intel(R) Xeon(R) at 2.40 GHz and 32 workflows for metagenomic analysis. Front Cell Dev Biol Go of RAM. The workflow has been run on two mock community 2014;2:70. samples of the Human Microbiome Project containing a genomic 2. Caporaso JG, Kuczynski J, Stombaugh J, et al., QIIME allows mixture of 22 known microbial strains. The details of compari- analysis of high-throughput community sequencing data. son analyses are described in the Supplementary Material. Nature methods 2010, 7, 5, 335–336:. 3. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported Availability of supporting data software for describing and comparing microbial communi- Archival copies of the code and mock data are available in the ties. Appl Environ Microbiol 2009;75:7537–41. 4. Nekrutenko A, Taylor J. Next-generation sequencing data GigaScience GigaDB repository [67]. interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 2012;13:667–72. Availability of supporting source code and 5. Meyer F, Paarmann D, D’Souza M, et al. The metagenomics requirements RAST server – a public resource for the automatic phyloge- netic and functional analysis of metagenomes. BMC Bioin- Project name: ASaiM formatics 2008;9:386. Project home page: https://github.com/ASaiM/framework 6. Hunter S, Corbett M, Denise H, et al. EBI metagenomics–a Operating system(s): Platform independent new resource for the analysis and archiving of metagenomic Other requirements: Docker data. Nucleic Acids Res 2014;42:D600–6. License: Apache 2 7. Goecks J, Nekrutenko A, Taylor J, et al. Galaxy: a compre- RRID:SCR 015878GTN hensive approach for supporting accessible, reproducible, All tools described herein are available in the Galaxy Tool- and transparent computational research in the life sciences. shed (https://toolshed.g2.bx.psu.edu). The Dockerfile to auto- Genome Biol 2010;11:R86. matically deploy ASaiM is provided in the GitHub repository 8. Afgan E, Baker D, van den Beek M, et al. The Galaxy plat- (https://github.com/ASaiM/framework) and a pre-built Docker form for accessible, reproducible and collaborative biomedi- image is available at https://quay.io/repository/bebatut/asaim-f cal analyses: 2016 update. Nucleic Acids Res 2016;44:W3–10. ramework. 9. Main Galaxy instance, http://usegalaxy.org 10. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic se- quence classification using exact alignments. Genome Biol Additional files 2014;15:R46. 11. Rognes T, Flouri T, Nichols B, et al. VSEARCH: a versatile open sup mat 1.pdf source tool for metagenomics. PeerJ 2016;4:e2584. 12. Kosakovsky Pond S, Wadhawan S, Chiaromonte F, et al. Abbreviations Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res 2009;19:2144–53. API: application programming interface; AsaiM: Auvergne Se- 13. Gruning ¨ BA, Fallmann J, Yusuf D, et al. The RNA workbench: quence analysis of intestinal Microbiota; CPU: central processing best practices for RNA and high-throughput sequencing unit; Galaxy Training Network. bioinformatics in Galaxy. Nucleic Acids Res 2017;45:W560– Competing interests 14. Galaxy instance of the Huttenhower Lab, http://huttenhowe r.sph.harvard.edu/galaxy The author(s) declare that they have no competing interests. 15. ASaiM Documentation, http://asaim.readthedocs.io 16. Docker images tracking the stable Galaxy releases, http://bg Funding ruening.github.io/docker-galaxy-stable 17. Blankenberg D, Von Kuster G, Bouvier E, et al. Dissemina- The Auvergne Regional Council and the European Regional De- tion of scientific software with Galaxy ToolShed. Genome velopment Fund supported this work. Biol 2014;15:403. 18. Sloggett C, Goonasekera N, Afgan E. BioBlend: automating Authors’ contributions pipeline analyses within Galaxy and CloudMan. Bioinfor- matics 2013;29:1685–6. B.B., K.G., C.D., S.H., J.F.B., E.P., and P.P. contributed equally to the 19. Gruning ¨ B, , Dale R, Sjodin ¨ A, et al. Bioconda: A sustain- conceptualization, methodology, and writing process; J.F.B. and able and comprehensive software distribution for the life sci- P.P. contributed equally to the funding acquisition; B.B., K.G., and ences. bioRxiv 2017. http://dx.doi.org/10.1101/207092. S.H. contributed equally to the software development; and B.B., 20. EBISearch,http://github.com/bebatut/ebisearch K.G., C.D., and J.F.B. contributed equally to the validation. 21. Batut B, Gruning ¨ B. ENASearch: A Python library for inter- acting with ENA’s API. The Journal of Open Source Software Acknowledgements 2017;2:418. 22. Li H. A statistical framework for SNP calling, mutation dis- The authors would like to thank EA 4678 CIDAM, UR 454 INRA, covery, association mapping and population genetical pa- M2iSH, LIMOS, AuBi, Mesocentr ´ e, and de.NBI for their involve- rameter estimation from sequencing data. Bioinformatics mentinthisproject,aswellasRejane Beugnot, Thomas Eymard, 2011;27:2987–93. David Parsons, and Bjorn ¨ Gruning ¨ for their help. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 7 23. Li H. Improving SNP discovery by base alignment quality. 47. Group HUMAnN2 to GO slim terms, https://github.com/asa Bioinformatics 2011;27:1157–8. im/group humann2 uniref abundances to GO. 24. Li H, Handsaker B, Wysoker A, et al. The Sequence 48. Langille MGI, Zaneveld J, Caporaso JG, et al. Predictive func- Alignment/Map format and SAMtools. Bioinformatics tional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 2013;31:814–21. 2009;25:2078–9. 49. export2graphlan, http://bitbucket.org/CibioCM/export2gra 25. McDonald D, Clemente JC, Kuczynski J, et al. The Biological phlan. Observation Matrix (BIOM) format or: how I learned to stop 50. Asnicar F, Weingart G, Tickle TL, et al. Compact graphi- worrying and love the ome-ome. Gigascience 2012;1:7. cal representation of phylogenetic data and metadata with 26. FastQC, https://www.bioinformatics.babraham.ac.uk/projec GraPhlAn. PeerJ 2015;3:e1029. ts/fastqc. 51. Ondov BD, Bergman NH, Phillippy AM. Interactive metage- 27. Schmieder R, Edwards R. Quality control and preprocessing nomic visualization in a Web browser. BMC Bioinformatics of metagenomic datasets. Bioinformatics 2011;27:863–4. 2011;12:385. 28. Trim Galore!, https://www.bioinformatics.babraham.ac.uk/ 52. Bik HM, Phinch: an interactive, exploratory data visualiza- projects/trim galore. tion framework for -Omics datasets. bioRxiv 2014 . http://dx 29. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- .doi.org/10.1101/009944. ble trimmer for Illumina sequence data. Bioinformatics 53. GraPhlAn, http://huttenhower.sph.harvard.edu/graphlan. 2014;30:2114–20. 54. Nascimento M, Sousa A, Ramirez M, et al., PHYLOViZ 2.0: 30. Ewels P, Magnusson M, Lundin S, et al. MultiQC: summarize providing scalable data integration and visualization for analysis results for multiple tools and samples in a single multiple phylogenetic inference methods. Bioinformatics report. Bioinformatics 2016;32:3047–8. 2017;33(1):128–129. 31. Fu L, Niu B, Zhu Z, et al. CD-HIT: accelerated for clus- 55. Goecks J, Coraor N, Galaxy Team, NGS analyses by visualiza- tering the next-generation sequencing data. Bioinformatics tion with Trackster. Nat Biotechnol 2012;30(11):1036–9. 2012;28:3150–2. 56. Awad S, Irber L, Titus Brown C, . Evaluating metagenome as- 32. Kopylova E, Noe L, Touzet H. SortMeRNA: fast and accu- sembly on a simple defined community with many strain rate filtering of ribosomal RNAs in metatranscriptomic data. variants , bioRxiv. 2017. http://dx.doi.org/10.1101/155358.. Bioinformatics 2012;28:3211–7. 57. Greenwald WW, Klitgord N, Seguritan V, et al. Utilization of 33. Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short defined microbial communities enables effective evaluation and error-prone reads. Nucleic Acids Res 2010;38:e191–. of meta-genomic assemblies. BMC Genomics 2017;18:296. 34. Li H, Durbin R. Fast and accurate long-read alignment with 58. Olson ND, Treangen TJ, Hill CM, et al. Metagenomic assem- Burrows-Wheeler transform. Bioinformatics 2010;26:589–95. bly through the lens of validation: recent advances in assess- 35. Langmead B, Salzberg SL. Fast gapped-read alignment with ing and improving the quality of genomes assembled from Bowtie 2. Nature Methods 2012, 9, 357–359. metagenomes. Brief Bioinform 2017, bbx098; http://dx.doi.o 36. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: architec- rg/10.1093/bib/bbx098. ture and applications. BMC Bioinformatics 2009;10:421. 59. Quince C, Walker AW, Simpson JT, et al. Shotgun metage- 37. Cock PJA, Chilton JM, Gruning ¨ B, et al. NCBI BLAST+ inte- nomics, from sampling to analysis. Nat Biotechnol grated into Galaxy. Gigascience 2015;4:39. 2017;35:833–44. 38. Buchfink B, Xie C, Huson DH. Fast and sensitive protein align- 60. Sczyrba A, Hofmann P, Belmann P, et al. Critical Assessment ment using DIAMOND. Nat Methods 2015;12:59–60. of Metagenome Interpretation-a benchmark of metage- 39. Mistry J, Finn RD, Eddy SR, et al. Challenges in homology nomics software. Nat Methods 2017;14:1063–71. search: HMMER3 and convergent evolution of coiled-coil re- 61. van der Walt AJ, Van Goethem MW, Ramond J-B, et al. As- gions. Nucleic Acids Res 2013;41(12):e121. sembling Metagenomes, One Community At A Time, BMC 40. Rodriguez-R LM, Konstantinidis KT. Nonpareil: a Genomics. 2017, 18:521. redundancy-based approach to assess the level of coverage 62. Vollmers J, Wiegand S, Kaster A-K. Comparing and evaluat- in metagenomic datasets. Bioinformatics 2014;30:629–35. ing metagenome assembly tools from a microbiologist’s per- 41. Li D, Luo R, Liu C-M, et al. MEGAHIT v1.0: A fast and scalable spective - not only size matters!. PLoS One 2017;12:e0169662. metagenome assembler driven by advanced methodologies 63. Batut B, Hiltemann S, Bagnacani A, et al., Community-driven and community practices. Methods 2016;102:3–11. data analysis training for biology, bioRxiv, 2017, http://dx.d 42. Nurk S, Meleshko D, Korobeynikov A, et al. metaSPAdes: oi.org/10.1101/225680 a new versatile metagenomic assembler. Genome Res 64. Galaxy Training Material for metagenomics, http://training 2017;27:824–34. .galaxyproject.org/topics/metagenomics 43. Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: 65. Sobel J, Henry L, Rotman N, et al. BeerDeCoded: the open beer evaluation of metagenome assemblies. Bioinformatics metagenome project. F1000Res 2017;6:1676. 2016;32:1088–90. 66. Metagenomics flavor of the European Galaxy instance, http 44. VALET, http://github.com/jgluck/valet. s://metagenomics.usegalaxy.eu 45. Truong DT, Franzosa EA, Tickle TL, et al. MetaPhlAn2 for 67. Batut B, Gravouil K, Defois C, et al. Supporting data for enhanced metagenomic taxonomic profiling. Nat Methods “ASaiM: a Galaxy-based framework to analyze microbiota 2015;12:902–3. data” GigaScience Database 2018 http://dx.doi.org/10.5524/1 46. Abubucker S, Segata N, Goll J, et al. Metabolic reconstruction for metagenomic data and its application to the human mi- crobiome. PLoS Comput Biol 2012;8:e1002358. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png GigaScience Oxford University Press

ASaiM: a Galaxy-based framework to analyze microbiota data

Free
7 pages

Loading next page...
 
/lp/ou_press/asaim-a-galaxy-based-framework-to-analyze-microbiota-data-04QP9u747U
Publisher
BGI
Copyright
© The Author(s) 2018. Published by Oxford University Press.
eISSN
2047-217X
D.O.I.
10.1093/gigascience/giy057
Publisher site
See Article on Publisher Site

Abstract

Background: New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings: We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions: Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable. Keywords: metagenomics; metataxonomics; user-friendly; Galaxy; Docker; microbiota; training scriptomics. These techniques are giving insight into taxonomic Findings profiles and genomic components of microbial communities. Background However, meta’omic data exploitation is not trivial due to the The study of microbiota and microbial communities has been fa- large amount of data, their complexity, the incompleteness of reference databases, and the difficulty to find, configure, use, cilitated by the evolution of sequencing techniques and the de- velopment of metataxonomics, metagenomics, and metatran- and combine the dedicated bioinformatics tools, etc. Hence, to Received: 8 September 2017; Revised: 6 January 2018; Accepted: 10 May 2018 The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 2 ASaiM: a Galaxy-based framework to analyze microbiota data extract useful information, a sequenced microbiota sample has A framework built on the shoulders of giants to be processed by sophisticated workflows with numerous suc- The ASaiM framework is built on existing tools and infrastruc- cessive bioinformatics steps [1]. Each step may require execution tures and combines all their forces to create an easily accessible of several tools or software. For example, to extract taxonomic and reproducible analysis platform. information with the widely used QIIME [2]orMothur[3], at least ASaiM is implemented as a portable virtualized container 10 different tools with at least four parameters each are needed. based on the Galaxy framework [8]. Galaxy provides researchers Designed for amplicon data, both QIIME and Mothur cannot be with means to reproduce their own workflows analyses, rerun directly applied to shotgun metagenomics data. In addition, the entire pipelines, or publish and share them with others. Based tools can be complex to use; they are command-line tools and on Galaxy, ASaiM is scalable from single CPU installations to may require extensive computational resources (memory, disk large multi-node high performance computing environments space). In this context, selecting the best tools, configuring them and manages efficiently job submission as well as memory con- to use the correct parameters and appropriate computational sumption of the tools. Deployments can be achieved by using resources, and combining them together in an analysis chain a pre-built ASaiM Docker image, which is based on the Galaxy is a complex and error-prone process. These issues and the in- Docker project [16]. This ASaiM Docker flavour is customized volved complexity are prohibiting scientists from participating with a variety of selected tools, workflows, interactive tours, and in the analysis of their own data. Furthermore, bioinformatics data that have been added as additional layers on top of the tools are often manually executed and/or patched together with generic Galaxy Docker instance. The containerization keeps the custom scripts. These practices raise doubts about a science deployment task to a minimum. The selected Galaxy tools are gold standard: reproducibility [3, 4]. Web services and automated automatically installed from the Galaxy ToolShed [17] using the pipelines such as MG-RAST [5] and EBI metagenomics [6] offer Galaxy API BioBlend [18], and the installation of the tools and solutions to the accessibility issue. However, these web services their dependencies are automatically resolved using packages work as a black box and are lacking in transparency, flexibil- available through Bioconda [19]. To populate ASaiM with the se- ity, and even reproducibility as the version and parameters of lected microbiota tools, we migrated the 12 tools/suites of tools the tools are not always available. Alternative approaches to im- and their dependencies to Bioconda (e.g., HUMAnN2), integrated prove accessibility, modularity, and reproducibility can be found 16 suites (>100 tools) into Galaxy (e.g., HUMAn2 or QIIME with its in open-source workflow systems such as Galaxy [ 6-8]. Galaxy approximately 40 tools), and updated the already available ones is a lightweight environment providing a web-based, intuitive, (Table 1). and accessible user interface to command-line tools, while auto- matically managing computation and transparently managing data provenance and workflow scheduling [ 6-8]. More than 5,500 Tools for microbiota data analyses tools can be used inside any Galaxy environment. For example, The tools integrated in ASaiM can be seen in Table 1.Theyare the main Galaxy server [9] integrates many genomic tools, and expertly selected for their relevance with regard to microbiota the few integrated metagenomics tools such as Kraken [10]or studies, such as Mothur (mothur, RRID:SCR 011947)[3], QIIME VSearch [11] have been showcased in the published windshield (QIIME, RRID:SCR 008249)[2], MetaPhlAn2 (MetaPhlAn, RRID:SC splatter analysis [12]. The tools can also be selected and com- R 004915)[45], HUMAnN2 [46], or tools used in existing pipelines bined to build Galaxy flavors focusing on specific type of analy- such as EBI Metagenomics’ one. We also added general tools sis, for example, the Galaxy RNA workbench [13] or the special- used in sequence analysis such as quality control, mapping, or ized Galaxy server of the Huttenhower lab [14]. However, none of similarity search tools. these solutions is dedicated to microbiota data analysis in gen- An effort in development was made to integrate these tools eral and with the community-standard tools. into Conda and the Galaxy environment (>100 tools integrated) In this context, we developed ASaiM (Auvergne Sequence with the help and support of the Galaxy community. We also de- analysis of intestinal Microbiota, RRID:SCR 015878), an Open- veloped two new tools to search and get data from EBI Metage- Source opinionated Galaxy-based framework. It integrates more nomics and ENA databases (EBISearch [20] and ENASearch [21]) than 100 tools and several workflows dedicated to microbiota and a tool to group HUMAnN2 outputs into Gene Ontology Slim analyses with an extensive documentation [15] and training Terms [47]. Tools inside ASaiM are documented [15] and orga- support. nized to make them findable. Diverse source of data Goals of ASaiM An easy way to upload user-data into ASaiM is provided by a ASaiM is developed as a modular, accessible, redistributable, web interface or more sophisticatedly via FTP or SFTP. On the sharable, and user-friendly framework for scientists working top, we added specialised tools that can interact with external with microbiota data. This framework is unique in combining databases like NCBI, ENA, or EBI Metagenomics to query them curated tools and workflows and providing easy access and sup- and download data into the ASaiM environment. port for scientists. ASaiM is based on four pillars: (1) easy and stable dissemina- Visualization of the data tion via Galaxy, Docker, and Conda, (2) a comprehensive set of microbiota-related tools, (3) a set of predefined and tested work- An analysis often ends with summarizing figures that conclude flows, and (4) extensive documentation and training to help sci- and represent the findings. ASaiM includes standard interactive entists in their analyses. plotting tools to draw bar charts and scatter plots for all kinds of tabular data. Phinch visualization [52] is also included to in- teractively visualize and explore any BIOM file and generate dif- ferent types of ready-to-publish figures. We also integrated two Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 3 Table 1: Available tools in ASaiM Section Subsection Tools File and meta tools Data retrieval EBISearch [20], ENASearch [21], SRA Tools Text manipulation Tools from Galaxy ToolShed Sequence file manipulation Tools from Galaxy ToolShed BAM/SAM file manipulation SAM tools [ 22-24] BIOM file manipulation BIOM-Format tools [ 25] Genomics tools Quality control FastQC [26], PRINSEQ [27], Trim Galore! [28, Trimmomatic [29], MultiQC [30] Clustering CD-Hit [31], Format CD-HIT outputs Sorting and prediction SortMeRNA [32], FragGeneScan [33] Mapping BWA [34], Bowtie [35] Similarity search NCBI Blast+ [36, 37], Diamond [38] Alignment HMMER3 [39] Microbiota dedicated tools Metagenomics data manipulation VSEARCH [11], Nonpareil [40] Assembly MEGAHIT [41], metaSPAdes [42], metaQUAST [43], VALET [44] Metataxonomic sequence analysis Mothur [3], QIIME [2] Taxonomy assignation on WGS sequences MetaPhlAn2 [45], Format MetaPhlan2, Kraken [10] Metabolism assignation HUMAnN2 [46], Group HUMAnN2 to GO slim terms [47], Compare HUMAnN2 outputs, PICRUST [48], InterProScan Combination of functional and taxonomic Combine MetaPhlAn2 and HUMAnN2 outputs results Visualization Export2graphlan [49], GraPhlAn [50], KRONA [51] This table presents the tools, organized in sections and subsections to help users. A more detailed table of the available tools and some documentation can be found in the online documentation (http://asaim.readthedocs.io/en/latest/tools/). other tools to explore and represent the community structure: metabolic assignation and pathway reconstruction (HUMAnN2 KRONA [51] and GraPhlAn [53]. Moreover, as in any Galaxy in- [46]); (iv) functional and taxonomic combination with developed stance, other visualizations are included such as Phyloviz [54] tools combining HUMAnN2 and MetaPhlAn2 outputs. for phylogenetic trees or the genome browser Trackster [55]for This workflow has been tested on two mock metagenomic visualizing SAM/BAM, BED, GFF/GTF, WIG, bigWig, bigBed, bed- datasets with controlled communities (Supplementary mate- Graph, and VCF datasets. rial). We have compared the extracted taxonomic and functional information to such information extracted with the EBI metage- nomics’ pipeline and to the expectations from the mock datasets Workflows to illustrate the potential of the ASaiM workflow. With ASaiM, we Each tool can be used separately in an explorative manner, the generate accurate and precise data for taxonomic analyses (Fig. Galaxy tool form helping users in setting meaningful parame- 2), and we can access information at the level of the species. ters. Tools can be also orchestrated inside workflows using the More functional information (e.g., gene families, gene ontolo- powerful Galaxy workflow manager. To assist in microbiota anal- gies, pathways) are also extracted with ASaiM compared to the yses, several workflows, including a few well-known pipelines, ones available on EBI metagenomics. With this workflow, we can are offered and documented (tools and their default parame- go one step further and investigate which taxons are involved in ters) in ASaiM. These workflows can be used as is; customized a specific pathway or a gene family (e.g ., involved species and either on the fly to tune the parameters or globally to change the their relative involvement in different step of fatty acid biosyn- tools, their order, and their default parameters; or even used as thesis pathways, Fig. 3). subworkflows. Moreover, users can also design novel meaning- For the tests, ASaiM was deployed on a computer with Debian ful workflows via the Galaxy workflow interface using the >100 GNU/Linux System, 8 cores Intel(R) Xeon(R) at 2.40 GHz and 32 Go available tools. of RAM. The workflow processed the 1,225,169 and 1,386,198 454 GS FLX Titanium reads of each datasets, with a stable memory usage, in 4h44 and 5h22 respectively (Supplementary material). Analysis of raw metagenomic or metatranscriptomic The execution time is logarithmically linked to the input data shotgun data size. With this workflow, it is then easy and quick to process raw microbiota data and extract diverse useful information. The workflow quickly produces, from raw metagenomic or metatranscriptomic shotgun data, accurate and precise taxo- nomic assignations, wide extended functional results, and tax- Assembly of metagenomics data onomically related metabolism information (Fig. 1). This work- flow consists of (i) processing with quality control/trimming Microbiota data usually come with quite short reads. To recon- (FastQC and Trim Galore!) and dereplication (VSearch [11]); (ii) struct genomes or to get longer sequences for further analy- taxonomic analyses with assignation (MetaPhlAn2 [45]) and vi- sis, microbiota sequences have to be assembled with dedicated sualization (KRONA, GraPhlAn); (iii) functional analyses with metagenome assemblers. To help in this task, two workflows Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 4 ASaiM: a Galaxy-based framework to analyze microbiota data Figure 1: Main ASaiM workflow to analyze raw sequences. This workflow takes as input a dataset of raw shotgun sequences (in FastQ format) from microbiota, preprocess it (yellow boxes), extracts taxonomic (red boxes) and functional (purple boxes) assignations, and combines them (green boxes). Image available under CC-BY license (https://doi.org/10.6084/m9.figshare.5371396.v3). Figure 2: Comparisons of the community structure for SRR072233. This figure compares the community structure between the expectations (mapping of the sequenc es on the expected genomes), data found on EBI Metagenomics database (extracted with the EBI Metagenomics pipeline), and the results of the main ASaiM workflow (Fig. 1). have been developed in ASaiM, each one using one of the well- Analysis of metataxonomic data performing assemblers [56-62]: MEGAHIT [41] and MetaSPAdes To analyze amplicon or internal transcribed spacer data, the [42]. Both workflows consists of: (1) processing with quality con- Mothur and QIIME tool suites are available in ASaiM. We inte- trol/trimming (FastQC and Trim Galore!); (2) assembly with ei- grated the workflows described in tutorials of Mothur and QI- ther MEGAHIT or MetaSPAdes; (3) estimation of the assembly IME as an example of metataxonomic data analyses as well as quality statistics with MetaQUAST [43]; (4) identification of po- support for the training material. tential assembly error signature with VALET; and (5) determi- nation of percentage of unmapped reads with Bowtie2 (Bowtie, Running as in EBI Metagenomics RRID:SCR 005476)[36] combined with MultiQC [30] to aggregate the results. As the tools used in the EBI Metagenomics pipeline (version 3) are also available in ASaiM, we integrate them in a workflow with the same steps as the EBI Metagenomics pipeline. Anal- yses made in the EBI Metagenomics website can be then re- Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 5 Figure 3: Example of an investigation of the relation between community structure and functions. The involved species and their relative involvement in fatty acid biosynthesis pathways have been extracted with ASaiM workflow (Fig. 1) for SRR072233. produced locally without having to wait for availability of EBI Installation and running ASaiM Metagenomics or to upload any data on EBI Metagenomics. How- Running the containerized ASaiM simply requires the user to ever, the parameters must be defined by the user, as we cannot install Docker and to start the ASaiM image with: find them on EBI Metagenomics documentation. In ASaiM, the entire provenance and every parameter are tracked to guarantee the reproducibility. $ docker run -d -p 8080:80 quay.io/bebatut/asaim- framework:latest Documentation and training As Galaxy, ASaiM is production ready and can be configured to use external accessible computer clusters or cloud environ- A tool or software is easier to use if it is well documented. Hence, ments. It is also possible and easy to install all or only a subset extensive documentation helps the users to be familiar with the of tools of the ASaiM framework on existing Galaxy instances, as tool and also prevents mis-usage. For ASaiM, we developed an we did on the European Galaxy instance [66]. More details about extensive online documentation [15] , mainly to explain how to the installation and the use of ASaiM are available on the online use it, how to deploy it, which tools are integrated with small documentation [15]. documentation about these tools, which workflows are avail- able, and how to use them. In addition to this online documentation, training materi- Conclusion als have been developed. Some Galaxy interactive tours are in- ASaiM provides a powerful framework to easily and quickly ana- cluded inside the Galaxy instance to guide users through entire lyze microbiota data in a reproducible, accessible, and transpar- microbiota analyses in an interactive (step-by-step) way. We also ent way. Built on a Galaxy instance wrapped in a Docker image, developed several step-by-step tutorials to explain the concepts ASaiM can be easily deployed with its extensive set of tools and of microbiota analyses, the different tools and parameters, and their dependencies, saving users from the hassle of installing ASaiM workflows with toy datasets. Hosted within the Galaxy all software. These tools are complemented with a set of pre- Training Material [63], the tutorials are available online at [64] and also directly accessible from ASaiM and its documentation defined and tested workflows to address the main questions of microbiota research (assembly, community structure, and func- for self-training. These tutorials and ASaiM have been used dur- ing several workshops on metagenomics data analysis and some tion). All these tools and workflows are extensively documented online [15] and supported by interactive tours and tutorials. undergraduate courses to explain and use the EBI Metagenomics workflow in a reproducible way. ASaiM is also used as support for With this complete infrastructure, ASaiM offers a sophis- ticated environment for microbiota analyses to any scientist a citizen science and education project (BeerDeCoded [65]). while promoting transparency, sharing, and reproducibility. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 6 ASaiM: a Galaxy-based framework to analyze microbiota data Methods References For the tests, ASaiM was deployed on a computer with Debian 1. Ladoukakis E, Kolisis FN, Chatziioannou AA. Integrative GNU/Linux System, 8 cores Intel(R) Xeon(R) at 2.40 GHz and 32 workflows for metagenomic analysis. Front Cell Dev Biol Go of RAM. The workflow has been run on two mock community 2014;2:70. samples of the Human Microbiome Project containing a genomic 2. Caporaso JG, Kuczynski J, Stombaugh J, et al., QIIME allows mixture of 22 known microbial strains. The details of compari- analysis of high-throughput community sequencing data. son analyses are described in the Supplementary Material. Nature methods 2010, 7, 5, 335–336:. 3. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported Availability of supporting data software for describing and comparing microbial communi- Archival copies of the code and mock data are available in the ties. Appl Environ Microbiol 2009;75:7537–41. 4. Nekrutenko A, Taylor J. Next-generation sequencing data GigaScience GigaDB repository [67]. interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 2012;13:667–72. Availability of supporting source code and 5. Meyer F, Paarmann D, D’Souza M, et al. The metagenomics requirements RAST server – a public resource for the automatic phyloge- netic and functional analysis of metagenomes. BMC Bioin- Project name: ASaiM formatics 2008;9:386. Project home page: https://github.com/ASaiM/framework 6. Hunter S, Corbett M, Denise H, et al. EBI metagenomics–a Operating system(s): Platform independent new resource for the analysis and archiving of metagenomic Other requirements: Docker data. Nucleic Acids Res 2014;42:D600–6. License: Apache 2 7. Goecks J, Nekrutenko A, Taylor J, et al. Galaxy: a compre- RRID:SCR 015878GTN hensive approach for supporting accessible, reproducible, All tools described herein are available in the Galaxy Tool- and transparent computational research in the life sciences. shed (https://toolshed.g2.bx.psu.edu). The Dockerfile to auto- Genome Biol 2010;11:R86. matically deploy ASaiM is provided in the GitHub repository 8. Afgan E, Baker D, van den Beek M, et al. The Galaxy plat- (https://github.com/ASaiM/framework) and a pre-built Docker form for accessible, reproducible and collaborative biomedi- image is available at https://quay.io/repository/bebatut/asaim-f cal analyses: 2016 update. Nucleic Acids Res 2016;44:W3–10. ramework. 9. Main Galaxy instance, http://usegalaxy.org 10. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic se- quence classification using exact alignments. Genome Biol Additional files 2014;15:R46. 11. Rognes T, Flouri T, Nichols B, et al. VSEARCH: a versatile open sup mat 1.pdf source tool for metagenomics. PeerJ 2016;4:e2584. 12. Kosakovsky Pond S, Wadhawan S, Chiaromonte F, et al. Abbreviations Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res 2009;19:2144–53. API: application programming interface; AsaiM: Auvergne Se- 13. Gruning ¨ BA, Fallmann J, Yusuf D, et al. The RNA workbench: quence analysis of intestinal Microbiota; CPU: central processing best practices for RNA and high-throughput sequencing unit; Galaxy Training Network. bioinformatics in Galaxy. Nucleic Acids Res 2017;45:W560– Competing interests 14. Galaxy instance of the Huttenhower Lab, http://huttenhowe r.sph.harvard.edu/galaxy The author(s) declare that they have no competing interests. 15. ASaiM Documentation, http://asaim.readthedocs.io 16. Docker images tracking the stable Galaxy releases, http://bg Funding ruening.github.io/docker-galaxy-stable 17. Blankenberg D, Von Kuster G, Bouvier E, et al. Dissemina- The Auvergne Regional Council and the European Regional De- tion of scientific software with Galaxy ToolShed. Genome velopment Fund supported this work. Biol 2014;15:403. 18. Sloggett C, Goonasekera N, Afgan E. BioBlend: automating Authors’ contributions pipeline analyses within Galaxy and CloudMan. Bioinfor- matics 2013;29:1685–6. B.B., K.G., C.D., S.H., J.F.B., E.P., and P.P. contributed equally to the 19. Gruning ¨ B, , Dale R, Sjodin ¨ A, et al. Bioconda: A sustain- conceptualization, methodology, and writing process; J.F.B. and able and comprehensive software distribution for the life sci- P.P. contributed equally to the funding acquisition; B.B., K.G., and ences. bioRxiv 2017. http://dx.doi.org/10.1101/207092. S.H. contributed equally to the software development; and B.B., 20. EBISearch,http://github.com/bebatut/ebisearch K.G., C.D., and J.F.B. contributed equally to the validation. 21. Batut B, Gruning ¨ B. ENASearch: A Python library for inter- acting with ENA’s API. The Journal of Open Source Software Acknowledgements 2017;2:418. 22. Li H. A statistical framework for SNP calling, mutation dis- The authors would like to thank EA 4678 CIDAM, UR 454 INRA, covery, association mapping and population genetical pa- M2iSH, LIMOS, AuBi, Mesocentr ´ e, and de.NBI for their involve- rameter estimation from sequencing data. Bioinformatics mentinthisproject,aswellasRejane Beugnot, Thomas Eymard, 2011;27:2987–93. David Parsons, and Bjorn ¨ Gruning ¨ for their help. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018 Batutetal. 7 23. Li H. Improving SNP discovery by base alignment quality. 47. Group HUMAnN2 to GO slim terms, https://github.com/asa Bioinformatics 2011;27:1157–8. im/group humann2 uniref abundances to GO. 24. Li H, Handsaker B, Wysoker A, et al. The Sequence 48. Langille MGI, Zaneveld J, Caporaso JG, et al. Predictive func- Alignment/Map format and SAMtools. Bioinformatics tional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 2013;31:814–21. 2009;25:2078–9. 49. export2graphlan, http://bitbucket.org/CibioCM/export2gra 25. McDonald D, Clemente JC, Kuczynski J, et al. The Biological phlan. Observation Matrix (BIOM) format or: how I learned to stop 50. Asnicar F, Weingart G, Tickle TL, et al. Compact graphi- worrying and love the ome-ome. Gigascience 2012;1:7. cal representation of phylogenetic data and metadata with 26. FastQC, https://www.bioinformatics.babraham.ac.uk/projec GraPhlAn. PeerJ 2015;3:e1029. ts/fastqc. 51. Ondov BD, Bergman NH, Phillippy AM. Interactive metage- 27. Schmieder R, Edwards R. Quality control and preprocessing nomic visualization in a Web browser. BMC Bioinformatics of metagenomic datasets. Bioinformatics 2011;27:863–4. 2011;12:385. 28. Trim Galore!, https://www.bioinformatics.babraham.ac.uk/ 52. Bik HM, Phinch: an interactive, exploratory data visualiza- projects/trim galore. tion framework for -Omics datasets. bioRxiv 2014 . http://dx 29. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- .doi.org/10.1101/009944. ble trimmer for Illumina sequence data. Bioinformatics 53. GraPhlAn, http://huttenhower.sph.harvard.edu/graphlan. 2014;30:2114–20. 54. Nascimento M, Sousa A, Ramirez M, et al., PHYLOViZ 2.0: 30. Ewels P, Magnusson M, Lundin S, et al. MultiQC: summarize providing scalable data integration and visualization for analysis results for multiple tools and samples in a single multiple phylogenetic inference methods. Bioinformatics report. Bioinformatics 2016;32:3047–8. 2017;33(1):128–129. 31. Fu L, Niu B, Zhu Z, et al. CD-HIT: accelerated for clus- 55. Goecks J, Coraor N, Galaxy Team, NGS analyses by visualiza- tering the next-generation sequencing data. Bioinformatics tion with Trackster. Nat Biotechnol 2012;30(11):1036–9. 2012;28:3150–2. 56. Awad S, Irber L, Titus Brown C, . Evaluating metagenome as- 32. Kopylova E, Noe L, Touzet H. SortMeRNA: fast and accu- sembly on a simple defined community with many strain rate filtering of ribosomal RNAs in metatranscriptomic data. variants , bioRxiv. 2017. http://dx.doi.org/10.1101/155358.. Bioinformatics 2012;28:3211–7. 57. Greenwald WW, Klitgord N, Seguritan V, et al. Utilization of 33. Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short defined microbial communities enables effective evaluation and error-prone reads. Nucleic Acids Res 2010;38:e191–. of meta-genomic assemblies. BMC Genomics 2017;18:296. 34. Li H, Durbin R. Fast and accurate long-read alignment with 58. Olson ND, Treangen TJ, Hill CM, et al. Metagenomic assem- Burrows-Wheeler transform. Bioinformatics 2010;26:589–95. bly through the lens of validation: recent advances in assess- 35. Langmead B, Salzberg SL. Fast gapped-read alignment with ing and improving the quality of genomes assembled from Bowtie 2. Nature Methods 2012, 9, 357–359. metagenomes. Brief Bioinform 2017, bbx098; http://dx.doi.o 36. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: architec- rg/10.1093/bib/bbx098. ture and applications. BMC Bioinformatics 2009;10:421. 59. Quince C, Walker AW, Simpson JT, et al. Shotgun metage- 37. Cock PJA, Chilton JM, Gruning ¨ B, et al. NCBI BLAST+ inte- nomics, from sampling to analysis. Nat Biotechnol grated into Galaxy. Gigascience 2015;4:39. 2017;35:833–44. 38. Buchfink B, Xie C, Huson DH. Fast and sensitive protein align- 60. Sczyrba A, Hofmann P, Belmann P, et al. Critical Assessment ment using DIAMOND. Nat Methods 2015;12:59–60. of Metagenome Interpretation-a benchmark of metage- 39. Mistry J, Finn RD, Eddy SR, et al. Challenges in homology nomics software. Nat Methods 2017;14:1063–71. search: HMMER3 and convergent evolution of coiled-coil re- 61. van der Walt AJ, Van Goethem MW, Ramond J-B, et al. As- gions. Nucleic Acids Res 2013;41(12):e121. sembling Metagenomes, One Community At A Time, BMC 40. Rodriguez-R LM, Konstantinidis KT. Nonpareil: a Genomics. 2017, 18:521. redundancy-based approach to assess the level of coverage 62. Vollmers J, Wiegand S, Kaster A-K. Comparing and evaluat- in metagenomic datasets. Bioinformatics 2014;30:629–35. ing metagenome assembly tools from a microbiologist’s per- 41. Li D, Luo R, Liu C-M, et al. MEGAHIT v1.0: A fast and scalable spective - not only size matters!. PLoS One 2017;12:e0169662. metagenome assembler driven by advanced methodologies 63. Batut B, Hiltemann S, Bagnacani A, et al., Community-driven and community practices. Methods 2016;102:3–11. data analysis training for biology, bioRxiv, 2017, http://dx.d 42. Nurk S, Meleshko D, Korobeynikov A, et al. metaSPAdes: oi.org/10.1101/225680 a new versatile metagenomic assembler. Genome Res 64. Galaxy Training Material for metagenomics, http://training 2017;27:824–34. .galaxyproject.org/topics/metagenomics 43. Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: 65. Sobel J, Henry L, Rotman N, et al. BeerDeCoded: the open beer evaluation of metagenome assemblies. Bioinformatics metagenome project. F1000Res 2017;6:1676. 2016;32:1088–90. 66. Metagenomics flavor of the European Galaxy instance, http 44. VALET, http://github.com/jgluck/valet. s://metagenomics.usegalaxy.eu 45. Truong DT, Franzosa EA, Tickle TL, et al. MetaPhlAn2 for 67. Batut B, Gravouil K, Defois C, et al. Supporting data for enhanced metagenomic taxonomic profiling. Nat Methods “ASaiM: a Galaxy-based framework to analyze microbiota 2015;12:902–3. data” GigaScience Database 2018 http://dx.doi.org/10.5524/1 46. Abubucker S, Segata N, Goll J, et al. Metabolic reconstruction for metagenomic data and its application to the human mi- crobiome. PLoS Comput Biol 2012;8:e1002358. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/6/giy057/5001424 by Ed 'DeepDyve' Gillespie user on 21 June 2018

Journal

GigaScienceOxford University Press

Published: May 15, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off