Resources and tools for the high-throughput, multi-omic study of intestinal microbiota

Resources and tools for the high-throughput, multi-omic study of intestinal microbiota Abstract The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host–microbe interactions. human gut microbiome, data repositories, large-scale and integrative computational tools, modelling, immunomodulation, drug screening Background The human gastrointestinal tract is a complex ecosystem in which eukaryotic cells continuously interact with nutrients and with the complex microbial population of the gut microbiota [1]. Gut microorganisms are the source of many bioactive products that play key functions in human host pathways and microbe–microbe interactions [2]. Processes such as host–microbe crosstalk, immune activation and inflammation, microbe–microbe signalling, microbial metabolism and antimicrobial activity are bioactive in the human gut [3]. Therefore, the ability to modulate the gut microbiome and the associated host–microbe interactions holds great promise for developing new therapeutic strategies for many chronic diseases and antibiotic-resistant infections [4, 5]. Colonization of the gut starts just after birth when pioneering species interact, through surface receptors, with gut cells to promote the expression of a specific set of host genes and favour the colonization of commensal microorganisms [6]. The epithelium function and the mucosal associated immune system are influenced by direct host–microbiota interactions and through modulation of the microbial metabolism [7]. The immune system is trained to ensure a fine balance between the response given to commensal gut microbiota (i.e. homeostatic and healthy situations) and pathogens (i.e. gastrointestinal disorders) [8]. Several non-infectious human diseases, such as autoimmune disorders, inflammatory bowel disease (IBD) and some gut-associated cancers, are related to the immunological imbalance and compositional perturbations of the gut microbiota, also known as dysbiosis. Gut dysbiosis is a major contributor in diet-related obesity and type 2 diabetes mellitus [9, 10]. For example, alterations in the relative abundances of Gammaproteobacteria and Verrucomicrobia phyla as well as in the ratios of Firmicutes to Bacteroidetes are associated to overweight, and alterations in butyrate-producing bacteria, such as Faecalibacterium prausnitzii, are often related to diabetes mellitus [11, 12]. Moreover, genetic and simple obesity share similar structural and functional features of dysbiosis, such as higher production of toxins with known potential to induce metabolic deteriorations (e.g. trimethylamine-N-oxide and indoxyl sulphate), higher abundance of genomes containing genes coding enzymes involved in the production of these toxic co-metabolites and higher abundance of pathways for biosynthesis of bacterial antigens (such as endotoxin) [13–15]. Although the precise cause remains unknown, profiling studies of the gut microbiome associate the pathogenesis of IBD, a chronic and relapsing inflammatory disorder of the gut, with the under-representation of certain species in the faecal microbiota [16–19]. For example, F. prausnitzii has been postulated as a biomarker or a potential therapeutic agent of IBD [20, 21]. Gastric cancer and colorectal cancer are also connected to alterations in gut microbiota. For example, some dietary factors may alter gut microbiota interactions and affect cancer development and response to cancer treatment [22]. In fact, the term ‘oncobiome’ has been recently adopted to refer to the emerging field of research devoted to the study of the interplay between the human microbiome and cancer development [23]. Moreover, it is well recognized that the excessive use of broad-spectrum antibiotics can affect the relative proportions of gut microbial populations and foster bacterial resistance [24]. Considering recent technological advancements and community initiatives towards large-scale compilation of data on human microbiome, integrative data analysis may be the key to better understand the mechanisms of action of the gut microbiome and their implication in the development and chronicity of the above-mentioned diseases [25]. For example, the study of colon cancer has relied on the combination of microbiome and metabolome data [26], proteome and metagenome data supported the investigation of Crohn's disease (CD) [27] and metabolome, metagenome and metatranscriptome data provide a basis for the investigation of the relations among gut microbiome and the xenobiotic metabolism of digoxin [28]. Furthermore, systems-level approaches, namely, metabolic modelling approaches [5, 29] and microbiome-based predictive tools [30, 31] are showing great potential in delivering non-obvious and biologically meaningful knowledge. Previous reviews presented attempts at computational systems biology and in silico modelling of the human microbiome [32] and introduced computational methods for understanding the human gut microbiota and developing therapeutic strategies [33]. The focus of the present review is human gut microbiome research, and our aim is to provide up-to-date information on bioinformatics resources and tools specialized in or useful for the multifaceted investigation of this microbiome (Figure 1). This review is accompanied by a small website (available on http://sing-group.org/humangut) that keeps up-to-date track of the public availability of the hereby mentioned projects, resources and tools while welcoming further inputs from the community. Figure 1. View largeDownload slide Unravelling the mechanism of action of the human microbiome: resources and tools for the study of the intestinal microbiota. Figure 1. View largeDownload slide Unravelling the mechanism of action of the human microbiome: resources and tools for the study of the intestinal microbiota. Attention is set on two main application areas: (1) the characterization of gut microbiota composition and the functional interplay related to dysbiosis, such as disease and antibiotic therapy; and (2) the screening of the proteome of human gut species for products holding immunomodulatory, anti-inflammatory or other bioactivity of therapeutic interest. This work is thus considered of interest to those investigating the human gut microbiome and, in particular, those who are developing in silico software to pursue and consolidate emerging paths in such research. Data repositories According to the journal Science, the discovery of the microbiome was one of the 10 milestones of the first decade of the 21st century (http://www.sciencemag.org/site/special/insights2010/). Late in 2000s, two large-scale initiatives were launched with the aim to document the role of human-associated microbial communities in human health and disease, i.e. the NIH’s Human Microbiome Project (HMP) [34] and the European Metagenomics of the Human Intestinal Tract (MetaHit) project [35]. Despite the enormous volume of data generated by these initiatives, general data usage is challenged by a number of design, technical and access decisions [36]. For example, many analyses still depend on a catalogue of reference genes. Existing catalogues for the human gut microbiome are based on samples from single cohorts or reference genomes (or protein sequences), which limits coverage of global microbiome diversity. Therefore, efforts have been invested in implementing integrated catalogues of reference genes [37] and developing approaches to conduct population-level analysis [38]. New catalogues as the 1000 Genomes Project [39], the AmericanGut (http://americangut.org/) and the BritishGut (http://britishgut.org/) are supporting these analyses. Although it is hardly possible to enumerate all existing data resources that may be helpful for this area of research, it is important to have a comprehensive view of data availability, so that the usefulness of less known repositories is uncovered and potential information gaps can be tackled. In this sense, Figure 2 presents relevant data repositories for human gut microbiome research, and Table 1 provides a list of the data sets available for human gut (e.g. from single cohort studies). Additional details on the explored databases can be found in Supplementary Material S1 and in the Web pages supporting this review. Table 1. Data sets available for the study of the human gut microbiome and its interplay with the host in health and disease scenarios Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Table 1. Data sets available for the study of the human gut microbiome and its interplay with the host in health and disease scenarios Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Figure 2. View largeDownload slide Mindmap of bioinformatics databases and data repositories commonly used in gut-related research. Figure 2. View largeDownload slide Mindmap of bioinformatics databases and data repositories commonly used in gut-related research. Functional profiling using reference information can be based either on reference genome read mapping (at the nucleotide level) or translated protein database searches [30]. That is, the assignment may be based on full protein-coding genes (CDSs) by means of orthology relations with sequences in well-characterized functional databases, such as NCBI nr [40], KEGG Orthology [41] and COGs [42], or by identifying specific PFAM [43] or SMART [44] peptide domains within CDSs. Broader biological functions are then built on these low-level functional annotations [45] using hierarchical ontologies that group functionally related proteins, such as in KEGG [41], Metacyc [46] and SEED [47]. Data processing and integration pipelines are also available, for instance, MG-RAST [48], IMG/M [49], MEGAN [50], HUMAnN [51], MALINA [52], MOCAT2 [53] and COGNIZER [54]. These pipelines typically include some combination of quality control and interference steps subsequent to homology search, such as selection of pathways by maximum parsimony, taxonomic limitation or statistical smoothing. However, as whole-community functional profiling is not yet well established, neither gene annotations within reference genomes nor those in protein databases are well tuned to whole-community metabolism. Indeed, both MetaCyc [46] and SEED [47] have ongoing efforts to develop microbiome-specific functional annotations, and gene family catalogues, such as eggNOG [55], are looking for a better way to represent uncultured communities. Bioinformatics tools A number of different bioinformatics tools are useful to the study of human gut. In particular, considering the huge volume of data being generated by high-throughput technologies, tools are needed for the processing and analysis of individual omics data as well as to gain a multi-level, integrated understanding of the role of gut microbiome in different aspects of health and disease conditions. The level of complexity and specialization of these tools varies significantly. Figure 3 summarizes commonly used bioinformatics tools, which are described in the following sections. A more detailed description of these tools is found within the Supplementary Material S1 and in the Web pages supporting this review, and the objectives, pros and cons of the different OMICS are available on Supplementary Material S2. Figure 3. View largeDownload slide Mindmap of bioinformatics tools commonly used in gut-related research. Tool names ended in an asterisk are public but require login and those ended in a number sign are private. The rest of tools are publicly available. Figure 3. View largeDownload slide Mindmap of bioinformatics tools commonly used in gut-related research. Tool names ended in an asterisk are public but require login and those ended in a number sign are private. The rest of tools are publicly available. Metagenomics: composition, abundance and variation Metagenome is the collective genome of a given microbial community. Metagenome sequencing presents the first, perhaps the greatest, opportunity to identify novel and biologically interesting microbial products in the human gut microbiome [78, 79]. Thousands of human gut-associated metagenomes have already been sequenced, representing an extensive database for mining biologically active microbial products, and studying intestinal microbiome diversity and dysbiosis, as well as relations to health and disease [80]. General and detailed descriptions of metagenomics technologies and computational support can be found in recent reviews on the field [81–84]. Table 2 shows common and publicly available metagenomics tools. Table 2. Publicly available metagenomics tools Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities Table 2. Publicly available metagenomics tools Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities A key preliminary step in metagenomic analysis is to characterize the taxonomic diversity of the metagenome, i.e. to categorize various microbes and quantify their diversity in terms of species abundance. Here, it is important to differentiate the adopted methodologies between those in which concise regions of the 16S ribosomal DNA are polymerase chain reaction (PCR)-amplified and sequenced (metataxonomics), and those where the whole genetic material is isolated, fragmented and sequenced, i.e. the shotgun metagenomics (metagenomics) [132]. Metataxonomics data can test if there is a population split in complex communities. However, it rarely informs you of the mechanisms underlying the population split because of inter-individual variability and/or coverage. On the other hand, metagenomics offers an effective but imperfect method to profile the structure and the potential functions encoded in microbial communities. Many gut metagenomics studies still perform 16S ribosomal RNA (rRNA) sequencing, and pipelines used correspond to QIIME or MEGAN. However, whole-genome sequencing is becoming the technology of choice to perform sequence analysis and community comparison; so, we consider more appropriate to focus this section in this second option. The assembly of overlapping reads into continuous or semi-continuous genome fragments allows an in-depth view of different aspects within a genomic context. Numerous metagenome assemblers have been developed, most of which assemble sequences in de novo fashion, i.e. do not rely on a closely related reference sequence. MIRA [118] and AMOS [86] are examples of reference-based assemblers, while IDBA-UD [106], MetaVelvet [116], MetAMOS [113], MEGAHIT [109] and metaSPAdes [115] are examples of de novo assemblers. Furthermore, the need to assembly increasingly larger sequencing data is motivating serious investment in improving computational performance. Assembler developers are now looking for more time- and memory-efficient ways to handle massive data volumes (hundreds of Giga base-pairs) on single server. Binning approaches, i.e. the classification and/or clustering of reads into specific bins, can further help elucidate the broader genomic context of interesting features [133]. Some binning methods are taxonomy-dependent (supervised learning procedures), i.e. obtain estimates of the profile/abundance of ‘known’ taxonomic groups (reference database) [134]. CAMERA [91], MG-RAST server [48], NBC classifier [122] and WebCARMA [131] are some well-known taxonomy-dependent Web applications. On the other hand, there are taxonomy-independent methods (unsupervised learning procedures), which group reads based on their mutual similarity and do not involve a database comparison step [82]. The tools CONCOCT [95], GroopM [105], MaxBin [108], MetaBAT [110] and MyCC [121] are some prominent examples. The prediction and annotation of gene-coding sequences is the last, fundamental step of analysis. In terms of software commonly used in 16S rRNA gene analyses, the Genboree Microbiome Toolset supports community profiling (i.e. determination of the abundance of each type of microbe) [103], QIIME [129] and Mothur [119] (also part of Genboree) can be used to obtain quantitative insights into microbial relative abundances and ecosystems, BLAST [89] and Cd-hit [92] facilitate the comparison of large sets of proteins and the Chimera Slayer is used to detect sequences falsely interpreted as organisms (aiming to correct false perceptions of sample diversity and false identification of novel taxa) [93]. Furthermore, the Ribosomal Database Project classifiers may help in the assignment of rRNA gene sequences into bacterial taxonomy [135]. Tools such as Glimmer-MG [104], FragGeneScan [136], MetaGeneAnnotator [111], PICRUSt [126], Orphelia [123] and PROkaryotic DYnamic programming Gene-finding ALgorithm (Prodigal) [128] are good examples of how gene prediction approaches have adapted to the challenges posed by shotgun sequencing data. Nowadays, MG-RAST [48], IMG/M [49], MEGAN [50], HUMAnN [51], MALINA [52], MOCAT2 [53] and COGNIZER [54] are among the most well-known comparative genomics-based automated computational pipelines, and present multiple ongoing developments. MG-RAST provides an easy-to-use Web interface for metagenomics analysis, including alignment, but imposes some limitations in terms of data file upload (file size limits). In turn, both HUMAnN and MEGAN both lack an integrated alignment tool and are notably unable to perform comprehensive downstream processes, such as operon-level analysis [137]. Databases such as Pfam [43] and Clusters of Orthologous Groups (COGs) [42] enable methods for comparison with sequence-diverse protein families or recurring sequence motifs, and the Kegg Orthology (KO) and KEGG pathways databases [41] are often used to predict the composition ratio of microbial gene families and pathways from the HMP [138, 139]. Tools such as RAPSearch [130] and PAUDA [124] propose faster alternatives than BLAST to the alignment of environmental sequencing reads. More recently, FishTaco, an analytical and computational framework, has presented the ability to produce integrated taxonomic and functional comparative analyses. In particular, FishTaco is equipped to accurately quantify taxon-level contributions to disease-associated functional shifts, i.e. to trace back shifts in the microbiome’s functional capacity to specific taxa [100]. Besides comparative genomics, gut studies encompass structure-based approaches, functional prediction methods based on evolutionary conservation and phylogeny and network context-based approaches (e.g. co-expression and metabolic networks) [139–141]. Approximately 50% of the genes in the gut microbiome could not be characterized using standard annotation methods [142]. Therefore, conventional methods for putative gene characterization and functional prediction, based on alignment to homologous genes with existing annotations (e.g. BLAST), were rendered ineffective [43]. Alternative computational methods approached the problem by integrating standard homology information with additional information, namely, sequence features, co-expression data, binding sites and subcellular localisation data [143–146]. The study of discrepancies between taxonomic and functional variations led to a proposal to revise some of the main metagenomic processing procedures to uncover hidden functional variation across samples [147]. This revision relies on the Metagenomic Universal Single-Copy Correction (MUSiCC) method [120], and the Evidence-based Metagenomic Pathway Assignment using geNe Abundance DAta (EMPANADA) schema. Phylogenetic analysis is often supported by tools such as: CAFE, a stand-alone software, which integrates 28 measures and downstream visualized analysis [90]; AlFree, a Web server, which integrates 38 measures and supports the visualization of phylogeny [85]; the CVTree server, which implements a whole-genome-based and alignment-free composition vector method [96] and is also included in CAFE tool; the AmphoraNet [87] that is the Web server implementation of the AMPHORA2 method, i.e. incorporates probability-based sequence alignment masks to improve the phylogenetic accuracy; MetaPhlAn, which estimates the abundance of species in each sample according to the number of mapped reads to its markers [114]; and PhyloSift [125], which statistically tests lineages of interest directly from an uncultured DNA sample and allows for comparison of community structure among many samples. An immediate application of phylogenetic approaches is the study of how species within the same genome interaction groups decrease or increase their abundance during dietary interventions [12]. The generation of community-level metabolic networks of the microbiome is also an interesting avenue. For example, these networks can be used to explore gene-level and network-level topological differences associated with obesity and IBD [148]. By placing variations in gene abundance in the context of these networks, researchers are able to look into the genes associated with these host states, namely, may inspect gene location and generate hypotheses about how the microbiome is interacting with host metabolism. Additionally, network analysis can bring to light associations between topological variations and community species composition. Genome mining approaches are increasingly valuable for the purpose of identifying antimicrobial-producing microorganisms as well as screening for and harnessing putative gene clusters. For example, genome mining using Rapid Annotation using Subsystem Technology (RAST) was applied to the comparative pathogenomic analysis of Nesterenkonia jeotgali [149]. Likewise, the bacteriocin genome mining tool BAGEL3 [88] helped in the identification of potential bacteriocin producers in the genomes of the gut microbiome subset of the HMP's reference genome database [150]. Arguably, metagenomics should be at the basis of most (if not all) microbiome studies. Despite the huge technological development in this field, methods are often limited in resolution and may fail to resolve relevant details concerning the composition of species and genes in the microbiome. Accumulating evidence shows that important functions of the gut microbiota may be species or even strain-specific; yet, many studies in metagenomics are still conducted at genus or higher taxonomic levels because of limited ability to assemble individual bacterial genomes directly from metagenomic data [151]. Metatranscriptomics: gene expression profiling Metatranscriptomics encompasses the functional characterization of microbiomes based on mRNA sequencing data to gain a better understanding about the taxonomic composition and active biochemical functions of microbiomes. Metatranscriptomics captures gene expression patterns from microbial communities without previous assumptions as to the ongoing activities or dominant taxa, and provides a catalogue of those genes being transcribed under given experimental conditions. Here, bioinformatics analysis methods can be broadly classified into those based on reference-dependent methods and those that are reference-independent. Reference-dependent methods are based on sequence alignment onto functionally well-characterized databases or datas ets, whereas reference-independent methods resort to de novo assemblies. Table 3 presents available metatranscriptomics tools. Table 3. Publicly available metatranscriptomics tools Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Table 3. Publicly available metatranscriptomics tools Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Most metatranscriptomics analyses involve reference-based or metagenomics-dependent analysis workflows. For example, the Functional Mapping and Analysis Pipeline (FMAP) aims to identify differentially abundant features in microbiome data sets [102]. FMAP supports data preprocessing and performs sequence alignment, gene family abundance calculations and differential statistical analysis. To this end, the pipeline integrates various tools: NCBI’s Best Match Tagger [154] for data preprocessing; USEARCH [156] and DIAMOND [99], for the alignment of reads to a reference database, namely, against a KEGG-filtered UniProt data collection; and the analysis of differentially abundant genes and the enrichment analysis of pathways and operons, based on statistical testing methods such as metagenomeSeq and Kruskal–Wallis rank-sum. The comprehensive metatranscriptomics analysis (COMAN) is an integrated Web server dedicated to the comprehensive functional analysis of metatranscriptomic data [152]. After an initial quality control step, reads are mapped to the RefSeq database using DIAMOND [99]. Functional annotation of genes and reads are prepared with COG [42] and KO [41]. COG-based annotation is conducted using RPS-BLAST against the CDD database [157], whereas DIAMOND [99] and KOBAS [107] support KO annotation. In addition, PRIAM [127] supports the annotation of genes to enzymes (Enzyme Commission numbers, ECs), and enables further profiling against MetaCyc pathways [46]. Noteworthy, the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline has been specifically designed for the analysis of gut microbiome data [155]. The FLASh short-read aligner is used in the preprocessing step [101]. The annotation step resorts to MG-RAST tools to generate annotations for the best matches to organisms and individual transcripts (RefSeq database), and the SEED database is used to obtain additional ontology annotations. The DESeq2 package in R supports the comparison of metatranscriptome samples and the identification of significantly differentially expressed transcripts [97]. A major drawback of reference-based methods is the large number of sequencing reads from uncultured species and divergent strains that are discarded during data analysis, i.e. the loss of potentially useful information. For example, in a recent study of 252 human fecal samples, 43% of the reads could not be mapped to available isolate genomes [158]. To mitigate this lacuna, reference-independent methods address the retrieval of the actual genomes and potentially novel genes present in the samples, maximizing the amount of data exploited for analysis. To this end, metatranscriptomics reference-independent approaches use dedicated assembly methods, namely, metatranscriptome assemblers, metagenomics assemblers or single-species transcriptome assemblers [159]. Moreover, these approaches aim to leverage the advantages associated with integrating metatranscriptomics and metagenomics data for the large-scale analysis of microbial community structure and function. For example, the open-source Integrated Meta-omic Pipeline (IMP) is a self-contained and standardized de novo assembly-based pipeline, which allows automated and large-scale integrated analyses of combined metagenomics and metatranscriptomics data sets [153]. Notably, IMP incorporates iterative co-assembly of metagenomic and metatranscriptomic data, analyses of microbial community structure and function and genomic signature-based visualization. Despite all these advances, metatranscriptomics approaches continue to struggle to cope with the quality mRNA from microbial samples. Metaproteomics studies: spectral search and protein profile Metaproteomics studies aim to perform the large-scale characterization of proteins extracted from the human gut microbiota [160]. Metaproteomics allows for the characterization of the dynamic proteome in complex communities, revealing their impact on microbial metabolism and proportionate information about which taxonomic groups are performing different metabolic roles. Compared with metagenomics and metatranscriptomics, the added value of metaproteomics lays on providing function details, i.e. metaproteomics conveys the identification of proteins, their assignment to specific taxa and the description of how these proteins interact with the human host. The publicly available metaproteomics tools are listed in Table 4. Table 4. Publicly available metaproteomics tools Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences ChIP-Seq: Chromatin Immunoprecipitation Sequencing. Table 4. Publicly available metaproteomics tools Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences ChIP-Seq: Chromatin Immunoprecipitation Sequencing. ExPASy Web portal has worldwide reputation as one of the main bioinformatics resources for proteomics [184]. ExPASy databases include Swiss-Prot [185], STRING [186], SWISS-MODEL [187], PROSITE [188], ViralZone [189] and neXtProt [190]. Analysis tools are available for specific tasks, such as protein sequence and identification [191] (tools such AACompIdent [161] or FindPept [165]), proteomics experiment [192] (tools such MZJava [171] or pICarver [174]), function analysis [193] (tools such AACompSim [161] or Compute pI/Mw [161]), sequences sites, features and motifs [194] (tools such HAMAP [167] or ScanProsite [179]), protein modification [195] (tools such ISMARA [168] or QuickMod [177]), protein structure [196] (tools such COILS [164] or QMEAN [176]), protein interactions [197] (tools such PDBePisa [173] or PredictProtein [175]) and similarity search/alignment [198] (tools such ClustalO [163] or T-coffee [181]). Analysis of mass spectra (MS) (i.e. decode of peptide sequences) is typically facilitated by database searching algorithms, namely, SEQUEST [180], Mascot [169], MyriMatch [170], OMSSA [172] and X! Tandem [183]. The development of cross-species protein identification approaches is desired, but challenging, given the complexity of the gut microbial proteome and the dynamic distribution of species between individuals [199, 200]. New approaches aim to increase the sensitivity of the peptide spectrum matching. Together, the combination of the ComPIL metaproteomic analysis method and the Blazmass search engine allows larger-scale database searches, including peptide masses, protein information and peptide sequences [162]. Other possible approaches are the integration of synthetic metaproteome information with metagenomic information [62], and de novo sequencing [201, 202]. The Galaxy bioinformatics framework offers a sophisticated proteogenomic workflow, named Galaxy for Proteomics or Galaxy-P (usegalaxyp.org), in support of broad metaproteomics data analysis [203]. This is a complex workflow, which includes ∼140 steps, and can be shared using built-in Galaxy functions [166]. Alternatively, the MetaPro-IQ workflow, which has been specifically developed for gut metaproteome identification and quantification, uses almost complete human or mouse gut microbial gene catalogues as reference database and an iterative database search strategy [204]. Unipept offers programmatic access to metaproteomics analysis features and has the advantage of being supported by a fast index built from UniProtKB and NCBI Taxonomy [205]. It facilitates interactive data visualization, and the Unique Peptide Finder enables the discovery of tryptic peptides that are taxon-specific, i.e. peptides that can be used as biomarkers to reliably detect the presence of the targeted taxa [182]. Scaffold is designed to identify and analyse proteins in biological samples [178]. By using output files from MS/MS search engines, Scaffold validates, organizes and interprets MA data, allowing the user to more easily manage large amounts of data, compare samples and search for protein modifications. Moreover, it attempts to increase the confidence in protein identification reports through the use of several statistical methods. In terms of applications, the study ofCD is a meaningful example of the added value of the integrated analysis of metagenomics and metaproteomics approaches [27]. Such analysis led to a better understanding of the CD phenotype (i.e. genes, proteins and pathways that primarily differentiated patients from healthy subjects) and enabled the association of the phenotype with alterations in bacterial carbohydrate metabolism, bacterial–host interactions, as well as human host-secreted enzymes. The investigation of colonic metaproteomics bacterial signatures in obesity represents another application [206]. The goal was to detect differences among obese and non-obese samples at a functional level. The combination of metaproteomics and phylogenetic data exposed significant metabolic activity of the phylum Bacteroidetes in obese subjects. Likewise, faecal metaproteomics analysis was applied in a probiotic intervention trial to identify individually different human intestinal proteomes (i.e. personalized host–microbiota interactions) and examine the activity of main phyla as well as key species, namely, F. prausnitzii [207]. Finally, in the context of type 1 diabetes mellitus (T1DM), a large-scale analysis of intra- and inter-individual variation using metagenomics, metatranscriptomics and metaproteomics inputs showed that community structures are reflected across all ‘-omics’ levels. In particular, differences in the relative abundances of certain human pancreatic enzymes were correlated with the expression of microbial genes involved in T1DM-relevant metabolic transformations, such as thiamine synthesis and glycolysis [208]. Metabolomics studies: metabolite identification and concentration Gut metabolome studies aim to identify and quantify the set of metabolites (or specific metabolites) in biological samples, and therefore, look into differences in signature metabolites and their relation to changes in the activity of metabolic pathways [209–211]. Metabolomics allows for the characterization of the dynamic metabolome in complex communities, revealing their impact on microbial metabolism. Besides being the most immediate indicator of dysbiosis [59, 212], metabolome profiling is able to show dependences on environmental factors (e.g. diet [213, 214] and antibiotic exposure [215, 216]) as well as provide valuable information about the interactions of the microbial community with the host environment (e.g. quorum sensing [217]). Metabolite profiling is typically carried out using a combination of chromatographical techniques (e.g. liquid chromatography or gas chromatography) and detection methods, such as MA and nuclear magnetic resonance [218, 219]. Computationally speaking, data processing and analysis can be challenging because of the huge number of different metabolites potentially detected in this kind of samples. Moreover, a combination of statistical and machine learning methods is usually applied to identify discriminative features [220–222]. For example, classical univariate tests (e.g. Student’s t-test, multivariate linear regression and Mann–Whitney test) are combined with multivariate analysis such as principal component analysis, hierarchical cluster analysis, discriminant analysis and classification models (e.g. k-nearest neighbour). Currently available metabolomics computational tools are listed in Table 5. Table 5. Publicly available metabolomics tools Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Table 5. Publicly available metabolomics tools Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Comparison against spectral databases is required to identify and quantify the metabolites in the sample, namely: the Human Metabolome Database [227], BioMagResBank [228], Madison-Qingdao Metabolomics Consortium Database [229], MassBank [230], Golm Metabolome Database [231], METLIN [232] and ChemSpider [233]. Alternatively, the in silico fragmenter MetFrag [225] combines compound database searching (via ChemSpider and PubChem [234] Web services) and fragmentation prediction, and the Metabolite Identification via Database Searching (MIDAS) approach [224] matches measured tandem MA against the predicted fragments of metabolites in the MetaCyc database. Untargeted metabolomics approaches are being developed as means to minimize the challenges in matching metabolites to their spectral features [222]. For example, the Metabolic I Network Expansions (MINEs) databases record molecules that have not been observed yet, but are likely to occur based on known metabolites and common biochemical reactions [235]. Computational predictions are based on the Biochemical Network Integrated Computational Explorer (BNICE) algorithm and expert-curated reaction rules based on the Enzyme Commission classification system. Details on a broader range of Web accessible databases of the properties, enzymatic reactions and metabolism of small molecules-search options have been recently reported [236, 237]. Within the scope of human gut research, IBD is one the main focus of metabolomics studies. The most discriminative metabolites for IBD, mainly derived from nuclear magnetic resonance spectroscopy studies, were alanine, isoleucine, leucine, lysine, valine, phenylalanine and butyrate [209, 238–240]. Also, MA studies have shown that long-chain fatty acids could play an important role in the disease. Researchers are now exploring certain metabolic patterns, discussing whether they are a cause of IBD or rather a consequence of inflammation or altered gut microbiota. For example, an increase of amino acids in faecal samples of IBD patients is explained by the low capacity of the inflamed intestinal tissue to absorb nutrients [241]. Obesity and T2D are also the subject of discussion through studies of co-metabolites [14]. Fluxomics studies: high-throughput analysis of metabolic fluxes Fluxomics refers to the group of techniques focused on the high-throughput analysis of metabolic fluxes, and is a clear complement to transcriptomics, proteomics and metabolomics. By integrating in vivo metabolic data with stoichiometric network models, absolute fluxes in the central metabolism of a biological system can be determined. Applications can be grossly divided into two approaches, constraint-based methods for examination of the relative contributions of different pathways to a given phenotype, and fluxomics based in the incorporation and monitorization of (13) C-labelled compounds [242]. Different algorithms, desktop or Web applications and resources have been published during the past years to facilitate the work of the fluxomics researchers [243]. Table 6 presents the publicly available fluxomics tools. Table 6. Publicly available fluxomics tools Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Table 6. Publicly available fluxomics tools Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Constraint-based approaches include more or less specific applications dealing with flux balance analysis (FBA). FBA has been traditionally used in the characterization of cellular metabolism and metabolic engineering [270]. There are many algorithms that have been developed for the high-throughput characterization of metabolic fluxes. Regulatory On/Off Minimization (ROOM) works on metabolic steady states and is focused on changes induced by gene knockouts, mostly providing rerouting options in response to the absence of an enzymatic step (i.e. a gene knockout) [264]. Fastcore is another algorithm able to reconstruct metabolic sub-networks that have been extracted from wider models. Starting by a set of reactions empirically known to be active (denominated core), the algorithm returns a metabolic network containing all the reactions and the minimum number of additional reactions that satisfy the metabolic results [248]. Fast-SL is another algorithm that, in the context of a genome-scale metabolic model, identifies sets of lethal reactions, which is useful for combinatorial discovery of drug targets [249]. ll-ACHRB (Artificially Centered Hit-and-Run on a Box) is a scalable algorithm for sampling flux samples in the context of metabolic networks [258]. Fast-SNP is an algorithm focused on the improvement of computational efficiency by reducing the original network into a smaller matrix. Overall, this algorithm is efficient for the formulation of loop-law constraints, allowing loopless flux optimization [250]. B-spline fitting Dynamic Metabolic Flux Analysis (B-DMFA) is a heuristic algorithm focused on knot placement, a time-consuming task in the dynamic metabolic flux analysis. This is performed by implementing the local support property of B-splines [244]. Finally, Influx_s is a deterministic algorithm with improved accuracy for flux estimation. Influx_s uses few computational resources; indeed, the central carbon metabolism network estimation of Escherichia coli requires from several seconds to few minutes with a standard personal computer (PC) architecture [255]. From the hundreds of applications available in the literature, many have been programmed using the Matlab environment. Perhaps, the most used is the Constraints-Based Reconstruction and Analysis (COBRA) Toolbox. This software package has been used for quantitative prediction of cellular metabolism through a predictive computation of optimal growth (steady-state or dynamic), and allows modelling the occurrence of gene deletions [245]. COBRA Toolbox can be also useful for methodologies such as regulatory on/off minimization and flux variability analysis [271]. It has been also implemented as Python package (COBRApy) and in Julia, where other associated packages such as distributedFBA.jl can be used to solve multiple flux balance analyses on concise pathways or on the whole central metabolism. This implementation in Julia provides scalability and integration with the high-level interface MathProgBase.jl, obtaining optimum results in terms of resources optimization [272]. Mackinac has been designed to profit from the COBRA metabolic analysis capabilities to, in combination with ModelSEED, infer in the metabolic potential of a biological system and to optimize genome-scale metabolic models [273]. ModelSEED is precisely a Web-based resource for the analysis of genome-scale metabolic models [262], and some Cytoscape plugins such as CytoSEED allows visualization and manipulation of the created models [270]. ORCA is another COBRA-based package, which implements notable improvements in terms of scope extension of COBRA metabolic models [274]. Another Matlab-based desktop application is Coordinate Hit-and-Run with Rounding (CHRR), which allows genome-scale sampling in biochemical networks [275]. Many fluxomics applications have been developed in the R environment (sometimes simply as libraries), such as Sybil, an R-based library for the analyses of metabolic networks that indeed is part of the COBRA Toolbox implementation in R [267]. GlobalFit is another R package designed for metabolic network refining. This is achieved by establishing models in which many properties of the reactions are changed (e.g. reversibility, presence/absence of an enzymatic step) and the fitting of experimental data is observed. Then, GlobalFit finds the optimal metabolic model with the minimum number of metabolic changes that fits better with the experimental data [254]. OptFlux is a platform for metabolic engineering, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results or flux modes [263]. It has been improved with a visualisation plugin, enabling the graphical the edition of the network and the visualization of the results [276]. Integration of Relative Metabolite Levels for Flux prediction (iReMet-flux) is an interesting tool, as it integrates data from other omics. More concisely, integration of metabolomics data is combined in iReMet-flux with the assumption that metabolism minimizes flux changes between two different scenarios. This allows biological interpretation of the changes on metabolite levels among different experimental conditions [256]. Sysmetab integrates high-throughput data from MS and/or nuclear magnetic resonance measurements to solve metabolic fluxes in experiments involving carbon-labelled metabolites [268]. Other remarkable applications are FluxModeCalculator, which allows large-scale elementary flux mode computations using multiple cores [252], and VisualCNA, which is a PyMOL plugin implementing many graphical visualizations of constraint network analysis [269]. A certain number of fluxomics tools incorporate data of 13 C experiments, in which the rates of the metabolic reactions within carbon metabolism are monitored through a 13 C-labelled metabolite, enabling among other applications query on the reversible character of some reactions [277]. This is the principle underlying Central Carbon Metabolic Flux (CeCaF) Database, which has been manually curated and which allows comparative analysis of the central carbon metabolism in many organisms. Resources where the empirical data were retrieved are linked and interactive visualization is supported in the Cytoscape Web API [277]. Sumoflux incorporates measurement of surrogates included in the experiments with machine learning algorithms, helping in the optimization of experimental designs, selecting which level of metabolites are more interesting to be measured, and it has also the possibility to merge data from different experiments [265]. An interesting application in the sense of identifying which metabolites are, a priori, more interesting to track is Maximum Metabolic Flexibility (MMF). This application estimates the influence of the flux of different metabolic pathways on other reactions, with a clear application in the prioritization of the reactions to measure first helping to optimize resources [259]. JBEI Quantitative Metabolic Modeling (jQMM) calculates flux models at the genome-scale. Prediction of internal metabolic fluxes is available not only through 13 C metabolic experiments but also through FBA. This application also accepts omics data, which makes it suitable for flux studies in microbial communities [257]. Finally, few Web-based fluxomics applications are available in comparison to desktop applications. For instance, MicrobesFlux uses annotated microorganism genomes (KEGG) to generate and reconstruct metabolic models [261]. CycSim is another Web application in which genome-scale metabolic models can be simulated and integrated with KEGG data [247]. MFlux is the third Web-based platform contemplated in this review, and it incorporates machine learning algorithms (support vector machine and k-nearest neighbours, among others) to predict bacterial metabolism, with the peculiarity that it incorporates experimental data from about a hundred of papers in which heterotrophic bacterial metabolisms were characterized by 13 C experiments. MFlux incorporates methodologies to adjust flux models with given stoichiometric constraints through quadratic programming [260]. Knowledge representation Model reconstruction and network analysis are mainstreams for system-level analysis, namely for the study of microbe systems as well as host–microbe and microbial community interplays. These works are equally relevant to gain a better understanding about the gut ecosystem and to disclose the impact of the social dynamics of these communities into dysbiosis and disease. Figure 4 illustrates the different aspects of knowledge representation that are detailed in the next subsections. Figure 4. View largeDownload slide Mindmap of gut-related modelling and system-level analysis efforts. Figure 4. View largeDownload slide Mindmap of gut-related modelling and system-level analysis efforts. Metabolic modelling The reconstruction of genome-scale metabolic models can be viewed as a framework for converting large amounts of varied data, e.g. genetic, metabolic and biochemical, into phenotype and interaction observations [25, 278]. Typically, such reconstruction requires extensive manual curation and validation, and is based on the genome sequence, biochemistry and physiology of the organism [279]. The resulting model describes individual chemical reactions governed by the fundamental laws of mass conservation and thermodynamics, and can be used to simulate microbial growth or to predict the production rate of a particular metabolite. The value of metabolic modelling for understanding the complex environment of the gut microbiome lays in resolving biochemical relationships within and between microbial species and potentially predicting the effect of ecosystem-wide perturbations, such as antibiotic application or pathogen invasion. Microbial communities can be seen not only as groups of individual microbes but also as collections of biochemical functions affecting and responding to an environment or host organism [280, 281]. Gut microbe biochemical models There are a number of available reconstructions for human gut microbes (Table 7). Notably, a recent work has presented draft metabolic reconstructions for 301 gastrointestinal microbe models [282]. Table 7. Genome-scale metabolic models and networks reconstructed for gut microbiota species Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis Table 7. Genome-scale metabolic models and networks reconstructed for gut microbiota species Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis These models describe the metabolism of each species, and their integrated analysis allows the exploration of interactions between predominant bacteria in the gut ecosystems. For example, El-Semman and colleagues [283] reconstructed two metabolic models for Bifidobacterium adolescentis L2-32 (the iBif452 model) and F. prausnitzii A2-165 (the iFap484), which enabled the study of the anti-inflammatory role that these microorganisms play in the gut ecosystem. A genome-scale metabolic model for Lactobacillus casei LC2W enabled the identification of essential amino acids and vitamins and the exploration of the biosynthetic potential of some metabolites [26]. Another reconstruction of B. adolescentis L2-32 and F. prausnitzii A2-165 models enabled in silico simulation of the metabolic crosstalk between the two species and evidenced the importance of acetate supply into butyrate production [27]. Likewise, the characterization of carbohydrate utilization in Bacteroides thetaiotaomicron, supported by genome-scale metabolic and regulatory reconstructions, prompted and refined specific functional assignments for sugar catabolic enzymes and transporters [144]. Many of the above described models were obtained using similar reconstruction pipelines and therefore, share some data resources and simulation tools. Often, genome sequence data (from NCBI Genome database [40]) is the starting point, and draft reconstructions are obtained with the Model SEED comparative genome annotation and analysis software [262]. KEGG database is a useful resource for functional annotation [41], and the BiGG database is further used to assign reaction directionality [302, 303]. Tools like GEMSiRV [253], Acorn [304], YANAsquare [305] and VANTED [306] are commonly used for this purpose. Finally, constraint-based computational techniques are used in varied model simulations. For example, the OptKnock algorithm [307] and the COBRA toolbox [245] are frequently used in flux balance analysis, which enables the prediction of the phenotypic responses triggered by environmental factors (i.e. manipulation of cellular growth in silico) and additional metabolic profiling. A comprehensive description of the available genome-scale metabolic reconstruction procedures and pipelines can be found in recent reviews [29, 308]. Along this line of research, but using different approaches, Bayesian inference of metabolic networks has been used to reveal a metabolic system with greater prevalence among IBD patients [309], and the construction and functional analysis of proteome interaction networks enabled the analysis of nutrient-affected pathways in human pathologies [310]. Gut microbiome community models As more metabolic reconstructions of gut microbes become available, bioinformatics efforts are being directed towards the development of modelling frameworks for the systematic investigation of metabolic crosstalk in gut microbiome communities [311, 312]. Although existing single-species quantitative and computational approaches can be applied to microbial communities, extended community-centred approaches are being proposed to consider the impact that social traits (e.g. bacteriocin production, quorum sensing and other cell-to-cell interactions) may have in specific scenarios [313]. Such modelling of microbe communities should entail community structure, i.e. the interactions among microbes over time (community states). Specifically, each community state is described by measurements of community-level fluxes, abundances of species and knowledge of the metabolism of these organisms. In complex ecosystems, such as the human gut, this may imply millions of reactions, many of which are carried out in different species. As the involved mathematical and computational modelling is too costly, alternatives based on coarse-grained models have been proposed, and recent reviews have described their rationale, pointing out main strengths and drawbacks [314, 315]. The so-called ‘supra-organism’ approach combines all metabolic reactions into a single network to study the metabolic capacities in terms of product and substrate variation of the community [316]. Such approach ignores the impact of species abundances and the interactions between community members while enabling the optimization of community-level objectives (i.e. prediction of important environmental conditions). The steady-state compartmentalized approach models each organism in the microbial community as a single constraint-based model (i.e. with its own objective function), nested within a global ecosystem model that represents the exchange of metabolites between the species. The aim is to maximize the objective function of the ecosystem, and thus, enable the study of host–microbe and microbe–microbe interactions. [315]. Although initially neglected, biomass concentrations of individual species are now also taken into account, which allows for the determination of accurate quantitative transfer rates [317]. The dynamic compartmentalized approach goes one step forward and uses the kinetics of substrate uptake and metabolite exchange between species to grasp ecosystem structure and functionality [318, 319]. Specifically, this approach accounts for changes in the biomass concentrations of individual species over time, which allows the simulation of interactions that may alter the community state. Furthermore, by complementing in silico metabolic network models with metagenomics-based compositional data, it is possible to predict levels of competition and complementarity among microbiome species and compare predicted interaction measures to species co-occurrence, specifically to study microbiome assembly according to habitat filtering [320]. There already exists several gut microbiome community models. Constraint-based multi-species modelling has been used to predict the effects of environmental constraints, namely, different dietary regimes as well as anoxic and oxic conditions, in the human gut ecosystem [321]. On the other hand, integer linear programming has been used to seek ways to shift target communities towards preferred states, i.e. minimal sets of microbial species that collectively provide the enzymatic capacity required to synthesize a set of desired target products from a predefined set of available substrates [322]. For example, the in silico design of faecal microbiota transplants, where synthetic communities are engineered to mimic a healthy gut, and thus, to be able to ameliorate the condition of patients with dysbiotic guts. This kind of transplants has shown promising results for addressing recurrent Clostridium difficile infections and other gut disorders, including IBD [323]. Computational tools such as PathPred [324] and Computation of Microbial Ecosystems in Time and Space (COMETS) [325] are being used in the study of community level biotransformation. PathPred uses the KEGG RPAIR database, a collection of biochemical structure transformation patterns and chemical structure alignments of substrate–product pairs, to predict plausible pathways for multistep reactions. COMETS enables computer simulations of metabolism in spatially structured microbial communities using dynamic flux balance analysis. To facilitate the visual exploration of the metabolic interactions between microbiomes in a community, e.g. as predicted by COMETS, tools like VisANT 5.0 [326], MetDraw [327], Cellular Overview [328], FBA-SimVis [251] and SurreyFBA [266] have been developed. Host–microbe models Typically, the characterization of host–microbe interactions entails the integration of a human metabolic reconstruction (or a mouse reconstruction) with one or various microbe metabolic reconstructions. These models are useful to gain a deeper understanding about host–microbe symbiosis in the scope of metabolic disorders, and thus, may offer valuable insights into diet modulation and the benefits of probiotics. For example, a ‘meta-metabolome’ network describing the interactions between the human host and three predominant phyla of gut bacteria, namely, Firmicutes, Bacteroidetes and Actinobacteria, shed light into cross-feeding relationships between some gut microbe enzymes and host carbohydrate metabolism enzymes [329]. The genome-scale metabolic reconstruction of B. thetaiotaomicron iAH991 was integrated with the mouse metabolic reconstruction iMM1415, in an effort to characterize intestinal transport and absorption reactions. The resulting model (iexGFMM_BΘ) comprises 7239 reactions, 5164 metabolites and 2769 genes, and was used to simulate the effect of different dietary regimes in both the host and the microbe [284]. Similarly, a metabolic reconstruction of human small intestinal epithelial cells, named hs_sIEC611, supported the study of microbe–microbe interactions in the presence/absence of the human host [330]. The first constraint-based host–microbial community model was recently published [311]. This model encompasses the most comprehensive model of human metabolism (Recon2) and 11 manually curated and validated metabolic models of commensals, probiotics, pathogens and opportunistic pathogens, with over 2000 exchanges representing metabolic functions in humans. It was used to predict potential metabolic host–microbe interactions under four in silico dietary regimes, which varied in carbohydrate, fat and protein intake. Network mining The construction of microbial function networks is often sought as a means of identifying co-occurrence of microbial species in humans. For example, a protein–protein interaction network supported the study of potential dietary interventions targeting the short-chain fatty acid metabolism, namely, the analysis of topological metrics enabled the identification of the most vulnerable protein targets of the butyrate and propionate metabolic pathways, i.e. protein targets that are more likely to change gene expression activity [331]. Another approach based on the mapping of microbial genes to functional units, i.e. KEGG orthologous groups (KO) [41] or evolutionary genealogy of genes (eggNOG) [55], has been applied to the study of the human gut microbiome associated with T2D [332]. This analysis used Pearson’s correlation coefficient to characterize the strength of the associations between functional units and included the prediction of the abundance of functional units using machine learning (i.e. random forest algorithm). Associations deemed as weak were eliminated and the final network of functional units was described in terms of global and local properties (e.g. number of nodes and edges, density, diameter and clustering coefficient) as well as functional modules, namely, T2D-specific functional networks, and network motifs. A network-based approach also helped in the characterization of microbial co-occurrence in IBD patients [333]. Besides classical topology metrics (e.g. path length and clustering coefficient), this study looked into three- and four-species network motifs to gain a better understanding about local patterns of species co-occurrence. A correlation network analysis identified significant associations between abundances of microbial taxa and diet-induced shifts in several metabolic health parameters [334]. Namely, it identified diet-induced changes in Bacteroides levels related to changes in carbohydrate oxidation rates, whereas changes in Firmicutes were correlated with changes in fat oxidation. Among network-based studies, Cytoscape is the platform with broadest usage [335]. Besides providing generic means of visualization and topological exploration, this platform offers a number of different applications that enable multi-scale data integration, data clustering, enrichment analysis (Gene Ontology (GO) functional annotation) and network comparison, among others. Alternatively, some works use ConsensusPathDB (which is also available as a plugin for Cytoscape) [336]. ConsensusPathDB-human integrates interaction networks in Homo sapiens including binary and complex protein–protein, genetic, metabolic, signalling, gene regulatory and drug–target interactions, as well as biochemical pathways. Other system-level analyses Although not so common, there are emerging tools that complement the knowledge provided by metabolic models and network, namely, in terms of system dynamics and time evolving. For example, the Metagenomic Microbial Interaction Simulator (MetaMIS) supports time series analysis of microbial community profiles [112]. The central purpose of this tool is to provide insights into microbe interactions in general and about specific microbes in the community. To this end, MetaMIS infers underlying microbial interactions from the abundance tables of operational taxonomic units, and then, uses it to construct interaction networks using the Lotka–Volterra model. For each interaction network, it systematically examines interaction patterns (such as mutualism or competition) and refines the biotic role within microbes. Dynamic Bayesian Networks are another approach used to capture complex interactions and dynamic change within the microbiome over time. For example, a dynamic Bayesian network was used to model the progression of microbiota while colonizing the infant gut [337]. The aim was to develop a predictive model based on prior composition. So, the model accounted for relationships between multiple bacterial taxa, the compositional changes bacterial taxa exert on other community members over time and the influence of environmental stresses (e.g. the use of antibiotics) on gut microbiome progression. A hidden Markov model was used to gain a better understanding about the distribution of butyrate production pathways in commensals and pathogens inhabiting different environments, namely, the human gut [338]. Boolean network modelling and dynamic analysis have enabled the inference of important relationships within gut microbiota composition. More specifically, this approach was used to explore the dynamics of clindamycin antibiotic treatment in C. difficile infection and to predict therapeutic probiotic interventions to suppress C. difficile infection [64]. Agent-based modelling has also supported the study of gut microbiome population dynamics. For example, one model represented two bacterial species, metabolites and the gut (host), considering behavioural rules for both microbe–microbe interactions and host–microbe interactions. Notably, this model encompasses the reactions governing fermentation of polysaccharides to acetate and propionate and fermentation of acetate to butyrate, and antibiotic treatment was chosen as disturbance factor and used to investigate stability of the system [339]. Another model described clostridia, Desulfovibrio sp., and bifidobacteria population interactions in the gut and supported the study of risks for developing autism [340]. Simulation results suggested that clostridia growth rate is a key determinant of risk of autism development and treatment of high-risk infants with supraphysiological levels of lysozymes may reduce the risk of developing autism. Finally, an agent-based model of virulence regulation in Pseudomonas aeruginosa was developed to represent the host–microbe interface in the gut and be able to study its spatial–temporal dynamics [341]. Gut microbiome studies have also taken advantage of machine learning and data mining methods. An approach using random forests helped exploring the role of the gut microbiota in colon tumorigenesis, namely, by modelling the number of tumours developed based on the initial composition of the microbiota and different combinations of antibiotics [56]. In another study, text mining and naive Bayes classification were combined towards disclosing diet–gut microbiome interactions at the molecular level [331, 342]. Noteworthy, this resulted in the development of NutriChem, a public Web-based database on associations between chronic diseases and plant-based foods [343]. Future challenges and opportunities After a period of notable introduction of Omics technologies into the study of the human gut microbiome, several achievements have been accomplished, which as a whole provided deeper insights and understanding into the complex microbial ecology and physiology operating on the intestinal mucosa. Omics have been crucial for the elucidation of some of the microbial and metabolic signatures, characterizing the link between dysbiosis and the disease/health status of the host. For instance, in IBD, a chronic gut inflammatory condition of the human gut, omics had led to a better understanding of the disease phenotype (i.e. genes, proteins and pathways that primarily differentiated patients from healthy subjects) and enabled the association of the phenotype with alterations in bacterial carbohydrate and protein metabolism, bacterial–host interactions, as well as human host-secreted enzymes [27]. Future challenges, which represent also new opportunities, rely on the integration of different techniques to recover a holistic view of the microbiomes. Both efficiency and quality of microbiome research heavily depend on the integration of the outputs of next-generation sequencing methods with those of other omics technologies, namely, data on transcript and protein variation, metabolite concentrations and spatial distribution. Within this context, microbiome bioinformatics have the mission to provide computational methods and techniques that complement experimental approaches and enrich our understanding of complex microbial communities, their internal interactions and their interactions with the host and the environment. Together with integration, curation of data repositories will be crucial for the proper identification of microorganisms, genes and proteins, notably those to which a putative status, a hypothetical function or not even that have been assigned. For this reason, studies focused on the function of single molecules are as important to metatranscriptomics, metaproteomics or metabolics as culturomics is for metagenomics. It will make no sense to discover new microbial biomarkers of disease if we cannot culture it. Last but not least, there is a general lack of graphical user interfaces to use many of the tools. This lack limits access of the scientific community to interesting resources and therefore represent and opportunity for current developments in the field of the human microbiota, as many of the informatics tools are currently operated via command line. Concluding remarks The human microbiome plays a key role in human health and is associated with numerous diseases. Understanding the importance of the gut microbiome on modulation of host health has thus become a subject of great interest for researchers across biomedical disciplines. Integrated and high-throughput analyses are providing new insights into microbial community structure and function in human gut. The continuous emergence of new knowledge, the heterogeneity of the involved data and the need for integrative, advanced and often application-based (customized workflows) analysis, makes the comprehensive description of possible resources, tools and pipelines hardly feasible. This review presents a collection of updated tools for the application of omics data into the field of gut microbiota, but it can be extended to the study of any microbiota, regardless the ecological niche. Main tools for metagenomics, metatranscriptomics, metaproteomics, metabolomics and fluxomics have been discussed and organised in a comprehensive way as an online resource for the research community. We have detected that further efforts need to be done in the integration of data from different omics, in the optimization of informatics resources, in the development of novel tools and, notably, in the curation of the information contained in databases and other public biomolecule repositories. We sincerely believe that cooperation between researchers working in different fields, from microbiology to bioinformatics, will help in achieving those suggested milestones. At the end, this will extend our view of the complex interaction between microbiota and the human host, and identify meaningful microbial targets for interventional studies in the framework of different diseases. Key Points Gut microbiota composition is related to human health and alterations in the relative proportions of its components linked several human diseases. Omics techniques have made possible to study the composition, functionality and metabolic activity of the human gut microbiota and its impact on host physiology. This review provides an overview of many bioinformatics resources that can be applied to the human gut microbiome research. Competing interests B.S. is on the scientific board and is co-founder of Microviable Therapeutics SL. The other authors have no competing interests. Supplementary Data Supplementary data are available online at http://bib.oxfordjournals.org/. Funding This work was supported by the Spanish ‘Programa Estatal de Investigación, Desarrollo e Inovación Orientada a los Retos de la Sociedad’ (grant number AGL2013-44039 R); the Asociación Española Contra el Cancer (‘Obtención de péptidos bioactivos contra el Cáncer Colo-Rectal a partir de secuencias genéticas de microbiomas intestinales’, grant number PS-2016). This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684); and the INOU16-05 project from the University of Vigo. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure. Aitor Blanco-Míguez is a PhD student of the Computer Science Doctoral programme of the University of Vigo. He is currently developing advanced computational methods for modelling the interaction of commensal bacteria with host epitelial/immune cells. Florentino Fdez-Riverola is a faculty member of the Department of Computer Science and a researcher affiliated to the Biomedical Research Centre (CINBIO), at the University of Vigo. He leads the Next Generation Computer System Group (SING), which is dedicated to the research and development of cutting-edge computational methodologies and applications. Borja Sánchez is a senior researcher at the IPLA-CSIC. His main research line is devoted to the understanding of the molecular mechanisms of host–bacteria interaction through extracellular and surface-associated proteins/peptides. Anália Lourenço is a faculty member of the Department of Computer Science and a researcher affiliated to the Biomedical Research Centre (CINBIO), at the University of Vigo and the Centre of Biological Engineering, at the University of Minho. Her main research interests include computational intelligence, bioinformatics and systems biology. References 1 Rakoff-Nahoum S , Foster KR , Comstock LE. The evolution of cooperation within the gut microbiota . Nature 2016 ; 533 : 255 – 9 . Google Scholar CrossRef Search ADS PubMed 2 Francino MP. Antibiotics and the human gut microbiome: dysbioses and accumulation of resistances . Front Microbiol 2016 ; 6 : 1 – 11 . Google Scholar CrossRef Search ADS 3 Walsh CJ , Guinane CM , O'Toole PW , et al. Beneficial modulation of the gut microbiota . FEBS Lett 2014 ; 588 ( 22 ): 4120 – 30 . http://dx.doi.org/10.1016/j.febslet.2014.03.035 Google Scholar CrossRef Search ADS PubMed 4 Morgan XC , Huttenhower C , Lewitter F , Kann M. Chapter 12: human microbiome analysis . PLoS Comput Biol 2012 ; 8 ( 12 ): e1002808. Google Scholar CrossRef Search ADS PubMed 5 van den Elsen LW , Poyntz HC , Weyrich LS , et al. Embracing the gut microbiota: the new frontier for inflammatory and infectious diseases . Clin Transl Immunol 2017 ; 6 ( 1 ): e125. Google Scholar CrossRef Search ADS 6 Hooper LV , Gordon JI. Commensal host-bacterial relationships in the gut . Science 2001 ; 292 ( 5519 ): 1115 – 8 . http://dx.doi.org/10.1126/science.1058709 Google Scholar CrossRef Search ADS PubMed 7 Sánchez B , Urdaci MC , Margolles A. Extracellular proteins secreted by probiotic bacteria as mediators of effects that promote mucosa-bacteria interactions . Microbiology 2010 ; 156 ( Pt 11 ): 3232 – 42 . Google Scholar CrossRef Search ADS PubMed 8 Sansonetti PJ. War and peace at mucosal surfaces . Nat Rev Immunol 2004 ; 4 ( 12 ): 953 – 64 . http://dx.doi.org/10.1038/nri1499 Google Scholar CrossRef Search ADS PubMed 9 Patterson E , Ryan PM , Cryan JF , et al. Gut microbiota, obesity and diabetes . Postgrad Med J 2016 ; 92 ( 1087 ): 286 – 300 . http://dx.doi.org/10.1136/postgradmedj-2015-133285 Google Scholar CrossRef Search ADS PubMed 10 He X , Ji G , Jia W , et al. Gut microbiota and nonalcoholic fatty liver disease: insights on mechanism and application of metabolomics . Int J Mol Sci 2016 ; 17 ( 3 ): 300 . http://dx.doi.org/10.3390/ijms17030300 Google Scholar CrossRef Search ADS PubMed 11 Barlow GM , Yu A , Mathur R. Role of the gut microbiome in obesity and diabetes mellitus . Nutr Clin Pract 2015 ; 30 ( 6 ): 787 – 97 . http://dx.doi.org/10.1177/0884533615609896 Google Scholar CrossRef Search ADS PubMed 12 Zhang C , Yin A , Li H , et al. Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children . EBioMedicine 2015 ; 2 : 966 – 82 . 13 Trøseid M , Hov JR , Nestvold TK , et al. Major increase in microbiota-dependent proatherogenic metabolite TMAO one year after bariatric surgery . Metab Syndr Relat Disord 2016 ; 14 : 197 – 201 . Google Scholar CrossRef Search ADS PubMed 14 Palau-Rodriguez M , Tulipani S , Isabel Queipo-Ortuño M , et al. Metabolomic insights into the intricate gut microbial-host interaction in the development of obesity and type 2 diabetes . Front Microbiol 2015 ; 6 : 1151 . Google Scholar CrossRef Search ADS PubMed 15 Arora T , Singh S , Sharma RK. Probiotics: Interaction with gut microbiome and antiobesity potential . Nutrition 2013 ; 29 ( 4 ): 591 – 6 . http://dx.doi.org/10.1016/j.nut.2012.07.017 Google Scholar CrossRef Search ADS PubMed 16 Buttó LF , Haller D. Dysbiosis in intestinal inflammation: cause or consequence . Int J Med Microbiol 2016 ; 306 : 302 – 9 . Google Scholar CrossRef Search ADS PubMed 17 Kataoka K. The intestinal microbiota and its role in human health and disease . J Med Invest 2016 ; 63 ( 1-2 ): 27 – 37 . http://dx.doi.org/10.2152/jmi.63.27 Google Scholar CrossRef Search ADS PubMed 18 Matsuoka K , Kanai T. The gut microbiota and inflammatory bowel disease . Semin Immunopathol 2015 ; 37 ( 1 ): 47 – 55 . http://dx.doi.org/10.1007/s00281-014-0454-4 Google Scholar CrossRef Search ADS PubMed 19 Kostic AD , Xavier RJ , Gevers D. The microbiome in inflammatory bowel disease: current status and the future ahead . Gastroenterology 2014 ; 146 ( 6 ): 1489 – 99 . http://dx.doi.org/10.1053/j.gastro.2014.02.009 Google Scholar CrossRef Search ADS PubMed 20 Forbes JD , Van Domselaar G , Bernstein CN. Microbiome survey of the inflamed and noninflamed gut at different compartments within the gastrointestinal tract of inflammatory bowel disease patients . Inflamm Bowel Dis 2016 ; 22 : 817 – 25 . http://dx.doi.org/10.1097/MIB.0000000000000684 Google Scholar CrossRef Search ADS PubMed 21 Cao Y , Shen J , Ran ZH. Association between Faecalibacterium prausnitzii reduction and inflammatory bowel disease: a meta-analysis and systematic review of the literature . Gastroenterol Res Pract 2014 ; 2014 : 872725 . Google Scholar PubMed 22 Paul B , Barnes S , Demark-Wahnefried W , et al. Influences of diet and the gut microbiome on epigenetic modulation in cancer and other diseases . Clin Epigenetics 2015 ; 7 : 112 . http://dx.doi.org/10.1186/s13148-015-0144-7 Google Scholar CrossRef Search ADS PubMed 23 Thomas RM , Jobin C. The Microbiome and Cancer: Is the ‘Oncobiome’ Mirage Real? . Trends in Cancer 2015 ; 1 ( 1 ): 24 – 35 . Google Scholar CrossRef Search ADS PubMed 24 Belizario JE , Napolitano M. Human microbiomes and their roles in dysbiosis, common diseases, and novel therapeutic approaches . Front Microbiol 2015 ; 6 : 1 – 16 . Google Scholar CrossRef Search ADS PubMed 25 Sung J , Hale V , Merkel AC , et al. Metabolic modeling with Big Data and the gut microbiome . Appl Transl genomics 2016 ; 10 : 10 – 5 . http://dx.doi.org/10.1016/j.atg.2016.02.001 Google Scholar CrossRef Search ADS 26 Weir TL , Manter DK , Sheflin AM , et al. Stool microbiome and metabolome differences between colorectal cancer patients and healthy adults . PLoS One 2013 ; 8 ( 8 ): e70803 . Google Scholar CrossRef Search ADS PubMed 27 Erickson AR , Cantarel BL , Lamendella R , et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease . PLoS One 2012 ; 7 ( 11 ): e49138 . Google Scholar CrossRef Search ADS PubMed 28 Haiser HJ , Gootenberg DB , Chatman K , et al. Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella lenta . Science 2013 ; 341 : 295 – 8 . http://dx.doi.org/10.1126/science.1235872 Google Scholar CrossRef Search ADS PubMed 29 Cuevas DA , Edirisinghe J , Henry CS , et al. From DNA to FBA: how to build your own genome-scale metabolic model . Front Microbiol 2016 ; 7 : 907 . Google Scholar CrossRef Search ADS PubMed 30 Segata N , Boernigen D , Tickle TL , et al. Computational meta’omics for microbial community studies . Mol Syst Biol 2014 ; 9 : 666 . Google Scholar CrossRef Search ADS 31 Morgan XC , Huttenhower C. Meta’omic analytic techniques for studying the intestinal microbiome . Gastroenterology 2014 ; 146 ( 6 ): 1437 – 48.e1 . Google Scholar CrossRef Search ADS PubMed 32 Borenstein E. Computational systems biology and in silico modeling of the human microbiome . Brief Bioinform 2012 ; 13 ( 6 ): 769 – 80 . http://dx.doi.org/10.1093/bib/bbs022 Google Scholar CrossRef Search ADS PubMed 33 Collison M , Hirt RP , Wipat A , et al. Data mining the human gut microbiota for therapeutic targets . Brief Bioinform 2012 ; 13 ( 6 ): 751 – 68 . http://dx.doi.org/10.1093/bib/bbs002 Google Scholar CrossRef Search ADS PubMed 34 Human Microbiome Project Consortium . A framework for human microbiome research . Nature 2012 ; 486 : 215 – 21 . http://dx.doi.org/10.1038/nature11209 CrossRef Search ADS PubMed 35 Qin J , Li Y , Cai Z , et al. A metagenome-wide association study of gut microbiota in type 2 diabetes . Nature 2012 ; 490 ( 7418 ): 55 – 60 . http://dx.doi.org/10.1038/nature11450 Google Scholar CrossRef Search ADS PubMed 36 McDonald D , Birmingham A , Knight R. Context and the human microbiome . Microbiome 2015 ; 3 : 52. http://dx.doi.org/10.1186/s40168-015-0117-2 Google Scholar CrossRef Search ADS PubMed 37 Li J , Jia H , Cai X , et al. An integrated catalog of reference genes in the human gut microbiome . Nat Biotechnol 2014 ; 32 ( 8 ): 834 – 41 . http://dx.doi.org/10.1038/nbt.2942 Google Scholar CrossRef Search ADS PubMed 38 Falony G , Joossens M , Vieira-Silva S , et al. Population-level analysis of gut microbiome variation . Science 2016 ; 352 ( 6285 ): 560 – 4 . http://dx.doi.org/10.1126/science.aad3503 Google Scholar CrossRef Search ADS PubMed 39 Auton A , Brooks LD , Durbin RM , et al. A global reference for human genetic variation . Nature 2015 ; 526 ( 7571 ): 68 – 74 . http://dx.doi.org/10.1038/nature15393 Google Scholar CrossRef Search ADS PubMed 40 Pruitt KD , Tatusova T , Brown GR , et al. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy . Nucleic Acids Res 2012 ; 40 : D130 – 5 . Google Scholar CrossRef Search ADS PubMed 41 Kanehisa M , Furumichi M , Tanabe M , et al. KEGG: new perspectives on genomes, pathways, diseases and drugs . Nucleic Acids Res 2017 ; 45 ( D1 ): D353 – 61 . Google Scholar CrossRef Search ADS PubMed 42 Tatusov RL , Fedorova ND , Jackson JD , et al. The COG database: an updated version includes eukaryotes . BMC Bioinformatics 2003 ; 4 : 41 . http://dx.doi.org/10.1186/1471-2105-4-41 Google Scholar CrossRef Search ADS PubMed 43 Finn RD , Coggill P , Eberhardt RY , et al. The Pfam protein families database: towards a more sustainable future . Nucleic Acids Res 2016 ; 44 ( D1 ): D279 – 85 . Google Scholar CrossRef Search ADS PubMed 44 Letunic I , Doerks T , Bork P. SMART 7: recent updates to the protein domain annotation resource . Nucleic Acids Res 2012 ; 40 : D302 – 5 . Google Scholar CrossRef Search ADS PubMed 45 Mitra S , Rupek P , Richter DC , et al. Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG . BMC Bioinformatics 2011 ; 12 (Suppl 1) : S21 . Google Scholar CrossRef Search ADS PubMed 46 Caspi R , Altman T , Dreher K , et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases . Nucleic Acids Res 2012 ; 44 : 471 – 80 . Google Scholar CrossRef Search ADS 47 Overbeek R , Olson R , Pusch GD , et al. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST) . Nucleic Acids Res 2014 ; 42 : D206 – 14 . Google Scholar CrossRef Search ADS PubMed 48 Wilke A , Bischof J , Gerlach W , et al. The MG-RAST metagenomics database and portal in 2015 . Nucleic Acids Res 2016 ; 44 : D590 – 4 . Google Scholar CrossRef Search ADS PubMed 49 Markowitz VM , Chen I-M. a , Palaniappan K , et al. IMG 4 version of the integrated microbial genomes comparative analysis system . Nucleic Acids Res 2014 ; 42 : 560 – 7 . http://dx.doi.org/10.1093/nar/gkt963 Google Scholar CrossRef Search ADS 50 Huson DH , Beier S , Flade I , et al. MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data . PLoS Comput Biol 2016 ; 12 ( 6 ): e1004957 . Google Scholar CrossRef Search ADS PubMed 51 Abubucker S , Segata N , Goll J , et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome . PLoS Comput Biol 2012 ; 8 ( 6 ): e1002358 . Google Scholar CrossRef Search ADS PubMed 52 Tyakht AV , Popenko AS , Belenikin MS , et al. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads . Source Code Biol Med 2012 ; 7 : 13 . http://dx.doi.org/10.1186/1751-0473-7-13 Google Scholar CrossRef Search ADS PubMed 53 Kultima JR , Coelho LP , Forslund K , et al. MOCAT2: a metagenomic assembly, annotation and profiling framework . Bioinformatics 2016 ; 32 : 2520 – 3 . http://dx.doi.org/10.1093/bioinformatics/btw183 Google Scholar CrossRef Search ADS PubMed 54 Bose T , Haque MM , Reddy C , et al. COGNIZER: a framework for functional annotation of metagenomic datasets . PLoS One 2015 ; 10 ( 11 ): e0142102 . Google Scholar CrossRef Search ADS PubMed 55 Huerta-Cepas J , Szklarczyk D , Forslund K , et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences . Nucleic Acids Res 2016 ; 44 : D286 – 93 . Google Scholar CrossRef Search ADS PubMed 56 Zackular JP , Baxter NT , Chen GY , et al. Manipulation of the gut microbiota reveals role in colon tumorigenesis . mSphere 2015 ; 1 : e00001-15 . Google Scholar CrossRef Search ADS PubMed 57 Norman JM , Handley SA , Baldridge MT , et al. Disease-specific alterations in the enteric virome in inflammatory bowel disease . Cell 2015 ; 160 : 447 – 60 . http://dx.doi.org/10.1016/j.cell.2015.01.002 Google Scholar CrossRef Search ADS PubMed 58 Biagi E , Candela M , Centanni M , et al. Gut microbiome in down syndrome . PLoS One 2014 ; 9 ( 11 ): e112023 . Google Scholar CrossRef Search ADS PubMed 59 Larsen PE , Dai Y. Metabolome of human gut microbiome is predictive of host dysbiosis . Gigascience 2015 ; 4 : 42. http://dx.doi.org/10.1186/s13742-015-0084-3 Google Scholar CrossRef Search ADS PubMed 60 Yap TW-C , Gan H-M , Lee Y-P , et al. Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults . PLoS One 2016 ; 11 ( 3 ): e0151893 . Google Scholar CrossRef Search ADS PubMed 61 Keren N , Konikoff FM , Paitan Y , et al. Interactions between the intestinal microbiota and bile acids in gallstones patients . Environ Microbiol Rep 2015 ; 7 : 874 – 80 . http://dx.doi.org/10.1111/1758-2229.12319 Google Scholar CrossRef Search ADS PubMed 62 Rooijers K , Kolmeder C , Juste C , et al. An iterative workflow for mining the human intestinal metaproteome . BMC Genomics 2011 ; 12 : 6 . http://dx.doi.org/10.1186/1471-2164-12-6 Google Scholar CrossRef Search ADS PubMed 63 Wills ES , Jonkers DMAE , Savelkoul PH , et al. Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation . PLoS One 2014 ; 9 ( 3 ): e90981 . Google Scholar CrossRef Search ADS PubMed 64 Steinway SN , Biggs MB , Loughran TP , et al. Inference of network dynamics and metabolic interactions in the gut microbiome . PLoS Comput Biol 2015 ; 11 ( 5 ): e1004338 . Google Scholar CrossRef Search ADS PubMed 65 Stewart CJ , Marrs ECL , Nelson A , et al. Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis . PLoS One 2013 ; 8 ( 8 ): e73465 . Google Scholar CrossRef Search ADS PubMed 66 La Rosa PS , Warner BB , Zhou Y , et al. Patterned progression of bacterial populations in the premature infant gut . Proc Natl Acad Sci USA 2014 ; 111 : 12522 – 7 . http://dx.doi.org/10.1073/pnas.1409497111 Google Scholar CrossRef Search ADS PubMed 67 Turnbaugh PJ , Hamady M , Yatsunenko T , et al. A core gut microbiome in obese and lean twins . Nature 2009 ; 457 ( 7228 ): 480 – 4 . http://dx.doi.org/10.1038/nature07540 Google Scholar CrossRef Search ADS PubMed 68 Caporaso JG , Lauber CL , Costello EK , et al. Moving pictures of the human microbiome . Genome Biol 2011 ; 12 ( 5 ): R50 . Google Scholar CrossRef Search ADS PubMed 69 Turroni S , Rampelli S , Biagi E , et al. Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 . Microbiome 2017 ; 5 ( 1 ): 39 . http://dx.doi.org/10.1186/s40168-017-0256-8 Google Scholar CrossRef Search ADS PubMed 70 Schnorr SL , Candela M , Rampelli S , et al. Gut microbiome of the Hadza hunter-gatherers . Nat Commun 2014 ; 5 : 3654 . Google Scholar CrossRef Search ADS PubMed 71 Zhang J , Guo Z , Xue Z , et al. A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities . ISME J 2015 ; 9 ( 9 ): 1979 – 90 . http://dx.doi.org/10.1038/ismej.2015.11 Google Scholar CrossRef Search ADS PubMed 72 Biagi E , Franceschi C , Rampelli S , et al. Gut microbiota and extreme longevity . Curr Biol 2016 ; 26 ( 11 ): 1480 – 5 . http://dx.doi.org/10.1016/j.cub.2016.04.016 Google Scholar CrossRef Search ADS PubMed 73 Morton ER , Lynch J , Froment A , et al. Variation in rural African gut microbiota is strongly correlated with colonization by entamoeba and subsistence . PLoS Genet 2015 ; 11 ( 11 ): e1005658 . Google Scholar CrossRef Search ADS PubMed 74 Gomez A , Petrzelkova KJ , Burns MB , et al. Gut microbiome of coexisting BaAka pygmies and bantu reflects gradients of traditional subsistence patterns . Cell Rep 2016 ; 14 ( 9 ): 2142 – 53 . http://dx.doi.org/10.1016/j.celrep.2016.02.013 Google Scholar CrossRef Search ADS PubMed 75 Stewart CJ , Nelson A , Campbell MD , et al. Gut microbiota of Type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study . Diabet Med 2017 ; 34 : 127 – 34 . http://dx.doi.org/10.1111/dme.13140 Google Scholar CrossRef Search ADS PubMed 76 Karlsson FH , Tremaroli V , Nookaew I , et al. Gut metagenome in European women with normal, impaired and diabetic glucose control . Nature 2013 ; 498 ( 7452 ): 99 – 103 . http://dx.doi.org/10.1038/nature12198 Google Scholar CrossRef Search ADS PubMed 77 Candela M , Biagi E , Soverini M , et al. Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet . Br J Nutr 2016 ; 116 ( 1 ): 80 – 93 . http://dx.doi.org/10.1017/S0007114516001045 Google Scholar CrossRef Search ADS PubMed 78 Wang W-L , Xu S-Y , Ren Z-G , et al. Application of metagenomics in the human gut microbiome . World J Gastroenterol 2015 ; 21 ( 3 ): 803 – 14 . http://dx.doi.org/10.3748/wjg.v21.i3.803 Google Scholar CrossRef Search ADS PubMed 79 Fabijanić M , Vlahoviček K. Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles . Methods Mol Biol 2016 ; 1415 : 509 – 31 . Google Scholar CrossRef Search ADS PubMed 80 Mulcahy-O’Grady H , Workentine ML. The challenge and potential of metagenomics in the clinic . Front Immunol 2016 ; 7 : 1 – 8 . Google Scholar CrossRef Search ADS PubMed 81 Noecker C , McNally CP , Eng A , et al. High-resolution characterization of the human microbiome . Transl Res 2017 ; 179 : 7 – 23 . http://dx.doi.org/10.1016/j.trsl.2016.07.012 Google Scholar CrossRef Search ADS PubMed 82 Sedlar K , Kupkova K , Provaznik I. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics . Comput Struct Biotechnol J 2017 ; 15 : 48 – 55 . http://dx.doi.org/10.1016/j.csbj.2016.11.005 Google Scholar CrossRef Search ADS PubMed 83 Ghurye JS , Cepeda-Espinoza V , Pop M. Metagenomic assembly: overview, challenges and applications . Yale J Biol Med 2016 ; 89 : 353 – 62 . Google Scholar PubMed 84 Coit P , Sawalha AH. The human microbiome in rheumatic autoimmune diseases: a comprehensive review . Clin Immunol 2016 ; 170 : 70 – 9 . http://dx.doi.org/10.1016/j.clim.2016.07.026 Google Scholar CrossRef Search ADS PubMed 85 Zielezinski A , Vinga S , Almeida J , et al. Alignment-free sequence comparison: benefits, applications, and tools . Genome Biol 2017 ; 18 ( 1 ): 186 . http://dx.doi.org/10.1186/s13059-017-1319-7 Google Scholar CrossRef Search ADS PubMed 86 Treangen TJ , Sommer DD , Angly FE , et al. Next generation sequence assembly with AMOS . Curr Protoc Bioinform 2011 ; Chapter 11 : Unit 11.8 . 87 Kerepesi C , Bánky D , Grolmusz V. AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite . Gene 2014 ; 533 ( 2 ): 538 – 40 . Google Scholar CrossRef Search ADS PubMed 88 van Heel AJ , de Jong A , Montalbán-López M , et al. BAGEL3: Automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides . Nucleic Acids Res 2013 ; 41 : W448 – 53 . Google Scholar CrossRef Search ADS PubMed 89 Altschul SF , Gish W , Miller W , et al. Basic local alignment search tool . J Mol Biol 1990 ; 215 ( 3 ): 403 – 10 . http://dx.doi.org/10.1016/S0022-2836(05)80360-2 Google Scholar CrossRef Search ADS PubMed 90 Lu YY , Tang K , Ren J , et al. CAFE: a Ccelerated Alignment-FrEe sequence analysis . Nucleic Acids Res 2017 ; 45 ( W1 ): W554 – 9 . Google Scholar CrossRef Search ADS 91 Sun S , Chen J , Li W , et al. Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource . Nucleic Acids Res 2011 ; 39 : D546 – 51 . Google Scholar CrossRef Search ADS PubMed 92 Li W , Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences . Bioinformatics 2006 ; 22 : 1658 – 9 . http://dx.doi.org/10.1093/bioinformatics/btl158 Google Scholar CrossRef Search ADS PubMed 93 Haas BJ , Gevers D , Earl AM , et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons . Genome Res 2011 ; 21 ( 3 ): 494 – 504 . http://dx.doi.org/10.1101/gr.112730.110 Google Scholar CrossRef Search ADS PubMed 94 Angiuoli SV , Matalka M , Gussman A , et al. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing . BMC Bioinformatics 2011 ; 12 ( 1 ): 356 . http://dx.doi.org/10.1186/1471-2105-12-356 Google Scholar CrossRef Search ADS PubMed 95 Alneberg J , Bjarnason BS , de Bruijn I , et al. Binning metagenomic contigs by coverage and composition . Nat Methods 2014 ; 11 ( 11 ): 1144 – 6 . http://dx.doi.org/10.1038/nmeth.3103 Google Scholar CrossRef Search ADS PubMed 96 Xu Z , Hao B. CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes . Nucleic Acids Res 2009 ; 37 : W174 – 8 . Google Scholar CrossRef Search ADS PubMed 97 Love MI , Huber W , Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 . Genome Biol 2014 ; 15 ( 12 ): 550. http://dx.doi.org/10.1186/s13059-014-0550-8 Google Scholar CrossRef Search ADS PubMed 98 Quince C , Connellly S , Raguideau S , et al. De novo extraction of microbial strains from metagenomes reveals intra-species niche partitioning . bioRxiv 2016 , doi: https://doi.org/10.1101/073825. 99 Buchfink B , Xie C , Huson DH. Fast and sensitive protein alignment using DIAMOND . Nat Methods 2015 ; 12 ( 1 ): 59 – 60 . Google Scholar CrossRef Search ADS PubMed 100 Manor O , Borenstein E. Systematic characterization and analysis of the taxonomic drivers of functional shifts in the human microbiome . Cell Host Microbe 2017 ; 21 ( 2 ): 254 – 67 . http://dx.doi.org/10.1016/j.chom.2016.12.014 Google Scholar CrossRef Search ADS PubMed 101 Magoč T , Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies . Bioinformatics 2011 ; 27 ( 21 ): 2957 – 63 . Google Scholar CrossRef Search ADS PubMed 102 Kim J , Kim MS , Koh AY , et al. FMAP: functional mapping and analysis pipeline for metagenomics and metatranscriptomics studies . BMC Bioinformatics 2016 ; 17 ( 1 ): 420 . http://dx.doi.org/10.1186/s12859-016-1278-0 Google Scholar CrossRef Search ADS PubMed 103 Riehle K , Coarfa C , Jackson A , et al. The genboree microbiome toolset and the analysis of 16S rRNA microbial sequences . BMC Bioinformatics 2012 ; 13(Suppl 13) : S11 . Google Scholar CrossRef Search ADS PubMed 104 Kelley DR , Liu B , Delcher AL , et al. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering . Nucleic Acids Res 2012 ; 40 : e9 . Google Scholar CrossRef Search ADS PubMed 105 Imelfort M , Parks D , Woodcroft BJ , et al. GroopM: an automated tool for the recovery of population genomes from related metagenomes . PeerJ 2014 ; 2 : e603 . Google Scholar CrossRef Search ADS PubMed 106 Peng Y , Leung HCM , Yiu SM , et al. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth . Bioinformatics 2012 ; 28 : 1420 – 8 . http://dx.doi.org/10.1093/bioinformatics/bts174 Google Scholar CrossRef Search ADS PubMed 107 Xie C , Mao X , Huang J , et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases . Nucleic Acids Res 2011 ; 39 : W316 – 22 . Google Scholar CrossRef Search ADS PubMed 108 Wu Y-W , Tang Y-H , Tringe SG , et al. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm . Microbiome 2014 ; 2 : 26 . http://dx.doi.org/10.1186/2049-2618-2-26 Google Scholar CrossRef Search ADS PubMed 109 Li D , Luo R , Liu C-M , et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices . Methods 2016 ; 102 : 3 – 11 . http://dx.doi.org/10.1016/j.ymeth.2016.02.020 Google Scholar CrossRef Search ADS PubMed 110 Kang DD , Froula J , Egan R , et al. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities . PeerJ 2015 ; 3 : e1165 . Google Scholar CrossRef Search ADS PubMed 111 Noguchi H , Taniguchi T , Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes . DNA Res 2008 ; 15 ( 6 ): 387 – 96 . http://dx.doi.org/10.1093/dnares/dsn027 Google Scholar CrossRef Search ADS PubMed 112 Shaw GT-W , Pao Y-Y , Wang D. MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles . BMC Bioinformatics 2016 ; 17 : 488 . http://dx.doi.org/10.1186/s12859-016-1359-0 Google Scholar CrossRef Search ADS PubMed 113 Treangen TJ , Koren S , Sommer DD , et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline . Genome Biol 2013 ; 14 ( 1 ): R2 . Google Scholar CrossRef Search ADS PubMed 114 Segata N , Waldron L , Ballarini A , et al. Metagenomic microbial community profiling using unique clade-specific marker genes . Nat Methods 2012 ; 9 : 811 – 4 . http://dx.doi.org/10.1038/nmeth.2066 Google Scholar CrossRef Search ADS PubMed 115 Nurk S , Meleshko D , Korobeynikov A , et al. metaSPAdes: a new versatile metagenomic assembler . Genome Res 2017 ; 27 ( 5 ): 824 – 34 . http://dx.doi.org/10.1101/gr.213959.116 Google Scholar CrossRef Search ADS PubMed 116 Namiki T , Hachiya T , Tanaka H , et al. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads . Nucleic Acids Res 2012 ; 40 : e155 . Google Scholar CrossRef Search ADS PubMed 117 Keegan KP , Glass EM , Meyer F. MG-RAST, a metagenomics service for analysis of microbial community structure and function . Methods Mol Biol 2016 ; 1399 : 207 – 33 . Google Scholar CrossRef Search ADS PubMed 118 Chevreux B , Pfisterer T , Drescher B , et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs . Genome Res 2004 ; 14 ( 6 ): 1147 – 59 . http://dx.doi.org/10.1101/gr.1917404 Google Scholar CrossRef Search ADS PubMed 119 Schloss PD , Westcott SL , Ryabin T , et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities . Appl Environ Microbiol 2009 ; 75 : 7537 – 41 . http://dx.doi.org/10.1128/AEM.01541-09 Google Scholar CrossRef Search ADS PubMed 120 Manor O , Borenstein E. MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome . Genome Biol 2015 ; 16 : 53 . http://dx.doi.org/10.1186/s13059-015-0610-8 Google Scholar CrossRef Search ADS PubMed 121 Lin H-H , Liao Y-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes . Sci Rep 2016 ; 6 : 24175 . http://dx.doi.org/10.1038/srep24175 Google Scholar CrossRef Search ADS PubMed 122 Rosen GL , Reichenberger ER , Rosenfeld AM. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads . Bioinformatics 2011 ; 27 ( 1 ): 127 – 9 . http://dx.doi.org/10.1093/bioinformatics/btq619 Google Scholar CrossRef Search ADS PubMed 123 Hoff KJ , Lingner T , Meinicke P , et al. Orphelia: predicting genes in metagenomic sequencing reads . Nucleic Acids Res 2009 ; 37 : W101 – 5 . Google Scholar CrossRef Search ADS PubMed 124 Huson DH , Xie CA. poor man’s BLASTX–high-throughput metagenomic protein database search using PAUDA . Bioinformatics 2014 ; 30 : 38 – 9 . Google Scholar CrossRef Search ADS PubMed 125 Darling AE , Jospin G , Lowe E , et al. PhyloSift: phylogenetic analysis of genomes and metagenomes . PeerJ 2014 ; 2 : e243 . Google Scholar CrossRef Search ADS PubMed 126 Langille MGI , Zaneveld J , Caporaso JG , et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences . Nat Biotechnol 2013 ; 31 : 814 – 21 . http://dx.doi.org/10.1038/nbt.2676 Google Scholar CrossRef Search ADS PubMed 127 Claudel-Renard C , Chevalet C , Faraut T , et al. Enzyme-specific profiles for genome annotation: PRIAM . Nucleic Acids Res 2003 ; 31 : 6633 – 9 . http://dx.doi.org/10.1093/nar/gkg847 Google Scholar CrossRef Search ADS PubMed 128 Hyatt D , Chen G-L , LoCascio PF , et al. Prodigal: prokaryotic gene recognition and translation initiation site identification . BMC Bioinformatics 2010 ; 11 : 119 . http://dx.doi.org/10.1186/1471-2105-11-119 Google Scholar CrossRef Search ADS PubMed 129 Caporaso JG , Kuczynski J , Stombaugh J , et al. QIIME allows analysis of high-throughput community sequencing data . Nat Methods 2010 ; 7 ( 5 ): 335 – 6 . http://dx.doi.org/10.1038/nmeth.f.303 Google Scholar CrossRef Search ADS PubMed 130 Ye Y , Choi J-H , Tang H. RAPSearch: a fast protein similarity search tool for short reads . BMC Bioinformatics 2011 ; 12 : 159. http://dx.doi.org/10.1186/1471-2105-12-159 Google Scholar CrossRef Search ADS PubMed 131 Gerlach W , Jünemann S , Tille F , et al. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads . BMC Bioinformatics 2009 ; 10 ( 1 ): 430 . Google Scholar CrossRef Search ADS PubMed 132 Marchesi JR , Ravel J. The vocabulary of microbiome research: a proposal . Microbiome 2015 ; 3 : 31. http://dx.doi.org/10.1186/s40168-015-0094-5 Google Scholar CrossRef Search ADS PubMed 133 Mande SS , Mohammed MH , Ghosh TS. Classification of metagenomic sequences: methods and challenges . Brief Bioinform 2012 ; 13 : 669 – 81 . http://dx.doi.org/10.1093/bib/bbs054 Google Scholar CrossRef Search ADS PubMed 134 Dröge J , McHardy AC. Taxonomic binning of metagenome samples generated by next-generation sequencing technologies . Brief Bioinform 2012 ; 13 : 646 – 55 . Google Scholar CrossRef Search ADS PubMed 135 Wang Q , Garrity GM , Tiedje JM , et al. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy . Appl Environ Microbiol 2007 ; 73 : 5261 – 7 . http://dx.doi.org/10.1128/AEM.00062-07 Google Scholar CrossRef Search ADS PubMed 136 Rho M , Tang H , Ye Y. FragGeneScan: predicting genes in short and error-prone reads . Nucleic Acids Res 2010 ; 38 : e191 . Google Scholar CrossRef Search ADS PubMed 137 Yi G , Sze S-H , Thon MR. Identifying clusters of functionally related genes in genomes . Bioinformatics 2007 ; 23 ( 9 ): 1053 – 60 . http://dx.doi.org/10.1093/bioinformatics/btl673 Google Scholar CrossRef Search ADS PubMed 138 Manor O , Levy R , Borenstein E. Mapping the inner workings of the microbiome: genomic- and metagenomic-based study of metabolism and metabolic interactions in the human microbiome . Cell Metab 2014 ; 20 : 742 – 52 . http://dx.doi.org/10.1016/j.cmet.2014.07.021 Google Scholar CrossRef Search ADS PubMed 139 Joice R , Yasuda K , Shafquat A , et al. Determining microbial products and identifying molecular targets in the human microbiome . Cell Metab 2014 ; 20 : 731 – 41 . http://dx.doi.org/10.1016/j.cmet.2014.10.003 Google Scholar CrossRef Search ADS PubMed 140 Dudhagara P , Bhavsar S , Bhagat C , et al. Web resources for metagenomics studies . Genomics Proteomics Bioinformatics 2015 ; 13 ( 5 ): 296 – 303 . http://dx.doi.org/10.1016/j.gpb.2015.10.003 Google Scholar CrossRef Search ADS PubMed 141 Kim Y , Koh I , Rho M. Deciphering the human microbiome using next-generation sequencing data and bioinformatics approaches . Methods 2015 ; 79-80 : 52 – 9 . Google Scholar CrossRef Search ADS PubMed 142 Human Microbiome Project Consortium . Structure, function and diversity of the healthy human microbiome . Nature 2012 ; 486 : 207 – 14 . http://dx.doi.org/10.1038/nature11234 CrossRef Search ADS PubMed 143 Dehoux P , Marvaud JC , Abouelleil A , et al. Comparative genomics of Clostridium bolteae and Clostridium clostridioforme reveals species-specific genomic properties and numerous putative antibiotic resistance determinants . BMC Genomics 2016 ; 17 : 819 . http://dx.doi.org/10.1186/s12864-016-3152-x Google Scholar CrossRef Search ADS PubMed 144 Milani C , Turroni F , Duranti S , et al. Genomics of the genus bifidobacterium reveals species-specific adaptation to the glycan-rich gut environment . Appl Environ Microbiol 2015 ; 82 : 980 – 91 . Google Scholar CrossRef Search ADS PubMed 145 Ravcheev DA , Godzik A , Osterman AL , et al. Polysaccharides utilization in human gut bacterium Bacteroides thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory networks . BMC Genomics 2013 ; 14 ( 1 ): 873 . http://dx.doi.org/10.1186/1471-2164-14-873 Google Scholar CrossRef Search ADS PubMed 146 Neville BA , Sheridan PO , Harris HMB , et al. Pro-inflammatory flagellin proteins of prevalent motile commensal bacteria are variably abundant in the intestinal microbiome of elderly humans . PLoS One 2013 ; 8 ( 7 ): e68919 . Google Scholar CrossRef Search ADS PubMed 147 Manor O , Borenstein E. Revised computational metagenomic processing uncovers hidden and biologically meaningful functional variation in the human microbiome . Microbiome 2017 ; 5 : 19 . http://dx.doi.org/10.1186/s40168-017-0231-4 Google Scholar CrossRef Search ADS PubMed 148 Greenblum S , Turnbaugh PJ , Borenstein E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease . Proc Natl Acad Sci USA 2012 ; 109 : 594 – 9 . http://dx.doi.org/10.1073/pnas.1116053109 Google Scholar CrossRef Search ADS PubMed 149 Chander AM , Nair RG , Kaur G , et al. Genome insight and comparative pathogenomic analysis of Nesterenkonia jeotgali Strain CD08_7 isolated from duodenal mucosa of celiac disease patient . Front Microbiol 2017 ; 8 : 129 . Google Scholar CrossRef Search ADS PubMed 150 Walsh CJ , Guinane CM , Hill C , et al. In silico identification of bacteriocin gene clusters in the gastrointestinal tract, based on the Human Microbiome Project’s reference genome database . BMC Microbiol 2015 ; 15 : 183 . Google Scholar CrossRef Search ADS PubMed 151 Zhao L. The gut microbiota and obesity: from correlation to causality . Nat Rev Microbiol 2013 ; 11 ( 9 ): 639 – 47 . http://dx.doi.org/10.1038/nrmicro3089 Google Scholar CrossRef Search ADS PubMed 152 Ni Y , Li J , Panagiotou GCOMAN. a web server for comprehensive metatranscriptomics analysis . BMC Genomics 2016 ; 17 : 622 . http://dx.doi.org/10.1186/s12864-016-2964-z Google Scholar CrossRef Search ADS PubMed 153 Narayanasamy S , Jarosz Y , Muller EEL , et al. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses . Genome Biol 2016 ; 17 : 260 . http://dx.doi.org/10.1186/s13059-016-1116-8 Google Scholar CrossRef Search ADS PubMed 154 Rotmistrovsky K , Agarwala R. BMTagger: Best Match Tagger for Removing Human Reads from Metagenomics Datasets. 2011 . 155 Westreich ST , Korf I , Mills DA , et al. SAMSA: a comprehensive metatranscriptome analysis pipeline . BMC Bioinformatics 2016 ; 17 : 399 . http://dx.doi.org/10.1186/s12859-016-1270-8 Google Scholar CrossRef Search ADS PubMed 156 Edgar RC. Search and clustering orders of magnitude faster than BLAST . Bioinformatics 2010 ; 26 : 2460 – 1 . http://dx.doi.org/10.1093/bioinformatics/btq461 Google Scholar CrossRef Search ADS PubMed 157 Marchler-Bauer A , Lu S , Anderson JB , et al. CDD: a conserved domain database for the functional annotation of proteins . Nucleic Acids Res 2011 ; 39 : D2259 . Google Scholar CrossRef Search ADS 158 Sunagawa S , Mende DR , Zeller G , et al. Metagenomic species profiling using universal phylogenetic marker genes . Nat Methods 2013 ; 10 : 1196 – 9 . http://dx.doi.org/10.1038/nmeth.2693 Google Scholar CrossRef Search ADS PubMed 159 Celaj A , Markle J , Danska J , et al. Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation . Microbiome 2014 ; 2 : 39 . http://dx.doi.org/10.1186/2049-2618-2-39 Google Scholar CrossRef Search ADS PubMed 160 Petriz BA , Franco OL. Metaproteomics as a complementary approach to gut microbiota in health and disease . Front Chem 2017 ; 5 : 4. Google Scholar CrossRef Search ADS PubMed 161 Wilkins MR , Gasteiger E , Bairoch A , et al. Protein identification and analysis tools in the ExPASy server . Methods Mol Biol 1999 ; 112 : 531 – 52 . Google Scholar PubMed 162 Chatterjee S , Stupp GS , Park SKR , et al. A comprehensive and scalable database search system for metaproteomics . BMC Genomics 2016 ; 17 ( 1 ): 642 . http://dx.doi.org/10.1186/s12864-016-2855-3 Google Scholar CrossRef Search ADS PubMed 163 Sievers F , Wilm A , Dineen D , et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega . Mol Syst Biol 2014 ; 7 : 539 – 9 . http://dx.doi.org/10.1038/msb.2011.75 Google Scholar CrossRef Search ADS 164 Lupas A , Van Dyke M , Stock J. Predicting coiled coils from protein sequences . Science 1991 ; 252 ( 5009 ): 1162 – 4 . http://dx.doi.org/10.1126/science.252.5009.1162 Google Scholar CrossRef Search ADS PubMed 165 Gattiker A , Bienvenut WV , Bairoch A , et al. FindPept, a tool to identify unmatched masses in peptide mass fingerprinting protein identification . Proteomics 2002 ; 2 ( 10 ): 1435 – 44 . http://dx.doi.org/10.1002/1615-9861(200210)2:10<1435::AID-PROT1435>3.0.CO;2-9 Google Scholar CrossRef Search ADS PubMed 166 Jagtap PD , Blakely A , Murray K , et al. Metaproteomic analysis using the Galaxy framework . Proteomics 2015 ; 15 ( 20 ): 3553 – 65 . http://dx.doi.org/10.1002/pmic.201500074 Google Scholar CrossRef Search ADS PubMed 167 Pedruzzi I , Rivoire C , Auchincloss AH , et al. HAMAP in 2015: updates to the protein family classification and annotation system . Nucleic Acids Res 2015 ; 43 : D1064 – 70 . [WorldCat] Google Scholar CrossRef Search ADS PubMed 168 Balwierz PJ , Pachkov M , Arnold P , et al. ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs . Genome Res 2014 ; 24 ( 5 ): 869 – 84 . http://dx.doi.org/10.1101/gr.169508.113 Google Scholar CrossRef Search ADS PubMed 169 Perkins DN , Pappin DJ , Creasy DM , et al. Probability-based protein identification by searching sequence databases using mass spectrometry data . Electrophoresis 1999 ; 20 ( 18 ): 3551 – 67 . http://dx.doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2 Google Scholar CrossRef Search ADS PubMed 170 Tabb DL , Fernando CG , Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis . J Proteome Res 2007 ; 6 : 654 – 61 . http://dx.doi.org/10.1021/pr0604054 Google Scholar CrossRef Search ADS PubMed 171 Horlacher O , Nikitin F , Alocci D , et al. MzJava: an open source library for mass spectrometry data processing . J Proteomics 2015 ; 129 : 63 – 70 . http://dx.doi.org/10.1016/j.jprot.2015.06.013 Google Scholar CrossRef Search ADS PubMed 172 Geer LY , Markey SP , Kowalak JA , et al. Open mass spectrometry search algorithm . J Proteome Res 2004 ; 3 ( 5 ): 958 – 64 . http://dx.doi.org/10.1021/pr0499491 Google Scholar CrossRef Search ADS PubMed 173 Krissinel E , Henrick K. Inference of macromolecular assemblies from crystalline state . J Mol Biol 2007 ; 372 ( 3 ): 774 – 97 . http://dx.doi.org/10.1016/j.jmb.2007.05.022 Google Scholar CrossRef Search ADS PubMed 174 Vaezzadeh AR , Hernandez C , Vadas O , et al. pICarver: a software tool and strategy for peptides isoelectric focusing . J Proteome Res 2008 ; 7 ( 10 ): 4336 – 45 . http://dx.doi.org/10.1021/pr8002672 Google Scholar CrossRef Search ADS PubMed 175 Yachdav G , Kloppmann E , Kajan L , et al. PredictProtein–an open resource for online prediction of protein structural and functional features . Nucleic Acids Res 2014 ; 42 : W337 – 43 . Google Scholar CrossRef Search ADS PubMed 176 Benkert P , Tosatto SCE , Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment . Proteins Struct Funct Bioinforma 2008 ; 71 : 261 – 77 . http://dx.doi.org/10.1002/prot.21715 Google Scholar CrossRef Search ADS 177 Ahrné E , Nikitin F , Lisacek F , et al. QuickMod: a tool for open modification spectrum library searches . J Proteome Res 2011 ; 10 ( 7 ): 2913 – 21 . Google Scholar CrossRef Search ADS PubMed 178 Searle BC. Scaffold: a bioinformatic tool for validating MS/MS-based proteomic studies . Proteomics 2010 ; 10 ( 6 ): 1265 – 9 . http://dx.doi.org/10.1002/pmic.200900437 Google Scholar CrossRef Search ADS PubMed 179 de Castro E , Sigrist CJA , Gattiker A , et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins . Nucleic Acids Res 2006 ; 34 : W362 – 5 . Google Scholar CrossRef Search ADS PubMed 180 Eng JK , McCormack AL , Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database . J Am Soc Mass Spectrom 1994 ; 5 : 976 – 89 . http://dx.doi.org/10.1016/1044-0305(94)80016-2 Google Scholar CrossRef Search ADS PubMed 181 Notredame C , Higgins DG , Heringa J. T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1 . J Thornton J Mol Biol 2000 ; 302 : 205 – 17 . http://dx.doi.org/10.1006/jmbi.2000.4042 Google Scholar CrossRef Search ADS 182 Mesuere B , Van der Jeugt F , Devreese B , et al. The unique peptidome: Taxon-specific tryptic peptides as biomarkers for targeted metaproteomics . Proteomics 2016 ; 16 : 2313 – 8 . http://dx.doi.org/10.1002/pmic.201600023 Google Scholar CrossRef Search ADS PubMed 183 Craig R , Beavis RC. TANDEM: matching proteins with tandem mass spectra . Bioinformatics 2004 ; 20 ( 9 ): 1466 – 7 . http://dx.doi.org/10.1093/bioinformatics/bth092 Google Scholar CrossRef Search ADS PubMed 184 Artimo P , Jonnalagedda M , Arnold K , et al. ExPASy: SIB bioinformatics resource portal . Nucleic Acids Res 2012 ; 40 : W597 – 603 . Google Scholar CrossRef Search ADS PubMed 185 Boutet E , Lieberherr D , Tognolli M , et al. UniProtKB/Swiss-prot, the manually annotated section of the UniProt knowledgebase: how to use the entry view . Methods Mol Biol 2016 ; 1374 : 23 – 54 . Google Scholar CrossRef Search ADS PubMed 186 Szklarczyk D , Morris JH , Cook H , et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible . Nucleic Acids Res 2017 ; 45 ( D1 ): D362 – 8 . Google Scholar CrossRef Search ADS PubMed 187 Biasini M , Bienert S , Waterhouse A , et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information . Nucleic Acids Res 2014 ; 42 ( W1 ): W252 – 8 . Google Scholar CrossRef Search ADS PubMed 188 Sigrist CJA , de Castro E , Cerutti L , et al. New and continuing developments at PROSITE . Nucleic Acids Res 2013 ; 41 : D344 – 7 . Google Scholar CrossRef Search ADS PubMed 189 Hulo C , de Castro E , Masson P , et al. ViralZone: a knowledge resource to understand virus diversity . Nucleic Acids Res 2011 ; 39(Suppl 1) : D576 – 82 . Google Scholar CrossRef Search ADS 190 Gaudet P , Michel P-A , Zahn-Zabal M , et al. The neXtProt knowledgebase on human proteins: 2017 update . Nucleic Acids Res 2017 ; 45 ( D1 ): D177 – 82 . Google Scholar CrossRef Search ADS PubMed 191 ExPASy proteomics toolset: Protein sequences and identification. 192 ExPASy proteomics toolset: Proteomics experiment. 193 ExPASy proteomics toolset: Function analysis. 194 ExPASy proteomics toolset: Sequences sites, features and motifs. 195 ExPASy proteomics toolset: Protein modification. 196 ExPASy proteomics toolset: Protein structure. 197 ExPASy proteomics toolset: Protein interactions. 198 ExPASy proteomics toolset: Similarity search/alignment. 199 Xiong W , Abraham PE , Li Z , et al. Microbial metaproteomics for characterizing the range of metabolic functions and activities of human gut microbiota . Proteomics 2015 ; 15 ( 20 ): 3424 – 38 . http://dx.doi.org/10.1002/pmic.201400571 Google Scholar CrossRef Search ADS PubMed 200 Tanca A , Palomba A , Fraumene C , et al. The impact of sequence database choice on metaproteomic results in gut microbiota studies . Microbiome 2016 ; 4 ( 1 ): 51 . http://dx.doi.org/10.1186/s40168-016-0196-8 Google Scholar CrossRef Search ADS PubMed 201 Muth T , Renard BY , Martens L. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics . Expert Rev Proteomics 2016 ; 13 : 757 – 69 . http://dx.doi.org/10.1080/14789450.2016.1209418 Google Scholar CrossRef Search ADS PubMed 202 Muth T , Kolmeder CA , Salojärvi J , et al. Navigating through metaproteomics data - a logbook of database searching . Proteomics 2015 ; 15 : 3439 – 53 . Google Scholar CrossRef Search ADS PubMed 203 Jagtap PD , Johnson JE , Onsongo G , et al. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework . J Proteome Res 2014 ; 13 : 5898 – 908 . http://dx.doi.org/10.1021/pr500812t Google Scholar CrossRef Search ADS PubMed 204 Zhang X , Ning Z , Mayne J , et al. MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota . Microbiome 2016 ; 4 ( 1 ): 31 . http://dx.doi.org/10.1186/s40168-016-0176-z Google Scholar CrossRef Search ADS PubMed 205 Mesuere B , Willems T , Van der Jeugt F , et al. Unipept web services for metaproteomics analysis . Bioinformatics 2016 ; 32 ( 11 ): 1746 – 8 . http://dx.doi.org/10.1093/bioinformatics/btw039 Google Scholar CrossRef Search ADS PubMed 206 Kolmeder CA , Ritari J , Verdam FJ , et al. Colonic metaproteomic signatures of active bacteria and the host in obesity . Proteomics 2015 ; 15 ( 20 ): 3544 – 52 . http://dx.doi.org/10.1002/pmic.201500049 Google Scholar CrossRef Search ADS PubMed 207 Kolmeder CA , Salojärvi J , Ritari J , et al. Faecal metaproteomic analysis reveals a personalized and stable functional microbiome and limited effects of a probiotic intervention in adults . PLoS One 2016 ; 11 ( 4 ): e0153294 . Google Scholar CrossRef Search ADS PubMed 208 Heintz-Buschart A , May P , Laczny CC , et al. Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes . Nat Microbiol 2016 ; 2 : 16180 . http://dx.doi.org/10.1038/nmicrobiol.2016.180 Google Scholar CrossRef Search ADS PubMed 209 Smirnov KS , Maier TV , Walker A , et al. Challenges of metabolomics in human gut microbiota research . Int J Med Microbiol 2016 ; 306 : 266 – 79 . http://dx.doi.org/10.1016/j.ijmm.2016.03.006 Google Scholar CrossRef Search ADS PubMed 210 Aw W , Fukuda S. Toward the comprehensive understanding of the gut ecosystem via metabolomics-based integrated omics approach . Semin Immunopathol 2015 ; 37 ( 1 ): 5 – 16 . http://dx.doi.org/10.1007/s00281-014-0456-2 Google Scholar CrossRef Search ADS PubMed 211 Aguiar-Pulido V , Huang W , Suarez-Ulloa V , et al. Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis . Evol Bioinform Online 2016 ; 12 : 5 – 16 . Google Scholar PubMed 212 Cheema AK , Maier I , Dowdy T , et al. Chemopreventive metabolites are correlated with a change in intestinal microbiota measured in A-T mice and decreased carcinogenesis . PLoS One 2016 ; 11 ( 4 ): e0151190 . Google Scholar CrossRef Search ADS PubMed 213 Duffy LC , Raiten DJ , Hubbard VS , et al. Progress and challenges in developing metabolic footprints from diet in human gut microbial cometabolism . J Nutr 2015 ; 145 ( 5 ): 1123S – 30S . Google Scholar CrossRef Search ADS PubMed 214 Martin F-PJ , Collino S , Rezzi S , et al. Metabolomic applications to decipher gut microbial metabolic influence in health and disease . Front Physiol 2012 ; 3 : 113 . Google Scholar CrossRef Search ADS PubMed 215 Bolvig AK , Nørskov NP , Hedemann MS , et al. The effect of antibiotics and diet on enterolactone concentration and metabolome studied by targeted and non-targeted LC-MS metabolomics . J Proteome Res 2017 ; 16 : 2135 – 50 . Google Scholar CrossRef Search ADS PubMed 216 Choo JM , Kanno T , Zain NMM , et al. Divergent relationships between fecal microbiota and metabolome following distinct antibiotic-induced disruptions . mSphere 2017 ; 2 : e00005-17 . Google Scholar CrossRef Search ADS PubMed 217 Wilson CM , Aggio RBM , O’Toole PW , et al. Transcriptional and metabolomic consequences of LuxS inactivation reveal a metabolic rather than quorum-sensing role for LuxS in Lactobacillus reuteri 100-23 . J Bacteriol 2012 ; 194 : 1743 – 6 . Google Scholar CrossRef Search ADS PubMed 218 Klassen A , Faccio AT , Canuto GAB , et al. Metabolomics: definitions and significance in systems biology . Adv Exp Med Biol 2017 ; 965 : 3 – 17 . Google Scholar CrossRef Search ADS PubMed 219 Dias DA , Jones OAH , Beale DJ , et al. Current and future perspectives on the structural identification of small molecules in biological systems . Metabolites 2016 ; 6 ( 4 ): 46 . http://dx.doi.org/10.3390/metabo6040046 Google Scholar CrossRef Search ADS 220 Worley B , Powers R. Multivariate analysis in metabolomics . Curr Metabolomics 2013 ; 1 ( 1 ): 92 – 107 . Google Scholar PubMed 221 Krumsiek J , Bartel J , Theis FJ. Computational approaches for systems metabolomics . Curr Opin Biotechnol 2016 ; 39 : 198 – 206 . http://dx.doi.org/10.1016/j.copbio.2016.04.009 Google Scholar CrossRef Search ADS PubMed 222 Johnson CH , Ivanisevic J , Siuzdak G. Metabolomics: beyond biomarkers and towards mechanisms . Nat Rev Mol Cell Biol 2016 ; 17 ( 7 ): 451 – 9 . http://dx.doi.org/10.1038/nrm.2016.25 Google Scholar CrossRef Search ADS PubMed 223 Wägele B , Witting M , Schmitt-Kopplin P , et al. MassTRIX reloaded: combined analysis and visualization of transcriptome and metabolome data . PLoS One 2012 ; 7 ( 7 ): e39860 . Google Scholar CrossRef Search ADS PubMed 224 Wang Y , Kora G , Bowen BP , et al. MIDAS: a database-searching algorithm for metabolite identification in metabolomics . Anal Chem 2014 ; 86 : 9496 – 503 . http://dx.doi.org/10.1021/ac5014783 Google Scholar CrossRef Search ADS PubMed 225 Ruttkies C , Schymanski EL , Wolf S , et al. MetFrag relaunched: incorporating strategies beyond in silico fragmentation . J Cheminform 2016 ; 8 : 3 . http://dx.doi.org/10.1186/s13321-016-0115-9 Google Scholar CrossRef Search ADS PubMed 226 Vyas J , Nowling RJ , Meusburger T , et al. MimoSA: a system for minimotif annotation . BMC Bioinformatics 2010 ; 11 : 328 . http://dx.doi.org/10.1186/1471-2105-11-328 Google Scholar CrossRef Search ADS PubMed 227 Wishart DS , Jewison T , Guo AC , et al. HMDB 3.0–the human metabolome database in 2013 . Nucleic Acids Res 2013 ; 41 : D801 – 7 . Google Scholar CrossRef Search ADS PubMed 228 Ulrich EL , Akutsu H , Doreleijers JF , et al. BioMagResBank . Nucleic Acids Res 2008 ; 36 : D402 – 8 . Google Scholar CrossRef Search ADS PubMed 229 Cui Q , Lewis IA , Hegeman AD , et al. Metabolite identification via the madison metabolomics consortium database . Nat Biotechnol 2008 ; 26 : 162 – 4 . http://dx.doi.org/10.1038/nbt0208-162 Google Scholar CrossRef Search ADS PubMed 230 Horai H , Arita M , Kanaya S , et al. MassBank: a public repository for sharing mass spectral data for life sciences . J Mass Spectrom 2010 ; 45 ( 7 ): 703 – 14 . http://dx.doi.org/10.1002/jms.1777 Google Scholar CrossRef Search ADS PubMed 231 Kopka J , Schauer N , Krueger S , et al. GMD@CSB.DB: the golm metabolome database . Bioinformatics 2005 ; 21 ( 8 ): 1635 – 8 . http://dx.doi.org/10.1093/bioinformatics/bti236 Google Scholar CrossRef Search ADS PubMed 232 Smith CA , O'Maille G , Want EJ , et al. METLIN: a metabolite mass spectral database . Ther Drug Monit 2005 ; 27 ( 6 ): 747 – 51 . http://dx.doi.org/10.1097/01.ftd.0000179845.53213.39 Google Scholar CrossRef Search ADS PubMed 233 Little JL , Williams AJ , Pshenichnov A , et al. Identification of ‘known unknowns’ utilizing accurate mass data and ChemSpider . J Am Soc Mass Spectrom 2012 ; 23 ( 1 ): 179 – 85 . Google Scholar CrossRef Search ADS PubMed 234 Kim S , Thiessen PA , Bolton EE , et al. PubChem substance and compound databases . Nucleic Acids Res 2016 ; 44 ( D1 ): D1202 – 13 . Google Scholar CrossRef Search ADS PubMed 235 Jeffryes JG , Colastani RL , Elbadawi-Sidhu M , et al. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics . J Cheminform 2015 ; 7 : 44 . Google Scholar CrossRef Search ADS PubMed 236 Minkiewicz P , Darewicz M , Iwaniak A , et al. Internet databases of the properties, enzymatic reactions, and metabolism of small molecules-search options and applications in food science . Int J Mol Sci 2016 ; 17 ( 12 ): 2039 . http://dx.doi.org/10.3390/ijms17122039 Google Scholar CrossRef Search ADS 237 Misra BB , van der Hooft JJJ. Updates in metabolomics tools and resources: 2014-2015 . Electrophoresis 2016 ; 37 ( 1 ): 86 – 110 . http://dx.doi.org/10.1002/elps.201500417 Google Scholar CrossRef Search ADS PubMed 238 Ahmed I , Greenwood R , Costello B , et al. Investigation of faecal volatile organic metabolites as novel diagnostic biomarkers in inflammatory bowel disease . Aliment Pharmacol Ther 2016 ; 43 : 596 – 611 . http://dx.doi.org/10.1111/apt.13522 Google Scholar CrossRef Search ADS PubMed 239 Jansson J , Willing B , Lucio M , et al. Metabolomics reveals metabolic biomarkers of Crohn’s disease . PLoS One 2009 ; 4 ( 7 ): e6386 . Google Scholar CrossRef Search ADS PubMed 240 Lee T , Clavel T , Smirnov K , et al. Oral versus intravenous iron replacement therapy distinctly alters the gut microbiota and metabolome in patients with IBD . Gut 2017 ; 66 : 863 – 71 . http://dx.doi.org/10.1136/gutjnl-2015-309940 Google Scholar CrossRef Search ADS PubMed 241 Ghishan FK , Kiela PR. Epithelial transport in inflammatory bowel diseases . Inflamm Bowel Dis 2014 ; 20 : 1099 – 109 . Google Scholar PubMed 242 Cortassa S , Caceres V , Bell LN , et al. From metabolomics to fluxomics: a computational procedure to translate metabolite profiles into metabolic fluxes . Biophys J 2015 ; 108 : 163 – 72 . http://dx.doi.org/10.1016/j.bpj.2014.11.1857 Google Scholar CrossRef Search ADS PubMed 243 Winter G , Krömer JO. Fluxomics - connecting ‘omics analysis and phenotypes . Environ Microbiol 2013 ; 15 ( 7 ): 1901 – 16 . Google Scholar CrossRef Search ADS PubMed 244 Martínez VS , Buchsteiner M , Gray P , et al. Dynamic metabolic flux analysis using B-splines to study the effects of temperature shift on CHO cell metabolism . Metab Eng Commun 2015 ; 2 : 46 – 57 . Google Scholar CrossRef Search ADS http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Resources and tools for the high-throughput, multi-omic study of intestinal microbiota