Resources and tools for the high-throughput, multi-omic study of intestinal microbiota
Resources and tools for the high-throughput, multi-omic study of intestinal microbiota
Blanco-MÃguez,, Aitor;Fdez-Riverola,, Florentino;Sánchez,, Borja;Lourenço,, Anália
2019-05-21 00:00:00
Abstract The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host–microbe interactions. human gut microbiome, data repositories, large-scale and integrative computational tools, modelling, immunomodulation, drug screening Background The human gastrointestinal tract is a complex ecosystem in which eukaryotic cells continuously interact with nutrients and with the complex microbial population of the gut microbiota [1]. Gut microorganisms are the source of many bioactive products that play key functions in human host pathways and microbe–microbe interactions [2]. Processes such as host–microbe crosstalk, immune activation and inflammation, microbe–microbe signalling, microbial metabolism and antimicrobial activity are bioactive in the human gut [3]. Therefore, the ability to modulate the gut microbiome and the associated host–microbe interactions holds great promise for developing new therapeutic strategies for many chronic diseases and antibiotic-resistant infections [4, 5]. Colonization of the gut starts just after birth when pioneering species interact, through surface receptors, with gut cells to promote the expression of a specific set of host genes and favour the colonization of commensal microorganisms [6]. The epithelium function and the mucosal associated immune system are influenced by direct host–microbiota interactions and through modulation of the microbial metabolism [7]. The immune system is trained to ensure a fine balance between the response given to commensal gut microbiota (i.e. homeostatic and healthy situations) and pathogens (i.e. gastrointestinal disorders) [8]. Several non-infectious human diseases, such as autoimmune disorders, inflammatory bowel disease (IBD) and some gut-associated cancers, are related to the immunological imbalance and compositional perturbations of the gut microbiota, also known as dysbiosis. Gut dysbiosis is a major contributor in diet-related obesity and type 2 diabetes mellitus [9, 10]. For example, alterations in the relative abundances of Gammaproteobacteria and Verrucomicrobia phyla as well as in the ratios of Firmicutes to Bacteroidetes are associated to overweight, and alterations in butyrate-producing bacteria, such as Faecalibacterium prausnitzii, are often related to diabetes mellitus [11, 12]. Moreover, genetic and simple obesity share similar structural and functional features of dysbiosis, such as higher production of toxins with known potential to induce metabolic deteriorations (e.g. trimethylamine-N-oxide and indoxyl sulphate), higher abundance of genomes containing genes coding enzymes involved in the production of these toxic co-metabolites and higher abundance of pathways for biosynthesis of bacterial antigens (such as endotoxin) [13–15]. Although the precise cause remains unknown, profiling studies of the gut microbiome associate the pathogenesis of IBD, a chronic and relapsing inflammatory disorder of the gut, with the under-representation of certain species in the faecal microbiota [16–19]. For example, F. prausnitzii has been postulated as a biomarker or a potential therapeutic agent of IBD [20, 21]. Gastric cancer and colorectal cancer are also connected to alterations in gut microbiota. For example, some dietary factors may alter gut microbiota interactions and affect cancer development and response to cancer treatment [22]. In fact, the term ‘oncobiome’ has been recently adopted to refer to the emerging field of research devoted to the study of the interplay between the human microbiome and cancer development [23]. Moreover, it is well recognized that the excessive use of broad-spectrum antibiotics can affect the relative proportions of gut microbial populations and foster bacterial resistance [24]. Considering recent technological advancements and community initiatives towards large-scale compilation of data on human microbiome, integrative data analysis may be the key to better understand the mechanisms of action of the gut microbiome and their implication in the development and chronicity of the above-mentioned diseases [25]. For example, the study of colon cancer has relied on the combination of microbiome and metabolome data [26], proteome and metagenome data supported the investigation of Crohn's disease (CD) [27] and metabolome, metagenome and metatranscriptome data provide a basis for the investigation of the relations among gut microbiome and the xenobiotic metabolism of digoxin [28]. Furthermore, systems-level approaches, namely, metabolic modelling approaches [5, 29] and microbiome-based predictive tools [30, 31] are showing great potential in delivering non-obvious and biologically meaningful knowledge. Previous reviews presented attempts at computational systems biology and in silico modelling of the human microbiome [32] and introduced computational methods for understanding the human gut microbiota and developing therapeutic strategies [33]. The focus of the present review is human gut microbiome research, and our aim is to provide up-to-date information on bioinformatics resources and tools specialized in or useful for the multifaceted investigation of this microbiome (Figure 1). This review is accompanied by a small website (available on http://sing-group.org/humangut) that keeps up-to-date track of the public availability of the hereby mentioned projects, resources and tools while welcoming further inputs from the community. Figure 1 View largeDownload slide Unravelling the mechanism of action of the human microbiome: resources and tools for the study of the intestinal microbiota. Figure 1 View largeDownload slide Unravelling the mechanism of action of the human microbiome: resources and tools for the study of the intestinal microbiota. Attention is set on two main application areas: (1) the characterization of gut microbiota composition and the functional interplay related to dysbiosis, such as disease and antibiotic therapy; and (2) the screening of the proteome of human gut species for products holding immunomodulatory, anti-inflammatory or other bioactivity of therapeutic interest. This work is thus considered of interest to those investigating the human gut microbiome and, in particular, those who are developing in silico software to pursue and consolidate emerging paths in such research. Data repositories According to the journal Science, the discovery of the microbiome was one of the 10 milestones of the first decade of the 21st century (http://www.sciencemag.org/site/special/insights2010/). Late in 2000s, two large-scale initiatives were launched with the aim to document the role of human-associated microbial communities in human health and disease, i.e. the NIH’s Human Microbiome Project (HMP) [34] and the European Metagenomics of the Human Intestinal Tract (MetaHit) project [35]. Despite the enormous volume of data generated by these initiatives, general data usage is challenged by a number of design, technical and access decisions [36]. For example, many analyses still depend on a catalogue of reference genes. Existing catalogues for the human gut microbiome are based on samples from single cohorts or reference genomes (or protein sequences), which limits coverage of global microbiome diversity. Therefore, efforts have been invested in implementing integrated catalogues of reference genes [37] and developing approaches to conduct population-level analysis [38]. New catalogues as the 1000 Genomes Project [39], the AmericanGut (http://americangut.org/) and the BritishGut (http://britishgut.org/) are supporting these analyses. Although it is hardly possible to enumerate all existing data resources that may be helpful for this area of research, it is important to have a comprehensive view of data availability, so that the usefulness of less known repositories is uncovered and potential information gaps can be tackled. In this sense, Figure 2 presents relevant data repositories for human gut microbiome research, and Table 1 provides a list of the data sets available for human gut (e.g. from single cohort studies). Additional details on the explored databases can be found in Supplementary Material S1 and in the Web pages supporting this review. Table 1 Data sets available for the study of the human gut microbiome and its interplay with the host in health and disease scenarios Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D View Large Table 1 Data sets available for the study of the human gut microbiome and its interplay with the host in health and disease scenarios Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D Data set URL Target Manipulation of the gut microbiota reveals role in colon tumorigenesis [56] http://www.ncbi.nlm.nih.gov/sra/? term=SRP056144 Colon tumorigensis Disease-specific alterations in the enteric virome in inflammatory bowel disease [57] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11446 CD and ulcerative colitis (UC) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease [27] http://compbio.ornl.gov/crohns_disease_metagenomics_metaproteomics/ CD Gut microbiome in down syndrome [58] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp10557 Down sindrome Metabolome of human gut microbiome is predictive of host dysbiosis [59] http://gigadb.org/dataset/100163 Dysbiosis Helicobacter pylori eradication causes perturbation of the human gut microbiome in young adults [60] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp8960 Dysbiosis Interactions between the intestinal microbiota and bile acids in gallstones patients [61] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11209 Gallstone patients An integrated catalog of reference genes in the human gut microbiome [37] http://gigadb.org/dataset/100064 General An iterative workflow for mining the human intestinal metaproteome [62] ftp://ftp.ncbi.nih.gov/pub/TraceDB/human_gut_metagenome/ General Fecal microbial composition of ulcerative colitis and Crohn’s disease patients in remission and subsequent exacerbation [63] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp4728 IBD, CD and ulcerative colitis Inference of network dynamics and metabolic interactions in the gut microbiome [64] https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src Model construction Development of the preterm gut microbiome in twins at risk of necrotising enterocolitis and sepsis [65] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp3781 Necrotising enterocolitis and sepsis Patterned progression of bacterial populations in the premature infant gut [66] https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs000247.v4.p3 Necrotizing enterocolitis Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children [12] https://www.ncbi.nlm.nih.gov/sra/? term=SRP045211 Obesity A core gut microbiome in obese and lean twins [67] http://metagenomics.anl.gov/linkin.cgi? project=mgp10 Obesity Moving pictures of the human microbiome [68] http://metagenomics.anl.gov/linkin.cgi? project=mgp93 Obesity, CD, IBD and malnutrition Temporal dynamics of the gut microbiota in people sharing a confined environment, a 520-day ground-based space simulation, MARS500 [69] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp79314 Population study Gut microbiome of the Hadza hunter-gatherers [70] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp7058 Population study A phylo-functional core of gut microbiota in healthy young Chinese cohorts across lifestyles, geography and ethnicities [71] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp1538 Population study Gut Microbiota and Extreme Longevity [72] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17761 Population study Variation in rural African gut microbiota is strongly correlated with colonization by Entamoeba and subsistence [73] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp15238 Population study Gut microbiome of coexisting BaAka Pygmies and Bantu Reflects Gradients of Traditional Subsistence Patterns [74] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp16608 Population study Gut microbiota of type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study [75] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp11616 T1D A metagenome-wide association study of gut microbiota in type 2 diabetes [35] https://www.ncbi.nlm.nih.gov/sra/? term=SRA045646https://www.ncbi.nlm.nih.gov/sra/? term=SRA050230 T2D Gut metagenome in European women with normal, impaired and diabetic glucose control [76] https://www.ncbi.nlm.nih.gov/sra? term=ERP002469 T2D Modulation of gut microbiota dysbioses in type 2 diabetic patients by macrobiotic Ma-Pi 2 diet [77] http://metagenomics.anl.gov/mgmain.html? mgpage=project&project=mgp17675 T2D View Large Figure 2 View largeDownload slide Mindmap of bioinformatics databases and data repositories commonly used in gut-related research. Figure 2 View largeDownload slide Mindmap of bioinformatics databases and data repositories commonly used in gut-related research. Functional profiling using reference information can be based either on reference genome read mapping (at the nucleotide level) or translated protein database searches [30]. That is, the assignment may be based on full protein-coding genes (CDSs) by means of orthology relations with sequences in well-characterized functional databases, such as NCBI nr [40], KEGG Orthology [41] and COGs [42], or by identifying specific PFAM [43] or SMART [44] peptide domains within CDSs. Broader biological functions are then built on these low-level functional annotations [45] using hierarchical ontologies that group functionally related proteins, such as in KEGG [41], Metacyc [46] and SEED [47]. Data processing and integration pipelines are also available, for instance, MG-RAST [48], IMG/M [49], MEGAN [50], HUMAnN [51], MALINA [52], MOCAT2 [53] and COGNIZER [54]. These pipelines typically include some combination of quality control and interference steps subsequent to homology search, such as selection of pathways by maximum parsimony, taxonomic limitation or statistical smoothing. However, as whole-community functional profiling is not yet well established, neither gene annotations within reference genomes nor those in protein databases are well tuned to whole-community metabolism. Indeed, both MetaCyc [46] and SEED [47] have ongoing efforts to develop microbiome-specific functional annotations, and gene family catalogues, such as eggNOG [55], are looking for a better way to represent uncultured communities. Bioinformatics tools A number of different bioinformatics tools are useful to the study of human gut. In particular, considering the huge volume of data being generated by high-throughput technologies, tools are needed for the processing and analysis of individual omics data as well as to gain a multi-level, integrated understanding of the role of gut microbiome in different aspects of health and disease conditions. The level of complexity and specialization of these tools varies significantly. Figure 3 summarizes commonly used bioinformatics tools, which are described in the following sections. A more detailed description of these tools is found within the Supplementary Material S1 and in the Web pages supporting this review, and the objectives, pros and cons of the different OMICS are available on Supplementary Material S2. Figure 3 View largeDownload slide Mindmap of bioinformatics tools commonly used in gut-related research. Tool names ended in an asterisk are public but require login and those ended in a number sign are private. The rest of tools are publicly available. Figure 3 View largeDownload slide Mindmap of bioinformatics tools commonly used in gut-related research. Tool names ended in an asterisk are public but require login and those ended in a number sign are private. The rest of tools are publicly available. Metagenomics: composition, abundance and variation Metagenome is the collective genome of a given microbial community. Metagenome sequencing presents the first, perhaps the greatest, opportunity to identify novel and biologically interesting microbial products in the human gut microbiome [78, 79]. Thousands of human gut-associated metagenomes have already been sequenced, representing an extensive database for mining biologically active microbial products, and studying intestinal microbiome diversity and dysbiosis, as well as relations to health and disease [80]. General and detailed descriptions of metagenomics technologies and computational support can be found in recent reviews on the field [81–84]. Table 2 shows common and publicly available metagenomics tools. Table 2 Publicly available metagenomics tools Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities View Large Table 2 Publicly available metagenomics tools Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities Tool Purpose AlFree [85] Phylogeny reconstruction using alignment-free sequence comparison methods AMOS [86] Assembling DNA reads AmphoraNet [87] Phylogenetic analysis of metagenomic shotgun sequencing data and genomic data BAGEL3 [88] Mining for bacteriocins in single or multiple DNA sequences a, e.g. (un)finished genomes, scaffold files, and meta-genomics data BLAST [89] Identification of regions of similarity between biological sequences CAFE [90] Integrating 28 measures and downstream visualised analysis CAMERA [91] Creating a rich, distinctive data repository and a bioinformatics tools resource Cd-hit [92] Clustering and comparison of protein or nucleotide sequences. Chimera Slayer [93] Detection of sequences falsely interpreted as organisms (contributing to false perceptions of sample diversity and the false identification of novel taxa) CloVR [94] Automated sequence analysis able to use cloud computing resources COGNIZER [54] Functional annotation of metagenomic data sets CONCOCT [95] Unsupervised binning of metagenomic contigs CVTree server [96] Construction of whole-genome-based phylogenetic trees DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DESMAN [98] Contig-based strain inference across multiple samples DIAMOND [99] High-throughput alignment of DNA reads and protein sequences EMPANADA Evidence-based assignment of genes to pathways in metagenomic data FishTaco [100] Descomposition of functional shifts into individual taxon-level contributions FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis FragGeneScan [111] Predicting protein-coding region in short reads Genboree Microbiome Toolset [103] Multi-omic data analysis Glimmer-MG [104] Allowing to detect genes in environmental shotgun DNA sequences GroopM [105] Using differential coverage to obtain high fidelity population genomes from related metagenomes HUMAnN [51] Determination of relative abundances of the gut microbial functional pathways in a community from metagenomic data IDBA-UD [106] Iterative De Bruijn Graph De Novo Assembler for short reads sequencing data IMG/M [49] Analysis and annotation of genome and metagenome datasets in a comprehensive comparative context KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets MALINA [52] Analysis of whole-genome gut-related metagenomic data MaxBin [108] Binning assembled metagenomic sequences based on an expectation–maximization algorithm. MEGAHIT [109] Assembling of for large and complex metagenomics sequencing reads. MEGAN [50] Interactive exploration and analysis of large-scale microbiome sequencing data MetaBAT [110] Integrating empirical probabilistic distances of genome abundance and tetranucleotide frequency for metagenome binning MetaGeneAnnotator [111] Predicting prokaryotic genes from genomic sequences MetaMIS [112] Analysing time series data of microbial community profiles MetAMOS [113] An integrated assembly and analysis pipeline for metagenomic data MetaPhlAn [114] Estimation of species abundance metaSPAdes [115] Assembling single cells and highly polymorphic diploid genomes reads MetaVelvet [116] De novo sequence assembler from short sequence reads MG-RAST [117] Metagenome analysis MIRA [118] Whole-genome shotgun (WGS) and EST sequence assembler MOCAT2 [53] Metagenomic sequence assembly and gene prediction Mothur [119] Analysing sequencing data MUSiCC [120] Normalizing and correcting gene abundance measurements derived from metagenomic shotgun sequencing MyCC [121] Combining genomic signatures, marker genes and optional contig coverages for automated metagenome binning NBC Classifier [122] Naïve Bayes Classification tool Web server for taxonomic classification of metagenomic reads Orphelia [123] Predicting protein-coding genes in short DNA sequences from metagenomics sequencing projects PAUDA [124] High-performance algorithms to compute BLASTX-like alignments PhyloSift [125] Phylogenetic analysis of metagenomic samples and comparison of community structure among multiple related samples PICRUSt [126] Predicting metagenomes from 16S data and a reference genome database PRIAM [127] Automated enzyme detection in fully sequenced genome Prodigal [128] Allowing gene prediction for microbial genomes QIIME [129] Performing microbiome analysis from raw DNA sequencing data RAPSearch [130] Fast protein similarity search for short reads RAST [47] Fully automated service for annotating bacterial and archaeal genomes RPS-BLAST Searching in profile databases WebCARMA [131] Taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities View Large A key preliminary step in metagenomic analysis is to characterize the taxonomic diversity of the metagenome, i.e. to categorize various microbes and quantify their diversity in terms of species abundance. Here, it is important to differentiate the adopted methodologies between those in which concise regions of the 16S ribosomal DNA are polymerase chain reaction (PCR)-amplified and sequenced (metataxonomics), and those where the whole genetic material is isolated, fragmented and sequenced, i.e. the shotgun metagenomics (metagenomics) [132]. Metataxonomics data can test if there is a population split in complex communities. However, it rarely informs you of the mechanisms underlying the population split because of inter-individual variability and/or coverage. On the other hand, metagenomics offers an effective but imperfect method to profile the structure and the potential functions encoded in microbial communities. Many gut metagenomics studies still perform 16S ribosomal RNA (rRNA) sequencing, and pipelines used correspond to QIIME or MEGAN. However, whole-genome sequencing is becoming the technology of choice to perform sequence analysis and community comparison; so, we consider more appropriate to focus this section in this second option. The assembly of overlapping reads into continuous or semi-continuous genome fragments allows an in-depth view of different aspects within a genomic context. Numerous metagenome assemblers have been developed, most of which assemble sequences in de novo fashion, i.e. do not rely on a closely related reference sequence. MIRA [118] and AMOS [86] are examples of reference-based assemblers, while IDBA-UD [106], MetaVelvet [116], MetAMOS [113], MEGAHIT [109] and metaSPAdes [115] are examples of de novo assemblers. Furthermore, the need to assembly increasingly larger sequencing data is motivating serious investment in improving computational performance. Assembler developers are now looking for more time- and memory-efficient ways to handle massive data volumes (hundreds of Giga base-pairs) on single server. Binning approaches, i.e. the classification and/or clustering of reads into specific bins, can further help elucidate the broader genomic context of interesting features [133]. Some binning methods are taxonomy-dependent (supervised learning procedures), i.e. obtain estimates of the profile/abundance of ‘known’ taxonomic groups (reference database) [134]. CAMERA [91], MG-RAST server [48], NBC classifier [122] and WebCARMA [131] are some well-known taxonomy-dependent Web applications. On the other hand, there are taxonomy-independent methods (unsupervised learning procedures), which group reads based on their mutual similarity and do not involve a database comparison step [82]. The tools CONCOCT [95], GroopM [105], MaxBin [108], MetaBAT [110] and MyCC [121] are some prominent examples. The prediction and annotation of gene-coding sequences is the last, fundamental step of analysis. In terms of software commonly used in 16S rRNA gene analyses, the Genboree Microbiome Toolset supports community profiling (i.e. determination of the abundance of each type of microbe) [103], QIIME [129] and Mothur [119] (also part of Genboree) can be used to obtain quantitative insights into microbial relative abundances and ecosystems, BLAST [89] and Cd-hit [92] facilitate the comparison of large sets of proteins and the Chimera Slayer is used to detect sequences falsely interpreted as organisms (aiming to correct false perceptions of sample diversity and false identification of novel taxa) [93]. Furthermore, the Ribosomal Database Project classifiers may help in the assignment of rRNA gene sequences into bacterial taxonomy [135]. Tools such as Glimmer-MG [104], FragGeneScan [136], MetaGeneAnnotator [111], PICRUSt [126], Orphelia [123] and PROkaryotic DYnamic programming Gene-finding ALgorithm (Prodigal) [128] are good examples of how gene prediction approaches have adapted to the challenges posed by shotgun sequencing data. Nowadays, MG-RAST [48], IMG/M [49], MEGAN [50], HUMAnN [51], MALINA [52], MOCAT2 [53] and COGNIZER [54] are among the most well-known comparative genomics-based automated computational pipelines, and present multiple ongoing developments. MG-RAST provides an easy-to-use Web interface for metagenomics analysis, including alignment, but imposes some limitations in terms of data file upload (file size limits). In turn, both HUMAnN and MEGAN both lack an integrated alignment tool and are notably unable to perform comprehensive downstream processes, such as operon-level analysis [137]. Databases such as Pfam [43] and Clusters of Orthologous Groups (COGs) [42] enable methods for comparison with sequence-diverse protein families or recurring sequence motifs, and the Kegg Orthology (KO) and KEGG pathways databases [41] are often used to predict the composition ratio of microbial gene families and pathways from the HMP [138, 139]. Tools such as RAPSearch [130] and PAUDA [124] propose faster alternatives than BLAST to the alignment of environmental sequencing reads. More recently, FishTaco, an analytical and computational framework, has presented the ability to produce integrated taxonomic and functional comparative analyses. In particular, FishTaco is equipped to accurately quantify taxon-level contributions to disease-associated functional shifts, i.e. to trace back shifts in the microbiome’s functional capacity to specific taxa [100]. Besides comparative genomics, gut studies encompass structure-based approaches, functional prediction methods based on evolutionary conservation and phylogeny and network context-based approaches (e.g. co-expression and metabolic networks) [139–141]. Approximately 50% of the genes in the gut microbiome could not be characterized using standard annotation methods [142]. Therefore, conventional methods for putative gene characterization and functional prediction, based on alignment to homologous genes with existing annotations (e.g. BLAST), were rendered ineffective [43]. Alternative computational methods approached the problem by integrating standard homology information with additional information, namely, sequence features, co-expression data, binding sites and subcellular localisation data [143–146]. The study of discrepancies between taxonomic and functional variations led to a proposal to revise some of the main metagenomic processing procedures to uncover hidden functional variation across samples [147]. This revision relies on the Metagenomic Universal Single-Copy Correction (MUSiCC) method [120], and the Evidence-based Metagenomic Pathway Assignment using geNe Abundance DAta (EMPANADA) schema. Phylogenetic analysis is often supported by tools such as: CAFE, a stand-alone software, which integrates 28 measures and downstream visualized analysis [90]; AlFree, a Web server, which integrates 38 measures and supports the visualization of phylogeny [85]; the CVTree server, which implements a whole-genome-based and alignment-free composition vector method [96] and is also included in CAFE tool; the AmphoraNet [87] that is the Web server implementation of the AMPHORA2 method, i.e. incorporates probability-based sequence alignment masks to improve the phylogenetic accuracy; MetaPhlAn, which estimates the abundance of species in each sample according to the number of mapped reads to its markers [114]; and PhyloSift [125], which statistically tests lineages of interest directly from an uncultured DNA sample and allows for comparison of community structure among many samples. An immediate application of phylogenetic approaches is the study of how species within the same genome interaction groups decrease or increase their abundance during dietary interventions [12]. The generation of community-level metabolic networks of the microbiome is also an interesting avenue. For example, these networks can be used to explore gene-level and network-level topological differences associated with obesity and IBD [148]. By placing variations in gene abundance in the context of these networks, researchers are able to look into the genes associated with these host states, namely, may inspect gene location and generate hypotheses about how the microbiome is interacting with host metabolism. Additionally, network analysis can bring to light associations between topological variations and community species composition. Genome mining approaches are increasingly valuable for the purpose of identifying antimicrobial-producing microorganisms as well as screening for and harnessing putative gene clusters. For example, genome mining using Rapid Annotation using Subsystem Technology (RAST) was applied to the comparative pathogenomic analysis of Nesterenkonia jeotgali [149]. Likewise, the bacteriocin genome mining tool BAGEL3 [88] helped in the identification of potential bacteriocin producers in the genomes of the gut microbiome subset of the HMP's reference genome database [150]. Arguably, metagenomics should be at the basis of most (if not all) microbiome studies. Despite the huge technological development in this field, methods are often limited in resolution and may fail to resolve relevant details concerning the composition of species and genes in the microbiome. Accumulating evidence shows that important functions of the gut microbiota may be species or even strain-specific; yet, many studies in metagenomics are still conducted at genus or higher taxonomic levels because of limited ability to assemble individual bacterial genomes directly from metagenomic data [151]. Metatranscriptomics: gene expression profiling Metatranscriptomics encompasses the functional characterization of microbiomes based on mRNA sequencing data to gain a better understanding about the taxonomic composition and active biochemical functions of microbiomes. Metatranscriptomics captures gene expression patterns from microbial communities without previous assumptions as to the ongoing activities or dominant taxa, and provides a catalogue of those genes being transcribed under given experimental conditions. Here, bioinformatics analysis methods can be broadly classified into those based on reference-dependent methods and those that are reference-independent. Reference-dependent methods are based on sequence alignment onto functionally well-characterized databases or datas ets, whereas reference-independent methods resort to de novo assemblies. Table 3 presents available metatranscriptomics tools. Table 3 Publicly available metatranscriptomics tools Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms View Large Table 3 Publicly available metatranscriptomics tools Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms Tool Purpose BLAST [89] Identification of regions of similarity between biological sequences COMAN [152] Functional analysis of metatranscriptomics data DESeq2 [97] Estimation of variance–mean dependence in count data from high-throughput sequencing assays and differential expression analysis DIAMOND [99] High-throughput alignment of DNA reads and protein sequences FLASh [101] Merging paired-end reads from next-generation sequencing experiments FMAP [102] Functional analysis of metagenomic/metatranscriptomic sequencing data, i.e. sequence alignment, gene family abundance calculations and differential feature statistical analysis Genboree Microbiome Toolset [103] Multi-omic data analysis IMP [153] Large-scale standardized integrated analysis of coupled metagenomic and metatranscriptomic data KOBAS [107] Gene/protein functional annotation and functional enrichment of gene sets NCBI’s Best Match Tagger [154] Filtering human reads from metagenomics data sets PRIAM [127] Automated enzyme detection in a fully sequenced genome RPS-BLAST Search in profile databases SAMSA [155] Comprehensive metatranscriptome analysis USEARCH [156] Sequence analysis, including search and clustering algorithms View Large Most metatranscriptomics analyses involve reference-based or metagenomics-dependent analysis workflows. For example, the Functional Mapping and Analysis Pipeline (FMAP) aims to identify differentially abundant features in microbiome data sets [102]. FMAP supports data preprocessing and performs sequence alignment, gene family abundance calculations and differential statistical analysis. To this end, the pipeline integrates various tools: NCBI’s Best Match Tagger [154] for data preprocessing; USEARCH [156] and DIAMOND [99], for the alignment of reads to a reference database, namely, against a KEGG-filtered UniProt data collection; and the analysis of differentially abundant genes and the enrichment analysis of pathways and operons, based on statistical testing methods such as metagenomeSeq and Kruskal–Wallis rank-sum. The comprehensive metatranscriptomics analysis (COMAN) is an integrated Web server dedicated to the comprehensive functional analysis of metatranscriptomic data [152]. After an initial quality control step, reads are mapped to the RefSeq database using DIAMOND [99]. Functional annotation of genes and reads are prepared with COG [42] and KO [41]. COG-based annotation is conducted using RPS-BLAST against the CDD database [157], whereas DIAMOND [99] and KOBAS [107] support KO annotation. In addition, PRIAM [127] supports the annotation of genes to enzymes (Enzyme Commission numbers, ECs), and enables further profiling against MetaCyc pathways [46]. Noteworthy, the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline has been specifically designed for the analysis of gut microbiome data [155]. The FLASh short-read aligner is used in the preprocessing step [101]. The annotation step resorts to MG-RAST tools to generate annotations for the best matches to organisms and individual transcripts (RefSeq database), and the SEED database is used to obtain additional ontology annotations. The DESeq2 package in R supports the comparison of metatranscriptome samples and the identification of significantly differentially expressed transcripts [97]. A major drawback of reference-based methods is the large number of sequencing reads from uncultured species and divergent strains that are discarded during data analysis, i.e. the loss of potentially useful information. For example, in a recent study of 252 human fecal samples, 43% of the reads could not be mapped to available isolate genomes [158]. To mitigate this lacuna, reference-independent methods address the retrieval of the actual genomes and potentially novel genes present in the samples, maximizing the amount of data exploited for analysis. To this end, metatranscriptomics reference-independent approaches use dedicated assembly methods, namely, metatranscriptome assemblers, metagenomics assemblers or single-species transcriptome assemblers [159]. Moreover, these approaches aim to leverage the advantages associated with integrating metatranscriptomics and metagenomics data for the large-scale analysis of microbial community structure and function. For example, the open-source Integrated Meta-omic Pipeline (IMP) is a self-contained and standardized de novo assembly-based pipeline, which allows automated and large-scale integrated analyses of combined metagenomics and metatranscriptomics data sets [153]. Notably, IMP incorporates iterative co-assembly of metagenomic and metatranscriptomic data, analyses of microbial community structure and function and genomic signature-based visualization. Despite all these advances, metatranscriptomics approaches continue to struggle to cope with the quality mRNA from microbial samples. Metaproteomics studies: spectral search and protein profile Metaproteomics studies aim to perform the large-scale characterization of proteins extracted from the human gut microbiota [160]. Metaproteomics allows for the characterization of the dynamic proteome in complex communities, revealing their impact on microbial metabolism and proportionate information about which taxonomic groups are performing different metabolic roles. Compared with metagenomics and metatranscriptomics, the added value of metaproteomics lays on providing function details, i.e. metaproteomics conveys the identification of proteins, their assignment to specific taxa and the description of how these proteins interact with the human host. The publicly available metaproteomics tools are listed in Table 4. Table 4 Publicly available metaproteomics tools Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences ChIP-Seq: Chromatin Immunoprecipitation Sequencing. View Large Table 4 Publicly available metaproteomics tools Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences Tool Purpose AACompIdent [161] Identifying of a protein from its amino acid composition AACompSim [161] Comparing the amino acid composition between UniProtKB/Swiss-Prot entries Blazmass+ComPIL [162] A comprehensive and scalable database search system for metaproteomics ClustalO [163] Aligning two or more protein sequences COILS [164] Comparing a sequence to a database of known parallel two-stranded coiled coils and derives a similarity score Compute pI/Mw [161] Computing of the theoretical isoelectric point and molecular weight for a list sequences FindPept [165] Identifying peptides that result from unspecific cleavage of proteins from their experimental masses Galaxy-P [166] Integrative analysis of MA-based proteomics and genomic and transcriptomic data HAMAP [167] Classifying and annotating protein sequences ISMARA [168] Modelling genome-wide expression or ChIP-seq data, in terms of computationally predicted regulatory sites for transcription factors and microRNAs Mascot [169] Protein identification using MA data MyriMatch [170] Comparison of MA from shotgun proteomics against a reference database MZJava [171] Analysis of MA data from large-scale proteomics and glycomics experiments OMSSA [172] MA/MA search algorithm PDBePisa [173] Exploring macromolecular interfaces pICarver [174] Visualizing theoretical distributions of peptide pI on a given pH range PredictProtein [175] Predicting protein structural and functional features QMEAN [176] Estimating the quality of protein structure models is a vital step in protein structure prediction QuickMod [177] Identifying modified peptides Scaffold [178] Protein identification and analysis ScanProsite [179] Scanning proteins for matches against the PROSITE collection of motifs or user-defined patterns SEQUEST [180] Correlates uninterpreted tandem MA of peptides with amino acid sequences from protein and nucleotide databases T-coffee [181] Computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures Unique Peptide Finder [182] Characterization of taxon-specific peptidomes and peptidome-based clustering X! Tandem [183] Protein identification via tandem MA matching against peptide sequences ChIP-Seq: Chromatin Immunoprecipitation Sequencing. View Large ExPASy Web portal has worldwide reputation as one of the main bioinformatics resources for proteomics [184]. ExPASy databases include Swiss-Prot [185], STRING [186], SWISS-MODEL [187], PROSITE [188], ViralZone [189] and neXtProt [190]. Analysis tools are available for specific tasks, such as protein sequence and identification [191] (tools such AACompIdent [161] or FindPept [165]), proteomics experiment [192] (tools such MZJava [171] or pICarver [174]), function analysis [193] (tools such AACompSim [161] or Compute pI/Mw [161]), sequences sites, features and motifs [194] (tools such HAMAP [167] or ScanProsite [179]), protein modification [195] (tools such ISMARA [168] or QuickMod [177]), protein structure [196] (tools such COILS [164] or QMEAN [176]), protein interactions [197] (tools such PDBePisa [173] or PredictProtein [175]) and similarity search/alignment [198] (tools such ClustalO [163] or T-coffee [181]). Analysis of mass spectra (MS) (i.e. decode of peptide sequences) is typically facilitated by database searching algorithms, namely, SEQUEST [180], Mascot [169], MyriMatch [170], OMSSA [172] and X! Tandem [183]. The development of cross-species protein identification approaches is desired, but challenging, given the complexity of the gut microbial proteome and the dynamic distribution of species between individuals [199, 200]. New approaches aim to increase the sensitivity of the peptide spectrum matching. Together, the combination of the ComPIL metaproteomic analysis method and the Blazmass search engine allows larger-scale database searches, including peptide masses, protein information and peptide sequences [162]. Other possible approaches are the integration of synthetic metaproteome information with metagenomic information [62], and de novo sequencing [201, 202]. The Galaxy bioinformatics framework offers a sophisticated proteogenomic workflow, named Galaxy for Proteomics or Galaxy-P (usegalaxyp.org), in support of broad metaproteomics data analysis [203]. This is a complex workflow, which includes ∼140 steps, and can be shared using built-in Galaxy functions [166]. Alternatively, the MetaPro-IQ workflow, which has been specifically developed for gut metaproteome identification and quantification, uses almost complete human or mouse gut microbial gene catalogues as reference database and an iterative database search strategy [204]. Unipept offers programmatic access to metaproteomics analysis features and has the advantage of being supported by a fast index built from UniProtKB and NCBI Taxonomy [205]. It facilitates interactive data visualization, and the Unique Peptide Finder enables the discovery of tryptic peptides that are taxon-specific, i.e. peptides that can be used as biomarkers to reliably detect the presence of the targeted taxa [182]. Scaffold is designed to identify and analyse proteins in biological samples [178]. By using output files from MS/MS search engines, Scaffold validates, organizes and interprets MA data, allowing the user to more easily manage large amounts of data, compare samples and search for protein modifications. Moreover, it attempts to increase the confidence in protein identification reports through the use of several statistical methods. In terms of applications, the study ofCD is a meaningful example of the added value of the integrated analysis of metagenomics and metaproteomics approaches [27]. Such analysis led to a better understanding of the CD phenotype (i.e. genes, proteins and pathways that primarily differentiated patients from healthy subjects) and enabled the association of the phenotype with alterations in bacterial carbohydrate metabolism, bacterial–host interactions, as well as human host-secreted enzymes. The investigation of colonic metaproteomics bacterial signatures in obesity represents another application [206]. The goal was to detect differences among obese and non-obese samples at a functional level. The combination of metaproteomics and phylogenetic data exposed significant metabolic activity of the phylum Bacteroidetes in obese subjects. Likewise, faecal metaproteomics analysis was applied in a probiotic intervention trial to identify individually different human intestinal proteomes (i.e. personalized host–microbiota interactions) and examine the activity of main phyla as well as key species, namely, F. prausnitzii [207]. Finally, in the context of type 1 diabetes mellitus (T1DM), a large-scale analysis of intra- and inter-individual variation using metagenomics, metatranscriptomics and metaproteomics inputs showed that community structures are reflected across all ‘-omics’ levels. In particular, differences in the relative abundances of certain human pancreatic enzymes were correlated with the expression of microbial genes involved in T1DM-relevant metabolic transformations, such as thiamine synthesis and glycolysis [208]. Metabolomics studies: metabolite identification and concentration Gut metabolome studies aim to identify and quantify the set of metabolites (or specific metabolites) in biological samples, and therefore, look into differences in signature metabolites and their relation to changes in the activity of metabolic pathways [209–211]. Metabolomics allows for the characterization of the dynamic metabolome in complex communities, revealing their impact on microbial metabolism. Besides being the most immediate indicator of dysbiosis [59, 212], metabolome profiling is able to show dependences on environmental factors (e.g. diet [213, 214] and antibiotic exposure [215, 216]) as well as provide valuable information about the interactions of the microbial community with the host environment (e.g. quorum sensing [217]). Metabolite profiling is typically carried out using a combination of chromatographical techniques (e.g. liquid chromatography or gas chromatography) and detection methods, such as MA and nuclear magnetic resonance [218, 219]. Computationally speaking, data processing and analysis can be challenging because of the huge number of different metabolites potentially detected in this kind of samples. Moreover, a combination of statistical and machine learning methods is usually applied to identify discriminative features [220–222]. For example, classical univariate tests (e.g. Student’s t-test, multivariate linear regression and Mann–Whitney test) are combined with multivariate analysis such as principal component analysis, hierarchical cluster analysis, discriminant analysis and classification models (e.g. k-nearest neighbour). Currently available metabolomics computational tools are listed in Table 5. Table 5 Publicly available metabolomics tools Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities View Large Table 5 Publicly available metabolomics tools Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities Tool Purpose BNICE Discovery of novel biochemical pathways MassTRIX [223] Annotation of metabolites in high precision MA data MIDAS [224] Database search algorithm for metabolite identification MetFrag [225] In silico fragmentation for computer-assisted identification of metabolite MA MimoSA [226] A pipeline for joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities View Large Comparison against spectral databases is required to identify and quantify the metabolites in the sample, namely: the Human Metabolome Database [227], BioMagResBank [228], Madison-Qingdao Metabolomics Consortium Database [229], MassBank [230], Golm Metabolome Database [231], METLIN [232] and ChemSpider [233]. Alternatively, the in silico fragmenter MetFrag [225] combines compound database searching (via ChemSpider and PubChem [234] Web services) and fragmentation prediction, and the Metabolite Identification via Database Searching (MIDAS) approach [224] matches measured tandem MA against the predicted fragments of metabolites in the MetaCyc database. Untargeted metabolomics approaches are being developed as means to minimize the challenges in matching metabolites to their spectral features [222]. For example, the Metabolic I Network Expansions (MINEs) databases record molecules that have not been observed yet, but are likely to occur based on known metabolites and common biochemical reactions [235]. Computational predictions are based on the Biochemical Network Integrated Computational Explorer (BNICE) algorithm and expert-curated reaction rules based on the Enzyme Commission classification system. Details on a broader range of Web accessible databases of the properties, enzymatic reactions and metabolism of small molecules-search options have been recently reported [236, 237]. Within the scope of human gut research, IBD is one the main focus of metabolomics studies. The most discriminative metabolites for IBD, mainly derived from nuclear magnetic resonance spectroscopy studies, were alanine, isoleucine, leucine, lysine, valine, phenylalanine and butyrate [209, 238–240]. Also, MA studies have shown that long-chain fatty acids could play an important role in the disease. Researchers are now exploring certain metabolic patterns, discussing whether they are a cause of IBD or rather a consequence of inflammation or altered gut microbiota. For example, an increase of amino acids in faecal samples of IBD patients is explained by the low capacity of the inflamed intestinal tissue to absorb nutrients [241]. Obesity and T2D are also the subject of discussion through studies of co-metabolites [14]. Fluxomics studies: high-throughput analysis of metabolic fluxes Fluxomics refers to the group of techniques focused on the high-throughput analysis of metabolic fluxes, and is a clear complement to transcriptomics, proteomics and metabolomics. By integrating in vivo metabolic data with stoichiometric network models, absolute fluxes in the central metabolism of a biological system can be determined. Applications can be grossly divided into two approaches, constraint-based methods for examination of the relative contributions of different pathways to a given phenotype, and fluxomics based in the incorporation and monitorization of (13) C-labelled compounds [242]. Different algorithms, desktop or Web applications and resources have been published during the past years to facilitate the work of the fluxomics researchers [243]. Table 6 presents the publicly available fluxomics tools. Table 6 Publicly available fluxomics tools Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations View Large Table 6 Publicly available fluxomics tools Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations Tool Purpose B-DMFA [244] A fast heuristic algorithm developed for knot placement COBRA Toolbox [245] Quantitative prediction of cellular behaviour COMETS [246] Performing computer simulations of metabolism in spatially structured microbial communities. CycSim [247] Simulating with constraint-based models of metabolism Fastcore [248] Reconstruction of context-specific metabolic network models from global genome-wide metabolic network models Fast-SL [249] Identification of synthetic lethal gene/reaction sets in genome-scale metabolic models Fast-SNP [250] Function analysis and selection tool for identifying and prioritizing SNPs that are likely to have functional effects FBA-SimVis [251] Constraint-based analysis of metabolic models FluxModeCalculator [252] Flux mode analysis in stoichiometric models GEMSiRV [253] Performing metabolic network drafting and editing, network visualization and flux balance analysis GlobalFit [254] Finding globally optimal networks Influx [255] Optimized flux estimation iReMet-flux [256] Flux prediction jQMM [257] Flux calculation for genome-scale models ll-ACHRB [258] Sampling the feasible solution space of metabolic networks MFF [259] Flux distribution and impact prediction, selection of key network reactions and prioritization of measurements Mflux [260] Prediction of the bacterial central metabolism via machine learning MicrobesFlux [261] Generation and reconstruction of metabolic models for annotated microorganisms ModelSEED [262] Reconstruction, exploration, comparison and analysis of metabolic models OptFlux [263] Flux balance analysis, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results ROOM [264] Constraint-based prediction of metabolic steady state Sumoflux [265] A toolbox for targeted 13C metabolic flux ratio analysis SurreyFBA [266] Providing constraint-based simulations and network map visualization Sybil [267] Constraint-based analyses of metabolic networks Sysmetab [268] Metabolic flux analysis VisualCNA [269] Constraint network analysis and molecular graphics representations View Large Constraint-based approaches include more or less specific applications dealing with flux balance analysis (FBA). FBA has been traditionally used in the characterization of cellular metabolism and metabolic engineering [270]. There are many algorithms that have been developed for the high-throughput characterization of metabolic fluxes. Regulatory On/Off Minimization (ROOM) works on metabolic steady states and is focused on changes induced by gene knockouts, mostly providing rerouting options in response to the absence of an enzymatic step (i.e. a gene knockout) [264]. Fastcore is another algorithm able to reconstruct metabolic sub-networks that have been extracted from wider models. Starting by a set of reactions empirically known to be active (denominated core), the algorithm returns a metabolic network containing all the reactions and the minimum number of additional reactions that satisfy the metabolic results [248]. Fast-SL is another algorithm that, in the context of a genome-scale metabolic model, identifies sets of lethal reactions, which is useful for combinatorial discovery of drug targets [249]. ll-ACHRB (Artificially Centered Hit-and-Run on a Box) is a scalable algorithm for sampling flux samples in the context of metabolic networks [258]. Fast-SNP is an algorithm focused on the improvement of computational efficiency by reducing the original network into a smaller matrix. Overall, this algorithm is efficient for the formulation of loop-law constraints, allowing loopless flux optimization [250]. B-spline fitting Dynamic Metabolic Flux Analysis (B-DMFA) is a heuristic algorithm focused on knot placement, a time-consuming task in the dynamic metabolic flux analysis. This is performed by implementing the local support property of B-splines [244]. Finally, Influx_s is a deterministic algorithm with improved accuracy for flux estimation. Influx_s uses few computational resources; indeed, the central carbon metabolism network estimation of Escherichia coli requires from several seconds to few minutes with a standard personal computer (PC) architecture [255]. From the hundreds of applications available in the literature, many have been programmed using the Matlab environment. Perhaps, the most used is the Constraints-Based Reconstruction and Analysis (COBRA) Toolbox. This software package has been used for quantitative prediction of cellular metabolism through a predictive computation of optimal growth (steady-state or dynamic), and allows modelling the occurrence of gene deletions [245]. COBRA Toolbox can be also useful for methodologies such as regulatory on/off minimization and flux variability analysis [271]. It has been also implemented as Python package (COBRApy) and in Julia, where other associated packages such as distributedFBA.jl can be used to solve multiple flux balance analyses on concise pathways or on the whole central metabolism. This implementation in Julia provides scalability and integration with the high-level interface MathProgBase.jl, obtaining optimum results in terms of resources optimization [272]. Mackinac has been designed to profit from the COBRA metabolic analysis capabilities to, in combination with ModelSEED, infer in the metabolic potential of a biological system and to optimize genome-scale metabolic models [273]. ModelSEED is precisely a Web-based resource for the analysis of genome-scale metabolic models [262], and some Cytoscape plugins such as CytoSEED allows visualization and manipulation of the created models [270]. ORCA is another COBRA-based package, which implements notable improvements in terms of scope extension of COBRA metabolic models [274]. Another Matlab-based desktop application is Coordinate Hit-and-Run with Rounding (CHRR), which allows genome-scale sampling in biochemical networks [275]. Many fluxomics applications have been developed in the R environment (sometimes simply as libraries), such as Sybil, an R-based library for the analyses of metabolic networks that indeed is part of the COBRA Toolbox implementation in R [267]. GlobalFit is another R package designed for metabolic network refining. This is achieved by establishing models in which many properties of the reactions are changed (e.g. reversibility, presence/absence of an enzymatic step) and the fitting of experimental data is observed. Then, GlobalFit finds the optimal metabolic model with the minimum number of metabolic changes that fits better with the experimental data [254]. OptFlux is a platform for metabolic engineering, allowing user-manipulation of the nodes composing a metabolic network and the overlay of phenotype results or flux modes [263]. It has been improved with a visualisation plugin, enabling the graphical the edition of the network and the visualization of the results [276]. Integration of Relative Metabolite Levels for Flux prediction (iReMet-flux) is an interesting tool, as it integrates data from other omics. More concisely, integration of metabolomics data is combined in iReMet-flux with the assumption that metabolism minimizes flux changes between two different scenarios. This allows biological interpretation of the changes on metabolite levels among different experimental conditions [256]. Sysmetab integrates high-throughput data from MS and/or nuclear magnetic resonance measurements to solve metabolic fluxes in experiments involving carbon-labelled metabolites [268]. Other remarkable applications are FluxModeCalculator, which allows large-scale elementary flux mode computations using multiple cores [252], and VisualCNA, which is a PyMOL plugin implementing many graphical visualizations of constraint network analysis [269]. A certain number of fluxomics tools incorporate data of 13 C experiments, in which the rates of the metabolic reactions within carbon metabolism are monitored through a 13 C-labelled metabolite, enabling among other applications query on the reversible character of some reactions [277]. This is the principle underlying Central Carbon Metabolic Flux (CeCaF) Database, which has been manually curated and which allows comparative analysis of the central carbon metabolism in many organisms. Resources where the empirical data were retrieved are linked and interactive visualization is supported in the Cytoscape Web API [277]. Sumoflux incorporates measurement of surrogates included in the experiments with machine learning algorithms, helping in the optimization of experimental designs, selecting which level of metabolites are more interesting to be measured, and it has also the possibility to merge data from different experiments [265]. An interesting application in the sense of identifying which metabolites are, a priori, more interesting to track is Maximum Metabolic Flexibility (MMF). This application estimates the influence of the flux of different metabolic pathways on other reactions, with a clear application in the prioritization of the reactions to measure first helping to optimize resources [259]. JBEI Quantitative Metabolic Modeling (jQMM) calculates flux models at the genome-scale. Prediction of internal metabolic fluxes is available not only through 13 C metabolic experiments but also through FBA. This application also accepts omics data, which makes it suitable for flux studies in microbial communities [257]. Finally, few Web-based fluxomics applications are available in comparison to desktop applications. For instance, MicrobesFlux uses annotated microorganism genomes (KEGG) to generate and reconstruct metabolic models [261]. CycSim is another Web application in which genome-scale metabolic models can be simulated and integrated with KEGG data [247]. MFlux is the third Web-based platform contemplated in this review, and it incorporates machine learning algorithms (support vector machine and k-nearest neighbours, among others) to predict bacterial metabolism, with the peculiarity that it incorporates experimental data from about a hundred of papers in which heterotrophic bacterial metabolisms were characterized by 13 C experiments. MFlux incorporates methodologies to adjust flux models with given stoichiometric constraints through quadratic programming [260]. Knowledge representation Model reconstruction and network analysis are mainstreams for system-level analysis, namely for the study of microbe systems as well as host–microbe and microbial community interplays. These works are equally relevant to gain a better understanding about the gut ecosystem and to disclose the impact of the social dynamics of these communities into dysbiosis and disease. Figure 4 illustrates the different aspects of knowledge representation that are detailed in the next subsections. Figure 4 View largeDownload slide Mindmap of gut-related modelling and system-level analysis efforts. Figure 4 View largeDownload slide Mindmap of gut-related modelling and system-level analysis efforts. Metabolic modelling The reconstruction of genome-scale metabolic models can be viewed as a framework for converting large amounts of varied data, e.g. genetic, metabolic and biochemical, into phenotype and interaction observations [25, 278]. Typically, such reconstruction requires extensive manual curation and validation, and is based on the genome sequence, biochemistry and physiology of the organism [279]. The resulting model describes individual chemical reactions governed by the fundamental laws of mass conservation and thermodynamics, and can be used to simulate microbial growth or to predict the production rate of a particular metabolite. The value of metabolic modelling for understanding the complex environment of the gut microbiome lays in resolving biochemical relationships within and between microbial species and potentially predicting the effect of ecosystem-wide perturbations, such as antibiotic application or pathogen invasion. Microbial communities can be seen not only as groups of individual microbes but also as collections of biochemical functions affecting and responding to an environment or host organism [280, 281]. Gut microbe biochemical models There are a number of available reconstructions for human gut microbes (Table 7). Notably, a recent work has presented draft metabolic reconstructions for 301 gastrointestinal microbe models [282]. Table 7 Genome-scale metabolic models and networks reconstructed for gut microbiota species Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis View Large Table 7 Genome-scale metabolic models and networks reconstructed for gut microbiota species Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis Model Species Total constituents Application Extended/revised iAH991 [145] Bacteroides thetaiotaomicron 308 genes, 82 enzymes, 22 transporters, 32 transcription factors and 37 proteins of undefined functions Suggest and refine specific functional assignments for sugar catabolic enzymes and transporters iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Characterization of host–microbe metabolic symbiosis iAH991[284] Bacteroides thetaiotaomicron 1488 reactions, 1152 metabolites and 991 genes Growth under diets varying in fat, carbohydrate and protein content iBif452 [283] Bifidobacterium adolescentis L2-32 Study of the anti-inflammatory role iMLTC806cdf [285] Clostridium difficile pathogenic strain 630 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions Prediction of essential targets and inhibitors iNV213 [286] Cryptosporidium hominis 3884/213 genes (genome/reconstruction) and 540 reactions Cryptosporidiosis iJO1366 [287] Escherichia coli strain K-12 MG1655 4405/1366 genes (genome/reconstruction), 1136 unique metabolites and 2251 reactions Comprehensive genome-scale reconstruction iCA1273 [288] Escherichia coli strain W (ATCC 9637) 4764/1273 genes (genome/reconstruction), 1111 unique metabolites and 2477 reactions Comprehensive genome-scale reconstruction iFpraus_v1.0 [289] Faecalibacterium prausnitzii A2-165 Carbon source utilization capabilities iFap484 [283] Faecalibacterium prausnitzii A2-165 Study of the anti-inflammatory role iIT341 [290] Helicobacter pylori strain 26695 1632/341 genes (genome/reconstruction), 411 unique metabolites and 476 reactions Gastritis, gastric ulcers, gastric cancer iYL1228 [291] Klebsiella pneumoniae strain MGH 78578 5186/1228 genes (genome/reconstruction), 1055 unique metabolites and 1970 reactions Infection in various tissues iLca12A_640 [292] Lactobacillus casei ATCC 12A 1076 reactions, 979 metabolites and 640 genes Identification of functional differences iLca334_548 [292] Lactobacillus casei ATCC 334 1040 reactions, 959 metabolites and 548 genes Identification of functional differences iJL846 [293] Lactobacillus casei LC2W 846 genes, 969 metabolic reactions and 785 metabolites Understanding and engineering the metabolism of the strain Metabolic network [294] Lactobacillus plantarum WCFS1 3009/721 genes (genome/reconstruction), 554 unique metabolites and 761 reactions, 643 reactions and 531 metabolites Analysis of the physiology of growth on a complex medium pan-metabolic map [295] Lactobacillus reuteri ATCC 55730 and ATCC PTA 6475 The metabolic model of 6475 includes 563 genes, similar to the metabolic model of L. reuteri JCM 1112. The metabolic model of 55730 includes 623 genes Define functional probiotic features Metabolic network [296] Lactococcus lactis ssp. lactis IL1403 2310/358 genes (genome/reconstruction), 422 unique metabolites and 621 reactionsa total of 621 reactions and 509 metabolites Understanding of lactococcal metabolic capabilities Metabolic network [297] Lactococcus lactis subsp. cremoris MG1363 518 genes, 754 reactions and 650 metabolites Analysis of flavour formation iMA945 [298] Salmonella typhimurium strain LT2 4489/619 genes (genome/reconstruction), 1036 unique metabolites and 1964 reactions Salmonellosis food poisoning STM_v1.0 [299] Salmonella typhimurium strain LT2 4489/1270 genes (genome/reconstruction), 1119 unique metabolites and 2201 reactions Salmonellosis food poisoning Genome-scale model [300] Streptococcus thermophilus LMG18311 1889/429 genes (genome/reconstruction) and 522 reactions, 1889 genes (or gene fragments), the total absolute numbers of reactions is 522 Metabolic Comparison of Lactic Acid Bacteria VvuMBEL943 [301] Vibrio vulnificus strain CMCP6 2896/673 genes (genome/reconstruction) 765 unique metabolites and 943 reactions Gastroenteritis View Large These models describe the metabolism of each species, and their integrated analysis allows the exploration of interactions between predominant bacteria in the gut ecosystems. For example, El-Semman and colleagues [283] reconstructed two metabolic models for Bifidobacterium adolescentis L2-32 (the iBif452 model) and F. prausnitzii A2-165 (the iFap484), which enabled the study of the anti-inflammatory role that these microorganisms play in the gut ecosystem. A genome-scale metabolic model for Lactobacillus casei LC2W enabled the identification of essential amino acids and vitamins and the exploration of the biosynthetic potential of some metabolites [26]. Another reconstruction of B. adolescentis L2-32 and F. prausnitzii A2-165 models enabled in silico simulation of the metabolic crosstalk between the two species and evidenced the importance of acetate supply into butyrate production [27]. Likewise, the characterization of carbohydrate utilization in Bacteroides thetaiotaomicron, supported by genome-scale metabolic and regulatory reconstructions, prompted and refined specific functional assignments for sugar catabolic enzymes and transporters [144]. Many of the above described models were obtained using similar reconstruction pipelines and therefore, share some data resources and simulation tools. Often, genome sequence data (from NCBI Genome database [40]) is the starting point, and draft reconstructions are obtained with the Model SEED comparative genome annotation and analysis software [262]. KEGG database is a useful resource for functional annotation [41], and the BiGG database is further used to assign reaction directionality [302, 303]. Tools like GEMSiRV [253], Acorn [304], YANAsquare [305] and VANTED [306] are commonly used for this purpose. Finally, constraint-based computational techniques are used in varied model simulations. For example, the OptKnock algorithm [307] and the COBRA toolbox [245] are frequently used in flux balance analysis, which enables the prediction of the phenotypic responses triggered by environmental factors (i.e. manipulation of cellular growth in silico) and additional metabolic profiling. A comprehensive description of the available genome-scale metabolic reconstruction procedures and pipelines can be found in recent reviews [29, 308]. Along this line of research, but using different approaches, Bayesian inference of metabolic networks has been used to reveal a metabolic system with greater prevalence among IBD patients [309], and the construction and functional analysis of proteome interaction networks enabled the analysis of nutrient-affected pathways in human pathologies [310]. Gut microbiome community models As more metabolic reconstructions of gut microbes become available, bioinformatics efforts are being directed towards the development of modelling frameworks for the systematic investigation of metabolic crosstalk in gut microbiome communities [311, 312]. Although existing single-species quantitative and computational approaches can be applied to microbial communities, extended community-centred approaches are being proposed to consider the impact that social traits (e.g. bacteriocin production, quorum sensing and other cell-to-cell interactions) may have in specific scenarios [313]. Such modelling of microbe communities should entail community structure, i.e. the interactions among microbes over time (community states). Specifically, each community state is described by measurements of community-level fluxes, abundances of species and knowledge of the metabolism of these organisms. In complex ecosystems, such as the human gut, this may imply millions of reactions, many of which are carried out in different species. As the involved mathematical and computational modelling is too costly, alternatives based on coarse-grained models have been proposed, and recent reviews have described their rationale, pointing out main strengths and drawbacks [314, 315]. The so-called ‘supra-organism’ approach combines all metabolic reactions into a single network to study the metabolic capacities in terms of product and substrate variation of the community [316]. Such approach ignores the impact of species abundances and the interactions between community members while enabling the optimization of community-level objectives (i.e. prediction of important environmental conditions). The steady-state compartmentalized approach models each organism in the microbial community as a single constraint-based model (i.e. with its own objective function), nested within a global ecosystem model that represents the exchange of metabolites between the species. The aim is to maximize the objective function of the ecosystem, and thus, enable the study of host–microbe and microbe–microbe interactions. [315]. Although initially neglected, biomass concentrations of individual species are now also taken into account, which allows for the determination of accurate quantitative transfer rates [317]. The dynamic compartmentalized approach goes one step forward and uses the kinetics of substrate uptake and metabolite exchange between species to grasp ecosystem structure and functionality [318, 319]. Specifically, this approach accounts for changes in the biomass concentrations of individual species over time, which allows the simulation of interactions that may alter the community state. Furthermore, by complementing in silico metabolic network models with metagenomics-based compositional data, it is possible to predict levels of competition and complementarity among microbiome species and compare predicted interaction measures to species co-occurrence, specifically to study microbiome assembly according to habitat filtering [320]. There already exists several gut microbiome community models. Constraint-based multi-species modelling has been used to predict the effects of environmental constraints, namely, different dietary regimes as well as anoxic and oxic conditions, in the human gut ecosystem [321]. On the other hand, integer linear programming has been used to seek ways to shift target communities towards preferred states, i.e. minimal sets of microbial species that collectively provide the enzymatic capacity required to synthesize a set of desired target products from a predefined set of available substrates [322]. For example, the in silico design of faecal microbiota transplants, where synthetic communities are engineered to mimic a healthy gut, and thus, to be able to ameliorate the condition of patients with dysbiotic guts. This kind of transplants has shown promising results for addressing recurrent Clostridium difficile infections and other gut disorders, including IBD [323]. Computational tools such as PathPred [324] and Computation of Microbial Ecosystems in Time and Space (COMETS) [325] are being used in the study of community level biotransformation. PathPred uses the KEGG RPAIR database, a collection of biochemical structure transformation patterns and chemical structure alignments of substrate–product pairs, to predict plausible pathways for multistep reactions. COMETS enables computer simulations of metabolism in spatially structured microbial communities using dynamic flux balance analysis. To facilitate the visual exploration of the metabolic interactions between microbiomes in a community, e.g. as predicted by COMETS, tools like VisANT 5.0 [326], MetDraw [327], Cellular Overview [328], FBA-SimVis [251] and SurreyFBA [266] have been developed. Host–microbe models Typically, the characterization of host–microbe interactions entails the integration of a human metabolic reconstruction (or a mouse reconstruction) with one or various microbe metabolic reconstructions. These models are useful to gain a deeper understanding about host–microbe symbiosis in the scope of metabolic disorders, and thus, may offer valuable insights into diet modulation and the benefits of probiotics. For example, a ‘meta-metabolome’ network describing the interactions between the human host and three predominant phyla of gut bacteria, namely, Firmicutes, Bacteroidetes and Actinobacteria, shed light into cross-feeding relationships between some gut microbe enzymes and host carbohydrate metabolism enzymes [329]. The genome-scale metabolic reconstruction of B. thetaiotaomicron iAH991 was integrated with the mouse metabolic reconstruction iMM1415, in an effort to characterize intestinal transport and absorption reactions. The resulting model (iexGFMM_BΘ) comprises 7239 reactions, 5164 metabolites and 2769 genes, and was used to simulate the effect of different dietary regimes in both the host and the microbe [284]. Similarly, a metabolic reconstruction of human small intestinal epithelial cells, named hs_sIEC611, supported the study of microbe–microbe interactions in the presence/absence of the human host [330]. The first constraint-based host–microbial community model was recently published [311]. This model encompasses the most comprehensive model of human metabolism (Recon2) and 11 manually curated and validated metabolic models of commensals, probiotics, pathogens and opportunistic pathogens, with over 2000 exchanges representing metabolic functions in humans. It was used to predict potential metabolic host–microbe interactions under four in silico dietary regimes, which varied in carbohydrate, fat and protein intake. Network mining The construction of microbial function networks is often sought as a means of identifying co-occurrence of microbial species in humans. For example, a protein–protein interaction network supported the study of potential dietary interventions targeting the short-chain fatty acid metabolism, namely, the analysis of topological metrics enabled the identification of the most vulnerab