TY - JOUR AU - Angione,, Claudio AB - Abstract Metabolic modelling has entered a mature phase with dozens of methods and software implementations available to the practitioner and the theoretician. It is not easy for a modeller to be able to see the wood (or the forest) for the trees. Driven by this analogy, we here present a ‘forest’ of principal methods used for constraint-based modelling in systems biology. This provides a tree-based view of methods available to prospective modellers, also available in interactive version at http://modellingmetabolism.net, where it will be kept updated with new methods after the publication of the present manuscript. Our updated classification of existing methods and tools highlights the most promising in the different branches, with the aim to develop a vision of how existing methods could hybridize and become more complex. We then provide the first hands-on tutorial for multi-objective optimization of metabolic models in R. We finally discuss the implementation of multi-view machine learning approaches in poly-omic integration. Throughout this work, we demonstrate the optimization of trade-offs between multiple metabolic objectives, with a focus on omic data integration through machine learning. We anticipate that the combination of a survey, a perspective on multi-view machine learning and a step-by-step R tutorial should be of interest for both the beginner and the advanced user. poly-omic, genome-scale models, metabolic models, flux balance analysis, data integration, multi-objective optimization Introduction Metabolism is the indispensable set of biochemical reactions in a cell that maintain its living state. Constraint-based reconstruction and analysis (COBRA) are a group of techniques that are commonly used for the mathematical and computational modelling of metabolic networks at the whole-genome scale. Genome-scale metabolic models are available in online repositories such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) [1], the Biochemical Genetic and Genomic (BiGG) knowledge-base [2], the BioCyc collection of pathway/genome databases [3], MetaNetX [4] and the ModelSEED database [5]. Principally, the preparation of a genome-scale metabolic model involves the reconstruction of all metabolic reactions taking place in the organism supplemented with functional annotation of genes, metabolites and pathways. Depending on the quality of the reconstruction, processes of manual curation and gap-filling may also be required [6]. Predictions obtained from genome-scale metabolic models can be reconciled with invivo findings and used to identify current gaps in our knowledge of metabolism [7]. There are often inconsistencies between models and experimental data, such as when an outcome is falsely predicted by the model (false positive) or when an experimentally observed outcome is not predicted (false negative). Algorithms such as Grow Match [8], SMILEY [9] and optimal metabolic network identification (OMNI) [10] correct for such inconsistencies by suggesting adjustments for improving model accuracy. Reducing disparity between predicted and experimentally measured fluxes presents opportunities to devise new strategies for biological discovery [11]. GapFind and GapFill are jointly designed optimization procedures to identify ‘problem’ metabolites that cannot be produced or consumed in the network and then propose mechanisms to restore pathway connectivity for these metabolites [12]. fastGapFill is an extension of FASTCORE, which incorporates flux and stoichiometric consistency into the gap-filling process [13]. Metabolic reconstruction via functional genomics (MIRAGE) conducts gap-filling by integrating with functional genomics data to estimate the probability of including each reaction from a universal database of gap-filling reactions in the reconstructed network [14]. This enables selection of the set of reactions whose addition is most likely to result in a fully functional model when flux analysis is repeated. Many models also integrate signalling and regulatory pathways with metabolic networks to add information regarding underlying mechanisms, consequently improving flux predictions [15]. Here, we present the foundations of constraint-based metabolic modelling as well as recent advances, in the form of a ‘forest’ of analytical tools and methods comprising algorithms and their software implementations. As such techniques are likely to expand and diversify over time, this schematic is also available in interactive version at http://modellingmetabolism.net, where it will be updated as newer methods are developed. We believe that classifying existing methods by their purpose or mode of implementation and defining their strengths and limitations will greatly facilitate the selection of methods for prospective modellers. In this regard, authors of new tools and methods are invited to contact us to include these in the interactive version of our figure. In the following sections, we describe the main approaches currently used for constraint-based metabolic modelling, with the inclusion of many recent developments, which we consider to be significant. These methods are divided into unbiased and biased approaches, the latter of which includes (i) a comprehensive review of flux balance analysis (FBA) and its specific variants, which apply different types of constraints for the prediction of metabolic fluxes, (ii) regulatory methods, for which constraints are derived from external sources for designing context- or condition-specific metabolic models, and (iii) methods for the simulation of genetic perturbations and selection of the objective function. Following this, a detailed discussion of methods for performing multi-objective optimization forms the basis of our tutorial for genetic design by multi-objective optimization (GDMO) in R. Finally, we include a perspective that evaluates the potential use of multi-view machine learning techniques for the analysis of multi-omic metabolic models, which we regard as an important venture for the future of metabolic modelling. Methods for constraint-based metabolic modelling Figure 1 depicts the ‘forest’ of methods commonly used for constraint-based modelling of metabolic networks, following and updating the framework proposed by Lewis et al. [16]. Methodological approaches are broadly divided into biased and unbiased; the former necessitates the definition of an objective function by the network, whereas the latter relies on determining a subset of statistically analysable functional states whilst searching the entirety of the solution space [17]. Figure 1 View largeDownload slide A forest of methods based on COBRA. Network-based pathway analysis describes the simplest configurations of metabolic pathways at steady state. Monte Carlo methods allow for uniform sampling of the solution space to compute the flux as a probability distribution for each pathway in the network. Thermodynamically infeasible fluxes may be eliminated using loop removal or thermodynamic parameter-based analysis. Model refinement may be carried out through the selection of single or multiple objectives or gap-filling techniques. Gene perturbation helps to establish the essentiality of genes and the most efficient pathways for the production of specific metabolites. Following this, strain design for metabolic engineering may involve strategies for gene deletion, reaction perturbation, or reaction addition. Alternate optimal solutions may be yielded by variations of FBA. Other variants may specify the addition of specific constraints or their removal. Additional constraints may be introduced through the inclusion of multi-omic data or gene regulatory mechanisms. Note that some methods may belong to multiple categories, but for clarity, we classify each method according to its main contribution. An interactive version of this figure is maintained and updated at http://modellingmetabolism.net. Figure 1 View largeDownload slide A forest of methods based on COBRA. Network-based pathway analysis describes the simplest configurations of metabolic pathways at steady state. Monte Carlo methods allow for uniform sampling of the solution space to compute the flux as a probability distribution for each pathway in the network. Thermodynamically infeasible fluxes may be eliminated using loop removal or thermodynamic parameter-based analysis. Model refinement may be carried out through the selection of single or multiple objectives or gap-filling techniques. Gene perturbation helps to establish the essentiality of genes and the most efficient pathways for the production of specific metabolites. Following this, strain design for metabolic engineering may involve strategies for gene deletion, reaction perturbation, or reaction addition. Alternate optimal solutions may be yielded by variations of FBA. Other variants may specify the addition of specific constraints or their removal. Additional constraints may be introduced through the inclusion of multi-omic data or gene regulatory mechanisms. Note that some methods may belong to multiple categories, but for clarity, we classify each method according to its main contribution. An interactive version of this figure is maintained and updated at http://modellingmetabolism.net. Unbiased methods Unbiased methods search the entirety of the solution space and find a subset of statistically analysable functional states without requiring the definition of an objective function. Network-based pathway analysis comprises a large family of unbiased methods assessing the main properties of biochemical pathways [18]. Gene Association Network-based Pathway Analysis (GANPA) improves on this process by adding gene weighting to determine gene non-equivalence within pathways [19]. Similarly, a novel method was recently proposed for assessing the significance of pathways by constructing weighted gene–gene interaction networks for normal and cancerous tissue samples [20]. These interaction networks were subsequently used to expand pathways for each set of samples and compare their topologies. Approaches based on network-based pathway enrichment analysis aim to identify a greater number of gene interactions. For example, NetPEA uses a protein–protein interaction network combined with random walk to include information from high-throughput networks as well as known pathways [21]. A combination of network estimation with condition-specific omic data has been used to create the NetGSA framework, thereby improving the ability to detect differential activity in pathways [22]. Using network-based pathway analysis, different methods may be used to calculate a set of routes through the reaction network and the corresponding kernel matrices which represent their stoichiometry. Elementary flux modes (EFMs) describe the minimal, non-decomposable set of pathways operating within a steady-state system; these are found by solving the steady-state condition following the iterative removal of single reactions until a valid flux distribution can no longer be calculated [23]. As this process often yields a combinatorial explosion of common functional motifs, a variation of the Agglomeration of Common Motifs (ACoM) method can be used to cluster these motifs, allowing for an overlap between classes [24]. Alternatively, a single EFM may be determined by solving an optimization problem using EFMevolver [25], which can draw attention to significant EFMs. A method known as K-shortest EFMs enumerates EFMs of their number of reactions and has been applied to genome-scale networks [26]; the shortest pathways are of interest, as they typically carry the highest flux and are easily manipulable. A minimal generating set is the smallest set necessary to define the geometry of the flux space using a null-space algorithm, the elements of which are known as generating flux modes (GFMs) or minimal generators [27]. A variant of this method can be used to find specific subsets of GFMs in a process that does not result in a combinatorial explosion, thus making them easier to compute [28]. Methods using minimal descriptions of the flux cone, such as minimal generators [29] and minimal metabolic behaviours [30], aim to reduce the dimensionality of the flux cone. A recent method prioritizes the search for the shortest path between a pair of end nodes based on graph theory [31]. However, the validity of this approach has been questioned, as reaction stoichiometry is overlooked [32]. Thermodynamic elementary flux mode analysis (tEFMA) [33] removes thermodynamically infeasible EFMs using network-embedded thermodynamic (NET) analysis [34]. The incorporation of thermodynamic constraints helps to select for physiologically significant EFMs, which become more difficult to detect as the size and complexity of networks increases. Identifying the largest thermodynamically consistent sets in these EFMs can further characterize condition-specific metabolic capabilities in the thermodynamically feasible regions of the flux cone [35]. Extreme pathways can be described as being a systemically independent subset of EFMs [18]. They are characterized by a set of convex basis vectors used to represent the edges of the steady-state solution space and consist of the minimum number of reactions needed to exist as a functional unit [36]. As opposed to many of methods described previously, extreme currents aim to increase dimensionality by describing non-decomposable EFMs situated both within and on the boundaries of the flux regions; these arise as a result of partitioning each reversible reaction into two irreversible reactions [37]. Minimal cut sets (MCSs) are another variant of EFMs that result in inactivity of the system with respect to the objective reaction if removed [38]. Therefore, they can be used to identify target genes and repress undesirable metabolic functions, while assessing the effect on the structure of the entire metabolic network. Elementary flux patterns (EFPs) define all potential elementary routes for steady-state fluxes as sets of indices, and can be mapped to EFMs to include factors such as pathway interdependencies, thus taking the entire network into account [39]. Frameworks that combine various computational approaches for synthetic pathway prediction, such as GEM-Path [40] and BNICE [41], are increasing in number, as they provide the opportunity to calculate all possible paths and score them by efficiency [42]. Unbiased methods may also incorporate Monte Carlo sampling, message passing algorithms [43] or symbolic flux analysis [44]. The Markov chain Monte Carlo (MCMC) method can be used to uniformly sample metabolic networks from a genotype space, producing a sequence of viable genotypes (or reaction subsets) by performing a reaction swap between each genotype and its successor; if a swap results in a non-viable genotype, this sequence will remain at the previous genotype for that step and the process is repeated until a metabolic network with the correct number of reactions is reached [45]. A Monte Carlo-based technique has also been used for uniform sampling of feasible steady states in an ellipsoid representing the solution space for a genome-scale metabolic model of Escherichia coli [46]. A revision of this method was proposed with rounding procedures to improve performance by eliminating ill-conditioning when sampling convex polytopes of steady states [47]. Of all omic data types, metabolomic data are said to give the closest indication of observed phenotypes [48]. Therefore, extracellular metabolomic measurements can help to predict intracellular flux states by integrating these data into a constraint-based framework using a sampling-based network approach [49, 50]. The MetaboTools toolbox provides a workflow for integrating metabolomic data into multi-omic models and predicting metabolic phenotypes through analysing how metabolite uptake and secretion differ between conditions [48]. Biased methods Biased methods rely on the definition of an objective function to solve the metabolic network and find its flux rates. For instance, standard FBA belongs to this class of methods. FBA and its variants Among the biased methods, the most well-known technique is FBA, which uses the assignment of stoichiometric coefficients in a matrix to represent the metabolites involved in any given reaction in a metabolic network [51]. Through these coefficients, constraints can be imposed on the system to identify all potential flux distributions associated with a corresponding set of feasible phenotypic states. The aim of FBA is to locate a value (or set of values) in the solution space that best satisfies a given objective function. FBA uses linear programming to solve this objective function, indicating the extent to which each reaction in the network contributes to a phenotypic state. If two objectives (flux rates or linear combinations thereof) were to be maximized, a multi-level linear problem would be formulated as follows: max g⊺vsuch that max f⊺vsuch that Sv=0vmin ≤v≤vmax (1) where f and g are n-dimensional arrays of weights associated with the first and second objectives, respectively, and indicate the contribution of the reaction fluxes v to each objective. vmin and vmax are vectors representing the lower and upper limits for the flux rates in v. A constraint in FBA postulates that the total amount of any metabolite being produced must be equal to the total amount of that metabolite consumed [52]. The most common objective function computed by FBA is the synthesis of biomass, which is commonly used to indicate cellular growth rate and predict product yields [53]. Fluxes can either be calculated under the steady state assumption or in a dynamic state, where changes in specific concentrations and kinetics parameters have been recorded for each metabolite over time [e.g. for dynamic flux balance analysis (DFBA)] [54]. Experimental validation of model predictions for DFBA are often obtained from 13C metabolic flux analysis [55], which uses isotopic-labelling of metabolic substrates to quantify intracellular fluxes. Additionally, methods such as dynamic multi-species metabolic modelling have been used to examine inter-species competition for metabolites in a microbial community [56]. Numerous modifications of FBA propose the application of various constraints to shrink the solution space for determining the precise flux state of the cell by calculating the optimal set of solutions for a given objective via linear programming. In most instances, constraints are defined by cell and reaction stoichiometry, fluxes through transport and metabolic reactions, upper and lower bounds for each flux, biomass composition and ATP requirements [57]. Upper and lower bounds can be estimated using flux variability analysis [58], which returns the maximum and minimum fluxes through each reaction while maintaining minimal biomass production [59]. Linear thermodynamic constraints can be applied in thermodynamic metabolic flux analysis (or thermodynamics-based FBA) and thermodynamic variability analysis to eliminate thermodynamically infeasible reactions or loops from pathways and gather information on feasible metabolite activity and Gibbs free energy changes [60, 61]. The removal of thermodynamically infeasible loops is necessary to prevent violating the loop law, which states that there is no net flux through balanced biochemical loops in networks at steady state [62]. Loopless COBRA methods solve a modified mixed-integer problem with the added constraint of no network fluxes containing loops; application of this constraint has been described for FBA, flux variability analysis (FVA) and MCMC sampling [63]. FBA with thermodynamic constraints has also been described in energy-balance analysis [64]. Fast flux variability analysis with thermodynamic constraints thermodynamically-constrained flux variability analysis (tFVA) removes unbounded fluxes from biochemical loops arising from non-zero, steady-state fluxes involving internal reactions [65]. This is a faster implementation of FVA that does not require the specification of metabolite concentrations or additional experimental data, although these have been included in other variants of the method. Parsimonious FBA identifies a subset of genes contributing to maximizing the growth rate insilico, therefore enabling maximization of stoichiometric efficiency [66]. Another technique uses conditional dependencies present in the metabolic model as constraints for each flux, whereby each flux is constrained by the activity of the compound that facilitates it. This technique is known as conditional FBA and has proved to be effective for simulating phototrophic growth and diurnal dynamics in cyanobacteria [67, 68]. Resource balance analysis uses growth rate limitation caused by distribution of proteins between cellular processes to constrain flux predictions [69]. Constrained allocation flux balance analysis applies a genome-wide constraint on fluxes to observe proteome allocation between ribosomal, transport and biosynthetic proteins [70]. For this method, growth laws governing the synthesis of intracellular proteins are used to design parameters for predicting levels of protein expression and energy production. To improve the prediction of internal fluxes, cost-reduced sub-optimal FBA minimizes protein and thermodynamic costs to simulate a sub-optimal state [71]. Linear metabolite dilution FBA forces dilution in metabolites associated with growth in active reactions, by adding a small dilution flux to block metabolic pathways without input fluxes [72]. Although more time-consuming than a linear programming approach, Bayesian flux estimation results in a probability density function, which is more stable and informative than a simple point estimate [73]. The METABOLICA statistical framework uses a Bayesian approach to performing FBA. Metabolism is modelled in a multi-compartment macroscopic model with a stochastic extension of the stationary state and the Bayesian inference problem is solved by computing posterior probability densities using MCMC sampling [74]. Alternatively, standard constraints may be removed to construct new FBA methods to improve flux predictions for non-steady-state or non-wild-type cells. Relaxing the assumption of fixed reactant proportions for biomass production is the basis of flexible FBA, which can be coupled with relaxing the fixed ratio of byproduct to reactant (known as time-linked FBA) to observe transitions between steady states [75]. Combining these methods also enables the comparison of metabolite production between knockout mutants. To further increase the informativity of constraint-based models, the integration of information from regulatory pathways and external multi-omic data is described in the following section. Regulatory methods to generate context-specific metabolic models Regulatory methods can be used to set constraints for FBA, which incorporate regulatory networks, as well as introducing external omic data, which provides the opportunity to simulate metabolism under specific genetic or environmental conditions. Steady-state regulatory flux balance analysis (SR-FBA) is used to quantify the extent to which metabolic and transcriptional regulatory constraints affect the state of flux activity for various metabolic genes [76]. SR-FBA allows for improved characterization of steady-state metabolic behaviour compared with regulatory FBA, which chooses a single steady state per time interval from all possible solutions to find the flux distribution consistent with the regulatory state of each interval. Integrated FBA incorporates metabolic, regulatory and signalling pathways in the FBA model to enable thorough characterization of dynamic-state metabolic behaviour [77]. Integrated dynamic FBA additionally couples fast and slow reactions to give quantitative time-variant flux predictions [78]. However, many of these approaches are limited by the Boolean logic formalism, which restricts the definition of gene activity to an on/off state. This disadvantage can be overcome by using conditional probabilities to represent gene states and gene-transcription factor interactions when combining high-throughput data with regulatory networks, as demonstrated by PROM [79]. This approach allows for a greater number of interactions between metabolic models and their respective transcriptional regulatory networks to be recorded, as they are quantified automatically [80]. Other methods take a differential (rather than absolute) approach to gene expression analysis where gene expression levels are classified as belonging to one of the three states: overexpressed, unchanged or underexpressed [32]. FlexFlux jointly analyses multi-state regulatory networks and metabolic pathways, both of which contribute to calculation of flux [81]. On constructing an initial regulatory network, qualitative states evolving towards an ‘attractor’ set are converted into user-defined continuous intervals, thus allowing reactions to be constrained by multiple flux values, rather than one single value. The multi-formalism interaction network simulator provides a platform for combining multiple kinetic models with signalling and regulatory networks, omic data integration algorithms and steady-state FBA with linear inhibitor and activator constraints [82]. In this method, a quasi-steady-state Petri net is used to illustrate interactions between different networks [83]. Transcriptional-controlled FBA uses constraints between pairs of conditions based on gene expression data for the optimization of FBA, considering both fold change and absolute change in expression to minimize noise [84]. Recently, a new method called transcriptional regulated flux balance analysis (TRFBA) has been introduced for incorporating expression data as well as transcriptional regulatory networks to simulate growth under various environmental and genetic perturbations [85]. TRFBA applies two unique linear constraints. First, reaction rates are limited using a constant that sets expression levels equal to the upper bounds of reactions; secondly, the expression level of each gene is correlated with the expression of the regulating genes. One important advantage of TRFBA is the ability to improve predictions of growth without requiring detailed information about transcriptional regulators and their target genes. For the integration of transcriptomic profiles with metabolic networks, gene expression measurements can be obtained from microarray and/or RNA sequencing data stored in public repositories, such as the Gene Expression Omnibus (GEO) [86] or Array Express [87], to examine gene activity across various conditions. In addition to these primary archives, there are also databases with additional data processing and annotation [88] (such as information regarding gene regulation or differential expression under various conditions), e.g. Gene Expression Atlas [89], or the web servers Gene Chaser [90] and Profile Chaser [91], which query GEO. There are also many specialized databases providing functional genomic data relating to a particular disease, species or tissue type, such as Oncomine (for cancer-specific microarrays) [92], the MGI Mouse Gene Expression Database [93] or the Pancreatic Expression Database [94]. The generation of context-specific metabolic models may be divided into two main approaches: switch-based and valve-based methods. Switch-based methods for omic integration Switch-based algorithms remove inactive or lowly expressed genes by setting the corresponding reaction boundaries to zero before FBA is performed [95]. For instance, the algorithm for gene inactivity moderated by metabolism and expression (GIMME) finds a flux distribution, which optimizes a given objective and avoids the use of so-called ‘inactive’ reactions below a certain transcription threshold [96, 97]. The main advantage of GIMME is that it can re-enable flux associated with false-negative values in inactive reactions and record consistency between gene expression data and the predicted flux distribution for a given objective [98]. Gene inactivation moderated by metabolism, metabolomics and expression is an extended version of GIMME, which also incorporates metabolomic data in the form of turnover metabolites added as products to each reaction, along with a corresponding sink reaction [99]. This allows for the computation of turnover flux ranges for metabolites. Tissue-specific gene and protein expression values can be integrated into genome-scale metabolic models accounting for different metabolic objectives at the cellular level [100], to extract information regarding the uptake and secretion of metabolites by specific tissue and cell types. For this, tissue-specific variations in enzyme expression levels are used to inform the likelihood of enzymes supporting flux in their associated reactions by categorizing gene-to-reaction mapping for each reaction in the model corresponding to the level of gene expression (i.e. high, moderate or low expression). Subsequently, fluxes corresponding to high gene expression are maximized and those corresponding to low gene expression are minimized when solving a mixed-integer linear program [101]. This process has been developed into the Integrative Metabolic Analysis Tool (iMAT) [102], which displays the most likely predicted metabolic fluxes corresponding to reactions in metabolic models. This tool enables the definition of a biological objective to be dependent on the requirements of each cell rather than the entire organism. An extension of iMAT known as the exploration of alternative metabolic optima enables the design of condition-specific metabolic models for human tissues [103]. Similarly, the integrative network inference for tissues (INIT) algorithm uses tissue-specific information collected from the Human Protein Atlas to help incorporate transcriptomic and proteomic data into a genome-scale model and produce cell-type-specific metabolic networks [104]. These data form the input for a mixed-integer linear problem, which modifies the steady-state condition by setting a small positive net accumulation rate for internal metabolites [105]. Net productions of these metabolites are assigned positive weights, corresponding to arbitrary scores for the level of protein expression. An updated version of INIT known as the task-driven integrative network inference for tissues (tINIT) was devised to identify structural analogs to metabolites (so-called ‘antimetabolites’) and a core set of metabolic tasks to be included in the model [106]. tINIT prevents simultaneous flux in reversible reactions and allows the user to decide whether net production of all metabolites should be considered. The algorithm for metabolic adjustment by differential expression compares the fold changes of gene expression values between conditions to intuitively predict the most consistent and statistically significant metabolic adjustments [107]. The fold changes are expressed as a series of binary expression states, for which differences between successive states most closely mirror corresponding differences in the mean expression levels. Valve-based methods for omic integration Unlike switch-based methods, Valve-based algorithms reduce (or increase) the activity of lowly (or highly, respectively) expressed genes by adjusting the upper and lower bounds for their corresponding reactions. This is usually proportional to the normalized expression of the genes associated with those reactions before performing FBA [95]. Such methods include E-flux [108], E-flux2 [109], metabolic and transcriptomics adaptation estimator (METRADE) [110], FALCON [111] and PROM [79]. For valve-based methods, gene expression data are not discretized as in switch-based methods. Data from methods that treat gene expression as relative as opposed to absolute are more indicative of protein concentrations, as levels of transcription are more comparable across genes [112, 105]. In E-flux, flux boundaries are tightly constrained when gene expression is low but relaxed when gene expression is high [108]; transcript levels can be used to set an upper bound for the maximum production of enzymes and therefore constrain all reaction rates [97]. To improve this formulation, E-flux2 adds minimization of the Euclidean norm of the measured flux vector, thus generating a unique solution [109]. Personalized reconstruction of metabolic models (PRIME) creates cell-specific models incorporating both transcriptomic and phenotypic data, and only modifies the bounds of a small set of reactions within a predefined range [113]. Expression data-guided flux minimization (E-Fmin) is similar to GIMME in that it minimizes a sum of fluxes where weights are a function of gene expression level; however, biomass production is forced to carry non-zero flux (i.e. metabolic activity is not threshold-dependent) and all reactions are thermodynamically feasible owing to flux minimization [114]. FALCON is a novel algorithm that estimates enzyme abundances using gene-protein-reaction (GPR) rules in the model, thus improving the predictive capability of models integrated with expression data [111]. Within a multi-omic model, METRADE constructs a Pareto front to identify the best trade-off when multiple objectives are simultaneously optimized [110]. Transcriptomic data comprising gene expression profiles and codon usage arrays can be mapped to a phenotypic objective space where each profile is associated with a condition [115]. In this way, the identification of optimal metabolic phenotypes is facilitated through the concurrent maximization or minimization of multiple metabolic markers and comparison of predicted flux rates between objectives. Network pruning and other methods In addition to switch- and valve-based integration methods, there are also pruning methods such as MBA [116], FASTCORE [117], mCADRE [118] and OnePrune [72], which only retain a core set of reactions in the metabolic model. FASTCORMICS is a faster adaptation of FASTCORE, which facilitates data integration by preprocessing and produces multiple metabolic models [119]. Similarly, the cost optimization reaction dependency assessment (CORDA) algorithm performs a four-step dependency assessment before calculating flux while minimizing cost production, i.e. using as many high-confidence reactions as possible and minimizing the involvement of absent reactions [120]. The cost of each reaction in the network is represented by the addition of a pseudo-metabolite as a product. CORDA is quicker to implement than many other pruning methods owing to its use of FBA to calculate flux. On the other hand, there are many methods that do not fit into the aforementioned categories (switch/valve/network pruning), as they use more unconventional approaches for omic data integration. Similar to E-flux2, the regularized context-specific model extraction method (RegrEx) is based on principles of regularized least squares optimization by minimizing the squared Euclidean distance between fluxes and experimental data [121] to calculate fluxes, which are independent of user-defined parameters. Instead of assigning expression measurements to individual genes or reactions, the GC-Flux algorithm splits GPR strings for each reaction into functional gene complexes to overcome the assumption of proteins catalysing more than one reaction at a time [122]. Another recent development is the use of the Huber penalty convex optimization function combined with flux minimization, to achieve a more accurate prediction of fluxes, which are closer to experimentally measured values [123]. This method introduces continuous gene expression values in the form of both constraints and target equations, without the need for definition of a biomass objective function or expression thresholds. A novel method known as omFBA [124] uses a phenotype-match algorithm to formulate the optimal objective function, i.e. the function that yields the most accurate estimations of the observed phenotypes. This objective can be simultaneously correlated with multiple omic data types via regression analysis to generate an omics-guided objective function, consequently resulting in a clearer correlation between genotype and phenotype and improved phenotypic predictions. Genetic perturbation and objective function selection Genetic perturbation is an important tool for establishing gene essentiality and maximizing pathway efficiency. Deciding on the number of gene perturbations to be performed depends on multiple factors. Choosing to perform single or pairwise gene perturbations one-by-one may fail to capture the essentiality and function of that gene as a result of genetic redundancy (i.e. there may be multiple genes encoding the same function). However, concurrently knocking out multiple genes can cause issues related to scaling unless coupled with e.g. Shapley value analysis, which assigns a contribution value to each gene knockout in the system [125]. Synthetic lethality can be described as the simultaneous inactivation of a set of non-essential genes resulting in the death of a cell or organism [126]. Knocking out multiple synthetic lethal pairs for genome-scale metabolic models can help in analysing the structural robustness of metabolic networks and identifying interdependencies among genes and reactions [127]. There are numerous algorithms for the detection and analysis of synthetic lethal pairs. Fast-SL is an algorithm capable of identifying higher-order lethal reaction and gene sets by taking GPR associations into account and vastly reducing the search space before iterating through the remaining combinations of genes/reactions [128]. MCSs can also be regarded as synthetic lethals, which can be targeted for drug therapies, as they constitute essential gene/reaction sets [129]. The data mining synthetic lethality identification pipeline (DAISY) statistically infers interactions between synthetic lethals using a combination of approaches: genomic survival of the fittest, short hairpin RNA (shRNA) based functional examination and pairwise gene coexpression [130]. A method for identifying dosage lethality effects in genome-scale models of cancer metabolism exploits synthetic dosage lethality to simulate the pairwise knockout of non-essential enzymes via overexpression of the first gene and underexpression of the second [131]. Flux ratios can be applied as constraints for FBA using the flux balance analysis with flux ratios (FBrAtio) algorithm, which can be directly implemented into the stoichiometric matrix of genome-scale metabolic models [132, 133]. In FBrAtio, multiple enzymes compete for metabolic branch points in the network (known as critical nodes) [132], which specify how a substrate in the metabolite pool is distributed between competing reactions; this depends on factors relating to thermodynamics such as enzyme availability and downstream accumulation of reactive intermediates. The optimization of flux ratios for a particular phenotype can be achieved through partial knockdown, overexpression or total knockout of enzyme-coding genes [132]. As opposed to complete knockouts, performing gene overexpression or partial knockdown may prove to be useful for targeted reduction of expression levels. The minimization of metabolic adjustment (MOMA) method relaxes the assumption of optimal growth flux for gene deletions by solving a quadratic problem to optimize distance minimization in flux space [134]. This is because the minimal response to the perturbation is considered to be a more accurate estimate of the true flux state in the mutant [135]. Initially, the flux distribution for the mutant remains as close as possible to optimal flux for the wild-type and deviates to form a sub-optimal flux distribution between that of the wild-type and mutant. In this way, MOMA is able to predict phenotypic outcomes following knockouts more precisely than FBA. Using a mechanistic model of reaction rates, integrative omics metabolic analysis also solves a quadratic problem to deliver kinetically derived estimations of flux following genetic perturbation [136]. This is possible through integrating quantitative proteomic and metabolomic data into the model, which improves performance when compared with MOMA. Similarly, regulatory on/off minimization minimizes the number of significant flux changes following knockouts with respect to the wild-type [137]. This is achieved through redirecting flux through alternative pathways following knockout. Many gene perturbation experiments simulate knockouts under the assumption that there is no downstream effect on gene regulation [112]. In the RELATCH method, the principle of relative optimality is applied to predict how cells adapt to perturbations, by minimizing relative flux patterns and latent pathway activation with respect to a reference flux distribution [138]. As strains adapt to their perturbed state, they undergo regulatory and metabolic changes, represented by two parameters—one penalizing latent pathway activation and another limiting enzyme contribution increases in active pathways. In varying environmental conditions, Bayesian factor modelling can be used to elucidate pathway cross-correlations and identify degrees of pathway activation [139]. Unconventionally, the REMEP method considers the impact of perturbations on metabolite as well as flux patterns [140]. This leads to improved flux predictions for knockout mutants as the structure of cellular regulation is represented more accurately. Multi-objective optimization of metabolic models It can be difficult to define the single most important objective in a biological system, as there are usually multiple conflicting cellular objectives in addition to the maximization of biomass, which is often used as a proxy for growth. Methods such as Bayesian objective function discrimination can be used for selection of the most suitable objective function by using a probabilistic approach to compare multiple objectives [141]. Alternatively, there are methods using lexicographic ordering [142] or calculation of a weighted sum to scalarize multiple objectives [143]; however, it can be difficult to select weights that elicit a uniform distribution of Pareto solutions and find solutions in non-convex regions [144–146]. Thus, multi-objective optimization arguably presents the most realistic representation of metabolic flux in biological systems by considering the contribution of a wide range of competing objectives. The rest of the section describes the main methods used to implement multi-objective optimization in metabolic models. Multi-objective optimization can be used to resolve trade-offs between conflicting metabolic objectives through simulating a series of optimal, non-dominated vectors f(x) in the multi-dimensional objective space. For such vectors, an improved solution does not exist for any given objective without sacrificing the performance of another [110]. This is known as a Pareto front and enables the simultaneous consideration of multiple conditions and constraints affecting each cellular objective [147]. For example, optimization of the objective function r through maximization or minimization of the vector function f(x) may be carried out respectively as follows: fi(x)>fi(x∗), ∀ i=1,…,rfi(x)% mutate(activation = fbar::gene_eval(expressions = geneAssociation, genes = names(genome), presences = genome ), activation = coalesce(activation,1), uppbnd = pmin(uppbnd,1000*activation + 0.1), lowbnd = pmax(lowbnd,-1000*activation-0.1))%>% fbar::find_fluxes_df}(do_minimization = FALSE)%>% mutate(lowbnd = ifelse(abbreviation== ’Biomass_Ecoli_core_w/GAM’, flux*0.99, lowbnd), uppbnd = ifelse(abbreviation== ’Biomass_Ecoli_core_w/GAM’, flux*1.01, uppbnd), obj_coef = 1*(abbreviation== ’EX_ac(e)’))%>% fbar::find_fluxes_df}(do_minimization = FALSE) return (list(bm = filter(res,abbreviation== ’Biomass_Ecoli_core_w/GAM’)$flux, synth = filter(res,abbreviation== ’EX_ac(e)’)$flux)) } Before proceeding with the genetic optimization portion of the procedure, we need to describe two helper functions to our slightly modified form of NSGA-II, termed non_dom_sort and crowding_distance. Note that the following descriptions refer to the full code provided in the complete tutorial, available as Supplementary Material. Non-domination sorting is the first stage of the selection procedure in NSGA-II. It sorts the points by multiple objectives to Pareto, or non-dominated fronts. These fronts are designed such that for every point in a front, there is no point in the same front or another front with a higher number such that the second point is better than the first in every objective (see Equation 2). This is calculated as follows: We compare every point against every other point. For each point (x), we see if there exists any second point (y) that has a higher value in all objectives. Where such a second point exists, we term the original point ‘dominated’. We find the set of points that have no dominating point, and term this the first non-dominated front. When two points are identical, they are both assigned to the same front. We repeat this procedure, but ignore points in the first non-dominated front to find the second non-dominated front, and so on. The second part of the NSGA-II evaluation procedure is finding the crowding distance. This is used to break ties between points in the same non-dominated front. For each front and for each dimension, this function sorts the points into order along the dimension, and finds the normalized distance between the proceeding point and succeeding point. These values are summed up across each dimension to find the value for the point. The following code is the genetic loop of the algorithm. It is explained by code comments, but follows a normal pattern of evaluating, sorting, selecting from and mutating the population. The genetic algorithm used here is a modified version of NSGA-II [151], with a population of 200 individuals and carrying out 500 iterations. Inside the loop, the steps are as follows: Evaluate all genomes: first, we use the evaluation function on each genome to find the resulting biomass and synthetic fluxes. Round the results: this is a useful tool to help the NSGA-II procedure by regarding similar results as identical, encouraging more variety in the results set. Label the results: labelling is required so that we can identify them after non-dominance sorting. Shuffle: shuffling the results is important because inevitably, some points will be completely identical, and we want to choose one at random in this case, rather than always pick the same result. Find the non-dominated fronts: assign a front number to each point, such that points with a lower front number are strictly superior to those with a higher one. Find the crowding distance: select for points with more variety. Sort by front, breaking ties by crowding distance: there are normally multiple points in each front, so the continuous crowding distance value is required to choose between these. Keep the best half of the population: using the label assigned earlier. Sample parents from population: use random sampling with replacement to find which members of the population are used as a basis for new members. Mutate parents to create offspring: add new members to the population by flipping 2% of the genes in the parents. Combine the offspring and parent populations: build the new population and repeat. start_genome <-set_names (rep_along(genes_in_model,TRUE),genes_in_model) pop <-list(start_genome) popsize = 200 generations = 500 for(iin1:generations){ results <-map_df(pop,evaluation_function)%>%# Evaluate all the genomes mutate(bm = signif(bm),synth = signif(synth))%>%#Round results mutate(id = 1:n())%>%#label the results sample_frac()%>%#Shuffle non_dom_sort()%>%#Find the non-dominated fronts crowding_distance()%>%#Find the crowding distances arrange(front,desc(crowding))#Sort by front, breaking ties by crowding distance selected <-results%>% filter(row_number() < = popsize/2)%>%#Keep the best half of the population getElement(’id’) kept_pop <-pop[selected] altered_pop <-kept_pop%>% sample(popsize-length(selected),TRUE)%>%#Sample parents from population map(function(genome){ xor(genome,runif(length(genome))>0.98)#Mutate parents to create offspring }) pop <-unique(c(kept_pop,altered_pop))#Combine the offspring and parent populations } Once we have a results set, we can construct a plot like Figure 3 to view the non-dominated fronts. We can see how the first front describes the trade-off between biomass and the synthetic objective, with the lines showing the dominated area (to the bottom left). This shows that even with a small population and a number of iterations we can see three high-quality trade-off points between the synthetic objective and biomass production, with biomass values of around 0.75, 0.63 and 0.38 h−1, and corresponding synthetic values of 6, 11 and 14 mmol h−1 gDW−1. Figure 3 View largeDownload slide Plot of a sample Pareto front showing the trade-off between the biomass (x-axis, h−1) and the synthetic objective (y-axis, mmol h−1 gDW−1). The area underlying each of the ten fronts of non-dominated points represents the number of metabolic configurations in the solution space that are supported when the objectives are maximized (or minimized). Figure 3 View largeDownload slide Plot of a sample Pareto front showing the trade-off between the biomass (x-axis, h−1) and the synthetic objective (y-axis, mmol h−1 gDW−1). The area underlying each of the ten fronts of non-dominated points represents the number of metabolic configurations in the solution space that are supported when the objectives are maximized (or minimized). Perspective: integration with multi-view machine learning approaches Multi-view learning can be described as a sub-division of machine learning methods that aims to merge different aspects of a common problem in a single setting. It is based on principles of maximizing the consensus between different viewpoints while offsetting the limitations of each view through complementation with the other views [190]. It is evident that this approach is highly applicable to the context of poly-omic data integration in genome scale metabolic models, owing to the interdependencies and correlations between all types of poly-omic data. At the very least, genomics, transcriptomics and proteomics are inextricably linked by the central dogma of molecular biology [191]. However, as omic types significantly differ in their scale and structure, the data are classed as heterogeneous, and several normalization measures would be required before mapping each omic as a layer in the metabolic model. Integration following the simultaneous analysis of multiple data types (known as meta-dimensional analysis) may be preceded by directly concatenating single sample matrices into one large matrix, transforming samples into intermediate graph or kernel matrices, or generating multiple models using different data types as training sets [192]. Data integration can be performed at an early, intermediate or late stage, depending on the nature of the data and the learning algorithm used. Early integration allows for the creation of a large pool of data before processing. Intermediate integration involves condensing each view into a similarity matrix with pairwise comparisons before a learning algorithm is applied. Late integration provides the opportunity to select the most suitable learning algorithm for each omic before merging data, comparing analyses between omics and linking patterns found in each omic [193]. As early integration increases data dimensionality rather than reducing it, intermediate or late integration would be preferred in this context to avoid introducing noise and decreasing performance before data transformation. Methods such as clustering and multiple kernel learning have successfully been implemented in cancer sub-typing on the basis of shared molecular characteristics, and are ideal for classifying unsupervised data into groups to detect underlying associations where there is little information available [194, 195]. K-means is a traditional clustering algorithm that finds the number of clusters minimizing the sum of the squared Euclidean distances between each observation and its respective cluster mean [196]. The algorithm starts by selecting k random points in the data set (termed cluster centroids), which define the groups that the remaining data points are assigned to. The centroids are then moved to the averages computed in each group, and the process is repeated until distinctive clusters are formed. A number of flaws can be identified when this process is applied to multi-view learning. The primary concern is that of a lack of consideration of the importance of each individual view, as well as the differences between multiple views. To address these issues, a bi-level variant of k-means clustering known as Tw-k-means was established, which added simultaneous weighting of both views and individual variables. This resulted in easier identification of the importance of variables and views, as well as in a decrease in the effect of low-quality views and noise [197]. An alternative multi-view clustering approach known as iCluster [198] has been developed to model cancer subtypes from the Cancer Genome Atlas. In iCluster, cancer subtypes are considered as latent variables, which could be estimated by taking differences between views into account when partitioning multidimensional data into disparate groups. If a clustering algorithm is chosen for the partitioning of poly-omic data, it is important to note that the clusters of genes or metabolites may vary depending on the condition modelled. Integration by matrix factorization compiles clusters into matrices, which are factorized to assess the contribution of each separate cluster to each view, as well as the overall contribution of each view [199]. Multiple kernel learning transforms data structures into kernel matrices and optimizes weight vectors that linearly combine these matrices to generate a unified kernel matrix. This facilitates the intermediate integration of data from different views, irrespective of the number of features used [195]. Another kernel-based technique known as similarity network fusion (SNF) [200] separately combines samples within each type of data to form individual networks. Such networks are iteratively integrated into a large, comprehensive network, mapped to the feature space in a non-linear fashion and used to assess the amount of information of each data type in explaining any similarity observed between the samples. In a modified version of SNF [201], a bias layer was introduced between omic layers to account for the varying quality of metabolic reconstructions and therefore assign larger weights to omic layers that contributed more to the phenotype. A support vector machine (SVM) is a supervised learning technique that, given a set of training examples, determines the optimal hyperplane to separate classes in the feature space whilst maximizing the distance between samples of different classes [202]. Linear models such as SVMs or the least absolute shrinkage and selection operator can be treated as classification or regression problems, respectively, for performing feature concatenation with heterogeneous features, such as those found in poly-omic data [203]. Decision tree-based methods are highly intuitive and allow the analysis of both continuous and discrete features in the same model without the need for data normalization. However, the high dimensionality of poly-omic data sets would make decision trees prone to noise and overfitting (especially if there are an insufficient number of features) [203]. This can be overcome by using ensemble learning methods such as random forest, which selects features at random as it constructs a decision tree. If a classification approach is taken, the most popular class is voted for following the generation of multiple trees; if the regression approach is taken, outputs from the multiple trees are averaged [204]. Feature selection uses the minimum information necessary to classify key features and could prove useful in identifying the most significant trends in poly-omic data sets. Fortino et al. [205] have described a multivariate discovery process using ‘fuzzy patterns’ to discretize and label gene expression data for the selection of the most relevant features. A random forest algorithm was then used to rank features in order of their usefulness, and to improve the stability and accuracy of data. A variation of this method has been used to implement feature selection in the integration of metabolomic, lipidomic and clinical data for the study of obesity and metabolic syndrome [206]. A number of Bayesian methods could also be applied to poly-omic data integration, such as Gaussian mixture models and Latent Dirichlet Allocation (LDA). Angione et al. [139] incorporated matrix factorization into a Bayesian hierarchical model using Gaussian Markov Random Fields. This approach led to inferring cross-correlations between pathways in a metabolic network, and to prediction of pathway activation profiles as a result of bacterial responses to environmental conditions. Using a Bayesian method, predefined reaction-pathway memberships were used to model the prior distribution that was multiplied by the likelihood of observations to obtain the posterior distribution, which was subsequently used to infer model predictions. This approach serves to facilitate the observation of links between reactions, pathways and conditions, which can in turn help to interpret the consequences of metabolic flux variations and, consequently, the biological system as a whole. LDA has yet to be applied to poly-omic integration but has potential because of its efficacy in organizing unsupervised data from a mixture of clusters [207]. Conclusion In this work, we have surveyed the principal methods available for constraint-based modelling and omic integration. We have presented these in the form of a ‘forest’, also available in interactive version at http://modellingmetabolism.net where it will be updated periodically. As an up-to-date classification of available methods, we believe that this will prove to be a useful resource for prospective modellers. We have also provided the first tutorial in R for multi-objective optimization of metabolic models. We envisage that a late integration approach can be used to test the suitability of various multi-view learning algorithms for each omic data set used in the integration process, depending on the structure of the data. For example, a clustering algorithm may be chosen for mapping microarray or RNA sequencing data onto a metabolic model. Likewise, a SVM may be chosen for mapping protein data or post-translational modifications. Finally, feature selection may be used to identify the most distinctive features across both of these omic layers, as a whole. In the near future, the integration of multi-view machine learning in metabolic modelling is likely to grow in parallel with the rapid advancement of high-throughput omic technologies. With the increase in the number of omic layers, existing algorithms will continue to be extended. However, we believe a one-size-fits-all approach is not the most effective way of tackling a multi-omic genome-scale problem. Given the intrinsic differences between omic layers, we envisage specific algorithms designed only for the analysis of each layer. When required, a global prediction can then be achieved through methods for aggregation of layers, such as those successfully used in multilayer network theory. Key Points Being downstream of gene expression, metabolism is being increasingly used as an indicator of the phenotypic outcome for drugs and therapies. We present an online resource and up-to-date classification of existing methods and tools for poly-omic modelling of metabolism in systems biology. We provide a hands-on tutorial for multi-objective optimization of metabolic models in R. We finally discuss the implementation of multi-view machine learning approaches in poly-omic data integration. Funding This work was partially funded by a Teesside University doctoral scholarship, EPSRC, and the EU grant MIMOMICS. Supreeta Vijayakumar is a PhD student at the Department of Computer Science and Information Systems, Teesside University, UK. Her research focuses on the integration of multi-omic data and machine learning algorithms for constraint-based modelling of microbial communities. Max Conway is a PhD student at the Computer Laboratory, University of Cambridge, UK. His research interests include machine learning to extract structure from metabolic and multi-omic models. Pietro Lió is a Reader in Computational Biology at the Computer Laboratory, University of Cambridge. His affiliations also include the Cambridge Computational Biology Institute. His work spans machine learning and computational models for health big data, personalized medicine research, multi-scale/multi-omic/multi-physics modelling and data integration methods. Claudio Angione is a Senior Lecturer in Computer Science at the Department of Computer Science and Information Systems, Teesside University, UK. He holds a PhD in Computer Science from the University of Cambridge, UK. His research interests include cancer metabolism, machine learning, systems biology and optimization of genome-scale and poly-omic models. References 1 Kanehisa M , Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes . Nucleic Acids Res 2000 ; 28 ( 1 ): 27 – 30 . Google Scholar Crossref Search ADS PubMed 2 Schellenberger J , Park JO , Conrad TM , et al. BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions . BMC Bioinformatics 2010 ; 11 ( 1 ): 213 . Google Scholar Crossref Search ADS PubMed 3 Karp PD , Ouzounis CA , Moore-Kochlacs C , et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes . Nucleic Acids Res 2005 ; 33 ( 19 ): 6083 – 9 . Google Scholar Crossref Search ADS PubMed 4 Moretti S , Martin O , Van Du Tran T , et al. MetaNetX/MNXref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks . Nucleic Acids Res 2016 ; 44 ( D1 ): D523 – 6 . Google Scholar Crossref Search ADS PubMed 5 Henry CS , DeJongh M , Best AA , et al. High-throughput generation, optimization and analysis of genome-scale metabolic models . Nat Biotechnol 2010 ; 28 ( 9 ): 977 – 82 . Google Scholar Crossref Search ADS PubMed 6 Prigent S , Frioux C , Dittami SM , et al. Meneco, a topology-based gap-filling tool applicable to degraded genome-wide metabolic networks . PLoS Comput Biol 2017 ; 13 ( 1 ): e1005276. Google Scholar Crossref Search ADS PubMed 7 Mienda BS. Genome-scale metabolic models as platforms for strain design and biological discovery . J Biomol Struct Dyn 2016 , in press, Just-accepted(just-accepted):1–23. 8 Kumar VS , Maranas CD. GrowMatch: an automated method for reconciling in silico/in vivo growth predictions . PLoS Comput Biol 2009 ; 5 ( 3 ): e1000308. Google Scholar Crossref Search ADS PubMed 9 Reed JL , Patel TR , Chen KH , et al. Systems approach to refining genome annotation . Proc Natl Acad Sci USA 2006 ; 103 ( 46 ): 17480 – 4 . Google Scholar Crossref Search ADS PubMed 10 Herrgård MJ , Fong SS , Palsson BØ. Identification of genome-scale metabolic network models using experimentally measured flux profiles . PLoS Comput Biol 2006 ; 2 ( 7 ): e72. Google Scholar Crossref Search ADS PubMed 11 OBrien EJ , Monk JM , Palsson BO. Using genome-scale models to predict biological capabilities . Cell 2015 ; 161 ( 5 ): 971 – 87 . Google Scholar Crossref Search ADS PubMed 12 Kumar VS , Dasika MS , Maranas CD. Optimization based automated curation of metabolic reconstructions . BMC Bioinformatics 2007 ; 8 :( 1 ): 212 . Google Scholar Crossref Search ADS PubMed 13 Thiele I , Vlassis N , Fleming RM. fastGapFill: efficient gap filling in metabolic networks . Bioinformatics 2014 ; 30 ( 17 ): 2529 – 31 . Google Scholar Crossref Search ADS PubMed 14 Vitkin E , Shlomi T. MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks . Genome Biol 2012 ; 13 ( 11 ): R111. Google Scholar Crossref Search ADS PubMed 15 Palsson BØ. Systems Biology: Constraint-Based Reconstruction and Analysis . Cambridge: Cambridge University Press , 2015 . 16 Lewis NE , Nagarajan H , Palsson BØ. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods . Nat Rev Microbiol 2012 ; 10 ( 4 ): 291 – 305 . Google Scholar Crossref Search ADS PubMed 17 Lee SY , et al. Systems Biology and Biotechnology of Escherichia Coli . New York: Springer , 2009 . 18 Papin JA , Stelling J , Price ND , et al. Comparison of network-based pathway analysis methods . Trends Biotechnol 2004 ; 22 ( 8 ): 400 – 5 . Google Scholar Crossref Search ADS PubMed 19 Fang Z , Tian W , Ji H. A network-based gene-weighting approach for pathway analysis . Cell Res 2012 ; 22 ( 3 ): 565 – 80 . Google Scholar Crossref Search ADS PubMed 20 Zhang Q , Li J , Xie H , et al. A network-based pathway-expanding approach for pathway analysis . BMC Bioinformatics 2016 ; 17 ( 17 ): 231. Google Scholar PubMed 21 Liu L , Ruan J. Network-based pathway enrichment analysis. In: 2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . New York: IEEE, 2013 , 218 – 21 . 22 Ma J , Shojaie A , Michailidis G. Network-based pathway enrichment analysis with incomplete network information . Bioinformatics 2016 ; 32 ( 20 ): 3165 – 74 . Google Scholar Crossref Search ADS PubMed 23 Zanghellini J , Ruckerbauer DE , Hanscho M , et al. Elementary flux modes in a nutshell: properties, calculation and applications . Biotechnol J 2013 ; 8 ( 9 ): 1009 – 16 . Google Scholar Crossref Search ADS PubMed 24 Pérès S , Vallée F , Beurton-Aimar M , et al. ACoM: a classification method for elementary flux modes based on motif finding . Biosystems 2011 ; 103 ( 3 ): 410 – 19 . Google Scholar Crossref Search ADS PubMed 25 Kaleta C,D , Figueiredo LF , Behre J , et al. EFMEvolver: computing elementary flux modes in genome-scale metabolic networks . Lect Notes Inform 2009 ; 157 : 179 – 89 . 26 De Figueiredo LF , Podhorski A , Rubio A , et al. Computing the shortest elementary flux modes in genome-scale metabolic networks . Bioinformatics 2009 ; 25 ( 23 ): 3158 – 65 . Google Scholar Crossref Search ADS PubMed 27 Wagner C , Urbanczik R. The geometry of the flux cone of a metabolic network . Biophys J 2005 ; 89 ( 6 ): 3837 – 45 . Google Scholar Crossref Search ADS PubMed 28 Rezola A , de Figueiredo LF , Brock M , et al. Exploring metabolic pathways in genome-scale networks via generating flux modes . Bioinformatics 2011 ; 27 ( 4 ): 534 – 40 . Google Scholar Crossref Search ADS PubMed 29 Urbanczik R , Wagner C. An improved algorithm for stoichiometric network analysis: theory and applications . Bioinformatics 2005 ; 21 ( 7 ): 1203 – 10 . Google Scholar Crossref Search ADS PubMed 30 Larhlimi A , Bockmayr A. A new constraint-based description of the steady-state flux cone of metabolic networks . Discrete Appl Math 2009 ; 157 ( 10 ): 2257 – 66 . Google Scholar Crossref Search ADS 31 Hidalgo JF , Guil F , García JM. A new approach to obtaining EFMs using graph methods based on the shortest path between end nodes . Genomics Comput Biol 2016 ; 2 ( 1 ): 30. Google Scholar Crossref Search ADS 32 Rezola A , Pey J , Tobalina L , et al. Advances in network-based metabolic pathway analysis and gene expression data integration . Brief Bioinform 2015 ; 16 : 265 – 79 . Google Scholar Crossref Search ADS PubMed 33 Gerstl MP , Ruckerbauer DE , Mattanovich D , et al. Metabolomics integrated elementary flux mode analysis in large metabolic networks . Sci Rep 2015 ; 5 : 8930. Google Scholar Crossref Search ADS PubMed 34 Kümmel A , Panke S , Heinemann M. Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data . Mol Syst Biol 2006 ; 2 ( 1 ): 2006.0034 . Google Scholar PubMed 35 Gerstl MP , Jungreuthmayer C , Müller S , et al. Which sets of elementary flux modes form thermodynamically feasible flux distributions? FEBS J 2016 ; 283 ( 9 ): 1782 – 94 . Google Scholar Crossref Search ADS PubMed 36 Schilling CH , Letscher D , Palsson BØ. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective . J Theor Biol 2000 ; 203 ( 3 ): 229 – 48 . Google Scholar Crossref Search ADS PubMed 37 Clarke BL. Stoichiometric network analysis . Cell Biochem Biophys 1988 ; 12 ( 1 ): 237 – 53 . 38 Clark ST , Verwoerd WS. Minimal cut sets and the use of failure modes in metabolic networks . Metabolites 2012 ; 2 ( 3 ): 567 – 95 . Google Scholar Crossref Search ADS PubMed 39 Kaleta C , de Figueiredo LF , Schuster S. Can the whole be less than the sum of its parts? Pathway analysis in genome-scale metabolic networks using elementary flux patterns . Genome Res 2009 ; 19 ( 10 ): 1872 – 83 . Google Scholar Crossref Search ADS PubMed 40 Campodonico MA , Andrews BA , Asenjo JA , et al. Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path . Metab Eng 2014 ; 25 : 140 – 58 . Google Scholar Crossref Search ADS PubMed 41 Hatzimanikatis V , Li C , Ionita JA , et al. Exploring the diversity of complex metabolic networks . Bioinformatics 2005 ; 21 ( 8 ): 1603 – 9 . Google Scholar Crossref Search ADS PubMed 42 Medema MH , Van Raaphorst R , Takano E , et al. Computational tools for the synthetic design of biochemical pathways . Nat Rev Microbiol 2012 ; 10 ( 3 ): 191 – 202 . Google Scholar Crossref Search ADS PubMed 43 Braunstein A , Mulet R , Pagnani A. Estimating the size of the solution space of metabolic networks . BMC Bioinformatics 2008 ; 9 ( 1 ): 240. Google Scholar Crossref Search ADS PubMed 44 Schryer DW , Vendelin M , Peterson P. Symbolic flux analysis for genome-scale metabolic networks . BMC Syst Biol 2011 ; 5 ( 1 ): 81 . Google Scholar Crossref Search ADS PubMed 45 Samal A , Matias Rodrigues JF , Jost J , et al. Genotype networks in metabolic reaction spaces . BMC Syst Biol 2010 ; 4 ( 1 ): 30 . Google Scholar Crossref Search ADS PubMed 46 De Martino D , Parisi V. Montecarlo uniform sampling of highdimensional convex polytopes: reducing the condition number with applications in metabolic network analysis. arXiv preprint. arXiv preprint arXiv:1312.5228, 2013 . 47 De Martino D , Mori M , Parisi V. Uniform sampling of steady states in metabolic networks: heterogeneous scales and rounding . PloS One 2015 ; 10 ( 4 ): e0122670. Google Scholar Crossref Search ADS PubMed 48 Aurich MK , Fleming RM , Thiele I. MetaboTools: a comprehensive toolbox for analysis of genome-scale metabolic models . Frontiers in Physiology 2016 ; 7 : 327 . Google Scholar Crossref Search ADS PubMed 49 Mo ML , Palsson BØ , Herrgård MJ. Connecting extracellular metabolomic measurements to intracellular flux states in yeast . BMC Syst Biol 2009 ; 3 ( 1 ): 37 . Google Scholar Crossref Search ADS PubMed 50 Aurich MK , Paglia G , Rolfsson Ó , et al. Prediction of intracellular metabolic states from extracellular metabolomic data . Metabolomics 2015 ; 11 ( 3 ): 603 – 19 . Google Scholar Crossref Search ADS PubMed 51 Orth JD , Thiele I , Palsson BØ. What is flux balance analysis? Nat Biotechnol 2010 ; 28 ( 3 ): 245 – 8 . Google Scholar Crossref Search ADS PubMed 52 Zieliński ŁP , Smith AC , Smith AG , et al. Metabolic flexibility of mitochondrial respiratory chain disorders predicted by computer modelling . Mitochondrion 2016 ; 31 : 45 – 55 . Google Scholar Crossref Search ADS PubMed 53 Feist AM , Palsson BØ. The biomass objective function . Curr Opin Microbiol 2010 ; 13 ( 3 ): 344 – 9 . Google Scholar Crossref Search ADS PubMed 54 Mahadevan R , Edwards JS , Doyle FJ. Dynamic flux balance analysis of diauxic growth in Escherichia coli . Biophys J 2002 ; 83 ( 3 ): 1331 – 40 . Google Scholar Crossref Search ADS PubMed 55 Wiechert W. 13 C metabolic flux analysis . Metab Eng 2001 ; 3 ( 3 ): 195 – 206 . Google Scholar Crossref Search ADS PubMed 56 Zhuang K , Izallalen M , Mouser P , et al. Genome-scale dynamic modeling of the competition between Rhodoferax and Geobacter in anoxic subsurface environments . ISME J 2011 ; 5 ( 2 ): 305 – 16 . Google Scholar Crossref Search ADS PubMed 57 Reed JL. Shrinking the metabolic solution space using experimental datasets . PLoS Comput Biol 2012 ; 8 ( 8 ): e1002662. Google Scholar Crossref Search ADS PubMed 58 Burgard AP , Vaidyaraman S , Maranas CD. Minimal reaction sets for Escherichia coli metabolism under different growth requirements and uptake environments . Biotechnol Prog 2001 ; 17 ( 5 ): 791 – 7 . Google Scholar Crossref Search ADS PubMed 59 Mahadevan R , Schilling C. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models . Metab Eng 2003 ; 5 ( 4 ): 264 – 76 . Google Scholar Crossref Search ADS PubMed 60 Henry CS , Broadbelt LJ , Hatzimanikatis V. Thermodynamics-based metabolic flux analysis . Biophys J 2007 ; 92 ( 5 ): 1792 – 805 . Google Scholar Crossref Search ADS PubMed 61 Ataman M , Hatzimanikatis V. Heading in the right direction: thermodynamics-based network analysis and pathway engineering . Curr Opin Biotechnol 2015 ; 36 : 176 – 82 . Google Scholar Crossref Search ADS PubMed 62 Price ND , Famili I , Beard DA , et al. Extreme pathways and Kirchhoff’s second law . Biophys J 2002 ; 83 ( 5 ): 2879 . Google Scholar Crossref Search ADS PubMed 63 Schellenberger J , Lewis NE , Palsson BØ. Elimination of thermodynamically infeasible loops in steady-state metabolic models . Biophys J 2011 ; 100 ( 3 ): 544 – 53 . Google Scholar Crossref Search ADS PubMed 64 Beard DA , Liang S , Qian H. Energy balance for analysis of complex metabolic networks . Biophys J 2002 ; 83 ( 1 ): 79 – 86 . Google Scholar Crossref Search ADS PubMed 65 Müller AC , Bockmayr A. Fast thermodynamically constrained flux variability analysis . Bioinformatics 2013 ; 29 : 903 – 9 . Google Scholar Crossref Search ADS PubMed 66 Lewis NE , Hixson KK , Conrad TM , et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models . Mol Syst Biol 2010 ; 6 ( 1 ): 390 . Google Scholar PubMed 67 Rügen M , Bockmayr A , Steuer R. Elucidating temporal resource allocation and diurnal dynamics in phototrophic metabolism using conditional FBA . Sci Rep 2015 ; 5 : 15247 Google Scholar Crossref Search ADS PubMed 68 Reimers AM , Knoop H , Bockmayr A , et al. Evaluating the stoichiometric and energetic constraints of cyanobacterial diurnal growth. arXiv preprint arXiv:1610.06859, 2016 . 69 Goelzer A , Fromion V. Bacterial growth rate reflects a bottleneck in resource allocation . Biochim Biophys Acta 2011 ; 1810 ( 10 ): 978 – 88 . Google Scholar Crossref Search ADS PubMed 70 Mori M , Hwa T , Martin OC , et al. Constrained allocation flux balance analysis . PLoS Comput Biol 2016 ; 12 ( 6 ): e1004913. Google Scholar Crossref Search ADS PubMed 71 Schultz A , Qutub AA. Predicting internal cell fluxes at sub-optimal growth . BMC Syst Biol 2015 ; 9 ( 1 ): 18. Google Scholar Crossref Search ADS PubMed 72 Dreyfuss JM , Zucker JD , Hood HM , et al. Reconstruction and validation of a genome-scale metabolic model for the filamentous fungus Neurospora crassa using FARM . PLoS Comput Biol 2013 ; 9 ( 7 ): e1003126. Google Scholar Crossref Search ADS PubMed 73 Heino J , Tunyan K , Calvetti D , et al. Bayesian flux balance analysis applied to a skeletal muscle metabolic model . J Theor Biol 2007 ; 248 ( 1 ): 91 – 110 . Google Scholar Crossref Search ADS PubMed 74 Heino J , Calvetti D , Somersalo E. Metabolica: a statistical research tool for analyzing metabolic networks . Comput Methods Programs Biomed 2010 ; 97 ( 2 ): 151 – 67 . Google Scholar Crossref Search ADS PubMed 75 Birch EW , Udell M , Covert MW. Incorporation of flexible objectives and time-linked simulation with flux balance analysis . J Theor Biol 2014 ; 345 : 12 – 21 . Google Scholar Crossref Search ADS PubMed 76 Shlomi T , Eisenberg Y , Sharan R , et al. A genome-scale computational study of the interplay between transcriptional regulation and metabolism . Mol Syst Biol 2007 ; 3 ( 1 ): 101 . Google Scholar PubMed 77 Covert MW , Xiao N , Chen TJ , et al. Integrating metabolic, transcriptional regulatory and signal transduction models in Escherichia coli . Bioinformatics 2008 ; 24 ( 18 ): 2044 – 50 . Google Scholar Crossref Search ADS PubMed 78 Lee JM , Gianchandani EP , Eddy JA , et al. Dynamic analysis of integrated signaling, metabolic, and regulatory networks . PLoS Comput Biol 2008 ; 4 ( 5 ): e1000086. Google Scholar Crossref Search ADS PubMed 79 Chandrasekaran S , Price ND. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis . Proc Natl Acad Sci USA 2010 ; 107 ( 41 ): 17845 – 50 . Google Scholar Crossref Search ADS PubMed 80 Chandrasekaran S , Price ND. Metabolic constraint-based refinement of transcriptional regulatory networks . PLoS Comput Biol 2013 ; 9 ( 12 ): e1003370. Google Scholar Crossref Search ADS PubMed 81 Marmiesse L , Peyraud R , Cottret L. FlexFlux: combining metabolic flux and regulatory network analyses . BMC Syst Biol 2015 ; 9 ( 1 ): 93. Google Scholar Crossref Search ADS PubMed 82 Wu H , Von Kamp A , Leoncikas V , et al. MUFINS: multi-formalism interaction network simulator . NPJ Syst Biol Appl 2016 ; 2 : 16032. Google Scholar Crossref Search ADS PubMed 83 Fisher CP , Plant NJ , Moore JB , et al. QSSPN: dynamic simulation of molecular interaction networks describing gene regulation, signalling and whole-cell metabolism in human cells . Bioinformatics 2013 ; 29 ( 24 ): 3181 – 90 . Google Scholar Crossref Search ADS PubMed 84 van Berlo RJ , de Ridder D , Daran JM , et al. Predicting metabolic fluxes using gene expression differences as constraints . IEEE/ACM Trans Comput Biol Bioinform 2011 ; 8 ( 1 ): 206 – 16 . Google Scholar Crossref Search ADS PubMed 85 Motamedian E , Mohammadi M , Shojaosadati SA , et al. TRFBA: an algorithm to integrate genome-scale metabolic and transcriptional regulatory networks with incorporation of expression data . Bioinformatics 2017 ; 33 : 1057 – 63 . Google Scholar PubMed 86 Barrett T , Wilhite SE , Ledoux P , et al. NCBI GEO: archive for functional genomics data sets–update . Nucleic Acids Rese 2013 ; 41 ( D1 ): D991 – 5 . Google Scholar Crossref Search ADS 87 Kolesnikov N , Hastings E , Keays M , et al. ArrayExpress update–simplifying data submissions . Nucleic Acids Res 2015 ; 43 : D1113 – 6 . Google Scholar Crossref Search ADS PubMed 88 Rung J , Brazma A. Reuse of public genome-wide gene expression data . Nat Rev Genet 2013 ; 14 ( 2 ): 89 – 99 . Google Scholar Crossref Search ADS PubMed 89 Petryszak R , Keays M , Tang YA , et al. Expression Atlas update–an integrated database of gene and protein expression in humans, animals and plants . Nucleic Acids Res 2016 ; 44 : D746 – 52 . Google Scholar Crossref Search ADS PubMed 90 Chen R , Mallelwar R , Thosar A , et al. GeneChaser: identifying all biological and clinical conditions in which genes of interest are differentially expressed . BMC Bioinformatics 2008 ; 9 ( 1 ): 548. Google Scholar Crossref Search ADS PubMed 91 Engreitz JM , Chen R , Morgan AA , et al. ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression . Bioinformatics 2011 ; 27 ( 23 ): 3317 – 18 . Google Scholar Crossref Search ADS PubMed 92 Rhodes DR , Yu J , Shanker K , et al. ONCOMINE: a cancer microarray database and integrated data-mining platform . Neoplasia 2004 ; 6 ( 1 ): 1 – 6 . Google Scholar Crossref Search ADS PubMed 93 Finger JH , Smith CM , Hayamizu TF , et al. The mouse Gene Expression Database (GXD): 2017 update . Nucleic Acids Res 2017 ; 45 ( D1 ): D730 – 6 . Google Scholar Crossref Search ADS PubMed 94 Ullah AZD , Cutts RJ , Ghetia M , et al. The pancreatic expression database: recent extensions and updates . Nucleic Acids Res 2014 ; 42 ( D1 ): D944 – 9 . Google Scholar Crossref Search ADS PubMed 95 Salehzadeh-Yazdi A , Asgari Y , Saboury AA , et al. Computational analysis of reciprocal association of metabolism and epigenetics in the budding yeast: a genome-scale metabolic model (GSMM) approach . PloS One 2014 ; 9 ( 11 ): e111686. Google Scholar Crossref Search ADS PubMed 96 Vivek-Ananth R , Samal A. Advances in the integration of transcriptional regulatory information into genome-scale metabolic models . Biosystems 2016 ; 147 : 1 – 10 . Google Scholar Crossref Search ADS PubMed 97 Kim MK , Lun DS. Methods for integration of transcriptomic data in genome-scale metabolic models . Comput Struct Biotechnol J 2014 ; 11 ( 18 ): 59 – 65 . Google Scholar Crossref Search ADS PubMed 98 Becker SA , Palsson BO. Context-specific metabolic networks are consistent with experiments . PLoS Comput Biol 2008 ; 4 ( 5 ): e1000082. Google Scholar Crossref Search ADS PubMed 99 Schmidt BJ , Ebrahim A , Metz TO , et al. GIM3E: condition-specific models of cellular metabolism developed from metabolomics and expression data . Bioinformatics 2013 ; 29 ( 22 ): 2900 – 8 . Google Scholar Crossref Search ADS PubMed 100 Shlomi T , Cabili MN , Herrgård MJ , et al. Network-based prediction of human tissue-specific metabolism . Nat Biotechnol 2008 ; 26 ( 9 ): 1003 – 10 . Google Scholar Crossref Search ADS PubMed 101 Blazier AS , Papin JA. Integration of expression data in genome-scale metabolic network reconstructions . Front Physiol 2012 ; 3 : 299 . Google Scholar Crossref Search ADS PubMed 102 Zur H , Ruppin E , Shlomi T. iMAT: an integrative metabolic analysis tool . Bioinformatics 2010 ; 26 ( 24 ): 3140 – 2 . Google Scholar Crossref Search ADS PubMed 103 Rossell S , Huynen MA , Notebaart RA. Inferring metabolic states in uncharacterized environments using gene-expression measurements . PLoS Comput Biol 2013 ; 9 ( 3 ): e1002988. Google Scholar Crossref Search ADS PubMed 104 Agren R , Bordel S , Mardinoglu A , et al. Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT . PLoS Comput Biol 2012 ; 8 ( 5 ): e1002518. Google Scholar Crossref Search ADS PubMed 105 Machado D , Herrgård M. Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism . PLoS Comput Biol 2014 ; 10 ( 4 ): e1003580. Google Scholar Crossref Search ADS PubMed 106 Agren R , Mardinoglu A , Asplund A , et al. Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling . Mol Syst Biol 2014 ; 10 ( 3 ): 721. Google Scholar Crossref Search ADS PubMed 107 Jensen PA , Papin JA. Functional integration of a metabolic network model and expression data without arbitrary thresholding . Bioinformatics 2011 ; 27 ( 4 ): 541 – 7 . Google Scholar Crossref Search ADS PubMed 108 Colijn C , Brandes A , Zucker J , et al. Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production . PLoS Comput Biol 2009 ; 5 ( 8 ): e1000489. Google Scholar Crossref Search ADS PubMed 109 Kim MK , Lane A , Kelley JJ , et al. E-Flux2 and SPOT: validated methods for inferring intracellular metabolic flux distributions from transcriptomic data . PLoS One 2016 ; 11 ( 6 ): e0157101. Google Scholar Crossref Search ADS PubMed 110 Angione C , Lió P. Predictive analytics of environmental adaptability in multi-omic network models . Sci Rep 2015 ; 5 : 15147. Google Scholar Crossref Search ADS PubMed 111 Barker BE , Sadagopan N , Wang Y , et al. A robust and efficient method for estimating enzyme complex abundance and metabolic flux from expression data . Comput Biol Chem 2015 ; 59 : 98 – 112 . Google Scholar Crossref Search ADS PubMed 112 Lee D , Smallbone K , Dunn WB , et al. Improving metabolic flux predictions using absolute gene expression data . BMC Syst Biol 2012 ; 6 ( 1 ): 73. Google Scholar Crossref Search ADS PubMed 113 Yizhak K , Gaude E , Le Dévédec S , et al. Phenotype-based cell-specific metabolic modeling reveals metabolic liabilities of cancer . Elife 2014 ; 3 : e03641. Google Scholar Crossref Search ADS 114 Song HS , Reifman J , Wallqvist A. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle . PloS One 2014 ; 9 ( 11 ): e112524. Google Scholar Crossref Search ADS PubMed 115 Kashaf SS , Angione C , Lió P. Making life difficult for Clostridium difficile: augmenting the pathogen’s metabolic model with transcriptomic and codon usage data for better therapeutic target characterization . BMC Syst Biol 2017 ; 11 ( 1 ): 25. Google Scholar Crossref Search ADS PubMed 116 Jerby L , Shlomi T , Ruppin E. Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism . Mol Syst Biol 2010 ; 6 ( 1 ): 401 . Google Scholar PubMed 117 Vlassis N , Pacheco MP , Sauter T. Fast reconstruction of compact context-specific metabolic network models . PLoS Comput Biol 2014 ; 10 ( 1 ): e1003424. Google Scholar Crossref Search ADS PubMed 118 Wang Y , Eddy JA , Price ND. Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE . BMC Syst Biol 2012 ; 6 ( 1 ): 153. Google Scholar Crossref Search ADS PubMed 119 Pacheco MP , John E , Kaoma T , et al. Integrated metabolic modelling reveals cell-type specific epigenetic control points of the macrophage metabolic network . BMC Genomics 2015 ; 16 ( 1 ): 809. Google Scholar Crossref Search ADS PubMed 120 Schultz A , Qutub AA. Reconstruction of tissue-specific metabolic networks using CORDA . PLoS Comput Biol 2016 ; 12 ( 3 ): e1004808. Google Scholar Crossref Search ADS PubMed 121 Estévez SR , Nikoloski Z. Context-specific metabolic model extraction based on regularized least squares optimization . PloS One 2015 ; 10 ( 7 ): e0131875 . Google Scholar Crossref Search ADS PubMed 122 Fyson N , Kim MK , Lun D , et al. Gene-centric constraint of metabolic models . bioRxiv 2017 , 116558 . 123 Zhang SW , Gou WL , Li Y. Prediction of metabolic fluxes from gene expression data with Huber penalty convex optimization function . Mol Biosyst 2017 , in press. 124 Guo W , Feng X. OM-FBA: integrate transcriptomics data with flux balance analysis to decipher the cell metabolism . PloS One 2016 ; 11 ( 4 ): e0154188. Google Scholar Crossref Search ADS PubMed 125 Deutscher D , Meilijson I , Schuster S , et al. Can single knockouts accurately single out gene functions? BMC Syst Biol 2008 ; 2 ( 1 ): 50. Google Scholar Crossref Search ADS PubMed 126 Nijman S. Synthetic lethality: general principles, utility and detection using genetic screens in human cells . FEBS Lett 2011 ; 585 ( 1 ): 1 – 6 . Google Scholar Crossref Search ADS PubMed 127 Suthers PF , Zomorrodi A , Maranas CD. Genome-scale gene/reaction essentiality and synthetic lethality analysis . Mol Syst Biol 2009 ; 5 : 301 . Google Scholar Crossref Search ADS PubMed 128 Pratapa A , Balachandran S , Raman K. Fast-SL: an efficient algorithm to identify synthetic lethal sets in metabolic networks . Bioinformatics 2015 ; 31 : 3299 – 305 . Google Scholar Crossref Search ADS PubMed 129 Tobalina L , Pey J , Planes FJ. Direct calculation of minimal cut sets involving a specific reaction knock-out . Bioinformatics 2016 ; 32 ( 13 ): 2001 – 7 . Google Scholar Crossref Search ADS PubMed 130 Jerby-Arnon L , Pfetzer N , Waldman YY , et al. Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality . Cell 2014 ; 158 ( 5 ): 1199 – 209 . Google Scholar Crossref Search ADS PubMed 131 Megchelenbrink W , Katzir R , Lu X , et al. Synthetic dosage lethality in the human metabolic network is highly predictive of tumor growth and cancer patient survival . Proc Natl Acad Sci USA 2015 ; 112 ( 39 ): 12217 – 22 . Google Scholar Crossref Search ADS PubMed 132 McAnulty MJ , Yen JY , Freedman BG , et al. Genome-scale modeling using flux ratio constraints to enable metabolic engineering of clostridial metabolism in silico . BMC Syst Biol 2012 ; 6 ( 1 ): 42. Google Scholar Crossref Search ADS PubMed 133 Yen JY , Nazem-Bokaee H , Freedman BG , et al. Deriving metabolic engineering strategies from genome-scale modeling with flux ratio constraints . Biotechnol J 2013 ; 8 ( 5 ): 581 – 94 . Google Scholar Crossref Search ADS PubMed 134 Segre D , Vitkup D , Church GM. Analysis of optimality in natural and perturbed metabolic networks . Proc Natl Acad Sci 2002 ; 99 ( 23 ): 15112 – 17 . Google Scholar Crossref Search ADS PubMed 135 Raval A , Ray A. Introduction to Biological Networks . Boca Raton: CRC Press , 2013 . 136 Yizhak K , Benyamini T , Liebermeister W , et al. Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model . Bioinformatics 2010 ; 26 ( 12 ): i255 – 60 . Google Scholar Crossref Search ADS PubMed 137 Shlomi T , Berkman O , Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations . Proc Natl Acad Sci USA 2005 ; 102 ( 21 ): 7695 – 700 . Google Scholar Crossref Search ADS PubMed 138 Kim J , Reed JL. RELATCH: relative optimality in metabolic networks explains robust metabolic and regulatory responses to perturbations . Genome Biol 2012 ; 13 ( 9 ): R78 . Google Scholar Crossref Search ADS PubMed 139 Angione C , Pratanwanich N , Lió P. A hybrid of metabolic flux analysis and Bayesian factor modeling for multiomic temporal pathway activation . ACS Synth Biol 2015 ; 4 ( 8 ): 880 – 9 . Google Scholar Crossref Search ADS PubMed 140 Oyetunde T , Czajka J , Wu G , et al. Metabolite patterns reveal regulatory responses to genetic perturbations. arXiv preprint arXiv:1701.01744, 2017 . 141 Knorr AL , Jain R , Srivastava R. Bayesian-based selection of metabolic objective functions . Bioinformatics 2007 ; 23 ( 3 ): 351 – 7 . Google Scholar Crossref Search ADS PubMed 142 Sendin J , Exler O , Banga JR. Multi-objective mixed integer strategy for the optimisation of biological networks . IET Syst Biol 2010 ; 4 ( 3 ): 236 – 48 . Google Scholar Crossref Search ADS PubMed 143 Xu G. An iterative strategy for Bi-objective optimization of metabolic pathways. In: Proceedings of the 2011 Fourth International Joint Conference on Computational Sciences and Optimization . New York: IEEE Computer Society, 2011 , 587 – 8 . 144 Angione C , Costanza J , Carapezza G , et al. Multi-target analysis and design of mitochondrial metabolism . PloS One 2015 ; 10 ( 9 ): e0133825. Google Scholar Crossref Search ADS PubMed 145 de Hijas-Liste GM , Klipp E , Balsa-Canto E , et al. Global dynamic optimization approach to predict activation in metabolic pathways . BMC Syst Biol 2014 ; 8 ( 1 ): 1. Google Scholar Crossref Search ADS PubMed 146 Angione C , Carapezza G , Costanza J , et al. Pareto optimality in organelle energy metabolism analysis . IEEE/ACM Trans Comput Biol Bioinform 2013 ; 10 ( 4 ): 1032 – 44 . Google Scholar Crossref Search ADS PubMed 147 Xu M , Bhat S , Smith R , et al. Multi-objective optimisation of metabolic productivity and thermodynamic performance . Comput Chem Eng 2009 ; 33 ( 9 ): 1438 – 50 . Google Scholar Crossref Search ADS 148 Sendín OH , Vera J , Torres NV , et al. Model based optimization of biochemical systems using multiple objectives: a comparison of several solution strategies . Math Comput Model Dyn Syst 2006 ; 12 ( 5 ): 469 – 87 . Google Scholar Crossref Search ADS 149 Angione C , Costanza J , Carapezza G , et al. Analysis and design of molecular machines . Theor Comput Sci 2015 ; 599 : 102 – 17 . Google Scholar Crossref Search ADS 150 Costanza J , Carapezza G , Angione C , et al. Robust design of microbial strains . Bioinformatics 2012 ; 28 ( 23 ): 3097 – 104 . Google Scholar Crossref Search ADS PubMed 151 Deb K , Pratap A , Agarwal S , et al. A fast and elitist multiobjective genetic algorithm: NSGA-II . Trans Evol Comp 2002 ; 6 ( 2 ): 182 – 97 . Available from: http://dx.doi.org/10.1109/4235.996017. Google Scholar Crossref Search ADS 152 Fortin FA , Parizeau M. Revisiting the NSGA-II crowding-distance computation. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation . New York: ACM, 2013 , 623 – 30 . 153 Zheng J , Shen R , Zou J. Enhancing diversity for NSGA-II in evolutionary multi-objective optimization. In: Eighth International Conference on Natural Computation (ICNC), 2012 . New York: IEEE, 2012 , 654 – 7 . 154 Burgard AP , Pharkya P , Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization . Biotechnol Bioeng 2003 ; 84 ( 6 ): 647 – 57 . Google Scholar Crossref Search ADS PubMed 155 Pharkya P , Burgard AP , Maranas CD. Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock . Biotechnol Bioeng 2003 ; 84 ( 7 ): 887 – 99 . Google Scholar Crossref Search ADS PubMed 156 Tepper N , Shlomi T. Predicting metabolic engineering knockout strategies for chemical production: accounting for competing pathways . Bioinformatics 2010 ; 26 ( 4 ): 536 – 43 . Google Scholar Crossref Search ADS PubMed 157 Ranganathan S , Suthers PF , Maranas CD. OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions . PLoS Comput Biol 2010 ; 6 ( 4 ): e1000744. Google Scholar Crossref Search ADS PubMed 158 Chowdhury A , Zomorrodi AR , Maranas CD. k-OptForce: integrating kinetics with flux balance analysis for strain design . PLoS Comput Biol 2014 ; 10 ( 2 ): e1003487. Google Scholar Crossref Search ADS PubMed 159 Xu Z , Zheng P , Sun J , et al. ReacKnock: identifying reaction deletion strategies for microbial strain optimization based on genome-scale metabolic network . PloS One 2013 ; 8 ( 12 ): e72150. Google Scholar Crossref Search ADS PubMed 160 Pharkya P , Burgard AP , Maranas CD. OptStrain: a computational framework for redesign of microbial production systems . Genome Res 2004 ; 14 ( 11 ): 2367 – 76 . Google Scholar Crossref Search ADS PubMed 161 Kim J , Reed JL , Maravelias CT. Large-scale bi-level strain design approaches and mixed-integer programming solution techniques . PLoS One 2011 ; 6 ( 9 ): e24162. Google Scholar Crossref Search ADS PubMed 162 Lun DS , Rockwell G , Guido NJ , et al. Large-scale identification of genetic design strategies using local search . Mol Syst Biol 2009 ; 5 : 296 . Google Scholar Crossref Search ADS PubMed 163 Fowler ZL , Gikandi WW , Koffas MA. Increased malonyl coenzyme A biosynthesis by tuning the Escherichia coli metabolic network and its application to flavanone production . Appl Environ Microbiol 2009 ; 75 ( 18 ): 5831 – 9 . Google Scholar Crossref Search ADS PubMed 164 Rocha M , Maia P , Mendes R , et al. Natural computation meta-heuristics for the in silico optimization of microbial strains . BMC Bioinformatics 2008 ; 9 ( 1 ): 499. Google Scholar Crossref Search ADS PubMed 165 Nagrath D , Avila-Elchiver M , Berthiaume F , et al. Soft constraints-based multiobjective framework for flux balance analysis . Metab Eng 2010 ; 12 ( 5 ): 429 – 45 . Google Scholar Crossref Search ADS PubMed 166 Kelk SM , Olivier BG , Stougie L , et al. Optimal flux spaces of genome-scale stoichiometric models are determined by a few subnetworks . Sci Rep 2012 ; 2 : 580. Google Scholar Crossref Search ADS PubMed 167 Maarleveld TR , Wortel MT , Olivier BG , et al. Interplay between constraints, objectives, and optimality for genome-scale stoichiometric models . PLoS Comput Biol 2015 ; 11 ( 4 ): e1004166. Google Scholar Crossref Search ADS PubMed 168 Zomorrodi AR , Maranas CD. OptCom: a multi-level optimization framework for the metabolic modeling and analysis of microbial communities . PLoS Comput Biol 2012 ; 8 ( 2 ): e1002363. Google Scholar Crossref Search ADS PubMed 169 Khandelwal RA , Olivier BG , Röling WF , et al. Community flux balance analysis for microbial consortia at balanced growth . PLoS One 2013 ; 8 ( 5 ): e64567. Google Scholar Crossref Search ADS PubMed 170 Gottstein W , Olivier BG , Bruggeman FJ , et al. Constraint-based stoichiometric modelling from single organisms to microbial communities . J R Soc Interface 2016 ; 13 ( 124 ): 20160627. Google Scholar Crossref Search ADS PubMed 171 Shoaie S , Ghaffari P , Kovatcheva-Datchary P , et al. Quantifying diet-induced metabolic changes of the human gut microbiome . Cell Metab 2015 ; 22 ( 2 ): 320 – 31 . Google Scholar Crossref Search ADS PubMed 172 Louca S , Doebeli M. Calibration and analysis of genome-based models for microbial ecology . Elife 2015 ; 4 : e08208. Google Scholar Crossref Search ADS PubMed 173 Harcombe WR , Riehl WJ , Dukovski I , et al. Metabolic resource allocation in individual microbes determines ecosystem interactions and spatial dynamics . Cell Rep 2014 ; 7 ( 4 ): 1104 – 15 . Google Scholar Crossref Search ADS PubMed 174 Willemsen AM , Hendrickx DM , Hoefsloot HC , et al. MetDFBA: incorporating time-resolved metabolomics measurements into dynamic flux balance analysis . Mol Biosyst 2015 ; 11 ( 1 ): 137 – 45 . Google Scholar Crossref Search ADS PubMed 175 Zomorrodi AR , Islam MM , Maranas CD. d-OptCom: dynamic multi-level and multi-objective metabolic modeling of microbial communities . ACS Synth Biol 2014 ; 3 ( 4 ): 247 – 57 . Google Scholar Crossref Search ADS PubMed 176 Balsa-Canto E , Henriques D , Gábor A , et al. AMIGO2, a toolbox for dynamic modeling, optimization and control in systems biology . Bioinformatics 2016 ; 32 : 3357 – 3359 . Google Scholar Crossref Search ADS PubMed 177 Tran LM , Rizk ML , Liao JC. Ensemble modeling of metabolic networks . Biophys J 2008 ; 95 ( 12 ): 5606 – 17 . Google Scholar Crossref Search ADS PubMed 178 Lee Y , Rivera JGL , Liao JC. Ensemble modeling for robustness analysis in engineering non-native metabolic pathways . Metab Eng 2014 ; 25 : 63 – 71 . Google Scholar Crossref Search ADS PubMed 179 Oh YG , Lee DY , Lee SY , et al. Multiobjective flux balancing using the NISE method for metabolic network analysis . Biotechnol Prog 2009 ; 25 ( 4 ): 999 – 1008 . Google Scholar Crossref Search ADS PubMed 180 El Samad H , Khammash M , Homescu C , et al. Optimal performance of the heat-shock gene regulatory network . IFAC Proc Vol 2005 ; 38 ( 1 ): 19 – 24 . Google Scholar Crossref Search ADS 181 Stracquadanio G , Umeton R , Papini A , et al. Analysis and optimization of c3 photosynthetic carbon metabolism. In: IEEE International Conference on BioInformatics and BioEngineering (BIBE), 2010 . New York: IEEE, 2010 , 44 – 51 . 182 Zhang HX , Goutsias J. A comparison of approximation techniques for variance-based sensitivity analysis of biochemical reaction systems . BMC Bioinformatics 2010 ; 11 ( 1 ): 246 . Google Scholar Crossref Search ADS PubMed 183 Petzold L , Li S , Cao Y , et al. Sensitivity analysis of differential-algebraic equations and partial differential equations . Comput Chem Eng 2006 ; 30 ( 10 ): 1553 – 9 . Google Scholar Crossref Search ADS 184 Kiparissides A , Hatzimanikatis V. Thermodynamics-based metabolite sensitivity analysis in metabolic networks . Metab Eng 2017 ; 39 : 117 – 27 . Google Scholar Crossref Search ADS PubMed 185 Miskovic L , Hatzimanikatis V. Production of biofuels and biochemicals: in need of an ORACLE . Trends Biotechnol 2010 ; 28 ( 8 ): 391 – 7 . Google Scholar Crossref Search ADS PubMed 186 Andreozzi S , Miskovic L , Hatzimanikatis V. iSCHRUNK–In Silico approach to characterization and reduction of uncertainty in the kinetic models of genome-scale metabolic networks . Metab Eng 2016 ; 33 : 158 – 68 . Google Scholar Crossref Search ADS PubMed 187 Teetor PR. Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics . O’Reilly Media , 2011 . 188 Matloff N. The Art of R Programming: A Tour of Statistical Software Design . San Francisco: No Starch Press , 2011 . 189 Deb K. Multi-Objective Optimization Using Evolutionary Algorithms, 2001 . Chicheter : John-Wiley , 2001 . 190 Sun S. A survey of multi-view machine learning . Neural Comput Appl 2013 ; 23 ( 7–8 ): 2031 – 8 . Google Scholar Crossref Search ADS 191 Frolova A , Obolenska M. Integrative approaches for data analysis in systems biology: current advances. In: 2016 II International Young Scientists Forum on Applied Physics and Engineering (YSF) . 2016 , 194–8. IEEE, New York. 192 Ritchie MD , Holzinger ER , Li R , et al. Methods of integrating data to uncover genotype-phenotype interactions . Nat Rev Genet 2015 ; 16 ( 2 ): 85 – 97 . Google Scholar Crossref Search ADS PubMed 193 Serra A , Fratello M , Fortino V , et al. MVDA: a multi-view genomic data integration methodology . BMC Bioinformatics 2015 ; 16 ( 1) : 261 . Google Scholar Crossref Search ADS PubMed 194 Taskesen E , Huisman SM , Mahfouz A , et al. Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics . Sci Rep 2016 ; 6 ( 24949 ): 1 – 13 . Google Scholar PubMed 195 Speicher NK , Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery . Bioinformatics 2015 ; 31 ( 12 ): i268 – 75 . Google Scholar Crossref Search ADS PubMed 196 McLachlan GJ , Bean RW , Ng SK. Clustering . In: Bioinformatics: Structure, Function and Applications . 2008 , 423 – 39 . IEEE, New York. 197 Chen X , Xu X , Huang JZ , et al. TW-k-means: automated two-level variable weighting clustering algorithm for multiview data . IEEE Trans Knowl Data Eng 2013 ; 25 ( 4 ): 932 – 44 . Google Scholar Crossref Search ADS 198 Shen R , Olshen AB , Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis . Bioinformatics 2009 ; 25 ( 22 ): 2906 – 12 . Google Scholar Crossref Search ADS PubMed 199 Greene D , Cunningham P. A matrix factorization approach for integrating multiple data views. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Heidelberg: Springer Verlag GmbH, 2009 , 423 – 38 . 200 Wang B , Mezlini AM , Demir F , et al. Similarity network fusion for aggregating data types on a genomic scale . Nat Methods 2014 ; 11 ( 3 ): 333 – 7 . Google Scholar Crossref Search ADS PubMed 201 Angione C , Conway M , Lió P. Multiplex methods provide effective integration of multi-omic data in genome-scale models . BMC Bioinformatics 2016 ; 17 ( 4 ): 257. Google Scholar PubMed 202 Wasito I , Istiqlal AN , Budi I. Data integration model for cancer subtype identification using Kernel Dimensionality Reduction-Support Vector Machine (KDR-SVM). In: 2012 7th International Conference on Computing and Convergence Technology (ICCCT) . 2012 , 876 – 80 . IEEE, New York. 203 Li Y , Wu FX , Ngom A. A review on machine learning principles for multi-view biological data integration . Brief Bioinformatics 2016 , in press. 204 Breiman L. Random forests . Mach Learn 2001 ; 45 ( 1 ): 5 – 32 . Google Scholar Crossref Search ADS 205 Fortino V , Kinaret P , Fyhrquist N , et al. A robust and accurate method for feature selection and prioritization from multi-class OMICs data . PloS One 2014 ; 9 ( 9 ): e107801. Google Scholar Crossref Search ADS PubMed 206 Acharjee A , Ament Z , West JA , et al. Integration of metabolomics, lipidomics and clinical data using a machine learning method . BMC Bioinformatics 2016 ; 17 ( 15 ): 37. Google Scholar PubMed 207 Pratanwanich N , Lió P. Exploring the complexity of pathway–drug relationships using latent Dirichlet allocation . Comput Biol Chem 2014 ; 53 : 144 – 52 . Google Scholar Crossref Search ADS PubMed © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling JO - Briefings in Bioinformatics DO - 10.1093/bib/bbx053 DA - 2018-11-27 UR - https://www.deepdyve.com/lp/oxford-university-press/seeing-the-wood-for-the-trees-a-forest-of-methods-for-optimization-and-7wwxWSBdMn SP - 1218 VL - 19 IS - 6 DP - DeepDyve ER -