Access the full text.
Sign up today, get DeepDyve free for 14 days.
Motivation: Transcriptomics and proteomics data have been integrated into constraint-based mod- els to inﬂuence ﬂux predictions. However, it has been reported recently for Escherichia coli and Saccharomyces cerevisiae, that model predictions from parsimonious ﬂux balance analysis (pFBA), which does not use expression data, are as good or better than predictions from various algorithms that integrate transcriptomics or proteomics data into constraint-based models. Results: In this paper, we describe a novel constraint-based method called Linear Bound Flux Balance Analysis (LBFBA), which uses expression data (either transcriptomic or proteomic) to pre- dict metabolic ﬂuxes. The method uses expression data to place soft constraints on individual ﬂuxes, which can be violated. Parameters in the soft constraints are ﬁrst estimated from a training expression and ﬂux dataset before being used to predict ﬂuxes from expression data in other con- ditions. We applied LBFBA to E.coli and S.cerevisiae datasets and found that LBFBA predictions were more accurate than pFBA predictions, with average normalized errors roughly half of those from pFBA. For the ﬁrst time, we demonstrate a computational method that integrates expression data into constraint-based models and improves quantitative ﬂux predictions over pFBA. Availability and implementation: Code is available in the Supplementary data available at Bioinformatics online. Contact: email@example.com Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction enabled transcriptomics and proteomics datasets to become more Constraint-based modeling (CBM) can be used to predict cell physi- widely available. These omics datasets can be used to derive ology (e.g. growth rate and metabolic fluxes) under different condi- expression-based CBM constraints and/or objective functions, tions and improve our understanding of cell metabolism. CBM has which can potentially improve model predictions. been applied in metabolic engineering (Burgard et al., 2003; Cotten There are two fundamental ways that expression data has been and Reed, 2013; Kim et al., 2011; Tervo and Reed, 2012), metabolic integrated into constraint-based models. The first way is to directly comparisons (Bosi et al., 2016; Hamilton and Reed, 2012; Nuccio integrate the expression information into the flux bound. For ex- ˚ ˚ and Ba ¨ umler, 2014), drug discovery (Chavali et al., 2012; Kim ample, Akesson et al. (Akesson et al., 2004)set thefluxestozeroif et al., 2010; Lee et al., 2009) and other health applications (Becker expression of their associated genes was low. E-Flux (Colijn et al., and Palsson, 2008; Magnu ´ sdo ´ ttir et al., 2017; Shlomi et al., 2008). 2009) directly models the maximum allowable flux value as a Recent developments in sequencing and mass spectrometry have function of measured gene expression. The second way is to divide V The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org 3882 Using expression data to improve flux predictions 3883 Table 1. A comparison between different constraint-based methods integrating gene expression data Akesson E-flux GIMME iMAT tFBA MADE PROM LBFBA Directly integrated gene expression into ﬂux bound Yes Yes No No No No Yes Yes Maximized agreement or minimized violation between ﬂux and gene expression No No Yes Yes Yes Yes No No Needs ﬂux data to parameterize constraints No No No No No No No Yes a a Compared ﬂux predictions to measured intracellular ﬂuxes Yes (4 ) No No No No No No Yes (37 ) Number of experimental conditions used 1 1 1 1 9 4 907 28 The number of ﬂuxes that were compared. Sensitivity analysis showed that 4 or 5 conditions in the training dataset were sufﬁcient. ˚ ˚ Note: These methods are Akesson (Akesson et al., 2004), E-ﬂux (Colijn et al., 2009), GIMME (Becker and Palsson, 2008), Imat (Shlomi et al., 2008), tFBA (van Berlo et al., 2011), MADE (Jensen and Papin, 2011), PROM (Chandrasekaran and Price, 2010) and LBFBA. the reactions into different categories based on gene expression (e.g. involves different constraints and an objective function. Flux bal- highly expressed or lowly expressed) and then maximize the agree- ance analysis (FBA) is one of the CBM methods often used to pre- ment (reactions associated with highly expressed genes have high dict a flux distribution which maximizes biomass yield. pFBA uses flux) or minimize the disagreement (reactions associated with lowly the sum of the absolute value of the fluxes as an objective function expressed genes should not have high flux) between flux and expres- [Equation (1)] and can be formulated as: sion. For example, Gene Inactivity Moderated by Metabolism and X min jv j (1) Expression (GIMME) (Becker and Palsson, 2008) minimizes the flux j2Reaction through reactions whose associated genes’ expression falls below a given threshold, weighted by the difference between the genes’ ex- s.t. pression and the threshold. iMAT (Shlomi et al., 2008) divides the S v ¼ 0 8i 2 Metabolite (2) ij j reactions into those associated with highly expressed genes and reac- j2Reaction tions associated with lowly expressed genes, and maximizes the num- ber of reactions whose fluxes are consistent with their gene LB v UB 8j 2 Reaction (3) j j j expression state. Transcriptionally controlled flux balance analysis (tFBA) (van Berlo et al., 2011) assumes that if gene expression v 0 8j 2 Irreversible Reaction (4) changes significantly from one condition to the other, the flux through the reaction associated with that gene will change according- ls v ¼ v 8j 2 Extracellular Reaction (5) ly. tFBA minimizes the violation of this assumption. A detailed com- parison of different methods is shown in Table 1. v ¼ v (6) biomass measured biomass Machado and Herrga ˚rd (Machado and Herrga ˚ rd, 2014) com- pared flux predictions made by several of these expression-based Equation (2) is the steady-state mass balance constraint, meaning methods and parsimonious flux balance analysis (pFBA), which there is no accumulation for each metabolite in the cell. S denotes does not use expression data and instead maximizes biomass yield the stoichiometric matrix where S is the stoichiometric coefficient ij and minimizes total flux through the network (Lewis et al., 2010). of metabolite i for reaction j. v is the flux through reaction j. They found that pFBA predictions were as good as or better than Equation (3) is the enzyme capacity constraint, which imposes an those obtained using existing methods that integrated transcriptom- upper bound (UB ) and lower bound (LB ) for each reaction (which j j ics/proteomics data. In this paper, we developed a CBM method is typically 1000 and 1000 mmol/gDW/h, respectively). Equation called Linear Bound FBA (LBFBA) that uses expression data to more (4) ensures that fluxes through irreversible reactions are non- accurately predict intracellular fluxes than pFBA. The method adds negative. Equation (5) fixes the extracellular flux values to the best- expression-derived flux bounds for a subset of fluxes based on the ls estimates of the extracellular fluxes (v ) obtained from a least gene/protein expression data. These upper and lower flux bounds squares fit between the metabolic model and extracellular flux meas- are linear functions of the transcriptomics/proteomics data and are urements (see Supplementary Methods for details). Equation (6) reaction-specific. Instead of applying a single bound function to all fixes the biomass flux (i.e. growth rate) to the measured value. By reactions, LBFBA derives reaction-specific bounds for each reaction, solving pFBA, the flux distribution under a specific condition can be which are learned from the transcriptomics (or proteomics) and predicted. fluxomics data in a training dataset. LBFBA requires measured tran- scriptomics or proteomics data and fluxomics data to first param- eterize the flux bound functions and requires only transcriptomics 2.2 Mathematical formulation of LBFBA or proteomics data to predict the flux distribution in a new condi- In LBFBA, gene or protein expression data are used to further tight- tion. LBFBA was applied to Escherichia coli and Saccharomyces cer- en the upper and lower bounds for individual fluxes. LBFBA is for- evisiae datasets and the flux distributions predicted by LBFBA were mulated as the following optimization problem: found to be more accurate than those fluxes predicted by pFBA. X X min jv jþ b a (7) j j j2Reaction j2Rexp 2 Materials and methods s.t. 2.1 Overview of pFBA S v ¼ 0 8i 2 Metabolite (8) ij j CBM is a powerful tool to predict cellular phenotypes and flux dis- j2Reaction tributions. The basic formulation of a constraint-based model 3884 M.Tian and J.L.Reed LB v UB 8j 2 Reaction (9) j j j v 0 8j 2 Irreversible Reaction (10) ls v ¼ v 8j 2 Extracellular Reaction (11) v ¼ v (12) biomass measured biomass v ða g þ c Þ a v v ða g þ b Þþ a 8j 2 R (13) glucose j j j j j glucose j j j j exp a 0 8j 2 R (14) j exp LBFBA includes the same mass balance, directionality, and capacity constraints as pFBA [Equations (8)–(12)]. Assuming the flux bounds are linearly correlated with gene or protein expression level, we for- Fig. 1. Relative ﬂux through the phosphoglucose isomerase (PGI) reaction mulated expression-based constraints [Equation (13)] for the subset versus gene expression (b4025). The x-axis represents the mRNA expression value for b4025, which encodes Pgi. The y-axis represents the relative ﬂux, of reactions (R ) with measured flux and expression values in a exp which is the ﬂux through the PGI reaction divided by the glucose uptake rate. training dataset. In this work, R contained 37 reactions for E.coli exp Given 29 pairs of b4025 gene expression and PGI ﬂux measurements in the based on a mutant multi-omics dataset (Ishii et al., 2007) and 33 E.coli mutant multi-omics dataset, 28 data points are plotted when perform- reactions for a yeast aerobicity multi-omics dataset (Jouhten et al., ing a leave-one-out cross-validation to predict ﬂuxes in the test condition. 2008; Rintala et al., 2009; Wiebe et al., 2008;). In Equation (13), g The two parallel lines shown are used to bound the PGI ﬂux given a gene ex- is the expression level for reaction j. The gene-to-protein-to-reaction pression value for b4025. The slope (a) and intercepts (b and c) for the two lines can be obtained using linear optimization [Equations (15) and (16)] (GPR) association is used to calculate g from the gene or protein ex- pression levels for genes associated with reaction j. If one reaction can be catalyzed by multiple isoenzymes, then g is calculated as the j testing dataset. To obtain these parameters (a , b , and c ), a linear j j j sum of the expression of these isoenzymes. For GPRs involving sub- program was formulated separately for each reaction j in R : exp units, the expression is the minimum of expression across all subu- min ðb c Þ (15) j j nits. In Equation (13), a , b and c are parameters, which are j j j estimated from expression and flux measurements from independ- s.t. ent training datasets [using Equations (15) and (16)]. Given ex- pression levels for each reaction (g ) and fixed parameters a ,and j j j k k a g þ c a g þ b 8k 2 K (16) j j j j j j c , the upper and lower flux bounds can be calculated for reactions j v glucose in R [using Equation (13)]. a is a nonnegative slack variable exp j We used the mutant multi-omics dataset for E.coli and S.cerevisiae which is introduced to avoid infeasible flux bounds [Equation multi-omics dataset (see below) containing transcriptomics, proteo- (14)] and LBFBA tries to make these slack variables as small as mics and fluxomics to estimate the parameters used in Equation possible. Therefore, these slack variables are weighted by b and (13). As before g is the expression level for reaction j. The flux for penalized in the objective function. Unless noted otherwise, b was j reaction j in condition k is defined as v , and the relative flux is set to 10 (see Supplementary Fig. S1 for sensitivity analysis on the k k k k found by dividing it by v (glucose uptake rate). g ; v ; v value of b). LBFBA attempts to minimize the sum of absolute val- j j glucose glucose were obtained from the multi-omics dataset. The set K includes all ues of all the fluxes with the smallest total flux bound violations of the different conditions used from the multi-omics dataset to param- the expression-based constraints [Equation (7)]. When using differ- eterize the flux bounds. All the points must be lay between the two ent multi-omics datasets as training and test sets, z-scores for the g lines defined with slope a and intercept b , c . The vertical distance values were used instead along with minor modifications to the j j j between the two lines, represented as b c should be as small as LBFBA constraints (see Supplementary Methods for details). j j possible so that there are tight bounds for each reaction flux. This Both pFBA and LBFBA were implemented for a genome- parameter estimation algorithm requires at least three pairs of ex- scale metabolic reconstruction of E.coli iJO1366 (Orth et al., 2011) pression level and flux measurements (i.e. K 3). We included the and S.cerevisiae iMM904 (Mo et al., 2009). The following reactions points with 0 expression and 0 flux (which occurs for mutants with- were deleted in all iJO1366 simulations: CAT, FHL, SPODM, out isozymes) in the parameter estimation since the flux must be SPODMpp, DHPTDNR, DHPTDNRN, SUCASPtpp, SUCFUMtpp, zero if there is no enzyme available. SUCMALtpp and SUCTARTtpp. All optimization problems were solved using CPLEX in GAMS optimization environment (GAMS Development Corporation, Washington, DC, USA) and CVX 2.4 Normalized error calculation (Grant and Boyd, 2014; Grant and Boyd, 2008), a package specify- The normalized errors were used to compare how close the pre- ing and solving convex programs in MATLAB. dicted flux distribution was to the measured flux distribution and was calculated as: X X 2.3 LBFBA expression parameterization p;k k k jv v j= jv j (17) j j j The goal for the LBFBA parameter estimation is to find two parallel j2measured j2measured lines, which correspond to the upper and lower flux bounds, for p;k each individual flux in R (Fig. 1). Therefore, the slope (a ) and v is the predicted flux through reaction j in condition k and is exp j intercepts for the two lines (b and c ) need to be calculated for each the solution from pFBA [Equations (1)–(6)] or LBFBA [Equations j j reaction using a training dataset, which is independent from the (7)–(14)]. v is the measured flux in condition k through reaction j j Using expression data to improve flux predictions 3885 in the dataset. The normalized errors can vary due to alternative op- For example, in condition 28 (wild-type E.coli at a growth rate of timal solutions. Minimum and maximum possible normalized errors 0.7 h ) pFBA falsely predicted gluconate production, which for a particular flux prediction method and condition were calcu- diverts carbon from glycolysis and thus predicted lower fluxes in lated (see Supplementary Methods for details). glycolysis and the TCA cycle. LBFBA with proteomics generally outperformed pFBA (except for conditions 4 and 5) and LBFBA with transcriptomics in terms of accuracy. We also performed a 10-fold cross-validation and found similar results (Supplementary 3 Results Fig. S2). 3.1 Application of LBFBA to the E.coli mutant multi- To determine the optimal training set size, a sensitivity analysis with respect to training set size for LBFBA with transcriptomics was omics dataset performed (Fig. 3). When the training set size was 0, pFBA was run To evaluate the accuracy of LBFBA predictions, we applied LBFBA for each of the 29 conditions. The average normalized error for to the E.coli mutant multi-omics dataset (Ishii et al., 2007) and per- pFBA across the 29 conditions was around 40%. The training set formed leave-one-out (LOOCV) and 10-fold cross-validation. In the size (K) in LBFBA was then varied from 3 to 28. When the training E.coli mutant multi-omics dataset, transcriptomics, proteomics and set size was K, K conditions were randomly selected from the 29 fluxomics were measured under 29 glucose aerobic conditions. conditions as the training set to estimate the parameters (slope a Conditions 1–24 are knockout mutants with growth rates of j and intercept b ; c for the set of measured fluxes), and then LBFBA 0.2 h . Conditions 25–29 are wild-type strains with growth rates of j j was run on the testing set, containing the other (29-K) conditions. 0.1, 0.4, 0.5, 0.7, 0.2 h . The average normalized error for a given training set of size K was In LOOCV, one condition was selected as the test set each time calculated by averaging the normalized errors for the (29-K) inde- and the other 28 conditions were used as the training set to obtain pendent conditions in the testing set. For each training set size ex- the a , b and c values. The normalized errors were calculated and j j j cept zero, this process was repeated 100 times. Each box plot in compared to normalized errors from pFBA (Fig. 2). LBFBA was Figure 3 represents results from 100 randomly chosen testing data- used with transcriptomics or proteomics data. For each condition, sets. The median of these 100 average normalized errors is repre- the best possible normalized error (Best case) was also calculated sented as the horizontal line in each box plot. As shown in Figure 3, by making reaction fluxes as close as possible to their measured the best training set size was 4 or 5. When the training set size was value (see Supplementary Methods for details).Among the29 con- larger than seven, the median average normalized error started to in- ditions, except for conditions 4 (E.coli Dpgi knockout at a growth crease. If a suitable training set size (4 or 5) was chosen, the median rate of 0.2 h ) and 26 (wild-type E.coli at a growth rate of of the 100 average normalized errors was around 20%. Compared 0.4 h ), LBFBA with transcriptomics outperformed pFBA in terms to pFBA, where the average normalized error was around 40%, of accuracy (Wilcoxon signed-rank test, P-value ¼ 3.1e 5). LBFBA decreased the average normalized error by more than half. The parameter values a , b and c for all 37 reactions, which led to j j j the lowest normalized error for K ¼ 3 are listed in Supplementary Table S1. Fig. 2. Simulation result for LBFBA compared with pFBA for the E.coli dataset. The x-axis represents the 29 conditions in the multi-omics dataset. Each con- dition has measured transcriptomics, proteomics and ﬂuxomics data. The y-axis represents the normalized error, which is deﬁned as sum of the abso- lute error for each measured reaction divided by the sum of the absolute Fig. 3. Sensitivity analysis of the training set size using transcriptomics data. value of experimental measurements [Equation (17)]. For each condition, The x-axis represents the number of instances (conditions) in the training set. data from the 28 other conditions were used to parameterize ﬂux bounds The y-axis represents average normalized error for the test set. The average before LBFBA was performed on the test condition (i.e. leave-one-out-cross- normalized error for pFBA across the 29 conditions is around 40% (corre- validation). For most cases, pFBA (hollow diamond) has a higher normalized sponding to a training set size of zero). Each Tukey box plot was created using error than LBFBA integrating transcriptomics data (star) or proteomics data the boxplot function in MATLAB. The horizontal line represents the median (solid triangle). Integrating proteomics data were more accurate than inte- value. The top and bottom edges of the boxes indicate the 75th and 25th per- grating transcriptomics for most conditions. The best case (hollow circle) rep- centiles and the plus signs indicate the outliers (which are 1.5 times the inter- resents the lowest possible error achieved by ﬁtting constraint-based models quartile range away from the top/bottom edge of the boxes, where the to the measured ﬂuxes. All the normalized errors predicted by pFBA or interquartile range is the difference between 75th and 25th percentile values). LBFBA were unique (i.e. ﬂuxes through reactions in R had no variability The whiskers indicate the furthest values away from the boxes that are not exp across alternative pFBA or LBFBA optimal solutions) considered outliers 3886 M.Tian and J.L.Reed Fig. 4. Simulation result for LBFBA compared with pFBA for Gerosa E.coli dataset. The x-axis represents the eight conditions in the Gerosa multi-omics dataset. Each condition has measured transcriptomics and ﬂuxomics data. Fig. 5. Simulation result for LBFBA compared with pFBA for a yeast dataset. The y-axis represents the normalized error, which is deﬁned as sum of the ab- The x-axis represents the ﬁve conditions with different oxygen composition. solute error for each measured reaction divided by the sum of the absolute Each condition has measured transcriptomics, proteomics and ﬂuxomics value of experimental measurements [Equation (17)]. For each condition, data. The y-axis represents the normalized error, which is deﬁne as sum of data from the 29 conditions in the E.coli mutant multi-omics dataset by Ishii the absolute error for each measured reaction divided by the sum of the abso- et al. was used to parameterize ﬂux bounds before LBFBA was performed on lute value of experimental measurements [Equation (17)]. The error bar for the eight conditions in Gerosa E.coli dataset. The z-scores for expression val- each normalized error was generated by minimizing and maximizing the nor- ues were used instead of the expression values directly (see Supplementary malized error while ﬁxing the LBFBA objective. For all 5 conditions, pFBA Methods for details). The best case (hollow circle) represents the lowest pos- (hollow diamond) had a higher normalized error than LBFBA integrating tran- sible error achieved by ﬁtting constraint-based models to the measured scriptomics data (star) and proteomics data (solid triangle). The error bar for ﬂuxes. All the normalized errors predicted by pFBA or LBFBA were unique LBFBA was also smaller than pFBA. The best case (hollow circle) represents the lowest possible error achievable by the iMM904 constraint-based model 3.2 Application of LBFBA to E.coli carbon source multi- omics datasets the inlet gas increased from 0 (fully anaerobic) to 20.9% (fully aer- To evaluate the accuracy of LBFBA predictions on a fully independ- obic). Following the same protocol as in Section 3.1, the normal- ent test set, we used the E.coli mutant multi-omics dataset (Ishii ized errors for LBFBA compared with pFBA were found (Fig. 5). et al., 2007) to parameterize the flux bounds (values listed in Each error bar represents the maximum and minimum normalized Supplementary Table S2). Flux distributions for another E.coli car- error that the method can achieve for a given condition due to al- bon source multi-omics dataset (Gerosa et al., 2015) were predicted ternative optimal solutions (see Supplementary Methods for details given transcriptomics data. In Gerosa et al., transcriptomics and on how these error bars were calculated). LBFBA improved the fluxomics data were measured for wild-type E.coli under eight dif- yeast flux predictions and had a smaller error range for each condi- ferent carbon sources (acetate, fructose, galactose, glucose, glycerol, tion, indicating a smaller flux variability across alternative solu- gluconate, pyruvate and succinate). Normalized errors for LBFBA tions. Unlike the E.coli mutant multi-omics datasets, LBFBA had compared with pFBA were found (Fig. 4). LBFBA slightly outper- similar performance using proteomics or transcriptomics data. The forms pFBA in terms of accuracy for conditions 2, 3, 7 (fructose, normalized errors for the best possible fit to the measured yeast galactose, pyruvate as a carbon source, respectively). pFBA slightly fluxes was high for conditions 2–5, indicating there are significant outperforms LBFBA for conditions 1, 4, 5, 6 and 8 (acetate, glucose, inconsistencies between the yeast metabolic model (iMM904) and glycerol, succinate as a carbon source, respectively). Unlike the mu- measured fluxes. tant E.coli multi-omics dataset, the normalized errors between pFBA and LBFBA for this carbon source multi-omics dataset was not significantly different (Wilcoxon signed-rank test, 3.4 Impact of reaction-specific flux bounds P-value ¼ 0.25). This is mainly because of the large differences be- LBFBA uses reaction-specific bounds to limit the flux instead of tween flux distributions in the training set (where glucose was the using the same bound for every flux like pFBA or using a global as- carbon source) and the test set (using eight different carbon sources). sumption for all reactions (e.g. highly expressed reactions have high Condition 6 (where gluconate was the carbon source) does not fit flux). This is the key difference between LBFBA and other methods the E.coli metabolic model (iJO1366) well, as the normalized errors like MADE (Jensen and Papin, 2011), GIMME (Becker and Palsson, for the best possible fit (Best case) to the measured E.coli fluxes was 2008), IMAT (Shlomi et al., 2008) and tFBA (van Berlo et al., high. 2011). To evaluate whether LBFBA performs better due to these reaction-specific bounds, we instead applied common linear equa- tions for flux bounds for every reaction in R and predicted the exp 3.3 Application of LBFBA to a yeast aerobicity multi- flux distribution using LBFBA. These new generic linear bounds omics dataset were parameterized by analyzing 1036 pairs of transcriptomics and A yeast aerobicity multi-omics dataset (Jouhten et al., 2008; fluxomic measurements from the E.coli mutant multi-omics dataset, Rintala et al.,2009; Wiebe et al.,2008) was also used for the eval- which included 37 measured fluxes under 28 conditions (Fig. 6 uating LBFBA predictions using a LOOCV. In the yeast multi- shows data from all 29 conditions) to predict fluxes for an independ- omics dataset, transcriptomics, proteomics and fluxomics were ent test condition. The normalized errors predicted by LBFBA with measured under five conditions with different oxygen composition these generic linear bounds (i.e. not reaction-specific) using LOOCV in the inlet gas. From condition 1–5, the oxygen composition in were about the same as the normalized errors from pFBA (average Using expression data to improve flux predictions 3887 Figure S4 for explanation why including condition 4 in the training set worsens LBFBA predictions). (v) LBFBA works for both eukar- yotes and prokaryotes, and for both wild-type and knockout strains, as shown in the E.coli example. LBFBA can integrate proteomics or transcriptomics and both lead to better flux predictions than pFBA. However, using proteo- mics data does show more accurate flux predictions than using tran- scriptomics data for most test cases. In the E.coli example, using proteomics data improved flux predictions for 24 out of the 29 con- ditions. In the yeast example, no significant difference was found be- tween integrating transcriptomics or proteomics data. Therefore, it is generally recommended to integrate proteomics, if available. LBFBA uses linear functions to constrain a subset of metabolic fluxes to a range of values that are more physiologically realistic, ra- Fig. 6. Generic ﬂux bounds generated from the relative ﬂux and gene expres- ther than try and fit expression and flux data to a single linear func- sion for all 37 reactions in the E.coli multi-omics dataset. The x-axis repre- tion. Since there is often weak correlation between flux and sents the gene expression value and the y-axis represents the ﬂux expression (see Supplementary Table S3 for correlation coefficients normalized by the carbon uptake rate for the corresponding condition. for the E.coli mutant dataset), using flux bounds allows for weak Shown are all 1073 pairs of transcriptomics levels and relative ﬂuxes for the correlations between flux and expression to still be accounted for by E.coli multi-omics dataset, which included measurements for 37 ﬂuxes under 29 conditions. The two parallel linear lines are used to bound the 37 ﬂuxes in reducing the solution space. In this work linear bounds were used LBFBA that required only three parameters be estimated for each reaction. Other types of functions could potentially be used; however, this normalized error across 29 conditions was 39.81% and 39.84% for could increase the number of parameters and training datasets LBFBA and pFBA, respectively) because these generic linear bounds required. were too loose. The flux bounds in LBFBA are sensitive to the datasets used to parameterize them (Fig. 3). However, even with weak correlations between flux and expression the LBFBA flux predictions are still ro- bust (since average normalized errors for half of the testing datasets 4 Discussion differed by <0.05, Fig. 3). Some conditions when included in the In this paper, we describe a new method, LBFBA, which predicts training dataset decreased LBFBA accuracy (Supplementary Fig. S3). flux distributions when transcriptomics/proteomics data are avail- For example, including data for the Pgi mutant in the training data- able along with flux values for a training dataset. The method inte- set (i.e. condition 4 from the E.coli mutant multi-omics dataset) grates the transcriptomics/proteomics data by setting the flux tended to worsen LBFBA flux predictions for other conditions. It bounds for a subset of reactions using reaction-specific linear func- would be beneficial to identify these outliers and eliminate them tions based on their corresponding expression levels. LBFBA can from the training datasets a priori; however, we were unable to find predict the flux distribution more accurately than pFBA. With the a way to identify them. For example, including data from the Pgi E.coli mutant dataset, when using four sets of experimental meas- mutant into the training set changed the estimated a, b and c values, urements of transcriptomics and fluxomics, LBFBA can achieve a but the magnitude of these changes was not larger than adding other normalized error which is around half the normalized errors of single conditions to the same training set. Additionally, measured pFBA. LBFBA can also reduce the flux variability across alternate fluxes for the Pgi mutant were not the most different compared to solutions, as shown in the yeast example. measured fluxes in other conditions, as determined by the mean LBFBA has several advantages compared to other expression- squared distance between flux distributions (Long and Reed, 2017). based methods: (i) LBFBA does not assume that higher expression While removing outliers could improve LBFBA predictions, these will lead to higher corresponding metabolic fluxes since the slopes outliers can also capture more extreme cellular behaviors, so remov- of the linear flux bound functions can take negative values. ing them could make predictions of other extreme behaviors worse Literature has shown that it is not always true that higher expression since similar conditions were not included in the training datasets. will lead to higher corresponding metabolic fluxes (Kim et al., Higher expression does not necessarily result in higher flux 2013). (ii) LBFBA is a simple linear programing CBM method and through the corresponding reaction. As shown in Supplementary thus easy to solve. (iii) LBFBA only requires four or five sets of ex- Table S1, nearly half of the slopes are negative, which demonstrates perimental measurements of transcriptomics/proteomics and fluxo- either a weak or negative correlation between gene expression and mics for parameterizing the flux bounds. Larger training set sizes flux levels. This is because there are multiple layers of enzyme con- may worsen the predictions because the flux bounds obtained from trol between gene expression and the corresponding flux (Hoppe, the training set may become too large due to data outliers. 2012). Hackett et al. showed that changes in flux across nutrient Therefore, the solution space does not shrink a lot and thus the conditions are predominantly due to metabolite levels, and not en- LBFBA predictions become worse. Additionally, training with more zyme levels (Hackett et al., 2016). This observation could explain datasets increases the likelihood that conditions are included where why LBFBA performance was not significantly better compared to fluxes are controlled at the metabolic level instead of the expression pFBA for the E.coli carbon source multi-omics dataset. level. In these instances, expression is not a good indicator of flux Among the top six reactions (RPE, G6PDH2r, PGL, GND, PYK, and could negatively impact LBFBA predictions. (iv) LBFBA was PGI, RPI) with the most negative slope, five belong to the Pentose not sensitive to which specific conditions were used in the training Phosphate Pathway (RPE, G6PDH2r, PGL, GND, RPI). As the gene of a given size (Supplementary Fig. S3) except for condition 4 (E.coli expression increases, the flux bounds are shifting down. Entry of Dpgi knockout at a growth rate of 0.2 h , see Supplementary glucose 6-phosphate either into glycolysis via PGI or into the 3888 M.Tian and J.L.Reed Gerosa,L. et al. (2015) Pseudo-transition analysis identiﬁes the key regulators pentose phosphate pathway via G6PDH2r is largely determined by of dynamic metabolic adaptations from steady-state data. Cell Syst., 1, the relative concentrations of NADPþ and NADPH (Lehninger 270–282. et al., 2008). Other reactions had slopes that were close to zero, Grant, M. and Boyd,S. (2008) Graph implementations for nonsmooth convex indicating that expression does not have much of an impact on flux. programs. In: Blondel, V. et al (eds), Recent Advances in Learning and This includes the D-lactate dehydrogenase (LDH_D), acetate kinase Control, Lecture Notes in Control and Information Sciences. Springer (ACKr) and phosphate acetyltransferase (PTAr) reactions. These Limited, Berlin, pp. 95–110. results could indicate that for these reactions flux is controlled at the Grant,M. and Boyd,S. (2014) CVX: Matlab Software for Disciplined Convex metabolite level rather than the gene expression level. Programming, Version 2.1. http://cvxr.com/cvx. LBFBA needs training data to parameterize the flux bounds. If Hackett,S.R. et al. (2016) Systems-level analysis of mechanisms regulating transcriptomics/proteomics and fluxomics data for the same condi- yeast metabolic ﬂux. Science, 354, aaf2786. Hamilton,J.J. and Reed,J.L. (2012) Identiﬁcation of functional differences in tion are available, LBFBA can be used. However, these datasets are metabolic networks using comparative genomics and constraint-based mod- usually hard to obtain due to the high cost of fluxomics measure- els. PLoS One, 7, e34670. ments. Also, the fluxomics measurements used in this paper are lim- Hoppe,A. (2012) What mRNA abundances can tell us about metabolism. ited to central metabolism. So LBFBA only compares the flux Metabolites, 2, 614–631. predictions in the central metabolism with those from pFBA, even Ishii,N. et al. (2007) Multiple high-throughput analyses monitor the response though both methods can predict the genome-scale flux distribu- of E. coli to perturbations. Science, 316, 593–597. tions. As more genome-scale flux measurements become available, Jensen,P.A. and Papin,J.A. (2011) Functional integration of a metabolic flux bounds for more reactions can be included in LBFBA and com- network model and expression data without arbitrary thresholding. parisons can be made between predicted and measured fluxes be- Bioinformatics, 27, 541–547. yond central metabolism. Jouhten,P. et al. (2008) Oxygen dependence of metabolic ﬂuxes and energy generation of Saccharomyces cerevisiae CEN.PK113-1A. BMC Syst. Biol., 2, 60. Kim,H.U. et al. (2013) Flux-coupled genes and their use in metabolic ﬂux ana- Acknowledgements lysis. Biotechnol. J., 8, 1035–1042. Thanks to Caroline Mitchell for help editing the manuscript. Thanks to Kim,H.U. et al. (2010) Genome-scale metabolic network analysis and drug Joonhoon Kim, Prashant Kumar and Matthew Long for helpful discussions. targeting of multi-drug resistant pathogen Acinetobacter baumannii AYE. Mol. Biosyst., 6, 339–348. Kim,J. et al. (2011) Large-scale bi-level strain design approaches and Funding mixed-integer programming solution techniques. PLoS One, 6, e24162. This work was funded in part by the Office of Science (BER), U.S. Lee,D.-S. et al. (2009) Comparative genome-scale metabolic reconstruction Department of Energy (DE-SC0008103), the U.S. Department of Energy and ﬂux balance analysis of multiple Staphylococcus aureus genomes iden- Great Lakes Bioenergy Research Center (DOE BER Office of Science DE- tify novel antimicrobial drug targets. J. Bacteriol., 191, 4015–4024. FC02-07ER64494), and by grant U19AI106772, provided by the National Lehninger,A.L. et al. (2008) Lehninger Principles of Biochemistry. W.H. Institute of Allergy and Infectious Diseases, National Institutes of Health. Freeman, New York. Lewis,N.E. et al. (2010) Omic data from evolved E. coli are consistent Conflict of Interest: none declared. with computed optimal growth from genome-scale models. Mol. Syst. Biol., 6, 390. Long,M.R. and Reed,J.L. (2017) Improving ﬂux predictions by integrating References data from multiple strains. Bioinformatics, 33, 893–900. Machado,D. and Herrga ˚ rd,M. (2014) Systematic evaluation of methods for Akesson,M. et al. (2004) Integration of gene expression data into integration of transcriptomic data into constraint-based models of metabol- genome-scale metabolic models. Metab. Eng., 6, 285–293. ism. PLoS Comput. Biol., 10, e1003580. Becker,S.A. and Palsson,B.O. (2008) Context-speciﬁc metabolic networks are Magnu ´ sdo ´ ttir,S. et al. (2017) Generation of genome-scale metabolic recon- consistent with experiments. PLoS Comput. Biol., 4, e1000082. structions for 773 members of the human gut microbiota. Nat. Biotechnol., van Berlo,R.J.P. et al. (2011) Predicting metabolic ﬂuxes using gene expression 35, 81–89. differences as constraints. IEEE/ACM Trans. Comput. Biol. Bioinform., 8, Mo,M.L. et al. (2009) Connecting extracellular metabolomic measurements 206–216. to intracellular ﬂux states in yeast. BMC Syst. Biol, 3, 37. Bosi,E. et al. (2016) Comparative genome-scale modelling of Staphylococcus Nuccio,S.-P. and Ba ¨ umler,A.J. (2014) Comparative analysis of Salmonella aureus strains identiﬁes strain-speciﬁc metabolic capabilities linked to genomes identiﬁes a metabolic network for escalating growth in the pathogenicity. Proc. Natl. Acad. Sci. USA, 113, E3801–E3809. inﬂamed gut. MBio, 5, e00929–14. Burgard,A.P. et al. (2003) Optknock: a bilevel programming framework for Orth,J.D. et al. (2011) A comprehensive genome-scale reconstruction of identifying gene knockout strategies for microbial strain optimization. Escherichia coli metabolism–2011. Mol. Syst. Biol., 7, 535. Biotechnol. Bioeng., 84, 647–657. Rintala,E. et al. (2009) Low oxygen levels as a trigger for enhancement of re- Chandrasekaran,S. and Price,N.D. (2010) Probabilistic integrative modeling spiratory metabolism in Saccharomyces cerevisiae. BMC Genomics, 10, of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA, 107, Shlomi,T. et al. (2008) Network-based prediction of human tissue-speciﬁc me- 17845–17850. tabolism. Nat. Biotechnol., 26, 1003–1010. Chavali,A.K. et al. (2012) A metabolic network approach for the identiﬁcation and Tervo,C.J. and Reed,J.L. (2012) FOCAL: an experimental design tool for sys- prioritization of antimicrobial drug targets. Trends Microbiol., 20, 113–123. tematizing metabolic discoveries and model development. Genome Biol., Colijn,C. et al. (2009) Interpreting expression data with metabolic ﬂux mod- 13, R116. els: predicting Mycobacterium tuberculosis mycolic acid production. PLoS Wiebe,M.G. et al. (2008) Central carbon metabolism of Saccharomyces Comput. Biol., 5, e1000489. cerevisiae in anaerobic, oxygen-limited and fully aerobic steady-state condi- Cotten,C. and Reed,J.L. (2013) Constraint-based strain design using continu- tions and following a shift to anaerobic conditions. FEMS Yeast Res., 8, ous modiﬁcations (CosMos) of ﬂux bounds ﬁnds new strategies for metabol- 140–154. ic engineering. Biotechnol. J., 8, 595–604.
Bioinformatics – Oxford University Press
Published: Jun 5, 2018
Access the full text.
Sign up today, get DeepDyve free for 14 days.