Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Large-scale computation of elementary flux modes with bit pattern trees

Large-scale computation of elementary flux modes with bit pattern trees Vol. 24 no. 19 2008, pages 2229–2235 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn401 Systems biology Large-scale computation of elementary flux modes with bit pattern trees Marco Terzer and Jörg Stelling Institute of Computational Science and Swiss Institute of Bioinformatics, ETH Zurich, 8092 Zurich, Switzerland Received on March 3, 2008; revised on July 14, 2008; accepted on July 28, 2008 Advance Access publication August 1, 2008 Associate Editor: John Quackenbush ABSTRACT major challenge. Various modeling approaches have been used for studying biological networks to understand interactions at different Motivation: Elementary flux modes (EFMs)—non-decomposable levels. However, simple approaches like graph analytical methods minimal pathways—are commonly accepted tools for metabolic may lack behavioral predictability and significance (Arita, 2004). network analysis under steady state conditions. Valid states of the Detailed (e.g. deterministic or stochastic dynamic) modeling is network are linear superpositions of elementary modes shaping mainly limited by insufficient knowledge on mechanisms and a polyhedral cone (the flux cone), which is a well-studied convex parameters (Klamt and Stelling, 2006). At an intermediary level, set in computational geometry. Computing EFMs is thus basically constraint-based approaches exploit that reaction stoichiometries are equivalent to extreme ray enumeration of polyhedral cones. This is often well known even for genome-scale metabolic networks. They a combinatorial problem with poorly scaling algorithms, preventing gain popularity, for instance, because they allow to predict fluxes the large-scale analysis of metabolic networks so far. for various organisms using linear or quadratic programming (Price Results: Here, we introduce new algorithmic concepts that enable et al., 2004). large-scale computation of EFMs. Distinguishing extreme rays from Constraint-based modeling starts with the m ×q stoichiometric normal (composite) vectors is one critical aspect of the algorithm. We matrix N , with q reactions as columns of N , each converting some present a new recursive enumeration strategy with bit pattern trees of the m metabolites (negative entries) into others (positive entries). for adjacent rays—the ancestors of extreme rays—that is roughly Thermodynamic conditions constrain the flux rates of irreversible one order of magnitude faster than previous methods. Additionally, reactions to non-negative values. Since every reversible reaction we introduce a rank updating method that is particularly well suited can be decomposed into two irreversible reactions, we get for parallel computation and a residue arithmetic method for matrix rank computations, which circumvents potential numerical instability r ≥ 0 (1) problems. Multi-core architectures of modern CPUs can be exploited q×1 for further performance improvements. The methods are applied to a where r represents a flux distribution or flux mode. A common central metabolism network of Escherichia coli, resulting in ≈26 Mio. assumption is furthermore that the system is in steady state because EFMs. Within the top 2% modes considering biomass production, metabolism usually operates on much faster time scales than the most of the gain in flux variability is achieved. In addition, we compute corresponding regulatory events. At steady state, production and ≈5 Mio. EFMs for the production of non-essential amino acids for a consumption of metabolites are balanced and concentrations of genome-scale metabolic network of Helicobacter pylori. Only large- (internal) metabolites remain constant: scale EFM analysis reveals the >85% of modes that generate several N ·r = 0 (2) amino acids simultaneously. Availability: An implementation in Java, with integration into Equations (1) and (2) constrain the solution space for possible MATLAB and support of various input formats, including SBML, is flux modes to a polyhedral cone called flux cone (see Section available at http://www.csb.ethz.ch in the tools section; sources are 2.2 for formal definitions). Optimization techniques such as flux available from the authors upon request. balance analysis (FBA) define objectives, for instance maximizing Contact: joerg.stelling@inf.ethz.ch for growth or energy production, to predict a single flux distribution Supplementary information: Supplementary data are available at (Schuetz et al., 2007). Related methods such as minimization of Bioinformatics online. metabolic adjustment include (experimentally derived) reference flux values to predict the adjustment under different conditions or of knockout mutants (Segrè et al., 2002). 1 INTRODUCTION For comprehensive analysis of metabolic network behavior, Systems biology investigates a biological system as a whole because however, the entire flux cone has to be considered. Minimal characterizing and understanding single parts or subcomponents functional pathways—elementary flux modes (EFMs)—are desired, often is not sufficient to explain the system behavior. Developing into which all operational modes of the network can be decomposed. mathematical models for such large-scale systems, however, is a Moreover, EFMs constitute a unique set of generators for the flux cone (Gagneur and Klamt, 2004). They correspond to extreme To whom correspondence should be addressed. rays of the polyhedral cone. Computing EFMs is equivalent to the © The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2229 M.Terzer and J.Stelling extreme ray enumeration problem from computational geometry. Definition 2. A convex cone P is called polyhedral cone if it is defined as the solution set of finitely many linear equalities and inequalities, that is, as Most algorithms are variants of the double description method the intersection of a finite number of hyperplanes and halfspaces: (Motzkin et al., 1953) and for EFMs, two versions are most relevant: the canonical basis approach by Schuster and Hilgetag (1994) and P ={x :Ax  = 0,Bx  ≥ 0}. the nullspace approach by Wagner (2004). However, little is known Definition 3. A vector r of a polyhedral cone P is called ray if about the complexity of the algorithm, and it is not known whether an algorithm exists with running time polynomial in the input r = 0 and α r ∈P for every α> 0. and output size. According to empirical observations, the running Two rays r and r are equivalent, i.e. r r ,if r = α r for some α> 0. time is approximately quadratic in the output size. Unfortunately, the number of EFMs grows exponentially with network size, Theorem 1(Minkowski’s Theorem for Polyhedral Cones). For every which currently restricts the application to small- and medium-scale polyhedral cone networks of limited connectivity. P ={x :Ax  = 0,Bx  ≥ 0} Many aspects have to be considered for any successful there exists some R such that it generates P: implementation of an EFM algorithm. After summarizing the most P ={x :x =Rc  for some c  ≥ 0}. recent and in our opinion the most relevant achievements in Section 2.1, we provide terminology and fundamentals in Section Every ray of the cone is a non-negative combination of columns in R.Ifwe 2.2. The standard implementation of the EFM algorithm with require R to be minimal, the columns in R are called extreme rays. Minimality important crunch points is discussed in Section 2.3. In Section 3.1, or elementarity of rays is essential for the algorithm. Hence we need a formal new methods are introduced, starting with bit pattern trees (Terzer definition for extreme rays, which is derived in the remainder of this section. and Stelling, 2006) and its advancement, recursive enumeration of We exploit the special structure of the flux cone, leading to simpler definitions and with it often to simpler algorithms. adjacent rays. A new rank updating method using residue arithmetic is developed, and we show how to exploit multi-core architectures Definition 4. Given the stoichiometric matrix N, the flux cone is a of modern CPUs with bit pattern trees. Finally, we demonstrate the polyhedral cone defined as power of the new concepts for comprehensive network analysis of F ={r :N r = 0, r ≥ 0} Escherichia coli and Helicobacter pylori metabolism (Sections 3.2 and 3.3). Aray r of the flux cone is called flux distribution or flux mode, extreme rays are called elementary flux modes or extreme pathways. 2 METHODS The flux cone coincides with the solution space for the constraints given by Equations (1) and (2). Distinctions between EFMs and extreme rays 2.1 Overview arise from slightly different treatments of reversible reactions [see Wagner The concept of EFMs for biochemical reaction networks was introduced by and Urbanczik (2005) for exact definitions]. Here, we use the general term Schuster and Hilgetag (1994). The solution space for feasible flux modes extreme rays, assuming that all reactions are irreversible (reversible reactions shapes a polyhedral cone, thus, extreme ray enumeration algorithms from are already decomposed). The inequality matrix (B in definition 2) is an computational geometry can be used to compute a minimal generating set for identity matrix for the flux cone, leading to the subsequent specialized the solution space. For EFM computation, variants of the double description definitions. General definitions for arbitrary polyhedral cones are given in method (Motzkin et al., 1953) are most often used. The complexity of the Fukuda and Prodon (1995). algorithm is poorly understood, but it performs quite well especially for Definition 5. For any ray r ∈F, the set Z (r) with indices i corresponding degenerate cases (Fukuda and Prodon, 1995). Performance and memory to zero fluxes in r is called the zero set of r: requirements are both critical for difficult problems. Apart from constraint ordering (see Section 2.3.3), initial matrix and number of iterations have most Z (r) ={i :r = 0, 1 ≤i ≤q} influence on performance. Mainly geared for EFM computation, Wagner (2004) proposed to use a nullspace initial matrix, leading to algorithm Definition 6. Let r and r be rays of F. If one of the following holds, both simplifications and improved performance. Subsequently, Gagneur and hold and r is called an extreme ray: Klamt (2004) proposed to use binary vectors to store flux values of processed a) rank(N )+|Z (r)|=q − 1 .Z (r) reactions (see Section 2.3.2). This binary approach not only reduces memory b) there is no r ∈F with Z (r ) ⊇Z (r) other than r r demands, but also facilitates set operations during elementarity testing (see Section 2.3.1), another performance-critical aspect of the algorithm. Klamt where q is the number of reactions and N are the columns of N .Z (r) et al. (2005) use rank computations to test elementarity, and they outline a corresponding to non-zero fluxes, represented by Z (r) ={i :r = 0, 1 ≤i ≤q}. divide and conquer strategy for parallel computation. In Terzer and Stelling (2006), we introduced bit pattern trees as indexing technique for optimized searching of subsets during elementarity testing. Furthermore, we introduced 2.3 Double description method the concept of candidate narrowing, which is extended here by a new Most algorithms for EFM computation are variants of the double description recursive enumeration approach (see Section 3.1.2). method invented by Motzkin et al. (1953). Starting with an initial cone that contains the final cone, some constraints are already considered and the remaining constraints are iteratively added. Each constraint is represented 2.2 Definitions by a halfspace, which is intersected with the intermediary cone (Fig. 1). Definition 1. A set C of points in R is convex if the line segment between The intersection removes some of the extreme rays. New extreme rays any two points in C lies in C. A set C is called a cone if for every x  ∈C, its are created from adjacent kept/removed extreme ray pairs using Gaussian non-negative multiple lies in C. Combined, we have a convex cone iff elimination. The newly created extreme rays lie in the hyperplane separating kept from removed rays. Wagner (2004) proposed a special form of the λ x + λ x ∈C for every x ,x ∈C and λ ,λ ≥ 0. kernel (or nullspace) matrix of N as an initial cone, namely K =[I;K ] , 1 1 2 2 1 2 1 2 2230 Large-scale computation of elementary flux modes elements in Z (r) ∩Z (r )to q − 2 −rank(N ), and it is always a good idea to check this condition first. 2.3.2 Data structures and compression Zero sets (definition 5) can be implemented as bit sets using one bit per set element. This requires only little space in memory, and set operations such as intersection and union can be computed with bitwise and/or functions—a single CPU operation for set sizes 32(64). Subset (superset) testing is not as obvious, but can be easily derived from the following basic set property: Z ⊆Z ⇐⇒Z ∩Z ≡ Z (4) 1 2 1 2 1 Gagneur and Klamt (2004) showed that storing only binary values for Fig. 1. Iteration step of the double description algorithm. A new extreme ray processed reactions is sufficient. Intermediary extreme rays consist of a h is created from adjacent rays a and g. Descendants of non-adjacent rays, binary and a numeric part. After enforcing irreversibility constraints, flux such as i from c and g, are not extreme rays. values corresponding to the processed reaction are converted from numeric to binary. Consequently, only binary extreme rays are left after the last iteration m×q where I is the identity matrix. If N has full rank m, the kernel matrix step, from which numerical flux values can be reconstructed. K has dimensions q × (q −m) and K consequently m × (q −m). All steady- Another important technique to reduce the size of data structures is state constraints (Equation 2) are satisfied by the nature of the kernel matrix. to remove redundancies beforehand. This saves memory, but also affects Furthermore, q −m irreversibility constraints (Equation 1) are fulfilled due performance since operations on smaller structures are faster, and compacted to the identity matrix in K , leaving m inequalities to be solved in the iteration stoichiometric matrices typically lead to fewer iteration steps. Good phase. Inequality constraints of the flux cone involve only one variable, the overviews of compression techniques are given in Gagneur and Klamt (2004) reaction for which only positive flux values are desired. Let us assume that and in the Appendix B of Urbanczik and Wagner (2005). we are in the iteration phase and have already considered j irreversibility constraints. To add constraint j + 1, extreme rays of the flux cone F are 2.3.3 Constraint ordering The double description algorithm is known + 0 − partitioned into three groups, R , R and R , which correspond to rays with to be very sensitive to constraint ordering. In Fukuda and Prodon positive, zero and negative flux value at reaction j + 1, respectively. Extreme (1995), different row ordering heuristics are compared, favoring simple + 0 rays of the cone F are those in R and R plus new extreme rays generated j+1 lexicographical ordering of matrix rows. Using the nullspace approach, the + − + − from ray pairs (r ,r ) ∈ (R ,R ). New rays are created by canceling out kernel matrix in row-echelon form serves as initial extreme ray matrix. the flux value at position j + 1 using Gaussian elimination, i.e. The identity part of the matrix has to be preserved, but remaining rows new + − − + can (and have to) be sorted to optimize performance. Unfortunately, no r =r r −r r (3) j+1 j+1 mathematical insight is available so far. In practice, we observed good Note that the new ray is a non-negative combination of old rays, and thus performance with the following orderings: maximum number of zeros + − by definition 1 surely a ray of the cone. Combining all elements of (R ,R ) (mostzeros), lexicographical (lexmin), absolute lexicographical (abslexmin), would generate a quadratic number of new rays measured in terms of fewest negative/positive pairs (fewestnegpos, reducing the set of adjacent intermediary extreme rays. However, only pairs of adjacent rays generate pair candidates) and combinations thereof. extreme rays (a and g in Fig. 1), thus adjacency testing or better directly enumerating adjacent rays is essential for any feasible implementation. 3 RESULTS 2.3.1 Elementarity testing Principally, one could use the definition of 3.1 Algorithmic improvements extreme rays to test elementarity of newly generated rays. Hence, either the rank of a submatrix of N is computed according to definition 6a, or the 3.1.1 Bit pattern trees In the iteration phase of the double new ray is tested against all other extreme rays as imposed by 6b. Instead of description method, new extreme rays are generated from adjacent constructing new rays first to discard most of them via the elementary test, ray pairs. According to definition 7b, we can enumerate all ray pairs it is advantageous to know about elementarity of the new ray beforehand. + − (r ,r ) and test adjacency by ensuring that no superset Z (r) ⊇ Only adjacent extreme rays generate new extreme rays, and the definition + − Z (r ) ∩Z (r ) exists. This is guaranteed with an exhaustive search of adjacency descends from definition 6. + − over all r r ,r , but an indexed search strategy is preferable. Definition 7. Let r, r and r be extreme rays of F. If one of the following Zero sets are q-dimensional tuples, hence multi-dimensional binary holds, both hold and r and r are said to be adjacent: search trees (kd-trees) can be used for optimized searching (Bentley, a) rank(N )+|Z (r) ∩Z (r )|= q − 2 1975). Therefore, we adapted kd-trees to binary data and invented .Z (r)∩Z (r ) the concept of bit pattern trees (Terzer and Stelling, 2006). b) if Z (r ) ⊇Z (r) ∩Z (r ) then r ror r r Figure 2 illustrates superset search on a bit pattern tree. Pseudo Both (a) and (b) can be used to test adjacency of two extreme rays. Test (a) code for tree nodes and search method are given in Supplementary does not depend on the number of intermediary extreme rays and thus is not Material. The fundamental idea of bit pattern trees is very simple: decelerating during computation, which is advantageous. Furthermore, it can a binary search tree is constructed, each node separating zero sets be easily applied for distributed computing since it only depends on the rays containing a certain bit (right child tree) from those not containing to be tested and on the stoichiometric matrix—a system invariant. Klamt et al. + − the bit (left). Searching a superset of Z =Z (r ) ∩Z (r ), we (2005) actually suggest using the rank test for that purpose. However, for traverse the tree and test at each node whether Z contains the bit larger networks, rank computation time is not negligible, especially if exact used for separation of zero sets in the subtrees. If Z contains the bit, arithmetic is used. It is a cubic algorithm using Gaussian elimination, i.e. m×q only the right child node can contain supersets, if not, both children O mm(q −m) for a full-rank stoichiometric matrix N . In Section 3.1.3 are recursed. we suggest a new rank update method that circumvents these problems and Since the tree will never contain all 2 possible sets, not all makes (a) a competitive strategy especially for large networks. Apart from that, it is worth mentioning that rule (a) constrains the minimum number of bits will be used for separation in the nodes. We can exploit this 2231 M.Terzer and J.Stelling Fig. 2. Bit pattern tree with union patterns on the right of every tree node and leaf nodes at the bottom of the tree with zero sets in the boxes. Bold values are common for the whole subtree since they were used to separate left and right tree sets. The set in the dotted box at the top of the tree represents an exemplary intersection set Z . Searching a superset of Z , the tree is traversed ∩ ∩ along the blue arrows. Dotted arrows are not traversed and truncation of searching is indicated by crosses next to the union pattern causing abortion. to balance the tree by selecting particular bits at tree construction time. Moreover, we store the union pattern Z (T ) = Z for every U i tree node T , unifying all zero sets Z ∈T . We can abort searching a superset of Z at node T if Z ⊆Z (T ). ∩ ∩ 3.1.2 Recursive enumeration of adjacent rays The simplest approach to use bit pattern trees enumerates all ray pair candidates and tests adjacency as described in the previous section. However, enumerating all pairs is still an elaborate task in O(n ), where n is the number of intermediary modes. To improve this enumeration step, + − 0 Fig. 3. Pseudo code for recursive enumeration of adjacent rays using bit we construct three bit pattern trees, T , T and T for rays with pattern trees. Steps (i) and (iii) contain the recursions, in (ii), adjacent pairs positive, negative and zero flux value for the currently processed are found and added to the pairs list. reaction, respectively. To enumerate all adjacent ray pair candidates + − + − (r ,r ) ∈ (T ,T ), we perform four recursive invocations on the + − can then benefit from the precomputed matrix part and only need to subtrees of the nodes T and T [see Step (i) in Fig. 3]. + − perform triangularization of the remaining part. The cut pattern Z =Z (T ) ∩Z (T ) unifies all intersection C U U + − Let us consider two consecutive recursions of addAdjPairs with sets Z =Z (r ) ∩Z (r ), and hence can be used as test set covering + − + − parent nodes (P ,P ) and child nodes (C ,C ), e.g. assuming all candidates of the subtrees (Terzer and Stelling, 2006). Note that the first recursion of Step (i) in Figure 3. Descending the tree, meetsPrerequirement is called on each recursion level, and thus union patterns can only have fewer or equally many elements. Thus, efficient tests are preferable. Any necessary condition for adjacency the parent cut pattern is a superset of the child cut pattern, i.e. can be used. Here, we applied minimum set size q − 2 −rank(N ) C ⊇C . Adjacency test 7a consists of two parts: the rank of a deduced from adjacency test 7a. A recursion occurs if at least one P C column submatrix of N , and the size of the test set T =Z (r) ∩Z (r ). tree node is an intermediary node. If both nodes are leaves, all pairs Here, T coincides with the cut pattern, and since the child set C are tested, first again using meetsPrerequirement and then by testing contains at most all elements of C , the submatrix N contains at for adjacency. If the combinatorial test 7b) is used to implement .C + − is Adjacent, we search for supersets in all three trees T , T and least all columns of N . The elements C \C =C \C are thus C P P C .C T (see Supplementary Material for a simple case study). exactly those columns which are added to the triangularized part of the matrix at this recursion step. A single step of the rank update method is illustrated in Figure 4. Note that adjacent enumeration 3.1.3 Lazy rank updating Instead of the combinatorial adjacency aborts early for many node pairs due to the meetsPrerequirement test, as explained in the previous section, we can use the rank test 7a. test. Therefore, we execute triangularization of the matrix lazily, This has the advantage that testing does not depend on the number of that is, not before a real rank computation is requested. intermediary modes and easily allows for distributed computation. However, for large networks, rank computation is an expensive procedure and some care has to be taken. 3.1.4 Floating point versus exact arithmetic Gaussian Considering that we are recursively traversing two trees with little triangularization with floating point numbers typically uses change between two recursion steps, we can think of an update full pivoting to minimize numeric instability effects. Rank updating strategy for the examined matrix. Using Gaussian elimination to uses the same matrix for various rank computations, and instability compute an upper triangular matrix to derive the rank, we extend becomes a serious issue. To circumvent this problem, the non- the triangular part with every recursion step. Lower recursion levels triangularized matrix part is initially stored at each update level. 2232 Large-scale computation of elementary flux modes in the above equation has to fit into a register, that is, it has to be below 2 with e = 32(64). We can ensure this by choosing our prime e−1 m < 2 . Suppose that we have reached the point where no non-zero values are left when scanning for the next pivot row/column. This is where rank computation normally stops, but in the residue case, one of the zero elements could theoretically be a multiple of our prime, unequal to zero in the non-residue world. That is, the rank computed by residue arithmetic is at most equal to the real rank. The probability −1 Fig. 4. One step of the rank update method: partially triangularized matrix of any non-zero value being zero (mod m)is m , which could cause before the current step (left) and continuation after triangularization of a problem if many different rank computations were executed. The −1 columns two and four (right). × stands for non-zero pivot elements, probability can be improved to m by simultaneously computing ⊗ for any value. Triangularized columns are reflected by elements in the the rank for different primes m , which can indeed be performed in complementary cut patterns C and C , respectively. Exemplary patterns are P C parallel in modern processors by using SIMD instructions (single given in binary (square brackets) and standard set notation (curly brackets), instruction multiple data, e.g. SSE instructions of an Intel processor). R stands for any remainder of the sets, R for its binary representation. Note However, in practice, even a single small prime around 100 did a that column three and four are swapped during the update step to put the perfect job. pivots in place, and rows might have been swapped to find non-zero pivot elements. If all remaining elements of an added column are zero, it is put to the end of the matrix and ignored at subsequent steps. 3.1.6 Exploiting multi-core CPUs Most current processors have multi-core architectures, allowing multiple threads or processes to It is restored before each rank computation and before branching run concurrently. We made use of this extra power with the help of to the next update level. Neglecting that this procedure uses some semaphores at Step (i) in Figure 3. The semaphore maintains a set of more memory, it has still two main disadvantages: (i) restoring permits, one per CPU core, with efficient, thread safe operations to significantly affects performance, and (ii) instability might still be acquire and release permits. The algorithm tries to acquire a permit an issue for large or ill-conditioned matrices. to start a new thread. If the permit is received, the four recursive Rank updating was therefore implemented with exact arithmetic, invocations are split into two parts—two invocations for the current, using rational numbers with large integer numerators and two for the newly created thread. If no permit is acquired, the current denominators. Opposed to the floating point variant, small pivot thread executes all four recursions. If a thread completes by reaching elements are chosen to avoid uncontrolled growth of integers. The the leaves of the tree, it releases the permit, triggering a new thread to downside of fraction numbers is 2-fold: arithmetic operations such as start soon after. Note that it is important to have very efficient acquire addition and multiplication are much more expensive, and integers and release operations since they are called very often. The second are possibly still growing, even if fractions are continually reduced. critical point is the concurrent write access to the pairs set. Either the Using store and restore as described for floating point arithmetic set itself is made thread safe, or the new thread gets its own set and improves control of integer growth and leads to better overall the parent thread is responsible for merging after completion of the performance. child thread. This dynamic concurrent tree traversal strategy can be improved by primarily collecting recursive invocations to a certain 3.1.5 Rank computation using residue arithmetic A powerful recursion depth if permits are available. A new thread is started method to implement rank computation is to work with integer for each available permit and all threads concurrently process the residues u modulo a prime m. The residues u are constrained collected recursions. Noteworthy, this fine-grained parallelization to the interval 0 ≤u <m using unsigned, and to −m <u <m with technique is applicable to both adjacency test variants. It can be signed arithmetic (Knuth, 1997, Section 4.3.2). Assuming that applied simultaneously with other parallelization approaches, for the stoichiometric matrix is rational, we compute the residue instance, the coarse-grained approach given in Klamt et al. (2005) matrix N =[n ] by multiplying each numerator n with the where the computational task is divided into disjunct subtasks that ij ij can be run in parallel. multiplicative inverse of the denominator d (modulo m), that is, ij −1 n = ((n mod m)(d mod m) mod m). Note that multiplicative ij ij ij inverses are defined for numbers being coprime with m. They are 3.2 Benchmarks computed using the extended Euclidean algorithm. If any of the For benchmarking, we analyzed an E.coli central metabolism denominators is not coprime with m, that is, it is a multiple of network with 106 reactions and 89 metabolites that was also used the prime—which is actually very unlikely for large primes—we in (Klamt et al., 2005). All tests were performed on one or two multiply the whole column of the matrix with m (or a power of it if dual-core AMD Opteron processors of a Linux 2.6.9 machine, necessary) and reduce the fractions. using a Java 64-Bit runtime environment (version 1.6.0) with For the triangularization of the matrix, no inverses are needed. 30 GB maximum heap memory size. For all tests, only the iteration Suppose we have chosen pivot element n . The new values n of rc ij cycle time of the algorithm was taken, excluding preprocessing subsequent rows are then calculated as (compression, etc.) and post-processing (from binary to numeric EFMs). Computation times for the different algorithmic strategies n = ((n n −n n ) mod m) for all i >r,j ≥c ij ij rc rj ic are compared in Figure 5A. Combining recursive enumeration of resulting in zeros for values n below the pivot element. If we want adjacent rays on bit pattern trees and a combinatorial adjacency test ic to use CPU arithmetic operations, the difference of two products shows best performance for the selected examples. Standard rank 2233 M.Terzer and J.Stelling 3 6 10 4 x 10 A B 3.5 2 A B 2 6 ← n 10 30 2.5 10 ← n 1.5 -1 10 0.5 1 2.5 10 26 1 2 3 4 0 0 # EFM’s # Threads nb 0 1 2 3 4 5 6 0 2 4 6 8 10 Configuration Top yield tolerance [%] Fig. 5. Benchmark results on a system with two dual-core AMD Opteron Fig. 6. (A) Number of EFMs for different amino acid uptake configurations, processors for different configurations for an E.coli central metabolism namely modes without biomass production (nb) and modes with 0–6 network. (A) Computation times with four threads for combinatorial test concurrently enabled amino acid uptake reactions. Only EFMs on the (−·◦·−), standard rank test (··×··), rank updating with exact arithmetic left of the dashed line can be computed from single amino acid uptake (–  –) and with residue arithmetic (−−). Dashed gray lines indicate linear configurations. (B) CV for reaction fluxes of different uptake configurations, and quadratic computation time. (B) Speedup with two and four threads considering only top yield EFMs. The configurations are: glucose without compared to single thread computation. Line colors indicate the problem amino acid uptake (dashed line), with phenylalanine (dotted), with glutamate size: red, blue and black stand for 0.5, 2.45 and 26 Mio EFMs, respectively. (dash-dotted) and with all selected amino acids (solid line). Line styles correspond to different computation methods: combinatorial test (——), rank updating with exact arithmetic (···) and with residue arithmetic (−−). The thin gray diagonal represents ideal speedup. Figure 6A shows the number of elementary modes per amino acid uptake configuration in terms of the number of simultaneously computation combined with recursive enumeration of adjacent rays enabled uptake reactions. The total number of EFMs with at most one and rank updating with residue arithmetic have similar performance amino acid uptake is 1 714 691, only 6.5% of the EFM set enabling for these examples. The superiority of the rank test in Klamt simultaneous ingestion of up to six amino acids. Conversely, by et al. (2005) may be caused by their unoptimized linear search computing EFM sets for one amino acid uptake per simulation, and method for the combinatorial test. Here, we make use of pattern combining the resulting sets afterwards, adding up to 1.7M EFMs, trees for an optimized search (Terzer and Stelling, 2006). However, simultaneous uptake cases are missed. These correspond to >90% rank tests might be more efficient for larger networks, since of the metabolic pathways. rank testing decelerates probably much slower with increasing A major disadvantage of single-objective optimization techniques stoichiometric matrix size than combinatorial testing, which depends such as FBA is the restriction to one optimal flux mode. Robustness on the exponentially growing number of intermediary modes. For reflected by a certain degree of flexibility is disregarded. We larger networks, we observed better performance for the rank therefore analyzed flux variability for modes with suboptimal updating method (data not shown). Rank updating is definitely a biomass yield (Fig. 6B). We observed no flux variation for zero good choice especially if parallel computation is considered. We deviance from the top biomass yield since single EFMs reach recommend residue arithmetic for matrix rank computations in optimality. With increasing suboptimality, the number of reactions parallelized versions, but rank updating performs remarkably well with coefficient of variation (CV) over 50% grows rapidly. Around when exact arithmetic is used instead. Speedup factors for one, 2% below top yield, reaction variation reaches a saturation point. two and four threads using two dual core CPUs are shown in Interestingly, we find a similar saturation point for different CV Figure 5B. Exploiting multi-core CPUs is particularly effective for threshold values (see Supplementary Material for other thresholds). large problems, where the speedup factors approach the optimum. Hence, the cell can achieve high flexibility (robustness) with only For our examples, all methods—even exact arithmetics for the little decrease in metabolic efficiency. large problems—are faster than CellNetAnalyzer/Metatool (Klamt Next, we applied our methods to a possibly more realistic, et al., 2005). For 507 632/2, 450 787 EFMs, we timed 2 min genome-scale metabolic network of H.pylori. In their study, Price 57 s/92 min 09 s for Metatool (5.0.4) and 1 min 30 s/19 min 39 s for et al. (2002) computed the much smaller set of extreme pathways the combinatorial test with one and 40 s/8 min 08 s with four threads, (EPs) for the formation of all non-essential amino acids and respectively (see Supplementary Material for network configuration ribonucleotides. Here, we focus on the amino acids and compute details and all computation times). EFMs for all non-essential amino acids simultaneously. Allowable inputs are d- and l-alanine, arginine, adenine, sulfate, urea and 3.3 Large-scale EFM computation oxygen; outputs are ammonia, carbon dioxide and the carbon For our first large-scale computational experiment, we used different sinks succinate, acetate, formate and lactate [in correspondence configurations of the central metabolism network of E.coli (Stelling to case 4 in Price et al. (2002)], together with all non-essential et al., 2002) with growth on glucose. Besides uptake of glucose, amino acids. Focusing on the production of a specific amino six central amino acids from different biosynthesis families are acid, only a small fraction of EFMs is found with typical small- available, only one at a time in a first experiment. This results in scale simulations, preventing the simultaneous production of other six sets of elementary modes for the amino acids alanine, aspartate, amino acids (between 3% for proline and 16% for asparagine, glutamate, histidine, phenylalanine and serine. The sets contain respectively). Altogether, only 815 576 of 5 785 975 EFMs are found between 321 431 (alanine) and 858 648 (glutamate) EFMs. In a with single amino acid simulations—as for the E.coli case, over second evaluation, we computed the set of EFMs for combined 85% of the modes are missed (see Supplementary Material for uptake that consists of 26 381 168 modes. details). Furthermore, allowing for simultaneous production of all Time [min] Speedup # EFMs % Reactions with CV > 50.0% Large-scale computation of elementary flux modes we analyzed the CV for all reactions. Interestingly, flux variation increases rapidly when gradually decreasing optimality of biomass production to 2% below maximum yield. After this saturation point, we observed only little gain of variation. This gives a marginal value where the cell can easily gain robustness at a small price of biomass production. Importantly, such multi-objective flux phenotypes cannot be explained with single objective optimization techniques such as FBA. In a second application, we analyzed the simultaneous production of non-essential amino acids in a genome- 20 40 60 80 100 120 Path Length scale metabolic network of H.pylori. Most of the more than 5 million EFMs generate multiple amino acids concurrently, and they Fig. 7. Path length distributions of EFMs for glycine production in H.pylori have significantly larger path lengths than those producing only a when glycine is produced alone (dark bars) or possibly jointly with other single amino acid. These more complex cellular functions are missed amino acids (light bars). with simplified setups considering only one amino acid at a time. Altogether, our study shows both the potential and necessity of large- amino acids yields EFMs with larger path lengths as shown in scale computation of elementary modes, an important step toward Figure 7 for glycine production. Path length distributions are similar universal genome-scale applications. for the other amino acids (see Supplementary Material) and the more complex pathways are not reflected in simplified single amino ACKNOWLEDGEMENTS acid experiments. Hence, our computational methods allow for We thank G. Gonnet and U. Sauer for pointers to methods and for investigation of combined production of amino acids at a genome- comments on the article. scale level for the first time. Still, applicability to larger networks and more complex environmental conditions has to be shown. Conflict of Interest: none declared. 4 CONCLUSION REFERENCES Currently, the computation of EFMs is restricted to metabolic Arita,M. (2004) The metabolic world of Escherichia coli is not small. Proc. Natl Acad. networks of moderate size and connectivity because the number of Sci. USA, 101, 1543–1547. Bentley,J.L. (1975) Multidimensional binary search trees used for associative searching. modes and the computation time raise exponentially with increasing Commun. ACM , 18, 509–517. network complexity. Here, we introduced new methods which Fukuda,K. and Prodon,A. (1995) Double description method revisited. In are several times faster than traditional algorithm variants for Combinatorics and Computer Science. Vol. 1120, Springer, Heidelberg, pp. 91–111. medium problem sizes, and even higher speedups are expected Gagneur,J. and Klamt,S. (2004) Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinformatics, 5, 175. for larger problems. We proved the efficiency by successfully Klamt,S. and Stelling,J.(2006) Stoichiometric and constraint-based modeling. In competing with alternate implementations, and through large-scale Szallasi,Z. et al. (eds) System Modeling in Cellular Biology. MIT Press, Cambridge computations for example networks. However, it remains open / MA, pp. 73–96. whether EFMs can be computed for the most recent genome- Klamt,S. et al. (2005) Algorithmic approaches for computing elementary modes in large biochemical reaction networks. IEE Proc. Syst. Biol., 152, 249–255. scale networks. It can possibly be achieved in the near future with Knuth,D.E. (1997) Seminumerical Algorithms. Vol. 2 of The Art of Computer intelligent modularization strategies and parallelized algorithms. Programming. 3rd edn. Addison-Wesley, Reading MA. Memory consumption becomes a major challenge and one might Motzkin,T.S. et al (1953) The double description method. In Kuhn,H. and Tucker,A. have to consider out-of-core implementations that store intermediary (eds) Contributions to the Theory of Games II . Vol. 8 of Annals of Math. Studies. modes on disk. The current in-core implementation requires a lot of Princeton University Press, Princeton / RI, pp. 51–73. Price,N. et al. (2002) Determination of redundancy and systems properties of the memory for large computations. However, our examples with up to metabolic network of Helicobacter Pylori using genome-scale extreme pathway 2.5 M EFMs can be run on a normal desktop computer with 1–2 GB analysis. Genetics Res., 12, 760–769. memory. We anticipate that porting the code from Java to C will Price,N. et al. (2004) Genome-scale models of microbial cells: evaluating the result in rather marginal performance and memory improvements consequences of constraints. Nat. Rev. Microbiol., 2, 886–897. at the cost of reduced interoperability and maintainability. We Schuetz,R. et al. (2007) Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol. Syst. Biol., 3,Article 119. therefore prioritize out-of-core computation and parallelization. We Schuster,S. and Hilgetag,C. (1994) On elementary flux modes in biochemical reaction already applied fine-grained parallelization by exploiting multiple systems at steady state. J. Biol. Syst., 2, 165–182. cores of modern CPUs and our rank update method with residue Segrè,D. et al. (2002) Analysis of optimality in natural and perturbed metabolic arithmetic is readily applicable to parallel computation. Coarse- networks. Proc. Natl Acad. Sci. USA, 99, 15112–15117. Stelling,J. et al. (2002) Metabolic network structure determines key aspects of grained parallelization complements our multi-core technique and functionality and regulation. Nature, 420, 190–193. can be applied simultaneously. Terzer,M. and Stelling,J. (2006) Accelerating the computation of elementary modes To investigate the potential of large-scale EFM computation to using pattern trees. In Bucher,P. and Moret,B.M.E. (eds) WABI , Vol. 4175 of Lecture yield new biological insight, we first focused on the analysis of Notes in Computer Science. Springer, Heidelberg, pp. 333–343. E.coli central metabolism. The enhanced computation potential Urbanczik,R. and Wagner,C. (2005) Functional stoichiometric analysis of metabolic networks. Bioinformatics, 21, 4176–4180. enabled the calculation of over 26 million elementary modes for Wagner,C. (2004) Nullspace approach to determine the elementary modes of chemical growth on glucose and simultaneous uptake of selected amino acids. reaction systems. J. Phys. Chem. B, 108, 2425–2431. Only a small fraction of these EFMs and none of the maximum Wagner,C. and Urbanczik,R. (2005) The geometry of the flux cone of a metabolic yield modes are found with typical small-scale computations. Next, network. Biophys. J., 89, 3837–3845. # EFMs http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Large-scale computation of elementary flux modes with bit pattern trees

Bioinformatics , Volume 24 (19): 7 – Aug 1, 2008

Loading next page...
 
/lp/oxford-university-press/large-scale-computation-of-elementary-flux-modes-with-bit-pattern-ssro3Qi0uh

References (24)

Publisher
Oxford University Press
Copyright
© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btn401
pmid
18676417
Publisher site
See Article on Publisher Site

Abstract

Vol. 24 no. 19 2008, pages 2229–2235 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn401 Systems biology Large-scale computation of elementary flux modes with bit pattern trees Marco Terzer and Jörg Stelling Institute of Computational Science and Swiss Institute of Bioinformatics, ETH Zurich, 8092 Zurich, Switzerland Received on March 3, 2008; revised on July 14, 2008; accepted on July 28, 2008 Advance Access publication August 1, 2008 Associate Editor: John Quackenbush ABSTRACT major challenge. Various modeling approaches have been used for studying biological networks to understand interactions at different Motivation: Elementary flux modes (EFMs)—non-decomposable levels. However, simple approaches like graph analytical methods minimal pathways—are commonly accepted tools for metabolic may lack behavioral predictability and significance (Arita, 2004). network analysis under steady state conditions. Valid states of the Detailed (e.g. deterministic or stochastic dynamic) modeling is network are linear superpositions of elementary modes shaping mainly limited by insufficient knowledge on mechanisms and a polyhedral cone (the flux cone), which is a well-studied convex parameters (Klamt and Stelling, 2006). At an intermediary level, set in computational geometry. Computing EFMs is thus basically constraint-based approaches exploit that reaction stoichiometries are equivalent to extreme ray enumeration of polyhedral cones. This is often well known even for genome-scale metabolic networks. They a combinatorial problem with poorly scaling algorithms, preventing gain popularity, for instance, because they allow to predict fluxes the large-scale analysis of metabolic networks so far. for various organisms using linear or quadratic programming (Price Results: Here, we introduce new algorithmic concepts that enable et al., 2004). large-scale computation of EFMs. Distinguishing extreme rays from Constraint-based modeling starts with the m ×q stoichiometric normal (composite) vectors is one critical aspect of the algorithm. We matrix N , with q reactions as columns of N , each converting some present a new recursive enumeration strategy with bit pattern trees of the m metabolites (negative entries) into others (positive entries). for adjacent rays—the ancestors of extreme rays—that is roughly Thermodynamic conditions constrain the flux rates of irreversible one order of magnitude faster than previous methods. Additionally, reactions to non-negative values. Since every reversible reaction we introduce a rank updating method that is particularly well suited can be decomposed into two irreversible reactions, we get for parallel computation and a residue arithmetic method for matrix rank computations, which circumvents potential numerical instability r ≥ 0 (1) problems. Multi-core architectures of modern CPUs can be exploited q×1 for further performance improvements. The methods are applied to a where r represents a flux distribution or flux mode. A common central metabolism network of Escherichia coli, resulting in ≈26 Mio. assumption is furthermore that the system is in steady state because EFMs. Within the top 2% modes considering biomass production, metabolism usually operates on much faster time scales than the most of the gain in flux variability is achieved. In addition, we compute corresponding regulatory events. At steady state, production and ≈5 Mio. EFMs for the production of non-essential amino acids for a consumption of metabolites are balanced and concentrations of genome-scale metabolic network of Helicobacter pylori. Only large- (internal) metabolites remain constant: scale EFM analysis reveals the >85% of modes that generate several N ·r = 0 (2) amino acids simultaneously. Availability: An implementation in Java, with integration into Equations (1) and (2) constrain the solution space for possible MATLAB and support of various input formats, including SBML, is flux modes to a polyhedral cone called flux cone (see Section available at http://www.csb.ethz.ch in the tools section; sources are 2.2 for formal definitions). Optimization techniques such as flux available from the authors upon request. balance analysis (FBA) define objectives, for instance maximizing Contact: joerg.stelling@inf.ethz.ch for growth or energy production, to predict a single flux distribution Supplementary information: Supplementary data are available at (Schuetz et al., 2007). Related methods such as minimization of Bioinformatics online. metabolic adjustment include (experimentally derived) reference flux values to predict the adjustment under different conditions or of knockout mutants (Segrè et al., 2002). 1 INTRODUCTION For comprehensive analysis of metabolic network behavior, Systems biology investigates a biological system as a whole because however, the entire flux cone has to be considered. Minimal characterizing and understanding single parts or subcomponents functional pathways—elementary flux modes (EFMs)—are desired, often is not sufficient to explain the system behavior. Developing into which all operational modes of the network can be decomposed. mathematical models for such large-scale systems, however, is a Moreover, EFMs constitute a unique set of generators for the flux cone (Gagneur and Klamt, 2004). They correspond to extreme To whom correspondence should be addressed. rays of the polyhedral cone. Computing EFMs is equivalent to the © The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2229 M.Terzer and J.Stelling extreme ray enumeration problem from computational geometry. Definition 2. A convex cone P is called polyhedral cone if it is defined as the solution set of finitely many linear equalities and inequalities, that is, as Most algorithms are variants of the double description method the intersection of a finite number of hyperplanes and halfspaces: (Motzkin et al., 1953) and for EFMs, two versions are most relevant: the canonical basis approach by Schuster and Hilgetag (1994) and P ={x :Ax  = 0,Bx  ≥ 0}. the nullspace approach by Wagner (2004). However, little is known Definition 3. A vector r of a polyhedral cone P is called ray if about the complexity of the algorithm, and it is not known whether an algorithm exists with running time polynomial in the input r = 0 and α r ∈P for every α> 0. and output size. According to empirical observations, the running Two rays r and r are equivalent, i.e. r r ,if r = α r for some α> 0. time is approximately quadratic in the output size. Unfortunately, the number of EFMs grows exponentially with network size, Theorem 1(Minkowski’s Theorem for Polyhedral Cones). For every which currently restricts the application to small- and medium-scale polyhedral cone networks of limited connectivity. P ={x :Ax  = 0,Bx  ≥ 0} Many aspects have to be considered for any successful there exists some R such that it generates P: implementation of an EFM algorithm. After summarizing the most P ={x :x =Rc  for some c  ≥ 0}. recent and in our opinion the most relevant achievements in Section 2.1, we provide terminology and fundamentals in Section Every ray of the cone is a non-negative combination of columns in R.Ifwe 2.2. The standard implementation of the EFM algorithm with require R to be minimal, the columns in R are called extreme rays. Minimality important crunch points is discussed in Section 2.3. In Section 3.1, or elementarity of rays is essential for the algorithm. Hence we need a formal new methods are introduced, starting with bit pattern trees (Terzer definition for extreme rays, which is derived in the remainder of this section. and Stelling, 2006) and its advancement, recursive enumeration of We exploit the special structure of the flux cone, leading to simpler definitions and with it often to simpler algorithms. adjacent rays. A new rank updating method using residue arithmetic is developed, and we show how to exploit multi-core architectures Definition 4. Given the stoichiometric matrix N, the flux cone is a of modern CPUs with bit pattern trees. Finally, we demonstrate the polyhedral cone defined as power of the new concepts for comprehensive network analysis of F ={r :N r = 0, r ≥ 0} Escherichia coli and Helicobacter pylori metabolism (Sections 3.2 and 3.3). Aray r of the flux cone is called flux distribution or flux mode, extreme rays are called elementary flux modes or extreme pathways. 2 METHODS The flux cone coincides with the solution space for the constraints given by Equations (1) and (2). Distinctions between EFMs and extreme rays 2.1 Overview arise from slightly different treatments of reversible reactions [see Wagner The concept of EFMs for biochemical reaction networks was introduced by and Urbanczik (2005) for exact definitions]. Here, we use the general term Schuster and Hilgetag (1994). The solution space for feasible flux modes extreme rays, assuming that all reactions are irreversible (reversible reactions shapes a polyhedral cone, thus, extreme ray enumeration algorithms from are already decomposed). The inequality matrix (B in definition 2) is an computational geometry can be used to compute a minimal generating set for identity matrix for the flux cone, leading to the subsequent specialized the solution space. For EFM computation, variants of the double description definitions. General definitions for arbitrary polyhedral cones are given in method (Motzkin et al., 1953) are most often used. The complexity of the Fukuda and Prodon (1995). algorithm is poorly understood, but it performs quite well especially for Definition 5. For any ray r ∈F, the set Z (r) with indices i corresponding degenerate cases (Fukuda and Prodon, 1995). Performance and memory to zero fluxes in r is called the zero set of r: requirements are both critical for difficult problems. Apart from constraint ordering (see Section 2.3.3), initial matrix and number of iterations have most Z (r) ={i :r = 0, 1 ≤i ≤q} influence on performance. Mainly geared for EFM computation, Wagner (2004) proposed to use a nullspace initial matrix, leading to algorithm Definition 6. Let r and r be rays of F. If one of the following holds, both simplifications and improved performance. Subsequently, Gagneur and hold and r is called an extreme ray: Klamt (2004) proposed to use binary vectors to store flux values of processed a) rank(N )+|Z (r)|=q − 1 .Z (r) reactions (see Section 2.3.2). This binary approach not only reduces memory b) there is no r ∈F with Z (r ) ⊇Z (r) other than r r demands, but also facilitates set operations during elementarity testing (see Section 2.3.1), another performance-critical aspect of the algorithm. Klamt where q is the number of reactions and N are the columns of N .Z (r) et al. (2005) use rank computations to test elementarity, and they outline a corresponding to non-zero fluxes, represented by Z (r) ={i :r = 0, 1 ≤i ≤q}. divide and conquer strategy for parallel computation. In Terzer and Stelling (2006), we introduced bit pattern trees as indexing technique for optimized searching of subsets during elementarity testing. Furthermore, we introduced 2.3 Double description method the concept of candidate narrowing, which is extended here by a new Most algorithms for EFM computation are variants of the double description recursive enumeration approach (see Section 3.1.2). method invented by Motzkin et al. (1953). Starting with an initial cone that contains the final cone, some constraints are already considered and the remaining constraints are iteratively added. Each constraint is represented 2.2 Definitions by a halfspace, which is intersected with the intermediary cone (Fig. 1). Definition 1. A set C of points in R is convex if the line segment between The intersection removes some of the extreme rays. New extreme rays any two points in C lies in C. A set C is called a cone if for every x  ∈C, its are created from adjacent kept/removed extreme ray pairs using Gaussian non-negative multiple lies in C. Combined, we have a convex cone iff elimination. The newly created extreme rays lie in the hyperplane separating kept from removed rays. Wagner (2004) proposed a special form of the λ x + λ x ∈C for every x ,x ∈C and λ ,λ ≥ 0. kernel (or nullspace) matrix of N as an initial cone, namely K =[I;K ] , 1 1 2 2 1 2 1 2 2230 Large-scale computation of elementary flux modes elements in Z (r) ∩Z (r )to q − 2 −rank(N ), and it is always a good idea to check this condition first. 2.3.2 Data structures and compression Zero sets (definition 5) can be implemented as bit sets using one bit per set element. This requires only little space in memory, and set operations such as intersection and union can be computed with bitwise and/or functions—a single CPU operation for set sizes 32(64). Subset (superset) testing is not as obvious, but can be easily derived from the following basic set property: Z ⊆Z ⇐⇒Z ∩Z ≡ Z (4) 1 2 1 2 1 Gagneur and Klamt (2004) showed that storing only binary values for Fig. 1. Iteration step of the double description algorithm. A new extreme ray processed reactions is sufficient. Intermediary extreme rays consist of a h is created from adjacent rays a and g. Descendants of non-adjacent rays, binary and a numeric part. After enforcing irreversibility constraints, flux such as i from c and g, are not extreme rays. values corresponding to the processed reaction are converted from numeric to binary. Consequently, only binary extreme rays are left after the last iteration m×q where I is the identity matrix. If N has full rank m, the kernel matrix step, from which numerical flux values can be reconstructed. K has dimensions q × (q −m) and K consequently m × (q −m). All steady- Another important technique to reduce the size of data structures is state constraints (Equation 2) are satisfied by the nature of the kernel matrix. to remove redundancies beforehand. This saves memory, but also affects Furthermore, q −m irreversibility constraints (Equation 1) are fulfilled due performance since operations on smaller structures are faster, and compacted to the identity matrix in K , leaving m inequalities to be solved in the iteration stoichiometric matrices typically lead to fewer iteration steps. Good phase. Inequality constraints of the flux cone involve only one variable, the overviews of compression techniques are given in Gagneur and Klamt (2004) reaction for which only positive flux values are desired. Let us assume that and in the Appendix B of Urbanczik and Wagner (2005). we are in the iteration phase and have already considered j irreversibility constraints. To add constraint j + 1, extreme rays of the flux cone F are 2.3.3 Constraint ordering The double description algorithm is known + 0 − partitioned into three groups, R , R and R , which correspond to rays with to be very sensitive to constraint ordering. In Fukuda and Prodon positive, zero and negative flux value at reaction j + 1, respectively. Extreme (1995), different row ordering heuristics are compared, favoring simple + 0 rays of the cone F are those in R and R plus new extreme rays generated j+1 lexicographical ordering of matrix rows. Using the nullspace approach, the + − + − from ray pairs (r ,r ) ∈ (R ,R ). New rays are created by canceling out kernel matrix in row-echelon form serves as initial extreme ray matrix. the flux value at position j + 1 using Gaussian elimination, i.e. The identity part of the matrix has to be preserved, but remaining rows new + − − + can (and have to) be sorted to optimize performance. Unfortunately, no r =r r −r r (3) j+1 j+1 mathematical insight is available so far. In practice, we observed good Note that the new ray is a non-negative combination of old rays, and thus performance with the following orderings: maximum number of zeros + − by definition 1 surely a ray of the cone. Combining all elements of (R ,R ) (mostzeros), lexicographical (lexmin), absolute lexicographical (abslexmin), would generate a quadratic number of new rays measured in terms of fewest negative/positive pairs (fewestnegpos, reducing the set of adjacent intermediary extreme rays. However, only pairs of adjacent rays generate pair candidates) and combinations thereof. extreme rays (a and g in Fig. 1), thus adjacency testing or better directly enumerating adjacent rays is essential for any feasible implementation. 3 RESULTS 2.3.1 Elementarity testing Principally, one could use the definition of 3.1 Algorithmic improvements extreme rays to test elementarity of newly generated rays. Hence, either the rank of a submatrix of N is computed according to definition 6a, or the 3.1.1 Bit pattern trees In the iteration phase of the double new ray is tested against all other extreme rays as imposed by 6b. Instead of description method, new extreme rays are generated from adjacent constructing new rays first to discard most of them via the elementary test, ray pairs. According to definition 7b, we can enumerate all ray pairs it is advantageous to know about elementarity of the new ray beforehand. + − (r ,r ) and test adjacency by ensuring that no superset Z (r) ⊇ Only adjacent extreme rays generate new extreme rays, and the definition + − Z (r ) ∩Z (r ) exists. This is guaranteed with an exhaustive search of adjacency descends from definition 6. + − over all r r ,r , but an indexed search strategy is preferable. Definition 7. Let r, r and r be extreme rays of F. If one of the following Zero sets are q-dimensional tuples, hence multi-dimensional binary holds, both hold and r and r are said to be adjacent: search trees (kd-trees) can be used for optimized searching (Bentley, a) rank(N )+|Z (r) ∩Z (r )|= q − 2 1975). Therefore, we adapted kd-trees to binary data and invented .Z (r)∩Z (r ) the concept of bit pattern trees (Terzer and Stelling, 2006). b) if Z (r ) ⊇Z (r) ∩Z (r ) then r ror r r Figure 2 illustrates superset search on a bit pattern tree. Pseudo Both (a) and (b) can be used to test adjacency of two extreme rays. Test (a) code for tree nodes and search method are given in Supplementary does not depend on the number of intermediary extreme rays and thus is not Material. The fundamental idea of bit pattern trees is very simple: decelerating during computation, which is advantageous. Furthermore, it can a binary search tree is constructed, each node separating zero sets be easily applied for distributed computing since it only depends on the rays containing a certain bit (right child tree) from those not containing to be tested and on the stoichiometric matrix—a system invariant. Klamt et al. + − the bit (left). Searching a superset of Z =Z (r ) ∩Z (r ), we (2005) actually suggest using the rank test for that purpose. However, for traverse the tree and test at each node whether Z contains the bit larger networks, rank computation time is not negligible, especially if exact used for separation of zero sets in the subtrees. If Z contains the bit, arithmetic is used. It is a cubic algorithm using Gaussian elimination, i.e. m×q only the right child node can contain supersets, if not, both children O mm(q −m) for a full-rank stoichiometric matrix N . In Section 3.1.3 are recursed. we suggest a new rank update method that circumvents these problems and Since the tree will never contain all 2 possible sets, not all makes (a) a competitive strategy especially for large networks. Apart from that, it is worth mentioning that rule (a) constrains the minimum number of bits will be used for separation in the nodes. We can exploit this 2231 M.Terzer and J.Stelling Fig. 2. Bit pattern tree with union patterns on the right of every tree node and leaf nodes at the bottom of the tree with zero sets in the boxes. Bold values are common for the whole subtree since they were used to separate left and right tree sets. The set in the dotted box at the top of the tree represents an exemplary intersection set Z . Searching a superset of Z , the tree is traversed ∩ ∩ along the blue arrows. Dotted arrows are not traversed and truncation of searching is indicated by crosses next to the union pattern causing abortion. to balance the tree by selecting particular bits at tree construction time. Moreover, we store the union pattern Z (T ) = Z for every U i tree node T , unifying all zero sets Z ∈T . We can abort searching a superset of Z at node T if Z ⊆Z (T ). ∩ ∩ 3.1.2 Recursive enumeration of adjacent rays The simplest approach to use bit pattern trees enumerates all ray pair candidates and tests adjacency as described in the previous section. However, enumerating all pairs is still an elaborate task in O(n ), where n is the number of intermediary modes. To improve this enumeration step, + − 0 Fig. 3. Pseudo code for recursive enumeration of adjacent rays using bit we construct three bit pattern trees, T , T and T for rays with pattern trees. Steps (i) and (iii) contain the recursions, in (ii), adjacent pairs positive, negative and zero flux value for the currently processed are found and added to the pairs list. reaction, respectively. To enumerate all adjacent ray pair candidates + − + − (r ,r ) ∈ (T ,T ), we perform four recursive invocations on the + − can then benefit from the precomputed matrix part and only need to subtrees of the nodes T and T [see Step (i) in Fig. 3]. + − perform triangularization of the remaining part. The cut pattern Z =Z (T ) ∩Z (T ) unifies all intersection C U U + − Let us consider two consecutive recursions of addAdjPairs with sets Z =Z (r ) ∩Z (r ), and hence can be used as test set covering + − + − parent nodes (P ,P ) and child nodes (C ,C ), e.g. assuming all candidates of the subtrees (Terzer and Stelling, 2006). Note that the first recursion of Step (i) in Figure 3. Descending the tree, meetsPrerequirement is called on each recursion level, and thus union patterns can only have fewer or equally many elements. Thus, efficient tests are preferable. Any necessary condition for adjacency the parent cut pattern is a superset of the child cut pattern, i.e. can be used. Here, we applied minimum set size q − 2 −rank(N ) C ⊇C . Adjacency test 7a consists of two parts: the rank of a deduced from adjacency test 7a. A recursion occurs if at least one P C column submatrix of N , and the size of the test set T =Z (r) ∩Z (r ). tree node is an intermediary node. If both nodes are leaves, all pairs Here, T coincides with the cut pattern, and since the child set C are tested, first again using meetsPrerequirement and then by testing contains at most all elements of C , the submatrix N contains at for adjacency. If the combinatorial test 7b) is used to implement .C + − is Adjacent, we search for supersets in all three trees T , T and least all columns of N . The elements C \C =C \C are thus C P P C .C T (see Supplementary Material for a simple case study). exactly those columns which are added to the triangularized part of the matrix at this recursion step. A single step of the rank update method is illustrated in Figure 4. Note that adjacent enumeration 3.1.3 Lazy rank updating Instead of the combinatorial adjacency aborts early for many node pairs due to the meetsPrerequirement test, as explained in the previous section, we can use the rank test 7a. test. Therefore, we execute triangularization of the matrix lazily, This has the advantage that testing does not depend on the number of that is, not before a real rank computation is requested. intermediary modes and easily allows for distributed computation. However, for large networks, rank computation is an expensive procedure and some care has to be taken. 3.1.4 Floating point versus exact arithmetic Gaussian Considering that we are recursively traversing two trees with little triangularization with floating point numbers typically uses change between two recursion steps, we can think of an update full pivoting to minimize numeric instability effects. Rank updating strategy for the examined matrix. Using Gaussian elimination to uses the same matrix for various rank computations, and instability compute an upper triangular matrix to derive the rank, we extend becomes a serious issue. To circumvent this problem, the non- the triangular part with every recursion step. Lower recursion levels triangularized matrix part is initially stored at each update level. 2232 Large-scale computation of elementary flux modes in the above equation has to fit into a register, that is, it has to be below 2 with e = 32(64). We can ensure this by choosing our prime e−1 m < 2 . Suppose that we have reached the point where no non-zero values are left when scanning for the next pivot row/column. This is where rank computation normally stops, but in the residue case, one of the zero elements could theoretically be a multiple of our prime, unequal to zero in the non-residue world. That is, the rank computed by residue arithmetic is at most equal to the real rank. The probability −1 Fig. 4. One step of the rank update method: partially triangularized matrix of any non-zero value being zero (mod m)is m , which could cause before the current step (left) and continuation after triangularization of a problem if many different rank computations were executed. The −1 columns two and four (right). × stands for non-zero pivot elements, probability can be improved to m by simultaneously computing ⊗ for any value. Triangularized columns are reflected by elements in the the rank for different primes m , which can indeed be performed in complementary cut patterns C and C , respectively. Exemplary patterns are P C parallel in modern processors by using SIMD instructions (single given in binary (square brackets) and standard set notation (curly brackets), instruction multiple data, e.g. SSE instructions of an Intel processor). R stands for any remainder of the sets, R for its binary representation. Note However, in practice, even a single small prime around 100 did a that column three and four are swapped during the update step to put the perfect job. pivots in place, and rows might have been swapped to find non-zero pivot elements. If all remaining elements of an added column are zero, it is put to the end of the matrix and ignored at subsequent steps. 3.1.6 Exploiting multi-core CPUs Most current processors have multi-core architectures, allowing multiple threads or processes to It is restored before each rank computation and before branching run concurrently. We made use of this extra power with the help of to the next update level. Neglecting that this procedure uses some semaphores at Step (i) in Figure 3. The semaphore maintains a set of more memory, it has still two main disadvantages: (i) restoring permits, one per CPU core, with efficient, thread safe operations to significantly affects performance, and (ii) instability might still be acquire and release permits. The algorithm tries to acquire a permit an issue for large or ill-conditioned matrices. to start a new thread. If the permit is received, the four recursive Rank updating was therefore implemented with exact arithmetic, invocations are split into two parts—two invocations for the current, using rational numbers with large integer numerators and two for the newly created thread. If no permit is acquired, the current denominators. Opposed to the floating point variant, small pivot thread executes all four recursions. If a thread completes by reaching elements are chosen to avoid uncontrolled growth of integers. The the leaves of the tree, it releases the permit, triggering a new thread to downside of fraction numbers is 2-fold: arithmetic operations such as start soon after. Note that it is important to have very efficient acquire addition and multiplication are much more expensive, and integers and release operations since they are called very often. The second are possibly still growing, even if fractions are continually reduced. critical point is the concurrent write access to the pairs set. Either the Using store and restore as described for floating point arithmetic set itself is made thread safe, or the new thread gets its own set and improves control of integer growth and leads to better overall the parent thread is responsible for merging after completion of the performance. child thread. This dynamic concurrent tree traversal strategy can be improved by primarily collecting recursive invocations to a certain 3.1.5 Rank computation using residue arithmetic A powerful recursion depth if permits are available. A new thread is started method to implement rank computation is to work with integer for each available permit and all threads concurrently process the residues u modulo a prime m. The residues u are constrained collected recursions. Noteworthy, this fine-grained parallelization to the interval 0 ≤u <m using unsigned, and to −m <u <m with technique is applicable to both adjacency test variants. It can be signed arithmetic (Knuth, 1997, Section 4.3.2). Assuming that applied simultaneously with other parallelization approaches, for the stoichiometric matrix is rational, we compute the residue instance, the coarse-grained approach given in Klamt et al. (2005) matrix N =[n ] by multiplying each numerator n with the where the computational task is divided into disjunct subtasks that ij ij can be run in parallel. multiplicative inverse of the denominator d (modulo m), that is, ij −1 n = ((n mod m)(d mod m) mod m). Note that multiplicative ij ij ij inverses are defined for numbers being coprime with m. They are 3.2 Benchmarks computed using the extended Euclidean algorithm. If any of the For benchmarking, we analyzed an E.coli central metabolism denominators is not coprime with m, that is, it is a multiple of network with 106 reactions and 89 metabolites that was also used the prime—which is actually very unlikely for large primes—we in (Klamt et al., 2005). All tests were performed on one or two multiply the whole column of the matrix with m (or a power of it if dual-core AMD Opteron processors of a Linux 2.6.9 machine, necessary) and reduce the fractions. using a Java 64-Bit runtime environment (version 1.6.0) with For the triangularization of the matrix, no inverses are needed. 30 GB maximum heap memory size. For all tests, only the iteration Suppose we have chosen pivot element n . The new values n of rc ij cycle time of the algorithm was taken, excluding preprocessing subsequent rows are then calculated as (compression, etc.) and post-processing (from binary to numeric EFMs). Computation times for the different algorithmic strategies n = ((n n −n n ) mod m) for all i >r,j ≥c ij ij rc rj ic are compared in Figure 5A. Combining recursive enumeration of resulting in zeros for values n below the pivot element. If we want adjacent rays on bit pattern trees and a combinatorial adjacency test ic to use CPU arithmetic operations, the difference of two products shows best performance for the selected examples. Standard rank 2233 M.Terzer and J.Stelling 3 6 10 4 x 10 A B 3.5 2 A B 2 6 ← n 10 30 2.5 10 ← n 1.5 -1 10 0.5 1 2.5 10 26 1 2 3 4 0 0 # EFM’s # Threads nb 0 1 2 3 4 5 6 0 2 4 6 8 10 Configuration Top yield tolerance [%] Fig. 5. Benchmark results on a system with two dual-core AMD Opteron Fig. 6. (A) Number of EFMs for different amino acid uptake configurations, processors for different configurations for an E.coli central metabolism namely modes without biomass production (nb) and modes with 0–6 network. (A) Computation times with four threads for combinatorial test concurrently enabled amino acid uptake reactions. Only EFMs on the (−·◦·−), standard rank test (··×··), rank updating with exact arithmetic left of the dashed line can be computed from single amino acid uptake (–  –) and with residue arithmetic (−−). Dashed gray lines indicate linear configurations. (B) CV for reaction fluxes of different uptake configurations, and quadratic computation time. (B) Speedup with two and four threads considering only top yield EFMs. The configurations are: glucose without compared to single thread computation. Line colors indicate the problem amino acid uptake (dashed line), with phenylalanine (dotted), with glutamate size: red, blue and black stand for 0.5, 2.45 and 26 Mio EFMs, respectively. (dash-dotted) and with all selected amino acids (solid line). Line styles correspond to different computation methods: combinatorial test (——), rank updating with exact arithmetic (···) and with residue arithmetic (−−). The thin gray diagonal represents ideal speedup. Figure 6A shows the number of elementary modes per amino acid uptake configuration in terms of the number of simultaneously computation combined with recursive enumeration of adjacent rays enabled uptake reactions. The total number of EFMs with at most one and rank updating with residue arithmetic have similar performance amino acid uptake is 1 714 691, only 6.5% of the EFM set enabling for these examples. The superiority of the rank test in Klamt simultaneous ingestion of up to six amino acids. Conversely, by et al. (2005) may be caused by their unoptimized linear search computing EFM sets for one amino acid uptake per simulation, and method for the combinatorial test. Here, we make use of pattern combining the resulting sets afterwards, adding up to 1.7M EFMs, trees for an optimized search (Terzer and Stelling, 2006). However, simultaneous uptake cases are missed. These correspond to >90% rank tests might be more efficient for larger networks, since of the metabolic pathways. rank testing decelerates probably much slower with increasing A major disadvantage of single-objective optimization techniques stoichiometric matrix size than combinatorial testing, which depends such as FBA is the restriction to one optimal flux mode. Robustness on the exponentially growing number of intermediary modes. For reflected by a certain degree of flexibility is disregarded. We larger networks, we observed better performance for the rank therefore analyzed flux variability for modes with suboptimal updating method (data not shown). Rank updating is definitely a biomass yield (Fig. 6B). We observed no flux variation for zero good choice especially if parallel computation is considered. We deviance from the top biomass yield since single EFMs reach recommend residue arithmetic for matrix rank computations in optimality. With increasing suboptimality, the number of reactions parallelized versions, but rank updating performs remarkably well with coefficient of variation (CV) over 50% grows rapidly. Around when exact arithmetic is used instead. Speedup factors for one, 2% below top yield, reaction variation reaches a saturation point. two and four threads using two dual core CPUs are shown in Interestingly, we find a similar saturation point for different CV Figure 5B. Exploiting multi-core CPUs is particularly effective for threshold values (see Supplementary Material for other thresholds). large problems, where the speedup factors approach the optimum. Hence, the cell can achieve high flexibility (robustness) with only For our examples, all methods—even exact arithmetics for the little decrease in metabolic efficiency. large problems—are faster than CellNetAnalyzer/Metatool (Klamt Next, we applied our methods to a possibly more realistic, et al., 2005). For 507 632/2, 450 787 EFMs, we timed 2 min genome-scale metabolic network of H.pylori. In their study, Price 57 s/92 min 09 s for Metatool (5.0.4) and 1 min 30 s/19 min 39 s for et al. (2002) computed the much smaller set of extreme pathways the combinatorial test with one and 40 s/8 min 08 s with four threads, (EPs) for the formation of all non-essential amino acids and respectively (see Supplementary Material for network configuration ribonucleotides. Here, we focus on the amino acids and compute details and all computation times). EFMs for all non-essential amino acids simultaneously. Allowable inputs are d- and l-alanine, arginine, adenine, sulfate, urea and 3.3 Large-scale EFM computation oxygen; outputs are ammonia, carbon dioxide and the carbon For our first large-scale computational experiment, we used different sinks succinate, acetate, formate and lactate [in correspondence configurations of the central metabolism network of E.coli (Stelling to case 4 in Price et al. (2002)], together with all non-essential et al., 2002) with growth on glucose. Besides uptake of glucose, amino acids. Focusing on the production of a specific amino six central amino acids from different biosynthesis families are acid, only a small fraction of EFMs is found with typical small- available, only one at a time in a first experiment. This results in scale simulations, preventing the simultaneous production of other six sets of elementary modes for the amino acids alanine, aspartate, amino acids (between 3% for proline and 16% for asparagine, glutamate, histidine, phenylalanine and serine. The sets contain respectively). Altogether, only 815 576 of 5 785 975 EFMs are found between 321 431 (alanine) and 858 648 (glutamate) EFMs. In a with single amino acid simulations—as for the E.coli case, over second evaluation, we computed the set of EFMs for combined 85% of the modes are missed (see Supplementary Material for uptake that consists of 26 381 168 modes. details). Furthermore, allowing for simultaneous production of all Time [min] Speedup # EFMs % Reactions with CV > 50.0% Large-scale computation of elementary flux modes we analyzed the CV for all reactions. Interestingly, flux variation increases rapidly when gradually decreasing optimality of biomass production to 2% below maximum yield. After this saturation point, we observed only little gain of variation. This gives a marginal value where the cell can easily gain robustness at a small price of biomass production. Importantly, such multi-objective flux phenotypes cannot be explained with single objective optimization techniques such as FBA. In a second application, we analyzed the simultaneous production of non-essential amino acids in a genome- 20 40 60 80 100 120 Path Length scale metabolic network of H.pylori. Most of the more than 5 million EFMs generate multiple amino acids concurrently, and they Fig. 7. Path length distributions of EFMs for glycine production in H.pylori have significantly larger path lengths than those producing only a when glycine is produced alone (dark bars) or possibly jointly with other single amino acid. These more complex cellular functions are missed amino acids (light bars). with simplified setups considering only one amino acid at a time. Altogether, our study shows both the potential and necessity of large- amino acids yields EFMs with larger path lengths as shown in scale computation of elementary modes, an important step toward Figure 7 for glycine production. Path length distributions are similar universal genome-scale applications. for the other amino acids (see Supplementary Material) and the more complex pathways are not reflected in simplified single amino ACKNOWLEDGEMENTS acid experiments. Hence, our computational methods allow for We thank G. Gonnet and U. Sauer for pointers to methods and for investigation of combined production of amino acids at a genome- comments on the article. scale level for the first time. Still, applicability to larger networks and more complex environmental conditions has to be shown. Conflict of Interest: none declared. 4 CONCLUSION REFERENCES Currently, the computation of EFMs is restricted to metabolic Arita,M. (2004) The metabolic world of Escherichia coli is not small. Proc. Natl Acad. networks of moderate size and connectivity because the number of Sci. USA, 101, 1543–1547. Bentley,J.L. (1975) Multidimensional binary search trees used for associative searching. modes and the computation time raise exponentially with increasing Commun. ACM , 18, 509–517. network complexity. Here, we introduced new methods which Fukuda,K. and Prodon,A. (1995) Double description method revisited. In are several times faster than traditional algorithm variants for Combinatorics and Computer Science. Vol. 1120, Springer, Heidelberg, pp. 91–111. medium problem sizes, and even higher speedups are expected Gagneur,J. and Klamt,S. (2004) Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinformatics, 5, 175. for larger problems. We proved the efficiency by successfully Klamt,S. and Stelling,J.(2006) Stoichiometric and constraint-based modeling. In competing with alternate implementations, and through large-scale Szallasi,Z. et al. (eds) System Modeling in Cellular Biology. MIT Press, Cambridge computations for example networks. However, it remains open / MA, pp. 73–96. whether EFMs can be computed for the most recent genome- Klamt,S. et al. (2005) Algorithmic approaches for computing elementary modes in large biochemical reaction networks. IEE Proc. Syst. Biol., 152, 249–255. scale networks. It can possibly be achieved in the near future with Knuth,D.E. (1997) Seminumerical Algorithms. Vol. 2 of The Art of Computer intelligent modularization strategies and parallelized algorithms. Programming. 3rd edn. Addison-Wesley, Reading MA. Memory consumption becomes a major challenge and one might Motzkin,T.S. et al (1953) The double description method. In Kuhn,H. and Tucker,A. have to consider out-of-core implementations that store intermediary (eds) Contributions to the Theory of Games II . Vol. 8 of Annals of Math. Studies. modes on disk. The current in-core implementation requires a lot of Princeton University Press, Princeton / RI, pp. 51–73. Price,N. et al. (2002) Determination of redundancy and systems properties of the memory for large computations. However, our examples with up to metabolic network of Helicobacter Pylori using genome-scale extreme pathway 2.5 M EFMs can be run on a normal desktop computer with 1–2 GB analysis. Genetics Res., 12, 760–769. memory. We anticipate that porting the code from Java to C will Price,N. et al. (2004) Genome-scale models of microbial cells: evaluating the result in rather marginal performance and memory improvements consequences of constraints. Nat. Rev. Microbiol., 2, 886–897. at the cost of reduced interoperability and maintainability. We Schuetz,R. et al. (2007) Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol. Syst. Biol., 3,Article 119. therefore prioritize out-of-core computation and parallelization. We Schuster,S. and Hilgetag,C. (1994) On elementary flux modes in biochemical reaction already applied fine-grained parallelization by exploiting multiple systems at steady state. J. Biol. Syst., 2, 165–182. cores of modern CPUs and our rank update method with residue Segrè,D. et al. (2002) Analysis of optimality in natural and perturbed metabolic arithmetic is readily applicable to parallel computation. Coarse- networks. Proc. Natl Acad. Sci. USA, 99, 15112–15117. Stelling,J. et al. (2002) Metabolic network structure determines key aspects of grained parallelization complements our multi-core technique and functionality and regulation. Nature, 420, 190–193. can be applied simultaneously. Terzer,M. and Stelling,J. (2006) Accelerating the computation of elementary modes To investigate the potential of large-scale EFM computation to using pattern trees. In Bucher,P. and Moret,B.M.E. (eds) WABI , Vol. 4175 of Lecture yield new biological insight, we first focused on the analysis of Notes in Computer Science. Springer, Heidelberg, pp. 333–343. E.coli central metabolism. The enhanced computation potential Urbanczik,R. and Wagner,C. (2005) Functional stoichiometric analysis of metabolic networks. Bioinformatics, 21, 4176–4180. enabled the calculation of over 26 million elementary modes for Wagner,C. (2004) Nullspace approach to determine the elementary modes of chemical growth on glucose and simultaneous uptake of selected amino acids. reaction systems. J. Phys. Chem. B, 108, 2425–2431. Only a small fraction of these EFMs and none of the maximum Wagner,C. and Urbanczik,R. (2005) The geometry of the flux cone of a metabolic yield modes are found with typical small-scale computations. Next, network. Biophys. J., 89, 3837–3845. # EFMs

Journal

BioinformaticsOxford University Press

Published: Aug 1, 2008

There are no references for this article.