Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Holimap: an accurate and efficient method for solving stochastic gene network dynamics Article https://doi.org/10.1038/s41467-024-50716-z Holimap: an accurate and efficient method for solving stochastic gene network dynamics 1 2 Received: 2 March 2024 Chen Jia &Ramon Grima Accepted: 13 July 2024 Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of Check for updates simulation methods that can accurately and efficiently predict how the dis- tributions of gene product numbers vary across parameter space. To over- come these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap’s computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene- gene interactions results in precise coordination and control of gene expression. Genetic regulation occurs through intricate interactions between a Boolean networks, the expression of each gene is tracked by a binary 1–4 number of genes .Agene “X” may express a protein which acts as a variable and hence large networks can be examined in a computa- transcription factor (TF), promoting or inhibiting the RNA polymerase tionally efficient way. A more refined description is provided by the use assembly on another target gene “Y” (or on itself) and thus regulating of ODEs, where the time-dependent concentrations of RNAs, proteins, the extent that the latter is expressed . These gene-gene interactions and other molecules are predicted as a function of the rate constants 16,17 can be simply visualized as a directed graph with the genes being the of the reactions in the network . An even more realistic description nodes (vertices) and the directed edges (links) representing the makes use of the CME approach where one predicts not only the mean 6,7 interactions . Networks inferred from gene expression data, com- expression levels of various genes but also the distributions of the monly called gene regulatory networks , have been reconstructed by discrete numbers of mRNAs and/or proteins measured across a 9–13 18 several methods . The complex connectivity of these networks population of cells . This stochasticity has various sources (biological makes intuitive understanding of their dynamics challenging. Conse- intrinsic and extrinsic noise, and technical noise introduced by quently, the construction, mathematical analysis, and simulation of experimental protocols), all of which lead to the large differences in 19–21 models of gene regulatory networks are indispensable tools in a gene expression observed from one cell to another . quantitative biologist’s arsenal. Unfortunately, with an increasing level of sophistication and Several formalisms have been employed to predict gene reg- predictive power, simulations also rapidly become computationally ulatory network dynamics, including Boolean networks, ordinary dif- expensive. Unraveling the stochastic dynamics of gene networks ferential equations (ODEs), and chemical master equations (CMEs)— requires solving a set of coupled CMEs for the probability of the system for reviews covering these approaches and more, please see refs. 14,15. being in each possible state. Since the number of states of a gene These approaches have various advantages and disadvantages. In network is typically infinite, direct solution of these equations is 1 2 Applied and Computational Mathematics Division, Beijing Computational Science Research Center, Beijing, China. School of Biological Sciences, University of Edinburgh, Edinburgh, UK. e-mail: [email protected] Nature Communications | (2024) 15:6557 1 1234567890():,; 1234567890():,; Article https://doi.org/10.1038/s41467-024-50716-z t = 5 CPU time nonlinear network t = 10 SSA # of protein 1 # of protein 2 Holimap t = 5 FSP CPU time linear network t = 10 # of protein 1 # of protein 2 Fig. 1 | Illustration of Holimap and its advantages over the SSA. Holimap row indicate that FSP truncates an infinite state space into a finite one and then decouples gene-gene interactions in a nonlinear regulatory network and trans- solves the finite-dimensional CME). Compared with the conventional Monte Carlo forms it into a linear network with multiple effective parameters, some of which approach (the SSA, whose two main stochastic steps are illustrated by dice), Holi- may be time-dependent. The time evolution of protein number distributions (for all map not only significantly reduces the CPU time, but it also yields an accurate, genes) of the nonlinear network can be approximately predicted by solving the noise-free prediction of the protein number distributions. dynamics of the effective linear network using, e.g., FSP (the lattices in the lower impossible. The finite-state projection algorithm (FSP) truncates the The paper is structured as follows. The Holimap method is infinite state space to a finite one; this renders numerical solutions introduced by means of a simple autoregulatory feedback loop possible becauseweonly need tosolve a finite-dimensional CME. example where we show step-by-step how the approximation is con- However, the immense number of states limits its applications to very structed when second or higher-order interactions are only between a small networks with one or two interacting genes. For larger networks protein and a gene. The method is then extended to show the appli- with multiple interacting genes, Monte Carlo simulations based on the cation to more complex networks with multiple protein-gene inter- stochastic simulation algorithm (SSA) become more practical. Spe- actions and also to networks with gene product interactions such as cifically, given the current state of the system, the SSA generates two those with RNA-RNA, RNA-protein, and protein-protein high-order random numbers to predict the time when the next reaction event reactions. By comparison with the SSA or FSP, we show that inde- occurs and which particular reaction event will occur. The output is a pendent of the type of interactions in a gene network, Holimap pro- number of statistically correct trajectories (molecule number versus vides highly accurate time-dependent distributions of protein or time data), one for each cell, from which the copy number distribu- mRNA numbers over large swathes of parameter space including those tions of all biochemical species can be calculated. However, the issue regions where the system displays oscillatory or multistable dynamics. remains that a large sampling size is typically required to obtain Finally, we show that the computation time of Holimap can be sig- smooth distributions and hence the computational time can still be nificantly reduced while maintaining its accuracy by devising a hybrid very considerable. For an introduction to simulation methods in sto- method which combines both Holimap and the SSA. chastic biology, we refer the reader to refs. 24–26. In this paper, we overcome the difficulties of conventional Results stochastic simulation methods for gene networks by devising an Fundamental principles of Holimap illustrated by an auto- efficient approach—the high-order linear-mapping approximation regulation example 27,28 (Holimap). The basic idea is to map the dynamics of a complex gene Consider a simple autoregulatory feedback loop ,whereby protein network with second or higher-order interactions (a system with expressed from a gene regulates its own transcription (Fig. 2a). Feed- nonlinear propensities and hence a nonlinear network) to the back is mediated by cooperative binding of h protein copies to the 29–32 33 dynamics of a much simpler system where all reactions are first- gene . In agreement with experiments , protein synthesis is order (a linear network). The reaction rates of this system are gen- assumed to occur in bursts of random size k sampled from a geometric erally time-dependent and complex functions of the reaction rates distribution with parameter p, i.e., Pðk = nÞ = p ð1  pÞ.Here σ is the of the original gene network and they are found by conditional binding rate of protein to the gene; σ is the unbinding rate; ρ and ρ u b u moment-matching. The linear network has a much smaller state are the burst frequencies of protein, i.e., the frequencies with which space than the nonlinear network which means that now simulation bursts are produced, when the gene is in the bound and unbound using FSP becomes feasible, leading to smooth distributions of states, respectively; d is the rate of protein degradation and dilution protein numbers in a fraction of the time taken by SSA simulations. (due to cell division). The reaction system describes a positive feed- For an illustration of Holimap see Fig. 1. back loop when ρ > ρ (since in the case, binding of a protein increases b u Nature Communications | (2024) 15:6557 2 probability probability probability probability Article https://doi.org/10.1038/s41467-024-50716-z autoregulatory circuit LMA a b σ σ b b G + h P G* G G* P ρ σ σ u u u G* G + h P G* G σ σ b u ρ ρ σ σ u u b u G G + k P G G + k P ρ ρ b b b G* b G* + k P G* G* + k P h copies G* d d G* 2-parameter Holimap 4-parameter Holimap c d σ σ b G b G G G G* G* ρ ρ u u σu P u P G* G G* G d d ρ ρ σ σ u σb σu u b u G G + k P G G + k P b b b G* ρ G* G* + k P G* + k P d d G* G* e f LMA 2-HM 4-HM HD HD HD HD 1.5 0.18 1 0.69 1 0.69 1 0.69 σ < d < d σ > d > d u u 0.46 0.46 0.46 0.12 -1 -1 -1 positive positive feedback feedback 0.06 0.23 0.23 0.23 negative negative bimodality bimodality bimodality bimodality bimodality bimodality feedback feedback LMA LMA 2-HM 2-HM 4-HM 4-HM 0.5 0 -3 0 -3 0 -3 0 -0 2 2 -0 2 2 -0 2 2 0.5 1 1.5 log ρ log ( σ / d ) log ( σ / d ) log ( σ / d ) u u u 10 u 10 10 10 slow switching intermediately fast switching fast switching h g 0.09 0.15 0.24 0.48 FSP LMA LMA 2-HM 0.06 0.1 0.16 0.32 4-HM 2-HM A B C 4-HM 0.03 0.05 0.08 0.16 0 0 0 13 2 4 02 10 0 30 40 02 10 0 30 40 02 10 0 30 40 cooperativity protein number protein number protein number Fig. 2 | Holimaps for autoregulatory gene networks in steady-state conditions. binding rate σ (normalized by the decay rate d)when ρ ≫ ρ . The red curves b b u a Stochastic model of an autoregulatory feedback loop, which includes bursty enclose the true bimodal region, i.e., the parameter region in which the protein protein synthesis, protein decay, cooperative binding of protein to the gene, and number has a bimodal distribution, as predicted by FSP; the orange curves enclose unbinding of protein. b The LMA maps the nonlinear network to a linear one with the bimodal region predicted by the approximation method. The vertical white effective parameter σ ^ . The high-order reactions G + hP ⇌ G in the former are dashed line demarcates the region of σ ≥ d, where the linear network given by the b u replaced by the first-order reactions G ⇌ G in the latter. c The 2-HM maps the LMA can never exhibit bimodality, from the region of σ < d where it can. nonlinear network to a linear one with effective parameters σ ~ and σ ~ . d The 4-HM g Comparison of the steady-state protein distributions computed using FSP, LMA, u b maps the nonlinear network to a linear one with effective parameters σ  ,σ  ,ρ  ,and 2-HM, and 4-HM in different regimes of gene state switching. h The maximum HD as u b u ρ . e Heat plot of the HD for the LMA as a function of the protein burst frequencies a function of the cooperativity h for the LMA and Holimaps. Here the maximum HD ρ and ρ . Here the HD for the LMA represents the Hellinger distance between the is computed when σ and σ vary over large ranges, while other parameters remain u b u b real steady-state protein distribution computed using FSP applied to the nonlinear fixed. See Supplementary Note 1 for the technical details of this figure. Source data system and the approximate protein distribution computed using the LMA. f Heat are provided as a Source Data file. plots of the HDs for the LMA and Holimaps as functions of the unbinding rate σ and its own expression) and describes a negative feedback loop when the moments: ρ < ρ (binding of a protein decreases its own expression). b u Let p denote the probability of having n proteincopiesinan i,n individual cell when the gene is in state i with i=0, 1 corresponding to g = σ g  σ μ , 0 u 1 b 1,0 the unbound and bound states, respectively. To proceed, let _ μ = ρ Bg  dμ + σ ðμ + g Þ σ ðμ + μ Þ, 1,0 u 0 1,0 u 1,1 1 b 2,0 1,0 g = p be the probability of observing the gene in state i and let i n=0 i,n _ μ = ρ Bg  dμ  σ μ + σ μ , 1,1 b 1 1,1 u 1,1 b 2,0 ð1Þ μ = nðn  1Þðn  m+1Þp be the mth factorial moment of m,i n=0 i,n μ =2ρ Bðμ + Bg Þ 2dμ 2,0 u 1,0 0 2,0 protein numbers when the gene is in this state. For simplicity, we first + σ ðμ +2μ Þ σ ðμ +2μ Þ, u 2,1 1,1 b 3,0 2,0 focus on the case of non-cooperative binding (h = 1). From the CME, it μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ , 2,1 b 1,1 1 2,1 u 2,1 b 3,0 is straightforward to obtain the following time evolution equations for Nature Communications | (2024) 15:6557 3 log ρ maximum HD 10 b log ( σ d ) probability / 10 b Article https://doi.org/10.1038/s41467-024-50716-z where g =1 − g and B = ⟨k⟩ = p/(1 − p) is the mean protein burst size, solved for the values of all zeroth, first, and second-order moments, 1 0 i.e., the mean number of protein molecules produced in a single burst. i.e., g , μ ,and μ . Finally substituting these into Eq. (4) gives the i 1,i 2,i e e For clarity, we have suppressed the explicit time dependence of all values of the effective parameters σ and σ for the linear network. See b u moments. Note that this system of equations is not closed, i.e., the Supplementary Note 2 for a more detailed explanation of the Holimap equation for a moment of a certain order depends on moments of algorithm. e e higher orders, and hence an exact solution is generally impossible. This In steady-state, the values of σ and σ are constants independent b u difficulty stems from the nonlinear dependence on molecule numbers of time, and hence we can use the steady-state protein distribution of of the bimolecular propensity modeling protein-gene interactions . the linear network to approximate that of the nonlinear one—this can In contrast, a linear gene network (one composed of only first- be computed analytically or using FSP. When the system has not e e order reactions, i.e., the propensity of each reaction has a linear reached steady-state, the values of σ and σ depend on time t.Inthis b u dependence on molecule numbers) is much easier to solve both ana- case, we can use the time evolution of the linear network with time- lytically and numerically than a gene network with nonlinear propen- dependent rates to predict that of the nonlinear one—while analytical sities; for example, the moment equations are closed and thus can be solutions are not generally available in this case, the distributions can solved exactly in this case. A basic idea of the linear-mapping be efficiently computed using FSP. approximation (LMA) developed in ref. 35 is to transform a complex In some regions of parameter space, the 2-HM may still not be nonlinear network into a linear one by replacing all second or higher- accurate enough. To solve this problem, we devise a second type of order reactions between proteins and genes by effective first-order Holimap—the 4-parameter Holimap (4-HM), which transforms the reactions. Specifically, for the network in Fig. 2a, we replace the reac- nonlinear network into the linear one illustrated in Fig. 2d. Here the * * tions G + hP ⇌ G by G ⇌ G . The LMA maps the nonlinear network to the binding rate σ , unbinding rate σ , and the protein burst frequencies ρ b u b linear one shown in Fig. 2b, where the binding rate σ for the former is and ρ for the former are replaced by four effective parameters b u replaced bytheeffectivegeneswitching rate σ for the latter, while the σ ,σ ,ρ ,and ρ for the latter, which can be determined by matching b b u b u other parameters remain unchanged. In the LMA, σ is chosen to be σ the moment equations for the two networks (“Methods”). Note that multiplied by the conditional mean of protein numbers in the unbound while for the 2-HM, we matched only the zeroth and first-order gene state, i.e., moments, for the 4-HM, we matched these and also the second-order moments. The 2-HM and 4-HM will be collectively referred to as Hol- σ μ b 1,0 imaps in what follows. σ = σ hnji=0i = , ð2Þ b b 0 Thus far, we have only considered the case of h =1.For thecaseof cooperative binding (h ≥ 2), the Holimap approximation procedure where g and μ can be calculated by a natural moment-closure can be similarly performed, except that higher-order moment equa- 0 1,0 method (“Methods”) . There are two approximations involved in the tionsneedtobesolved(SupplementaryNote2)—the algorithm for LMA: (i) in reality, the effective parameter σ ^ should be stochastic finding the effective parameters requires the solution of (h + 1)-order rather than deterministic since it is proportional to the instantaneous moment equations. For example, when h = 2, third-order moment protein number in the unbound state; (ii) any moment-closure method equations need to be solved and the effective parameters depend on inevitably leads to some errors . the values of zeroth, first, second, and third-order moments. We Next we propose an efficient method—Holimap, which we will emphasize that the computational cost of Holimap is mainly deter- show to perform much better than the LMA. There are two types of mined by the number of moment equations, L,tobe solved. For Holimaps. The first type is the 2-parameter Holimap (2-HM) which autoregulatory loops, L=1+2h for the LMA and L =3+ 2h for Holimap. transforms the nonlinear gene network into the linear one illustrated in Note that the 2-HM and 4-HM have the same L. Fig. 2c, where both the binding and unbinding rates σ and σ for the The principles used to construct Holimaps for autoregulated b u e e former are replaced by the effective gene switching rates σ and σ for networks can be used to obtain Holimaps for an arbitrarily complex b u e e the latter. The remaining question is how to determine σ and σ so network consisting of a system of interacting genes that regulate each b u that the solution of the linear network accurately approximates that of other via positive or negative feedback. A flow chart of the Holimap the nonlinear one. For the linear network, the evolution of moments is algorithm for a general regulatory network can be found in Supple- governed by mentary Fig. 1. The computational time of Holimap depends on the complexity of the network—an increased number of nodes (genes) or _ e e g = σ g  σ g , edges (regulatory reactions) results in an increased number of 0 u 1 b 0 _ moment equations L to be solved. In Supplementary Note 3, we prove μ = ρ Bg  dμ + σe μ  σe μ , 1,0 u 0 1,0 u 1,1 b 1,0 that for a general network, L scales polynomially with the cooperativity _ e e μ = ρ Bg  dμ  σ μ + σ μ , ð3Þ 1,1 b 1 1,1 u 1,1 b 1,0 h and scales exponentially with respect to the network size M (number _ e e μ =2ρ Bðμ + Bg Þ 2dμ + σ μ  σ μ , 2,0 u 1,0 0 2,0 u 2,1 b 2,0 of genes). _ e e μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ : 2,1 b 1,1 1 2,1 u 2,1 b 2,0 Applications to one-node (autoregulatory) networks e e The effective rates σ and σ are chosen so that the two systems have We now assess the performance of Holimap based on the Hellinger b u the same zeroth and first-order moment equations (for the latter, we distance (HD) between the steady-state protein distribution obtained mean the first-order moment when the gene is in the bound state). by applying FSP to the nonlinear network and the approximate dis- Matching the first and third identities in Eqs. (1)and (3), we find that σ tribution computed using the LMA and the two types of Holimaps. and σ should satisfy Note that while the direct application of FSP also leads to an approx- imate distribution, in effect, it can be considered exact since the error e e σ g  σ g = σ g  σ μ , is very small provided the state space is truncated to a large enough u 1 b 0 u 1 b 1,0 ð4Þ e e value . Here we choose the HD because it is bounded between 0 and 1; σ μ  σ μ = σ μ  σ μ : u 1,1 b 1,0 u 1,1 b 2,0 a visually accurate approximation is obtained when the HD ≪ 0.1. The remaining question is how to use these equations to obtain for- Figure 2e illustrates the HD for the LMA as a function of ρ and ρ . u b mulae for the effective rates. This can be done as follows: we first solve Clearly, the LMA performs well when ρ and ρ are not very different u b e e for σ and σ using Eq. (4) and then substitute these into Eq. (3)to from each other. However, it results in larger deviations from FSP when b u obtain a set of closed moment equations. These equations can be the protein burst frequency in one gene state is significantly larger Nature Communications | (2024) 15:6557 4 Article https://doi.org/10.1038/s41467-024-50716-z than that in the other. We also find that the LMA is much more accurate which usually occurs when σ and σ are large for negative feedback u b for negative feedback loops (ρ > ρ ) than for positive feedback loops loops (Supplementary Fig. 3). We did not observe numerical instability u b (ρ > ρ ). In the LMA, the effective stochastic parameter σ is for the 2-HM. As a result, the 2-HM might be the preferable choice b u approximated by σ multiplied by the conditional mean of protein when dynamics is of major interest. In steady-state, while the numbers in the unbound state. Hence it must give rise to inaccurate improvement in accuracy of the 4-HM may be marginal, nevertheless approximations when protein noise in the unbound gene state is large. since the two types of Holimaps require the solution of the same This is exactly what happens in the positive feedback case where the number of moment equations, the 4-HM is more advantageous when low synthesis rate in the unbound state results in a small conditional dynamics is not of interest. mean and thus large protein noise. We next examine whether Holimap outperforms the LMA when it Applications to two-node networks with deterministic mono- is applied to positive feedback loops. Figure 2fshows theHD against and bistability σ /d and σ /d for the LMA, 2-HM, and 4-HM when ρ ≫ ρ .Itisclear that We next evaluate the performance of Holimaps when applied to study u b b u the LMA (Fig. 2f, left) performs well when σ and σ are both small, but thesteady-statebehavioroftwo-nodegenenetworks,wheretwogenes u b it becomes highly inaccurate when σ and σ are larger. The protein regulate each other (Fig. 3a, left). Feedback is mediated by cooperative u b distribution can be unimodal or bimodal. The bimodal one is of par- binding of h copies of protein P to gene G and cooperative binding of 1 1 2 ticular interest because it indicates the separation of isogenic cells into h copies of protein P to gene G .Here σ and σ are the binding and 2 2 1 bi ui two different phenotypes. In particular, we find that the LMA results in unbinding rates for gene G,respectively; ρ and ρ are the synthesis i bi ui poor approximations when σ ≥ d and when the distribution is bimo- rates of protein P when the gene is in the bound and unbound states, u i dal. This can be explained as follows. Recall that the LMA transforms a respectively; d is the degradation rate of protein P . For simplicity, we i i nonlinear network into a linear one with unchanged σ ,which is do not take protein bursting into account, although it can be included commonly known as the telegraph model of stochastic gene easily. Depending on whether ρ < ρ or ρ > ρ for i=1,2,there are ui bi ui bi expression .Inref. 39, it has been proved that the telegraph model can four different types of effective system dynamics that constitute either produce a bimodal steady-state distribution only when both gene a positive feedback or a negative feedback loop (Fig. 3b). For example, switching rates are smaller than the protein decay rate (σ ,σ ^ < d). atoggleswitch(twonegativeregulations) corresponds to the case of u b When σ ≥ d, the linear network can never exhibit bimodality, while the ρ > ρ and ρ > ρ . For two-node networks, Holimaps can be per- u u1 b1 u2 b2 bimodality in the nonlinear network can be apparent. formed in a similar way as we have previously shown for autoregulatory We emphasize that σ ≥ d is biologically relevant since in naturally loops, i.e., by replacing all protein-gene binding reactions by effective occurring systems, protein is usually very stable and hence its decay first-order reactions with new parameters and also allowing some of rate is often smaller than the rates of gene state switching. For exam- the other reactions to have different rate constants than those in the ple, in mouse fibroblasts, it has been measured that the median original network (Fig. 3a, center and right). proteinhalf-life is 46 hand themeancellcycle duration is27.5h;hence We first focus on a negative feedback loop without cooperative the mean decay rate of protein is estimated to be binding (Fig. 3c). Since the LMA performs well when the unbinding 1 4 1 d = ðlog 2Þ=46 + ðlog 2Þ=27:5h =6:7×10 min .Inthe same cell rate σ is much smaller than the degradation rate d, hereweconsider ui i type, the mean activation and inactivation rates for thousands of genes the case of σ ≫ d . We use the HD between the actual and approximate ui i −1 −1 42 are estimated to be 0.002 min and 0.24 min . In another study, the steady-state distributions of protein P to test the accuracy of Holimap. −1 mean activation and inactivation rates are estimated to be 0.014 min Figure 3d illustrates the HDs for the LMA and 4-HM as functions of σ b1 −143 and 0.17 min .Hence σ ≥ d is indeed satisfied for most genes. and σ .We find that the network displays bimodality when σ is large u b2 b1 In contrast to the LMA, both the 2-HM and 4-HM markedly reduce and σ is small. This is surprising because in the literature there are b2 the HD values (Fig. 2f, center and right). The LMA has a maximum HD two well-accepted origins for bimodality: (i) a positive feedback loop of 0.7, while for the two types of Holimaps, the maximum HDs are only with ultra-sensitivity (type-I) and (ii) slow switching between gene 0.2 and 0.16. The 4-HM performs marginally better than the 2-HM in states (type-II), independent of the type of feedback loop .Herethe capturing steady-state protein distributions. We also compare the network is a negative feedback loop without cooperative binding, and region of parameter space where bimodality is predicted to exist thus there is neither a positive feedback loop nor ultra-sensitivity. (region enclosed by the orange curves) with the actual region where Moreover, since both σ and σ are large, gene G switches rapidly u1 b1 1 bimodality manifests according to FSP (region enclosed by the red between the two states. Hence the bimodality observed is neither type- curves). We note that while the LMA fails to capture the bimodal region I nor type-II, and in the following, we refer to it as type-III bimodality. of the protein distribution, especially when σ ≥ d, both the 2-HM and From Fig. 3d, it is clear that the LMA performs poorly in this 4-HM capture the vast majority of the bimodal region. In summary, the bimodal region. Again, the LMA cannot capture type-III bimodality deficiencies of the LMA for positive feedback loops are remedied by since it transforms the nonlinear network into a linear one with the use of Holimaps (Fig. 2g). unchanged σ , which is unable to produce a bimodal distribution ui Finally, we examine how the cooperativity in protein binding when σ ≥ d . On the other hand, the 4-HM significantly reduces the ui i affects the accuracy of various approximation methods. Figure 2h HD values and performs exceptionally well in capturing the bimodal shows the maximum HD as a function of h for the LMA, 2-HM, and 4- region (Fig. 3e). Here we do not show the 2-HM because it leads to HM, where the maximum HD is computed when σ and σ vary over similar results as the 4-HM except for a slightly larger HD value. u b large ranges and other parameters remain fixed. Clearly, for the LMA, We next consider a toggle switch with cooperative binding, where the maximum HD increases approximately linearly with respect to h two genes repress each other (Fig. 3f). Note that this is a positive when h ≤ 4; for Holimaps, the maximum HD is insensitive to h.SinceTF feedback loop with ultra-sensitivity and hence it can produce deter- cooperativity is the norm rather than the exception ,our resultssug- ministic bistability (type-I bimodality), which means that the corre- gest Holimap’s accuracy remains high over the physiologically mean- sponding system of deterministic rate equations (Supplementary ingful range of parameter values. Note 4) is capable of having two stable fixed points and one unstable The results that we have presented assume steady-state condi- point. Again, we only focus on the situation of σ ≫ d.Figure 3g ui i tions. However, the 2-HM can also accurately reproduce the time illustrates the HDs for the LMA and 4-HM against σ and σ .The b1 b2 evolution of the protein distribution for nonlinear gene networks yellow curve encloses the region of deterministic bistability, which is (Supplementary Fig. 2). The 4-HM is also accurate; however depending markedly smaller than the true bimodal region enclosed by the red on parameter values, it may lead to numerical instability at short times, curve. According to simulations, bimodality can be observed when Nature Communications | (2024) 15:6557 5 Article https://doi.org/10.1038/s41467-024-50716-z 2-node network 2-parameter Holimap 4-parameter Holimap G G G 1 1 1 ρ ρ ρ u1 u1 u1 d d d 1 1 1 b1 u1 σ σ σ σ b1 u1 b1 u1 P P P 1 1 1 ρ ρ b1 b1 ρ b1 h copies 2 * * * G G G 1 1 1 G G G 2 2 2 ρ ρ u2 u2 u2 P P P 2 2 2 d d d 2 2 2 σ σ b2 σ σ σ σ u2 b2 u2 b2 u2 ρ ρ b2 b2 ρ b2 h copies * * * 1 G G G 2 2 2 positive feedback loop negative feedback loop negative feedback loop positive feedback loop (toggle switch) G G 2 G G G G G G 1 1 2 1 2 1 2 ρ ρ ρ ρ ρ ρ ρ ρ b1 > u1 b1 < u1 b1 > u1 b1 < u1 ρ ρ ρ ρ ρ ρ ρ ρ b2 > u2 b2 > u2 b2 < u2 b2 < u2 LMA 4-HM HD HD 2 2 0.09 c d e 0.4 0.4 bimodality bimodality FSP prediction of 4-HM prediction of 4-HM negative feedback loop LMA 0.3 0.3 0.06 2-HM G G 4-HM 1 2 -1 -1 0.2 0.2 0.03 0.1 0.1 h = h = 1 1 2 0 0 -4 -4 0 02 10 0 30 40 -1 2 4 -1 2 4 log σ log σ protein number 10 b1 10 b1 LMA 4-HM HD HD f g 1 1 h 0.093 0.69 0.69 toggle switch 0.062 0.46 0.46 G 1 G 2 -2 -2 0.031 0.23 0.23 bimodality bimodality h = h = 2 1 2 prediction of 4-HM prediction of 4-HM deterministic bistability deterministic bistability -5 0 -5 0 0 02 10 0 30 40 -0 3 3 -0 3 3 log σ log σ protein number 10 b1 10 b1 Fig. 3 | Holimaps for two-node gene networks in steady-state conditions. predicted by Holimap. e Comparison of the steady-state distributions of protein P a Illustration of the 2-HM and 4-HM for a two-node gene network, where two genes computed using FSP, LMA, 2-HM, and 4-HM. f A toggle switch with cooperative G and G regulate each other. Feedback is mediated by cooperative binding of h binding (h = h =2). g Same as (d) but for the toggle switch. The yellow curve 1 2 1 1 2 copies of protein P to gene G and cooperative binding of h copies of protein P to encloses the parameter region of deterministic bistability, i.e., the region in which 1 2 2 2 gene G . b The two-node network can describe four different feedback loops, the deterministic rate equations have two stable fixed points and one unstable fixed according to whether ρ > ρ or ρ < ρ for i =1, 2. c A negative feedback loop with point. h Same as (e) but for the toggle switch. Here the parameters are chosen so ui bi ui bi non-cooperative binding (h = h =1). d Heat plots of the HDs for the LMA and 4-HM that the system displays deterministic bistability. While we only focus on the dis- 1 2 as functions of the binding rates σ and σ . Here the HD represents the Hellinger tribution of protein P in (d), (e), (g), and (h), the distribution of the second protein b1 b2 1 distance between the real and approximate steady-state distribution of the number P is also accurately predicted by Holimap (Supplementary Fig. 3). See Supple- of molecules of protein P . The red curve encloses the true bimodal parameter mentary Note 1 for the technical details of this figure. Source data are provided as a region computed using FSP, and the orange curve encloses the bimodal region Source Data file. both σ and σ are large. The LMA fails to reproduce the bimodal Applications to three-node networks with deterministic b1 b2 distribution since σ ≥ d , as expected. The 4-HM not only successfully oscillations ui i captures the bimodal region (enclosed by the orange curve), but also We now focus on three-node networks, where three genes regulate yields small HD values. The maximum HD for the LMA is as large as 0.7, each other in a cyclic manner (Fig. 4a, left). Feedback is mediated by while it is only 0.13 for the 4-HM. In particular, in the deterministically cooperative binding of h copies of protein P to gene G for i=1, 2, 3, i i i+1 bistable region, both the 2-HM and 4-HM accurately predict the pro- where G = G . Again, depending on whether ρ < ρ or ρ > ρ for 4 1 ui bi ui bi tein distribution while the LMA completely fails (Fig. 3h). i = 1, 2, 3, the network can be a repressilator (three negative Nature Communications | (2024) 15:6557 6 log σ log σ 10 b2 10 b2 probability probability Article https://doi.org/10.1038/s41467-024-50716-z 3-node network 2-parameter Holimap G G 1 1 ρ ρ u1 u1 d d 1 1 b1 u1 σ σ b1 u1 P P 1 1 ρ ρ b1 b1 h copies * * 3 G G 1 1 4-parameter Holimap G G G 2 3 1 ρ ρ ρ u1 u2 u3 P P 2 3 d2 d3 d1 σ σ σ σ b2 u2 b3 u3 σ σ b1 u1 ρ ρ b2 b3 ρ b1 h copies * h copies * * 1 G 2 G G 2 3 1 75 24 b c repressilator 50 16 ρ ρ b1 u1 ρ ρ SSA b2 u2 LMA ρ ρ b3 u3 2-HM h = h = h = 3 1 2 3 0 0 G G 2 3 02 10 0 30 40 02 10 0 30 40 time time 0.22 0.12 0.07 0.04 0.04 t = 0.4 t = 0.8 t = 1.2 t = 1.6 t = 2 SSA LMA 2-HM SSA+2-HM 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 0.04 0.04 0.04 0.05 0.04 t = 3 t = 5 t = 7 t = 9 t = 11 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 0.04 0.04 0.04 0.04 0.04 t = 14 t = 18 t = 22 t = 26 t = 30 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 protein number Fig. 4 | Holimaps for three-node gene networks. a Illustration of the 2-HM and oscillations. c Time evolution of the mean and Fano factor of fluctuations in the 4-HM for a three-node gene network, where three genes regulate each other along number of molecules of protein P computed using the SSA (with 10 trajectories), the counterclockwise direction. Feedback is mediated by cooperative binding of h LMA, and 2-HM. d Comparison of the time-dependent distributions of protein P at i 1 copies of protein P to gene G ,where G is understood to be G . b Arepressilator 15time points computed using the SSA (with 10 trajectories), LMA, 2-HM, and i i+1 4 1 with cooperative binding. Here the cooperativities are chosen as h =3 for hybrid SSA+2-HM (with 2000 trajectories). See Supplementary Note 1 for the i = 1, 2, 3 such that the deterministic system of rate equations produces sustained technical details of this figure. Source data are provided as a Source Data file. regulations) , a Goodwin model (one negative regulation and two (Supplementary Note 4) to produce sustained oscillations. According 46 47 positive regulations) , or a positive feedback loop . to simulations, deterministic oscillations are not observed when h ≤ 2. As for previous examples, Holimap transforms the nonlinear Figure 4c illustrates the oscillatory time evolution of the mean and network into a linear one (Fig. 4a, right). We now focus on the Fano factor (the variance divided by the mean) of fluctuations in the repressilator illustrated in Fig. 4b, where the cooperativities are chosen number of protein P computed using the SSA, LMA, and 2-HM. Note as h = h = h = 3. Here high cooperativities are chosen since we that here we do not consider the 4-HM because, as previously men- 1 2 3 require the corresponding deterministic system of rate equations tioned, it may cause numerical instability when computing time- Nature Communications | (2024) 15:6557 7 probabiity mean Fano factor Article https://doi.org/10.1038/s41467-024-50716-z dependent distributions. The LMA fails to reproduce damped oscilla- we also compare the CPU times and accuracy of these methods. The tions in the time evolution of the mean and Fano factor, while Holimap number of SSA trajectories N needed for SSA + 2-HM is chosen such excellently captures these oscillations. Note also that the LMA sig- that the distributions obtained from N trajectories and those obtained nificantly underestimates the variance of fluctuations and hence leads from 3N trajectories have an HD (averaged over all time points) less to a much smaller Fano factor in the limit of long times. than 0.02, i.e., increasing the sample size will not substantially improve Figure 4d compares the time-dependent protein distributions the approximation accuracy—a sample size of N = 2000 is sufficient to computed using the SSA, LMA, and 2-HM. Interestingly, both the LMA satisfy this criterion. Notably with almost the same CPU time, SSA + 2- and 2-HM accurately reproduce the protein distribution at small times HM yields distributions that are significantly more accurate than those (t ≤ 3). However, the LMA fails to reproduce bimodality at intermediate obtained from only the SSA with the same number of trajectories—the and large times since it underestimates noise. In contrast, Holimap HD for the former is only 0.04–0.06, while for the latter it is 0.11–0.13; performs remarkably well in predicting the complete time evolution of here the distributions obtained from the SSA with 10 trajectories are the protein distribution. used as a proxy of ground truth when computing the HDs. We also Thus far, we have considered regulatory networks where each note that SSA + 2-HM yields distributions that are practically as accu- gene is regulated by one TF; however, many genes are regulated by a rate as the 2-HM but are over 16 times faster (28 s vs 7 min 39 s). multitude of TFs which are often shared between multiple genes .In To further test the accuracy of SSA + Holimap, we apply it to a Supplementary Note 5, we investigate gene networks with two TF random M-node gene network (Fig. 5c), where any pair of nodes has a binding sites. We show that Holimap performs excellently in capturing probability of 2/M to be connected. This guarantees that each gene the protein distributions as well as the bimodal region, independent of on average regulates two genes. When connected, each direct edge has the type of network topology and the type of TF binding (independent, an equal probability to be positive or negative regulation; auto- positive cooperative, and negative cooperative binding ). regulation is also allowed. The details of the stochastic model are describedinMethods.Wethenapply the2-HMtotransform the A hybrid combination of SSA and Holimap provides highly effi- nonlinear random network into a linear one and then use 2000 SSA cient computation of complex gene network dynamics trajectories to estimate the effective parameters of the linear network. The FSP and SSA are two widely used methods for solving the Figure 5d illustrates the CPU times and HDs against the number of dynamics of stochastic chemical reaction systems. While FSP yields an nodes M forSSA+2-HMand theSSA with thesamenumberoftra- accurate distribution, from a practical point of view, it is only applic- jectories. Again an SSA with 10 trajectories is used to generate a proxy able to small networks where protein numbers are not very large; for of the ground truth distribution when computing the HDs. Clearly, the large networks, the size of the state space leads to an enormous two methods yield almost the same CPU times that approximately computational cost . The SSA can also be computationally expensive, linearly scale with M. This is because for SSA + 2-HM, almost all time is particularly when the network has multiple reaction time scales . spent on simulating the SSA trajectories, while solving the linear net- When fluctuations are large, it can yield a non-smooth distribution, work consumes very little time. However, compared with an SSA with from which it is sometimes even difficult to determine the number of 2000 trajectories, SSA + 2-HM gives rise to markedly lower HD values, modes. To overcome this, a huge number of stochastic trajectories which are insensitive to M. may be needed to obtain statistically accurate results. Holimap pro- vides an accurate and smooth approximation of the protein distribu- Generalization to networks with post-translational or post- tions; however, it becomes computationally slow when the network is transcriptional regulation complex or the cooperativity is large since in these cases we have to Thus far, we have showcased Holimap in transcriptional networks with solve a large number of moment equations. This raises an important protein-gene interactions. A crucial question is whether Holimap can question: is it possible to develop a highly efficient and accurate also be applied to solve the dynamics of post-translational and post- computation method of stochastic gene network dynamics that yields transcriptional networks with complex protein-protein, protein-RNA, a smooth distribution? and RNA-RNA interactions. To see this, we first focus on two post- To address this question, we propose a hybrid method that com- translational networks (Fig. 6a, b). bines the SSA and Holimap. This method consists of three steps (Fig. 5a). Figure 6a shows a two-node synthetic network with autoregula- First we use the SSA to generate a small number of trajectories (usually a tion and protein sequestration .Hereprotein P produced from gene few thousand trajectories are enough) from which we compute the G regulates its own expression; the two proteins P and P can bind to i 1 2 steady-state or time-dependent sample moments of protein numbers. each other and form an inactive complex C.Wethendeviseathree- We then use the latter to compute the approximate effective parameters parameter Holimap (3-HM) which transforms the nonlinear gene net- of the linear network. Finally, we use FSP to compute the protein dis- work into the linear one shown in Fig. 6c. In principle, Holimap tribution of the linear network with effective parameters to approximate replaces all high-order interactions between genes, proteins, and RNAs that of the nonlinear one. For example, for the autoregulatory circuit by effective first-order reactions. We first replace the protein-gene * * illustrated in Fig. 2a, we substitute the sample moments obtained from binding reactions G + h P " G by G " G with effective parameters i i i i i i the SSA into Eq. (4) to compute the approximate values of σe and σe ,and σe and σe , and then we replace the protein-protein binding reaction u b ui bi then use the marginal protein distribution of the linear network to P + P → C by P !+ with effective parameter d . Again, using 1 2 i i e e construct the 2-HM of the nonlinear network. In other words, for Holi- moment-matching, the three effective parameters σ ,σ ,and d can ui bi i map, the determination of the effective parameters can be done inde- be represented by low-order moments of the nonlinear network pendently of other computational methods while the hybrid method (Supplementary Note 7) and hence can be computed approximately requires the running of the SSA. using the SSA with a small number of trajectories. In this way, the This hybrid SSA + Holimap method is computationally much fas- hybrid SSA + Holimap can be applied to predict the dynamics of the ter than the SSA because the number of trajectories needed to obtain nonlinear network. good approximations to low-order moments is much less than that Note that since Holimap replaces the binding reaction P + P → C 1 2 needed to obtain smooth protein distributions. It is also computa- by P !+ with a new parameter d, intuitively, one may deduce that tionally less expensive than Holimap since there is no need to solve a this approximation is only valid when protein P is very abundant large number of moment equations. To test this hybrid method, we compared to protein P so that noise in protein P number can be 1 2 compare the time-dependent distributions for the repressilator cal- ignored. However, unexpectedly, we find that Holimap makes accurate culated using the SSA, LMA, 2-HM, and SSA + 2-HM (Fig. 4d). In Fig. 5b, predictions not only in this special case but also in scenarios where the Nature Communications | (2024) 15:6557 8 Article https://doi.org/10.1038/s41467-024-50716-z a b SSA + Holimap Method CPU time HD for P HD for P HD for P 1 2 3 FSP >10 years SSA (10 trajectories) 32 min 36 s 00 0 03 10 20 0 40 0.13 0.11 0.12 SSA (2000 trajectories) 26 s time LMA 3 min 05 s 0.29 0.20 0.22 2-HM 7 min 39 s 0.05 0.04 0.04 sample SSA+2-HM (2000 trajectories) 28 s 0.06 0.04 0.05 σ σ b1 u1 moments c 80 SSA random network SSA+2-HM u1 1 20 σ σ b1 u1 2468 10 b1 0.16 0.12 0.08 0.04 predict the protein distribution 2468 10 by solving the linear network number of genes M Fig. 5 | A hybrid method combining the SSA and Holimap. a SSA+Holimap serves infeasible (see Supplementary Note 6 for the estimation procedure for the CPU as a highly efficient strategy to approximately solve the dynamics of complex time required by FSP). c An illustration of a random M-node gene network, where stochastic gene networks. First, the SSA is used to generate a relatively small each pair of nodes has a probability of 2/M to be connected. There are on average number of trajectories of the nonlinear network from which the steady-state or 2M directed edges for the network, each having an equal probability to be positive time-dependent sample moments are estimated. The low-order sample moments or negative regulation. d Comparison of the CPU times and the accuracy (measured are then used to approximately compute the effective parameters of the linear by HDs averaged over ten-time points and overall proteins) against the number of network. Finally, the protein distributions follow directly by solving the dynamics nodes M for SSA+2HM and the SSA with the same number of trajectories. Here the of the linear network using FSP. b Comparison of the CPU times and the accuracy number of trajectories needed for both SSA+2HM and SSA is chosen as N =2000. (measured by HDs at t = 30) for different methods. All methods were used to Both methods were used to simulate the time-dependent distributions for a ran- simulate the time-dependent protein distributions shown in Fig. 4d. The HD dom network. Data are presented as mean values +/− standard deviations for five represents the Hellinger distance between the actual and approximate protein different random networks. See “Methods” for the technical details. In (b)and (d), distributions. Here a proxy for the ground truth distribution is computed using the all simulations were performed on an Intel Core i9-9900K processor (3.60 GHz). SSA with 10 trajectories rather than FSP since the latter is computationally two proteins interact at comparable concentrations and where P replaced by the effective degradation reaction M !+ with new is very scarce compared to P (Supplementary Fig. 5). This again con- parameter d (Supplementary Note 7). firms the high accuracy of Holimap over large regions of para- Figure 6e illustrates another network with microRNA-mRNA inter- meter space. actions, which has been shown to be capable of producing complex As another example of post-translational regulation, we consider a emergent behaviors such as bistability and sustained oscillations .Here gene network with autoregulation and protein phosphorylation the mRNA of interest, expressed from gene G ,has twomicroRNA (Fig. 6b), which has been used to account for circadian oscillations in binding sites. The microRNA, produced from gene G ,can bind to its Drosophila and Neurospora . Here the free protein P can be reversibly mRNA target and form two inactive complexes C (only one binding site phosphorylated into the forms P and P , successively. The latter form P is occupied) and C (both binding sites are occupied). The free mRNA 1 2 2 2 can bind to the gene and regulate its expression. Both phosphorylation and microRNA are degraded with rates d and d ,respectively. Once the 1 2 and dephosphorylation are enzyme-catalyzed and are described using complex C (C ) is formed, the mRNA and microRNA are degraded with 1 2 Michaelis-Menten kinetics. Holimap can also be applied to this network, rates a (b )and a (b ), respectively. The mRNA dynamics for this net- 1 1 2 2 where protein-gene interactions are replaced by the switching reactions work can also be predicted by Holimap, which replaces the complex e e G" G with effective parameters σ and σ ,and the complex post- post-transcriptional regulation by the effective reaction M !+ with u b translational regulation is replaced by the degradation reaction P !+ new parameter d (Fig. 6f and Supplementary Note 7). with effective parameter d (Fig. 6cand SupplementaryNote7). Note that for transcriptional networks, Holimap does not change Furthermore, we apply Holimap to two post-transcriptional net- the degradation rate; however, for post-transcriptional and post- works (Fig. 6d, e). Figure 6d illustrates a gene network with auto- translational networks, both the binding/unbinding rate and degra- regulation and mRNA degradation control .Here the enzyme can dation rate need to be modified. To test the accuracy of the three- convert between an inactive form E and an active form E.The degra- parameter Holimap, we compare the time-dependent distributions for dation of the mRNA of interest can occur spontaneously with rate d theabovefourgenenetworkscomputedusing theSSA with 10 tra- and can be catalyzed by the active form of the enzyme with rate α. jectories, SSA with 2000 trajectories, and hybrid SSA + Holimap with Holimap transforms the nonlinear network into the linear one shown 2000 trajectories (Fig. 6g). Clearly, SSA+Holimap is accurate for all in Fig. 6f by removing all high-order interactions between molecules. In networks. In particular, the distributions predicted by SSA + Holimap * * particular, the enzyme-catalyzed degradation reaction M + E → E is with a small number of trajectories have almost the same accuracy as Nature Communications | (2024) 15:6557 9 protein # HD CPU time Article https://doi.org/10.1038/s41467-024-50716-z protein sequestration enzyme-catalyzed mRNA degradation a d ρ G u1 d1 b1 u1 σ σ b1 M α h1 copies * G1 h copies * inactive complex u E * E u2 P2 P σ σ b2 u2 3-parameter Holimap 3-parameter Holimap c f b2 h copies * 2 G G 2 G u ρ σ σ b u σ σ b u e microRNA-mRNA interaction protein phosphorylation b ρ * 1 G * G G u1 d b1 u1 σ σ b M P ρ β b1 2 b h copies * G α G 1 a2 C C C 1 β 2 1 α b2 1 b2 a2 d1 a1 α b a 1 3 α P1 G2 b3 ρ β u2 C C β 4 3 b σ R 4 c3 b2 u2 P2 d a 2 4 d2 b2 G2 SSA (10 trajectories) g h protein sequestration (A) protein phosphorylation (B) SSA+Holimap (2000 trajectories) t = 1 t = 10 t = 1 t = 10 0.039 0.03 0.048 0.03 18 SSA (2000 trajectories) SSA (10 trajectories) 0.026 SSA+Holimap 0.02 0.032 0.02 0.013 0.01 0.016 0.01 5 s 15 s 14 s 10 s 0 0 0 0 0 40 80 120 0 40 80 120 0 40 80 120 0 40 80 120 0 protein number protein number protein number protein number AB C D enzyme-catalyzed mRNA degradation (C) microRNA-mRNA interaction (D) SSA (2000 trajectories) SSA+Holimap (2000 trajectories) t = 1 t = 10 t = 1 t = 10 0.03 0.033 0.033 0.021 0.12 0.02 0.022 0.022 0.014 0.08 0.01 0.011 0.011 0.007 0.04 0 0 0 0 0 40 80 120 0 40 80 120 0 40 80 120 0 40 80 120 mRNA number mRNA number mRNA number mRNA number AB C D Fig. 6 | Holimap for post-translational and post-transcriptional networks. degradation reactions. g Comparison of the protein distributions for the four a, b Post-translational networks. a Network with autoregulation and protein networks at two time points computed using the SSA with 10 trajectories, SSA with 50 51 sequestration . b Network with autoregulation and protein phosphorylation . 2000 trajectories, and SSA + Holimap (with 2000 trajectories). h Comparison of c Three-parameter Holimap for post-translational networks. d, e Post- the CPU times and the accuracy (measured by HDs averaged over ten-time points) transcriptional networks. d Network with autoregulation and mRNA degradation of the SSA and SSA + Holimap for the four networks. The distributions obtained 52 53 5 control . e Network with microRNA-mRNA interactions .Here α is the binding rate from the SSA with 10 trajectories are used as a proxy of ground truth when com- of microRNA to its mRNA target and β is the unbinding rate. f Three-parameter puting the HDs. See Supplementary Note 1 for the technical details of this figure. Holimap for post-transcriptional networks. All high-order interactions between Source data are provided as a Source Data file. genes, proteins, and RNAs are replaced by effective first-order switching and those predicted by the SSA with a huge number of trajectories (HD < repressilator, and complex randomly connected networks with 0.03) while the CPU time is reduced by over 60 fold (Fig. 6h). numerous interacting genes. Notably, we have demonstrated that a hybrid method that uses both Holimap and the SSA leads to much Discussion more accurate distributions than solely using the SSA, with practically In this paper, we have constructed a computational method, Holimap, no increase in the CPU time and high accuracy that is independent of for the accurate and efficient prediction of the protein/mRNA number the number of interacting genes in the network. distributions of a general gene regulatory network. We have show- We devised three types of Holimaps—the 2-HM, 3-HM, and 4-HM— cased the method by applying it to a variety of networks including all of them decoupling gene-gene interactions in a nonlinear reg- transcriptional networks with protein-gene interactions, post- ulatory network and transforming it into a linear one with multiple translational networks with protein-protein interactions, and post- effective parameters. The 2-HM and 4-HM apply to transcriptional transcriptional networks with protein-RNA or RNA-RNA interactions. networks, while the 3-HM is only applicable to post-translational and For transcriptional networks, we have tested Holimap in simple auto- post-transcriptional networks. The 4-HM is more accurate than the 2- regulatory loops where a gene influences its own expression, two-gene HM, although the improvement in accuracy is marginal. Depending on systems such as the toggle switch, three-gene systems such as the parameters, the 4-HM may lead to numerical instability at short times. Nature Communications | (2024) 15:6557 10 probability probability post-translational regulation post-transcriptional regulation HD CPU time (minutes) Article https://doi.org/10.1038/s41467-024-50716-z Hence the 4-HM is preferred if our aim is to compute the steady-state unbinding rates are large) and hence should be used more cautiously. distribution, and the 2-HM is a preferable choice if our aim is to Ongoing work aims to extend the method to predict both mRNA and compute the time-dependent distribution. The two types of Holimaps protein dynamics, including their joint distribution for pairs of genes. require the solution of the same number of moment equations and Concluding, we have devised a method that overcomes many of hence give rise to similar CPU times. Since the number of equations to the known difficulties encountered when simulating complex sto- be solved increases exponentially with the network size, the standard chastic gene network dynamics. We anticipate that Holimap will be Holimap is only recommended when the scale of the network is not too useful for investigating noisy dynamical phenomena in complex reg- large. For medium and large-scale networks, the hybrid SSA+Holimap ulatory networks where intuitive understanding is challenging to attain approach is more advantageous since it significantly reduces the CPU and simulations using the SSA become computationally prohibitive. time while maintaining high accuracy. Some of the advantages of our method over other common Methods approximations in the literature are as follows: (i) Holimap does not Determining the effective parameter for the LMA sacrifice the discrete nature of molecular reactions since the approx- For the linear network in Fig. 2b, the evolution of moments is governed by imate distributions are solutions of the CME of the effective linear _ ^ network. This is unlike many common methods that achieve a speed g = σ g  σ g , 0 u 1 b 0 increase by making use of a continuum approximation of the CME such _ μ = ρ Bg  dμ + σ μ  σ ^ μ , ð5Þ 1,0 u 0 1,0 u 1,1 b 1,0 54,55 as the Fokker-Planck / Langevin equations or partial integrodiffer- _ ^ μ = ρ Bg  dμ  σ μ + σ μ : 1,1 b 1 1,1 u 1,1 b 1,0 56,57 ential equations ; (ii) Holimap does not assume the protein number distribution to be of a simple type such as the Gaussian, Poisson, Inserting Eq. (2)into Eq. (5) gives a closed set of moment equations, Lognormal or Gamma distributions, as commonly assumed by con- from which the values of g , μ ,and μ can be computed approxi- 0 1,1 1,0 58,59 ventional moment-closure methods —the solution of the linear mately. Finally, using these values, the effective parameter σ ^ can be network that Holimap utilizes is very flexible and spans a very large obtained from Eq. (2). The remaining steps for the LMA are the same as number of possible distribution shapes including those with multiple for the 2-HM. modes and significant skewness. For example, if each gene in a com- plex regulatory network switches between a number of states for Determining the effective parameters for the 4-HM which only one is active, then Holimap approximates the protein dis- For the autoregulatory circuit, it follows from Eq. (1)that tribution for each gene by that of a multi-state gene expression model _ _ with no regulatory interactions (Supplementary Note 5) for which the μ + μ = ρ Bg + ρ Bg  dðμ + μ Þ 1,0 1,1 u 0 b 1 1,0 1,1 analytical steady-state solution is known to be a generalized hyper- + σ g  σ μ , u 1 b 1,0 60,61 ð6Þ geometric function , which includes a large number of special _ _ μ + μ =2ρ Bðμ + Bg Þ+2ρ Bðμ + Bg Þ 2,0 2,1 u 1,0 0 b 1,1 1 functions as special cases. 2dðμ + μ Þ+2σ μ  2σ μ : 2,0 2,1 u 1,1 b 2,0 Our hybrid SSA+Holimap method shares some similarities with neural network-based approaches , which can also be used to predict For the linear network in Fig. 2d,theevolutionofmomentsisgovernedby complex gene network dynamics. The former uses the SSA to generate the sample moments which are then used to compute the values of g = σ g  σ g , 0 u 1 b 0 effective parameters, while the latter uses the SSA to train the surrogate μ = ρ Bg  dμ + σ μ  σ μ , 1,0 u 0 1,0 u 1,1 b 1,0 neural network model. While both methods can accurately capture the μ = ρ Bg  dμ  σ μ + σ μ , ð7Þ 1,1 b 1 1,1 u 1,1 b 1,0 protein/mRNA distribution, our method outperforms the neural μ =2ρ Bðμ + Bg Þ 2dμ + σ μ  σ μ , 2,0 u 1,0 0 2,0 u 2,1 b 2,0 network-based ones in three aspects: (i) while neural network models μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ : 2,1 b 1,1 1 2,1 u 2,1 b 2,0 perform well in the parameter ranges which are used to train the sur- rogate model, their extrapolation ability is usually weak. Our method is mechanism-based and provides accurate results over wide parameter From these equations, it is easy to show that ranges; (ii) neural network-based methods require a very long time to _ _ train the surrogate model. When the network is complex, the training μ + μ = ρ Bg + ρ Bg  dðμ + μ Þ, 1,0 1,1 u 0 b 1 1,0 1,1 time may take tens of hours to several days and may also require _ _ μ + μ =2ρ Bðμ + Bg Þ+2ρ Bðμ + Bg Þ ð8Þ 2,0 2,1 u 1,0 0 b 1,1 1 multiple rounds of hyperparameter tuning. In contrast, Holimap avoids 2dðμ + μ Þ: 2,0 2,1 the long training time; (iii) neural network models have good predictive ability but their learned approximation does not typically have a clear Matching Eqs. (6)and (8), we find that ρ and ρ should satisfy the b u biophysical interpretation. Holimap transforms the complex nonlinear following system of linear equations: network into a linear one which not only has a clear physical meaning but also allows an approximative analytical solution. In addition, ρ Bg + ρ Bg = ρ Bg + ρ Bg + σ g  σ μ , u 0 b 1 u 0 b 1 u 1 b 1,0 SSA + Holimap can be combined with neural network-based methods ρ Bðμ + Bg Þ + ρ Bðμ + Bg Þ ð9Þ u 1,0 0 b 1,1 1 to increase the speed and accuracy of the latter. Since SSA + Holimap = ρ Bðμ + Bg Þ + ρ Bðμ + Bg Þ + σ μ  σ μ : u 1,0 0 b 1,1 1 u 1,1 b 2,0 can be used to generate distributions comparable in accuracy to those from the SSA with a much larger number of trajectories, it follows that Matching the first and third identities in Eqs. (1)and (7), it is clear that SSA + Holimap can reduce the time to generate an accurate training σ and σ should satisfy the following system of linear equations: b u dataset as input to the neural network. σ g  σ g = σ g  σ μ , The main limitation of the present method is that there are no u 1 b 0 u 1 b 1,0 ð10Þ analytical guarantees that the effective parameters of the linear network σ μ  σ μ = σ μ  σ μ + ðρ  ρ ÞBg , u 1,1 b 1,0 u 1,1 b 2,0 b b 1 are positively definite for all times. Nevertheless, for all examples using the 2-HM and 3-HM in this paper, we have numerically found this to be where ρ has been determined by solving Eq. (9). Compared with thecaseandhenceweare confident that the linear network obtained by Eq. (4), Eq. (10) has an additional term ðρ   ρ ÞBg .Thisisbecause ρ b b 1 the 2-HM or 3-HM procedure is generally physically interpretable. In remains unchanged for the 2-HM but is changed for the 4-HM. contrast, we observed that the 4-HM procedure can occasionally give Finally, inserting Eqs. (9)and (10)into Eq. (7) gives a system of rise to negative parameter values (typically when the binding and closed moment equations and hence the values of all zeroth, first, and Nature Communications | (2024) 15:6557 11 Article https://doi.org/10.1038/s41467-024-50716-z second-order moments can be approximately calculated. Substituting 2. Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the these moments into Eqs. (9)and (10), one can finally solve for the four evolution of animal body plans. Science 311,796–800 (2006). effective parameters ρ ,ρ ,σ ,and σ of the linear network. The 4-HM 3. Olson, E. N. Gene regulatory networks in the evolution and devel- u b u b predicts the protein distribution of the nonlinear network by solving opment of the heart. Science 313,1922–1927 (2006). the CME of the linear one in Fig. 2d with the values of the effective 4. Gerstein, M. B. et al. Architecture of the human regulatory network parameters found above. derived from ENCODE data. Nature 489,91–100 (2012). 5. Spitz, F. & Furlong, E. E. Transcription factors: from enhancer Stochastic model for complex gene networks binding to developmental control. Nat. Rev. Genet. 13, Here we consider the stochastic model of an arbitrary gene regulatory 613–626 (2012). network involving protein synthesis, protein degradation, gene state 6. Pavlopoulos, G. A. et al. Using graph theory to analyze biological switching, and complex gene regulation mechanisms .Specifically, we networks. BioData Min. 4,1–27 (2011). assume that the network involves M distinct genes, each of which can 7. Koutrouli, M., Karatzas, E., Paez-Espino, D. & Pavlopoulos, G. A. A be in two states: an inactive state G and an active state G .The protein guide to conquer the biological network era using graph theory. associated with gene G is denoted by P . The network can be described Front. Bioeng. Biotech. 8,34(2020). j j by the following reactions: 8. Emmert-Streib, F., Dehmer, M. & Haibe-Kains, B. Gene regulatory networks and their applications: understanding biological and 0 1 α α medical problems in terms of networks. Front. Cell Dev. Biol. 2, j j * * G ! G , G ! G , j j j j 38 (2014). 0 1 σ σ ji ji 9. Margolin,A.A.etal.ARACNE:analgorithm for the reconstruction of * * G + h P ! G , G + h P ! G , j ji i j j ji i j gene regulatory networks in a mammalian cellular context. BMC ð11Þ 0 1 ρ ρ j j Bioinform. 7,1–15 (2006). * * G ! G + P , G ! G + P , j j j j j j 10. Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 challenges: a community effort to assess biological network infer- P !+, i, j = 1,2,:::,M, ence. Ann. N. Y. Acad. Sci. 1158,159–195 (2009). where the reactions in the first row describe spontaneous gene state 11. Emmert-Streib, F.,Glazko, G. V.,Altay,G. & de MatosSimoes, R. switching, the reactions in the second row describe gene regulation, the Statistical inference and reverse engineering of gene regulatory reactions in the third row describe protein synthesis in the two-gene networks from observational expression data. Front. Genet. 3, states, and the last reaction describes protein degradation or dilution. 8(2012). 1 0 Since G is the inactive state and G is the active state, we have ρ > ρ . 12. Chan, T. E., Stumpf, M. P. & Babtie, A. C. Gene regulatory network j j Due to complex gene regulation, each gene G may be regulated by all inference from single-cell data using multivariate information 0 1 genes. If gene G activates gene G,then σ >0 and σ =0 since the measures. Cell Syst. 5,251–267 (2017). i j ji ji binding of protein P induces the switching from G to G ; on the contrary, 13. Badia-i Mompel, P. et al. Gene regulatory network inference in the i j 0 1 if gene G inhibits gene G,then σ =0 and σ > 0 since the binding of era of single-cell multi-omics. Nat. Rev. Genet. 24,1–16 (2023). i j ji ji protein P induces the switching from G to G.Whenperforming 14. De Jong, H. Modeling and simulation of genetic regulatory systems: i j simulations (SSA and SSA + Holimap), the parameters are chosen as aliterature review. J. Comput. Biol. 9,67–103 (2002). 1 0 0 1 0 1 d =1, h =1, ρ =81, ρ =5:4, α = α =0:5, σ =0:01, σ =0 when G acti- 15. Karlebach, G. & Shamir, R. Modelling and analysis of gene reg- i ji j j j j ji ji 0 1 vates G,and σ =0, σ =0:01 when G inhibits G . The presence or ulatory networks. Nat. Rev. Mol. Cell Biol. 9,770–780 (2008). j i j ji ji absence of a gene-gene interaction and its type are determined 16. Edelstein-Keshet, L. Mathematical models in biology (SIAM, randomly. Here we assume that protein-gene interactions are non- 2005). cooperative (h = 1). The spontaneous switching rates between G and G 17. Ingalls,B.P. Mathematical modeling in systems biology: an intro- ij j 0 1 are chosen to be σ = σ =0:5.Sinceeachgeneisonaverageregulatedby duction (MIT press, 2013). j j two genes (one positive regulation and one negative regulation), the 18. Schnoerr, D., Sanguinetti, G. & Grima, R. Approximation and infer- switching rates due to gene regulation are roughly equal to ence methods for stochastic biochemical kinetics—a tutorial 0 1 σ = σ =0:01 multiplied by the number of regulator P,which is ~50. review. J. Phys.A Math.Theor. 50,093001 (2017). ji ji Hence the total switching rates due to spontaneous contributions and 19. Elowitz,M.B., Levine,A. J., Siggia, E. D. &Swain, P. S.Stochastic gene regulation are roughly 0.5 + 0.01 × 50 = 1, i.e., they are comparable gene expression in a single cell. Science 297,1183(2002). with the degradation rate d =1. 20. Ozbudak,E.M., Thattai, M., Kurtser,I., Grossman,A. D.&van Oudenaarden, A. Regulation of noise in the expression of a single Reporting summary gene. Nat. Genet. 31,69–73 (2002). Further information on research design is available in the Nature 21. Munsky, B., Neuert, G. & Van Oudenaarden, A. Using gene expres- Portfolio Reporting Summary linked to this article. sion noise to understand gene regulation. Science 336, 183–187 (2012). Data availability 22. Munsky,B.&Khammash,M.The finite state projection algorithm for MATLAB R2019a was used to analyze the data. Source data are pro- the solution of the chemical master equation. J. Chem. Phys. 124, vided with this paper. 044104 (2006). 23. Gillespie, D. T. A general method for numerically simulating the Code availability stochastic time evolution of coupled chemical reactions. J. Com- The MATLAB codes for Holimap and SSA + Holimap can be found in put. Phys. 22,403–434 (1976). the Github repository . 24. SzékelyJr,T.&Burrage,K.Stochasticsimulationinsystemsbiology. Comput. Struct. Biotechnol. J. 12,14–25 (2014). References 25. Klipp, E.,Liebermeister,W., Wierling, C.&Kowald,A. Systems 1. Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the biology: a textbook (John Wiley & Sons, 2016). transcriptional regulation network of Escherichia coli. Nat. Genet. 26. Munsky, B., Hlavacek, W. S. & Tsimring, L. S. Quantitative biology: 31,64–68 (2002). theory, computational methods, and models (MIT Press, 2018). Nature Communications | (2024) 15:6557 12 Article https://doi.org/10.1038/s41467-024-50716-z 27. Bateman, E. Autoregulation of eukaryotic transcription factors. 53. Nordick, B., Yu, P. Y., Liao, G. & Hong, T. Nonmodular oscillator Prog. Nucleic Acid Res. Mol. Biol. 60,133–168 (1998). and switch based on RNA decay drive regeneration of multi- 28. Becskei, A. & Serrano, L. Engineering stability in gene networks by modal gene expression. Nucleic Acids Res. 50,3693–3708 autoregulation. Nature 405,590–593 (2000). (2022). 29. Crews, S. T. & Pearson, J. C. Transcriptional autoregulation in 54. Tian, T.,Burrage,K., Burrage, P.M. & Carletti,M.Stochasticdelay development. Curr. Biol. 19,R241–R246 (2009). differential equations for genetic regulatory networks. J. Comput. 30. Hermsen, R., Ursem, B. & Ten Wolde, P. R. Combinatorial gene reg- Appl. Math. 205,696–707 (2007). ulation using auto-regulation. PLoS Comput. Biol. 6, e1000813 (2010). 55. Tomioka, R., Kimura, H., Kobayashi, T. J. & Aihara, K. Multivariate 31. Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription fac- analysis of noise in genetic regulatory networks. J. Theor. Biol. 229, tors in the human genome. Genomics 112,3427–3434 (2020). 501–521 (2004). 32. Jia, C. & Grima, R. Dynamical phase diagram of an auto-regulating 56. Friedman, N., Cai, L. & Xie, X. S. Linking stochastic dynamics to gene in fast switching conditions. J. Chem. Phys. 152, 174110 (2020). population distribution: an analytical framework of gene expres- 33. Cai, L., Friedman, N. & Xie, X. S. Stochastic protein expression in sion. Phys.Rev.Lett. 97,168302 (2006). individual cells at the single molecule level. Nature 440, 57. Bokes, P. & Singh, A. Protein copy number distributions for a self- 358–362 (2006). regulating gene in the presence of decoy binding sites. PLoS one. 34. Singh, A. & Hespanha, J. P. Lognormal moment closures for bio- 10,e0120555(2015). chemical reactions. In Proc. of the 45th IEEE Conference on Decision 58. Schnoerr, D., Sanguinetti, G. & Grima, R. Comparison of different and Control,2063–2068 (IEEE, 2006). moment-closure approximations for stochastic chemical kinetics. J. 35. Cao, Z. & Grima, R. Linear mapping approximation of gene reg- Chem. Phys. 143, 185101 (2015). ulatory networks with stochastic dynamics. Nat. Commun. 9, 59. Lakatos, E., Ale, A., Kirk, P. D. & Stumpf, M. P. Multivariate moment 3305 (2018). closure techniques for stochastic kinetic models. J. Chem. Phys. 36. Grima, R. A study of the accuracy of moment-closure approximations 143, 094107 (2015). for stochastic chemical kinetics. J. Chem. Phys. 136, 154105 (2012). 60. Zhou, T. & Zhang, J. Analytical results for a multistate gene model. 37. Jia, C. & Grima, R. Small protein number effects in stochastic SIAM J. Appl. Math. 72,789–818 (2012). models of autoregulated bursty gene expression. J. Chem. Phys. 61. Jia, C.&Li,Y.Analyticaltime-dependent distributions forgene 152,084115 (2020). expression models with complex promoter switching mechanisms. 38. Ko, M. S. A stochastic model for gene induction. J. Theor. Biol. 153, SIAM J. Appl. Math. 83,1572–1602 (2023). 181–194 (1991). 62. Sukys, A., Öcal, K. & Grima, R. Approximating solutions of the 39. Jiao, F., Sun, Q., Tang, M., Yu, J. & Zheng, B. Distribution modes and chemical master equation using neural networks. Iscience. their corresponding parameter regions in stochastic gene tran- 25, (2022). scription. SIAM J. Appl. Math. 75,2396–2420 (2015). 63. Wang, X., Li, Y. & Jia, C. Poisson representation: a bridge between 40. Jia, C. & Grima, R. Frequency domain analysis of fluctuations of discrete and continuous models of stochastic gene regulatory mRNA and protein copy numbers within a cell lineage: theory and networks. J. R. Soc. Interface 20,20230467(2023). experimental validation. Phys.Rev.X 11, 021032 (2021). 64. Jia, C. & Grima, R. Holimap: an accurate and efficient method for 41. Schwanhäusser, B. et al. Global quantification of mammalian gene solving stochastic gene network dynamics. chenjiacsrc/Holimap expression control. Nature 473, 337 (2011). https://doi.org/10.5281/zenodo.12725485 (2024). 42. Larsson, A. J. et al. Genomic encoding of transcriptional burst kinetics. Nature 565,251–254 (2019). Acknowledgements 43. Suter, D. M. et al. Mammalian genes are transcribed with widely We thank Augustinas Sukys for comments on the manuscript. C.J. different bursting kinetics. Science 332,472–474 (2011). acknowledges support from the National Natural Science Foundation of 44. Gardner, T. S., Cantor, C. R. & Collins, J. J. Construction of a genetic China with grant Nos. U2230402 and 12271020. R.G. acknowledges toggle switch in Escherichia coli. Nature 403, 339 (2000). support from the Leverhulme Trust (RPG-2020-327). 45. Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of tran- scriptional regulators. Nature 403, 335–338 (2000). Author contributions 46. Goodwin, B. C. Oscillatory behavior in enzymatic control processes. R.G. conceived the original idea. C.J. performed the theoretical deriva- Adv. Enzym. Regul. 3,425–437 (1965). tions and numerical simulations. C.J. and R.G. interpreted the theoretical 47. Bragdon, M. D. et al. Cooperative assembly confers regulatory results and jointly wrote the manuscript. specificity and long-term genetic circuit stability. Cell 186, 3810–3825 (2023). Competing interests 48. Lammers, N. C., Kim, Y. J., Zhao, J. & Garcia, H. G. A matter of time: The authors declare no competing interests. using dynamics and theory to uncover mechanisms of transcrip- tional bursting. Curr. Opin. Cell Biol. 67,147–157 (2020). Additional information 49. Goentoro, L., Shoval, O., Kirschner, M. W. & Alon, U. The incoherent Supplementary information The online version contains feedforward loop can provide fold-change detection in gene reg- supplementary material available at ulation. Mol. Cell 36,894–899 (2009). https://doi.org/10.1038/s41467-024-50716-z. 50. Zhu, R., delRio-Salgado, J.M., Garcia-Ojalvo, J. &Elowitz,M.B. Synthetic multistability in mammalian cells. Science 375, Correspondence and requests for materials should be addressed to eabg9765 (2022). Ramon Grima. 51. Gonze, D., Halloy, J. & Goldbeter, A. Robustness of circadian rhythms with respect to molecular noise. Proc.NatlAcad. Sci. USA Reprints and permissions information is available at 99,673–678 (2002). http://www.nature.com/reprints 52. Kuwahara, H. & Schwartz, R. Stochastic steady state gain in a gene expression process with mRNA degradation control. J. R. Soc. Publisher’s note Springer Nature remains neutral with regard to jur- Interface 9,1589–1598 (2012). isdictional claims in published maps and institutional affiliations. Nature Communications | (2024) 15:6557 13 Article https://doi.org/10.1038/s41467-024-50716-z Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2024 Nature Communications | (2024) 15:6557 14 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nature Communications Springer Journals

Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Nature Communications , Volume 15 (1) – Aug 2, 2024

Loading next page...
 
/lp/springer-journals/holimap-an-accurate-and-efficient-method-for-solving-stochastic-gene-0070cY3TTu

References (62)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2024
eISSN
2041-1723
DOI
10.1038/s41467-024-50716-z
Publisher site
See Article on Publisher Site

Abstract

Article https://doi.org/10.1038/s41467-024-50716-z Holimap: an accurate and efficient method for solving stochastic gene network dynamics 1 2 Received: 2 March 2024 Chen Jia &Ramon Grima Accepted: 13 July 2024 Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of Check for updates simulation methods that can accurately and efficiently predict how the dis- tributions of gene product numbers vary across parameter space. To over- come these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap’s computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene- gene interactions results in precise coordination and control of gene expression. Genetic regulation occurs through intricate interactions between a Boolean networks, the expression of each gene is tracked by a binary 1–4 number of genes .Agene “X” may express a protein which acts as a variable and hence large networks can be examined in a computa- transcription factor (TF), promoting or inhibiting the RNA polymerase tionally efficient way. A more refined description is provided by the use assembly on another target gene “Y” (or on itself) and thus regulating of ODEs, where the time-dependent concentrations of RNAs, proteins, the extent that the latter is expressed . These gene-gene interactions and other molecules are predicted as a function of the rate constants 16,17 can be simply visualized as a directed graph with the genes being the of the reactions in the network . An even more realistic description nodes (vertices) and the directed edges (links) representing the makes use of the CME approach where one predicts not only the mean 6,7 interactions . Networks inferred from gene expression data, com- expression levels of various genes but also the distributions of the monly called gene regulatory networks , have been reconstructed by discrete numbers of mRNAs and/or proteins measured across a 9–13 18 several methods . The complex connectivity of these networks population of cells . This stochasticity has various sources (biological makes intuitive understanding of their dynamics challenging. Conse- intrinsic and extrinsic noise, and technical noise introduced by quently, the construction, mathematical analysis, and simulation of experimental protocols), all of which lead to the large differences in 19–21 models of gene regulatory networks are indispensable tools in a gene expression observed from one cell to another . quantitative biologist’s arsenal. Unfortunately, with an increasing level of sophistication and Several formalisms have been employed to predict gene reg- predictive power, simulations also rapidly become computationally ulatory network dynamics, including Boolean networks, ordinary dif- expensive. Unraveling the stochastic dynamics of gene networks ferential equations (ODEs), and chemical master equations (CMEs)— requires solving a set of coupled CMEs for the probability of the system for reviews covering these approaches and more, please see refs. 14,15. being in each possible state. Since the number of states of a gene These approaches have various advantages and disadvantages. In network is typically infinite, direct solution of these equations is 1 2 Applied and Computational Mathematics Division, Beijing Computational Science Research Center, Beijing, China. School of Biological Sciences, University of Edinburgh, Edinburgh, UK. e-mail: [email protected] Nature Communications | (2024) 15:6557 1 1234567890():,; 1234567890():,; Article https://doi.org/10.1038/s41467-024-50716-z t = 5 CPU time nonlinear network t = 10 SSA # of protein 1 # of protein 2 Holimap t = 5 FSP CPU time linear network t = 10 # of protein 1 # of protein 2 Fig. 1 | Illustration of Holimap and its advantages over the SSA. Holimap row indicate that FSP truncates an infinite state space into a finite one and then decouples gene-gene interactions in a nonlinear regulatory network and trans- solves the finite-dimensional CME). Compared with the conventional Monte Carlo forms it into a linear network with multiple effective parameters, some of which approach (the SSA, whose two main stochastic steps are illustrated by dice), Holi- may be time-dependent. The time evolution of protein number distributions (for all map not only significantly reduces the CPU time, but it also yields an accurate, genes) of the nonlinear network can be approximately predicted by solving the noise-free prediction of the protein number distributions. dynamics of the effective linear network using, e.g., FSP (the lattices in the lower impossible. The finite-state projection algorithm (FSP) truncates the The paper is structured as follows. The Holimap method is infinite state space to a finite one; this renders numerical solutions introduced by means of a simple autoregulatory feedback loop possible becauseweonly need tosolve a finite-dimensional CME. example where we show step-by-step how the approximation is con- However, the immense number of states limits its applications to very structed when second or higher-order interactions are only between a small networks with one or two interacting genes. For larger networks protein and a gene. The method is then extended to show the appli- with multiple interacting genes, Monte Carlo simulations based on the cation to more complex networks with multiple protein-gene inter- stochastic simulation algorithm (SSA) become more practical. Spe- actions and also to networks with gene product interactions such as cifically, given the current state of the system, the SSA generates two those with RNA-RNA, RNA-protein, and protein-protein high-order random numbers to predict the time when the next reaction event reactions. By comparison with the SSA or FSP, we show that inde- occurs and which particular reaction event will occur. The output is a pendent of the type of interactions in a gene network, Holimap pro- number of statistically correct trajectories (molecule number versus vides highly accurate time-dependent distributions of protein or time data), one for each cell, from which the copy number distribu- mRNA numbers over large swathes of parameter space including those tions of all biochemical species can be calculated. However, the issue regions where the system displays oscillatory or multistable dynamics. remains that a large sampling size is typically required to obtain Finally, we show that the computation time of Holimap can be sig- smooth distributions and hence the computational time can still be nificantly reduced while maintaining its accuracy by devising a hybrid very considerable. For an introduction to simulation methods in sto- method which combines both Holimap and the SSA. chastic biology, we refer the reader to refs. 24–26. In this paper, we overcome the difficulties of conventional Results stochastic simulation methods for gene networks by devising an Fundamental principles of Holimap illustrated by an auto- efficient approach—the high-order linear-mapping approximation regulation example 27,28 (Holimap). The basic idea is to map the dynamics of a complex gene Consider a simple autoregulatory feedback loop ,whereby protein network with second or higher-order interactions (a system with expressed from a gene regulates its own transcription (Fig. 2a). Feed- nonlinear propensities and hence a nonlinear network) to the back is mediated by cooperative binding of h protein copies to the 29–32 33 dynamics of a much simpler system where all reactions are first- gene . In agreement with experiments , protein synthesis is order (a linear network). The reaction rates of this system are gen- assumed to occur in bursts of random size k sampled from a geometric erally time-dependent and complex functions of the reaction rates distribution with parameter p, i.e., Pðk = nÞ = p ð1  pÞ.Here σ is the of the original gene network and they are found by conditional binding rate of protein to the gene; σ is the unbinding rate; ρ and ρ u b u moment-matching. The linear network has a much smaller state are the burst frequencies of protein, i.e., the frequencies with which space than the nonlinear network which means that now simulation bursts are produced, when the gene is in the bound and unbound using FSP becomes feasible, leading to smooth distributions of states, respectively; d is the rate of protein degradation and dilution protein numbers in a fraction of the time taken by SSA simulations. (due to cell division). The reaction system describes a positive feed- For an illustration of Holimap see Fig. 1. back loop when ρ > ρ (since in the case, binding of a protein increases b u Nature Communications | (2024) 15:6557 2 probability probability probability probability Article https://doi.org/10.1038/s41467-024-50716-z autoregulatory circuit LMA a b σ σ b b G + h P G* G G* P ρ σ σ u u u G* G + h P G* G σ σ b u ρ ρ σ σ u u b u G G + k P G G + k P ρ ρ b b b G* b G* + k P G* G* + k P h copies G* d d G* 2-parameter Holimap 4-parameter Holimap c d σ σ b G b G G G G* G* ρ ρ u u σu P u P G* G G* G d d ρ ρ σ σ u σb σu u b u G G + k P G G + k P b b b G* ρ G* G* + k P G* + k P d d G* G* e f LMA 2-HM 4-HM HD HD HD HD 1.5 0.18 1 0.69 1 0.69 1 0.69 σ < d < d σ > d > d u u 0.46 0.46 0.46 0.12 -1 -1 -1 positive positive feedback feedback 0.06 0.23 0.23 0.23 negative negative bimodality bimodality bimodality bimodality bimodality bimodality feedback feedback LMA LMA 2-HM 2-HM 4-HM 4-HM 0.5 0 -3 0 -3 0 -3 0 -0 2 2 -0 2 2 -0 2 2 0.5 1 1.5 log ρ log ( σ / d ) log ( σ / d ) log ( σ / d ) u u u 10 u 10 10 10 slow switching intermediately fast switching fast switching h g 0.09 0.15 0.24 0.48 FSP LMA LMA 2-HM 0.06 0.1 0.16 0.32 4-HM 2-HM A B C 4-HM 0.03 0.05 0.08 0.16 0 0 0 13 2 4 02 10 0 30 40 02 10 0 30 40 02 10 0 30 40 cooperativity protein number protein number protein number Fig. 2 | Holimaps for autoregulatory gene networks in steady-state conditions. binding rate σ (normalized by the decay rate d)when ρ ≫ ρ . The red curves b b u a Stochastic model of an autoregulatory feedback loop, which includes bursty enclose the true bimodal region, i.e., the parameter region in which the protein protein synthesis, protein decay, cooperative binding of protein to the gene, and number has a bimodal distribution, as predicted by FSP; the orange curves enclose unbinding of protein. b The LMA maps the nonlinear network to a linear one with the bimodal region predicted by the approximation method. The vertical white effective parameter σ ^ . The high-order reactions G + hP ⇌ G in the former are dashed line demarcates the region of σ ≥ d, where the linear network given by the b u replaced by the first-order reactions G ⇌ G in the latter. c The 2-HM maps the LMA can never exhibit bimodality, from the region of σ < d where it can. nonlinear network to a linear one with effective parameters σ ~ and σ ~ . d The 4-HM g Comparison of the steady-state protein distributions computed using FSP, LMA, u b maps the nonlinear network to a linear one with effective parameters σ  ,σ  ,ρ  ,and 2-HM, and 4-HM in different regimes of gene state switching. h The maximum HD as u b u ρ . e Heat plot of the HD for the LMA as a function of the protein burst frequencies a function of the cooperativity h for the LMA and Holimaps. Here the maximum HD ρ and ρ . Here the HD for the LMA represents the Hellinger distance between the is computed when σ and σ vary over large ranges, while other parameters remain u b u b real steady-state protein distribution computed using FSP applied to the nonlinear fixed. See Supplementary Note 1 for the technical details of this figure. Source data system and the approximate protein distribution computed using the LMA. f Heat are provided as a Source Data file. plots of the HDs for the LMA and Holimaps as functions of the unbinding rate σ and its own expression) and describes a negative feedback loop when the moments: ρ < ρ (binding of a protein decreases its own expression). b u Let p denote the probability of having n proteincopiesinan i,n individual cell when the gene is in state i with i=0, 1 corresponding to g = σ g  σ μ , 0 u 1 b 1,0 the unbound and bound states, respectively. To proceed, let _ μ = ρ Bg  dμ + σ ðμ + g Þ σ ðμ + μ Þ, 1,0 u 0 1,0 u 1,1 1 b 2,0 1,0 g = p be the probability of observing the gene in state i and let i n=0 i,n _ μ = ρ Bg  dμ  σ μ + σ μ , 1,1 b 1 1,1 u 1,1 b 2,0 ð1Þ μ = nðn  1Þðn  m+1Þp be the mth factorial moment of m,i n=0 i,n μ =2ρ Bðμ + Bg Þ 2dμ 2,0 u 1,0 0 2,0 protein numbers when the gene is in this state. For simplicity, we first + σ ðμ +2μ Þ σ ðμ +2μ Þ, u 2,1 1,1 b 3,0 2,0 focus on the case of non-cooperative binding (h = 1). From the CME, it μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ , 2,1 b 1,1 1 2,1 u 2,1 b 3,0 is straightforward to obtain the following time evolution equations for Nature Communications | (2024) 15:6557 3 log ρ maximum HD 10 b log ( σ d ) probability / 10 b Article https://doi.org/10.1038/s41467-024-50716-z where g =1 − g and B = ⟨k⟩ = p/(1 − p) is the mean protein burst size, solved for the values of all zeroth, first, and second-order moments, 1 0 i.e., the mean number of protein molecules produced in a single burst. i.e., g , μ ,and μ . Finally substituting these into Eq. (4) gives the i 1,i 2,i e e For clarity, we have suppressed the explicit time dependence of all values of the effective parameters σ and σ for the linear network. See b u moments. Note that this system of equations is not closed, i.e., the Supplementary Note 2 for a more detailed explanation of the Holimap equation for a moment of a certain order depends on moments of algorithm. e e higher orders, and hence an exact solution is generally impossible. This In steady-state, the values of σ and σ are constants independent b u difficulty stems from the nonlinear dependence on molecule numbers of time, and hence we can use the steady-state protein distribution of of the bimolecular propensity modeling protein-gene interactions . the linear network to approximate that of the nonlinear one—this can In contrast, a linear gene network (one composed of only first- be computed analytically or using FSP. When the system has not e e order reactions, i.e., the propensity of each reaction has a linear reached steady-state, the values of σ and σ depend on time t.Inthis b u dependence on molecule numbers) is much easier to solve both ana- case, we can use the time evolution of the linear network with time- lytically and numerically than a gene network with nonlinear propen- dependent rates to predict that of the nonlinear one—while analytical sities; for example, the moment equations are closed and thus can be solutions are not generally available in this case, the distributions can solved exactly in this case. A basic idea of the linear-mapping be efficiently computed using FSP. approximation (LMA) developed in ref. 35 is to transform a complex In some regions of parameter space, the 2-HM may still not be nonlinear network into a linear one by replacing all second or higher- accurate enough. To solve this problem, we devise a second type of order reactions between proteins and genes by effective first-order Holimap—the 4-parameter Holimap (4-HM), which transforms the reactions. Specifically, for the network in Fig. 2a, we replace the reac- nonlinear network into the linear one illustrated in Fig. 2d. Here the * * tions G + hP ⇌ G by G ⇌ G . The LMA maps the nonlinear network to the binding rate σ , unbinding rate σ , and the protein burst frequencies ρ b u b linear one shown in Fig. 2b, where the binding rate σ for the former is and ρ for the former are replaced by four effective parameters b u replaced bytheeffectivegeneswitching rate σ for the latter, while the σ ,σ ,ρ ,and ρ for the latter, which can be determined by matching b b u b u other parameters remain unchanged. In the LMA, σ is chosen to be σ the moment equations for the two networks (“Methods”). Note that multiplied by the conditional mean of protein numbers in the unbound while for the 2-HM, we matched only the zeroth and first-order gene state, i.e., moments, for the 4-HM, we matched these and also the second-order moments. The 2-HM and 4-HM will be collectively referred to as Hol- σ μ b 1,0 imaps in what follows. σ = σ hnji=0i = , ð2Þ b b 0 Thus far, we have only considered the case of h =1.For thecaseof cooperative binding (h ≥ 2), the Holimap approximation procedure where g and μ can be calculated by a natural moment-closure can be similarly performed, except that higher-order moment equa- 0 1,0 method (“Methods”) . There are two approximations involved in the tionsneedtobesolved(SupplementaryNote2)—the algorithm for LMA: (i) in reality, the effective parameter σ ^ should be stochastic finding the effective parameters requires the solution of (h + 1)-order rather than deterministic since it is proportional to the instantaneous moment equations. For example, when h = 2, third-order moment protein number in the unbound state; (ii) any moment-closure method equations need to be solved and the effective parameters depend on inevitably leads to some errors . the values of zeroth, first, second, and third-order moments. We Next we propose an efficient method—Holimap, which we will emphasize that the computational cost of Holimap is mainly deter- show to perform much better than the LMA. There are two types of mined by the number of moment equations, L,tobe solved. For Holimaps. The first type is the 2-parameter Holimap (2-HM) which autoregulatory loops, L=1+2h for the LMA and L =3+ 2h for Holimap. transforms the nonlinear gene network into the linear one illustrated in Note that the 2-HM and 4-HM have the same L. Fig. 2c, where both the binding and unbinding rates σ and σ for the The principles used to construct Holimaps for autoregulated b u e e former are replaced by the effective gene switching rates σ and σ for networks can be used to obtain Holimaps for an arbitrarily complex b u e e the latter. The remaining question is how to determine σ and σ so network consisting of a system of interacting genes that regulate each b u that the solution of the linear network accurately approximates that of other via positive or negative feedback. A flow chart of the Holimap the nonlinear one. For the linear network, the evolution of moments is algorithm for a general regulatory network can be found in Supple- governed by mentary Fig. 1. The computational time of Holimap depends on the complexity of the network—an increased number of nodes (genes) or _ e e g = σ g  σ g , edges (regulatory reactions) results in an increased number of 0 u 1 b 0 _ moment equations L to be solved. In Supplementary Note 3, we prove μ = ρ Bg  dμ + σe μ  σe μ , 1,0 u 0 1,0 u 1,1 b 1,0 that for a general network, L scales polynomially with the cooperativity _ e e μ = ρ Bg  dμ  σ μ + σ μ , ð3Þ 1,1 b 1 1,1 u 1,1 b 1,0 h and scales exponentially with respect to the network size M (number _ e e μ =2ρ Bðμ + Bg Þ 2dμ + σ μ  σ μ , 2,0 u 1,0 0 2,0 u 2,1 b 2,0 of genes). _ e e μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ : 2,1 b 1,1 1 2,1 u 2,1 b 2,0 Applications to one-node (autoregulatory) networks e e The effective rates σ and σ are chosen so that the two systems have We now assess the performance of Holimap based on the Hellinger b u the same zeroth and first-order moment equations (for the latter, we distance (HD) between the steady-state protein distribution obtained mean the first-order moment when the gene is in the bound state). by applying FSP to the nonlinear network and the approximate dis- Matching the first and third identities in Eqs. (1)and (3), we find that σ tribution computed using the LMA and the two types of Holimaps. and σ should satisfy Note that while the direct application of FSP also leads to an approx- imate distribution, in effect, it can be considered exact since the error e e σ g  σ g = σ g  σ μ , is very small provided the state space is truncated to a large enough u 1 b 0 u 1 b 1,0 ð4Þ e e value . Here we choose the HD because it is bounded between 0 and 1; σ μ  σ μ = σ μ  σ μ : u 1,1 b 1,0 u 1,1 b 2,0 a visually accurate approximation is obtained when the HD ≪ 0.1. The remaining question is how to use these equations to obtain for- Figure 2e illustrates the HD for the LMA as a function of ρ and ρ . u b mulae for the effective rates. This can be done as follows: we first solve Clearly, the LMA performs well when ρ and ρ are not very different u b e e for σ and σ using Eq. (4) and then substitute these into Eq. (3)to from each other. However, it results in larger deviations from FSP when b u obtain a set of closed moment equations. These equations can be the protein burst frequency in one gene state is significantly larger Nature Communications | (2024) 15:6557 4 Article https://doi.org/10.1038/s41467-024-50716-z than that in the other. We also find that the LMA is much more accurate which usually occurs when σ and σ are large for negative feedback u b for negative feedback loops (ρ > ρ ) than for positive feedback loops loops (Supplementary Fig. 3). We did not observe numerical instability u b (ρ > ρ ). In the LMA, the effective stochastic parameter σ is for the 2-HM. As a result, the 2-HM might be the preferable choice b u approximated by σ multiplied by the conditional mean of protein when dynamics is of major interest. In steady-state, while the numbers in the unbound state. Hence it must give rise to inaccurate improvement in accuracy of the 4-HM may be marginal, nevertheless approximations when protein noise in the unbound gene state is large. since the two types of Holimaps require the solution of the same This is exactly what happens in the positive feedback case where the number of moment equations, the 4-HM is more advantageous when low synthesis rate in the unbound state results in a small conditional dynamics is not of interest. mean and thus large protein noise. We next examine whether Holimap outperforms the LMA when it Applications to two-node networks with deterministic mono- is applied to positive feedback loops. Figure 2fshows theHD against and bistability σ /d and σ /d for the LMA, 2-HM, and 4-HM when ρ ≫ ρ .Itisclear that We next evaluate the performance of Holimaps when applied to study u b b u the LMA (Fig. 2f, left) performs well when σ and σ are both small, but thesteady-statebehavioroftwo-nodegenenetworks,wheretwogenes u b it becomes highly inaccurate when σ and σ are larger. The protein regulate each other (Fig. 3a, left). Feedback is mediated by cooperative u b distribution can be unimodal or bimodal. The bimodal one is of par- binding of h copies of protein P to gene G and cooperative binding of 1 1 2 ticular interest because it indicates the separation of isogenic cells into h copies of protein P to gene G .Here σ and σ are the binding and 2 2 1 bi ui two different phenotypes. In particular, we find that the LMA results in unbinding rates for gene G,respectively; ρ and ρ are the synthesis i bi ui poor approximations when σ ≥ d and when the distribution is bimo- rates of protein P when the gene is in the bound and unbound states, u i dal. This can be explained as follows. Recall that the LMA transforms a respectively; d is the degradation rate of protein P . For simplicity, we i i nonlinear network into a linear one with unchanged σ ,which is do not take protein bursting into account, although it can be included commonly known as the telegraph model of stochastic gene easily. Depending on whether ρ < ρ or ρ > ρ for i=1,2,there are ui bi ui bi expression .Inref. 39, it has been proved that the telegraph model can four different types of effective system dynamics that constitute either produce a bimodal steady-state distribution only when both gene a positive feedback or a negative feedback loop (Fig. 3b). For example, switching rates are smaller than the protein decay rate (σ ,σ ^ < d). atoggleswitch(twonegativeregulations) corresponds to the case of u b When σ ≥ d, the linear network can never exhibit bimodality, while the ρ > ρ and ρ > ρ . For two-node networks, Holimaps can be per- u u1 b1 u2 b2 bimodality in the nonlinear network can be apparent. formed in a similar way as we have previously shown for autoregulatory We emphasize that σ ≥ d is biologically relevant since in naturally loops, i.e., by replacing all protein-gene binding reactions by effective occurring systems, protein is usually very stable and hence its decay first-order reactions with new parameters and also allowing some of rate is often smaller than the rates of gene state switching. For exam- the other reactions to have different rate constants than those in the ple, in mouse fibroblasts, it has been measured that the median original network (Fig. 3a, center and right). proteinhalf-life is 46 hand themeancellcycle duration is27.5h;hence We first focus on a negative feedback loop without cooperative the mean decay rate of protein is estimated to be binding (Fig. 3c). Since the LMA performs well when the unbinding 1 4 1 d = ðlog 2Þ=46 + ðlog 2Þ=27:5h =6:7×10 min .Inthe same cell rate σ is much smaller than the degradation rate d, hereweconsider ui i type, the mean activation and inactivation rates for thousands of genes the case of σ ≫ d . We use the HD between the actual and approximate ui i −1 −1 42 are estimated to be 0.002 min and 0.24 min . In another study, the steady-state distributions of protein P to test the accuracy of Holimap. −1 mean activation and inactivation rates are estimated to be 0.014 min Figure 3d illustrates the HDs for the LMA and 4-HM as functions of σ b1 −143 and 0.17 min .Hence σ ≥ d is indeed satisfied for most genes. and σ .We find that the network displays bimodality when σ is large u b2 b1 In contrast to the LMA, both the 2-HM and 4-HM markedly reduce and σ is small. This is surprising because in the literature there are b2 the HD values (Fig. 2f, center and right). The LMA has a maximum HD two well-accepted origins for bimodality: (i) a positive feedback loop of 0.7, while for the two types of Holimaps, the maximum HDs are only with ultra-sensitivity (type-I) and (ii) slow switching between gene 0.2 and 0.16. The 4-HM performs marginally better than the 2-HM in states (type-II), independent of the type of feedback loop .Herethe capturing steady-state protein distributions. We also compare the network is a negative feedback loop without cooperative binding, and region of parameter space where bimodality is predicted to exist thus there is neither a positive feedback loop nor ultra-sensitivity. (region enclosed by the orange curves) with the actual region where Moreover, since both σ and σ are large, gene G switches rapidly u1 b1 1 bimodality manifests according to FSP (region enclosed by the red between the two states. Hence the bimodality observed is neither type- curves). We note that while the LMA fails to capture the bimodal region I nor type-II, and in the following, we refer to it as type-III bimodality. of the protein distribution, especially when σ ≥ d, both the 2-HM and From Fig. 3d, it is clear that the LMA performs poorly in this 4-HM capture the vast majority of the bimodal region. In summary, the bimodal region. Again, the LMA cannot capture type-III bimodality deficiencies of the LMA for positive feedback loops are remedied by since it transforms the nonlinear network into a linear one with the use of Holimaps (Fig. 2g). unchanged σ , which is unable to produce a bimodal distribution ui Finally, we examine how the cooperativity in protein binding when σ ≥ d . On the other hand, the 4-HM significantly reduces the ui i affects the accuracy of various approximation methods. Figure 2h HD values and performs exceptionally well in capturing the bimodal shows the maximum HD as a function of h for the LMA, 2-HM, and 4- region (Fig. 3e). Here we do not show the 2-HM because it leads to HM, where the maximum HD is computed when σ and σ vary over similar results as the 4-HM except for a slightly larger HD value. u b large ranges and other parameters remain fixed. Clearly, for the LMA, We next consider a toggle switch with cooperative binding, where the maximum HD increases approximately linearly with respect to h two genes repress each other (Fig. 3f). Note that this is a positive when h ≤ 4; for Holimaps, the maximum HD is insensitive to h.SinceTF feedback loop with ultra-sensitivity and hence it can produce deter- cooperativity is the norm rather than the exception ,our resultssug- ministic bistability (type-I bimodality), which means that the corre- gest Holimap’s accuracy remains high over the physiologically mean- sponding system of deterministic rate equations (Supplementary ingful range of parameter values. Note 4) is capable of having two stable fixed points and one unstable The results that we have presented assume steady-state condi- point. Again, we only focus on the situation of σ ≫ d.Figure 3g ui i tions. However, the 2-HM can also accurately reproduce the time illustrates the HDs for the LMA and 4-HM against σ and σ .The b1 b2 evolution of the protein distribution for nonlinear gene networks yellow curve encloses the region of deterministic bistability, which is (Supplementary Fig. 2). The 4-HM is also accurate; however depending markedly smaller than the true bimodal region enclosed by the red on parameter values, it may lead to numerical instability at short times, curve. According to simulations, bimodality can be observed when Nature Communications | (2024) 15:6557 5 Article https://doi.org/10.1038/s41467-024-50716-z 2-node network 2-parameter Holimap 4-parameter Holimap G G G 1 1 1 ρ ρ ρ u1 u1 u1 d d d 1 1 1 b1 u1 σ σ σ σ b1 u1 b1 u1 P P P 1 1 1 ρ ρ b1 b1 ρ b1 h copies 2 * * * G G G 1 1 1 G G G 2 2 2 ρ ρ u2 u2 u2 P P P 2 2 2 d d d 2 2 2 σ σ b2 σ σ σ σ u2 b2 u2 b2 u2 ρ ρ b2 b2 ρ b2 h copies * * * 1 G G G 2 2 2 positive feedback loop negative feedback loop negative feedback loop positive feedback loop (toggle switch) G G 2 G G G G G G 1 1 2 1 2 1 2 ρ ρ ρ ρ ρ ρ ρ ρ b1 > u1 b1 < u1 b1 > u1 b1 < u1 ρ ρ ρ ρ ρ ρ ρ ρ b2 > u2 b2 > u2 b2 < u2 b2 < u2 LMA 4-HM HD HD 2 2 0.09 c d e 0.4 0.4 bimodality bimodality FSP prediction of 4-HM prediction of 4-HM negative feedback loop LMA 0.3 0.3 0.06 2-HM G G 4-HM 1 2 -1 -1 0.2 0.2 0.03 0.1 0.1 h = h = 1 1 2 0 0 -4 -4 0 02 10 0 30 40 -1 2 4 -1 2 4 log σ log σ protein number 10 b1 10 b1 LMA 4-HM HD HD f g 1 1 h 0.093 0.69 0.69 toggle switch 0.062 0.46 0.46 G 1 G 2 -2 -2 0.031 0.23 0.23 bimodality bimodality h = h = 2 1 2 prediction of 4-HM prediction of 4-HM deterministic bistability deterministic bistability -5 0 -5 0 0 02 10 0 30 40 -0 3 3 -0 3 3 log σ log σ protein number 10 b1 10 b1 Fig. 3 | Holimaps for two-node gene networks in steady-state conditions. predicted by Holimap. e Comparison of the steady-state distributions of protein P a Illustration of the 2-HM and 4-HM for a two-node gene network, where two genes computed using FSP, LMA, 2-HM, and 4-HM. f A toggle switch with cooperative G and G regulate each other. Feedback is mediated by cooperative binding of h binding (h = h =2). g Same as (d) but for the toggle switch. The yellow curve 1 2 1 1 2 copies of protein P to gene G and cooperative binding of h copies of protein P to encloses the parameter region of deterministic bistability, i.e., the region in which 1 2 2 2 gene G . b The two-node network can describe four different feedback loops, the deterministic rate equations have two stable fixed points and one unstable fixed according to whether ρ > ρ or ρ < ρ for i =1, 2. c A negative feedback loop with point. h Same as (e) but for the toggle switch. Here the parameters are chosen so ui bi ui bi non-cooperative binding (h = h =1). d Heat plots of the HDs for the LMA and 4-HM that the system displays deterministic bistability. While we only focus on the dis- 1 2 as functions of the binding rates σ and σ . Here the HD represents the Hellinger tribution of protein P in (d), (e), (g), and (h), the distribution of the second protein b1 b2 1 distance between the real and approximate steady-state distribution of the number P is also accurately predicted by Holimap (Supplementary Fig. 3). See Supple- of molecules of protein P . The red curve encloses the true bimodal parameter mentary Note 1 for the technical details of this figure. Source data are provided as a region computed using FSP, and the orange curve encloses the bimodal region Source Data file. both σ and σ are large. The LMA fails to reproduce the bimodal Applications to three-node networks with deterministic b1 b2 distribution since σ ≥ d , as expected. The 4-HM not only successfully oscillations ui i captures the bimodal region (enclosed by the orange curve), but also We now focus on three-node networks, where three genes regulate yields small HD values. The maximum HD for the LMA is as large as 0.7, each other in a cyclic manner (Fig. 4a, left). Feedback is mediated by while it is only 0.13 for the 4-HM. In particular, in the deterministically cooperative binding of h copies of protein P to gene G for i=1, 2, 3, i i i+1 bistable region, both the 2-HM and 4-HM accurately predict the pro- where G = G . Again, depending on whether ρ < ρ or ρ > ρ for 4 1 ui bi ui bi tein distribution while the LMA completely fails (Fig. 3h). i = 1, 2, 3, the network can be a repressilator (three negative Nature Communications | (2024) 15:6557 6 log σ log σ 10 b2 10 b2 probability probability Article https://doi.org/10.1038/s41467-024-50716-z 3-node network 2-parameter Holimap G G 1 1 ρ ρ u1 u1 d d 1 1 b1 u1 σ σ b1 u1 P P 1 1 ρ ρ b1 b1 h copies * * 3 G G 1 1 4-parameter Holimap G G G 2 3 1 ρ ρ ρ u1 u2 u3 P P 2 3 d2 d3 d1 σ σ σ σ b2 u2 b3 u3 σ σ b1 u1 ρ ρ b2 b3 ρ b1 h copies * h copies * * 1 G 2 G G 2 3 1 75 24 b c repressilator 50 16 ρ ρ b1 u1 ρ ρ SSA b2 u2 LMA ρ ρ b3 u3 2-HM h = h = h = 3 1 2 3 0 0 G G 2 3 02 10 0 30 40 02 10 0 30 40 time time 0.22 0.12 0.07 0.04 0.04 t = 0.4 t = 0.8 t = 1.2 t = 1.6 t = 2 SSA LMA 2-HM SSA+2-HM 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 0.04 0.04 0.04 0.05 0.04 t = 3 t = 5 t = 7 t = 9 t = 11 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 0.04 0.04 0.04 0.04 0.04 t = 14 t = 18 t = 22 t = 26 t = 30 0 0 0 0 0 060 120 060 120 060 120 060 120 060 120 protein number Fig. 4 | Holimaps for three-node gene networks. a Illustration of the 2-HM and oscillations. c Time evolution of the mean and Fano factor of fluctuations in the 4-HM for a three-node gene network, where three genes regulate each other along number of molecules of protein P computed using the SSA (with 10 trajectories), the counterclockwise direction. Feedback is mediated by cooperative binding of h LMA, and 2-HM. d Comparison of the time-dependent distributions of protein P at i 1 copies of protein P to gene G ,where G is understood to be G . b Arepressilator 15time points computed using the SSA (with 10 trajectories), LMA, 2-HM, and i i+1 4 1 with cooperative binding. Here the cooperativities are chosen as h =3 for hybrid SSA+2-HM (with 2000 trajectories). See Supplementary Note 1 for the i = 1, 2, 3 such that the deterministic system of rate equations produces sustained technical details of this figure. Source data are provided as a Source Data file. regulations) , a Goodwin model (one negative regulation and two (Supplementary Note 4) to produce sustained oscillations. According 46 47 positive regulations) , or a positive feedback loop . to simulations, deterministic oscillations are not observed when h ≤ 2. As for previous examples, Holimap transforms the nonlinear Figure 4c illustrates the oscillatory time evolution of the mean and network into a linear one (Fig. 4a, right). We now focus on the Fano factor (the variance divided by the mean) of fluctuations in the repressilator illustrated in Fig. 4b, where the cooperativities are chosen number of protein P computed using the SSA, LMA, and 2-HM. Note as h = h = h = 3. Here high cooperativities are chosen since we that here we do not consider the 4-HM because, as previously men- 1 2 3 require the corresponding deterministic system of rate equations tioned, it may cause numerical instability when computing time- Nature Communications | (2024) 15:6557 7 probabiity mean Fano factor Article https://doi.org/10.1038/s41467-024-50716-z dependent distributions. The LMA fails to reproduce damped oscilla- we also compare the CPU times and accuracy of these methods. The tions in the time evolution of the mean and Fano factor, while Holimap number of SSA trajectories N needed for SSA + 2-HM is chosen such excellently captures these oscillations. Note also that the LMA sig- that the distributions obtained from N trajectories and those obtained nificantly underestimates the variance of fluctuations and hence leads from 3N trajectories have an HD (averaged over all time points) less to a much smaller Fano factor in the limit of long times. than 0.02, i.e., increasing the sample size will not substantially improve Figure 4d compares the time-dependent protein distributions the approximation accuracy—a sample size of N = 2000 is sufficient to computed using the SSA, LMA, and 2-HM. Interestingly, both the LMA satisfy this criterion. Notably with almost the same CPU time, SSA + 2- and 2-HM accurately reproduce the protein distribution at small times HM yields distributions that are significantly more accurate than those (t ≤ 3). However, the LMA fails to reproduce bimodality at intermediate obtained from only the SSA with the same number of trajectories—the and large times since it underestimates noise. In contrast, Holimap HD for the former is only 0.04–0.06, while for the latter it is 0.11–0.13; performs remarkably well in predicting the complete time evolution of here the distributions obtained from the SSA with 10 trajectories are the protein distribution. used as a proxy of ground truth when computing the HDs. We also Thus far, we have considered regulatory networks where each note that SSA + 2-HM yields distributions that are practically as accu- gene is regulated by one TF; however, many genes are regulated by a rate as the 2-HM but are over 16 times faster (28 s vs 7 min 39 s). multitude of TFs which are often shared between multiple genes .In To further test the accuracy of SSA + Holimap, we apply it to a Supplementary Note 5, we investigate gene networks with two TF random M-node gene network (Fig. 5c), where any pair of nodes has a binding sites. We show that Holimap performs excellently in capturing probability of 2/M to be connected. This guarantees that each gene the protein distributions as well as the bimodal region, independent of on average regulates two genes. When connected, each direct edge has the type of network topology and the type of TF binding (independent, an equal probability to be positive or negative regulation; auto- positive cooperative, and negative cooperative binding ). regulation is also allowed. The details of the stochastic model are describedinMethods.Wethenapply the2-HMtotransform the A hybrid combination of SSA and Holimap provides highly effi- nonlinear random network into a linear one and then use 2000 SSA cient computation of complex gene network dynamics trajectories to estimate the effective parameters of the linear network. The FSP and SSA are two widely used methods for solving the Figure 5d illustrates the CPU times and HDs against the number of dynamics of stochastic chemical reaction systems. While FSP yields an nodes M forSSA+2-HMand theSSA with thesamenumberoftra- accurate distribution, from a practical point of view, it is only applic- jectories. Again an SSA with 10 trajectories is used to generate a proxy able to small networks where protein numbers are not very large; for of the ground truth distribution when computing the HDs. Clearly, the large networks, the size of the state space leads to an enormous two methods yield almost the same CPU times that approximately computational cost . The SSA can also be computationally expensive, linearly scale with M. This is because for SSA + 2-HM, almost all time is particularly when the network has multiple reaction time scales . spent on simulating the SSA trajectories, while solving the linear net- When fluctuations are large, it can yield a non-smooth distribution, work consumes very little time. However, compared with an SSA with from which it is sometimes even difficult to determine the number of 2000 trajectories, SSA + 2-HM gives rise to markedly lower HD values, modes. To overcome this, a huge number of stochastic trajectories which are insensitive to M. may be needed to obtain statistically accurate results. Holimap pro- vides an accurate and smooth approximation of the protein distribu- Generalization to networks with post-translational or post- tions; however, it becomes computationally slow when the network is transcriptional regulation complex or the cooperativity is large since in these cases we have to Thus far, we have showcased Holimap in transcriptional networks with solve a large number of moment equations. This raises an important protein-gene interactions. A crucial question is whether Holimap can question: is it possible to develop a highly efficient and accurate also be applied to solve the dynamics of post-translational and post- computation method of stochastic gene network dynamics that yields transcriptional networks with complex protein-protein, protein-RNA, a smooth distribution? and RNA-RNA interactions. To see this, we first focus on two post- To address this question, we propose a hybrid method that com- translational networks (Fig. 6a, b). bines the SSA and Holimap. This method consists of three steps (Fig. 5a). Figure 6a shows a two-node synthetic network with autoregula- First we use the SSA to generate a small number of trajectories (usually a tion and protein sequestration .Hereprotein P produced from gene few thousand trajectories are enough) from which we compute the G regulates its own expression; the two proteins P and P can bind to i 1 2 steady-state or time-dependent sample moments of protein numbers. each other and form an inactive complex C.Wethendeviseathree- We then use the latter to compute the approximate effective parameters parameter Holimap (3-HM) which transforms the nonlinear gene net- of the linear network. Finally, we use FSP to compute the protein dis- work into the linear one shown in Fig. 6c. In principle, Holimap tribution of the linear network with effective parameters to approximate replaces all high-order interactions between genes, proteins, and RNAs that of the nonlinear one. For example, for the autoregulatory circuit by effective first-order reactions. We first replace the protein-gene * * illustrated in Fig. 2a, we substitute the sample moments obtained from binding reactions G + h P " G by G " G with effective parameters i i i i i i the SSA into Eq. (4) to compute the approximate values of σe and σe ,and σe and σe , and then we replace the protein-protein binding reaction u b ui bi then use the marginal protein distribution of the linear network to P + P → C by P !+ with effective parameter d . Again, using 1 2 i i e e construct the 2-HM of the nonlinear network. In other words, for Holi- moment-matching, the three effective parameters σ ,σ ,and d can ui bi i map, the determination of the effective parameters can be done inde- be represented by low-order moments of the nonlinear network pendently of other computational methods while the hybrid method (Supplementary Note 7) and hence can be computed approximately requires the running of the SSA. using the SSA with a small number of trajectories. In this way, the This hybrid SSA + Holimap method is computationally much fas- hybrid SSA + Holimap can be applied to predict the dynamics of the ter than the SSA because the number of trajectories needed to obtain nonlinear network. good approximations to low-order moments is much less than that Note that since Holimap replaces the binding reaction P + P → C 1 2 needed to obtain smooth protein distributions. It is also computa- by P !+ with a new parameter d, intuitively, one may deduce that tionally less expensive than Holimap since there is no need to solve a this approximation is only valid when protein P is very abundant large number of moment equations. To test this hybrid method, we compared to protein P so that noise in protein P number can be 1 2 compare the time-dependent distributions for the repressilator cal- ignored. However, unexpectedly, we find that Holimap makes accurate culated using the SSA, LMA, 2-HM, and SSA + 2-HM (Fig. 4d). In Fig. 5b, predictions not only in this special case but also in scenarios where the Nature Communications | (2024) 15:6557 8 Article https://doi.org/10.1038/s41467-024-50716-z a b SSA + Holimap Method CPU time HD for P HD for P HD for P 1 2 3 FSP >10 years SSA (10 trajectories) 32 min 36 s 00 0 03 10 20 0 40 0.13 0.11 0.12 SSA (2000 trajectories) 26 s time LMA 3 min 05 s 0.29 0.20 0.22 2-HM 7 min 39 s 0.05 0.04 0.04 sample SSA+2-HM (2000 trajectories) 28 s 0.06 0.04 0.05 σ σ b1 u1 moments c 80 SSA random network SSA+2-HM u1 1 20 σ σ b1 u1 2468 10 b1 0.16 0.12 0.08 0.04 predict the protein distribution 2468 10 by solving the linear network number of genes M Fig. 5 | A hybrid method combining the SSA and Holimap. a SSA+Holimap serves infeasible (see Supplementary Note 6 for the estimation procedure for the CPU as a highly efficient strategy to approximately solve the dynamics of complex time required by FSP). c An illustration of a random M-node gene network, where stochastic gene networks. First, the SSA is used to generate a relatively small each pair of nodes has a probability of 2/M to be connected. There are on average number of trajectories of the nonlinear network from which the steady-state or 2M directed edges for the network, each having an equal probability to be positive time-dependent sample moments are estimated. The low-order sample moments or negative regulation. d Comparison of the CPU times and the accuracy (measured are then used to approximately compute the effective parameters of the linear by HDs averaged over ten-time points and overall proteins) against the number of network. Finally, the protein distributions follow directly by solving the dynamics nodes M for SSA+2HM and the SSA with the same number of trajectories. Here the of the linear network using FSP. b Comparison of the CPU times and the accuracy number of trajectories needed for both SSA+2HM and SSA is chosen as N =2000. (measured by HDs at t = 30) for different methods. All methods were used to Both methods were used to simulate the time-dependent distributions for a ran- simulate the time-dependent protein distributions shown in Fig. 4d. The HD dom network. Data are presented as mean values +/− standard deviations for five represents the Hellinger distance between the actual and approximate protein different random networks. See “Methods” for the technical details. In (b)and (d), distributions. Here a proxy for the ground truth distribution is computed using the all simulations were performed on an Intel Core i9-9900K processor (3.60 GHz). SSA with 10 trajectories rather than FSP since the latter is computationally two proteins interact at comparable concentrations and where P replaced by the effective degradation reaction M !+ with new is very scarce compared to P (Supplementary Fig. 5). This again con- parameter d (Supplementary Note 7). firms the high accuracy of Holimap over large regions of para- Figure 6e illustrates another network with microRNA-mRNA inter- meter space. actions, which has been shown to be capable of producing complex As another example of post-translational regulation, we consider a emergent behaviors such as bistability and sustained oscillations .Here gene network with autoregulation and protein phosphorylation the mRNA of interest, expressed from gene G ,has twomicroRNA (Fig. 6b), which has been used to account for circadian oscillations in binding sites. The microRNA, produced from gene G ,can bind to its Drosophila and Neurospora . Here the free protein P can be reversibly mRNA target and form two inactive complexes C (only one binding site phosphorylated into the forms P and P , successively. The latter form P is occupied) and C (both binding sites are occupied). The free mRNA 1 2 2 2 can bind to the gene and regulate its expression. Both phosphorylation and microRNA are degraded with rates d and d ,respectively. Once the 1 2 and dephosphorylation are enzyme-catalyzed and are described using complex C (C ) is formed, the mRNA and microRNA are degraded with 1 2 Michaelis-Menten kinetics. Holimap can also be applied to this network, rates a (b )and a (b ), respectively. The mRNA dynamics for this net- 1 1 2 2 where protein-gene interactions are replaced by the switching reactions work can also be predicted by Holimap, which replaces the complex e e G" G with effective parameters σ and σ ,and the complex post- post-transcriptional regulation by the effective reaction M !+ with u b translational regulation is replaced by the degradation reaction P !+ new parameter d (Fig. 6f and Supplementary Note 7). with effective parameter d (Fig. 6cand SupplementaryNote7). Note that for transcriptional networks, Holimap does not change Furthermore, we apply Holimap to two post-transcriptional net- the degradation rate; however, for post-transcriptional and post- works (Fig. 6d, e). Figure 6d illustrates a gene network with auto- translational networks, both the binding/unbinding rate and degra- regulation and mRNA degradation control .Here the enzyme can dation rate need to be modified. To test the accuracy of the three- convert between an inactive form E and an active form E.The degra- parameter Holimap, we compare the time-dependent distributions for dation of the mRNA of interest can occur spontaneously with rate d theabovefourgenenetworkscomputedusing theSSA with 10 tra- and can be catalyzed by the active form of the enzyme with rate α. jectories, SSA with 2000 trajectories, and hybrid SSA + Holimap with Holimap transforms the nonlinear network into the linear one shown 2000 trajectories (Fig. 6g). Clearly, SSA+Holimap is accurate for all in Fig. 6f by removing all high-order interactions between molecules. In networks. In particular, the distributions predicted by SSA + Holimap * * particular, the enzyme-catalyzed degradation reaction M + E → E is with a small number of trajectories have almost the same accuracy as Nature Communications | (2024) 15:6557 9 protein # HD CPU time Article https://doi.org/10.1038/s41467-024-50716-z protein sequestration enzyme-catalyzed mRNA degradation a d ρ G u1 d1 b1 u1 σ σ b1 M α h1 copies * G1 h copies * inactive complex u E * E u2 P2 P σ σ b2 u2 3-parameter Holimap 3-parameter Holimap c f b2 h copies * 2 G G 2 G u ρ σ σ b u σ σ b u e microRNA-mRNA interaction protein phosphorylation b ρ * 1 G * G G u1 d b1 u1 σ σ b M P ρ β b1 2 b h copies * G α G 1 a2 C C C 1 β 2 1 α b2 1 b2 a2 d1 a1 α b a 1 3 α P1 G2 b3 ρ β u2 C C β 4 3 b σ R 4 c3 b2 u2 P2 d a 2 4 d2 b2 G2 SSA (10 trajectories) g h protein sequestration (A) protein phosphorylation (B) SSA+Holimap (2000 trajectories) t = 1 t = 10 t = 1 t = 10 0.039 0.03 0.048 0.03 18 SSA (2000 trajectories) SSA (10 trajectories) 0.026 SSA+Holimap 0.02 0.032 0.02 0.013 0.01 0.016 0.01 5 s 15 s 14 s 10 s 0 0 0 0 0 40 80 120 0 40 80 120 0 40 80 120 0 40 80 120 0 protein number protein number protein number protein number AB C D enzyme-catalyzed mRNA degradation (C) microRNA-mRNA interaction (D) SSA (2000 trajectories) SSA+Holimap (2000 trajectories) t = 1 t = 10 t = 1 t = 10 0.03 0.033 0.033 0.021 0.12 0.02 0.022 0.022 0.014 0.08 0.01 0.011 0.011 0.007 0.04 0 0 0 0 0 40 80 120 0 40 80 120 0 40 80 120 0 40 80 120 mRNA number mRNA number mRNA number mRNA number AB C D Fig. 6 | Holimap for post-translational and post-transcriptional networks. degradation reactions. g Comparison of the protein distributions for the four a, b Post-translational networks. a Network with autoregulation and protein networks at two time points computed using the SSA with 10 trajectories, SSA with 50 51 sequestration . b Network with autoregulation and protein phosphorylation . 2000 trajectories, and SSA + Holimap (with 2000 trajectories). h Comparison of c Three-parameter Holimap for post-translational networks. d, e Post- the CPU times and the accuracy (measured by HDs averaged over ten-time points) transcriptional networks. d Network with autoregulation and mRNA degradation of the SSA and SSA + Holimap for the four networks. The distributions obtained 52 53 5 control . e Network with microRNA-mRNA interactions .Here α is the binding rate from the SSA with 10 trajectories are used as a proxy of ground truth when com- of microRNA to its mRNA target and β is the unbinding rate. f Three-parameter puting the HDs. See Supplementary Note 1 for the technical details of this figure. Holimap for post-transcriptional networks. All high-order interactions between Source data are provided as a Source Data file. genes, proteins, and RNAs are replaced by effective first-order switching and those predicted by the SSA with a huge number of trajectories (HD < repressilator, and complex randomly connected networks with 0.03) while the CPU time is reduced by over 60 fold (Fig. 6h). numerous interacting genes. Notably, we have demonstrated that a hybrid method that uses both Holimap and the SSA leads to much Discussion more accurate distributions than solely using the SSA, with practically In this paper, we have constructed a computational method, Holimap, no increase in the CPU time and high accuracy that is independent of for the accurate and efficient prediction of the protein/mRNA number the number of interacting genes in the network. distributions of a general gene regulatory network. We have show- We devised three types of Holimaps—the 2-HM, 3-HM, and 4-HM— cased the method by applying it to a variety of networks including all of them decoupling gene-gene interactions in a nonlinear reg- transcriptional networks with protein-gene interactions, post- ulatory network and transforming it into a linear one with multiple translational networks with protein-protein interactions, and post- effective parameters. The 2-HM and 4-HM apply to transcriptional transcriptional networks with protein-RNA or RNA-RNA interactions. networks, while the 3-HM is only applicable to post-translational and For transcriptional networks, we have tested Holimap in simple auto- post-transcriptional networks. The 4-HM is more accurate than the 2- regulatory loops where a gene influences its own expression, two-gene HM, although the improvement in accuracy is marginal. Depending on systems such as the toggle switch, three-gene systems such as the parameters, the 4-HM may lead to numerical instability at short times. Nature Communications | (2024) 15:6557 10 probability probability post-translational regulation post-transcriptional regulation HD CPU time (minutes) Article https://doi.org/10.1038/s41467-024-50716-z Hence the 4-HM is preferred if our aim is to compute the steady-state unbinding rates are large) and hence should be used more cautiously. distribution, and the 2-HM is a preferable choice if our aim is to Ongoing work aims to extend the method to predict both mRNA and compute the time-dependent distribution. The two types of Holimaps protein dynamics, including their joint distribution for pairs of genes. require the solution of the same number of moment equations and Concluding, we have devised a method that overcomes many of hence give rise to similar CPU times. Since the number of equations to the known difficulties encountered when simulating complex sto- be solved increases exponentially with the network size, the standard chastic gene network dynamics. We anticipate that Holimap will be Holimap is only recommended when the scale of the network is not too useful for investigating noisy dynamical phenomena in complex reg- large. For medium and large-scale networks, the hybrid SSA+Holimap ulatory networks where intuitive understanding is challenging to attain approach is more advantageous since it significantly reduces the CPU and simulations using the SSA become computationally prohibitive. time while maintaining high accuracy. Some of the advantages of our method over other common Methods approximations in the literature are as follows: (i) Holimap does not Determining the effective parameter for the LMA sacrifice the discrete nature of molecular reactions since the approx- For the linear network in Fig. 2b, the evolution of moments is governed by imate distributions are solutions of the CME of the effective linear _ ^ network. This is unlike many common methods that achieve a speed g = σ g  σ g , 0 u 1 b 0 increase by making use of a continuum approximation of the CME such _ μ = ρ Bg  dμ + σ μ  σ ^ μ , ð5Þ 1,0 u 0 1,0 u 1,1 b 1,0 54,55 as the Fokker-Planck / Langevin equations or partial integrodiffer- _ ^ μ = ρ Bg  dμ  σ μ + σ μ : 1,1 b 1 1,1 u 1,1 b 1,0 56,57 ential equations ; (ii) Holimap does not assume the protein number distribution to be of a simple type such as the Gaussian, Poisson, Inserting Eq. (2)into Eq. (5) gives a closed set of moment equations, Lognormal or Gamma distributions, as commonly assumed by con- from which the values of g , μ ,and μ can be computed approxi- 0 1,1 1,0 58,59 ventional moment-closure methods —the solution of the linear mately. Finally, using these values, the effective parameter σ ^ can be network that Holimap utilizes is very flexible and spans a very large obtained from Eq. (2). The remaining steps for the LMA are the same as number of possible distribution shapes including those with multiple for the 2-HM. modes and significant skewness. For example, if each gene in a com- plex regulatory network switches between a number of states for Determining the effective parameters for the 4-HM which only one is active, then Holimap approximates the protein dis- For the autoregulatory circuit, it follows from Eq. (1)that tribution for each gene by that of a multi-state gene expression model _ _ with no regulatory interactions (Supplementary Note 5) for which the μ + μ = ρ Bg + ρ Bg  dðμ + μ Þ 1,0 1,1 u 0 b 1 1,0 1,1 analytical steady-state solution is known to be a generalized hyper- + σ g  σ μ , u 1 b 1,0 60,61 ð6Þ geometric function , which includes a large number of special _ _ μ + μ =2ρ Bðμ + Bg Þ+2ρ Bðμ + Bg Þ 2,0 2,1 u 1,0 0 b 1,1 1 functions as special cases. 2dðμ + μ Þ+2σ μ  2σ μ : 2,0 2,1 u 1,1 b 2,0 Our hybrid SSA+Holimap method shares some similarities with neural network-based approaches , which can also be used to predict For the linear network in Fig. 2d,theevolutionofmomentsisgovernedby complex gene network dynamics. The former uses the SSA to generate the sample moments which are then used to compute the values of g = σ g  σ g , 0 u 1 b 0 effective parameters, while the latter uses the SSA to train the surrogate μ = ρ Bg  dμ + σ μ  σ μ , 1,0 u 0 1,0 u 1,1 b 1,0 neural network model. While both methods can accurately capture the μ = ρ Bg  dμ  σ μ + σ μ , ð7Þ 1,1 b 1 1,1 u 1,1 b 1,0 protein/mRNA distribution, our method outperforms the neural μ =2ρ Bðμ + Bg Þ 2dμ + σ μ  σ μ , 2,0 u 1,0 0 2,0 u 2,1 b 2,0 network-based ones in three aspects: (i) while neural network models μ =2ρ Bðμ + Bg Þ 2dμ  σ μ + σ μ : 2,1 b 1,1 1 2,1 u 2,1 b 2,0 perform well in the parameter ranges which are used to train the sur- rogate model, their extrapolation ability is usually weak. Our method is mechanism-based and provides accurate results over wide parameter From these equations, it is easy to show that ranges; (ii) neural network-based methods require a very long time to _ _ train the surrogate model. When the network is complex, the training μ + μ = ρ Bg + ρ Bg  dðμ + μ Þ, 1,0 1,1 u 0 b 1 1,0 1,1 time may take tens of hours to several days and may also require _ _ μ + μ =2ρ Bðμ + Bg Þ+2ρ Bðμ + Bg Þ ð8Þ 2,0 2,1 u 1,0 0 b 1,1 1 multiple rounds of hyperparameter tuning. In contrast, Holimap avoids 2dðμ + μ Þ: 2,0 2,1 the long training time; (iii) neural network models have good predictive ability but their learned approximation does not typically have a clear Matching Eqs. (6)and (8), we find that ρ and ρ should satisfy the b u biophysical interpretation. Holimap transforms the complex nonlinear following system of linear equations: network into a linear one which not only has a clear physical meaning but also allows an approximative analytical solution. In addition, ρ Bg + ρ Bg = ρ Bg + ρ Bg + σ g  σ μ , u 0 b 1 u 0 b 1 u 1 b 1,0 SSA + Holimap can be combined with neural network-based methods ρ Bðμ + Bg Þ + ρ Bðμ + Bg Þ ð9Þ u 1,0 0 b 1,1 1 to increase the speed and accuracy of the latter. Since SSA + Holimap = ρ Bðμ + Bg Þ + ρ Bðμ + Bg Þ + σ μ  σ μ : u 1,0 0 b 1,1 1 u 1,1 b 2,0 can be used to generate distributions comparable in accuracy to those from the SSA with a much larger number of trajectories, it follows that Matching the first and third identities in Eqs. (1)and (7), it is clear that SSA + Holimap can reduce the time to generate an accurate training σ and σ should satisfy the following system of linear equations: b u dataset as input to the neural network. σ g  σ g = σ g  σ μ , The main limitation of the present method is that there are no u 1 b 0 u 1 b 1,0 ð10Þ analytical guarantees that the effective parameters of the linear network σ μ  σ μ = σ μ  σ μ + ðρ  ρ ÞBg , u 1,1 b 1,0 u 1,1 b 2,0 b b 1 are positively definite for all times. Nevertheless, for all examples using the 2-HM and 3-HM in this paper, we have numerically found this to be where ρ has been determined by solving Eq. (9). Compared with thecaseandhenceweare confident that the linear network obtained by Eq. (4), Eq. (10) has an additional term ðρ   ρ ÞBg .Thisisbecause ρ b b 1 the 2-HM or 3-HM procedure is generally physically interpretable. In remains unchanged for the 2-HM but is changed for the 4-HM. contrast, we observed that the 4-HM procedure can occasionally give Finally, inserting Eqs. (9)and (10)into Eq. (7) gives a system of rise to negative parameter values (typically when the binding and closed moment equations and hence the values of all zeroth, first, and Nature Communications | (2024) 15:6557 11 Article https://doi.org/10.1038/s41467-024-50716-z second-order moments can be approximately calculated. Substituting 2. Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the these moments into Eqs. (9)and (10), one can finally solve for the four evolution of animal body plans. Science 311,796–800 (2006). effective parameters ρ ,ρ ,σ ,and σ of the linear network. The 4-HM 3. Olson, E. N. Gene regulatory networks in the evolution and devel- u b u b predicts the protein distribution of the nonlinear network by solving opment of the heart. Science 313,1922–1927 (2006). the CME of the linear one in Fig. 2d with the values of the effective 4. Gerstein, M. B. et al. Architecture of the human regulatory network parameters found above. derived from ENCODE data. Nature 489,91–100 (2012). 5. Spitz, F. & Furlong, E. E. Transcription factors: from enhancer Stochastic model for complex gene networks binding to developmental control. Nat. Rev. Genet. 13, Here we consider the stochastic model of an arbitrary gene regulatory 613–626 (2012). network involving protein synthesis, protein degradation, gene state 6. Pavlopoulos, G. A. et al. Using graph theory to analyze biological switching, and complex gene regulation mechanisms .Specifically, we networks. BioData Min. 4,1–27 (2011). assume that the network involves M distinct genes, each of which can 7. Koutrouli, M., Karatzas, E., Paez-Espino, D. & Pavlopoulos, G. A. A be in two states: an inactive state G and an active state G .The protein guide to conquer the biological network era using graph theory. associated with gene G is denoted by P . The network can be described Front. Bioeng. Biotech. 8,34(2020). j j by the following reactions: 8. Emmert-Streib, F., Dehmer, M. & Haibe-Kains, B. Gene regulatory networks and their applications: understanding biological and 0 1 α α medical problems in terms of networks. Front. Cell Dev. Biol. 2, j j * * G ! G , G ! G , j j j j 38 (2014). 0 1 σ σ ji ji 9. Margolin,A.A.etal.ARACNE:analgorithm for the reconstruction of * * G + h P ! G , G + h P ! G , j ji i j j ji i j gene regulatory networks in a mammalian cellular context. BMC ð11Þ 0 1 ρ ρ j j Bioinform. 7,1–15 (2006). * * G ! G + P , G ! G + P , j j j j j j 10. Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 challenges: a community effort to assess biological network infer- P !+, i, j = 1,2,:::,M, ence. Ann. N. Y. Acad. Sci. 1158,159–195 (2009). where the reactions in the first row describe spontaneous gene state 11. Emmert-Streib, F.,Glazko, G. V.,Altay,G. & de MatosSimoes, R. switching, the reactions in the second row describe gene regulation, the Statistical inference and reverse engineering of gene regulatory reactions in the third row describe protein synthesis in the two-gene networks from observational expression data. Front. Genet. 3, states, and the last reaction describes protein degradation or dilution. 8(2012). 1 0 Since G is the inactive state and G is the active state, we have ρ > ρ . 12. Chan, T. E., Stumpf, M. P. & Babtie, A. C. Gene regulatory network j j Due to complex gene regulation, each gene G may be regulated by all inference from single-cell data using multivariate information 0 1 genes. If gene G activates gene G,then σ >0 and σ =0 since the measures. Cell Syst. 5,251–267 (2017). i j ji ji binding of protein P induces the switching from G to G ; on the contrary, 13. Badia-i Mompel, P. et al. Gene regulatory network inference in the i j 0 1 if gene G inhibits gene G,then σ =0 and σ > 0 since the binding of era of single-cell multi-omics. Nat. Rev. Genet. 24,1–16 (2023). i j ji ji protein P induces the switching from G to G.Whenperforming 14. De Jong, H. Modeling and simulation of genetic regulatory systems: i j simulations (SSA and SSA + Holimap), the parameters are chosen as aliterature review. J. Comput. Biol. 9,67–103 (2002). 1 0 0 1 0 1 d =1, h =1, ρ =81, ρ =5:4, α = α =0:5, σ =0:01, σ =0 when G acti- 15. Karlebach, G. & Shamir, R. Modelling and analysis of gene reg- i ji j j j j ji ji 0 1 vates G,and σ =0, σ =0:01 when G inhibits G . The presence or ulatory networks. Nat. Rev. Mol. Cell Biol. 9,770–780 (2008). j i j ji ji absence of a gene-gene interaction and its type are determined 16. Edelstein-Keshet, L. Mathematical models in biology (SIAM, randomly. Here we assume that protein-gene interactions are non- 2005). cooperative (h = 1). The spontaneous switching rates between G and G 17. Ingalls,B.P. Mathematical modeling in systems biology: an intro- ij j 0 1 are chosen to be σ = σ =0:5.Sinceeachgeneisonaverageregulatedby duction (MIT press, 2013). j j two genes (one positive regulation and one negative regulation), the 18. Schnoerr, D., Sanguinetti, G. & Grima, R. Approximation and infer- switching rates due to gene regulation are roughly equal to ence methods for stochastic biochemical kinetics—a tutorial 0 1 σ = σ =0:01 multiplied by the number of regulator P,which is ~50. review. J. Phys.A Math.Theor. 50,093001 (2017). ji ji Hence the total switching rates due to spontaneous contributions and 19. Elowitz,M.B., Levine,A. J., Siggia, E. D. &Swain, P. S.Stochastic gene regulation are roughly 0.5 + 0.01 × 50 = 1, i.e., they are comparable gene expression in a single cell. Science 297,1183(2002). with the degradation rate d =1. 20. Ozbudak,E.M., Thattai, M., Kurtser,I., Grossman,A. D.&van Oudenaarden, A. Regulation of noise in the expression of a single Reporting summary gene. Nat. Genet. 31,69–73 (2002). Further information on research design is available in the Nature 21. Munsky, B., Neuert, G. & Van Oudenaarden, A. Using gene expres- Portfolio Reporting Summary linked to this article. sion noise to understand gene regulation. Science 336, 183–187 (2012). Data availability 22. Munsky,B.&Khammash,M.The finite state projection algorithm for MATLAB R2019a was used to analyze the data. Source data are pro- the solution of the chemical master equation. J. Chem. Phys. 124, vided with this paper. 044104 (2006). 23. Gillespie, D. T. A general method for numerically simulating the Code availability stochastic time evolution of coupled chemical reactions. J. Com- The MATLAB codes for Holimap and SSA + Holimap can be found in put. Phys. 22,403–434 (1976). the Github repository . 24. SzékelyJr,T.&Burrage,K.Stochasticsimulationinsystemsbiology. Comput. Struct. Biotechnol. J. 12,14–25 (2014). References 25. Klipp, E.,Liebermeister,W., Wierling, C.&Kowald,A. Systems 1. Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the biology: a textbook (John Wiley & Sons, 2016). transcriptional regulation network of Escherichia coli. Nat. Genet. 26. Munsky, B., Hlavacek, W. S. & Tsimring, L. S. Quantitative biology: 31,64–68 (2002). theory, computational methods, and models (MIT Press, 2018). Nature Communications | (2024) 15:6557 12 Article https://doi.org/10.1038/s41467-024-50716-z 27. Bateman, E. Autoregulation of eukaryotic transcription factors. 53. Nordick, B., Yu, P. Y., Liao, G. & Hong, T. Nonmodular oscillator Prog. Nucleic Acid Res. Mol. Biol. 60,133–168 (1998). and switch based on RNA decay drive regeneration of multi- 28. Becskei, A. & Serrano, L. Engineering stability in gene networks by modal gene expression. Nucleic Acids Res. 50,3693–3708 autoregulation. Nature 405,590–593 (2000). (2022). 29. Crews, S. T. & Pearson, J. C. Transcriptional autoregulation in 54. Tian, T.,Burrage,K., Burrage, P.M. & Carletti,M.Stochasticdelay development. Curr. Biol. 19,R241–R246 (2009). differential equations for genetic regulatory networks. J. Comput. 30. Hermsen, R., Ursem, B. & Ten Wolde, P. R. Combinatorial gene reg- Appl. Math. 205,696–707 (2007). ulation using auto-regulation. PLoS Comput. Biol. 6, e1000813 (2010). 55. Tomioka, R., Kimura, H., Kobayashi, T. J. & Aihara, K. Multivariate 31. Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription fac- analysis of noise in genetic regulatory networks. J. Theor. Biol. 229, tors in the human genome. Genomics 112,3427–3434 (2020). 501–521 (2004). 32. Jia, C. & Grima, R. Dynamical phase diagram of an auto-regulating 56. Friedman, N., Cai, L. & Xie, X. S. Linking stochastic dynamics to gene in fast switching conditions. J. Chem. Phys. 152, 174110 (2020). population distribution: an analytical framework of gene expres- 33. Cai, L., Friedman, N. & Xie, X. S. Stochastic protein expression in sion. Phys.Rev.Lett. 97,168302 (2006). individual cells at the single molecule level. Nature 440, 57. Bokes, P. & Singh, A. Protein copy number distributions for a self- 358–362 (2006). regulating gene in the presence of decoy binding sites. PLoS one. 34. Singh, A. & Hespanha, J. P. Lognormal moment closures for bio- 10,e0120555(2015). chemical reactions. In Proc. of the 45th IEEE Conference on Decision 58. Schnoerr, D., Sanguinetti, G. & Grima, R. Comparison of different and Control,2063–2068 (IEEE, 2006). moment-closure approximations for stochastic chemical kinetics. J. 35. Cao, Z. & Grima, R. Linear mapping approximation of gene reg- Chem. Phys. 143, 185101 (2015). ulatory networks with stochastic dynamics. Nat. Commun. 9, 59. Lakatos, E., Ale, A., Kirk, P. D. & Stumpf, M. P. Multivariate moment 3305 (2018). closure techniques for stochastic kinetic models. J. Chem. Phys. 36. Grima, R. A study of the accuracy of moment-closure approximations 143, 094107 (2015). for stochastic chemical kinetics. J. Chem. Phys. 136, 154105 (2012). 60. Zhou, T. & Zhang, J. Analytical results for a multistate gene model. 37. Jia, C. & Grima, R. Small protein number effects in stochastic SIAM J. Appl. Math. 72,789–818 (2012). models of autoregulated bursty gene expression. J. Chem. Phys. 61. Jia, C.&Li,Y.Analyticaltime-dependent distributions forgene 152,084115 (2020). expression models with complex promoter switching mechanisms. 38. Ko, M. S. A stochastic model for gene induction. J. Theor. Biol. 153, SIAM J. Appl. Math. 83,1572–1602 (2023). 181–194 (1991). 62. Sukys, A., Öcal, K. & Grima, R. Approximating solutions of the 39. Jiao, F., Sun, Q., Tang, M., Yu, J. & Zheng, B. Distribution modes and chemical master equation using neural networks. Iscience. their corresponding parameter regions in stochastic gene tran- 25, (2022). scription. SIAM J. Appl. Math. 75,2396–2420 (2015). 63. Wang, X., Li, Y. & Jia, C. Poisson representation: a bridge between 40. Jia, C. & Grima, R. Frequency domain analysis of fluctuations of discrete and continuous models of stochastic gene regulatory mRNA and protein copy numbers within a cell lineage: theory and networks. J. R. Soc. Interface 20,20230467(2023). experimental validation. Phys.Rev.X 11, 021032 (2021). 64. Jia, C. & Grima, R. Holimap: an accurate and efficient method for 41. Schwanhäusser, B. et al. Global quantification of mammalian gene solving stochastic gene network dynamics. chenjiacsrc/Holimap expression control. Nature 473, 337 (2011). https://doi.org/10.5281/zenodo.12725485 (2024). 42. Larsson, A. J. et al. Genomic encoding of transcriptional burst kinetics. Nature 565,251–254 (2019). Acknowledgements 43. Suter, D. M. et al. Mammalian genes are transcribed with widely We thank Augustinas Sukys for comments on the manuscript. C.J. different bursting kinetics. Science 332,472–474 (2011). acknowledges support from the National Natural Science Foundation of 44. Gardner, T. S., Cantor, C. R. & Collins, J. J. Construction of a genetic China with grant Nos. U2230402 and 12271020. R.G. acknowledges toggle switch in Escherichia coli. Nature 403, 339 (2000). support from the Leverhulme Trust (RPG-2020-327). 45. Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of tran- scriptional regulators. Nature 403, 335–338 (2000). Author contributions 46. Goodwin, B. C. Oscillatory behavior in enzymatic control processes. R.G. conceived the original idea. C.J. performed the theoretical deriva- Adv. Enzym. Regul. 3,425–437 (1965). tions and numerical simulations. C.J. and R.G. interpreted the theoretical 47. Bragdon, M. D. et al. Cooperative assembly confers regulatory results and jointly wrote the manuscript. specificity and long-term genetic circuit stability. Cell 186, 3810–3825 (2023). Competing interests 48. Lammers, N. C., Kim, Y. J., Zhao, J. & Garcia, H. G. A matter of time: The authors declare no competing interests. using dynamics and theory to uncover mechanisms of transcrip- tional bursting. Curr. Opin. Cell Biol. 67,147–157 (2020). Additional information 49. Goentoro, L., Shoval, O., Kirschner, M. W. & Alon, U. The incoherent Supplementary information The online version contains feedforward loop can provide fold-change detection in gene reg- supplementary material available at ulation. Mol. Cell 36,894–899 (2009). https://doi.org/10.1038/s41467-024-50716-z. 50. Zhu, R., delRio-Salgado, J.M., Garcia-Ojalvo, J. &Elowitz,M.B. Synthetic multistability in mammalian cells. Science 375, Correspondence and requests for materials should be addressed to eabg9765 (2022). Ramon Grima. 51. Gonze, D., Halloy, J. & Goldbeter, A. Robustness of circadian rhythms with respect to molecular noise. Proc.NatlAcad. Sci. USA Reprints and permissions information is available at 99,673–678 (2002). http://www.nature.com/reprints 52. Kuwahara, H. & Schwartz, R. Stochastic steady state gain in a gene expression process with mRNA degradation control. J. R. Soc. Publisher’s note Springer Nature remains neutral with regard to jur- Interface 9,1589–1598 (2012). isdictional claims in published maps and institutional affiliations. Nature Communications | (2024) 15:6557 13 Article https://doi.org/10.1038/s41467-024-50716-z Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2024 Nature Communications | (2024) 15:6557 14

Journal

Nature CommunicationsSpringer Journals

Published: Aug 2, 2024

There are no references for this article.