# Parametric identification of stochastic interaction networks

Parametric identification of stochastic interaction networks Abstract This article addresses issues from applied stochastic analysis for solving parameter identification problems in interaction networks. Purely discontinuous Markov processes are the most appropriate choice for modelling these systems and it is of paramount importance to estimate the unknown characteristics of the model given the measured data. The model induces a Fokker–Planck–Kolmogorov equation along with moment equations, and achieving parametric identification based on direct solutions of these equations has remained elusive. We propose a novel approach which utilizes stochastic analysis of continuous-time Markov processes to lift the curse of dimensionality in parametric identification. We illustrate through a case study from ecological engineering. 1. Introduction Reaction networks are systems in which the populations of a finite number of species evolve according to predefined interactions. They are an important tool for modelling in several disciplines: biochemistry, epidemiology, pharmacology, ecology and social networks. Consider a system with $$n_s$$ species $$S_1,\ldots,S_{n_s}$$ that may interact according to $$n_r$$ different interactions written in the form $$\nu_{i1}S_1+\cdots+\nu_{in_s}S_{n_s}\longrightarrow \mu_{i1}S_1+\cdots+\mu_{in_s}S_{n_s} \quad\quad i\in\{1,\ldots,n_r\}.$$ $$\nu_{ij}$$ denotes the number of individuals from species $$S_j$$ needed for interaction $$i$$ to occur. The expression on the right-hand side represents the outcome of interaction $$i$$. Here, we assume instantaneous outputs of the interaction into the system. In other words, interaction $$i$$ means for every $$j\in\{1,\ldots,n_s\}$$ an instantaneous jump of amplitude $$\mu_{ij}-\nu_{ij}$$ for the number of individuals from species $$S_j$$. Therefore, we do not have necessarily $$\mu_{ij}\neq\nu_{ij}$$ for every $$j\in\{1,\ldots,n_s\}$$, but we must have $$(\mu_{i1},\ldots,\mu_{in_s})\neq(\nu_{i1},\ldots,\nu_{in_s}).$$ Interaction $$i$$ is an event occurring at random times; it occurs only if the required individuals from all the species are available. For $$dt>0$$, let $$\theta_i\,dt$$ be the probability that the required individuals for interaction $$i$$ successfully perform it as and when they meet during the time interval $$[t,t+dt]$$. $$\theta_i$$ is called the rate of success of interaction $$i$$; note that we have assumed the success rates to be independent of $$t$$ (i.e., constant in time). By construction, $$\theta_i>0$$. And without loss of generality, we assume that $$\theta_i\leq1$$; this amounts to a choice of the time unit. The continuous-time process $$X(t)=(X_1(t),\ldots,X_{n_s}(t))^{{\top}} \quad\quad t\in\mathbb{R}_+,$$ where $$X_j(t)$$ denotes the number of individuals from species $$S_j$$ present in the system at time $$t$$, has values in $$\mathbb{N}^{n_s}$$ and right continuous with finite left limits (rcll) sample paths. In this article, we address the inverse problem of parameter identification from measured data. Specifically, we need to estimate the unknown parameters $$\theta_i$$, $$i\in\{1,\ldots,n_r\}$$. The data consists of the population size of only one species, say $$S_1$$, measured at discrete observation times. Overview of this work and bibliographical notes on related work Even to the most casual observer, it should be clear that randomness is inherent in both the order and the timing of the interactions. Stochastic models provide a framework for characterizing this uncertainty, especially when dealing with reactants in low copy numbers. Namely, the evolution of the species of interest is modelled by continuous-time conservative Markov processes. Within this framework, the evolution of the probability distribution is given by the Fokker–Planck–Kolmogorov (FPK) equation. When the state space is the nonnegative integer lattice, the FPK equation is frequently referred to as the chemical master (CM) equation. In most cases the FPK equation cannot be solved analytically; numerical tools have been tested and proven reliable to approximate its solution in a few particular cases (Baili & Fleury, 2002, 2004; Bect et al., 2006). Here the challenge is to lift the curse of dimensionality: the exponential growth of computational cost in the dimension (i.e., the number of species) and the mode size (i.e., the maximum copy number). Kazeev et al. (2014) traces the origins of stochastic interaction networks in biochemistry and attributes the subject to recent approaches of dimensionality reduction. A first approach is based on moment closure (Ruess et al., 2011). This approach performs poorly in situations where one of the species exhibits low population size. A second approach uses a low parametric representation of tensors called canonical polyadic (CP) decomposition. This is a generalization of the singular value decomposition (SVD) to tensors of dimension greater than two. The main issue in applying the CP decomposition is to control the tensor rank of the approximate solution to the CM equation. In Hegland & Garcke (2010), it is suggested to recompute a lower rank CP decomposition after every arithmetic operation. Another approach, Jahnke & Huisinga (2008), consists in projecting the dynamics onto a manifold composed of all tensors with a CP decomposition of some predetermined maximal rank. These results in a set of coupled non-linear ordinary differential equations which are solved using available ODE solvers. The approach proposed in Kazeev et al. (2014) is based on a numerical tensor algebra, called quantized tensor train, that operates on a low parametric numerical representation of tensors rather than on their CP decomposition. The present article uses very different techniques and yields results under no assumptions nor approximations for the solution to the FPK equation. We divide the article into two parts; the technical part is composed of Sections 2 and 4. In this part, we discuss the Markovian model and then derive the bearings of this model. Of interest are a couple of operators associated to every Markov process: the generating operator and the Fokker–Planck operator. This part reviews in the present context classical results of Dynkin’s formula (Kulasiri & Verwoerd, 2002) and moment equations. The novelty of this work resides in Section 4. Our first main contribution consists in computing the matrix of transition rates for the observable component of the state; here it is important to notice that the observable component of the state is a non-Markovian purely discontinuous process whose characteristics depends on the whole state. In our second main result, Proposition 2, we show that the probability distribution of the observable component of the state is entirely determined by its transition rate matrix and its initial distribution. As a consequence, parametric identification turns out to a non-linear least-squares problem. Section 3 together with Section 5 form the implementation part of the article. The former presents two applications of interaction networks from ecological engineering and bioengineering, and the latter shows a demonstration of parametric identification for the first application. 2. Modelling: purely discontinuous Markov processes Let $$x_t$$ be a continuous-time stochastic process based on some filtered probability space $$\displaystyle({\it{\Omega}},\mathcal{F},\mathbb{F},\mathbb{P})$$. $$x_t$$ is called a purely discontinuous process (PDP) (Jacod & Skorokhod, 1996) if its sample paths are rcll piecewise-constant functions of time. The state space $$E$$ of a PDP is a subset from $$\mathbb{R}^d$$ that may be discrete or continuous. The instants of jumps of $$x_t$$, $$\displaystyle\{T_k\}_{k\in\{1,2,\dots\}}$$, form an increasing sequence of positive random variables. Here, we assume that for each $$t\in\mathbb{R}_+$$ $$\mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\right]<\infty.$$ Hence, almost surely, finitely many jumps take place in any finite time interval. This assumption holds naturally in applications. Besides, the jump times of $$x_t$$ are $$\mathbb{F}$$-stopping times of two kinds: totally inaccessible for spontaneous jumping, or predictable if the jumps are triggered by some events. More precisely, forced jumps of $$x_t$$, if any, are triggered only by hitting the boundary of the state space. Furthermore, for spontaneous jumping there is a hazard rate or intensity denoted $$z_t$$. This is a continuous-time stochastic process with nonnegative values. Totally inaccessible jump times are related to $$z_t$$ via the generalized exponential distribution (Brémaud, 1981): for $$t\in\mathbb{R}_+$$ \begin{gather*} \mathbb{P}\left\{T_1 > t\,|\,\mathcal{F}_t\right\}=\exp\left\{-\int_0^t z_s\,ds\right\}\!, \\ \mathbb{P}\left\{T_k > t\,|\,\mathcal{F}_{T_{k-1}}\right\}= \exp\left\{-\int_{T_{k-1}}^t z_s\,ds\right\}\,{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1}\leq t\}}\quad+\quad{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1} > t\}}\quad\quad k\in\{2,3,\dots\}, \end{gather*} (For homogeneity of notation we set $$T_0=0$$). $$\mathcal{F}_{T_{k-1}}$$ is called the $$\sigma$$-algebra of events prior to the $$\mathbb{F}$$-stopping time $$T_{k-1}$$ (Øksendal, 2003; Protter, 2005), namely $$\mathcal{F}_{T_{k-1}}=\sigma\left\{A\in\mathcal{F}_{\infty}: \forall t\in\mathbb{R}_+,A\cap\left\{T_{k-1}\leq t\right\}\in\mathcal{F}_t\right\}\!,$$ where $$\mathcal{F}_{\infty}=\sigma\left(\bigcup_{\,t\in\mathbb{R}_+}\mathcal{F}_t\right)\!.$$ Without loss of generality, we further assume that $$z_t$$ is a state feedback: at the current time $$t$$, it may depend only on $$t$$ and $$x_{t^{\,-}}$$, where $$x_{t^{\,-}}=\displaystyle\lim_{s\uparrow t}x_s.$$ In fact, a PDP $$x_t$$ with spontaneous jumping is Markovian if and only if $$z_t=\lambda(t,x_{t^{\,-}})$$ or $$z_t=\lambda(x_{t^{\,-}})$$; it is time-homogeneous Markovian only in the latter case. Predictable jump times, regarded as a point process, do not have a hazard rate process. We define the active or essential boundary of the state space as a set of boundary points $$\bar{x}\in\partial E$$ almost surely reachable. The hitting times of $$\partial E$$ are $$\mathbb{F}$$-stopping times since $$x_t$$ is rcll and $$\partial E$$ is a closed set. The active boundary, if non-empty, is assumed to be non-absorbing, that is, there is no killing of the process $$x_t$$ on the boundary of the state space. In case of non-empty active boundary, we may have jumps out of the active boundary into $$E$$. In this respect, we denote $$\partial E^{{\textrm{rep}}}$$ the repulsive active boundary, that is, everywhere $$x_t$$ reaches $$\partial E^{{\textrm{rep}}}$$, it jumps instantaneously to interior or boundary points belonging to the state space $$E$$. Remark. In case where $$\partial E \subset E$$, there may be boundary points of the state space where the state process sticks for a random duration and then jumps spontaneously; these boundary points belong to the active boundary but not to $$\partial E^{{\textrm{rep}}}$$. The last ingredient of the PDP is a transition kernel; it reinitiates the PDP after every jump, of both types. Specifically, the transition kernel is the probability distribution on $$E$$ of the post-jump state $$x(T_k)$$ under the conditional probability measure $$\mathbb{P}\left\{\left.\bullet\right|\,\mathcal{F}_{T_k^{\,-}}\right\}$$, where $$\mathcal{F}_{T_k^{\,-}}$$ is the $$\sigma$$-algebra of events strictly prior to the $$\mathbb{F}$$-stopping time $$T_k$$, namely $$\mathcal{F}_{T_k^{\,-}}=\mathcal{F}_0\vee\sigma\left\{A\cap\left\{t<T_k\right\}:A\in\mathcal{F}_t,t\in\mathbb{R}_+\right\}\!.$$ The transition kernel is assumed to be Markovian, that is, dependent only on $$T_k$$ and $$x(T_k^{\,-})$$: for a Borel-measurable set $$A$$ from $$E$$ $$\mathbb{P}\left\{\left.x(T_k)\in A\right|\,\mathcal{F}_{T_k^{\,-}}\right\}\triangleq Q(T_k,x(T_k^{\,-}),A).$$ Since the active boundary may not belong to the state space $$E$$, $$x(T_k^{\,-})$$ has values in $$E\cup\partial E^{{\textrm{rep}}}$$. The ingredients of the PDP described above result in a strong Markovian process. It is clear that the proposed PDP can be applied for modelling our interaction network; the remainder is to identify all its ingredients. Here, $$d=n_s$$ and $$E$$ is the nonnegative integer lattice $$\mathbb{N}^{n_s}$$. Hence, the boundary $$\partial E=\emptyset$$. $$X(t)$$ jumps spontaneously, that is, the jump times are all totally inaccessible $$\mathbb{F}$$-stopping times. The last two ingredients to identify are the jumping intensity and the transition kernel; these are derived from the interaction scheme $$\nu_{i1}S_1+\cdots+\nu_{in_s}S_{n_s}\longrightarrow \mu_{i1}S_1+\cdots+\mu_{in_s}S_{n_s} \quad\quad i\in\{1,\ldots,n_r\}.$$ We claim that $$\mathbb{P}\left\{\left.T_{k}\in]t,t+dt]\,\right|\,\mathcal{F}_{T_{k-1}}\right\}= \mathbb{P}\left\{T_k > t\,|\,\mathcal{F}_{T_{k-1}}\right\}\,{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1}\leq t\}}\sum_{i=1}^{n_r}\theta_i\,dt\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(t^{\,-})\geq\nu_{ij}\}}\,C_{X_j(t^{\,-})}^{\nu_{ij}},$$ where $$C_n^k$$ denotes the binomial coefficient, defined as $$\frac{n!}{k!(n-k)!}$$. The distribution of $$T_{k}$$ under the conditional probability measure $$\mathbb{P}\left\{\left.\bullet\right|\,\mathcal{F}_{T_{k-1}}\right\}$$ is given by the generalized exponential distribution above; it follows that $$z_t=\sum_{i=1}^{n_r}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(t^{\,-})\geq\nu_{ij}\}}\,C_{X_j(t^{\,-})}^{\nu_{ij}}.$$ This is a particular case of the state feedback form $$z_t=\lambda(X(t^{\,-}))$$ with $$\lambda(x)=\sum_{i=1}^{n_r}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}.$$ $$z_t$$ likewise the $$\theta_i$$’s has unit: per time unit or (time unit)$$^{-1}$$. After every jump the process is reinitiated according to the following sum of random Dirac’s measures: $$Q(T_k,X(T_k^{\,-}),A)= \sum_{i=1}^{n_r}\frac{\theta_i\left(\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(T_k^{\,-})\geq\nu_{ij}\}}\,C_{X_j(T_k^{\,-})}^{\nu_{ij}}\right)}{\sum_{i=1}^{n_r}\theta_i\left(\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(T_k^{\,-})\geq\nu_{ij}\}}\,C_{X_j(T_k^{\,-})}^{\nu_{ij}}\right)} \,\delta_{X(T_k^{\,-})+{\it{\Delta}}_i}(A),$$ where $${\it{\Delta}}_i=\left(\begin{array}{c}\mu_{i1}-\nu_{i1}\\ \vdots\\\mu_{in_s}-\nu_{in_s}\end{array}\right)\!.$$ Being independent of $$T_k$$, $$Q(T_k,X(T_k^{\,-}),A)$$ is a time-homogeneous Markovian kernel. And thus the Markov process $$X(t)$$ is time-homogeneous. 2.1. The generating operator and the Fokker–Planck operator for the Markov model The description above yields, in the first place, to the generating operator $$\mathcal{G}$$ of the homogeneous Markov process $$X(t)$$. $$\mathcal{G}f(x)=\lambda(x)\left[-f(x)+\int_E f(x')Q(x,dx')\right]\!.$$ That is, $$\mathcal{G}f(x)=\sum_{i=1}^{n_r}\theta_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)\left[-f(x)+f(x+{\it{\Delta}}_i)\right]\!,$$ where $$x=(x_1,\ldots,x_{n_s})$$ and $${\it{\Delta}}_i=(\mu_{i1}-\nu_{i1},\ldots,\mu_{in_s}-\nu_{in_s})$$. The domain $$D(\mathcal{G})$$ of $$\mathcal{G}$$ is the set of measurable real-valued functions on $$E$$. In the second place, the central concept of this section is the Fokker–Planck operator $$\mathcal{L}_{\textrm{FP}}$$ along with its domain $$D(\mathcal{L}_{\textrm{FP}})$$. Let $$p(t,x)$$ be the probability distribution of $$X(t)$$. The Fokker–Planck operator is the adjoint of the generating operator in the sense that, for $$f\in D(\mathcal{G})$$ and $$p\in D(\mathcal{L}_{\rm FP})$$, $$(\mathcal{G}f,p)=(f,\mathcal{L}_{\textrm{FP}}\,p).$$ The pairing of $$\mathcal{G}f$$ and $$p$$, regarded as a function of $$x$$, denotes $$\sum_{x\in E} \mathcal{G}f(x)p(t,x),$$ and $$(f,\mathcal{L}_{\textrm{FP}}\,p)$$ denotes $$\sum_{x\in E} f(x)\mathcal{L}_{\textrm{FP}}\,p(t,x).$$ Proposition 1 We claim that $$\mathcal{L}_{\textrm{FP}}\,p(t,x)$$ is given by $$\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)+{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!.$$ Proof. $$\sum_{x\in E} \mathcal{G}f(x)p(t,x)=\sum_{i=1}^{n_r}\theta_i \sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)\left[-f(x)+f(x+{\it{\Delta}}_i)\right]p(t,x)$$ First, we have $$\label{eq1} \sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)= \sum_{\left\{x\in E:\forall j,\, x_j-\nu_{ij}\geq0\right\}}\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x).$$ (1) Now make the change of variables $$x_j-\nu_{ij}+\mu_{ij}$$ by $$x'_j$$ in the right-hand side of Eq. (1). Then \begin{gather*} x_j=x'_j-\mu_{ij}+\nu_{ij},\\ \left\{x\in E:\forall j,\, x_j-\nu_{ij}\geq0\right\}=\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}\!, \end{gather*} and we obtain $$\sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)= \sum_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}\left(\prod_{j=1}^{n_s}C_{x'_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)f(x')p(t,x'-{\it{\Delta}}_i),$$ or equivalently, \begin{align*} &\sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)\\ &\quad =\sum_{x\in E} {\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)f(x). \end{align*} Thus $$\sum_{x\in E} \mathcal{G}f(x)p(t,x)$$ is equal to \begin{align*} &\sum_{x\in E}f(x)\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)\right.\\ &\quad \left.+{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!. \end{align*} □ As regards $$D(\mathcal{L}_{\textrm{FP}})$$, it is the set of functions $$p(t,x):\mathbb{R}_+\times E\rightarrow [0,1]$$ satisfying for each $$t\in\mathbb{R}_+$$ $$\sum_{x\in E}p(t,x)=1.$$ 2.2. Dynkin’s formula versus the Fokker–Planck equation Dynkin’s formula is the main tool in computing expectations for a wide class of functions (Klebaner, 2005). Let $$f\in D(\mathcal{G})$$; if for each $$t\in\mathbb{R}_+$$ $$\label{sufficient-condition} \mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\left|f(X(T_k))-f(X(T_k^{\,-}))\right|\right]<\infty,$$ (2) then $$\label{dynkin} \mathbb{E}\left[f(X(t))-f(X(0))\right]=\mathbb{E}\left[\int_0^t \mathcal{G}f(X(s))\,ds\right]\!.$$ (3) The condition (2) holds in particular when $$f$$ is bounded and for each $$t\in\mathbb{R}_+$$ $$\mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\right]<\infty.$$ Developing Eq. (3) yields \begin{eqnarray*} &&\mathbb{E}\left[f(X(t))-f(X(0))\right]=\int_0^t \mathbb{E}\left[\mathcal{G}f(X(s))\right]ds,\\ &&=\int_0^t\sum_{i=1}^{n_r}\theta_i\,\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(X(s))\left(\prod_{j=1}^{n_s}C_{X_j(s)}^{\nu_{ij}}\right)\left[-f(X(s))+f(X(s)+{\it{\Delta}}_i)\right]\right]ds,\\ &&=\sum_{i=1}^{n_r}\theta_i\int_0^t\mathbb{E}\left[\left[-f(X(s))+f(X(s)+{\it{\Delta}}_i)\right]\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{X_j(s)\geq\nu_{ij}\right\}}C_{X_j(s)}^{\nu_{ij}}\right]ds. \end{eqnarray*} The Fokker–Planck equation is an important analytical tool for characterizing the law of a Markov process (Gardiner, 1985; Risken, 1989). The derivation of the Fokker–Planck equation is handled on a case-by-case base. The probability distribution of the purely discontinuous Markov process (PDMP) $$X(t)$$ solves the following FPK equation: for $$t>0$$ $$\frac{\partial p}{\partial t}(t,x)=\mathcal{L}_{\textrm{FP}}\,p(t,x),$$ i.e., \begin{align*} \frac{\partial p}{\partial t}(t,x)&=\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)\right.\\ &\quad\left. +{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!, \end{align*} with the initial condition $$p(0,x)=\mathbb{P}\left\{X(0)=x\right\}$$. 3. Case studies 3.1. Interaction networks in ecology The first case study we consider is the evolution of an agricultural pest. Let $$X_1(t)$$ be the size of the pest population, and $$X_2(t)$$ be an indicator of how much the environment has been deteriorated by the infestation. New pest individuals arise in the system due to immigration with rate $$\theta_1$$, or birth with rate which is proportional to the population size, namely $$\theta_2X_1(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t^{\,-})\geq1\right\}}.$$ The death rate of the pest is given by $$\theta_3X_1(t^{\,-})X_2(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\{X_1(t^{\,-})\geq1,X_2(t^{\,-})\geq1\}},$$ that is, it depends on the population size and the damage to the environment. In addition, $$X_2(t)$$ is increased by one unit instantly when a new pest individual is added. Since the pest individuals deteriorate the environment for a time period that may exceed their own life span, we assume that the death of pest individuals leaves $$X_2(t)$$ unchanged. Furthermore, the environment may recover (here, we assume that $$X_2(t)$$ decreases instantaneously by one unit) with rate $$\theta_4X_2(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\left\{X_2(t^{\,-})\geq1\right\}}.$$ The description above corresponds schematically to the following interactions: \begin{eqnarray*} \emptyset&\longrightarrow&S_1+S_2\\ S_1&\longrightarrow&2S_1+S_2\\ S_1+S_2&\longrightarrow&S_2\\ S_2&\longrightarrow&\emptyset \end{eqnarray*} Obviously $$n_s=2$$, $$n_r=4$$, $$\nu_{11}=0$$, $$\nu_{12}=0$$, $$\mu_{11}=1$$, $$\mu_{12}=1$$, $$\nu_{21}=1$$, $$\nu_{22}=0$$; $$\mu_{21}=2$$, $$\mu_{22}=1$$, $$\nu_{31}=1$$, $$\nu_{32}=1$$, $$\mu_{31}=0$$, $$\mu_{32}=1$$, $$\nu_{41}=0$$, $$\nu_{42}=1$$, $$\mu_{41}=0$$, $$\mu_{42}=0$$, $${\it{\Delta}}_1=(1,1)$$, $${\it{\Delta}}_2=(1,1)$$, $${\it{\Delta}}_3=(-1,0)$$, $${\it{\Delta}}_4=(0,-1)$$. And then the Fokker–Planck equation is given by \begin{eqnarray*} \frac{\partial p}{\partial t}(t,x_1,x_2)&=&-\left[\theta_1+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}x_1+\theta_3{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}x_1x_2+\theta_4{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}x_2\right]p(t,x_1,x_2)\\ &&+\left[\theta_1{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq2\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}(x_1-1)\right]p(t,x_1-1,x_2-1)\\ &&+\theta_3{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}(x_1+1)x_2p(t,x_1+1,x_2)\\ &&+\theta_4(x_2+1)p(t,x_1,x_2+1). \end{eqnarray*} Now, we wish to show how the mean of the observable species $$S_1$$ evolve in time by applying the Dynkin’s formula to $$f(x)=x_1$$. First, for every $$i\in\{1,\ldots,n_r\}$$, $$-f(x)+f(x+{\it{\Delta}}_i)=-x_1+x_1+{\it{\Delta}}_{i1}={\it{\Delta}}_{i1}.$$ Then \begin{eqnarray*} \mathbb{E}\left[X_1(t)\right]-\mathbb{E}\left[X_1(0)\right]&=&\sum_{i=1}^{n_r}\theta_i{\it{\Delta}}_{i1}\int_0^t\mathbb{E}\left[\prod_{j=1}^{n_s} {\rm 1}\kern-0.24em{\rm I}_{\left\{X_j(s)\geq\nu_{ij}\right\}}C_{X_j(s)}^{\nu_{ij}}\right]ds,\\ &=&\theta_1\,t+\theta_2\int_0^t\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\{X_1(s)\geq1\}}X_1(s)\right]ds -\theta_3\int_0^t\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\{X_1(s)\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{X_2(s)\geq1\}}X_1(s)X_2(s)\right]ds,\\ &=&\theta_1\,t+\theta_2\int_0^t\mathbb{E}\left[X_1(s)\right]ds-\theta_3\int_0^t\mathbb{E}\left[X_1(s)X_2(s)\right]ds. \end{eqnarray*} This is equivalent to saying that $$\dot{m}_1(t)=\theta_1+\theta_2\,m_1(t)-\theta_3\,m_2(t),$$ where $$m_1(t)=\mathbb{E}\left[X_1(t)\right]$$ and $$m_2(t)=\mathbb{E}\left[X_1(t)X_2(t)\right]$$. Clearly then, the ordinary differential equation for a first-order moment (the mean of the observable species) involves a second-order moment (the cross-correlation function between the two species). 3.2. Interaction networks in biochemistry As a second case study, we choose gene expression in budding yeast transiently induced by the high-osmolarity glycerol (HOG) pathway (Hohmann, 2002; Pelet et al., 2011; Zechner et al., 2012). The yeast cells induce the MAPK Hog1 signaling cascade upon hyper-osmotic shocks. Signaling pathways are activated for only a short time during which the role of such outcome, as a mitogen-activated protein kinase (MAPK), is twofold. First, in the cytoplasm, Hog1 phosphorylates its substrate to increase the internal concentration of glycerol in the cell. Secondly and in parallel, a large fraction of the active Hog1 translocates to the nucleus where it triggers the activation of transcription for roughly 300 genes. The promoter of the sugar transporter-like protein 1 (pSTL1) is a Hog1 expression target driving the expression of a fluorescent protein (quadrupleVenus-qV). This protein is used as a reporter to quantify the amount of transcription. Once the internal glycerol concentration allows to balance the external osmotic pressure, the pathway is deactivated, leading to a rapid termination of the transcription. A bimodality in the protein expression profiles is observed; this stems from the transient activation of the MAPK Hog1 in conjunction with slow transcription activation by the promoter. The system described above is depicted in Fig. 1. Fig. 1. View largeDownload slide MAPK Hog1 induced pSTL1-qV expression. Osmotic pressure is sensed at the membrane and results in the activation of the MAPK signaling cascade. Once active, double-phosphorylated MAPK Hog1 translocates to the nucleus, where it can bind via transcription factors to the pSTL1 promoter. Then remodelling of the chromatin structure allows for efficient transcription of mRNA, which is exported from the nucleus to yield expression of the fluorescent reporter pSTL1-qV. Fig. 1. View largeDownload slide MAPK Hog1 induced pSTL1-qV expression. Osmotic pressure is sensed at the membrane and results in the activation of the MAPK signaling cascade. Once active, double-phosphorylated MAPK Hog1 translocates to the nucleus, where it can bind via transcription factors to the pSTL1 promoter. Then remodelling of the chromatin structure allows for efficient transcription of mRNA, which is exported from the nucleus to yield expression of the fluorescent reporter pSTL1-qV. A simple model of transiently induced gene expression is given by the following interaction network: \begin{align*} S_1&\longrightarrow\emptyset\\ S_1+S_2&\longrightarrow S_3\\ S_3&\longrightarrow S_1+S_2\\ S_3&\longrightarrow S_3+S_4 \end{align*} The model consists of four species: a transcription factor $$S_1$$, the gene $$S_2$$, the gene with bound transcription factor $$S_3$$ and the produced protein $$S_4$$. Here the binding of $$S_1$$ and $$S_2$$ (to form $$S_3$$) aggregates all necessary steps involved in gene activation such as the binding of transcription factors, polymerase binding, or chromatin remodelling. Also protein synthesis is reduced to a first-order production, abstracting messenger RNA (mRNA) transcription and degradation, translation and protein folding. In this case, $$n_s=4$$, $$n_r=4$$, $$\nu_{11}=1$$, $$\nu_{12}=0$$, $$\nu_{13}=0$$, $$\nu_{14}=0$$, $$\mu_{11}=0$$, $$\mu_{12}=0$$, $$\mu_{13}=0$$, $$\mu_{14}=0$$, $$\nu_{21}=1$$, $$\nu_{22}=1$$, $$\nu_{23}=0$$, $$\nu_{24}=0$$, $$\mu_{21}=0$$, $$\mu_{22}=0$$, $$\mu_{23}=1$$, $$\mu_{24}=0$$, $$\nu_{31}=0$$, $$\nu_{32}=0$$, $$\nu_{33}=1$$, $$\nu_{34}=0$$, $$\mu_{31}=1$$, $$\mu_{32}=1$$, $$\mu_{33}=0$$, $$\mu_{34}=0$$, $$\nu_{41}=0$$, $$\nu_{42}=0$$, $$\nu_{43}=1$$, $$\nu_{44}=0$$, $$\mu_{41}=0$$, $$\mu_{42}=0$$, $$\mu_{43}=1$$, $$\mu_{44}=1$$, $${\it{\Delta}}_1=(-1,0,0,0)$$, $${\it{\Delta}}_2=(-1,-1,1,0)$$, $${\it{\Delta}}_3=(1,1,-1,0)$$, $${\it{\Delta}}_4=(0,0,0,1)$$. With respect to the inverse problem, the species which is subject to observation is the protein. Accordingly, we wish to determine the protein expression profiles, that is, the probability distribution of species $$S_4$$; for the following implementation details we obtain the plots shown in Fig. 2: $$\theta_1=0.001,\quad \theta_2=0.1\,\quad \theta_3=1,\quad \theta_4=0.1,\quad X(0)=(20,5,0,0)^{{\top}}.$$ Fig. 2. View largeDownload slide From the top and from left to right: protein expression profiles at $$t=0$$, $$t=10$$, $$t=20$$, $$t=30$$, $$t=40$$, $$t=50$$ time units ($$x$$-axis: number of molecules, $$y$$-axis: probability). Fig. 2. View largeDownload slide From the top and from left to right: protein expression profiles at $$t=0$$, $$t=10$$, $$t=20$$, $$t=30$$, $$t=40$$, $$t=50$$ time units ($$x$$-axis: number of molecules, $$y$$-axis: probability). The result depicted in Fig. 2 is obtained by Monte-Carlo using $$10^{5}$$ simulations of the PDMP $$X(t)$$. 4. Parameter identification The jumping rate out of state $$x$$ is given by $$\lambda(x)=\sum_{i=1}^{n_r}\lambda(x,x+{\it{\Delta}}_i),$$ where $$\lambda(x,x+{\it{\Delta}}_i)=\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}$$ is the rate of jumping from state $$x$$ to state $$x+{\it{\Delta}}_i$$. We may write $$\lambda(x,x+{\it{\Delta}}_i)$$ in the following expanded form: $$\lambda((x_1,\ldots,x_{n_s}),(x_1+{\it{\Delta}}_{i1},\ldots,x_{n_s}+{\it{\Delta}}_{in_s})),$$ where $${\it{\Delta}}_i=({\it{\Delta}}_{i1},\ldots,{\it{\Delta}}_{in_s})$$. We will make use of the notation $$E_i=\left\{x\in E\textrm{ such that: there are jumps of type i out of }x\right\}$$ for every $$i\in\{1,\ldots,n_r\}$$. We wish to compute the jumping rates between different states of the observable component of the state $$X_1(t)$$. First we must order the $${\it{\Delta}}_{i1}$$, $$i\in\{1,\ldots,n_r\}$$; if they are all nonzero and different, then \begin{eqnarray} \lambda(x_1,x_1+{\it{\Delta}}_{i1})&=&\sum_{x_2,\ldots,\,x_{n_s}} \lambda((x_1,\ldots,x_{n_s}),(x_1+{\it{\Delta}}_{i1},\ldots,x_{n_s}+{\it{\Delta}}_{in_s}))\nonumber\\ &=&\sum_{x_2,\ldots,\,x_{n_s}}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}\nonumber\\ &=&{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{i1}\right\}}\,C_{x_1}^{\nu_{i1}}\sum_{x_2,\ldots, \,x_{n_s}}\theta_i\,\prod_{j=2}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}.\label{rate_jump0} \end{eqnarray} (4) Actually, the sum over the set $$\left\{x_2,\ldots,\,x_{n_s}\textrm{ such that: } (x_1,x_2,\ldots,x_{n_s})\in E_i\right\}$$ is abbreviated with the notation $$\sum_{x_2,\ldots,\,x_{n_s}}$$ in Eq. (4). Here we introduce the notation $$\tilde{\theta}_i=\sum_{\left\{x_2,\ldots, \,x_{n_s}:\,(x_1,x_2,\ldots,x_{n_s})\in E_i\right\}}\theta_i\,\prod_{j=2}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}$$ so as to have $$\lambda(x_1,x_1+{\it{\Delta}}_{i1})=\tilde{\theta}_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{i1}\right\}}\,C_{x_1}^{\nu_{i1}}.$$ If for instance the nonzero $${\it{\Delta}}_{i1}$$’s are all different but two, say $${\it{\Delta}}_{11}={\it{\Delta}}_{21}$$, then $$\lambda(x_1,x_1+{\it{\Delta}}_{11})=\lambda(x_1,x_1+{\it{\Delta}}_{21})=\tilde{\theta}_1\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{11}\right\}}\,C_{x_1}^{\nu_{11}}+\tilde{\theta}_2\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{21}\right\}}\,C_{x_1}^{\nu_{21}}.$$ In greater generality, we have neither $${\it{\Delta}}_{i1}\neq0$$ for every $$i\in\{1,\ldots,n_r\}$$ nor $${\it{\Delta}}_{11}\neq{\it{\Delta}}_{21}\neq\ldots\neq{\it{\Delta}}_{n_r1}$$ and the jump times of $$X(t)$$ out of state $$x\in\mathbb{N}^{n_s}$$ where its observable component $$X_1(t)$$ jumps as well has intensity $$\label{intensity} \lambda(x)=\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\lambda(x,x+{\it{\Delta}}_i).$$ (5) In other words, if we denote by $$\tilde{T}_1$$ the first jump time of both $$X(t)$$ and $$X_1(t)$$, then we have $$\mathbb{P}\left\{\tilde{T}_1 > t\,|\,\mathcal{F}_t\right\}=\exp\left\{-\int_0^t \sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\lambda(X(s^{\,-}),X(s^{\,-})+{\it{\Delta}}_i)\,ds\right\}\!.$$ Taking into account Eq. (5), we deduce by a similar argument that the intensity of jumps for $$X_1(t)$$, out of state $$x\in\mathbb{N}$$, is given by \begin{eqnarray*} \lambda(x)&=&\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\,\,\sum_{\left\{x_2,\ldots, \,x_{n_s}:\,(x,x_2,\ldots,x_{n_s})\in E_i\right\}}\lambda((x,x_2,\ldots,x_{n_s}),(x,x_2,\ldots,x_{n_s})+{\it{\Delta}}_i),\\ &&\\ &=&\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\tilde{\theta}_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq\nu_{i1}\right\}}\,C_{x}^{\nu_{i1}}. \end{eqnarray*} It appears clearly from the results obtained so far that $$\theta_i$$ is identifiable if and only if (i) $${\it{\Delta}}_{i1}\neq0$$ and (ii) the enumerable set $$E_i$$ is non-empty and finite. The finiteness of $$E_i$$, for every $$i\in\{1,\ldots,n_r\}$$, together with the non-emptiness are demonstrated numerically in Section 5. Now that, we have computed the jumping rates between different states for $$X_1(t)$$, let us introduce the square matrices $${\it{\Lambda}}=\left(\begin{array}{ccccc}-\sum_{x'\neq 0}\lambda(0,x')&\lambda(1,0)&\cdots&\lambda(x,0)&\cdots\\[3pt] \lambda(0,1)&-\sum_{x'\neq 1}\lambda(1,x')&\cdots&\lambda(x,1)&\cdots\\[3pt] \lambda(0,2)&\lambda(1,2)&\cdots&\lambda(x,2)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots\\[3pt] \lambda(0,x-1)&\lambda(1,x-1)&\cdots&\lambda(x,x-1)&\cdots\\[3pt] \lambda(0,x)&\lambda(1,x)&\cdots&-\sum_{x'\neq x}\lambda(x,x')&\cdots\\[3pt] \lambda(0,x+1)&\lambda(1,x+1)&\cdots&\lambda(x,x+1)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots \end{array}\right)\!,$$ $$Q^{{\top}}=\left(\begin{array}{ccccc}0&q(1,0)&\cdots&q(x,0)&\cdots\\[3pt] q(1,0)&0&\cdots&q(x,1)&\cdots\\[3pt] q(0,2)&q(1,2)&\cdots&q(x,2)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots\\[3pt] q(0,x-1)&q(1,x-1)&\cdots&q(x,x-1)&\cdots\\[3pt] q(0,x)&q(1,x)&\cdots&0&\cdots\\[3pt] q(0,x+1)&q(1,x+1)&\cdots&q(x,x+1)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots \end{array}\right)\!,$$ where $$q(x,x')=\frac{\lambda(x,x')}{\lambda(x)},$$ for $$x'\neq x$$, $$x,x'\in\mathbb{N}$$ and $$\lambda(x)=\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x,x').$$ Note that the sum total of every column of $${\it{\Lambda}}$$ is 0 and the sum total of every row of $$Q$$ is 1; $${\it{\Lambda}}$$ and $$Q$$ are called respectively, the transition probability matrix and the transition rate matrix of $$X_1(t)$$. Since in our application we have at most $$n_r$$ types of jumps out of each $$x\in\mathbb{N}$$, $${\it{\Lambda}}$$ and $$Q$$ are sparse matrices. Besides, knowing $$\lambda(x)$$ for every $$x\in\mathbb{N}$$, there is a one-to-one correspondence between the entries of the two matrices $${\it{\Lambda}}$$ and $$Q$$. We now call the probability distribution of $$X_1(t)$$, $$t\in\mathbb{R}_+$$, $$p(t,x)=\mathbb{P}\left\{X_1(t)=x\right\}\quad\quad x\in\mathbb{N},$$ and form the column vector $$\textbf{p}(t)=\left(\begin{array}{c}p(t,0)\\p(t,1)\\\vdots\end{array}\right)\!.$$ Proposition 2 For every $$t\in\mathbb{R}_+$$, the probability distribution $$\textbf{p}(t)$$ of $$X_1(t)$$ is uniquely determined by the transition rate matrix $${\it{\Lambda}}$$ and the initial distribution $$\textbf{p}(0)$$. Proof. The generator of $$X_1(t)$$ is given by $$\mathcal{G}f(x)=-\lambda(x)f(x)\quad+\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x,x') f(x').$$ The Fokker–Planck operator of $$X_1(t)$$ is given by $$\mathcal{L}_{\textrm{FP}}\,p(t,x)=-\lambda(x)p(t,x)\quad+\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x',x)p(t,x').$$ The distribution $$p(t,x)$$ before steady-state solves the following Fokker–Planck equation: $$\dot{p}(x,t)=\mathcal{L}_{\textrm{FP}}\,p(t,x) \quad\quad t>0,$$ with the initial condition $$p(0,x)=\mathbb{P}\left\{X_1(0)=x\right\}\!.$$ In compact notation the Fokker–Planck equation is rewritten as $$\dot{\textbf{p}}(t)={\it{\Lambda}}\,\textbf{p}(t) \quad\quad t>0.$$ The initial condition is a known distribution $$\textbf{p}(0)$$ on $$\mathbb{N}$$. The matrix $${\it{\Lambda}}$$ is constant in time, then the unique solution to the Fokker–Planck equation is $$\label{solution} \textbf{p}(t)=\exp\left\{t\,{\it{\Lambda}}\right\}\textbf{p}(0) \quad\quad t\in\mathbb{R}_+.$$ (6) □ Now we are ready to address the identification problem. We have observations of $$X_1(t)$$ at irregularly spaced instants $$t_1,...,t_K$$; these are deterministic and known. And for every observation time $$t_k$$, we have a large number $$N$$ of independent replicates for $$X_1(t_k)$$, written $$X_1(t_k,\omega_1),\ldots,X_1(t_k,\omega_N).$$ The best estimation for the unknown parameters based on the observations is given by the minimization of the following criterion over $$\theta_i$$, $$i\in\{1,\ldots,n_r\}$$: $$\sum_x\sum_{k=1}^K\left[p(t_k,x)-\hat{p}(t_k,x)\right]^2,$$ where $$\hat{p}(t_k,x)=\frac{1}{N}\sum_{n=1}^N {\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t_k,\omega_n)=x\right\}} \quad\quad x\in\mathbb{N}$$ is the empirical probability distribution of $$X_1(t_k)$$, $$k\in\left\{1,\ldots,K\right\}$$. Equation (6) provides us with all what we need to compute the criterion for given values of the $$\theta_i$$’s from ]0,1]. The last ingredient in parameter identification is a truncation: we wish to truncate the state space $$\mathbb{N}$$ to a bounded region $$\{0,1,\ldots,x_{{\textrm{max}}}\}$$; this is, actually the set containing the observed values of $$X_1(t)$$. As a matter of fact, limiting the observation to a finite duration is strictly equivalent to a truncation of the state space for the observable process. We need a theoretical framework to use the truncation without any approximation. Here we call the theory of censored Markov chains; it allows to represent the conditional behaviour of a system within a subset of observed states. Bounding the state space for the observable component of the state $$X_1(t)$$ is not only essential to perform the identification, but it also says that the population of species $$S_1$$ cannot be arbitrary large in size for the maximum allowed value of the observation duration; this is quite natural in our application. Often, in biological processes, we have low populations (e.g., the key molecules are in low copy number) and concentrations are not well defined. Consider the following partition for the state space $$\mathbb{N}$$ of $$X_1(t)$$: $$\mathfrak{D}=\{0,1,\ldots,x_{{\textrm{max}}}\}\quad\quad \mathbb{N}\setminus\mathfrak{D}$$ so as to obtain the following block description of the transition probability matrix of $$X_1(t)$$: $$Q=\left(\begin{array}{cc}Q_{11}&Q_{12}\\ Q_{21}&Q_{22}\end{array}\right)\!.$$ The blocks $$Q_{11}$$, $$Q_{12}$$, $$Q_{21}$$ and $$Q_{22}$$ correspond respectively to jumps within $$\mathfrak{D}$$, jumps from $$\mathfrak{D}$$ to $$\mathbb{N}\setminus\mathfrak{D}$$, jumps from $$\mathbb{N}\setminus\mathfrak{D}$$ to $$\mathfrak{D}$$ and jumps within $$\mathbb{N}\setminus\mathfrak{D}$$. The censored chain with censoring set $$\mathfrak{D}$$ observes only the states in $$\mathfrak{D}$$. More precisely, let $$\tau_k$$, $$k\in\{1,2,\ldots\}$$, be the jump times of $$X_1(t)$$ relative to jumps between states within $$\mathfrak{D}$$, that is, for every $$k\in\{1,2,\ldots\}$$, $$X_1(\tau_k^{\,-})$$ and $$X_1(\tau_k)$$ belong to $$\mathfrak{D}$$. The discrete-time process $$(X_1(\tau_k),k\in\{1,2,\ldots\})$$ is called the censored chain with censoring set $$\mathfrak{D}$$; its transition probability matrix is equal to $$\label{CMC} Q_{{\textrm{cens}}}=Q_{11}+Q_{12}(I-Q_{22})^{-1}Q_{21}.$$ (7) $$Q_{{\textrm{cens}}}$$ is the so-called stochastic complement of matrix $$Q_{11}$$ (Bušić, 2012). Then, knowing $$\lambda(x)$$ for every $$x\in\{0,1,\ldots,x_{{\textrm{max}}}\}$$, we derive the transition rate matrix $${\it{\Lambda}}_{{\textrm{cens}}}$$ for the censored chain with censoring set $$\mathfrak{D}$$. Finally, parameter identification reduces to the following optimization problem: $$\label{optimization_problem} \min_{\theta_1,\ldots,\theta_{n_r}\in]0,1]}\,\sum_{x=0}^{x_{{\textrm{max}}}}\sum_{k=1}^K\left[p(t_k,x)-\hat{p}(t_k,x)\right]^2,$$ (8) where $$\hat{p}(t_k,x)=\frac{1}{N}\sum_{n=1}^N {\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t_k,\omega_n)=x\right\}} \quad\quad x\in\{0,1,\ldots,x_{{\textrm{max}}}\},$$ and $$\left(\begin{array}{c}p(t_k,0)\\\vdots\\p(t_k,x_{{\textrm{max}}})\end{array}\right)=\exp\left\{t_k\,{\it{\Lambda}}_{{\textrm{cens}}}\right\}\left(\begin{array}{c}p(0,0)\\\vdots\\p(0,x_{{\textrm{max}}})\end{array}\right) \quad\quad t_k\in\{t_1,...,t_K\}.$$ Remark. It is possible to consider other loss functions than the squared difference in Eq. (8), e.g., the Kullback–Leibler divergence. Ultimately, it is worth noting that observing a single species and accordingly considering a one-dimensional observation process is rather a simplification than a limitation. If several components of the state are observable, the generalization of our method for parameter identification is straightforward. 5. Case study continued Let us go back to the first case study from ecological engineering. The purpose of this section is to emphasize and complete our approach for parameter identification while implementing it. In this case the state $$X(t)=(X_1(t),X_2(t))^{{\top}}$$ has three types of jumps with rates given by \begin{eqnarray*} \lambda((x_1,x_2),(x_1+1,x_2+1))&=&\theta_1{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq0\right\}}\,C_{x_1}^{0}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq0\right\}}\,C_{x_2}^{0}+ \theta_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,C_{x_1}^{1}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq0\right\}}\,C_{x_2}^{0},\\ \lambda((x_1,x_2),(x_1-1,x_2))&=&\theta_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,C_{x_1}^{1}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,C_{x_2}^{1},\\ \lambda((x_1,x_2),(x_1,x_2-1))&=&\theta_4{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq0\right\}}\,C_{x_1}^{0}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,C_{x_2}^{1}, \end{eqnarray*} i.e., \begin{eqnarray*} \lambda((x_1,x_2),(x_1+1,x_2+1))&=&\theta_1+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,x_1,\\ \lambda((x_1,x_2),(x_1-1,x_2))&=&\theta_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,x_1x_2,\\ \lambda((x_1,x_2),(x_1,x_2-1))&=&\theta_4{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,x_2. \end{eqnarray*} Hence, the observable component of the state $$X_1(t)$$ has two types of jumps with the following rates: \begin{eqnarray*} \lambda(x,x+1)&=&\tilde{\theta}_1+\tilde{\theta}_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq1\right\}}\,x,\\ \lambda(x,x-1)&=&\tilde{\theta}_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq1\right\}}\,x. \end{eqnarray*} Without truncation the transition probability matrix and the transition rate matrix of $$X_1(t)$$ are given by $${\it{\Lambda}}=\left(\begin{array}{cccccc}-\lambda(0,1)&\lambda(1,0)&0&\cdots\cdots&0&\cdots\\ \lambda(0,1)&-\lambda(1,0)-\lambda(1,2)&\lambda(2,1)&&\vdots&\cdots\\ 0&\lambda(1,2)&-\lambda(2,1)-\lambda(2,3)&&\vdots&\cdots\\ \vdots&0&\lambda(2,3)&&\vdots&\cdots\\ \vdots&\vdots&0&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)&\cdots\\ \vdots&\vdots&\vdots&&-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)&\cdots\\ \vdots&\vdots&\vdots&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\vdots&\cdots \end{array}\right)\!,$$ $$Q^{{\top}}=\left(\begin{array}{cccccc}0&\frac{\lambda(1,0)}{\lambda(1,0)+\lambda(1,2)}&0&\cdots\cdots&0&\cdots\\ 1&0&\frac{\lambda(2,1)}{\lambda(2,1)+\lambda(2,3)}&&\vdots&\cdots\\ 0&\frac{\lambda(1,2)}{\lambda(1,0)+\lambda(1,2)}&0&&\vdots&\cdots\\ \vdots&0&\frac{\lambda(2,3)}{\lambda(2,1)+\lambda(2,3)}&&\vdots&\cdots\\ \vdots&\vdots&0&&\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\vdots&\cdots \end{array}\right)\!.$$ In the present case we are able to build the four blocks $$Q_{11}^{{\top}}$$, $$Q_{12}^{{\top}}$$, $$Q_{21}^{{\top}}$$ and $$Q_{22}^{{\top}}$$; it follows from Eq. (7) that $$Q_{{\textrm{cens}}}^{{\top}}=\left(\begin{array}{cccccc}0&q(1,0)&0&\cdots\cdots&0\\ 1&0&q(2,1)&&\vdots\\ 0&q(1,2)&0&&\vdots\\ \vdots&0&q(2,3)&&\vdots\\ \vdots&\vdots&0&&q(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)\\ \vdots&\vdots&\vdots&&\propto q(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)q(x_{{\textrm{max}}}+1,x_{{\textrm{max}}}) \end{array}\right)\!.$$ Here it appears clearly that we must have $$q(x_{{\textrm{max}}},x_{{\textrm{max}}}+1) =\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}=0,$$ and, consequently, $$q(x_{{\textrm{max}}},x_{{\textrm{max}}}-1) =\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}=1.$$ In other words, the truncation for the state space $$\mathbb{N}$$ of the observation process $$X_1(t)$$ turns out to be equivalent to killing or stopping $$X_1(t)$$ at $$\tau$$, the hitting time of $$x_{{\textrm{max}}}+1$$, and considering $$X_1(t)$$ on $$[0,\tau[$$. Finally, knowing $$\lambda(x)$$ for every $$x\in\mathfrak{D}$$, we derive the transition rate matrix for the censored chain with censoring set $$\mathfrak{D}$$, namely $${\it{\Lambda}}_{{\textrm{cens}}}=\left(\begin{array}{cccccc}-\lambda(0,1)&\lambda(1,0)&0&\cdots\cdots&0\\ \lambda(0,1)&-\lambda(1,0)-\lambda(1,2)&\lambda(2,1)&&\vdots\\ 0&\lambda(1,2)&-\lambda(2,1)-\lambda(2,3)&&\vdots\\ \vdots&0&\lambda(2,3)&&\vdots\\ \vdots&\vdots&0&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)\\ \vdots&\vdots&\vdots&&-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1) \end{array}\right)\!.$$ To complete our general statements of the previous section about the solution to the inverse problem, we must provide an arbitrary implementation for numerical verification of the finiteness along with the non-emptiness of $$E_i$$, for every $$i\in\{1,\ldots,n_r\}$$. The implementation details are as follows: $$X_1(0)$$ and $$X_2(0)$$ are discrete uniform random numbers from $$\{0,1\}$$; $$\theta_1=0.1$$, $$\theta_2=0.1$$, $$\theta_3=0.1$$, $$\theta_4=0.1$$ per time unit; $$x_{{\textrm{max}}}=100$$ individuals from the observable species. Figure 3 shows a particular trajectory of the two-dimensional PDMP $$X(t)$$. Fig. 3. View largeDownload slide Arbitrary sample paths of the species’ population size. Fig. 3. View largeDownload slide Arbitrary sample paths of the species’ population size. The empirical probability distributions of $$X_1(t)$$ observed at $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units are displayed in chronological order as a column of histograms in Fig. 4. Theses histograms are confronted to the corresponding probability distributions computed using the estimated parameters; the obtained distributions are shown as red stems in the subplots of Fig. 4. A solution to the optimization problem given by Eq. (8) is obtained using the optimization toolbox of MATLAB; here MATLAB’s functions aim at solving the optimization problem in the weak sense of finding a locally optimal point. lsqnonlin, for instance, requires an initial guess for the optimization variables $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. This initial guess or starting point is critical, and can greatly affect the objective value of the local solution obtained. In this case, setting the initial guess to the zero lower bound yields the following estimates: $$\tilde{\theta}_1=0.9838,\quad\tilde{\theta}_2=0.6035\,\quad\tilde{\theta}_3=1.7237.$$ Fig. 4. View largeDownload slide From the top: the empirical distributions (white bars) versus the estimated ones (red stems) for $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units. $$x$$-axis: number of individuals from the observed species, $$y$$-axis: probability. Fig. 4. View largeDownload slide From the top: the empirical distributions (white bars) versus the estimated ones (red stems) for $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units. $$x$$-axis: number of individuals from the observed species, $$y$$-axis: probability. Hence, \begin{eqnarray*} \frac{\tilde{\theta}_1}{\theta_1}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_1\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{12}\right\}}\,C_{x_2}^{\nu_{12}}&\approx&10,\\ \frac{\tilde{\theta}_2}{\theta_2}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_2\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{22}\right\}}\,C_{x_2}^{\nu_{22}}&\approx&6,\\ \frac{\tilde{\theta}_3}{\theta_3}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_3\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{32}\right\}}\,C_{x_2}^{\nu_{32}}&\approx&17, \end{eqnarray*} where $$E_i=\left\{x\in E:\textrm{ there are jumps of type i out of }x\right\}\quad\quad i\in\{1,2,3\}.$$ Thus, we have checked that for every $$i$$ for which $${\it{\Delta}}_{i1}\neq0$$ the enumerable set $$E_i$$ is non-empty and finite. 5.1. Artificial neural networks for parameter calibration In the remainder, we address the question of how to proceed when we only have on hand the real observed histograms and the initial probability distribution of the process $$X(t)$$. Here the outstanding issue is to find a triplet $$(\theta_1,\theta_2,\theta_3)$$ producing any desired triplet $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. This can be addressed by learning the inverse of the function $$f({\it{\Theta}})=\tilde{{\it{\Theta}},}$$ where $${\it{\Theta}}=(\theta_1,\theta_2,\theta_3)$$ and $$\tilde{{\it{\Theta}}}=(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. However, if two different $${\it{\Theta}}$$ vectors have the same $$\tilde{{\it{\Theta}}}$$ vector, then an inverse neural network would end up learning the average of the two rather than learning one or the other. We must therefore assume the function $$f$$ to be bijective, and we propose a method composed of two steps: First, we perform Monte-Carlo simulations of the system for $$N$$ independent and identically distributed samples $$(\theta_1,\theta_2,\theta_3)_n$$, $$n\in\{1,\ldots,N\}$$, drawn from the uniform distribution on $$]0,1]^3$$. For each $$n\in\{1,\ldots,N\}$$, we perform the optimization using the simulated data so that to obtain the triplet of estimates $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n$$. Now that we have a set of pairs $$\left\{(\theta_1,\theta_2,\theta_3)_n\,,\,(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n\right\} \quad\quad n\in\{1,\ldots,N\},$$ we then approximate the non-linear input–output relationship between the parameters $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$ and $$(\theta_1,\theta_2,\theta_3)$$ by supervised learning using artificial neural networks (ANN). Here the inputs are the $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n$$’s and the desired targets are the $$(\theta_1,\theta_2,\theta_3)_n$$’s. From these data, it is required to construct a map that generalizes well, that is, given a new value of $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$, the map will provide a reasonable prediction for the unobserved output $$(\theta_1,\theta_2,\theta_3)$$ associated with this new value of the predictor variable, at least when it is not too far from the training predictor set. Much interest in ANN has been based on the use of trainable feedforward networks as universal approximators for functions $$f:\mathcal{U}\longrightarrow\mathcal{Y}$$ from the input space $$\mathcal{U}$$ to the output space $$\mathcal{Y}$$. In fact, there is proof that a fairly simple ANN can fit any practical function. The neural net fitting app of MATLAB makes it easy to create and train a single-hidden-layer feedforward neural net for function approximation and non-linear regression. Besides, for improving generalization, the default method provided by MATLAB’s neural network toolbox is called early stopping. In this technique the available data is divided into three subsets: training, validation and test sets. The error on the validation set decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set typically begins to rise. When the validation error increases for a specified number of iterations, the training is stopped and the weights and biases at the minimum of the validation error are returned. After training, the test data set is used to estimate the prediction error, that is, how well the network performs on future data. A data set of $$N=3500$$ example subjects is generated, and then it is randomly divided into three sets; here 70% is used for training, 15% is used to validate, that is, to stop training before overfitting and 15% is used to test the generalization ability of the network. A reasonably satisfactory approximation is obtained from a basic network with 10 neurons and a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer. The network performance computed on the test data set, that is, the mean square error (MSE) between the actual output and the target, amounts to 0.0392. A plot of the training MSE, validation MSE and test MSE is shown in Fig. 5. The error histogram in Fig. 6 gives additional performance assessment. A linear regression between the network outputs and the corresponding targets for the training, validation and test sets is displayed in Fig. 7. Fig. 5. View largeDownload slide Training MSE, validation MSE and test MSE. Fig. 5. View largeDownload slide Training MSE, validation MSE and test MSE. Fig. 6. View largeDownload slide Error histogram. Fig. 6. View largeDownload slide Error histogram. Fig. 7. View largeDownload slide Linear regression between the network outputs and the corresponding targets for the training, validation, and test sets. Fig. 7. View largeDownload slide Linear regression between the network outputs and the corresponding targets for the training, validation, and test sets. 6. Conclusion There is a wide range of applications in the field of stochastic interaction networks; here model identification is the most challenging problem theoretically. We give an advance in assessing and learning the functional connections within such a network using a PDMP as model. While the underlying assumptions are more realistic, a PDMP lends itself well to the identification problem. We illustrate through two applications of biological network modelling. The conformity between the implementation results and the general statements, along with a low computational cost prove the performance of our approach. References Baili H. & Fleury G. ( 2002 ) Statistical characterization of a measurement—approach using MCMC. Proceeding of IEEE 19th Instrumentation and Measurement Technology Conference , Anchorage, USA , May 21 – 23 . Baili H. & Fleury G. ( 2004 ) Indirect measurement within dynamical context: probabilistic approach to deal with uncertainty. IEEE Trans. Instrum. Meas. , 53 , 1449 – 1454 . Google Scholar CrossRef Search ADS Bect J. , Baili H. & Fleury G. ( 2006 ) Generalized Fokker–Planck equation for piecewise-diffusion processes with boundary hitting resets. Proceedings of 17th International Symposium on Mathematical Theory of Networks and Systems , Kyoto, Japan , July 24 – 28 . Brémaud P. ( 1981 ) Point Processes and Queues: Martingale Dynamics . Springer . Google Scholar CrossRef Search ADS Bušić A. , Djafri H. & Fourneau J. M. ( 2012 ) Bounded state space truncation and censored Markov chains. IEEE 51st Annual Conference on Decision and Control , Maui, Hawaii, USA , December 10 – 13 . Gardiner C. W. ( 1985 ) Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences , 2nd edn . Berlin : Springer . Google Scholar CrossRef Search ADS Hegland M. & Garcke J. ( 2010 ) On the numerical solution of the chemical master equation with sums of rank one tensors. ANZIAM J. , 52 , C628 – C643 . Google Scholar CrossRef Search ADS Hohmann S. ( 2002 ) Osmotic stress signaling and osmoadaptation in yeasts. Microbiol. Mol. Biol. Rev ., 66 , 300 – 372 . Google Scholar CrossRef Search ADS PubMed Jacod J. & Skorokhod A. V. ( 1996 ) Jumping Markov processes. Ann. I. H. P. , 32 , 11 – 67 . Jahnke T. & Huisinga W. ( 2008 ) A dynamical low-rank approach to the chemical master equation. Bull. Math. Biol. , 70 , 2283 – 2302 . Google Scholar CrossRef Search ADS PubMed Kazeev V. , Khammash M. , Nip M. & Schwab C. ( 2014 ) Direct solution of the chemical master equation using qunatized tensor trains. PLOS Comput. Biol. , 10 . Klebaner F. C. ( 2005 ) Introduction to Stochastic Calculus with Applications, 2nd edn . London : Imperial College Press . Google Scholar CrossRef Search ADS Kulasiri D. & Verwoerd W. ( 2002 ) Stochastic Dynamics. Modelling Solute Transport in Porous Media , 1st edn . Amsterdam : Elsevier Science . Øksendal B. ( 2003 ) Stochastic Differential Equations: An Introduction with Applications , 6th edn . Springer . Pelet S. et al. ( 2011 ) Transient activation of the HOG MAPK pathway regulates bimodal gene expression. Science, 332 , 732 – 735 . Google Scholar CrossRef Search ADS Protter P. E. ( 2005 ) Stochastic Integration and Differential Equations , 2nd edn . Springer . Google Scholar CrossRef Search ADS Risken H. ( 1989 ) The Fokker–Planck Equation: Methods of Solution and Applications , 2nd edn . Springer . Google Scholar CrossRef Search ADS Ruess J. , Milias-Argeitis A. , Summers S. & Lygeros J. ( 2011 ) Moment estimation for chemically reacting systems by extended Kalman filtering. J. Chem. Phys. , 135 , 165102 . Google Scholar CrossRef Search ADS PubMed Zechner C. et al. ( 2012 ) Moment-based inference predicts bimodality in transient gene expression. Proc. Natl Acad. Sci. USA , 109 , 8340 – 8345 . Google Scholar CrossRef Search ADS © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Mathematical Control and Information Oxford University Press

# Parametric identification of stochastic interaction networks

, Volume Advance Article – Sep 11, 2017
24 pages

/lp/ou_press/parametric-identification-of-stochastic-interaction-networks-QafOAfPgM0
Publisher
Oxford University Press
© The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
ISSN
0265-0754
eISSN
1471-6887
D.O.I.
10.1093/imamci/dnx037
Publisher site
See Article on Publisher Site

### Abstract

Abstract This article addresses issues from applied stochastic analysis for solving parameter identification problems in interaction networks. Purely discontinuous Markov processes are the most appropriate choice for modelling these systems and it is of paramount importance to estimate the unknown characteristics of the model given the measured data. The model induces a Fokker–Planck–Kolmogorov equation along with moment equations, and achieving parametric identification based on direct solutions of these equations has remained elusive. We propose a novel approach which utilizes stochastic analysis of continuous-time Markov processes to lift the curse of dimensionality in parametric identification. We illustrate through a case study from ecological engineering. 1. Introduction Reaction networks are systems in which the populations of a finite number of species evolve according to predefined interactions. They are an important tool for modelling in several disciplines: biochemistry, epidemiology, pharmacology, ecology and social networks. Consider a system with $$n_s$$ species $$S_1,\ldots,S_{n_s}$$ that may interact according to $$n_r$$ different interactions written in the form $$\nu_{i1}S_1+\cdots+\nu_{in_s}S_{n_s}\longrightarrow \mu_{i1}S_1+\cdots+\mu_{in_s}S_{n_s} \quad\quad i\in\{1,\ldots,n_r\}.$$ $$\nu_{ij}$$ denotes the number of individuals from species $$S_j$$ needed for interaction $$i$$ to occur. The expression on the right-hand side represents the outcome of interaction $$i$$. Here, we assume instantaneous outputs of the interaction into the system. In other words, interaction $$i$$ means for every $$j\in\{1,\ldots,n_s\}$$ an instantaneous jump of amplitude $$\mu_{ij}-\nu_{ij}$$ for the number of individuals from species $$S_j$$. Therefore, we do not have necessarily $$\mu_{ij}\neq\nu_{ij}$$ for every $$j\in\{1,\ldots,n_s\}$$, but we must have $$(\mu_{i1},\ldots,\mu_{in_s})\neq(\nu_{i1},\ldots,\nu_{in_s}).$$ Interaction $$i$$ is an event occurring at random times; it occurs only if the required individuals from all the species are available. For $$dt>0$$, let $$\theta_i\,dt$$ be the probability that the required individuals for interaction $$i$$ successfully perform it as and when they meet during the time interval $$[t,t+dt]$$. $$\theta_i$$ is called the rate of success of interaction $$i$$; note that we have assumed the success rates to be independent of $$t$$ (i.e., constant in time). By construction, $$\theta_i>0$$. And without loss of generality, we assume that $$\theta_i\leq1$$; this amounts to a choice of the time unit. The continuous-time process $$X(t)=(X_1(t),\ldots,X_{n_s}(t))^{{\top}} \quad\quad t\in\mathbb{R}_+,$$ where $$X_j(t)$$ denotes the number of individuals from species $$S_j$$ present in the system at time $$t$$, has values in $$\mathbb{N}^{n_s}$$ and right continuous with finite left limits (rcll) sample paths. In this article, we address the inverse problem of parameter identification from measured data. Specifically, we need to estimate the unknown parameters $$\theta_i$$, $$i\in\{1,\ldots,n_r\}$$. The data consists of the population size of only one species, say $$S_1$$, measured at discrete observation times. Overview of this work and bibliographical notes on related work Even to the most casual observer, it should be clear that randomness is inherent in both the order and the timing of the interactions. Stochastic models provide a framework for characterizing this uncertainty, especially when dealing with reactants in low copy numbers. Namely, the evolution of the species of interest is modelled by continuous-time conservative Markov processes. Within this framework, the evolution of the probability distribution is given by the Fokker–Planck–Kolmogorov (FPK) equation. When the state space is the nonnegative integer lattice, the FPK equation is frequently referred to as the chemical master (CM) equation. In most cases the FPK equation cannot be solved analytically; numerical tools have been tested and proven reliable to approximate its solution in a few particular cases (Baili & Fleury, 2002, 2004; Bect et al., 2006). Here the challenge is to lift the curse of dimensionality: the exponential growth of computational cost in the dimension (i.e., the number of species) and the mode size (i.e., the maximum copy number). Kazeev et al. (2014) traces the origins of stochastic interaction networks in biochemistry and attributes the subject to recent approaches of dimensionality reduction. A first approach is based on moment closure (Ruess et al., 2011). This approach performs poorly in situations where one of the species exhibits low population size. A second approach uses a low parametric representation of tensors called canonical polyadic (CP) decomposition. This is a generalization of the singular value decomposition (SVD) to tensors of dimension greater than two. The main issue in applying the CP decomposition is to control the tensor rank of the approximate solution to the CM equation. In Hegland & Garcke (2010), it is suggested to recompute a lower rank CP decomposition after every arithmetic operation. Another approach, Jahnke & Huisinga (2008), consists in projecting the dynamics onto a manifold composed of all tensors with a CP decomposition of some predetermined maximal rank. These results in a set of coupled non-linear ordinary differential equations which are solved using available ODE solvers. The approach proposed in Kazeev et al. (2014) is based on a numerical tensor algebra, called quantized tensor train, that operates on a low parametric numerical representation of tensors rather than on their CP decomposition. The present article uses very different techniques and yields results under no assumptions nor approximations for the solution to the FPK equation. We divide the article into two parts; the technical part is composed of Sections 2 and 4. In this part, we discuss the Markovian model and then derive the bearings of this model. Of interest are a couple of operators associated to every Markov process: the generating operator and the Fokker–Planck operator. This part reviews in the present context classical results of Dynkin’s formula (Kulasiri & Verwoerd, 2002) and moment equations. The novelty of this work resides in Section 4. Our first main contribution consists in computing the matrix of transition rates for the observable component of the state; here it is important to notice that the observable component of the state is a non-Markovian purely discontinuous process whose characteristics depends on the whole state. In our second main result, Proposition 2, we show that the probability distribution of the observable component of the state is entirely determined by its transition rate matrix and its initial distribution. As a consequence, parametric identification turns out to a non-linear least-squares problem. Section 3 together with Section 5 form the implementation part of the article. The former presents two applications of interaction networks from ecological engineering and bioengineering, and the latter shows a demonstration of parametric identification for the first application. 2. Modelling: purely discontinuous Markov processes Let $$x_t$$ be a continuous-time stochastic process based on some filtered probability space $$\displaystyle({\it{\Omega}},\mathcal{F},\mathbb{F},\mathbb{P})$$. $$x_t$$ is called a purely discontinuous process (PDP) (Jacod & Skorokhod, 1996) if its sample paths are rcll piecewise-constant functions of time. The state space $$E$$ of a PDP is a subset from $$\mathbb{R}^d$$ that may be discrete or continuous. The instants of jumps of $$x_t$$, $$\displaystyle\{T_k\}_{k\in\{1,2,\dots\}}$$, form an increasing sequence of positive random variables. Here, we assume that for each $$t\in\mathbb{R}_+$$ $$\mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\right]<\infty.$$ Hence, almost surely, finitely many jumps take place in any finite time interval. This assumption holds naturally in applications. Besides, the jump times of $$x_t$$ are $$\mathbb{F}$$-stopping times of two kinds: totally inaccessible for spontaneous jumping, or predictable if the jumps are triggered by some events. More precisely, forced jumps of $$x_t$$, if any, are triggered only by hitting the boundary of the state space. Furthermore, for spontaneous jumping there is a hazard rate or intensity denoted $$z_t$$. This is a continuous-time stochastic process with nonnegative values. Totally inaccessible jump times are related to $$z_t$$ via the generalized exponential distribution (Brémaud, 1981): for $$t\in\mathbb{R}_+$$ \begin{gather*} \mathbb{P}\left\{T_1 > t\,|\,\mathcal{F}_t\right\}=\exp\left\{-\int_0^t z_s\,ds\right\}\!, \\ \mathbb{P}\left\{T_k > t\,|\,\mathcal{F}_{T_{k-1}}\right\}= \exp\left\{-\int_{T_{k-1}}^t z_s\,ds\right\}\,{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1}\leq t\}}\quad+\quad{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1} > t\}}\quad\quad k\in\{2,3,\dots\}, \end{gather*} (For homogeneity of notation we set $$T_0=0$$). $$\mathcal{F}_{T_{k-1}}$$ is called the $$\sigma$$-algebra of events prior to the $$\mathbb{F}$$-stopping time $$T_{k-1}$$ (Øksendal, 2003; Protter, 2005), namely $$\mathcal{F}_{T_{k-1}}=\sigma\left\{A\in\mathcal{F}_{\infty}: \forall t\in\mathbb{R}_+,A\cap\left\{T_{k-1}\leq t\right\}\in\mathcal{F}_t\right\}\!,$$ where $$\mathcal{F}_{\infty}=\sigma\left(\bigcup_{\,t\in\mathbb{R}_+}\mathcal{F}_t\right)\!.$$ Without loss of generality, we further assume that $$z_t$$ is a state feedback: at the current time $$t$$, it may depend only on $$t$$ and $$x_{t^{\,-}}$$, where $$x_{t^{\,-}}=\displaystyle\lim_{s\uparrow t}x_s.$$ In fact, a PDP $$x_t$$ with spontaneous jumping is Markovian if and only if $$z_t=\lambda(t,x_{t^{\,-}})$$ or $$z_t=\lambda(x_{t^{\,-}})$$; it is time-homogeneous Markovian only in the latter case. Predictable jump times, regarded as a point process, do not have a hazard rate process. We define the active or essential boundary of the state space as a set of boundary points $$\bar{x}\in\partial E$$ almost surely reachable. The hitting times of $$\partial E$$ are $$\mathbb{F}$$-stopping times since $$x_t$$ is rcll and $$\partial E$$ is a closed set. The active boundary, if non-empty, is assumed to be non-absorbing, that is, there is no killing of the process $$x_t$$ on the boundary of the state space. In case of non-empty active boundary, we may have jumps out of the active boundary into $$E$$. In this respect, we denote $$\partial E^{{\textrm{rep}}}$$ the repulsive active boundary, that is, everywhere $$x_t$$ reaches $$\partial E^{{\textrm{rep}}}$$, it jumps instantaneously to interior or boundary points belonging to the state space $$E$$. Remark. In case where $$\partial E \subset E$$, there may be boundary points of the state space where the state process sticks for a random duration and then jumps spontaneously; these boundary points belong to the active boundary but not to $$\partial E^{{\textrm{rep}}}$$. The last ingredient of the PDP is a transition kernel; it reinitiates the PDP after every jump, of both types. Specifically, the transition kernel is the probability distribution on $$E$$ of the post-jump state $$x(T_k)$$ under the conditional probability measure $$\mathbb{P}\left\{\left.\bullet\right|\,\mathcal{F}_{T_k^{\,-}}\right\}$$, where $$\mathcal{F}_{T_k^{\,-}}$$ is the $$\sigma$$-algebra of events strictly prior to the $$\mathbb{F}$$-stopping time $$T_k$$, namely $$\mathcal{F}_{T_k^{\,-}}=\mathcal{F}_0\vee\sigma\left\{A\cap\left\{t<T_k\right\}:A\in\mathcal{F}_t,t\in\mathbb{R}_+\right\}\!.$$ The transition kernel is assumed to be Markovian, that is, dependent only on $$T_k$$ and $$x(T_k^{\,-})$$: for a Borel-measurable set $$A$$ from $$E$$ $$\mathbb{P}\left\{\left.x(T_k)\in A\right|\,\mathcal{F}_{T_k^{\,-}}\right\}\triangleq Q(T_k,x(T_k^{\,-}),A).$$ Since the active boundary may not belong to the state space $$E$$, $$x(T_k^{\,-})$$ has values in $$E\cup\partial E^{{\textrm{rep}}}$$. The ingredients of the PDP described above result in a strong Markovian process. It is clear that the proposed PDP can be applied for modelling our interaction network; the remainder is to identify all its ingredients. Here, $$d=n_s$$ and $$E$$ is the nonnegative integer lattice $$\mathbb{N}^{n_s}$$. Hence, the boundary $$\partial E=\emptyset$$. $$X(t)$$ jumps spontaneously, that is, the jump times are all totally inaccessible $$\mathbb{F}$$-stopping times. The last two ingredients to identify are the jumping intensity and the transition kernel; these are derived from the interaction scheme $$\nu_{i1}S_1+\cdots+\nu_{in_s}S_{n_s}\longrightarrow \mu_{i1}S_1+\cdots+\mu_{in_s}S_{n_s} \quad\quad i\in\{1,\ldots,n_r\}.$$ We claim that $$\mathbb{P}\left\{\left.T_{k}\in]t,t+dt]\,\right|\,\mathcal{F}_{T_{k-1}}\right\}= \mathbb{P}\left\{T_k > t\,|\,\mathcal{F}_{T_{k-1}}\right\}\,{\rm 1}\kern-0.24em{\rm I}_{\{T_{k-1}\leq t\}}\sum_{i=1}^{n_r}\theta_i\,dt\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(t^{\,-})\geq\nu_{ij}\}}\,C_{X_j(t^{\,-})}^{\nu_{ij}},$$ where $$C_n^k$$ denotes the binomial coefficient, defined as $$\frac{n!}{k!(n-k)!}$$. The distribution of $$T_{k}$$ under the conditional probability measure $$\mathbb{P}\left\{\left.\bullet\right|\,\mathcal{F}_{T_{k-1}}\right\}$$ is given by the generalized exponential distribution above; it follows that $$z_t=\sum_{i=1}^{n_r}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(t^{\,-})\geq\nu_{ij}\}}\,C_{X_j(t^{\,-})}^{\nu_{ij}}.$$ This is a particular case of the state feedback form $$z_t=\lambda(X(t^{\,-}))$$ with $$\lambda(x)=\sum_{i=1}^{n_r}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}.$$ $$z_t$$ likewise the $$\theta_i$$’s has unit: per time unit or (time unit)$$^{-1}$$. After every jump the process is reinitiated according to the following sum of random Dirac’s measures: $$Q(T_k,X(T_k^{\,-}),A)= \sum_{i=1}^{n_r}\frac{\theta_i\left(\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(T_k^{\,-})\geq\nu_{ij}\}}\,C_{X_j(T_k^{\,-})}^{\nu_{ij}}\right)}{\sum_{i=1}^{n_r}\theta_i\left(\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\{ X_j(T_k^{\,-})\geq\nu_{ij}\}}\,C_{X_j(T_k^{\,-})}^{\nu_{ij}}\right)} \,\delta_{X(T_k^{\,-})+{\it{\Delta}}_i}(A),$$ where $${\it{\Delta}}_i=\left(\begin{array}{c}\mu_{i1}-\nu_{i1}\\ \vdots\\\mu_{in_s}-\nu_{in_s}\end{array}\right)\!.$$ Being independent of $$T_k$$, $$Q(T_k,X(T_k^{\,-}),A)$$ is a time-homogeneous Markovian kernel. And thus the Markov process $$X(t)$$ is time-homogeneous. 2.1. The generating operator and the Fokker–Planck operator for the Markov model The description above yields, in the first place, to the generating operator $$\mathcal{G}$$ of the homogeneous Markov process $$X(t)$$. $$\mathcal{G}f(x)=\lambda(x)\left[-f(x)+\int_E f(x')Q(x,dx')\right]\!.$$ That is, $$\mathcal{G}f(x)=\sum_{i=1}^{n_r}\theta_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)\left[-f(x)+f(x+{\it{\Delta}}_i)\right]\!,$$ where $$x=(x_1,\ldots,x_{n_s})$$ and $${\it{\Delta}}_i=(\mu_{i1}-\nu_{i1},\ldots,\mu_{in_s}-\nu_{in_s})$$. The domain $$D(\mathcal{G})$$ of $$\mathcal{G}$$ is the set of measurable real-valued functions on $$E$$. In the second place, the central concept of this section is the Fokker–Planck operator $$\mathcal{L}_{\textrm{FP}}$$ along with its domain $$D(\mathcal{L}_{\textrm{FP}})$$. Let $$p(t,x)$$ be the probability distribution of $$X(t)$$. The Fokker–Planck operator is the adjoint of the generating operator in the sense that, for $$f\in D(\mathcal{G})$$ and $$p\in D(\mathcal{L}_{\rm FP})$$, $$(\mathcal{G}f,p)=(f,\mathcal{L}_{\textrm{FP}}\,p).$$ The pairing of $$\mathcal{G}f$$ and $$p$$, regarded as a function of $$x$$, denotes $$\sum_{x\in E} \mathcal{G}f(x)p(t,x),$$ and $$(f,\mathcal{L}_{\textrm{FP}}\,p)$$ denotes $$\sum_{x\in E} f(x)\mathcal{L}_{\textrm{FP}}\,p(t,x).$$ Proposition 1 We claim that $$\mathcal{L}_{\textrm{FP}}\,p(t,x)$$ is given by $$\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)+{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!.$$ Proof. $$\sum_{x\in E} \mathcal{G}f(x)p(t,x)=\sum_{i=1}^{n_r}\theta_i \sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)\left[-f(x)+f(x+{\it{\Delta}}_i)\right]p(t,x)$$ First, we have $$\label{eq1} \sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)= \sum_{\left\{x\in E:\forall j,\, x_j-\nu_{ij}\geq0\right\}}\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x).$$ (1) Now make the change of variables $$x_j-\nu_{ij}+\mu_{ij}$$ by $$x'_j$$ in the right-hand side of Eq. (1). Then \begin{gather*} x_j=x'_j-\mu_{ij}+\nu_{ij},\\ \left\{x\in E:\forall j,\, x_j-\nu_{ij}\geq0\right\}=\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}\!, \end{gather*} and we obtain $$\sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)= \sum_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}\left(\prod_{j=1}^{n_s}C_{x'_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)f(x')p(t,x'-{\it{\Delta}}_i),$$ or equivalently, \begin{align*} &\sum_{x\in E}\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)f(x+{\it{\Delta}}_i)p(t,x)\\ &\quad =\sum_{x\in E} {\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)f(x). \end{align*} Thus $$\sum_{x\in E} \mathcal{G}f(x)p(t,x)$$ is equal to \begin{align*} &\sum_{x\in E}f(x)\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)\right.\\ &\quad \left.+{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!. \end{align*} □ As regards $$D(\mathcal{L}_{\textrm{FP}})$$, it is the set of functions $$p(t,x):\mathbb{R}_+\times E\rightarrow [0,1]$$ satisfying for each $$t\in\mathbb{R}_+$$ $$\sum_{x\in E}p(t,x)=1.$$ 2.2. Dynkin’s formula versus the Fokker–Planck equation Dynkin’s formula is the main tool in computing expectations for a wide class of functions (Klebaner, 2005). Let $$f\in D(\mathcal{G})$$; if for each $$t\in\mathbb{R}_+$$ $$\label{sufficient-condition} \mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\left|f(X(T_k))-f(X(T_k^{\,-}))\right|\right]<\infty,$$ (2) then $$\label{dynkin} \mathbb{E}\left[f(X(t))-f(X(0))\right]=\mathbb{E}\left[\int_0^t \mathcal{G}f(X(s))\,ds\right]\!.$$ (3) The condition (2) holds in particular when $$f$$ is bounded and for each $$t\in\mathbb{R}_+$$ $$\mathbb{E}\left[\sum_{k\geq 1}{\rm 1}\kern-0.24em{\rm I}_{\{0<T_k\leq t\}}\right]<\infty.$$ Developing Eq. (3) yields \begin{eqnarray*} &&\mathbb{E}\left[f(X(t))-f(X(0))\right]=\int_0^t \mathbb{E}\left[\mathcal{G}f(X(s))\right]ds,\\ &&=\int_0^t\sum_{i=1}^{n_r}\theta_i\,\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(X(s))\left(\prod_{j=1}^{n_s}C_{X_j(s)}^{\nu_{ij}}\right)\left[-f(X(s))+f(X(s)+{\it{\Delta}}_i)\right]\right]ds,\\ &&=\sum_{i=1}^{n_r}\theta_i\int_0^t\mathbb{E}\left[\left[-f(X(s))+f(X(s)+{\it{\Delta}}_i)\right]\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{X_j(s)\geq\nu_{ij}\right\}}C_{X_j(s)}^{\nu_{ij}}\right]ds. \end{eqnarray*} The Fokker–Planck equation is an important analytical tool for characterizing the law of a Markov process (Gardiner, 1985; Risken, 1989). The derivation of the Fokker–Planck equation is handled on a case-by-case base. The probability distribution of the purely discontinuous Markov process (PDMP) $$X(t)$$ solves the following FPK equation: for $$t>0$$ $$\frac{\partial p}{\partial t}(t,x)=\mathcal{L}_{\textrm{FP}}\,p(t,x),$$ i.e., \begin{align*} \frac{\partial p}{\partial t}(t,x)&=\sum_{i=1}^{n_r}\theta_i \left\{-{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\nu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j}^{\nu_{ij}}\right)p(t,x)\right.\\ &\quad\left. +{\rm 1}\kern-0.24em{\rm I}_{\left\{x'\in E:\forall j,\, x'_j-\mu_{ij}\geq0\right\}}(x)\left(\prod_{j=1}^{n_s}C_{x_j-\mu_{ij}+\nu_{ij}}^{\nu_{ij}}\right)p(t,x-{\it{\Delta}}_i)\right\}\!, \end{align*} with the initial condition $$p(0,x)=\mathbb{P}\left\{X(0)=x\right\}$$. 3. Case studies 3.1. Interaction networks in ecology The first case study we consider is the evolution of an agricultural pest. Let $$X_1(t)$$ be the size of the pest population, and $$X_2(t)$$ be an indicator of how much the environment has been deteriorated by the infestation. New pest individuals arise in the system due to immigration with rate $$\theta_1$$, or birth with rate which is proportional to the population size, namely $$\theta_2X_1(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t^{\,-})\geq1\right\}}.$$ The death rate of the pest is given by $$\theta_3X_1(t^{\,-})X_2(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\{X_1(t^{\,-})\geq1,X_2(t^{\,-})\geq1\}},$$ that is, it depends on the population size and the damage to the environment. In addition, $$X_2(t)$$ is increased by one unit instantly when a new pest individual is added. Since the pest individuals deteriorate the environment for a time period that may exceed their own life span, we assume that the death of pest individuals leaves $$X_2(t)$$ unchanged. Furthermore, the environment may recover (here, we assume that $$X_2(t)$$ decreases instantaneously by one unit) with rate $$\theta_4X_2(t^{\,-}){\rm 1}\kern-0.24em{\rm I}_{\left\{X_2(t^{\,-})\geq1\right\}}.$$ The description above corresponds schematically to the following interactions: \begin{eqnarray*} \emptyset&\longrightarrow&S_1+S_2\\ S_1&\longrightarrow&2S_1+S_2\\ S_1+S_2&\longrightarrow&S_2\\ S_2&\longrightarrow&\emptyset \end{eqnarray*} Obviously $$n_s=2$$, $$n_r=4$$, $$\nu_{11}=0$$, $$\nu_{12}=0$$, $$\mu_{11}=1$$, $$\mu_{12}=1$$, $$\nu_{21}=1$$, $$\nu_{22}=0$$; $$\mu_{21}=2$$, $$\mu_{22}=1$$, $$\nu_{31}=1$$, $$\nu_{32}=1$$, $$\mu_{31}=0$$, $$\mu_{32}=1$$, $$\nu_{41}=0$$, $$\nu_{42}=1$$, $$\mu_{41}=0$$, $$\mu_{42}=0$$, $${\it{\Delta}}_1=(1,1)$$, $${\it{\Delta}}_2=(1,1)$$, $${\it{\Delta}}_3=(-1,0)$$, $${\it{\Delta}}_4=(0,-1)$$. And then the Fokker–Planck equation is given by \begin{eqnarray*} \frac{\partial p}{\partial t}(t,x_1,x_2)&=&-\left[\theta_1+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}x_1+\theta_3{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}x_1x_2+\theta_4{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}x_2\right]p(t,x_1,x_2)\\ &&+\left[\theta_1{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\{x_1\geq2\}}{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}(x_1-1)\right]p(t,x_1-1,x_2-1)\\ &&+\theta_3{\rm 1}\kern-0.24em{\rm I}_{\{x_2\geq1\}}(x_1+1)x_2p(t,x_1+1,x_2)\\ &&+\theta_4(x_2+1)p(t,x_1,x_2+1). \end{eqnarray*} Now, we wish to show how the mean of the observable species $$S_1$$ evolve in time by applying the Dynkin’s formula to $$f(x)=x_1$$. First, for every $$i\in\{1,\ldots,n_r\}$$, $$-f(x)+f(x+{\it{\Delta}}_i)=-x_1+x_1+{\it{\Delta}}_{i1}={\it{\Delta}}_{i1}.$$ Then \begin{eqnarray*} \mathbb{E}\left[X_1(t)\right]-\mathbb{E}\left[X_1(0)\right]&=&\sum_{i=1}^{n_r}\theta_i{\it{\Delta}}_{i1}\int_0^t\mathbb{E}\left[\prod_{j=1}^{n_s} {\rm 1}\kern-0.24em{\rm I}_{\left\{X_j(s)\geq\nu_{ij}\right\}}C_{X_j(s)}^{\nu_{ij}}\right]ds,\\ &=&\theta_1\,t+\theta_2\int_0^t\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\{X_1(s)\geq1\}}X_1(s)\right]ds -\theta_3\int_0^t\mathbb{E}\left[{\rm 1}\kern-0.24em{\rm I}_{\{X_1(s)\geq1\}}{\rm 1}\kern-0.24em{\rm I}_{\{X_2(s)\geq1\}}X_1(s)X_2(s)\right]ds,\\ &=&\theta_1\,t+\theta_2\int_0^t\mathbb{E}\left[X_1(s)\right]ds-\theta_3\int_0^t\mathbb{E}\left[X_1(s)X_2(s)\right]ds. \end{eqnarray*} This is equivalent to saying that $$\dot{m}_1(t)=\theta_1+\theta_2\,m_1(t)-\theta_3\,m_2(t),$$ where $$m_1(t)=\mathbb{E}\left[X_1(t)\right]$$ and $$m_2(t)=\mathbb{E}\left[X_1(t)X_2(t)\right]$$. Clearly then, the ordinary differential equation for a first-order moment (the mean of the observable species) involves a second-order moment (the cross-correlation function between the two species). 3.2. Interaction networks in biochemistry As a second case study, we choose gene expression in budding yeast transiently induced by the high-osmolarity glycerol (HOG) pathway (Hohmann, 2002; Pelet et al., 2011; Zechner et al., 2012). The yeast cells induce the MAPK Hog1 signaling cascade upon hyper-osmotic shocks. Signaling pathways are activated for only a short time during which the role of such outcome, as a mitogen-activated protein kinase (MAPK), is twofold. First, in the cytoplasm, Hog1 phosphorylates its substrate to increase the internal concentration of glycerol in the cell. Secondly and in parallel, a large fraction of the active Hog1 translocates to the nucleus where it triggers the activation of transcription for roughly 300 genes. The promoter of the sugar transporter-like protein 1 (pSTL1) is a Hog1 expression target driving the expression of a fluorescent protein (quadrupleVenus-qV). This protein is used as a reporter to quantify the amount of transcription. Once the internal glycerol concentration allows to balance the external osmotic pressure, the pathway is deactivated, leading to a rapid termination of the transcription. A bimodality in the protein expression profiles is observed; this stems from the transient activation of the MAPK Hog1 in conjunction with slow transcription activation by the promoter. The system described above is depicted in Fig. 1. Fig. 1. View largeDownload slide MAPK Hog1 induced pSTL1-qV expression. Osmotic pressure is sensed at the membrane and results in the activation of the MAPK signaling cascade. Once active, double-phosphorylated MAPK Hog1 translocates to the nucleus, where it can bind via transcription factors to the pSTL1 promoter. Then remodelling of the chromatin structure allows for efficient transcription of mRNA, which is exported from the nucleus to yield expression of the fluorescent reporter pSTL1-qV. Fig. 1. View largeDownload slide MAPK Hog1 induced pSTL1-qV expression. Osmotic pressure is sensed at the membrane and results in the activation of the MAPK signaling cascade. Once active, double-phosphorylated MAPK Hog1 translocates to the nucleus, where it can bind via transcription factors to the pSTL1 promoter. Then remodelling of the chromatin structure allows for efficient transcription of mRNA, which is exported from the nucleus to yield expression of the fluorescent reporter pSTL1-qV. A simple model of transiently induced gene expression is given by the following interaction network: \begin{align*} S_1&\longrightarrow\emptyset\\ S_1+S_2&\longrightarrow S_3\\ S_3&\longrightarrow S_1+S_2\\ S_3&\longrightarrow S_3+S_4 \end{align*} The model consists of four species: a transcription factor $$S_1$$, the gene $$S_2$$, the gene with bound transcription factor $$S_3$$ and the produced protein $$S_4$$. Here the binding of $$S_1$$ and $$S_2$$ (to form $$S_3$$) aggregates all necessary steps involved in gene activation such as the binding of transcription factors, polymerase binding, or chromatin remodelling. Also protein synthesis is reduced to a first-order production, abstracting messenger RNA (mRNA) transcription and degradation, translation and protein folding. In this case, $$n_s=4$$, $$n_r=4$$, $$\nu_{11}=1$$, $$\nu_{12}=0$$, $$\nu_{13}=0$$, $$\nu_{14}=0$$, $$\mu_{11}=0$$, $$\mu_{12}=0$$, $$\mu_{13}=0$$, $$\mu_{14}=0$$, $$\nu_{21}=1$$, $$\nu_{22}=1$$, $$\nu_{23}=0$$, $$\nu_{24}=0$$, $$\mu_{21}=0$$, $$\mu_{22}=0$$, $$\mu_{23}=1$$, $$\mu_{24}=0$$, $$\nu_{31}=0$$, $$\nu_{32}=0$$, $$\nu_{33}=1$$, $$\nu_{34}=0$$, $$\mu_{31}=1$$, $$\mu_{32}=1$$, $$\mu_{33}=0$$, $$\mu_{34}=0$$, $$\nu_{41}=0$$, $$\nu_{42}=0$$, $$\nu_{43}=1$$, $$\nu_{44}=0$$, $$\mu_{41}=0$$, $$\mu_{42}=0$$, $$\mu_{43}=1$$, $$\mu_{44}=1$$, $${\it{\Delta}}_1=(-1,0,0,0)$$, $${\it{\Delta}}_2=(-1,-1,1,0)$$, $${\it{\Delta}}_3=(1,1,-1,0)$$, $${\it{\Delta}}_4=(0,0,0,1)$$. With respect to the inverse problem, the species which is subject to observation is the protein. Accordingly, we wish to determine the protein expression profiles, that is, the probability distribution of species $$S_4$$; for the following implementation details we obtain the plots shown in Fig. 2: $$\theta_1=0.001,\quad \theta_2=0.1\,\quad \theta_3=1,\quad \theta_4=0.1,\quad X(0)=(20,5,0,0)^{{\top}}.$$ Fig. 2. View largeDownload slide From the top and from left to right: protein expression profiles at $$t=0$$, $$t=10$$, $$t=20$$, $$t=30$$, $$t=40$$, $$t=50$$ time units ($$x$$-axis: number of molecules, $$y$$-axis: probability). Fig. 2. View largeDownload slide From the top and from left to right: protein expression profiles at $$t=0$$, $$t=10$$, $$t=20$$, $$t=30$$, $$t=40$$, $$t=50$$ time units ($$x$$-axis: number of molecules, $$y$$-axis: probability). The result depicted in Fig. 2 is obtained by Monte-Carlo using $$10^{5}$$ simulations of the PDMP $$X(t)$$. 4. Parameter identification The jumping rate out of state $$x$$ is given by $$\lambda(x)=\sum_{i=1}^{n_r}\lambda(x,x+{\it{\Delta}}_i),$$ where $$\lambda(x,x+{\it{\Delta}}_i)=\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}$$ is the rate of jumping from state $$x$$ to state $$x+{\it{\Delta}}_i$$. We may write $$\lambda(x,x+{\it{\Delta}}_i)$$ in the following expanded form: $$\lambda((x_1,\ldots,x_{n_s}),(x_1+{\it{\Delta}}_{i1},\ldots,x_{n_s}+{\it{\Delta}}_{in_s})),$$ where $${\it{\Delta}}_i=({\it{\Delta}}_{i1},\ldots,{\it{\Delta}}_{in_s})$$. We will make use of the notation $$E_i=\left\{x\in E\textrm{ such that: there are jumps of type i out of }x\right\}$$ for every $$i\in\{1,\ldots,n_r\}$$. We wish to compute the jumping rates between different states of the observable component of the state $$X_1(t)$$. First we must order the $${\it{\Delta}}_{i1}$$, $$i\in\{1,\ldots,n_r\}$$; if they are all nonzero and different, then \begin{eqnarray} \lambda(x_1,x_1+{\it{\Delta}}_{i1})&=&\sum_{x_2,\ldots,\,x_{n_s}} \lambda((x_1,\ldots,x_{n_s}),(x_1+{\it{\Delta}}_{i1},\ldots,x_{n_s}+{\it{\Delta}}_{in_s}))\nonumber\\ &=&\sum_{x_2,\ldots,\,x_{n_s}}\theta_i\,\prod_{j=1}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}\nonumber\\ &=&{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{i1}\right\}}\,C_{x_1}^{\nu_{i1}}\sum_{x_2,\ldots, \,x_{n_s}}\theta_i\,\prod_{j=2}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}.\label{rate_jump0} \end{eqnarray} (4) Actually, the sum over the set $$\left\{x_2,\ldots,\,x_{n_s}\textrm{ such that: } (x_1,x_2,\ldots,x_{n_s})\in E_i\right\}$$ is abbreviated with the notation $$\sum_{x_2,\ldots,\,x_{n_s}}$$ in Eq. (4). Here we introduce the notation $$\tilde{\theta}_i=\sum_{\left\{x_2,\ldots, \,x_{n_s}:\,(x_1,x_2,\ldots,x_{n_s})\in E_i\right\}}\theta_i\,\prod_{j=2}^{n_s}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_j\geq\nu_{ij}\right\}}\,C_{x_j}^{\nu_{ij}}$$ so as to have $$\lambda(x_1,x_1+{\it{\Delta}}_{i1})=\tilde{\theta}_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{i1}\right\}}\,C_{x_1}^{\nu_{i1}}.$$ If for instance the nonzero $${\it{\Delta}}_{i1}$$’s are all different but two, say $${\it{\Delta}}_{11}={\it{\Delta}}_{21}$$, then $$\lambda(x_1,x_1+{\it{\Delta}}_{11})=\lambda(x_1,x_1+{\it{\Delta}}_{21})=\tilde{\theta}_1\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{11}\right\}}\,C_{x_1}^{\nu_{11}}+\tilde{\theta}_2\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq\nu_{21}\right\}}\,C_{x_1}^{\nu_{21}}.$$ In greater generality, we have neither $${\it{\Delta}}_{i1}\neq0$$ for every $$i\in\{1,\ldots,n_r\}$$ nor $${\it{\Delta}}_{11}\neq{\it{\Delta}}_{21}\neq\ldots\neq{\it{\Delta}}_{n_r1}$$ and the jump times of $$X(t)$$ out of state $$x\in\mathbb{N}^{n_s}$$ where its observable component $$X_1(t)$$ jumps as well has intensity $$\label{intensity} \lambda(x)=\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\lambda(x,x+{\it{\Delta}}_i).$$ (5) In other words, if we denote by $$\tilde{T}_1$$ the first jump time of both $$X(t)$$ and $$X_1(t)$$, then we have $$\mathbb{P}\left\{\tilde{T}_1 > t\,|\,\mathcal{F}_t\right\}=\exp\left\{-\int_0^t \sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\lambda(X(s^{\,-}),X(s^{\,-})+{\it{\Delta}}_i)\,ds\right\}\!.$$ Taking into account Eq. (5), we deduce by a similar argument that the intensity of jumps for $$X_1(t)$$, out of state $$x\in\mathbb{N}$$, is given by \begin{eqnarray*} \lambda(x)&=&\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\,\,\sum_{\left\{x_2,\ldots, \,x_{n_s}:\,(x,x_2,\ldots,x_{n_s})\in E_i\right\}}\lambda((x,x_2,\ldots,x_{n_s}),(x,x_2,\ldots,x_{n_s})+{\it{\Delta}}_i),\\ &&\\ &=&\sum_{\{i\in\{1,\ldots,n_r\}:\,{\it{\Delta}}_{i1}\neq0\}}\tilde{\theta}_i\,{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq\nu_{i1}\right\}}\,C_{x}^{\nu_{i1}}. \end{eqnarray*} It appears clearly from the results obtained so far that $$\theta_i$$ is identifiable if and only if (i) $${\it{\Delta}}_{i1}\neq0$$ and (ii) the enumerable set $$E_i$$ is non-empty and finite. The finiteness of $$E_i$$, for every $$i\in\{1,\ldots,n_r\}$$, together with the non-emptiness are demonstrated numerically in Section 5. Now that, we have computed the jumping rates between different states for $$X_1(t)$$, let us introduce the square matrices $${\it{\Lambda}}=\left(\begin{array}{ccccc}-\sum_{x'\neq 0}\lambda(0,x')&\lambda(1,0)&\cdots&\lambda(x,0)&\cdots\\[3pt] \lambda(0,1)&-\sum_{x'\neq 1}\lambda(1,x')&\cdots&\lambda(x,1)&\cdots\\[3pt] \lambda(0,2)&\lambda(1,2)&\cdots&\lambda(x,2)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots\\[3pt] \lambda(0,x-1)&\lambda(1,x-1)&\cdots&\lambda(x,x-1)&\cdots\\[3pt] \lambda(0,x)&\lambda(1,x)&\cdots&-\sum_{x'\neq x}\lambda(x,x')&\cdots\\[3pt] \lambda(0,x+1)&\lambda(1,x+1)&\cdots&\lambda(x,x+1)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots \end{array}\right)\!,$$ $$Q^{{\top}}=\left(\begin{array}{ccccc}0&q(1,0)&\cdots&q(x,0)&\cdots\\[3pt] q(1,0)&0&\cdots&q(x,1)&\cdots\\[3pt] q(0,2)&q(1,2)&\cdots&q(x,2)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots\\[3pt] q(0,x-1)&q(1,x-1)&\cdots&q(x,x-1)&\cdots\\[3pt] q(0,x)&q(1,x)&\cdots&0&\cdots\\[3pt] q(0,x+1)&q(1,x+1)&\cdots&q(x,x+1)&\cdots\\[3pt] \vdots&\vdots&\cdots&\vdots&\cdots \end{array}\right)\!,$$ where $$q(x,x')=\frac{\lambda(x,x')}{\lambda(x)},$$ for $$x'\neq x$$, $$x,x'\in\mathbb{N}$$ and $$\lambda(x)=\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x,x').$$ Note that the sum total of every column of $${\it{\Lambda}}$$ is 0 and the sum total of every row of $$Q$$ is 1; $${\it{\Lambda}}$$ and $$Q$$ are called respectively, the transition probability matrix and the transition rate matrix of $$X_1(t)$$. Since in our application we have at most $$n_r$$ types of jumps out of each $$x\in\mathbb{N}$$, $${\it{\Lambda}}$$ and $$Q$$ are sparse matrices. Besides, knowing $$\lambda(x)$$ for every $$x\in\mathbb{N}$$, there is a one-to-one correspondence between the entries of the two matrices $${\it{\Lambda}}$$ and $$Q$$. We now call the probability distribution of $$X_1(t)$$, $$t\in\mathbb{R}_+$$, $$p(t,x)=\mathbb{P}\left\{X_1(t)=x\right\}\quad\quad x\in\mathbb{N},$$ and form the column vector $$\textbf{p}(t)=\left(\begin{array}{c}p(t,0)\\p(t,1)\\\vdots\end{array}\right)\!.$$ Proposition 2 For every $$t\in\mathbb{R}_+$$, the probability distribution $$\textbf{p}(t)$$ of $$X_1(t)$$ is uniquely determined by the transition rate matrix $${\it{\Lambda}}$$ and the initial distribution $$\textbf{p}(0)$$. Proof. The generator of $$X_1(t)$$ is given by $$\mathcal{G}f(x)=-\lambda(x)f(x)\quad+\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x,x') f(x').$$ The Fokker–Planck operator of $$X_1(t)$$ is given by $$\mathcal{L}_{\textrm{FP}}\,p(t,x)=-\lambda(x)p(t,x)\quad+\sum_{\begin{array}{c}x'\neq x\\x'\in\mathbb{N}\end{array}} \lambda(x',x)p(t,x').$$ The distribution $$p(t,x)$$ before steady-state solves the following Fokker–Planck equation: $$\dot{p}(x,t)=\mathcal{L}_{\textrm{FP}}\,p(t,x) \quad\quad t>0,$$ with the initial condition $$p(0,x)=\mathbb{P}\left\{X_1(0)=x\right\}\!.$$ In compact notation the Fokker–Planck equation is rewritten as $$\dot{\textbf{p}}(t)={\it{\Lambda}}\,\textbf{p}(t) \quad\quad t>0.$$ The initial condition is a known distribution $$\textbf{p}(0)$$ on $$\mathbb{N}$$. The matrix $${\it{\Lambda}}$$ is constant in time, then the unique solution to the Fokker–Planck equation is $$\label{solution} \textbf{p}(t)=\exp\left\{t\,{\it{\Lambda}}\right\}\textbf{p}(0) \quad\quad t\in\mathbb{R}_+.$$ (6) □ Now we are ready to address the identification problem. We have observations of $$X_1(t)$$ at irregularly spaced instants $$t_1,...,t_K$$; these are deterministic and known. And for every observation time $$t_k$$, we have a large number $$N$$ of independent replicates for $$X_1(t_k)$$, written $$X_1(t_k,\omega_1),\ldots,X_1(t_k,\omega_N).$$ The best estimation for the unknown parameters based on the observations is given by the minimization of the following criterion over $$\theta_i$$, $$i\in\{1,\ldots,n_r\}$$: $$\sum_x\sum_{k=1}^K\left[p(t_k,x)-\hat{p}(t_k,x)\right]^2,$$ where $$\hat{p}(t_k,x)=\frac{1}{N}\sum_{n=1}^N {\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t_k,\omega_n)=x\right\}} \quad\quad x\in\mathbb{N}$$ is the empirical probability distribution of $$X_1(t_k)$$, $$k\in\left\{1,\ldots,K\right\}$$. Equation (6) provides us with all what we need to compute the criterion for given values of the $$\theta_i$$’s from ]0,1]. The last ingredient in parameter identification is a truncation: we wish to truncate the state space $$\mathbb{N}$$ to a bounded region $$\{0,1,\ldots,x_{{\textrm{max}}}\}$$; this is, actually the set containing the observed values of $$X_1(t)$$. As a matter of fact, limiting the observation to a finite duration is strictly equivalent to a truncation of the state space for the observable process. We need a theoretical framework to use the truncation without any approximation. Here we call the theory of censored Markov chains; it allows to represent the conditional behaviour of a system within a subset of observed states. Bounding the state space for the observable component of the state $$X_1(t)$$ is not only essential to perform the identification, but it also says that the population of species $$S_1$$ cannot be arbitrary large in size for the maximum allowed value of the observation duration; this is quite natural in our application. Often, in biological processes, we have low populations (e.g., the key molecules are in low copy number) and concentrations are not well defined. Consider the following partition for the state space $$\mathbb{N}$$ of $$X_1(t)$$: $$\mathfrak{D}=\{0,1,\ldots,x_{{\textrm{max}}}\}\quad\quad \mathbb{N}\setminus\mathfrak{D}$$ so as to obtain the following block description of the transition probability matrix of $$X_1(t)$$: $$Q=\left(\begin{array}{cc}Q_{11}&Q_{12}\\ Q_{21}&Q_{22}\end{array}\right)\!.$$ The blocks $$Q_{11}$$, $$Q_{12}$$, $$Q_{21}$$ and $$Q_{22}$$ correspond respectively to jumps within $$\mathfrak{D}$$, jumps from $$\mathfrak{D}$$ to $$\mathbb{N}\setminus\mathfrak{D}$$, jumps from $$\mathbb{N}\setminus\mathfrak{D}$$ to $$\mathfrak{D}$$ and jumps within $$\mathbb{N}\setminus\mathfrak{D}$$. The censored chain with censoring set $$\mathfrak{D}$$ observes only the states in $$\mathfrak{D}$$. More precisely, let $$\tau_k$$, $$k\in\{1,2,\ldots\}$$, be the jump times of $$X_1(t)$$ relative to jumps between states within $$\mathfrak{D}$$, that is, for every $$k\in\{1,2,\ldots\}$$, $$X_1(\tau_k^{\,-})$$ and $$X_1(\tau_k)$$ belong to $$\mathfrak{D}$$. The discrete-time process $$(X_1(\tau_k),k\in\{1,2,\ldots\})$$ is called the censored chain with censoring set $$\mathfrak{D}$$; its transition probability matrix is equal to $$\label{CMC} Q_{{\textrm{cens}}}=Q_{11}+Q_{12}(I-Q_{22})^{-1}Q_{21}.$$ (7) $$Q_{{\textrm{cens}}}$$ is the so-called stochastic complement of matrix $$Q_{11}$$ (Bušić, 2012). Then, knowing $$\lambda(x)$$ for every $$x\in\{0,1,\ldots,x_{{\textrm{max}}}\}$$, we derive the transition rate matrix $${\it{\Lambda}}_{{\textrm{cens}}}$$ for the censored chain with censoring set $$\mathfrak{D}$$. Finally, parameter identification reduces to the following optimization problem: $$\label{optimization_problem} \min_{\theta_1,\ldots,\theta_{n_r}\in]0,1]}\,\sum_{x=0}^{x_{{\textrm{max}}}}\sum_{k=1}^K\left[p(t_k,x)-\hat{p}(t_k,x)\right]^2,$$ (8) where $$\hat{p}(t_k,x)=\frac{1}{N}\sum_{n=1}^N {\rm 1}\kern-0.24em{\rm I}_{\left\{X_1(t_k,\omega_n)=x\right\}} \quad\quad x\in\{0,1,\ldots,x_{{\textrm{max}}}\},$$ and $$\left(\begin{array}{c}p(t_k,0)\\\vdots\\p(t_k,x_{{\textrm{max}}})\end{array}\right)=\exp\left\{t_k\,{\it{\Lambda}}_{{\textrm{cens}}}\right\}\left(\begin{array}{c}p(0,0)\\\vdots\\p(0,x_{{\textrm{max}}})\end{array}\right) \quad\quad t_k\in\{t_1,...,t_K\}.$$ Remark. It is possible to consider other loss functions than the squared difference in Eq. (8), e.g., the Kullback–Leibler divergence. Ultimately, it is worth noting that observing a single species and accordingly considering a one-dimensional observation process is rather a simplification than a limitation. If several components of the state are observable, the generalization of our method for parameter identification is straightforward. 5. Case study continued Let us go back to the first case study from ecological engineering. The purpose of this section is to emphasize and complete our approach for parameter identification while implementing it. In this case the state $$X(t)=(X_1(t),X_2(t))^{{\top}}$$ has three types of jumps with rates given by \begin{eqnarray*} \lambda((x_1,x_2),(x_1+1,x_2+1))&=&\theta_1{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq0\right\}}\,C_{x_1}^{0}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq0\right\}}\,C_{x_2}^{0}+ \theta_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,C_{x_1}^{1}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq0\right\}}\,C_{x_2}^{0},\\ \lambda((x_1,x_2),(x_1-1,x_2))&=&\theta_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,C_{x_1}^{1}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,C_{x_2}^{1},\\ \lambda((x_1,x_2),(x_1,x_2-1))&=&\theta_4{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq0\right\}}\,C_{x_1}^{0}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,C_{x_2}^{1}, \end{eqnarray*} i.e., \begin{eqnarray*} \lambda((x_1,x_2),(x_1+1,x_2+1))&=&\theta_1+\theta_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}\,x_1,\\ \lambda((x_1,x_2),(x_1-1,x_2))&=&\theta_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x_1\geq1\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,x_1x_2,\\ \lambda((x_1,x_2),(x_1,x_2-1))&=&\theta_4{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq1\right\}}\,x_2. \end{eqnarray*} Hence, the observable component of the state $$X_1(t)$$ has two types of jumps with the following rates: \begin{eqnarray*} \lambda(x,x+1)&=&\tilde{\theta}_1+\tilde{\theta}_2{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq1\right\}}\,x,\\ \lambda(x,x-1)&=&\tilde{\theta}_3{\rm 1}\kern-0.24em{\rm I}_{\left\{x\geq1\right\}}\,x. \end{eqnarray*} Without truncation the transition probability matrix and the transition rate matrix of $$X_1(t)$$ are given by $${\it{\Lambda}}=\left(\begin{array}{cccccc}-\lambda(0,1)&\lambda(1,0)&0&\cdots\cdots&0&\cdots\\ \lambda(0,1)&-\lambda(1,0)-\lambda(1,2)&\lambda(2,1)&&\vdots&\cdots\\ 0&\lambda(1,2)&-\lambda(2,1)-\lambda(2,3)&&\vdots&\cdots\\ \vdots&0&\lambda(2,3)&&\vdots&\cdots\\ \vdots&\vdots&0&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)&\cdots\\ \vdots&\vdots&\vdots&&-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)&\cdots\\ \vdots&\vdots&\vdots&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\vdots&\cdots \end{array}\right)\!,$$ $$Q^{{\top}}=\left(\begin{array}{cccccc}0&\frac{\lambda(1,0)}{\lambda(1,0)+\lambda(1,2)}&0&\cdots\cdots&0&\cdots\\ 1&0&\frac{\lambda(2,1)}{\lambda(2,1)+\lambda(2,3)}&&\vdots&\cdots\\ 0&\frac{\lambda(1,2)}{\lambda(1,0)+\lambda(1,2)}&0&&\vdots&\cdots\\ \vdots&0&\frac{\lambda(2,3)}{\lambda(2,1)+\lambda(2,3)}&&\vdots&\cdots\\ \vdots&\vdots&0&&\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}&\cdots\\ \vdots&\vdots&\vdots&&0&\cdots\\ \vdots&\vdots&\vdots&&\vdots&\cdots \end{array}\right)\!.$$ In the present case we are able to build the four blocks $$Q_{11}^{{\top}}$$, $$Q_{12}^{{\top}}$$, $$Q_{21}^{{\top}}$$ and $$Q_{22}^{{\top}}$$; it follows from Eq. (7) that $$Q_{{\textrm{cens}}}^{{\top}}=\left(\begin{array}{cccccc}0&q(1,0)&0&\cdots\cdots&0\\ 1&0&q(2,1)&&\vdots\\ 0&q(1,2)&0&&\vdots\\ \vdots&0&q(2,3)&&\vdots\\ \vdots&\vdots&0&&q(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)\\ \vdots&\vdots&\vdots&&\propto q(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)q(x_{{\textrm{max}}}+1,x_{{\textrm{max}}}) \end{array}\right)\!.$$ Here it appears clearly that we must have $$q(x_{{\textrm{max}}},x_{{\textrm{max}}}+1) =\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}=0,$$ and, consequently, $$q(x_{{\textrm{max}}},x_{{\textrm{max}}}-1) =\frac{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)}{\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)+\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}+1)}=1.$$ In other words, the truncation for the state space $$\mathbb{N}$$ of the observation process $$X_1(t)$$ turns out to be equivalent to killing or stopping $$X_1(t)$$ at $$\tau$$, the hitting time of $$x_{{\textrm{max}}}+1$$, and considering $$X_1(t)$$ on $$[0,\tau[$$. Finally, knowing $$\lambda(x)$$ for every $$x\in\mathfrak{D}$$, we derive the transition rate matrix for the censored chain with censoring set $$\mathfrak{D}$$, namely $${\it{\Lambda}}_{{\textrm{cens}}}=\left(\begin{array}{cccccc}-\lambda(0,1)&\lambda(1,0)&0&\cdots\cdots&0\\ \lambda(0,1)&-\lambda(1,0)-\lambda(1,2)&\lambda(2,1)&&\vdots\\ 0&\lambda(1,2)&-\lambda(2,1)-\lambda(2,3)&&\vdots\\ \vdots&0&\lambda(2,3)&&\vdots\\ \vdots&\vdots&0&&\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1)\\ \vdots&\vdots&\vdots&&-\lambda(x_{{\textrm{max}}},x_{{\textrm{max}}}-1) \end{array}\right)\!.$$ To complete our general statements of the previous section about the solution to the inverse problem, we must provide an arbitrary implementation for numerical verification of the finiteness along with the non-emptiness of $$E_i$$, for every $$i\in\{1,\ldots,n_r\}$$. The implementation details are as follows: $$X_1(0)$$ and $$X_2(0)$$ are discrete uniform random numbers from $$\{0,1\}$$; $$\theta_1=0.1$$, $$\theta_2=0.1$$, $$\theta_3=0.1$$, $$\theta_4=0.1$$ per time unit; $$x_{{\textrm{max}}}=100$$ individuals from the observable species. Figure 3 shows a particular trajectory of the two-dimensional PDMP $$X(t)$$. Fig. 3. View largeDownload slide Arbitrary sample paths of the species’ population size. Fig. 3. View largeDownload slide Arbitrary sample paths of the species’ population size. The empirical probability distributions of $$X_1(t)$$ observed at $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units are displayed in chronological order as a column of histograms in Fig. 4. Theses histograms are confronted to the corresponding probability distributions computed using the estimated parameters; the obtained distributions are shown as red stems in the subplots of Fig. 4. A solution to the optimization problem given by Eq. (8) is obtained using the optimization toolbox of MATLAB; here MATLAB’s functions aim at solving the optimization problem in the weak sense of finding a locally optimal point. lsqnonlin, for instance, requires an initial guess for the optimization variables $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. This initial guess or starting point is critical, and can greatly affect the objective value of the local solution obtained. In this case, setting the initial guess to the zero lower bound yields the following estimates: $$\tilde{\theta}_1=0.9838,\quad\tilde{\theta}_2=0.6035\,\quad\tilde{\theta}_3=1.7237.$$ Fig. 4. View largeDownload slide From the top: the empirical distributions (white bars) versus the estimated ones (red stems) for $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units. $$x$$-axis: number of individuals from the observed species, $$y$$-axis: probability. Fig. 4. View largeDownload slide From the top: the empirical distributions (white bars) versus the estimated ones (red stems) for $$t_1=10$$, $$t_2=20$$, $$t_3=30$$, $$t_4=40$$, $$t_5=50$$ time units. $$x$$-axis: number of individuals from the observed species, $$y$$-axis: probability. Hence, \begin{eqnarray*} \frac{\tilde{\theta}_1}{\theta_1}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_1\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{12}\right\}}\,C_{x_2}^{\nu_{12}}&\approx&10,\\ \frac{\tilde{\theta}_2}{\theta_2}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_2\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{22}\right\}}\,C_{x_2}^{\nu_{22}}&\approx&6,\\ \frac{\tilde{\theta}_3}{\theta_3}=\sum_{\left\{x_2:\,(x_1,x_2)\in E_3\right\}}{\rm 1}\kern-0.24em{\rm I}_{\left\{x_2\geq\nu_{32}\right\}}\,C_{x_2}^{\nu_{32}}&\approx&17, \end{eqnarray*} where $$E_i=\left\{x\in E:\textrm{ there are jumps of type i out of }x\right\}\quad\quad i\in\{1,2,3\}.$$ Thus, we have checked that for every $$i$$ for which $${\it{\Delta}}_{i1}\neq0$$ the enumerable set $$E_i$$ is non-empty and finite. 5.1. Artificial neural networks for parameter calibration In the remainder, we address the question of how to proceed when we only have on hand the real observed histograms and the initial probability distribution of the process $$X(t)$$. Here the outstanding issue is to find a triplet $$(\theta_1,\theta_2,\theta_3)$$ producing any desired triplet $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. This can be addressed by learning the inverse of the function $$f({\it{\Theta}})=\tilde{{\it{\Theta}},}$$ where $${\it{\Theta}}=(\theta_1,\theta_2,\theta_3)$$ and $$\tilde{{\it{\Theta}}}=(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$. However, if two different $${\it{\Theta}}$$ vectors have the same $$\tilde{{\it{\Theta}}}$$ vector, then an inverse neural network would end up learning the average of the two rather than learning one or the other. We must therefore assume the function $$f$$ to be bijective, and we propose a method composed of two steps: First, we perform Monte-Carlo simulations of the system for $$N$$ independent and identically distributed samples $$(\theta_1,\theta_2,\theta_3)_n$$, $$n\in\{1,\ldots,N\}$$, drawn from the uniform distribution on $$]0,1]^3$$. For each $$n\in\{1,\ldots,N\}$$, we perform the optimization using the simulated data so that to obtain the triplet of estimates $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n$$. Now that we have a set of pairs $$\left\{(\theta_1,\theta_2,\theta_3)_n\,,\,(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n\right\} \quad\quad n\in\{1,\ldots,N\},$$ we then approximate the non-linear input–output relationship between the parameters $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$ and $$(\theta_1,\theta_2,\theta_3)$$ by supervised learning using artificial neural networks (ANN). Here the inputs are the $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)_n$$’s and the desired targets are the $$(\theta_1,\theta_2,\theta_3)_n$$’s. From these data, it is required to construct a map that generalizes well, that is, given a new value of $$(\tilde{\theta}_1,\tilde{\theta}_2,\tilde{\theta}_3)$$, the map will provide a reasonable prediction for the unobserved output $$(\theta_1,\theta_2,\theta_3)$$ associated with this new value of the predictor variable, at least when it is not too far from the training predictor set. Much interest in ANN has been based on the use of trainable feedforward networks as universal approximators for functions $$f:\mathcal{U}\longrightarrow\mathcal{Y}$$ from the input space $$\mathcal{U}$$ to the output space $$\mathcal{Y}$$. In fact, there is proof that a fairly simple ANN can fit any practical function. The neural net fitting app of MATLAB makes it easy to create and train a single-hidden-layer feedforward neural net for function approximation and non-linear regression. Besides, for improving generalization, the default method provided by MATLAB’s neural network toolbox is called early stopping. In this technique the available data is divided into three subsets: training, validation and test sets. The error on the validation set decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set typically begins to rise. When the validation error increases for a specified number of iterations, the training is stopped and the weights and biases at the minimum of the validation error are returned. After training, the test data set is used to estimate the prediction error, that is, how well the network performs on future data. A data set of $$N=3500$$ example subjects is generated, and then it is randomly divided into three sets; here 70% is used for training, 15% is used to validate, that is, to stop training before overfitting and 15% is used to test the generalization ability of the network. A reasonably satisfactory approximation is obtained from a basic network with 10 neurons and a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer. The network performance computed on the test data set, that is, the mean square error (MSE) between the actual output and the target, amounts to 0.0392. A plot of the training MSE, validation MSE and test MSE is shown in Fig. 5. The error histogram in Fig. 6 gives additional performance assessment. A linear regression between the network outputs and the corresponding targets for the training, validation and test sets is displayed in Fig. 7. Fig. 5. View largeDownload slide Training MSE, validation MSE and test MSE. Fig. 5. View largeDownload slide Training MSE, validation MSE and test MSE. Fig. 6. View largeDownload slide Error histogram. Fig. 6. View largeDownload slide Error histogram. Fig. 7. View largeDownload slide Linear regression between the network outputs and the corresponding targets for the training, validation, and test sets. Fig. 7. View largeDownload slide Linear regression between the network outputs and the corresponding targets for the training, validation, and test sets. 6. Conclusion There is a wide range of applications in the field of stochastic interaction networks; here model identification is the most challenging problem theoretically. We give an advance in assessing and learning the functional connections within such a network using a PDMP as model. While the underlying assumptions are more realistic, a PDMP lends itself well to the identification problem. We illustrate through two applications of biological network modelling. The conformity between the implementation results and the general statements, along with a low computational cost prove the performance of our approach. References Baili H. & Fleury G. ( 2002 ) Statistical characterization of a measurement—approach using MCMC. Proceeding of IEEE 19th Instrumentation and Measurement Technology Conference , Anchorage, USA , May 21 – 23 . Baili H. & Fleury G. ( 2004 ) Indirect measurement within dynamical context: probabilistic approach to deal with uncertainty. IEEE Trans. Instrum. Meas. , 53 , 1449 – 1454 . Google Scholar CrossRef Search ADS Bect J. , Baili H. & Fleury G. ( 2006 ) Generalized Fokker–Planck equation for piecewise-diffusion processes with boundary hitting resets. Proceedings of 17th International Symposium on Mathematical Theory of Networks and Systems , Kyoto, Japan , July 24 – 28 . Brémaud P. ( 1981 ) Point Processes and Queues: Martingale Dynamics . Springer . Google Scholar CrossRef Search ADS Bušić A. , Djafri H. & Fourneau J. M. ( 2012 ) Bounded state space truncation and censored Markov chains. IEEE 51st Annual Conference on Decision and Control , Maui, Hawaii, USA , December 10 – 13 . Gardiner C. W. ( 1985 ) Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences , 2nd edn . Berlin : Springer . Google Scholar CrossRef Search ADS Hegland M. & Garcke J. ( 2010 ) On the numerical solution of the chemical master equation with sums of rank one tensors. ANZIAM J. , 52 , C628 – C643 . Google Scholar CrossRef Search ADS Hohmann S. ( 2002 ) Osmotic stress signaling and osmoadaptation in yeasts. Microbiol. Mol. Biol. Rev ., 66 , 300 – 372 . Google Scholar CrossRef Search ADS PubMed Jacod J. & Skorokhod A. V. ( 1996 ) Jumping Markov processes. Ann. I. H. P. , 32 , 11 – 67 . Jahnke T. & Huisinga W. ( 2008 ) A dynamical low-rank approach to the chemical master equation. Bull. Math. Biol. , 70 , 2283 – 2302 . Google Scholar CrossRef Search ADS PubMed Kazeev V. , Khammash M. , Nip M. & Schwab C. ( 2014 ) Direct solution of the chemical master equation using qunatized tensor trains. PLOS Comput. Biol. , 10 . Klebaner F. C. ( 2005 ) Introduction to Stochastic Calculus with Applications, 2nd edn . London : Imperial College Press . Google Scholar CrossRef Search ADS Kulasiri D. & Verwoerd W. ( 2002 ) Stochastic Dynamics. Modelling Solute Transport in Porous Media , 1st edn . Amsterdam : Elsevier Science . Øksendal B. ( 2003 ) Stochastic Differential Equations: An Introduction with Applications , 6th edn . Springer . Pelet S. et al. ( 2011 ) Transient activation of the HOG MAPK pathway regulates bimodal gene expression. Science, 332 , 732 – 735 . Google Scholar CrossRef Search ADS Protter P. E. ( 2005 ) Stochastic Integration and Differential Equations , 2nd edn . Springer . Google Scholar CrossRef Search ADS Risken H. ( 1989 ) The Fokker–Planck Equation: Methods of Solution and Applications , 2nd edn . Springer . Google Scholar CrossRef Search ADS Ruess J. , Milias-Argeitis A. , Summers S. & Lygeros J. ( 2011 ) Moment estimation for chemically reacting systems by extended Kalman filtering. J. Chem. Phys. , 135 , 165102 . Google Scholar CrossRef Search ADS PubMed Zechner C. et al. ( 2012 ) Moment-based inference predicts bimodality in transient gene expression. Proc. Natl Acad. Sci. USA , 109 , 8340 – 8345 . Google Scholar CrossRef Search ADS © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.

### Journal

IMA Journal of Mathematical Control and InformationOxford University Press

Published: Sep 11, 2017

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations