Approximating first-passage time distributions via sequential Bayesian computation

We show that computing first-passage time distributions for stochastic processes is equivalent to a sequential Bayesian inference problem for an auxiliary observation process. The solution can be approximated efficiently by solving a closed set of coupled ordinary differential equations. The method is accurate and orders of magnitude faster than existing approaches, enabling hitherto computationally prohibitive tasks such as sensitivity analysis. We apply it to an epidemic model, an entrained oscillator and a trimerisation process, and show good agreement with exact stochastic simulations.

We show that computing first-passage time distributions for stochastic processes is equivalent to a sequential Bayesian inference problem for an auxiliary observation process. The solution can be approximated efficiently by solving a closed set of coupled ordinary differential equations. The method is accurate and orders of magnitude faster than existing approaches, enabling hitherto computationally prohibitive tasks such as sensitivity analysis. We apply it to an epidemic model, an entrained oscillator and a trimerisation process, and show good agreement with exact stochastic simulations.
Many systems in nature consist of stochastically interacting agents or particles. Such systems are typically modelled as continuous-time Markov processes. There are several examples of stochastic modeling in the fields of systems biology [1,2], ecology [3], epidemiology [4], social sciences [5] and neuroscience [6]. For all but the simplest cases no analytic solutions to the corresponding master or Fokker-Planck equations are known and exact stochastic simulations of such processes are computationally expensive. Significant effort has therefore been spent on the development of approximation methods for efficient computation of systems dynamics [7].
Most methods compute the single-time marginal probability distributions of the process by developing an approximation to the underlying stochastic description. Path properties, in contrast, are not captured by these equations. An important path quantity is the firstpassage time (FPT), that is, the time it takes the process to first cross a certain threshold [8,9]. FPT distributions play a crucial role both in the theory of stochastic processes and in their applications across various disciplines as they permit to investigate quantitatively the uncertainty in the emergence of system properties within a finite time horizon. For example, the time it takes cells to respond to external signals by expressing certain genes may be modelled as a FPT problem. Different characteristics of this first time distribution, for example its variance, may represent evolutionarily different strategies that organisms adopt to filter fluctuations in the environment [10][11][12]. Other intriguing phenomena, such as the variation in period of a stochastic oscillator, can be cast as a FPT problem. Fluctuations of such oscillators have recently been found to lead to widely varying differentiation times of genetically identical cells [13]. Examples from other disciplines include the extinction time of diseases in epidemic models, or the time for a species to reach a critical threshold in population dynamics.
Barely any approximation methods exist for path properties such as FPTs, and the few existing approaches typically approximate mean FPTs only [14]. The reason for * gsanguin@inf.ed.ac.uk this methodological gap is conceptual. The FPT probability is the probability that a process reaches a certain state for the first time, which means the probability to be in a certain state at a given time conditioned on the process not having been in this state before. In general, there are no tractable evolution equations for FPT distributions except for one variable, one step processes [15,16] or certain catalytic processes [17,18]. The only way to access FPTs is typically given by computationally intensive stochastic simulations.
In this article, we approach the problem of computing FPTs from a novel perspective. We show that the FPT problem can be exactly formulated as a Bayesian inference problem. This is achieved by introducing an auxiliary observation model that determines if the process has ever crossed the threshold up to a given time. This novel formulation allows us to derive an efficient approximation scheme that relies on the solution of a small set of ordinary differential equations. We will use this approximation to analyse the FPT distributions in several non-trivial examples.
The standard approach to assess the FPT of a process x t to leave a certain region C is to make the boundary of C absorbing, and to compute the survival probability Z [0,t] , that is, the probability that the process remains in C on the time interval [0, t] [16]. The FPT distribution is then given by the negative time-derivative of Z [0,t] . The latter can be written as a path integral over the process with absorbing boundary [16]. Here, we note that an absorbing boundary is mathematically equivalent to an unconstrained process reweighted by an auxiliary observation process p( and zero otherwise. We can then write the survival probability Z [0,t] up till time t as a path integral over the density p(x [0,t] ) of the unconstrained process as Note that (1) is exact. However, it is not obvious how to compute the path integral in (1).
To make progress, we approximate the continuous process with paths x [0,t] by a discrete-time version (x t0 , . . . , x t N ) at points t 0 = 0, . . . , t N = t with spacing ∆t = t/N . This means that the global observation process p(C [0,t] |x [0,t] ) can be written as a product of local observation process as where p(C ti |x ti ) = 1 if x ti ∈ C and zero otherwise. This gives the problem a Markovian structure and allows us to cast it into a sequential Bayesian inference problem, as follows. First, we approximate the binary observation factors in (2) by a smooth approximation of the form where U (x ti , t i ) is a smooth function that is large for x ti / ∈ C and close to zero for x ti ∈ C, with a sharp transition at the boundary. We will give specific examples later on. For now, the only property U (x ti , t i ) has to satisfy is that we can compute its expectation analytically with respect to a normal distribution and that it allows a closed-form expression in the moments of this distribution. The smooth approximation in (3) proves computationally expedient in the algorithm below and allows us to take the continuum limit ∆t → 0 since it is the discretised version of the global soft constraint Due to the Markov property, the survival probability Z [0,t] in (1) for the discrete-time system factorises as where C ≤ti = {C ti C ti−1 , . . . , C t0 }. The computation of the factors on the r.h.s. of (5) corresponds to a sequential Bayesian inference problem which can be solved iteratively as follows: (i) Suppose we know p(x ti |C ≤ti ), and that using this as the initial distribution, are able to solve the system (the master or Fokker-Planck equation) forward in time until time point t i+1 to obtain p(x ti+1 |C ≤ti ).
(ii) To obtain p(x ti+1 |C ≤ti+1 ), take the constraint in (3) for time point t i+1 into account by means of the Bayes' theorem as where we defined Z ti+1 = p(C ti+1 |C ≤ti ). Note that the latter corresponds to the ith term in (5).
Performing steps (i) and (ii) iteratively from t 0 to t N one can thus, in principle, compute the survival probability according to (5).
However, steps (i) and (ii) are generally intractable, and we propose an approximation method in the following. For step (i), we need to solve the system forward in time. We do this approximately by means of the normal moment closure [19], which assumes the single-time probability distribution to be a multi-variate normal distribution N (x t ; µ t , Σ t ), where µ t and Σ t are the mean and covariance of the process. This assumption leads to closed evolution equations for µ t and Σ t which can be solved numerically [7]. Now suppose that we have solved the system forward from time t to t + ∆t to obtainμ t+∆t andΣ t+∆t (step (i)). In order to perform the observation update in (6) we approximate the posterior distribution p(x t+∆t |C ≤t+∆t ) by a normal distribution with mean µ t+∆t and covariance Σ t+∆t , an approach known as assumed-density filtering in the statistics literature [20]. The corresponding normalisation Z t+∆t in (6) can be written as Taking the logarithm of both sides, expanding in ∆t both the exponential within the integral and the logarithm, and taking derivatives w.r.t. µ and Σ, one obtains corresponding update equations (see Appendix A for a derivation). The continuum limit ∆t → 0 gives the following closed set of differential equations: Here, the first terms on the r.h.s. of (8) and (9) are respectively the prior equations for the mean and covariance as obtained from the normal moment closure for the unconstrained process system, while the second terms incorporate the auxiliary observation. Equation (10) gives the desired survival probability for the process. We term this method for computing FPT distributions Bayesian First-Passage Times (BFPT). Equations (8)-(10) are the central result of this article. They constitute closed form ordinary differential equations for the mean, covariance and log-survival probability of the process, for which efficient numerical integrators exist. Solving these equations forward in time on an interval [0, t] provides an approximation of the survival probability Z [0,t] for τ ∈ [0, t] (on the time grid of the numerical integrator), from which the FPT distribution p(τ ; C) can be derived for all τ ∈ [0, t] by taking the negative derivative of Z [0,t] . We note that similar equations were obtained in a different context in [21] by means of a variational approximation without the need for timediscretisation.
Note that in the derivation of (8) -(10) we utilised three approximations: we approximated the unconstrained process using normal moment closure, the binary observation model by a soft potential and the observation updates by projections onto a normal distribution. Depending on the circumstances, the relative contribution of the three sources to the overall error may vary.
We now examine the performance of BFPT on three numerical examples. First, we consider an epidemic system consisting of a susceptible population S, an infected population I and a recovered population R and interactions This system is frequently modelled as a continuoustime Markov-jump process to model a disease spreading through a population. k 1 and k 2 in (11) are the corresponding rate constants. Let x t = (z t , y t , z t ), where x t , y t and z t denote the populations of S, I and R, respectively. We are interested in the probability distribution of time for the disease to be permanently eradicated, that is, the time it takes the process to reach a state with y t = 0. To approximate this extinction region by a soft constraint as in (3) we use a potential U (t, xt) = (yt − b) n /a n , a ∈ R+, b ∈ R, n ∈ Z and even.
We found a polynomial potential as in (12) numerically more stable than for example an exponential potential of the form exp(−n(x − b)) and will use this form also for following examples. For more details on the choice of U (t, x t ) see the Appendix B.  shows the mean, variance and mode of the FPT to extinction as obtained from our method and the stochastic simulation algorithm [22]. We find that BFPT captures the mean value of the FPT over a wide range of varying initial values for S and I. The variance and mode are also captured well with errors decreasing for larger initial values. Figure 1 (c) and (d) show the FPT distributions for four different parameter sets. The modality, mode and overall shape of the FPT are well captured, even for highly skewed and bimodal distributions (c.f. blue curve in Fig. 1 (c) and (d), respectively). In some cases the method predicts less peaked distributions than actual (not shown here).
The value of the approach is borne out by considering its computational efficiency: for the results shown in Fig. 1, BFPT is several orders of magnitude faster than stochastic simulations. For example, simulating 10 4 paths to obtain the results shown in Fig. 1 (d), takes about 10 2 seconds in our implementation of the direct stochastic simulation algorithm [22], whereas BFPT takes less than a second. Crucially, while stochastic simulations scale with the number of transitions per unit time and thus with the average population size, BFPT scales with the number of species. Therefore, for larger population sizes stochastic simulations will become computationally more and more expensive while BFPT's cost can be expected to remain of the same order.
Next, we study a stochastic oscillator that becomes entrained by an external input. Such models are frequently used to model circadian oscillators in plants [23]. We are interested in the fluctuations of the oscillatory period of the stochastic process. To this end, we use a prey-predator system of the Lotka-Voltera type consisting of a prey species X and predator species Y that interact via Note that the rate of the third reaction, which resembles The white areas in the first two plots indicate that either the value is larger than the plotted range or that the target state is reached with such small probability that a estimation of moments is not sensible. The power in the potential in (12) was chosen to be n = 40.
reproduction of prey X, oscillates around a mean value k 2 with amplitude a and period p. This could for example model the changing availability of food resources due to seasons over the period of a year. Figure 2 shows the distribution of one oscillatory period. The oscillatory period is computed by estimating the FPT for the process starting from its maximum minus half a standard deviation to first reach this value again after one period (see Appendix C for details). Again BFPT is in excellent agreement with simulation results, but in this case the difference in computational costs is even more marked: due to the time-dependence of the third reaction in (13), this system cannot be simulated by standard simulation algorithms [22], requiring a computationally intensive modified simulator [24]. In this example, simulating 1000 periods to produce the results shown in Fig. 2 took about 10 3 hours implementation. The BFPT result in Figure 2, in contrast, took only 0.1 seconds meaning that BFPT is about 7 orders of magnitude faster than the exact simulation.
Finally, we apply BFPT to a polymerisation system of monomers X, dimers XX and trimers XXX with inter- Starting from a fixed number of 10 3 of monomers, zero dimers and zero trimers, we are interested in the FPT it takes to produce 200 trimers. We are interested in exploring the dependence of this FPT distribution on the parameters of the system (dimerisation and trimerisation rate k 1 and k 2 respectively); such parameter exploration is computationally too demanding to be performed by brute force simulation without access to dedicated hardware since the FPT distribution needs to be estimated for a large number of parameter sets. Figure 3 shows the results for this process. We observe excellent agreement between BFPT and simulations for a particular value of the parameters (Fig. 3 (a)). The heat plot for the mean as a function of k 1 and k 2 indicates that for a given trimerisation rate k 2 a minimal mean FPT is achieved for an intermediate value of dimerisation rate k 1 . We find a linear relationship k 2 ≈ 2.3k 1 between the two rates for the location of these minima. The variance of FPT behaves quantitatively similarly (not shown in the figure). The coefficient of variation (Fig. 3 (c) ), however, becomes minimal for small values of k 1 for a given k 2 . This reveals an unexpected tradeoff between an optimal mean FPT and optimal noise-to-mean ratio (coefficient of variation). Figure 3 (d) shows the probability that the target state is reached, that is, the probability that at least 200 trimers are being produced. We find that there are two parameter regions, one with probability close to one and one with probability close to zero, and a small transition range between these two with boundary k 2 ≈ 0.55k 1 .
In conclusion, we have shown that the problem of computing survival probabilities and FPT distributions for stochastic processes can be formulated as a sequential Bayesian inference problem. This novel formulation opens the way for a new class of efficient approximation methods from machine learning and computational statistics to address this classical intractable problem. Here, we derived one approximation which relies on solving a small set of ordinary differential equations and which was found to combine distributional accuracy with high computational efficiency for several examples. This efficiency can be leveraged to perform hitherto computationally prohibitive systematic analyses such as parameter sweeps and opens the way to quantitatively understand how physical parameters interact to determine emergent behaviours in stochastic systems. The approach can in principle also be used to compute more general path properties, such as general reachability constraints in computer science; it would be interesting to investigate extensions of this approach to address intractable problems for high dimensional and spatio-temporal processes. of trimers to be smaller than 200. Choosing a = 200 and b = 0, we confine the trimer numbers to [−200, 200]. However, since the probability mass for negative values is vanishingly small (zero for the true, discrete process), this is basically the same as confining it to (−∞, 200]. Similar arguments apply to the other studied examples. This leaves us with the choice of the power n. A larger n corresponds to a steeper potential and hence to a better approximation of the binary observation process. However, a larger n also leads to stronger non-linearity of the equations (8)-(10). This will typically limit n for numerical feasibility. Optimally, one would hope the results to converge in the power n for values for which the equations are still numerically feasible. For some of the examples studied we indeed found this to be the case. For others, we chose n as large as numerically feasible. A more systematic way of choosing n is left for future work.