Non-Markovian gene expression

We study two non-Markovian gene-expression models in which protein production is a stochastic process with a fat-tailed non-exponential waiting time distribution (WTD). For both models, we find two distinct scaling regimes separated by an exponentially long time, proportional to the mean first passage time (MFPT) to a ground state (with zero proteins) of the dynamics, from which the system can only exit via a non-exponential reaction. At times shorter than the MFPT the dynamics are stationary and ergodic, entailing similarity across different realizations of the same process, with an increased Fano factor of the protein distribution, even when the WTD has a finite cutoff. Notably, at times longer than the MFPT the dynamics are nonstationary and nonergodic, entailing significant variability across different realizations. The MFPT to the ground state is shown to directly affect the average population sizes and we postulate that the transition to nonergodicity is universal in such non-Markovian models.

Introduction.Gene expression is an inherently stochastic process, playing a key role in the function of prokaryotes and eukaryotes [1][2][3][4][5].Stochastic gene expression in genetically identical cells in identical environments is governed by mRNA and protein noise, and is suggested to be an important source of phenotype variability [6][7][8], and an important trait that can optimize the balance of fidelity and diversity in eukaryotic gene expression [9].
Multiple studies have treated stochastic gene expression using either chemical master equations or Langevin equations in various models [4,[10][11][12][13][14][15][16][17].However, most existing models are Markovian with exponentially distributed inter-reaction times [11].Although this is a valid assumption in many realistic cases [18], it is becoming apparent that many natural processes exhibit long delays or non-exponential intrinsic waiting times [19][20][21].Molecular memory can be created, e.g., due to incomplete mixing of small reaction steps involved in the synthesis of macromolecules, such as mRNA or proteins [22][23][24][25][26].Moreover, even in simple geometries, reaction dynamics are characterized by an enormous spread of relevant time scales [27][28][29], such that reaction and diffusion control are intricately coupled, in contrast to models based on (global) mean first passage and reaction times [30][31][32][33].Such defocused reaction times are relevant for intra-cellular regulation for low-concentration reactants.In particular, processes governed by powerlaw waiting times were shown to determine the motion of protein channels in membranes of living cells, displaying diffusion-controlled anomalous dynamics [34].
Here, we apply the CTRW formalism to gene expression models with intrinsic fat-tailed WTD depending on the system's current state, and identify a general class of models transitioning between ergodic and nonergodic phases at non-trivial long times, when the WTD's mean diverges.In addition, stationarity and ergodicity are shown to strongly depend on the system's internal noise.
We consider a two-state model in which a promoter randomly transitions between transcriptionally active and inactive states, see Fig. 1.In contrast to previous models, we assume that the time it takes the promoter to activate is power-law distributed, mimicking delays due to the DNA binding to specific and limited elements in the cell.This may be justified by realizing that activating the promoter often requires binding to sparse elements in the cell [31,48,49].Thus, the activation WTD is related to mean time it takes a random walker to hit a target, which follows a power-law distribution [11,50].
To gain insight into the two-state model depicted in Fig. 1, we first study a simpler model of a self-regulating gene (SRG) with linear rates.Here, the WTD between protein production events is assumed to be fat-tailed, as protein production requires the DNA state to be active.We show that even this simple model is sufficient to give rise to non-trivial dynamics, including the existence of a long-lived metastable state followed by protein decay.To this end, we compute the typical time to transition from metastability to power-law decay, the mean protein number, and its copy-number distribution.The latter displays super-Poissonian behavior, with a Fano factor greater than 1 (see below), even for truncated WTDs, which are experimentally relevant [51].Finally, we study the full two-state model (see Fig. 1) and show that the two models are qualitatively identical.Our theory is verified via numerical simulations based on a modified Monte-Carlo (MC) algorithm, recently developed for non-Markovian stochastic systems [52][53][54].
Self-regulating gene.We consider a protein-only model for the number of proteins n defined by the reactions n → n + 1 and n → n − 1, respectively representing protein production, e.g., due to translation, and protein degradation, e.g., due to cell division.Most studies assume that these reactions are exponential, memory-less point processes [11].Yet, in realistic scenarios, the times between consecutive reactions is not necessarily exponentially distributed [19] and the process may be described by a WTD of the next event.We thus define ψ − (τ ) as the WTD for one of n proteins to degrade between t and t + τ (n → n − 1), and ψ + (τ ) as the WTD for protein production (n → n + 1).These are given by Here, degradation is assumed to be a Poisson process linear in the population size, which gives rise to an exponential WTD of ψ − (τ ) [11] with normalized rate 1.In contrast, as discussed above, production may depend on a larger number of intra-cellular products, present in small numbers [3,12] or may take place in a sparse environment, leading to fat-tailed delays.Thus, production events are no longer exponentially distributed but sampled from a fat-tailed distribution.For concreteness, we consider the power-law WTD in (1).Here K is the carrying capacity and α is the power-law exponent.
To derive the birth-death master equation we define the probability ϕ + (n, t) [ϕ − (n, t)] for a single protein production (degradation) event to occur at time t provided that no degradation (production) event occurred until time t, when the system has n proteins, which reads for i, j ∈ {+, −} and i ̸ = j.Using the CTRW formalism developed in Ref. [45] we write the following chemical master equation for the probability P n (t) of having n proteins at time t, given Eqs.(2): where E j k f (k) = f (k + j) are step operators.The kernel M (n, t) is defined in terms of its Laplace transform [45] M (n, s) = s φ+ (n, s)/ 1 − φ+ (n, s) − φ− (n, s) , (4) where φi are the Laplace transforms of ϕ i and s is the Laplace variable.The term (E 1 n −1)nP n (t) in Eq. ( 3) is due to the exponential degradation of n products, while the integral term comes from the non-exponential production terms with memory kernel M (n, t).Note that, in the case of an exponential WTD [i.e., if ψ + (τ ) = Ke −Kτ ] one obtains φ− = n/(s+K +n) and φ+ = K/(s+K +n).Substituting these into Eq.( 4) yields M (n, s) = n and M (n, t) = nδ(t).Thus, Eq. ( 3) reduces to the well-known chemical master equation for exponential WTDs [11].
For a power-law WTD, the memory kernel is obtained by substituting Eqs. ( 1) and ( 2) into Eq.( 4): where Being interested in the long-time dynamics, t ≫ 1, we approximate the memory kernel at s ≪ 1, where states with n > 0 and n = 0 (ground state) display a markedly different behavior.For n > 0 and α > 0, Eq. ( 5) can be approximated as M (n > 0, s ≪ 1) = Km(x) + O(s), with x ≡ n/K being the protein density, and m(x) ≡ E α+1 (αx) /E α (αx).The leading-order inverse Laplace transform reads M (n > 0, t) ≃ δ(t)Km(x); i.e., short-term memory for n > 0, as degradation can always occur.In contrast, we find for n = 0 and s ≪ 1: Here, for α > 1 the leading-order term is independent on s.In contrast, for α < 1 there is a power-law dependence, since at n = 0 the only reaction that can occur is production, which gives rise to increasingly long times of inactivity.Note that, the subleading term in Eq. ( 6) differs between α > 2 and 1 < α < 2, which affects the rate of convergence to stationarity, but not stationarity itself.The inverse Laplace transform of (6) reads For α > 1, the dynamics is unaffected by the state n = 0, as its probability, P 0 , is exponentially small for K ≫ 1.
On the other hand, for α < 1 the dynamics display longrange correlation even at infinitely long times.This is a clear signature of non-ergodicity.To show this, we derive an equation for the mean protein number n by multiplying Eq. ( 3) by n and summing over all n: where ).This equation cannot be solved explicitly, and below we deal with it asymptotically.For α > 1 we show below that at t ≫ 1, M (n, t) → M (n) ≡ Km(x) and m(x) ≃ m(x), suggesting that the dynamics are ergodic.On the other hand, for α < 1, a single state retains memory leading to ergodicity breaking and aging [38,45].We conjecture that at sufficiently long times (see below), P 0 (t) grows due to long periods of inactivity at n = 0, whereas P n>0 (t) rapidly decay.In Fig. 2(a) we show that for t ≫ 1: 1 − P 0 (t) ∼ t −(1−α) and P n>0 (t) ∼ t −(1−α) .The averaged memory kernel is then dominated by the state n = 0 such that for 1 , where we have used (7) and that P 0 (t ′ ) ≃ 1. Substituting this into (8), the long-time asymptotic of the dynamics for any α < 1 becomes: At what times is this scaling reached?As the scaling is caused by the state n = 0, the dynamics are effectively stationary as long as this state has not been visited.Only upon visiting n = 0, we expect long-memory effects and nonstationarity.Thus, the typical time to asymptotic decay, τ typ , is roughly proportional to τ 0 -the mean first passage time (MFPT), to reach n = 0.At times t ≪ τ typ the dynamics is expected to be stationary, allowing us to analytically find the long-lived metastable state prior to the asymptotic decay as well as the MFPT to n = 0.
Stationary dynamics.For α > 1 and t ≫ 1, and for α < 1 and 1 ≪ t ≪ τ typ , the dynamics are stationary.To find n(t) and the MFPT to state n = 0, we write a stationary master equation (dP n (t)/dt = 0) by Laplace-transforming Eq. ( 3), multiplying by s, and using the final value theorem: lim s→0 s Pn (s) = P n [17].This yields: where P n is the steady state solution for the probability to have n proteins at time t, and M (n) = Km(x), see definition below Eq. ( 5).Equation ( 9) can be solved recursively to give P n = (1/n!)P0 n−1 k=0 M (k), where P 0 is the probability to be in the ground state, found by normalization.For K ≫ 1, this can be recast (up to a prefactor) into a semi-classical form as P n ∼ e −KS(n/K) ≡ e −KS(x) , with the action S(x) = x x ln (x ′ /m(x ′ )) dx ′ [55][56][57], where x ≡ n/K.The integral here can be solved numerically for any n, and the MFPT to the ground state n = 0 is given by τ 0 ∼ e KS(0) [55][56][57].
In addition to the steady-state dynamics, in Fig. 2(b) we show that the Fano factor, defined as the variance of the protein number divided by its average (σ 2 /µ), increases with decreasing α.As in all experiments the power-law WTD is expected to have an exponential cutoff at some finite τ cutoff , see e.g., [51], we plot in Fig. 2(b) the Fano factor for different values of τ cutoff , see Appendix A.Here the Fano factor is obtained from MC simulations and does not depend on the carrying capacity.Notably, the distribution tends to a Poissonian (σ 2 /µ = 1) either at α ≫ 1 or for very short cutoff times, τ cutoff ≲ 1/K.The existence of reactions with fattailed WTDs can thus serve as a possible explanation of experimental observations of super-Poissonian distributions (with σ 2 /µ > 1) in gene expression [58,59].
In Fig. 2(c) we test our analytical results using simulations for a wide range of α < 1 values.To collapse the curves, and to show the asymptotic behavior of n ∼ t −(1−α) , we plot (n/n ss ) 1/(1−α) , versus normalized time t/τ 0 , where nss is the numerical solution of (8), τ 0 = e KS(0) , and S(0) is found numerically.The collapse indicates that all curves start decaying at roughly the same normalized time.Yet, the curves clearly do not perfectly overlap at long times, due to an intermediate regime in Fig. 2(c), caused by an α-dependent prefactor, not accounted for by our theory; we have checked that the width decreases as K increases.Finally, Fig. 2(d) shows further evidence of the crossover between ergodic and nonergodic dynamics at α-dependent times, using the ergodicity breaking (EB) parameter [38].The latter is the variance of the time-averaged squared-displacement of the protein number δ 2 (∆) = 1/(t−∆) divided by its mean, and is used as a measure for the level of variability across different trajectories of a given ensemble (see Appendix B for additional details).The EB parameter is plotted in Fig. 2(d) versus the total simulation time, with EB≪ 1 and EB=O(1), being signatures of ergodic and nonergodic dynamics, respectively [38,60,61].
Two-state promoter model.Similar effects occur in a more complex two-state gene expression model, which explicitly accounts for mRNA noise, where transitions between a transcriptionally active and inactive promoter are independent of the protein number [4,14], see Fig. 1.As stated above, activation often requires binding to limited elements in the cell, which may give rise to a nonexponential, fat-tailed WTD [see Eq. ( 1)]: where κ is a scale parameter.In contrast to activation, deactivation is expected to occur at an exponential rate [11].For simplicity, we assume that the typical switching timescales from the active to inactive state and vice versa are equal, and thus, we set the rate for deactivation to be also κ.The rest of the reactions, see Fig. 1, transcription of mRNA and translation of proteins, and degradation of mRNA and proteins, are modeled as firstorder, exponential processes with rates a, bγ, γ and 1, respectively, where time is measured in units of inverse protein decay rate.The associated WTDs for these five reactions (except binding) are ψ j (τ ) = λ j exp(λ j τ ), with {λ j } 5 j=1 = {ℓκ, ℓa, n, γm, bγm}, where m and n are the mRNA and protein numbers, and ℓ = {0, 1} is the promoter's state (0 inactive, 1 active).
As transcription can occur only when the promoter is active and translation is mRNA-dependent, this model is qualitatively similar to the SRG model where we modeled the waiting times directly in the protein production.In Appendix C we derive the master equation for this set of reactions, and obtain quantitatively similar results to the SRG model.Here, for α > 1, all states including the ground state at n = m = ℓ = 0, do not exhibit memory at t ≫ 1, and the dynamics are stationary for all t ≫ 1.In addition, for α < 1 and times 1 ≪ t ≪ τ typ , i.e., longer than the relaxation time but shorter than the typical time τ typ to sample the ground state, the dynamics are still effectively stationary.In these cases the equations for the mean protein and mRNA numbers, n and m, read (11) where M (0) n,m is the leading order of the memory kernel for any state but the ground state, see Appendix C.
In contrast, for α < 1 and t > τ typ the system eventually reaches the ground state n = m = ℓ = 0 and the dynamics is no longer stationary.Here, n ∼ m ∼ t −1+α , see Appendix D, similarly to the SRG model.Yet, to determine the time to reach the ground state in the two-state model, one has to distinguish between two cases: moderate to fast switches κ ≥ 1 and slow switches κ ≪ 1.For κ ≥ 1 we find the same effect as in the SRG model, i.e., the typical time to decay is proportional to the MFPT to the ground state.Here, τ 0 ≫ 1 is governed by the longlived metastable dynamics and is typically exponential with the mean protein number.Notably, τ 0 can be computed using MC simulations, see Fig. 3(a); it asymptotically decreases as κ → ∞.In contrast, for κ ≪ 1 the typical time to reach the ground state is no longer governed by the metastable dynamics, since the promoter can be inactive for significantly longer periods than the typical relaxation time of O(1), see Fig. 3(a).This leads to the dynamics reaching the zero state after τ typ = O(κ −1 ), resulting in a relatively quick decay n ∼ t −1+α .
In Fig. 3(b) we compare numerical solutions of Eq. ( 11) to simulations, showing stationary dynamics at times 1 ≪ t ≪ τ typ for κ ≥ 1, and non-stationary dynamics when κ < 1 for any t.In Fig. 3(c) our results agree well with simulations for both 1 ≪ t ≪ τ typ and t ≫ τ typ .Here, as in Fig. 2 we normalize n by its steady state value found by solving Eq. ( 11), and normalize time by the MFPT to reach n = 0, independently obtained from simulations (compare to inset).The collapse occurs at t ≫ τ typ for all values of κ, and at t ≪ τ typ for κ ≥ 1.
In summary, we have studied two gene-expression models with delayed protein production due to fat-tailed WTDs.For distributions with diverging mean (α < 1) the mean protein number starts to decay after a typical time, which scales as the MFPT to a ground state.Here, the dynamics are ergodic at short times but become nonergodic and display ageing as the ground state is sampled, from which the system can only exit via a non-exponential reaction.We also showed that longrange memory may increase the Fano factor (variance over the mean of the protein distribution) as α decreases, which also holds for a truncated WTD.This effect may be experimentally measurable, provided that the powerlaw exponent can be adjusted (e.g., by modifying the molecule's binding affinity to the binding site).
Although we have focused on gene expression, fattailed WTDs may be key in various other fields.For instance, in predator-prey models in movement ecology, diffusion-limited predation or nonergodic foraging may be indicative of power-law waiting times [42].Furthermore, long waiting times also appear in epidemic dynamics, where a large variability in infection and/or recovery periods may lead to markedly different dynamics [62].Our findings may provide valuable insight into these and other dynamical models where long waiting times appear.
OV and MA were supported by the Israel Science Foundation Grant No. 531/20.

Appendix A: Truncated power law
In experimental systems measured power laws are often truncated.In the main text, Fig. 2(b), we plotted the Fano factor for truncated power laws.The WTD that replaces the one in Eq. ( 1) is defined as follows As shown in Fig. 2(b), as long as τ cutoff > 1/K (the typical degradation timescale), the Fano factor increases as α decreases.

Appendix B: Ergodicity Breaking
Ergodicity breaking is formally defined as a disparity between the mean squared displacement (MSD) and time-averaged mean squared displacement (TAMSD).The MSD is defined as the squared displacement of the protein number with respect to a reference number, averaged over an ensemble of independent simulations.The TAMSD is given by averaging over the squared displacement of the protein number performed in a time lag ∆ [37,38], where in this expression an overline denotes time averaging.Note that the same disparity occurs also between other ensemble-averaged and time-averaged observables, and not only the TAMSD [36].For a Brownian process and ∆ ≪ t one obtains δ 2 (∆) ∼ ∆ ∼ x 2 (∆) .In contrast, if the TAMSD and MSD scale differently, the underlying process is, by definition, nonergodic; that is, the ensemble averaging is different from the time averaging [44].In many cases, and especially for nonergodic dynamics, it is convenient to compute the so-called averaged TAMSD defined as where angular brackets denote ensemble averaging over N simulations (i.e., independent realizations of the protein number).Here, averaging is necessary due to the irreproducible nature of the process (i.e., large diversity across simulations).In Fig. 4 we show an example of the large diversity between simulations when the process becomes nonergodic.In the leftmost panel, we plot 50 random simulations compared to the mean protein number, while in the other four panels, each of the 50 lines denotes an average over 5, 10, 20, and 50 simulations.One can see that the variability around the nonergodic phase is larger and more immune to averages over a small number of simulations.
To derive the ergodicity breaking (EB) parameter we define the variability of the TAMSD (the spread of individual TAMSDs around their average) in terms of the dimensionless parameter ξ = δ 2 (∆)/ δ 2 (∆) .For many processes, at long measurement times the distribution of ξ satisfies a Mittag-Leffler distribution [60] Here, l α is the one-sided Lévy stable distribution with the Laplace transform L{l α (t)} = exp(−u α ), while Γ(•) is the Gamma function.For Brownian diffusion, α → 1, ϕ(ξ) ∼ δ(ξ − 1), i.e., a sharply peaked distribution around 1.However, for general α < 1 the distribution is wide and skewed.A common measure of ergodicity breaking, which we use in the main text, is the so-called EB parameter, defined for long simulation times t as In prototypical systems, e.g., CTRW with a power-law WTD, this parameter varies between EB = 0 for an ergodic system to EB > 0 for a nonergodic system, where for α → 0 we expect the EB parameter to be O(1) [38].
Appendix C: Derivation of the master equation for the two-state model To derive the CME for the set of reactions in the twostate promoter model, we repeat the steps detailed for the SRG model based on [45], noting that now we have 6 rates and not only 2. We start by writing the probability density for reaction i to occur at time t while no other reaction j ̸ = i occurs until t: A chemical master equation for the probability of having n proteins and m mRNAs at time t when the DNA is in state ℓ, P n,m (t), is then given by The memory kernel M n,m (t) is defined in terms of its Laplace transform Mn,m (s) = s φ0 (n, m, 0, s) and the operator A is defined in terms of the step operators . Note that the memory kernel associated with any of the exponential rates {λ j } 5  can also be calculated in a similar way to Eq. (C3) and reduces to M j (t) = λ j δ(t), giving the form of the operator A in Refs.[11,45].Explicit calculation of the memory kernel in the Laplace variable, (C3), yields, after some algebra where λ tot = n + γm + γbm, and noting that the additional reactions of deactivation and transcription do not contribute when the promoter is in the inactive state.
The exponential integral function, E α (x), is defined in the main text.
The memory kernel can be simplified in the limit s → 0 (t → ∞).An immediate result for all states λ tot > 0 is Mn,m (s) = M (0) n,m + O(s), (C5) which is independent of s in the leading order.Applying the inverse Laplace transform to Eq. (C5) yields M n,m,ℓ (t) ≃ M n,m δ(t) for λ tot > 0. As long as the dynamics are not in the state n = m = ℓ = 0 we have λ tot > 0 and this approximation holds.However, for n = m = ℓ = 0 the only possible reaction is the DNA switching to the active state and we have λ tot = 0.Here the form of the memory kernel at long times is strongly dependent on α: Thus, for α > 1 all states do not exhibit memory at t ≫ 1, suggesting stationary dynamics at long times, similarly to the exponential WTD case [4,14,15].Moreover, even for α < 1 and times 1 ≪ t ≪ τ typ , i.e., longer than the relaxation time but shorter than the typical time τ typ to sample the ground state λ tot = 0, the dynamics are expected to be stationary.In addition, when the mean protein number is large the value of the memory kernel for λ tot = 0 will be negligible.For all of these cases, the steady-state equation for P ℓ n,m is given by 0 = (−1) ℓ κP (1)  n,m − M n,m P (0) n,m + AP (ℓ) n,m , (C7) where the derivation is similar to the one shown above for the SRG.A similar set of equations for a stationary problem was analyzed in Ref. [14], and it was shown that the stationary mean-field dynamics of (C7) follow 0 = ṁ = aM  The initial condition is P (0) = 1, and the probabilities obey P (t)+Q(t) = 1 for any t.These equations are based on the non-Markovian kinetic rate equations developed in [48], see also [49] for further details.Equations (D1) and (D2) can be Laplace transformed (D4)

FIG. 1 .
FIG.1.A two-state gene expression model with a promoter, transcription, and translation.The activation process has a fat-tailed WTD denoted by ψ0(τ ).All other processes have exponential WTDs, and their average rates are specified.

FIG. 2 .
FIG. 2. SRG model.(a) Probabilities Pn(t) for various n (see legend) and 1 − P0(t) based on MC simulations for K = 25 and α = 0.36, all showing similar scaling at long times.(b) The Fano factor versus α for different τ cutoff (see legend), for K = 500.(c) The mean protein number versus time for different values of α (0.2 − 0.46); the curves can be approximately collapsed by properly rescaling n and t, see text.The dashed line is an eye guide with a slope of −1.Inset displays the averages without rescaling.Here n is averaged over 10 5 simulations and the carrying capacity is K = 25.(d) The EB parameter versus time for different α values, K = 500 and ∆ = 100, see text.

FIG. 4 .
FIG. 4. Protein number in the two-state model for α = 0.3, a = 100, b = 1.2, and γ = 10.The blue solid line is the mean protein number, averaged over 10 5 simulations.In red there are 50 lines, where each represents a trajectory averaged over a varying number of simulations, ranging from no averaging (leftmost panel) to averaging over 50 simulations (rightmost panel).

0 ψ 0
Appendix D: Effective two-state modelTo show the scaling of the mean number of proteins and mRNA with time, at sufficiently long times, we construct an effective model in which the switches are decoupled from the other cell components.In this effective model the probabilities of the DNA to be in the active and inactive states, P = n m P (1) n,m and Q = n m P (0) n,m , respectively, are given by ∂P ∂t = −κP (t) + t 0 ψ 0 (t − τ )κP (τ )dτ, (D1) ∂Q ∂t = κP (t) − t (t − τ )κP (τ )dτ.(D2)