Special Function Methods for Bursty Models of Transcription

We explore a Markov model used in the analysis of gene expression, involving the bursty production of pre-mRNA, its conversion to mature mRNA, and its consequent degradation. We demonstrate that the integration used to compute the solution of the stochastic system can be approximated by the evaluation of special functions. Furthermore, the form of the special function solution generalizes to a broader class of burst distributions. In light of the broader goal of biophysical parameter inference from transcriptomics data, we apply the method to simulated data, demonstrating effective control of precision and runtime. Finally, we suggest a non-Bayesian approach to reducing the computational complexity of parameter inference to linear order in state space size and number of candidate parameters.

the speed of approach varies.Matrix methods, such as finite state projection [13] or multi-finite buffers [14], rely on matrix exponentiation or eigenvalue calculation to directly solve a truncation of the infinite-dimensional CME system; however, barring convenient symmetries, these methods require a characteristic running time of roughly O(n 3 ), where n is the state space size.Finally, analytical methods directly solve the underlying system of ordinary differential equations (ODEs), e.g. using a generating function representation [8] or a convenient basis [15], and can be run in O(n) time.Due to lower computational complexity, these analytical methods are highly relevant to the determination of biophysical parameters from high-dimensional, multimodal data, such as that available by modern transcriptomics and proteomics methods.Recent findings suggest that the use of joint data can provide substantial improvements to model and parameter estimation [16], motivating the development of more efficient solvers for the CME.Current chemistries can quantify spliced and unspliced mRNA molecules [3,17], as well as surface proteins [18,19].The following multimodal models have analytical CME solutions, as well as drawbacks limiting their direct application to biological data.[15,20]: cannot be applied to proteomics, and does not explicitly model multistate genes.

Combination of Poissonian solutions
2. Constitutive mRNA and protein production [21]: exact solution, but applies poorly to eukaryotic systems due to prevalence of multistate genes.
3. Telegraph mRNA and protein production [22,23]: perturbative solution that relies on timescale separation between mRNA and protein lifetimes, and inapplicable to a large fraction of eukaryotic genes.
4. Multi-state gene solutions with a single product [24][25][26]: exact solution, but does not provide information regarding downstream gene products.Current sequencing methods cannot be easily integrated with DNA accessibility testing.
5. Bursty mRNA production and isomerization [27]: exact solution, but relies on numerical integration and uses a fairly simple burst model.
A recent method [17] uses joint distributions of spliced and unspliced mRNA to perform shorttime extrapolation on the cell landscape, motivating a more detailed treatment using stochastic biophysics.Motivated by that work, we propose a semi-analytical method for the evaluation of joint distributions resulting from the bursty transcription model [27], which describes a large fraction of mammalian genes [28][29][30][31][32]. Furthermore, we apply these models to parameter estimation, and discuss their applications to a set of burst size distributions that have not been previously solved.

Methods
We follow previous literature [27] in implementing a Markov model for production, isomerization, and degradation of mRNA (Figure 1a).A single gene locus undergoes transcriptional bursting at a rate of k i , producing B nascent mRNA transcripts (pre-mRNA) per burst, with P (B = ρ) = α ρ .
The nascent transcripts are isomerized to mature mRNA.B is a random variable; if the underlying gene expression follows a two-state telegraph model with short bursts of finite size, B is drawn from a geometric distribution [33].The reactions are modeled as a Poisson processes with constant rates, which enables their representation using a homogeneous continuous-time Markov chain (CTMC).P (n, m, t), the law of this CTMC model, yields the probability of finding n nascent and m mature molecules at time t.
The full set of CME ODEs is as follows: Using the probability-generating functions (PGF) x n y m P (n, m, t) and F (x) = ∞ ρ=0 α ρ x ρ , the CME recurrence relation may be cast into the form of a single partial differential equation (PDE): x n y m P (n, m, 0) and the normalization condition G(1, 1, t) = 1.Introducing the transformations x = 1 + u, y = 1 + v, and G = e φ results in the following PDE: such that M (u) = F (1 + u).The solution of the PDE at time t is expressed by the following integral: Per the method of characteristics,V (s) = ve −γs , U (s) = vf e −γs + (u − vf )e −βs whenever γ = β and e −γs (u + γvs) otherwise, where f ≡ β β−γ .Finally, the PGF G is recovered by exponentiating φ.We follow the approach of Bokes [21,27] in evaluating the PGF for x, y around the complex unit circle, interpreting these values as the two-dimensional discrete Fourier transform (DFT), or characteristic function (CF) values, of the original probability distribution, and converting them to the discrete domain by application of the inverse discrete Fourier transform (IDFT).This method has time complexity O(N log N ), where N is the state space size, such that N = max n × max m of interest (Figure 1b).For systems with relatively low copy numbers up to ≈ 100, where CME modeling is necessary, N ∼ 100 × 100, requiring on the order of 10,000 evaluations of the integral The model with a geometric burst size distribution of mean b requires the evaluation of t 0 bU 1−bU ds.This integral does not have a closed-form solution, and must be treated using repeated numerical quadrature.However, an approximation to the integral can be computed by decomposing the integrand into an integrable power series.Any expression in the form of X 1−X is amenable to an expansion in powers of X = bU .In the region |X| > 1, the Laurent expansion − ∞ i=0 X −i is available.The intuitive choice of the complementary Taylor expansion ∞ i=0 X i , which is valid for |X| < 1, is inappropriate for integration across the boundary |X| = 1: the approximation diverges and the integral of the expansion ceases to be identical to the original integral.Instead, we leverage the form of U , and note that Re(U ) < 0 for all nontrivial choices of u, v. Therefore, we utilize the Taylor expansion about −1, which is valid for |X + 1| < 2; the form of the series is As shown in the illustration of their shared domain of convergence (Figure 1c), it is possible to select the appropriate approximation based solely on a threshold for the real-valued |U |, which simplifies the computation.Thus, X is decomposed into multiple approximation domains {S j }, such that |X| evaluated at the boundary ∂S j is α, the threshold choice, and successive domains alternate in having |X| strictly greater or less than α (Figure 1d).As discussed in the Supplementary Note, the form of U guarantees that |{S j }| ≤ 4; at most two Laurent and two Taylor approximations are necessary.Examination of the expansions shows that both can be expressed as i Ω j,i U i .If |U (s)| ≥ α∀s ∈ S j , the Laurent approximation is appropriate, and Ω j,i = −b −i .For a Laurent order of approximation N L , i ∈ {0, −1, −2, ..., −N L }. Conversely, if |U (s)| ≤ α∀s ∈ S j , the Taylor approximation is appropriate.For a Taylor order of approximation N T , binomial expansion of (1 + X) i yields The resulting approximation i Ω j,i U i has i ∈ {1, 2, ..., N T }.Finally, the full integrand bU 1−bU is approximately j i Ω j,i U i .Therefore, the sought integral t 0 bU 1−bU ds can be computed using the truncated power series j i Ω j,i S j U i ds, where each expansion is only integrated over its appropriate domain of convergence S j .The details of computation are provided in the Supplementary Note, and the integrals U i ds are given in Table I.Numerical routines to evaluate the exponential integral and the Gaussian hypergeometric function are readily available; however, they are not necessarily optimized for speed.We discuss the approximation schema used to make them practical for large-scale computation in the Supplementary Note.Furthermore, the same approach can be used for other burst distributions.We consider a degenerate distribution (a gene locus that produces b transcripts per burst), a uniform distribution (a gene locus equally probably to produce any number of transcripts between a and b) [34,35], and a shifted geometric distribution (a gene locus guaranteed to produce at least one transcript per burst, e.g.due to the inhibitor being removed by an advancing RNA polymerase).We find that the approximate solutions to these systems can also be expressed in the form j i Ω j,i S j U i ds, as shown in Table II.Equivalently, as long as numerical routines are available to compute U i ds for i ∈ Z, a broad array of burst distributions can be computed simply by determining the appropriate integration limits (domains where the expansions converge) and computing the coefficients Ω j,i .

Results and Discussion
We have presented an approximation for the Chemical Master Equation solution of bursty pre-mRNA production and its conversion to mature mRNA.We explored several burst distributions discussed in previous studies and explored an extension to a polymerase-inhibitor interaction model.The CME solutions can be found via the computation of i Ω i U i ds for the finite-support distributions and j i Ω j,i S j U i ds for the infinite-support distributions.The analytical solutions of U i ds are given in Table I, whereas the combinatorial weights for specific burst distributions are given in Table II.The series form of the solution enables the modulation of approximation order for computational facility (Figure 2a).The control of method precision and runtime motivates the development of adaptive methods that determine a broad parameter domain using a low-fidelity approximation, then refine it using higher-order or quadrature-based methods.The purpose of the current investigation is the development of a unified framework for the computation of CME solutions for a variety of burst models, as well as the determination of analytical solutions for the approximations.To maintain generality, we do not emphasize a particular implementation of the underlying special functions, but presuppose the availability of efficient implementations of the incomplete gamma and Gaussian hypergeometric function.Nevertheless, as a proof of concept, we develop a case study to benchmark the performance of the degenerate case β = γ.Furthermore, we discuss several considerations for implementation and evaluation of the special functions (Supplementary note).In light of the motivating broader goal of parameter estimation, we use the algorithm to compute likelihood (Kullback-Leibler divergence) landscapes for simulated data [11] with a geometric burst size distribution and b = 19, k i = 2.5, β = γ = 1 (Figure 2b).The landscapes produced by the approximation method (shown for N L = N T = 7) closely follow those produced via numerical integration.Repeating this analysis for a range of approximation orders allows benchmarking the method.Over the entire domain shown in Figure 2b, the quality of approximation can be easily controlled by modulating the Taylor approximation order (Figure 2c).The runtime is largely a function of the Laurent approximation order (Figure 2d), due to its explicit reliance on the computation of special functions.We particularly note that the commercial adaptive quadrature method used for benchmarking [36] provides poor control of runtime.The procedure for regenerating the discrete distributions from generating functions presents certain problems for inference.As shown in Figure 2a, the result of the IDFT is not guaranteed to be a probability distribution; the IDFT enforces k π k = 1, but does not enforce π k ≥ 0∀k.However, the properties of the Markov chain ostensibly guarantee that π k ≥ 0, with the inequality becoming strict at equilibrium.For the computation of divergence, we treat this problem in an ad hoc manner, by setting π k ≤ 0 to a small float near machine epsilon.A natural, and potentially valuable, extension of this method is the development of transformations using non-negative, non-Fourier basis functions.An alternative approach is available, and yields faster performance at the expense of interpretability in the Bayesian framework.Instead of computing Kullback-Leibler divergence in the probability domain, it is possible to compute a measure of distance between the characteristic functions of proposed and observed distributions, or even their corresponding cumulants (logarithms).This approach provides two advantages.Firstly, the roundoff and computational expense of repeated exponentiation and logarithm operations is eliminated.Secondly, the overall computational complexity of an inference procedure that uses the Fourier transform method and samples M candidate parameters is O(MN log N ).Performing the entire analysis in the Fourier domain requires only a single Fourier transform to determine the empirical characteristic function, reducing the computational complexity to O(N log N + MN ), equivalent to O(MN ) in the practical limit of large M.This approach is has been explored for use in goodness-of-fit testing and model selection [37,38], but only rarely for parameter inference [39].However, we anticipate that the computational advantages may outweigh the incompatibility with Bayesian inference, similarly to the recent interest in using nonparametric Kolmogorov and Wasserstein distances for parameter inference [40][41][42].Further, we note that optimization of the characteristic function uses information about the entire distribution, potentially overcoming identifiability issues observed using other computationally inexpensive non-Bayesian approaches, such as the method of moments [16].Since characteristic function methods have primarily been used for analysis of (continuous) stable distributions [39,43,44] and their performance for inference from discrete-valued random variable observations has not, to our knowledge, been systematically explored, signifying a substantial lacuna.Thus, this approach is a natural next step for optimizing inference from large datasets.Our discussion of parameter estimation only touched upon inference from steady-state data, which is relevant for fixed-cell experiments that produce information about molecule distributions without a natural time coordinate, such as those available via scRNA-seq [45] and smFISH [46,47].However, experimental methods with temporal information are available [33,[48][49][50][51]. Given live-cell data, where cell identities are tracked across time, it is straightforward to extend this method to compute the probability of transitioning from an initial state to any other state, and thus compute the full likelihood of a time-series (Supplementary Note: Addenda).Repeating this process for all observed cells, assuming their trajectories are independent, and summing the log-likelihoods of their time-series yields a joint likelihood for the observations of the entire experiment [52][53][54].Furthermore, given fixed-cell data, where only the population-level statistics are tracked across time, it is likewise straightforward to compute the probability of transitioning from one copynumber distribution to another, and use it for likelihood computation [55] (Supplementary Note: Addenda).Finally, technical challenges in single-cell transcriptomics, such as sparsity of sampling in sequencing [56] and noise in fluorescence microscopy [57], have resulted in alternative competing explanations for qualitative features of observed biomolecule distributions, such as heavy-tailed laws [25,27] and apparent dropouts [58][59][60][61].We anticipate that intrinsic degeneracies, as well as aleatory effects, in mapping from a model parameter space to an observable space preclude the unambiguous identification of underlying biophysical schema: the presence of parameter equivalence classes, even in inference of simple models, is well-characterized [62][63][64][65].Nevertheless, we also anticipate that the development of analytical solutions, as well as numerical solvers, for a diversity of transcriptional mechanisms, sampling behaviors, and multimodal observables will aid in making inference sufficiently robust for design and extrapolation.For example, as a natural extension, it is straightforward to calculate the laws for observed pre-mRNA and mRNA copy numbers by computing the distributions under an arbitrary sampling schema.This approach enables the natural integration of experimental noise in the same framework as the underlying transcriptional and molecular stochasticity, enabling the simultaneous inference of experimental and physiological parameters.

Tables
Table I.Integrals of U i for various approximations and levels of degeneration.

Burst distribution b-step
Uniform Geometric Shifted geometric Table II.Integrands, expansion coefficients, summation indices, and expansion domain thresholds associated with approximating the CME solutions for four burst distributions.

Burst generating functions and their expansions
Let φ be the logarithm of the probability generating function (PDF) G.To determine φ, it is necessary to integrate dφ dt = k i (M (U (s)) − 1), where M (U (s)) is the factorial-moment generating function (FMGF) of the burst distribution.This function is represented as M (U ) = F (1 + U ) where F (x) = ∞ ρ=0 P (B = ρ)x ρ is the PGF of the burst distribution.In what follows we use µ to denote the mean burst size for a single-parameter burst distribution.

Geometric distribution, µ = b
As discussed by Singh and Bokes [1], the geometric distribution with mean burst size b has the probability mass function (PMF) P (B = ρ) = p(1 − p) ρ , where p = 1 1+b and ρ = 0, 1, 2, ....The resulting PGF is x .This is exact for |x| ≤ 1 and extends to x ∈ C by analytical continuation.Using the definition of p, F This expression has the well-known Laurent expansion − ∞ i=0 1 This expansion can be truncated at order N T , yielding

Shifted geometric distribution, µ = b
A shifted geometric distribution with mean burst size b has the PMF P (B = ρ) = p(1 − p) ρ−1 , where p = 1 b and ρ = 1, 2, 3, ....The resulting PGF over the relevant support is . This expansion can be truncated at order N T , yielding

Degenerate (b-step)
The degenerate burst distribution that yields b pre-mRNA products with every burst has PMF P (B = ρ) = δ ρb , where δ ij is the Kronecker delta.The resulting PGF is

Special function solutions to U i ds
As explored in the section "Burst generating functions and their expansions," the FCGFs of the burst models explored here can be represented in the common form M (u) − 1 ≈ i Ω i u i , where i ∈ Z.Given U (s), a characteristic solution representing the dynamics downstream of the gene locus, the integral φ k i = [M (U (s)) − 1]ds can be approximated as i Ω i U (s) i ds wherever the expansion holds.The determination of the appropriate domains is treated in the section "Domain Decomposition." Degenerate case: U (s; u, v) = e −γs (u + γvs)

Taylor expansion
From standard identities [2], the Taylor expansion of order i ∈ Z yields T i (s; u, v) = e −iγs (u + γvs) i ds = − e iu/v v i γi i+1 Γ(i + 1, i v (u + γvs)), where Γ(a, z) = ∞ z t a−1 e −t dt is the upper incomplete Gamma function [2].The evaluation of Γ(a, z) for arbitrary complex arguments is computationally nontrivial.However, for n ∈ N, Γ(i + 1, z) = i!e −z e n (z), where e n (z ], which is directly computable without the use of special function routines.This expression for T i is inappropriate for the degenerate case v = 0, corresponding to the marginal with respect to the pre-mRNA.In that case, the definite integral reduces to − u i γi [e −iγs 2 − e −iγs 1 ].

Laurent expansion
For k = −i, where the order of Laurent expansion k ∈ Z, the identity L k (s; u, v) = e iγs (u + γvs) The direct computation of Γ(1 − i, z) is computationally nontrivial, and finite power series expansions are unavailable for this purpose.However, the following identity holds: , where E i (z) is the generalized exponential integral of order i.Further, t dt is the first-order complex exponential integral.Therefore, As in the Taylor expansion, this expression is inappropriate for the degenerate case v = 0.In that case, the definite integral reduces to 1 iu i γ (e iγs 2 − e iγs 1 ).

Taylor expansion
For a Taylor expansion of order i ∈ Z, T i (s; u, v) = [vf e −γs + (u − vf )e −βs ] i ds = (Ae −γs + Be −βs ) i ds.Expanding the binomial term yields The resulting definite integral, which can be computed directly, is: Conversely, the Taylor integrals for the degenerate system scale with 1 i i , and the Laurent ones scale with i i .In case of the degenerate system, issues can be ameliorated somewhat by computing Ω i i i and i i U (s) i ds, which yields a ( b i ) i term and balances the growth behavior.It is not clear whether an analogous approach is available for the non-degenerate case, although solutions based on Stirling's approximation (i! ∼ i i √ 2πie −i ) may be viable for larger i.

Exponential integral
The solution for the Laurent expansion of the degenerate system, which only involves the evaluation of a parameter-free exponential integral E 1 (z), lends itself to optimization.We found that the following combination of approximations yields no more than 10 −8 relative error with respect to the MATLAB function expint throughout the entire complex plane [4].In this enumeration, x ≡ Re(z) and y ≡ Im(z).For u = 0 or v = 0, explicit solutions to |U | = α can be directly computed using the Lambert W function.For u, v = 0, a generalized Lambert W function is required for an analytical solution [8,9].Although approximation methods are available [10,11], they are not generally necessary here.Instead, a numerical scheme can be used to compute coarse estimates for roots constrained by the zeroes of d|U | 2 ds .As long as the roots are within ∆α of α, such that (α − ∆α, α + ∆α) is within the radius of convergence for both expansions, the approximation is valid.

General case: γ e = γ c
An analogous demonstration may be provided for the general case where the export rate is not equal to the degradation rate.As elsewhere, f ≡

Root computation
Apart from the degenerate cases described above, where the roots can be computed via logarithms or the Lambert W function, the determination of domain boundaries requires the use of numerical

Figures
Figures

Figure 1 :
Figure 1: (a) Schema of modeled physiology (k i : burst frequency; B: burst size drawn from discrete distribution on N; β: pre-mRNA splicing rate; γ: mRNA degradation rate).(b) Outline of the solution procedure.(c) Taylor/Laurent approximation criterion (orange: approximations common region of convergence; purple: threshold value of |U |).(d) Sample shapes of |U | and their partitions (black curve: |U |; purple: threshold value of |U |; grey: Laurent approximation regions).
Figure 1: (a) Schema of modeled physiology (k i : burst frequency; B: burst size drawn from discrete distribution on N; β: pre-mRNA splicing rate; γ: mRNA degradation rate).(b) Outline of the solution procedure.(c) Taylor/Laurent approximation criterion (orange: approximations common region of convergence; purple: threshold value of |U |).(d) Sample shapes of |U | and their partitions (black curve: |U |; purple: threshold value of |U |; grey: Laurent approximation regions).