Using Temporal Correlations and Full Distributions to Separate Intrinsic and Extrinsic Fluctuations in Biological Systems

Studies of stochastic biological dynamics typically compare observed fluctuations to theoretically predicted variances, sometimes after separating the intrinsic randomness of the system from the enslaving influence of changing environments. But variances have been shown to discriminate surprisingly poorly between alternative mechanisms, while for other system properties no approaches exist that rigorously disentangle environmental influences from intrinsic effects. Here, we apply the theory of generalized random walks in random environments to derive exact rules for decomposing time series and higher statistics, rather than just variances. We show for which properties and for which classes of systems intrinsic fluctuations can be analyzed without accounting for extrinsic stochasticity and vice versa. We derive two independent experimental methods to measure the separate noise contributions and show how to use the additional information in temporal correlations to detect multiplicative effects in dynamical systems. Many complex systems in nature are both intrinsically probabilistic and randomized by varying extrinsic inputs. Because the properties of such variability are often poorly understood, unknown extrinsic noise greatly complicates analyses of intrinsic mechanisms and vice versa [1]. One approach to this problem is to separate the two types of effects by monitoring independent twin reporter systems embedded in the same environment. Correlated noise is then interpreted as extrinsic and uncorrelated noise as intrinsic. This approach has been used to separate nature from nurture in human development [2], diffusion from material inhomogeneities in microrheology [3], and the effects of individual chemical events from variation in the intracellular state of living cells [4–10]. However, such approaches only separate intrinsic and extrinsic variances, and though variances can be useful to illustrate principles, they are often insufficient to infer mechanisms from data. For example, the predictions of many simple models of stochastic gene expression were initially borne out experimentally [11–15], while subsequent studies showed that other models with other non-linear effects, burst sizes, or waiting time distributions fit experiments equally well [16–19]. Full distributions are less used to explain basic principles, but if reliably measured they greatly facilitate experimental tests. Since different processes create the same overall occupancies [18,20,21], time correlations can also help pinpoint mechanisms from data. Restricting the separation of intrinsic and extrinsic effects to variances thus creates an unfortunate choice between either neglecting the information in temporal correlations and higher moments or simultaneously analyzing the combined complexity of system and environment. Operational definitions based on experimental procedures can also become divorced from …

Many complex systems in nature are both intrinsically probabilistic and randomized by varying extrinsic inputs.Because the properties of such variability are often poorly understood, unknown extrinsic noise greatly complicates analyses of intrinsic mechanisms and vice versa [1].
One approach to this problem is to separate the two types of effects by monitoring independent twin reporter systems embedded in the same environment.Correlated noise is then interpreted as extrinsic and uncorrelated noise as intrinsic.This approach has been used to separate nature from nurture in human development [2], diffusion from material inhomogeneities in microrheology [3], and the effects of individual chemical events from variation in the intracellular state of living cells [4][5][6][7][8][9][10].
However, such approaches only separate intrinsic and extrinsic variances, and though variances can be useful to illustrate principles, they are often insufficient to infer mechanisms from data.For example, the predictions of many simple models of stochastic gene expression were initially borne out experimentally [11][12][13][14][15], while subsequent studies showed that other models with other nonlinear effects, burst sizes, or waiting time distributions fit experiments equally well [16][17][18][19].Full distributions are less used to explain basic principles, but if reliably measured they greatly facilitate experimental tests.Since different processes create the same overall occupancies [18,20,21], time correlations can also help pinpoint mechanisms from data.Restricting the separation of intrinsic and extrinsic effects to variances thus creates an unfortunate choice between either neglecting the information in temporal correlations and higher moments or simultaneously analyzing the combined complexity of system and environment.
Operational definitions based on experimental procedures can also become divorced from the phenomena they were meant to capture.Intrinsic and extrinsic noise must therefore first be physically defined, while experimental strategies must be designed to provably measure the same quantities.To take a step further and identify mechanisms from data, for example, comparing measured intrinsic noise to models of the intrinsic system, the general mapping between phenomenological and mechanistic properties must also be established.
Here, we identify intrinsic and extrinsic contributions to time correlations and higher moments, show how they can be determined from either of two types of experimental approaches, and establish how they relate to mechanistic properties.We also demonstrate how to experimentally test if a system falls in a class for which mechanistic and phenomenological properties agree and describe an alternative approach for when they do not.Because biological systems operate far from thermodynamic equilibrium, we cannot use classical statistical mechanics approaches, such as connecting potentials to distributions, and, because the purpose of the approach is to analyze interaction networks where we do not know many details, we cannot evaluate linearizations or continuity approximations.We therefore consider generalized multileveled random walks with state-dependent transitions and derive exact results for families of processes.
Generalizing previous phenomenological definitions of intrinsic and extrinsic noise [9], we consider an arbitrary stochastic system-allowing for feedback loops and other interactions between components-that is subject to an arbitrary vector of changing environmental inputs ZðtÞ, where by definition the environment is negligibly affected by the system.We then consider an observed variable of the system, X t , and decompose that observable into a time-varying ensemble average " xðtÞ ¼ hX t jZ½0; ti e conditioned on the environmental history Z½0; t and the deviation Áx t ¼ X tjZ½0;t À " xðtÞ from that conditional average (Fig. 1) In the limit of a constant environment, Áx t purely reflects the inherent randomness of the system, while, in the extreme where the deviations Áx t are always zero, " xðtÞ purely reflects the influence of the varying environment.For all intermediate cases, the effects combine, with the state of the environmental variable affecting both the system's average response as well as its fluctuations.The multipoint temporal correlation A X ð 1 ; . . .; ' Þ hðX t À hXiÞðX tþ 1 À hXiÞ Á Á Á ðX tþ ' À hXiÞi of the physical observable X t then decomposes into where angular brackets denote a time average, denotes the set of all possible partitions of f0; 1; . . .; 'g into two nonempty sets 1 , 2 , and 0 ¼ 0 for notational convenience.The first two terms correspond to intrinsic and extrinsic components, respectively, and the remaining cross terms can be nonzero even for the simplest systems and environments.This illustrates the danger of first identifying an extrinsic contribution and treating the remainder as intrinsic, even if all interpretations are purely phenomenological.However, in the following, we show how all terms in Eq. ( 2) can be inferred from experimental data, under the only condition that the system eventually samples all states over the time scales considered.For ' ¼ 1, the above decomposition simplifies because all cross terms vanish as hÁx t jZ½0; ti e ¼ 0 at any point for any environmental history [22].The autocorrelation of X t thus exactly decomposes (Fig. 2) into the autocorrelation of " xðtÞ and the autocorrelation of Áx t , generalizing the previous variance result [9] to temporal correlations x ðÞ: These contributions are experimentally accessible by following two identical but independent reporter systems R 1 and R 2 embedded in the same fluctuating environment [4,23].By construction, the reporter systems are then conditionally independent hÁR 1;t ÁR 2;t jZ½0; ti e ¼ hÁR 1;t jZ½0; ti e hÁR 2;t jZ½0; ti e and their temporal cross correlation identifies the extrinsic contribution to the autocorrelation [22] A " x ðÞ ¼ hR 1;t R 2;tþ i À hR 1;t ihR 2;t i: Subtracting A " x ðÞ from the single reporter autocorrelation subsequently identifies the intrinsic component A Áx ðÞ via Eq.( 3).The cross correlation of dual reporters has been experimentally reported and used to operationally define intrinsic and extrinsic contributions to autocorrelations [23].Equations ( 3) and ( 4) establish that such experimental correlations directly measure the properties of " xðtÞ and Áx t , allowing for rigorous mechanistic interpretations of experimental results rather than just phenotypic classifications (see below).
For higher statistics (' ! 2), the fact that " xðtÞ and Áx t are not independent becomes explicit in the decomposition.The nth central moment n ½X hðX À hXiÞ n i [corresponding to 1 ¼ 0; . . .; nÀ1 ¼ 0 in Eq. ( 2)] is given by FIG. 1 (color online).Analyzing an arbitrary stochastic process subject to fluctuating environments.Given an environmental trace zðtÞ (dashed gray line), a system exhibits a random trace X t (two realizations, black lines).We define the time-varying ensemble average " xðtÞ (thick blue line), given the environmental history, and the instantaneous deviation Áx t of the system from that average (red bar).FIG. 2 (color online).Decomposing autocorrelations and distributions.The observed autocorrelation of X t (dashed line) is the sum of the autocorrelation exhibited by the ensemble average and the autocorrelation of the instantaneous deviations [Eq.( 3)].The probability distribution of X t (dashed line) is not a simple convolution of PðÁxÞ and Pð " xÞ [because Áx t and " xðtÞ are not independent], but its moments are related to those of Áx t and " xðtÞ, as specified by Eq. ( 5).
PRL 109, 248104 ( 2012) Because distributions are uniquely determined by their moments, this identifies the full intrinsic and extrinsic distributions.All cross terms and moments can be identified experimentally by generalizing the twin reporter approach to n identical and independent copies R 1 ; R 2 ; . . .; R n of the system of interest embedded in the same fluctuating environment.Such a setup could, for example, be realized by expressing multiple fluorescent reporter proteins of different color in the same cell [24].We find that the multiple reporter cross correlations satisfy For j ¼ 1; 2; . . .; n À 1, these relations define a linear equation system that can be solved for all cross terms in Eq. ( 5) as well as n ½ " x ¼ I n;0 , which means that n ½Áx can be inferred from the equation.The cross correlations between n independent and identical reporters can thus exactly identify the intrinsic and extrinsic contributions to the observed nth central moment of the distribution of X t .
Relating intrinsic and extrinsic contributions to separately defined Áx and " x allows us to design alternative experimental strategies to measure the same quantities (Fig. 3): assuming independence between reporters, the sum of many independent copies of the same type of reporter provides a direct estimate for the time series of the conditional average " x, while any intrinsic statistical property can be observed with just one additional distinct reporter, whose deviations from " x give Áx.
This approach estimates exactly the same type of intrinsic and extrinsic contributions as the correlation-based method and captures the full intrinsic and extrinsic distributions.In fact, for variances and autocorrelations, only a single type of reporter is needed, first measuring fluctuations of the full process and then separately measuring fluctuations in the average of many reporters.This approach is limited by sample error when estimating the average response from a finite number of reporters but has the advantage that distinguishable reporters are not needed.Ideally, both approaches would be used for a side-by-side comparison, providing independent validation by measuring the same property in two different ways, but one method may be more convenient than the other in any given application.
The main practical reason for identifying intrinsic and extrinsic noise contributions is to analyze each category separately using simpler models that only account for that type of randomness, i.e., comparing measured intrinsic noise to models that ignore environmental fluctuations and comparing measured extrinsic noise to models of a system responding deterministically to environmental changes [5][6][7].However, whether or not this is a rigorous approach depends on the noise category, the statistical measure considered, how reaction rates depend on abundances, and how the environment is coupled to the components of the system [9,25].For systems where the intrinsic variables interact nonlinearly with each other, it is known [9] that neither the intrinsic nor the extrinsic noise above can be rigorously analyzed from models that ignore extrinsic or intrinsic stochasticity, respectively: noise brings the system into states with disproportionate changes in dynamics.However, Eqs. ( 2)-( 6) are exact even for such systems, and many experimental approaches were designed so that nonlinear interactions are in the shared extrinsic environment [4,5], instead using the approach to disentangle other nontrivial effects such as bursting, time averaging, molecular memory, etc., in the random walks.Here, we determine to what extent intrinsic or extrinsic noise in such systems can be rigorously and exactly analyzed by models that only account for intrinsic [5][6][7] or extrinsic noise.
Previously, we have shown that, if all reaction rates r k ðx; zðtÞÞ are linear functions of the vector of intrinsic variables x ¼ fx 1 ; x 2 ; . ..g,where the system undergoes reactions x !r k x þ s k for k ¼ 1; 2; . . ., which is true for many models of stochastic gene expression, the dynamics of the ensemble average " xðtÞ follows exactly from a classic rate equation approach, employing ordinary differential equations subject to time-varying rate constants set by the extrinsic processes [9] d " where the fluctuating vector of fluxes is given by Rðx; zÞ ¼ P k r k ð " xðtÞ; zðtÞÞs k .By showing how to determine any time FIG. 3 (color online).Experimentally inferring intrinsic and extrinsic contributions.The correlations between multiple independent but distinguishable reporters identify temporal autocorrelations and higher order statistics of Áx and " x [see Eqs. ( 4) and ( 6)].Alternatively, a set of indistinguishable reporters can be used to directly estimate " x and thus Áx. x, Eqs. ( 4) and ( 6) thus show how the simple rate equation type of models that ignore intrinsic noise can be rigorously compared against much richer aspects of the extrinsic data than just variances [9].
Whether intrinsic contributions can be analyzed without explicitly representing environmental fluctuations in turn depends on whether the environmental influences are additive (constant Jacobian matrix J) or multiplicative (fluctuating Jacobian J) [9].We previously showed that for additive environments the intrinsic variance is exactly captured by models that account for intrinsic stochasticity but ignore environmental fluctuations, if the system is linear with respect to intrinsic variables [9].In other words, for such systems, the covariance matrix C of intrinsic species where C int is the covariance matrix of the simplified system x !rk ðxÞ x þ s k for k ¼ 1; 2; . . ., in which environmental fluctuations have been replaced by their averages rk ðxÞ r k ðx; hziÞ, and C ext is the covariance matrix of a system of deterministic ordinary differential rate equations with fluctuating rates [Eq.(7)].Under the same conditions, we can show that the matrix of correlations between the intrinsic species at times t and t þ , conditioned on the same environmental history, satisfies @ @ Aðt; t þ Þ ¼ ÀJA, with Aðt; tÞ ¼ CðtÞ, where CðtÞ is the instantaneous conditional covariance matrix with elements C ij ðtÞ ¼ hÁx i Áx j i e between intrinsic species [26].Because we can write the intrinsic contribution to the autocorrelation ðA Áx Þ i;j ¼ hÁx i;t Áx j;tþ i as a time average of the instantaneous autocorrelations, i.e., A Áx ðÞ ¼ hAðt; t þ Þi, we thus obtain where we have made use of C int ¼ hCðtÞi from Ref. [9].For intrinsically linear systems subject to additive extrinsic noise, the inferred autocorrelations of any intrinsic species Áx i can thus be rigorously analyzed using simplified models that ignore extrinsic influences.This makes the temporal correlations between dual reporters [4] a powerful experimental measure, as illustrated below.
The same is true for a system's intrinsic skewness: the time evolution of the skewness tensor of an arbitrary multivariate system that is linear and additive in the above sense can be described by a general tensor equation involving only the Jacobian and covariance matrices [22].By taking time averages of the tensor equation with fluctuating rates, it can be shown that the skewness 3;i of species x i predicted by mechanistic models in which fluctuating rates are replaced by their averages rk ðxÞ is in fact equal to the intrinsic contribution to the skewness Surprisingly, the kurtosis and higher order moments cannot be interpreted as the higher order moments of systems where environmental effects are replaced by their averages-even for intrinsically linear systems subject to additive environmental influences (proof by counterexample in Ref. [22]).This illustrates the dangers of using interpretations that are merely intuitive and not proven.
When environmental effects are multiplicative, the experimentally identified intrinsic noise cannot be rigorously compared to models that ignore extrinsic fluctuations; see Fig. 4(a) and [9].For many processes, in living cells, the environment will multiplicatively affect system variables, but not for all.The problem is that the twin reporter assay has so far always been used in situations where we do not know if the environment is additive or multiplicative [4][5][6][7] and thus cannot necessarily trust the conclusions from the associated mechanistic models.
Specifically, if only variances are experimentally decomposed, we must assume additive environments to analyze intrinsic mechanisms, with no way of testing if the assumption holds or not [9].However, intrinsic and extrinsic contributions to skewnesses or autocorrelations [Eqs.( 3)-( 6)] place constraints on models without increasing their degree of freedom.This can be used to test if a given type of model is consistent with additive environments.
For example, consider the commonly used toy model of gene expression in which individual mRNA molecules (levels denoted by x 1 ) are produced with rate 1 , degraded with rate 1 x 1 , and individual proteins (levels denoted by x 2 ) are produced with rate 2 x 1 , degraded with rate 2 x 2 .We typically do not know a priori which rate constants are significantly affected by environmental fluctuations.However, for systems with additive environments, i.e., where only rate constant 1 fluctuates significantly, then, regardless of how 1 fluctuates, the experimentally extracted intrinsic component of the total protein autocorrelations exactly follows where F ¼ 2 =ð 2 À 1 Þ.Without knowing all the individual parameters, we can then fit a sum of two exponentials ae Àk a þ be Àk b to the experimentally determined intrinsic autocorrelation.Surprisingly, we find that even in the presence of multiplicative noise such fits can be near perfect [see the inset of Fig. 4(b)].However, if the environment is additive, the fit parameters will satisfy while for multiplicative environmental noise this relation is violated; see Fig. 4(b).
For multiplicative environments, we can thus rigorously model the system's response to extrinsic noise without accounting for intrinsic contributions, but not vice versa.However, because Eqs. ( 2)-( 7) remain valid, we can first thoroughly evaluate extrinsic models and then use them to embed models of intrinsic stochasticity.The approaches above could also be combined with recent theoretical developments to use reporters at multiple levels of a system [10], which can divide intrinsic and extrinsic noise into more subcategories and help pinpoint exact mechanisms.As cell biology moves away from bulk averages to study natural perturbations and responses, we believe these types of approaches-that analytically exploit broader properties of classes of systems rather than simulating whole networks-will be crucial to quantify the mechanisms of living cells.
statistics of "

FIG. 4 (
FIG.4(color online).Signature of multiplicative noise.Simulation results of the gene expression model in the main text.(a) For systems subject to multiplicative noise, the intrinsic contribution to the autocorrelation differs from the autocorrelation ÂðÞ that the system would exhibit in the absence of environmental fluctuations.(b) Systems with multiplicative noise (black squares, solid line) deviate significantly from Eq. (11) (dashed gray line).Inset: Despite strongly multiplicative noise, the best fit (red line) of simple exponentials (black dots) can be near perfect.(See the parameter values in Ref.[22].)