Brownian yet non-Gaussian diffusion: from superstatistics to subordination of diffusing diffusivities

A growing number of biological, soft, and active matter systems are observed to exhibit normal diffusive dynamics with a linear growth of the mean squared displacement, yet with a non-Gaussian distribution of increments. Based on the Chubinsky-Slater idea of a diffusing diffusivity we here establish and analyze a minimal model framework of diffusion processes with fluctuating diffusivity. In particular, we demonstrate the equivalence of the diffusing diffusivity process with a superstatistical approach with a distribution of diffusivities, at times shorter than the diffusivity correlation time. At longer times a crossover to a Gaussian distribution with an effective diffusivity emerges. Specifically, we establish a subordination picture of Brownian but non-Gaussian diffusion processes, that can be used for a wide class of diffusivity fluctuation statistics. Our results are shown to be in excellent agreement with simulations and numerical evaluations.


I. INTRODUCTION
Thermally driven diffusive motion belongs to the fundamental physical processes. To a big extent inspired by the groundbreaking experiments of Robert Brown in the 1820ies [1] the theoretical foundations of the theory of diffusion were then laid by Einstein, Sutherland, Smoluchowski, andLangevin between 1905 and [2][3][4][5]. On their basis novel experiments, such as the seminal works by Perrin and Nordlund [6,7], in turn delivered ever better quantitative information on molecular diffusion as well as the atomistic nature of matter. Typically, we now identify two fundamental properties with Brownian diffusive processes: (i) the linear growth in time of the mean squared displacement (MSD) r 2 (t) = ∞ −∞ r 2 P (r, t)dr = 2dDt (1) typically termed normal (Fickian) diffusion. Here d denotes the spatial dimension and D is called the diffusion coefficient. (ii) The second property is the Gaussian shape of the probability density function to find the diffusing particle at position r at some time t [8]. From a more mathematical viewpoint the Gaussian emerges as limit distribution of independent, identically distributed random variables (the steps of the random walk) with finite variance and in that sense assumes a universal character [9]. Deviations from the linear time dependence (1) are routinely observed. Thus, modern microscopic techniques reveal anomalous diffusion with the power-law dependence r 2 (t) ≃ t α of the MSD, where according to the value of the anomalous diffusion exponent we distinguish subdiffusion for 0 < α < 1 and superdiffusion with 1 < α < 2 [10][11][12][13][14]. Examples for subdiffusion of passive molecular and submicron tracers abound in the cytoplasm of living biological cells [15][16][17] and in artificially crowded fluids [18], as well as in quasi two-dimensional systems such as lipid bilayer membranes [19][20][21][22]. Superdiffusion is typically associated with active processes and also observed in living cells [23]. Anomalous diffusion processes arise due to the loss of independence of the random variables, divergence of the variance of the step length or the mean of the step time distribution, as well as due to the tortuosity of the embedding space. The associated probability density function of anomalous diffusion processes may have both Gaussian and non-Gaussian shapes [10][11][12].
A new class of diffusive dynamics has recently been reported in a number of soft matter, biological and other complex systems: in these processes the MSD is normal of the form (1), however, the probability density function P (r, t) is non-Gaussian, typically characterized by a distinct exponential shape with the decay length λ(t) = √ Dt [24]. This form of the probability density function is also sometimes called a Laplace distribution. The Brownian yet non-Gaussian feature appears quite robustly in a large range of systems, including beads diffusing on lipid tubes [25] or in networks [25,26], tracer motion in colloidal, polymeric, or active suspensions [27], in biological cells [28], as well as the motion of individuals in heterogeneous populations such as nematodes [29]. For additional examples see [30][31][32][33] and the references in [24,34,35].
How can this combination of normal, Brownian scaling of the mean squared displacement be reconciled with the existence of a non-Gaussian probability density function? One argument not brought forth in the discussion of anomalous diffusion above is the possibility that the random variables making up the observed dynamics are indeed not identically distributed. This fact can be introduced in different ways. First, Granick and coworkers [25] as well as Hapca et al. [29] employed distributions of the diffusivity of individual tracer particles to explain this remarkable behavior: indeed, averaging the Gaussian probability density function (2) for a single diffusivity D over the exponential distribution p D (D) = D −1 exp(−D/ D ) with the mean diffusivity D , the exponential form (3) of the probability density function emerges [29,34]. In fact, this idea of creating an ensemble behavior in terms of distributions of diffusivities of individual tracer particles is analogous to the concept of superstatistical Brownian motion: based on two statistical levels describing, respectively, the fast jiggly dynamics of the Brownian particle and the slow environmental fluctuations with spatially local patches of given diffusivity this concept demonstrates how non-Gaussian probability densities arise physically [37]. In what follows we refer to averaging over a diffusivity distribution p D (D) as superstatistical approach. An important additional observation from experiments that cannot be explained by the superstatistical approach is that "the distribution function will converge to a Gaussian at times greater than the correlation time of the fluctuations" [24]. This is impressively demonstrated, for instance, in Fig. 1C in [25]. This crossover cannot be explained by the superstatistical approach. At the same time the normal-diffusive behavior is not affected by the crossover between the shapes of the distribution.
Second, Chubinsky and Slater came up with the diffusing diffusivity model, in which the diffusion coefficient of the tracer particle evolves in time like the coordinate of a Brownian particle in a gravitational field [34]. For short times they indeed find an exponential form (3). At long times, they demonstrate from simulations that the probability density function crosses over to a Gaussian shape. Jan and Sebastian formalize the diffusing diffusivity model in an elegant path integral approach, which they explictly solve in two spatial dimensions [35]. Their results are consistent with those of Ref. [34].
Here we introduce a simple yet powerful minimal model for diffusing diffusivities, based on the concept of subordination. Based on a double Langevin equation approach our model is fully analytical, providing an explicit solution for the probability density function in Fourier space. The inversion is easily feasible numerically, and we demonstrate excellent agreement with simulations of the underlying stochastic equations. Moreover, we provide the analytical expressions for the asymptotic behavior at short and long times, including the crossover to Gaussian statistics, and derive explicit results for the kurtosis of the probability density function. The bivariate Fokker-Planck equation for this process and its connection to the subordination concept are established. Finally, we show that at times shorter than the diffusivity correlation time our analytical results are fully consistent with the superstatistical approach. Our approach has the distinct advantage that it is amenable to a large variety of different fluctuating diffusion scenarios.
In what follows we first formulate the coupled Langevin equations for the diffusing diffusivity model. Section 3 then introduces the subordination concept allowing us to derive the exact form of the subordinator as well as the Fourier image of the probability density function. The Brownian form of the MSD is demonstrated and the short and long time limits derived. Moreover, the connection to the superstatistical approach is made. The kurtosis quantifying the non-Gaussian shape of the probability density function is derived. In section 4 the bivariate Fokker-Planck equation for the joint probability density function P (x, D, t) is analyzed, before drawing our conclusions in section 5. Several Appendices provide additional details.

II. SUPERSTATISTICAL APPROACH TO BROWNIAN YET NON-GAUSSIAN DIFFUSION
As mentioned above, it was suggested by Granick and coworkers [24] as well as by Hapca et al. [29] that the Laplace distribution with effective diffusivity D emerging from a standard Gaussian distribution with diffusivity D, through the averaging procedure over D. This approach corresponds to the idea of superstatistics [37]: accordingly the overall distribution function P (x, t) of a system of tracer particles, individually moving in sufficiently large, disjunct patches with local diffusivity D, becomes the weighted average, where p D (D) is the stationary state probability density for the particle diffusivities D. While in this Section we restrict the discussion to the one-dimensional case, we will also provide results for higher dimensions below. Fourier transforming Eq. (6) we obtain where we used the fact that G(k, t) = exp(−Dk 2 t). On the right hand side we identified the integral of p D (D) over exp(−Dk 2 t) as the Laplace transformp D (s = k 2 t) to be taken at s = k 2 t. Concurrently, the Fourier transform of expression (4) is Combining these results and recalling the Laplace trans- To obtain the Laplace distribution (4) as superstatistical average of elementary Gaussians (5), the necessary distribution of the diffusivities is the exponential (9). This is exactly the result of Granick and coworkers [24] and Hapca et al. [29]. (We note that Hapca and coworkers also report results for the case of a gamma distribution p D (D).) Now, let us take the Fourier inversion of Eq. (7) and invoke the substitution κ = kt 1/2 , The right hand side defines a scaling function F of the form where ζ = x/t 1/2 . Thus the form F as function of the similarity variable ζ is an invariant. In particular, no transition of P (x, t) from a Laplace distribution to a different shape is possible in this superstatistical framework.
To account for the experimental observation, however, we are seeking a model to explain the crossover from an initial Laplace distribution to a Gaussian shape at long(er) times.
Anomalous diffusion with exponential, stretched Gaussian, and power law shapes of the probability density function We briefly digress to mention that for the case of anomalous diffusion with a mean squared displacement of the form x 2 (t) ≃ t α a similar phenomena was observed. Namely, for the motion of particles in a viscoelastic environment with a fixed generalized diffusivity D α of dimension cm 2 /sec α the motion is characterized by the Gaussian [12,36] with x/t α/2 scaling variable. Instead, in a recent experimental study observing the motion of labeled messenger RNA molecules in living E.coli and S.cerevisiae cells an exponential distribution of the diffusivity was found [55], on the single trajectory level, pointing at a higher inhomogeneity of the motion than previously assumed. The distribution (13) combined with the Gaussian (12) gives rise to the Laplace distribution [55] Similarly one can show that the stretched Gaussian observed for the lipid motion in protein-crowded lipid bilayer membranes [21] emerges from the Gaussian (12) in terms of a modified diffusivity distribution of the form In that case the resulting distribution assumes the form with an additional power law term in x, see Appendix A. Depending on the value of κ one can then obtain stretched Gaussian shapes for p α (x, t) or even broader than exponential forms (superstretched Gaussians). We finally note that for a power law distribution with 0 < α < 2 the resulting superstatistical distribution acquires long tails of the form as demonstrated in Appendix A. This brief discussion shows the need for a more general model for the diffusing diffusivity, the basis of which is established here.

III. LANGEVIN MODEL FOR DIFFUSING DIFFUSIVITIES
To describe Brownian but non-Gaussian diffusion we start with the combined set of stochastic equations The independent noise terms ξ(t) and η(t) are white and Gaussian, and both are specified by their first two moments for i, j = x, y, z and l, m = 1, . . . , n. As explained below, the dimension n of the process Y(t) may differ from the value of d of the process r(t) in real space.
In the above set of coupled stochastic equations (19), expression (19a) designates the well-known overdamped Langevin equation driven by the white Gaussian noise ξ(t) [38]. However, we consider the diffusion coefficient D(t) to be a random function of time, and we express it in terms of the square of the Ornstein-Uhlenbeck process Y(t) (see below for the reasoning). The physical dimension of the latter is [Y] = cm/sec 1/2 . In Eq. (19c) the correlation time of the Ornstein-Uhlenbeck process is τ , and σ of units [σ] = cm/sec characterizes the amplitude of the fluctuations of Y. We complete the set of stochastic equations with the initial conditions, chosen as Physically, the choice of the above set of dynamic equations corresponds to the following reasonings. In the diffusing diffusivity picture we model the particle motion, on the single trajectory level, by the random diffusivity D(t). Taking D(t) as the square of the auxiliary variable Y(t) guarantees the non-negativity of D(t). This way we avoid the need to impose reflecting boundary condition on D(t) at D = 0, which is more difficult to handle analytically [34]. The reason to choose the Ornstein-Uhlenbeck process (19c) for Y(t) is two-fold. First, it makes sure that the diffusivity dynamics is stationary, with a given correlation time. Second, the ensuing distribution p D (D) has exponential tails, thus guaranteeing the emergence of the Laplace-like distribution for P (r, t) at short times, as we will show. At long times, the above choice corresponds to a particle moving with an effective diffusivity D , and thus leads to the crossover to the long time Gaussian behavior of P (r, t). The above set of Langevin equations not only fulfill these requirements but also allows for an analytical solution, as shown below.
For simplicity, we introduce dimensionless units via the transformations t → t/τ and x → x/(στ ) (and similar for y and z). The process Y(t) is renormalized according to Y → στ 1/2 Y. As detailed in Appendix B we then obtain the set of stochastic equations for our minimal diffusing diffusivity model.
We note that the above minimal model for the diffusing diffusivity allows different choices for the number of components of Y(t). The number n is thus essentially a free parameter of the model. It defines the number of 'modes' necessary to describe the random process D(t). This is actually another advantage of the present approach, since it provides additional flexibility.
In the Discussion section we will show that the above compound process is analogous to the Heston model [41] and thus a special case of the Cox-Ingersoll-Ross (CIR) model [39], which are widely used for return dynamics in financial mathematics. Our approach therefore has a wider appeal beyond stochastic particle dynamics.

Properties of the Ornstein-Uhlenbeck process
The stochastic equation (20c) contains a linear restoring term, corresponding to the motion of the process Y in a centered harmonic potential. The formal solution of this Ornstein-Uhlenbeck process reads The associated autocorrelation function is Thus, for long times (t 1 + t 2 → ∞), we find the exponential decay of the autocorrelation, and thus the stationary variance We note that the Fokker-Planck equation for this Ornstein-Uhlenbeck process reads The distribution f (Y, t) converges to the normalized equilibrium Boltzmann form In what follows and in our simulations we assume that the initial condition Y 0 is taken randomly from the equilibrium distribution (26). Then, the process Y(t) becomes stationary starting from t = 0, and Eq. (23) is exact at all times.
The stationary diffusivity distribution p D (D) encoded in Eq. (26) in terms of the variable Y(t) can then be obtained as follows.
(i) In dimension n = 1, the variance of Y in the stationary state is Y 2 st = 1/2 and the mapping to p D (D) reads In dimensional form we have with From comparison with the direct superstatistical approach in Section II we see that the pure exponential form (9) in our diffusing diffusivity model is being modified by the additional prefactor 1/D 1/2 . From numerical comparison, however, the exponential dependence is dominating and thus the result (28) practically indistinguishable from (9) for sufficiently large D values.
(ii) For n = 2, the stationary state variance of Y is where Y = |Y| and 2πY f st (Y) = 2Y exp(−Y 2 ). Moreover, we made use of the property (31) of the δ-function. In dimensional units, we have in conjunction with relation (29).

IV. SUBORDINATION CONCEPT FOR DIFFUSING DIFFUSIVITIES
Subordination, introduced by Bochner [42], is an important concept in probability theory [43]. Simply put, a subordinator associates a random time increment with the number of steps of the subordinated stochastic process. For instance, continuous time random walks with power-law distributions of waiting times can be described as a Brownian motion in terms of the number of steps of the process, while the random waiting times are introduced in terms of a Lévy stable subordinator, as originally formulated by Fogedby [44] and developed as a stochastic representation of the fractional Fokker-Planck equation [11] and generalized master equations for continuous time random walk models [45][46][47][48].
Here we apply and extend the subordination concept to a new class of random diffusivity based stochastic processes. Our results for our minimal model of diffusive diffusivities demonstrates that the subordination approach leads to a superstatistical solution at times shorter than typical diffusivity correlation times.
To start with, we note that the stochastic probability density function P (r, t) = P (r, t|D(t)) fulfills the diffusion equation With this in mind we can rewrite the Langevin equation (20a) in the subordinated form After this change of variables the Green function of the diffusion equation has the form with r = |r|. The path variable τ , for any given instant of time t, according to Eq. (36b) is a random quantity. In order to calculate the probability density P (r, t) of the variable r at time t we need to eliminate the path variable τ . This is achieved by averaging the Green function (37) over the distribution of τ in the form Here T n (τ, t) is the probability density function of the process Equation (38) is but the well known subordination formula, implying the following: the probability for the walker to arrive at position r at time t equals the probability of being at τ on the path at time t, multiplied by the probability of being at position r for this path length τ , summed over all path lengths [44].
By help of relation (38) we write the Fourier transform in the subordinated form with k = |k|. Thus, the Fourier transform of P (r, t) is expressed in terms of the Laplace transformT n of the density function T n (τ, t) with respect to τ , with argument s = k 2 . The subordination approach established here introduces a superior flexibility into the diffusing diffusivity model. By specific choice of the subordinator density T n (τ, t) we may study a broad class of normal and anomalous diffusion processes caused by diffusivities, that are randomly varying in time and/or space. In turn, the advantage of our minimal model for diffusing diffusivities introduced here is, that the process τ (t) is the integrated square of the Ornstein-Uhlenbeck process, for which in the one-dimensional case n = 1 the Laplace transform of the probability density function is known [49], We thus directly obtain the exact analytical result for the Fourier transform of the probability density function P (x, t). The inverse Fourier transform can be performed numerically. Figure  1 demonstrates excellent agreement between this result and simulations of the stochastic starting equations (20a) to (20c). Below we provide analytical estimates of P (x, t) for short and long times and establish a connection of the subordination approach with the superstatistical framework.
Our approach can be easily generalized to the case of the n-dimensional Ornstein-Uhlenbeck process considered by Jain and Sebastian [35]. Namely, let us consider where Ornstein-Uhlenbeck process. Since the components of Y(t) are independent and the Laplace transformT n (s, t) and the characteristic functionP (k, t) are simply n-fold products of identical, one-dimensional functions (43) and (44), respectively: We thus directly obtain the exact analytical result for the Fourier transform This result is consistent with that of Eq. (25) in Ref. [35], up to numerical factors, which appear due to the difference in numerical coefficients entering the Ornstein-Uhlenbeck process.
A. Brownian mean squared displacement and leptokurtic behavior The mean squared displacement and the fourth moment encoded in our minimal model can be directly obtained from the Fourier transform (44) through differentiation, For the isotropic case considered here the Laplace operator is defined as Expanding relation (41) for small k, we obtain up to the fourth order From this we directly obtain the mean squared displacement and the fourth order moment With the results of Appendix D we find where D st is given by Eq. (24), and   The deviation of the shape of a distribution function from a Gaussian can be conveniently quantified in terms of the kurtosis We note that the kurtosis is closely related to the (first) non-Gaussian parameter, introduced in the classical text by Rahman [50]. Inserting results (54) and (55) we obtain in the short time limit that for the choice d = n. At long times, The first relation characterizes exponential distributions according to Equations (66), (70), and (72) derived below, whereas the second result coincides exactly with the kurtosis of the multidimensional Gaussian distribution (note that this kurtosis does not depend on n). Taking along the next higher term in the long time expansion, we observe that the leptokurtosis vanishes as ≃ 1/t in the long time limit, In Fig. 3 we compare the analytical result (57) for the kurtosis for d = 1 based on Equations (54) and (55) with simulations, showing excellent agreement from the short time behavior all the way to the saturation plateau at the Gaussian value K = 3. We note that the crossover time from strongly leptokurtic to Gaussian behavior occurs at t ≈ 1, which in dimensional units corresponds to the correlation time of the diffusing diffusivity process. Thus in experiments the behavior of the kurtosis as function of time provides a direct means to extract the correlation time of D(t), which also corresponds to the crossover time from the exponential to the Gaussian behavior of the probability density P (x, t), as will be shown below. We also note that when the process Y(t) is very highly dimensional and thus D st large, the exponential tails of P (r, t) do not exist, see Appendix C.
We now derive explicit analytical results for the probability density function P (r, t) in the short and long time limits, starting with the short time limit and its relation to the superstatistical formulation of the diffusing diffusivity. In our further exemplary calculations we take n = d. However, in Appendix C we discuss the situations when n = 1 and d = 2, as well as when n goes to infinity while d stays finite.

B. Short time limit
First we concentrate on the shape of the density P (r, t) in the short time limit t ≪ τ . In dimensionless units this means that we consider the asymptotic behavior of the characteristic function (48) under the condition t ≪ 1, for which sinh t 1 + 2k 2 ∼ t 1 + 2k 2 , cosh t 1 + 2k 2 ∼ 1, (61) Together with expression (48) we thus find This expression is indeed normalized,P (k = 0, t) = 1. We can thus perform the inverse Fourier transform to obtain P (r, t) in the short time limit.
(i) For one dimension d = n = 1 we find in terms of the Bessel function [51] K 0 (aβ) = ∞ 0 cos(ax) Apart from the normalization we observe that from Eq. (63) we also derive the Brownian behavior x 2 (t) = t = 2 D st t, in accordance with Eqs. (24) and (54). Keeping in mind that here we are pursuing the large value limit of the scaling variable z = xt −1/2 ≫ 1, we expand the Bessel function in the form [52] We thus find the asymptotic result This expression reproduces the exponential shape of the probability density function P (x, t) of the diffusing diffusivity model, with the power-law correction |x| −1/2 . Figure 4 demonstrates excellent agreement of our short time result (63) and simulations. For the longest simulated time the wings of the distribution start to show some deviations, indicating that in this case the short time limit is no longer fully justified.
(ii) For d = n = 2 we obtain  . 4: Probability density function P (x, t) for d = n = 1 at short times in dimensionless form (σ = τ = D⋆ = 1). We compare results from simulations of the set of Langevin equations (20a) to (20c), represented by the symbols, with the explicit short time solution (63). Excellent agreement is observed, only for the longest time t = 0.5 the wings start to show some deviations. and thus Here we used the relation π 0 cos(kr cos ϕ)dϕ = πJ 0 (kr) (69) in terms of the modified Bessel function J 0 . The distribution is normalized and encodes the Brownian behavior r 2 (t) = 4t = 4 D st t. Expanding the Bessel function K 0 as above we find that (iii) Finally, in d = n = 3 the asymptotic probability density becomes and we have r 2 (t) = 9t = 6 D st t. Expansion of the Bessel function produces the asymptotic exponential behavior P (r, t) ∼ 1 (2π) 3/2 r 1/2 t 5/4 e −r/ √ t . (72) As we will show in the following Subsection, in all dimensions the short time limit reproduces the superstatistical behavior. The subordination formulation, the explicit result (63) in terms of the Bessel function, and the asymptotic exponential (Laplace) form (66) (and their multidimensional analogs (68) and (70) to (72)) constitute our first main result. We note that the asymptotic forms (66), (70), and (72) of the density function P (x, t) by itself leads to the Brownian scaling (54) of the corresponding mean squared displacement. When n = d the exponential shape of the short time behavior is conserved while the subdominant prefactors change, as shown for the case d = 2 and n = 1 in Appendix C.

C. Relation to the superstatistical approximation
Above we formulated the concept of diffusing diffusivities in terms of coupled stochastic equations for the particle position r(t) and the random diffusivity D(t). An alternative approach suggested in Refs. [24,34] is that of the superstatistical distribution of the diffusivity, as laid out in Section II. In this superstatistical sense the overall distribution function is given as the weighted average of a single Gaussian over the stationary diffusivity distribution, (ii) In d = n = 2, we have with Eq. (32) (iii) Finally, in d = n = 3, we find with Eq. (34) For all d the mean squared displacement acquires the linear Brownian scaling in time r 2 = 2d D st t, as it should. This is but exactly the result of our subordination scheme in the short time limit, expressions (66), (68), and (71), written in dimensional form. Thus, in our approach to the diffusing diffusivity the short time regime of the subordination formalism leads directly to the superstatistical result. The reason is as follows: At times less than the diffusivity correlation time τ the diffusion coefficient does not change considerably, and the subordination scheme describes an ensemble of particles, each diffusing with its own diffusion coefficient. This mimics a spatially inhomogeneous situation, when the local diffusion coefficient is random, but stays constant within confined spatial domains. In this case the ensemble of particles moving in different domains exhibits a superstatistical behavior, as assumed in the original works [37]. However, in any system with finite patch sizes, we would not expect the particles to stay in their local patch of diffusivity D forever, thus violating the assumption of the superstatistical approach. Our annealed approach in some sense delivers a mean field approximation to the spatially disordered situation, and adequately describes the transition from short time superstatistical behavior to the Gaussian probability law at long times, which will be shown in the subsequent section. The full consistency in the short time limit between the subordination approach and superstatistics is our second main result.

D. Long time limit
We now turn to the long time limit encoded in the Fourier transform (44) of the probability density P (r, t), that is, the times larger than the diffusivity correlation time τ . In dimensionless units it corresponds to t ≫ 1, and the hyperbolic functions assume the limiting behaviors Combined with result (48) we find As in the short time limit above, this expression is normalized,P (k = 0, t) = 1. Now let us focus on the tails of the probability density P (r, t), corresponding to the limit k ≪ 1, for which Eq. (78) givesP (k, t) ∼ exp(−nk 2 t/2) = exp − D st k 2 t , and thus At long times the probability density function P (r, t) assumes a Gaussian form, with the effective diffusivity D st = n/2. This is a consequent result given the Ornstein-Uhlenbeck variation of the diffusivity encoded in the starting equations (19b) and (19c): at sufficiently long times the process samples the full diffusivity space and behaves like an effective Gaussian process with renormalized diffusivity. The explicit derivation of the crossover to the Gaussian behavior is our third main result. Fig. 5 shows the crossover from the initial exponential to the long time Gaussian behavior of the probability density function P (x, t) for d = n = 1 by comparison to the Gaussian distribution (79) for short time t = 0.1, the crossover time t = 1.0 and the longer time t = 10.0.

V. BIVARIATE FOKKER-PLANCK EQUATION AND RELATION TO THE SUBORDINATION APPROACH
In this section we derive the Fokker-Planck equation corresponding to the set of stochastic equations (20a) to (20c) of our diffusing diffusivity model. We will also establish the relation to the subordination approach of section IV. Note that we here restrict the discussion to the case d = n = 1, as higher dimensional cases are completely equivalent.
Following our notation we thus seek the Fokker-Planck equation for the bivariate probability density function f (x, y, t), which has the structure ∂ ∂t f (x, y, t) = L y f (x, y, t) + y 2 ∂ 2 ∂x 2 f (x, y, t).
The Fokker-Planck operator in y reads and the marginal probability density function for x is then To proceed further we introduce the joint probability density function q(τ, y, t) of the Ornstein-Uhlenbeck process y(t) and its integrated square τ (t), given by equation (39). The corresponding system of stochastic equations has the form and thus the bivariate Fokker-Planck equation governing the probability density q(τ, y, t) reads ∂ ∂t q(τ, y, t) = L y q(τ, y, t) − y 2 ∂ ∂τ q(τ, y, t).
Now we introduce an ansatz for the solution of equation (80) of the form where G(x, τ ) is the Gaussian probability density given by equation (37). Then, in accordance with equation (82) the marginal probability density P (x, t) can be written in the subordination form of equation (38), where is the marginal probability density function of the integrated square of the Ornstein-Uhlenbeck process whose Laplace transform is given by equation (42). We now prove that the solution of equation (80) can be presented in the form (85). To that end we differentiate equation (85) and use relation (84) to get We now apply relation (85) to the first term and integrate the second term by parts, obtaining Finally, with the relation ∂G(x, τ )/∂τ = ∂ 2 G(x, τ )/∂x 2 , we see that ∂ ∂t f (x, y, t) = L y f (x, y, t) + y 2 ∂ 2 ∂x 2 f (x, y, t) +y 2 q(τ = 0, y, t)δ(x). (89) The last term vanishes, as τ is the integrated square of y(t), and thus we arrive at equation (80). Therefore we showed that the solution of the bivariate Fokker-Planck equation (80) can be presented in the form (85), and consequently the marginal probability density function P (x, t) can be written in the subordination form (38). The connection of the bivariate Fokker-Planck equation (80) for the Langevin system (19a) to (19c) with the subordination approach represented by relations (36) and (38) is our fourth main result.

VI. DISCUSSION
An increasing number of systems are reported in which the mean squared displacement is linear in time, suggesting normal (Fickian) diffusion of the observed tracer particles. Concurrently the (displacement) probability density function is pronouncedly non-Gaussian. Normal diffusion with a Laplace distribution of particle displacements was previously explained in a superstatistical approach by Granick and coworkers [24]. The experimentally observed crossover to Gaussian statistics at longer times was interpreted as a consequence of the central limit theorem, kicking in at times longer then the correlation time of the diffusion fluctuations [24]. Chubinsky and Slater [34] introduced the diffusing diffusivity model and studied it numerically, concluding the crossover from the initial exponential shape to a Gaussian with effective diffusivity. Jain and Sebastian go further with the double Langevin approach [35]. Here we introduced a consistent minimal model for a diffusing diffusivity. We explicitly obtain the Fourier transform of the full probability density function, from which we derive the analytical short and long time limits. This allows us to determine the dynamical crossover to the long time Gaussian shape of the probability density at the correlation time of the fluctuating diffusivity. Moreover we demonstrate a full consistency of our minimal model with the superstatistical approach, as well as with the results of Jain and Sebastian. At the same time our model is more general and flexible: phrasing the diffusing diffusivity approach in terms of a subordination concept we endow our model with an extremely flexible basis, such that a wide range of different statistics for the diffusivity can be included.
We also obtained the bivariate Fokker-Planck equation for this diffusing diffusivity process and expressed its solution in terms of the subordination integral. Excellent agreement with simulations is provided for the probability density function, the Brownian scaling of the mean squared displacement, and the kurtosis of the probability density function.
We are confident that this subordination integral formulation of the diffusing diffusivity model will prove useful for experimentalists observing such dynamics. The model can be calibrated with respect to the two parameters τ and σ, which can be obtained from experiment by analyzing the time-dependence of the mean squared displacement and the kurtosis. The latter provides information on the typical diffusivity correlation time τ (see Fig. 3), whereas the former allows one to estimate the parameter σ, see Eq. (56). Moreover, measurement of the diffusivity distribution, as can be directly obtained experimentally [55], provides the value of the parameter D ⋆ = σ 2 τ according to Eq. 28. Additionally, the possibility to include a different dimensionality n for the subordinating process Y(t) allows for fine-tuning of the model to match the experimentally observed probability density function, which might be necessary when the model is used for quantitative predictions. As we show in the example in Appendix C1 the difference in the dimensionality n of the process Y(t) does not change the dominant exponential behavior at short times but affects the prefactors. Similarly the prefactors of the exponential in the diffusivity distribution p st D (D) is affected by the concrete value of n. This fact may be employed to account for the deviations from the pure exponential shape of the probability distribution also reported in [24].
An intriguing question emerging from our analysis concerns the physical origin of the dimensionality n of the subordinating process Y(t). Intuitively, one might argue that the diffusivity and thus Y(t) should have as many components as spatial directions in the particle trajectory r(t), i.e., d = n. This is the case considered in the derivations in Section IV. However, the mode concept for D(t) introduced here may also have a more fundamental physical meaning. As we showed here, the value n affects the details of the shape of P (r, t) as well as p st D (D), and for large n values the short time exponential shapes may even be fully suppressed. Advanced experiments allowing one to determine n from the exact shape of the probability densities P (r, t) and p st D (D) will provide important clues concerning this question.
Possible generalizations may include diffusing diffusivity models with additional deterministic time dependence of the diffusivity [54] or non-Gaussian anomalous viscoelastic diffusion in crowded membranes [21], which contrasts Gaussian anomalous viscoelastic diffusion in noncrowded membranes [20]. Of course, the diffusing diffusivity concept is a first step in capturing the full spatiotemporal disorder of complex systems. Ultimately, a full description of spatial and temporal stochasticity in terms of a random diffusivity D(x, t) will be desired.
Let us put the diffusing diffusivity approach into context with other popular models with distributed transport coefficients. Typically these are constructed to describe anomalous diffusion processes. Another model is scaled Brownian motion, in which the diffusivity is a deterministic, power-law function of time [56,57]. On a stochastic level scaled Brownian motion appears natu-rally in granular gases [58], in which non-ideal collisions effect a decrease of the system's temperature (kinetic energy). Scaled Brownian motion is non-ergodic and displays a massively delayed overdamping transition [59]. Heterogeneous diffusion processes employ a continuous, deterministic space dependence of the diffusivity and lead to non-ergodic and ageing dynamics [57,60,61]. In contrast to these models random diffusivity approaches also have a considerable history. Thus segregation in solids in the context of radiation was described by such an approach [62], and Brownian motion in media with fluctuating friction coefficient, temperature fluctuations, or randomly interrupted diffusion were used to describe, for instance, randomly stratified media [63]. A random diffusivity approach was elaborated to consider light scattering in a continuous medium with fluctuating dielectric constant [64]. Motivated by the comparison of diffusion processes assessed by different modern measurement techniques, the concept of microscopic single-particle diffusivity was developed [65]. In [66] the diffusivity varies randomly but is constant on patches of random sizes. Such random patch model show non-ergodic subdiffusion due to the diffusivity effectively changing at random times with a heavy-tailed distribution. Intermittency between two values of the diffusivity were also considered [53]. Finally, we mention that a deterministic time dependence of the diffusivity was combined with a random diffusivity in [54]. As seen in several experimental studies already, to describe stochastic particle motion in real complex systems such as living biological cells, combinations of different stochastic mechanisms are necessary to capture the observed dynamics [16]. Thus also the diffusing diffusivity picture may need to be complemented by other processes, as we saw in the example of non-Gaussian viscoelastic subdiffusion based on the observations in [21,55].
Despite the wealth of established stochastic processes the diffusing diffusivity model has quite unique properties. Thus the crossover from a short time exponential shape to Gaussian statistics at longer times, while the MSD remains linear and thus classifies normal (Fickian) diffusion, cannot be captured by existing models. Of course, crossovers between non-Gaussian to Gaussian probability density functions may be grasped by truncated continuous time random walks or distributed order fractional diffusion equations [67]. However, in these models also the MSD exhibits a crossover from anomalous to normal diffusion. In this sense we believe that the diffusing diffusivity model and its potential generalizations on the basis of our subordination approach will emerge as a new paradigm in the theory of stochastic processes.
We conclude with pointing out that the diffusing diffusivity model developed here is closely related to the Cox-Ingersoll-Ross (CIR) model for monetary returns which is widely used in financial mathematics [39]. To show this relation let us write the Langevin equation (19c) for the Ornstein-Uhlenbeck process as where i = 1, . . . , n and W i (t) is the Wiener process with variance 1/2. Our aim is to design a Langevin equation in the Itô form for the squared Ornstein-Uhlenbeck process in n dimensions, To this end we employ the Itô formula of differentiation to the function of a n-dimensional vector [40] to find This is but the stochastic differential equation of the CIR process describing the time evolution of interest rates [39]. The same process is used in the Heston model specifying the evolution of stochastic volatility of a given asset [41]. Our results for the subordination approach should therefore also be relevant to financial market modeling. Indeed, the technique of subordination, which is closely related to random time changes, is a very common concept in financial mathematics [68].
Appendix A: Superstatistics with modified exponential diffusivity distribution Consider the Gaussian probability density function typical for viscoelastic subdiffusion in the overdamped limit, which is equivalent to fractional Brownian motion [12]. The associated mean squared displacement is x 2 (t) = 2D α t α . For the superstatistical distribution of the generalized diffusion coefficient we choose the modified exponential The resulting probability density function After change of variables according to y =D κ we have With the identification with the Fox H-function [69], using the Laplace transform rules for the H-function [70] along with the standard rules for the Fox H-function [69] one arrives at the result The asymptotic behavior is then [69] . (A8) In Figure 6 we show the behavior of the resulting probability density (A3) from numerical inversion. For compressed exponential distributions p D (D α ) with κ > 1 the resulting function P s (x, t) is a stretched Gaussian, while for a stretched exponential p D (D α ) with 0 < κ < 1 the function P s (x, t) is a superstretched Gaussian, which is broader than the exponential (Laplace) distribution. Figure 6 also demonstrates that the asymptotic behavior (A8) indeed fits the numerical inversion.

Asymptotics by Laplace's method
The asymptotic behavior (A8) may also be obtained by the Laplace method. As this is an interesting alternative method to derive the asymptotic behavior of the integral (see Eq. (A4)) for λ ≫ 1, we include this approach here. This is a Laplace integral of the form The standard methods to evaluate the asymptotics of I cannot be applied, since f (y) in Eq. (A9) equals zero at y = 0 with all its derivatives. Thus, to evaluate the asymptotics we need to find the maximum of the function ϕ(y) = −λy − y −κ (A11) which is reached at y m = (κ/λ) 1/(1+κ) . We now introduce the new variable t = y/y m , such that Eq. (A9) becomes After substitution τ = tκ 1/(1+κ) we get where λ = λ κ/(1+κ) and S(τ ) = −τ − τ −κ . Now, the standard Laplace method can be applied to Eq. (A13). The function S(τ ) reaches its maximum at τ m = κ 1/(1+κ) . Following the standard procedure we find .

Power law diffusivity distribution
We now consider the power law distribution with α > 0. With the relation (7) we separately consider the following cases: after substituting and integrating by parts. In the tails we then obtain the following scaling behavior for the probability density function, lim k→0 P (k, t) ∼ 1 − D ⋆ k 2α t α 1 + D ⋆ k 2 t + . . .
It thus follows that such that the second moment does not exist.
The second moment still does not exist. with the MSD Power law diffusivity distributions lead to a long tailed, power law distribution P (x, t) in the superstatistical approach. The second moment diverges for 0 < α ≤ 1, while normal diffusion emerges for α > 1.
The asymptotic behavior is given by Comparing this result with Eq. (70) for the case d = n = 2 we recognize the modified prefactor, including a different scaling in r and t. Thus the difference in the dimensionality of the process Y(t) does not change the dominating exponential behavior. The connection to the superstatistical approach in analogy to the discussion in Section IV C following Eq. (73) with the two-dimensional Gaussian kernel G(r, t|D) = 1 4πDt e −r 2 /(4Dt) (C3) and the stationary diffusivity distribution for the case n = 1 produces the distribution (C5) which matches exactly Eq. (C1) written in dimensional form.

Infinite-dimensional process Y(t)
We here consider the limit of large dimension n for the process Y(t). For short times t ≪ 1 the tails of the probability density P (r, t) follow from (compare Eq. (48) Thus the tails of the probability density function P (x, t) are Gaussian already at short times. The long time behavior t ≫ 1 leads tô P (k, t) ∼ 2 n/2 exp nt 2 1 − √ 1 + 2k 2 Considering the tails, we take k ≪ 1, revealing that is also Gaussian, with the same variance. Thus, in the high-dimensional case the regime of exponential wings in the probability density function does not exist at all and the Gaussian shape is established early on. This is equivalent to the observation that the kurtosis (57) becomes Gaussian already for short times when n is large.