Stochastic time-evolution, information geometry and the Cramer-Rao Bound

We investigate the connection between the time-evolution of averages of stochastic quantities and the Fisher information and its induced statistical length. As a consequence of the Cramer-Rao bound, we find that the rate of change of the average of any observable is bounded from above by its variance times the temporal Fisher information. As a consequence of this bound, we obtain a speed limit on the evolution of stochastic observables: Changing the average of an observable requires a minimum amount of time given by the change in the average squared, divided by the fluctuations of the observable times the thermodynamic cost of the transformation. In particular for relaxational dynamics, which do not depend on time explicitly, we show that the Fisher information is a monotonically decreasing function of time and that this minimal required time is determined by the initial preparation of the system. We further show that the monotonicity of the Fisher information can be used to detect hidden variables in the system and demonstrate our findings for simple examples of continuous and discrete random processes.


I. INTRODUCTION
Information geometry [1] is a branch of information theory which describes information in terms of differential geometry. This can be motivated by a question central to any physical experiment: Given a system described by a set of parameters, how much information about the system can we gain from a slight variation of the parameters? Under certain regularity conditions, how the system changes under such a small parameter variation defines a metric, the so-called Fisher information metric [2][3][4][5]. This metric encodes the maximum amount of information that can be gained by measuring the change of any observable due to the parameter change.
The relation between the measurement of (macroscopic) observables and information gained about the physical system is also central to thermodynamics. Deciding which observables to measure and which parameters to vary in doing so is essential for reconstructing the thermodynamic potentials and thus obtaining a complete information about the macroscopic state of the system. Thus, it is not surprising that there exists a strong connection between thermodynamics and information theory, which, despite dating back all the way to Gibbs and Boltzmann [6], has recently received much attention [7][8][9][10][11][12][13][14][15][16]. This is in part motivated by improved experimental techniques, allowing to probe the relation between information and thermodynamic quantities on a more detailed and microscopic level [17,18], but also by new theoretical proposals based on understanding information as a quantity that is just as physical as matter or energy.
Recently, a 7connection between information geometry and stochastic thermodynamics was established in Ref. [19]. Stochastic thermodynamics describes the behavior of thermodynamic quantities like heat, work and entropy in small systems, where these quantities fluctuate due to the presence of noise [20,21]. In particular, Ref. [19] found an intimate connection between Fisher information and stochastic entropy. In this case, the parameter, whose change is described by the Fisher information, is time. Thus, the temporal Fisher information quantifies how much information can be gained from the time evolution of the system.
In this work, our aim is to expand on the idea of describing the time evolution of a Markovian stochastic system in terms of information, and to elucidate the consequences for the behavior of measurable observables. Our first result is a speed limit on the time evolution of any observable, which is related to the Crámer-Rao bound [22,23]. Specifically, the rate of change of an observable is bounded from above by its fluctuations times the temporal Fisher information. This provides a measurable consequence of the Fisher information as the maximum obtainable information. Interpreting the Fisher information as a thermodynamic cost, this result complements a class of recently derived steady-state thermodynamic uncertainty relations [24][25][26][27][28]. As our second main result, we show that if the stochastic system describes a relaxation process without time-dependent driving, then the temporal Fisher information is a monotonically decreasing function of time. Thus, the amount of information that can be gained by observing a relaxation process gradually decreases. Together with our first result, this provides an explicit quantification of the physical intuition that the time evolution of a system should gradually slow down during a relaxation process. The monotonicity of the Fisher information for relaxation processes has two profound consequences: First, it results in a lower bound on the time required to evolve a stochastic system from an initial to a final configuration, extending previously obtained speed limits for stochastic dynamics [19,29]. Second, it can serve as an indicator for the presence of hidden variables in the system: If we observe an increase of the Fisher information during a relaxation process, this necessarily implies that we are missing some infor-arXiv:1810.06832v1 [cond-mat.stat-mech] 16 Oct 2018 mation about the system. We show that this discrepancy between observed and total information can be used to detect hidden degrees of freedom.
The present paper is organized as follows: In Section II, we introduce the Fisher information and some of its basic properties. In Section III we show how, in the case of a time-dependent stochastic system, the Crámer-Rao bound leads to a speed limit for arbitrary observables. We briefly discuss the relation between this speed limit and previously obtained bounds. Section IV contains the explicit proof of the monotonicity of the Fisher information in Markovian dynamics without explicit time-dependence, followed by a more general argument based on the relation between Fisher information and the Kullback-Leibler divergence. In Section V, we use the monotonicity to derive a generalization of a previously obtained speed limit for stochastic dynamics. As a second consequence, we show in Section VI how a nonmonotonic behavior of the Fisher information can be used to detect hidden degrees of freedom in the system. Section VII is dedicated to examples that provide an explicit demonstration of the general results of the previous sections; starting from the explicit expression for the Fisher information for an arbitrary normal distribution, we discuss the paradigmatic examples of diffusion and a particle in a parabolic potential. We go on to construct a simple jump process and show that the behavior of the Fisher information behaves qualitatively differently depending on whether hidden states are present in the system or not. We finish with some concluding remarks and outlook in Section VIII.

II. LENGTH AND FISHER INFORMATION
We study a general stochastic system that is described by a probability density P (X = x|θ) ≡ P (x, θ), where X is a vector of M continuous random variables X = (X 1 , . . . X M ) ∈ R M and θ ∈ R is a parameter. If θ is equal to the observation time t ∈ [0, T ] then P (x, t) describes the time evolution of the probability density. However, θ may also be some other, more general parameter, e. g. P (x, θ) could be the steady state probability density of the system and θ some externally tunable field. In the following, we assume that P (x, θ) depends smoothly on θ, such that, in particular, the derivative ∂ θ P (x, θ) exists and is a continuous function and the second derivative ∂ 2 θ P (x, θ) exists. The Fisher information I(θ) is defined by [30] where . . . θ denotes an average with respect to P (x, θ).
Here, and in the following, we use the shorthand ∂ θ f = ∂f /∂θ for partial and d θ f = df /dθ for total derivatives.
The last equality follows from the normalization of the probability density ∂ θ dx P (x, θ) = ∂ θ 1 = 0. We note that, by definition, the Fisher information is positive and vanishes only if the probability density is independent of θ. The Fisher information is related to the Kullback-Leibler divergence or relative entropy between two distributions P (x) and Q(x), Choosing Q(x) = P (x, θ + dθ), i. e. the probability distribution at an infinitesimally different value of θ, the corresponding Kullback-Leibler divergence is to leading order in dθ given by and the Fisher information thus is the curvature of the Kullback-Leibler divergence. Similar to the Kullback-Leibler divergence, the Fisher information is additive in the following sense: Suppose that we subdivide the random variables into two sets X = (Y , Ψ). Introducing the conditional probability density P Ψ|Y (ψ, θ|y) = P (Ψ = ψ|Y = y, θ), we can then write where P Y (y, θ) = P (Y = y|θ) is the marginal density of the random variables Y . Then a straightforward calculation shows that The Fisher information can thus be decomposed into two positive terms, depending on the conditioned statistics of the random variables Ψ and the statistics of the random variables Y , respectively. In particular, we have I(θ) ≥ I Y (θ), i. e. eliminating variables decreases the Fisher information. If the random variables Ψ and Y are further independent, then we have The geometric interpretation of the Fisher information follows from defining a statistical line element ds by The quantity ds may be thought of as a dimensionless distance between the probability densities at two infinitesimally different values of θ, i. e. between P (x, θ) and P (x, θ + dθ). The infinitesimal statistical line element in a natural way defines a statistical length, FIG. 1. Illustration of different parameterizations of the path between two probability densities P (θ1) and P (θ2) (red dots). While any parameterizationP is constrained to lie on the unit sphere due to normalization, the length of the path can be arbitrarily long (blue). By contrast, the shortest possible path is given by the geodesic P * (green dashed). Note that this three-dimensional illustration corresponds to the case of three discrete states, whereas in the case of continuous random variables the underlying space is infinite-dimensional.
This length measures the length of the path traced by the probability density under a change of the parameter from θ = θ 1 to θ = θ 2 . We remark that the statistical length has all the properties expected of a path length, in that it satisfies the triangle inequality and is invariant under monotonic reparameterizations of the path. We further remark that the above notions can be extended to a higher-dimensional parameter space; however, in what follows, we will take θ to be the evolution time of a stochastic system and thus will only require the one-dimensional case. In principle, there infinitely many possible parameterizations of the path from θ 1 to θ 2 , e. g.P (x, θ) withP (x, θ 1 ) = P (x, θ 1 ) and However, since any parameterization has to give a normalized probability density, dx P (x, θ) = 1, there exists a unique parameterization that minimizes the path length L(θ 2 , θ 1 ). Geometrically, the normalization condition means that P (x, θ) has to be a vector of length 1, i. e. tracing a path on the infinite-dimensional unit sphere, see the illustration in Fig. 1. Thus the minimal length is the arc length between the point P (x, θ 1 ) and P (x, θ 2 ), Λ ≡ 2 arccos dx P (x, θ 2 )P (x, θ 1 ) .
The parameterization that realizes this minimal length is the geodesic curve, which simultaneously minimizes the action integral For the geodesic curve, we thus have while for any other parameterization P (x, θ), we have the inequalities where the first inequality follows from applying the Cauchy-Schwartz inequality to L 2 and the second one is a consequence of L ≥ Λ.

III. CRÁMER-RAO BOUND AND DIFFERENTIAL SPEED LIMIT
We define an observable R(X) as a function of the random variables; its average corresponding to a parameter value θ is given by Since P (x, θ) is a normalized probability density, we have dx ∂ θ P (x, θ) = 0, or ∂ θ ln P θ = 0. Then, using the covariance inequality, it is straightforward to obtain the following inequality where we defined ∆R(x) = R(x) − R θ . Taking the square root of the above, and using the definition Eq. (6) of the statistical line-element, This inequality states that the change in some observable R due to a change in θ relative to its fluctuations is bounded by the line element ds. Interpreting the lefthand side as a distance in the average of R, the intuitive interpretation of this inequality is that the distance elapsed in the space of probability densities is always larger than the distance in the projection of this space onto any observable. Note that if the observable R =θ is an unbiased estimator of θ, i. e. θ θ = θ, then this is equivalent to the Crámer-Rao bound [22,23], . (16) Written in terms of the Fisher-information, the inequality (15) is thus equivalent to the generalized Crámer-Rao bound [30], whereR is any unbiased estimator of R θ . The variance of an estimator of R is thus always larger than the sensitivity of the expectation of R with respect to changes in the parameter θ divided by the Fisher information. The Crámer-Rao bound is widely used in estimation theory and statistics. However, the inequality (15) and thus the Crámer-Rao bound have another intriguing physical interpretation if θ = t is equal to the physical time. In this case, we have the bound where we defined the temporal Fisher information This bound provides a differential speed limit for the time evolution of any observable without explicit time dependence: the rate of change of any such observable is bounded by its fluctuations times the speed ds/dt of the evolution of the probability density. Alternatively, we can interpret the temporal Fisher information as the maximum information that one can obtain by observing the time evolution of the system. The speed limit Eq. (18) then implies that measuring the time evolution of any observable can only yield less information. The appearance of the fluctuations of the observable appear in the bound (18) reflects that, if the fluctuations of an observable are large, then we also have to observe a large change in the average value in order to make a meaningful statement about the system. Occasionally, it may be useful to consider vector-valued observables R(X) = {R 1 (X), . . . , R N (X)}. In this case, the speedlimit Eq. (18) generalizes to where Ξ R is the covariance matrix of R, In Ref. [19], a thermodynamic interpretation of the action Eq. (10) defined by (ds/dt) 2 , was proposed as the thermodynamic cost associated with the time evolution of the system during the time interval [0, T ]. This thermodynamic cost measures the rate of local entropy entropy production; see Appendix A for a more detailed discussion for the case of Fokker-Planck dynamics. At this point, we briefly recall the thermodynamic uncertainty relation for currents in steady state systems [24][25][26][27][28] where Ẋ st is the steady state current of some observable X, D X = lim T →∞ ∆X 2 T /(2T ) the corresponding diffusion coefficient (quantifying fluctuations) and σ bath st the average rate of entropy production in the heat bath (quantifying the thermodynamic cost of maintaining the current). In analogy to Eq. (23), also Eq. (18) can be understood as a uncertainty relation, since it relates the rate of a change in the system to the fluctuations and the thermodynamic cost of the time evolution. In this sense Eqs. (18) and (23) are dual to each other: The uncertainty relation provides a bound on the rate at which a quantity is transported in a steady state situation, while the speed limit bounds the rate of change of a quantity due to a transient dynamics. We remark that a speed limit similar to Eq. (18) can also be obtained for higher order moments of R, e. g. for the variance where κ R (t) = ∆R 4 t / ∆R 2 2 t and γ R (t) = ∆R 3 t /( ∆R 2 t ) 3/2 denote the kurtosis and skewness of the distribution with respect to R. We provide a derivation of this bound in Appendix B.
In Ref. [19] also an integral speed limit for the total evolution time T was derived where L = L(T , 0) is the length of the path traced by the probability density during the time evolution from 0 to T and C is the corresponding thermodynamic cost. From the definition of the statistical length L, Eq. (7) and Eq. (18) we have where ∆R 2 max denotes the maximum variance of R in the interval [0, T ]. Then we can get an integral speed limit in terms of the observable R from Eq. (25), Thus the time for needed the system to evolve from one state to another is bounded from below by the change of any observable between the two states relative to its fluctuations, divided by the cost C. Note that the speed limit Eq. (25) constitutes a tighter bound on the evolution time than the observable-dependent bound Eq. (27). However, if we are not interested in the precise state of the system but only in the value of the observable R, the latter bound may be the more relevant one. Indeed, we can also read it as a bound on the required thermodynamic cost to change the value of the observable from R 0 to R T within time T , Read in this way, the bound states that a fast (small T ) and precise (small fluctuations) change of an observable necessarily incurs a large thermodynamic cost. Again, this is similar to the uncertainty relation Eq. (23), which states that fast transport with small fluctuations likewise requires a large investment in terms of entropy production [24,31]. As a particularly interesting case of Eq. (18), we note that the time derivative of the Shannon entropy Σ sys (t) = − dx ln(P )P is given by We thus have the bound This provides an inequality between two central quantities of information theory, the Fisher information and the rate of change of Shannon entropy. If we consider the Shannon entropy as the average of a stochastic Shannon entropy Φ(x, t) = − ln(P (x, t)), then the first factor on the right-hand side can be interpreted as the fluctuations of this stochastic Shannon entropy. The inequality (30) then states that the average rate of Shannon entropy change is always less than the fluctuations of the Shannon entropy times the Fisher information.

IV. MONOTONICITY OF FISHER INFORMATION
Up to this point, the origin of the probability density P (x, t), i. e. the precise stochastic system that is described by the latter, has not been specified. We now assume that P (x, t) describes the time-evolution of a diffusive dynamics, i. e. is the solution of the Fokker-Planck equation [32] where a sum over repeated indices is implied. Here a(x, t) is a drift vector and B(x, t) is a symmetric and positive semidefinite diffusion matrix, i. e.
for an arbitrary vector v and for all x and t. The Fokker-Planck operator G is the generator of the dynamics. We further introduce the adjoint of the generator, which satisfies for suitable (smooth and integrable) functions f (x, t) and g(x, t). For such a dynamics, we consider the timederivative of the Fisher information with the convention that derivatives inside square brackets do not act on terms outside the brackets. Here and in the following, we omit the arguments of the respective functions for brevity. We write the second timederivative of the probability density as Defining the generalized potential Φ(x, t) = − ln(P (x, t)), which can be identified as a stochastic Shannon entropy in the sense that the Shannon entropy is the average of Φ, Σ sys = − dx ln(P )P = Φ t , we can write for the time-derivative of the Fisher From the definition of Φ we have P ∂ t Φ = −∂ t P = −GP , such that the last term vanishes. We thus arrive at The rate of change of the Fisher information thus decomposes into two terms. We can think of the two terms as the dynamical and statistical contribution to the change in Fisher information: The former contribution is proportional to the explicit time-dependence of the dynamics via the drift vector a and diffusion matrix B and can be either positive or negative. The latter contribution, on the other hand, is always less or equal zero, since the diffusion matrix is positive semidefinite. It characterizes the relaxation of the system towards the instantaneous steady state and the loss of information due to this relaxation process. In particular, if the dynamics do not have any explicit time-dependence (i. e. a and B do not depend on time), the Fisher information is a non-increasing function of time, For systems possessing a steady state P st (x) = lim t→∞ P (x, t), this guarantees that the approach towards the steady state is always monotonic with respect to the Fisher information, independent of the initial state. This is obviously not the case for arbitrary observables, which need not approach their steady state value in a monotonic fashion, e. g. for an underdamped particle in a confining potential, whose position may exhibit oscillations. Note that the monotonic behavior of the Fisher information is a consequence of the time-translation invariance of the generator. We remark that the same result holds for e. g. Markov jump processes (see Appendix D). It is in fact a conse-quence of the monotonicity of the Kullback-Leibler divergence: For both Fokker-Planck and Markov jump dynamics, given two solutions for the probability (density) P (t) and Q(t), their Kullback-Leibler divergence is a non-increasing function [32] If the generator of the dynamics is independent of time, then both P (t) and Q(t) = P (t + τ ) are valid solutions.
To leading order in τ , their Kullback-Leibler divergence is given by and we thus have If the system possesses a steady state, then obviously also the Kullback-Leibler divergence between the instantaneous probability and the steady state is a non-increasing quantity. We thus have two related measures for the approach of a system to the steady state: Both the Fisher information I(t) = (∂ t P/P ) 2 t and the Kullback-Leibler divergence D KL (P (t) P st ) = ln(P/P st ) t are positive, non-increasing functions of time and zero only in the steady state. However, the Fisher information has the advantage that it is local in time, i. e. it depends only on the instantaneous state of the system and does not require knowledge about (or even the existence of) the steady state.

V. INTEGRAL SPEED LIMIT
Using the properties of the Fisher information discussed above, specifically the minimal statistical length Eq. (11) and the monotonic behavior of the Fisher information for dynamics without explicit time-dependence Eq. (40), we can conclude from the speed limit Eq. (25) Here, we used that L ≥ Λ and 2C = . We thus have the following speed limit for dynamics without explicit time-dependence Importantly, this speed limit depends only on the initial and final state and on the generator G of the dynamics. We remark that such speed limits have been extensively discussed in quantum-mechanical systems (see e. g. Ref. [33]), however, it has recently been found that similar bounds also apply to classical and stochastic dynamics [29]. We note that in contrast to the Margolus-Levitin-type bound derived in Ref. [29] (Eq. (23) therein), this result does not require any particular spectral properties of the generator or existence of a steady state; the only requirement is that the generator does not depend explicitly on time. It is further tighter than the Mandelstam-Tamm-type bound derived in Ref. [29] (Eq. (26) therein) for a particle relaxing in a binding potential, since we have 2 arccos(x) ≥ π(1 − x) for x > 0.
Using the monotonicity of the Fisher information, we also have from Eq. (18) Thus, in the absence of explicit time dependence, the initial state limits how fast any observable may evolve at any later time. As mentioned before, R t is not necessarily a monotonic function of time, however, the magnitude of d t R t relative to the fluctuations of R is bounded from above by a decreasing function. Thus, if R t exhibits oscillations, this result implies that the amplitude of the oscillations necessarily decreases over time, provided ∆R 2 t is bounded.

VI. FISHER INFORMATION AS AN INDICATOR FOR HIDDEN DEGREES OF FREEDOM
For Fokker-Planck and Markov jump dynamics without explicit time-dependence, we have shown in Section IV, respectively Appendix D, that the Fisher information has to decrease monotonically with time. Combining this with the additivity of Fisher information (see Eq. 5), we can make a statement about the behavior of the Fisher information in the presence of hidden degrees of freedom. Suppose that, as in Eq. 5, the system of interest is composed two sets of degrees of freedom Y and Ψ. Physically, we assume that Y contains the observable degrees of freedom, that are accessible to direct observation, and Ψ is composed of hidden degrees of freedom, which are not directly observable. If the system is timeindependent, we then have for the Fisher information of the total system from Eqs. (5) and (40) While each individual term may be positive or negative, the sum of the terms has to be negative. This means that if we measure I Y (t) from the probability distribution of the observable degrees of freedom and find d t I Y (t) > 0 at any time, than this is a clear indicator that hidden degrees of freedom are present in the system. We can make this precise in the form of the following statement: If, for some stochastic process Y (t), we observe d t I Y (t) > 0 at any time t, the process cannot be described in terms of a diffusion process with time-independent drift and diffusion coefficient. Thus, either the drift and/or diffusion coefficient depend explicitly on time, or there are hidden degrees of freedom in the system which effectively render the process Y (t) non-Markovian. Similarly, for a Markov jump dynamics with states labeled by the set X = {1, . . . , M } and occupation probabilities p i , i ∈ X, we can divide the states into observable states Y ⊂ X and hidden states Ψ = X \ Y . For a state i ∈ B, we define the occupation probability restricted on the set of observable states as p i|Y = p i /p Y , where p Y = i∈Y p i is the probability to find the system in an observable state. Then the Fisher information can be decomposed as with Here we used that the probability of the system being in a hidden state satisfies p Ψ = 1 − p Y . The total Fisher information thus consists of three terms: The first two contain the Fisher information I |Y , I |Ψ due to changes in the occupation probabilities within the observable and hidden states, respectively. These correspond to the dynamics within the subsets Y and Ψ. The third term describes exchange of probability between the subsets Y and Ψ. If only the set Y of states is accessible to observation, then only the restricted occupation probabilities p i|Y and thus I |Y can be measured. Assuming that the transition rates are time-independent, we have for timederivative of the total Fisher information d t I(t) ≤ 0 (see Eq. (D9)). If we now measure I |Y (t) as a function of time and find d t I |Y (t) > 0 at any time, then this is a clear indication for the presence of hidden states. Thus, the existence of hidden states can potentially be determined from the behavior of the Fisher information of the observable states.

A. General normal distributions
A particularly succinct and widely applicable example for the relation between statistical lenght, Fisher information and observables is for a normal distribution in M variables, with the average x t = µ(t) and the (symmetric and positive definite) covariance matrix Ξ(t) defined by Here the subscript T denotes transposition and det the determinant. In this case, we can compute the rate of Shannon entropy change σ sys (t) = d t Σ sys (t) and the Fisher information explicitly [34], with a symmetric, positive semidefinite matrix B, provided that the initial distribution is normal The mean and covariance matrix then are determined by the differential equations or in matrix notation (using that B is symmetric)

Ξ(t) = K(t)Ξ(t) + Ξ(t)K T (t) + B(t),
with initial condition µ(0) = µ 0 and Ξ(0) = Ξ 0 . These equations allow us to write the Fisher information without relying on time-derivatives, Obviously, any normal distribution is uniquely determined by its mean and covariance matrix and thus the latter two quantities also specify the average of any observable R(x) and its time evolution. However, how precisely the time evolution of the mean and covariance matrix impact the time evolution of R t , i. e. the explicit expression of R t in terms of µ and Ξ is not obvious except in simple cases. Nevertheless, from Eq. (20), we always have the bound This bound is particularly instructive for a timeindependent covariance matrixΞ = 0, where it states that the change in the average of any observable, relative to its covariance matrix, is always less than the respective quantity for the mean of the distribution. In this sense, no observable can change faster than the mean of the distribution. We further note a result valid for any probability distribution which depends on time only via its mean µ, and can thus be written as P (x, t) =P (x − µ(t)). For such a probability distribution, the Fisher information is always larger than for a normal distribution with the same mean and variance, Thus, a normal distribution minimizes the Fisher information for pure translations. We give the proof of this result in Appendix E. Note that the inequality (59) breaks down if the variance or some higher cumulants depend on time.
For a normal distribution, the relation between Fisher information and Shannon entropy Eq. (30) change takes a particularly simple form, since, as we show in Appendix E, we have independent of the shape of the covariance matrix. For a normal distribution, we thus have the relation between Shannon entropy and Fisher information Using Σ sys (T )−Σ sys (0) = T 0 dt σ sys (t) and applying the Cauchy-Schwarz inequality, this yields Since we generally expect both C and Σ sys to scale linearly with the number M of degrees of freedom, we can write this in terms of the following speed limit for normal distributions where weΣ sys = Σ sys /M andC = C/M are the Shannon entropy and thermodynamic cost per degree of freedom. This result has two interesting consequences: First, it provides a speed limit in terms of the Shannon entropy difference between initial and final state. Second, it explicitly demonstrates that, at least in the case of a normal distribution, this speed limit remains useful in the limit of a macroscopic number of degrees of freedom M 1. We stress that the latter statement is not self-evident: For the case of the speed limit Eq. (46), the numerator is obviously bounded from above by π, the largest possible arc length on the unit sphere. On the other hand, the denominator scales as √ M for independent degrees of freedom, since the Fisher information is additive in this case. Thus the right-hand side of Eq. (46) is typically of order 1/ √ M and the bound becomes meaningless in the macroscopic limit.

B. Brownian motion
The most basic example of a continuous-valued random process is Brownian motion. Let us first consider the classical case of an overdamped particle in a environment at temperature T , described by the diffusion equation or, equivalently the overdamped Langevin equation where γ is the friction coefficient, F 0 is a constant bias force, T is the temperature and ξ(t) is Gaussian white noise. The solution of the diffusion equation is straightforward, where x 0 and ∆x 2 0 are the initial average and variance of the particle's position at time t = 0. Here, we introduced the diffusion coefficient D x given by the Einstein relation D x = T /γ. As we only have one degree of freedom, the expression for the Fisher information Eq. (52b) simplifies to Both with and without bias, the Fisher information for Brownian motion is a monotonously decaying function and thus (biased) Brownian motion is a generalized relaxation process. Note that even though the Fisher information decreases, the time-derivative of the average position d t x t = F 0 /γ does not decay to zero but remains constant. This is not in contradiction with the speed limit Eq. (18), which only demands that the time derivative of x t relative to the fluctuations of x-which in this case increase with time-should decrease along with the Fisher information.

C. Particle in a parabolic trap
As a second paradigmatic example, we consider a single overdamped particle with position or, equivalently, the Langevin equation where γ is the friction coefficient, κ the spring constant and T the temperature. We allow the spring constant, temperature and equilibrium position r(t) of the trap to change as a function of time. Provided that the initial state is given by a normal distribution with average x 0 and variance ∆x 2 0 , the solution to this problem is the normal distribution where the average and variance of the position obey the following differential equations, Again, for a single degree of freedom, the expression for the Fisher information is immediate from Eq. (52b) Here, the Fisher information (and thus the thermodynamic cost C) consists of two positive terms: The first one is non-zero if the variance changes as a function of time, the second one if the average position changes. The average rates of change of Shannon σ sys (t) = d t Σ sys (t) and total entropy σ tot (t) = d t Σ tot (t) (see Appendix A) are given by In this case, the bound Eq. (61) on the rate of change of the Shannon entropy is obvious, since we have (M = 1) In the case of a single Gaussian degree of freedom, we thus have equality in Eq. (61) if the average position does not change in time. On the other hand, the local change in Shannon and total entropy, defined in Appendix A, is given by The local change in Shannon entropy vanishes only if the particle is located at the instantaneous average position, since this corresponds to the maximum of the probability distribution and thus a slight change of the particle's position will not change its Shannon entropy. On the other hand, the local change in total entropy always vanishes independent of the particle's position if the system is in an equilibrium state d t x t = d t ∆x 2 t = 0. This reflects the fact that in an equilibrium system, the total entropy production is zero not only on average but also for every single trajectory. Using the equations of motion (71), we can also write the Fisher information as Then the time-derivative of the Fisher information can be calculated as The first three terms depend explicitly on the timederivative of T , r and κ, respectively, while the last term is negative. In particular, if the parameters T , r and κ are independent of time, then we have and the Fisher information decreases monotonically, as predicted by Eq. (40). The same calculation can be done for an underdamped particle with position x(t) and velocity v(t) with the associated equations of motion for the moments Note that the overdamped case is obtained by taking the limit of vanishing particle mass m → 0. In this case, the solution of the equations is already quite involved and we refrain from writing down the cumbersome expression for the Fisher information, which can be obtained from Eq. (52b). However, in this case, since we have two degrees of freedom, already the case where T , r and κ do not depend on time offers some interesting insights. In this case, we observe a relaxation from the initial state to the equilibrium state with x eq = r, v eq = 0, ∆x 2 eq = T /κ, ∆x∆v eq = 0 and ∆v 2 eq = T /m. For a non-equilibrium initial condition corresponding to potential withκ > κ andr = r, the Fisher information of the relaxation process is show in Fig. 2. While the Fisher information of the joint distribution decays monotonically, as prediced by Eq. (40), the Fisher information of the marginal position distribution P (x, t) exhibits a non-monotonic behavior. As the system approaches the overdamped limit of vanishing mass (bottom panel) the maximum in the marginal Fisher information moves to shorter times and we recover the monotonic behavior of the overdamped Fisher information for times longer than the typical relaxation time of the velocity, m/γ. As an example of the speed limit Eq. (18), we show the timederivative of the average relative to the variance (green line in Fig. 2. We observe that this quantity is bounded both by I(t) and I x (t).

D. Hidden states
To demonstrate how the Fisher information can be used as a tool to reveal hidden states, we construct a simple Markov jump model consisting of four states X = {1, 2, 3, 4} with associated energies E i . The timeevolution of the occupation probability p i (t) of state i is governed by the Master equation with given initial occupation p i (0). We take the transition rates W ij > 0 from state j to state i to be timeindependent and assume that they satisfy the detailed balance condition where β = 1/(k B T ) is the inverse temperature and k B the Boltzmann constant. Under this condition, the equilibrium occupation probabilities are given by the Boltzmann weights, We take the states Y = {1, 2, 3} to be low-energy states with similar energies, whereas state 4 is a short-lived, high-energy state. Since the system only spends a short time in the high-energy state, we consider this state as hidden and want to study the behavior of the system using only the occupation probabilities of the observable states Y , conditioned on the system to be in an observable state, As we discussed in Section VI, while the total Fisher information is a monotonous function of time d t I(t) ≤ 0, this is not necessarily true for the Fisher information restricted to the observable states, We can thus use this to distinguish between the system with the hidden state 4 present and a system without this hidden state by examining the time-dependence of the Fisher information. In the following we choose β = 1, which is obtained by randomly assigning values between 0.8 and 1.2 to the lower-left half of the matrix and then enforcing the detailed balance condition on the upper-right half. Note that the entries in the last column are larger, reflecting the large transition rates out of the short-lived state 4. We initialize the system with equal probability in each of the observable states, p 1 (0) = p 2 (0) = p 3 (0) = 1/3 and then evolve it with the above transition rate matrix. Figure 3 shows the resulting time-evolution of the occupation probabilities and the Fisher information. Clearly, the Fisher information of the observable degrees of freedom, Eq. (86), shows a non-monotonic behavior in the presence of the hidden state. Thus, observing only the occupation probabilities of the observable states, we can conclude that a three-state model with time-independent transition rates cannot possibly describe the dynamics correctly.

VIII. DISCUSSION
The speed limit Eq. (18) on the time-evolution of the average of a fluctuating observable shows that the behavior of measurable observables (averages and fluctuations) is governed by the information-theoretic concept of Fisher information. A similar connection between the Fisher information and the family of thermodynamic uncertainty relations was recently obtained in Refs. [35,36]. Such a connection can potentially be exploited in several ways. If the underlying probability distribution and the corresponding Fisher information is not known, then we can obtain a lower bound in terms of measurable quantities. Since the lower bound is guaranteed to hold for all observables, we may also compare the bounds obtained by measuring different observables in order to find the observable that contains the most information about the time-evolution of the probability density.
On the other hand, if we have a theoretical model for a particular physical system, then the speed limit can serve as a test for the validity of the model: If we find that the observed time evolution of any observable exceeds the Fisher information bound predicted by the theoretical model, then this is a sure indication that crucial information about the system is missing in the model. For systems without explicit time-dependence the monotonic decay of the Fisher information provides even stricter restrictions on the type of models that can describe a given system. Finally, if the Fisher information itself is known, then the speed limit imposes a regularity condition on the system in the sense that it limits the rate of change of any conceivable observable.

ACKNOWLEDGMENTS
with Φ sys t = S sys (t). We rewrite Fokker-Planck equation (31) as a continuity equation in terms of the probability current j(x, t), where we define the operator ∇B = ∂ xj B ij . By Itō's Lemma, we have for the differential of Φ sys , where by B∇∇ we mean the operator B ij ∂ xi ∂ xj . We can equivalently write this using the Stratonovich product •, The first term describes the change in Shannon entropy in a fixed state x due to the change in the ensemble probability P (x, t) to be in state x. We interpret this term as a global (in the sense of ensemble) contribution; note that due to conservation of probability, this term always vanishes on average. On the other hand, the second, local, contribution describes the change in Shannon entropy due to a change in state from x to x = x + dx; this change in Shannon entropy in a transition ∆σ sys x →x of a Markov jump process, as defined in Ref. [19]. In analogy to Ref. [19], we thus interpret as the local change in Shannon entropy, which is related to the change in average Shannon entropy via Using this definition and integrating by parts, it is then easy to show that in analogy to Eq. (37) of Ref. [19]. For a diagonal diffusion matrix B ij = B i δ ij with B i > 0, we can further write the change in total entropy as follows [37,38], Defining the local change in medium entropy and total entropy t), we thus have for the average change in total entropy [38] dΣ tot = ∆Σ med loc + ∆Σ sys This further allows us to write again in analogy to the identification made in Ref. [19].

Appendix C: Minimal cost probability density
Let us consider two particular values θ 1 , θ 2 of a parameter and the corresponding probability densities P a (x) = P (x, θ 1 ) and P b (x) = P (x, θ 2 ). Note that there is an infinite number of possible parameterized probability densities satisfying these conditions, e. g. we may have two probability densities P (x, θ) andP (x, θ) that coincide at θ 1 and θ 2 but are different otherwise. Each of these possible choices has an associated statistical length and cost defined by Eqs. (7) and (22) where we assumed θ 2 > θ 1 without loss of generality. Note that for different P andP , also the length and cost are generally different. However, there exists a unique choice P * (x, θ) which simultaneously minimizes the length and cost. To see this, we first minimize the cost C with respect to P (x, θ). In order to simplify the notation, we first reparameterize θ(q) = θ 2 q + θ 1 (1 − q) with q ∈ [0, 1]. Using this, we can write the length and cost as with P (x, q) ≡ P (x, θ(q)). We now want to minimize C with respect to P (x, q), under the condition that P (x, q) is a well-defined probability density, i. e. P (x, q) > 0 and dx P (x, q) = 1. Introducing the Lagrange multiplier α, we thus have to minimize the auxiliary functional where the factor 4 in front of α is included for later notational convenience. The corresponding Euler-Lagrange equation reads Since P (x, q) > 0, we can write this as which has the general solution The functions f (x) and g(x), as well as the value of α are fixed by the boundary conditions P (x, 0) = P a (x) and P (x, 1) = P b (x) and the normalization. The final result for P * (x, q) minimizing the cost reads, For this choice, we have I * (q) = dx (∂ q P * (x, q)) 2 /P * (x, q) = Λ 2 and thus the minimal cost and statistical length In hindsight, it is obvious that C is minimized by a probability density that yields constant Fisher information, since the former is defined as C = θ2 θ1 dθ I(θ). The same is true for the length L, which is thus also minimized by P * . We note that, in analogy to the discussion in Ref. [19], the choice P * (x, q) is the geodesic curve connecting P a (x) and P b (x), however, the geometric analogy is now less intuitive, since the underlying space is infinite-dimensional. Since P * (x, q) yields the minimal length between P a and P b for a normalized probability density, we can interpret L * = Λ as the arc length between P a and P b on the infinite-dimensional unit sphere. Since C * is the minimal cost, any other normalized probability densityP (x, q) results in a larger costC ≥ C * . In particular, for a simple linear interpolatioñ which is positive and normalized, we obtain the cost where we defined the symmetrized Kullback-Leibler divergence or relative entropy. We thus obtain the by no means obvious lower bound on the latter, (C11) Applying the above discussion to the time evolution of a stochastic dynamics θ = t, we fix the initial and final state of the system, P (x, 0) = P i (x) and P (x, T ) = P f (x). The optimal time evolution between these two states is given by Eq. (C7) with s = t/T . Since this results in L * = Λ and C * = Λ 2 /(2T ), we obtain a lower bound on the thermodynamic cost of the evolution from the initial to the final state [19], Thus, the minimal thermodynamic cost is given by the square of the shortest distance between the initial and final state, divided by the evolution time. This shows that, in particular, a faster evolution is generally associated with a larger thermodynamic cost; further, zero cost is only realizable in the quasistatic limit where the time evolution is infinitely slow.
or in terms of the generator where we introduced the time-derivative of the generatorĠ(t). We define a i ≡ d t ln p i (t), in terms of which we can rewrite the above as We now plug the explicit definition (D2) of the generator into the second term, where we renamed the summation indices in the last term from (i, k) to (j, i) in the second-to-last step. Since the both the transition rates and occupation probabilities are positive, W ij ≥ 0 and p i ≥ 0, this term is evidently negative. We thus arrive at where, in analogy to the continuous case, we introduced the vector of state-dependent Shannon entropy Φ defined by Φ i = − ln p i . As in Eq. (38), the time-derivative of the Fisher information decomposes into a term involving the explicit time-dependence of the generator and a negative semidefinite term. If the transition rates do not depend explicitly on time, d t W ij = 0, then, just as in the case of Fokker-Planck dynamics, the Fisher information decreases monotonically in time in complete analogy to Eq. (40). We remark that the same result holds for a mixed process, i. e. a Fokker-Planck dynamics with additional discrete states labeled by k and a state-dependent drift vector and diffusion matrix, since the generator is the sum of a diffusion and jump part, to which the arguments leading to Eqs. (40) and (D9) can be applied separately.
Next, for any distribution that depends on time only via its mean, with a functionP (z) that does not explicitly depend on time, the Fisher information can be written as We now use the operator inequality, in the sense that the expression on the left-hand side is a positive semidefinite matrix. Here we defined This inequality holds for arbitrary differentiable probability distributions and leads to Since the rightmost expression is just the Fisher information for a normal distribution with time-independent covariance matrix, Eq. (52b), this proves the bound (59). What is left to do is to prove the operator inequality Eq. (E16). To do so, we consider the covariance cov(f, g) ≡ f g − f g with respect to some differentiable probability distribution where a, b ∈ R M are arbitrary vectors and we sum over repeated indices. Here, we integrated by parts in the second-to-last step. On the other hand, we have from the covariance inequality cov(a T x, b T ∇ ln(P )) 2 ≤ var(a T x)var(b T ∇ ln(P )), where var denotes the variance with respect to P (x), var(f ) ≡ f 2 − f 2 First, we note that b T ∇ ln(P ) = 0 and, consequently, the variance of b T ∇ ln(P ) is given by Next, we evaluate the variance var(a T x) = dx a i a j x i x j P (x) − dx dy a i a j x i y j P (x)P (y) = a T Ξa (E22) Then, the covariance inequality (E20) can be written as Since this holds for arbitrary a and Ξ is positive definite and thus invertible, we may choose For this choice, we obtain where we used the symmetry of Ξ and that ΞΞ −1 = 1. Since b is arbitrary, this is equivalent to the inequality (E16).