Inference in non-equilibrium systems from incomplete information: the case of linear systems and its pitfalls

Data from experiments and theoretical arguments are the two pillars sustaining the job of modelling physical systems through inference. In order to solve the inference problem, the data should satisfy certain conditions that depend also upon the particular questions addressed in a research. Here we focus on the characterization of systems in terms of a distance from equilibrium, typically the entropy production (time-reversal asymmetry) or the violation of the Kubo fluctuation-dissipation relation. We show how general, counter-intuitive and negative for inference, is the problem of the impossibility to estimate the distance from equilibrium using a series of scalar data which have a Gaussian statistics. This impossibility occurs also when the data are correlated in time, and that is the most interesting case because it usually stems from a multi-dimensional linear Markovian system where there are many time-scales associated to different variables and, possibly, thermal baths. Observing a single variable (or a linear combination of variables) results in a one-dimensional process which is always indistinguishable from an equilibrium one (unless a perturbation-response experiment is available). In a setting where only data analysis (and not new experiments) is allowed, we propose - as a way out - the combined use of different series of data acquired with different parameters. This strategy works when there is a sufficient knowledge of the connection between experimental parameters and model parameters. We also briefly discuss how such results emerge, similarly, in the context of Markov chains within certain coarse-graining schemes. Our conclusion is that the distance from equilibrium is related to quite a fine knowledge of the full phase space, and therefore typically hard to approximate in real experiments.


I. INTRODUCTION
Inference from the knowledge of only partially accessible information is a central challenge in many physical fields. We can say that such a topic is an unavoidable link between the experimental science and the theoretical approach in terms of mathematical description as well as for the building of effective models from data [1]. A paradigmatic example of the importance to have an efficient inference protocol is the problem of the phase space reconstruction in chaotic systems. Typically, experimental measurements provide just a time series of one observable y sampled at discrete times t 1 , t 2 , . . . , t m , so we have a time series y 1 , y 2 , . . . , y m , depending on the (unknown) D dimensional state vector x = {x i } i=1,D of the underlying system. The problem of phase space reconstruction consists in computing, from this series, quantities such as Lyapunov exponents or to assess the deterministic or stochastic nature of the system, as well as to build up from the time series a mathematical model enabling predictions. Takens was able to show, using the embedding method, how, for a deterministic system under quite general assumptions, a time series of the vector {y k } k=1,m allows to faithfully reconstruct the properties of the underlying dynamics [2][3][4]. Such a result has a tremendous conceptual, as well as practical, relevance: it is a bridge between experiment and theory. On the other hand, even important results have their practical difficulties: it is now well known that there are rather severe limitations for the application of the Takens method in high dimensional systems [5,6].
Inference is rather a wide topic, and its relevance in statistical mechanics is well recognised [1]. Let us briefly introduce the problem in general terms: we have a system whose state is determined by a vector x ∈ R D , or an integer i ∈ I N ≡ {1, . . . , N }. In the first case the evolution rule is given by a differential equation (or a map), possibly including noise; for the second case typically one has a Markov Chain or Master equation. In many interesting systems we have N, D 1, in addition some variables can be much faster than others; usually in the experiments it's impossible to observe the whole state of the systems but only a part. So instead of x ∈ R D or i ∈ I N sometimes we can prefer, or are forced, to deal only with a vector y = y(x) : x ∈ R D → y ∈ R d or an integer a = a i : i ∈ I N → a ∈ I M with d < D and M < N (and sometimes d D and M N ). The ys or as can be the result of some coarse graining, projection or decimation procedure, due to the particular measurement protocol. At a theoretical level the natural question is how to build an effective description for y or a, an important topic treated in many works [7][8][9][10][11]. However the present paper is not devoted to such an aspect. On the contrary, we focus on a question which is relevant from an experimental point of view. Namely we wonder whether/how from a time series, y or a, it is possible to understand the original problem, or at least some of its salient features.
Before going on, let us briefly summarize some results which, at first glance can appear negative, but have their relevance showing how, in the building of models, it is vital to adopt a pragmatic approach. We already mentioned the crucial contribution of Takens in the analysis of the experimental data of chaotic systems, and the severe limits in many practical cases. One of these limits is that such a result does not hold for noisy systems [12]; the basic reason is that noise can be seen as a function of time with an infinite number of Fourier terms, and therefore it is not possible to apply to it the embedding technique using a finite dimensional vector. For instance when x = (x 1 , x 2 , x 3 ) ∈ R 3 is a Gaussian process, the knowledge of an even very long time series of y = (x 1 , x 2 ) ∈ R 2 , in general is not sufficient to understand the features of x.
Another practical limit is in the resolution of the data. An arbitrarily fine resolution of the state of the system could allow one to determine whether a given experimental signal (i.e. a time series of an observable) originates from a chaotic deterministic or stochastic dynamics, with the help of methods from information theory combined with the Takens approach [13]. However such a distinction is strongly limited by the practical impossibility to reach an arbitrarily fine resolution. Given that, it becomes very useful to perform an entropic analysis of a given data record in terms of -entropies (and associated finite size Lyapunov exponents) which characterise the entropy of data at different resolution scales [14]. Such a result has its practical relevance: it allows us a resolution-dependent classification of the stochastic or chaotic character of a signal. In practice, without any reference to a particular model, one can define the notion of deterministic or chaotic behavior of a system on a certain range of scales [13].
In the present paper we face a different inference question, relevant in the context of non equilibrium statistical mechanics [15], for which we give a brief introductory example here. Let us assume that we know only the time series of a unique variable y of a system and we know that the underlying dynamics is a Gaussian process, e.g, y is a component or a linear combination of the components of a vector x ∈ R D whose evolution law is ruled by the linear Langevin equationẋ + Ax = Bξ. Of course, a well designed perturbation-response experiment can tell us if the underlying system is at thermodynamic equilibrium or not [16]. However such a test requires observing the system at least in two states: the unperturbed and the perturbed ones. A quite natural question in an experiment emerges: is the knowledge of one component, e.g. y, enough to understand that the system has a non equilibrium character [17,18]? Note that there is, for Markov processes, a neat connection between the violation of Kubo relation (or "equilibrium" Fluctuation-Dissipation relation, EFDR) and the presence of an asymmetry under time reversal, typically measured as non-zero entropy production [19]. However, previous results have shown that for linear systems the above question may have negative answer [20][21][22]. For instance, in the case with x = (x 1 , x 2 ) ∈ R 2 ruled by a set of linear Langevin equations with two different temperatures in such a way that the entropy production is positive, the dynamics of y has always zero entropy production even if it does not satisfy the EFDR [23]. Such a result is rather disappointing: the knowledge of an even very long time-series y coming from the system does not allow to understand a qualitative and fundamental aspect of the system.
In this paper we put this problem in a wider perspective, considering the class of linear systems, Markovian and non-Markovian, in full generality, and we try to suggest strategies to solve this problem in practice, from the point of view of experimental measures. The question we address has received a lot of attention in the recent years, for several categories of systems, particularly systems with a discrete space of states. For this reason we briefly review some of the strategies discussed in the literature.
The rest of the paper is organised in the following way: in Section II we consider a few explicit examples which highlight the impossibility of inferring the thermodynamic state of a system from partial observations. We apply different analysis upon the data, including conditional averages and excursion analysis: all procedure report perfect symmetry under time-reversal, when they are applied to a single component, regardless of whether the underlying two-dimensional system is at equilibrium or not. In Section III we demonstrate that this is a general limit intrinsic of linear systems, i.e. it is impossible to distinguish equilibrium from non-equilibrium looking at a time-series of a scalar observable which is a linear function of state space variables without performing response experiments. In Section IV we finally show a procedure which is successful in the equilibrium/non-equilibrium distinction, but requires -in fact -the availability of data taken with different parameters, a condition which could be met in experiments even without a direct control of the parameters. The Section, V, is a brief discussion of the extension of this problem to Markov chains with simple topologies (rings), motivated by the observation that very similar results apply. Conclusions and perspective are discussed in Section VI. Appendix A contains a brief discussion of the generality of the reduction of entropy production under coarse-graining. Appendix B gives a detailed treatment of Gaussian stochastic systems, in order to make self-consistent the paper. Finally, Appendix C shows the algorithm we use to perform numerical simulations of Ornstein-Uhlenbeck processes.

A. Brief review of recent approaches
From a statistical mechanics point of view, the lack of thermodynamic equilibrium is measured by the timereversal asymmetry, typically measured by entopy production (EP), whose most straightforward definition has been given for Markov processes in [24]. From an infor-mational point of view, it coincides with the Kullback-Leibler (KL) divergence between the probability of timeforward and time-backward sequences of positions in the full phase space [25,26]. In principle such a definition can be applied also to time-series of observables which bear partial information about the system, but it can only result in a lower bound to the EP, see Appendix A.
The need for a better understanding of stochastic thermodynamic from an inferential point of view has emerged in the last years [18]. Part of such an interest has been triggered by the discovery of Thermodynamic Uncertainty Relations (TURs) [27][28][29][30] which give, under quite general conditions, a bound to the total EP of a system based upon the knowledge of fluctuations of any partial current. This observation has led to recipes for the estimate of EP from incomplete information, such as in [31,32]. It has been also shown that TUR-based approaches are usually more powerful, i.e. they give closer estimates of EP, than measuring the Kullback-Leibler information of the available partial data [33]. Given the fact that a TUR-derived bound can be improved by looking for optimal currents, machine learning approaches have also been proposed [34,35]. The use of TURs for the estimate of EP is certainly promising, however it requires the measurements of currents, i.e. of observables which are already indicating some breaking of timereversal symmetry, and it is usually hard to evaluate the tightness of the obtained bound, i.e. one may obtain arbitrarily low estimates [32,36].
We remark that currents are frequently measured by collecting a number larger than 1 of observables, in order to see immediately the presence of cyclical trajectories. Recently several studies have put in evidence the possibility to observe cyclical trajectories in small biological systems, measuring two or more coarse-grained observables [37], e.g. the main Fourier modes (or principal components) of some complex organism, for instance C. elegans worms [38], Chlamydomonas [39], filaments in actin-myosin networks [40] and with mammalian sperms [41,42]. In these studies the nonequilibrium character of the system is verified but a quantitative estimate of EP is rarely considered [33].
Strictly related to dynamical asymmetries, is the concept of avalanche shape, which has a mathematical counterpart in the concepts of bridges and excursions. In terms of stochastic processes, an avalanche or excursion corresponds to a portion of the stochastic trajectory between two successive passages through a chosen threshold. Similarly, a "bridge" is the portion of a stochastic trajectory joining a chosen starting point to a given final one without further constraints. Both these quantities have been studied for a broad class of processes, with several applications in physics [43][44][45][46]. Stochastic thermodynamics represents an ideal framework where these studies could reveal their utility, for instance in comparing an excursion from an initial to a final configuration and its time reversed counterpart. Not surprisingly, tools and results from stochastic control theory and op-timal transport have very recently been transferred and adopted in stochastic thermodynamics [47]. Extending these studies to non-Markovian dynamics or for incomplete information looks intriguing. In the present paper we show how bridges and excursions appear symmetric when measured on a single time-series from a Gaussian stochastic process, even if it is non-Markovian and strongly asymmetric in full phase-space, as it happens when out-of-equilibrium. More recent papers have addressed the problem of estimating EP, or at least discriminating if it is zero or positive (that is distinguishing between equilibrium and non-equilibrium), even when the available data do not bear any signature of currents. The distribution, or some of its moments, of the residence times in certain states have been shown to be useful constraints also when observable currents are zero, leading to inferior bounds to the entropy production, but certain assumptions are required, for instance the observed states to obey semi-Markov statistics [48], or in alternative a complex optimisation problem must be solved in order to account for all possible hidden Markov state networks compatible with the constraints [49,50]. The distribution of the time elapsed between certain transitions also provides a promising approach [51][52][53]. All these approaches can be applied under the validity of specific conditions and/or result in lower bounds. A lower bound is better than nothing (provided that it is not zero), but can be frequently very far from even the correct order of magnitude of EP, particularly when the investigated system is macroscopic: for instance the Authors of [50] when analysing an experimental time-series of "residence" times for a cow to stand or lie, can only conclude that "the cows consume at least 2.4 × 10 −21 Cal/h, in deciding whether to lie or stand". The study of times between transitions is also important in the presence of strong non-linearities, e.g. potentials with several local equilibria (multi-wells) [33,54,55]. Another quantity which is not directly related to time-reversal asymmetry, emerged in the study of non-equilibrium fluctuationdissipation relations, is dynamical activity (or sometimes "frenesy"), which is also known to provide bounds to entropy production [56,57]. A different interesting strategy is to compare data coming from regimes realised with different choices of a certain parameter [58]. This approach is, for some aspects, similar to a response experiment, but one could imagine that such parameter-dependent sets of data are already available and can be exploited in order to realize a kind of "a posteriori " (in principle non-linear) response experiment. For instance, this situation could be realized even if the observer cannot directly influence the parameters of the system (e.g. in weather or climate observations). We will apply similar ideas to our problem in Section IV.

II. PITFALLS OF LINEAR SYSTEMS
Let's first introduce a very simple example of stochastic linear system, which in its most general form is called Brownian Gyrator, a model recently adopted to describe experimental systems and nano-machines [59][60][61][62]. It consists of a linear two-variables Markovian system whose evolution is ruled by the following stochastic differential equation [63] where the ξ's are two independent noise sources, whose amplitudes may differ ξ i (t)ξ j (t ) = 2T i δ ij δ(t − t ). It is important to remark that many of our examples are good descriptions of overdamped Brownian particles in contact with baths and therefore it is reasonable (assuming damping coefficients to be 1) to identify the noise amplitudes T 1 and T 2 as temperatures. (Here we are interested in the case where α > 0, γ > 0, αγ > λµ, where the system asymtptotically converge to a stationary distribution).
In order to illustrate the behaviour of the model, let's consider the average trajectory between two points in the configuration space. In Fig. 1, it is shown the average trajectory between the point x 1 = 0, x 2 = 0 and x 1 = 1, x 2 = 2, for trajectory of fixed duration τ . In each panel we show the average trajectory for the direct (0, 0) → (1, 2) and the reverse (1, 2) → (0, 0) path. Not surprisingly, for equal temperatures T 1 = T 2 (left panel) the direct and the reverse path coincide. This is a consequence of detailed balance, which holds for an equilibrium system. On the other hand, when T 1 = T 2 (right panel) detailed balance is broken, and this has an immediate consequence on the average direct and reverse trajectories, which now take two completely different paths. The figure gives a very intuitive visualization of the presence of the internal probability current, characterizing the non-equilibrium stationary state for the general T 1 = T 2 case.
The problem we are interested in is wether is possible to appreciate the non equilibrium nature of the process, having access to the information given by one component only (say x 1 ). Given the simplicity of the model, is it possible to carry out several exact computations. In fact the quantities shown in Fig. 1 are related to the the socalled "bridge" of the stochastic process.
The bridge is the process obtained constraining the trajectories of the original stochastic process to some initial and final (after a chosen time τ ) configurations. In other words, the bridge is the ensemble of trajectories which connect x i to x f in a time τ . The probability to observe a value x of the bridge at an intermediate time 0 < t < τ is given by where P(x, t 1 |y, t 2 ) is the propagator of the Markov process (which has been also considered homogeneous in time P(x, t 1 |y, t 2 ) = P(x, t 1 − t 2 |y, 0)). We consider this quantity as a tentative proxy of the equilibrium status of the process, since it can be shown (for variables which are even under time-reversal) that detailed balance implies the following symmetry of the bridge distribution: In the case of a Brownian Gyrator, because of the linearity of the process, the bridge is a Gaussian (non homogeneous) process, and the distribution Eq. (2) is a multivariate Gaussian distribution, fully characterised by its mean vector L(t) and its covariance matrix Q(t). Their expressions are [64]: where R(t) = exp(−At) is the solution of the deterministic equation dR(t) dt = −AR(t) with initial condition R(0) = I (i.e. the response, see Appendix B), and P (t) is the covariance of the propagator, which satisfy the Lyapunov equation dP dt = −AP − P A T + Σ with initial condition P (0) = 0 and where Σ ij = 2T i δ ij . (The covariance of the propagator can be also expressed in terms of correlation C and response R of the process as As before mentioned, Fig. 1 shows the mean L(t) (the plane coordinates corresponds to the coordinates of the vector L 1 (t) and L 2 (t)) for a bridge of a Brownian Gyrator from x i = (0, 0) to x f = (1, 2) compared with the mean of the bridge with endpoints reverted x i = (1, 2) and x f = (0, 0). As can be seen, because of Eq. 3, when detailed balance is satisfied (T 1 = T 2 ) the two paths (direct and reverse) coincide, while they differ for T 1 = T 2 .
In the present work, we are not interested in the information contained in the full bridge distribution Eq. (2). Rather we consider the case where only a single component of the Brownian Gyrator is accessible by measures. A single component is still a Gaussian, stationary process, but it is no more Markovian (since the other variable is now hidden).
Nevertheless, one can consider the moments of the component of the bridge: which, for n = 1, recovers the first component of the mean vector L(t), while for n = 2 is P 11 (t) + L 1 (t) 2 .
Note that the moments depend on τ and on the extreme points x 0 and x f , which include both components of the original Markovian process. Again, if detailed balance is satisfied, the simmetry Eq. 3 implies:

FIG. 1: Average trajectory for Brownian Gyrator (drift paramters
2) and viceversa. The two panels represent different choices of temperatures T 1 , T 2 : at left the system satisfies detailed balance (T 1 = T 2 = 2), at right detailed balance is broken (T 1 = 10, T 2 = 1). These are parametric plots, where the axis are the two average components of the bridge that means that the moments have symmetric shape with respect to t = T /2. In Fig. 2, we show L 1 (t), as well as L 1 (t)± Q 11 (t) for the same bridges considered in Fig. 1. Again, in the case of detailed balance (T 1 = T 2 ), moments satisfy the time reverse symmetry Eq.(4). However, in the non equilibrium case (T 1 = T 2 ), the symmetry is broken.
In particular, one can consider the special bridges with x i = x f = (0, 0): in this case, since one has obviously L(t) = 0, one can measure the second moment, which correspond to Q 11 (t). In Fig. 3 we show the behaviour of Q 11 (t) for such a bridge, with and without detailed balance.
Interestingly, figures as Fig. 3 are very similar to the average avalanche shape, a measure introduced in the context of the study of crackling signals [65,66], in the context of Barkhausen noise in ferromagnetic materials [67,68]. In this case, the signal analized is a (positive) bursty measure. Given a small (ideally zero) threshold, the signal is regarded as a sequence of avalanches (the signal between to successive zeroes). In practice the avalanche represents a single burst of activity of the signal, between to quiescent phases (think for instance to the intensity of a single earthquake). Then, avalanches of the same durations are averaged in order to get the average avalanche shape.
In terms of the theory of stochastic processes, the avalanche of the process is called an excursion [45,69]. In a recent paper [46], for the case of a class of multiplicative stochastic processes (ABBM/CIR/Bessel processes), it has been shown that the average bridge shape is simply proportional to the average avalanche shape, suggesting that the two quantities (bridge and excursion) carry similar information about the time evolution of the process. Symmetric, as well asymmetric average avalanche shapes has been observed in several physical [70][71][72][73], geophysical [74] and biological [75] phenomena, but at the moment there is no general understanding of the meaning of such property. The only work [76] devoted to the asymmetry of some Barkhausen average avalanche shape attributes the phenomenon to subtle inertial effects of the effective motion of magnetic domains inside the ferromagnetic material.
Coming back to the bridge shapes of the Brownian Gyrator, Fig. 3 could give the illusion to represent a measure on a single component, which could discriminate the equilibrium nature of the whole process: the equilibrium case corresponding to a symmetric shape, while the offequilibrium an asymmetric one.
Unfortunately, the quantity considered includes more information with respect to the single component, since it represent the average shape of a bridge comprised by two points where both the coordinates of the full process are zero. In other terms, in order to perform such a measure, one needs an information on both coordinates.
In order to consider the more general case, where one has access strictly to a single component, one have to consider the stationary bridge of such a component, independently from the value of the second component. This turns to a different definition of the bridge, with respect to Eq. (2). In fact, once fixed the values of the first components of x 0 and x f , one has to perform a stationary average over the second component of x 0 and then integrate over every possible values of the second component of x f and x. More precisely, the bridge distribution for the stationary first component, going from x 1 (0) = x 1i to x 2 (τ ) = x 2f is: where is the stationary distribution of the (free) process. Due to gaussianity of the process, the computation of (5) for the Brownian Gyrator can actually be performed, but it's too cumbersome to be shown here, even in the case of the symmetric gyrator α = γ and λ = µ. However, an example of single component bridge is shown in Fig. 4.
There we compare the average shape (its variance) of the bridge for the single component between two zero values (x 1i = x 1f = 0), for several values of duration τ : In the left panel the component comes from a Brownian Gyrator at equilibrium (T 1 = T 2 ), while the right panel the system is out-of-equilibrium (T 1 = T 2 ). In both cases, the shape is symmetric with respect to t = τ /2, and there is no way to appreciate the off-equilibrium origin of the second case. This shows that, for linear systems, the bridge of a single component can not be a proxy for the determina- tion of the equilibrium nature of the full system. In fact, this is due to a very general mathematical results [20]: any scalar gaussian process, not necessarily Markovian, which is statistically invariant under time translation is also statistically invariant under time reversal. In order to grasp a more physical intuition of such a quite suprising result, we consider more standard quantities.
Suppose we can carry out an experiment in which the only measurable quantity is the observable y = x 1 . Since the system is linear, the variable y is Gaussian and therefore its correlation function C y (t) completely characterizes observed stochastic process. Nevertheless, there is an infinity of underlying linear, markovian bidimensional systems which are consistent with the observations. For instance, consider two systems [77]: Note that, while S 1 is at equilibrium (T 1 = T 2 ), system S 2 is not: the average value of its entropy production rate is S = 0.12. However, looking at y = x 1 only, it can be shown that both systems share exactly the same correlation function C y (t) (6) but different response functions as shown in Fig.5. It is therefore evident that without knowing the response functions the two systems cannot be distinguished. One might be tempted to find out whether the detailed balance is satisfied by looking at suitable statistical features. Actually, if the system is invariant under the transformation t → −t then it follows that every statistical quantity is an even function of time. The panels of Fig.6 show the conditional averages y(τ )|y 0 and y(−τ )|y 0 for the two systems. As can be seen, these objects are invariant under t → −t. One might naively think that, despite y(τ )|y 0 = y(−τ )|y 0 , possible dissipative effects occur in statistical objects that depend on two or more times due to the non-Markovianity of the process. For instance, given ∆t > 0, one can consider the two quantities y(τ )|y 0 , ± = y(τ )|y(0), y(−∆t) ≶ y(0) . y(τ )|y 0 , + and y(τ )|y 0 , − are the mean values of y(τ ) knowing that y(0) = y 0 has been reached from below or from above, respectively. In some sense, the second conditioning is equivalent to providing information also on the "velocity" of the process. For a process which is invariant under the transformation t → −t we have y(τ )|y 0 , + = y(τ )|y 0 , − since there are no differences to look at the process forward or backward in time. Not surprisingly, y(τ )|y 0 , + and y(τ )|y 0 , − are indistinguishable (not shown). Hence, it is not possible to infer the thermodynamic state of the system.
In the following sections, we will provide a general proof of such indistinguishibility, based on an analysis of correlation and response functions of the system. Fur-FIG. 6: Conditional average y(τ )|y 0 as a function of the time delay τ for the two systems S 1 (left) and S 2 (right). In both cases y 0 = 0.7σ y , where σ y is the standard deviation. The results have been obtained by means of numerical simulations. thermore, we propose a possible way out, which could be used when a limited control on the system is available.

III. A "NO-GO" THEOREM FOR LINEAR SYSTEMS
First, we assume to know -from previous knowledgethat the data are generated by a stochastic linear system (being it Markovian or not). Then, we know that the only statistical information that can be retrieved from a time series of a set of D observables measure during its evolution is the stationary correlation matrix C(t − t ) whose elements are x i (t)x j (t ) .
If we are only interested in discriminating equilibrium from non equilibrium, and we are lucky enough to have access -with our available observables -to a subset large enough of the full phase space, we can simply check if the detailed balance is satisfied or not, by looking at the condition SC(t)S = C(t) T , where S ij = s i δ ij s i ∈ {−1, +1} takes into account the effect of the time reversal on the different components (see Appendix B). But what happens when the space of the real phases of the system under observation is very large and, instead, our observables are a few or even only one?
In one dimension in particular, the detailed balance condition is inevitably always satisfied, because C(t) is a scalar function. If the system is non-Markovian, however, it is possible that non-equilibrium information is contained in the comparison between the time-correlation of the noise and the memory kernel function representing the deterministic force of the system, i.e. by evaluating the so-called 2-nd kind Fluctuation Dissipation Relation [16]. In order to do so, however, we need to separate the contribution of the noise from that of the deterministic forces.
In Appendix B we show in details that such separation is impossible; here we summarise the situation. The dynamics of our small set of observables x = {x 1 , . . . , x D } is described by a linear integral-differential stochastic equation of the type Lx = Bξ, where L and B are operator acting on a vector space of functions in L 1 (R) and ξ is a (in general non-diagonal) colored noise matrix. In this case the Fourier transform of the connected correlation and to the noise fluctuations ξ α (t)ξ β (t ) c = ν αβ (t − t ) according to (see Appendix B) where √ 2π R(ω) = L(ω) −1 and The equation (8) states that in general, in the absence of any other information (in particular, without response experiments), it is impossible to separate the contribution of the noise from that of the response by simply looking at the correlation function: the noise and response poles are mixed together and we cannot assign them to one source or the other without ambiguity. In order to show such an impossibility we can consider the following one dimensional stochastic process with a colored noise ( ξ(t) = 0 ) with a > 0 and b > 0, for which, a direct computation leads to the following expression for the Fourier transform of the time correlation function C(ω) and for the response R(t) of the system Without a response experiment we are not able to distinguish the stochastic process above from the one having inverted rateṡ x + bx = ξ ξ(t)ξ(t ) = 2T e −a|t−t | R(t) = θ(t)e −bt or, not necessarily worse, we might think to have δcorrelated white noise and a second order SDE: in this case we could have something likë Then, if we are unable to perturb the system and measure its response, then we are unable to understand what process we are really looking at. We can hope to deal with this problem by restricting the set of systems under investigation. For instance, we can think to have a multidimensional model for variables {x i } (i = 1...D) and that an equation like Ly = Bξ arises once we just observe a single dynamic variable y = x i or a linear combination y = i a i x i of these or their derivatives. A very general and natural model is a genuine multidimensional Ornstein-Uhlenbeck process asẋ + Ax = Bξ or its subsetẍ + Γẋ + Kx = Bξ where the dynamical variables x andẋ have opposite time reversal parity and ξ i (t)ξ j (t ) = ν ij δ(t − t ). In this way, with a good data fit like β sin Ω β t and a robust statistical analysis able to determine the number of exponentials to consider, we can hope to bet on the dimension of the hypothetical underlying multidimensional system and to recover the poles of the response function (since we assume the noise does not have it). In many cases this approach seems to work reasonably well when, for example, we want to distinguish a second order dynamic from a first order one. Let's imagine that we have obtained the time correlation function C y (t) = c + e −l+|t| + c − e −l−|t| from the evolution of an one dimensional observable y = x 1 which, for experimental reason, we can interpret as a speed. Hence, we look for a linear stochastic dynamics in two dimension and we would like to distinguish between the following two cases: always in equilibrium (I) and generally not (II).
where l + , l − , T , D are the eigenvalues, trace and determinant of A respectively. A simple computation prove that there is a substantial difference in the Fourier transform of correlation function C y (ω). This difference allow us to distinguish the two cases simply by looking at some conditions on the coefficients c + and c − .
Note that, since matrix ν is positive definite, the two coefficients c 0 and c 1 can never be negative, then, if matrices A and B are strictly positive we are sure we are not in case I. But, what happens when we have excluded case I? Is it possible to discriminate equilibrium from non-equilibrium in case II? Unfortunately, there are an infinite number of processes like (II) which have exactly the same correlation function C y (ω) of an outof-equilibrium process while satisfing equilibrium con-dition (α − γ)ν 12 = λν 22 − µν 11 . For example, given D, T , c 0 and c 1 which completely characterize the correlation function, we can look at the equilibrium processes just by fixing ν 12 = 0 and by choosing the parameters as and we are still free to choose any value for ν 22 . In other words, we are not able to detect the temperatures of the thermal baths since they are mixed with the deterministic forces in the relative residues. Definitely, it seems impossible understand from the simple knowledge of the correlation function C y (t) of the single dynamical variable y if the original multidimensional system was out or in equilibrium.

IV. A WAY OUT: "A POSTERIORI" RESPONSE
Let us consider an experiment from which we are able to get the time series of a single scalar observable for which the single-time fluctuations y around the average value (that we assume we are able to subtract step-bystep) are, at least a first approximation, normally distributed. We estimate the time correlation function C y (t) from the time series of such fluctuations y and then, with in mind the idea of a underlying multidimensional Ornstein-Uhlenbeck process, we fit C y (t) with a linear combination of exponentials. Let's imagine now that we have a knob that, even if in an uncontrolled way, slightly modifies the parameters with which the system is evolving. How will the relative correlation function be made? If the knob has changed the drift parameters then we will find something almost completely different as the poles will certainly have moved. But if the poles have not moved, we can think that we have changed only the "temperatures" T i → T i of the effective thermal baths with which we can characterize the contribution of the noise to the process. If so, only the coefficients have changed c α → c α accordingly with the formula c α = i T i c iα where the c iα depend on the drift only (see Appendix B). Still in the appendix B, we show that, when the system is at equilibrium, all the temperatures T i are proportional T i = T g i to the a single temperature T with constants of proportionality g i which depend on the drift only: in other word, at equilibrium we have a single effective thermal bath. In light of this observation, after the knob has changed only the noise properties, two things can happen: either the c α all scale by the same factor or not. If they scale by the same factor there are two possible explanation: • there is a single thermal bath and the system, both before and after turning the knob, was in equilibrium but at two different temperatures, the ratio between these two temperatures coincides with the ratio between the coefficients c α of the correlation functions ; • the system is out-of equilibrium but the knob, for some reason, increases or decreases all the effective temperatures of the thermal baths by the same factor T i → rT i .
The idea of having a knob that varies some parameters of the model has already been used to infer the thermodynamic properties of a system. For example, in [58] the authors suppose that they can vary the probability of a link in a Markov chain in order to find the stalling condition (no current through the link). This condition is then used to provide a lower bound of the entropy production of the system. Moreover, in the experiments described in [59,60] the experimenters had a knob that allowed to change the temperature of one of the two thermal baths. The procedure described above allows us to understand, through two system measurements, whether both measurements were made in equilibrium conditions or not. However, in the second case, it does not allow to make precise statements on the two measures individually, that is, it does not allow to distinguish the three different situations: • the system is in equilibrium and the knob takes it out-of equilibrium, • the system is out-of equilibrium and the knob takes it in equilibrium, • the system is out-of equilibrium both before and after turning the knob.
Since the c α = i T i c iα are written as a linear combination of the temperatures T i and the relative coefficients c iα are known functions of the parameters of the model, it could be possible to distinguish these three cases by taking different measurements by varying several times the effective temperatures of some thermal baths. Let D be the number of poles of the C(ω) and n the number of the effective thermal baths whose temperatures change by moving our knob. In this way the relationship between the c α and the temperatures of the thermal baths is c α = q α + n i=1 T i c iα where the q α is a D−dimensional constant vector which does not depend on the temperatures of the thermal bath we change with the knob and the c iα are nD coefficients with depend on the drift only. At each measurement we have D conditions but also n additional unknowns temperatures so, with m measurements, the number of the unknown parameters are (n + 1)D + mn and the number of conditions that must be satisfied at the same time is Dm. This means that we need at least m ≥ (n + 1)D/(D − n) measurements in order to be able to fit the q α and the c iα . Note that the knowledge of the poles entails other D additional conditions that the parameters of the model must satisfy. Hence, in some particular experimental setups, by putting all these information together we could be able to infer the nature in or out equilibrium from few measurements, as we will show in the next section.

A. Our Protocol at Work for the Brownian Gyrator
Consider again the example of Sec.II. In this case, the correlation function takes the form with where T = α + γ and D = αγ − λµ. From the definition of c + , c − , l + and l − we get Let r (j) − ) where the index j refers to the j-th experiment and imagine that the knob changes only the temperature T 1 . Then, we have From the knowledge of γ, we can estimate α = l + + l − − γ and λµ = αγ − l + l − . The entropy production S is proportional to T 1 µ − T 2 λ, i.e.
Since two measures are sufficient to compute the right hand side of Eq.11, we are able to infer whether the system is at equilibrium or not and furthermore we estimate the average Lebowitz-Spohn entropy production rate S. While the above procedure is theoretically correct, to be useful it must also work in practical cases. We therefore Entropy production rate  decided to apply this procedure to the two systems introduced in Sec.II. For both systems, different trajectories were simulated by employing the algorithm described in Appendix C. These data were used to estimate the correlation functions and the 4 parameters l + , l − , c + and c − . Then, the temperature T 1 of the thermal bath coupled to y = x 1 was changed (system 1 was put out of equilibrium while system 2 at equilibrium) and new data were generated. By re-estimating the correlation functions and combining the new measurements with the previous ones, we were therefore able to compute the entropy production S of each of the two systems before and after the manipulation. To check the robustness of the procedure and also to get an idea of the error associated with the estimate of S, 40 experiments were repeated on each system. The results of these experiments are summarized in Fig.7 which shows the histograms of the entropy production for the two systems in the two cases. As can be seen from the histograms, there is a clear difference between equilibrium and non-equilibrium cases. In fact, in the first case the histograms have a peak around zero while in the second case the distributions are wider and have a maximum for non-zero values. Furthermore, it should be noted that in non equilibrium cases the maximum of the distributions is close to the theoretical entropy production. Tab.I shows the theoretical values of the entropy production rate as well as the results obtained experimen-tally which are compatible considering the error bars [78].

V. COARSE GRAINING OF MARKOV CHAINS
So far we have only dealt with linear stochastic processes whose states are represented by vectors in R D . However, it may happen that an appropriate description of the problem requires the use of a discrete phase space. In these cases, the state of the system is represented by an integer index i = 1, 2 · · · , N . In analogy with continuous processes, the experimenter usually does not have access to all the phase space and is therefore limited to observe a coarse-grained process that lives in a reduced phase space. In this Section we show that, in the context of Markov chains or Markov jump processes, some coarse-graining procedures lead to results similar to those of Gaussian continuous systems, i.e. that the statistical features of the coarse-grained process and of its time-reversal are indistinguishable despite the underlying process being an out of equilibrium Markov process. In the following, we discuss the case of Markov chains but the results are correct also for processes with continuous time.
Let Ω = {1, 2 · · · , N } be the phase space of a system described by a Markov chain whose transition matrix is denoted by G. Now imagine that an experimenter is not able to observe all the states of the system but rather he observes a coarse-grained process where the states have been grouped into two disjoint groups. Let a = 0, 1 represents the state of the coarse-grained process. Given the nature of the problem, it is natural to introduce a block representation of both G and the invariant distribution Π, i.e.
The probability of a sequence a = {a t } 1≤t≤T of length T (as well as the other statistical quantities that characterize the process) can be computed from the knowledge of G and the invariant distribution Π, i.e.
Also note that the sequence a can be encoded with a sequence of K pairs (a k , n k ) where n k represents the time spent in the macrostate a k and therefore P (a) = P (a 1 , n 1 ; a 2 , n 2 ; · · · ; a K , n K ), (13) with k n k = T and a k+1 =ā k ≡ 1 − a k . In general, the process describing the evolution of a is non-Markovian and the computation of the KL divergence between the probability of time-forward and time-backward sequences provides a lower bound on the production rate of entropy of the whole system. Now consider the special case in which one of the two macrostates, say the macrostate 1, contains only one microstate. Since the state 1 is pure and the whole process is Markovian, the dynamics of the coarse-grained process is correlated only in the time-interval between two successive visits of this state. To put it another way, the coarse-grained process is a semi-Markov process. Let p aā (τ ) the probability distribution of the exit times from state a. Since the process is semi-Markov, we have that the probability of the sequence a is P (a) = p in a1a2 (n 1 ) (14) where p in (p f ) is the initial (final) probability of observing a sequence starting (ending) with n 1 (n K ) characters a 1 (a K ). The reverse sequence will instead be ← − a = (a K , n K ; a K−1 , n K−1 ; · · · ; a 1 , n 1 ) and its probabil- ity is (15) Since a k−1 = a k+1 , P (a) and P ( ← − a ) differ only for the boundary terms. Therefore, when the length T of the two sequences a and ← − a goes to infinity the two probabilities are equal and the entropy production rate vanishes, i.e.

S = lim
This result seems to leave no hope of understanding the thermodynamic state of the system through the observation of the coarse-grained process. However, in [50] the authors show that even in these cases there is a more powerful analysis based upon residence time statistics, showing -in some cases -that the only Markovian processes compatible with the observations are not at equilibrium. Note that the same is true for linear processes whose correlations have sinusoidal modulations (see the discussion in appendix B.3).
Nevertheless, we expect that, in general, for such a coarse-grained procedure it will be possible to find two Markov chains, one at equilibrium and one out of equilib-rium, that produce the same statistics for the exit times.
To support this conjecture we have considered Markov chains with a simple topology, that is, translation invariant Markov chains on a ring with periodic boundary condition of size N , for which the invariant distribution is uniform and the transition matrix is such that and G ij = 0 otherwise. In Fig.8, we show the comparison between the distributions of the rescaled exit times, i.e. τ r = τ − τ σ , in the cases at equilibrium ( = 0) and out of equilibrium ( = 0) for two different chains with N = 4 (left) and N = 5 (right) states. As might be expected, for small values of (top panels) it is very difficult to distinguish a distribution that originates from an equilibrium process from one determined from an out of equilibrium process. More surprisingly, also in the case with G ii−1 = 0 (completely irreversible process) the distribution of the rescaled exit times is not too dissimilar from its equilibrium counterpart (see bottom panels).

VI. CONCLUSIONS
In this paper we have shown results whose mathematical aspects are in part already present in the (physical or mathematical) literature [20][21][22][23], quite scattered in time and not widely known, i.e. that one-dimensional Gaussian data -even when non-Markovian and coming from a system which is out-of-thermodynamic equilibrium -are always indistinguishable from equilibrium. In all the papers where we have found something of it, the Authors do not draw conclusions about the inference problem neither they propose strategies to circumvent the observed obstacles: in the present paper we do both.
After discussing the problem in its full generality, we have given concrete examples where 1d data coming from equilibrium and non-equilibrium systems are indistinguishable, even if an observation in full phase space shows very strong differences. This result appears more surprising when looking to quantities such as bridges which are strongly asymmetric in the full phase-space of an out-of-equilibrium system, they are still asymmetric if the bridge observed in full phase-space is projected in 1d, but lose completely their asymmetry when they are constructed directly in the reduced (1d) space. We have discussed a general demonstration of the problem, which amounts to the fact that correlations, which are the only information contained in Gaussian-distributed data, contains an entangled product of quantities related to both memory and noise. Disentangling these two ingredients would allow us to check the 2nd-kind fluctuation-dissipation relation [16], but in 1d this is indeed impossible. An additional conclusion that can be drawn from this observation is that linear response cannot be deduced, in general, from correlations, in a 1d non-Markovian systems. We underline that these nega-tive results bear a certain degree of surprise if one expects analogy with deterministic systems to hold. In deterministic systems, in fact, the reconstruction of all the properties of a system in dimensions larger than 1 can be done starting from a (long enough) time-series of a 1d observable (embedding thecnique [3,4]). Our discussion, therefore, is a more convincing proof that for stochastic systems the embedding idea is not going to work, in general.

Appendix A: Entropy production
Here we briefly recall the fact that a partial observation cannot overestimate the EP of a system, being it linear, non-linear, Markovian or not Markovian. We first wish to show an example of how taking longer -but incomplete -observations can improve our knowledge of a deterministic system (approaching, for large times, the knowledge of the full phase-space, as in the embedding Takens theorem) while this is not true for stochastic systems. Consider a deterministic process in discrete time, so that at time i it stays in state γ i . When the time goes from 1 to t the process generates the path Γ t = γ 1 ...γ t which is fully determined by initial state γ 1 . One observes the system without maximum precision, that means that instead of the path Γ t the observer sees a path Ω t = ω 1 ...ω t : the lack of precision is in the fact that many different Γ t correspond to the same observed Ω t , we can define the set C(Ω t ) which contains all the paths Γ t compatible with the observation Ω t . When a new ω t+1 state is observed the number of compatible Γ t+1 can remain the same or reduce, it cannot grow because the new observation is an additional constraint on a fixed information (the initial state γ 1 ). So, observing longer and longer paths Ω t → Ω t+1 → Ω t+2 implies smaller and smaller sets of compatible states C(Ω t ) ⊇ C(Ω t+1 ) ⊇ C(Ω t+2 ) ⊇ .... This suggests that a long enough time-series of a partial observation should be equivalent to the observation of the system in full phase space increasing the length t of the observed path Ω t . Now imagine to repeat the reasoning above for a stochastic process. Each new observation ω t+1 has not the same power as in the deterministic case: in fact the underlying new state γ t+1 is not exactly determined by the previous story, therefore ω t+1 is an additional constraint on a string Γ t+1 which also contains (in general) more information than Γ t : there is no reason to expect C(Ω t ) ⊇ C(Ω t+1 ) and therefore it is not true, in general, that a longer partial observation can help in inferring features of the full phase space. This fact has been observed for continuous systems in continuous time in [12].
The lack of information about the full phase space affects entropy production in the following way. By defining Γ * t the time-reversed path, the average entropy pro-duction measured in a time-length t is when we observe the system with lower precision we can only measure . (A2) where the notation Ω t [Γ t ] the coarse-grained path Ω t corresponding to the real path Γ t . So in general S Ω t = S Γ t . Most importantly (and perhaps not noticed before), in view of the previous considerations, there is apparently no reason to expect -for stochastic systems -that increasing t may let S Ω t → S Γ t . In [23] (Sec. IIIC) -a simple demonstration is given, for a particular kind of coarse-graining (from 2d to 1d), for continuous stochastic processes -that S Ω t ≤ S Γ t . The passages can be generalised: The validity of the interpretation as a Kullback-Leibler divergence (which is non-negative) is guaranteed by the fact that Q(Γ t ) = P (Γ * t )P (Ω t [Γ t ])/P (Ω t (Γ t ) * ) is positive and normalised: The last passage requires that [79].

Appendix B: Linear systems
In this Appendix we give a complete treatment of linear systems of integro-differential stochastic equations with correlated-in-time noise, that should cover the largest possible set of stochastic (Markovian and non-Markovian) processes with Gaussian statistics. Of course the topics is largely treated in the literature, in probability and in physics, starting from the seminal paper of Uhlenbeck and Ornstein in 1930 [80], to modern books on stochastic proccesses such as [81] and [82] which treat in details the consequences of response theory, time-dependent problem and of the consequences of detailed balance, focusing on the Markovian case. The non-Markovian case is much less a textbook case and for this reason we decided to review the topics in a compact and general way.

Correlation and response
We consider the vector process x(t) that obeys the following equation is a set of external fields and ξ(t) = {ξ α (t)} α=1,d ∈ R d a vector of colored normally distributed random noise with zero mean ξ α (t) = 0 and covariance matrix ξ α (t)ξ β (t ) = ν αβ (t− t ) depending on times t and t just by the difference (t − t ). We have also introduced L = {L ij } i,j=1,D and ,D which are two matrices whose elements are linear combinations of operators that act on single variables by multiplying, differentiating or integrating them in a convolution, i.e. (Af ) (t) = dtA(t − t )f (t ) where the kernels A(t) must decay fast enough to make finite the integral on t ∈ (−∞, +∞) [83].
The above equation can also written, component by component, as In this framework we consider only real functions and it is useful to look at Fourier space (the overline is the complex-conjugate) and define the adjoint A † and the inverse A −1 (if exist) of an generic operator A paying attention to the difference between A † (ω) (the Fourier transform of operator A † (t)) and A(ω) † (the transposed-conjugated matrix of A(ω)) and between A −1 (ω) (the Fourier transform of A −1 (t)) and A(ω) −1 (the inverse matrix of A(ω)). Now, we are ready to look at equation (B2) in a compact way. If we assume that x(t) is known since t = −∞ and up to t = +∞, in Fourier space it obeys In this way the solution will be where G(ω) = R(ω) B(ω) and √ 2π R(ω) = L(ω) −1 so, by computing mean m i (t) and time-correlations func- we get that the linear response function ∂x i (t)/∂h j (t ) | h=0 is just R ij (t − t ) and that the time correlation function C(ω) in Fourier space reads Being involved only linear operators and assuming Gaussian-distributed (or delta-peaked) initial conditions, every (joint or conditional) probability distribution of a sequence of m observations {x(t 1 ) = x 1 , x(t 2 ) = x 2 , . . . , x(t m ) = x m } is a multivariate Gaussian. It is important to stress that even if x(t) is non-Markovian, when h = 0, the knowledge of C(t) is sufficient to reconstruct all such probabilities and therefore the full path probabilities too.
In particular the joint probability distribution of m observations, in the case h = 0, reads where the multivariate Gaussian for a generic vector z in n dimension with covariance matrix A is and C is a matrix mD × mD composed of blocks of the matrix two-time correlation C evaluated at the timedifferences between all the observations

Detailed balance condition
Now we discuss the detailed balance condition (when h = 0), considering the difference between the joint probability P(x(t 0 ) = x 0 , x(t 1 ) = x 1 , . . . , x(t m−1 ) = x m−1 ) and the joint probability of the "reverse path The two probabilities are equals if and only if C = C . For the validity of such a condition for whatever choice of observation times, one needs SC(t)S = C(−t) = C(t) T or, in Fourier space S C(ω)S = C(ω) T . This is analogous to the renowned Onsager reciprocity relation [84]. Actually the closest equivalent to original Onsager reciprocity is obtained by taking the time derivative of such relation and computing it in t = 0, i.e. SLS = L T , where L = C(0) is the Onsager matrix (see Chapter 5.3 of Gardiner's book [81]). We can also compute the Kullback-Leibler divergence D m between the above forward and reverse probabilities of m-paths, in order to mimic entropy production features (note, the true entropy production rate is typically computed for the continuous path probabilities, i.e. taking infinite observations at infinitesimal time-delays): From the above considerations, a main thing is evident: for 1-dimensional systems it is immediate to verify that, since C(t) = C(−t) we have D m = 0, i.e. all groups of m observations have identical forward and backward probabilities. This result is true whatever are the operators in the original equation, i.e. ∀L, B and ν.

A Markovian case
Equations like Lx = Bξ can arise from a genuine multidimensional Ornstein-Uhlenbeck process once we just observe a single dynamic variable or a linear combination of these. Let us consider the following stochastic process in D dimensions where A is an invertible and positive definite D × D real matrix and ν is the covariance matrix of the noise. It is convenient to rewrite the equation in the reference frame that has the eigenstates of symmetric matrix BνB T as a basis. In this way we can interpret the contribution of noise in terms of something analogous to the temperatures T i of D thermal bath, i.e.
so, by performing the substitutions U T x → x, U T AU → A and U T Bξ → ξ we obtain the effective process from which we verify that the statistical independence between the effective thermal baths ξ i ξ j ∼ δ ij is not an approximation. By direct integration we obtain the following expressions for response R(t) and time correlation matrix C(t) Since the process is Markovian (and given the positivity of A) it has an invariant measure and it is easy to verify that the single-time normal distribution N C (x) is the stationary solution of the following Fokker-Plank equation In Fourier space it is convenient to look at the following expression obtained by rationalizing the denominators of C(ω) so, by comparing it with its transpose Hence, AC = CA T ⇐⇒ AΣ = ΣA T . Interestingly, if the structure of A is not appropriate, the equilibrium with D effective thermal baths is impossible. In fact, by assuming A ij = 0 ∀i < j, we must have On the other hand, the condition equilibrium implies pure imaginary poles only in C(ω). In fact, if T i > 0 ∀i, since AΣ = ΣA T , we can diagonalize the symmetric matrix A = Σ − 1 2 AΣ 1 2 = A T = U ΛU T in order to derive the eigenstates of A = Σ 1 2 U ΛU T Σ − 1 2 = V ΛV −1 and verify, due to the spectral theorem for symmetric matrices, that all its eigenvalues are real and then C(t) is a sum of pure real exponential. In the case there are some T i = 0 we simply separate the two types of variables and we look at the blocks of the matrices In this way, the condition AΣ = ΣA T implies λ = 0 and real eigenvalue for γ since γΣ = Σ γ T and then which has pure imaginary poles only. Note that the equilibrium condition AΣ = ΣA T implies also the familiar fluctuation-response theorems for equilibrium systems i.e. (θ(0) = 1 2 ) The computation is considerably simplified in the case of a "symplectic" stochastic dynamics like x, y ∈ R D M, Γ, K, B, ν ∈ R D×D for which we can forget that x and y have opposite time reversal parity simply by considering the second order stochastic equation Mẍ + Γẋ + Kx = Bξ and the time correlation function C ij (t − t ) = x i (t)x j (t ) which involves just the x components. In fact, if M is invertible, we are authorized to simplify thanks to the following substitutions where, as in the general case, we are put ourself in the reference frame that diagonalize BνB T = U ΣU T . In this way we geẗ As in the general case, by rationalizing and by imposing C(ω) = C(ω) T we are able to prove that the equilibrium holds when both the conditions ΓΣ = ΣΓ T (Γ ij T j = Γ ji T i ) and KΣΓ T = ΓΣK T are simultaneously satisfied. Even in this case we have the further condition Γ ij Γ jk Γ ki = Γ ik Γ kj Γ ji without which equilibrium with D thermal bath is impossible and, by following the same procedure as in the general case, Γ has real eigenvalues only. It is possible to prove that K also has real eigenvalues: it is sufficient to make explicit the eigenstates of Γ = Σ 1 2 U ΛU T Σ − 1 2 and to note that the condition KΣΓ T = ΓΣK T implies that the matrix K = Λ − 1 2 U T Σ − 1 2 KΣ 1 2 U Λ 1 2 must be symmetric from which immediately we deduce that the eigenvalues of K are real too.

One variable from a Markovian system
We look now at a single scalar component y of a generic multidimensional Ornstein-Uhlenbeck process. We showed that, unless pathological situations, the linearity of the equation allows us to scale suitably the dynamical variables and to look at the stochastic process in the reference frame for which the covariance matrix of the noise is diagonal and the coefficient in front at the higher order derivative is unitary. By focusing on this "effective" process, we minimized the number of involved parameters and simplified the calculations relating to the equilibrium conditions but we have lost the identity as a component of the state vector of the original dynamical variable y which we are observing. However, surely y will be a linear combination of the components of the effective process so, we can still derive some general considerations about the properties of y just by looking at the time correlation function C y (t) of a generic linear combination y = i a i x i of the roto-scaled dynamical variables x i , i.e. In this way it is evident, as expected, that a stochastic equations satisfied by y can be formally written as which is in the form of the equation (B2). Note that the equation satisfied by the n-th derivative of y, z = ∂ n y/∂t n , since z(ω) = (iω) n y(ω) in Fourier space simple reads from which we immediately derive that C z (ω) = ω 2n C y (ω). By using the residue theorem and the complex conjugate root theorem which states that the nonreal roots of a polynomial with real coefficient appear always into pairs of complex conjugates (this implies f i (−z) = f i (z) and f i (z) = f i (z) ∀z ∈ C) we are able to compute C(t). First of all, the functions f i (ω) have the same denominator then the same 2D singularities in the complex plane so, if Imλ α = 0 we have two pure imaginary poles in iλ α and −iλ α while if Imλ α = 0 we have four complex poles in ±iλ α and ±iλ α . We indicate with µ α the set of D poles that have a positive imaginary part then, for t > 0, we get Finally, the properties of the functions f i (ω) and the simultaneous presence of the poles µ α and −µ α implies C y (t) = iα . If the equilibrium condition A ij T j = A ji T i holds (or Γ ij T j = Γ ji T i in the symplectic case), it means that we can write all temperatures T i as a function of only one, for example, T 1 = T and T i = T A i1 /A 1i (= T Γ i1 /Γ 1i ) = T g i and then where the d ( ) α depend on the drift only. In other word: equilibrium implies a single effective thermal bath.

APPENDIX C
To perform a numerical integration of the equations of motion, one could use one of the standard algorithms for stochastic differential equations such as the Euler or Runge-Kutta stochastic method, just to give two examples. However, these methods require the use of integration time steps much smaller than the characteristic times of the system. In the case of the Ornstein-Ulhenbeck process, however, it is possible to use an exact algorithm. We consider the following Cauchy problem for which the formal solution after a time-step is In general H ( ) is a complex matrix but the spectral theorem for symmetric matrix assures us that it can still be decomposed into H ( ) = U M ( ) U T where U is an unitary matrix and M ( ) ij = σ 2 i δ ij is a real diagonal matrix with non-negative entries. This suggests that, once ∆t is fixed, matrix A and H ( ) are diagonalized and R ( ) = V e − Λ V −1 is computed, it is possible to introduce an exact integration algorithm by implementing the following instructions step by step