Inferring nonequilibrium thermodynamics from tilted equilibrium using information-geometric Legendre transform

Nonstationary thermodynamic quantities depend on the full details of nonstationary probability distributions, making them difficult to measure directly in experiments and numerics. We propose a method to infer thermodynamic quantities in relaxation processes by measuring only a few observables, using additional information obtained from measurements in tilted equilibrium, i.e., equilibrium with external fields applied. Our method is applicable to arbitrary classical stochastic systems, possibly underdamped, that relax to equilibrium. The method allows us to compute the exact value of the minimum entropy production (EP) compatible with the nonstationary observations, giving a tight lower bound on the true EP. Under a certain additional condition, it also allows the inference of the EP rate, thermodynamic forces, and a constraint on relaxation paths. Our method uses a Legendre transform of EP at the level of probability distributions, which we develop based on a similar Legendre transform in information geometry.


I. Introduction
Relaxation processes are ubiquitous in nature, and they undergo various nonstationary probability distributions before relaxing to the final stationary distribution.For example, biological systems such as biochemical signaling pathways [1,2] and neurons [3,4] respond to external signals in a transient manner to convey information.Relaxation processes also include various nontrivial physical phenomena, such as nonmonotonic relaxations [5][6][7], slow glassy relaxations [8], and sudden cutoff relaxations [9,10].
These processes are out of equilibrium and thus have inevitable thermodynamic costs.According to stochastic thermodynamics [11][12][13], thermodynamic costs such as entropy production (EP) and thermodynamic forces depend on the full details of nonstationary probability distributions.Therefore, to obtain thermodynamic quantities directly in experiments and numerical simulations, we need to measure all the details of probability distributions by accumulating many realizations of the same relaxation process and computing the histogram at each time point.This nonstationary measurement is practically impossible for systems with more than a few states.
In this paper, we propose a method of thermodynamic inference for relaxation processes that uses measurements in tilted equilibrium, i.e., the equilibrium under the application of external fields to the system.Our approach combines the nonstationary measurement of a few observables with the tilted equilibrium measurement of the same set of observables.From these data, our method allows us to compute the exact value of the minimum EP compatible with the nonstationary data, which constitutes a tight lower bound on the true EP over the relaxation from any intermediate distribution to the final equilibrium.Moreover, if the system satisfies a condition called realizability condition, which says that the nonstationary distribution is exactly realized as a tilted equilibrium, our method provides us with additional information about the process: the exact value of the true EP, the instantaneous EP rate, the nonstationary thermodynamic forces, and a constraint on relaxation trajectories.Our method applies to arbitrary classical stochastic systems relaxing to equilibrium, including overdamped and underdamped systems, that may have continuous or discrete state spaces.
This paper is organized as follows.In Sec.II, we state the setup and define the problem.Section III is the main section of this paper, where we introduce the tilted equilibrium measurements and establish the inference method of the minimum EP.In Sec.IV, we numerically demonstrate our proposed method with a one-dimensional Brownian particle.Section V develops additional inference methods under the realizability condition.In Sec.VI, we sketch the derivation of the results.In the derivation, we develop a Legendre transform at the level of probability distributions from a similar Legendre transform in information geometry [47,48].Section VII concludes the paper.

A. Setup
We consider a stochastic system in contact with a single heat bath at a constant temperature T .The system stochastically moves around a continuous state space X ⊂ R d or a discrete state space X = {1, . . ., N}.Let p(x) denote a probability density function for a continuous system and a probability mass function for a discrete case, which satisfies X dx p(x) = 1.Here and hereafter, X dx should always be replaced by x∈X for discrete cases.The state x has energy ϵ(x), which is assumed to be time independent.The state energy determines the equilibrium distribution p eq ≡ {p eq (x)} x∈X with p eq (x) ∝ exp[−ϵ(x)/k b T ], where k b is the Boltzmann constant.
The system exhibits a relaxation process p(t) ≡ { p(x; t)} x∈X for t ≥ 0, where t is the time variable.For notational convenience, we use tilde ( ˜) for quantities associated with the relaxation process of interest.We assume that the process p(t) converges to the equilibrium distribution p(t) → p eq as t → ∞.We do not assume any specific time evolution law unless otherwise noted.
The fundamental assumption of this paper is that the details of p(x; t) are not measurable, but we can measure the expectation values of a few observables B 1 , . . ., B K .The number K ≥ 1 is arbitrary, and it can be a small number such as 2 or 3.An observable B α ≡ {B α (x)} x∈X is any real-valued function over X, and its expectation value over a probability distribution p is ⟨B α ⟩ p X dx B α (x)p(x).We use the notation B ≡ (B 1 , . . ., B K ) T and ⟨B⟩ p ≡ (⟨B 1 ⟩ p , • • • , ⟨B K ⟩ p ) T .Without loss of generality, we assume that the K observables are linearly independent of each other.We also assume that the K observables are linearly independent with a constant observable (a trivial observable whose value does not depend on x) because such an observable gives no information about the process.We write the set of expectation values measured at time t as which is the only nonstationary data needed for our proposed method [Fig.1(a)].

B. EP and the minimum EP
The fundamental quantity characterizing the thermodynamic cost of a relaxation process is EP [12].For systems FIG. 1. Schematics of the proposed inference method.(a) A system undergoing a relaxation process with a time-independent energy ϵ(x) and a nonstationary distribution p(t).Our method requires only the data of the expectation values of a few observables, η(t) = ⟨B⟩ p(t) , along the process.(b) To systematically collect data from tilted equilibrium measurements, we apply the external field v(x; θ) and measure the expectation values of the observables, η(θ) = ⟨B⟩ pte(θ) , for many θ's.We also compute the tilted equilibrium free energy ∆F (θ) from the measured values of η(θ).
in contact with a single heat bath, the EP of the relaxation process from p(t) to p eq is given by T −1 ∆H[ p(t)], where [12,49].The first term in Eq. ( 2) accounts for the change in the Shannon entropy of the system, while the second term gives the change in the bath entropy due to heat flux.Since T is set constant throughout the paper, we abuse terminology and call ∆H[ p(t)] the EP.The EP depends on the details of the probability distribution, and we need p(x; t) for all x and all t if we want to compute ∆H[ p(t)] directly from Eq. ( 2).
Since the measured data η(t) are not sufficient to determine a single value of ∆H[ p(t)], we follow the general concept in Ref. [23] to focus on the range of EP compatible with the data η(t).There is no upper limit of the compatible range because hidden (unobserved) degrees of freedom can cause an arbitrarily large EP [21].On the other hand, there is a lower bound, which we write as for any set of expectation values η.Here, the minimum is taken over all probability distributions q that satisfy ⟨B⟩ q = η, i.e., that are compatible with the set of expectation values η.Since ∆H[q] is the dissipation of the relaxation process from q to p eq , any relaxation process that exhibits the set of expectation values η at some point in time must dissipate at least ∆H m (η) before relaxing to the final equilibrium.For this reason, ∆H m (η) is interpreted as the fundamental cost associated with the set of expectation values η.

A. Tilted equilibrium
Our main result is a method for calculating the minimum EP ∆H m (η) from tilted equilibrium measurements.The method uses a family of external fields parameterized by θ = (θ 1 , . . ., θ K ) T : The external field v(x; θ) is the superposition of the fields proportional to the observable B α with an intensity θ α .The external field v(x; θ) incurs the tilted equilibrium p te (θ) ≡ {p te (x; θ)} x∈X with Here, we use the convention for the sign of the external field such that it modifies the state energy from ϵ(x) to ϵ(x) − v(x; θ).We propose the following procedure for collecting tilted equilibrium data to perform the thermodynamic inference systematically [Fig.1(b)]: 1. Realize the tilted equilibrium distributions for many sets of parameter values θ and measure the expectation values of the observables B, for each θ.

Interpolate the measured expectation values to infer
the functional dependence η(θ) over a range of θ.
3. Calculate the tilted equilibrium free energy ∆F (θ) from the data η(θ) as described below.
The tilted equilibrium free energy is defined as and ∆F (θ) F (θ) − F (0).The difference ∆F (θ) is computed from the data η(θ) as (see Sec. VI A for derivation) where the line integral is over any curve connecting θ ′ = 0 and θ ′ = θ, and the resulting value does not depend on the intermediate path.
As we show in Sec.VI A, the correspondence from θ to η, denoted by η(θ) in Eq. ( 6), is invertible.We write the inverse function as θ(η), which solves for each η.In other words, θ(η) is the unique set of parameter values that incurs the set of expectation values η.
Using the interpolated data of η(θ), one can find θ(η) for a given η if the data range covers the given η.For simplicity, we assume that the data range is large enough to cover any η that appears below.In summary, we have the data of η(θ) and ∆F (θ) from the tilted equilibrium measurements, and when a set of expectation values η is given, we can find θ(η) from the tilted equilibrium data.

B. Inference of minimum EP
In terms of these tilted equilibrium data, the minimum EP in Eq. ( 3) is expressed as which is the central relation of our inference method.We can calculate the right-hand side of Eq. ( 10) for a given η from the tilted equilibrium data.To do so, we first find θ(η), i.e., the set of parameter values corresponding to the given η, from the interpolated data η(θ).Then we look up the value of the tilted equilibrium free energy ∆F (θ(η)).Equation ( 10) is derived in Sec.VI A and Appendix B. From Eq. ( 1) and the definition of ∆H m (η) in Eq. ( 3), the true EP of the process for each t is lower bounded as Equation (11b) can be calculated by combining the nonstationary data and the tilted equilibrium data.To do so, we first use the tilted equilibrium data to find θ( η(t)), i.e., the set of parameter values that incurs the same set of expectation values as the nonstationary data η(t).Then we find the value of the tilted equilibrium free energy ∆F (θ( η(t))).
We emphasize that the calculated value of ∆H m ( η(t)) is not merely a lower bound on the true EP but a meaningful thermodynamic cost.Indeed, it is the fundamental cost for producing the observed set of expectation values η(t) in relaxation processes, as discussed below Eq. (3).

C. Equality condition and tightness
If the inequality in Eq. (11a) holds with equality, we can use Eq.(11b) to calculate the exact value of the true EP from the nonstationary and tilted equilibrium data.As shown in Sec.VI B, this happens if and only if namely, p(t) is exactly realizable as a tilted equilibrium.We call the condition in Eq. ( 12) the realizability condition.We discuss when the realizability condition holds in Sec.V A. Apart from the equality condition, we make two remarks about the tightness of the inequality in Eq. (11a).First, the inequality becomes tighter as we increase K by adding more observables to B. This is because increasing K has the effect of narrowing the domain of minimization in Eq. ( 3) and raising the minimum EP ∆H m (η).
Second, we can sometimes get a tighter bound by considering the whole process at once, assuming that the time evolution is Markovian: The first equality is the second law of thermodynamics for the true EP, −d t ∆H[ p(t)] ≥ 0, where d t ≡ d/dt is the time derivative, which holds under the Markovian assumption [12,50].The next inequality follows from the inequality in Eq. (11a).The last side of Eq. ( 13) can be computed from the considered data using Eq.(11b), and it gives a tighter bound than Eq.(11a) if ∆H m ( η(t)) is not monotonically decreasing in time.

A. Model and required measurements
We demonstrate our results with a one-dimensional Brownian particle in a harmonic trapping potential plus a ragged potential, where a > 0 determines the width of the harmonic potential, and ϵ r (x) is an arbitrary ragged potential.It is a model of an optically trapped particle linked to a biomolecule [12,51], for which ϵ r (x) denotes the energy of the biomolecule pulled to a displacement x.The time evolution is governed by the Fokker-Planck equation with a uniform mobility µ, where ∂ t ≡ ∂/∂t and ∂ x ≡ ∂/∂x denote the time and spatial derivatives.
As an example, we consider the case where we can keep track of the mean and variance of the position x of the particle over relaxation processes, but we cannot reliably get the higher moments due to the limited number of relaxation trajectories obtained from experiments.This assumption corresponds to the choice of two observables, B 1 (x) = x and B 2 (x) = x 2 , in our framework.To perform the tilted equilibrium measurements, we need to modify the energy to which is still a harmonic potential plus the ragged potential.Therefore, we can realize the tilted equilibrium by modulating the center and the width of the harmonic trapping potential.We need to measure the mean and variance of the position x at many tilted equilibrium distributions, interpolate the data to find η(θ), and compute ∆F (θ) as described in Sec.III A. If ϵ r (x) = 0 and the initial distribution is Gaussian, the realizability condition [Eq.( 12)] holds.To see this, we first note that p(t) is Gaussian for all t ≥ 0 for a Gaussian initial distribution [52].Since we can make any harmonic potential by changing θ 1 and θ 2 in Eq. ( 15) for ϵ r (x) = 0, we can realize any Gaussian distribution as a tilted equilibrium.Therefore, the realizability condition is satisfied, and the equality holds in Eq. (11a), allowing us to extract the exact value of ∆H[ p(t)] from the tilted equilibrium data.In the case of a nonzero ragged potential, the realizability condition no longer holds.Nevertheless, if ϵ r (x) is small enough, we expect that p(t) is approximately realizable as a tilted equilibrium distribution, and therefore, Eq. (11b) gives a fairly accurate estimate of ∆H[ p(t)].

B. Numerical results
We numerically demonstrate our inference scheme for ϵ r (x) = b cos cx.In all calculations, we scale the energy by k b T , the length by l 0 √ k b T/a, and the time by t 0 1/aµ.We set b = 0 (harmonic; the upper panels of Fig. 2) or b = 1k b T (ragged; the lower panels of Fig. 2) and c = 6(l 0 ) −1 .The initial distribution is the Gaussian distribution with mean −3l 0 and variance 0.15(l 0 ) 2 for all calculations.No other parameters are free to choose after scaling the quantities.See also Appendix E for more details on the numerics.
We plot the potential ϵ(x) and the time evolution p(t), both assumed to be unmeasurable, in Fig. 2(a).The ragged system has potential barriers that separate the state space into metastable wells.Figure 2(b) shows the tilted equilibrium data η(θ).In the ragged system, η(θ) has a steplike feature, reflecting the multiwell structure of the potential.We sampled η(θ) from sufficiently dense data points over the θ space, leaving the consideration of sparse data points to future work.
Figure 2(c) shows the minimum EP ∆H m ( η(t)) compatible with the observed mean and variance, which is calculated using the tilted equilibrium data via Eq.(11b).The true EP ∆H[ p(t)] is also plotted, which is not measurable.The figure shows that the calculated value ∆H m ( η(t)) is equal to or less than the true EP ∆H[ p(t)], thus confirming the inequality in Eq. (11a).Moreover, these two quantities agree exactly for b = 0, and they are in fairly good agreement for b = 1k b T [Fig.2(d)].The former is expected from the realizability condition, as discussed in Sec.IV A. The latter, in contrast, is rather surprising because the height of the ragged potential is 2b = 2k b T , which is not very small compared with k b T .This example shows that, even if the realizability condition is violated, the minimum EP can be a good approximation of the true EP.We also present an extensive result over a wide range of b.

V. Realizability condition and additional inference methods
As discussed in Sec.III C, the realizability condition [Eq.(12)] ensures that the inequality in Eq. (11a) holds with equality, allowing the inference of the exact value of EP.We discuss three situations where the realizability condition is satisfied in Sec.V A. Moreover, assuming the realizability condition, we can extract additional information about the relaxation process from the tilted equilibrium measurements, including the EP rate, the thermodynamic force, and a constraint on relaxation paths.We present these additional inference methods in Secs.V B-V D.

A. Sufficient conditions for the realizability condition
There are several physically natural situations in which we can ensure the realizability condition (see Appendix A for details).The first situation is the existence of a timescale separation: We can ensure the realizability condition except for the initial fast relaxation if the states are lumped into groups, the Markovian transition rates within a group are sufficiently larger than the rates between differ-ent groups, and we can track the total probability of each group as the observables.The second situation is when the system admits a symmetry: The realizability condition holds if the system energy and the time evolution law are symmetric under some permutations of the discrete states, the initial condition has the same symmetry, and we can track all observables obeying the symmetry.Another situation is when the process is Gaussian: We can ensure realizability condition if the state energy is a (possibly multivariate) harmonic potential, the time evolution is governed by the Fokker-Planck equation with a uniform mobility, the initial distribution is Gaussian, and we can track the mean and covariance matrix as the observables.This third situation has been demonstrated in the example (Sec.IV).Note that each of these conditions jointly concerns the system energy, the time evolution law, the initial condition, and the choice of observables.
If we have enough information about the system to ensure that the system satisfies one of these sufficient conditions for the realizability condition, we can obtain the exact value of the true EP from Eq. (11b), and we can obtain additional information by the methods described below in Secs.V B-V D. Alternatively, if we can expect the system to approximately satisfy one of these sufficient conditions, we expect p(t) to be approximately realized as a tilted equilibrium distribution.In this case, ∆H m ( η(t)) will be a good approximation of ∆H[ p(t)], and the additional inference methods below will provide reasonable estimates of the additional quantities.

B. EP rate
The first quantity obtained under the realizability condition is the EP rate, σ(t) Under the realizability condition, we have the equality ∆H[ p(t)] = ∆H m ( η(t)), and thus, we can obtain the EP rate by simply differentiating ∆H m ( η(t)), which is calculated from Eq. (11b).Alternatively, we have the following equality (see Sec. VI A for derivation): The right-hand side of this equation is calculated by differentiating the nonstationary data η(t) and finding θ( η(t)) from the tilted equilibrium data η(θ).Equation ( 16) also provides a decomposition of the EP rate into the dissipation due to the change in the expectation value of each observable.From this equation, we can regard −T −1 θ α ( η(t)) as the thermodynamic force conjugate to the probability flux incurring the change in the expectation values d t ηα (t).
In Fig. 4(a), we demonstrate the inference of the EP rate from Eq. (16) for the same example systems as in Sec.IV.Without a ragged potential (the upper panel), the realizability condition is exactly satisfied, and therefore the right-hand side of Eq. ( 16) gives the exact value of σ(t).With a nonzero ragged potential (the lower panel), the realizability condition is approximately satisfied, and indeed the right-hand side of Eq. ( 16) gives a good estimate of the EP rate.

C. Thermodynamic force
The second quantity obtained under the realizability condition is the thermodynamic force (affinity) over the state space.For discrete-state systems, the thermodynamic force from state y to x at time t is given by [49] The thermodynamic force is related to the EP rate by σ(t) = 1 2 x,y ȷ(x, y; t) f (x, y; t), where ȷ(x, y; t) is the net probability flux from state y to x at time t [49].Thus, the thermodynamic force quantifies the EP rate due to the probability flux from y to x.Under the realizability condition, we can calculate the thermodynamic force f (x, y; t) from the considered data as (see Appendix B 4 for derivation) The right-hand side can be calculated from the nonstationary data η(t) and the tilted equilibrium data η(θ), assuming that we know the values of the observables B(x) in each state.
For continuous-state systems, the thermodynamic force at x ∈ R d is defined as a continuous version of Eq. ( 17) [53], f (x; t) −T −1 ∇ϵ(x) − k b ∇ ln p(x; t), which is a d-dimensional vector.It is related to the EP rate by σ(t) = X dx ȷ(x; t) • f (x; t), where ȷ(x; t) is the d-dimensional probability current at time t.We can similarly calculate the thermodynamic force from the data as f (x; t) = −T −1 K α=1 θ α ( η(t))∇B α (x) under the realizability condition.

D. Constraint on the time evolution
The final piece of information obtained from tilted equilibrium measurements is a constraint on the time evolution.Let us introduce a function of the external field parameters θ: Then the time evolution of the observables η(t) must satisfy assuming a Markovian time evolution in addition to the realizability condition (see Appendix B 5 for derivation).This restriction on the possible time trajectories is different from and complementary to the second law of thermodynamics, −d t ∆H[ p(t)] ≥ 0. Note that ⟨B α ⟩ p eq is independent of θ, and the second term of L(θ) is linear in θ.If we shift the definition of each observable by a constant so that ⟨B α ⟩ p eq = 0, L(θ) coincides with ∆F (θ).
In Fig. 4(b), we demonstrate the monotonicity in Eq. (20) with the same example system as above.For the harmonic system (the upper panel), the realizability condition is satisfied, and the function L(θ) is indeed monotonically increasing.For the ragged system (the lower panel), the realizability condition holds only approximately, but L(θ) is still monotonically increasing.

A. Derivation of the central relation [Eq. (10)]
The derivation of Eq. ( 10) involves two nontrivial tasks: expressing the minimum EP in Eq. (3) in a closed form and relating the minimum EP to the tilted equilibrium quantities.We sketch how these two tasks are accomplished, leaving the detailed calculation to Appendix B.
To express the minimum EP in a closed form, we first find the following relation for any set of expectation values η and any distribution q that satisfies ⟨B⟩ q = η: (21) where D kl [p 1 ∥p 2 ] = X dx p 1 (x) ln[p 1 (x)/p 2 (x)] is the Kullback-Leibler (KL) divergence [54].Taking the minimum of both sides of Eq. ( 21) with respect to q that satisfies ⟨B⟩ q = η for a fixed η, the left-hand side reduces to the definition of ∆H m (η) in Eq. ( 3).The minimum of the right-hand side is achieved at q = p te (θ(η)) since the KL divergence D[p 1 ∥p 2 ] is nonnegative, and it is zero if and only if p 1 = p 2 .Thus we have This successfully expresses the minimum EP in a concrete form.
The other element of the proof is a Legendre duality over probability distributions.Using the expression of ∆H m (η) in Eq. ( 22), we can prove that the two functions −∆F (θ) and ∆H m (η) are strictly convex and connected by a Legendre transform: where the correspondences θ(η) and η(θ) are the same as those already defined in Sec.III A. Equation (23b) is identical to our central relation in Eq. (10).Equation (23a) ensures that the correspondence between θ and η is oneto-one, and thus the solution to Eq. ( 9) is unique.Equation (23a) also proves the expression of ∆F (θ) as a line integral in Eq. ( 8), as well as the expression of the EP rate in Eq. ( 16).

B. Equality condition and scaling of the error
We prove that the realizability condition [Eq.(12)] is the equality condition of the inequality in Eq. (11a).Inserting q = p(t) and η = η(t) into Eqs.( 21) and (22), we find that the difference between the two sides of the inequality in Eq. (11a) is given by Since the KL divergence is zero if and only if the two arguments are equal, this difference vanishes if and only if p(t) = p te (θ( η(t))).If this happens, then obviously the realizability condition is satisfied.Conversely, if the realizability condition holds, then the set of parameter values θ in Eq. ( 12) must satisfy ⟨B⟩ p te (θ) = ⟨B⟩ p(t) = η(t), and therefore, it is given by θ = θ( η(t)).Thus, we have p(t) = p te (θ( η(t))), and Eq. ( 24) vanishes.
We can also show that the difference in Eq. ( 24) scales quadratically with the magnitude of the violation of the realizability condition, which explains the observed behavior in Fig. 3(b).Consider a reference system that satisfies the realizability condition and another perturbed system with a slightly different energy, time evolution equation, initial condition, or set of observables.We use λ for the magnitude of any of these perturbations.In Appendix C, we show that the perturbed system generically obeys the scaling p(t) − p te (θ( η(t))) = O(λ), assuming a Markovian time evolution and some mild conditions.Combined with the expansion of the KL divergence between two close distributions, D kl [p 1 ∥p 2 ] ≃ 1 2 X dx [p 1 (x) − p 2 (x)] 2 /p 1 (x) in the leading order of p 1 − p 2 [48], we can see that the error term scales as D kl [ p(t)∥p te (θ( η(t)))] = O(λ 2 ).Combined with Eq. ( 24), we conclude Therefore, our method gives the true EP up to the first order in the magnitude of the violation of the realizability condition.

VII. Discussion
In this paper, we have developed a method of thermodynamic inference that uses tilted equilibrium measurements.The method enables us to obtain the exact value of the minimum EP ∆H m (η) compatible with the observed set of expectation values η.This method applies to any classical stochastic system that relaxes to equilibrium with any choice of observables.Furthermore, if we have enough information about the system to ensure that the realizability condition holds, or at least that the realizability condition is approximately satisfied, we can extract the true EP, the EP rate with its decomposition, the thermodynamic force, and a constraint on relaxation paths.
Compared with existing methods of thermodynamic inference for nonstationary processes [24,[39][40][41], our approach significantly reduces the demand for nonstationary measurements.Our method requires only the expectation values of a few arbitrary observables, which is insufficient for any of these existing methods.This reduced demand is achieved at the expense of tilted equilibrium measurements.Therefore, our method will be useful when one cannot practically collect sufficiently many trajectories of a relaxation process to infer EP only from nonstationary data using previously proposed methods, but one can freely apply a few types of external fields to the system.This would include both experiments and numerical simulations.
Our method generally provides only the lower bound of EP, but it gives the optimal lower bound in the sense that there exists a distribution p(t) that saturates the inequality in Eq. (11a) for any η(t) since ∆H m (η) is defined by a minimization in Eq. (3).In other words, given only η(t) for each t separately as nonstationary data, our lower bound is the best possible.Moreover, our lower bound ∆H m (η) is a meaningful thermodynamic cost since it is interpreted as the minimum cost required to realize the set of expectation values η, as discussed below Eq. ( 3).This is in contrast with other lower bounds of EP that involve only a few observable values, such as from thermodynamic uncertainty relations for nonstationary processes [55][56][57].These lower bounds are meaningful as statistical quantities, such as the precision of a current observable, but they do not admit interpretations as thermodynamic costs in general.
Our results open an avenue of thermodynamic inference: inference for nonstationary processes based on static measurements.We leave several directions open for future work.First, our method assumes that the set of measurable observables and the set of available external fields are both given by B. However, these two sets are often different in natural situations.Considering this difference is important to make our method more useful.Second, our method is exact in the sense that it provides the exact value of the minimum EP ∆H m (η).Approximating the minimum EP with a smaller amount of tilted equilibrium data is an interesting direction.Finally, it is essential to extend the applicability of our approach.As discussed in Appendix D 1, our results can be easily extended to systems with internal entropy of states [21] and exchange of particles with a single particle reservoir.On the other hand, extending our results to systems with multiple baths is nontrivial.This extension is possible on a formal (mathematical) level by considering tilted nonequilibrium stationary distributions (see Appendix D 2), but making it practical is left to future work.Extending our approach to driven systems is also a nontrivial and important direction.
From a mathematical point of view, the Legendre transform we have introduced in Eq. ( 23) is at the level of probability distributions, and it does not rely on asymptotics.Such a microscopic Legendre transform has previously appeared in other fields such as information geometry [47,48] and the foundations of statistical mechanics [58], and we have formulated the Legendre transform in Eq. ( 23) by borrowing ideas from information geometry and replacing information theoretic quantities with thermodynamic ones, such as the Shannon entropy with the thermodynamic EP.This formulation extends the existing connections between information geometry and thermodynamics [59][60][61][62].Detailed explorations of this geometric picture of stochastic thermodynamics is left for future work.JPMJPR18M2, JST ERATO Grant No. JPMJER2302, and UTEC-UTokyo FSI Research Grant Program.

Appendix A: Sufficient conditions for the realizability condition
We discuss three situations in which we can ensure the realizability condition in Eq. (12).A similar discussion based on information geometry that relates time evolutions of stochastic systems to a constrained set of probability distributions is found in Ref. [63].

Time-scale separation
The first situation is the existence of a time-scale separation.Consider that the states are grouped (coarse-grained) into K +1 disjoint sets of states G 0 , G 1 , . . ., G K .We assume the time-scale separation, i.e., that the Markovian transitions between two states in one group are much faster than the transitions between two states in different groups [20][21][22].We further assume that we can keep track of the expectation values of the observables χ 0 , χ 1 , . . ., χ K , where χ α (x) is defined as χ α (x) = 1 for x ∈ G α and χ α (x) = 0 for x G α .The expectation value ⟨χ α ⟩ p(t) gives the total probability of the αth group at time t.The observables satisfy K α=0 χ α (x) = 1 for all x.To connect this setup to our formulation, we choose B α = χ α for α = 1, . . ., K as the observables.Here, we exclude α = 0 so that the set of observables B is linearly independent with the constant observable.
Under these assumptions, we prove the realizability condition except for the initial fast relaxation.For this purpose, we introduce the coarse-grained probability Q(α; t) ⟨χ α ⟩ p(t) and its equilibrium value Q eq (α) ⟨χ α ⟩ p eq .The time-scale separation implies that the conditional probability within each group, p(x; t)/ Q(α; t) for x ∈ G α , rapidly relaxes to the conditional canonical distribution p eq (x)/Q eq (α) [20][21][22].Thus, the nonstationary probability distribution has the form of except for the initial fast relaxation.Using Eq. (A1) and p eq (x) = exp{−[ϵ(x) − F eq ]/k b T }, where F eq is the equilibrium free energy, we have .
(A2) By inserting χ α (x) = B α (x) for α ≥ 1 and χ 0 (x) = 1 − K α=1 B α (x) into Eq.(A2), we can rearrange Eq. (A2) as with a set of real numbers (θ α ) K α=1 , where the constant term is independent of x.Rearranging gives Thus, p(x; t) is realized as a tilted equilibrium with an external field of the form of Eq. ( 4), which proves the realizability condition.Similarly, we can expect the realizability condition to hold approximately if the transitions within a group are faster than the transitions between different groups, but their time scales are not sufficiently separated.

Symmetry
Another situation is when the system and the initial distribution obey a symmetry [63].Focusing on a discrete space X = {1, . . ., N}, we consider a symmetry expressed by a permutation group G on X, whose element g ∈ G is a bĳection from X to itself.We assume that the state energy is invariant under the permutations, ϵ(x) = ϵ(g(x)) for all g ∈ G and all x ∈ X.We also assume that the time evolution law obeys the symmetry.For example, if the time evolution is Markovian and given by ∂ t p(x; t) = y [W(x, y) p(y; t) − W(y, x) p(x; t)], where W(x, y) is the transition rate from y to x, then we assume W(x, y) = W(g(x), g(y)) for all g ∈ G and all x, y ∈ X.We also impose the symmetry on the initial distribution p(x; 0) = p(g(x); 0).We assume that we can keep track of the expectation values of all observables that are invariant under the permutations.In other words, the observables B 1 (x), . . ., B K (x) and the constant observable form a basis of the linear subspace {w ∈ R N | w(x) = w(g(x)), ∀g ∈ G, ∀x ∈ X} of R N .
Under these conditions, we prove the realizability condition in Eq. ( 12) for all t ≥ 0. First, by symmetry considerations, the nonstationary distribution obeys the symmetry for all t > 0: p(x; t) = p(g(x); t). (A5) Therefore, the vector {k b T ln p(x; t) + ϵ(x)} x∈X has the same symmetry, and thus, it can be expanded in terms of B 1 , . . ., B K and the constant observable as with a set of real numbers (θ α ) K α=0 .Rearranging Eq. (A6) gives This is the form of the equilibrium distribution under the external field α θ α B α (x).Therefore, p(t) is realizable as a tilted equilibrium, and the realizability condition holds.
Similarly, the realizability condition is expected to hold approximately if the symmetry does not hold perfectly but holds approximately.

Harmonic potential
The third situation is when the system has a (possibly multivariate) harmonic potential.More precisely, focusing on a continuous state space x ≡ (x 1 , . . ., x d ) T ∈ X = R d , we assume that the potential is harmonic, ϵ(x) = x T ax + b T x, where a is a d × d positive-definite symmetric matrix, and b is a d-dimensional vector.We also assume that the time evolution is given by the Fokker-Planck equation, where µ is a d × d positive-definite symmetric mobility tensor, with a Gaussian initial distribution.As for the observables, we assume that we can keep track of the mean and the covariance matrix.This amounts to choosing x 1 , . . ., x d , (x 1 ) 2 , . . .
Under this setup, we prove the realizability condition in Eq. ( 12) for all t ≥ 0. First, the solution of Eq. (A8) is Gaussian for all t ≥ 0 [Sec.VIII 6,52], and therefore, it can be written in a form of using a d × d symmetric matrix ã(t) and a d-dimensional vector b(t).We can rewrite Eq. (A9) as as a linear combination of the observables.This shows that p(x; t) is realized as a tilted equilibrium distribution with an external field of the form α θ α B α (x), and thus, the realizability condition is satisfied.
Even if the initial distribution is not Gaussian, the realizability condition holds asymptotically after the initial fast transition.Restricting to a one-dimensional case for simplicity, in which a, b, and µ are scalar, we write the solution of the Fokker-Planck equation in terms of the changes in the cumulants.The deviation of the nth cumulant (n = 1, 2, . . . ) from its equilibrium value decays exponentially in time as exp(−2nµat) [52], and the equilibrium values are zero for n ≥ 3. Thus, only the first two cumulants remain significant after the initial fast transition, which means that the distribution is close to Gaussian, and the realizability condition holds asymptotically.
We can expect that the realizability condition holds approximately when the potential is close to but not exactly Gaussian.We explore this situation in the example in Sec.IV.

Appendix B: Detailed derivation of the results
In Appendixes B 1-B 3, we follow the steps outlined in Sec.VI A to derive the inference of the minimum EP [Eq.(10)].In Appendixes B 4 and B 5, we derive the additional inference schemes described in Sec.V.

KL divergence
We start the derivation by relating the thermodynamic quantities to the KL divergence.For any probability distribution p, where F eq ≡ F (0) is the equilibrium free energy at p eq , and we have used p eq (x) = exp{−[ϵ(x) − F eq ]/k b T }.For p = p eq , Eq. (B1) gives H[p eq ] = F eq .Thus, we obtain which is a well-known expression of the EP [64][65][66].
Next, we rewrite the tilted equilibrium free energy in terms of the KL divergence.For two arbitrary sets of external field parameters, θ and θ ′ , we obtain Substitutions θ → 0 and θ ′ → θ give an expression of ∆F (θ), where we used p te (0) = p eq .Note that a similar expression has been found in Ref. [67].

Explicit expression of the minimum EP
We prove Eq. ( 21), which is one of the two elements of the proof of Eq. (10).Let η be any set of expectation values and q be any distribution that satisfies ⟨B⟩ q = η.The given set of expectation values η determines the set of parameter values θ(η) and the tilted equilibrium p te (θ(η)), which we write as θ and p te for conciseness.Equation ( 9) reads ⟨B⟩ p te = η in this shorthand notation.Then the three distributions, q, p te , and p eq , satisfy the generalized Pythagorean theorem [48]: To prove this relation, we calculate the difference as follows: The third equality follows from X dx p te (x) = X dx q(x) = 1, and the last equality is due to the relation of expectation values ⟨B⟩ p te = η = ⟨B⟩ q .Using Eq. (B2), we can rewrite Eq. (B5) as Recalling the abbreviation p te ≡ p te (θ(η)), Eq. (B7) is identical to the desired relation in Eq. ( 21).We note that inserting Eqs. ( 22) and (23b) into Eq.(B7) gives an expression like the Donsker-Varadahn representation [68], but the use of coordinates θ and the restriction of the set of observables are unique to information geometry.

Legendre duality
We prove the Legendre transform between ∆H m (η) and ∆F (θ) in Eq. (23), which is the second element of the proof of Eq. (10).
First, we prove the Legendre transform from ∆F (θ) to ∆H m (η).We fix an arbitrary set of parameters θ and use η ≡ η(θ) = ⟨B⟩ p te (θ) for the corresponding set of expectation values.The derivative relation ∂∆F /∂θ α = −η α is proved as This relation states that the derivative of the equilibrium free energy by an external field parameter gives the corresponding expectation value, which is a classical result in statistical mechanics (e.g., Ref. [69]).To prove the relation between the two functions, ∆H m (η) = ∆F (θ) + α θ α η α , we use the expression of ∆H m (η) in Eq. ( 22) and find where we have used Eqs.(B2) and (B3).Next, we prove the inverse transformation from ∆H m (η) to ∆F (θ).For this purpose, it suffices to show that ∆F (θ) is a strictly concave function.Then the general theory of Legendre duality for strictly convex or strictly concave functions (e.g., Ref. [70]) leads to the inverse transform ∂∆H m /∂η α = θ α and ∆F (θ) = ∆H m (η) − α θ α η α .The general theory also ensures that ∆H m is strictly convex.
To prove the strict concavity of ∆F , we consider two sets of parameter values θ, θ ′ .We consider the graph of the function ∆F ( • ) and its tangent plane at θ.The tangent plane, which we write as T ( • ; θ), is given by This tangent plane is always above the graph of F ( • ): where we have used the derivative relation in Eq. (B8) in the first equality, and the second equality follows from Eq. (B3).The last inequality holds with equality if and only if p te (θ ′ ) = p te (θ).This happens if and only if θ ′ = θ since we have assumed that the observables B and the constant observable are linearly independent.Thus, the function ∆F is strictly concave.

Thermodynamic force
We derive the relation between the thermodynamic force and the external fields in Eq. ( 18) under the realizability condition.Since the realizability condition implies p(t) = p te (θ( η(t))) (Sec.VI B), we have ln p(x; t) p(y; t) = ln p te (x; θ( η(t))) p te (y; θ( η(t))) Comparing this relation with the definition of the thermodynamic force in Eq. ( 17), it is easy to prove Eq. ( 18).The continuous-state version is proved similarly by replacing the differences in quantities between two states by their spatial derivatives, e.g., ϵ(x) − ϵ(y) by ∇ϵ(x).

Monotonicity
We prove the monotonicity of the function L(θ) in Eq. ( 20), assuming the realizability condition and a Markovian time evolution.Comparing Eqs.(B4) and (19), we obtain L(θ) = −k b T D kl [p eq ∥p te (θ)] for any θ.Combined with the realizability condition, we get (B13) For Markovian time evolutions, the KL divergence has the contraction property [52], for two arbitrary trajectories, p(t) and q(t), obeying the same Markovian time evolution.By substituting p eq for q(t) and noting that p eq is the fixed point of the time evolution, we obtain d t D kl [p eq ∥ p(t)] ≤ 0. Combining this equation with Eq. (B13) proves the desired monotonicity in Eq. (20).Note that substituting p eq for p(t) in Eq. (B14) and using the expression in Eq. (B2) gives the monotonicity d t ∆H[ p(t)] ≤ 0, which is the second law of thermodynamics [49].Therefore, the monotonicities of L and ∆H have similar mathematical origins.

Appendix C: Perturbative calculation of the error
We compare a reference system that satisfies the realizability condition with a slightly different (perturbed) system that does not fulfill the realizability condition.Our goal is to show that the discrepancy between p(t) and p te (θ( η(t))) scales linearly with the magnitude of the perturbation.
We specify the reference system by the Markovian time evolution generator L , the state energy ϵ ≡ {ϵ(x)} x∈X , the initial distribution p(0) ≡ { p(x; 0)} x∈X , and the set of observables B ≡ {B(x)} x∈X .The Markovian time evolution generator, used only in this appendix, generates the time evolution as d t p(t) = L p(t).We do not assume any particular form of it, only imposing the relaxation to the equilibrium p eq (x) ∝ e −ϵ(x) .Here and until the end of this appendix, we set k b T = 1 for simplicity.
At a fixed time t, the reference system has the distribution p(t), the set of expectation values η(t) = ⟨B⟩ p(t) , the corresponding set of external field parameters θ( η(t)), and the tilted equilibrium p te (θ( η(t))).For conciseness, we use the symbols p, η, θ, and p te to denote these quantities for the reference system at the fixed time t.We assume that the reference system satisfies the realizability condition.As discussed in Sec.VI B, the realizability condition is equivalent to p(t) = p te (θ( η(t))), which reads p = p te in the shorthand notation adopted here.
First, the perturbed probability distribution at time t is where ≃ denotes equality to the leading order in λ.Since p = e L t p(0) for the reference system, the perturbation term has the exponent β = 1.Note that we cannot exclude the possibility that the O(λ) term in Eq. (C1) vanishes identically, but in that case, we can still set β = 1 and write Eq. (C1) as p + λ p′ + o(λ) with p′ = 0.This caveat also applies to λ γ η′ , λ µ θ ′ , and λ ν p ′ te below.The perturbed expectation values of the observables are The reference system has the set of expectation values η = X dx p(x)B(x), and thus, the exponent of the perturbation term is γ = 1.
Next, we evaluate the perturbations to the set of parameter values θ.It is determined by solving with respect to θ + λ µ θ ′ , where θ + λ µ θ ′ is related to p te + λ ν p ′ te as where we have defined the covariance ⟨⟨X, Y⟩⟩ ⟨XY⟩ − ⟨X⟩⟨Y⟩.Inserting this into Eq.(C3), the resulting zerothorder equation, ⟨B⟩ = η, is the equation for determining θ of the reference system, and therefore, it is already satisfied.The remaining terms give the equation to determine λ µ θ ′ : (C7) This gives the exponent µ = 1.
Finally, we calculate the perturbation to p te .Substituting µ = 1 back into Eq.(C4), we find that the exponent of the perturbation to p te is ν = 1.
Combining the above results, we can evaluate the discrepancy between the nonstationary distribution and the tilted equilibrium distribution for the perturbed system as where we have used the realizability condition for the reference system.From Eq. (C8), it is fair to say that the discrepancy generically scales linearly in λ, which is the fact we used in Sec.VI B to evaluate the scaling of the error in the inference of the true EP.The exception is when p′ − p ′ te happens to vanish, in which case the scaling of the discrepancy is of higher order, and the error is even smaller.
respectively, to eliminate T from the theory because the temperature T is not defined when the system is in contact with multiple heat baths.With these replacements, we can formally redefine the EP and the tilted equilibrium free energy as The difference ∆ Ĥ[p] = Ĥ[p] − Ĥ[p st ] is known as the Hatano-Sasa excess EP [71] and the nonadiabatic EP [72] over the process from p to p st .We can formally reproduce all our results with these replacements.If we can physically implement an external field that modifies the stationary distribution p st to a tilted stationary distribution proportional to exp{−[ϕ(x) − v(x; θ)]}, we can use the procedure in the main text to obtain the minimum of ∆ Ĥ[p] compatible with the observed expectation values, and we can also calculate other thermodynamic quantities under the realizability condition.However, this generalization remains formal because we cannot physically realize the tilted stationary distribution generically.

Appendix E: Supplemental information on the example 1. Details of the numerical calculations
In the numerical calculations (Figs.2-4), we numerically solved the Fokker-Planck equation by discretizing the space and solving the resulting discrete-space continuoustime Markov jump system.We calculated the expectation values of the observables, the EP, and the tilted equilibrium free energy by replacing the integrals with the sums over the discretized space.We confirmed that the results do not depend on the spatial mesh size.We also checked that the results are consistent with the analytical calculation when ϵ r (x) = 0, which is detailed in the next section.

Analytical calculation for the harmonic system
We analytically calculate the relevant quantities for the example system in the main text (Sec.IV) with ϵ(x) = ax 2 , i.e., in the absence of the ragged potential.Assuming a Gaussian initial distribution, the system satisfies the realizability condition, as discussed in the main text.
In the tilted equilibrium measurements, we apply the external fields in Eq. ( 4) and measure the expectation values of the observables B 1 and B 2 .The modified state energy due to the external fields is The modified state energy is harmonic, and therefore, the tilted equilibrium is a Gaussian distribution with mean θ 1 /2(a − θ 2 ) and variance k b T/2(a − θ 2 ).The set of expectation values at the tilted equilibrium is The tilted equilibrium free energy is The inverse correspondence from η to θ is , (E5) where the term η 2 − (η 1 ) 2 is the variance of the tilted equilibrium distribution.

FIG. 2 .
FIG. 2. Demonstration of our inference method with a one-dimensional Brownian particle with ϵ(x) = ax 2 + b cos cx.The upper panels are with b = 0 (harmonic), and the lower panels are with b = k b T (ragged).We scale the energy by k b T , the length by l 0 √ k b T/a, and the time by t 0 1/aµ, where µ is the mobility.See the main text for parameter values.(a) The potential energy ϵ(x) (gray) and the probability distributions over a relaxation process (blue; vertically shifted).These quantities are assumed to be unmeasurable.(b) The tilted equilibrium measurement measures the expectation values of the observables (η 1 (θ), η 2 (θ)) with varying the external field parameters (θ 1 , θ 2 ).(c) Inference of the minimum entropy production (EP).The minimum EP ∆H m ( η(t)) is obtained from the measurements (red), while the true EP ∆H[ p(t)] is not measurable (blue).(d) The difference between the true EP and the obtained minimum EP, ∆H[ p(t)] − ∆H m ( η(t)).

FIG. 3 .
FIG. 3. Relative error {∆H[ p(t)] − ∆H m ( η(t))}/∆H[ p(t)] between the true entropy production (EP) and the calculated minimum EP.(a) Relative error over time, plotted for several values of the height b of the ragged potential.(b) Relative error over b, plotted for four time points.Each symbol represents a time point indicated in (a).The relative error grows quadratically in b (dashed line).
Figure 3(a) shows the relative error, ∆H[ p(t)] − ∆H m ( η(t)) /∆H[ p(t)], for various values of b.The relative error is zero for b = 0 due to the realizability condition, and it increases with b.It does not significantly depend on t.As shown in Fig. 3(b), the relative error for a fixed t scales quadratically in b.Therefore, the calculated value ∆H m ( η(t)) equals the true EP ∆H[ p(t)] up to the first order in b.In fact, this quadratic dependence is a generic property, as discussed in Sec.VI B.

FIG. 4 .
FIG. 4. Additional inference under the realizability condition demonstrated with the same systems as in Fig. 2. The harmonic system (upper panels) satisfies the realizability condition, while the ragged system (lower panels) satisfies it only approximately.(a) Inference of the entropy production (EP) rate.The true EP rate T σ(t) = −d t ∆H[ p(t)] (blue) and the obtained estimate −θ( η(t)) • d t η(t) (red) are in exact agreement under the realizability condition.We also plot the decomposed values −θ α ( η(t))d t ηα (t) (green).(b) Constraint on Markovian time evolution.The function L(θ( η(t))) increases monotonically with time under the realizability condition.