Housekeeping and excess entropy production for general nonlinear dynamics

We propose a housekeeping/excess decomposition of entropy production for general nonlinear dynamics in a discrete space, including chemical reaction networks and discrete stochastic systems. We exploit the geometric structure of thermodynamic forces to define the decomposition; this does not rely on the notion of a steady state, and even applies to systems that exhibit multistability, limit cycles, and chaos. In the decomposition, distinct aspects of the dynamics contribute separately to entropy production: the housekeeping part stems from a cyclic mode that arises from external driving, generalizing Schnakenberg's cyclic decomposition to non-steady states, while the excess part stems from an instantaneous relaxation mode that arises from conservative forces. Our decomposition refines previously known thermodynamic uncertainty relations and speed limits. In particular, it not only improves an optimal-transport-theoretic speed limit, but also extends the optimal transport theory of discrete systems to nonlinear and nonconservative settings.


I. INTRODUCTION
The notion of "nonequilibrium" has two aspects: breaking of detailed balance and nonstationarity. The housekeeping/excess decomposition of entropy production rate (EPR) is a way to deal with these two aspects of nonequilibrium systems [1][2][3][4]. Such a decomposition has previously been formulated based on steady states: the housekeeping EPR characterizes how detailed balance is broken in a steady state, while the excess part quantifies the additional dissipation that is needed to transition between steady states. The best known decomposition of this kind was proposed by Hatano and Sasa (HS) [2] (also known as the adiabatic/nonadiabatic decomposition [3]), while an important alternative decomposition was proposed by Maes and Netočný (MN) [4]. The practical utility of these decompositions is that they provide meaningful thermodynamic bounds for nonequilibrium systems where the integrated entropy production (EP) diverges, by discounting the housekeeping EP, which also diverges, to give a finite excess EP [1].
However, previously known decompositions have a crucial limitation. Namely, they are not defined for general nonlinear dynamics, such as general chemical reaction networks. Such systems are not always globally stable and can exhibit various nontrivial phenomena such as limit cycles, bifurcations and multistability, in contrast to existing EPR decompositions which assume global stability. More precisely, the HS decomposition has been generalized to chemical reaction networks only if there is a special steady state, called complex balanced steady state [5,6], which is always globally stable [7]. However, even if a chemical reaction network has a stable steady state, it may not be complex balanced. In that case, it is possible for the HS decomposition to give a negative value of excess EPR, which makes it difficult to interpret the decomposition physically. Because a complex balanced steady state is "almost detailed balanced" in the sense that it always shows We propose a decomposition of (a) the total entropy production rate (EPR) in general nonlinear dynamics in a discrete space (e.g., a chemical reaction network, and a Markov jump process) into (b) the excess part that reflects a relaxation mode and (c) the housekeeping part that reflects a cyclic mode. The total EPR σ is expressed as the square norm of the thermodynamic force F ∈ R E with the metric L. The projection of the force onto the space of conservative forces {S T ψ | ψ ∈ R S } ⊂ R E defined in Eq. (31) determines the conservative force F * that provides the Onsager excess EPR σ ex . The difference F − F * gives the Onsager housekeeping EPR σ hk , which can be expressed as the sum of cyclic contributions as in Eq. (43). Because F * and F − F * are orthogonal with respect to the inner product given by L, σ ex and σ hk decompose the total EPR, σ = σ ex + σ hk . global stability and cannot exhibit genuine nonlinear phenomena such as multistability or chaos [8], this particular generalization does not help us understand the various physically interesting situations which play a key role in biology and other complex chemical systems [9].
In this paper, we propose an EPR decomposition that applies to general nonlinear dynamics in a discrete space, which include Markov jump processes and chemical reaction networks as special cases (Fig. 1). Because the decomposition is defined based on the geometric structure of thermodynamic forces, it does not require any information about the steady state or even its existence. Since it is not related to steady states, the decomposition is not only applicable to a wider range of dynamics, but it also provides crucial insights into the physical meaning of the individual terms: it allows us to interpret each term of the decomposition based on physical modes of dynamics, beyond the notion of stationarity. The housekeeping part reflects a cyclic mode on the graph of states that does not affect the apparent dynamics, and it quantifies how external driving breaks detailed balance at a given instant of time. As a result, we generalize Schnakenberg's cyclic decomposition of steady-state dissipation into contributions stemming from cycles in the graph of states [10]. On the other hand, the excess term expresses a "pseudo-relaxation" mode even if the system does not relax to a steady state, by extracting a conservative component of the thermodynamic force. Because a conservative force is given by a potential difference of the internal states, it represents an internal free-energy-like contribution to dissipation. Using a corresponding instantaneous "pseudo-equilibrium", we obtain a gradient-flow form of an equation of motion, which was previously used to describe the relaxation to an equilibrium state [11,12]. In summary, our result not only allows us to decompose the EPR in general nonlinear dynamics in a discrete space, but also to separate the effects on dissipation stemming from the external cyclic mode and the internal relaxation mode.
More generally, we discuss connections between our decomposition and optimal transport theory. Optimal transport theory interprets the minimum transportation cost from one probability distribution to another as a distance between them [32], providing a natural geometry for probability distributions. Recent studies have revealed that one can obtain thermodynamic bounds in continuous systems through this geometrical approach, associating thermodynamic costs with the optimal-transport-theoretic distance [28,29,[33][34][35][36][37][38][39][40]. In particular, a distance called L 2 -Wasserstein distance provides a very intuitive type of speed limit τ ≥ W 2 /Σ, with τ being the time the system takes to change, W the L 2 -Wasserstein distance between the initial and final state, and Σ the integrated EP. However, such a simple bound has not been know for general discrete systems, except for a result restricted to detailed balanced stochastic systems [41], because the discretized L 2 -Wasserstein distance has been only defined for such systems [42][43][44]. In this paper, we show that the local geometrical structure which defines our EPR decomposition reveals the physical meaning of the discrete L 2 -Wasserstein distance as minimum excess EP, and allows us to generalize it to general nonlinear dynamics in a discrete space. As a result, we obtain a simple speed limit for general nonlinear dynamics in a discrete space, in which the total integrated EP is replaced by the integrated excess EP, tightening the bound.
The paper is organized as follows. In Sec. II, we introduce basic concepts of nonlinear dynamics in a discrete space, which we characterize as a graph, and propose a general equation of motion that contains master equations and chemical rate equations as special cases. We also define the edgewise Onsager coefficients and the induced inner product, which determine the aforementioned geometrical structure of thermodynamic forces. In addition, we review some relevant previous studies. In Sec. III, we derive the decomposition of the EPR and examine the properties of the excess and housekeeping EPRs, in particular the cyclic decomposition of the housekeeping EPR and a connection between gradient flow and the excess EPR. In Sec. IV, we apply this decomposition to derive refined versions of several TURs. In Sec. V, we introduce the generalized L 2 -Wasserstein distance and examine its mathematical and physical features. We relate it to the excess EPR in Sec. V B, and obtain the speed limit in Sec. V C. Section VI is devoted to the study of two examples that demonstrate how the decomposition of the EPR is obtained and characterize the distinct dynamical modes separately, both analytically and numerically. The first example is a two level system attached to two heat baths at different temperatures, where the relaxation mode is described by a "mean" temperature of the two. The second example is the Brusselator model of chemical oscillation, which exhibits a limit cycle. The appendixes contain mathematical details of the derivations and results.

A. Dynamics
Let us consider general nonlinear Markovian dynamics on a graph whose nodes may include several "states" or "species". It will be shown that the dynamics can lead to master equations and rate equations. Graph theoretical notions introduced below are summarized in Figs. 2 and 3 through examples of Markov jump process and chemical reaction network respectively.
We consider a directed graph with nodes {1, 2, . . . , N } and edges {1, 2, . . . , E}. The stoichiometry of state or species α ∈ {1, 2, . . . , S} at node i is indicated by ν αi ∈ Z ≥0 . The distribution (concentration vector or statistical state) of the system at time t is indicated by a vector x(t) = (x α (t)) ∈ R S ≥0 . An edge represents a pair of reversible transitions or reactions and connects two nodes. We denote the starting node of edge e ∈ {1, . . . , E} as ι(e) and its ending node by ι (e). The reversed direction of edge e is indicated by −e. The structure of the graph is expressed by the incidence matrix B = (B ie ) (b) In a Markov jump process, where each node is identified with a physical state, we define a cycle as a chain of transitions that leaves the system unchanged. Now there are two independent cycles, C1 and C2, whose corresponding vectors S(C1) and S(C2) span linear space ker B. The other cycles, such as C , are given by superposing the two cycles. The notion of cycle differs a bit in chemical reaction network as we explain in the next figure. Figure 3. An example of graph theoretic description of a chemical reaction network. Note that although each edge has a specific direction, it represents a set of reversible reactions. The reaction network is the Brusselator, a paradigmatic model of chemical oscillation, which we consider as an example in Section VI B. Unlike Markov jump processes, a node in a chemical reaction network represents a so-called complex that contains some chemical species as represented by ν. Complex ∅ corresponds to external species whose concentrations are kept constant. Here we cannot find a "cycle" by just looking at the diagram, or equivalently, the incidence matrix B. However, the stoichiometric coefficient matrix S, which governs the dynamics, has a nonempty kernel, which proves that there exists a cycle of reactions that does not change the system. defined by We write the possibly nonlinear and time-dependent flux of ι(e) → ι (e) as K e (x) and ι (e) → ι(e) as K −e (x). We assume the fluxes are always positive K ±e (x) > 0. The current J e (x) on edge e is given by J e (x) = K e (x)−K −e (x). Therefore, the net inflow to node i amounts to e B ie J e (x). Defining S = νB, where ν = (ν αi ), we obtain a continuity equation Throughout this paper, we consider general Markovian dynamics of a distribution x governed by this equation. It can represent Markov jump processes and deterministic chemical reaction networks, as we show below. We sometimes assume that the flux is given by the law of mass action, which however is not essential to our general results. Below, we explicitly note whenever we make the assumption of mass actions kinetics. In addition, when mass action is used, the coefficients k e (t) > 0 may depend on time.
We sometimes abbreviate the dependence of some quantity f (x(t), t) on the state x(t) at time t and the explicit dependence on time t as f (t) if there is no ambiguity; we make the dependence implicit in the majority of the paper.

Example: Markov jump process
Consider the case where each node i is identified with a state α, so ν = I, the identity. Then, by writing x as p and assuming the mass action law, Eq. (2) becomes the master equation where p i can be interpreted as the occupation probability of state i if normalized i p i = 1, and k ±e the transition rate along edge ±e. It can be written as because if e = i → j, B ie = −1 and if e = j → i, B ie = 1, and only one of the two directions is contained in the summation in Eq. (4). We note that the linear form of flux is an example of mass action kinetics since where δ is Kronecker's delta.

Example: Chemical reaction network
The general equation (2) can represent an open chemical reaction network by interpreting x as a concentration distribution c [5]. A node is interpreted as an aggregate of chemical species, technically called complex, which contains ν αi molecules of chemical species α and appears on the left-or right-hand side of a chemical equation. Because the (α, e)element of S = νB reads S is the conventional stoichiometric matrix. When we consider chemical reaction networks, we do not assume the mass action kinetics in general so that we can treat a broad class of non-ideal chemical reaction networks [45]. We put the assumption only when 1) we review previous studies which require it, e.g., gradient flow equations [11] and the HS decomposition [5,6], 2) we illustrate our results numerically where we need to provide a concrete form of K.
When we discuss a result particular to a Markov jump process, we use the variable as p instead of x to indicate that the variable is a probability distribution; on the other hand, when we focus on a chemical reaction network, we will use c to indicate a concentration distribution.

B. Thermodynamics
Let us define the thermodynamic force on edge e by We assume the local detailed balance condition, namely that F e gives the total change of the reduced entropy on the process that edge e represents [5,10,45], where "reduced" means that it is divided by a physical constant, the Boltzmann constant k B for Markov jump processes, or the gas constant R for chemical reaction networks. We do not distinguish the entropy and the reduced one by setting the constant we consider to one. Then, we obtain the following form of the entropy production rate (EPR): For simplicity, hereafter we do not write the argument x explicitly.
In the seminal paper [10], Schnakenberg obtained an expression of the EPR by decomposing the network into cycles. He introduced the fundamental set of cycles {C µ } µ=1,...,M of the graph defined by states and transitions in a Markov jump process. Each cycle corresponds to a vector S(C µ ) = (S e (C µ )) e=1,...,E ∈ ker B and the set {S(C µ )} µ=1,...,M forms a basis of ker B, where ker indicates the kernel of the matrix. Such a set of vectors can also be introduced for a chemical reaction network, where it spans ker S [46]. In general, we refer to a basis of ker S rooted in the cycles of a graph as a cycle basis. A cycle basis is not unique in general; hereafter, we fix some choice of the basis. See also Figs. 2 and 3, where we provide concrete examples of cycles and clarify the slight difference between Markov jump processes and chemical reaction networks.
When the system is in a steady state, the current J st is in the kernel of S, so it can be expanded in the cycle basis as Note that the superscript st is used to indicate the value of a quantity in the steady state. Each expansion coefficient J µ quantifies the current flowing in the cycle C µ . The conjugated cyclic forces are defined by In terms of the currents and forces, we obtain the expression of the EPR as in Refs. [10,46], which shows that the steady-state EPR can be divided into contributions from cycles, whose dissipation can be analyzed separately [47][48][49]. We will generalize this expression out of steady states in Sec. III A.
C. Edgewise Onsager coefficient, gradient flow, and Wasserstein distance We define the edgewise Onsager coefficient by and the diagonal matrix L(x) := diag( e (x)). When K e (x) = K −e (x), by continuity we see e (x) is given by K e (x). It is also easy to see that e (x) > 0 and L(x) is positive definite. For simplicity, we make L and 's dependence on x implicit, but we stress that we always choose the current distribution x as the argument. We write L 1/2 = diag( √ e ). For notational convenience, we define an inner product u, v L := u T Lv and the corresponding norm u L := u, u L . We can use this inner product to express the EPR as which will be the starting point of our results [50]. Note that if we choose any other state like a steady state as the argument of L(·), then we can no longer give the EPR as a squared norm with L being the metric. We remark that as a general property of the logarithmic mean, e (x) is sandwiched between arithmetic mean (K e (x) + K −e (x))/2 and geometric mean K e (x)K −e (x) as where the equalities hold if and only if K e (x) = K −e (x). In Ref. [10], Schnakenberg introduced a symmetric matrix L and the reciprocal relation for the steady-state cycle current and force defined in Eqs. (10) and (11). By the definitions of J and F, we find that connects J st and F st as J st =LF st . However,L is not diagonal in general and does not coincide with L st . Nevertheless, the edgewise Onsager coefficient is useful to rewrite the equation of motion (2) into a gradient flow [11,12], which we review here. This is possible when there exists an equilibrium steady state x eq that satisfies the detailed balance condition and the flux K is given by the mass action law (3). Then the thermodynamic force is given by where ∇ = (∂/∂x α ) is a column vector of the partial differential operators, and D(· ·) is the generalized Kullback-Leibler (KL) divergence, which reduces to the usual KL divergence in the Markov jump case. Let us prove Eq. (19). The detailed balance condition leads to Substituting this into the definition of the thermodynamic force (8), we have with Ψ = ln(x/x eq ), where we define the log of a vector as the vector of the logs of the elements. The proof ends by calculating the derivative of the KL divergence to show Because LF = J, by substituting Eq. (19) into Eq. (2), we obtain the gradient flow expression [11,42] dx dt = −SLS T ∇D(x x eq ). (24) Note that the matrix SLS T is positive semidefinite. When we consider a Markov jump process, SLS T reads BLB T , which is called the Laplacian matrix of a weighted graph [51]. A consequence of the above is that the KL divergence is a Lyapunov function when the system is autonomous, thus x eq is time invariant. The divergence monotonically decreases as Using Eqs. (14) and (19), we can confirm that the time derivative of the KL divergence is equal to the negative of the EPR. For autonomous Markovian dynamics with a detailed balanced steady state, the gradient flow structure has also been utilized to measure the distance between two states. Maas proposed in Ref. [42] an L 2 -Wasserstein distance measure where the infimum is taken under the conditions dp dt = BL(p)B T ψ, p(t = 0) = p (0) , p(t = 1) = p (1) . (27) We can change the time interval from the unit one into an arbitrary one [0, T ] as with the last condition of Eq. (27) replaced with p(T ) = p (1) [52]. Note that the distance is a dynamics-wise measure because L depends not only on the current state but also on the given kinetics as in Eq. (13), or more explicitly as Our definition looks unlike the original one [42], but the equivalence is easily proved (see Appendix A for the proof). Vu and Hasegawa [31] found a speed limit by using the Wasserstein distance as where σ is the EPR of the solution of a master equation {p(t)} t∈[0,τ ] . The speed limit is obtained because the solution of a master equation satisfies the first condition in Eq. (27) as Figure 4. Schematic diagrams of the decompostion of force, which is central to the decomposition of entropy production rate (34). (a) Geometrical illustration of decomposing force F . Projector P onto the space of conservative forces im S is given in Eq. (31). The projected force F * is orthogonal to the remainder F − F * under the inner product ·, · L introduced in Sec. II C. The remaining current J cyc = L(F − F * ) belongs to ker S T , therefore it is cyclic and can be expanded in a cyclic basis as in Eq. (41). (b) The decomposition of force is exemplified on edge e = 2 in the system considered in Fig. 2. The conservative force F * is given by the difference of φ * which is assigned a value on each node. On the other hand, the remaining force F − F * is associated with the coefficients of the linear combination of cycles.
with Ψ i = −(∂/∂p i )D(p p eq ). Since Eq. (28) involves minimization with respect to p and ψ, the Wasserstein distance is smaller than the corresponding expression evaluated for the original dynamics. By choosing the time interval of the solution τ as the time interval in Eq. (28), we obtain the lower bound.

III. ONSAGER-PROJECTIVE DECOMPOSITION OF EPR
We state our main result, a decomposition of the EPR in Markov jump processes and chemical reaction networks, which can be seen as the generalization of the Maes-Netočný decomposition to such systems [4]. Because we define the decomposition in a geometrical way by using a projection operator and the edgewise Onsager coefficient (13), we term it the Onsager-projective decomposition. The basic idea of the technical details discussed in the following two paragraphs is summarized in Fig. 4.
Let us consider a matrix P that is the projection onto the image of S T and orthogonal with respect to the matrix L. Formally, this means that for any f, g ∈ R E , P satisfies P f, (I − P )g L = 0, where I is the identity. This projection matrix is given by [53] with A − denoting the generalized inverse of a matrix A.
Since S does not have full rank in general, we need to con-sider a generalized inverse. Nevertheless, we may set it to a theoretically tractable one, like the Moore-Penrose inverse, because P does not depend on the choice of the generalized inverse [53]. To be more concrete, let the eigendecomposition of SLS T be U ΛU −1 with orthogonal matrix U and diagonal matrix Λ = diag(λ 1 , . . . , λ r , 0 . . . , 0) where λ α = 0 and r = rank S. Then, the Moore-Penrose inverse of SLS T can be computed as For any f ∈ R E , P f is also characterized by of a connected graph, c is given by a vector of ones times any scalar. When we consider chemical reaction networks, such a null vector is called a conservation law [5].
Nonetheless, P f is unique, so we can consider φ(f ) as fixed. Now, we can define the conservative force that plays a central role in the Onsager-projective decomposition. Define F * : The force F * can be seen as a conservative force that reproduces the original (possibly nonconservative) dynamics because SLF * = SLF = SJ = d t x. This can be shown as follows. First, we have a decomposition The second term SL(I − P ) is zero because P is the orthogonal projection onto im S T with respect to the metric L, i.e., for any ϕ ∈ R S and f ∈ R E , P satisfies S T ϕ, Using the conservative force F * , we find a decomposition of the EPR that can be regarded as a generalization of the Maes-Netočný decomposition in Langevin systems to Markov jump processes and chemical reaction networks [4]. We define the Onsager excess and housekeeping EPR by respectively. These terms give a nonnegative decomposition of the EPR, for both Markov jump processes and CRNs. They are always nonnegative because L is positive definite. From the orthogonality relation the decomposition is proved as follows The two orthogonal components F * and F − F * of the thermodynamic force allow us to interpret the decomposition in a geometrical way, see Figs. 1 and 4. This is similar to the interpretation in the Langevin case [28,29].
In a steady state, where SJ st = 0 holds, the excess EPR vanishes and σ = σ hk because On the other hand, the housekeeping EPR is identically zero if and only if the steady state is detailed balanced [54]. Therefore, the decomposition (35) separates the EPR into a transient contribution σ ex and a contribution σ hk required to sustain a nonequilibrium steady state if a steady state exists. Given the eigendecomposition SLS T = U ΛU −1 discussed earlier, we can write down the excess and housekeeping EPR as In what follows in this section, we discuss several properties and consequences of the decomposition, which we summarize in Table I with corresponding equation numbers. Except for the operational implications discussed in Sec. III D using the variational formulas from Sec. III C, these results are basically independent from each other.

A. Generalization of Schnakenberg's cyclic decomposition
Our decomposition leads to a generalization of Schnakenberg's cyclic decomposition of the steady-state EPR (12) to non-steady states. Consider the current corresponding to F − F * , that is J cyc := L(F − F * ). This current is in the kernel of S because SLF = SLF * , therefore it is a cyclic current and can be expanded in the cycle basis as This expansion is available whether the system is in a steady state or not. The conjugated force is also given by where we used e S e (C µ )F * e = −(φ * ) T SS(C µ ) = 0. Thus, we have generalized the decomposition in Eq. (11), which applies only in a steady state, to arbitrary states. Importantly, these cycle quantities allow us to write the Onsager housekeeping EPR as Therefore, our decomposition generalizes Schnakenberg's cyclic decomposing of the steady-state EPR [10] to a cyclic decomposition of the housekeeping EPR in non-steady-state systems.

B. Generalization of Maes-Netočný decomposition
Next, we show that the decomposition can also be understood as a generalization of the MN decomposition [4]. In their paper, Maes and Netočný regarded the EPR as a functional of the potential landscape, and minimized it by changing the landscape. The result is the following minimum EP principle: the potential landscape minimizing the EPR is the one for which the instantaneous distribution is a steady state. In our case, using the minimizing property of the projection operator (32), we have the expression In a Markov jump process, F is written as F = −B T (u + ln p) + F nc , where u is some potential landscape and F nc is a nonconservative driving force. We define J(v) := −LB T (v+ ln p) + LF nc as a function of the potential landscape. Using these expressions, we find a formulation resembling Ref. [4], Minimization leads to the condition Thus, among all potential landscapes with Onsager matrix L, the entropy is minimized for the one in which the instantaneous distribution p is a steady state. This is the analogue of Maes and Netočný's minimum EP principle [4]. We remark that fixing L is connected to fixing the diffusion matrix in the Langevin case (see Appendix B).

C. Variational formulas
We can also extend other results derived in the Langevin case [28,29] to discrete dynamics. First, the following variational expressions hold: Note that SL is the adjoint of S T with respect to the inner product ·, · L . As an important example, F − F * is in its kernel because SLF = SLF * . This space can be interpreted as the space of forces that do not contribute to the dynamics if we regard LF as a probability current for any force F . The first expression (48) is obtained by combining the Cauchy-Schwarz inequality and the following equality, Equality is obtained in (51) when S T ψ is parallel to F * . The infimum representation (49) for the excess EPR is proven by the following calculation: When F ∈ ker SL, The second term in the third line of Eq. (53) is nonnegative and vanishes when F = F −F * ∈ ker SL; thus the minimum value gives the excess EPR. The supremum formula (50) of the housekeeping EPR can be shown in a parallel way to the excess case. For Thus, the inequality is obtained for any F ∈ ker SL. The equality is attained when F is proportional to F − F * , which also belongs to ker SL.

D. Minimum entropy production
The variational formula (49) allows us to interpret the excess EPR as a minimum EPR. In this section, we focus on Markov jump processes. In general, when we control rate constants to minimize the EPR with the dynamics d t p unchanged, the EPR can be made arbitrarily small unless we put further assumptions [55]. To consider minimum EPR under additional constraints, recent studies have proposed to fix the geometrical mean of the transition rates k e k −e on each edge [56], or to fix the backward rates k −e [57]. They found that such constraints lead to nonconservative optimal forces. On the other hand, the excess EPR can be interpreted as the minimum EPR when we change transition rates while keeping the value of L(p) fixed, since Eq. (49) can be rewritten as where F k , L k and σ k are the force, the Onsager matrix, and the EPR given by transition rates k instead of the actual dynamics k, which provides L. The definition of the excess EPR (34) also shows that we can obtain a conservative optimal force in this minimization. Quite recently, a similar study [58] discussed minimization of integrated EP by fixing the time average of the sum of the edgewise coefficients, not the instantaneous coefficients. We summarize these minimizations in Table II. Although the minimization is mathematically well defined, fixing the Onsager coefficients is not always interpretable from a physical point of view. Nonetheless, we can find a concrete meaning of this constraint in the continuum limit. In this limit, the edgewise Onsager coefficient becomes e (p) ≈ D e (ι(e))p ι(e) /(∆x) 2 with ∆x being the distance between nearest sites going to zero, and D e (i) a kind of "diffusion coefficient" [59]. Therefore, fixing L(p) is equivalent to assuming that the diffusion coefficients are constant, which is a natural assumption when we consider Langevin dynamics. For more detailed discussion on the asymptotic relation, see Appendix B.
Ours Remlein [56] Ilker [57] Vu [58] Table II. Comparison of dynamical constraints and optimal thermodynamic forces that minimize the EPR (or integrated EP). Here, NC and C stand for nonconservative and conservative. Vu [58] did not mention whether the optimal force is conservative or not.

E. Gradient flow
We now generalize Eqs. (24) and (25) using the Onsagerprojective decomposition. Let φ * (t) be the potential that gives F * (t) = −S T φ * (t) at time t, and define a pseudo-canonical distribution x pcan (t) by In a Markov jump process, is a normalization constant that does not depend on α. In a chemical reaction network, Z(t) is any vector whose log becomes a conservation law, i.e., ln Z(t) = (ln Z α (t)) ∈ ker S T holds. If there is no conservation law, Z(t) will be the vector of ones. Then, we have the relation This result follows because where the second equality follows from i B ie = 0 for Markov jump processes and from S T ln Z = 0 for chemical reaction networks. In particular, in a Markov jump process, by the decomposition φ * = * + ln p, the pseudo-canonical distribution reads If F * = F , the pseudo-energy level * becomes the actual energy level of each state. The expression (61) is the reason why we refer to x pcan as the pseudo-canonical distribution. Although this pseudo-canonical distribution depends on the instantaneous state, it indicates the momentary "goal" of the dynamics, even if the system is attracted to a limit cycle. Along with the pseudo-canonical distribution x pcan , we can formally obtain a gradient flow representation of the continuity equation (2) even without detailed balance because so that This is a generalization of the gradient flow (24). We remark that, even if the rates entering the dynamics via (3) are timeindependent, x pcan generally depends on time, see Eq. (58).

F. Brief comparison with Hatano-Sasa decomposition
There are other ways to decompose the EPR into excess and housekeeping contributions. One well-known approach is the Hatano-Sasa (HS) decomposition [2], also called the adiabatic-nonadiabatic decomposition [3], which is given by If the flux is written in mass action form, the difference between the forces becomes with Ψ α = ln(x α /x st α ). Then, we obtain another expression for the HS excess EPR which is similar to the expression σ ex = −J T S T ψ * that follows from the definition (34) and −SLS T φ * = SJ. The nonnegativity of the HS excess and housekeeping EPR can be proven only when the steady state satisfies BJ st = 0. For a Markov-jump process, this condition is precisely the steady-state condition. For a chemical reaction network, however, this condition is called complex balance condition and is stricter than the steady state condition SJ st = νBJ st = 0 [5,6]. On the other hand, the Onsager excess EPR is defined even in the absence of stable steady state, which we will show in Sec. VI A through the Brusselator model that exhibits a limit cycle. For more detailed discussion about the relation between our approach and the HS decomposition, see Appendix C. The variational expression (48) allows us to obtain a set of refined short-time TURs. In this section, we focus on Markov jump processes. We define the edgewise dynamical activity (traffic) χ e by [60] χ e := k e p ι(e) + k −e p ι (e) .

IV. THERMODYNAMIC UNCERTAINTY RELATIONS
From the inequality in Eq. (15), we have The equality holds if and only if p satisfies k e p ι(e) = k −e p ι (e) , i.e., in the equilibrium state. The numerator of the right-hand side in Eq. (48), S T ψ, F L , which reads B T ψ, F L now, is the time derivative of the expectation value ψ = i p i ψ i because Note that here we assume that ψ does not have any timedependence. Thus, from the variational formula (48), we have a TUR for any observable ψ. Therefore, as long as we know the edgewise Onsager matrix L, we can estimate the excess EPR by measuring the rate of change of the average ψ .
Because of the inequality in Eq. (71), the denominator is bounded from above by the short-time limit of the variance as where ∆ τ ψ = ψ i(t+τ ) − ψ i(t) and the variance is asymptotically given by [23] Var This inequality shows a weaker bound Although equality can only be achieved in equilibrium, where we trivially have σ ex = 0, the inequality can be tight near equilibrium, where the difference between the arithmetic and logarithmic mean in Eq. (71) is small. Moreover, Eq. (76) does not require information about the Onsager matrix L. Therefore, is much more tractable than Eq. (73) and gives an estimate of the excess EPR that depends only on the rate of change of the average and the short-time fluctuations of some observable. This inequality partly generalizes the short-time TUR derived in Ref. [23] for the case where currents are expressed as changes of observables as above.
While the interpretation of the inequalities derived from Eq. (48) as TURs was straightforward, we need to develop a trajectory level description to understand the meaning of the inequality suggested by Eq. (50). This is done later when we discuss finite-time TURs.
B. Short-time TUR: Chemical Reaction Network Next, we consider chemical reaction networks. Because of the inequality between logarithmic mean and arithmetic mean (15), we have The scaled diffusion coefficient of a chemical reaction network is defined by [6,30] D αβ := 1 2 e S αe S βe (K e (c) + K −e (c)).
Using this quantity, the right-hand side of Eq. (77) becomes Moreover, because we obtain a TUR for a chemical reaction network This inequality generalizes the previous result in Ref. [30]: because σ ex ≤ σ, follows by setting ψ α = δ α α .

C. Finite-time TURs
The above results relate the excess EPR to the short-time fluctuations of observables. In the following, we obtain finitetime TURs for Markov jump process. Proofs of these relations are found in Appendix D. Note that these proofs rely on the expression for the path probability of the system, so their generalization to chemical reaction networks is challenging and we leave it for future work.
In general, we consider a stochastic current observable J w with weight w e (t), for a stochastic trajectory with jumps e k at time t k (k = 1, 2, . . . ). We assume the weight satisfies w e = −w −e . The average during a time interval [0, τ ] is given by We introduce a decomposition of the current observable into the excess and the housekeeping contribution [29] as where we defined J * e (t) := e (t)F * e (t). As shown above, these currents induce the same time evolution as J, since BJ * = BLF * = BJ.
By using the housekeeping component of a current observable, we can show the TUR for the Onsager housekeeping EP with variance Var(J w ) := (J w − J w τ ) 2 τ . Unlike the short-time TURs discussed above, this relation is valid for any finite time duration. It states that the conventional form of the steady-state TUR remains valid if we consider the respective housekeeping contributions for both the current and the entropy production. Now, we can derive a short-time TUR for the housekeeping EPR. Since the time interval [0, τ ] can be replaced with [t, t + ∆t], by taking limit ∆t → 0, we obtain the short-time TUR When we assume the weight satisfies w = (w e ) e=1,...,E ∈ ker SL, we see that w T J * = w T LF * = 0 because F * ∈ im S T . Thus, we find another TUR for w ∈ ker SL. Because d t J w t = w T J = w, F L , this is derived from Eq. (50) in the same way as we did in Sec. IV A.
What is important here is that, by introducing a current observable, we can interpret the inequality we get as a TUR. If we change the time interval as in Refs. [29,61], we can also derive a finite-time TUR for the Onsager excess EP. We assume that the transition rates k e (t) depend on t and τ in the form of t/τ only. For example, with k 0 constant, k e (t) = k 0 exp(t/τ ) is acceptable but k e (t) = k 0 exp(t) and k e (t) = τ −1 exp(t/τ ) are not. Physically, this assumption means that any external operation on the system is accelerated when τ becomes smaller, and vice versa. We put the same assumption on the time dependence of the weight w. Under these assumptions, we can prove the finite-time TUR where ∂ τ is taken while holding fixed the normalized time t/τ that defines k e (and enters into the definition of J w τ , Eq. (84), via J e ). When the weight is given by a scalar time-dependent observable ψ(t/τ ) as w = B T ψ, J w becomes the change ∆ τ ψ = ψ(1) − ψ(0) and we have the relations Then, equation (92) reads This can be understood as a refinement of the TUR for timedependent driving derived in Ref. [61]. In particular, when one considers the change of scalar observables, rather than currents, the resulting TUR can only estimate the excess part of the entropy production. By taking the limit τ → 0, we recover the short-time TUR (76).
Since the proofs of the finite-time TURs are lengthy and technically similar to existing results, we give them in Appendix D, only explaining their underlying idea briefly at this point. The structures of the proofs are essentially the same. The first step is to parameterize the rate constant and compare the two path probabilities for two close parameters. The difference between the two path probabilities is measured by a KL divergence which bounds from above the ratio of two quantities: (1) the difference between the averages of the current observable given by the different parameters and (2) the variance of the current observable. This inequality is obtained from the fluctuation-response inequality [26]. The difference between the observables can be shown to be equal to J hk w or τ ∂ τ J w τ − J hk w depending on the way the rates are parameterized. At the same time, the KL divergence bounds from below the Onsager excess or housekeeping EP. Combining the two inequalities, we obtain the uncertainty relations.

A. Generalization of the Wasserstein distance
In our final set of main theoretical results, we provide a generalization of the Wasserstein distance measure (26) Table IV. Summary of main results we obtain in section V. Here "distance" means Wasserstein distance. The first three are mathematically novel, while the last two reveals physical relations between the Wasserstein geometry and the excess EP(R).
speed limit (29). In this section, we assume that the kinetics K do not depend on time explicitly. As we reviewed in Sec. II C, the Wasserstein distance has been defined for Markov jump processes with detailed balanced steady states. Now, we find a general definition applicable to both Markov jump processes and chemical reaction networks: As in Eq. (28), the unit time interval can be replaced with an arbitrary one [0, T ] and we have for any T > 0, with the last condition of Eq. (97) replaced with x(T ) = x (1) . Because L(x) = diag( e (x)) is defined by W is defined in reference to some kinetics K. Here, x(t) and f (t) are taken as independent variables in the minimization which are related by d t x = SL(x)f ; in particular, f (t) is not necessarily equal to the original thermodynamic force F (x(t)). If there is no pair of x and f that satisfies the continuity equation in Eq. (97) for a given (x (0) , x (1) ), we may let W = ∞. However, we do not consider such a situation, because if (x (0) , x (1) ) are realized as the initial and final state of the dynamics (2), then x(t) and F (x(t)) are a solution. We can show that the optimal force that provides the value of W is conservative. That is, we have the formula We give the proof of this formula in Appendix E. The mathematical features of the generalized Wasserstein distance are expected to be similar to those of Maas's distance [42]. In this paper, we demonstrate two important properties of the Wasserstein distance, namely the constant speed property of a geodesic [43] and the Kantorovich duality [44].
We first explain that a geodesic with respect to the Wasserstein distance has constant speed. In Ref. [43], the authors proved that there exists a geodesic p * accompanied by potential ψ * with respect to their original definition of the Wasserstein distance (26). This geodesic has constant speed, meaning that B T ψ * (t) L(p * (t)) = const. As we show, the proof is also valid if we assume there is a geodesic x * with potential ψ * for the generalized Wasserstein distance (100). Because of the Cauchy-Schwarz inequality, for an arbitrary {x, ψ}, we have Moreover, using a reparametrization discussed in Ref. [62], we can also prove the opposite inequality for the geodesic pair {x * , ψ * }. We provide the detail of this technique in Appendix F. Because the equality of the Cauchy-Schwarz inequality (102) is attained only if S T ψ(t) L(x(t)) is constant, the geodesic has the constant speed Next, for a Markov jump process, we prove the Kantorovich duality representation [32] of the Wasserstein distance as in Ref. [44]: where ψ(t) ∈ R S (t ∈ [0, T ]) has to satisfy for any probability distribution q, and · i (i = 0, 1) is the expectation value under p (i) . The maximizer ψ * * of Eq. (105) and the minimizer {p * , ψ * } of Eq. (100) provide the same conservative force, B T ψ * = B T ψ * * . Furthermore, it can be shown that ψ * * and p * attain the equality in Eq. (106) as which can be seen as a generalization of the Hamilton-Jacobi equation. While the duality representation characterizes the Wasserstein distance as the maximum potential difference without considering the time evolution of the distribution, we can use the Hamilton-Jacobi equation with the differential equation in Eq. (101) to obtain the optimal solution. We provide a proof of the duality formula (105) in Appendix G; there, we assume the existence of an optimal time evolution of p and ψ, that is, a geodesic. A proof that is more rigorous but restricted to systems with detailed balanced steady states can be found in Ref. [44].

B. Wasserstein distance and Onsager excess EPR
Here we show that, analogously to the continuous case [28,29,34,36,39], the Wasserstein distance is connected with the excess entropy production. To this end, we employ the definition given in Eqs. (98) and (97).
Let {x(t)} t∈[0,τ ] be a solution of the continuity equation (2). Note that τ indicates the physical time interval we focus on while T is an arbitrary time parameter. If we set x (0) =x(t), x (1) =x(t + dt), and T = dt in Eq. (98), condition (97) reads Hence for an infinitesimal time interval dt, the Wasserstein distance can be expanded as The condition of minimization can be translated into {f | F (x(t)) − f ∈ ker SL}, hence the leading term is identified with the excess EPR σ ex from Eq. (49). We finally obtain the relation Intuitively, in the definition of the Wasserstein distance Eq. (98), we optimize over the thermodynamic forces and the corresponding evolution of the state connecting the initial and final state. By contrast, in computing the excess EPR, we restrict the evolution of the state to the one of the original dynamics and only optimize over the forces. In the short-time limit of a single time-step, there is no evolution of the state, so both problems are equivalent and we can identify the Wasserstein distance and the excess EPR with each other.

C. Speed limit
We definel and l bẏ l(t) := lim While W(x(0),x(τ )) gives the shortest length between the initial statex(0) and the final statex(t), l quantifies the length of the path {x(t)} t∈[0,τ ] in terms of the Wasserstein metric.
Using l, we have the speed limit, as in Ref. [39], This is because of the Cauchy-Schwarz inequality l 2 = ( τ 0 dtl) 2 ≤ ( τ 0 dt)( τ 0 dtl 2 ) and the equality (110). By the constant speed property, the following triangle inequality is proven for any t 1 , t 2 , t 3 ∈ [0, τ ]: This relation leads to an inequality between the two distance measures as The equality holds only when the distribution changes at the constant speedl By combining Eqs. (112) and (114), we obtain the speed limit This generalizes the Vu and Hasegawa speed limit (29), which was introduced in Ref. [31] for systems with detailed balanced steady states, to general Markovian dynamics on a graph which may have no stable steady state.

D. Relation to L 1 -Wasserstein distance
Another distance that has been considered in the theory of optimal transport is the L 1 -Wasserstein distance [32,55]. This distance is defined by where the infimum is taken over the couplings between the probability distributions p (0) and p (1) , that is, and d ij is a weight that satisfies the axioms of distance: d ij = d ji ≥ 0, d ij = 0 if and only if i = j, and d ij ≤ d ik + d kj . The (i, j)-element of a coupling represents how much of the mass on i is moved to j and d evaluates the cost of the transportation. Here we set d ij = 1 if i and j are directly connected, and assume that the graph of states is connected.
In Ref. [55], one of us derived the Kantorovich duality for this L 1 -Wasserstein distance where ξ is a potential that satisfies ξ j − ξ i = d ij = 1 if and only if the optimal coupling Π * is positive on (i, j), Π * ij > 0. Using this formula, we can prove the following relation between the L 2 -and L 1 -Wasserstein distance, where we write the L 2 -Wasserstein distance as W 2 andĀ[p * ] is the time-averaged dynamical activity of the minimizer p * of the L 2 -Wasserstein distancē This is similar to the inequality W 2 (p 0 , p τ ) ≥ W 1 (p 0 , p τ ) for the L 2 -and L 1 -Wasserstein distances of continuous probability distributions [32], apart from the prefactor that represents how frequently transitions occur in the optimal path on average. We provide the proof in Appendix H. We remark that in the discrete case, the L 2 -distance defined above is a dynamicswise measure and depends on the concrete values of the rates. By contrast, the L 1 -distance depends only on the connectivity of the state network. This differs from the continuous case, where both distances are defined in terms of the Euclidean distance.

VI. EXAMPLES
In this section, we discuss two examples. One clearly shows how the Onsager-projective decomposition works and separates the two aspects of dynamics into excess and housekeeping terms. It is illustrates that the short-time TUR (76) improves the bound given in Ref [23]. The other demonstrates that the decomposition can also be obtained for a nonlinear chemical reaction network that exhibits a limit cycle. We numerically show that in such a system, the conventional HS decomposition fails. On the other hand, we do not discuss other results such as the generalization of finite-time TURs and the Wasserstein speed limit because they have been well studied in [25,61] and [58], respectively.

A. Two-level system attached to two reservoirs
We illustrate our results in a simple two-level stochastic system. Consider a two-level system attached to two heat reservoirs at inverse temperature β h and β c respectively. We can calculate the analytical forms of the excess and housekeeping EPR. Let the energy of state i be i for i = 1, 2 and 2 > 1 . There are two kinds of transition associated with the distinct reservoirs. We label each transition with the reservoir Figure 5. Conceptional diagram of example VI A. A two-level system is attached to two heat reservoirs at different temperatures. The total dissipation (EPR) σ can be divided into excess EPR σ ex and housekeeping EPR σ hk . Both are clearly interpreted as follows: σ ex corresponds to a relaxation caused by a single reservoir at the "mean" temperatureβ, while σ hk is attributed to heat transfer that does not change the state. Figure 6. Verification of the short-time TUR of Eq. (76). The ratios between the lower bound 2(dt ψ ) 2 /D ψ vs. EPR σ and excess EPR σ ex are shown. η corresponds to σ ex and η to σ. We choose the observable ψ = (1, 0) T but, in this system, the lower bound does not depend on the choice of ψ = 0. η decreases and tends to zero after its peak because σ does not vanish when t → ∞. On the other hand, η approaches the upper bound of 1, although σ ex diminishes at the same time. In the inset, we compare our decomposition with the HS decomposition. In this example, where a stable steady state exists, they behave in almost the same way. at inverse temperature β e by e = h or c, instead of 1 or 2, to avoid confusion. The incidence matrix is then given by This setup is illustrated in Fig. 5. By using the eigendecomposition of BLB T and formula (31), we can find the projection matrix where e denotes the Onsager coefficient associated with a transition driven by heat bath e. It leads to the conservative force F * as where, up to an additive constant, The choice of φ * is not unique, but φ * is uniquely determined, so we write φ * 2 − φ * 1 as ∆φ * . If we do not consider any explicit external force, the forces read for e = h, c, where ∆ = 2 − 1 and ∆ ln p = ln p 2 − ln p 1 . Then, we obtain the expression Therefore, φ * can be thought of as a mean potential of the system attached with a single bath at inverse temperatureβ. The excess EPR is then interpreted as the dissipation due to the relaxation to the pseudo-canonical distribution p pcan i ∝ e −β i , in view of Eq. (59).
On the other hand, F − F * is written as where ∆β := β c − β h . Then, the corresponding current is cyclic (BJ cyc = 0). Moreover, the housekeeping EPR reads From this expression, we can see that the (inverse) temperature difference leads to a nonequilibrium steady state with a positive rate of dissipation. We illustrate the TUR of Eq. (76) and compare our decomposition with the HS decomposition in Fig. 6. We plot the time evolution of the ratios where we choose the observable ψ = (1, 0) T . Note, however, that in this case η and η do not depend on the choice of ψ, as long as it is nonzero. We find that η has a maximum value of t 0.3 and cannot attain the upper bound 1, while η approaches 1 as t → ∞ (even though σ ex vanishes at the same time). This calculation was done with parameters β h = 1, β c = 2, 2 − 1 = 1, k −h = k −c = 1 and p 0 (0)/p 1 (0) = 10 3 .
The inset shows the time evolution of the excess and housekeeping EPR for the two decompositions. Both the decompositions give almost the same values in this system. We note that we can also express the HS housekeeping EPR with another mean temperatureβ can be defined by It satisfies β h ≤β ≤ β c if β h ≤ β c and let us rewrite the HS housekeeping EPR as However, the definition ofβ is not straightforward and this expression is less easy to interpret than Eq. (131).

B. Brusselator model of chemical oscillation
Let us consider the Brusselator model of a chemical oscillation. This model consists of three reactions, We assume the mass action law and set the rate constants k 1 = k 3 = k −1 = k −3 = 1, k 2 = 10, and k −2 = 0.1. The concentrations of X and Y obey the following rate equation: Here we denote the concentrations by c rather than x. This equation admits an unstable steady state (c X , c Y ) = (1, 10) and has a limit cycle. The time evolution is shown in the uppermost panel of Fig. 7. In the smaller inset, we show the oscillation of the concentrations c X and c Y . The bigger graph shows its cyclic behavior. The blue star indicates the unstable fixed point. We show the behavior of the EPRs along with ċ 2 = c 2 X +ċ 2 Y in the middle panel. ċ 2 is normalized so that it fits in the graph with the EPRs. This figure shows the correlation between σ ex and ċ 2 . In this model, the faster the concentration changes, the bigger the Onsager excess EPR becomes. The color of dots in the uppermost panel indicates the size of σ ex relative to σ hk . The yellow (bright) color corresponds to parts of the trajectory where σ ex is large. This situation corresponds to around t 25.8 in the middle panel.
Below the middle figure, we show that the HS excess EPR can be negative if one defines it using the unstable steady state  = (1, 10). A point in the trajectory is yellower (brighter) when σ ex is large relative to σ hk . The middle figure shows EPRs and the squared norm ċ 2 =ċ 2 X +ċ 2 Y as a function of time. The norm is normalized so that it fits in the graph with EPRs, thus the value is in arbitrary units. Note the correlation between the Onsager excess EPR σ ex and the speed of the concentration vector ċ 2 in the middle panel. As shown in the lowermost panel, in this system with an unstable steady state, the HS excess EPR can be negative. This implies that, unless σ ex HS vanishes identically, it has to take negative values at some times during the cycle. We stress that this is in contrast to the Onsager projective decomposition, whose excess part is positive by construction.

VII. DISCUSSION
In this paper, we developed a geometrical perspective on dissipation in general Markovian dynamics on networks. In particular, we found a geometrical excess/housekeeping decomposition of the entropy production rate that helps us understand dynamics in terms of dissipation. Our decomposition leads to refinements of several thermodynamic uncertainty relations. We have also generalized the L 2 -Wasserstein geometry of a Markovian dynamics on a network, which allows us to derive the thermodynamic speed limit. Because our results are valid for very general dynamics, we expect that they will offer novel insights in various concrete systems.
The geometrical decomposition given by the projection in (31) shows how dynamics can be divided into two modes, one being cyclic and one being a relaxation driven by a conservative force, which can be written as a gradient flow. Even if a stable steady state does not exist, we can identify an instantaneous target of the relaxation mode as in Eq. (58). The Onsager excess entropy production quantifies the dissipation due to this relaxation. On the other hand, the cyclic mode keeps the system out of equilibrium. The size of the cyclic mode defines the Onsager housekeeping entropy production, which can be further decomposed into contributions from individual cycles, generalizing Schnakenberg's decomposition as in Eq. (43). Because these interpretations apply to both Markov jump processes and chemical reaction networks, our decomposition provides a clear and general thermodynamic separation of motion.
We note that one may define a decomposition of the EPR using an orthogonal projection onto the image of B T instead of S T as we have done. For a Markov jump process, it provides the same results, while for a chemical reaction network, it gives a different decomposition. The excess part of this decomposition vanishes in a steady state only if the steady state is complex balanced, but both parts are always nonnegative in contrast to the HS decomposition, thus this decomposition could provide another perspective to chemical reaction networks. However, it may be argued that the projection onto the image of S T , as investigated here, is more physically meaningful. This is because it corresponds to conservative forces defined in terms of the stoichiometric coefficients of the reactions, as standard in chemical thermodynamics, rather than forces that depend explicitly on the reactant and product chemical complexes.
From the experimental point of view, the edgewise Onsager coefficient, and therefore the Onsager excess and housekeeping EPRs, require information about the kinetics. For this rea-son, it may difficult to access them experimentally. However, recent studies have shown that TURs can be utilized to obtain estimates of the entropy production [23,24]. In a similar manner, we can use the refined TURs to estimate the excess and housekeeping EPR using measurable quantities. Although we cannot generally attain equality in the TUR for a discrete system out of equilibrium, since part of the estimation error stems from the log mean inequality between the edgewise Onsager coefficient and the edgewise activity [63], this error gets smaller as the system gets closer to a steady state. In a two-level system, we can approach equality, as shown in Fig. 6 of Sec. VI A, because the steady state is "detailed balanced" in the sense that there is no net current between the states, even though there is a steady flux between the reservoirs.
We have also extended Maas's definition of L 2 -Wasserstein distance to a general Markovian dynamics on a network. A mathematical advantage of Maas's distance is that it can be related to master equation dynamics via the Wasserstein gradient flow [42], compared with other optimal transport theoretical distances between two discrete distributions, e.g., the L 1 -Wasserstein distance [32,55] and another discretization of the L 2 -Wasserstein distance [64]. We expect that a deterministic rate equation with a detailed-balanced steady state is also characterized as a Wasserstein gradient flow with the aid of our generalized Wasserstein distance, beyond the usual gradient flow expression as in Eq. (24) [11].
The relation between the geometrical decomposition and the generalized Wasserstein distance presented in Eq. (110) is parallel to recent results for Langevin systems [28,29]. Therefore, our generalization of Maas's L 2 -Wasserstein distance is not only important from a mathematical point of view, but also has a physical interpretation as the minimum excess entropy of a process connecting the initial and final state. As discussed above, obtaining a meaningful "minimum entropy production" requires a constraint, that is, keeping some quantity fixed during the minimization [55][56][57][58]65]. In the present study, we fix the Onsager coefficients, whereas in previous studies [56,57,65] the symmetric parts or one of the two directions of the transition rates were fixed. While the latter may be the intuitive choice from an operational point of view, our approach establishes the connection between minimum entropy production, conservative forces and optimal transport. Although we have shown how the excess EPR can be understood as a minimum EPR at each time point, further study is needed to investigate how the finite-time optimization of the Wasserstein distance works practically and how the different approaches are related to each other.
One drawback of the discrete L 2 -Wasserstein distance is that it needs a kinetics to refer to. Since the reference kinetics has to be time-independent, we cannot obtain the Wasserstein speed limit in externally driven systems. We remark a point that prevents generalizing the Wasserstein distance to the case where the reference kinetics is time dependent. While in the original definition, the time interval is not definite, we should care about how to define the time integral for a time-dependent reference. Then, however we choose the way to do so, we cannot use the conventional reparametrization technique, which is well described in Appendix F, because it requires the reference kinetics to be time independent. As a result, it becomes difficult to prove several properties, such as the triangle inequality and constant speed property of a geodesic.
Let us mention a complementary study that we have recently done. In this paper, we considered an EPR decomposition by studying the space of thermodynamic forces using Euclidean geometry. However, it is also possible to define such a decomposition using the non-Euclidean information geometry. We develop this perspective in Ref. [66], where derive several thermodynamic bounds based on an informationgeometric excess/housekeeping decomposition.
Finally, we remark about a missing piece: nonlinear systems in continuous space. Two of the present authors, with another coauthor, presented a geometric decomposition of the EPR for linear stochastic systems in continuous space [28,29]. In this work, we developed it for Markov jump processes (linear stochastic system) and chemical master equations (nonlinear deterministic systems) in discrete space. The story remains to be written for nonlinear systems in continuous space, such as reaction-diffusion chemical systems and hydrodynamic systems. For such systems, geometric techniques for decomposing EPR may shed light on new kinds of thermodynamic constraints and tradeoffs.
with conditions where L is the logarithmic mean First, by symmetry, the integrand is rewritten as where we used the detailed balance condition k e p eq ι(e) = k −e p eq ι (e) . Moreover, the numerator of the logarithmic mean times k e p eq ι(e) gives J e because k e p eq ι(e) ρ ι(e) = k e p ι(e) (A7) Furthermore, the denominator reads due to the detailed balance condition. Therefore, we have k e p eq ι(e) L(ρ ι(e) , ρ ι (e) ) = and the integrand is equal to We next consider the conditions for minimization. From Eq. (A10), we find condition (A3) is equivalent to therefore the conditions are equivalent to ours in Eq. (27). Finally, we complete proving the equivalence for the left-hand side and for the right-hand side.

Appendix B: Onsager coefficient and diffusion coefficient
We discuss how the Onsager coefficients are connected to the diffusion coefficient in Langevin systems when we consider a Markov jump process on a square lattice that approximates a continuous space. The result here was obtained in Ref. [58] for one-dimensional cases and we generalize it to n-dimensional systems for an arbitrary dimension n.
Let i denote a point on n-dimensional square lattice with the lattice constant ∆x 1. The discrete probability distribution p i can be approximated by a smooth density function P (x) as p i P (i)(∆x) n . A transition can be designated by its starting point and the direction d = ±1, . . . , ±n. With w (d) i being the transition rate of the jump from i in direction d, we have the master equation where we make the time dependence implicit and i + d is the point next to i in the direction of d. Note that now each edge is determined by the combination of the starting point and the direction of movement. As in previous work [58,67], we assume the transition rates are expanded as the series Here, we assume D d (x) to be a symmetric function ). Then the master equation reads By taking the limit ∆x → 0, we have the Fokker-Planck equation Now, we can show that to the leading order, the current and the force on edge (i, d) are given as which shows that fixing the functional form of the Onsager coefficients is similar to fixing the diffusion coefficient in a Langevin system.

Appendix C: Detailed comparison with HS decomposition
In addition to the discussion given in Sec. III F, we can further establish a connection between the HS decomposition and Onsager-projective decomposition. Let P Ψ be the orthogonal projection operator onto the vector S T Ψ with respect to the metric L, with Ψ given by Eq. (69), We define the pseudo-HS decomposition by Because P Ψ is an orthogonal projection operator, the sum of the two terms is equal to the total EPR: Since P P Ψ = P Ψ , we have the inequality which allows us to define a coupling term as in Refs. [28,29] σ cpl := σ ex − σ ex pHS ≥ 0, and decompose the EPR into three positive contributions However, the pseudo-HS decomposition does not generally coincide with the genuine HS decomposition. A straightforward calculation reveals their connection If σ ex HS = S T Ψ 2 L holds, the two give the same value, but it is not the case in general.
Moreover, the difference between σ ex HS and S T Ψ 2 L is of the same order as each individual term even when the system is near a steady state; in other words, not small. Let sup α |x α − x st α | = be very small, with x st a steady state and assume d t x = SJ is of the same order. Then, σ ex HS and S T Ψ 2 L are of order of 2 because Ψ and SJ are both of order . On the other hand, because SL st F st = SJ st = 0, the difference between σ ex HS and S T Ψ 2 L becomes where the third line comes from equation (67). Here, Ψ and L − L st are of the order of , while F st is generally of the order of 1, so that we see the difference is of the order of 2 , just as σ ex HS and S T Ψ 2 L . The equality only holds when we have the detailed balance condition, that is for conservative forces. In this case, we find σ = σ ex HS = σ ex pHS .
Appendix D: Derivations of finite-time TURs

Housekeeping TUR
To prove the housekeeping TUR (87), let us consider a parametrization of transition rates by introducing the interpolated force F θ e = F * e + θ(F e − F * e ) as k θ e p ι(e) = e g(F θ e ), with where θ ∈ R is an interpolation parameter. We also define J θ e := k θ e p ι(e) − k θ −e p ι (e) . One can easily show that J θ e = e F θ e and J θ e = J * e + θ(J e − J * e ) hold. Because BJ θ = BJ for any θ and any time, the solution of the original (θ = 1) master equation {p(t)} t∈[0,τ ] solves d t p(t) = BJ θ (t).
In general, we denote the path probability for the master equation by P and that for the master equation with modified rate constants k θ , such as those defined in Eq. (D1), by P θ . The KL divergence between P θ and P θ is given by where Γ indicates a path, ±e denotes the summation over both directions of edges, and p θ is the solution of the master equation given by k θ . We always assume the initial distribution does not depend on parameter θ. The KL divergence can also be expressed as where D(k θ p θ ||k θ p θ ) is the generalized KL divergence between the two "distributions" {k θ e p θ ι(e) } e=±1,±2,... and {k θ e p θ ι(e) } e=±1,±2,... , that is, In the parametrization (D1), the relation holds. Thus, the KL divergence becomes (D9) One can easily show that h(x) + h(−x) ≤ 1/2. Therefore, we have the following inequality regardless of the value of θ The fluctuation-response inequality (12) of Ref. [26] tells us that for any current observable J w defined in Sec. IV C, where · θ τ and Var θ (·) are the average and variance regarding P θ . Because setting θ = 1 in Eq. (D11) and combining with Eq. (D10), we obtain the desired inequality in Eq. (87).

Excess TUR
Let us prove the relation (92). We separate the time dependence as t = τ s, so that s ∈ It is a natural assumption that the initial distributionp(0; τ ) does not depend on τ . We first consider the tilting of the transition rates with τ fixedk θ e (s; τ ) =k e (s) exp where θ is the parameter of the tilted transition rate. From the general formula (D6), the KL divergence between P 0 and P dθ given by the tilting (D15) can be calculated as because ∂ θ lnk θ e (s) =J * e (s; τ )/χ e (s; τ ). From the log mean inequality, we also have Thus, the fluctuation-response inequality provides Next, we identify the quantity ∂ θ J w θ τ θ=0 . Let {p θ (s; τ )} s∈[0,1] be the solution of the master equation given byk θ e (s; τ ), where θ = 0 reproduces the original dynamics Eqs. (D13) and (D14). If we use the notation (D26) It is the same equation as Eq. (D23). Because τ ∂ τp (s; τ ) and r(s; τ ) satisfies the same first order differential equation and they share the initial condition τ ∂ τp (0; τ ) = r(0; τ ) = 0, we conclude that r(s; τ ) = τ ∂ τp (s; τ ) and Therefore, the optimal force is conservative, and it is sufficient to consider conservative forces when we calculate the Wasserstein distance.
Combining it with f * (t) = B T ψ * (t), we find the equation We next show ψ * gives the distance by the duality formula. Because of the equality for the log mean shown in Ref. [43], we have the relation where u 1 and u 2 denote the first and second argument of L. Combining it with equation (G4), we obtain the Hamilton-Jacobi equation Then, the Wasserstein distance becomes where we used the continuity equation d t p * = BL(p)B T ψ * in the second equality of the first line, and did the integration by parts in the second line. Recall that · i indicates the expectation value under p (i) . Therefore, we have the formula 1 2T W(p (0) , p (1) ) 2 = ψ * (T ) 1 − ψ * (0) 0 , Finally, we prove that ψ * provides the maximum under the condition in Eq. (106) (also shown in Eq. (G13) below). To show that ψ * satisfies the condition, let g(p) := p T d t ψ * (t) + (1/2) B T ψ * (t) 2 L(p) . Then, and it is zero because ψ * satisfies Eq. (G4). As we will show later, since e (p) is a concave function, g(p) is also concave, so p * gives the maximum value of g(p) [68]. Therefore, we conclude g(p) ≤ 0 and ψ * satisfies the condition. Let ψ also satisfy the condition. Then, where the third line follows from the condition (106), and the fourth line is derived from a general property of an inner product. Therefore, we obtain the expression s.t. q T d t ψ(t) + 1 2 B T ψ(t) 2 L(q) ≤ 0 for any probability distribution q.
The concavity of e (p) follows from the concavity of the log mean because the map p → (W e p ι(e) , W −e p ι (e) ) is linear. Let us prove that the log mean is concave. The log mean has an integral form L(x, y) = 1 0 dt x t y 1−t . (G14) Then, the Hessian H is given as where we used the Cauchy-Schwarz inequality in the last line. The trace is shown to be negative because t(t − 1) ≤ 0 and x, y are positive by definition, so H xx and H yy are negative. Therefore, the Hessian H is negative semidefinite and the log mean is concave.
Appendix H: Derivation of Eq. (120) In this section, we provide the proof of the inequality in Eq. (120). First, we characterize the L 2 -Wasserstein distance by a functional slightly different from what we used in the previous sections. Next, we show relations between the L 2 -and L 1 -Wasserstein distance by using the derived expression.