Geometrical aspects of entropy production in stochastic thermodynamics based on Wasserstein distance

We study a relationship between optimal transport theory and stochastic thermodynamics for the Fokker-Planck equation. We show that the lower bound on the entropy production is the action measured by the path length of the $L^2$-Wasserstein distance. Because the $L^2$-Wasserstein distance is a geometric measure of optimal transport theory, our result implies a geometric interpretation of the entropy production. Based on this interpretation, we obtain a thermodynamic trade-off relation between transition time and the entropy production. This thermodynamic trade-off relation is regarded as a thermodynamic speed limit which gives a tighter bound of the entropy production. We also discuss stochastic thermodynamics for the subsystem and derive a lower bound on the partial entropy production as a generalization of the second law of information thermodynamics. Our formalism also provides a geometric picture of the optimal protocol to minimize the entropy production. We illustrate these results by the optimal stochastic heat engine and show a geometrical bound of the efficiency.

In optimal transport theory [36,37], another geometry explains the optimal control and is related to thermodynamics. In optimal transport theory, a geometric measure called the L 2 -Wasserstein distance quantifies a difference between two probability distributions. A relationship between L 2 -Wasserstein distance and thermodynamic relaxation has been discussed, especially for the Fokker-Planck equation. For example, R. Jordan, D. Kinderlehrer, and F. Otto showed that the time evolution of the Fokker-Planck equation minimizes the sum of the free energy and the L 2 -Wasserstein distance [38]. A trend to thermodynamic equilibrium for the Fokker-Planck equation has also been discussed using the L 2 -Wasserstein distance [39]. Remarkably, the terminology of the entropy production is also used in optimal transport theory [36], and a connection between the entropy production and the L 2 -Wasserstein distance has been discussed [40][41][42]. Moreover, a relationship between optimal transport theory and information geometry has been mathematically discussed [43,44].
In parallel with information geometry, optimal transport theory can explain the optimal control of thermodynamic systems and thermodynamic trade-off relations. Optimal transport theory has been used in stochastic thermodynamics to find a heat minimization protocol [45][46][47][48]. In the context of a heat minimization protocol, E. Aurell et al. have derived the lower bound on the entropy production [46] using the Banamou-Brenier formula [49] in optimal transport theory. Recently, A. Dechant and Y. Sakurai have pointed out that this lower bound on the entropy production is regarded as a thermodynamic speed limit [50]. Several thermodynamic trade-off relations about the efficiency of the stochastic heat engine has been derived based on the optimal transport theory [51,52]. Nevertheless the usefulness of the above result, optimal transport theory has not been well focused in the field of stochastic thermodynamics [53,54]. This paper shows a novel relationship between optimal transport theory and stochastic thermodynamics for the Fokker-Planck equation. Based on a connection between the entropy production rate and the L 2 -Wasserstein distance [41], we clarify geometrical aspects of the entropy production and derive several thermodynamic trade-off relations. To consider an infinitesimal time evolution step, we newly show that the entropy production is bounded by the time integral of the square of the velocity, namely the action in differential geometry, measured by the space of the L 2 -Wasserstein distance. Furthermore, the entropy production can be proportional to the action with some assumptions where the force is given by the potential. This result provides a geometric interpretation of the entropy production for the Fokker-Planck equation. Using this geometrical expression of the entropy production, we obtained a lower bound on the entropy production as a generalization of the thermodynamic speed limit, which is tighter than the previous result [46,50]. Remarkably, the derivation of the new thermodynamic speed limit is the same as the original derivation of the thermodynamic speed limit based on information geometry [10]. Moreover, we discuss stochastic thermodynamics for the subsystem [55][56][57][58][59][60] and the stochastic heat engine [61] by using the L 2 -Wasserstein distance. We obtain a tighter bound on the partial entropy production as a generalization of the second law of information thermodynamics [12,[55][56][57][58][59][60]. We illustrate our results by using the examples of the harmonic potential where the entropy production is proportional to the action. Based on the geometrical interpretation of the entropy production, we obtain a geometrical constraint of the heat engine's efficiency and analytical derivation of the optimization protocol [62] to minimize the entropy production. We also numerically illustrate a tightness of a generalized thermodynamic speed limit and the optimal heat engine based on the Wasserstein distance.
This paper is organized as follows. In Sec. II, we review previous results on stochastic thermodynamics and optimal transport theory. We formulate the setup of the Fokker-Planck equation in Sec. II A. We introduce the concept of the Wasserstein distance in Sec. II B. In Sec. III, we discuss our main results, which are a geometrical interpretation of the entropy production and new geometric lower bounds on the entropy production. We present a geometrical interpretation of the entropy production in Sec. III A and discuss new geometric lower bounds on the entropy production in Sec. III B. In Sec. IV, we discuss a generalization of the result in Sec. III for a subsystem. We introduce the setup of a subsystem and generalize the main result in Sec. IV A. We discuss an information-thermodynamic interpretation and derive a new lower bound on the partial entropy production in Sec. IV B. In Sec. V, we illustrate the main result by several examples. In Sec. V A, we discuss the stochastic heat engine and show a geometric lower bound on the efficiency. In Sec. V B, we show the optimal protocol to minimize the entropy production based on our geometric interpretation analytically. In Sec. V C, we numerically illustrate geometric lower bounds on the entropy production and the estimation of the entropy production based on the lower bound. In Sec. V D, we numerically discuss the optimal protocol to minimize the entropy production for the stochastic heat engine. In Sec. VI, we conclude this paper with some remarks.

II. REVIEW ON STOCHASTIC THERMODYNAMICS AND OPTIMAL TRANSPORT THEORY
A. Stochastic thermodynamics for Fokker-Planck equation In this paper, we consider the probability distribution p t (x) of a particle in a Euclid d-dimensional position x ∈ X(= R d ) at time t. The time evolution of p t (x) is described by the following Fokker-Planck equation for a particle driven by a potential V t (x) with mobility µ attached to a heat bath at temperature T , where ∇ is the del oparator, and ν t (x) is a quantity called the mean local velocity. We here set the Boltzmann constant to unity k B = 1. As a continuity equation, the mean local velocity ν t (x) is regarded as the velocity field. In stochastic thermodynamics [53], the internal energy U , the extracted work dW , the heat received from the heat bath dQ, and the entropy of the system S sys at time t are defined as follows, By definition, the heat dQ satisfies the first law of thermodynamics dU/dt = dW/dt + dQ/dt. From these definitions (3)-(6), the entropy production rate at time t is calculated as where we used Eq. (1) and the normalization of the probability (d/dt)[ dxp t (x)] = 0, and we assumed that p t (x) vanishes at infinity. The symbol ||ν t || 2 := ν t · ν t indicates the square of L 2 norm. Thus, the entropy production rate σ t is given by the expected value of the square of the mean local velocity divided by the factor µT . The entropy production from time t = 0 to time t = τ is defined as the time integral of the entropy production rate, We can easily check the non-negativity of the entropy production Σ ≥ 0, which implies the second law of thermodynamics.
B. Optimal transport theory and L 2 -Wasserstein distance Next, we discuss the geometric measure of optimal transport called the L 2 -Wasserstein distance [37]. We consider the cost function c(x, y) of transporting a single particle at the point x ∈ X to the point y ∈ X. We first introduce the Monge-Kantrovich minimization problem [63], which quantifies a difference between two probability distributions p(x) and q(y). The optimal transport cost for c(x, y) between p(x) and q(y) is defined as where the lower bound is taken over the entire set P(p, q) of joint probability distributions Π(x, y) on X × X, q(y) = dxΠ(x, y), Π(x, y) ≥ 0}, (12) where marginal distributions of Π(x, y) in the set P(p, q) are given by p(x) and q(y). Therefore, the optimal transport cost gives a minimum value of the expected value of the cost c(x, y) for the joint distribution Π(x, y). We call the value of Π that minimizes the expected value of the distance as the optimal transport plan Π * , which is defined as In general, the Monge-Kantrovich minimization problem is hard to be solved analytically. However, if we consider the L 2 -norm as the optimal transport cost on the Euclidean space, the Monge-Kantrovich minimization problem can be solved with few restrictions [37]. This optimal transport cost leads to the L 2 -Wasserstein distance which plays an important role in this paper. The L 2 -Wasserstein distance W(p, q) is introduced by the square root of the optimal transport cost for the cost function which is the square of the L 2 -norm. Explicitly, the L 2 -Wasserstein distance W(p, q) between p and q is defined as where C(p, q) is the optimal transport cost for the cost function c(x, y) = x − y 2 . The L 2 -Wasserstein distance is well defined [37] if two probability distributions p and q satisfy which is only an assumption for two distributions p and q to define L 2 -Wasserstein distance. We assume this condition Eq. (15) throughout the paper. Furthermore, it is known that there exists a map T p→q (x) is the delta function [37]. This map T p→q is called the optimal transport map from p to q. Using the fact that the marginal distributions of Π * (x, y) are p(x) and q(y), we can obtain dyf (y)q(y) = dx dyf (y)Π * (x, y) for any differential and measurable function f (x). If we consider the change of variables y = T p→q (x) and dy = We consider optimal transport from the probability distribution p(x) to the probability distribution q(y). The length of the green arrow shows the optimal transportation distance ||x − Tp→q(x)||, and the square of the L 2 -Wasserstein distance is given by the expected value of the square of its optimal transportation distance.
dx| det(∇T p→q (x))|, we obtain the Jacobian equation [37] p where | det(∇T p→q (x))| denotes the determinant of the Jacobian matrix ∇T p→q at x. By using the optimal transport map, the L 2 -Wasserstein distance is calculated as Thus, the L 2 -Wasserstein distance can be regarded as the expected value of ||x − T p→q (x)|| 2 (see Fig. 1).
We briefly introduce the Benamou-Brenier formula [49], which is related to a relation between the entropy production and the L 2 -Wasserstein distance in this paper. If dynamics of the probability q t (x) at time t are driven by the continuity equation with the velocity field v t (x), the L 2 -Wasserstein distance gives the lower bound on the expected value of the square of the velocity field, where we consider the time integral from time t = 0 to time t = τ . Because the velocity field of the Fokker-Planck equation is the mean local velocity, we obtain a relation between the entropy production rate and the L 2 -Wasserstein distance as discussed in the next section.

III. ENTROPY PRODUCTION AND L 2 -WASSERSTEIN DISTANCE
A. Relation between Wasserstein distance and entropy production rate In this section, we discuss a relation between the L 2 -Wasserstein distance and the entropy production rate, which is a main result in this paper. We set that dynamics of the probability distribution p t (x) are described by the Fokker-Planck equation (1). We define the path length on the probability simplex measured by the L 2 -Wasserstein distance from time t = 0 to time t = τ as where n is a positive integer satisfying n∆t ≤ τ ≤ (n+1)∆t.
The entropy production rate is bounded by which is a main result in this paper. This main result is consistent with the optimal transport theory for an infinitesimal time transition in Ref. [41]. This equation gives a relation between the L 2 -Wasserstein distance and the entropy production rate for the Fokker-Planck equation. In terms of the L 2 -Wasserstein distance, the quantity (dL t /dt) 2 is given by Thus, this inequality can be regarded as the Benamou-Brenier formula [49] for the short time τ = ∆t, We next discuss the situation that the equality in Eq. (22) holds. We introduce a non-negative term σ rot t defined as and discuss the situation σ rot t = 0. We consider the Taylor expansion of the optimal transport map T pt→pt+∆t (x) up to the order ∆t, where a 1 (x) is the first order of the Taylor coefficient. From Eq. (18), we obtain an expression of (dL t /dt) 2 , Thus, if the mean local velocity gives an optimal transport map, that is ν t (x) = a 1 (x), the equality holds and σ rot t = 0.
Next, we consider the difference between a 1 (x) and ν t (x). By substituting (p t , p t+∆t ) into (p, q), the Jacobian equation in Eq. (17) is given by We calculate the Taylor expansions of the determinant up to the order ∆t as follows From the Fokker-Planck equation (1), we also obtain which is the discretized version of the Fokker-Planck equation for the short time ∆t. By inserting Eqs. (26), (29) and (30) into Eq. (28), we obtain By considering the first-order terms of ∆t, we obtain In case of d = 3, this equation implies the existence of a vector potential A t (x) because of Helmholtz's decomposition such as Thus, this vector potential A t quantifies a difference between optimal transport plan and the time evolution of the Fokker-Planck equation from time t to time t+∆t. In the general case of d = 3 or the case of non-Euclidean space, we may consider the Helmholtz-Hodge decomposition to obtain an expression of To find the expression of σ rot t , we use the formula for the time derivative of the L 2 -Wasserstein distance [37]. The following formula d ds holds for any probability distribution p(x), where we used the notation T t = T p→pt . The proof of this formula (33) is shown in Appendix A. By applying the Taylor expansion Eq. (26) to the formula Eq. (33) for (p, p t+s ) = (p t , p t+∆t+s ), we obtain the following equation, where we used Eq. (16). From the definition of the path length Eq. (21), we obtain for small s. Therefore, we also obtain d ds By comparing Eq. (36) with Eq. (34), we obtain another ex- We also obtain expressions of σ rot t , where we compared Eq. (27) with Eq. (37). Thus, σ rot t is non-negative, and zero if ||a 1 (x) − ν t (x)|| = 0. This value σ rot t quantifies the amount of a difference between the velocity field of optimal transport and the mean local velocity. In case of d = 3, σ rot t is calculated as which quantifies the amount of the rotation because σ rot t is proportional to the mean value of the square of the rotation We discuss when σ rot t vanishes. If the mean local velocity ) as we assumed in Eq. (2), the quantity σ rot t is given by where we used Eq. (31) and dS denotes the surface integral. If the quantity [(ν t (x)−a 1 (x))p t Φ t vanishes at infinity, the quantity σ rot t becomes zero. The assumption that the probability p t vanishes at infinity is physically natural. Therefore, σ rot t vanishes in a physically natural situation and we obtain The above condition ν t (x) = −∇Φ t is based on the assumption that the force −∇V t (x) is given by the gradient of the potential V t (x). In the section V B, we analytically show σ rot t = 0 for the 1D-Brownian particle trapped in the harmonic potential. If the force is given by the non-poteintial force and the mean local velocity ν t (x) is not given by the potential, the term σ rot t might not be zero. The non-poteintial force is needed to achieve nonequilibrium steady state, and the steady flow and the steady force should be cyclic because of the Schnakenberg network theory [64]. The quantity σ rot t might play an important role in the steady state thermodynamics with the non-potential force [65].

B. Geometric lower bounds on entropy production and thermodynamic speed limits
We here discuss a lower bound on the entropy production Σ := dtσ t based on Eq. (22). By using Eq. (22), the entropy production from time t = 0 to time t = τ is bounded by In differential geometry, the quantity C = (1/2) τ 0 dt (dL t /dt) 2 is called the action, and Eq. (22) implies that the entropy production for the Fokker-Planck equation is bounded by the action measured by the path length of the Wasserstein L 2 distance, If σ rot = 0, the entropy production is proportional to the action measured by the path length of the Wasserstein L 2 distance Σ = 2C/(µT ). Here, we consider the following Cauchy-Schwarz inequality which gives a lower bound on the action. In information geometry, this inequality has been considered [9] as a trade-off relation between time τ and the action C. By considering (dL t /dt) 2 as the Fisher information of time, several variants of thermodynamic speed limits can be derived from this inequality for the Markov jump process [10], the Fokker-Planck equation [26] and the rate equation [23] in stochastic thermodynamics of information geometry. In the same way, we obtain a lower bound on the entropy production by considering the action measured by the L 2 -Wasserstein distance (see also Fig. 2), which is also a main result in this paper. Because this inequality implies a trade-off relation between time and the entropy production, this result can also be regarded as a generalization of thermodynamic speed limits. Since we use the Cauchy-Schwarz inequality, the equality can be achieved when the probability distribution moves with a constant velocity on the L 2 -Wasserstein distance space, that is, when it satisfies the following equation for any 0 ≤ t ≤ τ . Using the fact that the L 2 -Wasserstein distance satisfies the triangle inequality for probabilities p, q and r [37] we obtain the following inequality, from the definition of L τ . Using Eq. (47) and the above inequality, we can obtain the previously known inequality in Refs. [46,50], which is equivalent to the Benamou-Brenier formula [49] because the entropy production rate is given by the expected value of the square of the velocity field ν t (x). Considering the above derivation, the condition for the equality to hold is when the probability distribution changes at a constant speed on a straight line as measured by the L 2 -Wasserstein distance, In this case, the entropy production is minimized with constraints p 0 and p τ . Moreover, when the initial distribution p 0 , the final distribution p τ , and the time interval τ are specified, the protocol to achieve this equality can be numerically obtained by the algorithm of the fluid mechanics [49]. In other words, by using this algorithm, we can construct an efficient heat engine for small systems with the minimum entropy production. Similarly, we obtain another lower bound by applying the Cauchy-Schwartz inequality Eq. (46) and the triangle inequality Eq. (49). Let us consider the time interval t i = τ (i/N ). Because the entropy production is given by another lower bound on the entropy production can be obtained in a similar way as follows whereΣ(t; s) is the lower bound on the entropy production by the Benamou-Brenier formula from time t to time s, Moreover, in case of σ rot t = 0, we obtain because the change from p ti to p ti+1 is at a constant rate on a straight line as measured by the L 2 -Wasserstein distance in the limit t i+1 − t i = τ /N → 0. Remarkably, a calculation ofΣ(t i ; t i+1 ) does not require information of the joint probability distribution at time t i and t i+1 , while the experimental estimation of the entropy production based on the fluctuation theorem needs information of the joint probability distribution [66]. It is relatively difficult to estimate the joint probability in an experiment with a small number of samples, compared to two probabilities. This fact might be useful to estimate the entropy production in an experiment by using Eq. (57). This estimation of the entropy production by using Eq. (57) is similar to the estimation of the entropy production based on thermodynamic trade-off relations such as thermodynamic uncertainty relations [67][68][69][70].

A. Stochastic thermodynamics for subsystem
Stochastic thermodynamics for a subsystem has been discussed in terms of information thermodynamics, which explains a paradox of the Maxwell's demon [55]. In information thermodynamics, we consider a relation between the partial entropy production and information flow for the 2D Fokker-Planck equation (58) or the 2D Langevin equations [59,60,71]. In this section, we discuss a relationship between the partial entropy production and the L 2 -Wasserstein distance for a subsystem.
We start with two-dimensional systems X and Y . Stochastic dynamics of two positions x ∈ X(= R) and y ∈ Y (= R) are driven by the following Fokker-Planck equation We first consider the situation that the position y is the hidden degree of freedom and we can only observe the position x. Thus, we can only measure the marginal distribution of X defined as The time evolution of the marginal distribution is given by whereν X t (x) is the marginal mean local velocity of X and p is the conditional probability of Y under the condition of X. If we want to measure the entropy production rate for this system, we only obtain the apparent entropy production rate of X, which is different from the partial entropy production rate of X, From the Cauchy-Schwarz inequality, we obtain the inequality Thus, the apparent entropy production rateσ X t is always smaller than the partial entropy production rate σ X t . The apparent entropy production rate is equivalent to the partial entropy production when ν X t (x, y) =ν X t (x). This condition implies that the potential force −∂V t (x, y)/∂x does not depend on y, and the systems X and Y are statistically independent p t (x, y) = p X t (x)p Y t (y) with p Y t (y) := dx p t (x, y). If we define the path length of X from time t = 0 to time t = τ as our result for the path length of X gives the apparent entropy production rate of X, where n is a positive integer satisfying n∆t ≥ τ ≥ (n + 1)∆t. We also obtain a lower bound on the apparent entropy production rate of X as follows, Compound inequalities of Eqs. (64), (68) and (69) can be 446 regarded as an information-thermodynamic speed limit for the 447 subsystem. We remark thatν X t (x) is given by the y). In this case, we may obtain σ X t =σ X t = (dL X t /dt) 2 /(µT ) because of the same reason in case of σ tot t = 0 for the total system. We also remark thatν X t (x) is also given byν X t (x) = −∂ xΦ X t with the potentialΦ X t = dyp Y (y)Φ t if the systems X and Y are the statistically independent p t (x, y) = p X t (x)p Y t (y). Now, we discuss a relationship between the subsystem X and the subsystem Y . We introduce the marginal mean local velocity of Y , the apparent entropy production rate of Y and the partial entropy production rate of Y as follows, We also obtain a lower bound on the apparent entropy production rate of Y as follows, where n is a positive integer satisfying n∆t ≤ τ ≤ (n+1)∆t. By definition, the entropy production rate is given by the sum of the partial entropy production rates, Because is satisfied. Thus, we obtain lower bounds on the entropy production from Eqs. (68), (69), (75), (76) and (79), From the non-negativity of (L Y τ ) 2 and W(p Y 0 , p Y τ ), we also obtain as looser bounds. This result implies that the entropy production of the total system is generally bounded by geometry of the L 2 -Wasserstein distance for two subsystems X and Y .

B. Information thermodynamics
We next discuss an interpretation of the above result based on information thermodynamics. In information thermodynamics, we consider the following decomposition of the partial entropy production σ X t into informational term −İ X and thermodynamic terms σ X bath;t + σ X sys;t . The partial entropy production rates of X and Y for Eq. (58) are calculated as where σ X bath;t (σ Y bath;t ) is the entropy change of the system X (Y ), σ X bath;t (σ Y bath;t ) is the entropy change of the heat bath attached to the system X (Y ), andİ X (İ Y ) is information flow from X to Y (Y to X). We remark that this information flow is related to other measure of information flow called the transfer entropy [56,72].
We explain the decomposition of the partial entropy production rates Eqs. (84) and (85). The entropy changes of the system X and Y are given by the differential entropy change, where we used the partial integral and the normalization of the probability (d/dt) dxp X t (x) = 0. The sum of the entropy changes of the heat bath gives the total entropy changes of the heat bathes where we used the partial integral. The sum of information flows gives the change of the mutual information I between X and Y , where we used the partial integral, the marginalization dyp t (x, y) = p X t (x) and dxp t (x, y) = p Y t (y), and the normalization of the probability thus the sum of the partial entropy production rates gives the total entropy production rate.
The non-negativity of the partial entropy production rates gives the second laws of information thermodynamics for the subsystem [56][57][58][59][60]71], which implies that the entropy changes of the system and heat bath are bounded by information flow in the presence of the subsystem. The sum of two inequalities gives the second law of thermodynamics for the total system These inequalities Eq. (100) and (101) explain a conversion between information and thermodynamic quantities in the context of the Maxwell's demon. In the study of autonomous demon, the system Y can be considered as the Maxwell's demon, and the system X is regraded as the system where the entropy changes σ X bath;t + σ X sys;t can be negative. The second laws of information thermodynamics explains the reason why the entropy changes can be negative. Because of the information flow from the demon to the systemİ X , the entropy changes can be negative.
Based on the results Eqs. (64) and (66), we obtain tighter inequalities compared to the second law of information thermodynamics as follows Thus, the entropy changes of the system and heat bath are tightly bounded by both information flow and the L 2 -Wasserstein distance. We now consider the situation σ rot t = 0. Because the sum of the partial entropy production rates gives the total entropy production rate, the sum of two tighter inequalities gives nonnegativity of a measure I W , The equality holds when because holds in case of σ rot t = 0, and we can obtain under the condition Eq. (108). The measure I W quantifies both the statistical independence and the independence of the potential, while the mutual information I only quantifies the statistical independence. Thus, I W could be an interesting measure of the independence between two systems when stochastic dynamics of two systems are driven by the Fokker-Planck equation. Its non-negativity is decomposed by tighter inequalities of information thermodynamics Eqs. (104) and (105).

A. Stochastic heat engine and geometrical bounds on efficiency
Let us consider a stochastic heat engine [61] driven by the potential V t . The cycle of a stochastic engine consists of the following four steps (see also Fig. 3).
1. Let us consider an isothermal process of varying the potential V t (x) during time 0 ≤ t < t h at temperature T h . During this step, the probability distribution changes from p a to p b , and the entropy change of the system is given by ∆S := dxp a (x) ln p a (x) − dxp b (x) ln p b (x). In this step, the work is extracted −W h := t h 0 dt(dW/dt) > 0 for the external system. 2. The temperature is changed from T h to T c (< T h ) instantaneously at time t = t h . During this time, the distribution p b does not change. Therefore, the entropy of the system also did not change, and this step can be interpreted as an adiabatic process.  FIG. 3. An example of a stochastic heat engine. Because the initial state at time t = 0 is the same as the final state at time t = t h + tc, the four steps gives the cycle of a stochastic heat engine. The work −W h is extracted during time 0 ≤ t < t h , and the work Wc is done during time t h ≤ t < t h + tc. The total amount of the work through one cycle −W = −W h + Wc > 0 is extracted.

Let us consider an isothermal process that returns the
During this step, the probability distribution changes from p b to p a , and the entropy change of the system is −∆S. In this step, the system is assumed to be given work W c := t h +tc t h dt(dW/dt) > 0 by the external system.
4. The temperature is changed from T c to T h instantaneously at time t = t h + t c . During this time, the distribution does not change. Therefore, the entropy of the system also did not change, and this step can be interpreted as an adiabatic process.
If we consider the harmonic potential and the initial distribution p a is Gaussian, thermodynamic quantities such as the entropy change and the work are calculated, and we can find an optimal protocol to minimize the entropy production can be obtained analytically [61]. As we shown in the section V B, σ rot t = 0 and the entropy production rate is proportional to the action.
Here we consider a general case that the potential is not necessarily harmonic and the probability distribution at time t is not necessarily Gaussian. When the time t h and t c are long enough and the potential V t (x) is a harmonic potential, the efficiency of the heat engine becomes the Carnot efficiency asymptotically, and the heat engine can be considered as a stochastic extension of the Carnot cycle. The extracted work of the heat engine through the one cycle is where Σ h := t h 0 dtσ t is entropy production in the isothermal step 1 at temperature T h and Σ c := t h +tc t h dtσ t is the entropy production in the isothermal step 3 at temperature T c . If we assumed that the extracted work is positive −W > 0, the condition ∆S ≥ 0 should be needed because of the second law of thermodynamics Σ h ≥ 0 and Σ c ≥ 0.
By using Eq. (51), we can obtain the following inequality for the extracted work −W , where t r is called the reduced time. When we impose the positive extracted work in the whole cycle, i.e., −W > 0, we obtain the following inequality for the reduced time t r from Eq.(112), This inequality implies that the reduced time in the engine is generally bounded by the ratio of the square of the L 2 -Wasserstein distance W(p a , p b ) 2 to the entropy change ∆S, which are given by the initial distribution p a and the final distribution p b . The efficiency of the heat engine η is defined as Because the second law of thermodynamics Σ h + Σ c ≥ 0 holds, we obtain the fact that the efficiency is generally bounded by the Carnot efficiency η C [73], When we considered the situation that the entropy production is minimized as follows the efficiency η is given by and reaches to the Carnot efficiency η C in the limit t h → ∞ and t c → ∞. This fact is also discussed in Ref. [51]. In the limit t h → ∞ and t c → ∞. the square of the L 2 -Wasserstein distance plays the same role as the irreversible "action" A irr in Ref. [61]. When σ rot t = 0, we obtain a geometric interpretation of the efficiency from Eq. (25), In this case, we obtain a lower bound on the efficiency where C = (1/2) t h +tc 0 dt(dL t /dt) 2 is the action measured by the L 2 -Wasserstein distance. The efficiency η can reach to the Carnot efficiency η C when the ratio between the action and the Shannon entropy change C/∆S converges to zero. In general σ rot t = 0 and this lower bound Eq. (122) is generally violated, especially when the non-potential force exists. Thus, the quantity σ rot t might play an important role in a stochastic heat engine with the non-potential force.

B. Analytical calculation of geometric optimal protocol
We here discuss dynamics of a Brownian particle in a harmonic potential as an example of stochastic thermodynamics based on L 2 -Wasserstein distance. In this case, we can show σ rot t = 0, and obtain the protocol of minimizing the entropy production analytically. In terms of the Langevin equation, the time evolution of the position x(t) at time t is given by with the harmonic potential where ξ(t) is the Gaussian noise with the mean ξ(t) = 0 and the variance ξ(t)ξ(t ) = δ(t − t ). This Langevin equation corresponds to the following Fokker-Planck equation [74], We now assume that the probability distribution at the initial time is Gaussian. For the harmonic potential, the probability distribution at time t is Gaussian if the probability distribution at the initial time is Gaussian. The probability distribution p t (x) is written as the Gaussian distribution with the mean E[x] t and the variance Var[x] t at time t, Therefore, the mean local velocity ν t (x) is analytically calculated as and the entropy production rate is also calculated as The Wasserstein distance can be concretely calculated for the Gaussian distribution [75,76]. For two probability distributions and the L 2 -Wasserstein distance can be written as follows This L 2 -Wasserstein distance is also known as the Fréchet distance [77]. Using this analytical expression of the L 2 -Wasserstein distance for two Gaussian distributions, we can confirm σ rot = 0 in this case as follows where we used Eqs. (130) and (131).
We also can confirm that the entropy production Σ is minimized if Eq. (53) holds. The minimum value of the entropy production Σ for fixed p 0 and p τ is calculated as where we used the Cauchy-Schwarz inequality The minimum value can be achieved if dθ t /dt is constant. This condition of the minimum value can be rewritten as or equivalently, Under this condition, W(p 0 , p τ )/τ is calculated as which is the condition that the probability distribution changes at a constant rate on a straight line measured by the L 2 -Wasserstein distance Eq. (53). By comparing Eqs. (144) and (145) with Eqs. (130) and (131), the optimal protocol that minimizes the entropy production is given by In terms of the parameters of the harmonic potential V t (x), the optimal protocol that minimizes the entropy production is given by Thus, we obtain the parameters k t and a t which realize such an optimal protocol in practice If we assume that k t is always non-negative, the following inequality must hold for this optimal protocol. The results implies that when the variance gets smaller, i.e., Var[x] τ < Var[x] 0 , we can use this optimal protocol for all τ > 0, but when the variance gets larger, i.e., Var[x] τ ≥ Var[x] 0 , there is a limit to the time τ for the process to achieve this optimal protocol.

C. Numerical illustration of thermodynamic speed limits
We numerically test lower bounds on the entropy production. We consider the Brownian motion with the harmonic potential Eqs.(123) and (124). The parameters of the Brownian motion are given by µ = 0.01 and T = 1. We consider the case that the parameters of the harmonic potential are periodically changed as follows, a t = 10 sin(t).
The initial distribution is Gaussian with E[x] 0 = Var[x] 0 = 1, and calculate the time evolution from τ = 0 to τ = 10. In Fig. 4, we illustrate the tightness of the lower bound L 2 τ /(µT τ ) in Eq. (47) compared with the lower bound W(p 0 , p τ ) 2 /(µT τ ) in Eq. (51). The value of the Wassserstein distance W(p 0 , p τ ) oscillates for the periodic change of the potential whereas the value of the Wassserstein path length L τ monotonically increases in time. This fact is a reason why our new bound L 2 τ /(µT τ ) becomes much tighter than the previous bound W(p 0 , p τ ) 2 /(µT τ ) for the periodic change of the potential.  We also numerically check the lower bound in Eq. (55) and the estimation of the entropy production based on Eq. (57). In Fig. 5, we numerically calculate the lower bound N −1 i=0Σ (t i ; t i+1 ) as a function of the integer N = τ (i/t i ) at time τ = 10. The inequality Eq. (55) holds for any N , and the lower bound converges to the entropy production in the limit N → ∞. This result implies that σ rot = 0 and the entropy production is numerically estimated from the Wasserstein path length. Our lower bound might be useful to estimate the entropy production from the measurement of the probability distribution at the short time interval. D. Numerical calculation of optimal stochastic heat engine in finite time We numerically discuss the optimal protocol for stochastic heat engine in finite time. We here consider the Brownian motion with the harmonic potential Eqs.(123) and (124). We set the parameters as µ = 0.01, t h = t c = 100, T h = 10 and T c = 1. Thus, the Carnot efficiency is given by η C = 0.9. We assume that the probability distribution is Gaussian (152), the optimal parameters of the harmonic potential are given by for 0 ≤ t < t h and By considering a similar discussion to derive Eq. (55), we can obtain the lower bound on the entropy production as follows, In Fig. 6, we show that the lower bound Eq. (162) is equal to the entropy production in this optimal protocol. We can check that the time derivative of the Wasserstein path length dL τ /dt is constant. We next discuss the bound on the efficiency in this case. The change of the Shannon entropy is calculated as ∆S = ln 2 and the Wasserstein distance is calculated as W(p a , p b ) = 1. Thus, the efficiency is numerically obtained as η 0.7145, which is lower than the Carnot efficiency η C = 0.9. On the other hand, the action is calculated as C = 0.01 the lower bound on the efficiency Eq. (122) gives a reasonable value which is smaller than the efficiency η 0.7145.

VI. DISCUSSION
We discuss a geometrical feature of stochastic thermodynamics for the Fokker-Planck equation based on the L 2 -FIG. 6. The entropy production Σ for the optimal heat engine in finite time. The lower bound on the entropy productionΣ is equal to the entropy production Σ in this case. We also show the time evolution of the variance Var[x]τ and the Wasserstein path length Lτ . The red area implies the interval where the temperature of the heat bath is T h , and the blue area implies the interval where the temperature of the heat bath is Tc, respectively.
Wasserstein distance. As shown in this paper, the L 2 -Wasserstein distance is strongly related to the entropy production for the Fokker-Planck equation. Thus, based on L 2 -Wasserstein distance, we can introduce a differential geometry of stochastic thermodynamics for the Fokker-Planck equation, closely related to the entropy production.
It might be interesting to consider a relation between the L 2 -Wasserstein distance and the Fisher information matrix because the Fisher information matrix gives another metric in information geometry, which is also a possible choice of differential geometry of stochastic thermodynamics. For example, the entropy production is given by the projection in information geometry [12]. Thus, there might be a deep connection between information geometry and optimal transport by the L 2 -Wasserstein distance. For example, the HWI inequality, the logarithmic Sobolev inequalities, and the Talagrand inequalities are considered as a trade-off relation among the L 2 -Wasserstein distance, the relative Fisher information, and the Shannon entropy [37,41]. As shown in Ref. [26], we have an analogy between the entropy production rate and the Fisher information of time for the Fokker-Planck equation. This analogy is also pointed out in Ref. [78]. Thus, we might unify two directions of researches of information geometry and the L 2 -Wasserstein distance for the Fokker-Planck equation based on the entropy production. The unification of information geometry and geometry of the L 2 -Wasserstein distance has been recently discussed [43,44], and our results might provide a new direction in this topic.
If we consider thermodynamics based on information geometry, we can consider not only stochastic thermodynamics for the Fokker-Planck equation [26] but also stochastic thermodynamics for the Markov jump process [10] and chemical thermodynamics for the rate equation [23]. Thus, it might be interesting to seek a correspondence of the L 2 -Wasserstein distance for the Markov jump process and the rate equation. Indeed, T. Van Vu and Y. Hasegawa derived a generalization of thermodynamic speed limits for the Markov jump process [52], then a thermodynamic correspondence of the L 2 -Wasserstein distance for the Markov jump process might be the distance discussed in Ref. [52]. Moreover, our result is based on the setting of the overdamped Langevin equation, where the entropy production rate is given by the mean local velocity. Thus, it is interesting to generalize our result for the underdamped Langevin equation or the generalized Langevin equation for a non-Markovian process.
In a nonequilibrium steady state, the quantity σ rot t might play an important role. Under the existence of the nonpotential force, the entropy production rate is generally decomposed into two non-negative parts, the Wasserstein part (dL t /dt) 2 /(µT ) and the non-potential part σ rot t . This fact is very similar to the case of the steady state thermodynamics [65], where the entropy production is decomposed into the excess entropy production and the housekeeping heat. optimal transport plan from p to p t+s , we obtain the inequality W(p, p t+s ) 2 = dx x − T t+s (x) 2 p(x) ≤ dx x − M t→t+s (T t (x)) 2 p(x). (A3) By using Eqs. (A1) and (A3), we obtain d ds x−T t+s (x) 2 − x−M t+s→t (T t+s (x)) 2 2s because the composite map M t+s→t • T t+s is a non-optimal transport plan from p to p t . From Eqs. (A4) and (A5), we finally obtain the formula Eq. (33).