Strong Coupling Thermodynamics and Stochastic Thermodynamics from the Unifying Perspective of Time-Scale Separation

Assuming time-scale separation, a simple and unified theory of thermodynamics and stochastic thermodynamics is constructed for small classical systems strongly interacting with its environment in a controllable fashion. The total Hamiltonian is decomposed into a bath part and a system part, the latter being the Hamiltonian of mean force. Both the conditional equilibrium of bath and the reduced equilibrium of the system are described by canonical ensemble theories with respect to their own Hamiltonians. The bath free energy is independent of the system variables and the control parameter. Furthermore, the weak coupling theory of stochastic thermodynamics becomes applicable almost verbatim, even if the interaction and correlation between the system and its environment are strong and varied externally. Finally, this TSS-based approach also leads to some new insights about the origin of the second law of thermodynamics.


I. INTRODUCTION
One of the most significant discoveries of statistical physics in the past few decades is that thermodynamic variables can be defined on the level of dynamic trajectory [1][2][3]. Studies of these fluctuating quantities in nonequilibrium processes have led to significant results such as Fluctuation Theorems [2], Jarzynski equality [3], as well as a much deeper understanding of the second law of thermodynamics.
Consider, for example, a small classical system with Hamiltonian H(x, λ) weakly interacting with its bath, such that the interaction energy and statistical correlation between the system and the bath are negligibly small. Here x = (q, p) be the canonical variables, and λ an external control parameter. The differential work and heat at trajectory level are defined respectively as: Through out the work, we use the notations d λ H(x, λ) and d x H(x, λ) respectively for differentials of H(x, λ) due to variations of λ and of x [42]. These notations will greatly simplify the presentation of our theory. With H(x, λ) identified as the fluctuating internal energy, the first law at trajectory level then follows directly: dH = d λ H + d x H = dW + dQ. Further using the time-reversal symmetry of Hamiltonian dynamics or Langevin dynamics, one can derive Crooks function theorem, Jarzynski equality, as well as Clausius inequality. Mathematical expressions for various thermodynamic variables of weak-coupling stochastic thermody- * Electronic address: xxing@sjtu.edu.cn namics are shown in the center column of Table I of Sec. V. For pedagogical reviews, see e.g. Refs. [2,3].
In recent years, there have been significant interests in generalizing thermodynamics and stochastic thermodynamics to small systems that are strongly coupled to environment, both classical [5][6][7][8][9][10][11][12][13][14][15][16], and quantum [6,8,[17][18][19][20][21][22][23][24][25][26][27][28]. Strong interactions between system and its environment cause ambiguities the definitions of system thermodynamic quantities [6,8]. If the system size is large, and the interactions are short-ranged, the correlations between system and bath are confined to the interfacial regions, and hence do not influence the bulk properties of the system. This is indeed the reason why classical thermodynamics and statistical mechanics are so successful in describing the equilibrium properties of macroscopic systems, even if these systems may be strongly interacting with environment near the interfaces. Small systems however have no "bulk", and their thermodynamic properties may be overhelmly dominated by their interactions and correlations with environment. Whether one should relegate the interaction energy to the system or to the bath? Whether one should treat the mutual information between system and hath variables as part of system entropy or bath entropy? There seems no general principle in favor of any particular answer. For critical and insightful discussions of these fundamental issues, see the recent articles by Jarzynski [7], and by Talkner and Hänggi [8].
Numerous versions [5-7, 12, 29] of strong coupling thermodynamic theories have been proposed in recent years. Probably the most influential theory was developed by Seifert [5], and critically evaluated by Talkner and Hänggi [6,8]. In this theory, one uses the Hamiltonian of mean force (HMF) H X [4,16,37] to construct the equilibrium free energy F = −T log e −βHX , and then defines equilibrium system energy and entropy via E = ∂βF /∂β, S = −β 2 ∂F /∂β. Whilst these relations exactly hold in equilibrium thermodynamics, they must be deemed as definitions of energy and entropy in Seifert's theory of strong coupling thermodynamics. Interestingly, these definitions correspond to the particular decomposition of total thermodynamic variables A tot = A sys + A bath , where A bath is the thermodynamic variable of the bare bath, with the interaction between the system and bath switched off. Hence it can be said that Seifert allocates the entire interaction energy to the system. These definitions of energy and entropy are further bootstrapped to non-equilibrium situations [5], and fluctuation theorems and Clausius inequality are subsequently established. The resulting formulas (the right column of Table I) in strongly coupled regimes are markedly more complicated than those in weak coupling theory (the central column). These differences however disappear as the interaction Hamiltonian vanishes, and the HMF reduces to the bare system Hamiltonian.
Strasberg and Esposito [14] recently studied the strong coupling problem from the viewpoint of time-scale separation (TSS). They consider a system involving both slow and fast variables. By assuming fast variables in conditional equilibrium, they show that Seifert's theory can be derived by averaging out the fast variables. Furthermore, they proposed a definition of total entropy production in terms of relative entropy, which is a variation of entropy production defined Ref. [35], and show that it is equivalent to the entropy production in Seifert's theory. The conditional equilibrium of bath also allows one to prove the positivity of instantaneous rate of total entropy production, rather than the positivity of total entropy production of an entire process. The importance of TSS has long been known. It was invoked heuristically to justify adiabatic approximation [30,31], Markov modeling [32], or dimensional reduction of dynamic theories [33,34]. Jarzynski [7] developed a more comprehensive (and hence more complex) theory for strong coupling thermodynamics, and systematically discussed the definitions of internal energy, entropy, volume, pressure, enthalpy, and Gibbs free energy. The formalism was established around the concept of volume, whose definition is somewhat arbitrary. All other thermodynamic variables are uniquely fixed by thermodynamic consistency once the system volume is (arbitrarily) defined. Jarzynski further showed that Seifert's theory is a special case of his framework, i.e., the "partial molar representation". He discussed in great detail the "bare representation", where the system enthalpy coincides with HMF. The total entropy production is however the same in both representations. Jarzynski made analogy between the arbitrariness in the definition of thermodynamic variables in the strong coupling regime and the gauge degree of freedom in electromagnetism, which was criticized by Talkner and Hänggi [8].
The main purpose of the present work is to show that, with TSS and the ensuing conditional equilibrium of bath variables, a much simpler thermodynamic theory can be developed for strongly coupled small classical systems. More specifically, we will show that by identifying the Hamiltonian of Mean Force (HMF) as the system Hamil-tonian, and relegating the remaining part of the total Hamiltonian to the bath, both the equilibrium ensemble theory and the weak coupling theory of stochastic thermodynamics remain applicable, almost verbatim, in the strong coupling regime. Work and heat, entropy, and energy all retain the same definitions and the same physical meanings as in the weak coupling theory, as long as the bath entropy understood as conditioned on the system state. Fluctuation Theorems, Jarzynski equality, and Clausius inequality can all be proved using nonlinear Langevin dynamics [39], whose validity relies on TSS but not on strength of coupling. Using the conditional equilibrium nature of bath, it can be rigorously demonstrated that dS − βdQ equals to the entropy change of the universe, which establishes the meaning of Clausius inequality as increasing total entropy. Finally, we will also show that our theory, though significantly simpler, are consistent with all previous theories, in the sense that the total entropy productions in all theories are mathematically equivalent. Summarizing, we achieve a natural unification of thermodynamics and stochastic thermodynamics at both weak and strong coupling regimes.
A logical consequence of TSS is that the dynamic evolution of slow variables can modeled as Markov process, such as Langevin dynamics with white noises. In the strongly coupled regime, the noises are however generically multiplicative. In a complementary paper [39], two of us develop a theory of stochastic thermodynamics using nonlinear Ito-Langevin dynamics, establish its covariance property, and derive Crooks Fluctuation Theorem, Jarzynsk equality, as well as Clausius inequality. The definitions of thermodynamic quantities are identical in these two works, if we take g ij = δ ij in Ref. [39]. (The theory in Ref. [39] was developed for Langevin dynamics on arbitrary Riemannian manifold with invariant volume measure √ g d d x, whereas in the present work, we consider Hamiltonian systems with Liouville measure i dp i dq i . ) Combination of these two works provides a covariant theory of thermodynamics and stochastic thermodynamics for systems strongly interacting with a single heat bath, with TSS as the only assumption.
The remaining of this work is organized as follows. In Sec. II, we introduce our decomposition of the total Hamiltonian, and discuss the equilibrium thermodynamic properties of strongly coupled systems. In Sec. III, we discuss the non-equilibrium thermodynamic properties of the system. Work and heat retain the same definitions and same physical meanings as in the weak coupling theory, i.e., the energy changes of the combined system and of the bath respectively. In Sec. IV, we discuss the connection between heat and entropy change of the bath, conditioned on the slow variables. In Sec. V, we compare our theory with previous theories by Seifert, by Hänggi and Talkner, by Jarzynski, and by Strasberg and Esposito, and show that they are all equivalent. We will also discuss a simple scenario where the present theory fits better with the common intuition about system entropy and heat. In Sec. VI we draw concluding remarks.

II. EQUILIBRIUM THEORY
In this section, we shall demonstrate that by identifying HMF as the system Hamiltonian, and the remaining of the total Hamiltonian as the bath Hamiltonian, canonical ensemble theory can be straightforwardly adapted to describe the equilibrium properties of systems that are strongly coupled to their baths. There is also a related decomposition of total thermodynamic quantities into system parts and bath parts. The bath free energy turns out to be the same as that of a bare bath, and is independent of the state of slow variables or of the external control parameter.

A. Decomposition of total Hamiltonian
We shall use X, Y to denote fast and slow variables, and x, y their values. We shall also call X the system and Y the bath. Let the total Hamiltonian be where H 0 X (x; λ) and H Y (y) are the bare system Hamiltonian and bare bath Hamiltonian, whereas H 0 I (x, y; λ) is the bare interaction. Note that every term in the RHS is independent of temperature, and the bare bath Hamiltonian H Y (y) is independent of λ. Our starting point Eq. (2.1) is more general than those in Ref. [5][6][7], where the bare interaction H 0 I (x, y; λ) is assumed to be independent of λ.
Throughout this work, we shall assume that XY is weakly interacting with a much larger super-bath whose dynamics is even faster than Y. We will call YZ the environment, and XYZ the universe. We shall use y ≡ d N y to denote integration over y, and similar notation for integration over x and z. These notations are especially useful when we dealing with integrals of differential forms. Let T = 1/β be the temperature, which is assumed to be fixed throughout the work. We shall set the Boltzmann constant k B = 1, and hence all entropies are dimensionless. We shall define system Hamiltonian H X (x; λ, β) and interaction Hamiltonian H I (x, y; λ, β) as
We now obtain a new decomposition of H XY : Note that even though both H X and H I depend on β, the total Hamiltonian in LHS of Eq. (2.4a) is independent of β. We further define bath Hamiltonian as and rewrite Eq. (2.4a) as We also define the bath partition function as which is conditioned on X = x, and generally also depends on both x and λ. Using Eqs. (2.4b) and (2.3), we easily see that is the partition function of the bare bath, with the interaction Hamiltonian between X and Y completely switched off.
Hence the bath partition function Z Y (x, λ, β) as defined by Eq. (2.5) is independent of x and λ: and we shall from now on simply write it as Z Y (β). Equation (2.7) will play a very significant role in our theory.

B. Conditional equilibrium of bath
In an intermediate time-scale, the fast variables equilibrate whereas the slow variables barely change. Hence Y achieves equilibrium conditioned on X = x, described by the conditional Gibbs-Boltzmann distribution: with Z Y (β) defined in Eq. (2.5). We further define the conditional free energy of the bath: is the free energy of the bare bath. Equations (2.8) define a conditional canonical ensemble, which describes the equilibrium physics of the fast variables in the intermediate time scales, during which the slow variables are change very little. In this ensemble theory, x serves as a parameter, just like λ and β.
The internal energy and entropy of bath in the conditional equilibrium state are defined in a standard way: S Y|X=x is known in information theory [36] as the conditional Shannon entropy of Y given X = x. Note that even though F Y (β) does not depend on x and λ, both E Y (x) and S Y|X=x do.
Even though the free energy of the bath conditioned on the system state equals to that of the bare bath, there are important differences between other thermodynamic quantities of the bath and the bare bath. For example, the internal energy and entropy of the bare bath are given respectively by which are manifestly different from Eqs. (2.9a), (2.9b).

C. Joint equilibrium of system and bath
In long time-scales, XY achieve a joint equilibrium, which is described by the joint Gibbs-Boltzmann distribution: where Z XY (β, λ) is the canonical joint partition function: From this we can obtain various thermodynamic quantities for this joint canonical ensemble in a standard way: The joint canonical ensemble describes the equilibrium statistical properties of both slow and fast variables.

D. Reduced equilibrium of System
But we may also study the equilibrium distribution of slow variables alone. This reduced canonical distribution can be obtained from Eq. (2.11) by integrating out the fast variables: where used was Eq. (2.5) and the fact that Z Y (β) is independent of x. Hence the equilibrium distribution of X is canonical with respect to the system Hamiltonian H X (x; λ). This is, of course, well known, since we have defined H X (x; λ) as the HMF. It is then convenient to define the partition function of slow variables: (2.14) so that Eq. (2.13) assumes the standard canonical form: Integration of Eq. (2.13) then yields The above results prompt us to construct a reduced canonical ensemble theory for the system, with free energy, internal energy, and entropy given by (2.17d) These definitions of system energy and entropy are manifestly different from the strong coupling theory in Refs. [5,6,8], even though the free energy is the same in two theories.

E. Decomposition of Thermodynamic Variables
Comparing Eqs. (2.17) with Eqs. (2.9) and (2.12), we find the following decomposition of total thermodynamic quantities into system parts and bath parts: where E Y (x) X and S Y|X are respectively averages of E Y (x) and S Y|X=x over fluctuations of X: S Y|X is called the conditional Shannon entropy of Y given X in information theory [36]. Note the subtle differences between the names for S Y|X and for S Y|X=x . There are numerous pleasant features of the equilibrium thermodynamic theory developed here: Firstly all equilibrium distributions are Gibbs-Boltzmann with respect to the corresponding Hamiltonian. Secondly, all entropies are Gibbs-Shannon entropy with respect to the corresponding pdfs. As a consequence, the formulas in Eqs. (2.8), (2.9), (2.12), and (2.17) are all the same as those in canonical ensemble theory. These feature are remarkable, since they indicate that standard canonical ensemble theory are applicable both to the system and to the bath, regardless of the strong interaction and correlation between them. Thirdly, Eq. (2.8b) says that the bath free energy F Y (β) equals to the bare bath free energy F 0 Y (β), and is independent of λ and x. This feature leads to substantial conceptual simplification since we are only interested in the physics of slow variables. Consider, for example we immerse a DNA chain into an aqueous solvent, or stretch it in the solvent, or tune the interaction between a nano-engine and its environment. There is no need to worry about the change of bath free energy, because it stays constant by construction.
All these convenient features follow from the particular decomposition of total Hamiltonian Eqs. (2.4). There are however some subtleties resulting from the temperature dependence of H X , which will be discussed in Sec. V. We shall also give a detailed comparison between our theory and the previous theories by Seifert, Hänggi and Talkner, and by Jarzynski in Sec. V.

III. NON-EQUILIBRIUM THEORY
In this section, we shall show that with the HMF H X identified as the fluctuating internal energy, the weak coupling theory of stochastic thermodynamics becomes applicable in the strong coupling regime.

A. Definitions of energy and entropy
The mission of stochastic thermodynamics starts with definitions of system thermodynamic variables in general non-equilibrium situations. We define the fluctuating internal energy of the system as H X (x; λ, β), the HMF. The non-equilibrium internal energy is then defined as the ensemble average of H X : Throughout this work we use A[p X ] to denote a nonequilibrium thermodynamic variable, to distinguish it from the equilibrium version A. We also define the system entropy as the Gibbs-Shannon entropy: We shall not need to define stochastic entropy in this work. The non-equilibrium free energy of the system is also defined in the standard way: which turns out to be the same as the free energy defined in several previous theories [5,6,8,14]. Note that these definitions of non-equilibrium entropy, energy, and free energy are identical to those in weak coupling theory, with H X understood as the system Hamiltonian. For equilibrium state p X = p EQ X , these thermodynamic variables reduce to their equilibrium counterparts, Eqs. (2.17b), (2.17c), and (2.17a) respectively.

B. Work and heat at trajectory level
Let us now discuss differential work and heat at trajectory level of system variables.
The Hamiltonian of the universe, including system, bath and super-bath, is given by with H XY given by Eqs. (2.4). We assume that the interaction between XY and Z is negligibly small but nonetheless is strong enough to drive thermal equilibration between XY and Z.
We consider a microscopic process with infinitesimal duration dt, where x, y, z and λ change by dx, dy, dz and dλ. Whereas dλ is externally controlled, dx, dy, dz are determined by evolution of Hamiltonian dynamics. As is generally adopted in stochastic thermodynamics, work is defined as the change of total energy of the universe: For now we shall assume that x, y, z, λ are all smooth functions of t [44]. We can then expand Eq. (3.5) in terms of dx, · · · , dλ up to the first order. The coefficients are just the partial derivatives of H XYZ with respect to dx, · · · , dλ. Now note that the universe XYZ is thermally closed. Hence if λ is fixed, H XYZ must be conserved. In another word, Eq. (3.5) can change only due to λ: where in the last equality we have used the fact that both H Y and H Z are independent of λ. Hence the microscopic work dW is independent of the state of the super-bath.
Note that the work dW as given by Eq. (3.7) depends on x, y, λ, dλ. In stochastic thermodynamics, we keep track of dynamic evolution of x but not of y. Hence to obtain the differential work at the trajectory level of system variables, we need to average Eq. (3.7) over the conditional equilibrium as given by Eq. (2.8a): (3.8) This equation and many analogous equations below are understood as volume integral of differential forms. Be careful not to confuse the differential forms dW, d λ H X etc with the volume measure d N y which is hidden in y . Now, taking the λ differential of Eq. (2.5), and further using Eq. (2.7), we find: Hence, even though the interaction Hamiltonian H I may be tuned externally, the work dW at trajectory level is nonetheless independent of H I .
Taking the differential of Eq. (3.4) and using Eq. (3.5), we obtain (3.11) As in above, we take the average Eq. (3.11) over fluctuations of YZ, which results in where · YZ means average over YZ, and dQ is the differential heat at trajectory level of the system variables.
Since H X is defined as the fluctuating internal energy, and dW is the work at the trajectory level, Eq. (3.12) can be interpreted as the first law at the trajectory level if dQ = −d (H B + H Z ) is interpreted as the heat at the trajectory level. Equation (3.13) then says that the heat dQ is negative the average energy variation of the environment YZ. Such an interpretation of heat is exactly the same as that in the weak coupling stochastic thermodynamics. But the differential of fluctuating internal energy dH X can be written as the sum of d λ H and d x H X : (3.14) Comparing this with Eq. (3.12) we obtain an alternative expression for dQ: which must be equivalent to Eq. (3.13). It is attempting to rewrite d x H X in terms of partial derivatives This is however valid only if x(t) is differentiable so that dx is linear in dt. In the limit of time-scale separation, we expect that a typical path of slow variables x(t) becomes that of Brownian motion, which is everywhere continuous but non-differentiable. As a consequence, dx(t) scales as √ dt (Ito's formula) and we need to expand d x H X up to the second order in dx, if the product in RHS of Eq. (3.16) is defined in Ito's sense. We can also interpret the product in RHS of Eq. (3.16) in Stratonovich's sense. Then Eq. (3.16) remains valid even if x(t) is a typical path of Brownian motion. In this work, we shall not write d x H X in terms of partial derivatives, so that we do not need to worry about the issue of stochastic calculus.
Note that the definitions of work and heat at trajectory level, Eqs. (3.10) and (3.15), are the same as those in the weak coupling theory.

C. Work and heat at ensemble level
To obtain work and heat at ensemble level, we need to average corresponding objects at trajectory level over (generally out-of-equilibrium) statistical distribution of dynamic trajectories of X. This is a rather nontrivial task. Luckily, dW as given by Eq. (3.10) is independent of dx. Hence we do not need to know the pdf of dx, but only need to average Eq. (3.10) over statistical distribution p X (x, t) at time t, and obtain the differential work dW at ensemble level: (3.17) Now we want to take ensemble average of heat Eq. (3.15), which does depend on dx ≡ x(t + dt) − x(t), whose distribution is not encoded in the instantaneous distribution p X (x, t). A dynamic theory for dx, such as non-linear Langevin dynamics, would supply the necessary information. This route was pursued in the complementary work [39]. Here we take a detour by studying the average of dH X . Let p X (x, t) and p X (x, t + dt) be the pdfs of x at t and at t+ dt respectively, and dp X (x, t) the differential of p X (x, t) as given by Let us calculate the differential of internal energy as given by Eq. (3.1): Since x is integrated over in RHS, the differential d is due to changes of λ and of p(x, t): But the first term in RHS is just the work at ensemble level, as we defined in Eq. (3.17). Hence the second term must be the heat at ensemble level: dQ = x H X dp X = dQ , (3.21) and Eq. (3.20) becomes the first law at the ensemble level: The definitions of work and ensemble at ensemble level, Eqs. (3.17) and (3.21), are again the same as those in the weak coupling theory of stochastic thermodynamics.

IV. PHYSICAL MEANINGS OF HEAT
In this section, we shall establish the connections between heat (both at trajectory level and at ensemble level) and entropy change of the environment, conditioned on the system state and possibly other thermodynamic variables. We shall also discuss the physical meanings of Clausius inequality and total entropy production. The results are again the same as those in the weak coupling theory, with the conditioning of slow variables properly taken into account.

A. Heat at trajectory level
The universe XYZ is thermally closed, and evolves according to Hamiltonian dynamics with Hamiltonian given by Eq. (3.4). Due to TSS, with x fixed, the environment YZ is described by a micro-canonical ensemble with fixed energy. We define Boltzmann entropy of environment as a function of its energy E YZ : where H Bath is defined in Eq. (2.4b), and Ω YZ (E YZ ) is the area of YZ hyper-surface with constant bath energy E YZ . Note that S YZ (E YZ ) generally also depends on x, λ, β parametrically through H Bath . We shall however not explicit display the parameters x, λ, β, in order not to make the notations too cluttered. Strictly speaking, S YZ (E YZ ) is the Boltzmann entropy of the environment conditioned on X = x. Suppose in the initial state the system is at x with external parameter λ, and the universe XYZ has total energy E XYZ . The energy of the environment is then E YZ = E XYZ − H X . In the final state the system is at x + dx with external parameter λ + dλ, and the universe has total energy E XYZ + dW, with dW given by Eq. (3.10). (Recall that the work is defined as the change of total energy.) The energy of the environment in the final state is then E ′ YZ = E XYZ + dW − H X − dH X , where dH X is given by Eq. (3.14). The Boltzmann entropies of the environment in the initial and final states are hence respectively: Note that E XYZ is much larger than dH X , dW, because the size of super-bath is much larger than XY. Expanding Eq. (4.2b) in terms of dW and dH X and subtracting from it Eq. (4.2a), we obtain: where β = ∂S YZ /∂E YZ is the inverse temperature. Further using Eq. (3.15), we find −βdQ = dS YZ (E YZ ), (4.4) which establishes the connection between the differential heat dQ at the level of system trajectory and the differential of environment Boltzmann entropy dS YZ (E YZ ) conditioned on X = x.

B. Heat at ensemble level, and total entropy production
Recall that XY is in contact with a much larger superbath Z, and that Y is always in conditional equilibrium. If the system is in a non-equilibrium state p X (x), the joint pdf of XY is given by where p EQ Y|X (y|x) is given in Eq. (2.8a). The nonequilibrium free energy for the system is already defined in Eq. (3.3). Let us similarly define the non-equilibrium free energy of the combined system XY: For XY, there is no difference between Hamiltonian and Hamiltonian of mean force, since XY is in weak interaction with Z. Substituting Eq. (4.5) into Eq. (4.6), and using Eqs. (2.4b) and (2.8), we obtain which says that F XY [p XY ] and F X [p X ] differ only by an additive constant F Y (β), which is, according to Eq. (2.7), independent of λ, x, and hence need to be worried about when we study non-equilibrium processes. Equation (4.7) is a non-equilibrium generalization of Eq. (2.18a).
Let us now consider variations of λ and p X , and study the resulting variation of free energies. Taking the differential of Eq. (3.3), we obtain: where dW, dQ are work and heat at ensemble level, given respectively in Eqs. (3.17) and (3.21). We can rewrite this result into We can also do the similar thing on dF XY [p X ], and obtain an analogous result: where dW XY , dQ XY are the work and heat at ensemble level of XY: dQ XY = x,y dp XY H XY . (4.12) Using Eqs. (4.5) and (3.9) in Eq. (4.11), we see that Taking the differential of Eq. (4.7) we find (4.14) Combining the preceding two equations with Eqs. (4.9) and (4.10), we find Now recall XY is weakly coupled to the super-bath Z, and hence the weak coupling theory of stochastic thermodynamics is applicable. It tells us that dS XY [p XY ] − βdQ XY is positive definite and can be interpreted as the change of total entropy of the universe XYZ. Equation (4.15) then says that the total entropy production is the same, whether we calculate it using the dynamic theory of XY or using the reduced theory X alone. If we understand the dynamic theory of X as a consequence of coarse-graining of the XY dynamics, then Eq. (4.15) says that entropy production is invariant under coarsegraining, as long as the fast variables remain in conditional equilibrium. A similar result was obtained by Esposito [41] in the setting of master equation dynamics.
Furthermore, assuming that XY evolves according to Langevin dynamics (which follows if the dynamics of Z much faster than that of XY), the Clausius inequality can be proved using the Langevin dynamics dS XY [p XY ]− βdQ XY ≥ 0. Hence we have Combining Eqs. (4.16) with (4.15), we finally obtain which not only establishes the Clausius inequality, but also says that the physical meaning of dS X [p X ] − βdQ is indeed the variation of total entropy of the universe. It is interesting to rewrite Eq. (4.17) into Hence −βdQ is the differential of S YZ|X , the conditional Gibbs-Shannon entropy of YZ given the system state X.

V. COMPARISON WITH OTHER THEORIES
In this section, we provide a detailed comparison between the present work and several previous influential works on strong coupling thermodynamics. First of all, we list all major formulas of our theory in the central column of Table I. These formulas are identical to those of the weak coupling stochastic thermodynamic theory, with H X understood as the Hamiltonian of mean force. In the weak coupling limit, H X simply becomes the bare Hamiltonian of the system.
In the theory developed by Seifert [5] and critically evaluated by Talkner and Hänggi [6], the equilibrium free energy of a strongly coupled system is defined in terms of HMF H X as which is the same as Eq. (2.17a). The equilibrium internal energy and entropy are defined as: such that F X =Ẽ X − TS X remains valid. (We useÃ to denote thermodynamic quantity in Seifert's theory if it is different from the corresponding quantity A in our theory.) Note that in our theory, energy and entropy are defined by Eqs. (2.17).
In review of the results obtained in Sec. II C, the following thermodynamic relations hold for XY: The free energy, energy, and entropy of the bath are then defined asF where F 0 Y is the free energy of the bare bath, with the interaction switched off. These results show that in Seifert's theory, the interaction energy and correlation are completely relegated to the system. By contrast, in our theory, interaction energy and correlation are completely relegated to the bath, if we interpret H X as the system Hamiltonian.
Seifert further bootstrap Eqs. (5.2) to the nonequilibrium case, and define fluctuating internal energyH , non-equilibrium internal energyẼ[p X ], and nonequilibrium entropyS[p X ] as follow: The differential of entropy is then given by The non-equilibrium free energy is defined as which is the same as that of our theory, Eq. (3.3).
The work at trajectory level and ensemble level are defined in terms of change of total energy: which are identical to our definitions. The heat at trajectory level is then defined to satisfy the first law: The LHS of Clausius inequality can be calculated: which is again the same as in our theory. As a consequence, the first and second laws of thermodynamics in Seifert's theory are equivalent to those in our theory. This means that these two theories are equivalent to each other, even though they use different definitions of internal energy, entropy, and heat. Major formulas of Seifert's theory are displayed in the right column of Table I. Hanggi and Talkner [6,8] accept the definitions of equilibrium thermodynamic quantities, Eqs. (5.2) . Yet they argue that the non-equilibrium thermodynamic quantities cannot be uniquely determined from their equilibrium versions, which is of course valid. They also argue that the Hamiltonian of mean force cannot be uniquely determined from the equilibrium distribution of system Jarzynski inequality e −βW = e −β∆F X e −βW = e −β∆F X variables alone [45]. They further discuss more serious ambiguities associated with the definition of nonequilibrium work for quantum systems. Jarzynski [7] develops a more comprehensive (and hence more complex) theory for strong coupling thermodynamics, and systematically discusswa the definitions of internal energy, entropy, volume, pressure, enthalpy, and Gibbs free energy. Using a pebble immersed in a liquid as a metaphor, he establishes his formalism around the concept of volume, whose definition is somewhat arbitrary. All other thermodynamic variables are uniquely fixed by thermodynamic consistency once the system volume is (arbitrarily) defined. Jarzynski further shows that Seifert's theory is a special case of his framework, i.e., the "partial molar representation". He discusses in great detail the "bare representation", where the system enthalpy coincides with HMF. The total entropy production is however the same in both representations. The heat and work in the bare representation are formally identical to those in our theory. We note that for many small systems, volume or pressure is seldom controlled. It is then unnecessary to distinguish energy from enthalpy, or Helmholtz free energy from Gibbs free energy.
In all works discussed above, the interaction Hamiltonian H I is assumed to be independent of the external pa-rameter λ, whereas time-scale separation is not assumed. As a consequence, it is possible to prove the integrated Clausius inequality ∆S − βQ ≥ 0 for a finite process, but not possible to prove the differential Clausius inequality dS − βdQ ≥ 0 for every infinitesimal evolution step in the process. Barring the issues of TSS and of λ dependence of the interaction Hamiltonian H I , our theory can be understood as a simplification of Jarzynski's bare representation, with HMF and free energy playing the role of enthalpy and Gibbs free energy.
Strasberg and Esposito [14] study the consequences of TSS in the settings both of master equation theory and of Hamiltonian dynamics. For master equation theory, using the conditional equilibrium nature of the fast variables, they show that a reduced theory of slow variables can be derived once the fast variables are averaged out. Note, however, the heat and internal energy in their reduced theory pertain to the original system consisting of both slow and fast variables, see Eq. (33)-(35) of Ref. [14]. As a consequence, these quantities do not have a finite limit as the dimension of fast variables goes to infinite. For Hamiltonian dynamics, Strasberg and Esposito propose a definition of total entropy production as the relative entropy, and show that, with TSS, it is equivalent to that in Seifert's theory, which is also equivalent to en-tropy production in our theory, as we have demonstrated in Eq. (5.13). By this, they confirm the consistency of Seifert's strong coupling theory.
By contrast, in the present work, we use TSS to carry out a different decomposition of Hamiltonian as discussed in Sec. II A. This leads to a remarkable situation where all formulas of the weak coupling theory of stochastic thermodynamics remain applicable even in the strong coupling regime. These formulas are significant simpler than those in Seifert's strong coupling theory. For a comparison, see Table I.
The differences between the present theory and Seifert's theory are however not completely notational. Consider a "fast" slow process with time duration dt where λ changes by dλ. It is slow enough so that the bath remains in conditional equilibrium, and our stochastic thermodynamic theory remains applicable. Yet it is also fast enough so that the distribution p X barely changes. Such a process can always be realized if TSS is satisfied. Hence we have d λ H X = 0, but dp X = 0. According to the present theory, then both dS X and dQ vanish, and hence the variation of total entropy dS X − βdQ also vanishes. Now in Seifert's theory, dS X and dQ are given respectively by Eqs. (5.7) and (5.12). Neither of these two vanishes even if dp X = 0, yet the variation of the total entropy dS X − βdQ does vanish. This means that in Seifert's theory there is an exchange of entropy between the system and bath even though p X remains unchanged. While this does not violate the second law of thermodynamics, it does contradict the common intuition about entropy as measure of multitude of system states: It is very strange if the pdf of system variable stay unchanged, yet the system entropy changes suddenly! From this perspective, the present theory is more natural and intuitive.

VI. CONCLUSION
In this work, we have demonstrated that the usual theory of strong coupling thermodynamics and stochastic thermodynamics, which is based on the assumption of weak coupling between the system and its environment, can be made applicable in the strong coupling regime, if we define the Hamiltonian of mean force as the system Hamiltonian. Our result is consistent with previous theories by various authors, in the sense that the first and second laws in different theories are mathematically equivalent. Overall, the present work can be understood as a re-interpretation, synthesis, and simplification of various previous theories of strong coupling stochastic thermodynamics.
In a future work, we will conduct a systematic study of coarse-graining process, i.e. integrating out fast variables to obtain an effective dynamic theory for slow variables, with the ratio of time scales between the slow and fast variables treated as a small parameter. If this ratio is small but nonzero, there should be slight deviation of fast variables from conditional equilibrium. We shall analyze how this deviation leads to modification of dissipation in the dynamics of slow variables. We shall also extend our theory to quantum case, and develop a thermodynamic theory for small open quantum systems strongly coupled to environment.
X.X. acknowledges support from NSFC grant #11674217 as well as Shanghai Municipal Science and Technology Major Project (Grant No.2019SHZDZX01). Z.C.T. acknowledges support from NSFC grant #11675017.
X.X. is also thankful to additional support from a Shanghai Talent Program.