Entropy production and thermodynamics of information under protocol constraints

We investigate bounds on the entropy production (EP) and extractable work involved in transforming a system from some initial distribution p to some ﬁnal distribution p (cid:48) , given the driving protocol constraint that the dynamical generators belong to some ﬁxed set. We ﬁrst show that, for any operator φ over distributions that (1) obeys the Pythagorean theorem from information geometry and (2) commutes with the set of available dynamical generators, the contraction of KL divergence D ( p (cid:107) φ ( p )) − D ( p (cid:48) (cid:107) φ ( p (cid:48) )) provides a non-negative lower bound on EP. We also derive a bound on extractable work, as well as a decomposition of the non-equilibrium free energy into an “accessible free energy” (which can be extracted as work) and “inaccessible free energy” (which must be dissipated as EP). We use our general results to derive bounds on EP and work that reﬂect symmetry, modularity, and coarse-graining constraints. We also use our results to decompose the information acquired in a measurement of a system into “accessible information” (which can be used to extract work from the system) and “inaccessible information” (which cannot be used to extract work from the system). Our approach is demonstrated on several examples, including diﬀerent kinds of Szilard boxes and discrete-state master equations.


A. Background
One of the foundational issues in thermodynamics is quantifying how much work is required to transform a system between two thermodynamic states. Recent results in statistical physics have derived general bounds on work which hold even for transformations between non-equilibrium states [1,2]. In particular, suppose one wishes to transform a system with initial distribution p and energy function E to some final distribution p and energy function E . For an isothermal process, during which the system remains in contact with a single heat bath at inverse temperature β, the work extracted during this transformation obeys where F (p, E) := E p − β −1 S(p) is the non-equilibrium free energy [1-3]. The inequality Eq. (1) comes from the second law of thermodynamics, which states that entropy production (EP), the total increase of the entropy of the system and all coupled reservoirs, is non-negative. For an isothermal process, the EP generated in carrying out the transformation p p is proportional to the remainder of the inequality in Eq. (1), To extract work from a system, one must apply a driving protocol to the system, which we formalize as a timedependent trajectory of dynamical generators (e.g., rate matrices for discrete-state systems, or Fokker-Planck operations for continuous-state systems). There are many different protocols that can be used to transform some initial distribution p to some final distribution p , which will generally incur different Figure 1. A two-dimensional Szilard box with a single particle, where a vertical partition (blue) can be positioned at different horizontal locations in the box. For this setup, we demonstrate that only information about the particle's horizontal position, not its vertical position, can be used to extract work from the system. amounts of EP and work. Saturating the fundamental bounds set by the second law, such as Eq. (1), typically requires idealized driving protocols, which make use of arbitrary energy functions, infinite timescales, etc. In many real-world scenarios, however, there are strong practical constraints that make such idealized driving protocols unavailable.
The goal of this paper is to derive stronger bounds on EP and work, which arise in the presence of constraints on the driving protocols. We formalize constraints by assuming that, during the entire protocol taking p → p , the dynamical generators belong to some limited set of available generators. Deriving stronger bounds on EP and work may provide new insights into various real-world thermodynamic processes and workharvesting devices, ranging from biological organisms to artificial engines. It may also cast new light on some well-studied scenarios in statistical physics.
For example, consider a two-dimensional Szilard box connected to a heat bath, which contains a single Brownian particle and a vertical partition [4], as shown in Fig. 1. We assume that the horizontal position of the vertical partition can be manipulated by the driving protocols. Imagine that the particle is initially located in the left half of the box. How much work can be extracted by transforming this initial distribution arXiv:2008.10764v1 [cond-mat.stat-mech] 25 Aug 2020 to a uniform final distribution, assuming the system begins and ends with a uniform energy function? A simple application of Eq. (1) shows that the extractable work is bounded by β −1 ln 2. This upper bound can be achieved by quickly moving the vertical partition to the middle of the box, and then slowly expanding it rightward. Now imagine an alternative scenario, in which the particle is initially located in the top half of the box. By Eq. (1), the work that can be extracted by bringing this initial distribution to a uniform final distribution is again bounded by β −1 ln 2. Intuitively, however, it seems that this bound should not be achievable, given the constrained set of available protocols (i.e., given that one can only manipulate the system by moving the vertical partition). Our results will make this intuition rigorous for the two-dimensional Szilard box, as well as various other systems that can only be manipulated by a constrained set of driving protocols.
This phenomenon also occurs when the starting and ending distributions can depend on the outcome of a measurement of the system (a setup which is usually called "feedback-control" in statistical physics [2,5]). Imagine that the state of some system X is first measured using some (generally noisy) observation apparatus, producing measurement outcome m, after which the system undergoes a driving protocol which can depend on m. Let p X|m indicate the initial distribution over the states of the system conditioned on measurement outcome m, and let p X |m indicate the corresponding distribution at the end of the driving protocol. Assuming that the system's energy function begins as E and ends as E for all measurement outcomes, the average work that can be extracted from the system is bounded by where we used Eq. (1) and Eq. (4), the bound on average extractable work increases with the drop of mutual information. This relationship between work and information is a prototypical example of the so-called "thermodynamics of information" [2]. Just like Eq. (1), the bound of Eq. (4) is typically achieved by idealized protocols, which have access to arbitrary energy functions, infinite timescales, etc. As mentioned above, in the real-world there are typically constraints on the available protocols, in which case the bound of Eq. (4) may not be achievable. As an example, consider again the Szilard box shown in Fig. 1. Imagine measuring a bit of information about the location of the particle, and that using this information to extract work while driving the system back to a uniform equilibrium distribution, so that I(X; M ) = ln 2 and I(X ; M ) = 0. If the system starts and ends with the uniform energy function, then Eq. (4) states that W ≤ β −1 ln 2. Intuitively, however, it seems that measuring the particle's horizontal position should be useful for extracting work from the system, while measuring the particle's vertical position should not be useful. The general bound of Eq. (4) does not distinguish between these two kinds of measurements. In fact, this bound depends only on the overall amount of information acquired by the measurement, and is therefore completely insensitive to the content of that information.

B. Summary of results
In this paper we derive bounds on extractable work and EP which arise when the transformation p p is carried out by a constrained driving protocol. As mentioned, we formalize such constraints by assuming that at all t, the generator of the dynamics is in some fixed set Λ. We use the term (constrained) driving protocol to refer to a trajectory of time-dependent dynamical generators L(t), such that L(t) ∈ Λ at all t.
Below, we begin by deriving several general results. After that, we use those results to analyze EP and work bounds for specific types of constraints, such as constraints on symmetry, modularity, and coarse-graining of the dynamical generators. For simplicity of presentation, in the main text we focus entirely on isothermal protocols. We discuss how our results apply to more general types of protocols in Section VIII (and in Appendix A).

Overview of general results
Our general results begin by assuming that for a given set of constraints Λ, one can define an operator φ over distributions that (1) obeys the so-called Pythagorean theorem from information geometry [6] and (2) commutes with all available dynamical generators in Λ. Given an operator φ that satisfies these conditions, we derive several useful decompositions and bounds on the EP and work involved in transforming p p under constraints.
First, we show that any constrained driving protocol L(t) that maps p p must also map the initial distribution φ(p) to the final distribution φ(p ), and that the corresponding EP obeys where D(· ·) is the Kullback-Leibler (KL) divergence. In words, the EP incurred when carrying out the transformation p p is equal to the contraction of KL divergence between p and φ(p) from the beginning to the end of the protocol, plus the EP incurred by the same protocol when carrying out the transformation φ(p) φ(p ). Given the non-negativity of EP, Eq. (5) implies the following lower bound: which is our first main result. This bound is shown schematically in Fig. 2.
We define an operator φ such that the EP incurred by any constrained protocol that transforms p p (solid gray line) is equal to the EP incurred by that protocol when transforming φ(p) φ(p ) (dashed gray line), plus the contraction of the KL divergence D(p φ(p)) − D(p φ(p )) (contraction of green lines). We show that this contraction of KL divergence provides a non-negative lower bound on EP.
We then derive the following decomposition of the nonequilibrium free energy, Applying this decomposition both at the beginning and end of the protocol, and combining with Eqs.
(2) and (6), leads to the following upper bound on the work that can be extracted by transforming p p : Negating both sides of this inequality gives a lower bound on −W (p p ), the work that must be invested in order to carry out the transformation p p in the presence of constraints. Eq. (7) can be understood as a decomposition of the nonequilibrium free energy into a sum of an accessible free energy F (φ(p), E), which can be turned into work given the protocol constraints (given Eq. (8)), and an inaccessible free energy β −1 D(p φ(p)), which cannot be turned into work and must be dissipated as EP (given Eq. (6)). The accessible free energy is always less than the overall free energy, F (φ(p), E) ≤ F (p, E), which follows from Eq. (7) and the non-negativity of KL divergence. It can also be shown, via Eqs. (2) and (5), that the work extracted by any allowed protocol that transforms p p is equal to the work extracted by that protocol when transforming φ(p) φ(p ), Thus, φ maps each distribution p to another distribution φ(p), which captures that part of p which is useful for work extraction.
Our last general result, and our second main result, shows that the drop of KL divergence between p and φ(p) obeys Eq. (10) implies an irreversibility condition on the dynamics: for any two distributions p and p , any constrained driving protocol can either carry out the transformation p p or the transformation p p, but not both, unless D(p φ(p)) = D(p φ(p )). This inequality also implies that our bounds on EP and work, as in Eqs. (6) and (8) respectively, are stronger than those provided by the second law, as in Σ ≥ 0 and Eq. (1). Fig. 2 provides a schematic way of understanding our results. Consider a constrained protocol that carries out the map p p , and imagine that there is an operator φ that satisfies the Pythagorean theorem and commutes with the constrained set of dynamical generators Λ. By Eq. (5), the EP incurred during the system's actual trajectory (solid gray line) is given by the drop in the distance from the system's distribution to the set img φ over the course of the protocol, plus the EP that would incurred by a "projected trajectory" that transforms φ(p) φ(p ) while staying within img φ (dashed gray line). Since the EP of the projected trajectory must be non-negative, the drop in the distance from the system's distribution to img φ must be dissipated as EP, Eq. (6). In addition, by Eq. (10) this decrease in the distance must be positive, meaning that the system must get closer to img φ over the course of the protocol. Finally, it can be helpful to imagine the trajectory p p as composed of three parts: a segment from p down to φ(p), the projected trajectory from φ(p) to φ(p ) that stays within img φ, and a final segment from φ(p ) up to p . (Note that this decomposition is helpful for thermodynamic accounting, and will not generally reflect the actual trajectory the system takes from p to p .) Then, the first and third segments contribute (positive and negatively, respectively) only to EP, not extractable work. On the other hand, the projected trajectory segment (φ(p) φ(p )) contributes both to EP and to extractable work. Thus, the work that can be extracted during p p is determined by the projected trajectory φ(p) φ(p ), leading to Eq. (9) and Eq. (8).
Finally, note that for a given Λ, in general there may be several different operators φ with the desired properties. These different φ will give rise to different decompositions and bounds on EP and work, some of which may be tighter than others. This is discussed in more depth in Section IV.

Overview of the three applications of our general results
Eqs. (5) to (10) are our theoretical results, which we use to analyze bounds on EP and work for several specific types of protocol constraints. Specifically, we apply them to three different types of protocol constraints: 1. Symmetry constraints, which arise when driving protocols obey some symmetry group. An example is provided by the Szilard box in Fig. 1, which possesses vertical reflection symmetry.
2. Modularity constraints, which arise when different (possibly overlapping) subsystems of a multivariate system evolve independently of each other. An example is provided by the Szilard box in Fig. 1, where the particle's horizontal and vertical position evolve independently of each other.
3. Coarse-graining constraints, which arise when the driving protocols exhibit closed coarse-grained dynamics over a set of macrostates, and these coarse-grained dynamics obey some constraints. An example of coarsegrained constraints is provided by the Szilard box in Fig. 1: the particle's vertical position (the macrostate) evolves in a way that does not depend on the horizontal position (i.e., it has closed dynamics), and its distribution cannot be controlled by moving the partition.
Each of these types of constraints corresponds to a different set of allowed dynamical generators Λ, as well as a different operator φ. For example, when considering symmetry constraints, φ will map each p to its "symmetrized" version, which is invariant under the action of the symmetry group. Similarly, for modularity constraints, φ will map each p to a distribution where the statistical correlations between subsystems are destroyed.

C. Implications for thermodynamics of information
Our results have important implications for the thermodynamics of information. They establish rigorously that in the presence of constraints, the thermodynamic value of information depends not only the amount of measured information, but also the content of the measurement [7]. In particular, our results allow one to decompose the information acquired by a measurement into accessible information (which can be exploited by available driving protocols to extract work) and inaccessible information (which cannot be exploited in this way). Loosely speaking, the amount of accessible information in a measurement reflects the "alignment" between the choice of measurement observable and the way the system can be manipulated, given protocol constraints [8,9].
We summarize these results more formally using the feedback-control setup discussed above, where an observation apparatus makes a measurement m, after which the system undergoes a driving protocol that depends on m. Imagine that all available driving protocols obey some set of constraints, such that we can define an operator φ that satisfies Eq. (8). For notational convenience, let pX |m = φ(p X|m ) and p X |m = φ(p X |m ) indicate the initial and final conditional distributions mapped through φ, and letp = m p m pX |m andp = m p m p X |m indicate the corresponding marginal distributions. Then, by averaging Eq. (8) across measurement outcomes, we can bound the average extractable work as This provides a refinement of Eq. (3) that includes the effects protocol constraints. By rearranging Eq. (11), we can also derive the following refinement of Eq. (4): where I(X; M ) and I(X ; M ) indicate the mutual information between measurement and system under the mapped conditional distributions pX |m and p X |m , respectively. Thus, for a fixedp andp , it is the mutual information in these mapped distributions, I(X; M ), that quantifies the accessible information that may be relevant for work extraction, rather than the actual mutual information I(X; M ). As a special case of the above analysis (which holds in the examples we analyze below), assume that the marginal distributions obeyp = p andp = p . Then, by using the chain rule for KL divergence, the total mutual information can be written as a sum of accessible and inaccessible information, Finally, we note that by taking expectations of Eqs. (5) and (6) across different measurement outcomes, we can derive decompositions and bounds on the average EP incurred by feedback-control processes [10]. For instance, by averaging Eq. (6), the expected EP incurred across different measurements can be bounded as This shows that the expected EP is lower bounded by the drop of the inaccessible information from Eq. (13).

D. Roadmap
In the next section, we discuss relevant prior work. In Section III, we introduce our physical setup, and review background from non-equilibrium statistical physics. We introduce our theoretical framework and main results in Section IV. In order to use those results, one must define an operator φ that satisfies the necessary properties (Pythagorean theorem and commutativity). In the rest of the paper we show how to do this for three very common types of protocol constraints. First, we use this framework to analyze symmetry constraints in Section V. Then we analyze modularity constraints in Section VI, and we end by analyzing coarse-graining constraints in Section VII. We finish with a brief discussion, which touches upon how our results generalize beyond the isothermal assumption, in Section VIII. All derivations and proof are in the appendix.

II. PRIOR WORK
Wilming et al. [11] analyzed how extractable work depends on constraints on the Hamiltonian, in the context of a quantum system coupled to a quantum finite-sized heat bath. That paper derived an upper bound on the work that could be extracted by carrying out a physical process which consists of sequences of (1) unitary transformations of the system and bath, and (2) total relaxations of the system to some equilibrium Gibbs state. In contrast, we consider a classical system coupled to one or more classical idealized reservoir, and derive bounds on EP and work under a much broader set of protocols.
Our approach can be related to so-called "resource theories", which have become popular in various areas of quantum physics [12]. A resource theory quantifies a physical resource in an operational way, in terms of what transformations are possible when the resource is available. This kinds of approaches have provided operational descriptions of resources such as coherence [13,14], asymmetry [15][16][17], entanglement [18], as well as free energy and other thermodynamic quantities [11,[19][20][21][22][23]. Most resource theories are based on a common set of formal elements, such as a resource quantifier (a real-valued function that measures the amount of a resource), a set of free states (statistical states that lack the resource), and free operations (transformations between statistical states that do not increase the amount of resource).
As we discuss in more detail below, our results on symmetry constraints are related to some previous work on the resource theory of asymmetry [17,[24][25][26]]. In addition, at a high level, we are inspired by similar operational motivations as resource theories; for example, we define "accessible free energy" in an operational way as a quantity that governs extractable work under protocol constraints. Furthermore, many elements of our general approach can be mapped onto the resource theory framework: the set of allowed dynamical generators (which we call Λ) plays the role of the free operations, the image of the operator φ plays the role of the set of free states, and the KL divergence D(p φ(p)) serves as the resource quantifier. The commutativity condition used in Theorems 1 and 2 (see Section IV) has also appeared in work on so-called resource destroying maps [27]. However, unlike most resource theories, our focus is on the thermodynamics of classical systems described as driven continuous-time open systems. Further exploration of the connection between our approach and resource theories is left for future work.
A recent paper by Still [28] also considered thermodynamics of information under constraints, and proposed a decomposition of mutual information into accessible and inaccessible components under the constraint that the protocol cannot change the conditional distribution over some subsystem. This constraint simplifies the thermodynamic analysis and leads to some interesting results, but is too strict to apply to many realistic setups (such as the model of the Szilard box, which we analyze below). Our results derive a decomposition of acquired information into accessible and inaccessible components (based on our bounds on EP and work) for a much broader set of constraints and processes.
Our results also complement previous research on the relationship between EP (and extractable work) and different aspects of the driving protocol, such as temporal duration [29-34], stochasticity of control parameters [35], presence of nonidealized work reservoirs [36], as well as research on the the design of "optimal protocols" [37-39].
Finally, there are other results in the literature that are related more narrowly to either our analysis of symmetry constraints, modularity constraints, or coarse-graining constraints. We mention those results in the associated sections below.

III. PRELIMINARIES
We consider a physical system with state space X. The state space X can be either discrete or continuous (X ⊆ R n ). In the discrete case, the term "probability distribution" will refer to a probability mass function over X, and p(x) will refer to the probability mass of state x. In the continuous case, the term "probability distribution" will refer to a probability density function over X, and p(x) will refer to the probability density of state x. We will use the notation P to refer to the set of all probability distributions over X. For any probability distribution p ∈ P, we use E p [·] to indicate expectation, S(p) = E p [− ln p] to indicate (discrete or differential) Shannon entropy, and D(p q) = E p [− ln q] − S(p) to indicate KL divergence.
We assume that the system is coupled to a heat bath at inverse temperature β (see Appendix A for a generalization of this assumption), and therefore evolves in a stochastic manner. At time t, the system's state distribution evolves according to where L(t) is a continuous linear operator that represents an infinitesimal dynamical generator of a Markovian stochastic process. For discrete-state systems, L(t) will represent a discrete-state master equation, where L x x (t) indicates the transition rate from state x to state x at time t. For continuous-state systems, L(t) will represent a continuous-state master equation [40,41], an important special case of which is a Fokker-Planck equation, where A and D are drift and diffusion terms [40,42]. Given our assumptions, the rate of entropy production (EP rate) for distribution p and dynamical generator L iṡ where ∂ t p(t) = Lp, and π L indicates the equilibrium stationary distribution of L [43][44][45]. Throughout this paper, we assume that all dynamical generators have at least one stationary distribution. For any L that has more than one stationary distribution, π L can be chosen to indicate any stationary distribution of L which has maximal support. We use the term (driving) protocol to refer to a timedependent trajectory of dynamical generators L(t) over t ∈ [0, 1] (the units of time are arbitrary, so the choice of the time interval is made without loss of generality). For a given driving protocol and initial distribution p, we use p(t) to indicate the solution to Eq. (15) for some initial condition p(0) = p. Given an isothermal protocol that transforms initial distribution p = p(0) to final distribution p = p(1), the overall integrated EP is given by the integral of the EP rate, Σ(p p ) = 1 0Σ (p(t), L(t)) dt. As mentioned, we formalize the notion of driving protocol constraints by assuming that a limited set of dynamical generators Λ is available, i.e., that any allowed driving protocol obeys L(t) ∈ Λ at all t ∈ [0, 1]. We consider several different types of constraints (such as those that reflect symmetry, modularity, and coarse-graining), which correspond to different sets of available dynamical generators Λ. We assume that the desired final distribution p is reachable from initial distribution p by at least one constrained driving protocol.

IV. THEORETICAL FRAMEWORK
We begin by deriving several general results, which we later use to derive bounds for concrete kinds of constraints.
Given some set of available dynamical generators Λ, we first show that if there is an operator φ : P → P that obeys the following two conditions, then Eqs. (5) to (10) hold for any constrained driving protocol L(t).
The first condition is that the operator φ obeys for all p ∈ P and q ∈ img φ := {φ(p) : p ∈ P}. This relation is called the Pythagorean theorem of KL divergence in information geometry [6]. It can be shown that any φ that obeys Eq. (19) can be expressed as the projection φ(p) = arg min s∈img φ D(p s). We also make the weak technical assumption that for all p, the support of φ(p) contains the support of p (this is necessary for D(p φ(p)) < ∞, see [46]). The second condition is that for all L ∈ Λ, the operator φ obeys the following commutativity relation: This condition states that given any initial distribution p, the same final distribution is reached regardless of whether p first evolves under L ∈ Λ for time τ and then undergoes mapping under φ, or instead first undergoes mapping under φ and then evolves under L ∈ Λ for time τ . The next result, which is proved in Appendix B, shows that if φ obeys the two conditions stated above for a given L, then the EP rate incurred by distribution p(t) under L can be written as the sum of two non-negative terms: the instantaneous contraction of the KL divergence between p(t) and φ(p(t)), and the EP rate incurred by φ(p(t)) under L. Theorem 1. If φ obeys Eq. (19) and Eq. (20) for some L, then for all p ∈ P that evolve as ∂ t p(t) = Lp(t), We sketch the proof of this theorem in terms of a discretetime relaxation over interval τ , as visually diagrammed in Fig. 3 (the continuous-time statement, as in Theorem 1, follows by taking the τ → 0 limit, and using the definition of EP rate in Eq. (18)). Consider some distribution p that relaxes for time τ under dynamical generator L, thereby reaching the distribution e τ L p (solid gray line). The EP incurred by this relaxation is given by the contraction of KL divergence to the equilibrium distribution π, Σ(p e τ L p) = D(p π) − D(e τ L p π) (contraction of purple lines) [43,44]. Without loss of generality, we assume that π ∈ img φ (Lemma 1 in Appendix B), which allows us to apply the Pythagorean Distribution p freely relaxes under L for time τ (solid gray line). The EP incurred during this relaxation (contraction of purple lines) can be decomposed into the contraction of the KL divergence between p and φ(p) (contraction of green lines), which is non-negative, plus the EP incurred during the free relaxation of φ(p) (contraction of red lines). theorem, Eq. (19), to both D(p π) and D(e τ L p π). Along with the commutativity condition, Eq. (20), this means that the EP Σ(p e τ L p) can be written as the contraction of the KL divergence from p to φ(p) (green lines), plus the contraction of the KL divergence from φ(p) to π (red lines). The former contraction is non-negative by the data-processing inequality. The latter contraction is equal to Σ(φ(p) e τ L φ(p)), the EP incurred by letting φ(p) relax freely under L. Now assume that the commutativity relation Eq. (20) holds for all L ∈ Λ. In Lemma 3 in Appendix B, we prove that any constrained driving protocol that carries out the transformation p p must also transform initial distribution φ(p) to final distribution φ(p ). Note also that, for a constrained driving protocol, the assumptions of Theorem 1 hold at all times. Using these facts, in Appendix B we prove the following result about integrated EP.
Theorem 2. If φ obeys Eq. (19) and Eq. (20) for all L ∈ Λ, then for any allowed protocol that transforms p p , The results in Theorem 2 appeared as Eq. (5) and Eq. (10) in the introduction. It is clear that Eq. (5) implies the EP bound in Eq. (6), since EP is non-negative.
We now derive the decomposition of non-equilibrium free energy in Eq. (7). First, write the non-equilibrium free energy at the beginning of the protocol, F (p, E), as where π ∝ e −βE is the Boltzmann distribution for energy function E [3]. To guarantee that this non-equilibrium free energy is well defined, we assume that π is the unique equilibrium distribution of the the dynamical generator L(0) at the beginning of the protocol (if L(0) has more than one equilibrium distribution, then thermodynamic quantities such as rate of heat flow can be equivalently defined in terms of more than one energy function, so the meaning of E becomes ambiguous). Then, π ∈ img φ by Lemma 1 in Appendix B, and we use the Pythagorean theorem, Eq. (19), to rewrite Eq. (21) as which appeared as Eq. (7) in the introduction. We can similarly decompose the final non-equilibrium free energy as By combining these decompositions with Eqs.
Finally, we show how to use these results to derive Eq. (9), which states that the same amount of work is extracted when transforming p p as when transforming φ(p) φ(p ). Plugging in Eqs. (5), (22) and (23) into Eq. (2) allows us to write By Lemma 3 in Appendix B, any constrained driving protocol that carries out p p must also carry out φ(p) φ(p ). Using Eq.
Note that the Pythagorean theorem in Eq. (19) concerns only the operator φ, while the commutativity relation in Eq. (20) concerns both φ and Λ. Note also that for any set of dynamical generators Λ, there can be many different φ that satisfy Eq. (20) (as well as Eq. (19)). Different choices of φ will give different decompositions of EP in Eq. (5), as well as different bounds on EP and work in Eqs. (6) and (8), some of which may be tighter than others. Generally speaking, tighter bounds arise from operators that have smaller images. To illustrate this issue, consider the extreme case where φ is the identity mapping, φ(p) = p for all p (so img φ = P). It is easy to verify that Eqs. (19) and (20) will always hold for this φ. In this case, however, Eq. (5) reduces to a trivial identity, and the lower bound on EP in Eq. (6) is just 0. At the other extreme, imagine that there is only a single dynamical generator available, Λ = {L}, which has a unique equilibrium distribution π. For this Λ, the operator φ(p) = π for all p satisfies Eqs. (19) and (20), and when plugged into Eq. (6) leads the following bound on EP: In fact, the right hand side is an exact expression (not just a lower bound) for the EP incurred by a free relaxation towards the equilibrium distribution π [43,44]. Thus, our bounds are tightest in the case when Λ contains a single dynamical generator and φ maps all distribution to its equilibrium distribution (so img φ is a singleton). If Λ contains multiple dynamical generators with different equilibrium distributions, then the operator φ(p) = π will in general violate the commutativity condition in Eq. (20). Loosely speaking, as the set of available dynamical generators Λ grows in size (i.e., the protocol constraints are weakened), the image of any φ that satisfies the commutativity condition will grow larger, and the resulting bounds on EP and work will become weaker.
Finally, it is important to note that we do not demonstrate that the EP bound of Eq. (6) can always be achieved. However, it is possible to achieve it in some special cases. For example, imagine that the final distribution obeys p = φ(p ) (e.g., this holds if the process ends on an equilibrium distribution, see Lemma 1 in the Appendix B). In that case, Eq. (6) reduces to It can be shown that there exists sets of dynamical generators Λ such that the commutativity condition in Eq. (20) holds and such that this inequality is tight. Specifically, consider the set Λ such that every p ∈ img φ is the equilibrium distribution of some L ∈ Λ (such Λ are simple to construct for the kinds of symmetry, modularity, and coarse-graining constraints we consider below). The bound of Eq. (27) can be achieved via the following two-step process: first, let the initial distribution p relax freely to the equilibrium distribution φ(p); second, carry out a quasistatic protocol that transforms φ(p) to the final distribution φ(p ), while remaining in equilibrium throughout.
(This procedure can be understood visually by using Fig. 2: the system first relaxes along the green arrow connecting p to φ(p), then follows the dashed line to φ(p ) in a quastatic manner.) The free relaxation step incurs D(p φ(p)) of EP, while the second step incurs zero EP. Thus, this two-step procedure will achieve the bound of Eq. (27).

V. SYMMETRY CONSTRAINTS
We now use the theoretical framework outlined in the last section to derive bounds on EP when the driving protocol obeys symmetry constraints.
Consider a compact group G which acts on the state space X, such that each g ∈ G corresponds to a bijection υ g : X → X. For continuous state spaces, we assume that each υ g is a rigid transformation (i.e., a combination of reflections, rotations, and translations). Now define the following operator φ G which maps each function f : X → R to its uniform average under the action of G: where µ is the Haar measure over G. Following the terminology in the literature, we refer to φ G as the twirling operator [17,47]. In Appendix C, we show that this operator obeys the Pythagorean theorem, Eq. (19).
We say that a set of dynamical generators obeys symmetry constraints with respect to group G if the commutativity relation Eq. (20) holds for the operator φ G . When a given L is a discrete-state master equation, this commutativity condition holds when the transition rates obey Simple sufficient conditions for Eq. (20) can also be derived for potential-driven Fokker-Planck equations of the type In this case, the commutativity relation holds if all available energy functions are invariant under the action of G, (See Appendix C for derivation of Eq. (20) from Eq. (29) and Eq. (31).) Given Theorem 2, any protocol that carries out the transformation p p while obeying symmetry constraints with respect to group G permits the decomposition of EP found in Eq. (5), with φ = φ G . Note that D(p φ G (p)) is a non-negative measure of the asymmetry in distribution p with respect to the symmetry group G, which vanishes when p is invariant under φ G . Thus, Eq. (5) implies that the EP incurred by a protocol that obeys symmetry constraints is given by the "drop in the asymmetry" of the system's distribution over the course of the protocol, plus the EP that would be incurred by the twirled (and therefore symmetric) initial distribution. Theorem 2 also implies the following bound on EP, The first inequality comes from the non-negativity of the EP. The second inequality states that the asymmetry in the system's distribution can only decrease under any driving protocol that obeys symmetry constraints. The accessible free energy in Eq. (8) is given by F (φ G (p), E). This means that the drop of asymmetry, the middle term in Eq. (32), cannot be extracted by any driving protocol that carries out the transformation p p , while obeying symmetry constraints with respect to G. Conversely, the greater the drop in asymmetry, the more work needs to be invested by any driving protocol that obeys symmetry constraints and carries out the transformation p p .
Note that some related results have been previously derived in the context of quantum resource theory of asymmetry [17]. This research considered a finite-state quantum system, coupled to a heat bath and a work reservoir. It then analyzed how much work can be extracting by bringing some initial quantum state ρ to a maximally mixed state, with a uniform initial and final Hamiltonian, using discrete-time operations that commute with the action of some symmetry group G. It was shown that the work extractable from ρ under such transformations is equal to the work extractable from the (quantum) twirling φ G (ρ). This research also derived a derived an operational measure of asymmetry that is equivalent to D(p φ G (p)) [48], and showed that asymmetry can only decrease under symmetry-obeying operations. Our results are restricted to classical systems, but are otherwise more general: they hold for transformations between arbitrary initial and final distributions and energy functions, they apply to arbitrary (finite or infinite) state spaces and to systems coupled to more than one reservoir (Appendix A), and they provide bounds not only on work but also EP.

A. Example: Szilard box
We demonstrate the implications of our results using the Szilard box shown in Fig. 1. We assume that the box is coupled to a single heat bath at inverse temperature β, and that the particle inside the box undergoes overdamped Fokker-Planck dynamics, so that each L ∈ Λ has the form of Eq. (30). The system's state is represented by a horizontal and a vertical coordinate, By choosing different L ∈ Λ, one can manipulate the potential energy function of the box, thereby moving the vertical partition. This means that all available energy functions have the form where λ 1 ∈ R is a controllable parameter that determines the location of the vertical partition, V p is the partition's repulsion potential, and V w is the repulsion potential of the box walls. The box extends over We also assume that V p (x, λ 1 ) = 0 whenever |λ 1 | ≥ 1, meaning that the partition is completely removed when λ 1 is outside the box. This means that when |λ 1 | ≥ 1, the energy function is constant within the box, corresponding to a uniform equilibrium distribution. We write this uniform energy function as E ∅ and its corresponding equilibrium distribution as π ∅ ∝ e −βE ∅ . Let G be the two-element symmetric group S 2 , which acts on X via the vertical reflection (x 1 , x 2 ) → (x 1 , −x 2 ). For notational convenience, for any density p, definep(x 1 , x 2 ) = p(x 1 , −x 2 ). Then, the twirling φ G (p) is the uniform mixture of p and its reflection, φ G (p) = (p +p)/2. Since the energy function obeys E(x 1 , x 2 ) = E(x 1 , −x 2 ) for all L, Eq. (31) holds and the condition for Eq. (32) are satisfied. This means that for any constrained driving protocol that transforms p p , wherep is the final distribution corresponding to initial distributionp. This also gives the accessible free energy function F ((p +p)/2, E).
We now derive bounds on the work that can be extracted from the Szilard box. Consider some driving protocol which starts and ends with partition removed. Assume that under Figure 4. Work bounds under symmetry constraints. Consider a Szilard box with a Brownian particle and a partition (blue) which can be moved horizontally. We show that no work can be extracted from an initial distribution which is uniform over the top half of the box, as long as the partition has a vertically symmetric shape. the initial distribution p, the particle is uniformly distributed across the top half of the box, while the final distribution is in equilibrium, p = π ∅ . How much work can be extracted? The general bound provided by the second law, Eq. (1), is However, this bound is too optimistic given the driving constraints. In fact, the twirling of the initial distribution p is a uniform distribution over the box, (p +p)/2 = π ∅ , so the accessible free energy at the beginning of the protocol is equal to F (π ∅ , E ∅ ). Since π ∅ is invariant under twirling, π ∅ = φ G (π ∅ ), the accessible free energy at the end of the protocol is also given by F (π ∅ , E ∅ ). Using Eq. (8), we arrive at a tighter bound meaning that no work can be extracted from this initial distribution p given the available driving protocols. Now consider a different scenario, in which the particle's initial distribution p is uniform across the left half of the box. This distribution p is invariant under vertical reflection, p =p, so F ((p +p)/2, E) = F (p, E). Using Eq. (8), we recover which is the same as the bound set by the second law, Eq. (36). This work bound can be achieved by quickly moving the partition to the middle of the box, and then slowly moving it to the right of the box. We can also consider the thermodynamic value of different measurements for this Szilard box. Imagine that one can choose between two different 1-bit measurements: (1) measuring whether the particle is in the top or the bottom half of the box, or (2) measuring whether the particle is in the left or right half of the box. Using Eq. (11) and Eq. (37) gives W ≤ 0 for the first measurement, which means that the acquired information is not useful for work extraction. For the second measurement, Eq. (38) gives the bound W ≤ β −1 ln 2, which means that the acquired information may used to extract work.
These results hold not just for energy functions of the form Eq. (33), but whenever the energy functions obey the vertical reflection symmetry E(x 1 , x 2 ) = E(x 1 , −x 2 ). In particular, these results hold not only when the Szilard box has a simple vertical partition, but when the partition has any vertically symmetric shape, as illustrated in Fig. 4.

B. Example: discrete-state master equation
We also demonstrate the implications of our results using a discrete-state system. The system consists of a set of N states, indexed as X = {0, . . . , N − 1}. We consider a group generated by circular shifts, representing m-fold circular symmetry: Assume that all available rate matrices L ∈ Λ obey this symmetry group: An example of such a master equation would be a unicyclic network, where the N states are arranged in a ring, and transitions between nearest-neighbor states obey the symmetry of Eq. (40). Such unicyclic networks are often used to model biochemical oscillators and similar biological systems [49]. This kind of system is illustrated in Fig. 5, with N = 12 and m = 4 (4-fold symmetry). Imagine that this system starts from the initial distribution p(x) ∝ x, so the probability grows linearly from 0 (for x = 0) to maximal (for x = N − 1). For the 12 state system with 4-fold symmetry, this initial distribution is shown on the left hand side of Fig. 5. How much work can be extracted by bringing this initial distribution to some other distribution p , while using rate matrices of the form Eq. (40), and assuming the energy function changes from E to E ? This is specified by the drop of the accessible free energy, via Eq. (8): where φ G (p) and φ G (p ) are the twirlings of the initial and final distributions respectively. Using the example system with 12 states and 4-fold symmetry, the twirling of p is shown on the right panel of Fig. 5.

VI. MODULARITY CONSTRAINTS
In many cases, one is interested in analyzing the thermodynamics of systems with multiple degrees of freedom, such as systems of interacting particles or spins. Such systems often exhibit modular organization, meaning that their degrees of freedom can be grouped into independently evolving subsystems. Prototypical examples of modular systems include computational devices such as digital circuits [50][51][52], regulatory networks in biology [53], and brain networks [54].
We use the theoretical framework developed above to derive bounds on work and EP for modular protocols. We begin by introducing some terminology and notation. Consider a system whose degrees of freedom are indexed by the set V , such that the overall state space can be written as X = Ś v∈V X v , where X v is the state space of degree of freedom v. We use the term subsystem to refer to any subset A ⊆ V , and modular decomposition to refer to a set of subsystems M, such that each v ∈ V belongs to at least one subsystem A ∈ M. Note that the subsystems in M can overlap, in which case some degrees of freedom v ∈ V belong to more than one subsystem in M. We use to indicate those degrees of freedom that belong to more than one subsystem, which we refer to as the overlap. For a given subsystem A, we use X A to indicate the random variable representing the state of subsystem A and x A to indicate an outcome of X A (i.e., an actual state of subsystem A). Given some distribution p over the entire system, we use p A to indicate a marginal distribution over subsystem A, and [Lp] A to indicate the derivative of the marginal distribution of subsystem A under the dynamical generator L. We say that the available driving protocols obey modularity constraints with respect to the modular decomposition M if each generator L ∈ Λ can be written as a sum of linear operators, such that each L (A) obeys two properties. First, the dynamics over the marginal distribution p A should be closed under L (A) (i.e., depend only on the marginal distribution over A): Second, the marginal distribution over all subsystems other than A should be held fixed when evolving under L (A) : It can be verified that this second condition implies that the degrees of freedom in the overlap cannot change state when evolving under L. However, the degrees of freedom in the overlap may be used to control the dynamical evolution of degrees of freedom that can change state. For example, given a modular decomposition into two possibly overlapping subsystems M = {A, B}, the degrees of freedom in A\B and B \A can evolve in a way that depends on the state of the degrees of freedom in O(M) = A ∩ B. This allows our formalism to encompass common types of feedback-control processes, where some degrees of freedom are held fixed, but are used to guide the evolution of other degrees of freedom [5,55]. For discrete-state master equations, Eqs. (43) and (44) will hold when all the rate matrices L ∈ Λ can be written in the following form: where R (A) is some rate matrix over subsystem A that obeys It is also possible to specify simple conditions for Eqs. (43) and (44) to hold for Fokker-Planck operators. For simplicity, consider dynamics with the following overdamped form: where γ v is the mobility along dimension v, E(x) is some potential energy function, and β −1 is the diffusion scale. Such equations can represent potential-driven Brownian particles coupled to a heat bath at inverse temperature β, where the different mobility coefficients represent different particle masses or sizes [56]. For such dynamics, Eq. (43) and Eq. (44) are satisfied when for all L ∈ Λ, the energy functions have the following additive form, and the mobility coefficients in Eq. (47) obey We now define the following operator φ M : In Appendix D, we show that φ M obeys the Pythagorean theorem, Eq. (19). In that appendix, we also show that if some dynamical generator L obeys Eqs. (43) and (44), then e τ L commutes with φ M for all τ ≥ 0, so Eq. (20) holds. To do so, we show that Eqs. (43) and (44)  We also show that Eqs. (43) and (44) imply that each separate e τ L (A) commutes with φ M . Combining these results implies that e τ L commutes with φ M . Given Theorem 2, any protocol that carries out the transformation p p while obeying modularity constraints permits the decomposition of EP found in Eq. (5), with φ = φ M . Note that D(p φ M (p)) is a non-negative measure of the amount of statistical correlations between the subsystems of M under distribution p, which vanishes when each A is conditionally independent given the state of the overlap O(M). Thus, Eq. (5) implies that the EP is given by the "drop in the inter-subsystem correlations" over the course of the protocol, plus the EP that would be incurred by the initial distribution φ M (p). Theorem 2 also implies the following bound on EP, The first inequality comes from the non-negativity of EP. The second inequality states that the statistical correlations between the subsystems of M can only decrease during any driving protocol that obeys modularity constraints.
The accessible free energy in Eq. (8) is given by F (φ M (p), E). This means that the drop in correlations between subsystems cannot be turned into work by driving protocols that obey modularity constraints. Conversely, the greater the drop in statistical correlations between subsystems, the more work needs to be invested by any constrained driving protocol that carries out the transformation p p .
A particularly simple case of our approach applies when M contains two non-overlapping subsystems, M = {A, B} with A ∩ B = ∅. In that case, the decomposition of EP in Eq. (5) can be written as where I(X A ; X B ) and I(X A ; X B ) indicate the initial and final mutual information between the two subsystems, while p A p B = φ M (p) and p A p B = φ M (p ) are initial and final product distribution over A × B. This immediately leads to a bound on EP in terms of the decrease of mutual information between A and B over the course of the process, A straightforward generalization of this result, which holds when M contains an arbitrary number of non-overlapping subsystems, gives where I(p) = A∈M S(p A ) − S(p) is the multiinformation in the initial distribution p with respect to partition Figure 6. A two-dimensional Szilard box with a Brownian particle and two movable partitions, one vertical and one horizontal. No work can be extracted from initial correlations between the particle's horizontal and vertical position, such as when the particle is in the top left corner 50% of the time, and in the bottom right corner 50% of the time.
M, I(p ) is the multi-information in the final distribution, and A∈M p A = φ M (p) is a product distribution over the partition M. (The multi-information is a well-known generalization of mutual information, which is also sometimes called "total correlation" [57].) This leads to a bound on EP in terms of the drop in multi-information, Σ(p p ) ≥ I(p) − I(p ).
We briefly review some prior related work. Boyd et al. [51] argued that a variant of Eq. (52) must hold in the special case where there are only two subsystems, they don't overlap, and one of them is held fixed. The argument in Boyd et al. [51] was restricted to the case where the process is isothermal. A more detailed analysis of the same scenario, which also holds for multiple reservoirs, was given in [52,58]. The more general bound on EP in terms of the drop of multi-information, for the case where multiple subsystems are simultaneously evolving, was previously derived for discrete-state systems in [58][59][60]. In this paper, we generalize these previous results to both continuous-and discrete-state systems, and to situations where the modular decomposition M may have overlapping (but fixed) subsystems.
Finally, rate matrices of the form Eq. (45) are a special case of the more general discrete-state dynamics analyzed in [61,62], in which the variables in the overlap between evolving subsystems are also allowed to evolve. (Note that those papers used different terminology from the terminology here; see [63]). Applying the result of Appendix E of [62] to the case analyzed in this section can be used to derive Eq. (53), for the particular case of discrete-state systems. (See also Ex. 1 in [64].)

A. Example: Szilard box
We demonstrate the results in this section using the example of a Szilard box. We use a variant of the model described in Section V A. In this variant, there is not only a vertical partition whose horizontal position can be manipulated, but also a horizontal partition whose vertical position can be manipulated. The available energy functions have the form where λ 1 and λ 2 are controllable parameters that determine the location of the vertical and horizontal partitions, respectively. As before, the system evolves according to Fokker-Planck dynamics, where each L ∈ Λ has the form of Eq. (30). As in Section V A, we use E ∅ to indicate the uniform energy function that occurs when both partitions are removed, and π ∅ ∝ e −βE ∅ to indicate the corresponding equilibrium distribution.
The energy function Eq. (54) no longer obeys the simple reflection symmetry (x 1 , x 2 ) → (x 1 , −x 2 ), thus the techniques in Section V A no longer apply. Note also that by manipulating the horizontal partition, one can extract work from an initial distribution which is concentrated in the top (or bottom) half of the box, such as the one shown in Fig. 4, which was impossible to do with a vertical partition. One may wonder if non-trivial bounds on the EP that can be derived for this setup. In this section, we derive such bounds by using the fact that the driving protocols obey a modular decomposition. Let This means that any drop of the mutual information between the horizontal and vertical position of the particle cannot be exploited by any available driving protocol.
To make things concrete, imagine some driving protocol which starts and ends with the partitions removed. In addition, assume that under the initial distribution p, which is shown schematically in Fig. 6, the particle has a 50% probability of being in the left top quarter of the box, and a 50% probability of being in the right bottom quarter of the box. This initial distribution contains 1 bit of mutual information between the particle's horizontal position and vertical position. Assume that the final distribution p is in equilibrium (p = π ∅ ). How much work can be extracted by transforming p p ? The general bound of Eq. (1) states that However, this bound is too optimistic given the driving constraints. In fact, we have Σ(p p ) ≥ I(X 1 ; X 2 ) − I(X 1 ; X 2 ) = ln 2 Using Eq. (56) and the relationship between work and EP, Eq.
(2), gives W (p p ) ≤ 0. This means that no work can be extracted from the correlated initial distribution p given the available driving protocols.
Using these results, we also briefly analyze the thermodynamics of information given some measurement M . For the modular decomposition analyzed here, the operator φ M maps every joint distribution over X 1 × X 2 to a product of the marginal distributions. Using the notation of the EP bound in Eq. (14), we then have and similarly for D(p X |M p X |M ). Combined with Eq. (14), this gives the following bound on average EP in the presence of measurements, In other words, the conditional mutual information between X 1 and X 2 , given the measurement M , is useless information that can only be dissipated away. The useful information, on the other hand, is given by the difference between the total information acquired by the measurement and the useless conditional mutual information, This shows that information about the marginal distributions of X 1 and X 2 , minus the correlation between them, can potentially be turned into work.

B. Discrete-state spin system
We now demonstrate our results using a discrete-state system. We imagine a system with some number of spins, which are indexed by V . We then consider the modular decomposition into two subsystems M = {A, B}, which may have some non-zero overlap. Imagine that all available rate matrices L ∈ Λ can be written in the form Note that such rate matrices guarantee that the degrees of freedom in the overlap O(M) = A ∩ B are held fixed, which means that the driving protocols obey modularity constraints with respect to M. This kind of system, which is shown schematically in Fig. 7, might represent a feedback-controller, where the degrees of freedom in O(M) = A ∩ B correspond to the fixed controller which is used to control the evolution of A\B and B\A.
For this modular decomposition, using the definition of φ M in Eq. (50), one can verify that Plugged into Eq. (51), this gives the following bound on EP for any allowed transformation p p , This result shows that any decrease in the conditional mutual information between A and B, given the state of the overlap A∩B, can only be dissipated away as EP, not turned into work.

VII. COARSE-GRAINING CONSTRAINTS
In this final section, we consider bounds on EP and work that arise from course-graining constraints. We begin by introducing some notation and preliminaries. Let ξ : X → Z be some coarse-graining of the microscopic state space X, where Z is a set of macrostates. For any distribution p over X, let p Z indicate the corresponding distribution over the macrostates Z, and let p X|Z indicate the conditional probability distributions of microstates within macrostates. Similarly, for some dynamical generator L and distribution p, let [Lp] Z indicate the instantaneous dynamics over the coarse-grained distribution p Z . We will useP := {p Z : p ∈ P} to indicate the set of all coarse-grained distributions Z.
To derive our bounds, we assume that the dynamics over the coarse-grained distributions are closed, i.e., for all L ∈ Λ, Given this assumption, the evolution of the coarse-grained distribution p Z can be represented by a coarse-grained dynamical generator, ∂ t p Z =Lp Z (discussed in more detail below). We provide simple conditions that guarantee that Eq. (60) holds for a given dynamical generator L. For a discrete-state master equation L, Eq. (60) is satisfied when whereL z,z indicates the coarse-grained transition rate from macrostate z to macrostate z (see Appendix E). In words, this states that for each microstate x , the total rate of transitions from x to microstates located in other macrostates z = ξ(x ) should depend only on the macrostate of x , not on x directly. This condition has been sometimes called "lumpability" in the literature [65]. A similar condition, but with sums replaced by integrals, can be used for continuous-state master equations. Moreover, for some kinds of coarse-graining functions and Fokker-Planck operators, we can guarantee that Eq. (60) holds by simply considering the available energy functions (see Appendix E). Imagine that L is a Fokker-Planck operator of the form and that ξ is a linear function R n → R m . Then, if the energy function E satisfies where J ξ is the n × m Jacobian matrix of ξ and f is some arbitrary function of the macrostate, the coarse-grained dynamics will be closed. Moreover, in this case, the coarse-grained dynamical generatorL will itself have the following Fokker-Planck form: (For notational simplicity, and without loss of generality, here we assumed that ξ is scaled so that det(J ξ J T ξ ) = 1.) We now derive bounds on work and EP that arise from constraints on the coarse-grained macrostate dynamics. To begin, as we show in Appendix E, our assumption of closed coarse-grained dynamics implies the following lower bound on the EP rate: where π L Z is the coarse-graining of π L , the stationary distribution of L. The right hand side of Eq. (65) can be understood as a kind of "coarse-grained EP rate" for isothermal protocols, which arises from the macrostate distribution p Z being out of equilibrium. We will write the total "coarse-grained EP" over the course of the protocol aŝ Given Eq. (65), we can apply Theorem 2 to derive bounds that arise in the presence of constraints on the coarse-grained dynamical generatorsΛ := {L : L ∈ Λ}. Imagine that there is some operatorφ :P →P over the coarse-grained distribution that obeys (1) the Pythagorean theorem, Eq. (19), for all p ∈P, and (2) the commutativity relation, Eq. (20), for allL ∈Λ. For example, this coarse-grained operator might reflect the presence of symmetry or modularity constraints of the macrostate dynamics. Then, Theorem 2 implies the following decomposition of coarse-grained EP, Since Σ ≥Σ ≥ 0, as follows from Eq. (65), Eq. (67) implies the EP bound We can also use coarse-graining constraints to derive a decomposition of non-equilibrium free energy and a bound on extractable work bound. First, useφ to define an operator over microstate distributions as φ(p) :=φ(p Z )p X|Z . Given thatφ obeys the Pythagorean theorem of Eq. (19) at the level of distributions over Z, it can be verified that φ will obey the Pythagorean theorem at the level of distributions over X. Then, by exploiting the Pythagorean theorem and the EP bound in Eq. (68), we can use the approach described in Section IV to decompose the non-equilibrium free energy into accessible and inaccessible free energy (as in Eq. (7)), We can also use the approach in Section IV to derive a bound on extractable work (as in Eq. (8)), It is important to note that the operator φ =φ(p Z )p X|Z will not necessarily satisfy the commutativity relation in Eq. (20). Instead, given our assumption of closed coarse-grained dynamics, it will satisfy a "coarse-grained commutativity condition", which is weaker than Eq. (20). This weaker commutativity relation is sufficient to derive the bounds on EP and work we present above. However, because the full commutativity relation is not satisfied, the exact identities Eqs. (5) and (9) are not guaranteed to hold for the operator φ. One simple application of the above results occurs when all L ∈ Λ have the same coarse-grained equilibrium distribution, i.e., there is some π Z such that π L Z = π Z for all L ∈ Λ. In this case,φ(p) = π Z will satisfy Eqs. (19) and (20) at the coarsegrained level (compare to the derivation of Eq. (54) above). Applying Eq. (68) gives the EP bound In words, if the coarse-grained equilibrium distribution cannot be changed, then any deviation of the actual coarse-grained distribution from the coarse-grained equilibrium distribution can only be dissipated as EP, not turned into work. Conversely, ifφ represents coarse-grained symmetry or modularity constraints, then Eq. (68) implies that any any asymmetry or intersubsystem correlation in the macrostate distribution can only be dissipated away, not turned into work.

A. Example: Szilard box
We demonstrate our results using the Szilard box. We use the setup described in Section V A, with a single vertical partition and overdamped Fokker-Planck dynamics as in Eq. (30). However, we now assume that there is a vertical gravitational force, as shown in Fig. 8, so all available energy functions have the form E(x 1 , x 2 ) = V p (x 1 −λ 1 )+V w (|x 1 |)+V w (|x 2 |)+κx 2 , (73) Gravity Figure 8. A two-dimensional Szilard box with a Brownian particle, in the presence of gravity.
where κ is a fixed constant that determines the strength of gravity (compare to Eq. (33)).
The modified energy function Eq. (73) no longer obeys the reflection symmetry (x 1 , x 2 ) → (x 1 , −x 2 ), thus the techniques in Section V A can no longer be applied. The dynamics do however obey the modular decomposition analyzed in Section VI A; for expository reasons, however, here we derive a different kind of bound on EP from the one derived in that section.
The microstate of the particle is represented by the horizontal and vertical position, X = (X 1 , X 2 ). We consider a coarse-graining of this microstate in which the macrostate is the vertical coordinate of the particle, Z = X 2 . This corresponds to the coarse-graining function ξ(x 1 , which satisfies Eq. (63), and therefore guarantees that the coarse-grained dynamics are closed. Given Eq. (74), for all L ∈ Λ, the coarse-grained Fokker-Planck operatorL (Eq. (64)) will have the same coarse-grained equilibrium distribution, where we used the form of V w (·) from Eq. (34) and 1 is the indicator function. Thus, the operatorφ(p) = π Z p X|Z satisfies the conditions Eqs. (19) and (20) for the set of coarsegrained dynamical operatorsΛ, allowing us to use bounds on EP and work such as Eqs. (70) and (72). Consider a driving protocol that starts and ends with partition removed. When the partition is removed, the energy function takes the form E ∅ (x 1 , x 2 ) = V w (|x 1 |) + V w (|x 2 |) + κx 2 , with the corresponding equilibrium distribution where Z ∅ = 2 sinh(βκ)/βκ is the normalization constant. Now imagine that under the initial distribution p, the particle is restricted to the top half of the box, x 2 ∈ [0, 1], so where Z = (1 − e −βκ )/βκ is the normalization constant. Imagine also that the final distribution p is the uniform one, p = π ∅ . How much work can we extracted by this protocol, given the constraints on the energy functions? The general bound of Eq. (1) can be evaluated to give W (p p ) ≤ β −1 ln 2 + ln sinh(βκ) 1 − e −βκ .
As before, however, this bound is too optimistic. Note that the initial accessible free energy in Eq. (70) is given by 1] . This is the same as the final free energy in Eq. (70), which implies W (p p ) ≤ 0. Thus, no work can be extracted from this initial distribution p, given the available driving protocols. Now imagine that the particle's initial distribution p is constrained to the left half of the box, while its vertical position is in equilibrium: Eq. (1) states that W (p p ) ≤ β −1 ln 2. In this case, the initial accessible free energy in Eq. (70) is given by F (π Z p X|Z , E ∅ ) = F (p, E ∅ ). In this case, Eq. (70) coincides with Eq.
(1), and shows that β −1 ln 2 of work may be extractable from this initial distribution.
Finally, we analyze the thermodynamic value of different measurements using this model of a Szilard box with gravity. Imagine that, starting from an initial equilibrium distribution π ∅ , one measures the particle's location using some measurement M , and then drives the system back to π ∅ while extracting work from the particle. The second law provides a fundamental limit on average extractable work, Eq. (3), which gives However, we also have the bound on average EP under constraints, Eq. (14), which gives Σ ≥ D(π ∅ X2|M π ∅ ) = I(X 2 ; M ).
Using the relationship between work and EP, Eq.
(2), gives the tighter work bound This shows that only the information about the particle's horizontal location, condition on its vertical location, can potentially be turned into work.

VIII. DISCUSSION
In this paper, we analyzed the EP and work incurred by a driving protocol that carries out some transformation p p , while subject to constraints on the set of available dynamical generators. We first used a general theoretical framework to derive several decompositions and bounds on EP and extractable work. We then applied our general framework to analyze three broad classes of driving protocol constraints, reflecting symmetry, modularity, and coarse-graining.
Our bounds on EP and extractable work, such as Eqs. (6) and (8), are stated in terms of state functions, in that they depend only on the initial and final distributions p and p and not on the path that the protocol takes in going from p to p . In general, it may also be possible to derive other, possibly tighter, bounds on work and EP that are not written in this form. Nonetheless, bounds written in terms of state functions have some important advantages. In particular, they allow one to quantify the inherent "thermodynamic value" (in terms of EP and work) of a given distribution p, irrespective of what particular future protocols that system may undergo -as long as those protocols obey the relevant constraints.
For simplicity, our results were derived for isothermal protocols, where the system is coupled to a single heat bath at a constant inverse temperature β. Nonetheless, as described in Appendix A, many of our results continue to hold for more general protocols, in which the system is coupled to any number of thermodynamic reservoirs. In such cases, our decomposition of EP, Eq. (5), applies to so-called nonadiabatic EP, which reflects the contribution to EP that is due to the system being out of the stationary state. In most cases of interest (such as discrete-state master equation dynamics, overdamped dynamics, etc.), nonadiabatic EP provides a lower bound on the total EP, so our lower bounds on EP, Eqs. (6) and (14), hold for both nonadiabatic EP and total EP. Note, however, that the simple relationship between work and EP, Eq. (2), as well as our results regarding extracted work which make use of this relationship (such as Eqs. (8) and (9)), hold only for isothermal protocols. which is similar to setups commonly employed in modern nonequilibrium statistical physics [2, [66][67][68][69][70]. This model can be justified by imagining a box that contains a large colloidal particle, as well as a medium of small solvent particles to which the vertical partition is permeable. Note that this model differs from Szilard's original proposal [71], in which the box contains a single particle in a vacuum, which has been analyzed in [72][73][74] In the main text, for simplicity we assumed that all protocols are isothermal (coupled to a single heat bath at a constant inverse temperature β). In fact, our results apply more generally, to protocols that can be coupled to any number and kind of thermodynamic reservoirs.
For a general protocol, the right hand side of Eq. (18) quantifies the rate of so-called nonadiabatic EP [43][44][45]. Nonadiabatic EP is non-negative, and reflects the contribution to EP that is due to the system being out of stationarity. In the general case, our decompositions in Theorems 1 and 2, as well as EP lower bounds Eqs. (6) and (14), apply to nonadiabatic EP, rather than overall EP. Importantly, for a given dynamical generator L, the nonadiabatic EP rate is a lower bound on the EP rate whenever the stationary distribution π L is symmetric under conjugation of odd-parity variables (i.e., when the stationary probability of every state x is equal to the stationary probability of its conjugated state, where the sign of all odd-parity variables such as velocity flipped) [45]. This symmetry condition is satisfied by many dynamics of interest, including discrete-state master equations (which typically do not use odd variables), overdamped dynamics (which have no odd variables), and many kinds of underdamped dynamics. In such cases, Eqs. (6) and (14) give lower bounds not only the nonadiabatic EP, but also (by transitivity) regular EP.
As a final note, we observe that our EP bound for closed coarse-grained dynamics, Eq. (65), bounds the overall EP rate, not the nonadiabatic EP rate, even for non-isothermal protocols. See Appendix E 3 for details.

Proof of Theorem 1
The following results assume that L is an infinitesimal dynamical generator that has a stationary distribution.

Lemma 1.
Assume that e L φ = φe L and that, for all p, the support of φ(p) contains the support of p. Then L has a stationary distribution π ∈ img φ whose support includes the support of every other stationary distribution of L.
Proof. Let q be a stationary distribution of L. Then, This shows that φ(q) ∈ img φ must also be a stationary distribution of L. By assumption the support of q must fall within the support of φ(q). Thus, there must a stationary distribution of L with maximal support that is an element of img φ.

Lemma 2.
If e τ L φ(p) = φ(e τ L p) for all p ∈ P and τ > 0, then for any a, b ∈ P with ∂ t a(t) = La(t) and ∂ t b(t) = Lb(t), Proof. Expand the derivative as where in the third line we used the commutativity relation, and then the data processing inequality for KL divergence [75].
Proof of Theorem 1. Given Lemma 1, let π ∈ img φ be a stationary distribution of L with maximal support. Then, expand the derivative aṡ Rewrite the term in the brackets as The non-negativity of − d dt D(p(t) φ(p(t))) is given by taking a = b = p in Lemma 2.

Proof of Theorem 2
Lemma 3. Given a protocol {L(t) : t ∈ [0, 1]} such that L(t) ∈ Λ for all t, and assume there is an operator φ that obeys Eqs. (19) and (20). Then, where p(t) is the distribution at time t given initial distribution p, and [φ(p)](t) is the distribution at time t given initial distribution φ(p).
Proof. Using Lemma 2 with a = [φ(p)](t) and b = p(t), Note that Proof of of Theorem 2. Write Σ as the integral of the EP rate: Using Theorem 1, we can rewrite the last line as Then, rewrite the first term using the fundamental theorem of calculus, Note that the right hand side is non-negative, since − d dt D(p(t) φ(p(t))) ≥ 0 by Lemma 2. Using Lemma 3, we can rewrite the second term as Appendix C: Symmetry constraints

φG obeys the Pythagorean theorem
Here we show that φ G obeys the Pythagorean theorem, in the sense that for all p, q ∈ P, To show this, first rewrite the left hand side of Eq. (C1) as Note that for any a ∈ P, φ G (a) is invariant under any υ g : where the second line involves a change of variables in the Lebesgue integral, and the third line uses that µ(g ) = µ(g −1 g ), by properties of the Haar measure. We use this result to derive the following equality: Eq. (C4) uses the variable substitution x → υ g (x) in the integral, along with the fact that the absolute value of the determinant of the Jacobian of υ g is 1 (since it is a rigid transformation). Eq. (C5) uses Eq. (C3), while Eq. (C7) uses the definition of the twirling operator, Eq. (28). Eq. (C1) follows by combining the right hand side of Eq. (C2) with Eq. (C8) twice, first taking a = p and then taking a = q.
Proof. First, expand the definition of the operator exponential, Since L and φ commute, L k and φ commute, so Note that we performed the variable substitution y = υ g (x). The same derivation applies to a continuous-state master equation, as long as one replaces x by an integral and, in the variable substitution, uses that the absolute value of the determinant of the Jacobian of any υ g is 1 (since it is a rigid transformation).
We have shown that L and φ G commute. Given Lemma 4, this implies that Eq. (20) holds for this L. Here, we use the notation a • b to indicate composition, for instance [p • υ g −1 ] = p(υ g −1 (x)). Eq. (28) can be rewritten using this notation as We now show that Eq. (31) is sufficient for Eq. (20) to hold, when all the L ∈ Λ refer to Fokker-Planck equations of the form Eq. (30). Let E be an energy function that is invariant under G, i.e., E(x) = E(υ g (x)) for all g ∈ G. Then, rewrite Eq. (30) as where we leave the dependence of p(x, t) on t implicit. Now, choose any g ∈ G and write the diffusion term in Eq. (C10) as Here, we first used that p = [p • υ g −1 ] • υ g , and then that the Laplace operator commutes with rigid transformations. Then consider the drift term in Eq. (C10). Using the product rule, ∇·∇E(x)p(x) = (∇p(x)) T (∇E(x))+p(x)∆E(x). (C12) We can rewrite the second term above as where in the first line we used that p(x) = [p • υ g −1 ](υ g (x)), in the second line we used the invariance of E under G, and in the third line we used that the Laplace operator commutes with rigid transformations. Now consider the first term on the right hand side of Eq. (C12): (∇p(x)) T (∇E(x)) = ∇([p • υ g −1 ] • υ g )(x) T [∇(E • υ g )(x)] = J T g ∇[p • υ g −1 ](υ g (x)) T J T g (∇E)(υ g (x)) = ∇[p • υ g −1 ](υ g (x)) T J g J T g (∇E)(υ g (x)) = ∇[p • υ g −1 ](υ g (x)) T (∇E)(υ g (x)), where J g indicates the Jacobian of υ g . Plugging Eqs. (C13) and (C14) back into Eq. (C12) gives where we've used the product rule (in reverse). Using Eqs. (C11) and (C15), we can rewrite Eq. (C10) as Note that this holds for all g ∈ G.
Finally, use Eq. (C9) to derive the following commutativity relationship: where a O and a A\O|A∩O indicate marginal and conditional distribution, respectively. In the third line above, we used that p and φ(p) have the same marginals over all subsystems all A ∈ M as well as the overlap O (this can be verified from the definition of φ, Eq. (50)). Then, = D(p φ(p)) + D(φ(p) φ(q)), where in the last line we applied Eq. (D2) twice, first with a = p and a = q.

φM commutes with e τ L
Here we show that if for some dynamical generator L, Eqs. (43) and (44) hold for all A ∈ M, then φ and e τ L commute for all τ ≥ 0.
We first introduce some helpful notation. Let δ x indicate the delta function over X centered at x (this will be the Dirac delta for continuous X, and the Kronecker delta for discrete X). For any subsystem S ⊆ V , let δ x S indicate the delta function over X S centered at x S . For any A ∈ M, letÃ = A \ B∈M\{A} B indicate the degrees of freedom that belong exclusively to A (and not other subsystems in M), and A c = V \Ã = B∈M\{A} B indicate the complement ofÃ.
We will use the fact that if some p obeys p S = δ x S , then This follows from the fact that the constant random variables (such as X S under p) are always statistically independent. We will also use the following intermediate results. indicate the conditional probability of state x , given some initial state x that evolves for time τ under L (A) . Given Eqs. (43) and (44), this conditional distribution has the form P (x |x) = P (x Ã |x A )δ xÃc (x Ãc ).
Proof. First, write the conditional distribution overÃ as where we used the definition of the operator exponential. Note that [L (A) f ] A is a function of f A by Eq. (43), so (by induction) [L (A) k f ] A is a function of f A . Thus, the right hand side of the above equation is a function of x A , which allows us to generically write SinceÃ ⊆ A, by marginalization this also implies Similarly, the conditional probability distribution overÃ c is where in the third line we integrated over the delta function. Now take q = φ(p), and note that By the definition of φ, we also have that q(x Ã |x Ãc ) = q(x Ã |x A∩Ã c ). This implies that the integral in Eq. (D13) is a function of x A , which we write as Combining with Eqs. (D13) and (D14) allows us to write This has the form of the right hand side of Eq. (50)