Thermodynamically-Efficient Local Computation and the Inefficiency of Quantum Memory Compression

Modularity dissipation identifies how locally-implemented computation entails costs beyond those required by Landauer's bound on thermodynamic computing. We establish a general theorem for efficient local computation, giving the necessary and sufficient conditions for a local operation to have zero modularity cost. Applied to thermodynamically-generating stochastic processes it confirms a conjecture that classical generators are efficient if and only if they satisfy retrodiction, which places minimal memory requirements on the generator. This extends immediately to quantum computation: Any quantum simulator that employs quantum memory compression cannot be thermodynamically efficient.


I. INTRODUCTION
Recently, Google AI announced a breakthrough in quantum supremacy, using a 54-qubit processor ("Sycamore") to complete a target computation in 200 seconds, claiming the world's fastest supercomputer would take more than 10,000 years to perform a similar computation [1]. Shortly afterward, IBM announced that they had proven the Sycamore circuit could be successfully simulated on the Summit supercomputer, leveraging its 250 PB storage and 200 petaFLOPS speed to complete the target computation in a matter of days [2]. This episode highlights two important aspects of quantum computing: first, the importance of memory and, second, the subtle relationship between computation and simulation.
Feynman [3] broached the notion that quantum computers would be singularly useful for the simulation of quantum processes, without supposing that this would also make them advantageous at simulating classical processes. Here, we explore issues raised by the recent developments in quantum computing, focusing on the problem of simulating classical stochastic processes via stochastic and quantum computers. We show that using quantum computers to simulate classical processes typically requires nonzero thermodynamic cost, while stochastic computers can theoretically achieve zero cost in simulating classical processes. This supports the viewpoint originally put forth by Feynman-that certain types of computers would each be advantageous at simulating certain physical processes-which challenges the current claims of quantum supremacy. Furthermore, we show that in both classical and quantum simulations, thermodynamic efficiency places a lower bound on the required memory of the simulator. To demonstrate both, we must prove a new theorem on the thermodynamic efficiency of local operations. Correlation is a resource: it has been investigated as such, in the formalism of resource theories [4] such as that of local operations with classical communication [5], with public communication [6], and many others, as well as the theory of local operations alone, under the umbrella term of common information [7][8][9]. Correlations have long been recognized as a thermal resource [10][11][12][13], enabling efficient computation to be performed when taken properly into account. Local operations that act only on part of a larger system are known to never increase the correlation between the part and the whole; most often, they are destructive to correlations and therefore resource-expensive. Thermodynamic dissipation induced by a local operationsay on system A of a bipartite system AB to make a new joint system CB-is classically proportional to the difference in mutual informations [14]: This can be asymptotically achieved for quantum systems [15]. By the data processing inequality [16,17], it is always nonnegative: ∆S loc ≥ 0. Optimal thermodynamic efficiency is achieved when ∆S loc = 0. To identify the conditions, in both classical and quantum computation, when this is so, we draw from prior results on saturated information-theoretic inequalities [18][19][20][21][22][23][24]. Specifically, using a generalized notion of quantum sufficient statistic [24][25][26][27], we show that a local operation on part of a system is efficient if and only if it unitarily preserves the minimal sufficient statistic of the part for arXiv:2001.02258v3 [quant-ph] 1 Feb 2020 the whole. Our geometric interpretation of this also draws on recent progress on fixed points of quantum channels [28][29][30][31].
Paralleling previous results on ∆S loc [14], our particular interest in locality arises from applying it to thermal transformations that generate and manipulate stochastic processes. This is the study of information engines [12,13,[32][33][34][35]. Rooted in computational mechanics [36][37][38][39], which investigates the inherent computational properties of natural processes and the resources they consume, information engines embed stochastic processes and Markovian generators in the physical world, where Landauer's bound for the cost of erasure holds sway [10].
A key result for information engines is the informationprocessing Second Law (IPSL): the cost of transforming one stochastic process into another by any computation is at least the difference in their Kolmogorov-Sinai entropy rates [33]. However, actual physical generators and transducers of processes, with their own internal memory dynamics, often exceed the cost required by the IPSL [14]. This arises from the temporal locality of a physical generator-only operating from timestep-to-timestep, rather than acting on the entire process at once. The additional dissipation ∆S loc induced by this temporal locality gives the true thermodynamic cost of operating an information engine.
Previous work explored optimal conditions for a classical information engine to generate a process. Working from the hidden Markov model (HMM) [40] that determines an engine's memory dynamics, it was conjectured that the HMM must be retrodictive to be optimal. For this to hold, the current memory state must be a sufficient statistic of the future data for predicting the past data [14].
Employing a general result on conditions for reversible local computation, the following confirms this conjecture, in the form of an equivalent condition on the HMM's structure. We then extend this, showing that it holds for quantum generators of stochastic processes [15,[41][42][43][44][45][46][47][48][49]. Notably, quantum generators are known to provide potentially unbounded advantage in memory storage when compared to classical generators of the same process [42,43,[45][46][47][48]. Surprisingly, the advantage is contingent: optimally-efficient generators-those with ∆S loc = 0must not benefit from any memory compression. We show this to be true not only for previously published quantum generators, but for a new family of quantum generators as well, derived from time reversal [47,[50][51][52].
While important on its own, this also provides a complementary view to our previous result on quantum generators which showed that a quantum-compressed generator is never less thermodynamically-efficient than the clas-sical generator it compresses [15]. Combined with our current result, one concludes that a quantum-compressed generator is efficient with respect to the generator it compresses but, to the extent that it is compressed, it cannot be optimally efficient. In short, only classical retrodictive generators achieve the lower bound dictated by the IPSL. Practically, this highlights a pressing need to experimentally explore the thermodynamics of quantum computing.

II. THERMODYNAMICS OF QUANTUM INFORMATION RESERVOIRS
The physical setting of our work is in the realm of information reservoirs-systems all of whose states have the same energy level. Landauer's Principle for quantum systems says that to change an information reservoir A from state ρ A to state ρ A requires a work cost satisfying the lower bound: where H[ρ A ] is the von Neumann entropy [17]. Note that the lower bound W min : is simply the change in free energy for an information reservoir. Further, due to an information reservoir's trivial Hamiltonian, all of the work W becomes heat Q. Then the total entropy production-of system and environment-is: Thus, not only does Landauer's Principle provide the lower bound, but reveals that any work exceeding W min represents dissipation.
Reference [14] showed that Landauer's bound may indeed be attained in the quasistatic limit for any channel acting on a classical information reservoir. This result generally does not extend to single-shot quantum channels [53]. However, when we consider asymptotically-many parallel applications of a quantum channel, we recover the tightness of Landauer's bound [15].
These statements are exceedingly general. To derive useful results, we must place further constraints on the system dynamics to see how Landauer's bound is affected. Reference [14] introduced the following perspective. Consider a bipartite information reservoir AB, on which we wish to apply the local channel E ⊗ I B , where which arises because we did not use the correlations to facilitate our erasure. See Fig. 1 for a simple example of this phenomenon. This local form of Landauer's Principle is still highly general, but the following shows how to examine it for specific classical and quantum computational architectures The key question we ask is: For which architectures can ∆S loc be made to exactly vanish? We first we consider this problem generally and then provide a solution.

III. REVERSIBLE LOCAL COMPUTATION
Suppose we are given a bipartite system AB with state ρ AB . We wish to determine the conditions for a local channel E A ⊗ I B that maps A to C: to preserve the mutual information I(A : B) = I(C : B). Proofs of the following results are provided in the Supplementary Material. Stating our result requires first defining the quantum notion of a sufficient statistic. Previously, quantum sufficient statistics of A for B were defined when AB is a classical-quantum state [27]. That is, when ρ AB commutes with a local measurement on A. They were also introduced in the setting of sufficient statistics for a family of states [24,25]. This corresponds to the case where AB is quantum-classical-ρ AB commutes with a local measurement on B. Our definition generalizes these cases to fully-quantal correlations between A and B. We start, as an example, by giving the following definition of a minimum sufficient statistic of a classical joint random variable XY ∼ Pr(x, y) in terms of an equivalence relation. We define the predictive equivalence relation ∼ for which x ∼ x if and only if Pr(y|x) = Pr(y|x ) for all y. The minimum sufficient statistic (MSS) [X] Y is given by the Thermodynamics of locality: Suppose we have two bits XY in a correlated state where 1 /2 probability is in XY = 00 and 1 /2 probability is in XY = 11. (a) A thermodynamically irreversible operation can be performed to erase only X (that is, set X = 0 without changing Y ) if we are not allowed to use knowledge about the state of Y . (b) A reversible operation can be performed to erase X if we are allowed to use knowledge about Y . Both operations have the same outcome given our initial condition, but the nonlocal operation (a) is more thermodynamically costly because it is irreversible. According to Thm. 1, operation (a) is costly since it erases information in X that is correlated with Y . This cannot be directly generalized to the quantum setting since correlations between A and B cannot always be described in the form of states conditioned on the outcome of a local measurement on A. If the latter were the case, the state would be classical-quantum, but general quantum correlations can be much more complicated than these. However, we can take the most informative local measurement that does not disturb ρ AB and then consider the "atomic" quantum correlations it leaves behind.
Let ρ AB be a bipartite quantum state. A maximal local commuting measurement (MLCM) of A for B is any local measurement X with projectors {Π (x) } on system A such that: where: and: and any further local measurement Y on ρ (x) AB disturbs the state: We call the states ρ Now, as in the classical setting, we define an equivalence class over the values of the MLCM via the equivalence between their quantum correlation atoms. Classically, these atoms are simply the conditional probability distributions Pr(·|x); in the classical-quantum setting, they are the conditional quantum states ρ (x) B . Note that each is defined as a distribution on the variable Y or system B. In contrast, the general quantum correlation atoms ρ (x) AB depend on both systems A and B. The resulting challenge is resolved in the following way. Let ρ AB be a bipartite quantum state and let X be the MLCM of A for B. We define the correlation equivalence Finally, we define the Minimal Local Sufficient Statistic (MLSS) [X] ∼ as the equivalence class [x] ∼ := {x : x ∼ x} generated by the relation ∼ between correlation atoms. Thus, our notion of sufficiency of A for B is to find the most informative local measurement and then coarse-grain its outcomes by unitary equivalence over their correlation atoms. The correlation atoms and the MLSS [X] ∼ together describe the correlation structure of the system

AB.
The machinery is now in place to state our result. The proof depends on previous results regarding the fixed points of stochastic channels [28][29][30][31] and saturated information-theoretic inequalities [18][19][20][21][22][23][24]. This background and the proof are described in the SM. (1) Then, I (A : B) = I (C : B) if and only if E A can be expressed by Kraus operators of the form: where φ xyα is any arbitrary phase and Pr (y, α|x) is a stochastic channel that is nonzero only when ρ In light of the previous section, there is a simple thermodynamic interpretation of Thm. 1 and Cor. 1: local channels that circumvent dissipation due to their locality (i.e., those which have ∆S loc = 0) are precisely those channels that preserve the sufficiency structure of the joint state. They may create and destroy any information that is not stored in the sufficient statistic and the correlation atoms. However, the sufficient statistic itself must be conserved and the correlation atoms must be only unitarily transformed. We now turn to apply this perspective to classical and quantum generators-systems that use thermodynamic mechanisms to produce stochastic processes. We compute the necessary and sufficient conditions for these generators to have zero locality dissipation: ∆S loc = 0. And so, in this way we determine precise criteria for when they are thermodynamically efficient.

Thermal Reservoir
Work Reservoir where (here) S is countable, X is finite, and for each x ∈ X , T (x) is a matrix with values given by a stochastic channel from S to S × Y, T (x) s s := Pr G (s , x|s). We define generators to use recurrent HMMs, which means the total transition matrix T s s := x T (x) s s is irreducible. In this case, there is a unique stationary distribution π G (s) over states S satisfying π G (s) > 0, s π G (s) = 1, and s T s s π G (s) = π G (s ). During its operation, a generator's function is to produce a stochastic process-for each , a probability distribution Pr G (x 1 . . . x ) over words x 1 . . . x ∈ X . The probabilities for words of length generated by G are defined by: Typically, we view a generator as operating over discrete time, writing out a sequence of symbols from x ∈ X on a tape, while internally transforming its memory state; see Fig. 2. Starting with an initial state S 0 ∼ π(s) and empty tape at time t = 0, the entire system at time t is described by the joint random variable X 1 . . . X t S t , with distribution: Continuing this technique, one can compute the joint random variable X 1 . . . X t S t X t+1 S t+1 .
This picture of a generator as operating on a tape while continually erasing and rewriting its internal memory allows us to define the possible thermodynamics, also shown in Fig. 2. Erasure generally requires work, drawn from the work reservoir, while the creation of noise often allows the extraction of work, which is represented in our sign convention by drawing negative work from the reservoir. Producing a process X 1 . . . X t ∼ Pr (x 1 . . . x t ) of length t has an associated work cost . The negative sign, as discussed, indicates work k B T ln 2 H (X 1 . . . X t ) may be transferred from the thermal reservoir to the work reservoir. For large t, this can be asymptotically expressed by the work rate W/t ≥ −k B T ln 2 h µ , where: is the process' Kolmogorov-Sinai entropy rate [33]. This is a reasonable description of the average entropy rate of a process that is stationary-that is, Pr(X t . . . X t+ = Recurrent generators produce exactly these sorts of processes.
Now, a given generator cannot necessarily be implemented as efficiently as the minimal work rate W min := −k B T ln 2 h µ indicates. This is because a generator acts temporally locally, only being able to use its current memory state to generate the next memory state and symbol. The true cost at time t must be bounded below by W loc := W min + ∆S loc , where in this case the locality dissipation is [14]: In this case, the dissipation does not represent work lost to heat but rather the increase in tape entropy that did not facilitate converting heat into work. To understand this in some detail, this section identifies the necessary and sufficient conditions for efficient generators-those with ∆S loc = 0.
To state our result for classical generators, we must introduce two further notions regarding generators. As before, proofs of results are given in the SM. Consider a partition of S: P = {P θ }, P θ ∩ P θ , θ P θ = S, labeled by index θ. Let: with π G (s|θ) = π G (s)/π G P (θ) and π G P (θ) = s∈P θ π G (s). We say a partition {P θ } is mergeable with respect to the generator G = (S, θ θ := Pr (θ , x|θ), generates the same process as the original.
Pertinent to our goals here is the notion of We now state our theorem for efficient classical generators:

for all t if and only if the retrodictively state-merged generator
We say that a generator G = (S, X , {T for some f is called unifilar [54]. For every process, there is a unique generator, called the reverse -machine, constructed by retrodictively statemerging any co-unifilar generator [48]. Similarly, using a different partition, called predictive equivalence on states, any unifilar generator for a process can be state-merged into a unique generator called the forward -machine of that process [48].
The reverse -machine has the following property. Let − → X t := X t+1 X t+1 . . . represent all future generated symbols, the reverse -machine state Σ t at time t is the minimum sufficient statistic of − → X t for predicting X 1 . . . X t . Any generator whose state S t is a sufficient statistic of − → X t for X 1 . . . X t is called a retrodictor. The reverse -machine can then be considered the minimal retrodictor.
Reference [14] conjectured that the necessary and sufficient condition for ∆S loc = 0 is that the generator in question is a retrodictor. In the SM we confirm this by establishing that the conditions of Thm. 8 imply the generator is a retrodictor.
A similar result, for classical generators, was presented in [34] where a lower bound on ∆S loc was derived for predictive generators (Eq. (A23) in [34]). A consequence of this bound is that ∆S loc = 0 only when the predictor is also a retrodictor. However, this bound does not extend to nonpredictive generators. In contrast, Thm. 8 applies to all generators.
Our result is complemented by another recent result [35], which demonstrated how from a predictive generator one can construct a sequence of generators that asymptotically approach a retrodictor and whose dissipation ∆S loc asymptotically approaches zero. Helpfully, this result points to possible perturbative extensions of Thm. 8.
These results bear on the trade-off between dissipation and memory for classical generators. The reverse (forward) -machine, being a state-merging of any co-unifilar (unifilar) generator, is minimal with respect to the counifilar (unifilar) generators via all quantifications of the memory, such as the number of memory states |S| and the entropy H[S] [48].
As a consequence, we now see that the above showed that any thermodynamically efficient generator can be statemerged into a co-unifilar generator. This means it can be further state-merged into the reverse -machine of the process it generates. In short, thermodynamic efficiency comes with a memory constraint. And, when the memory falls below this constraint, dissipation must be present.

V. THERMODYNAMICS OF QUANTUM MACHINES
A process' forward -machine, a key player in the previous section, may be concretely defined as the unique generator s s }) for a given process satisfying [38]: -Machines are a process' minimal unifilar generators, in the sense that they are smallest with respect to the number of memory states |S|, the entropy H[S], and all other ways of measuring memory, such as the Rényi entropies H α [S] := 1 1−α log 2 ( s π G (s) α ). In this, they are unique.
However, one can implement -machines with even lower memory costs, by encoding them in a quantum system and generating symbols by means of a noisy measurement. This encoding is called a q-machine. In terms of qubits, as a unit of size, these implementations can generate the same process at a much lower memory cost that the -machine's bit-based memory cost. It has also been shown that these quantum implementations have a lower locality cost W loc than their corresponding -machine, and so they are more thermodynamically efficient [15].
This section identifies the constraints for quantum generators to have zero dissipation; that is, ∆S loc = 0. We show that this results in a peculiar pair of constraints. First, the forward -machine memory must not be smaller than the memory of the reverse -machine. (This mirrors the results of Thm. 8 in SM.) Second, the quantum generator achieves no compression. That is, the memory of the quantum generator in qubits is precisely the memory of the forward -machine in bits. Thus, compression of memory and perfect thermodynamic efficiency are exclusive outcomes.
To state this precisely, we review q-machines and introduce several new definitions to capture their properties. (See the SM for the proofs.) x ∈ X } on said Hilbert space such that: This expression implicitly defines the Kraus operators given the encoding {|ψ s }. The encoding, in turn, is determined up to a unitary transformation by the following constraint on their overlaps: . This equation has a unique solution for every choice of phases {φ xs } [49].
We note that if π G (s) is the -machine's stationary distribution, then the stationary state of this quantum generator is given by the ensemble: and satisfies: When we say that a quantum generator uses less memory than its classical counterpart, we mean that dim H S ≤ |S|, are the Rényi-von Neumann entropies [42,47,48].
To see this quantum generator as a physical system, as in Fig. 2, requires us interpreting the tape being written on as a series of copies of a single Hilbert space H A that represents one cell on the tape. On H A we define the computational basis {|x : x ∈ X } in which outputs will be written. The system at time t can be described using the joint Hilbert space H A1 ⊗ H At ⊗ H S , where each H A k is unitarily equivalent to H A , and the state is: From this we get the process generated by the -machine and quantum generator in terms of the Kraus operators as: Let us now briefly discuss the thermodynamic properties of quantum generators, homing in on our main result about conditions for their efficiency. The previous section discussed how a process, to be generated, requires the minimal work rate W min = −k B T ln 2 h µ . However, this is not typically achievable for classical generators. The same principle holds for quantum generators: Since they act temporally locally, the true cost at time t is bounded below by W loc = W min +∆S loc and the locality dissipation ∆S loc has the same form: There are two crucial differences, though. First, the mutual information I above is the quantum mutual information derived from the von Neumann entropy. Second, even the work rate W loc is not necessarily achievable in the single-shot case [53]. However, it may be attained for asymptotically parallel generation [15]. We will not concern ourselves with this second problem here. Our intent is to focus, as in the previous section, on the necessary and sufficient conditions for ∆S loc = 0.
The preceding material was, in fact, review. We now introduce a simple partition that may be constructed on the memory states of the -machine for a given quantum implementation. Specifically, we define the maximal commuting partition (MCP) on S to be the most refined partition {B θ } such that the overlap matrix ψ r |ψ s is  3. Performance trade-offs for q-machines, whose variety and dependence on phases {φxs} is depicted by a torus: Under all ways of quantifying memory, the q-machines constructed from a predictor achieve nonnegative memory compression [48], and they also have a smaller dissipation ∆S loc , rendering them more thermodynamically efficient [15]. However, to achieve positive compression, they must also have a nonzero ∆S loc , rendering them less efficient than a classical retrodictor.
Our result on thermodynamically-efficient quantum generators is as follows.
Theorem 3 (Maximally-efficient quantum generator). We previously found that, in the limit of asymptotically parallel generation, a quantum generator is always more thermodynamically efficient than its corresponding -machine, in that it has a lower dissipation [15]. Yet this does not imply dissipation can be made to vanish for quantum generators of a process. In fact, only for processes whose forward -machine is also a retrodictor can dissipation be made to vanish. In these cases, the memory states will be orthogonally encoded, and so no memory compression is achieved, which is seen by the trivial maximality of B. The situation is heuristically represented in Fig. 3.

VI. THERMODYNAMICS OF REVERSE q-MACHINES
We showed that forward -machines compressed via the q-machine cannot achieve the efficiency of a classical retrodictor. However, one may wonder what happens to a retrodictor's optimal efficiency if it is directly compressed. We now demonstrate a method for such compression, derived from the time-reversal of the q-machine, and prove that even here any nonzero compression of memory precludes optimal efficiency.
A process' reverse -machine may be defined similarly to the forward -machine as the unique generator G = (S, X , {T (x) s s }) for a given process satisfying: Reverse -machines are a process' minimal co-unifilar generators, in the sense that they are smallest with respect to the number of memory states |S|, the entropy H[S], and all other ways of measuring memory, such as the Rényi entropies H α [S] := 1 1−α log 2 ( s π G (s) α ). There is an intricate relationship between forward and reverse -machines that can only be appreciated in the language of time reversal. The time-reverse of a generator [52]. The generator G is associated with the reverse process, . . x 1 ). Note that time reversal preserves both the state space S and the stationary distribution π s .
Given a process' forward -machine F, its time reverse F is the reverse -machine of the reverse process. Conversely, given a process' reverse -machine G, its time reverse G is the forward -machine of the reverse process. Since the stationary distribution and state space are preserved under time reversal, F and F have the same memory costs, as do G and G. However, somewhat surprisingly, this does not mean that F and G have the same memory costs [51].
Previous work compared the results of compressing the forward -machine F of a process and the forward -machine G of the reverse process using the q-machine formalism. The result, for compressing G, is a q-machine that generates the reverse process-remarkably, with identical cost to the q-machine constructed from F [47].
The q-machine constructed from G is a quantum process and as such can itself undergo quantum time-reversal [50], resulting in a new process that we call the reverse q-machine. Just as the q-machine compresses G, the reverse q-machine is a compression of G. Though the reverse q-machine is derived from the q-machine via timereversal, there is genuinely new physics present, as the dissipation ∆S loc (Eq. (5)) is not invariant under timereversal. Thus, they must be approached as a separate case from the traditional q-machine when examining their thermodynamic efficiency.
The details of the time-reversal are handled in the SM. Here, we present the resulting technique for compressing the reverse -machine. Given a reverse -machine G = (S, X , {T (x) s s }), for any set of phases {φ xs : x ∈ X , s ∈ S} there is an encoding {|ψ s : s ∈ S} of orthogonal states into a Hilbert space H S and a set of Kraus operators x ∈ X } on said Hilbert space such that: The orthogonality of {|ψ s } allows us to turn this into an explicit definition of the Kraus operators: The stationary state ρ π of this machine is, unlike the qmachine, generically not expressible as an ensemble of the encoding states {|ψ s }. If this were so, the orthogonality of {|ψ s } would make them a diagonalizing basis for ρ π , and we would achieve no memory compression. Rather, compression is achieved for the reverse q-machine precisely because the stationary state ρ π is generically not diagonal in the encoding states-in contrast to the q-machine, which derived compression from the nonorthogonality of the encoding states.
The reverse q-machine stochastic dynamics Eq. (4) and thermodynamics Eq. (5) are defined precisely as those for q-machines in the previous section. As before, to prove our result we must define a special partition of the generator states. Here, it is important to note a relationship between a process' forward -machine F = P, X , R (x) p p and its reverse -machine G = S, X , T (x) s s . Specifically, the state S t of G after seeing the word x 1 . . . x t and the state P t of F after the same are related by: for some channel Pr C (s|p). Let λ p be the stationary distribution of F's states and let Pr E (s |s) = p Pr C (s|p) Pr C (s |p) λ p /π s . Let B = {B θ } be the ergodic partition of Pr E (s |s), such that Pr E (s |s) > 0 only when θ(s) = θ(s ). The SM shows that the ρ π is diagonal in the blocks defined by B.
Our result for reverse q-machines, proven in the SM, can now be stated: Notice that this is a similar statement to that made in the last section and is essentially its time reverse. It implies that the only reverse -machines which can be quantally compressed are those which are also predictive generators. Also, again the trivial maximality of the ergodic partition B implies an inability to achieve nonzero compression. A heuristic diagram of the situation is shown in Fig. 4.
In conjunction with the previous section, this is a profound result on the efficiency of quantum memory compression. Distinct from the classical case, where Thm. 8 established that every process has certain generators that do achieve zero dissipation, Thms. 10 and 3 imply that only certain processes have zero-dissipation quantum generators and, moreover, those particular processes achieve no memory compression. The memory states, being orthogonally encoded, take no advantage of the quantum setting to reduce their memory cost.

VII. CONCLUDING REMARKS
We identified the conditions under which local operations circumvent the thermodynamic dissipation ∆S loc that arises from destroying correlation. We started by showing how a useful theorem can be derived using recent results on the fixed points of quantum channels. We applied it to the setting of local operations to determine the necessary and sufficient conditions for vanishing ∆S loc in classical and quantum settings, with the aid of a generalized notion of quantum sufficient statistic. We employed this fundamental result to review and extend previous results on the thermodynamic efficiency of generators of stochastic processes. We confirmed a recent conjecture regarding the conditions for vanishing ∆S loc in a classical generator. And, then, we showed the exact same conditions hold for quantum generators, even to the point of requiring orthogonal encoding of memory states. This implies the profound result that quantum memory compression and perfect efficiency (∆S loc = 0) are incompatible.
It is appropriate here to recall the lecture by Feynman in the early days of thinking about quantum computing, in which he observed that quantum systems can only be simulated on classical (even probabilistic) computers with great difficulty, but on a fundamentally-quantum computer they could be more realistically simulated [3]. Here, we considered the task of simulating a classical stochastic process by two means: one by using fundamentallyclassical but probabilistic machines and the other by using a fundamentally-quantum machine. Previous results generally indicated quantum machines are advantageous in memory for this task, in comparison to their classical counterparts. Historically, this led to a much stronger notion of "quantum supremacy" than Feynman proposed: quantum computers may be advantageous in all tasks [55].
However, the quantum implementation we examined, though advantageous in memory, requires nonzero dissipa-tion in order to cash in on that advantage. Furthermore, not every process necessarily has a quantum generator that achieves zero dissipation. This is in sharp contrast to the classical outcome. And so, this returns us to the spirit of Feynman's vision for simulating physics, in which it may sometimes be the case that the best machine to simulate a classical stochastic process is a classical stochastic computer-at least, thermodynamically speaking.
To further exercise these results, further extensions must be made to quantum generators, beyond the q-machine and its time reverse. We must determine if the exclusive relationship between compression and zero dissipation continues to hold in such extensions. We pursue this question in forthcoming work.

Thermodynamically-Efficient Local Computation and the Inefficiency of Quantum Memory Compression
Samuel P. Loomis and James P. Crutchfield The Supplementary Materials give a quick notational overview, review (i) fixed points of quantum channels, reversible computation, and sufficient statistics, (ii) quantum implementations of classical generators and q-machines and their thermodynamic costs, and (iii) provide details on example calculations. The intention is that the main development be accessible while, together with the Supplementary Materials, the full treatment becomes self-contained. x ∈ X over H A that satisfy the completeness relation: x K (x) † K (x) = I A -the identity on H A . Given state ρ A and measurement X, the probability of outcome x ∈ X is: and the state is transformed into: If the Kraus operators K (x) : x ∈ X are orthogonal to one another and projective, then X is called a projective measurement. Otherwise, X is called a positive-operator valued measure (POVM). Given a set of projectors {Π (x) } on a Hilbert space H A , such that x Π (x) = I A is the identity, we may decompose A and H A2 is equivalent to C |X | . Let ρ A and σ A be states in B + (H A ). If, for every measurement X and outcome x ∈ X , Pr (x; σ A ) > 0 implies that Pr (x; ρ A ) > 0, then we write ρ A σ A , indicating that ρ A is absolutely continuous with respect to σ A . To discuss classical systems, we eschew states and instead focus directly on the measurements. In this setting, measurements are called random variables. Here, a random variable X is defined over a countable set X , taking values x ∈ X with a specified probability distribution Pr (X = x). When the variable can be inferred from context, we will simply write the probabilities as Pr (x).
A random variable X can generate a quantum state ρ A through an ensemble Pr(X = x), ρ (x) A of potentially nonorthogonal states such that: A quantum channel is a linear map E : B (H A ) → B (H C ) that is trace-preserving and completely positive. These conditions are equivalent to requiring that there is a random variable X and a set of Kraus operators {K (x) : x ∈ X } such that: for all ρ A . 1 To every channel E there is an adjoint E † : for every state ρ A ∈ B + (H A ) and operator M ∈ B (H A ). The adjoint has the form:

Given a subspace H B ⊆ H
. E is ergodic if and only if it has a unique stationary state E (π A ) = π A [29, and references therein].
The classical equivalent of a quantum channel is, of course, the classical channel that maps a set X to Y according to the conditional probabilities Pr (Y = y|X = x). This may be alternately represented by the stochastic matrix T := (T yx ) such that T yx = Pr (Y = y|X = x). Stochastic matrices are also defined by the conditions that T yx ≥ 0 for all x ∈ X and y ∈ Y and y T yx = 1 for all x ∈ X . A channel from X to itself is ergodic if there is no proper subset Y ⊂ X such that, for x ∈ Y, Pr (X = x |X = x) > 0 only when x ∈ Y as well. This is equivalent to requiring that T x x := Pr (X = x |X = x) is an irreducible matrix.
A classical channel Pr(Y = y|X = x) can be induced from a joint distribution Pr(x, y) by the Bayes' rule Pr(y|x) := Pr(x, y)/ y Pr(x, y), so that the joint distribution may be written as Pr(x, y) = Pr(x) Pr(y|x). Given three random variables XY Z, we write X−Y −Z to denote that X, Y , and Z form a Markov chain: Pr(x, y, z) = Pr(x) Pr(y|x) Pr(z|y). The definition is symmetric in that it also implies Pr(x, y, z) = Pr(z) Pr(y|z) Pr(x|y).

Entropy, Information, and Divergence
The uncertainty of a random variable or measurement is considered to be a proxy for its information content. Often, uncertainty is measured by the Shannon entropy: when it does not diverge. Given a system A with state ρ A , the quantum uncertainty is the von Neumann entropy: When the same system may have many states in the given context, this is written directly as H (ρ A ). It is the smallest entropy that can be achieved by taking a projective measurement of system A. The minimum is attained by the measurement that diagonalizes ρ A .
There are many ways of quantifying the information shared between two random variables X and Y . The most familiar is the mutual information: Pr (x, y) log 2 Pr (x, y) Pr (x) Pr (y) .
Corresponding to the mutual information are the conditional entropies: We will also have occasion to use the conditional mutual information for three variables: Pr(x, y, z) log 2 Pr(x, y|z) Pr(x|z) Pr(y|z) .
Note that an equivalent condition for X − Y − Z is the vanishing of the conditional mutual information: I(X : Z|Y ) = 0.
For a bipartite quantum state ρ AB the mutual information is usually taken to be the analogous quantity I (A : B) =

H(A) + H(B) − H(AB) and the conditional entropies as H (A|B) = H(AB) − H(B) and so on.
We need a way to compare quantum systems and classical random variables. Consider an ensemble Pr(X = x), ρ (x) A generating average state ρ A and define its Holevo quantity: Consider the ensemble-induced state ρ AB defined by: Consider random variables X 1 and X 2 over the same set X with two possible distributions Pr(X 1 = x) and Pr(X 2 = x), respectively, and suppose that whenever Pr(X 2 = x) > 0 we have Pr(X 1 = x) > 0 as well. Then, we can quantify the difference between the two distributions with the relative entropy: Similarly, two states ρ A and σ A with ρ A σ A may be compared via the quantum relative entropy: The relative entropy leads to allied information-theoretic quantities. For instance: and: One of the most fundamental information-theoretic inequalities is the monotonicity of the relative entropy under FIG. S1. Quantum channel decompositions: Conserved measurement X divides the Hilbert space "vertically" via an orthogonal decomposition, H , represented above by labels A (x=0) and A (x=1) . For each value of x, there is a "horizontal" decomposition into the tensor product of an ergodic subspace and a decoherence-free subspace: H A Df , represented respectively by the labels A E and A Df . According to Theorem 6, information-theoretic reversibility requires storing data in the conserved measurement and the decoherence-free subspace. Any information stored coherently with respect to the conserved measurement (stored in the ergodic subspace) will be irreversibly modified under the channel's action.
transformations. This is the data processing inequality [16]. For a quantum channel E it says: The condition for equality requires constructing the Petz recovery channel: It is easy to check that [18,19].
Two other forms of the data processing inequality are useful to note here. The first uses another quantity for measuring distance between states called the fidelity: It takes value F = 0 when the states ρ and σ are completely orthogonal and value F = 1 if and only if ρ = σ. The data processing inequality for fidelity states that for any quantum channel E: This is yet another way of saying that states map closer together under a quantum channel E.
The second form arises from applying Eq. (S1) to the mutual information. Let To understand our result on local channels, an illustrative starting point is a key result on fixed points of quantum channels [28,29] that leads to a natural decomposition, as illustrated in Fig. S1.

Theorem 5 (Channel and Stationary State Decomposition). Suppose E : B (H A ) → B (H A ) is a quantum channel,
Hilbert space H A has a transient subspace H T , and there is a projective measurement X = Π (x) on H ⊥ T with countable outcomes X , such that A is the support of Π (x) . Then: A further decomposes into an ergodic subspace H

(x)
A E and decoherence-free subspace H (x) such that the Kraus operators of E| H ⊥ T decompose as [30]:

Any subspace of H satisfying the above two properties is, in fact, H (x)
A for some x ∈ X .

Furthermore, if ρ A is any invariant state-that is, ρ A = E A (ρ A )-then it decomposes as:
for any distribution Pr (x) and state ρ (x) Figure S1 gives the geometric structure implied by the theorem. The ergodic subspace of quantum channel E has two complementary decompositions. First, there is an orthogonal decomposition H ⊥ A induced by a projective measurement X whose values are conserved by E's action. This conservation is decoherent: only states compatible with X are stationary under the action of E. X is called the conserved measurement of E [31]. Then, each H

(x)
A has a tensor decomposition H (x) A Df into an ergodic (E) and a decoherence-free (Df) part. The decoherence-free subspace H

(x)
A Df undergoes only a unitary transformation [30]. The ergodic part H

(x)
A E is irreducibly mixed such that there is a single stationary state. This result's contribution here is to aid in identifying when the data-processing inequality saturates. That is, using Thm. 5 and the Petz recovery channel, we derive necessary and sufficient constraints on the structures of ρ A , σ A , and To achieve this, we recall a previously known result [22,23], showing that it can be derived using only the Petz recovery map and Thm. 5. The immediate consequence is a novel proof.
A Df are unitarily equivalent and the Kraus operators decompose as: Furthermore, states ρ A and σ A satisfy: A E . And, their images are: where π (x) We know that N σ := R σ • E must have both ρ A and σ A as stationary distributions. Let X be the conserved measurement of N σ . It induces the decompositions A Df , as well as the state decompositions Eq. (S4). Now, we leverage the fact that N σ | H ⊥ T has Kraus decomposition of the form Eq. (S1). The net effect is summed up by two constraints: 1. For each α, K (α) maps each H Proving Eq. (S3) requires these two constraints. Each is a form of distinguishability criterion for the total channel Since N σ can tell certain orthogonal outcomes apart, so too must E. Or else, R σ would "pull apart" nonorthogonal states. However, this is impossible for a quantum channel. By formally applying this notion to constraints 1 and 2 above, we recover Eq. (S3).

(x)
A to some orthogonal subspace H Let N be another complete measurement on H

(x)
A Df such that |n = m w m,n |m , with w m,n a unitary matrix. And, let n, n ∈ N be distinct. We have for any |ψ that: Now, it must be that E (x) (|ψ ψ| ⊗ |n n|) and E (x) (|ψ ψ| ⊗ |n n |) are orthogonal. However: All of which leads one to conclude that the H (x,m) C for each m must be unitarily equivalent. And so, the decomposition instead becomes a tensor product decomposition H Finally, the constraints Eq. (S5) follow from the form of E and Eq. (S4). Theorem 6's main implication is that, when a channel E acts, information stored in the conserved measurement and in the decoherence-free subspaces is recoverable. Two states that differ in terms of the conserved measurement and the decoherence-free subspaces remain different and do not grow more similar under E's action. Conversely, information stored in measurements not compatible with the conserved measurement or stored in the ergodic subspaces is irreversibly garbled by E.
The next section uses this decomposition to study how locally acting channels impact correlations between subsystems. This directly drives the thermodynamic efficiency of local operations. Namely, for thermodynamic efficiency correlations must be stored specifically in the conserved measurement and decoherence-free subspaces of the local channel.

Appendix C: Quantum Sufficient Statistics and Reversible Local Operations
Let's review the definition of sufficient statistic in the main body. Let ρ AB be a bipartite quantum state. A maximal local commuting measurement (MLCM) of A for B is any local measurement X on system A such that: where: Pr(X = x) = Tr (Π and any further local measurement Y on ρ (x) AB disturbs the state: We call the states ρ Suppose there were two distinct MLCMs, X and Y . Then: This can be written as: However, this means for each x: AB . So, X is not a MLCM, giving a contradiction. It will be helpful in our study of quantum generators to have the following fact as well: Proposition 4 (MLCM for a classical-quantum state). Given a classical-quantum state: the MLCM is the most refined measurement Θ such that: for all x. Given that Θ is a commuting local measurement, the question is whether it is maximal. If it is not maximal, though, there is a refinement Y that is also a commuting local measurement. By Θ's definition, there is an x such that . This implies ρ AB = y (I ⊗ Π (y) )ρ AB (I ⊗ Π (y) ), contradicting the assumption that Y is commuting local. Now, let ρ AB be a bipartite quantum state and let X be the MLCM of A for B. We define the correlation equivalence AB and (S1) such that E A ⊗ I B conserves measurement X and acts decoherently on (AB) E and coherently on (AB) Df . However, the local nature of E A ⊗ I B makes it clear we can simplify this decomposition to: where E A conserves the local measurement X on A and acts decoherently on A E and acts as a local unitary A Df B . The joint measurement XY X Z X -where X is measured first and then the other two measurements are determined with knowledge of its outcome-is the MLCM of ρ AB . Note that for any x and z, the outcomes (x, y, z) and (x, y , z) are correlation equivalent: measurement Y X is completely decoupled from system B. Then, the MLSS Σ := [XY X Z X ] B is simply a function of X and Z X . Since XZ X is conserved by the action of E A ⊗ I-where X is the conserved measurement, while Z X is preserved through the unitary evolution-the MLSS Σ must be preserved with values given by a stochastic channel from S to S × Y, T (x) s s := Pr G (s , x|s). We define generators to use recurrent HMMs, which means the total transition matrix T s s := x T (x) s s is irreducible. In this case, there is a unique stationary distribution π G (s) over states S satisfying π G (s) > 0, s π G (s) = 1, and s T s s π G (s) = π G (s ).
The following probability distributions are particularly relevant: Also, recall the definition of a mergeable partition and retrodictive equivalence. Each partition P = {P θ } of S has a corresponding merged generator G P = (P, θ θ }) with transition dynamics given by: where π G (s|θ) = π G (s)/π G P (θ) and π G P (θ) = s∈P θ π G (s). A partition {P θ } is mergeable if the merged generator generates the same process as the original. To see this, we follow a proof by induction. We first suppose that the distribution Pr G P (x 1 . . . x t , σ t ) is equal to the distribution: for some t. Then, noting that: we have: Now, for t = 1, we have: Pr G (x 1 , s 1 ) .
Then, by induction, Pr G P (x 1 . . . x t , σ t ) = Pr G (x 1 . . . x t , σ t ) for all t. By summing over σ t , we have Pr G P (x 1 . . . x t ) = Pr G (x 1 . . . x t ) for all t as well.
Using this result and the Reversible Local Operations Theorem, we can establish the following.
for all t if and only if the retrodictively state-merged generator To see this, recall from Cor. 1 that I (S t : X 1 . . . X t ) = I (S t+1 X t+1 : X 1 . . . X t ) only if Pr G (s t+1 , x t+1 |s t ) > 0 implies: Now, note that: Rearranging and using the retrodictive equivalence partitions σ t := [s t ] ∼ and σ t+1 := [s t+1 ] ∼ , we have: Define a function f : S × X → S as follows. For a given σ and x, let f (σ , x) be the equivalence class such that: Such an equivalence class f (σ , x) must exist by Eq. (S1). It is unique since, by definition, equivalence classes σ have unique distributions Pr (·|σ). Then σ t = f (σ t+1 , x t+1 ) is a requirement for Pr G P (σ t+1 , x t+1 |σ t ) > 0. If we then take the merged generator G P , we must have T (x) σ σ > 0 only when σ = f (σ , x). Conversely, suppose that the retrodictively state-merged generator x) for some function f : S × X → S. Then, for a given s t , its equivalence class [s t ] ∼ is always a function of the next state and symbol: . . X t from the basic Markov property of the generator. Therefore, we must have: Last, we connect this result to previous literature on the efficiency of retrodictors by showing that our conditions for efficiency imply that the generator is retrodictive. Recall that any generator whose state S t is a sufficient statistic of − → X t for X 1 . . . X t is called a retrodictor.
s s }) whose retrodictively state-merged generator G P = (P ∼ , x) for some function f : S × X → S is a retrodictor.
This follows since G P , being co-unifilar and already retrodictively state-merged, must be the reverse -machine with states Σ t . It is clear from the proof of Prop. 5 that for all t, we have the Markov chain (X 1 . . . X t ) − Σ t − S t . Since Σ t is also the minimum sufficient statistic of − → X t for X 1 . . . X t , we must have the Markov chain (X 1 . .
terms of the q-machine, we can write: . This is, essentially, the Petz reversal of the POVM { K (x) }, and it constitutes a formal time-reversal of the quantum process [50]. Computing K (x) is straightforward using Eq. (S3), as this gives the Kraus operators in the diagonal basis of ρ π , where it is easiest to compute ρ 1/2 π and its inverse. We have: Now, take the basis |ψ s = α U * sα |α . In this basis: Thus, we see that the basis {|ψ s } and Kraus operators {K (x) } form a reverse q-machine as described in the main body-one that compresses the reverse -machine G.
Note that the stationary state of a time-reversed q-machine is just the stationary state of the original q-machine-this is not altered under time reversal. However, we find a new expression for the stationary state, in terms of the basis {|ψ s }: So, ρ π is generally not diagonal in the basis {|ψ s }. The extent to which ρ commutes with {|ψ s } is the extent to which Ω rs is block-diagonal.

Efficiency of q-Machines
To establish our first theorem relating memory to efficiency we turn to forward q-machines. First, we must prove a result regarding the synchronization of q-machine states.
This shows that the quantum state of the memory system converges to a single encoded state with high probability.
This follows straightforwardly from a similar statement about -machines [38,56,57]. Let Q(x 1 . . . x t ) := 1 − Pr(ŝ|x 1 . . . x t ) be the probability of not being in the most likely state after word x 1 . . . x t . Then there exist 0 < α < 1 and K > 0 such that, for all t: Note that, from the Kraus operator definition, the quantum state of the memory system after word x 1 . . . x t is: Now, the fidelity can be computed as: And so, from the synchronization theorem for -machines, we have: for all t.
The second notion we must introduce is a simple partition that may be constructed on the memory states of the -machine for a given quantum implementation. Specifically, we define the maximal commuting partition on S to be the most refined partition {B θ } such that the overlap matrix ψ r |ψ s is block-diagonal. That is, {B θ } is such that ψ r |ψ s = 0 if r ∈ B θ and s ∈ B θ for θ = θ . From this partition we construct the maximally commuting local measurement required to define sufficient statistics. We can now prove the following.
For the equivalence relation we say that θ ∼ θ if ρ   where Q(x 1 . . . x t ) = 1 − Pr F (p t |x 1 . . . x t ). Then the result follows from the forward -machine synchronization theorem. Now, we prove a synchronization theorem for the reverse q-machine.  s j+1 and x j+1 ).
First, recall that x 1 . . . x t corresponds to the reverse of the sequence x 1 . . . x t , generated by the forward -machine G of the reverse process. We have the relation Pr G ( s t | x 1 . . . To handle the term Pr (s t |x 1 . . . x t ,ŝ), we use the co-unifilarity properties of the reverse -machine. Note that: Pr (s t |x 1 . . . x t ,ŝ) = Pr (ŝ|x 1 . . . x t , s t ) Pr (s t |x 1 . . . x t ) 1 − Q .
By co-unifilarity, Pr (ŝ|x 1 . . . x t , s t ) is either 0 or 1 depending on the value of s t . The synchronization theorem then demands that Pr (s t |x 1 . . . x t ) < Q for those s t which assign Pr (ŝ|x 1 . . . x t , s t ) a zero value. Due to this, along with the above equation: With this, we have sufficient information to determine the MCLM of the system S t A t . . . A 1 for sufficiently long t. Let π s be the stationary distribution for G's memory state, and let λ q be the same for F. Consider the channel Pr(s |s) = q Pr(s |q) Pr(s|q)λ q /π s . From it we can construct the ergodic partition {B θ } which is defined as the most refined partition such that Pr(s |s) > 0 only if θ(s) = θ(s ). This partition represents information about the state of G which is recoverably encoded in the state of F. If, on the one hand, the partition is trivially coarse-grained (|{B θ }| = 1), then no information about G is recoverably encoded. If, on the other, the partition is trivially maximal (|B θ | = 1 for all θ), then the state of G is actually a function of the state of F. Consequently, in this extreme case, G is unifilar in addition to being unifilar.
Proposition 11. Let ρ G (t) be the state of the system A 1 . . . A t S t at time t. Let Θ be the projective measurement on H S corresponding to the ergodic partition described above. Then, for sufficiently large t, Θ is the MLCM of S t . Similarly, X t Θ is the MLCM of A t S t .
By Prop. 4, the MLCM must leave each ρ x1...xt unchanged, for all t. This is true for Θ. The question is if any nontrivial refinement, say Y , of Θ can do so. Realize that for any > 0, there is sufficiently large t, so that for each state s there must be at least one word x 1 . . . x t satisfying F (ρ x1...xt , |Ψ x1...xt Ψ x1...xt |) > 1 − . Then, for sufficiently large t, it must be the case that there exists a word x 1 . . . x t such that Y modifies ρ x1...xt , because Y (by virtue of being a refinement of Θ) cannot commute with all the |ψ s ψ s |. Therefore, Θ is the maximal commuting local measurement. That X t Θ is the MLCM of A t S t follows from similar considerations. We define ρ    This implies that the quantum generator's Kraus operators have the form: The values Pr(θ , x|θ) must be positive only when θ x ∼ θ. This would imply that the resulting merged machine is retrodictive. However, since the states S are those of the reverse -machine, they cannot be further merged into a retrodictive machine. It must then be the case that the partition Θ is trivially maximal. Consequently, it must be the case that G is predictive.
The consequences of this theorem are augmented by the following statement about partition {B θ } and the stationary distribution ρ π .
For, we can express the stationary distribution as: The primary implication of this theorem, then, is that trivial maximality of Θ implies ρ π is diagonal in the basis {|ψ s }, which itself implies that the reverse q-machine does not achieve any nonzero memory compression.