Thermodynamic costs of Turing Machines

Turing Machines (TMs) are the canonical model of computation in computer science and physics. We combine techniques from algorithmic information theory and stochastic thermodynamics to analyze the thermodynamic costs of TMs. We consider two different ways of realizing a given TM with a physical process. The first realization is designed to be thermodynamically reversible when fed with random input bits. The second realization is designed to generate less heat, up to an additive constant, than any realization that is computable (i.e., consistent with the physical Church-Turing thesis). We consider three different thermodynamic costs: the heat generated when the TM is run on each input (which we refer to as the"heat function"), the minimum heat generated when a TM is run with an input that results in some desired output (which we refer to as the"thermodynamic complexity"of the output, in analogy to the Kolmogorov complexity), and the expected heat on the input distribution that minimizes entropy production. For universal TMs, we show for both realizations that the thermodynamic complexity of any desired output is bounded by a constant (unlike the conventional Kolmogorov complexity), while the expected amount of generated heat is infinite. We also show that any computable realization faces a fundamental tradeoff between heat generation, the Kolmogorov complexity of its heat function, and the Kolmogorov complexity of its input-output map. We demonstrate this tradeoff by analyzing the thermodynamics of erasing a long string.

In this paper we extend this line of research by deriving new results on the thermodynamic costs of performing general computations, as formalized by the notion of Turing machines (TMs).A TM is an abstraction of a conventional modern computer, which run programs written in a conventional programming language (C, Python, etc.) [40][41][42][43][44][45].A TM reads an input string of arbitrary length (a "program") and runs until it produces an output string.In the same way that any modern computer can simulate other computers (e.g., via an emulator), there exist an important class of TMs called universal Turing Machines (UTMs), each of which is able to simulate the operation of any other TM.
TMs are a keystone of the theory of computation [46], and touch upon several foundational issues that lie at the intersection of mathematics and philosophy, such as whether P = NP and Gödel's incompleteness theorems [47].Their importance * Complexity Science Hub, Vienna; Arizona State University is partly due to the celebrated Church-Turing thesis, which postulates that any function that can be computed by a sequence of formal operations can also be computed by some TM [48][49][50].For this reason, in computer science, a function is called computable if and only if it can be carried out by a TM [42].TMs also play important roles in many facets of modern physics.For instance, TMs are used to formalize the difference between easy and hard computational problems in quantum computing [51][52][53][54][55].There has also been some speculative, broader-ranging work on whether the foundations of physics may be restricted by some of the properties of TMs [56,57].Finally, there has been extensive investigation of the physical Church-Turing thesis, which states that any function that can be implemented by a physical process can also be computed with a TM [51,53,[58][59][60][61][62][63][64][65][66][67][68][69][70].
One of the most important concepts in the theory of TMs is Kolmogorov complexity.The Kolmogorov complexity of a string y, written as K(y), is the length of the shortest input program which causes a UTM to produce y as the output (formal definitions are provided in Section II B).The Kolmogorov complexity of a string y captures the amount of randomness in y, because a string with a non-random pattern can be produced by a short input program.For example, the string containing the first billion digits of π can be generated by running a very short program, and so has small Kolmogorov complexity.In contrast, for a random string y without any patterns, the shortest program that produces y is a program of the type "print 'y'", which has about the same length as y.An important variant of Kolmogorov complexity is the conditional Kolmogorov complexity of y given x, written K(y|x), which is the length of the shortest program which causes a UTM to produce y as output, when the UTM is provided with x as an additional input.Kolmogorov and conditional Kolmogorov complexity have many formal connections with entropy and conditional entropy from Shannon's information theory [43], and are studied in a field called Algorithmic Information Theory (AIT) [42,71].
In this paper, we combine techniques from AIT and stochastic thermodynamics to analyze the thermodynamics of TMs.We imagine a discrete-state physical system that is coupled to a heat bath at temperature T , and which evolves under the influence of a driving protocol.We identify the initial and final states of the physical system with the logical inputs and outputs of some TM, so that the dynamics over the states of the physical system corresponds to a computation performed by the TM.We refer to a physical process that is consistent with the laws of thermodynamics and whose dynamics correspond to the input-output map of a TM as a realization of that TM.
We derive numerous results that concern the thermodynamic properties of realizations of TMs.The core underlying idea behind these results is that the logical properties a given TM (such as the structure of the TM's input-output map, or the Kolmogorov complexity of its inputs and outputs) provide constraints on the thermodynamic costs incurred by realizations of that TM (such as the amount of heat those realizations generate).Some of our results relate logical properties and thermodynamic costs at the ensemble level (i.e., relative to a probability distribution over computational trajectories of a TM), thereby building on the thermodynamic analysis initiated by Landauer and others.In addition to these, many of our results also relate logical properties and thermodynamic costs at the level of individual computational trajectories (i.e., individual runs of the TM), which goes beyond most existing research on thermodynamics of computation.

A. Summary of results
We investigate three different kinds of thermodynamic costs for a given realization of a TM: (1) The amount of heat that is generated by running the realization of a given (univeral or non-universal) TM on each individual input x.We refer to the map from inputs to their associated heat values as the heat function of the TM's realization, and write it as Q(x).
(2) The minimal amount of heat generated by running the realization of a given TM on some individual input that results in a desired output y.Here we assume that the TM is universal, so that it can in principle produce any output.This second cost is a function of the desired output y, rather than of the input x, and can be viewed as a thermodynamic analog of conventional Kolmogorov complexity.For this reason, we refer to this cost as the thermodynamic complexity of y.
(3) The ensemble-level expected heat Q generated by the realization of a TM, evaluated for the input distribution that minimizes entropy production (EP).For this cost, we again focus on the case of universal TMs.
In general, there are many physical processes that are realizations of the same TM, which can have different thermodynamic costs from one another.In this paper we consider the above three thermodynamic costs for two important types of realizations.The first realization we consider, which is called the coin-flipping realization, is constructed to be thermody-namically reversible when input programs are sampled from the "coin-flipping" distribution p(x) ∝ 2 −ℓ(x) , where ℓ(x) indicates the length of string x.This input distribution arises by feeding random bits into a TM (hence its name) and plays a fundamental role in AIT.
We show that the heat function of the coin-flipping realization of a given TM is proportional to ℓ(x) minus a "correction term" which reflects the logically irreversibility of the input-output map computed by the TM.Importantly, when the realized TM is a universal TM U , this correction term can be related to the Kolmogorov complexity of the output of U on input x.In this case, the heat function is given by where φ U (x) indicates the output of U on input x, and O(1) indicates equality up to an additive constant independent of x (see Section I C for a formal definition).Thus, up to an additive constant, the heat generated by running input x on the coin-flipping realization of some UTM U is proportional to the excess length of the input program x, over and above the length of the shortest program for U that produces the same output as x.
It follows from Eq. ( 1) that if x is the shortest program for U that produces output φ U (x), then Q coin (x) = O(1).This means that by running the shortest program x that produces some desired y as output, one can produce that y for an amount of heat that is bounded by a constant.Thus, the thermodynamic complexity for the coin-flipping realization is a bounded function, unlike the Kolmogorov complexity, which grows arbitrarily large [42].On the other hand, we also show that when inputs are sampled from the coin-flipping distribution, the expected heat Q generated by the coin-flipping realization of a UTM is infinite.This holds even though the heat necessary to run the UTM on any given input x is finite.
The second realization we analyze is inspired by the physical Church-Turing thesis.To begin, we refer to a realization of a TM with heat function Q as a computable realization if the function x → Q(x)/kT is computable (i.e., there exist some TM that takes as input any desired x and outputs the corresponding heat value Q(x) in units of kT ).Under common interpretations of the physical Church-Turing thesis [50,53,[59][60][61]64], any realization that is actually constructable in the realworld must be computable; in other words, a non-computable realization is a hypothetical physical process which does not violate any laws of thermodynamics, but which nonetheless cannot be constructed because of computational constraints.Motivated by these observations, we define the so-called dominating realization of a TM M to be "optimal" in the following sense: the heat it generates on any input x is smaller than the heat generated by any computable realization of M on x, up to an additive constant which does not depend on x.1 The heat 1Note that generating minimal heat is different from generating minimal EP.For example, the coin-flipping realization of a TM is thermodynamically reversible for the coin-flipping distribution over inputs x, and thus generates zero EP when run on inputs sampled from that distribution.However, that does not mean that it generates less heat on any particular input x, relative to the heat generated by another realization of the same TM on x.
function of the dominating realization is proportional to the conditional Kolmogorov complexity of the output given the input, where φ M (x) indicates the output of TM M on input x.We show that this heat function is smaller than the heat function Q of any computable realization of M , Note that this result holds whether or not M is a UTM.For the special case where M is a UTM, we show that for any desired output y, the thermodynamic complexity of y under the dominating realization is bounded by a constant that is independent of y, just like for the coin-flipping realization.Moreover, for the dominating realization there is a simple scheme for choosing the input x that will produce any desired output y with a bounded amount of heat.This differs from the coin-flipping realization, where one must know the shortest program that generates y in order to produce y with a bounded amount of heat (in general, finding the shortest program to produce a given output y is not computable).
Finally, we consider the expected heat that is generated by the dominating realization, given some probability distribution over input programs.A natural input distribution to consider is the one that minimizes the entropy production of the dominating realization.As for the coin-flipping realization, we show that the expected heat across inputs sampled from this distribution is infinite.
There are two important caveats concerning the dominating realization.First, while the dominating realization is better than any computable realization, in the sense of Eq. ( 3), it itself is not computable.This is because its heat function is defined in terms of the conditional Kolmogorov complexity, which is not a computable function.Nonetheless, as we discuss below, one can always define a sequence of computable realizations whose heat functions approach Q dom from above.Thus, the dominating realization presents a fundamental bound on the heat generation of computable realizations, and this bound is achievable in the limit.
Second, for a given TM M , Eq. ( 3) states that the heat generated by the dominating realization on input x, Q dom (x), is smaller than the heat generated by any computable realization, Q(x), up to an additive constant that does not depend on x.This additive constant, however, can depend on the particular alternative realization of M that is being compared, i.e., on the choice of comparison heat function Q.In fact, depending on the alternative realization, that additive constant can be arbitrarily large and negative.This means that for a given TM M and some particular choice of input program x, there may exist alternative realizations of M that generate arbitrarily less heat than the dominating realization.It turns out, however, that the difference between Q dom (x) and Q(x) is upper bounded by the sum of the Kolmogorov complexity of the input-output function φ M and the Kolmogorov complexity of the comparison heat function Q.Using this result, we show that any computable realization that produces output y from input x faces a fundamental cost of K(x|y), which can be paid either by producing a large amount of heat, by computing an input-output map with high complexity, or by having a heat function with high complexity.
The paper is laid out as follows.In the following subsections, we review relevant prior work and introduce notation.In Section II, we define TMs and review some relevant results from AIT.In Section III, we review the basics of statistical physics, and discuss how a TM can be implemented as a physical system.We present our main results on the coin-flipping and dominating realizations in Section IV and Section V.In Section VI, we demonstrate the tradeoff between heat and complexity by analyzing the thermodynamics of erasing a long string.In the last section we discuss potential directions for future research.

B. Prior work on thermodynamics of TMs
Some of the earliest work on the thermodynamics of TMs focused on TMs with deterministic and logically reversible dynamics [72,73].Logically reversible TMs can perform computations without generating any heat or entropy production, at the cost of having to store additional information in their output, which logically irreversible TMs do not need to store.Due to the thermodynamic costs that would arise in re-initializing that extra stored information, there are some subtleties in calculating the thermodynamic cost of running a "complete cycle" of any logically reversible TM [34].(See also [74,75] for a discussion of the relationship between thermodynamic and logical reversibility.)Logically reversible TMs form a special subclass of TMs, and require special definitions of universality [76].In this work, we focus on the thermodynamics of general-purpose TMs, whose computations will generally be logically irreversible.However, we will sometimes also discuss how our results apply in the logically reversible case.
More recently, [36] analyzed the thermodynamics of logically reversible TMs with stochastic forward-backward dynamics along a computational trajectory, which causes the state of the TM to become more uncertain with time.2This model incurs non-zero entropy production, even though each computational trajectory encodes a logically reversible computation.Note that this entropy production could in principle be made arbitrarily small by driving the TM forward with momentum (e.g., by coupling it to a large flywheel).In this work, we will ignore possible stochasticity in the progression of a TM along its computational trajectory.
Finally, there has been recent work which interprets the coinflipping distribution over strings x, as defined in Section IV, as a "Boltzmann distribution" induced by the "energy function" ℓ(x) [77].Doing this allows one to formulate a set of equations concerning TMs that are formal analogs of Maxwell's relations for equilibrium thermodynamic systems.
In our own earlier work, we began to analyze the thermodynamic complexity of computing desired outputs, focusing on the coin-flipping realization and a three-tape UTM [78].We first showed explicitly how to construct a system that is thermodynamically reversible for the coin-flipping distribution, and then derived the associated heat function.We showed that for this realization, the minimal amount of heat needed to compute any given output y equals the Kolmogorov complexity of y, plus what we characterized as a "correction term".In other, more recent work, we rederived these results using stochastic thermodynamics and single-tape machines [79].
In this paper, we extend this earlier work on the coin-flipping realization.For simplicity, we consider the thermodynamics of systems that implement the entire computation of a given UTM in some fixed time interval.(In contrast, our earlier work considered systems that implement a given UTM's update function iteratively, taking varying amounts of time to halt, depending on the input to the UTM.)We then go further, and use Levin's Coding theorem to show that the thermodynamic complexity of the coin-flipping realization is bounded, even though the conventional Kolmogorov complexity function is not.We also extend this earlier work by showing that the coin-flipping realization generates infinite expected heat when inputs are sampled from the coin-flipping distribution.
The other main contributions of this paper concern the thermodynamic costs of the dominating realization.These results are related to a series of ground-breaking papers begun by Zurek [5,6,[80][81][82][83][84][85][86].Those papers were generally written before the widespread adoption of trajectory-based analyses of thermodynamics [18], and contained a semiformal argument that computing an output string y from an input x has a minimal "thermodynamic cost" of at least K(x|y).Even though that semiformal argument is quite different from our analysis, the same "thermodynamic cost" function also appears in our analysis of the dominating realization.We discuss connections between our results and this earlier work in more detail in Section VI.

C. Notation
We use uppercase letters, such as X and Y , to indicate random variables.We use lowercase letters, like x and y, to indicate their outcomes.We use p X to indicate a probability distribution over random variable X, and p X|Y to indicate a conditional probability distribution of random variable X given random variable Y .We also use p X|Y =y to indicate the probability distribution of X conditioned on one particular outcome Y = y.Finally, we use supp p X to indicate the support of distribution p X , and notation like f (X) pX = x p X (x)f (x) to indicate expectations.A partial function f : A → B is a map from some subset of A, which is called the domain of definition of f , into B. We write dom f ⊆ A to indicate the domain of definition of f , and img f := {f (a) : a ∈ dom f } to indicate the image of f .The value of f (a) is undefined for any a ∈ dom f .For any set A, we use A * to indicate the set of finite strings of elements from A. We use A ∞ to indicate the set of infinite strings of elements from A. In particular, {0, 1} * indicates the set of all finite binary strings.Note that for any finite A, A * is a countably infinite set.
The Kronecker delta is indicated by δ(•, •).We sometimes write δ x to indicate a delta-function probability distribution over outcome x of random variable X, δ x (x ′ ) = δ(x, x ′ ).

II. BACKGROUND ON TURING MACHINES AND AIT A. Turing Machines
In its canonical definition, a TM comprises three variables, and a rule for their joint dynamics.First, there is a tape variable whose state is a semi-infinite string s ∈ A ∞ , where A is a finite set of tape symbols which includes a special blank symbol.Second, there is a pointer variable v ∈ {1, 2, 3, . . .}, which is interpreted as specifying a "position" on the tape (i.e., an index into the infinite-dimensional vector s).Finally, there is a head variable h whose state belongs to a finite set, which includes a specially designated start state and a specially designated halt state.
The TM starts with its head in the start state, the pointer set to position 1, and its tape containing some finite string of nonblank symbols, followed by blank symbols.The joint state of the tape, pointer, and head evolves over time according to a discrete-time update function.If during that evolution the head ever enters its halt state, that is interpreted as the computation being completed.If and when the computation completes, we say that the TM has then computed its output, which is specified by the state of its tape at that time.Importantly, for some inputs, a TM might never complete its computation, i.e., it may go into an infinite loop and never enter the halt state.The operation of a TM is illustrated in a schematic way in Fig. 1.A more formal definition of a TM and the update function is provided in Appendix A.
There many other variants of TMs that have been considered in the literature, including ones with multiple tapes and multiple heads.However, all of these variants are computationally equivalent: any computation that can be carried out with a particular TM variant can also be carried out with some TM that possesses a single tape and a single head [34,40,87].
For simplicity of analysis, we make two assumptions about the TMs analyzed in this paper, none of which affect the computational capabilities of the TMs.First, we assume that the tape alphabet A contains the binary symbols 0 and 1, and that these are the only non-blank symbols present on the tape at the beginning of the computation.Second, we assume that any TM we consider is designed so that, if and when it reaches a halt state, its tape will contain a string from {0, 1} * followed by all blank symbols, and the pointer will be set to 1 (i.e., returned to the start of the tape).This assumption of a "stan- A TM performing a computation.The update function is applied over a sequence of steps, causing the finite-state head (rounded box, states are colored circles) to move along an infinite tape of symbols (b indicates a special "blank" symbol).During each step, the head can read/write the tape symbol in the current position, move left or right along the tape, and change its current state (green triangle).The computation completes if and when the head reaches its halt state (red circle).
dardized" halt state properly accounts for the thermodynamic costs of running a complete cycle of the TM.For instance, after this standardized halt state is reached, the output of the TM can be moved from the tape onto an off-board storage device and a new input can be moved from another off-board storage device onto the tape, thus preparing the TM to run another program.Importantly, both of these operations can in principle be performed without incurring thermodynamic costs [34].
Given the above assumptions, one can represent the computation performed by any TM M as a partial function over the set of finite-length bit strings {0, 1} * (see Appendix A), which we write as φ M : {0, 1} * → {0, 1} * .In this notation, φ M (x) = y indicates that when TM M is started with input program x, it eventually halts and produces the output string y.Note that φ M is a partial function because it is undefined for any input x for which M does not eventually halt [40,42,43].Thus, dom φ M (the domain of definition of φ M ) is the set of all input strings on which M eventually halts, which is sometimes called the "halting set of M " in the literature.
As mentioned in the introduction, a universal TM (UTM) is a TM that can simulate any other TM.More precisely, given some UTM U and any other TM M , there exists an "interpreter program" σ U,M such that for any input x of M , φ U (σ U,M , x) = φ M (x).Intuitively, this means that there exists programming languages which are "universal", meaning they can run programs written in any programming language, after appropriate translation from that other language.Note that since M can itself be a UTM, any UTM can simulate any other UTM.
The Kolmogorov complexity is unbounded: for any UTM U and any finite κ, there exists a string x such that K U (x) > κ (this follows from the fact that {0, 1} * is an infinite set, while only a finite number of different outputs can be produced by programs of length κ or less).Moreover, K U is an uncomputable function.This implies that if the physical Church-Turing thesis is true, then no real-world physical system can take any desired string x as input and produce the value of K U (x) as output.On the other hand, Kolmogorov complexity can be bounded from above,3 and it is possible to derive many formal results about its properties [42].
One can define the Kolmogorov complexity not just for strings but also for computable partial functions.Recall from the previous section that given any UTM U and TM M , there is a corresponding "interpreter program" σ U,M , which can be used by U to simulate M on any input x.The Kolmogorov complexity of a computable function f is defined as the length of the shortest interpreter program for U that simulates a TM that computes f : Similarly, the Kolmogorov complexity of some computable function f : {0, 1} * → R is given by the length of the shortest interpreter program that approximates f to arbitrary precision.
Above we defined Kolmogorov complexity relative to some particular choice of UTM U .In fact, the choice of U is only relevant up to an additive constant.To be precise, for any two UTMs U and U ′ , the "invariance theorem" [42] states that Given this result along with the unboundedness of K U , for any two UTMs U and U ′ and any desired ǫ > 0, for all but a finite number of strings x (out of the infinite set of all possible such strings).For many purposes, this allows us to dispense with specifying the precise UTM U when referring to the Kolmogorov complexity of a string x, and simply write Finally, the conditional Kolmogorov complexity of x ∈ {0, 1} one can derive increasingly tight upper bounds on it.In addition, like regular Kolmogorov complexity, the conditional Kolmogorov complexity defined relative to two UTMs U and U ′ differs only up to an additive constant which does not depend on x or y [42], Accordingly, for many purposes we can simply write K(x|y), without specifying the precise UTM U .

A. Physical setup
We consider a physical system with a countable state space X .In practice, X will often be a "mesoscopic" coarse-graining of some underlying phase space, in which case X would represent the states of the system's "information bearing degrees of freedom" [88].For simplicity, in this paper we ignore issues raised by coarse-graining, and treat X as the microstates of our system.
We assume that the system is connected to a work reservoir and a heat bath at temperature T .The system evolves dynamically under the influence of a driving protocol, and we are interested in its dynamics over some fixed interval t ∈ [0, t f ].
As mentioned in the introduction, research in nonequilibrium statistical physics has defined thermodynamic quantities such as heat, work, and entropy production at the level of individual trajectories of a stochastically-evolving process, so that ensemble averages of those measures over all trajectories obey the usual properties required by conventional statistical physics [18,19].Adopting this approach, we define the heat function Q(x) as the expected amount of heat transferred from our system to the heat bath during the interval t ∈ [0, t f ], assuming that the system begins in initial state x.Following a standard setup in the literature [89][90][91][92], we assume that the joint Hamiltonian of the system and bath can be written as where H t X is the time-dependent Hamiltonian of the system, H B is the bare Hamiltonian of the bath, and H int is the interaction Hamiltonian (which is typically very small, reflecting weak-coupling).Regardless of the initial state of the system x, the bath is initially taken to be in a Boltzmann distribution p B (b) ∝ e −HB (b)/kT .Let p ′ B|x indicate the final distribution of the bath at t = t f , given that the system began in initial state x.The heat function is then given by the increase of the expected energy of the bath [89,90], The expectation of Q(x) under any initial distribution p X then gives the overall expected amount of generated heat averaged across all trajectories, assuming that initial system-bath states are sampled from p X (x)p B (b).This setup can be used to model infinite-sized idealized heat baths (infinite heat capacity, fast equilibration, etc.) by taking appropriate limits [89][90][91][92].
A central quantity of interest in statistical physics is the (irreversible) entropy production (EP), which reflects the overall increase of entropy in the system and the coupled environment.For a given physical process, let p X be an initial state distribution at time t = 0 and let p Y be the corresponding final state distribution at t = t f .Then, the expected EP is where S(•) indicates the Shannon entropy.4By the second law of thermodynamics, Σ(p X ) is non-negative for any physicallyallowed heat function Q and every initial distribution p X [90].
A physical process is said to be thermodynamically reversible if it achieves zero EP.We say that a physical process is a realization of some partial function f : X → X if the conditional probability of the system's ending state given the starting state obeys The behavior of a realization of f on initial states x ∈ dom f can be arbitrary, as it is not constrained by Eq. ( 13).
The following technical result links the logical properties of a partial function f with the heat function of any realization of that f .This result will be central to our analysis, as it will allows us to establish thermodynamic constraints on processes that realize TMs.Proposition 1.Given a countable set X , let f : X → X and G : X → R be two partial functions with the same domain of definition.The following are equivalent: 1.For all p X with supp p X ⊆ dom f , 2. For all y ∈ img f , 3. There exists a realization of f coupled to a heat bath at temperature T , whose heat function Q obeys This proposition is proved in Appendix C. The proof exploits a useful decomposition of EP into a sum of a conditional Kullback-Leibler divergence term and a non-negative expectation term, which is derived in Appendix B.
4 For countably infinite state spaces (e.g., the state spaces of UTMs), the Shannon entropy of both the initial and final distribution can be infinite, making the expression in Eq. ( 12) ill-defined.In such cases, a finite EP can often be defined by writing Eq. ( 12) as a limit Σ(p X ) = lim i→∞ Σ(p i ), where each p i has finite support and lim i→∞ p i = p X .
A realization of a TM is a physical process over a countable state space X ⊆ {0, 1} * , which maps initial states to final states according to the input-output function of the TM.As a hypothetical example, consider a process that evolves to the final state 0110011 at t = t f when started on initial state 10101100 at t = 0 (left), as might correspond to a computation performed by the TM (right, see also Fig. 1).
We note two things about Proposition 1. First, the remainder of the inequality in Eq. ( 15) determines the EP incurred by a realization of f .In particular, as we show in Appendix C, if that inequality is tight for all y ∈ img f , then the inequality in Eq. ( 14) is also tight for some initial distributions p X .In this case, the realization of f , as referenced in Eq. ( 16), is thermodynamically reversible for those initial p X .
Second, it is straightforward to generalize the setup described in this section to consider a system connected to multiple thermodynamic reservoirs, instead of a single heat bath [17].In the general case, Proposition 1 still holds, if the left hand side of Eq. ( 16) is interpreted as the amount of entropy increase in all coupled thermodynamic reservoirs, given that the process begins in initial state x.Eq. ( 16) is a special case of this general formulation, since releasing Q(x) of heat to a bath at temperature T increases the bath's entropy by Q(x)/kT .

B. Realizations of a TM
We briefly describe how a physical process can realize a TM M .Without loss of generality, we assume that the countable state space of the physical system X can be represented by a set of binary strings, so X ⊆ {0, 1} * .
As described in Section II A and Appendix A, the computation performed by a TM can be formalized as a partial function φ M : {0, 1} * → {0, 1} * .We say that a physical process is a realization of a TM M if it realizes the partial function φ M , in the sense of Eq. ( 13) and Proposition 1.Note that this is only possible when dom φ M ∪ img φ M ⊆ X .Note also that there may be physical states x ∈ X that do not belong to dom φ M .When the system is initialized with such states at t = 0, its will undergo some well-defined dynamical evolution.However, its behavior for such initial states is not constrained by the fact that the system is a realization of the TM, and can be arbitrary (in general, the dynamic and thermodynamic properties for such initial x are not our focus).The mapping between a TM and a physical system is illustrated in Fig. 2.
Many TMs, including all UTMs, can have arbitrarily long programs (i.e., unbounded input length), and can take an arbitrary number of steps before halting on any particular input (i.e., unbounded runtime).For such TMs, our formulation appears to assume a physical system that can store a tape of unbounded size, and which can complete an unbounded number of computational steps in a finite time interval [0, t f ], which is not realistic from a physical point of view.In such cases, one can imagine a sequence of realizations, each of which involves manipulating a finite (but growing) tape over a finite (but growing) number of computational steps.Our analysis and results then apply to limit of this sequence, in which the tape size and runtime can be arbitrarily large.
In the following sections, we apply Proposition 1 with f = φ M to establish constraints on the heat function Q of any realization of M .We emphasize that in general these constraints do not fully determine the heat function of any realization of M : there can be many different realizations of any given TM M , each with different heat functions and therefore with different thermodynamic properties (see also [34]).In the next sections, we analyze the thermodynamics of two particular realizations of a given TM, which we call the coin-flipping realization and the dominating realization.We work "backwards" for each one, first specifying its heat function, then using Proposition 1 to establish that there is in fact a realization with that heat function, and then analyzing the properties of that heat function.
Before proceeding, we discuss an important issue concerning the computability properties of realizations of TMs.We say that a realization of a TM M with heat function Q is a computable realization if the function Q(x)/kT is computable (i.e., if there exists a TM that can take as input any x ∈ dom φ M and output the value of Q(x)/kT to arbitrary precision).Some of our results below will rely on particular properties of computable realizations.At the same time, some of the realizations we construct and analyze below will not be computable.Whether such non-computable realizations can actually be constructed in the real-world depends on the status of the physical Church-Turing thesis.To see why, imagine that one could construct a non-computable realization of a TM; for example, it might have Q(x)/kT = K(x), where K(x) is the (non-computable) Kolmogorov complexity function.In that case, one could run the realization on various inputs x, use a calorimeter to measure the generated heat in units of kT (i.e., measure Q(x)/kT ), and then arrive at the value of K(x).The above procedure would use a physical process to evaluate a non-computable function, thereby violating the physical Church-Turing thesis.
In this paper, we do not take a position on the validity of the physical Church-Turing thesis.Rather, we will explicitly discuss relevant (non-)computability properties of our realizations, as well as how our non-computable realization can be interpreted in light of the physical Church-Turing thesis.It is important to emphasize, however, that even our noncomputable realizations are consistent with the laws of thermodynamics, and are well-defined in terms of a sequence of time-varying Hamiltonians and stochastic dynamics (see the construction in the proof of Proposition 1, Appendix C).Their non-computability arises from the fact that our construction uses various idealizations, such as the ability to apply arbitrary Hamiltonians to the system, which are standard in theoretical statistical physics but which disregard possible computational constraints on the set of achievable processes.For example, our construction disregards the fact that, if the physical Church-Turing thesis holds, then it should be impossible to apply non-computable Hamiltonians to the system, such as H(x) = K(x).

IV. COIN-FLIPPING REALIZATION
We first consider a realization of a TM M that achieves zero EP (i.e., is thermodynamically reversible) when run on input programs randomly sampled from a particular input distribution.
To begin, consider the following coin-flipping distribution over programs, which plays an important role in AIT: Note that m X sums to a value less than 1 [42], therefore m X is a non-normalized probability distribution.Nonetheless, we refer to it as a "distribution", following the convention in the AIT literature.
To understand m X more concretely, imagine that the initial state of the TM's tape is set to a sample of an infinitely long sequence of independent and uniformly distributed bits.Then, m X (x) is proportional to the probability that M eventually halts after reading the bit string x from the tape.5Under this hypothetical initialization procedure, the TM will halt on output y with probability This output distribution is biased toward strings that can be generated by short input programs.Note that, like m X , this output distribution is not normalized.We now consider the thermodynamic cost of running a TM on the coin-flipping distribution.We first define a normalized version of the coin-flipping distribution, where Ω M := x∈dom φM 2 −ℓ(x) ≤ 1 is a normalization constant (which in AIT is called the "halting probability").
5For clarity, we omit various technicalities regarding the random process that motivates the coin-flipping distribution.To be precise, this process should be defined in terms of a multi-tape machine, in which one of the tapes is a one-way read-only "input tape" (see Appendix A).Then, m X (x) is the probability that the multi-tape machine halts after reading the string x from the input tape, assuming the input tape is initialized with an infinitely-long random bit string.
p coin X (x) is the probability that a TM halts after running input program x, conditioned on the TM halting on some input program, given the random initial tape described above.We also define a normalized version of the output distribution, Now consider the associated function It can be verified that this function satisfies condition 2 of Proposition 1.Thus, there is at least one realization of M , which we call the coin-flipping realization, whose heat function obeys By plugging Q coin into Eq.( 12), we can verify that this realization achieves Σ(p coin X ) = 0, meaning that it is thermodynamically reversible when run on input distribution p coin X .Eq. ( 22) can be further simplified by using the definitions of p coin X and p coin Y : This establishes the claim in the introduction, that the heat generated under the coin-flipping realization on input x is proportional to the length of x, minus a "correction term" − log 2 m Y (φ M (x)).This correction term is always positive, since m Y (y) ≤ 1 for all y.Moreover, it reflects the logical irreversibility of the partial function φ M on input x: it achieves its minimal value of − log 2 Ω when φ M maps all inputs to a single output, and its maximal value of ℓ(x) when φ M is logically reversible on input x (i.e., when x is the only input that produces output φ M (x)).In the latter logically reversible case, Q coin (x) = 0 for all x.
Eq. ( 23) implies that if one wishes to produce some desired output y ∈ img φ M while minimizing heat generation, one should choose the shortest input x such that φ M (x) = y.Loosely speaking, the "less efficient" one is in choosing what program to use to compute y, the greater the heat that is expended in that computation.Note that this relationship between shorter programs and less heat generation is not a universal feature of all realizations of TMs.It holds for the coin-flipping realization because this realization is explicitly designed to be thermodynamically-reversible for the coin-flipping input distribution, which has a "built-in bias" for shorter input strings.
An important special case is when the TM of interest is a universal TM.For any UTM U , the output distribution in Eq. ( 18) is called the universal distribution in AIT.The universal distribution possesses many important mathematical properties and is one of the cornerstones of AIT [42,71,97,104,105], and has attracted attention in artificial intelligence [93][94][95][96][97][98], foundations of physics [99,100], and statistical physics [77,[101][102][103].In particular, "Levin's Coding Theorem" [42] relates the universal distribution to Kolmogorov complexity, This implies that for a UTM, the "correction term" mentioned above is equal to the Kolmogorov complexity of the output, up to an additive constant.Plugging Eq. ( 24) into Eq.( 23) lets us write the heat function of the coin-flipping realization of a UTM as So for a coin-flipping realization of a UTM, the heat generated on input x reflects how much the length of x exceeds the shortest program which produces the same output as x.
These results allow us to calculate the thermodynamic complexity of any output string y using the coin-flipping realization of a UTM U , i.e., the minimal heat necessary to generate some desired output y: where we've used Eq. ( 25) and the fact that min x:φU (x)=y ℓ(x) = K(y) by definition.Thus, for the coin-flipping realization, the minimal heat required by the UTM to compute y is bounded by a constant.As emphasized above, this is a fundamental difference between thermodynamic complexity of the coin-flipping realization and Kolmogorov complexity, which is unbounded as one varies over y.However, in order to actually produce a desired output y on a UTM U while generating the minimal possible amount of heat, one needs to know the shortest program for that y.Unfortunately, the shortest program for a given output is not computable in general.In fact, we prove in Appendix D that there cannot exist a computable function that maps any desired output y to some corresponding input x such that both φ U (x) = y and the heat is bounded by a constant, Q coin (x) = O(1).
We finish by considering the expected heat that would be generated by a realization of a UTM U if inputs were drawn from the distribution p coin X .To begin, rewrite Eq. ( 12) as In Appendix F, we show that the difference of entropies on the RHS of Eq. ( 27) is infinite.Since Σ(p coin X ) is always non-negative, any realization of U must, on average, expend an infinite amount of heat to run input programs sampled from p coin X .This applies to the coin-flipping distribution, for which Σ(p coin X ) = 0, as well as any other realization.Note that ℓ(x) ≥ Q coin (x)/(kT ln 2) (by Eq. ( 23) and the fact that m Y (y) ≤ 1 for all y), and that ℓ(x) is a lower bound on the number of steps that a prefix UTM needs to run program x (since it must take at least one step per read-in bit).Thus, the fact that programs sampled from the coin-flipping distribution have infinite expected heat generation also implies that they have an infinite expected length, and take an infinite expected number of steps before halting.
We finish by emphasizing that EP and expected heat vary in different ways as one changes the initial distribution.For example, if we run the coin-flipping realization on input distribution p coin X , then EP is zero while expected heat is infinite.On the other hand, since expected heat is a linear function of the input distribution, minimal expected heat corresponds to a delta-function input distribution centered on the x that minimizes Q coin (x).However, some simple algebra shows that any such delta-function distribution incurs a strictly positive EP for any UTM.6Thus, the distribution that minimizes expected heat cannot be the one that minimizes EP.

A. Minimal possible heat function
We now consider a realization of a TM whose heat function is smaller, up to an additive constant, than the function of any computable realization.
To begin, given any (universal or non-universal) TM M , consider the associated function Note that this conditional Kolmogorov complexity can be defined in terms of any desired UTM, with no a priori relation to M .In Appendix E, we show that this function G satisfies condition 2 in Proposition 1.Therefore, there must be at least one realization of M , which we call the dominating realization, whose heat function obeys Intuitively speaking, the inputs x that generate a large amount of heat under the dominating realization of a TM M are long and incompressible, even when given knowledge of their associated outputs φ M (x).An example of such an input is a program x that instructs M to read through a long and incompressible bit string and then output nothing, so that φ M (x) is an empty string (this example is analyzed in more depth below, in Section VI).In contrast, the inputs x that generate little heat under the dominating realization are those in which the output provides a large amount of information about the associated input program.For instance, if M is universal, then a program x that consists of the instruction "print 'y'" (represented in some appropriate binary encoding) generates little heat, since K("print 'y'"|y) = O(1) for any y.More generally, if φ M is logically reversible over its domain, then K(x|φ M (x)) = O(1) for all x in that domain, because one can always reconstruct the input x from the output φ M (x) by applying φ −1 M .Thus, in the (logically reversible) case, the heat generated by the dominating realization on any input x is bounded by a constant that doesn't depend on x.Now consider any alternative computable realization of M that is coupled to a heat bath at temperature T , whose heat function we indicate by Q.The assumption of computability means that the function Q(x)/kT is computable (i.e., there is some TM that, for any desired x, can approximate the value of Q(x) in units of kT to arbitrary precision).As we prove 6Given a UTM and any string y, there are many inputs x that result in φ U (x) = y.This means that p coin Y (φ U (x)) > p coin X (x) for any x, so Q coin (x) > 0 by Eq. ( 22).Thus, for any delta-function distribution δx, Σ(δx) = S(δ φ U (x) )− S(δx) + Q(x) = Q(x) > 0, where we've used S(δ φ U (x) ) = S(δx) = 0.
in Appendix E, the heat function of this alternative realization must obey the following inequality, where K(Q/kT ) is the Kolmogorov complexity of the heat function Q in units of kT ,K(φ M ) is the Kolmogorov complexity of the partial function computed by M , and O(1) represents equality up to an additive constant (that does not depend on x, Q, or M ).Since neither K(Q/kT ) nor K(φ M ) depends on the input x, Eq. ( 29) implies Q(x) ≥ Q dom (x) + κ for some constant κ that is independent of x.Note though that κ can depend on φ M (the partial function being computed) and the alternative realization Q, and note also that in principle this constant may be arbitrarily large and negative.This means that for any fixed input x, there may be computable realizations that result in far less heat when run on x than does the dominating realization.However, this can only occur if φ M has high complexity (large value of K(φ M )), or if the heat function has high complexity, as reflected by a large value of K(Q/kT ).This shows that any computable realization must face a fundamental tradeoff between three different factors: the "lost" algorithmic information about the input in the output, the complexity of the input-output map being realized, and the complexity of the heat function.We explore this tradeoff using an example of erasing a long string in Section VI.
When the TM under question is universal, then it is guaranteed that there exists some program that can generate any desired output y.This permits us to analyze the thermodynamic complexity of the dominating realization.It turns out that, as for the coin-flipping realization, this amount is bounded by a constant: This minimum is achieved by programs of the form x = "print 'y'", since these programs achieve K(x|φ U (x)) = O(1).Eq. ( 30) also holds if the TM is not a UTM, as long as for each each output y, there is some x that obeys φ M (x) = y and K(x|φ M (x)) = O(1) (e.g., if φ M is logically reversible).Finally, we consider the expected heat that would be generated by running the dominating realization of a UTM U , assuming that inputs are sampled randomly from some input distribution.To parallel the analysis of the coin-flipping realization, we consider the input distribution which results in minimal EP for the dominating realization, which we call p * X .In Appendix F, we prove that the expected heat generated by the dominating realization on the input distribution p * X is infinite.It is interesting to note that ℓ(x) ≥ Q dom (x)/(kT ln 2)+O(1) and, as we mentioned above, ℓ(x) is a lower bound on the number of steps that a UTM needs to run program x.7 Thus, the 7We have the inequalities K(x|y) ≤ K(x) + O(1) ≤ ℓ(x) + O(1).The first comes from subadditivity of Kolmogorov complexity [42], while the second comes from Lemma 5 in Appendix H. fact that programs sampled from p * X have infinite expected heat generation also implies that they have an infinite expected length, and an infinite expected runtime.Note that the dominating realization of a UTM will in general incur a strictly positive amount of EP, even when run on the optimal input distribution p * X (see Appendix G for details).

B. Practical implications of the dominating realization
Our analysis of the dominating realization uses several abstract computer science concepts, such as the computability of the heat function and its Kolmogorov complexity.It is worth making some comments about the real-world significance of such concepts for the thermodynamics of physical systems.
First, the computability properties of the heat function are entirely separate from the computability properties of the logical map φ M realized by a physical process.In particular, the heat function can be uncomputable even though φ M is computable by definition (since φ M is the partial function implemented by a TM).On the other hand, common interpretations of the physical Church-Turing thesis imply that the heat function of any actually constructable real-world physical process must be computable.This implies that, if the physical Church-Turing holds, the dominating realization generates less heat, up to an additive constant, than any realization that can actually be constructed in the real-world.
At the same time, while the dominating realization is better than any computable realization, it is important to note that it itself is not computable.This is because the conditional Kolmogorov complexity is not a computable function, i.e., there is no TM that can take as input two strings x and y and output the value of K(x|y).However, this does not necessarily imply that the dominating realization is irrelevant from a practical point of view.This is because K(x|y) is an upper-semicomputable function, meaning that it is possible to compute an improving sequence of upper bounds that converges on K(x|y).Formally, there is a computable function f such that f (x, y, n) ≥ f (x, y, n + 1) and lim n→∞ f (x, y, n) = K(x|y).8 The upper-semicomputability of Q dom allows one to approach the performance of Q dom by constructing a sequence i = 1, 2, . . . of realizations of φ M , each with a computable heat function Q i , such that Q i converge from above on Q dom .Each subsequent realization in this sequence is guaranteed to be better (generate less heat) on every input than the previous.Moreover, because the heat functions converge on Q dom , by advancing far enough in this sequence one can run any input x with only Q dom + ǫ heat for any ǫ > 0. An important subtlety, however, is that one cannot compute how far into the sequence to advance so as to be within ǫ of Q dom (if one could compute this, then Q dom would be computable, and not just upper-semicomputable).
8This function can be computed by a TM that runs multiple programs in parallel, while keeping track of the shortest program which has halted on input y with output x.
Finally, while we showed that Q dom is better than any computable realization in terms of heat generation, we also mentioned that it itself is only upper-semicomputable, not computable.One might ask if there is some other uppersemicomputable realization (i.e., one whose heat function can be approached by above) which is even better than Q dom .It is known that this is not the case: the optimality result of Eq. ( 29) holds not only for any computable Q, but more generally for any upper-semicomputable Q.

C. Comparison of coin-flipping and dominating realizations
We finish our discussion of the dominating realization by briefly comparing it to the coin-flipping realization.
First, for both dominating and coin-flipping realizations, the minimal heat necessary to generate a given output y on a UTM U , which we call the thermodynamic complexity of the realization, is bounded by a constant that does not depend on y.There is no a priori relationship between those two constants, and in principle it is possible that, for all y, the thermodynamic complexity is larger under the dominating realization than the coin-flipping realization, or vice versa.In general, the constants will depend on the realized UTM U , as well as the UTM used to define the conditional Kolmogorov complexity in Eq. ( 28) (which does not have to be the same as U ).
Second, to achieve bounded heat production for output y under the coin-flipping realization, one must know the shortest program for producing y, which is uncomputable.In contrast, to achieve bounded heat production for output y under the dominating realization, it is sufficient to choose an input of the form "print 'y'".
Third, for both realizations, there is an infinite amount of expected heat generated, assuming that inputs are sampled from the EP-minimizing distribution.
Fourth, the coin-flipping realization is (by design) thermodynamically-reversible for input distribution p coin X .The dominating realization, on the other hand, is not thermodynamically-reversible for any input distribution (see Appendix G).
Finally, neither the coin-flipping nor the dominating realization of a UTM has a computable heat function.In fact, the heat function of the coin-flipping realization is not even uppersemicomputable.9This means that our results concerning the superiority of the dominating realization do not apply when comparing to the coin-flipping realization, and in particular it is not necessarily the case that Q coin (x) ≥ Q dom (x) + O(1).Nonetheless, it turns out that for any UTM U , the additional heat incurred by the dominating realization on input x, beyond that incurred by the coin-flipping realization, is bound by a logarithmic term in the complexity of the output, [42,Thm. 4.3.3].This implies that Q coin is "lower-semicomputable", meaning it can be approximated by an improving sequence of computable lower bounds.
FIG. 3. Any computable process that realizes a deterministic inputoutput map f faces a fundamental cost of K(x|y) for mapping input x to output y = f (x).This cost can be paid through some combination of three different strategies: generating a large amount of heat, having a high complexity heat function, or having a high complexity inputoutput map f .This tradeoff is illustrated on three axes, with blue indicating the feasible region.
(See Appendix H for proof.)Such logarithmic correction terms are considered inconsequential in some previous analyses of the thermodynamics of TMs [5,82].

VI. HEAT VS. COMPLEXITY TRADEOFF
Our analysis of the dominating realization uncovered a tradeoff between heat and complexity faced by any computable physical process.In this section, we illustrate this tradeoff by analyzing the thermodynamics of erasing a long bit string.
As before, consider a physical system with a countable state space, which undergoes driving while coupled to a heat bath at temperature T .For notational simplicity, in this section we choose units so that kT = 1.Assume that the process realizes some deterministic and computable map from initial to final states, which we indicate generically as f : {0, 1} * → {0, 1} * .Now imagine that one observes a single realization of this physical process, in which initial state x is mapped to final state y = f (x).
Since this is a computable realization of f , it must obey the dominating realization bound of Eq. ( 29).Plugging Eq. ( 28) into that inequality and rearranging gives where we've used the assumption that kT = 1.This shows that there is a fundamental cost of K(x|y) that is incurred by any computable realization that maps input x to output y.This fundamental cost can be paid either by generating a lot of heat (large Q(x)/ ln 2), by having a high complexity heat function (large K(Q)), or by realizing a high-complexity input-output function (large K(f )).This tradeoff is illustrated in Fig. 3.We demonstrate this tradeoff using an example of a process that erases a long binary string.In this example, x is a long string consisting of n binary digits, while the final state y is a string of n 0s, which we write '00...00'.Assuming x is incompressible (which is true for the vast majority of all strings), the fundamental cost of mapping x → y is given by K(x|y) = K(x) ≈ ℓ(x) up to logarithmic factors [42].Different processes can pay this fundamental cost in different ways, thereby satisfying Eq. ( 32): (1) A process can generate a lot of heat.For example, in order to erase string x, the process can run an erasure map: while using the dominating implementation.In this case, Q(x)/ ln 2 = K(x|y) by Eq. ( 28).
(2) A process can have a high-complexity heat function, so that K(Q) ≥ ℓ(x).For example, one can tweak the dominating realization of the erasure map, so that the heat values for input x and the input consisting of all 0s are swapped: One can verify that since Q dom satisfies condition 2 in Proposition 1, so does this Q.Moreover, this realization generates a small amount of heat when erasing x, Q(x) = Q dom ('00...00') = K('00...00'|'00...00') ≈ 0.
Note, however, that the long input string x is now "hard-coded" into the definition of the heat function Q, leading to a large value of K(Q).
(3) A process can realize a high-complexity input-output map f , so that K(f ) ≥ K(x|y).This strategy could be used, for example, by a process which implements the following logically reversible map: x ′ if x ′ ∈ {x, '00...00'} '00...00' if x ′ = x x if x ′ = '00...00' Since logically reversible function can be carried out without generating heat, it is possible to implement this f while achieving Q(x ′ ) = 0 for all x ′ .In this case, not only does erasing x not generate any heat, Q(x) = 0, but the heat function has low complexity, K(Q) ≈ 0. Now, however, the long input string x is "hard-coded" into the definition of the input-output map f , leading to a large value of K(f ).
We finish by noting that in a series of papers by Zurek and others [5,6,[80][81][82][83][84][85][86], it was argued that the conditional Kolmogorov complexity K(x|y) is "the minimal thermodynamic cost" of computing some output y from input x.However, most of these early papers were written before the development of modern nonequilibrium statistical physics.As a result, the arguments in those papers are rather informal, which in turn makes it difficult to translate them in a fully rigorous manner into modern nonequilibrium statistical physics.(See Sec.14.4 in [34] for one possible translation.)To give one example of these difficulties, those earlier analyses quantified the "thermodynamic cost" in terms of the number of physical bits (binary degrees of freedom) that are erased during that computation, independent of the initial probability distributions over those binary degrees of freedom.However, we now know that minimal heat generation is given by changes in Shannon entropy, i.e., in terms of statistical bits rather than physical bits.Relatedly, these papers led to some proposals that the foundations of statistical physics be changed, so that thermodynamic entropy is identified not only with Shannon entropy, but also a Kolmogorov complexity term [6,42].
In contrast, our analysis is grounded in modern nonequilibrium physics, and does not involve any foundational modifications to the definition of thermodynamic entropy.Moreover, it covers some issues not considered in earlier analyses.In particular, we show that the lower bound of K(x|y) is a cost that in general applies only to computable realizations (i.e., ones with a computable heat function), not for all possible realizations, as implied in the earlier papers.The significance of this restriction depends on the legitimacy of the physical Church-Turing thesis.Finally, we also demonstrate different ways in which one can pay the fundamental cost K(x|y): by either generating heat, by having a large Kolmogorov complexity of the heat function K(Q), or by having a large Kolmogorov complexity of the input-output map, K(f ).

VII. DISCUSSION
In this paper we combine Algorithmic Information Theory (AIT) and nonequilibrium statistical physics to analyze the thermodynamics of TMs.We consider a physical process that realizes a deterministic input-output function, representing the computation performed by some TM.We derive numerous results concerning two different realizations of TM: a coin-flipping realization, which is designed to be thermodynamically reversible when fed with random input bits, and a dominating realization, which is designed to generate less heat than any computable realization.
Using our analysis of the dominating realization, we uncover a fundamental tradeoff, faced by any computable realization of a deterministic input-output map, between heat generation, the Kolmogorov complexity of the heat function, and the Kolmogorov complexity of the input-output map.An interesting topic for future research is how the Kolmogorov complexity of the heat function and the input-output map relates to the "physical complexity" of the driving process, as commonly understood in physics (e.g., whether the Hamiltonians must have many-body interactions, etc.).
For simplicity, in this paper we represented a TM M as a physical system whose dynamics carries out the partial function φ M : {0, 1} * → {0, 1} * during some finite time interval [0, t f ].This representation allowed us to abstract away many implementation details of the realization, such as the fact that a TM consists of a separate tape, head, and pointer variables, and that a TM operates in a sequence of discrete steps.Essentially, this representation does not distinguish whether the physical process operates via the same sequence of steps as a TM, or simply implements a "lookup table" that maps outputs to inputs.
While this representation simplifies our analysis, it provides no guidance on how to actually construct a physical process that realizes a TM in the laboratory, and it leaves implicit some important issues.Alternatively, one could represent a realization of a TM in a more conventional and "mechanistic" way, as a dynamical system over the state of the TM's tape, pointer, and head, which evolves iteratively according to the update function of the TM until the head reaches the halt state.In contrast to the representation we adopted, this kind of mechanistic representation could easily be physically constructed, and would correspond more closely to the step-by-step operation of realworld physical computers.Moreover, this kind of mechanistic representation could be used to analyze the thermodynamic costs of TMs in a more realistic manner.For example, it could be used to analyze how the heat and EP incurred by the TM depends on the number of steps taken.As another example, it could be used to impose constraints on how the degrees of freedom of the head, tape, and pointer can be coupled together (e.g., via interaction terms of applied Hamiltonians).One might postulate, for instance, that the head of the TM can only interact with tape locations that are located near the pointer.These kinds of constraint will generally increase the heat and EP incurred by each step of the TM [34,107].These complications concerning the thermodynamics of more mechanistic representations of TMs are absent from the analysis in this paper, and are topics of future research.that this lower bound is non-negative, and vanishes whenever p X|f (X) = w X|f (X) .This means that w X|f (X) , as defined in Eq. (B4), encodes that conditional probability of inputs x given outputs f (x) that achieves minimal EP for a realization of f with heat function Q.
In our previous work, we have sometimes referred to the conditional KL divergence in Eq. (B5) as mismatch cost.Using the chain rule for KL divergence, we write cost as where w f (X) (y) = x:f (x)=y w X (x), while w X (x) is any distribution that obeys In our previous, we referred to the distribution w X (x) as a prior.(This term was originally motivated by a Bayesian interpretation of EP [108].)As long as |img f | > 1, there are an infinite number of priors for any given w X|f (X) , since the relative probabilities of any pair x, x ′ with f (x) = f (x ′ ) are unconstrained.
In our previous work [34,107], we referred to the term − ln Z(f (X)) pX in Eq. (B5) as the residual EP.Observe that for any y ∈ img f , Σ(w X|f (X)=y ) = D(w X|f (X)=y w X|f (X) ) − ln Z(y) = − ln Z(y). (B6) Since Σ(w X|f (X)=y ) ≥ 0 by the second law, − ln Z(y) is non-negative for all y ∈ img f and therefore residual EP is always non-negative.Note also that the residual EP is an expectation under p X , thus it is linear in p X .In fact, it only depends on the probabilities assigned to each output p f (X) (y), not the conditional distribution of inputs corresponding to each output.In our other work [107], we've sometimes called the indexed set {− ln Z(y)} y the residual EP parameter.Finally, define an island of f as a pre-image f −1 (y) for some y, with L(f ) the set of all islands of U .We can rewrite Eq. (B5) as where p(c) = x∈c p X (x).Intuitively, this expression shows that any realization of the function f can be thought of a set of (island-indexed) "parallel" processes, operating independently of one another on non-overlapping subsets of X , each generating EP given by the associated mismatch cost and residual EP.
This form of mismatch cost, residual EP, and island decomposition was introduced in [34,107,108].It holds even in the general case of non-deterministic dynamics, with an appropriate (more general) definition of the prior w X and the island decomposition.However, that previous work on mismatch cost and residual EP assumed finite state spaces.The derivation presented above does not have that restriction.
We now prove that condition 1 is implied by condition 2. Define w X|f (X) (x|f (x)) as in Eq. (B4), while taking Q/kT = G.Then, use the results in Appendix B to rewrite F as F (p X ) = D(p X|f (X) w X|f (X) ) − ln Z(f (X)) pX ≥ D(p X|f (X) w X|f (X) ) ≥ 0.
The first inequality follows from the assumption that Z(y) = x:f (x)=y e −G(x) ≤ 1 for all ∈ img f , and the second inequality follows from the non-negativity of conditional KL divergence [109].
The rest of this proof shows by construction that condition 3 follows from condition 2. For simplicity, assume that the physical process has access to a set of "auxiliary" states, one for each y ∈ img f .We use x y to indicate the auxiliary state corresponding to each y, and assume that x y ∈ dom f .For notational convenience, let W := dom f ∪ {x y : y ∈ img f }.Then, define the following function f : W → X , For any x ∈ dom f , f (x) := f (x) , For any y ∈ img f , f (x y ) := y .
In words, any x in the domain of f is mapped by f to f (x), while any auxiliary state x y is mapped by f to y.To derive the second line, we plugged Eqs.(C3) and (C4) into Eq.(C5) and simplified.We now consider the following physical process over t ∈ [0, t f ], applied to a system coupled to a work reservoir and a heat bath at temperature T : 1.At t = 0, the Hamiltonian H is applied to the system.
2. Over t ∈ (0, τ ], the system is allowed to freely relax toward equilibrium.However, the only allowed transitions are those between pairs of states w, w ′ that have f (w) = f (w ′ ).We assume that by t = τ , the system has reached a stationary distribution.
3.Over t ∈ (τ, t f ], the system undergoes a quasistatic physical process that implements the map f from initial to final states, and does so in a thermodynamically reversible way for initial distribution π.There are numerous known ways of constructing such a process [15,16,110]. Note that the above procedure assumes a separation of timescales (i.e., the relaxation time of the system is infinitely faster than τ and t f − τ ).
The above procedure will map any x ∈ dom f to final state f (x).Let Q indicate the heat function of this process.We will show that Q(x)/kT = G(x) for any x ∈ dom f .First, let δ x indicate an initial distribution which is a delta function over some state x.Note that where we've used the fact that S(δ x ) = S(δ f (x) ) = 0. We then analyze Σ(δ x ).
Step (1) and step (3) in the above construction incur no EP.For step (2), EP incurred during free relaxation from t = 0 to t = τ is given by where p τ x is the state distribution at time τ , given that the system started in distribution δ x at t = 0.By construction, p τ x will be equal to the equilibrium distribution restricted to a subset of states, p τ x (w) = δ( f (w), f (x))π(w) w ′ δ( f (w ′ ), f (x))π(w ′ ) .
It can be verified, using the definition of δ x and π, that D(δ x π) = f (x)/kT + G(x) + ln Z.
It can be verified that the physical process constructed in the proof of Proposition 1 is thermodynamically reversible if it is started with the initial equilibrium distribution π X , so that the free relaxation in step 2 incurs no EP.Generally, this equilibrium distribution will have support on the auxiliary states, which are outside of dom f .However, consider the case when Eq. ( 15) is an equality for all y ∈ img f .Then, the definition in Eq. (C4) gives H(x y ) = ∞ and π X (x y ) = 0 for all y ∈ img f .In this case, the input distribution p X = π X obeys supp p X ⊆ dom f and achieves zero EP.Moreover, using the decomposition in Appendix B, it can be verified that if Eq. ( 15) is an equality for all y ∈ img f , then any input distribution that obeys p X|f (X) = π X|f (X) , as defined in Eq. (B1), also achieves zero EP.