Functional thermodynamics of Maxwellian ratchets: Constructing and deconstructing patterns, randomizing and derandomizing behaviors

Maxwellian ratchets are autonomous, ﬁnite-state thermodynamic engines that implement input-output in-formational transformations. Previous studies of these “demons” focused on how they exploit environmental resources to generate work: They randomize ordered inputs, leveraging increased Shannon entropy to transfer energy from a thermal reservoir to a work reservoir while respecting both Liouvillian state-space dynamics and the second law. However, to date, correctly determining such functional thermodynamic operating regimes was restricted to a very few engines for which correlations among their information-bearing degrees of freedom could be calculated exactly and in closed form, a highly restricted set. Additionally, a key second dimension of ratchet behavior was largely ignored: ratchets do not merely change the randomness of environmental inputs, their operation constructs and deconstructs patterns. To address both dimensions, we adapt recent results from dynamical-systems and ergodic theories that efﬁciently and accurately calculate the entropy rates and the rate of statistical complexity divergence of general hidden Markov processes. In concert with the information processing second law, these methods accurately determine thermodynamic operating regimes for ﬁnite-state Maxwellian demons with arbitrary numbers of states and transitions. In addition, they facilitate analyzing structure versus randomness tradeoffs that a given engine makes. The result is a greatly enhanced perspective on the information processing capabilities of information engines. As an application, we give a thorough-going analysis of the Mandal-Jarzynski ratchet, demonstrating that it has an uncountably inﬁnite effective state space. DOI: 10.1103/PhysRevResearch


I. INTRODUCTION
In 1867, Maxwell introduced a thought experiment designed to challenge the second law of thermodynamics [1,2], what Lord Kelvin later came to call "Maxwell's demon."Exploiting the fact that the second law holds only on average, i.e., the thermodynamic entropy S cannot decrease over repeated transformations, the experiment conjured an imaginary, intelligent being capable of detecting and then harvesting negative entropy fluctuations to do work.The paradox Maxwell put forward is that by using its "intelligence" this being apparently violates the second law of thermodynamics.Maxwell's challenge was the first indication that the second law must take into account information processing.
The puzzle's solution came from recognizing that the "very observant" and "neat-fingered" demon must manipulate memory to perform its detection and control task and, critically, that such information processing comes at a cost [3,4].To operate, the demon's intelligence has thermodynamic consequences.This is summarized by Landauer's principle: "any logically irreversible manipulation of information... must be accompanied by a corresponding entropy increase in noninformation-bearing degrees of freedom of the informationprocessing apparatus or its environment" [5].This recasts the demon as a type of engine, an information engine that uses correlations in an information reservoir to leverage thermodynamic fluctuations in a heat reservoir to do useful work.This class of information engines, Maxwellian demons and their generalized ratchets, has been subject to extensive study [6][7][8][9].However, previous determinations of their thermodynamic functionality were stymied by the difficulty of accurately calculating the entropic change in what Landauer identified as the system's "information-bearing degrees of freedom." Consider a Maxwellian ratchet designed to read an infinite input tape, perform a computation and thermodynamic transformation, and write to an infinite output tape, as depicted in Fig. 1.The relevant entropic change is quantified by the difference in the Kolmogorov-Sinai entropies of the inputs to the ratchet (h μ ) and of the outputs to the information reservoir (h μ ) [7].However, this calculation ranges from very difficult to intractable when the processes generating the input and output information have temporal correlations.And, more troubling, this problem is generic: when driving a ratchet with uncorrelated input, even simple finite-state memoryful ratchets produce output processes with temporal correlations.Fundamental progress was halted since determining thermodynamic functionality in the most general case, temporally correlated input driving a memoryful ratchet, was intractable.FIG. 1. Information engine as a finite-state ratchet (controller) connected to a thermal reservoir, a work reservoir, and an information reservoir (depicted as tape whose storage cells may be read or written).
Attempts to circumvent these problems either heavily restricted thermodynamic-controller architecture [7], invoked approximations that misclassified thermodynamic functioning, or flatly violated the second law [6].It appears that, and this is one practical consequence of the results reported in the following, a number of recent analyses of information-engine efficiency and functioning must be revisited and corrected.Our contribution is that the latter is now possible.
Reexamining a well-known information ratchet, we apply newly discovered techniques to accurately measure the Kolmogorov-Sinai-Shannon entropy of temporally correlated processes in general [10,11].We show that, via the information processing second law [7], this allows accurate determination of the functional thermodynamics of arbitrary finite-state ratchets.The net result is a shift in perspective.To guarantee that the output information could be studied analytically, previous successful efforts designed ratchet structure, the states and transitions, in accord with a given input's correlational structure [8].One consequence is that follow-on efforts adopted a fixed input-output-centric view of information engines.Here, following the example of the earliest discussions of information ratchets [6], the new methods shift the focus back to the engine itself, setting its design and then exploring all possible input-dependent thermodynamic functionalities.
The approach has appeal beyond mere narrative and historical symmetry.The shift in focus reveals a second dimension to ratchet functionality.The change in entropy rate h μ = h μ − h μ monitors the degree to which the ratchet transforms a process' informational content, but it does not address how this comes about.To do this requires investigating the change in structure from the input process to the output process.These structural changes were previously proved to be deeply relevant to engine thermodynamic efficiency and an engine's ability to meet the work production bounds set by Landauer's principle [12].Their impact is nontrivial.For example, we show an example where forcing a Maxwellian ratchet to perfectly generate or erase structure requires divergent memory resources.
The next section briefly reviews information engines and ratchets, including their energetics, structure, and informatics.To be concrete, we recall one of the first ratchets and review how its thermodynamic functionality (engine, eraser, or dud) is determined.Using new methods from ergodic theory and dynamical systems that determine randomness generation and memory use, we reanalyze the original ratchet, showing that previous analyses misidentified its thermodynamic functioning.This is illustrated for its operation in several distinctly correlated environments.We then explore the structural dimension of ratchet functionality, demonstrating that the engine/eraser/dud classification does not uniquely describe ratchet information processing for a given input.To remedy this, in conjunction with the previous functional classification which can now be exactly carried out, we introduce structurerandomness tradeoffs in engine operation, highlighting the multidimensional nature of ratchet information processing.

II. INFORMATION ENGINES
The information engines of interest consist of a finite-state stochastic controller or ratchet that interacts with a thermal reservoir, a work reservoir, and an information reservoir.These are connected as shown in Fig. 1 and are embedded in a thermal environment at constant temperature T .The information reservoir takes the form of an input tape, which stores a binary-symbol string.Its state is described by the random variable X 0:∞ = X 0 X 1 . . . .We restrict to binary input and output alphabets, so that each X N realizes an element x n ∈ X = {0, 1}.The ratchet operates in continuous time; the controller state at time t = Nτ is represented by the random variable R N , which realizes an element r ∈ R, the ratchet's discrete, finite-state space.
At each step, X N couples to the ratchet controller for an interaction of duration τ .During this time, thermal fluctuations continuously drive transitions in the coupled state space R × X of the ratchet and the current tape symbol.After the interaction interval, the ratchet is in a potentially different state R N+1 , and the symbol X N has been transduced into an output symbol X N = x N ∈ X , which is written to the tape.The strings of possible output symbols are expressed by the random variable X 0:∞ = X 0 X 1 . . . .The tape moves forward, and the next input symbol X N+1 begins its interaction with the ratchet, which now starts in state R N+1 .The joint transitions between states of the ratchet and symbol have energetic consequences, capturing energy flows between the thermal and work reservoirs.

A. Energetics
These information engines are autonomous and transitions in the coupled ratchet-symbol system are driven by fluctuations in the thermal reservoir.Recently, Ref. [8] introduced a general formalism for determining the energetics of such information engines.Under detailed balance, transitions over the joint ratchet-symbol state space R × X are described by a Markov chain M, where every transition with positive probability, denoted must have a reverse transition with positive probability.Energy changes associated with an internal-state transition are then determined by the forward-reverse transition probability ratio: Assuming that all energy exchanges with the heat reservoir occur during the ratchet-symbol interaction interval τ and that all energy exchanges with the work reservoir occur between interaction intervals, the average asymptotic work is where π r⊗x is the asymptotic distribution over the joint state of the ratchet-symbol system at the beginning of an interaction interval.

B. Structure
To discuss the computational structure of information engines, we first cast the input and output strings in terms of the hidden Markov models that read and generate them.
a set of N by N symbol-labeled transition matrices T (x) , x ∈ A: T (x)  i j = Pr(σ j , x|σ i ); the corresponding overall state-to-state transitions are described by the row-stochastic matrix T = x∈A T (x) .
This representation allows us to consider the internal states S of the input machine as well as the internal states S of the output machine.The latter are the joint states of the input process and the ratchet: S = S × R.
When the string of inputs or outputs can be generated by an HMM with only a single internal state, they are memoryless since they can store no information from the past.The random variables generated by the associated HMM are independent and identically distributed (IID).When there is more than a single state, in contrast, the associated process is memoryful and the random variables generated may be correlated in time.
Similarly, we cast the ratchet controller as a transducer that maps input sequences to distributions over output sequences.
Definition 2. A finite-state edge-labeled transducer consists of (1) a finite set of states a set of N by N input-output symbol-labeled transition matrices T (x,x ) , x, x ∈ A × A : The transducer formulation allows us to calculate the output HMM in terms of the ratchet and the input machine.The exact method is given in Appendix B. As with the input machine, a ratchet is memoryless when it possesses only one internal state, and memoryful otherwise.A key feature of a ratchet we focus on here is its ability to alter temporal correlations by altering the structure of an input process.If memoryless (IID) input is fed to a memoryful ratchet, generally the output will be memoryful since this guarantees the state-space dimension of the output Figure 2 graphically illustrates the composition of various input process HMMs with the ratchet transducer we analyze in detail shortly.

C. Informatics
Following Landauer, extensions of the second law of thermodynamics were proposed to bound the thermodynamic costs of information processing by an information engine.Reference [6] employed a bound that compares the Shannon entropy of single-input and single-output symbols.Recall that the Shannon entropy H 1 for a single random variable X realizing values x ∈ X is ( This Shannon entropy quantifies the randomness of the single random variable X averaged over time.That is, H 1 [X ] answers the question of how uncertain it is that any particular x N will be 0, 1, . . ., or k − 1.
Comparing the single-symbol Shannon entropy in the input string to that in the output string quantifies how the ratchet transforms randomness in individual symbols.This difference captures one aspect of the ratchet's information processing.And, it was proposed as an upper bound on the asymptotic work done W [6]: where is the entropy averaged over the input tape, is that averaged over the output, and H 1 = H 1 − H 1 is the change in the single-symbol statistics produced by the ratchet's operation.Note, however, that while the H 1 's track the average information in any single instance of X t or X t , they do not account for temporal correlations within input sequences or within output sequences.This is key, since information ratchets change more than the statistical bias in an individual symbol: they also alter temporal correlations in symbol strings.These altered correlations are related to the fact that the ratchet induces structural change in its input.Recognizing this is central to bounding the thermodynamic costs of the ratchet's interaction with the input process.
To properly address how correlations affect costs, we calculate a process' intrinsic randomness when all temporal correlations are taken into account, as measured by the entropy rate [13]: where H ( ) = H[Pr(X 0: −1 )] is the Shannon entropy for length-symbol blocks.
Replacing the input and output Shannon entropies in Eq. (3) with their respective entropy rates gives the information processing second law (IPSL) [7]: FIG. 2. Composing the Mandal-Jarzynski transducer (center, yellow) with a hidden Markov model (left, green) that describes the process on the input tape gives an output hidden Markov model (right, purple) that describes the process written to the output tape.Hidden Markov model (HMM) states R are depicted as circles.Directed edges between states represent possible transitions on an observed symbol x.HMM edges are labeled x : Pr(x , R N+1 |R N ).Transducers are similarly depicted by circular states with directed edges representing possible transitions on pairs of input symbols x and output symbols x .Transducer transitions are specified by x |x : Pr(x , R N+1 |x , R N ).Left: input HMMs discussed here, from top to bottom, a (memoryless) biased coin, a period-2 process, and the golden mean process.Center: the Mandal-Jarzynski ratchet, represented by a three-state transducer.Probabilities are not shown on edge labels for conciseness, but are nonzero for all transitions and all combinations of input-output symbol pairs (x, x ).Each edge probability is a function, denoted by f (. ..), of the previous state R N , the next state R N+1 , the input symbol x, and the output symbol x .See Sec.IV and Appendix B for further details.Right: output HMMs resulting in the Mandal-Jarzynski transducer composed with the corresponding input HMM on left.Edge labels are left off for conciseness, but each transition label represents a positive probability of observing a 0 or a 1.
The IPSL correctly expresses the upper bound on work, taking into account the presence of temporal correlations in input and output processes.
The importance of Eq. ( 5) cannot be overstated: any memoryful ratchet induces temporal correlations in its output, even for IID input.Using Eq. (3) in the IID case typically overestimates the upper limit on available work.Additionally, temporal correlations in the input are known to be a thermodynamic resource [14].In fact, suitably designed ratchets can leverage such correlations to do useful work.Thus, inappropriately applying Eq. (3) in these cases often results in claims that violate the second law.In short, Eq. ( 5) generalizes Landauer's principle to the case of correlated environments and finite-state memoryful ratchets that generate correlated outputs.
Unfortunately, due to the difficulty of accurately calculating the entropy rate for most processes, previous treatments of information ratchets were restricted to use either Eq.(3) or finite-length approximations to Eq. ( 5).Here, we make use of a solution that removes this restriction and gives accurate calculations of entropy rates for processes generated by general HMMs.

III. ENTROPY RATE OF HMMs
Properly determining the entropy rate of processes generated by HMMs is a longstanding challenge, one known since the 1950s [15].Its recent resolution required introducing new concepts from ergodic theory and dynamical systems [10,11].We now turn to briefly discuss these and the new analysis tools that follow from them.(The Appendices give a more detailed exegesis.)

A. machines
First, though, we need to more carefully consider the HMMs that we use to represent stochastic processes.We briefly recall two important HMM classes.Definition 3. A unifilar HMM (uHMM) is an HMM such that for each state σ k ∈ S and each symbol x ∈ A there is at most one outgoing edge from state σ k labeled with symbol x.
This seemingly minor structural property means that the states are predictive: current state and symbol exactly predict the next state.This has important consequences for calculating the statistical and informational properties of the process that an HMM generates.If an HMM is unifilar, we may directly calculate the entropy of its generated process via the closedform expression In contrast, if an HMM is nonunifilar, its states are not predictive and there is no closed form for the generated process' entropy rate.Definition 4.An machine is a uHMM with probabilistically distinct states: For each pair of distinct states σ k , σ j ∈ S there exists some finite word w = x 0: −1 such that Pr(X 0: As a consequence, a process' machine is its optimally predictive model [16].Moreover, a process' machine is minimal and unique.This means that we can quantify the amount of structural memory a process effectively uses by counting the number of states in its machine or by calculating its stored information.The latter is the statistical complexity C μ , which is the Shannon entropy of the asymptotic probability distribution over states: So, knowing a process' machine is powerful, as it provides closed-form expressions for both a process' intrinsic randomness and its structural memory [17], two important thermodynamic resources.That said, even if a ratchet's input is generated by a finite machine, the output process will not be.In general, the output process generator (Fig. 2, right column) will be a nonunifilar HMM.This precludes a direct calculation of the entropy rate of a ratchet's output process.And so, when determining thermodynamic function, it appears that a key constituent (h μ ) is inaccessible.Note, too, that nonunifilarity precludes determining the output process' memory C μ and, failing that, one cannot accurately analyze the changes in structure effected by a ratchet.

B. Mixed-state presentation
Despite there being no closed-form expression for the entropy rate of the process generated by a finite nonunifilar HMM, there is a way to unifilarize HMMs, introduced by Blackwell [15], using mixed states.A mixed state is the answer to the following question: "Given that one knows the HMM structure (states and transitions) and has observed a particular sequence, what is the best guess of the internal state probabilities?"More formally, an N-state HMM's mixed states are conditional probability distributions η(x − :0 ) = Pr(R 0 |X − :0 = x − :0 ) over the HMM's internal states R, given all sequences x − :0 ∈ A , = 0, 1, 2, . . . .(See Appendix A 2 for further detail on the calculation of mixed states.) The mixed states together with the mixed-state transition dynamic give an HMM's mixed-state presentation (MSP), a presentation of the process that is unifilar by construction.However, in the typical case, this improvement comes at a heavy cost: the set of mixed states is uncountably infinite.This renders the complexity-measure expressions Eqs. ( 6) and (7) unusable.Blackwell provided a formal replacement for Eq. ( 6)'s entropy-rate expression, an integral expression [Eq.(A6) in Appendix A 3] for the entropy rate over the often fractal invariant Blackwell measure μ(η) on the set of mixed states R [15].
Recently, Refs.[10,11] introduced a constructive approach to evaluate this integral equation by showing that the mixedstate generation process is ergodic.Given this, rather than integrate over the measure μ(η) as required by Blackwell's equation (A6), we can take the time average over the series η 0 , η 1 , . . ., η t of iterated mixed states to obtain the entropy rate.(See Appendix A 3 for detail.)In the present setting, this means we can now calculate the entropy rate of output processes for arbitrary ratchets and arbitrary inputs.
Characterizing a nonunifilar HMM's structure is slightly more delicate.Due to the generic uncountability of predictive states for nonunfilar HMMs, C μ of the set of mixed states diverges.To characterize the divergent memory resource cost of predicting processes with uncountably infinite mixed states, we track the rate of divergence of C μ , the statistical complexity dimension d μ of the Blackwell measure μ on R [18]: where H ε [Q] is the Shannon entropy of a continuous-valued random variable Q, coarse-grained at size ε, and R is the random variable associated with the mixed states η ∈ R.
Appendix A 4 develops an upper bound on this that can be accurately determined from the measured process' entropy rate h B μ [Eq.(A7)] and the mixed-state process' Lyapunov characteristic exponent spectrum .
In this way, the functional thermodynamics of finite-state Maxwellian ratchets can be accurately determined and systematically explored.

IV. MANDAL-JARZYNSKI INFORMATION RATCHET
To demonstrate the descriptive power of these dynamicalthermodynamic results on ratchet entropy, dimension, mixed states, and function, we apply them to a well-known example of an information engine, the Mandal-Jarzynski ratchet [6], hereafter, the ratchet.Although initially introduced without reference to HMMs and transducers, following Ref.[7] we translate the original ratchet model into the HMM-transducer formalism outlined in Sec.II.In these terms, the ratchet is a three-state, fully connected transducer, designed such that only transitions that flip an incoming symbol are energetically consequential.As shown in Fig. 3, the ratchet's transition probabilities are parametrized by τ ∈ [0, ∞) (duration of the ratchet-symbol interaction) and ∈ (−1, 1) (the weight parameter).For a given τ and , the Mandal-Jarzynski model may be written as the three-state transducer shown in the center column Fig. 2. See Appendix B for how to calculate the transducer, which is based on a rate-transition matrix, and Appendix B 1 for the input-transducer composition method.
Any interaction interval in which the input symbol is unchanged is energetically neutral.Therefore, we measure the average work done by the ratchet by the difference in the probability of reading a 1 on the input tape cell versus writing Ratchet schematic adapted from the original Mandal-Jarzynski construction, showing how the dial-and-symbol system is transformed into a three-state transducer upon selection of a specific , determining the energetics of flipping a symbol, and τ , determining the interaction interval.For almost every value of and τ every state-to-state transition has positive probability for every input-output symbol combination.a 1 to the output tape cell: where w( ) = log[(1 + )/(1 − )].When = 0, flips 0 → 1 and 1 → 0 are both energetically neutral; when → ±1, symbol flips in one direction are energetically favored over the other.Note that this computation finds the same asymptotic work production as Eq. ( 1), recalled here as an aid to intuition.
The analysis in Ref. [6] considered only uncorrelated inputs.That is, their input machine was a single-state HMM, a biased coin, with bias δ = Pr(0) − Pr(1).To identify their ratchet's thermodynamic functionality, the work bound was approximated via Eq.( 3), that is, assuming tape symbols were statistically independent.However, the ratchet is memoryful (due to its three internal states) and, therefore, in general induces correlations in its output, even for uncorrelated inputs.Since the single-symbol entropy only upper bounds the true Shannon entropy rate h μ H 1 , Eq. ( 3) is suspect when used to identify actual thermodynamic functioning.The following shows that, while approximately correct for uncorrelated input, the single-symbol entropy bound is violated for correlated input.Its incorrect use mischaracterizes thermodynamic functioning and can lead to violations of the second law.
In addition, our methods give insight into how the ratchet processes structural information.Due to its inherent nonunifilarity, the ratchet produces nonunifilar output machines that generate processes with an uncountably infinite set of mixed states, even when driven by a finite-state machine, as Fig. 4 shows.Moreover, the figure also demonstrates that as ratchet parameters vary, the mixed-state sets have strikingly different structure.
Previous interpretations of ratchet thermodynamic functioning were limited to considering only transformations of randomness, i.e., for given ratchet parameters and input, what is the sign and magnitude of k B T ln 2 h μ and how does this affect W ?Such questions ignore the key second dimension of information processing illustrated so vividly by Fig. 4.That is, given the same ratchet, parameters, and input, what is the sign and magnitude of C μ and d μ ?Does the ratchet construct new patterns in its output ( C μ > 0 or d μ > 0) or deconstruct patterns passed to it from the input ( C μ < 0 or d μ < 0)?How do these then affect W ? Answering structural questions requires a more thorough taxonomy of thermodynamic functionality than the original engine/dud/eraser categories.

V. RANDOMIZING AND DERANDOMIZING BEHAVIORS
The ratchet's previously identified thermodynamic functions engine, eraser, and dud were identified by comparing the sign and magnitude of k B T ln 2 h μ to the asymptotic work production.As such, there are three physically possible orderings: (i) engine: 0 A ratchet randomizing inputs ( h μ > 0) can operate as an engine, if it is leveraging the change in entropy rate to do useful work.It may also act as a dud, if the randomization produces no useful work or, worse, if the ratchet is using work.A ratchet derandomizing inputs ( h μ < 0) is termed an "eraser" and can only derandomize up to W /k B T ln 2 bits using W joules of work.The ordering k B T ln 2 h μ < W < 0 would imply that the ratchet is derandomizing beyond the physical limitations of Landauer's principle.
As noted already, Ref. [6] originally identified these functionalities using the entropy-change approximation H 1 rather than the exact change h μ introduced by Ref. [7].As previously shown, driving a memoryful ratchet with a memoryful input violates Eq. (3) [14].In all other cases, Eq. ( 3) is valid, but may mischaracterize the functional thermodynamic regimes.A natural question, therefore, is how much difference does using the correct entropy rate make in identifying function?To see this, we now compare h μ and H 1 .
There are three possibilities.First, H (1) = h μ .In this case, a ratchet does not change the presence of temporal correlations.This occurs when a memoryless ratchet is driven by memoryless input.
Second, H (1) > h μ .Here, a ratchet reduces the presence of temporal correlations, which occurs when a memoryless ratchet has been driven by memoryful input.In this regime, the difference in single-symbol entropy is a tighter bound on the correlation change than the difference in entropy rate.Critical to this case, though, recall that our goal is not a tight bound, but rather an accurate measurement of the gap between information processing and asymptotic work.The upshot is that using Eq.(3) in this case may mischaracterize thermodynamic functionality.
Finally, H (1) < h μ , which occurs when a memoryful ratchet is driven by memoryless input.In this case, the ratchet increases temporal correlations in the output, so that the difference in entropy rates is a tighter bound on the asymptotic work production.This is the scenario in the first treatment of the Mandel-Jarzynski ratchet [6].Note that when a memoryful ratchet is driven with memoryful input, the most generic case, all orderings of h μ and H (1) are possible.
Let us now turn to consider in detail how the ratchet operates in three distinct environments: memoryless, periodic, and memoryful inputs.This gives more direct insight into the ratchet's transformational capabilities.

A. Memoryless input
When the ratchet is driven with a memoryless input, as in the original analysis, Eq. ( 3) is valid, but IPSL always offers a tighter or equal bound on work production than the single-symbol entropy approximation.This holds since the input is memoryless, while the three-state output machine is memoryful and nonunifilar for almost every parameter setting.As such, one cannot calculate the entropy rate h μ in closed form.However, the techniques above can determine the mixed-state presentations of the output HMMs and this gives accurate numerical calculation of both the single-symbol and the IPSL work bounds.This all being said, for most parameter values of the Mandal-Jarzynski ratchet, in practice we find that h μ ≈ H 1 .In other words, when driven with a memoryless input, the ratchet's functional thermodynamic regions are not significantly changed when identified via the single-symbol entropy, a minor quantitative difference without a functional distinction.(See Fig. 8 for a comparison of the functional thermodynamic regions found by each bound.)Exploring output-machine MSPs shows this arises from the ratchet's transition topology.As shown in the middle column of Fig. 2, the ratchet's transducer is fully connected, and all transitions to any other state on any combination of symbols are possible.Therefore, it is impossible to be certain about which state the ratchet is in or, indeed, to even be sure which states the ratchet is not in.Graphically, this is represented by the fact that the output-machine mixed states η always lie deep in the simplex R's interior, as illustrated in Fig. 4 (right).
Mixed states lying at R's center correspond to an equal belief in each of the output HMM's three states: A ⊗ D, B ⊗ D, and C ⊗ D. Those on R's border indicate certainty of not being in at least one state.Since the probability distribution over the next symbol is a continuous function over the mixed states (see Appendix A 2), the diameter of the mixed-state set is a rough measure of the presence of temporal correlations in the ratchet's behavior.To explicitly illustrate this, the mixed states for two example output processes generated by the ratchet are shown in Fig. 4. On the left, the mixed states are spread out, indicating that at the selected parameters, the ratchet induces stronger temporal correlations than in the next example (right).There, all mixed states lie very close together and very near the simplex center.The mixed-state set has very small diameter.For most parameter values, one finds that the mixed states of the memoryless-driven ratchet's output process cluster closely in the middle of the simplex.(See Fig. 9 for a broader survey of ratchet MSPs for memoryless input.)So, by giving insight into the mixed states of the output process, our techniques directly explain why h μ ≈ H 1 for this particular ratchet.

B. Periodic input
Now, consider driving the ratchet with a periodic input.The period-2 process, shown in the middle row of Fig. 2, is memoryful, with two internal states.So, it now is possible that Eq. ( 3) is violated.Since H 1 = 1 and h μ = 0, the presence of temporal correlations in the input is maximized.Noting this, and the near-memoryless behavior of the ratchet as discussed in Sec.V A, we see that for almost all parameters, the ratchet decreases the presence of temporal correlations in transforming the input process to the output.The periodically driven ratchet output HMMs have six states and are nonunifilar for nearly all parameter values; see Fig. 2 (middle, last column).And so, we must calculate these machine's  3) and ( 5) in Fig. 5 to the asymptotic work production shows that Eq. ( 3) is not violated.As predicted above, it is a tighter bound on W than Eq. ( 5).
Although it may seem desirable to use the tighter bound, the single-symbol and entropy rate bounds identify the ratchet's thermodynamic functioning differently: Since W k B T ln 2 H 1 0 for all values of , the singlesymbol entropy bound classifies the ratchet as an eraser, dissipating work to reduce the randomness in the input.However, when considering temporal correlations, we see that the ratchet is in plain fact a dud: h μ > 0. That is, the ratchet dissipates work while increasing the tape's intrinsic randomness.This mischaracterization of thermodynamic function by the single-symbol entropy highlights an important lesson: Bounding the asymptotic work production as tightly as possible is not the same as correctly identifying the functional thermodynamics.As Ref. [12] recently showed, rather than merely a bound, Eq. ( 5) is meaningful only when comparing k B T ln 2 h μ to W .The difference in the two quantifies the amount of work the ratchet can do, if it were an optimal, globally integrated information processor.This shows that even when it may appear to outperform Eq. ( 5), in general Eq. ( 3) cannot serve as a reliable bound on asymptotic work production.We return to this in our final example, where applying Eq. (3) implies a violation the second law.

C. Memoryful input
Finally, let us drive the ratchet with a mixed-complexity memoryful process (partly regular, partly stochastic), the golden mean process.As depicted in Fig. 2 (bottom row), this two-state HMM generates a family of processes parametrized by s ∈ [0, 1].When s = 1, the process is period-2.Decreasing s lets the process emit multiple 1's in a row.This increases in probability until at s = 0, where the process emits only 1's.The driven ratchet's output HMMs have six states and are nonunifilar for nearly all parameter values; see Fig. 2 (bottom, last column).So, again, we must calculate mixedstate presentations to get h μ and identify functionality.
In Fig. 6, we apply both Eqs.(3) and (5) for the same set of ratchet parameters.For both, we find asymmetry in the functional thermodynamic regions with respect to , in contrast to the highly symmetric regions found for memoryless input, shown in Fig. 8.This is due to the asymmetry in input.In fact, it is not possible for the golden mean process to produce strings biased toward 0. Thermodynamically, for > 0, the ratchet is not able to extract work.When applying the single-symbol bound, as shown on the left in Fig. 6, the bound reports large regions of eraser behavior.And, most importantly, between the engine and lower eraser region lies a region where we see that Eq. ( 3) implies a violation of the second law!Of course, when we apply Eq. ( 5) in Fig. 6 (right), the violation region disappears, to be correctly identified as duds.Additionally, the large region of eraser functionality in Fig. 6 (left) shrinks significantly in Fig. 6 (right).Figure 6 (left) regions have been mischaracterized similar to the case discussed in Sec.V B. It is more subtle here, though, since h μ > 0. However, the fundamental problem is the same, by considering only the single-symbol entropy, it appears that the ratchet performs work to make the input less random, since H 1 < 0. In fact, the output is more intrinsically random than the input, and the ratchet dissipates work uselessly.In the violation region on the left, the ratchet is identified as not dissipating sufficient work to reduce the randomness as much as H 1 implies it must be.This leads to the second law violation.This contradiction is resolved when we take into account that the input's intrinsic randomness was actually much lower than its single-symbol entropy.And so, the apparent decrease in randomness was in fact an increase.
It is already known that Eq. ( 3) may be violated in cases of a memoryful ratchet driven by memoryful input.However, the Mandal-Jarzynski ratchet was not designed to find such a violation, as has been done previously [8].Rather, we find that driving a simple transition-rate based ratchet with a mixed-complexity process creates regions of violation when applying Eq. ( 3).Since such ratchets are common in application, and any such ratchet will be highly stochastic by nature, for reasons further discussed in Appendix B, we conclude that Eq. ( 3) is not suitable to be broadly applied.On the positive side, we see that the dynamical-systems techniques introduced here apply broadly, giving consistent and accurate characterizations stochastic-control information engines.

VI. CONSTRUCTING AND DECONSTRUCTING PATTERNS
Up to this point, we monitored how the ratchet changed the amount of intrinsic randomness present in a symbol sequence and leveraged this to do useful work.When information ratchets are memoryful, they can alter not only the statistical bias of a symbol sequence, but also the presence of temporal correlations.This has thermodynamic consequences, as discussed above.Now, we turn to consider by what mechanisms an information ratchet changes the presence of temporal correlations, which manifests in changes in sequence structure and organization.
By structure and organization, we refer to the internal states of the HMM that generates the input symbol sequence and the ratchet states and transitions.As depicted in Fig. 2, the input and the ratchet each have their own set of internal states.Since the output machine is the composition of the ratchet transducer and input HMM, its states are the Cartesian product of the set of input states and set of output states.
In the simplest case, when a memoryless ratchet is driven by memoryless input, there is only ever one state, and no temporal correlations are present at any stage.The only possible action of the ratchet then is to change the statistical bias of individual input symbols and transform this change in Shannon entropy to a change in thermodynamic entropy.
When one or both of the input and ratchet are memoryful, the internal structure of the output will be, in general, memoryful.That is, the ratchet has induced a structural change in processing the input to generate the output.Consider two basic structural-change operating modes [12]: pattern construction, where the output is more structured than the input, and pattern deconstruction, the output is less structured.As before, these modalities are input dependent: the same ratchet may exhibit either.Note that structural change to the symbol sequence does not uniquely determine the thermodynamic functionality associated with changes in randomness.It is possible for an information engine to act as an engine, eraser, or dud while constructing patterns.The same is true of deconstruction.Rather, transformations of randomness and structure are orthogonal, and a ratchet's information processing capabilities may lie anywhere in the h μd μ plane sketched in Fig. 7.

A. Pattern construction
Ideal pattern construction occurs when a ratchet takes structureless input, an IID process, to structured output.Therefore, when the ratchet is driven with a biased coin input, it is operating as an ideal pattern constructor.As discussed in Sec.V A, driving the ratchet with memoryless input results in an uncountably infinite set of states in the output HMM for most parameter values.The exception occurs along the line δ = in parameter space, where the ratchet returns the input unchanged, implying C μ = 0.At every other point in ratchet parameter space C μ = +∞ and the ratchet acts as a pattern constructor.As can be seen from Appendix C's Fig. 8, this type of structural change can be associated with any thermodynamic behavior.
The resulting divergence of C μ is a direct consequence of the nonunifilarity induced by the ratchet.The structure generated by any ratchet driven by an IID process is the set of mixed states of the ratchet, given knowledge of the outputs.Due to the ratchet's topology, there is an uncountable infinity of such mixed states.In this circumstance one uses the statistical complexity dimension [d μ of Eq. ( 8)] of the set of output mixed states to monitor the rate of the memory-resource divergence.d μ distinguishes between output machines with an uncountable infinity of states, and so is able to compare the structural information processing of the ratchet across parameter space.
Figure 7 places two examples of ideal pattern construction on the right side of the h μd μ plane with the associated input and output machines, the latter plotted on the 2-simplex.Although it may appear that the more entropic ratchet in the upper half of the plane constructs a more "complex" pattern, this is not so.Refer back to Fig. 4 and compare the dimension of the two sets of mixed states to see the opposite is true.The ratchet operating in the − h μ half of the plane produces a much denser set of states, resulting in a larger d μ .In addition to the structural transformation, the ratchet in the + h μ plane FIG. 7.After passing through the information engine, the input process, which has some initial h μ and d μ , is transformed into an output process with a potentially different h μ and d μ .By carefully selecting input, an information engine can be induced to act as a randomizer ( h μ > 0) or derandomizer ( h μ < 0) and a pattern constructor ( d μ > 0) or deconstructor ( d μ < 0).We show here that all four regions of the h μd μ plane are accessible to the Mandal-Jarzynski ratchet by carefully selecting parameters and input.The two insets on the left show the uncountable set of mixed states of an input process that the ratchet transduces to an IID output.The insets on the right show two uncountably infinite-state output processes produced by running the Mandal-Jarzynski ratchet on a biased coin.The parameters, clockwise from top right: δ input = −0.98,= 0.01, τ = 0.1; δ input = 0.3, = 0.5, τ = 0.1; δ output = 0.8, = −0.96,τ = 0.75; δ output = 0.0, = 0.9, τ = 0.9.
randomizes inputs as a dud, while the other derandomizes inputs as an eraser.

B. Pattern deconstruction
In a complementary fashion, the ratchet can deconstruct patterns.In ideal pattern deconstruction, a ratchet transforms a memoryful input sequence, with C μ > 0, to memoryless, IID output, with C μ = 0.When taking a ratchet-focused view, as we do here, ideal pattern deconstruction is a more involved task than ideal pattern construction since we must carefully design inputs that a ratchet will transform into a biased coin.Any correlations in the input must be recognizable by the ratchet so that the ratchet can map them to randomness.Similar to the previous discussion, we consider the induced ratchet mixed states, but now we have knowledge of the inputs.The algorithm to design the required input process, given knowledge of the ratchet, is discussed in Appendix E.
Critically, pattern deconstruction is not possible for all ratchet parameters and desired output.That said, the Mandal-Jarzynski ratchet can perform as an engine, eraser, or dud while deconstructing patterns, as can be seen in Appendix E's Fig. 10.As τ increases, the parameter-space region in which the ratchet can extract patterns shrinks.At τ → ∞ pattern extraction may only occur along the line δ = .In a mirror of pattern construction, generating the input processes requires reference to the uncountably infinite set of mixed states of the ratchet.In general, this implies that an input process which maps to a memoryless output process also has an uncountably infinite set of states and C μ → −∞.In other words, to properly ensure the output symbols are temporally uncorrelated, the input process must remember its infinite past.Once again, the associated statistical complexity dimension d μ , now of the set of input mixed states, quantifies the rate of the memory-resource divergence.
Two examples of ideal pattern deconstruction are placed on the left side of the h μd μ plane in Fig. 7, with the ratchet mixed states, on the 2-simplex, and the output machine.d μ is approximate, based on the dimension of the ratchet mixed states, which are conjectured to have the same dimension as the input mixed states.

C. Thermodynamic taxonomy of construction and deconstruction
From its highly stochastic nature and from parameter sweeps like the one shown in Appendix D's Fig. 9, we conclude that for almost all parameters, the Mandal-Jarzynski ratchet is only able to construct patterns with infinite sets of predictive features (mixed states).We conjecture that, likewise, it is only able to perfectly deconstruct infinite patterns.An interesting note is that the input and output mixed-state sets and their dimensions are asymmetric.We can visually see the asymmetry in Fig. 7, which sketches the h μd μ plane and shows an example of the Mandal-Jarzynski ratchet operating in all four quadrants.Infinite-state output constructed by the ratchet may span the simplex, but the mixed states of the ratchet, while acting as a deconstructor, always lie along a line in the simplex.This implies that while the ratchet may construct patterns up to d μ = 2.0, it is only able to deconstruct patterns up to d μ = −1.0.The difference in d μ in these two modalities points to a difference in memoryresource divergence for pattern construction versus pattern deconstruction.
This asymmetry is not necessarily surprising.Recall the asymmetry in the ratchet's ability to randomize and derandomize behavior.The combined area of dud and engine regions in Fig. 6 comprise the ratchet's randomizing regime, while the derandomizing regime is the comparatively small eraser region.One interpretation of this asymmetry comes from the thermodynamic limitations on the ordering of h μ and W : While an increase in h μ is thermodynamically unbounded, h μ is constrained by the second law to only drop as low as the minimum asymptotic work.This strongly suggests that there is a thermodynamic taxonomy of structural transformation, one that parallels our existing thermodynamic taxonomy of randomness transformation.We must leave finding such a taxonomy and the analysis of more general ratchets with input-dependent structural behavior to the future.

VII. RELATED EFFORTS
We can now place the preceding methods and results in the context of prior efforts to identify the thermodynamic functioning of information engines.In short, though, having revealed the challenge of exact entropy calculations and the inherent divergence in structural complexity, these methods appear to call for a substantial reevaluation of previous claims.We start noting a definitional difference and then turn to more consequential comparisons.
The framework of information reservoirs discussed here differs from alternative approaches to the thermodynamics of information processing, which include (i) active feedback control by external means, where the thermodynamic account of the demon's activities tracks the mutual information between measurement outcomes and system state [19][20][21][22][23][24][25][26][27][28][29][30][31]; (ii) the multipartite framework where, for a set of interacting, stochastic subsystems, the second law is expressed via their intrinsic entropy production, correlations among them, and transfer entropy [32][33][34][35]; and (iii) steady-state models that invoke timescale separation to identify a portion of the overall entropy production as an information current [36,37].A unified approach to these perspectives was attempted in Refs.[38][39][40].
These differences being called out, Maxwellian demonlike models designed to explore plausible automated mechanisms that do useful work by decreasing the physical entropy, at the expense of positive change in reservoir Shannon information, have been broadly discussed elsewhere [6,36,[41][42][43][44][45].However, these too neglect correlations in the information-bearing components and, in particular, the mechanisms by which those correlations develop over time.In effect, they account for thermodynamic information processing by replacing the Shannon information of the components as a whole by the sum of the components' individual Shannon informations.Since the latter is larger than the former [46], using it can lead to either stricter or looser bounds than the correct bound derived from differences in total configurational entropies.Of more concern, though, bounds that ignore correlations can simply be violated.Finally, and just as critically, the bounds refer to configurational entropies, not the intrinsic dynamical entropy over system trajectories: the Kolmogorov-Sinai entropy.A more realistic model was suggested in Ref. [44].Issues aside, these designs have been extended to enzymatic dynamics [47], stochastic feedback control [48], and quantum information processing [49,50].
In comparison, our approach expands on Ref. [7] that considers a demon in which all correlations among the system components are addressed and accounted for.As shown above, this has significant impact on the analysis of demon thermodynamic functionality.To properly account for correlations, we developed a suite of tools that allow quickly and efficiently analyzing nonunifilar HMMs and related stochastic controllers, which removes the mathematical intractability of analyzing correlations for arbitrary demons.We note that our approach and results are consistent with the analyses that consider the entropy of the system as a whole, therefore treating correlations in the system implicitly, an approach epitomized by Ref. [51].Since correlations are not ignored, this approach is fully consistent with our treatment.This being said, insofar as that work does not address specific partitioning of the system, it does not offer an explicit accounting of the system's internal correlations, as is done here.As previously discussed, one may derive information ratchet-type results from that approach by considering an explicit partitioning [12,14,52].While the results are consistent, leaving the role of correlations implicit does not allow for investigating how to best leverage them.It also does not give a way to analyze internal computational structure.These remarks highlight the importance of explicitly considering information engine-style partitioning.
The dynamical-systems methods additionally allowed us to consider a demon's internal structure, which had only previously been investigated for unifilar ratchets in Ref. [12].From engineering and cybernetics to biology and now physics, questions of structure and how an agent, here understood as the ratchet, interacts with and leverages its environment, i.e., input, is a topic of broad interest [53,54].General principles for how an agent's structure must match that of its environment will become essential tools for understanding how to take thermodynamic advantage of correlations in structured environments, whether the correlations are temporal or spatial.Ashby's law of requisite variety (a controller must have at least the same variety as its input so that the whole system can adapt to and compensate that variety and achieve homeostasis [53]) was an early attempt at such a general principle of regulation and control.For information engines, a controller's variety should match that of its environment [14].Above, paralleling this, but somewhat surprisingly, we showed that for the Mandal-Jarzynski ratchet to extract patterns from its environment, the input must have an uncountably infinite set of memory states synchronized to the ratchet's current mixed state.One cannot but wonder how such requirements manifest physically in adaptive thermodynamic nanoscale devices and biological agents.

VIII. CONCLUSIONS
Thermodynamic computing has blossomed, of late, into a vibrant and growing research domain, driven by applications and experiments [2,19,28,32,36,39,47,48,51,[55][56][57][58][59][60].As such, it is vital that analytical tools accurately relate information processing and thermodynamic functionality.While the original class of Maxwellian information engines was flexible and well suited to specific applications, accurate analysis and correct functional classifications were previously hampered by the challenge of determining the entropy rate of temporally correlated sequences, sequences that are inevitably induced by Maxwellian ratchets or are present in their possible environments.Previously useful and seemingly reasonable approximations to the entropy rate are not up to this task.As we demonstrated, they can fail miserably, even lead to incorrect attributions of thermodynamic function and, worse, to violations of the second law.
Here, we introduced techniques from dynamical systems and ergodic theory (dimension theory, iterated function systems, and random matrix theory) that overcome these hurdles and, in the process, constructively solve Blackwell's longstanding question of the entropy rate of processes generated by hidden Markov models.They allow us to accurately determine the thermodynamic functioning of Maxwellian information engines with arbitrary ratchet design, over all possible inputs.In this way, the results significantly expand the set of analyzable engines.In short, this changes the perspective of the current research program from studying highly constrained toy examples to broadly surveying engine designs.This is a boon to theory, experiment, and engineering.
Furthermore, these tools allowed us to look under the hood, so to speak, and examine more than quantitative changes in the intrinsic randomness of processes, but also to show how ratchets impact structure and correlation.Most strikingly, we showed that, in general, stochastic ratchets generate outputs that require uncountably infinite sets of predictive features to optimally function, even when driven by trivial (temporally uncorrelated) input.
From this follows the probability of transitioning from η(w) to η(wx) on observing symbol x: This defines the mixed-state dynamic W over the mixed states.Together the mixed states and their dynamic give the HMM's mixed-state presentation (MSP) U = {R, W} [15].
Given an HMM presentation, though, we can explicitly calculate its MSP.The probability of generating symbol x when in mixed state η is with 1 a column vector of 1's.Upon seeing symbol x, the current mixed state η t is updated: with η 0 = η(λ) = π and λ the null sequence.Thus, given an HMM presentation we can calculate the mixed state of Eq. (A1) via .
The mixed-state transition dynamic is then since Eq.(A3) tells us that, by construction, the MSP is unifilar.That is, the next mixed state is a function of the previous and the emitted (observed) symbol.
Transient mixed states are those state distributions after having seen finite-sequences w, while recurrent mixed states are those remaining with positive probability in the limit that → ∞.When their set is minimized, recurrent mixed states exactly correspond to causal states S [61].Now, with a unifilar presentation one is tempted to directly apply Eqs. ( 6) and (7) to compute measures of randomness and structure, but another challenge prevents this.With a small number of exceptions, the MSP of a process generated by a nonunifilar HMM has an uncountable infinity of states η [18].Practically, this means that one cannot construct the full MSP, that direct application of Eq. ( 6) to compute the entropy rate is not feasible, and that |S| diverges and, typically, so does C μ .

Entropy rate of nonunifilar processes
The collection of mixed states over all of a process' allowed sequences, i.e., → ∞, and the mixed-state dynamic W induces a (Blackwell) measure μ on the state distribution Pr(R) (N − 1)-dimensional simplex R. The entropy rate is then [ where Pr(x|η ) is the mixed-state dynamic of Eq. (A5) and x 0: is the first symbol of an arbitrarily long sequence x 0:∞ generated by the mixed-state process.This handily addresses accurately estimating the entropy rate of nonunifilar processes.And so, we are left to tackle the issue of these process' structure with the statistical complexity dimension.This requires a deeper investigation of the MSP.

Statistical complexity dimension
C μ diverges for processes generated by generic HMMs, as they are typically nonunifilar and that, in turn, leads to an uncountable infinity of mixed states.To quantify these processes' memory resources one tracks the rate of divergence, the statistical complexity dimension d μ of the Blackwell measure μ on R: where H ε [Q] is the Shannon entropy (in bits) of the continuous-valued random variable Q coarse grained at size ε and R is the random variable associated with the mixed states η ∈ R. d μ is determined by the measured process' entropy rate h B μ , as given by Eq. (A7), and the mixed-state process' spectrum of Lyapunov characteristic exponents (LCEs).The latter is calculated from an HMM's labeled transition matrices, which map the mixed states η t ∈ R according to Eq. (A3).The LCE spectrum = {λ 1 , λ 2 , . . ., λ N : λ i λ i+1 } is determined by time averaging the contraction rates along the N eigendirections of this map's Jacobian.The statistical complexity dimension is then bounded by a modified form of the LCE dimension [62]: where and k is the greatest index for which h B μ + k i=1 λ k > 0. Reference [11] introduces this bound for an HMM's statistical complexity dimension, interprets the conditions required for its proper use, and explains in fuller detail how to calculate an HMM's LCE spectrum.
In short, the set of mixed states generated by a generic HMM is equivalent to the set defining the attractor of a nonlinear, place-dependent iterated function system (IFS).Exactly calculating dimensions, say, d μ , of such sets is known to be difficult.This is why here we adapt d LCE to iterated function systems.The estimation is conjectured to be accurate in "typical systems" [62][63][64].Even so, in certain cases where the IFS does not meet the open set condition [64], the relationship becomes an inequality: d μ < d LCE .This case, which is easily detected from an HMM's form, is discussed in more detail in Ref. [11].

APPENDIX B: MANDAL-JARZYNSKI RATCHET
To work with the Mandal-Jarzynski ratchet, we reformulated it in computational mechanics terms, which is explained in Sec.II.In its original conception, the model was imagined as a single symbol ("bit") interacting with a dial that may smoothly transition between three positions, as shown on the left in Fig. 3.This results in six possible states of the joint dial-symbol system {A ⊗ 0, A ⊗ 1, B ⊗ 0, B ⊗ 1, C ⊗ 0, C ⊗ 1}.The transitions among these six states are modeled as a Poisson process, where R i j is the infinitesimal transition probability from state j to state i, with i; j ∈ {A × 0, . . .,C × 1} [6].The weight parameter , so named because it is intended to model the effect of attaching a mass to the side of the dial, impacts the probability of transitions among the six states by making 0 → 1 transitions energetically distinct from 1 → 0 transitions.This creates a preferred "rotational direction" since bit flips in one direction will be more energetically beneficial than the other.This is what allows the ratchet to do useful work.
Explicitly, the transition rate matrix R is To express the ratchet's evolution over a single interaction interval of length τ , we calculate T (τ, ) = (e R( )τ ) , the transition matrix of the six-state Markov model representing the Mandal-Jarzynski model.In turn, this six-state model with the states {A ⊗ 0, A ⊗ 1, B ⊗ 0, B ⊗ 1, C ⊗ 0, C ⊗ 1} may be transformed into a three-state transducer, with states {A, B, C} and input and output symbols in {0, 1}.To do this, we define the projection matrices Then, the transducer input-output matrices K in,out (τ, ) for a given T (τ, ) are given by K in,out (τ, ) = (P in ) T (τ, )P out .

Composing a ratchet with an input process' machine
Given an input process generated by an HMM with transition matrices T (x) , such that x ∈ {0, 1}, we may exactly calculate the transition matrices T (x ) of the output process' HMM: S N ,S N+1 , noting that the state space of the output HMM is the Cartesian product of the state space of the transducer R and the state space of the input machine S.Although presenting this in the setting of the Mandal-Jarzynski ratchet specifically, this method applies for any input machine and transducer, given that the transducer is able to recognize the input [65].That said, there are several interesting points specific to the Mandal-Jarzynski ratchet we should highlight.As noted in the previous section, the Mandal-Jarzynski transducer matrices are positive definite, guaranteeing that the output machine will be nonunifilar, although disallowed state transitions in input machines are preserved in the output.(Composing the Mandal-Jarzynski ratchet with the golden mean process in Fig. 3 illustrates this effect.)This is characteristic of any transducer defined via the rate transition matrix method outlined above.The conclusion is that the techniques required to analyze nonunifilar HMMs are required in general.

APPENDIX C: BIASED COIN PARAMETER SWEEP
As Sec.V A discusses, we recreated the results from Mandal and Jarzynski's original ratchet [6] using the techniques outlined in this section and in Appendix A. There, the ratchet is driven by a memoryless biased coin and the functional thermodynamic regions are identified via Eq.( 3) [6].These results are shown in Fig. 8, on the left, and demonstrate close agreement with the original results.As previously noted, calculating the thermodynamic regions via Eq.( 5) did not significantly change the identified regions, as can be seen by comparison to the figure on the right.Although not shown here, we also recreated the results at τ = 10, which again show strong agreement with results reported in Ref. [6].

APPENDIX D: INFORMATION RATCHET MIXED-STATE ATTRACTOR SURVEY
To emphasize how exploring a ratchet's mixed states elucidates the underlying physics, Fig. 9 presents the attractors of the Mandal-Jarzynski (MJ) ratchet driven by the biased coin, as a function of and b, in analogy with Fig. 8.Each square in the grid shows the mixed-state attractor for the output HMM produced by the composition of the Mandal-Jarzynski ratchet at the given with a biased coin at the given bias δ.The grid is laid out identically to the functional thermodynamic plots above, with varying on the y axis and the input bias b varying on the x axis.Note that the squares are not at the same scale: each is magnified to show the structure of the attractor; the magnification factor is given in the lower right corner.Compare to Fig. 4 to see the mixed-state attractors in further detail.Additionally, the attractors are color coded to show thermodynamic functionality: red for engines, blue for erasers, and black for duds.
The symmetry of the Mandal-Jarzynski ratchet around = 0 is revealed by how structure of the output HMM attractors is reflected and reversed over the = δ line.Along this diagonal, we see that the mixed-state attractor collapses to a single state: a single point.This reflects the fact that at any = δ the output HMM is the input biased coin, so W = H = h μ = 0. Furthermore, we see that the structure of the mixed-state attractor does not have a strong effect on the thermodynamic functionality; very similar attractors act as duds and as erasers on each side of the = 0 line.This is as expected since, although thermodynamic functionality appears to change suddenly, the grids in Figs. 6, 8, and 9 actually sweep over output machines with smoothly changing transition probabilities.And, changes in functionality represented by the boundaries of thermodynamic regions are actually due to small, smooth changes in the comparative magnitude of W and h μ .Figure 9 illustrates this clearly, as the mixed-state attractor changes smoothly under the parameter sweep.
Note that the construction of Fig. 9 was only possible due to the dynamical-systems techniques outlined in Appendices A 2 and A 3. The recently developed guarantee of ergodicity and quick generation of mixed states allows us to easily plot and investigate the mixed-state attractors of arbitrary HMMs.And, this allows for parameter sweeps of attractors of HMM families and rapid calculation of their entropy rates.The latter was required to determine the thermodynamic functionality color coding in Fig. 9.
of .As τ → ∞ these regions grow in size, until at large τ the only parameter region capable of pattern deconstruction is δ = .This is where the ratchet becomes memoryless, so it is trivially a pattern deconstructor along this line.

FIG. 4 .
FIG. 4. Mixed states η ∈ R of the output process generated by the ratchet driven with memoryless input [Fig. 2 (top row)] plotted on the 2-simplex.Corner labels give the mixed-state probability distributions η = ( Pr(A ⊗ D), Pr(B ⊗ D), Pr(C ⊗ D)).Mixed states at the simplex corners correspond to the HMM being in exactly one of its states, while mixed states in the simplex interior are mixtures of the possible HMM states, with η = ( 1 3 , 1 3 , 1 3 ) lying at the center.(Left) Ratchet parameters δ = −0.98,= 0.01, and τ = 0.1.(Right) Ratchet parameters δ = 0.4, = 0.5, and τ = 0.1.Insets: detail of the mixed-state sets, magnified by amount indicated in upper right corner.

FIG. 5 .
FIG.5.Asymptotic work production W , single-symbol H 1 bound, and Kolmogorov-Sinai-Shannon h μ bound when ratchet is driven by period-2 memoryful input.Since the input has no parameters, the parameter sweeps only over with τ = 10.mixed-state presentations to estimate h μ .Comparing Eqs.(3) and(5) in Fig.5to the asymptotic work production shows that Eq. (3) is not violated.As predicted above, it is a tighter bound on W than Eq.(5).Although it may seem desirable to use the tighter bound, the single-symbol and entropy rate bounds identify the ratchet's thermodynamic functioning differently: Since W k B T ln 2 H 1 0 for all values of , the singlesymbol entropy bound classifies the ratchet as an eraser, dissipating work to reduce the randomness in the input.However, when considering temporal correlations, we see that the ratchet is in plain fact a dud: h μ > 0. That is, the ratchet dissipates work while increasing the tape's intrinsic randomness.This mischaracterization of thermodynamic function by the single-symbol entropy highlights an important lesson: Bounding the asymptotic work production as tightly as possible is not the same as correctly identifying the functional thermodynamics.As Ref.[12] recently showed, rather than merely a bound, Eq. (5) is meaningful only when comparing k B T ln 2 h μ to W .The difference in the two quantifies the amount of work the ratchet can do, if it were an optimal, globally integrated information processor.This shows that even when it may appear to outperform Eq. (5), in general Eq.(3) cannot serve as a reliable bound on asymptotic work production.We return to this in our final example, where applying Eq. (3) implies a violation the second law.

FIG. 9 .
FIG. 9. Mixed-state attractors of the output HMMs of the Mandal-Jarzynski ratchet driven by a biased coin as a function of and input bias δ, given above each square; ( , δ) ∈ [0, 1] × [−1, 1].Each plot shows 1000 mixed states from the attractor at the magnification noted in the lower right corner.The attractors may be compared with Fig. 8 to determine thermodynamic functionality.
contractivity of the simplex maps, the substochastic transition matrices of Definition 1, showing that the generation of mixed states is ergodic.Given this, rather than integrate over μ(η) as required by Blackwell's Eq. (A6), we may use a time average over an iterated sequence of mixed states to obtain the entropy rate [10,11] x∈A Pr(x|η) log 2 Pr(x|η).(A6)However, this expression was previously intractable, due to the often fractal nature of the Blackwell measure μ(η) and the MSP's uncountable state space.Recently, Refs.[10,11]introduced a constructive approach to evaluate this integral by establishing