Non-Markovian Momentum Computing: Universal and Efficient

All computation is physically embedded. Reflecting this, a growing body of results embraces rate equations as the underlying mechanics of thermodynamic computation and biological information processing. Strictly applying the implied continuous-time Markov chains, however, excludes a universe of natural computing. We show that expanding the toolset to continuous-time hidden Markov chains substantially removes the constraints. The general point is made concrete by our analyzing two eminently-useful computations that are impossible to describe with a set of rate equations over the memory states. We design and analyze a thermodynamically-costless bit flip, providing a first counterexample to rate-equation modeling. We generalize this to a costless Fredkin gate---a key operation in reversible computing that is computation universal. Going beyond rate-equation dynamics is not only possible, but necessary if stochastic thermodynamics is to become part of the paradigm for physical information processing.

The burgeoning field of thermodynamic computing leverages recent progress in nonequilibrium thermodynamics and information and computation theories [1][2][3][4][5] to establish a new physical paradigm for computation [6]. Working upwards from fundamental laws of physics, it promises to increase computational efficiency and power and to reduce energy dissipation in a next generation of computers. More broadly, a general framework rooted in thermodynamics, as thermodynamic computing is, will provide the tools to understand the physics of computation in all its many forms. However, recent efforts inadvertently and unnecessarily limit the potential scope. The following illustrates the breadth of that scope by introducing non-Markovian momentum computing that is both computation universal and thermodynamically efficient. A computation over a time interval t ∈ (0, τ ) is described by the conditional input-output mapping between memory states m(0), m(τ ) ∈ M: p m(0)→m(τ ) = Pr [m(τ )|m(0)]. The mapping p characterizes the probability of the final memory state given the initial memory state. Attempts to establish a general framework for the required mappings in thermodynamic computing have assumed that the memory state m is a physical state of a system-it obeys stochastic Markovian dynamics, with time-evolution depending only on the system's current state [7,8, and references therein]. More broadly, weak coupling to a thermal bath and separation of time scales are often invoked to justify such Markovian thermodynamic behavior [2,[9][10][11][12]. Separating the memory system's time scale from that of the heat bath serves to eliminate any memory in the heat bath from the system's behavior, as the heat bath is assumed to rapidly relax to a local equilibrium. As a result, transitions among memory states are potentially stochastic, but Markovian-they only depend on the current state [2]. Thus, these systems obey continuous-time Markov chains (CTMCs), which are equivalently represented by rate equations [7,8]. Heat bath interactions allow for a broader suite of behaviors among memory states than purely deterministic Hamiltonian evolution allows on its own. However, this framework is still restrictive. For example, only input-output mappings whose determinants are positive are allowed when memory-state dynamics are restricted to obey CTMCs. And this, for better or worse, eliminates a wide range of possible and common computations, including flipping a single bit of information [13]. And so, while CTMCs are a powerful framework for stochastic thermodynamics [2,14,15], they fail to capture a broad swath of physically-realizable computing. They neglect the possibility of physical variables that carry hidden memory of the past beyond the immediate computational memory configuration. In a thermodynamic system consisting of a collection of particle positions and momenta ( x, p) the dynamics are Langevin: stochastic, but Markovian [16]. The underlying dynamic that governs the combination of the system's full microstate and the thermal bath together, though, is deterministic and Hamiltonian. The system's stochastic evolution results from coarse-graining over the bath degrees of freedom. Focusing on the system's Markov dynamic alone is a conventional modeling strategy, especially since one is typically uninterested in the bath's details. Similarly, if system memory-states are determined only by particle position, that choice of state coarse-grains away an additional component of the system and bath: system momentum. However, unlike the microstates of the ideal thermal bath, system momentum may carry memory of past behavior and this contributes to memory-state dynamics. Soberingly, analytical treatment of partiallyobserved (and therefore non-Markov) systems is highly nontrivial [5,12,[17][18][19]. That said, removing the CTMC restriction permits realizing a broader range of computations. Thus, despite the additional analytical burden, investigating systems that operate in the regime where hidden states carry computationally-useful memory is a topic of current focus [5,[20][21][22][23][24]. We argue that the appropriate setting for thermodynamic computing is continuous-time hidden Markov chains (CTHMCs), in which hidden variables may store computationally-relevant information. Recently, this was recognized as sufficient for a broad class of input-output mappings p by introducing ancillary hidden states which implement sequences of logical operations that individually obey CTMC dynamics [8,25]. However, CTHMCs implement more general computations still. The following first implements a thermodynamicallycostless bit flip-a simple computation that is explicitly forbidden by CTMCs. It then generalizes this to a costless Fredkin gate [26]-a key component in reversible computing that is also impossible to implement with CTMCs. The implementation of this universal and reversible logic gate via CTHMCs demonstrates that non-Markov dynamics are essential to thermodynamic computing. Bit Flip To execute a single bit flip over a time interval t ∈ [0, τ ], the first step is to store a bit of information. One candidate is a particle with a single position dimension x ∈ R and corresponding momentum p ∈ R with double-well potential energy landscape: V DW (x) ≡ αx 4 − βx 2 , where both α and β are positive and determine the location of the potential minima x * = ± β/2α. The parameters are set so that information is stored robustly at the beginning of the computation interval (t = 0). The particle's environment is a thermal bath at temperature T . As the height of the potential energy barrier at x = 0 rises relative to the bath energy scale k B T , the probability that the particle transitions between left (x < 0) and right (x ≥ 0) decreases exponentially. In this way, if we assign the left half of the space to memory state 0 and the right half to memory state 1, the energy landscape is capable of metastably storing a bit m ∈ {0, 1}. To execute a flip operation, we instantaneously reduce the coupling to the thermal reservoir to zero such that it now follows dissipationless Hamiltonian dynamics. Simultaneously, the potential energy landscape changes to a positive quadratic well: V DW (x, t = 0 + ) = kx 2 /2. The resulting particle motion is harmonic oscillation: where x * is the maximum distance from the origin of the cycle and φ is the phase difference from maximum distance at the time t = 0 + . If we maintain the decoupled system in the quadratic potential energy landscape for half the period of oscillation t ∈ 0, π √ m/ √ k , then the particle's new position becomes: over the computation interval τ = π √ m/ √ k, the position flipped sign so that the memory state has flipped as well: m(τ ) = 1 − m(0). Finally, we instantaneously return the potential energy landscape to the original double well and recouple to the thermal bath.
The work cost comes from changes in the potential energy at t = 0 and t = τ . However, since particle position simply flips sign between t = 0 and t = τ and the potential energy landscape is even in position, zero net work must be generated during this time-symmetric protocol.
Not only does this computation go beyond what is physically allowable according to rate-equation dynamics over the memory states, but the states only change while the Hamiltonian control is fixed. Thus, the computation is passive, meaning that it fits the information-ratchet framework introduced by Ref. [27].

Fredkin Gate
The bit-flip implementation may seem obvious in its simplicity. Can sophisticated and functional computing, in fact, be built from such simple passive processes? We answer in the affirmative by showing that a similar strategy implements the Fredkin gate, a reversible and universal logical gate [26]. This straightforwardly establishes that the CTHMC framework for thermodynamic computing gives easy access to complex and universal Turing computing.
The Fredkin gate operates on three bits M = {0, 1} 3 . That is, we encode our physical system as three particleposition variables (x, y, z) that are each separated into negative and positive memory-state regions as above. This splits the memory states into eight respective octants: (x < 0, y < 0, z < 0) corresponds memory state m = 000, (x < 0, y ≥ 0, z < 0) to m = 010, and so on. The information-storing Hamiltonian is a straightforward sum of bistable double-wells in each dimension: This provides metastable regions corresponding to each memory state m x m y m z ∈ {0, 1} 3 .
Within this framework, we consider physical transformations that implement the Fredkin gate and do so robustly. The Fredkin gate is also known as the controlled swap gate, as it swaps inputs m y and m z only if the control m x is set to 1. In other words, the gate maps all inputs to themselves, excluding 101 and 110 which swap with each other. The implementation uses the same strategy of decoupling and adding a harmonic potential over the time interval t ∈ (0, τ ), then recoupling and resetting the original information-storing Hamiltonian. The only difference is that the harmonic potential driving the computation is now embedded in the higher-dimensional space.
To execute the Fredkin gate, first note that the memorystate x-index must always be fixed: m x (τ ) = m x (0). Moreover, behavior in the y − z plane should only depend on x up to whether it is positive or negative. Thus, we first split the potential into two pieces: If m x (0) = 0 then m y and m z must also not change. This suggests using the informationstoring potential for this region of state space: V (x < 0, y, z, t) = V store (x, y, z), so that: For m x = 1, however, we must nontrivially compute on m y and m z : Here, V comp determines that part of the Hamiltonian which implements the switch 101 → 110 and 110 → 101 and remains unchanging over t ∈ (0, τ ). Due to decoupling from the x-axis, particle behavior in either the positive or negative x regions can be considered as being purely the result of two-dimensional dynamics.
To swap 101 and 110, while keeping 111 and 100 fixed, consider a new basis for the yz-space. Define new variables: y = (y − z)/ √ 2 and z = (y + z)/ √ 2, such that the local equilibrium distributions for states 110 and 101 are centered around z = 0 and those for states 111 and 100 are centered around y = 0. Thus, our goal is to swap the distributions in the y -coordinate while preserving their z -coordinate. Given this, we split the computation Hamiltonian again into independent components: V comp (y, z) = V (y ) + V (z ). Flipping in the y -coordinate employs the same Hamiltonian as for the previous Bit Flip protocol: V (y ) = ky 2 /2. As a result, when waiting half a period τ = π √ m/ √ k, the y coordinate changes sign y (τ ) = −y (0), as does its momentum.
We choose the z coordinate's potential to be quadratic as well, but with an induced period of oscillation that is half as long: V (z ) = 2kz 2 . z then undergoes a full cycle after the duration τ = π √ m/ √ k, returning to its original value z (τ ) = z(0), as does its momentum.
The resulting full Hamiltonian over the control interval operates piecewise. Figure 1 shows the potential in the x < 0 and x > 0 regions during the computation interval: for t ∈ (0, τ ). Translating back to the original coordinates y = (y + z )/ √ 2 and z = (z − y )/ √ 2, we find that for x ≥ 0, this passive Hamiltonian transforms the particle's state by swapping y and z: while it holds the other four quadrants where m x = 0 in their respective potential minima. Thus, the transformation swapped y and z only when m x = 1, implementing the Fredkin gate.
For a particular trajectory (x, y, z)(t), the work invested only comes from the initial and final instantaneous changes in the energy landscape: Recall the restriction that x(t) is exponentially unlikely to change sign, because the energy barrier between states is much higher than the vast majority of thermal fluctuations can access. Thus, we assume that paths maintain a single sign for x(t). If x(t) is negative, then there is no instantaneous change, as the system is held in the same double-well potential, so W = 0. That said, if x(t) is positive, then the work invested also vanishes.
The x subspace of the potential decouples from the y − z subspace and remains constant. Thus, there are no work contributions from the x-dependent terms. Additionally, the y − z subspace potential is symmetric with respect to exchange of the y and z coordinates. So, the energy differences above will vanish for the y and z dependent terms as well. (Recall that the action of the potential over our interval is to swap the y and z coordinates so that (y(τ ), z(τ )) = ((z(0), y(0)).) And so, the average work production is nearly zero-only the exponentially suppressed barrier crossing events can contribute to nonzero work values. Figure 2 demonstrates the evolution of the phase space on an ensemble of initial conditions drawn from the equilibrium distribution of V store (x, y, z). As shown by the particle coloring, those that start in 110 and 101 swap while all others are fixed. Moreover, none of the particles' x-coordinate change informationally-confirming the effectiveness of the overall transformation.

Langevin Simulation
The preceding stipulated that the logical system be isolated from its thermal environment during the swap. However, the impact of adding thermal coupling is minimal. To demonstrate this, we investigated how robust the operation is to thermal coupling by using underdamped Langevin dynamics. A simulation was carried out by initializing 20000 particles in equilibrium with a thermal reservoir under the information-storing potential V store (x, y, z). Next, as described above, we exert work on the system by turning on the computational potential V comp in the region x > 0. However, rather than reducing the thermal coupling to λ = 0, we drop the coupling coefficient to a nonzero value in the weak coupling regime. This coupling value and potential are held fixed for time τ = π m/k. (The appendix provides additional detail.) The particles experience thermal fluctuations as the weak coupling to the bath perturbs their trajectories from the otherwise expected harmonic motion. The work gained from shutting off the potential will not generally be the same as the work invested to turn it on (as in the idealized case of zero thermal coupling). In fact, the Second Law guarantees that, generally, positive work is invested for such cyclical transformations, because the net change in equilibrium free energy is zero. Nevertheless, one expects the behavior to approximate the desired Fredkin-gate dynamics if the coupling is sufficiently weak. While the energetic cost of implementing the gate does not remain zero as the coupling approaches zero, Fig. 3 shows that the logical fidelity approaches unity. And, it does so with zero slope, revealing that this Fredkin gate implementation is robust even in the presence of thermal fluctuations. Thus, we see that the Fredkin gate (CTHMC) dynamics do not rely on removing the thermal reservoir. As expected and shown in Fig. 3, the work invested approaches zero with decreasing coupling. However, as the coupling to the thermal reservoir increases, the average work required to compute increases to multiples of k B T . This cost is much more than predicted by the microscopic detailed-balance dynamics that underlie the Langevin simulation. This suggests the existence of a lower bound on entropy production-one that accounts for the coursegraining, as predicted in Ref. [28]. Conclusion Rate equation dynamics is certainly a venerable and powerful framework, central to reaction kinetics in chemistry [29,30] and key to the master equations of applied statistical mechanics [2,14,15]. In fact, perhaps due to the remarkable successes of continuous-time Markov chain predictions of many thermodynamic behaviors, it might seem natural to claim that in order to be "physically realizable", thermodynamic computing and biological information processing can only be described and analyzed as rate-equation dynamics [7]. However, we demonstrated this framework cannot form a complete basis for thermodynamic computing. Moreover, its strict application levies a penalty that precludes engineering and analyzing Maxwellian information ratchets, which are the physical equivalent of Turing machines [27,[31][32][33]. The limits are especially draconian, since efficient time-symmetrically controlled general computations consist of involutions [28]-operations that are composed of bit swaps and identity maps in positional memory (or any memory that is even under time reversal). As a constructive alternative, we proposed employing continuous-time hidden Markov chains to realize non-Markovian momentum computing. We demonstrated it provides a more complete framework, using two explicit examples that are forbidden if one is restricted to rate equations to describe the evolution between memory states [7]. More and helpfully, we introduced explicit mechanisms for implementing both with zero work, proving that they are most certainly "physically realizable". Not only are hidden Markov chains more general, but their added generality is critical in many circumstances. The fact that the Fredkin gate can be executed robustly, even when thermal fluctuations perturb the particle trajectories, suggests that this implementation will have practical use for reversible universal computing. The robustness of the gate to fluctuations separates it from other implementations of reversible computing-such as, ballistic computing with billiards-that are dynamically unstable [26]. We did, however, fully acknowledge the increased analytical complexity posed by CTHMC dynamics. Fortunately, the requisite tools have been developed that render the be-haviors analytically tractable and in closed form [34,35]. In short, there is little impediment to reaching the full generality of thermodynamic computing with CTHMCs.
Given that convincing physically-realizable implementations of the bit flip and and Fredkin gate [26,36,37] have been known for some time, one can only conclude that computing devices must be able to operate beyond the restrictions imposed by rate-equation dynamics. The examples presented here were intentionally couched in the thermodynamics of information to help bridge an apparent gap in understanding general computing. Most specifically, the conception of memory must be modified, from being the realization of a microscopic physical state to being a mesoscopic coarse-graining, to fully realize the power and breadth of physical computations.

Non-Markovian Momentum Computing: Universal and Efficient
Kyle J. Ray, Alexander B. Boyd, Gregory W. Wimsatt, and James P. Crutchfield

Langevin Simulations
Simulations were carried out using Langevin equations of motion: where r(t) is a memoryless Gaussian random variable with zero mean and unit variance. Since we track behavior when sweeping the thermal coupling parameter λ only, it is convenient to consider a particle with unit mass (m = 1) and set k B T = 1. This yields a very simple dynamic that is readily interpreted: The parameter λ (the thermal coupling coefficient) controls the damping force the particle experiences from the thermal bath when it has unit velocity. It is commonly called the damping coefficient or the inverse mobility.
The simulation employed the fourth-order Runge-Kutta method for the deterministic portion and Euler's method for stochastic portion of the integration. (Python NumPy's Gaussian number generator was used to generate the memoryless Gaussian variable r(t).) Figure 3's plot displays 3σ error bars, but the errors are generally small enough that they do not show up appreciably. Statistical errors were estimated using standard procedures for sample means and proportions. Figure 3 was generated from simulation using the following procedure. First, an ensemble of 20000 trials were chosen from an approximate equilibrium distribution of V store (x, y, z) with α = 2, β = 16, using the Monte Carlo algorithm. Second, this ensemble was thermalized while coupled to a bath (λ = 1) until the ensemble energy changed by no more than 1 part in 1000 over a unit time interval. Third, this ensemble was then used as the start state for the Fredkin gate operation. We then dropped λ down to a low coupling value and exposed the unit mass particles to the potential in Eq. (2) with α = 2, β = 16, and k = 1. Fourth, at this point we measured the work required to change the potential across our ensemble. Fifth, the potential was then held fixed for a time τ = π k/m = π using an integration step dt ≈ 0.0005. Finally, immediately following the computation interval, we measured the second work contribution-the work that would be harvested by dropping the potential back to V store . The average net work is the ensemble average difference between the work invested when raising the potential and the work harvested when lowering it. Figure 2 was generated by starting the particles in the equilibrium distribution described above, and running the simulation with λ = 0, to simulate dissipationless oscillatory dynamics. The plot shows a sample of 200 trials, rather than the full 20000, for clarity.