Harnessing Fluctuations in Thermodynamic Computing via Time-Reversal Symmetries

We experimentally demonstrate that highly structured distributions of work emerge during even the simple task of erasing a single bit. These are signatures of a refined suite of time-reversal symmetries in distinct functional classes of microscopic trajectories. As a consequence, we introduce a broad family of conditional fluctuation theorems that the component work distributions must satisfy. Since they identify entropy production, the component work distributions encode both the frequency of various mechanisms of success and failure during computing, as well giving improved estimates of the total irreversibly-dissipated heat. This new diagnostic tool provides strong evidence that thermodynamic computing at the nanoscale can be constructively harnessed. We experimentally verify this functional decomposition and the new class of fluctuation theorems by measuring transitions between flux states in a superconducting circuit.

Physics dictates that all computing is subject to spontaneous error. These days, this truism repeatedly reveals itself: despite the once-predictable miniaturization of nanoscale electronics, computing performance increases have dramatically slowed in the last decade or so. In large measure, this is due to the concomitant rapid decrease in the number of information-bearing physical degrees of freedom, rendering information storage and processing increasingly susceptible to corruption by thermal fluctuations. All computing is thermodynamic. Controlling the production of fluctuations and removing heat pose key technological challenges to further progress. One comparable setting that gives some optimism, though, is the overtly functional behavior exhibited by biological cells-presumably functional information processing by and controlling fluctuations in informational and energetic resources and in engine functioning. For this, one appeals to fluctuation theorems that capture exact time-reversal symmetries and predict entropy production leading to irreversible dissipation [8][9][10][11][12][13][14]. We are now on the doorstep of the very far-from-equilibrium thermodynamics needed to understand the physics of computing. And, in turn, this has started to reveal the physical principles of how nature processes information in the service of biological functioning and survival.
To close the gap, we show how to diagnose thermodynamic computing on the nanoscale by explaining the signature structures in work distributions generated during information processing. These structures track the mesoscale evolution of a system's informational states and reveal classes of functional and nonfunctional microscopic trajectories. We show that the informational-state evolutions are identified by appropriate conditioning and that they obey a suite of trajectory-class fluctuation theorems, which give accurate bounds on work, entropy production, and dissipation. The result is a new tool that employs mesoscopic measurements to diagnose nanoscale thermodynamic computing. For simplicity and to make direct contact with previous efforts, we demonstrate the tools on Landauer erasure of a bit of information in a superconducting flux qubit.
As a reference, we first explore the thermodynamics of bit erasure in a simple model: a particle with position and momentum in a double-well potential V (x, t) and in contact with a heat reservoir at temperature T . (Refer to Fig. 1.) An external controller adds or removes energy from a work reservoir to change the form of the potential V (·, t) via a predetermined erasure protocol {(β(t), δ(t)) : 0 ≤ t ≤ τ }. β(t) and δ(t) change one at a time piecewise-linearly through four protocol substages: (1) drop barrier, (2) tilt, (3) raise barrier, and (4) untilt. (See SM VI.) The system starts at time t = 0 in the equilibrium distribution for a double-well V (x, 0) at temperature T . Being equiprobable, the informational states associated with each of the two wells thus contain 1 bit of information [25].
The default potential, V (·, 0) = V (·, τ ), has two symmetric wells separated by a barrier. Following common practice we call the two wells, from negative to positive position, the Left (L) and Right (R) informational states, respectively.
The erasure protocol is designed so that the particle ends in the R state with high probability, regardless of its initial state. Conducting our simulation 3.5 × 10 6 times, 96.2% of the particles were successfully erased into the R state. Thus, as measured by the Shannon entropy, the initial 1 bit of information was reduced to 0.231 bits. Note that we chose the protocol to give partially inaccurate erasure in order to illustrate our main results on diagnosing success and failure.
At all other times t, V (·, t) has either one or two local minima, naturally defining metastable regions for a particle to be constrained and gradually evolve towards local equilibrium. We therefore define the informational states at time 0 ≤ t ≤ τ to be the metastable regions, labeling them R and, if two exist, L -from most positive to negative in position.
Since the protocol is composed of four simple substages, we coarse-grain the system's response by its activity during each substage at the level of its informational state. Specifically, for each substage, we assign one of three substage trajectory classes: the system (i) was always in the R state, (ii) was always in the L state, or (iii) spent time in each. Sometimes there is only one informational state and so the latter two classes are not achievable for all substages.
We then focus on a single mesoscopic observable-the thermodynamic work expended during erasure. An individual realization generates a trajectory of system microstates, with W (t, t ) being the work done on the system between times 0 ≤ t < t ≤ τ ; see SM VI. Let W s = W (t s−1 , t s ) denote the work generated during substage s and C s the substage trajectory class. Figure 1 (Outer plot sequence) shows the corresponding substage work distributions Pr(W s , C s ) obtained from our simulations. (See SM VII.) The drop-barrier and tilt substage work distributions are rather simple, being narrow and unimodal. The raisebarrier distributions have some asymmetry, but are also similarly simple. However, the untilt work distributions (farthest right in Fig. 1) exhibit unusual features that are significant for understanding the intricacies of erasure. Trajectories that spend all of the untilt substage in either the R state or L state form peaks at the most positive (red) and negative (orange) work values, respectively. This is because the R-state well is always increasing in potential energy while the L-state well is always decreasing during untilt. In contrast, the other trajectories contribute a log-linear ramp of work values (blue) dependent on the time spent in each. The ramp's positive slope signifies that more time is typically spent in the R state.
Looking at the total work W total = W (0, τ ) generated for each trajectory over the course of the entire erasure

Raise Barrier
Reset Drop Barrier~4  (Table S1) evolution of position distribution Pr(x). Potential V (x, ts) at substage boundary times ts, s = 0, 1, 2, 3, 4. Starting at t = t0, the potential evolves clockwise, ending at t = t4 in the same configuration as it starts:  Figure 2 (Three front plots) shows the work distribution for each of these three trajectory classes. Together they recover the total work distribution over all trajectories shown in Fig. 2 with ∆F the change in equilibrium free energy over the protocol, P(C) the probability of realizing the class C during the protocol, and R(C R ) the probability of obtaining the time reverse of class C under the time-reverse protocol. k B is Boltzmann's constant.
The TCFTs lead to several consequences. First, wellformulated trajectory classes allow accurate estimates of the works for their trajectories, even with limited knowledge of system response under the protocol and its time reverse. Second, they strictly and more strongly bound the average work over all trajectories compared to the equilibrium free energy change ∆F . Third, they provide a new expression for obtaining equilibrium free-energy changes: (2) Remarkably, this only requires statistics for a particular class C and its reverse C R to produce the system's free energy change. Since rare microstate trajectories may generate sufficiently negative works that dominate the average exponential work, this leads to a substantial statistical advantage over direct use of Jaryznski's equality ∆F = −k B T ln e −W/k B T − → Z for estimating free energies [26]. To explore these predictions, we selected a superconducting flux qubit composed of paired Josephson junctions ( Fig. 3(A)), resulting in a double-well nonlinear potential that supports information storage and processing ( Fig.  3(B)). SM VIII A explains the physics underlying their nonlinear equations of motion, comparing the similarities and differences with our model's idealized Langevin dynamics. Despite control protocols for double-well potentials that perform accurate and efficient bit erasure [27], we run the flux qubit in a mode that yields imperfect erasure ( Fig. 3(C)). As with the simulations, our intention is to illustrate how trajectory classes and the TCFTs can be used to diagnose and interpret success and failure in microscopic information processing using only mesoscopic measurements of work. Interplay between the geometric, linear magnetic inductances and the nonlinear Josephson inductances gives rise to a potential landscape that can be controlled with external bias fluxes. It is natural to call the φ x and φ xdc fluxes, threading the differential mode and the small SQUID loop, respectively, the tilt and barrier controls. (See (Fig.  3(A) caption.) SM VIII presents a derivation of the flux qubit potential and details its calibration. All experiments presented here were carried out at a temperature of 500 mK. To execute an erasure protocol, we first choose an information-storage state with a tall barrier and two equaldepth wells. The two-dimensional potential for this at the calibrated device parameters is depicted in Fig. 3(B). We implement the bit erasure protocol as a time-domain deformation imposed by the two control fluxes that starts and ends at the storage configuration. The amplitudes of the control waveforms in reduced units are small; see Fig. 3(C). Hence, the microscopic energetics change linearly as a function of the control fluxes. We use a local dc-SQUID magnetometer to continuously monitor the trapped flux state in the device-Readout 1 in Fig. 3(A). The digitized signal has a rise time of 100 µs, after which the two logical states are discriminated  virtually without error. A typical magnetometer trace V (t) acquired during the execution of the erasure protocol is shown in Fig. 3(C). We operate the magnetometer with a low-amplitude AC current bias at 10 MHz to avoid an increase in the effective temperature during continuous readout of the flux state due to wideband electromagnetic interference.
To collect work statistics, we repeat the erasure protocol 10 5 times. We identify the logical-state transitions from the magnetometer traces as zero-crossings, recording the direction δ i -sign convention: +1 (−1) for a L-to-R (Rto-L) transition-and the time t i relative to the start of the protocol. We evaluate a single-shot work estimate is the biasing of the potential minima at time t i . Making use of the linearity of the system energetics and the choice of offsets and compensation coefficients, we find U LR (t) = A (φ x (t) − φ x (0)), with the coefficient A = 210K × k B evaluated from the calibrated potential. The above work estimate based on the logical-state transitions is an accurate estimate of the true microscopic work assuming that the timescales for the state transitions and for changes in the control parameters are much slower than the intra-well equilibration. (See SM VI.) The total work distribution estimated from the flux qubit experiments is shown as the rear-most distribution in Fig.  3(D). Using the previous microstate trajectory partitioning into the Success, Fail, and Transitional trajectory classes reveals a decomposition of the total work distribution given by Fig. 3(D)(Three front panels). The close similarity with our simulations (Fig. 2) is notable. Especially so, given the rather substantial differences between the simulated system (idealized double-well potential and thermal noise, exactly one-dimension system, ...) and the experimental system (complex potential in two dimensions, nonideal fluctuations, ...). A priori it is not clear that the theoretical predictions of the informational classes should apply so directly and immediately to the real-world qubit. In point of fact, these differences serve to emphasize the descriptive power of the mesoscopic work fluctuation theorems: despite substantial differences in system detail they successfully diagnose the information-processing classes of microscopic trajectories. This robustness will be especially helpful in monitoring thermodynamic computing in biological systems, where, in many cases, information-bearing degrees-of-freedom cannot be precisely modeled.
We experimentally demonstrated that work fluctuations generated by information engines are highly structured. Nonetheless, they strictly obeyed a suite of time-reversal symmetries-the trajectory-class fluctuation theorems introduced here. The latter are direct signatures of how a system's informational states evolve and they identify functional and nonfunctional microscopic trajectory bundles. We showed that the trajectory-class fluctuation theorems naturally interpolate between Jarzynski's integral and Crooks' detailed fluctuation theorems, providing a unified diagnostic probe of nonequilibrium thermodynamic transformations that support information processing.
Using them we gave a detailed mechanistic analysis of the thermodynamics of the now-common example of erasing a bit of information as an external protocol manipulated a stochastic particle in a double-well potential (simulation) and the stochastic state of a flux qubit (experiment). To give insight into the new level of mechanistic analysis possible, we briefly discussed the untilt trajectory-class partitioning. Though ignoring other protocol stages, this was sufficient to capture the basic trajectory classes that generate the overall work distribution's features. Partitioning on informational-state occupation times during barrier raising and untilting-an alternative used in followon studies-yields an even more incisive decomposition of the work distributions and diagnosis of informational functioning. The corresponding bounds on thermodynamic resources obtained via the TCFTs also improve on current estimation methods. The net result is that trajectory-class analysis can be readily applied to debug thermodynamic computing by engineered or biological systems.   Competing Financial Interests Statement: The authors declare that they have no competing financial interests.

Data Availability:
The data that support the findings of this study are available from the corresponding author on reasonable request.

Supplementary Materials
Materials and Methods: Derivation of trajectory-class fluctuation theorems, further discussion and interpretation, and experiment implementation, calibration, and work estimation methods.

Materials and Methods
The following presents details, derivations, and explanations for the theoretical claims and experimental results. First, we give a synopsis of recently developed principles of thermodynamic computing. Then, we introduce the model microscopic stochastic thermodynamical system, including its equations of motion and its physical calibration. The following sections explain how to use and interpret the trajectory-class fluctuation theorems and provide their derivation. Their practical application to work estimation for trajectories and classes, in light of the alternative kinds of work (inclusive and exclusive) and methods for experimental estimation, is laid out. A brief commentary on the substage work distributions then follows. Finally, we turn to describe the flux qubit, its equations of motion, implementation, calibration, and measurement.

I. PRINCIPLES OF THERMODYNAMIC COMPUTING: A RECENT SYNOPSIS
A number of closely-related thermodynamic costs of computing have been identified, above and beyond the housekeeping heat that maintains a system's overall nonequilibrium dynamical state. First, there is the information-processing Second Law [28] that extends Landauer's original bound on erasure [2] to dissipation in general computing and properly highlights the central role of information generation measured via the physical substrate's dynamical Kolmogorov-Sinai entropy. It specifies the minimum amount of energy that must be supplied to drive a given amount of computation forward. Second, when coupling thermodynamic systems together, even a single system and a complex environment, there are transient costs as the system synchronizes to, predicts, and then adapts to errors in its environment [29][30][31]. Third, the very modularity of a system's organization imposes thermodynamic costs [32]. Fourth, since computing is necessarily far out of equilibrium and nonsteady state, there are costs due to driving transitions between informationstorage states [33]. Fifth, there are costs to generating randomness [34], which is itself a widely useful resource. Finally, by way of harnessing these principles, new strategies for optimally controlling nonequilibrium transformations have been introduced [27,[35][36][37].

II. MICROSCOPIC STOCHASTIC THERMODYNAMICAL SYSTEM
For concreteness, we concentrate on a one-dimensional system: a particle with position and momentum in an external potential V (x, t) and in contact with a heat reservoir at temperature T . An external controller adds or removes energy from a work reservoir to change the form of the potential V (·, t) via a predetermined erasure protocol The potential takes the form: with constants a, b 0 , c 0 > 0. During the erasure protocol, β(t) and δ(t) change one at a time piecewise-linearly through four protocol substages: (1) drop barrier, (2) tilt, (3) raise barrier, and (4) untilt, as shown in Table S1. The system starts at time t = 0 in the equilibrium distribution for a double-well V (x, 0) at temperature T . Being equiprobable, the informational states associated with each of the two wells thus contain 1 bit of information [25]. The effect of the control protocol on the system potential and system response is graphically displayed in Fig. 1. We model the erasure physical information processing with underdamped Langevin dynamics: where k B is Boltzmann's constant, γ is the coupling between the heat reservoir and system, m is the particle's mass, and r(t) is a memoryless Gaussian random variable with r(t) = 0 and r(t) For comparison to experiment, we simulated erasure with the following parameters, sufficient to fully specify the dynamics: γτ /m = 500, 2k B T τ 2 a/(mb 0 ) = 2.5 × 10 5 , b 2 0 /(4ak B T ) = 7, and 8a/b 3 0 c 0 = 0.4. The resulting potential, snapshotted at times during the erasure substages, is shown in Fig. 1(Inner plot sequence). Reliable information processing dictates that we set time scales so that the system temporarily, but stably, stores information. To support metastable-quasistatic behavior at all times the relaxation rates of the informational states are much faster than the rate of change of the potential, keeping the system near metastable equilibrium throughout. The entropy production for such protocols tends to be minimized.

III. TRAJECTORY-CLASS FLUCTUATION THEOREMS: USE AND INTERPRETATION
Here, we describe the trajectory-class fluctuation theorems, explaining several of their possible implications and exploring their application to both the simulations and flux qubit experiment. Their derivations are given in the section following. First, consider a forward process distribution P, defined by the probabilities of the system microstate trajectories − → Z due to an initial equilibrium microstate distribution evolving forward in time under a control protocol. Then, the reverse process distribution R is determined by preparing the system in equilibrium in the final protocol configuration and running the reverse protocol. The reverse protocol is the original protocol conducted in reverse order but also with objects that are odd under time reversal, like magnetic fields, negated. The time-reversal of a trajectory For a measurable subset of trajectories C ⊂ − → Z , let · C denote an average over the ensemble of forward process trajectories conditioned on the trajectory class C. Let P(C) and R(C R ) denote the probabilities of observing the class C in the forward process and the reverse class C R = { − → z R | − → z ∈ C} in the reverse process, respectively.
We first introduce a trajectory-class fluctuation theorem (TCFT) for the class-averaged exponential work e −W/k B T C : with ∆F the system equilibrium free energy change. We also introduce a class-averaged work TCFT: This employs the Kullback-Liebler divergence D KL [ · ] taken between forward and reverse process distributions over all class trajectories − → z ∈ C, conditioned on the forward class C and reverse class C R , respectively. If we disregard this divergence, which is nonnegative and would generally be difficult to obtain experimentally, we then find the lower bound W min C on the class-averaged work of Eq. (1).
In the limit of class C possessing only a single trajectory, we recover detailed fluctuation theorems as in Ref. [12]. If, however, we take C to be the entire set of trajectories − → Z , we recover integral fluctuation theorems as in Jarzynski's equality [10]. Thus, the TCFTs are a suite that spans the space of fluctuation theorems between the extreme of the detailed theorems, that require very precise information about an individual trajectory, and the integral theorems, that describe the system's entire trajectory ensemble. SM IV below provides proofs for both TCFTs.
We can rearrange Eq. (S2) to obtain Eq. (2)-an expression for estimating equilibrium free energy changes: Thus, to estimate free energy one sees that statistics are needed for only one particular class and its reverse. Generally, this gives a substantial statistical advantage over direct use of Jaryznski's equality: since rare microstate trajectories may generate negative work values that dominate the average exponential work [26]. The problem is clear in the case of erasure. Recall from Fig. 2(Three front panels) that Fail trajectories generate the most-negative work values. In the limit of higher success-rate protocols that maintain low entropy production, failures generate more and more negative works, leading them to dominate when estimating average exponential works.
In contrast, to efficiently determine the change in equilibrium free energy from Eq. (2), its form indicates that one should choose a class that (i) is common in the forward process, (ii) has a reverse class that is common in the reverse process, and (iii) generates a narrow work distribution. This maximizes the accuracy of statistical estimates for the three factors on the RHS. For example, while the equilibrium free energy change in the case of our erasure protocol is theoretically simple (zero); the Success class fits the criteria.
We can then monitor the class-averaged work in excess of its bound: The inequality in Eq. (1) is a refinement of the equilibrium Second Law and therefore the bound W min C generally provides a more accurate estimate of the average work of trajectories in a class compared to the equilibrium free energy change ∆F . More precisely, as we will see below, an average of the excess E C over all classes C in a partition of trajectories must be smaller than the dissipated work W − ∆F . For trajectory classes with narrow work distributions, this can be a significant improvement. We can see this by Taylor expanding the LHS of Eq. (S2) about the mean dimensionless work W/k B T C . This shows that Eq. (1) becomes an equality when the variance and higher moments vanish. SM V below delves more into moment approximations. In any case, trajectory classes with narrow work distributions have small excess works E C .
To estimate R(C R ), we ran 3.5 × 10 6 simulations of the reverse process. Table S2 shows that the Success and Fail classes have small excesses and, as seen in Fig. 2(Three front panels), these classes indeed have narrow work distributions. Elsewhere we explore these and additional partition schemes, finding that the Transitional trajectories can be further partitioned to yield narrow work distributions so that all trajectory classes have small excesses E C . In short, this demonstrates how well-formulated trajectory classes allow accurate estimates on the works for all trajectories.
To measure the efficacy of a given partition Q of trajectories into classes, we ask what the ensemble-average of   class-average excess works is: (1), we see that W min Q is the coarse-grained lower bound on ensemble-average dissipation from Ref. [38]: where D KL [ · ] is the Kullback-Liebler divergence between forward and reverse process distributions over the trajectory classes C ∈ Q. Since Kullback-Liebler divergences are nonnegative, such a bound always provides an improvement over the equilibrium Second Law. Table S2 shows both W min Q and E Q for the trivial partition { − → Z }, our three-class partition, labeled Untilt-Centric I, and the improved partition described in follow-on work, labeled Untilt-Centric II. In this case, the latter two also provide an improvement on the nonequilibrium Second Law which, assuming metastable starting and ending distributions, provides a lower bound on the average work equal to 0.533, the change in nonequilibrium free energy.
We can appeal to Landauer's erasure bound-k B T ln 2 ≈ 0.693 k B T -to calibrate the excesses E C and E Q . We see for the simulation data that our three-class partition Untilt-Centric I provides class-average work bounds that, on average, are only about 11% of k B T ln 2 from the actual class-average works. The more refined Untilt-Centric II partition reduces this excess to about 5% while the trivial partition fails by about 91% of k B T ln 2.
The experimental data matches these results, with the largest discrepancy occurring for the class-average excess for the Fail class. This is not wholly coincidence, since we determined the parameters of the experimental protocol by adjusting the parameters of a simple two-state Markov simulation to obtain a work distribution similar to that obtained by our Langevin simulations of the Duffing potential system described in the main text. However, it is interesting that this was sufficient to provide matches in both the clean decomposition of the total work distribution by trajectory class and the quantities of Table S2. We also recover the equality of Ref. [38] for the ensemble-average work by averaging Eq. (S3) over each class: which of course is lower bounded by W min Q . These results suggest the criterion for optimal trajectory partitions: Select a partition sufficiently refined to yield tight bounds on class-average works, but no finer. Machine learning methods for model order selection will provide a basis for a natural classification scheme for trajectories that captures all relevant thermodynamics and information processing. By changing our forward and reverse processes P and R to begin in system microstate distributions other than equilibrium, a yet-broader class of TCFTs emerge. We can then find analogous results for heats and comparisons with works and nonequilibrium free-energy changes. We explore these in depth elsewhere.

IV. TCFT DERIVATIONS
We now present derivations for the two TCFTs introduced in Eqs. (S2) and (S3). Assume that the system dynamics is described by a Hamiltonian specified in part by an external control protocol, as well as by a weak coupling to a thermal environment that induces steady relaxation to canonical equilibrium. Start the system in equilibrium distribution π 0 for Hamiltonian H 0 and run a protocol until time τ , causing the system Hamiltonian to evolve to H τ . If we then hold the Hamiltonian at H τ for a long time, the system relaxes into the equilibrium distribution π τ . The system's ensemble entropy change from t = 0 to t = ∞ is then: The trajectory-wise system entropy difference is defined to be: where z 0 and z τ are the initial and final microstates of system microstate trajectory − → z , respectively. Averaged over all trajectories − → z ∈ − → Z , this then becomes the ensemble entropy change. Let p( − → z |z 0 ) denote the probability of obtaining system microstate trajectory − → z via the protocol conditioned on starting the system in state z 0 = − → z (0). Now, start the system Hamiltonian at H τ and run the reverse protocol, ending the Hamiltonian at H 0 . We then obtain the trajectory − → z with a different conditional probability: r( − → z |z 0 ).
Assuming microscopic reversibility and given a system trajectory − → z , the change in the heat bath's entropy is: where β = 1/k B T , Q( − → z ) is the net energy that flows out of the heat bath into the system given the trajectory − → z , and (·) R denotes time-reversal. This holds for systems with strictly finite energies and Markov dynamics that induce the equilibrium distribution when control parameters are held fixed [39]. Both our simulated Duffing potential system and flux qubit obey these requirements at sufficiently short time scales. Then we can express the total trajectory-wise change in entropy production due to a trajectory − → z as the sum of system and heat reservoir entropy changes: Since π t (z) = e −β(Ht(z)−Ft) , with F t the system's equilibrium free energy at time t, we can write: Using Eq. (S5), we also have: with: Combining, we obtain a detailed fluctuation theorem: From here, we derive our first TCFT by integrating each side of Eq. (S6) over all trajectories − → z in a measurable set C ⊂ − → Z . Starting with the LHS and recalling the Iverson bracket [·], which is 1 when the interior expression is true and 0 when false, we have: Combining, we have our first TCFT, Eq. (S2).
To obtain the second TCFT, we first change the form of Eq. (S6): Then we calculate the class-average. The equilibrium free energy change is unaffected while the rightmost term becomes: which gives Eq. (S3)'s TCFT.

V. CLASS-AVERAGED WORK APPROXIMATION FOR NARROW DISTRIBUTIONS
Here, we demonstrate that the class-averaged work W C approaches its bound W min C when the variance and higher moments of the class' distribution of works vanish. One concludes that W min C is a good approximation for W C when the class' work distribution is narrow. We first express the LHS of Eq. (S2) in terms of the unitless distance of work from its class-average: . Then, we Taylor expand the exponential inside the class-average: x n C . Equation (S2) then gives: Since e −x is convex, so a ≥ 0. Then: The second line becomes an equality when a goes to zero, which occurs as the variance and higher moments vanish.

VI. WORK DEFINITIONS AND EXPERIMENTAL ESTIMATION
Properly estimating the required works and devolved heats from experimental devices undergoing cyclic control protocols requires explicitly and consistently accounting for energy and information flows between the system, its environment, and the controlling laboratory apparatus. To this end, we construct a model Hamiltonian universe for common processes involving small systems interacting with laboratory apparatus and a thermal environment. After deriving key equalities for two definitions of work, the inclusive and exclusive works, we define a method of approximating them in appropriate cyclic protocols.

A. The Model Universe and Hamiltonian
To study a small system that exchanges energy with its environment in the forms of heat and work, we introduce a model universe: a system of interest, a heat bath, and a lab (laboratory apparatus) that controls the system and derives any needed energy from a work reservoir. The system directly interacts with both the heat bath and the lab, but the heat bath and lab are not directly coupled. We assume that a Hamiltonian H describes the universe's evolution and that there is a set of generalized coordinates which can be sensibly partitioned into those for the system, heat bath, and lab. Then, we decompose the universe Hamiltonian into the following form: where s, b, and l denote both the generalized coordinates and conjugate momenta for the system, bath, and lab, respectively. For any universe Hamiltonian H, there can be many choices for this decomposition. We also define the system Hamiltonian H as the three components that depend on the system coordinates: First, consider the subset of lab coordinates l for which h S,L has nontrivial dependence. These so-called protocol parameters λ are often simple and much fewer than the entire set of l. We often assume that we have total control of their evolution. More precisely, under an appropriate preparation for the lab at time t = 0, a specific trajectory for the protocol parameters {λ(t)} t for 0 ≤ t ≤ τ is guaranteed for all preparations of the heat bath and system coordinates. We refer to the parameter trajectory as the protocol. Suppose the heat-bath degrees of freedom that interact with the system change much faster than the system's. We can assume that the system response to the bath resembles Brownian motion. On the time scale of changes in the system coordinates, then, we ignore the system-bath interaction term h S,B in writing the system Hamiltonian H : The latter decomposition into kinetic energy T and potential energy V can be used to write Langevin equations of motion for the system. Furthermore, if the heat bath has a relaxation time sufficiently short that it is roughly in equilibrium at all times with fixed temperature, then its influence on the system will be memoryless.

B. Inclusive and Exclusive Works and Heats
The basic scenario for executing a protocol is as follows. The universe coordinates begin according to a given initial distribution Pr(s) at time t = 0 and they evolve in isolation until t = τ . As above, we assume that a well-defined protocol {λ t } t emerges due to our preparation of the lab coordinates.
We label all energy exchanged between the system and lab as work and all energy exchanged between the system and heat bath as heat. Since the lab is directly coupled only to the system, the work they exchange is given by the change in energy of the lab's work reservoir. Similarly, since the heat bath is directly coupled only to the system, the heat exchanged is given by the change in the heat bath's energy.
Note that this requires choices as to what constitute the energies of the three universe subsystems. While H B , H S , and H L define energies for the heat bath, system, and work reservoir, respectively, what of h S,L and h S,B ? If all subsystems were macroscopic, these interaction terms would be negligible. While it may be desirable to assume that the system is only weakly coupled to the heat bath-so that h S,B can be ignored-h S,L can be significant in many important small systems. And so, in general, we define the system energy to be H S plus any portions of h S,L and h S,B . Then the work reservoir energy is H L plus the rest of h S,L , while the heat bath energy is H B plus the rest of h S,B . To make these distinctions clear we label two types of works, each corresponding to the two extremes for allocation of h S,L between the system and work reservoir: the inclusive work W and the exclusive work W 0 [40]. Specifically: We can similarly define the inclusive heat Q and exclusive heat Q 0 depending on how we allocate h S,B between the system and heat bath: The inclusive work corresponds to fully including h S,L in the system energy, while the exclusive work corresponds to excluding it. Inclusive and exclusive heat correspond similarly with respect to h S,B . There is a key relation between the inclusive and exclusive works: That is, the inclusive work for an interval of time equals the sum of the exclusive work and the change in the system-lab interaction term h S,L . In the above expressions, calculating the rate of change of a work or heat requires the time derivative of one or more of H L and H B . This can be problematic. Fortunately, there are alternate forms that are amenable. One can show that the inclusive work rate is given by: This is a more common definition for the work rate in small-system nonequilibrium thermodynamics. And, it allows the work to be calculated as: The exclusive work W 0 has a corresponding form: For the case where h S,L is a scalar potential for s, this is the product of the corresponding force with velocity. This makes the exclusive work equal to a familiar mechanics definition of work as the integral of the dot product of force and displacement: In this way, we write the inclusive and exclusive work rates in terms of the rates of change of the system and work-reservoir interaction term h S,L with respect to either the system or work reservoir coordinates.

C. Approximating Inclusive Work Experimentally
For the flux qubit experimental system investigated here, we assume the following: That is, as far as the flux qubit and work reservoir are concerned, the only relevant energies at least partially ascribable to the flux qubit are its kinetic energy and the potential energy with the work reservoir. h S,L must then capture the change in the potential V due to changes in the protocol parameters. We could simply define h S,L (s, λ) = V (s, λ) so that H S (s) = T (s). However, it is more useful to allocate the initial potential energy to H S . That is: For cyclic protocols where V (·, λ 0 ) = V (·, λ τ ), such as in our erasure operation, h S,L (s(t), λ(t)) vanishes for all trajectories at t = 0, τ . By Eq. (S7) we then have the useful equality W = W 0 between inclusive and exclusive works taken over the entire protocol.
Estimating W for a system trajectory is then equivalent to estimating W 0 for the cyclic protocols we consider. In the flux qubit, the form of h S,L is known and the specific protocol {λ t } t∈[0,τ ] is known. Unfortunately, we lack sufficient information about its instantaneous state s at all times, since the device's physics precludes precise measurements of system flux φ-the relevant part of s for determining the potential h S,L . Instead, we do have reliable measurement of large and stable changes in the flux φ. This specifically monitors when the system moves between wells in a double-well potential V (·, λ(t)), if the rate of transition between wells is sufficiently slow.
And so, we can use information about the flux φ to approximate the exclusive work contribution at each moment in time. Then, adding up these contributions yields an approximation to the total exclusive work W 0 over the entire protocol and therefore of the inclusive work W over the entire protocol. Note that the protocols used here maintain two wells at all times for the system flux φ. We develop the approximation in two steps.

First-Order Approximation
We first partition the potential in flux space into three segments. Two segments constitute the wells for the flux in which that state spends all its time except for very brief transitions between wells. Then, the third segment connects the two wells, capturing the dynamics arising from crossing the barrier that separates them.
We require that the partitioning allows the following two approximations. First, the particle spends negligible total duration in between the two wells. Second, the wells do not change shape over the protocol, but instead simply raise or lower in potential at different times, if they change at all. This means that the shape of the system-lab interaction term h S,L (·, λ(t)) at any time t is very simple in the two wells-flat.
The result is that the exclusive work over any time duration is easily calculable from the experimental data. During times when the flux remains in a well, the exclusive work must be zero, since h S,L does not change with s. During a transition, the shape of h S,L does not change due to the first approximation. Then, the exclusive work is the difference in heights of the two wells as measured by h S,L : where λ(t) is the protocol parameter setting at any time during the transition and w 0 and w 1 are arbitrary flux values in the starting and ending wells, respectively.
Thus, the total inclusive work W over the protocol for a trajectory is simply the sum of the jump contributions above for each transition.

Second-Order Approximation
In point of fact, the potential wells do change shape. Fortunately, our method for calculating the inclusive work over the protocol remains valid under weaker constraints on the protocol.
We first require the protocol to maintain two metastable regions, the informational states, at all times; each possessing a unique local potential minimum continuously in time. We denote the flux value at the potential minima of informational state i at time t as φ t i . The protocol must also evolve slowly enough so that the potential landscape changes slowly compared to the system's relaxation rate in each metastable region. Both of these criteria are met by our erasure protocol.
Consider a short duration ∆t during which the potential V (·, t) changes little but long enough compared to the relaxation rates of the informational states. Consider two cases: either the system crosses the barrier between the two informational states during this time or it remains in one informational state.
First, suppose that the system transitions from one informational state i to the other j. Denote the system flux at the beginning of the transition as φ t and at the end as φ t+∆t . By Eq. (S7), the exclusive work contribution ∆W trans 0 is the difference of the inclusive work contribution and the change in system-lab interaction term ∆h S,L . The change ∆h S,L can itself be broken down into two terms, one for the difference in h S,L between the informational-state minima and the other for the change in h S,L local to the respective minima. In other words: where ∆m t , Eq. (S12)'s first term, is the change in h S,L at the informational-state minima and ∆l t , Eq. (S12)'s second term is the change in h S,L of the system with respect to the informational-state minima. Our protocol ensures that the total number of transitions is so small and the time durations so narrow that we can ignore the total contributions of inclusive works ∆W trans due to these transition durations. Then, we approximate the exclusive work contribution during a transition via: Suppose, now, that the system remains in one informational state i during a time interval ∆t. Since the relaxation rate is fast compared to the duration ∆t, we assume that the system visits all microstates in the informational state roughly in proportion to the local equilibrium distribution. Then, the inclusive work contribution ∆W stay is approximately independent of the specific system trajectory during this time and, instead, is determined by the time duration and the informational state i. If during this time we simultaneously shift the entire potential up by a given amount, we add an inclusive work contribution equal to the potential shift but the system trajectory is unchanged. Thus, the actual inclusive work contribution is equal to an amount due to the change in the system-lab interaction term at the informational-state minimum plus an amount due solely to the change in potential shape at the informational state with respect to its minimum. That is: where ∆W s is the inclusive work contribution due to the change in potential shape at the informational state. Equation (S13) applies equally well here in describing the change in system-lab interaction term. Thus, the exclusive work contribution ∆W stay 0 for this time interval is: The result is that we have exclusive work contributions for both durations when the system transitions between informational states and when it remains in one.
To find the total exclusive work over the protocol for a given trajectory we add up the contributions. The sum of all local h S,L changes ∆l t over all durations is the net local change in h S,L . Recall, though, that the minima of the informational states begin and end at the same values. And so, the total local change in h S,L reduces to the absolute change in h S,L . However, since we chose h S,L (·, 0) = h S,L (·, λ(t)) = 0, this must vanish: We can now specify our final approximation: At any time t, the inclusive work contribution ∆W s due to the change in potential shape is independent of the informational state. This is reasonable for our erasure protocol since the asymmetric contribution to the change in potential-the tilt-is slight. While it clearly breaks the symmetry of the double-well potential by changing the well heights, it has less effect on the well shapes and even less in making those shapes distinct.
Then, we can assume that the sum of ∆W s for a trajectory is the same as that staying in one informational state the entire time. Since the protocol is very slow and cyclic, though, a particle that stays in one informational state the entire time must receive approximately zero inclusive work W . Given that the sum over all ∆W s must be equal to W for such a trajectory, it must also be negligible.
Altogether, the total exclusive work is approximately given by the sum over all transitions between informational states of the difference in potential at the informational-state minima: To reiterate, since ∆h S,L = 0, this is also the total inclusive work W (0, τ ) for a trajectory over the entire protocol.

VII. SUBSTAGE WORK DISTRIBUTIONS COMMENTARY
Here, we briefly interpret several features of the substage work distributions observed in Fig. 1(Outer left plots).
The distributions for barrier dropping and tilting are narrow, symmetric peaks; see Fig. 1 (Outer left plots). Barrier raising also has a rather narrow peak, composed primarily of trajectories always in the R state, but also exhibits a bulge toward positive work; see Fig. 1 (Top right). Note that the L state is created mid-way through barrier raising, allowing for trajectories that spend some time in either informational state, but disallowing trajectories that spend all time in the L state. The former induce the positive work bulge toward less negative works, which while notable will not be further explored here.
The substage work distributions for untilting presents the most striking picture; see Fig. 1 (Bottom right). Always-R trajectories induce a large positive work peak (red), always-L trajectories induce a large negative work peak (orange), and all other trajectories induce a ramp between them (blue).
These features can be directly interpreted by following the locations of the potential minima over time and noting how the shifting potential adds or removes energy from a particle. During barrier dropping, to take one example, the protocol raises both minima by over 7 k B T , resulting in a narrow, peaked work distribution with a mean near 7 k B T .
Most interesting is the untilt substage. Since most particles start and then stay in the R state for this substage, a large positive work is probable, due to the rising R-state well. However, it is also possible for the system to start in and then get stuck in the L-state well, resulting in a large negative work. The final possibility is transitioning between states during untilting, resulting in an intermediate range of less-likely work values. For trajectories that do transition between states during untilting, it is more likely to spend more time in the R state, since it is energetically favored, resulting in the rising probability with increasing work in their work distribution-giving rise to the log-linear ramp in the work distribution.
Note that there are small peaks on each end of this third class' distributions that require a more nuanced explanation. When a particle crosses a barrier-due to random thermal excitation-the surplus energy may quickly send the particle back to the previous well before it can be dissipated. Such particles then spend almost all of the substage in this first well, generating a work value accordingly. Statistics of the ramp proper are due to particles that have time to locally equilibrate before crossing any barriers.
Follow-on work develops the theory underlying this detailed mechanistic analysis and analyzes similar behavior in all metastable-quasistatic processes.

VIII. FLUX QUBIT DEVICE, CALIBRATION, AND MEASUREMENT
The benefits of the flux qubit device are several-fold. First, their physics provide a genuine two-degree of freedom dynamics, while other comparable experiments on Maxwellian demons and bit erasure are very high dimensional, only indirectly providing an effectively few-degree of freedom dynamics [19,21,41]. Second, they operate at very high frequency and so one readily captures the substantial amounts of data required to accurately estimate rare-fluctuation statistics. Third, they leverage recent advances in manufacturing technology led by efforts in quantum computing. Fourth, being constructed via modern integrated circuit technology they form the basis of a technology that will scale to large, multicomponent circuit devices for more sophisticated thermodynamic computing. And, finally, in the near future flux qubits will facilitate experiments that probe the thermodynamics of the transition to quantum information processing.
At the microscopic level, a fraction of the electrons in a superconducting metal form bosonic Cooper pairs-a quantum-coherent condensate. For designing superconducting electronic circuits, though, one can forgo the microscopic description and work with higher-level phenomena, such as flux quantization and the Josephson relations for weak links. Importantly, the circuit-level degrees of freedom are not coarse-grained quantities, but display a full range of quantum behavior, including quantized excitations, coherent superpositions, and entangled states in such circuits. For our purposes here, however, we run the device so that it exhibits only classical stochastic dynamics, reserving quantum information thermodynamic explorations for the future.
This section lays out the basic physics of the flux qubit device and details of the experimental implementation.

A. Flux qubit physics
Our experimental information processor is a special type of superconducting quantum interference device (SQUID) with two degrees of freedom-a gradiometric flux qubit or the variable-I c rf SQUID introduced by Ref. [42]. Notably, the energies associated with the motion perpendicular to and along the escape direction differ substantially by about a factor of 12. Practically, this asymmetry reduces the two-dimensional potential to one dimension. The net result is a device with an effective double-well potential with barriers as low as ∆U ∼ k B T that operates at frequencies in the GHz range. The potential shape is controlled by fluxes that are readily controlled by currents. SQUID device parameters, used to determine the potential shape and energy scales, were all independently determined.
The variable-I c rf SQUID replaces the single Josephson junction in a standard rf SQUID with a symmetric dc SQUID with small inductance β dc = 2π I c0 /Φ 0 1, where 2 is the loop inductance, I c0 = i c1 + I c2 is the sum of critical currents of the two junctions, and Φ 0 is the flux quantum h/2e. This architecture gives a device whose parameters can be accurately measured and that can be selected to exhibit a range of phenomena including thermal activation, macroscopic quantum tunneling, incoherent relaxation, photon-induced transitions, and macroscopic quantum coherence. It also allows us to perform, as we demonstrate, nanoscale thermodynamic computing.
Its macroscopic dynamical variables are the magnetic flux Φ through the rf SQUID loop and Φ dc through the dc SQUID loop. Based on the resistively-capacitively-shunted junction model of Josephson junctions, in the classical limit the variable-I c rf SQUID's deterministic equations of motion are [42]: In units of Φ 0 /2π, the 2D potential for the variable-I c rf SQUID is U (φ, φ dc ) = U 0 f (φ, φ dc ) with: where U 0 = Φ 2 0 /(4π 2 L). Here, γ = L/(2 ) is the ratio of rf and dc SQUID inductances; φ x (φ xdc ) is the external flux applied to the rf (dc) SQUID loop; φ (φ dc ) is the flux enclosed in the rf (dc) SQUID loop; β 0 = 2πLI c0 /Φ 0 ; and δβ = 2πL(I c2 − I c1 )/Φ 0 .
For large-amplitude tuning of the external controls, the system response to φ x (φ xdc ) is 2π (4π) periodic. We make use of the global features to accurately determine the coefficients of the potential.
In the experiment, cross-coupling between the barrier and tilt controls was canceled by an affine transformation (φ x , φ xdc ) → (φ x + αφ xdc , φ xdc ), with the coefficient α chosen such that the equilibrium population of the left and right wells was unaffected to first order by the barrier control φ xdc .
Operating the magnetometer generates wide-band local electromagnetic interference that can affect the dynamics of the flux qubit. A careful study of the back-action indicates that low-amplitude operation of the magnetometer can induce transitions in a manner that corresponds to a shift in the effective tilt and flux controls. Importantly, the effective temperature under magnetometer operations was not elevated from 500 mK.
The dynamical variable φ describes the in-phase motion of the two junctions that results in a current circulating in the rf SQUID loop. The dynamical variable φ dc describes the out-of-phase motion, resulting in a current circulating in the dc SQUID loop. The shape of the effective potential is completely determined by the dimensionless function f (φ, φ dc ) and the energy scale of the potential is determined by U 0 . With suitable device parameters and applied fluxes (φ x and φ xdc ) one obtains a double-well potential. The barrier height ∆U separating the two wells is readily adjusted by varying φ xdc . The effective potential is plotted in Fig. 3(B) with parameters: β 0 = 6.2, γ = 12 and δ b = 0.2.

B. Experimental implementation
The junctions were 1 × 1µm 2 Nb/Al 2 O 3 /Nb tunnel junctions of very low subgap leakage, typically having a quality factor of V m ≈ 70 mV at 4.2 K.
We followed a standard procedure (see, e.g., Ref. [42]) for calibrating the flux qubit parameters. An outline of the steps is given below. A complete description of the measurements is presented elsewhere.
First, by executing wide-range sweeps of the coil currents I tilt and I barrier , parameter values corresponding to singlevalued and bistable potential landscapes are recorded. A linear transformation from I tilt and I barrier to (φ x , φ xdc ) is established by matching the experimental periodicity with the theoretical one (2π, 4π). Linear cross-talk from I barrier to I tilt is calibrated by orthogonalizing the global response. Cross-talk from I tilt to I barrier can be assumed to be small due to the symmetry of the on-chip flux lines and is taken to be zero.
The parameter values β 0 = 6.2, γ = 12 and δ b = 0.2 are determined by equating the observed extent of hysteresis at φ xdc = 0 and the differential flux response d φ /dφ x at φ xdc = 2π to theoretical predictions. The prefactor U 0 = 56.3 K is determined by equating the observed escape energy for inter-well transitions at high temperatures with k B T . The plasma frequency ω p = 1/ √ LC = 2π × 13.7 GHz is determined from the observed low-temperature cross-over temperature T c r = 103 mK to macroscopic quantum tunneling (MQT) dominated dynamics. We obtain an upper bound Q = ω p RC < 130 from the coupling to the passive shunt resistor of the magnetometer. Parameter calibration measurements are performed in such a way that the effect of magnetometer back-action is nulled through pulsing of the readout or otherwise minimized. The effective temperature under continuous magnetometer operation was determined by repeating the measurement for escape energy for interwell transitions and comparing the result to that obtained under pulsed magnetometer operation.