Principles of Low Dissipation Computing from a Stochastic Circuit Model

We introduce a thermodynamically consistent, minimal stochastic model for complementary logic gates built with field-effect transistors. We characterize the performance of such gates with tools from information theory and study the interplay between accuracy, speed, and dissipation of computations. With a few universal building blocks, such as the NOT and NAND gates, we are able to model arbitrary combinatorial and sequential logic circuits, which are modularized to implement computing tasks. We find generically that high accuracy can be achieved provided sufficient energy consumption and time to perform the computation. However, for low-energy computing, accuracy and speed are coupled in a way that depends on the device architecture and task. Our work bridges the gap between the engineering of low dissipation digital devices and theoretical developments in stochastic thermodynamics, and provides a platform to study design principles for low dissipation digital devices.


I. INTRODUCTION
The last decade has seen an exponential growth in energy consumption associated with information, communications, and computing technologies. Such resource demands are not sustainable, and thus there is a need to design devices with reduced energetic costs. While the problem of computing efficiency dates back to Landauer, 1,2 with modern developments in stochastic thermodynamics, this problem is actively being revisited. 3,4 The main goal of this paper is to bridge the gap between developments in nonequilibrium statistical physics and circuit engineering by proposing a model for stochastic logic circuits that is thermodynamically consistent, and thus amenable to physical analysis and constraints, but simple enough to be extendable to complex computing tasks. By treating thermal fluctuations in electron transport explicitly at a mesoscopic scale, our model reproduces the behavior of a robust circuit in the low noise limit, but describes errors accurately away from this limit. With this model we explore the consequences of carrying out computations at low thermodynamic costs and finite time, and provide design principles for low dissipation computing devices.
State-of-the-art semiconductor devices are typically built from metal-oxide-semiconductor field-effect transistors on the scale of a few nanometers, enabling billions of transistors to be packed on a single chip. In order to mitigate heating and large energy consumption burdens, it would be advantageous to operate such small devices with small bias voltages; however, as biases approach thermal scales, fluctuations increase, which necessitates a careful treatment of thermal noise. 5,6 The conventional treatment of thermal noise is largely phenomenological and involves either a correction to the power spectral a) Electronic mail: dlimmer@berkeley.edu density, 7 or transformation of the internal noise into external independent sources. 8,9 Such models are typically valid only near equilibrium where the fluctuationdissipation theorem can be invoked to constrain their functional form, 10 whereas higher-order correlations are needed in general to determine the full response. [11][12][13] While these models can provide insight into how thermal noise may put a physical limit on the density of transistors, 14 their validity in non-linear electrical networks operating far from equilibrium is uncertain.
Stochastic thermodynamics provides a theoretical way to move beyond an equilibrium description of thermal noise and its impact on information processing. 15 While information theory provides limits on the accuracy of typical communication, 16,17 stochastic thermodynamics provides generalized fluctuation-dissipation relationships, and places limits on the work required to implement a physical process in finite time and the spectrum of its fluctuations. [18][19][20][21][22] The link between information theory and stochastic thermodynamics has generated a wealth of expressions relating precision, speed, and dissipation, including the thermodynamic uncertainty relationships, speed limits, and fluctuation theorems. For example, dissipation bounds the rate at which a system transforms between different states. 23-29 Dissipation also provides an upper bound for the precision of a current. [30][31][32] A universal tradeoff between power, precision, and speed has been proposed for communication systems as well. 33 These theoretical results have found application in many biological processes that natively operate near thermal energy scales. [34][35][36][37][38][39] Placed in the context of artificial computing, these relationships have shed light on fundamental constraints on the design of computing devices to minimize thermodynamic costs. 3,4,[40][41][42][43] While such theoretical results are general, to apply them to the problem of computing design requires a realistic physical representation of information processing, such as bit storage, measurement, and erasure. Some success has been made with nonlinear single-electron devices and Coulomb blockade systems, [44][45][46] where the logical states are represented by the presence of a few electrons. More recently, thermodynamically consistent stochastic models have been proposed for transistors and non-linear electronic circuits using either the continuous or discrete degrees of freedom. [47][48][49] For example, two-terminal devices, such as tunnel junctions, diodes, and metal-oxidesemiconductor (MOS) transistors, have been modeled as bi-directional Poisson processes embedded in a Markovian graph representing electron transfer. 49 While such models can reproduce nonlinear current behaviors and noise characteristics, the nonlinearity has to be encoded by parametrizing the voltage dependence of the forward and backward rates. In this paper, we adopt a different approach where single logic gates are described by a tunnel junction model on the mesoscopic scale, combined with a capacitive circuit model for the charging and manipulation of the device. In this case, nonlinearity emerges from many interacting gates. Such an approach is able to describe electron transport processes consistent with the fluctuation theorems, 50,51 but also consistent with the complementary metal-oxide-semiconductor (CMOS) circuit platform used widely in modern computing devices. Therefore, it provides an ideal platform to study circuit behaviors with the tool of stochastic thermodynamics.
In what follows, we demonstrate principles for low dissipation computing by constructing a stochastic model for logic circuits from a bottom-up approach. By working with elementary linear components, we can build nonlinear circuits that are thermodynamically consistent. We first introduce a model for single gates, including the NOT gate and the NAND gate, and discuss their physical properties. We then study the collective behaviors of these basic components, including spatial correlations within combinational circuits, and temporal correlations within sequential circuits, where the emphasis will be on circuit design principles. The logic circuits are finally modularized and scaled up to a computing device to illustrate how multiple components are synchronized to complete a computing task. Throughout, the thermodynamically consistent model enables a description of errors and dissipation.

II. MODEL FOR SINGLE GATES
Modern CMOS circuits implement logic functions by integrating two different types of transistors differentiated by their major charge carriers, so-called N-type and P-type transistors. Here we choose a mesoscopic tunnel junction model to describe electron transport in a single gate. 52 The transistors are modeled by two singleelectron levels of energy i with i = N, P for the Ntype and P-type transistors. The electrodes are modeled by electron reservoirs with chemical potential µ j with j = s, d, g denoting the source, drain, and gate, respectively. Electron transfer among them is described by a Markovian master equation, parametrized by transition rates k ji that describe the exchange rate of an electron from site i to j. The transition rates are chosen to satisfy a local detailed balance condition, and thus guaranteeing thermodynamical consistency, where β = 1/k B T , k B is the Boltzmann constant, and T the temperature of the device. The energy is described by either the band energy for an electron in the transistor, i , or a chemical potential, µ j , for an electron in an electrode. The condition of local detailed balance is a prerequisite for the application of stochastic thermodynamics, as it ensures a correct description of dissipation away from equilibrium, and relaxation to a Boltzmann distribution at equilibrium. While local detailed balance models each microscopic transition as being thermally mediated, emergent nonlinear behaviors resulting from collections of transitions can take the system arbitrarily far from equilibrium. 53 The energy levels of the transistors are controlled by an input voltage denoted V in . In the case of a field-effect transistor, V in refers to the gate voltage that switches the transistor on and off. In the limit of high gate capacitance, V in changes the energy levels of the transistors approximately linearly 52 where 0 i=N,P are reference energies and q is the unit of electric charge. The sign of the slope differentiates the N and P-type transistors with different charge carriers. In our model, a voltage also uniquely determines the energetics of the electrodes by modulating their chemical potentials, µ j = −qV j . Throughout, we will differentiate between two different types of electrodes. The first type, including the source and drain electrodes, is kept at fixed potentials, V s and V d , respectively. The second type, the gate electrode, satisfies a capacitive charging model with a fluctuating voltage V g for reading out a gate. This is justified by the fact that in CMOS circuits, the output of a single gate is usually used as the input of another gate, in which case the two are connected through a capacitor. The dynamics of V g is described by the equation of motion where C g is the capacitance and J g is the electron current flowing into the electrode from the transistors. The constant capacitance implies a quadratic energy for charging the electrode, E = C g V 2 g /2. We adopt a semi-classical ballistic transport model for the rate of transfer of an electron from an electrode into or out of a transistor. [54][55][56] Such a description is valid in the weak coupling limit between a transistor and an electrode relative to the thermal energy, and for transistors that are small in scale relative to the mean free path of the electron. We restrict our analysis to single energy level transistors, for which the corresponding transition rates between transistor i and electrode j are where f j (x) = [e β(x−µj ) + 1] −1 is the Fermi distribution. The prefactor Γ is related to contact resistances and is chosen so that the timescale of electron transitions is longer than the timescale of thermal fluctuation, and thus the broadening of energy levels due to the coupling is smaller than thermal fluctuations. In making these assumptions to simplify our model, we have neglected effects such as scattering within the transistor, delocalization between the electrode and the transistor, and electron correlations, each of which can be incorporated into our model as long as thermodynamical consistency is retained. Since we will be considering energy scales on the order of thermal fluctuations at the room temperature, we use V T = k B T /q ≈ 26meV and β ≈ 25fs as our units of voltage and time, where is Planck's constant. The voltage signal-to-noise ratio V d /V T in our model will be on the order of 10, which is the prerequisite of low dissipation in the computing process since the two are closely related. While this ratio is much lower than the current technology, and requires delicate operation of the device, it can be experimentally achieved by designs such as the single-electron box. 57, 58 We reference potentials relative to the source voltage so that V s = 0, and take 0 P = 0 and 0 N = 1.5qV d so that there exists only one independent energy parameter V d . The transition rate constant is chosen as β Γ = 0.2 to ensure the weak coupling assumption is valid. 59 To study the dynamics of the gates, we use both the exact steady-state solution of master equation when possible, and Gillespie simulations 60 to sample individual trajectories. We set C g = 200q/V T in order to separate the timescales of capacitor charging from individual electron transfer events, simplifying the Gillespie simulations. Details of the numerical methods and the justification of the parameters can be found in Section I of the Supplementary Information (SI).

A. NOT gate
The NOT gate, also known as the inverter, takes a single binary input X, and generates its complement as the output Y . The circuit diagram of the NOT gate, composed of two transistors, is shown in Fig. 1A. The N-type transistor is connected to a lower source voltage V s = 0 on its left, while the P-type transistor is connected to a higher drain voltage V d to its right. Both transistors are controlled by an input voltage V in as in Eq. 2, which is treated as fixed in a single gate, while the output voltage V out is measured between the two transistors from the capacitor voltage V g , which evolves according to Eq. 3. The kinetic diagram for our Markovian model is also shown in Fig. 1A. Electrons can move ballistically between adjacent sites in the kinetic diagram according to a master equation, the details of which can be found in the SI (Eq. S4). A NOT gate is typically characterized by its voltage transfer curve (VTC), shown in Fig. 1B. The VTC reports on the average V out in response to V in in the long time limit. Generically, we find increasing V in results in a decrease in V out in agreement with the expected response of an inverter. However, its behavior is dependent on the scale of the thermal noise relative to V d . The limiting values of V out approach 0 and V d for V in = V d and 0, respectively, and sharpens between these limits with increasing V d . Both features result from tuning the band energies of the two transistors in or out of resonance with their respective electrodes, as the transistor band energies depend on V d through Eq. 2. Increasing V d with V in = 0 or V d , increasingly suppresses current into the gate capacitor from V d or V s . In the limit that current flows from only one electrode with fixed voltage, the gate electrode would reach an equilibrium state with that same voltage. The approach to this limiting behavior is exponential, for example, for increasing The VTC is also symmetric around V in = V out = V d /2, under which condition the difference between the energy level of the transistors and its connecting reservoirs is roughly the same for the N-type and P-type transistors.

Performance as a computing unit
When used as a computing unit, our first concern is whether our model generates the correct output with high probability. We define a perfect gate or device as one that generates a deterministic output according to the truth table, e.g., Y should be the complement of X for a perfect NOT gate. However, in the presence of noise, the deterministic output becomes stochastic and subject to finite error rates. As can be anticipated from the behavior of the VTC, in the limit of high V d /V T , or the low-noise limit, the performance of our model approaches that of a perfect NOT gate, whereas the behavior is nontrivial at smaller V d .
The input and output signals are given as voltages in this model, so we map them to binaries by where ∅ represents an invalid result that cannot be designated and α represents an error tolerance with 0 < α 1. We choose α = 0.02 so that the resultant error is below 10 −10 for V d = 40V T as comparable to current technologies, but our qualitative results are insensitive to this choice.
To characterize the accuracy of the gate, we define the error rate ξ as the probability of observing an output different from the perfect gate in a single shot. In the case of X = 0, the error rate can be calculated from the empirical distribution of V out in steady state, as ξ(X = 0) = p[V out < (1 − α)V d |V in = 0] = 0.36. A comprehensive characterization of the accuracy that takes into account the error rate for both cases of X = 0/1 is the channel capacity which is the highest information rate that can be achieved with arbitrarily small error. 16 We compute C numerically from the mutual information I(X; Y ) between the input and output at steady state as a function of V d (see details in SI Section II), as shown in Fig. 1C. For a binary channel, the capacity is between 0 and 1, with 1 corresponding to a perfect gate. Here the capacity is computed to be C = 0.60 for a channel operated at V d = 5V T , given the slight difference between the error rate for X = 0 or 1. While from the VTC the mean V out is influenced by both the source and drain electrode for finite V d , we find its distribution to be Gaussian with variance 1/(βC g ) within the steady state (Fig. S1). This is expected from a Boltzmann distribution, reflecting a proximity to equilibrium despite the presence of persistent currents. To reach a higher capacity, we need the average output V out to approach the limits 0 or V d . This can be achieved by operating at a higher V d so that the leakage current flowing through the higher energy level transistor is even smaller. Given the Gaussian statistics, asymptotically for large V d the error scales as ξ ∼ exp[−βC g α 2 V 2 d /2] 2π/βC g /αV d and the channel capacity scales as C ∼ 1 − ξ(1 − log 2 ξ), consistent with Fig. 1C.

Trade-off among accuracy, speed and dissipation
While the accuracy of the gate improves dramatically for V d V T , its performance is compromised by significantly increasing costs in computation time and energy consumption. Upon receiving a distinct input signal, the gate requires time to charge or discharge the capacitor to reach a steady state output signal. The average relaxation to steady state is shown in Fig. 2A for an initially discharged capacitor with input X = 0. The relaxation is monotonic and nearly exponential but with characteristic decay time that depends on V d . Under this initial condition and input voltage, N µ s , so that few electrons can flow between the source and the capacitor. The lower energy level P facilitates electrons to transfer from the capacitor to the drain following the concentration gradient, gradually building up a higher voltage.
We define the time it takes for V out to reach (1 − α)V d , the threshold voltage for Y = 1, as the propagation delay time τ p . While the threshold voltage increases linearly with V d , the average propagation delay τ p grows exponentially. The propagation delay time, τ p follows an inverse Gaussian distribution 61 with a long exponential tail (Fig.  S1). Note that τ p coincides with the time required for the error rate to decay below 0.5. Figure 2B shows the decay of the error rate with time for V d = 5, 8, 10V T , scaled by the propagation delay τ p for each V d . As the distribution of V out remains Gaussian, the time dependence of the error reflects the charging of the gate capacitor, and specifically follows the evolution of the mean V out . While we consider the single-shot error, the exponential scaling of τ p with V d implies that associating an error rate with a time-averaged measurement of V out would yield a non-monotonic relationship between the waiting time to reach a set error threshold and V d .
0.8 1 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 the slower decorrelation time will cause waiting times to increase with V d , while for large V d the suppressed fluctuations will dominate and decrease waiting times.
When the gate is used repeatedly to process a sequence of inputs X = {X 1 , X 2 , · · · , X N }, there is no need to reinitialize the gate after each computation, and the residual charge on the capacitor may help reduce the computational cost. We call this a memory effect, which introduces temporal correlation between consecutive data transmission processes. For such an information channel with memory, the accuracy can be characterized with the average information rate per data, which is a generalization of the channel capacity, 62 and the detail of which can be found in the SI (Section III). Using this metric, we find the memory effect plays a significant role at intermediate τ obs enhancing the robustness of transmission by up to 30 percent (Fig. S2). For times much longer than τ p , the memory effect wears off and the information rate is set by the channel capacity.
The energy consumption for a gate can be quantified with the heat dissipated to the environment. From stochastic thermodynamics, the heat dissipation of the NOT gate during a long observation time τ obs can be computed by the product of electron current and its conjugate affinity from two separate pathways 48 where J s→N is the electron current flowing from the source to the N-type transistor, and J d→P is the current from the drain to the P-type transistor (Eq. S5). In the process described in Fig. 2A, the pathway through the N-type transistor is essentially blocked due to the highenergy level of N , so the main contribution in Eq. 7 is the second term in the sum. This second term has a similar form as the work required to quasi-statically charge the capacitor from V g = 0 to V g ≈ V d , and thus is close to C g V 2 d /2. This initial charging process is the dominant contribution to the heat dissipation over short times, and represents the reversible limit of the NOT gate (Fig. S1).
Once the system reaches the steady state, there is still a steady entropy production coming from the leakage currents through both pathways, but the entropy production rate within the steady state is much smaller and decreases exponentially with V d (Fig. S3). This is because the output voltage V out is very close to V d , leaving the affinity across the drain and the output nearly zero. Further, the corresponding leakage current from the source to the output is small due to the high-energy level N . The contributions to Σ(τ obs ) from V d implies that for each observation time τ obs , there exists an optimal V d that minimizes Σ(τ obs ), as confirmed in Fig. 2C. The minimum V d shifts to the right with increasing time as at higher V d a larger contribution from the steady-state flux counterbalances the higher heat dissipation during charging.

B. NAND gate
We have presented a Markovian model for the NOT gate, which reproduces the performance of a perfect gate in the limit of high V d and for which there is a complex interplay between energy consumption and time. Within the framework presented, it is straightforward to construct an analogous model of a NAND gate. A NAND gate takes in two binary inputs X A , X B , and outputs Y = 0 only when both inputs are 1. As shown in Fig.  3A, the kinetic diagram, similar to the circuit diagram, is composed of two P-type transistors P A , P B , and two N-type transistors N A and N B . The energy levels of P A and N A depend on the first input voltage V in,A , while the energy levels of P B and N B are controlled by the second input V in,B (Eq. S14). More details on the model, including the definition of the heat dissipation, can be found in the SI (Section IV). The two-dimensional VTC for V d = 5V T is shown in Fig. 3B, which agrees with the truth table for a perfect NAND gate. While the dynamical properties of the NAND gate are very similar to the NOT gate, an asymmetry arises in the NAND gate due to the different pathways in the kinetic diagram, which is a feature absent in the NOT gate. Con-sider the three different inputs (X A , X B ) = (0, 0), (1, 0) and (0, 1) shown in Fig. 3C for V d = 5V T . While for a perfect NAND gate, these three inputs should all correspond to the output Y = 1, the evolution of the error rate ξ and its converged values in the steady state are not exactly the same for finite V d . In the case of (X A , X B ) = (0, 0), as both P 1 and P 2 have relatively low energy levels, there are two pathways to charge the capacitor, resulting in a faster error decay rate. For the cases (X A , X B ) = (0, 1) and (1, 0), one of the pathways is blocked due to the high energy level of the P transistor, so the error rate decays much slower reflecting the slower charging of the capacitor. While the latter two cases also differ slightly due to the asymmetry in N 1 and N 2 , such differences shrink drastically when we increase V d to 8V T in Fig. 3D. The three cases now converge to similar error rates in the steady state. In fact, as we further increase V d , all such asymmetries vanish, another example of which can be found in Fig. S4, where we plot the one dimensional cut of the VTC along the line As in the case of the NOT gate, our model behaves as a perfect NAND gate as V d approaches 1eV. For clarification, we define the propagation delay τ p of a NAND gate as the time required to reach the threshold αV d for the input (X A , X B ) = (0, 1), which is close to τ p for the NOT gate of the same V d .

III. LOGIC CIRCUITS
Equipped with a model for the NOT and NAND gates, we now in principle have the tools to implement arbitrary logic functions. While any logic function can be represented in multiple ways, the topology of the circuit has an influence on its accuracy, and thermodynamic costs. 43 In the following section, we first explore spatial propagation effects arising from assembling multiple gates in a combinational circuit, and then demonstrate memory effects arising from the feedback loop in a sequential circuit. Understanding the behavior of these basic computing circuits will be crucial to building up a computing device.
For each logic circuit, which is itself a computing module made up of multiple logic gates, while each gate has an intermediate output, we reserve the symbol V out for the specific V g that corresponds to the overall output Y of the module. Intermediate input and output voltages are not converted to binaries except for the final output V out . While the output of each gate is used as the input of the ensuing gate, we neglect the back reaction on V out so that the occupation of the ensuing transistors does not affect V out , which is consistent with the high capacitance assumption made in Eq. 2. Unless specified otherwise, all gates are initialized at V g = 0V T at the start of the computation, but no re-initialization is done afterwards. While the channel capacity is a more comprehensive characterization of the accuracy and provides the best case scenario, the much larger input space and complicated memory effects make it cumbersome to calculate in the case of logic circuits. We thus use the error rate in the final output instead, and consider the worst case scenario in choosing the inputs to provide an upper bound for the error rate whenever possible.

A. Combinational Circuit
A combinational circuit maps a given set of inputs to a single output using a number of gates, such as an adder that computes the sum of inputs and a XOR gate that computes their parity. As the simplest example, we study the behavior of an array of L NOT gates indexed by i = 1, 2, · · · , L connected in the way that V A schematic of the system can be found in Fig.  4A. The input of the circuit X determines V (1) in , and the output is measured from the last gate V out = V (L) g . The spatial dimension adds complexity to the evolution of V g , as illustrated in Fig. 4B for V d = 5V T , X = 0. In the steady state, we expect the output voltage of the odd gates close to V d , and the even gates close to zero. For a gate to reach its steady state, its input, which depends on the dynamics of the previous gate, must first reach its expected value, thus the propagation delay should increase with the gate index i. As the odd gates are initialized far from their steady state, it will take a significant amount of time to reach its expected output. For the odd gates which have not yet reached the steady state, the ensuing even gate will have a lower input voltage, resulting in the overshoot of voltage before eventually decaying to its expected lower output. The turn over in voltage of the even gates corresponds to the inflection point on the VTC.
A consequence of the connectivity between gates is the corruption of initial input. While the input voltage of the first gate is always 0V T , for finite V d , the maximum input voltage of the second gate will be slightly lower than V d , and thus corrupted. As the VTC of the NOT gate is a non-increasing function, a corrupted input will inevitably cause a higher error rate in the output, which will propagate along the array. This is shown in Fig. 4C, where the error rate for individual gates in the steady state rises initially with gate index, before converging to a constant value after a few gates, and is always higher than that of the single gate. A similar behavior can be found in the propagation delay time, which increases sharply for the first few gates and converges to a slower linear increase afterwards (Fig. S5). This implies that circuit designs with deeper layered structure are unfavorable in terms of both accuracy and propagation delay.
The convergence behavior is intriguing as it implies the existence of a pair of fixed points (V * odd , V * even ) for the intermediate outputs in the steady state. Indeed, the fixed point solution corresponds to the point on the VTC (V in = V * odd , V out = V * even ) satisfying the condition that its reflection (V in = V * even , V out = V * odd ) is also on the in (Fig. S5), whereas the speed of approaching the fixed point characterizes the spatial correlation in the system. We fit the decay in |V (i) g − V * |/V T with an exponential function exp[−κi], and report the rate κ for different V d in Fig.  4D. For V d = 5V T , the spatial correlation length 1/κ is on the order of 1, which means spatial correlation exists between neighboring gates. As a consequence, it is more probable to observe consecutive errors along the array, which is shown by an error analysis of simulated trajectories in the SI (Section V). As the VTC becomes sharper with increasing V d , the correlation length between gates decreases. In the limit of high V d , the fixed-point solution can be found exactly at (V in = 0, V out = V d ), which means that the input becomes uncorrupted. To summarize, the combination of gates introduces longer propagation delay and input corruption, and thus deeper layered circuit design is advised against. By operating at a higher V d to reduce spatial correlation, the latter problem can be mitigated, but of course this is done at the cost of even longer propagation delay.

B. Sequential Circuit: RS latch
While combinational circuits are typically used to carry out arithmetic computations, modern computing devices often include another type of logic circuit to handle memory -the sequential circuit. Fig. 5A shows an example of such a circuit, known as the RS latch. The RS latch consists of two NAND gates where the output of gate 1, V g , is sent as an input of gate 2, V (2) in,A , and similarly, the output of gate 2, V g , is fed back as V (1) in,B . The remaining two inputs, V (1) in,A and V (2) in,B , corresponds to the two external binary inputs X S and X R , respectively. The output of the circuit, V out , which coincides with V (1) g , depends not only on the external inputs X S and X R , but also the stored information of V (1) g and V (2) g . This is the defining characteristic of a sequential circuit, which makes it useful as a memory storage. More specifically, for a perfect RS latch, in the "set" stage where the external inputs are set as X S = 0, X R = 1 or X S = 1, X R = 0, there exists only one dynamically stable state for the system, so that we can unambiguously designate the memory at V out as 1 or 0. In the "hold" stage where X S = X R = 1, however, the system is bistable and its state depends on the initialized value of V (1) g and V (2) g . In the vicinity of the fixed points, an effective Hamiltonian description of the RS latch is quartic in V out with two minima and a maxima between them. 63 This emergent bistability resulting from the feedback loop allows the RS latch to function as a memory storage device.
To function as a memory storage device, a circuit must have at least two distinguishable states in which information can be stored. For our stochastic model in Fig. 5A, these states correspond to the steady-state solutions that satisfy the feedback condition V (1) in,B = V in,B = V d . An intuitive way to find their location is to overlap the VTC of the two NAND gates along the cut V (1) in,A = V d and V (2) in,B = V d , which are not exactly the same due to the asymmetry in the non-perfect NAND gates. We show a couple of scenarios at different V d in Figs. 5D-F . At V d = 3V T , the highly asymmetric VTCs cross merely at (V (1) in,B , V (2) in,A ) = (0.67V T , 2.61V T ), indicating that the system only has a single stable state and does not qualify as a memory storage device. As V d increases to 5V T , two dynamically stable informational states start to emerge at (V (1) in,B , V (2) in,A ) = (0.19V T , 4.92V T ) and (4.89V T , 0.20V T ), though the slight asymmetry suggests different dynamics around the two states. While a third intersection point is found at (V (1) in,B , V (2) in,A ) = (2.93V T , 2.26V T ), it corresponds to an unstable saddle point. At an even higher V d = 40V T , the two states converge to (V (1) in,B , V (2) in,A ) = (0V T , 40V T ) and (40V T , 0V T ), and symmetry is restored.
While the existence of two distinguishable informational states is guaranteed at sufficiently high V d , there remains the question of whether these informational states are robust against noises. While in both the set and hold stages, V (1) g and V (2) g are usually sufficiently far from each other that it is possible to distinguish them definitively, there do exist occasions where the noise can mediate a transition. One such example is shown in Figs. 5B and C for the initialization V As the outputs of the gates evolve from their initialization towards the steady state solution, there is a significant overlap between the two outputs around t = 0.5τ p , which leads to about 13% percent of the trajectories failing to retain the information and evolving to the wrong fixed point. This kind of perturbation happens when the overlap region includes the unstable intersection point on the VTC, and the change of convexity of the effective Hamiltonian brings the trajectory towards a different stable state. Such an initialization error is rare to observe either in the set or hold stage, and we show additional evidence for the robustness of the circuit at V Fig. S7. In addition, at a higher V d , as the VTC becomes sharper, not only do the two minima in the Hamiltonian become more separated, their vicinity also become steeper, both of which facilitate the differentiation between the two states and thus will drastically improve the robustness of the device.

C. Sequential Circuit: D flip-flop
With the RS latch as a basic computing unit, we can model a memory storage module that synchronizes with the clock generator, called the D flip-flop. Modern computing devices typically include a pulse generator that oscillates between 0 and 1, with a clock cycle τ c . To see how the clock is incorporated into the D flip-flop, we show the circuit diagram of a D flip-flop in Fig. 6A, built up from 4 NAND gates and 1 NOT gate. The circuit can be readily modularized as a memory storage unit, denoted with the symbol D, that takes in an input X representing the data, another input X WE synchronized with the clock, and generates an output Y . The two NAND gates with the feedback loop on the right hand side constitute an RS latch, which is responsible for the memory storage. When the write-enable input X WE = 1, the D flipflop sets its output V out in agreement with the data X, whereas when X WE = 0, the D flip-flop holds its stored value as its output, which can be further processed for computing purposes.
The clock cycle τ c , or the clock frequency 1/τ c , is an important parameter as it determines how fast data can be read and stored. In Figs. 6B and C we illustrate how the clock cycle influences the accuracy and dissipation of the data transmission process for a D flip-flop with V d = 8V T . We start with X WE = 1 and send in a stream of data X = {1, 0, 1, 0, · · · }. While X WE alternates between 1 and 0 every τ c /2, the data input only changes every τ c . This input data sequence is chosen to maximize the alternation in the output, and thus minimize the memory effect discussed earlier for the NOT gate. Therefore, the error rate and dissipation in this case are expected to be the highest among all possible input sequences. The error rate ξ is measured according to the output V out at the end of each cycle, and is reported separately for the cycles with X = 1 and 0. The evolution of V out as a function of the cycle number can be found in Fig. S8. Similar to the behavior for the single NAND gate in Fig. 3D, the error rate for X = 0 starts to decrease monotonically when τ c is longer than the single gate propagation delay τ p . The error rate for X = 1, however, first increases with τ c before eventually decreasing. This counter-intuitive behavior comes from the memory retention behavior in the RS latch. Once data are stored in the RS latch, it tends to stay in the memory by influencing the transmission of the following data, and thus introduces temporal correlation between the outputs. The influence of the data can only be erased given sufficient time to transmit the following data. This temporal cor-relation time, or memory retention time, again coincides with the propagation delay τ p . In this example, as the first input X = 1, the output retains the memory of a higher output at short τ c , so the error rate for X = 1 is deceptively low, and the error rate for X = 0 is high. At τ c ≈ τ p , the output is stuck between the high and low outputs before reaching either steady state, so that the error rate for either cycle is high. In this regime, the average dissipation accumulated within each cycle rises fast with τ c , as charging processes contribute heavily to energy costs. When τ c > τ p , the memory effect is eventually overcome and the error rates for both cycles start to decay. The average dissipation rate also converges to a smaller constant value as within each cycle, the system is able to reach the steady state, in which much less dissipation is generated. The exponential scaling of τ p with V d implies that while the asymptotic error is expected to decrease when operating far from thermal energies V d V T , the speed with which the D flip-flop can function with that lower error is significantly slower. Due to this lag, comparing between a lower and a higher V d , the error is expected to be much lower in the former case for a fixed computing time on the order of τ p of the lower V d .

IV. PARITY COMPUTING DEVICE
With the combinational circuit modularized as the arithmetic logic unit (ALU), and the sequential circuit as the memory storage device, we can combine the two components to model a computing device. We choose the task of computing the parity of a sequence of inputs X = {X 1 , X 2 , · · · , X N } of length N , which has wide applications in error detection. Such a task can be easily implemented by combining (N − 1) XOR gates in a sequential manner. However, when N is relatively large, due to the limitation in resources, it is beneficial to break up the task in several steps, and store intermediate results in memory. The clock generator synchronizes the operation of different components to ensure correct sequencing.
As an example, we consider two XOR gates as an ALU, and four D flip-flops, D1 to D4, as a memory device to check the parity of N = 12. Fig. 7A shows the schematic of our design, while the complete circuit diagram can be found in Fig. S9. Each XOR gate takes in 2 binary inputs at a time, the source of which is controlled by 2 input two-way switches, shown in red in Fig. 7A. When the switch is connected to terminal 1, the input comes from the data sequence X; whereas when terminal 2 is connected, the input comes from the data stored in a D flip-flop. At the end of each XOR gate is an output twoway switch, shown in green in Fig. 7A, which controls where to store the output. We store new data only on free D flip-flops, where the data stored at an earlier time is already read out for post-processing and does not need to be held any more. The total system requires modeling We start the computation by sending in pairs of input data from the data sequence, and computing their parities with the XOR gates. The D flip-flops are set by outputs from the ALU (first D1, D2 and then D3, D4), and once all D flip-flops have been set, we free them by sending the stored information back to the ALU for further processing. The computation is terminated when all inputs are taken into account in the final output, and the entire task can be completed in 6 clock cycles. A more detailed description of the protocol, and a computational tree graph that illustrates how intermediate outputs are related to the final output can be found in the SI (Section VI) and Fig. S10.
As before, we are interested in the time and dissipation required to achieve a certain accuracy. In Fig. 7B, we show the error rate for the final output at t = 6τ c , and the average dissipation per gate (averaged over the 28 gates in this device) per clock cycleΣ as a function of τ c with V d = 8V T . Both results are averaged over more than 10 4 inputs, which are sequences of independent and identically distributed Bernoulli random variables with equal probability of being 0 or 1. As expected, the average error rate decays with the clock cycle until τ c ≈ 3τ p , as the extended spatial dimension of the circuit increases the propagation delay in the final output. At such a high V d , spatial correlations do not extend beyond neighboring gates, and are even weaker between different modules, especially for clock cycles longer than τ p . We further analyze how the error in the final output is correlated along its computational path in the SI. The average dissipation first increases sharply and then converges to a linear growth in the limit of large τ c , similar to the D flip-flop, but slightly lower than that in Fig. 6C for the same τ c . This is because the input sequences are randomly chosen instead of alternating between 0 and 1, and the memory effect can help shorten the charging process, which most contributes to the entropy production. Additionally, because of the synchronization, the D flip-flops may remain at a steady state for a few cycles before they are freed to store new data. During such periods, the dissipation is especially low as the entropy production in the steady state is minimal due to relatively small leakage currents. Therefore, computational protocols that minimize changes on the memory storage device are desirable for low dissipation computing. Taking into consideration both the accuracy and dissipation, the optimal clock cycle to operate with is τ c ≈ 2τ p , as lowering the speed further will only result in higher dissipation from the steady state.

V. DISCUSSION AND CONCLUSION
We have illustrated a promising model for stochastic logic gates, and demonstrated its utility in building arbitrary logical circuits. Information manipulations, such as bit storage and erasing, are represented by the charging and discharging of the capacitors, which is consistent with current data storage technology. While our model performs as a perfect logic circuit when operated in the limit of low noise, its thermodynamical consistency allows us to study the rich interplay between speed, accuracy, and dissipation in the intermediate regimes, from which we can derive some useful design principles for low dissipation computing devices. For instance, we have provided a physical origin of input corruption in the combinational circuits, as well as feedback robustness in the sequential circuits, and illustrated how each can be improved drastically by operating at a slightly higher voltage. In addition, memory effects should be exploited as much as possible to minimize dissipation. With modularization, it is straightforward to scale up our model to even larger and more complex systems, making it a useful model to study collective behaviors of circuits. It is useful to bear in mind that the signal-to-noise ratio regime that is explored in this work is two orders of magnitude lower than current technology. However, with the exponential growth of the number of computations per unit of energy dissipated, as observed by Koomey's law, 64 such a low dissipation regime will soon become relevant. While the model we propose is not intended for a direct comparison with the current CMOS technology, the fact that it can reproduce the input-output behaviors of universal logic gates makes it a promising tool to study fundamental physical limits on computations.
One of the major motivations of this work is to enable the design of low dissipation computing devices with maximal accuracy and speed. While there exist several theoretical results that propose bounds on the thermodynamic costs of computing, 4,43 understanding under what circumstances they are saturated requires a realistic model for the thermal noise. As each dynamical process in our model obeys a local detailed balance, we are able to harness the lessons of stochastic thermodynamics to define and analyze the time dependence and fluctuations of the entropy production. Note that the Σ we have referred to throughout the paper is different from the total dissipation, which is the heat released by the system, by a term T ∆S, the change in the Shannon entropy of the system transistors times the bath temperature. Nevertheless, we have used the two terms interchangeably since for the timescales studied, the boundary term ∆S is orders of magnitude smaller than the cumulative term Σ, which is very large due to the large gate capacitance. This then raises the question of how to further decrease the irreversible dissipation and that associated with charging the gates. This problem is the crux of optimal control theory, and adiabatic circuit design, 65,66 from which some design principles can be borrowed. For example, while we have kept the input voltage of the transistors V in fixed within each cycle, one can design optimal feedback protocol that controls it according to the state of the capacitor, in order to minimize the irreversible dissipation throughout the process. Such optimal feedback protocols already exist for simple thermodynamic engines, 67 and we believe our model provides an ideal testing ground for applying more advanced stochastic control algorithms. 68 Marrying our model with a framework that integrates information with thermodynamics, 69,70 we hope to get a step closer to achieving a computing design that minimizes dissipation while maximizing accuracy and speed.

VI. MATERIALS AND METHODS
Simulations were done with both an iterative, numerically exact diagonalization of the master equation as well as Gillespie simulations. 60 In both, we employ a separation of timescales for electron transfer to or from a transistor and gate charging, afforded by the large gate capacitance. Specifically, the large capacitance means we can update V g with discrete time-step, chosen to be 10β , and compute rates at fixed V g in-between these dynamical updates. More details on the models and calculations can be found in the SI. All our codes and data can be accessed on GitHub: https://github.com/chloegao12.

VII. ACKNOWLEDGEMENT
The authors thank Gavin E. Crooks for suggesting this topic for study and for invaluable discussion and comments. C.Y.G. and D.T.L. are supported by the US Department of Energy, Office of Basic Energy Sciences, through the CPIMS Program Early Career Research Program under Award DE-FOA0002019.

I. NOT GATE
The Markovian system is described by the occupation number of the two single-electron levels n N , n P = 0/1, and the gate voltage V g . Electrons can jump between the transistors and the reservoirs only if the target site is empty. Denoting the transition rate from state i to j as k ji , the rates describing the exchange of electrons between the transistors and the reservoirs are given by where f i (x) = [e β(x−µ i ) + 1] −1 is the Fermi distribution. The transition rate between the two transistors depends on their relative energy levels, for example, in the case of P > N , where n(x) = [e βx − 1] −1 is the Bose-Einstein distribution. The rate constant Γ = 0.2/β ∼ ps −1 is chosen so that electron transitions happen on a longer timescale than quantum tunneling, and the broadening of energy levels due to the coupling to electrodes is smaller than thermal fluctuations.
The dynamics of the capacitor is solved by the equation of motion where C g is the capacitance and J g is the electron current flowing into the electrode from the transistors. While the transfer of electrons changes V g , the capacitor is treated as an electron reservoir at constant chemical potential µ g = −qV g within each time interval t int . Thus, the assumption made here is that the electron transfer within each t int is small compared to C g V T , and the electron relaxation within the capacitor is fast compared to t int . We have chosen C g = 200q/V T , t int = 10β in order to justify these assumptions.
To obtain a numerically exact solution to the Markovian dynamics, for each interval t int , we solve for the average occupation number n N , n P from the stationary solution of the master equation, which describes how the probability of the configuration p n N ,n P evolves with time: where S j = i =j W ij for a matrix W . Note that the transition rates concerning the gate (g) are time dependent through V g , while a local equilibrium approximation is invoked within each integration interval t int . The current J g flowing into the capacitor is then computed by the sum of two terms In the Gillespie simulation, the electron jumping processes are modeled explicitly as chemical reactions, with M = 10 reaction rates We use the Monte Carlo method to simulate the probability that reaction i will happen after time t, and the currents between two sites are calculated as the discrete number of jumps between the two sites. The discretization error in voltages between the average protocol and the Gillespie simulation is on the order of q/C g = 0.005V T .

II. COMPUTATION OF CHANNEL CAPACITY
For the NOT gate, we observe Gaussian distributions in V out in the steady state regardless of V d , with the same variance 1/(βC g ). Under the Gaussian assumption, the error rate is uniquely determined by the average output voltage V out . For example, the accuracy rate for X = 0 is determined by the conditional probability Marginalizing the conditional probabilities, the mutual information can be computed by To compute the channel capacity, we numerically maximize I(X; Y ) over the base probability distribution p(X), where the conditional probabilities are computed using V out in the steady state.

III. COMPUTATION OF AVERAGE INFORMATION RATE
For an information channel with memory, the average information rate per data is defined which in the limit of N data → ∞ and upon maximizing over the input probability distribution p(X), is the generalization of the channel capacity. As an example, for N data = 2, the mutual information is computed by To incorporate the memory effect, note that the probability distribution of the ith output Y i is not only a function of X i , but also the output voltage of the previous data V i−1 out . The dependence can be expressed with the conditional probability p(V i out |x i , V i−1 out ), which we sample by Gillespie simulations of more than 10 7 trajectories. The dependence of I on the observation time τ obs thus comes from this conditional probability.
Let V 0 out = 0V T , and we can write the joint probability which follows from the Markovian nature of the memory effect, and the fact that the input data x i are independent from each other. The joint distributions can then be computed by marginalization, e.g., defining the mapping between Y and V out in Eq. 5 as Y = dig(V out ), where the sum is over all V i out , discrete in our simulation, that correspond to y i . As the numerical maximization is difficult for large N data , without loss of generality, we choose as our input a sequence of independent and identically distributed Bernoulli random inputs with equal probability of being 0 or 1. We show in Fig. S2 the average information rate at V d = 5V T for N data = 1, 2, 3 as a function of τ obs , the processing time for each individual data from input to output. For N data = 1, the information rate first decreases at small τ obs , as ξ(X = 1) inevitably increases at short time due to the initialization V g = 0V T , and rises up sharply around the propagation delay τ p , which is the time required for ξ(X = 0) to decay. As we increase N data , the memory effect is expected to be especially helpful when consecutive inputs share the same value, and thus should on average improve the information rate. This effect is not evident for extremely small τ obs , where the error rate for X = 0 is too high to be corrected by the memory effect. However, the memory effect plays a significant role, bringing up to 30 percent increase in the average information rate, at intermediate τ obs .
All curves eventually converge in the long time limit as the memory effect wears off for times much longer than the propagation delay.

IV. NAND GATE
The NAND gate is described by four occupation numbers (n P A , n P B , n N A , n N B ) and the gate voltage V g . The energy levels of the four transistors are determined in the same manner as in Eq. 2 of the main text, with N A , P A corresponding to V in,A , and N B , P B corresponding to V in,B , The transition rates between the transistors and reservoirs are described analogously to Eqs. S1 and S2. To avoid numerical issues in Eq. S2 when the adjacent transistors have the same energy levels, we add a 10 −3 regularizer in the denominator of the Bose-Einstein distribution n(x). The output V g is again treated as a capacitor described by Eq. S3, where the current J g is the sum of J P A →g , J P B →g , and J N A →g , each defined as in Eq. S5. As in the NOT gate, both an average protocol and a Gillespie simulation consisting of 16 chemical reactions are used to study the dynamics. The heat dissipation during an observation time τ obs is

V. ERROR ANALYSIS IN AN ARRAY OF NOT GATES
We simulate an array of NOT gates of length L with V in errors between adjacent gates. In other word, given that an error occurs at gate i, the probability of observing another error at its neighboring gate is enhanced due to the spatial correlation.
Here the clock cycle is chosen as τ c = 10 7 β . All gates are operated at V d = 8V T , while all capacitors are initialized with zero charge. We highlight with red cross the time points where outputs are being read out from the D flip-flops. At t = 0, we send in 4 input data, X 1 to X 4 , by connecting all the input two-way switches to terminal 1. Since all the D flip-flops are slack at the moment, we can store the computing results of the XOR gates into D1 and D2 by switching both output two-way switches to terminal 1 as well. The clock stays at 0 within the first half cycle while computations are being done at the ALU, until t = τ c /2, when the clock switches to 1 and the outputs of the ALU are being written into D1 and D2. At t = τ c , as the clock returns to 0, another 4 input data are sent in while the output two-way switches are connected to terminal 2 so that outputs can be sent to and stored at D3 and D4. At the end of the second cycle, when we realize that our memory devices are full and can not take in new inputs, we read out the outputs at D1 to D4 and send them back to the ALU as inputs by connecting all input two-way switches to 2. We continue the computation in this manner, until all input data are taken into account and the final output is read from D1 at t = 6τ c .

B. Error analysis in the parity computing device
The computational tree graph of the computing device is shown in Fig. S10A, with 12 input nodes in the zeroth layer representing the input data, and a single final output l (1) 4 that computes the parity of the inputs. The nodes in layers 1 to 3 represent intermediate computation results. If an error is observed in any of the nodes whose layer is deeper than 0, we look further at its parent nodes to trace where the error originates, and its child node (if existing) to see how far the error propagates. We call such a record of error vertically along the computational tree graph an error path, and its length is denoted as l e . In this computational tree graph, the maximum value for l e is 4, which means the error propagates from layer 1 all the way to the final output; while the minimum value for l e is 1. Among the 1.28 × 10 4 simulations we have done with different input sequences, we make a histogram of the error paths with length l e for different clock cycle τ c , which is shown in Fig. S10D.
For the shortest clock cycle plotted τ c = 5 × 10 6 β , which is too soon for the gates to reach their steady states, we observe an overwhelmingly high number of error paths of length l e = 4. However, when τ c is longer, we see an exponential decay in the number of error paths with increasing l e . This exponential decay rate characterizes the temporal correlation between intermediate computation results. The rate increases with longer τ c , indicating the diminishing correlations, or the weakening of the memory effect at longer clock cycle. With τ c > 2 × 10 7 β , it is almost impossible to find an error path with l e = 4 that propagates through the computational tree graph. The heat dissipation during the propagation delay τ p (dots) for X = 0, which converges to the quadratic function C g V 2 d /2 (line) at high V d . (B ) The heat dissipation rate in the steady state σ ss decreases exponentially with V d for both X = 0/1.   and τ c = 3 × 10 7 β (C ). The input data sequence starts from X data = 1 and alternates between 1 and 0. All gates are operated at V d = 8V T , and are initialized with V g = 0V T .
We discard the first few cycles and average over more than 50 cycles when computing the average error rate and dissipation in Fig. 6, so that their values have no dependence on the initialization.