Stochastic Thermodynamics of Non-Linear Electronic Circuits: A Realistic Framework for Computing around kT

We present a general formalism for the construction of thermodynamically consistent stochastic models of non-linear electronic circuits. The devices constituting the circuit can have arbitrary I-V curves and may include tunnel junctions, diodes, and MOS transistors in subthreshold operation, among others. We provide a full analysis of the stochastic non-equilibrium thermodynamics of these models, identifying the relevant thermodynamic potentials, characterizing the different contributions to the irreversible entropy production, and obtaining different fluctuation theorems. Our work provides a realistic framework to study thermodynamics of computing with electronic circuits. We demonstrate this point by constructing a stochastic model of a CMOS inverter. We find that a deterministic analysis is only compatible with the assumption of equilibrium fluctuations, and analyze how the non-equilibrium fluctuations induce deviations from its deterministic transfer function. Finally, building on the CMOS inverter, we propose a full-CMOS design for a probabilistic bit (or binary stochastic neuron) exploiting intrinsic noise.


I. INTRODUCTION
The growing energy consumption of data-intensive technologies is raising concern [1,2]. Semiconductorbased electronic circuits are the dominant technology for information processing. In order to reduce energy consumption, those circuits need to function in regimes where the information carrying signals are increasingly close to thermal fluctuations [3][4][5][6][7][8][9]. Indeed, proposals have been made to trade reliability for energetic costs in certain applications [10][11][12], or to exploit intrinsic thermal fluctuations in order to solve stochastic optimization problems [13][14][15]. But while non-linear electronic circuits are essential for computation, the proper description of thermal noise in these circuits is a highly non-trivial problem, especially if one aims at preserving thermodynamic consistency [16][17][18][19][20][21][22]. Traditional methods employed in engineering for the description of noise are usually based on the linearization of a given element response around an operating point, or only consider external noise generated by linear resistors (the internal or intrinsic noise is first mapped to external sources) [19,23]. Although these approaches might offer accurate estimations of the noise in some applications, they are not thermodynamically consistent and therefore are not suited for situations in which thermal fluctuations are relevant or, in particular, are exploited as a resource.
The discovery of the so-called fluctuation theorems [24][25][26][27][28] and the ensuing development of the theory of stochastic thermodynamics [27,[29][30][31], established very general constraints on the thermal fluctuations of different physical systems, even if they are highly non-linear and arbitrarily away from thermal equilibrium. The theory was used to study colloidal particles, chemical reac-tion networks, molecular motors [27,[32][33][34], but also in electronic systems. There, it was applied to linear electrical circuits ranging from simple circuits [35][36][37] to complex networks (even in quantum regimes) [38], but these circuits are of limited use to implement computations. Also, after experimental and theoretical progress on the study of non-linear single electron devices and Coulomb blockade systems [16,[39][40][41][42][43], stochastic thermodynamics was successfully applied to reach a deep understanding of thermal fluctuations in these systems [44][45][46][47][48][49][50][51]. In those devices, the nature of the conduction channels (typically tunnel junctions) and the nanoscopic size of the conductors (which implies low capacitances and high charging energies), allow to design circuits that process information with very low energy requirements [40]. The logical states are represented here by the presence of just one or few electrons. Unfortunately, the requirement of low temperatures and challenges in the fabrication of these devices have so far prevented their practical application in computing. In the foreseeable future, regular complementary metal-oxide-semiconductor (CMOS) circuits will probably remain the most relevant platform for computing [52,53]. In this context, the quest for speed and low energy consumption fueled a spectacular progress in the miniaturization of transistors, and nowadays integrated circuits with typical features size of around 5nm can be mass produced. In these circuits, a transistor can be activated by just a few hundred electrons in its gate terminal. Thus, CMOS circuits are approaching regimes where a description in terms of single electrons becomes relevant [9]. The fact that both single electron devices and CMOS transistors (in some modes of operation) display shot noise [54], suggests that the rigorous tools and methods that have been used for the modeling arXiv:2008.10578v4 [cond-mat.stat-mech] 18 May 2021 and simulation of single electron devices could also be applied to study CMOS circuits. While recent studies used stochastic thermodynamics for the detailed characterization of individual devices such as diodes and transistors [51,55,56], the study of complex networks and circuits comprising such devices remains unexplored.
Stochastic thermodynamics also offers a framework to study the energetic costs associated to computing in a systematic way [57,58]. Although it was realized early on that information processing can in principle be done without energy expenditure [59,60], this is only true in idealized setups in which either the computation is extremely slow or the computing device is perfectly isolated from the environment. Practical (artificial or natural) computing done in finite time and in noisy environments involves the dissipation of energy. In this context, the modern theory of stochastic thermodynamics was employed to clarify early notions like the relation between thermodynamic and logical reversibility [57,58,61], the energetic costs associated to measurement and erasing [61][62][63][64], and the cyclic operation of computing devices [57]. Also, stochastic thermodynamics allows to rigorously evaluate the dissipation of nonequilibrium and finite-time processes [65][66][67][68], which has applications for optimal erasing protocols [69][70][71][72]. The thermodynamic costs associated to the structure of complex information processing networks were also studied [73,74]. However, these efforts involve either extremely simple and idealized models or abstract formulations aimed at obtaining fundamental bounds, where no particular computing architecture is considered.
In this paper we report three major achievements. First, in sections II to IV, we resolve the long standing and fundamental problem of rigorously describing noise in non-linear electronic circuits. We do so by developing a general formalism to construct thermodynamically consistent stochastic models of arbitrary non-linear circuits. Despite the presence of charging effects causing the behavior of each device to be modified by its environment [39], given the I-V curve characterization of a single arbitrary device, we show how to describe its stochastic behavior when it is introduced in an arbitrary circuit, avoiding some pitfalls leading to unphysical steady states. In this way we can seamlessly accommodate many different devices like tunnel junctions [40,47], diodes [20], MOS transistors in subthreshold operation [20,54,75], or more exotic devices like nanoscale vacuum channel transistors [76]. The devices may even be time-dependently driven and at different temperatures.
Second, in sections V and VI, we provide a complete thermodynamic characterization of these non-linear electronic circuits using stochastic thermodynamics. We formulate the first and second law of thermodynamics for these circuits at the ensemble averaged as well as at the fluctuating level. In doing so we establish the relation between heat and electric currents and identify the thermodynamic potentials and forces at work in these circuits. We also formulate a general version of the Landauer prin-ciple as well as the different fluctuation theorems known to date.
Third, by formulating a stochastic thermodynamics describing a large family of technological relevant electronic circuits, we provide a rigorous framework to study thermodynamics of computation implemented with realistic architectures instead of toy models. To assert this claim, in section VII we construct and analyze a stochastic model of a CMOS inverter (or NOT gate) and of a probabilistic bit (p-bit). The CMOS inverter is an important primitive in electronic design, from which more complex devices like oscillators and memories can be built. We show how, due to non-equilibrium fluctuations, the transfer function of the gate deviates from the one obtained by a deterministic treatment. We also compute the full counting statistics of the current fluctuations and illustrate the validity of a detailed fluctuation theorem. The p-bit can be considered as a faulty memory, with a controllable bias and error rate. They are a physical implementation of what in machine learning is known as a binary stochastic neuron [77,78]. Such devices were recently employed in proof-of-concept experiments to solve stochastic optimization problems and emulate artificial neural networks [13,14]. Other proposals to exploit physical noise include cryptographic applications [79]. To the best of our knowledge, our design is the first full-CMOS proposal for a p-bit exploiting intrinsic noise, and can be implemented with current technology. The methods employed in these examples can be directly applied to model arbitrary logic gates at the stochastic level, both in asynchronous computing schemes or synchronous ones requiring an external clock signal.
Our work provides bridges between computer engineering, mesoscopic physics and nonequilibrium statistical physics. In doing so it may contribute to the search for new practical and energy-efficient computing paradigms, and also to the design of experiments taking advantage of the versatility of electronic circuits in order to test new developments in statistical physics.
Please note that the reader mainly interested in the stochastic modeling of circuits might want to skip Sections V-VI on a first read, and go directly to the applications discussed in Section VII.

II. BASIC SETUP
We consider an arrangement of N 0 ideal conductors characterized by their total charge {q n } n=1,··· ,N0 and electrostatic potential {V n } n=1,··· ,N0 , where the latter are measured with respect to some reference or 'ground' (see Figure 1). Basic electrostatic theory shows that the charges and potentials are related by the linear relation: where q 0 = (q 1 , · · · , q N0 ) T and V 0 = (V 1 , · · · , V N0 ) T are column vectors containing the charges and voltages, and the N 0 ×N 0 symmetric matrix C 0 (known as the Maxwell capacitance matrix) encodes the mutual and self capacitances of the conductors. These capacitances depend on the shape, size, and relative position and orientation of the conductors. The electrostatic energy contained in such a system is given by the quadratic form: Some of the conductors can have their potential fixed by voltage sources, and in that case we say that the circuit is open. We will refer to conductors with fixed potentials as regulated conductors, and to the rest as free conductors. Thus, we have N 0 = N + N r , where N is the number of free conductors and N r the number of regulated conductors. The vector q 0 will be split into vectors q and q r containing the charges of the free and regulated conductors, respectively. If the circuit is open, then the degrees of freedom of the system are reduced. This can be seen from Eq. (1), since fixing the potential of a conductor imposes a linear relationship between all the charges. Then, the state of the system is fully specified by the charges q of all the free conductors. We also consider two-terminal devices or channels that allow the transport of elementary charges of value q e between pairs of conductors, forming a network or circuit. Each of these channels is modeled as a bi-directional Poisson process (BPP). This choice offers some generality while keeping the formalism concrete and simple and, as mentioned before, allows to describe relevant devices like tunnel junctions [40,47], diodes [20], MOS transistors in subthreshold operation [54,75], and other devices like nanoscale vacuum channel transistors [76]. The common feature of all these devices is that they display shot-noise (i.e., the current noise spectral density is proportional to the average current for large biases [80], see section III A). This is also a limitation of the BPP modeling choice, since it does not allow to properly describe regular resistors or source-drain conduction in MOS transistors operating in saturation mode, where the current noise spectral density is approximately independent of the average current (as for Johnson-Nyquist noise) [80]. However this is not a serious limitation, especially if we are interested in the regime of ultralow energy consumption, where the subthreshold and unsaturated operation of MOS transistors is optimal [75].
Let ρ = 1, · · · , M index the two-terminal devices present in the circuit. Then, given a device ρ connecting conductor n and m, we associate to it two basic Poisson processes: a 'forward' one in which an elementary charge is transported from conductor n to conductor m, and the 'reverse' one in which a charge is transported in the opposite direction, with respective rates λ ρ (q, t) and λ −ρ (q, t). The forward direction is of course chosen arbitrarily. Note that the rates λ ±ρ (q, t) depend explicitly on the full state q of the system, which allows to model externally controlled conduction channels. Thus, in a transition ±ρ the state of the system changes as: where q e is the value of the elementary charge involved in all the possible transitions and ∆ −ρ = −∆ ρ . The vector ∆ ρ encodes to which conductors the device ρ is connected and what is the change in their number of charges during the transitions. If a device ρ is connected between one of the free conductors, n, and one with fixed voltage, the corresponding vector ∆ ρ is given by: where the forward direction was chosen as the one leaving the conductor with fixed potential. One can imagine more complex devices that involve three or more conductors in an irreducible way, for example by taking one charge from conductor m, one from n, and transporting them to conductor o. This kind of devices can also be treated with the formalism we develop here, although most discussions will be focused on two-terminal devices (that however might be controlled externally).

A. Reduced incidence matrix, cycles and conservation laws
The vectors ∆ ρ>0 can be grouped in an N ×M reduced incidence matrix : This matrix is analogous to the stoichiometric matrix in chemical reaction networks [33,81] 1 . For closed cir-1 A given circuit with two-terminal devices can be mapped to a chemical reaction network by mapping conductors to chemical cuits it coincides with the full incidence matrix of the directed graph obtained by mapping conductors to nodes and two-terminal devices as directed edges (with the direction given by the forward one). The reduced incidence matrix for an open circuit is obtained from the one of the closed circuit by eliminating the rows corresponding to regulated conductors. The right null eigenvectors of ∆ define cycles, i.e., sequences of transitions that leave the circuit state invariant: The elements of the vectors c α can always be chosen to be 0,1, or −1. The number of independent cycles is N c = dim(Ker(∆)). The left null eigenvectors of ∆ correspond to conservation laws, since if then the quantities will not change under any transition, i.e., they are determined by the initial state of the circuit. The elements of ν can always be considered to be 0 or 1. For each connected component of the full circuit in which no conductor is regulated, we have a conserved quantity that is just the total charge of the conductors in that component. In fact, these are the only conserved quantities. Thus, the number of independent conservation laws, N l = dim(Ker(∆ T )), equals the number of closed connected components of the circuit. Whenever a closed circuit is opened by connecting one of its conductors to a voltage source (see Figure 1), we might either break a conservation law or create a new cycle. This can be seen in the following way. The ranknullity theorem applied to the matrix ∆ can be expressed as: This is valid for closed as well as for open circuits. Let us assume however that in the previous equation N , N l and N c correspond to the matrix ∆ of the circuit in which all the voltage sources are disconnected. Then we connect N r voltage sources, and thus the number of conductors involved in the new matrix ∆ is now N = N − N r . Applying the rank-nullity theorem to ∆ we obtain: Subtracting the previous two equations we see that: species A, B, C, · · · and devices to chemical reactions X ↔ Y . This mapping only concerns the structure of the circuit and network, not the dynamics or thermodynamics.
Thus, the number of voltage sources connected to the circuit equals the number of broken conservation laws, N l − N l , plus the number of emergent cycles, N c − N c . This is easily understood: if a previously closed component of the circuit is connected to a voltage source, its total charge ceases to be a conserved quantity. However, if we further connect another voltage source to another conductor of the same component, then a new cycle is created (the one in which a charge is injected by one source, transported through the component, and removed by the second source).

B. Stochastic and deterministic dynamics
At any given time the state of the circuit is described by a probability distribution P (q, t) over the state space. It evolves according to the master equation: (12) where the probability currents are defined as: They are simply the probability per unit time to observe a transition ρ in state q. The summation in Eq. (12) is over positive and negative values of ρ, i.e., it is over transitions and not over devices. The set of currents J ρ (q, t) ρ=±1,··· ,±M can be considered the components of a vector function of the state, that we denote J (q, t).
We define the following operator over those functions: Then, D ρ q [J ] is the net probability current arriving at state q corresponding to transitions ±ρ, and the master equation reads: Note that the change in a scalar quantity F (q) in a transition q → q + q e ∆ ρ can also be expressed by trivially extending the operator D ρ q [·] to these functions: The dynamical description based on the master equation in Eq. (12) is valid for timescales which are large compared to the time taken by each transition or conduction event, that here are considered to be instantaneous. This dynamics can be compared to the deterministic dynamics obtained by usual methods in circuit theory [82]. In those deterministic descriptions, the charges q n are considered to be continuous variables, and the charge vector q evolves according to the following differential equation: where I ρ is the electric current associated to device ρ. The previous equation is closed by providing the I-V curve characterization of all the devices, and by Eq. (1) relating voltages and charges. For example, the current I ρ through a two-terminal device connected from conductor n to conductor m is considered to be a function I ρ (∆V n,m ) of the voltage drop ∆V n,m = V n − V m (see Section III A). Then, ∆V n,m can be expressed in terms of q by inverting Eq. (1). The relation between the deterministic and stochastic descriptions is non-trivial and will be examined in the particular example of the CMOS inverter in section VII A.

C. Equilibrium states and detailed balance
An equilibrium state P eq (q, t) of the circuit is defined as one in which the global detailed balance condition holds: By Eq. (15), if an equilibrium state exists it is also a stationary one. In general no equilibrium state exists. However, for closed and isothermal circuits consistency with equilibrium thermodynamics demands the following Gibbs state to be an equilibrium one: where δ[a, b] = 1 if a = b and 0 otherwise, q (i) is the initial state of the circuit, and ν runs over a set of independent conservation laws. The partition function Z is such that P Gibbs (q, t) is normalized and thus depends on the inverse temperature β and the quantities {L ν (q (i) )}. More general equilibrium states can be obtained by mixing Gibbs distributions like Eq. (19) according to a distribution P (q (i) ) on the initial state. The demand that P Gibbs (q, t) must be an equilibrium state when the circuit is closed and isothermal imposes minimal conditions on the transition rates λ ±ρ (q, t). These are the local detailed balance (LDB) conditions which for closed circuits and isothermal settings are: for each ρ. They can also be written as: where λ(q, t) is a vector function of the state with components {λ ρ (q, t)} ρ=±1,··· ,±N , and the log(·) function is applied element-wise. Thus, the rates λ ±ρ (q, t) characterizing a given two-terminal device ρ must fulfill the constraints imposed by Eq. (20). We now generalize the LDB conditions to open circuits and non-isothermal settings.

D. Energy difference and local detailed balance
We consider an open circuit in which some conductors have the potential fixed by voltage sources. In the same way we did with the charges, the vector V 0 is split into vectors V and V r , containing the voltages of the free and regulated conductors, respectively. We can then express the relation of Eq. (1) between all the charges and voltages as: where C r is the N r × N r capacitance matrix of the regulated conductors, C is the N × N capacitance matrix of the free conductors, and C m is the N × N r matrix with the mutual capacitances between conductors of the two groups. The previous equation can be rewritten as: from where it is clear that, given the potentials V r , the charges q are enough to determine the rest of the variables, as discussed before. The total electrostatic energy is then: We are interested in computing how the total energy of the system (conductors plus sources) changes in a transition q → q + q e ∆ ρ . From the previous equation we see that the change in electrostatic energy is which is independent of the voltages V r . In addition to this, we need to consider the change in the energy stored in the voltage sources. This can be be computed as (minus) the work performed by them, which equals the charge transported from ground to the conductor to which each source is connected times its voltage. There are two different contributions to this work. First, the transition ρ might directly involve one regulated conductor. If a charge q e arrives to the conductor fixed to a potential V r , it needs to be removed, and for this the source must perform an amount of work given by w r = −q e V r . Secondly, even if the transition does not involve any regulated conductor, changes in the distribution of charge among the free conductors can induce a charging of the regulated conductors. From Eq. (23) we see that the induced charge is δq r = q e C T m C −1 ∆ ρ . It follows that the total amount of work performed by the sources during transition ρ is where ∆ r ρ is a vector encoding the change in the number of charges in the regulated conductors in transition ρ (if no regulated conductor is involved in transition ρ, then ∆ r ρ = 0). Thus, the change in the energy of the system and sources can be written as: where we have defined the potential The first term in the right hand side of Eq. (27) is conservative, since its contribution vanishes in any cyclic sequence of transitions in the state space {q}. The second contribution is not conservative since its contribution does not vanish in cyclic transformations: its value will depend on how the regulated conductors are involved in the cycle. Also, we note that the gradient of the potential Φ(q) gives the voltage of the free conductors: as can be verified from Eq. (23). The quantity δQ ρ (q) is the energy required to perform the transition q → q + q e ∆ ρ . By conservation of energy, it must be provided by the environment of the device ρ, which we assume to be at thermal equilibrium at temperature T ρ (this implies that the two conductors to which ρ is connected should also be at temperature T ρ ). Therefore, δQ ρ (q) is the heat associated to device ρ during the transition, and corresponds to an entropy change in its environment equal to −δQ ρ (q)/T ρ . Thus, the LDB condition now reads: For closed and isothermal settings this condition reduces to Eq. (20).

E. Infinite vs finite state spaces
Certain types of circuits, in particular single-electron devices, are such that the number of charges in a given conductor can only take few distinct values. They can therefore be described by truncating their infinite state space {q} to the finite set of states relevant for the dynamics. This will not compromise thermodynamic consistency. Such approaches can also be used to model single-electron traps which can be useful to model random telegraphic and 1/f noise, as we will discuss in section III B 3. For many other circuits, such truncation to a small state space is not possible. Infinite state spaces are indeed a crucial ingredient of devices displaying a macroscopic limit. This is the case of CMOS circuits for instance, where as the typical size of the transistors is increased, the number of electrons in each conductor becomes very large. In this case the stochastic dynamics gives rise to a deterministic non-linear dynamics (described by Eq. (17)), which enables the emergence of complex phenomena such as bistabilities, oscillations, and chaos. Studying fluctuations can become very expensive numerically and many techniques suitable for finite state space (e.g spectral methods) will not be applicable anymore. Also, common approximation techniques such as the second order truncation of the Kramers-Moyal expansion of Eq. (12) leading to Fokker-Planck or Langevin equations, are known to produce incorrect results [83,84]. One must therefore resort to more elaborate methods such as path integrals and large deviations techniques [85][86][87]. Such methods have been used to study stochastic chemical reaction networks [88] where similar problems are encountered [84].

III. MODELS FOR DEVICES
A. I-V curve characterization Two-terminal devices are usually characterized by measuring how the average electric current through them depends on the applied voltage across their terminals, as in Figure 2. Modeling a device ρ as a BPP with rates λ + and λ − , and applying the LDB condition of Eq. (30) to this simple case, we obtain: The net amount of charge going through the device in the forward direction between time t and t + ∆t is: where N ± (∆t) are independent Poisson processes with rates λ ± (V ). The average current is then: Thus, the BPP modeling assumption and the LDB condition allows to determine the rates λ ± (V ) from the measurement of the I-V curve alone, via Eqs. (32) and (34). In turn, from these rates we can compute any statistical moment of the electric current I(∆t) = q(∆t)/∆t. Therefore the full statistics of the process is completely determined by just the mean value I(∆t) . In particular, the second central moment is [20,21]: which at variance with the first moment I(∆t) depends explicitly on the integration time ∆t. This integration time is related via the Nyquist-Shannon sampling theorem to the frequency bandwidth ∆f = 1/(2∆t) of the measurement. Thus, in the limit of large bias (βq e V 1) we obtain the usual expression for shot noise: Then, in this context, shot noise appears as a direct consequence of the BPP assumption and of the LDB condition. For this reason, the fluctuations in circuits with elements that do not display shot noise cannot be faithfully described with this formalism. In the opposite limit where thermal effects dominate (βq e V 1), we recover the usual expression for Johnson-Nyquist noise [54,89].

Tunnel junctions
A tunnel junction is the simplest kind of device and the one for which the BPP model is more natural (in some regimes of operation) [40]. Here we will consider a tunnel junction consisting of a sufficiently small gap between two metallic islands at room temperature, such that electrons can tunnel through the gap. It typically displays an Ohmic I-V curve [40,90]: I = V /R TJ , where the tunnel junction resistance R TJ can be computed from the specific properties of the metal conductors and the transmission coefficient of the junction. Using Eqs. (32) and (34) we obtain the rates: These expressions are well defined for any positive or negative value of the elementary charge q e , and if it changes sign then the roles λ + and λ − are just inverted. Many other different kinds of tunnel junctions exist, which can display strongly non-linear I-V curves depending on the spectral densities of the materials constituting the junction (see [40] for a quick review). it. This allows to preserve the symmetry between the source (S) and drain (D) terminals, that is broken by connecting S and B together to obtain a three-terminal device like in (b).

Diodes
The characteristic curve of a p-n junction diode is often modeled via the ideal Shockley diode equation [91]: where in this case q e is the positive electron charge and I s > 0 is the reversed bias saturation current. Then the Poisson rates are given by: where we have defined the thermal voltage

MOS transistors in weak inversion
MOS transistors are ubiquitous devices underlying most of modern digital and analog electronics. An enhancement-mode nMOS transistor like the one depicted in Figure 3 has two typical modes of operation: (i) a saturation mode, and (ii) a subthreshold or weak inversion mode (see [53] for a rigorous discussion of all the modes of operation). In the saturation mode the transistor essentially behaves like a switch, allowing conduction between source (S) and drain (D) if the gate (G) voltage is above a certain threshold V th (see Figure 3-(a)). If V G < V th then source-drain conduction is suppressed. However, whenever V S = V D some small leakage current will still flow, and its magnitude will greatly depend on how far V G is below V th . This is the subthreshold mode of operation, on which we focus in the following. To describe this mode, we consider the Enz-Krummenacher-Vittoz model of the MOS transistor as developed in [52]. According to this model, the average drain current I D can be naturally split into forward and reverse components given by: where the voltages and the current I D are defined as in Figure 3-(a). This model of the MOS transistor in subthreshold operation involves three parameters characterizing the device: the threshold voltage V th , the 'specific' current I 0 and the 'slope factor' n ≥ 1. All these parameters can be determined from a microscopic description of the device as explained in [52]. The total average drain current is then: In the previous expression the symmetry of the device is preserved, since we see that the current I D is inverted if we interchange the roles of drain and source. For the more common three-terminal configuration of Figure 3-(b), the symmetry is broken and the current is given by: The voltage bias driving this current is V D − V S , which plays the role of V in the I-V curve characterization. Using this last expression we obtain the following Poisson rates: In principle this model is accurate only for I f /r D I 0 . In a pMOS transistor conduction between drain and source is increasingly allowed as the gate voltage becomes negative with respect to the body, contrarily to the nMOS transistor. However, all the expressions presented for the nMOS transistor are still valid for pMOS transistors provided that the sign of the currents and voltages are reversed, as Figure 4 indicates.
In general treatments, the noise in MOS transistors is modeled by integrating infinitesimal Johnson-Nyquist sources along the drain-source channel [52,53]. However, for the subthreshold or weak-inversion mode in which we are interested, the results obtained in that way are fully compatible with those obtained from a simple BPP model as considered here [54]. This is not the case for other modes of operation. We note that this discussion only concerns the noise associated with the transport processes. Other kinds of noise associated with the presence of defects and charge traps (1/f and/or random telegraphic noise) play a role in MOS transistors [92][93][94], and are relevant at low frequency. Random telegraphic noise can nonetheless be modeled within the framework using individual charge traps, as briefly mentioned in section II D. 1/f can be modeled as well as it can be generated by an ensemble of random telegraphic sources [95].

IV. CHARGING EFFECTS
The previously discussed I-V characterization of twoterminal devices allows to determine the Poisson rates λ ±ρ for a given device ρ in situations where the voltage across the device is kept fixed. In actual circuits this voltage will of course depend on the full state q of the circuit, in accordance with the relation of Eq. (1). Thus, the Poisson rates will be functions λ ±ρ (q) of the full state. However, naive constructions of the functions λ ±ρ (q), based on the I-V characterization and Eq. (1), fail to fulfill the LDB conditions. As a consequence, they would lead to unphysical non-thermal stationary states for closed and isothermal circuits. To discuss and illustrate this situation, we revisit the well known 'Brillouin's paradox' [22]. Based on the analysis of this problem we derive a procedure to construct the rates λ ±ρ (q) for arbitrary devices and circuits in such a way that the LDB conditions are always respected.

A. Brillouin Paradox
We consider the circuit of Figure 5: a closed and isothermal circuit consisting of a diode and a capacitor connected in parallel. At any given time the voltage across the diode is equal to the capacitor voltage V = q/C, where q is the total charge in the upper capacitor plate. Then, to construct the Poisson rates λ ± (q) we might consider the following procedure: to obtain the rate for a transition q → q ∓ q e , we evaluate the fixedvoltage rates λ ± (V ) of Eqs. (39) at the voltage preceding the transition (we refer to this as a 'naive causality' assumption). In this way, we have: However, these rates do not fulfill the LDB condition of Eq. (20), that for this simple case reads: = βq e (V + q e /(2C)) = (q + q e /2)/(CV T ), (46) where E(q) = q 2 /(2C) is the energy of the circuit. As a consequence, the stationary distribution corresponding to the transition rates λ c ± (q) is: which deviates from the correct Gibbs equilibrium by a factor e −βqeq/(2C) . Since this factor is an uneven function of the charge, it follows that the stationary mean value of the charge in the capacitor is strictly below 0. If this were the case, the capacitor could be disconnected from the diode and employed as a source of energy, and this process could in principle be repeated indefinitely. This apparent violation of the second law is essentially the Brillouin paradox, and can be considered the electronic analogue of a Brownian ratchet. A way to solve this problem is to notice that the LDB condition of Eq. (46) would be fulfilled if the fixedvoltage rates λ ± (V ) were evaluated not at the voltage before each transition, but at the average of the voltage before and after the transition. Using this midpoint rule we obtain the rates [20,21]: which lead to the correct Gibbs equilibrium. Later we will show that the midpoint rule is valid in general. This means that it can be applied to the fixed-voltage rates λ ±ρ (V ) of an arbitrary device ρ to obtain thermodynamically consistent transition rates λ ±ρ (q) when this device is embedded in an arbitrary circuit. Although this rule seems to be at odds with the notion of causality, it is actually not: the probability of a transition naturally depends on the final state as well as on the initial one. For the naive notion of causality to be preserved one should modify the characteristic I-V curve of the device in question in a way that is context dependent. This in turn challenges the idea of modularity, i.e., the notion that the behavior of a device is not influenced by its environment and therefore can be plugged in different circuits without modifying its description, which is a basic assumption in the usual modeling of complex electronic circuits at the deterministic level. However, modifications to the characteristic curve of a device due to charging effects in its environment are well known in the study of single electron devices, where the most explicit example is known as the Coulomb blockade effect [39-43, 90, 96]. In the following we illustrate the charging effects in the context of the Brillouin paradox.

B. Charging effects in the I-V curve
Let I V be the average current for a given value of the capacitor voltage in the example of Figure 5. According to the correct rates of Eqs. (48), it reads: which match the characteristic I-V function of Eq. (38) evaluated at a voltage shifted by δV = q e /(2C). At the same time, the standard deviation of the voltage is σ V k b T /C. Then, charging effects are relevant whenever δV is comparable to V and σ V . Consequently, in order to observe or employ these effects, as done for example in single electron transistors, one either needs to work with nanoscopic circuits (in order to achieve low values of C) or at low temperatures (usually a combination of both). The characteristic curves of Eqs. (38) and (49) are compared in Figure 6-(a). We note that, counter to intuition, the mean value I V does not vanish for V = 0. This seems to indicate that an initially empty capacitor would charge up when connected to the diode. However, in accordance to the second law, this anomaly is effectively neutralized by thermal fluctuations. This can be verified by computing the mean value of the current for the Gibbs equilibrium distribution P eq (q) ∝ e −βq 2 /(2C) : Also, for 0 < V < δV we have I V < 0, and thus it seems that for those values of V the device is actually delivering power. However this is not the case since the actual voltage can only take the discrete values V = n2δV , with n integer, as indicated with dots in Figure 6-(a). Finally, we note that if we take the limit C → ∞ while fixing the voltage V , which correspond to a model of a perfect voltage source, charging effects disappear (we go back to the picture of Figure 2).

C. General case
Comparing Eqs (32) and (30) we see that if V in the fixed-voltage ratesλ ± (V ) is replaced by ∓δQ ±ρ (q)/q e , then the resulting state dependent rates will automatically satisfy the LDB condition (recall the definition of δQ ±ρ (q) in Eq. (27) and note that it satisfies δQ −ρ (q + q e ∆ ρ ) = −δQ ρ (q)). Explicitly, we should consider (for ρ > 0): In turn, if device ρ is connected to conductors n and m (with n → m as the forward direction) then we have: where ∆V ρ nm (q) is the average of the voltage difference ∆V nm = V n − V m before and after the transition q → q + q e ∆ ρ . This justifies the midpoint rule mentioned above and can be easily verified from the relation of Eq. (23) and the definition of Φ(q) in Eq. (28). Particular care should be taken for the case of the MOS transistor (or in general with three-terminal devices that can be considered as externally controlled twoterminal devices). In this case the fixed-voltage transition ratesλ ± for the source-drain conduction do not depend only on the voltage difference ∆V DS = V D − V S between those terminals, but also on the 'control' voltage ∆V GS = V G − V S . Generalizing the rates of Eq. (44) we can write: where the functions f and g ± are such that the following condition is satisfied: To construct state dependent rates satisfying the LDB condition of Eq. (30), we should not only replace ∆V DS by its average before and after the transition but also do the same with the control parameter ∆V GS . Explicitly: An analysis similar to the one for the diode in section IV B also holds for more general circuits, including CMOS circuits. If C is the typical capacitance at a given node of a circuit, the standard deviation of the voltage fluctuations at that node can be estimated by its equi- where we have defined the elementary voltage v e = q e /C which represents the voltage change associated to a single charge jump. Values of C as low as 50 − 100 aF can be attained in modern CMOS fabrication processes [97], that at room temperature lead to elementary voltages as high as v e 0.1V T , and thus σ V 0.3V T . Then, we see that in modern subthreshold or near threshold applications it is not possible to neglect charging effects nor thermal fluctuations if the operating voltages are comparable to the thermal voltage V T . This in turn opens new possibilities that are explored in the example of section VII B.

D. Charging effects and non-linearity
Circuits containing devices with non-linear I-V curves are qualitatively different from circuits in which all devices are linear. For example, in linear RLC networks (where the degrees of freedom are continuous and the stochastic dynamics is described by a Fokker-Planck equation), the dynamics of the mean or expected values of voltages and currents is decoupled from the dynamics of higher order moments, and therefore it matches the deterministic dynamics [38]. This is not anymore the case if some non-linear element is present. However, in the kind of discrete models we are considering here, charging effects can induce non-linear behaviors even if the characteristic I-V curves of all the devices in the circuit are linear, as is the case with tunnel junctions [39,42,43]. This is illustrated in Figure 6-(b), that was obtained in the same way as Figure 6-(a) for the diode. These induced non-linearities are a resource that is exploited in the construction of single electron transistors and logic gates only consisting of small conductive islands and tunnel junctions between them [40]. However, in the macroscopic limit in which each conductor has many excess charges, or for high temperatures, the non-linear effects are washed out. Consequently, in those regimes, the expected values of voltages and currents in such circuits will obey closed and linear equations of motion that will match the ones obtained from the deterministic description of Eq. (17). The question about the relation between the stochastic and deterministic descriptions in general circuits is non-trivial, and will not be addressed here in full generality. It will be analyzed for the particular example of the CMOS inverter in Section VII A. A general treatment will be considered elsewhere, based on Large Deviations Theory [88,98,99].

V. STOCHASTIC THERMODYNAMICS
So far we have established the general stochastic description of electronic circuits and we have shown how to construct the transition rates corresponding to different devices in a way that is thermodynamically consistent. In the following we explore the general properties of this kind of models. We start by analyzing the energy balance and the entropy production, i.e., we establish the first and second laws. For this we first define the production of heat in each device and its relation to the electric current.

A. Electrical currents and heat dissipation
Let us consider the following pair of transitions, which are the inverse of each other: The average electric current corresponding to this pair of transitions is: And then the net average electric current corresponding to device ρ is: In a similar way, the average rate at which heat is provided by the environment of device ρ corresponding to that pair of transitions is: where δQ ρ (q) is the change in energy of the system during transition ρ and is given by Eq. (27). Note that, as is usual in stochastic thermodynamics but contrary to what is normally done in electronics, heat is defined as positive when it increases the energy of the system. Recalling Eq. (52), if device ρ is connected to conductors n and m, we can write: This is the stochastic version of the usual formula for Joule heating. Note that it is only valid at the level of transitions, and that the average voltage difference is involved. The net heat rate associated to device ρ is: where we have employed Eq. (27) to substitute δQ ρ (q).

B. Balance of Energy
Let us consider the rate of change in the mean value of the potential Φ(q, t): The explicit time dependence of Φ accounts for possible external controls of the parameters entering its definition, like the elements of the capacitance matrix or the voltages of the regulated conductors. Thus, the contribution ∂ t Φ is interpreted as the rate of work done by this external control: Then, employing the master equation of Eq. (12), we can write: In the second and third lines of this equation we used the symmetry D ρ . Note that at variance with the first line, the sums in the last line only involve positive values of ρ and therefore can be considered as sums over devices. According to Section II D, the quantities are the average rates of work performed by the voltage sources corresponding to device ρ. In this way we obtain the following energy balance for a general circuit:

C. Entropy production
To the probability distribution P (q, t) we assign the Shannon entropy: which, as the notation suggests, can be considered the average of the state dependent entropy S(q, t) = −k b log(P (q, t)). Its time derivative is As usual, this rate of entropy change can be split into two components: Using the LDB condition in Eq. (31) the second term can be related to the average entropy production in the environment: Thus, combining the last two equations we obtain the following expression for the total average irreversible entropy production: which is explicitly positive. This constitutes a proof of the second law of thermodynamics in this context. We see that the entropy production Σ vanishes if and only if D ρ q [J ] = 0, i.e., if the state is an equilibrium one (see Eq. (18)). The fact that Σ corresponds to the familiar concept of entropy production is further justified in the following.

Isothermal conditions
If the temperature of all the devices is the same then we can split the total entropy production into a potential term and a work term. To see this we combine Eqs. (70) and (66) and write: where T is the common temperature of all devices. Then, we obtain: where we have defined the average free energy as: Thus, the temperature times the total entropy production rate was expressed as a change in a thermodynamic potential plus the total work performed on the system. Integrating Eq. (73), and using the fact that Σ > 0, we recover the usual statement of the second law: the amount of work that can be extracted from the system during an arbitrary transformation is limited by minus the free energy difference.

D. Equilibrium states
Due to the LDB conditions, for time-independent, closed and isothermal systems there is always an equilibrium state satisfying Eq. (18), and it is given by Eq. (19). However, equilibrium states could also exists under more general conditions. For example, if an isothermal and closed circuit is opened by connecting some or all connected components to a voltage source (in such a way that there is only one voltage source connected to each component), then there also exists an equilibrium state, in which no currents flow. To see this, we notice that in such a case also the second term in Eq. (27) is conservative. In fact, we can write: where n p indexes the regulated conductors, V np is the voltage of conductor n p , and is the total charge of the free conductors in the connected component to which conductor n p belongs. Here the vectors l np can be constructed as the reduction to the space of free conductors of the left eigenvectors of the incidence matrix ∆ of the full circuit, including the regulated conductors. We see then that in this case the energy change during a transition can be expressed as the change in a state function Ψ(q): with Thus, it follows that for isothermal conditions there exists an equilibrium state satisfying the global detailed balance conditions of Eq. (18). It is given by: where the index ν c runs over the closed connected components of the circuit, L νc (q) is the total charge of component ν c as defined by Eq. (8), and q (i) is the initial state. As with Eq. (19), the partition function Z is such that P eq (q, t) is normalized and thus depends on the conserved quantities {L νc (q (i) )}.

E. Fundamental non-equilibrium forces
Based on the previous discussion, we can now split the work performed by the sources into conservative and non-conservative contributions. For this we first split the set of regulated conductors into two categories. For each of the open connected components in the full circuit we arbitrarily select one of its regulated conductors. The conductors selected in this way are indexed by n p = 1, · · · , N p . As will be clear later, the subindex p stands for 'potential'. The total number N p of 'potential' conductors can be easily seen to match the number of broken conservation laws as defined in section II A. The rest of the regulated conductors are indexed by n f = 1, · · · , N f . In this case the subindex f stands for 'force', and N f = N − N p equals the number of emergent cycles. In this way, to each transition ρ involving a regulated conductor we can assign: (i) the voltage V nr(ρ) of the regulated conductor involved in that transition, and (ii) a reference voltage V np(ρ) , that is the voltage of the regulated conductor n p that was selected as 'potential' in the corresponding open connected component. Thus, the energy change during a transition ρ involving a regulated conductor can be rewritten as: where L np (q) is the total charge on the free conductors in the open connected component of conductor n p , as defined in Eq. (76). Thus, the heat rate of device ρ, Eq.
, can also be expressed as: where Ψ is the potential defined in Eq. (78). To each device ρ we can assign a voltage difference ∆V ρ = −(∆ r ρ ) nr(ρ) V nr(ρ) −V np(ρ) . The minus sign in this definition was introduced in order to make ∆V ρ positive whenever the reference voltage is the lowest one in each connected component, and the forward direction of device ρ is the one leaving the regulated conductor. For a "internal" device not connected to any regulated conductor, or for a device connected to a regulated conductor at the reference voltage, ∆V ρ = 0. Then, we see that at most N f voltage differences ∆V ρ can be different from zero. They are considered elements of a set {∆V n f } n f =1,··· ,N f of fundamental voltage differences, or non-equilibrium forces. Therefore, the total heat rate can be written as: where ∆V n f is one of the N f fundamental nonequilibrium forces or voltage differences, and I n f its associated electric current.

F. Minimal decomposition of the isothermal entropy production
For isothermal settings, the entropy production can be decomposed in a similar way as in section V C 1, this time in terms of the grand potential and the fundamental forces ∆V n f . To see this, we start by computing the change in the potential Ψ: As before, we define a rate of work associated to the external control of the system: Note that this rate of work only coincides with Ẇ Φ in Eq. (63) if the voltages of the regulated conductors are time independent. Now, combining the last two equations with Eq. (82) and recalling that for isothermal settings we have T Σ e = − Q , we obtain: that leads to the following expression for the irreversible entropy production: where is naturally defined as the work rate associated to the fundamental voltage difference ∆V n f . We see that if the system is not driven ( Ẇ Ψ = 0) and there are no fundamental forces ∆V n f , then d t Ω = −TΣ ≤ 0. Also, from the fact that the capacitance matrix C is positive definite, it follows that the thermodynamic potential Ω is bounded from below. Thus, when Ẇ Ψ = Ẇ n f = 0, Ω is a Lyapunov function that reaches a minimum at equilibrium.

G. Non-equilibrium Landauer principle
Let us consider a transformation between two arbitrary, possibly non-equilibrium, states P (i) (q) and P (f ) (q). This transformation is driven by changing in time the parameters of the circuit (for example the elements of the capacitance matrix or the properties of some of the devices), and/or by modifying the voltages of the regulated conductors. This can induce a parametric driving of the potentials Φ and Ψ (and thus also the free energies F and Ω), as well as a change in the non-equilibrium forces ∆V n f to which the system is subjected. In the following we will consider the time dependent equilibrium state P eq (q, t), which is just the equilibrium state of Eq. (79) corresponding to the parameters of the system at time t, and that will serve as a reference state. Also, given an arbitrary state P (q, t) (compatible with the conserved quantities L νc (q (i) )), we introduce its relative entropy with respect to the equilibrium state P eq (q, t): I(t) = D(P |P eq ) = q P (q, t) log(P (q, t)/P eq (q, t)), (89) which in simple terms measures how much information should be provided in order to identify the state P (q, t) starting from P eq (q, t). It vanishes if and only if P (q, t) = P eq (q, t), and is always positive otherwise. By employing the explicit form of the equilibrium state P eq (q, t) it is easy to see that the relative entropy can be computed as a difference between average free energies: where Ω(t) = q P (q, t)Ω(q, t) is the non-equilibrium free energy and Ω eq = q P eq (q, t)Ω(q, t) = −k b T log(Z(t)) is the equilibrium one. Using Eq. (90), we can rewrite Eq. (87) as: Integrating this relation over time and using that Σ = Σ dt ≥ 0 we obtain Thus, the previous expression provides a bound for the amount of work necessary to perform (or that can be extracted during) a transformation between arbitrary states. Importantly, this bound explicitly takes into account the 'information content' of the initial and final states with respect to the 'uninformative' equilibrium, and it can be considered a general version of the Landauer principle [100]. The connection to the notion of logical or computational information is achieved by splitting the state space {q} into different logical subspaces. This was done in [61], where a bound equivalent to Eq. (92) was derived. The Landauer principle is commonly discussed in terms of physical memories that represent logical values as quasiequilibrium metastable states, of which the proper thermal equilibrium is a mixture [61,[101][102][103]. However, it is important to notice that in some relevant kinds of electronic memories logical values are represented by non-equilibrium steady states (NESSs) that continuously produce entropy (for example, SRAM cells, or the probabilistic bit discussed in section VII B). Although the Landauer principle can anyway be applied to those cases, the bound obtained from the right hand side of Eq. (92) does not take into account that continuous entropy production (or "housekeeping heat" [104]), or any other additional dissipation due to restrictions on the control parameters of the system [105]. Refinements of the Landauer principle based on lower bounds for the entropy production in non-adiabatic transformations can be obtained [66][67][68][69][70][71], but to the best of our knowledge the physics of computation with NESSs remains poorly explored.

VI. STOCHASTIC TRAJECTORIES AND FLUCTUATION THEOREMS
In the previous sections we have studied the average or expected values of relevant quantities like the energy, entropy or work. In this section we turn to a lower level description based on single trajectories in the state space of the circuit, that will allow us to formulate different fluctuation theorems. We closely follow the treatment in [33] for chemical reaction networks and of [28] for general Markov chains. Here we only present the main results and the necessary definitions. Additional details about the derivations can be found in the supplementary material.
We define a trajectory Q t as a particular realization of the stochastic dynamics, from some initial time τ = 0 up to time τ = t. Thus, a particular trajectory is fully characterized by its initial state q 0 , the set of transitions {ρ l } that took place up to time t, and the times {t l } at which they occurred. The index l takes the values l = 1, · · · , N t , where N t is the number of transitions up to time t. All this information can be encoded in the trajectory probability current: where q t is the state immediately before instant t. Different trajectories will occur with different probabilities. If the evolution of the system is well described by the master equation of Eq. (12), then the probability density P[Q t ] of observing trajectory Q t given that the initial state is q 0 satisfies: where we have defined t Nt+1 = t. The factors in the first product account for the probabilities of not having any transition during the periods [t l , t l+1 ), while the factors in the second product are proportional to the probabilities of each of the jumps to take place. If we average Eq. (93) over all trajectories then we recover the probability currents of Eq. (13). The average quantities defined in the previous sections can be easily extended to individual trajectories. For example, the instantaneous electric current and power of device ρ are: andQ These are just the stochastic versions of the average quantities in Eqs. (81) and (57), respectively, and are obtained by simply replacing the average current vector J (q, t) by the stochastic one, j(q, t), which is a vector function with components {j ρ (q, t)} ρ=±1,··· ,±M . ∆V n f (t) is one of the fundamental non-equilibrium forces defined in Section V E. The net change of a state function f (q, t) during a trajectory can be expressed in terms of the currents j ρ (q, t) in the following way: Applying this expression to the potential Ψ(q, t) defined in Eq. (78) we can arrive to the following energy balance for a given trajectory: where is the external driving work performed during the trajectory, is the heat corresponding to device ρ, and is the work performed by the fundamental nonequilibrium force ∆V n f (I n f is its associated electric current).

A. Stochastic entropy and the integral fluctuation theorem
As mentioned before, it is possible to define the entropy of a given state during a trajectory in the following way [106]: where P (q, t) is the solution of the master equation in Eq. (12) for a given initial distribution P (q, 0). The entropy flow, i.e., the production of entropy in the environment during a given trajectory is Then, the total entropy production during the trajectory is: It can be verified that the time derivatives of the averages Σ e and Σ over all trajectories match the entropy production rates Σ e and Σ defined in Eqs. (70) and (71). Unlike its average Σ , the entropy production Σ of a given trajectory is not always positive. However, the fluctuations of Σ are bound to satisfy a general integral fluctuation theorem (IFT). This fundamental result is expressed as the following equality: where the average is taken over all trajectories Q t . In simple terms, this equality states that positive values of the full entropy production are more probable than negative ones. Accordingly, from this result and Jensen's inequality the usual statement of the second law follows: Σ ≥ 0. Eq. (105) is valid for transient or steady state dynamics, in autonomous or time-dependent circuits.

B. Detailed fluctuation theorems
Equation (105) is only one of several fluctuation theorems. Under certain conditions, other quantities different from the full entropy production satisfy more stringent constraints. For example, in isothermal settings where β ρ = (k b T ) −1 for all ρ, it is possible to obtain the following detailed fluctuation theorem (DFT): (106) In this expression, P ({W n f }, W Ψ ) is the probability to observe the values {W n f } of work for each of the fundamental forces and of W Ψ for the driving work during a forward protocol. This protocol consists on the initialization of the system state at t = 0 according to the equilibrium state of Eq. (79), and its subsequent evolution according to the transition rates λ(q, τ ) up to time t (the explicit dependence of the rates on τ takes into account a possible external manipulation of the circuit parameters, leading to an inhomogeneous time evolution). Analogously, the quantity P † ({−W n f }, −W Ψ ) is the probability to observe the work values {−W n f } and −W Ψ during the corresponding backward protocol, in which the system is initialized in a state drawn from the equilibrium distribution of Eq. (79) (this time corresponding to the circuit parameters at time t), and evolves according to the rates λ(q, t − τ ).
The previous result holds for arbitrary times t, under the condition that the initial state is an equilibrium one. A complementary relation can be obtained, which is only valid asymptotically (that is, for long times), but irrespective of the initial distribution (under the additional assumption that the circuit state is bounded to a finite region of the state space). It reads: where we have defined the average work ratesW Ψ = t −1 W Ψ andW n f = t −1 W n f . Note that in the case in which the voltage differences ∆V n f are constant we have W n f =Ī n f ∆V n f , whereĪ n f = t −1 t 0 dτ I n f (τ ) are the average associated currents during the trajectory, and therefore Eq. (107) can be easily expressed in terms of the probabilities P ({Ī n f },W Ψ ). If there is no external driving of the circuit parameters, then P (·) = P † (·). The fluctuation theorems in Eqs. (106) and (107) express fundamental symmetries of energy exchange processes. For example, the novel thermodynamic uncertainty relations [107], or the usual Onsager relations (as well as their non-linear extension [56]) can be recovered from them. For completeness, a general proof of these and others fluctuation theorems is given in the supplementary material.

A. The CMOS inverter
The inverter or NOT gate is the most elementary logic gate. It has a single logical input, which is negated in its only output. A diagram of a possible implementation of this gate with MOS transistors is shown in Figure 7-(a). It is composed by one pMOS (top) and one nMOS (bottom) transistors, with common drain and gate terminals. The device is powered by applying a voltage difference V dd − V ss between source terminals. When the voltage in the input is V in < (V dd + V ss )/2, conduction in the nMOS transistor is suppressed while it is enhanced in the pMOS transistor, and therefore the output voltage V out rapidly approaches V dd . The situation is reversed for V in > (V dd + V ss )/2, as shown in Figure 7-(b). Now we explain how to build a stochastic model of the inverter within our formalism. The first step is to model the MOS transistor as an externally controlled conduction channel, with associated capacitances. For example, the nMOS transistor at the bottom of the diagram of Figure 7-(a) can be represented as in Figure 8. There, the transistor is represented as an externally controlled conduction channel between source and drain. The Poisson rates λ n ± associated to that channel are constructed as explained in section IV C, and will depend on the gate to source voltage as well as on the drain to source voltage. The gate-body interface is modeled as a capacitor FIG. 8: Model of an nMOS transistor as an externally controlled conduction channel between source and drain, with associated Poisson rates λ n ± . The gate-body interface is represented as a capacitor Cg, and another capacitor Co takes into account the output capacitance. This is just a minimal model and does not pretend to be realistic. Other parasitic capacitances could also be taken into account, for example between drain and gate, but a proper description of them must take into account the physical dimensions of the device. of capacitance C g , and another capacitor C o takes into account the output capacitance of the transistor. Using this mapping, we can model the full inverter with the diagram of Figure 9. In turn, this diagram corresponds to a set of four conductors, in which 3 of them are regulated by voltage sources, as shown in Figure 10. The relation between the charges and voltages in this system is given by where q out is the charge of the only free conductor (the output of the gate), and to which we will refer simply as q in the following. By comparison with Eq. (22) we can extract the blocks C, C m and C r of the capacitance matrix (separated by lines in the previous expression). Using that and Eq. (24) we obtain the internal energy of the circuit as a function of the only degree of freedom q: with ∆V = V dd − V ss . In the same way, from Eq. (28), we can obtain the potential and by taking its gradient we obtain the output voltage as a function of q: Also, by selecting V ss as the reference voltage to construct the potential Ψ (Eq. (78)), we find and that ∆V as defined above is the only non-equilibrium force (with I p , the current through the pMOS transistor, as the associated current). According to Eq. (98), the energy balance for this circuit at the trajectory level is: whereQ n andQ p are the heat currents associated to each of the transistors. The irreversible entropy production is given by Eq. (87) and for this case reads: where T is the temperature of both transistors and Ω = Ψ − T S (we assume time independent voltages, so the driving contribution of Eq. (85) is not present). As can be seen from the two previous equations, for steady state conditions (d t Ψ = d t S = 0), the entropy production Σ reduces to the entropy flow Σ e and we recover the usual expression: We now build the transition rates associated to both transistors according to the procedure of Section IV C. We begin with the nMOS transistor. The voltage difference between drain and source is ∆V DS = V out − V ss , and thus its average during the transition q → q ± q e , with rates λ n ∓ (q), is V DS = (q ± q e /2)/(2C o ) + ∆V /2. For the pMOS transistor, the voltage difference between source and drain is ∆V SD = V dd − V out (recall that for pMOS transistors the references for voltage and currents are reversed), and its average for the same transitions, this time with rates λ p ± (q), is V SD = −(q ± q e /2)/(2C o ) + ∆V /2. Then, via the procedure of Section IV C and the fixedvoltage rates of Eq. (44), we obtain the transition rates λ n + (q) = (I 0 /q e ) e (Vin−Vss−V th )/(nV T ) λ n − (q) = λ n + (q) e −((q+qe/2)/(2Co)+∆V /2)/V T for the nMOS transistor, and for the pMOS. Thus, the master equation for the distribution P (q, t) reads: The master equation can be employed, for example, to find the steady state for given voltages V in , V dd and V ss . As shown in the supplementary material, this can be done analytically, and the steady state is uniquely determined by the following recurrence relation: where we have defined the constants: as q st = q T log(x). The constants a and b are such that: where the mean value is taken on the stationary state given by Eq. (119) (see supplementary material). Thus, a and b quantify the fluctuations of the output charge around the mean value. They are defined so that if the stationary state was a thermal equilibrium state (as it is for V dd = V ss ), then a = b = 0. The constant a is a measure of how the even moments of P st (q) around the mean value deviate from those corresponding to equilibrium, while b is the same for the odd moments. We see then that Eq. (17)). Thus, this equation constitutes an exact stochastic generalization of the deterministic results that to the best of our knowledge was not obtained before. In Figure 11-(a) we show the probability distribution for the output charge for different values of the power and input voltages. When there is no voltage bias applied to the gate (V dd = −V ss = 0), the distribution is just the equilibrium one. When a bias is applied but there is no input voltage (V dd = −V ss = 5V T and V in = 0), the distribution is stretched out and it ceases to be Gaussian. The application of a small input voltage tilts this distribution to one side, and a further increase of the input voltage generates an approximately Gaussian peak centered around the value corresponding to the deterministic solution. We see that the distribution of the output charge is in general asymmetric with respect to the deterministic value. Thus, its mean value and the deterministic one will differ. This is further evidenced in Figure 11-(b), where we plot the parameters a and b as a function of the input voltage. We see that the largest deviations from equilibrium occur around zero input voltage, when the two transistors are equally activated, while they rapidly decrease as one of the transistors is more activated than the other. What happens here is that for large positive (negative) V in the output conductor is approximately at equilibrium with the source V ss (V dd ). Consequently, the current through the device (and therefore the entropy production), follow a similar pattern. Finally, in Figure  11-(c) we show the deviations of the actual transfer function of the inverter from the deterministic one, caused by non-equilibrium fluctuations.
We now turn to the analysis of the current fluctuations. For this we employ the method for full counting statistics [41,45], that we review in the supplementary material. This method allows to evaluate the characteristic function associated to the currents fluctuations in terms of the generator of the master equation. Then, the characteristic function can be inverted to obtain the probability distribution. We consider the number N t of charges that went through the pMOS transistor during a time t, starting from the stationary distribution. In Figure 12-(a) we show the probability distribution of N t for different input voltages, with t = 10 −1 t 0 , where t 0 = (q e /I 0 ) exp(V th /(nV T )) is the natural time scale for this problem. Also, in Figure 12-(b) we illustrate the DFT of Eq. (106), which in this case reads: assuming an initial state P eq (q) ∝ e −βΨ(q) , with Ψ(q) given by Eq. (112). From Figures 11-(a) and 12-(a) we see that, except when the number of charges is too low (as in the bottom panel of Figure 12-(a)), the deterministic solution matches the most probable result according to the stochastic treatment. This is analogous to what has been formally shown in the case of chemical reaction networks using large deviation theory [88]. These results and methods set the stage for more interesting problems, since the NOT gate is a basic primitive in electronic design in terms of which more complex devices can be built. For example, connecting the output of the gate back to its input through some conduction channel we can generate self-sustained oscillations. Connecting two NOT gates in a loop we obtain a bistable system with two metastable NESSs, which is the basis of many designs of electronic memories, and also of the next example. More complex logic gates can be modeled in the same way. We have only analyzed the stationary distribution of the inverter for a given input voltage, although in a real application the energetic cost of switching the inputs is a significant contribution to the total entropy production. This cost can be analyzed by letting the input change in time in a predefined way. Alternatively, it can also be studied in autonomous circuits (i.e., not requiring time dependent external driving) displaying bistability or limit cycles, as is done in the next section.

B. A full-CMOS probabilistic bit
A probabilistic bit (p-bit), or binary stochastic neuron, is a device with a single output b that can ideally take only two values, let us say 1 and −1. It outputs the value 1 with probability p, and −1 with probability 1 − p. The value of p is controlled by an input I. For large positive values of I, p → 1, while for large negative values p → 0. In a collection of N of these elementary devices, the inputs {I i } i=1,··· ,N could be adjusted as a function of the instantaneous state B = (b 1 , b 2 , · · · , b N ), and in this way correlations between different p-bits can be established [14]. For example, given a cost function E(B), it is possible to derive functional relations I i (B) such that the state B occurs with probability P (B) ∝ e −E(B) . The function E can then be chosen so that its minimum (the most probable state) encodes the solution to some problem of interest [108]. Alternatively, the function E(B) can be "learned" in order for P (B) to approximate the distribution of a given data set [77].
There have been recent proposals to implement p-bits with noisy electronic circuits. The most relevant employs magnetic tunnel junctions (MTJs), a technology being used for some commercial memories [109], modified on purpose so that they are sensitive to thermal noise (by lowering the energy barrier separating the two possible states representing one bit) [13,14]. A previous proposal considers a "probabilistic switch" [11,110]: a regular CMOS inverter which is driven by external noise at its input, so that its output fluctuates between the two possible values. The advantage of this second proposal is that it is based on CMOS circuits only (not on less common devices like MTJs). However, its main drawback is that the intrinsic noise is actually neglected, instead of being exploited as a resource. The reason is that the description of the CMOS inverter is purely deterministic, and therefore access to an external source of noise is assumed. Also, the noise is considered to be Gaussian, which as we have seen in the last section is not always the case. These limitations are of course related to the difficulty of describing intrinsic noise in non-linear electronic circuits, as discussed in Section I. As we will see next, our formalism allows to overcome those limitations, which highlights its practical value.
We propose a full-CMOS design for a p-bit that is selfsufficient: its stochastic behavior is due to the intrinsic thermal noise, so no external source of noise is necessary. Also, while in p-bits based on MTJs the transition rate or error probability is fixed by the fabrication process (for a given temperature), our design allows to control this parameter on the fly by just changing the power voltage. The basic circuit is shown in Figure 13 and is composed of two coupled NOT gates as in regular SRAM cells. The logical circuit of Figure 13-(a) has two stable states: b = 1 andb = −1, or b = −1 andb = 1. The corresponding CMOS implementation of Figure 13-(b) has two degrees of freedom: the voltages v 1 and v 2 (or alternatively, the charges q 1 and q 2 ) at the output of each inverter. We consider V out = v 1 to be the output used to monitor the state of the bit. If the powering voltage is above a critical value V * dd (that for n = 1 can be found to be V * dd = V T ln(2) [99]), then the deterministic equations for the circuit have two possible steady solutions, which correspond to the two stable logical states of Figure 13-(a). At the stochastic level, they correspond to two metastable NESSs, for which V out V dd or V out −V dd , respectively. A stochastic model for the circuit of Figure 13-(b) can be built as before, by employing the mapping of Figure 8 and constructing the rates associated to each transistor with the procedure of Section IV C. In this case this is done automatically by a custom software package, that is also able to deal with general circuits [111]. The steady state distribution can be obtained by constructing the generator of the master equation in Eq. (12) (truncated to some maximum number of charges), and computing its eigenvector of zero eigenvalue, as shown in Figure 14-(a). Also, the corresponding stochastic dynamics can be simulated with the Gillespie algorithm. In this way we can generate stochastic trajectories. For example, in Figure 14-(b) we show two trajectories for different values of the power voltage. To obtain those results we considered the following parameters: V T = 26 mV (room temperature), C g = 50 aF and C o = 10 −2 C g . Crucially, these values of capacitances are compatible to what is achieved in modern sub-7nm fabrication processes [97]. Also, for simplicity we took n = 1, and as before the parameters I 0 and V th of the transistors just fix the time scale t 0 = (q e /I 0 ) exp(V th /(nV T )). We clearly observe random transitions, or errors, between the two metastable NESSs, and that the transition or error rate depends on the power voltage V dd . This is easily understood: frequent random transitions are expected whenever the standard deviation of the fluctuations around the output voltage, which can be estimated as σ V = k b T /(2(C o + C g )), is comparable to the mean value V out ±V dd . Thus, one can control the transition rate by changing the powering voltage V dd or, for fixed V dd > V * dd , by changing the temperature and or the size of the transistors (which modifies the capacitances C g and C o ). Indeed, to a very good approximation, the waiting time τ between transitions is exponentially distributed, P (τ ) = λe −λτ , and the transition rate λ can be seen to scale as λ ∝ exp −2(V dd /σ V ) 2 /(n + 2) to dominant order in V dd /σ V 1 [99]. Note that for the previous parameters V * dd /σ V is of order one at room temperature, but that it increases as the square root of the size of the transistors. Therefore, the transition rate λ decreases exponentially in the size of the transistors. As a consequence, the exploitation of these naturally ocurring fluctuations as a resource at room temperature is only a real possibility for highly scaled, state of the art fabrica- tion processes, as the ones considered above. The error rate λ can be computed from the trajectories generated by the Gillespie algorithm, or also by spectral methods as explained in [99]. The results are shown in Figure 14-(c), for two different scales.
To complete the construction of the p-bit it is necessary to provide a mechanism to bias its output. There are different ways to achieve this, and here we will focus on the circuit of Figure 13-(c), where the biasing circuit is colored. It works as follows. A bias voltage V b > 0 is coupled to the outputs of the two inverters which form the core of the p-bit through the drain-source channel of two transistors. The transistor influencing the output of the first inverter is an nMOS, while the one influencing the output of the second inverter is a pMOS.
Both transistors have their bodies grounded, such that their activation depends only on the gate-body voltage V in (the Poisson rates corresponding to this configuration are discussed in the supplementary material). For V in = 0, both transistors are equaly activated and the output of both inverters is very weakly biased towards V b in . For V in > 0, conduction through the nMOS is enhanced, while it is suppressed for the pMOS, and therefore only the output of the first inverter is biased towards V b . In that case the symmetry between the two possible metastable NESSs (V out V dd or V out −V dd ) is broken in favour of the one with V out V dd . The situation is reversed for V in < 0. In Figure 14-(d) we show two sample trajectories of the output voltage V out for a positive and a negative value of the input voltage V in . We see that V out is indeed biased and spends more time around positive or negative values, respectively. The parameters of the transistors are the same as before, with the exception that the specific current I 0 of the transistors in the biasing circuit is one order of magnitude lower than the others (I 0 = I 0 /10). Also, we considered a bias voltage V b = V T . In Figure 14-(e) we show how the probability p = P (V out > 0) of the output being positive depends on the input voltage. We see that p is indeed given by a sigmoidal function of V in , similar to the typical activation functions considered in artificial neural networks. Note however that, due to the asymmetric I-V curves of the transistors in the biasing circuit, the balanced case P (V out > 0) = 1/2 is not achieved at V in = 0 but for a slightly positive V in .
We now analyze the energy consumption of the p-bit. For this we obtain the steady state distribution as before, and compute the mean values of current and heat rates associated to each transistor according to Eqs. (57) and (61), respectively. By symmetry, for the circuit consituting the core of the bit in Figure 13-(b), the steady state electric current through all the four transistor is the same, and its dependence on V dd is shown in Figure 15-(a). We see that it increases monotonously for low powering voltages, and that it develops a peak right after the transition into bistability (at V dd = V T log(2)), after which it settles to a constant value. This maximum in the current close to the transition into bistability is associated to the ocurrence of errors or transitions between different NESSs: each switching event involves the relaxation of previously stored charge, which adds up to the continuous flow of charge in each NESS. This is also seen in Figure 15-(b), where we plot the entropy production rate and its derivative with respect to the powering voltage. Since we are considering steady state conditions, we have that Σ = Σ e = (−1/T )( Q p1 + Q n1 + Q p2 + Q n2 ), where Q (n/p)(1/2) is the rate of heat dissipation in the n/pMOS transistor of the first or second inverter. Interestingly, we see that the transition to bistability is signaled by a maximum in the derivative of Σ with respect to V dd . In Figure 15-(c) we show all the currents for the full circuit of Figure 13-(c), including the bias circuit, for V b /V T = 1 and V in = 0. We see that the behaviour of the currents through the transistor in the core circuit is qualitatively similar as the previous case, although the currents are naturally not all equal anymore. In Figure 15-(d) we show the total entropy production and its derivative for the full circuit. In this case we split the total entropy production rate as Σ = Σ core + Σ bias , where Σ core = (−1/T )( Q p1 + Q n1 + Q p2 + Q n2 ), and Σ bias = (−1/T )( Q pb + Q nb ) are the rate entropy production rates corresponding the the bit core and biasing circuits ( Q (n/p)b is the rate of heat dissipation in the nMOS or pMOS biasing transistor). We see that the transition to bistability is still signaled by a maximum in the derivative of Σ , and that the dissipation associated to the biasing circuit is a small fraction of the total one. Finally, the behaviour with respect to the input voltage V in with fixed powering voltage V dd /V T = 1.15 and bias voltage V b /V T = 1 is shown in Figures 15-(e-f). In Figure 15-(f), we see that the entropy production rate has a maximum value at zero bias. This is again due to the ocurrence of transitions between different NESSs, which of course decrease when the bias increases in any direction.
The average total dissipated heat per generated bit is given byQ = T Σ /λ. It is interesting to note that for transitions generated by intrinsic thermal noise,Q decreases as the speed or transition rate increases. For example, by reducing the powering voltage V dd (always above V * dd ), the total rate of heat dissipation T Σ decreases ( Figures 15-(b) and (d)), while the transition rate λ increases exponentially (Figures 14-(c)), and thereforē Q decreases exponentially. For V dd /V T 1.1 we have an average dissipated heat per generated bit in the order ofQ = T Σ /λ 2 × 10 3 k b T 10 aJ. This can be compared to the MTJ p-bit in [14], that requires an energy of 2 fJ per random bit, two orders of magnitude higher than the previous estimation.

VIII. DISCUSSION
We have presented a formalism for the construction of stochastic models of non-linear electronic circuits in a thermodynamically consistent way. Devices with arbitrary I-V curves can be described, provided that their current fluctuations display shot noise. A complete analysis of the stochastic thermodynamics of these models was carried out. The relevant thermodynamic potentials were identified, and the different contributions to the irreversible entropy production were characterized. All these quantities were extended to individual trajectories, based on which we presented different detailed and integral fluctuation theorems. As a first application, we have constructed a stochastic model of a subthreshold CMOS inverter, or NOT gate. We have shown how to analytically find the steady state of the resulting master equation. Based on that solution, we analyzed how the non-equilibrium thermal fluctuations induce modifications in the transfer function of the gate. Also, we showed how to compute the full counting statistics of the current fluctuations and in that way illustrated a detailed fluctuation theorem. Finally, we proposed a full-CMOS design of a probabilistic bit, or binary stochastic neuron, in which intrinsic thermal noise is exploited as a resource to generate random bits of information in a controllable way. The energy consumption of our design is several times lower than in previous proposals.
Of course, the formalism has some limitations, which are important to discuss here. In first place, only devices displaying shot noise can be described. This excludes, for example, regular resistors or MOS transistors in general modes of operation. In addition, it is not possible to de- scribe inertial effects (i.e., inductances). These could be included at the price of mixing discrete and continuous variations of charge. Nevertheless, there is little practical motivation for this, since it is difficult to integrate inductances in nanoscale electronic circuits [112], and therefore inertial effects might only be relevant at extremely high frequencies. There are other possible extensions of the formalism, which are however less relevant, since they capture effects that can be actually emulated with the formalism as presented here. For example, although the stochastic dynamics we have considered is Markovian, non-Markovian effects can be described by considering a given circuit as a part of a larger one (something known as 'Markovian embedding'). Finally, although we have not focused on single electron devices, our formalism can be directly applied to them. In that context, there are "cotunneling" effects (events in which two or more transitions happen at the same time, possibly leaving the state of the circuit unchanged) that can become relevant in the Coulomb blockade regime and our formalism does not take into account [40] Our work offers a connection between different subjects and communities. It shows how to employ the methods of stochastic thermodynamics and single electron devices in order to model other kinds of circuits that are traditionally given a deterministic description, which is later supplemented by an approximate treatment of the noise. In this way we can describe the fluctuations in those circuits on a rigorous basis. This is relevant and timely, given the impressive reduction in the size of CMOS circuits, and the need for new energy-efficient computing paradigms. On the other hand, the great versatility in the fabrication and control of electronic circuits makes them an excellent platform to study complex phenomena in statistical physics and non-equilibrium thermodynamics. Thus, our formalism also offers a valuable bridge between theory and experiment. In the future it could be interesting to explore the connection between the lowlevel description we propose here and the more abstract treatments of stochastic thermodynamics of complex circuits in Refs. [73,74], in particular under which conditions the fundamental bounds that they obtained can be approached within a given technology.
After finishing this work we became aware of a recent article [9], in which a similar stochastic description is employed to compute the error rate of a low-power SRAM memory cell. It should be noted that the transition rates employed in that article are actually not thermodynamically consistent, i.e., they do not respect the local detailed balance conditions. and In the last expression, ∆D is the change in the stochastic relative entropy between the actual distribution and the reference one, D(t) = k b log(P (q t , t)/P ref (q t , t)). The subindices nc, c, and d, stand for 'non-conservative', 'conservative' and 'driving', respectively. The physical meaning of these different contributions to the entropy production will be clarified in the following. Before proceeding it is necessary to update the previous definitions of forward and backward protocols. In the following by forward protocol we consider a process that is initiated at time t = 0 by selecting an initial state according to the distribution P ref (q, 0), and subsequently evolves according to the rates λ ρ (q, τ ) up to time t. In the backward protocol, the initial state is drawn from the distribution P ref (q, t), and evolves with rates λ † ρ (q, τ ). Crucially, the time inversion is also applied to the reference distribution, i.e., during the backward trajectory the reference distribution is . Under these conditions, it can be seen that the quantities Σ nc and Σ d are odd under time inversion: Owing to this 'involution' property, as explained in [2], the following detailed fluctuation theorem (DFT) holds: Here, P (Σ nc , Σ d ) is the probability of obtaining the values Σ nc and Σ d of the observables Σ nc (Q t ) and Σ d (Q t ) during the forward protocol, and P † (Σ nc , Σ d ) is the same for the backward protocol. Integrating the previous expression over all possible values of Σ nc and Σ d we obtain a new IFT: Notice that this is not equivalent to the IFT of Eq. (3). Also, we note that a DFT like Eq. (10) for the full entropy production Σ does not hold in general, due to the fact that this observable does not satisfy the involution property. We now introduce a particular choice for the reference distribution, that naturally leads to the 'conservative' and 'non-conservative' labels introduced before. This choice is given by the equilibrium distribution of Eq. (79) in the main text, at a reference inverse temperature In this case the non-conservative contribution is and the driving contribution is where Ω is the potential defined in Eq. (83) in the main text and Note that this contribution to Σ d is not fluctuating, i.e., it does not depend on the actual trajectory. We see that if the potential Ψ(q, t) is time independent, then the driving contribution vanishes. In the general case we have: In isothermal conditions where β ρ = β ref = (k b T ) −1 for all ρ, since Σ e = − ρ Q ρ /T , the non-conservative contribution is simplified to: Also, since the individual work contributions satisfy the involution property W n f (Q t ) = −W n f (Q † t ), the DFT of Eq. (10) can be generalized to: where P ({W n f }, W Ψ ) is the probability to observe the values {W n f } of work for each of the fundamental forces and of W Ψ for the driving work during the forward protocol, while P † ({W n f }, W Ψ ) is the same for the backward protocol.

B. Adiabatic-Nonadiabatic decomposition
In the context of Eq. (10), another relevant option for the reference state is given by the instantaneous stationary distribution. That is, we consider the distribution P ref (q, t) that is left invariant given the transition rates λ ρ (q, t) for a fixed time t: Thus, P ref (q, t) is the stationary distribution to which the system would eventually relax if the transition rates were fixed at their values at time t. For slowly varying protocols d t Σ nc approximates the entropy production in the instantaneous steady state, while Σ c is associated to the driving and relaxation. Accordingly, this contributions are now refered to as 'adiabatic' and 'non-adiabatic', respectively. As explained in [1], in this case both expected values are positive: Σ nc ≥ 0 and Σ c ≥ 0. If the instantaneous stationary distribution is an equilibrium state, and therefore satisfies the global detailed balance conditions D ρ q [J ref ] = 0 for all q and ρ, then the adiabatic contribution Σ nc is strictly zero (at the trajectory level, not just on average). Thus, the DFT of Eq. (10) reduces to: On the other hand, if the circuit parameters do not depend on time, Σ d = 0 and we have Note that only the probabilities for the forward protocol appear in the previous equation, since in this case the forward and backward protocols are equivalent (the starting distributions are the same, i.e., the stationary one, and there is no driving).

C. Asymptotic average rates
In typical out of equilibrium processes the entropy production for long times is dominated by the entropy flow Σ e , since, contrary to the change ∆S in the internal entropy, it is an unbounded and time extensive quantity (we assume that the state q dwells in a finite region of the state space). This property can be exploited to obtain a DFT for the stochastic asymptotic average work rates, or average powers. To see this we first note the following general identity: where 0)). If we are considering a process in which the entropy production Σ is time extensive, then the entropy flow Σ e should also be time extensive, since ∆S ref is bounded and cannot grow indefinitely. Thus, recalling that Σ e = − ρ Q ρ /T ρ , for long times we can write: whereQ is the average heat rate of device ρ during a trajectory. Thus, neglecting the non time-extensive terms in Σ nc + Σ d , it is possible to arrive to the following result: for long times. This result can be further simplified for isothermal conditions, since in that case T Σ e = −Q = W Ψ − ∆Ψ + n f W n f . Again, assuming that the circuit state q is restricted to a finite region of the state space during the trajectory, only the quantities W Ψ and W n f can be time extensive, since ∆Ψ is bounded. Thus, again neglecting non time extensive terms, for long times we can write: where we have defined the average work ratesW Ψ = t −1 W Ψ andW n f = t −1 W n f .

II. STEADY STATE OF THE CMOS INVERTER
In this section we find the steady state distribution for the NOT gate using the ladder operator method introduced in [3], where it was applied to a diode in thermal equilibrium. Thus, we define the ladder operator X = e qed/dq acting on functions over the state space: Xf (q) = f (q + q e ), and therefore Then we can write the master equation for the NOT gate as (q e /I 0 ) e V th /(nV T ) d t P (q, t) = X −1 α n γe −q/(2CoV T ) + α p + X α n + α p γe q/(2CoV T ) − α n γe −q/(2CoV T ) + 1 + α p γe q/(2CoV T ) + 1 P (q, t), where we have defined the constants α n = e (Vin−Vss)/(nV T ) α p = e (V dd −Vin)/(nV T ) γ = e −(∆V /2+qe/(4Co))/V T q T = 2C o V T .
From this, we can conclude that for the stationary state P st (q) the quantity α n 1 − X −1 γe −q/q T + α p γe q/q T − X −1 P st (q) should be constant. Also, this constant should be 0, since P st (q) → 0 for q → ±∞. In this way we arrive at the following recurrence relation: α n + α p γe q/q T P st (q) = α p + α n γe −(q−qe)/q T P st (q − q e ), which can be iterated in order to find P st (q). Its physical meaning is transparent: the left hand side is proportional to the probability of the transition q → q − q e , while the right hand side is proportional to the probability of the transition q − q e → q. The condition that these two probabilities must balance each other uniquely determines the steady state. At equilibrium it reduces to the detailed balance condition. Furthermore, by summing the previous relation over all values of q we obtain: α n − α n γ e −q/q T st = α p − α p γ e q/q T st , where · st indicates averaging over the stationary distribution. This condition is just equivalent to demanding that in the stationary state the average current through both transistors must be the same. In the next section we use this relation to obtain an approximate analytical expression for the transfer function of the gate.

III. STOCHASTIC CORRECTIONS TO THE TRANSFER FUNCTION OF THE CMOS INVERTER
The relation of Eq. (33) can be rewritten as: Now, one can notice that the quantities exp(±(q − q st )/q T ) st will equal 1 in a deterministic limit. Their deviation from this value is a measure of the thermal fluctuations. If known, the previous equation could be solved to obtain q st . One possible approximation is to compute them using the equilibrium distribution. Doing so, one obtains: e ±(q− q eq )/q T eq = e qe/(2q T ) .
Therefore, under this approximation, the average output charge q can be determined by finding the positive root of the following equation: where γ 0 = exp(−∆V /(2V T )), and we have defined x = exp( q st /q T ). This is precisely the same solution one finds with a classical deterministic analysis of the circuit (i.e., by finding the fixed value of the output charge or voltage that makes the currents through both transistors equal). Thus, we see that a deterministic analysis is only compatible with the assumption that the fluctuations in the circuit are the same as in equilibrium. To quantify the deviations of the actual fluctuations from equilibrium we define quantities a and b as follows: e ±(q− q st )/q T st = e qe/(2q T ) e a±b .
In this way, the parameters a and b account for the deviations with respect to equilibrium of the even central moments and odd central moments, respectively (if the stationary distribution is always symmetric around the mean value then b = 0). Then, given a and b, the output charge q is determined by the positive root of:

IV. FULL COUNTING STATISTICS OF THE CURRENTS
In this section we review a semi-analytical method to evaluate the characteristic function of the current fluctuations [4,5]. The main quantities we will consider are the numbers {N ρ (t)} indicating the number of charges that went through a given device ρ during a time t: where I ρ (τ ) = −q e q D ρ q [j]| τ is the stochastic electric current during a trajectory (see Section VI in the main text). The caracteristic function for these stochastic quantities is: where χ is a vector of 'counting fields' χ ρ > 0, one for each device, and the average is taken over all trajectories compatible with a given initial distribution. The function Φ(χ) can be evaluated in terms of a modified version of the generator associated to the master equation for the circuit. Before proceeding it is convenient to define this generator as follows. First, we write the master equation (Eq. (12) in the main text) in matrix form: Thus, {|q } is a ortonormal basis in which each vector is associated to one particular circuit state ( q|q = δ q,q ), and |P (t) is a vector enconding the probability distribution at time t. The generatorL(t) is split into diagonal and non-diagonal parts: L(t) =γ(t) −Γ(t) withγ(t) = q |q q| γ(q, t), andΓ(t) = q ρ |q q − q e ∆ ρ | λ ρ (q − q e ∆ ρ ). (42) Then, the diagonal partγ(t) encodes the escape rates γ(q, t) = ρ λ ρ (q, t), while the non-diagonal part encodes the transition rates. Since the evolution given byL(t) preserves the normalization of the distribution |P (t) , it always has a zero eigenvalue, with a left eigenvector |l 0 = (1, 1, · · · , 1) T and a right eigenvector |P st , where P st (q, t) is the stationary state, that we assume to be unique. Also, we will consider the modified generator: L χ (t) =γ(t) −Γ χ (t), withΓ χ (t) = q ρ |q q − q e ∆ ρ | λ ρ (q − q e ∆ ρ ) e iχ |ρ| s(ρ) , where χ |ρ| is the counting field associated with device ρ (note that the absolute value is necessary since in the previous expression the sum is over transitions, and therefore ρ takes positive and negative values), and s(ρ) is just the sign of ρ. It is important to note that the modified generator λ χ (t) does not preserve normalization. In fact, as shown in [4], the characteristic function defined in Eq. 40 can be expressed as: where T τ is the time-ordering operator and |P (0) is the initial state. For χ = 0, U (t, 0) = exp(− t 0 dτ L χ (τ )) is just the evolution operator corresponding to the master equation. In that case, the previous equation is the sum of the probabilities in |P (t) and therefore we have Φ(0) = 1. For circuits with constant parameters we have the simpler result: The previous expression can be numerically evaluated by truncating the state space to an appropiate number of dimensions and constructing the operator L χ in that truncated space. Once the characteristic function Φ(χ) has been obtained, the probability distribution for the quantities {N ρ (t)} can be computed as: This is how the results of Section VII A in the main text were obtained. Finally, we mention that from the previous definitions it is possible to obtain a fluctuation theorem that is a special case of Eq. (18) in the main text. Indeed, for time-independent circuits and isothermal settings (βρ = β for all ρ), we have the following symmetry of the operatorΓ χ : whereP eq is a diagonal operator encoding the equilibrium distribution P eq (q) = Z −1 exp(−βΨ(q)) (Eq. (79) in the main text), and χ = {−χ ρ + iq e β ∆V n f (ρ) } ρ>0 , where ∆V n f (ρ) is the voltage difference associated to device ρ, as defined in Section V E in the main text. From this symmetry it follows that Φ(χ) = Φ(χ), and consequently which is our final result.

V. TRANSITION RATES FOR MOS TRANSISTORS WITH GROUNDED BODY
In this section we discuss the Poisson rates that one should assign to the drain-source conduction channel of a MOS transistor in which the body is grounded, instead of being connected to the source, as in Figure 3-(a). As discussed in the main text (recall Eq. (42)), in that case the symmetry between drain and source is preserved, but the mean current is not a function of the voltage bias V D − V S only: This is natural since the body potential introduces an internal voltage reference. However, the Poisson rates can still be considered functions of the voltage bias V D − V S if we regard V D and V S as extra control parameters, in addition to V G . Explicitly, we can consider the rates: