Thermodynamics of Error Correction

Information processing at the molecular scale is limited by thermal fluctuations. This can cause undesired consequences in copying information since thermal noise can lead to errors that can compromise the functionality of the copy. For example, a high error rate during DNA duplication can lead to cell death. Given the importance of accurate copying at the molecular scale, it is fundamental to understand its thermodynamic features. In this paper, we derive a universal expression for the copy error as a function of entropy production and {\cred work dissipated by the system during wrong incorporations}. Its derivation is based on the second law of thermodynamics, hence its validity is independent of the details of the molecular machinery, be it any polymerase or artificial copying device. Using this expression, we find that information can be copied in three different regimes. In two of them, work is dissipated to either increase or decrease the error. In the third regime, the protocol extracts work while correcting errors, reminiscent of a Maxwell demon. As a case study, we apply our framework to study a copy protocol assisted by kinetic proofreading, and show that it can operate in any of these three regimes. We finally show that, for any effective proofreading scheme, error reduction is limited by the chemical driving of the proofreading reaction.


INTRODUCTION
Copying information is a fundamental process in the natural world: all living systems, as well as the vast majority of manmade digital devices, need to replicate information to function properly.The quality of a copy relies on it being an accurate reproduction of the original and can be quantified by the fraction η of wrongly copied bits that it contains.Errors can be provoked by several hardware-specific causes, such as imperfections in the copying machinery.At the molecular scale, perfect copying does not exist as thermal fluctuations constitute a fundamental source of error, regardless of the system.Since the reliability of the copying process is ultimately limited by thermal noise, it must be understood in terms of thermodynamics, as recognized by Von Neumann [1].
Therefore, a critical question is whether one can invoke the second law of thermodynamics to establish a universal connection between the error and physical quantities characterizing the copy process.This issue should be addressed in a general framework, incorporating two basic features of copying machineries.First, copying protocols often involve several intermediate discriminatory steps used to regulate the accuracy and speed of the process.This is a characteristic property of both natural and artificial error-correcting protocols.For example, accurate copying of DNA occurs via multistep reactions [2].Second, due to the statistical nature of the second law, one should consider cyclically repeated copy operations rather than a single one [3].This cyclical operation is also consistent with the behavior of polymerases when duplicating long biopolymers.
To understand the thermodynamics of copying, we in-troduce a general framework where both the copying protocol can be arbitrarily complex (as in models describing biochemical reactions [4][5][6][7]) and copy operations are cyclically repeated (as in models inspired by the physics of polymer growth [8][9][10][11][12][13][14][15]).Our framework describes template-assisted growth of a copy polymer (or "tape", see [16]) aided by a molecular machine, see Fig. 1.Gray and white circles represent two different monomer types.
The molecular machine, represented as a red circle in the figure, is situated at the tip of the copy strand and tries to match freely diffusing monomers with corresponding ones on the template.When a free monomer arrives at the tip, the machine transitions through a network of intermediate states to determine whether to incorporate or to reject it.Incorporation is more likely if the matching is right, i.e. the color of the monomer matches that of the template, than if it is wrong.On average, the copy strand elongates at a speed v ≥ 0 and accumulates errors with probability η.  [8,[11][12][13]15]). Second example: kinetic proofreading, where after an intermediate state a backwards driven pathway removes errors to improve the overall accuracy of the copy [4,13].Third example: mRNA translation, where the three copying steps represent initial binding, GTP hydrolysis and final accommodation; a proofreading reaction is also present [18].
is quasi-static, v → 0. The error is then η eq ≈ exp[−(∆E w − ∆E r )/T ], determined by the energy changes ∆E r and ∆E w of right and wrong monomer incorporation and independently of the copying protocol.In this case, the error can be reduced by increasing the gap (∆E w −∆E r ), in agreement with Bennett's idea that cyclic copying can be performed near equilibrium with arbitrary precision [3,13].This mechanism is however unpractical, for example due to the low speed limitation.Instead, typical molecular machines spend chemical energy to copy at a finite speed and out of thermodynamic equilibrium.Non-equilibrium copying protocols can also reduce the error far below its equilibrium value.For example, the equilibrium estimate for the error in DNA duplication is η eq ∼ 10 −2 , where the actual observed error is η ∼ 10 −9 [2].An important non-equilibrium mechanism underlying error correction is kinetic proofreading, which feeds on chemical energy to preferentially undo wrong copies [4,5,8].Other non-equilibrium mechanisms such as induced fit [17] and kinetic discrimination [10,13] complement kinetic proofreading to underpin the high accuracy of replication in biological systems.
In this work we demonstrate that, for the broad class of processes depicted in Fig. 1, a direct relation links copy errors with non-equilibrium thermodynamic observables.In particular, at fixed work budget, the error decreases exponentially with the total entropy produced per wrongly copied bit.This relation is completely general, in contrast with conditions setting hardware-specific minimum errors η min that characterize each particular copying protocol.When studying wrong matches alone, three copying regimes can be identified: error amplification, where energy is invested in increasing the error rate; error correction, where energy is invested in decreasing the error rate; and Maxwell demon, where the information contained in the errors is converted into work.We conclude by studying the specific copying protocol of kinetic proofreading.We show that proofreading can operate in all these three regimes.Furthermore, for a broad class of proofreading protocols, we show that error reduction is limited by the chemical energy spent in the proofreading reaction.

Template-assisted polymerization
We start our discussion by detailing the stochastic dynamics of the template-assisted polymerization process sketched in Fig. 1.Its transition network is represented in Fig. 2A.The rectangles correspond to the states of the system after the copying machine finalized incorporation of a monomer.We denote them with a string such as . . .rrwr, which refers to a particular sequence of right and wrong matches (see also Fig. 1).Dashed circles en-closes sub-networks of n intermediate states, characteristic of the copying protocol.The intermediate states, represented as blue/green circles for right/wrong matches in Fig. 2A, are used by the machine to process a tentatively matched monomer and decide whether to incorporate it or not.We note intermediate states as . . .rrwrr i , with 1 ≤ i ≤ n, and analogously for wrong monomers.A copying protocol is fully specified by the topology of the subnetworks, assumed to be the same for right and wrong matches, and the kinetic rates k r ij for right matches and k w ij for wrong ones.Differences in the rates are responsible for discrimination.Possible examples of sub-networks of increasing complexity are represented in Fig. 2B.
Because of thermal fluctuations induced by the environment at temperature T , all kinetic transitions are stochastic.The states are thus characterized by timedependent probabilities P (. . .r), P (. . .w), P (. . .r i ) and P (. . .w i ).Their evolution is governed by a set of master equations which can be solved at steady state, see Methods.Key to the solution is to postulate that errors are uncorrelated along the chain, so that P (. . .
where N is the length of the chain and N w is the total number of incorporated wrong matches.The error η can then be determined via the condition where v r and v w are the average incorporation speeds for right and wrong monomers respectively.Substituting the solution for P (. ..) into the master equations leads to explicit expressions for v w and v r as a function of the error and all the kinetic rates.In this way, Eq. ( 1) becomes a closed equation for the only unknown η.Its solution can be used to explicitly compute other physical quantities, such as the net elongation speed v = v r + v w .

Thermodynamics of copying with errors
The kinetic rates k r ij and k w ij are determined by the energy landscape of the system, the chemical drivings µ ij of the reactions, and the temperature T of the thermal bath, as represented in Fig. 3A.The energy difference of an intermediate state respect to the state before the candidate monomer incorporation are ∆E r i = E(. . .r i ) − E(. . .), and similarly for wrong incorporation.The energy changes after finalizing incorporation of a monomer are ∆E r = E(. . . ) − E(. . . ) and analogously for wrong matches.Energetic discrimination arises when the wrong match is energetically more unstable than the right one, ∆E w ≥ ∆E r .In addition, wrong matches can also be discriminated kinetically, i.e. by exploiting a different activation barrier δ ij in the transitions performed by the machine when a right monomer is bound.In general, complex copying protocols can combine both these mechanisms [13,19].Full expressions of the rates are summarized in Fig. 3B.

A B j i
FIG. 3: Energy landscape and kinetic rates.A Energetic diagram of a single transition in the reaction network.B Corresponding kinetic rates.The transition j → i can be driven by energy differences and the chemical driving µij.
Transitions involving a right and a wrong monomer can be characterized by different kinetic barriers δij, as well as different energetic landscapes ∆E w j = ∆E r j .The bare rate ωij is the inverse characteristic time scale of each reaction.
Given a steady-state elongation speed v, the chemical drivings perform an average work per added monomer ∆W = ij µ ij (J r ij + J w ij )/v, where J r ij and J r ij are probability fluxes (see also Methods).Further, each monomer incorporation results in an equilibrium freeenergy change of ∆F = −T log(e −∆E r /T + e −∆E w /T ).In the quasi-static limit v → 0, the system approaches equilibrium and the population of all states is determined by detailed balance.This implies that the equilibrium error is η eq = exp (−∆E w + ∆F ).When driving the dynamics out of equilibrium, the error will in general depart from its equilibrium value, leading to a positive total entropy production.In Methods, we derive that the total entropy production per copied monomer and the error are linked by the relation where is the Kullback-Leibler distance between the equilibrium and non-equilibrium error distribution, which is always non-negative and vanishes only for η = η eq .Eq.
2 is a formulation of the second law of thermodynamics, as it states that the average performed work per monomer is greater than the incorporated free energy, ∆W −∆F ≥ T D(η||η eq ) ≥ 0. In this view, the Kullback-Leibler term in Eq. 2 can be interpreted as the intrinsic entropic cost of maintaining the error away from its equilibrium value.This cost sets a lower bound on the minimum dissipated work which is more refined than that of the traditional form of the second law.Eq. ( 2) relates the information content of the copy with thermodynamics.However, in many relevant cases, the entropy production is dominated by the dissipated work, so that in practice Eq. ( 2) reduces to the traditional form of the second law.Consider for example a case in which error correction is very effective, η η eq .In this limit, the Kullback-Leibler term tends to a constant, D(η||η eq ) → − log(1 − η eq ) > 0. Since usually the equilibrium error is already small, this constant is also small, D(η||η eq ) ≈ η eq 1.The reason is that, since errors are typically rare, their overall contribution will be small.
However, errors can have a large impact on the entropy produced per wrong incorporated monomer where ∆W w = ij J w ij µ ij /v w is the work performed per wrong match (see Methods).Rearranging terms in Eq. ( 3) yields a general expression for the error in terms of thermodynamic observables This result does not depend on microscopic details of the copying protocol, such as the discrimination barriers δ ij .Eq. ( 4) provides a direct link between thermodynamic irreversibility and accuracy of copying.It states that, given a fixed work budget, reduction of the error beyond its equilibrium value always comes at a cost in terms of entropy production.However, the dependence of the error on the dissipated work is non-trivial to derive from Eq. ( 4), as varying the work also affects the entropy production.
In this regime, dissipated work leads to an error higher than its equilibrium value.While in this case dissipating energy is counterproductive in terms of the achieved error, it can be justified by the need of achieving a high copying speed.
2. Maxwell demon, ∆W w − ∆F < 0 and η < η eq .In this regime, the machine extracts work while lowering the information entropy of the chain with respect to its equilibrium value, −η log(η) < −η eq log(η eq ).This phenomenon can be considered as an instance of a Maxwell demon, as an apparent violation of the second law would appear from neglecting entropy production associated with information manipulation (see e.g.[20]).Note however that, including also the effect of right matches, the total dissipated work is always non-negative, see Eq. 2.
3. Error correction, ∆W w − ∆F > 0 and η < η eq .This is an error-correction scenario in which work  2) as a function of the error.Notice that the irreversible work dominates over the information term.B Same terms as in A, but for wrong monomers only.In this case, the information term dominates the entropy production.C Relation between error and entropy production of wrong monomers, together with thermodynamic (red, dashed) and hardware-specific (black, dashed) bounds.In all panels, the driving µ10 is varied to vary the error.Parameters are δ10 = 10, ∆E r 1 = 0, ∆E r 1 = 3T .
is dissipated to achieve an error lower than the equilibrium error.In this case, which is the most common for biological machines, Eq. ( 4) implies a simple bound on the error, η ≥ η eq exp(−∆S w tot ).
Given the copying protocol and the kinetic rates, the copying machinery will achieve a certain error η and operate in one of these three regimes.Varying the kinetic rates affects both the error and the thermodynamic observables, possibly shifting the operating regime of the machine.To better scrutinize these aspects, we now move to considering specific protocols.
In the simplest possible example, incorporation occurs in a single step, as sketched on the top panel of Fig. 2B (see also [8,[11][12][13]15]).It can be shown that this protocol is always dissipative, ∆W w − ∆F ≥ 0. In general, wrong monomers can be discriminated by a kinetic barrier δ 10 and an energy difference ∆E w − ∆E r [13].If the kinetic barrier is larger than the energy difference, δ 10 > ∆E w − ∆E r , it can be shown that η < η eq , corresponding to error correction.If it is lower, then η > η eq , which corresponds to error amplification [13].In Fig. 4A we plot the different terms of the total entropy production, Eq. ( 2), for the error correction case.As discussed before, the information contribution to the total entropy production is negligible for small errors.Instead, note in Fig. 1B that the information term of Eq. ( 3) dominates over the work performed per wrong matches.This implies that the universal expression for the error, Eq. ( 4), is very well approximated by the lower bound of error correction, as shown in Fig. 4C.The error departs from this bound only when it approaches its hardware-specific minimum η min ≈ e −δ10/T .

Energetic bound to proofreading accuracy
In kinetic proofreading, a copying pathway that incorporates monomers at a speed v c ≥ 0 is assisted by a parallel pathway which preferentially removes wrong matches at a speed v p ≤ 0, see Fig. (5A).To maintain a negative speed, the proofreading reaction must be driven backward either by performing a work ∆W p , or by exploiting a high free energy difference ∆F between the final and the initial state.By means of proofreading, one can achieve lower errors than those of the copying pathway alone, at the cost of spending additional chemical driving and reducing the net copying speed v = v c + v p .
We consider a proofreading protocol consisting of a copying pathway with one intermediate step in addition to the proofreading reaction, see middle panel in Fig. 2B.By tuning the rates, this model can operate in all three regimes described in the previous section, as shown in Fig. 5B.In particular, in the Maxwell demon regime, the error can be reduced up to one order of magnitude below its equilibrium value while at the same time extracting work from the wrong copying reaction.Very small errors are achieved in a strongly driven error correction regime, where the dissipated work is positive and the error satisfies η ≥ η eq exp(−∆S w tot ).However, at variance with the example of the previous section, here the entropy production becomes quickly much larger than this bound.The reason is that effective proofreading fundamentally involves dissipation of work.This dissipation, rather than the information term, dominates the entropy production of wrong matches at low errors.
To derive a better estimate of the error in proofreading, we now focus on the entropy production rate of proofread errors T Ṡw p,tot = −v w p ∆W p − v w p [∆F eq + T log (η/η eq )], where ∆W p is the proofreading work, equal to the chemical driving of the proofreading reaction.Using that in proofreading v p < 0 while Ṡw p,tot ≥ 0, we can derive (see For each value of the error η, the other free parameters (µ10, µ21, ω21, ω02) are determined by minimizing the entropy production per copied wrong monomer ∆S w tot .C Minimum error as a function of the proofreading work ∆Wp = µ02.For each curve, energies and activation barriers are fixed parameters as in the previous panel (except for δ02 which varies, as in the captions).For each value of µ02, the other free parameters (µ10, µ21, ω21, ω02) are determined by numerically minimizing the error η.Red-dashed and blackdashed lines represent thermodynamic and hardware-specific bounds, respectively.Methods for details) the following bound for the error This equation shows that error reduction in proofreading is limited by its energetic cost, either in the form of chemical work in the proofreading pathway [19] or free energy of the final state, which involves performing work in the copying pathway [4].Similarly to Eq. ( 4), this bound does not depend on details of the copying protocol.In Fig. 5C, we show the error of the specific proofreading model of Fig. 5B as a function of the proofreading work.One can appreciate that the bound from Eq. 5 is tightly met for a wide range of errors.For very small values of ∆W p , when v p > 0 and no proofreading occurs, the bound is not satisfied.Finally, for very large work values, the error approaches the hardware-specific minimum η min .In this case, the value of η min can be obtained from the explicit solution of the model (see derivation in Methods).
In the strongly driven regime, the error η decreases at increasing proofreading work ∆W p .At the same time, v p becomes more negative as more copies are proofread.The minimum error is thus obtained in the limit of vanishing elongation speed, when the proofreading speed is negative enough to arrest copying, v p = −v c .Imposing this condition gives the hardware-specific minimum error This expression shows that the error of the first copying step, approximatively equal to e −δ10/T because of the large kinetic barrier, is reduced by a factor e (δ02−∆E w +∆E r )/T due to the additional discrimination of the proofreading reaction.

DISCUSSION
In this paper, we analyzed template-assisted polymerization, where copies are cyclically produced by an arbitrary complex reaction network.This broadly extends previous studies where monomer incorporation occurs in a single step [8][9][10][11][12][13][14][15]).In particular, the results presented here allow for analyzing the thermodynamics of realistic biological copying protocols, where the reaction network is responsible for error correction.
At variance with models for the copy of a single monomer [4][5][6][7], in template-assisted polymerization the entropy of the system grows during elongation due to the exponential increase in the number of possible states of the chain.This growth causes the appearance of an information term in the formula for the total entropy production, Eq. 2. A similar term appears in the context of Landauer principle out of equilibrium [21], and was interpreted as the amount of information necessary to shift from the equilibrium distribution to the non-equilibrium one.On the other hand, this term should not be confused with a formally similar one derived in [9], which represents a physically different quantity, i.e. the entropy of the copy given the template.
The main result of this paper is that, thanks to the explicit dependence on the error, the second law of thermodynamics can be used to obtain general expressions and bounds on the copy error.This allows us to identify three different copying regimes: error amplification, error correction, and Maxwell demon, all of which can be achieved by kinetic proofreading.
Considering cyclic copying is analogous to considering cyclic transformation when studying the efficiency of thermodynamic engines.Besides being the natural choice to properly describe the thermodynamics of the process, template-assisted polymerization allows for outof-equilibrium copying regimes which are absent in singlemonomer models.For example, a lower bound to the error analogous to Eq. 5 is generally valid in closed networks [22,23].In template-assisted polymerization, this limit can be broken when the proofreading reaction reverts its flux, as seen in Fig. 5D for small values of the work.
We briefly discuss the relevance of our results for interpreting experimental data.Many biological copying pathways are driven by the hydrolysis of one single GTP molecule.The chemical work spent in this process is ∆µ = ∆µ 0 + k B T log Taking as reference the bare potential of ATP, ∆µ 0 = 14.5kB T , and typical concentrations [GT P ] = 1mM, [GDP ] = 0.01mM and [P i] = 1mM, we obtain ∆µ GT P ≈ 20k B T .In a protocol involving proofreading, this information and Eq. 5 can be used to set a lower bound for the error.Assuming that the energy of GTP is all spent to increase the free energy of the chain, ∆F ≈ ∆µ GT P , we obtain that the total error reduction is η/η eq ≥ 10 −9 .The value of this bound is smaller than typically observed errors, which reasonably suggests that not 100% of the energy of hydrolysis is utilized to increase the free energy of the system.
Given the flexibility of our framework, many complex copying mechanisms studied in the literature as noncyclic processes [17][18][19] can be directly considered as template assisted polymerization problems and studied from the point of view of thermodynamic efficiency.One limitation of our treatment is the lack of long-term memory: while processing a monomer, the machine does not keep track of the past errors encountered along the chain.A more general scheme could exploit correlations in the template sequence to reduce the error.An example of this is backtracking [24], where regions of the template containing many errors are entirely reprocessed.Generalization of template-assisted polymerization to these cases will be the subject of a future study.
The thermodynamic relations derived in this paper fundamentally limit the capabilities of stochastic machines to reduce and proofread errors.They are reminiscent of similar bounds derived for adaptation error in sensory systems [25], and constitute an important step towards understanding general thermodynamic principles [26] limiting the accuracy of non-equilibrium information-processing.

Steady-state solution of template-assisted polymerization
In this section, we briefly outline how to solve the template-assisted polymerization model.We start by writing the master equations governing the evolution of probabilities of all main states P (. . .), and those of the intermediate states P (. . .r i ) and P (. . .w i ).The probability flux between two arbitrary intermediate states . . .r j and . . .r i is J r ij (. . . ) = k r ij P (. . .r j ) − k r ji P (. . .r i ), and analogous for wrong matches (see Fig. 2A).The master equations for the intermediate states can be expressed in a compact form in terms of these fluxes Ṗ (. . .r i ) = where the three sets of fluxes in each equation correspond to finalized incorporation of the last monomer in the main state, and attempted incorporation of a right and wrong monomer.Eqs. ( 7) are similar to those written for biochemical models, while Eqs.( 8) are similar to those used for polymer growth.
(9) For the intermediate states we make the additional ansatz P (. . .r i ) = P (. . .)p r i and P (. . .w i ) = P (. . .)p w i , (10) where p r i and p w i are the occupancies of the intermediate states 1 ≤ i ≤ n, assumed to be independent of P (. . .).Substituting Eqs. 9 and 10 in 7 yields a system of 2n linear equations, from which the occupancies can be expressed as functions of the kinetic rates and the error η, still to be determined.It is now convenient to define the occupation fluxes J r ij as where N = [1 + n i=1 (p r i + p w i )] −1 is a normalization constant.Occupation fluxes are related to the probability fluxes via J r ij (. . . ) = P (. . .)J r ij /N and analogously for wrong matches.The speed of right and wrong monomer incorporations can now be expressed as v r = i J r n+1i = i J r i0 and v w = i J w n+1i = i J w i0 .Replacing the ansatz in Eqs. 8 and using these definitions results in Eq. 1, which can be finally used to determine the error.

Entropy production rate
To calculate the steady-state entropy production rate, we start with the general expression [27] Ṡtot = 1 2 ...,i,j .
We now factorize the sum into one over strings (noted For an isolated network at steady state, all terms but the first one vanish by flux conservation [27].However, in cyclic copying the states i = 0 and i = n + 1 receive

FIG. 4 :
FIG.4: Template-assisted polymerization without intermediate states.A Irreversible work ∆W − ∆F , entropy production and Kullback-Leibler term of Eq. (2) as a function of the error.Notice that the irreversible work dominates over the information term.B Same terms as in A, but for wrong monomers only.In this case, the information term dominates the entropy production.C Relation between error and entropy production of wrong monomers, together with thermodynamic (red, dashed) and hardware-specific (black, dashed) bounds.In all panels, the driving µ10 is varied to vary the error.Parameters are δ10 = 10, ∆E r 1 = 0, ∆E r 1 = 3T .

FIG. 5 :
FIG. 5: Regimes and bounds of kinetic proofreading.A Scheme of a generic proofreading scheme.Copying occurs at a net speed vc > 0 through an arbitrary reaction network of intermediate states.After the copy is finalized, a proofreading reaction removes errors at a speed vp < 0. The net average speed is v = vc +vp ≥ 0 B. Thermodynamic regimes of kinetic proofreading.The model combines a copying scheme with one intermediate state with kinetic proofreading, as represented in Fig.(2B).The shaded regions denote the three thermodynamic regimes discussed in the previous section.Parameters are δ10 = 5T , δ21 = 0, δ02 = 5T , ∆E w 2 = ∆E w 1 = 2T , ∆E r 2 = ∆E r 1 = 0.For each value of the error η, the other free parameters (µ10, µ21, ω21, ω02) are determined by minimizing the entropy production per copied wrong monomer ∆S w tot .C Minimum error as a function of the proofreading work ∆Wp = µ02.For each curve, energies and activation barriers are fixed parameters as in the previous panel (except for δ02 which varies, as in the captions).For each value of µ02, the other free parameters (µ10, µ21, ω21, ω02) are determined by numerically minimizing the error η.Red-dashed and blackdashed lines represent thermodynamic and hardware-specific bounds, respectively.

.
.. ) and one over intermediate states (where ij denotes links).Using the definition of the occupation fluxes, Eq. 11, we obtain: Ṡtot = ... P (. . . ) the sum over all states is normalized to one, we have that ... P (. ..) = [1 + n i=1 (p r i + p w i )] −1 .Using the definition of N in previous section, the term outside the brackets is equal to 1. Substituting the definition of the rates of Fig.(3) into (