Optimal control of nonequilibrium systems through automatic differentiation

Controlling the evolution of nonequilibrium systems to minimize dissipated heat or work is a key goal for designing nanodevices, both in nanotechnology and biology. Progress in computing optimal protocols has thus far been limited to either simple systems or near-equilibrium evolution. Here, we present an approach for computing optimal protocols based on automatic differentiation. Our methodology is applicable to complex systems and multidimensional protocols and is valid arbitrarily far from equilibrium. We validate our method by reproducing theoretical optimal protocols for a Brownian particle in a time-varying harmonic trap. We also compute departures from near-equilibrium behaviour for magnetization reversal on an Ising lattice and for barrier crossing driven by a harmonic trap, which has been used to represent a range of biological processes including biomolecular unfolding reactions. Algorithms based on automatic differentiation outperform the near-equilibrium theory for far-from-equilibrium magnetization reversal and driven barrier crossing. The optimal protocol for crossing an energy landscape barrier of 10kT is found to hasten the approach to, and slow the departure from, the barrier region compared to the near-equilibrium theoretical protocol.


Introduction
The control of nonequilibrium phenomena at microscopic scales is central to biology and nanotechnology.Evolution has exquisitely tuned cellular processes to perform out-of-equilibrium tasks, ranging from machines like ATP synthase to metabolic factories converting raw materials and energy into functional macro-molecules.Experimental advances allow phenomena on this scale to be probed in unprecedented detail [1,2,3], but determining precisely how specific processes work and the role of evolutionary optimization remains a major challenge.And while impressive progress has already been made engineering synthetic DNA [4,5] and protein [6,7] structures, we do not understand how to design de novo nanomachines for nonequilibrium environments well enough for nanotechnology to rival the complexity of cellular machines.For a microscopic system evolving out of equilibrium, thermodynamic quantities like entropy, heat, and work can be meaningfully defined only at the level of individual trajectories [8].An ensemble of trajectories has distributions of thermodynamic properties with forms that depend on the system's non-equilibrium evolution [8,9].This immediately suggests an optimization problem whereby a protocol λ(t) drives a system between given initial and final states to produce a desired distribution of thermodynamic properties.A common aim is to minimize average dissipated work.This is important for optimal bit erasure [10,11,12], as well as experimental measurements of the equilibrium free energy of biomolecules from nonequilibrium force pulling experiments and simulations [13,9,14].Other targets include protocols that maximize thermodynamic efficiency for synthetic [15,16] and biological [17] nanoengines and protocols minimize dissipated heat, for example in bit flipping operations [18,19].
Modelling nonequilibrium processes is notoriously difficult, even when the equations of motion are precisely known.A general method to optimize nonequilibrium driving protocols valid for systems of any complexity evolving arbitrarily far from equilibrium has heretofore not been elucidated.Existing work calculating optimal protocols has been limited to either extremely simple systems such a Brownian particle diffusing in a harmonic well or a single quantum dot [20,21,22,23], or applies only in the near-equilibrium regime [24,25,26].The assumption of near-equilibrium evolution restricts optimal protocols to free energy landscapes with low energy barriers, ruling out most systems of interest, such as RNA molecules with pseudoknots, proteins, and biomolecular motors like ATP synthase.
Inspired by recent computational advances in the machine learning community, we propose a method for computing optimal nonequilibrium protocols that is valid for complex systems evolving far-from-equilibrium.In particular, we leverage automatic differentiation (AD) [27,28,29,30], a technique for computing gradients that repeatedly applies the chain rule to elementary computational steps.AD optimization has been recently applied in a range of scientific contexts, from quantum devices to self-assembly [31,32].Using efficient AD algorithms developed in the context of training neural networks [33,34,35] in conjunction with sophisticated GPU (graphical processing units) and TPU (tensor processing units) hardware, we compute gradients by backpropagating through entire simulations, allowing us to find optimal protocols via gradient descent for a variety of systems.To illustrate the potential of this method, we here consider three canonical examples from the optimal protocol literature and drive evolution much farther out of equilibrium than previously possible.First, we consider Monte Carlo (MC) simulations and use AD to derive optimal protocols for flipping the magnetization of a 2D Ising lattice.The AD protocols perform similarly to existing near-equilibrium theoretical results in the linear regime and outperform the near-equilibrium theory in the far-from-equilibrium regime.Next, we treat molecular dynamics (MD) simulations, reproducing classic analytical results for a single Brownian particle in a time-varying harmonic potential.Finally, we examine barrier-crossing on a double-well potential driven by a moving harmonic potential, a problem that maps onto biomolecular unfolding processes [36,37].After recreating existing results, we probe the far-from-equilibrium regime of barrier crossing, demonstrating the capability of our method to capture departures from the near-equilibrium optimal protocols.

Differentiation of MC simulations: Nanomagnetic spin systems
Computers dissipate large amounts of heat when performing logical operations via bit erasure, which reverses nanomagnetic spins [38,39].This has motivated recent studies investigating minimum-dissipation protocols for magnetization reversal with the 2D Ising model [18,40,19].The system is described by the Hamiltonian where σ i = ±1 are the spins, ij indicates a sum over all nearest neighbour spins, J ij is the coupling between spins i and j, and B(t) is the (time-dependent) external magnetic field.
In the linear response (near-equilibrium) regime, Crooks and co-workers developed a general formalism for computing optimal protocols based on thermodynamic geometry [41,24].Rotskoff and Crooks applied this theory to yield the theoretical optimal protocol for varying external magnetic field B and spin-spin coupling strength J (equivalent to varying temperature) to reverse magnetization on an Ising lattice [18].More recently, Gingrich et al. [40] explored the same problem using a numeric approach, in which the space of low dissipation protocols is explored with a Monte Carlo scheme.This yields a number of degenerate, near-optimal protocols, but like the work of Rotskoff and Crooks [18] is limited by the assumption of near-equilibrium evolution.
Inspired by previous work, we examine the non-equilibrium magnetization reversal of a 2D periodic lattice of spins driven by a protocol that varies both magnetic field B(t) and temperature T (t): λ(t) = (B(t), T (t)), but push evolution beyond the near-equilibrium regime, benchmarking against the linear response formalism of Rotskoff and Crooks [18].
We seek to minimize the total entropy production ∆S, a proxy for the heat dissipated to the environment during the "bit flip" that quantifies the irreversibility of the process [9,42,43,44,40]: Here P F [x] is the probability of observing a particular trajectory x during the forward evolution of a system, and P R [x] is the probability of observing the exact time-reversal of that trajectory, x.
To find the optimal protocol for varying B(t) and T (t), we have written a Monte Carlo simulator using JAX [35], a python library with built-in automatic differentiation and just-in-time compilation.The code carries out standard Glauber dynamics [45] and iteratively updates the grid points with even then odd lattice index i using the spin flip probability where β = 1/k B T is the usual inverse thermal energy and the change in lattice energy resulting from the flip of spin σ i can be computed using the sum of its nearest neighbour spins j σ j : Our code compiles to run rapidly on GPUs or TPUs and is differentiable: given a Monte Carlo trajectory of spins under some protocol λ(t), we can compute the gradient of any function of the trajectory with respect the protocol.This gradient can be computed in either forward mode or reverse mode.In forward mode, the computational work for computing the gradient depends on the number of parameters characterizing the protocol λ(t).In reverse mode, the computational work is independent of the number of parameters but it is necessary to hold the entire trajectory and all intermediate derivatives in memory, which can be a significant constraint.
Here, the parameterizations we use for λ(t) are low dimensional relative to the length of the trajectories considered, so gradients are computed with forward mode differentiation.We have found it most convenient to parameterize the protocol using a Chebyshev polynomial basis where the T i is the i th Chebyshev polynomial and k is a hyperparameter (typically we choose k = 32), but many other parameterizations are possible.
The loss function ( 2) is computed for our trajectories as follows.Each time step in the forward evolution can be formulated as a sequence of two sub-steps: first, the external protocol parameters are updated (this is where external work is performed), and then the system performs a transition to a new microstate (this is where heat is exchanged with the bath) [42,43,44].If ρ A (x 0 ) is the probability of drawing microstate x 0 from the equilibrium distribution corresponding to state A, then the probability of observing the forward trajectory is given by where P (x i−1 → x i ; λ i ) is the transition probability between lattice states x i−1 and x i at protocol parameter values λ i .Correspondingly, the probability of observing the time-reversed trajectory x is Formulating the evolution as a succession of accepted and rejected spin flips and noting that for the Glauber transition probability (3), 1 − P (∆E) = P (−∆E), with ∆E is given by (4), we can combine ( 6) and ( 7) to obtain: Here, the products containing i include all spins that flipped successfully and those containing j are failed spin flips.Note that the probability for a spin not to flip is the same in the forward and reverse trajectories, while the ∆E terms for accepted spin flips along the forward and reverse trajectory differ in sign.The ratio of probabilities of drawing the initial and final states from their respective equilibrium ensembles, ρ A (x N ) ρ B (x0) , is given by ρ where Z B /Z A = 1 since the magnitude of the external field is identical (and equal to 1) in the initial and final states of our simulations.
Plugging (3) into (8) and rearranging (2), we find that the dissipation in our simulations is given by where we take a sum of the system energy changes ∆E i following successful spin flips i, multiplied by the inverse temperature β i at which the flip occurred.
We use JAX's automatic differentiation to compute the gradient ∇ θ ω of this dissipation.Due to the discrete choices inherent in Monte Carlo simulations, care must exercised in computing gradients.In particular, the dissipation depends on external parameters through discrete spin flip operations -dictated by a Metropolis-like acceptance criterion -which are not themselves differentiable; a similar issue arises in the context of training stochastic neural networks [46].Instead of directly backpropagating through the loss function, we proceed as follows.The average dissipation over all possible trajectories x(t) for external protocol parameters θ is given by (see also Ref. [47]): where Dx(t) is an integration over all possible trajectories, P [x(t); θ] is the probability weight associated with each trajectory, and ω[x(t), θ] is the dissipation (total entropy production) for each trajectory.
Applying the product rule and noting that ∇P = P ∇ ln P , the gradient of ω(θ) is Note that merely averaging the gradient ∇ θ ω over a batch of simulated trajectories does not yield the correct average, since the probability of observing a trajectory is itself dependent on protocol parameters θ.A similar approach to finding stochastic gradients is used for taking gradients in the REINFORCE algorithm [48], widely used in reinforcement learning.
We carry out N Monte Carlo simulations of the 2D Ising lattice evolving under assumed protocol λ(θ).For each trajectory, we compute the gradients ∇ θ ln P and ∇ θ ω.Plugging these into Equation (12) and averaging gives us a Monte Carlo estimate of the gradient of interest, ∇ θ ω(θ) .We then use the Adam optimizer [49] to minimize the loss (Eq.10).
Figure 1 (A) shows our lowest achieved average entropy production ω from AD protocols alongside ω for the near-equilibrium theoretical protocol of Rotskoff and Crooks [18] for seven different simulation lengths on a 32x32 lattice (τ =10, 50, 100, 500, 1000, 5000, and 10000 MC time steps).The longer the simulation, the closer the magnetization reversal is to quasi-static.ω values are averages over n=2560 trajectories.In all cases, AD outperforms the near-equilibrium theory.We repeated the t=50, t=100, t=500, and t=1000 simulations on multiple lattice sizes up to 512x512 and found the normalized entropy production on the 32x32 lattice is within 1% of its converged, infinite-lattice value; see SI Figure S3.The AD-derived protocols also outperformed the near-equilibrium theory regardless of lattice size.This suggests AD optimization can be effectively performed on smaller lattices to find optimal protocols for larger lattices.
Figure 1 (B) shows the optimal protocols corresponding to the simulation lengths in (A) along with the near-equilibrium theoretical curve of Rotskoff and Crooks [18].Curves for t=5000 and t=10000 are omitted as they were near-identical to the t=1000 curve.While the near-equilibrium theoretical protocol is necessarily time-symmetric [18], our protocols appear to break this symmetry; see SI Fig. S5.Like the near-equilibrium theoretical result, the AD protocols avoid the critical phase transition region [18], but they do not appear to be converging to the exact shape of the near-equilibrium theoretical curve as equilibrium is approached.We also observed a flat loss function for t=5000 and t=10000 over 1000 optimization iterations, as shown in SI Figure S2.The fact that our curves perform comparably to the near-equilibrium theoretical curve in the near-equilibrium limit, but are differently shaped, suggests that the entropy dissipation landscape is relatively flat in the region of optimal protocols, and that there is a degenerate space of nearly-optimal solutions.Indeed, all of the AD protocols in Fig. 1 (B) perform comparably well at longer simulation lengths; see SI Figure S6.This is in agreement with the findings of Gingrich et.al. [40] and noted by Rotskoff and Crooks [18], who predict 'weakly constrained' protocols in the non-critical region of (B(t), T (t))-space.trajectories (standard errors of the mean are smaller than the line widths).The AD protocols in each case were derived using the Adam optimizer [49] with gradients averaged over batches of N=256.AD protocols are able to outperform the optimal entropy production that results from using the near-equilibrium (NE) theoretical protocol of Rotskoff and Crooks [18] and approach the minimum dissipation value of Rotskoff and Crooks [18] as the near-equilibrium regime is approached.Inset: zoom-in of near-equilibrium region.(B) Optimal protocols for reversing magnetization for five simulation lengths: t=10, 50, 100, 500, and 1000.Forward evolution in time occurs along the counterclockwise direction.Curves for t=5000 and t=10000 are nearly identical to the t=1000 curve and are not shown.That the AD curves do not converge to the shape of the near-equilibrium (NE) theoretical curve, yet produce lower dissipation in (A), suggests a flat loss landscape in the optimal region in the near-equilibrium regime, in accordance with previous findings [18,40].

Optimal Protocols for Brownian Dynamics
We now consider the molecular dynamics (MD) of isothermal evolution of Brownian particles , where the total entropy production is equal to the dissipated external work (W D ) [9], which we use as our loss function in the following case studies.

Brownian particle in a harmonic potential
Some of the earliest work identifying optimal non-equilibrium protocols focused on the paradigmatic colloidal Brownian particle in a harmonic trap [50,22,20,47,8].Exact optimal protocols for varying the stiffness of the center and stiffness of the harmonic potential were found by Schmiedl and Seifert [22] using variational calculus.Strikingly, the solutions have discrete 'jumps' in the parameters [51,47].We note that approaches based on the linear response approximation are incapable of discovering these jumps since they assume protocols are differentiable [50,41].
We reproduce these early results by using JAX-MD [52] to automatically differentiate molecular dynamics simulations of a colloid subjected to the moving harmonic potential where here λ(t) is the time dependent position of the trap.We seek a protocol that that minimizes the total work dissipated in moving the trap from λ i to λ f .As before, and following Crooks [44], evolution can be formulated as proceeding in two stages: (i) the external protocol is updated and then (ii) the particle makes a random transition to a new state.External work is done in the first step, implying: Since the free energy is the same in the initial and final ensembles, this external work is equivalent to the 'dissipated work' W D .
We performed Brownian dynamics simulations using one of the sets of parameters considered by Schmiedl and Seifert [22]: ∆λ = λ f − λ i = 5; k B T = µ = 1, with k B T the usual thermal energy and µ the mobility of the colloid, and total simulation time of t f = 2.69 units, the time that theoretically yields the highest ratio between the work dissipated by a 'naive', linear trap protocol and the dissipated work corresponding to the optimal protocol [22].
To parameterize our protocols, we consider piecewise linear λ(t), specified by the values at 8 distinct time points.
Starting from an initial guess of a linear protocol, we perform optimization with Adam [49] on batches of N = 5000 MD simulations with learning rate α = 0.1, integration time step dt = 0.001, and an initial equilibration period of 0.1 simulation time units prior to trap motion, we are able to reproduce the exact theoretical optimal curve derived by Schmiedl and Seifert [22] within 100 optimization iterations, taking a few minutes on a GPU.Our calculation reproduces the theoretical ratio between the work dissipated by the optimal and linear protocols, W lin /W opt ≈ 1.14 for t f = 2.69 [22].
Our methodology also successfully reproduces the exact theoretical optimal protocol for varying the stiffness of a harmonic potential V (x, t) = λ(t)x 2 /2 within 100 optimization iterations; see Materials & Methods for details. Figure 2 summarizes our results.

Driven barrier crossing
We now turn to the more complex situation of driven Brownian motion on a bistable potential, a model used widely in soft matter to represent biomolecular unfolding via AFM or optical tweezers [53,54], as well as optimal protocols for bit erasure [55,12].Sivak and Crooks [36] consider a Brownian particle driven across a bistable potential energy landscape (see Figure 3 insets and Materials & Methods for details of the potential) by a harmonic trap with a time dependent minimum λ(t).The trap drives barrier crossing from one minimum to the other.
Following References [36] and [37], we performed molecular dynamics simulations of barrier crossing using parameters approximating DNA hairpin unfolding experiments with optical tweezers; see Materials & Methods for details.For quantitative comparison with previous work, we make the same simplifying assumptions as Sivak and Crooks [36]: (i) the free energies of the initial and final equilibrium states are equal and (ii) the two landscape wells have equal curvature.Note that our method does not require these assumptions.We proceed by calculating dissipated work with Eq. 14.We consider two free energy landscapes, with barrier heights 2. Figure 2: Automatic differentiation-based optimization rapidly converges to the exact theoretical optimal protocol for the cases of a Brownian particle in (A) a moving harmonic trap and (B) a trap with time-varying stiffness.The discrete jumps at the beginning and the end of the protocol found originally by Schmiedl and Seifert [22] are recaptured by our method.[36] in the near-equilibrium regime, corresponding to a 2.5 k B T barrier landscape.Inset: a 2.5 k B T barrier energy landscape.(B) The AD protocol is asymmetric and differs from the near-equilibrium (NE) theoretical result of Sivak and Crooks [36] in the farther-from-equilibrium regime represented by a 10 k B T barrier landscape.Upper right insets reveal a discrete jump at the beginning and end of the optimal protocol, in agreement with results for similar systems [22,20,51,59].Lower right inset: a 10 k B T barrier energy landscape.(C) Probability work distributions associated with the protocols in (A).AD ( W D = 0.104 ± 0.001 k B T) performs as well as the near-equilibrium (NE) theory ( W D = 0.104 ± 0.001 k B T) within error, and both outperform a naive linear protocol ( W D = 0.136 ± 0.002 k B T). (D) Probability work distributions for the protocols shown in (C).Here the AD protocol ( W D = 3.260 ± 0.007 k B T) significantly outperforms the near-equilibrium (NE) theory ( W D = 5.37 ± 0.02 k B T), which fails to unfold all particles and is thus also beaten by a naive linear protocol ( W D = 5.172 ± 0.009 k B T). from equilibrium.Here, a barrier height of 2.5 k B T (10 k B T) maps roughly onto the unfolding of a 6 (20) base pair DNA hairpin [56,57,58].
Figure 3 presents the results of using automatic differentiation-based optimization to find optimal trap protocols λ(t) for driven barrier crossing; landscape profiles are shown as insets.In the near-equilibrium regime (barrier height 2.5 k B T), optimizing over a batch of N = 2504 trajectories, our method converges to the near-equilibrium theoretical result of Sivak and Crooks [36] after 1000 optimization iterations, with most of the convergence achieved after a couple hundred optimization steps (taking a few hours on TPU).The shape of the optimal protocol -faster trap motion at the beginning and ends of the motion and a slowing down in the central barrier region -reflects the fact that the minimal work is dissipated if the trap slows down in the vicinity of the barrier to 'wait' for the system to harness thermal energy kicks to surmount it [36,37].
We compared the limiting probability distributions (across ∼1e5 MD simulations) of dissipated work for the nearequilibrium theoretical protocol, our result, and a naive linear protocol.The AD-optimized and near-equilibrium theoretical distributions agree within error, each having a mean work of W D = 0.104 ± 0.001 k B T. Both protocols outperform the naive linear protocol, which gives W D = 0.136 ± 0.002 k B T. Errors are standard errors of the mean.
The AD-based optimization allows us to probe far beyond the near-equilibrium regime (Figure 3 (C) and (D)).With a 10 k B T barrier landscape, the automatic differentiation-based optimal protocol outperforms the near-equilibrium theory.Here, our algorithm finds that a non-symmetric protocol is optimal, whereas linear theory necessarily predicts that it is symmetric [41].Intuitively, the trap needs to spend more time in the vicinity of the barrier to successfully 'drag' the particle along: the bimodal distribution of the near-equilibrium theoretical p(W ) in Figure 3 (D) reveals that not every particle successfully 'unfolds' under the near-equilibrium optimal protocol: some are left behind in the 'folded' state after the trap has completed its motion.These trajectories maximize the external dissipated work: after the trap stops moving, work can no longer accrue according to 14, even if the particle eventually hops to the unfolded well.In contrast, AD-based optimization finds a protocol that 'unfolds' all molecules in simulation time, leading to a significantly lower average dissipated work of W D = 3.260 ± 0.007 k B T compared to the near-equilibrium theoretical mean work of W D = 5.37 ± 0.02 k B T. Here, a naive linear protocol ( W D = 5.172 ± 0.009 k B T) also outperforms the near-equilibrium theory.
The AD-based λ(t) features discrete jumps at the beginning and end of the protocol that are absent in the linear response optimum (see upper insets of Figure 3 (C)).Discrete jumps have been observed in multiple other studies of minimum-dissipation protocols [20,51,59], and indeed were posited by Schmiedl and Seifert [22] to be a "generic feature of the optimal protocol for arbitrary potentials."Recent work corroborates the universality of jump features in optimal protocols [60].Since the near-equilibrium theory assumes protocols to be differentiable, it necessarily miss these features [41,24,10].
We note that although here we have focused here on the form of landscape studied in previous literature, our method allows the user to perform a similar barrier crossing optimization for virtually any free energy landscape, such as bespoke free energy landscapes containing nontrivial features -like intermediate states -that map to complex biomolecules.

Conclusion
We have demonstrated the viability of automatic differentiation (AD) to identify optimal non-equilibrium protocols for both Monte Carlo and molecular dynamics simulations.The method performed as well as existing near-equilibrium theoretical results in the near-equilibrium regime for both magnetization reversal on a 2D Ising lattice and driven barrier crossing.Critically, the AD algorithm easily extends to far-from-equilibrium conditions, where it significantly outperforms existing near-equilibrium theoretical protocols.
Our work considers fixed simulation times with one-and two-dimensional protocols, though the framework is much more general than this, and essentially arbitrary constraints and protocol parameters are possible, e.g.multidimensional external protocols and arbitrary loss functions.For example, one could attempt to maximize the accuracy of Jarzynskibased free energy landscape reconstructions for a given amount of experimental or computational time; optimize the speed or efficiency of nanoengines; or minimize the time taken to unfold a molecule.Further, in the JAX-MD code suite, non-Brownian dynamics can easily be simulated, opening up the possibilities of optimizing protocols for systems like the recently-proposed active-matter based thermodynamic engine [15].
The main limitation of AD-based protocol optimization at present is the high computational cost of backpropagating gradients through entire simulations.We are currently exploring strategies to mitigate this, including importance sampling of trajectories.Improving performance will allow more experimentally-realistic models of complex systems to be studied.
AD provides a valuable complement to existing near-equilibrium approaches to find optimal protocols, as it makes more complex systems and the far-from-equilibrium regime accessible.We are eager to see its manifold applications unfold in non-equilibrium protocol optimization and beyond.

2D Ising Model MC simulations
We implement Glauber dynamics [45] on a 32x32 lattice with periodic boundary conditions using the JAX Python code suite [35], and check our results using forward simulations on multiple other lattice sizes up to 512x512.Because the Hamiltonian (Eqn. 1) contains only nearest-neighbour spin interactions, we use a checkerboard update scheme [61] in which spin flips are proposed for all 'even' spins and then all 'odd' spins.Spin flips are accepted with the Glauber probability (Eqn.3).
As an initial protocol guess, we use a linear ramp for the external field and a quadratic curve for the temperature with fixed endpoints (B(0), T (0)) = (−1, 0.65) and (B(τ ), T (τ )) = (1, 0.65); see SI for details.The exceptions are our t=5000 and t=10000 simulations, for which we use the t=1000 and t=5000 AD protocols as initial guesses, respectively.Each of the B(t), T (t) protocols are parametrized as 32-degree Chebyshev polynomials capturing the deviation of the final protocol from the initial guess.
Gradients are clipped at norm = 1 and the Adam optimizer [49] is used with b1 = 0.9, b2 = 0.999, eps = 1 × 10 −8 , and a learning rate that begins at α = 0.1 and decays exponentially to α = 0.01 by the end of the optimization, which we run for 5000 iterations for the t=10, t=50, t=100, t=500, and t=1000 simulations.For the t=5000 and t=10000 simulations, the learning rate starts at α = 0.01 and decays exponentially to α = 0.001 over 1000 iterations.We compute gradients over batches of N=256 simulations.

Molecular dynamics simulations
We evolve a particle using the JAX-MD Python package [52] according to overdamped Langevin dynamics [62]: where r(t) is the particle's position; F (r(t)) are the forces arising from the potential; m is the particle mass; γ is the friction coefficient; k B T is the usual thermal energy; and η(t) is a random Gaussian-distributed noise term that simulates thermal coupling to the bath.

Brownian particle in harmonic trap
As noted in the text, we use µ = 1 mγ = 1, kT = 1, integration time step δt = 0.001, and run 100 steps with the Adam optimizer with with b1 = 0.9, b2 = 0.999, eps = 1 × 10 −8 and learning rate α = 0.1.Before beginning protocols, the colloid is equilibrated for 0.1 simulation time units, until its energy stabilizes.For the moving trap simulations: we use batches of N=5000 trajectories to estimate gradients; run for a total simulation time of t f = 2.69 simulation time units; set trap stiffness k = 1; and use λ i = 5 and λ f = 10.For the varying stiffness simulations, we use batches of N=100 000 trajectories; run for a total simulation time of t f = 0.5; and increase stiffness from λ i = 0.2 to λ f = 1.The average dissipated work is given by ( 12); however, the first term is zero for our Brownian evolution, as the randomness originates in Gaussian noise generated by a uniform probability distribution.This is analogous to the 'reparametrization trick' used in variational autoencoders [63].We therefore straightforwardly compute ∇ θ ω in our optimization.

Driven barrier crossing
We simulate Brownian motion on the bistable landscape where β is the usual thermal energy; κ L/R is the curvature of the left/right well; x is the particle position, ±x m are the distances from the potential minima to the barrier; and ∆E is the free energy difference between the 'folded' (left well) and 'unfolded' (right well) states.The full potential the particle is subject to is given by E m (x) plus a harmonic trap of stiffness k s with a time dependent minimum location λ(t): Following Sivak and Crooks [36], we set κ L m = κ R m = κ m , x m = 10nm, and ∆E = 0. We simulate two landscape barrier heights, E B = E m (0) − E m (x m ) = 2.5k B T and 10k B T. Setting the barrier height E B = E m (0) − E m (x m ) to be 2.5k B T and 10 k B T fixes the curvature of the wells to be κ m = 0.2627 and κ m = 0.8798, respectively.
In carrying out overdamped Langevin (i.e.Brownian) dynamics simulations according to (15), we again follow Sivak and Crooks [36] and use a diffusion coefficient, spring constant, and mass representative of the dielectric beads used in optical tweezer experiments: D = 0.44µm 2 /s, k s = 0.4pN nm −1 and mass = 1 × 10 −17 g.Our simulations are carried out at room temperature, k B T = 1 β = 4.114pN • nm, and friction coefficient γ = kBT Dm as per the Einstein relation.We use an integration time step of δt = 10 −6 s and equilibrate the particle for 0.001 s prior to beginning the protocol, which moves the trap from x = −10nm to x = 10nm in t f = 10 ms.
In practice, we found that using the REINFORCE method provided better convergence than straight computation of ∇ θ ω for the 10kT landscape; this effect has been noted elsewhere [64], and so we explicitly compute both terms in (12) for this case.
We optimize on batches of N = 2504 trajectories with the Adam optimizer and parameters b1 = 0.9, b2 = 0.999, eps = 1 × 10 −8 , and a learning rate that begins at α = 0.1 and decays exponentially to α = 0.001 by the end of the optimization, which we run for 1000 iterations.As an initial guess, we use a linear protocol.Our protocols are defined piecewise linearly between the first (t = 0) and second (t = δt) points and between the penultimate (t = t f − δt) and final (t = t f ) points and parametrized between δt and t f − δt using a degree 13 Chebyshev polynomial; this allows us to capture any discrete jumps at the beginning and end of the protocol.Figure 2: Convergence of the optimal protocol and entropy dissipation (loss) as a function of Adam optimization iteration for the (a) t=5000 and (b) t=10000 simulation lengths.For the t=5001 simulation, the t=1000 optimal curve is used as an initial guess and no significant decrease in loss is observed after N=1000 optimization iterations.For the t=10000 simulations, the t=5000 optimal curve is used as an initial guess, and again, no significant decrease in loss is observed after N=1000 optimization iterations.Gradients are averaged over batches of n=256 simulations.Here, the optimal protocol derived using automatic differentiation on a 32x32 lattice is used to run forward simulations on smaller and larger lattices to test whether optimizing on a 32x32 yields a good approximation of the infinite-lattice optimal curve.Forward simulations are also run using the near-equilibrium (NE) theoretical optimal curve of Sivak and Crooks [4].For each simulation duration plotted (t=50, 100, 500, 1000), the entropy production is normalized I. for lattice size, by dividing by lattice area, and II.for the fact that entropy dissipation decreases with for longer (i.e.closer-to-equilibrium) simulations, by dividing by entropy production for a t=50 simulation on a 4x4 lattice using the near-equilibrium (NE) theoretical optimal protocol of Sivak and Crooks [4].Top panel: A comparison between the entropy produced with the AD-derived optimal protocol (blue) and the nearequilibrium (NE) theoretical protocol (orange) for each simulation length, showing that the AD-derived protocols outperform near-equilibrium theory in the infinite-lattice limit; Bottom panel: zoom-in on the results of the AD-derived protocols, showing that the normalized entropy production has converged to the infinite-lattice limit.The 32x32 result, used in the main text, is shown as a red star, and agrees to within 1% of the infinite-lattice limit.Entropies are averages over N=2560 simulations.) for four different lattice sizes: 32x32 (featured in the main text), 128x128, 256x256, and 512x512, for both the near-equilibrium (NE) theoretical optimal protocol of Sivak and Crooks [4] and for the optimal protocol derived using automatic differentiation (AD) on 32x32 lattice simulations.Results for larger lattices collapse onto the 32x32 results, demonstrating that performing AD optimization on a smaller lattice provides a good estimation of the optimal protocol in the infinite lattice limit.6: Performance of optimal protocols derived for a specific simulation length (t=10, 50, 100, 500, 1000) on simulations of other lengths.Specifically, the optimal protocol found using automatic differentiation for t=10 magnetization reversal simulations was run on batches of n=2560 forward simulations of lengths t=50, 100, 500, and 1000, and similarly for the other simulation lengths.The 'bespoke' curve indicates the results of using the AD-derived protocol for the specific simulation length.Notably, far-from-equilibrium optimal protocols perform comparably to the near-equilibrium protocols in the near-equilibrium regime (the reverse is not true).Standard errors of the mean are less than line widths.7: Same as above, but zoomed in on the near-equilibrium region.Performance of optimal protocols derived for a specific simulation length (t=10, 50, 100, 500, 1000) on simulations of other lengths.Specifically, the optimal protocol found using automatic differentiation for t=10 magnetization reversal simulations was run on forward simulations of lengths t=50, 100, 500, and 1000, and similarly for the other simulation lengths.The 'bespoke' curve indicates the results of using the AD-derived protocol for the specific simulation length.Notably, far-from-equilibrium optimal protocols perform comparably to the near-equilibrium protocols in the near-equilibrium regime (the reverse is not true).Standard errors of the mean are less than line widths.9: Average cyclical dissipated work (solid lines, left ordinate) and half the variance in cyclical dissipated work (dotted lines, right ordinate) for barrier crossing simulations of various total lengths.For these cyclical simulations, a Brownian particle is dragged across its landscape and then back to its original position.The simulation times in the abscissa are half of the total cycle time, and longer simulations are closer to equilibrium.At or near equilibrium, the average cycle work is expected to equal half the variance [6].Thus, for simulations perfectly at equilibrium, we expect the dotted and solid lines to overlap.Here, the simulations on a 2.5k B T barrier satisfy the βW = β 2 σ 2 W /2 condition for simulations longer than 1ms, while the 10k B T barrier simulations begin to deviate from this condition already at 10ms simulation lengths.Here, a a batch of 5000 simulations is run at each simulation end time, and a linear trap protocol -not an optimal protocol -is used.All other parameters used are as described in Materials & Methods in the main text.The main text features simulations of length t=10ms, where the 2.5k B T barrier landscape dynamics are near-equilibrium and the 10k B T barrier landscape dynamics have begun to deviate from equilibrium.

Figure 1 :
Figure1: (A) Minimum entropy dissipation for different simulation lengths on a 32x32 lattice, averaged over n=2560 trajectories (standard errors of the mean are smaller than the line widths).The AD protocols in each case were derived using the Adam optimizer[49] with gradients averaged over batches of N=256.AD protocols are able to outperform the optimal entropy production that results from using the near-equilibrium (NE) theoretical protocol of Rotskoff and Crooks[18] and approach the minimum dissipation value of Rotskoff and Crooks[18] as the near-equilibrium regime is approached.Inset: zoom-in of near-equilibrium region.(B) Optimal protocols for reversing magnetization for five simulation lengths: t=10, 50, 100, 500, and 1000.Forward evolution in time occurs along the counterclockwise direction.Curves for t=5000 and t=10000 are nearly identical to the t=1000 curve and are not shown.That the AD curves do not converge to the shape of the near-equilibrium (NE) theoretical curve, yet produce lower dissipation in (A), suggests a flat loss landscape in the optimal region in the near-equilibrium regime, in accordance with previous findings[18,40].5 5 k B T and 10 k B T, corresponding to the near-equilibrium regime and a farther-from-equilibrium regime, respectively; SI Fig.S7contains details of how we quantified distance

Figure 3 :
Figure 3: (A)The automatic differentiation-based optimal protocol converges to the near-equilibrium (NE) theoretical result of Sivak and Crooks[36] in the near-equilibrium regime, corresponding to a 2.5 k B T barrier landscape.Inset: a 2.5 k B T barrier energy landscape.(B) The AD protocol is asymmetric and differs from the near-equilibrium (NE) theoretical result of Sivak and Crooks[36] in the farther-from-equilibrium regime represented by a 10 k B T barrier landscape.Upper right insets reveal a discrete jump at the beginning and end of the optimal protocol, in agreement with results for similar systems[22,20,51,59]. Lower right inset: a 10 k B T barrier energy landscape.(C) Probability work distributions associated with the protocols in (A).AD ( W D = 0.104 ± 0.001 k B T) performs as well as the near-equilibrium (NE) theory ( W D = 0.104 ± 0.001 k B T) within error, and both outperform a naive linear protocol ( W D = 0.136 ± 0.002 k B T). (D) Probability work distributions for the protocols shown in (C).Here the AD protocol ( W D = 3.260 ± 0.007 k B T) significantly outperforms the near-equilibrium (NE) theory ( W D = 5.37 ± 0.02 k B T), which fails to unfold all particles and is thus also beaten by a naive linear protocol ( W D = 5.172 ± 0.009 k B T).

Figure 1 :
Figure 1: Convergence of the optimal protocol and entropy dissipation (loss) as a function of Adam optimization iteration for the (a) t=50, (b) t=100, (c) t=500, and (d) t=1000 simulation lengths.The initial guess curves are quadratic in temperature and linear in magnetic field, and the AD curves in the text are the result of N=5000 optimization iterations.Gradients are averaged over batches of n=256 simulations.

Figure 3 :
Figure3: Normalized entropy production as a function of lattice size (4x4, 8x8, 12x12, 16x16, 24x24, 32x32, 40x40, 128x128, 256x256, and 512x512 shown).Here, the optimal protocol derived using automatic differentiation on a 32x32 lattice is used to run forward simulations on smaller and larger lattices to test whether optimizing on a 32x32 yields a good approximation of the infinite-lattice optimal curve.Forward simulations are also run using the near-equilibrium (NE) theoretical optimal curve of Sivak and Crooks[4].For each simulation duration plotted (t=50, 100, 500, 1000), the entropy production is normalized I. for lattice size, by dividing by lattice area, and II.for the fact that entropy dissipation decreases with for longer (i.e.closer-to-equilibrium) simulations, by dividing by entropy production for a t=50 simulation on a 4x4 lattice using the near-equilibrium (NE) theoretical optimal protocol of Sivak and Crooks[4].Top panel: A comparison between the entropy produced with the AD-derived optimal protocol (blue) and the nearequilibrium (NE) theoretical protocol (orange) for each simulation length, showing that the AD-derived protocols outperform near-equilibrium theory in the infinite-lattice limit; Bottom panel: zoom-in on the results of the AD-derived protocols, showing that the normalized entropy production has converged to the infinite-lattice limit.The 32x32 result, used in the main text, is shown as a red star, and agrees to within 1% of the infinite-lattice limit.Entropies are averages over N=2560 simulations.

Figure 4 :
Figure4: Normalized entropy production (all entropies are divided by the near-equilibrium (NE) theoretical protocol entropy dissipation for t=50) for four different lattice sizes: 32x32 (featured in the main text), 128x128, 256x256, and 512x512, for both the near-equilibrium (NE) theoretical optimal protocol of Sivak and Crooks[4]  and for the optimal protocol derived using automatic differentiation (AD) on 32x32 lattice simulations.Results for larger lattices collapse onto the 32x32 results, demonstrating that performing AD optimization on a smaller lattice provides a good estimation of the optimal protocol in the infinite lattice limit.

FieldFigure 5 :
Figure5: Symmetry breaking in the optimal protocols derived with automatic differentiation.Protocols are the result of automatic differentiation based optimization on a 32x32 lattice, for the four simulation lengths shown.By contrast, the near-equilibrium (NE) theoretical optimal protocol is necessarily time symmetric[4].Dotted lines are a guide for the eye and indicate what a symmetric reflection of the first half of each protocol would look like.In general, protocol symmetry increases as equilibrium is approached, as expected.

Figure
Figure6: Performance of optimal protocols derived for a specific simulation length (t=10, 50, 100, 500, 1000) on simulations of other lengths.Specifically, the optimal protocol found using automatic differentiation for t=10 magnetization reversal simulations was run on batches of n=2560 forward simulations of lengths t=50, 100, 500, and 1000, and similarly for the other simulation lengths.The 'bespoke' curve indicates the results of using the AD-derived protocol for the specific simulation length.Notably, far-from-equilibrium optimal protocols perform comparably to the near-equilibrium protocols in the near-equilibrium regime (the reverse is not true).Standard errors of the mean are less than line widths.

Figure
Figure7: Same as above, but zoomed in on the near-equilibrium region.Performance of optimal protocols derived for a specific simulation length (t=10, 50, 100, 500, 1000) on simulations of other lengths.Specifically, the optimal protocol found using automatic differentiation for t=10 magnetization reversal simulations was run on forward simulations of lengths t=50, 100, 500, and 1000, and similarly for the other simulation lengths.The 'bespoke' curve indicates the results of using the AD-derived protocol for the specific simulation length.Notably, far-from-equilibrium optimal protocols perform comparably to the near-equilibrium protocols in the near-equilibrium regime (the reverse is not true).Standard errors of the mean are less than line widths.

Figure 8 :
Figure 8: Convergence of the automatic differentiation-derived optimal protocols to the exact theoretical optimal protocols of Schmiedl and Seifert [5] for a Brownian particle in (a) a moving harmonic trap and (c) a harmonic trap with time-varying stiffness, as shown in the main text.Loss (dissipated work) convergence is also shown for (b) the moving trap and (c) the trap varying in stiffness.

Figure
Figure9: Average cyclical dissipated work (solid lines, left ordinate) and half the variance in cyclical dissipated work (dotted lines, right ordinate) for barrier crossing simulations of various total lengths.For these cyclical simulations, a Brownian particle is dragged across its landscape and then back to its original position.The simulation times in the abscissa are half of the total cycle time, and longer simulations are closer to equilibrium.At or near equilibrium, the average cycle work is expected to equal half the variance[6].Thus, for simulations perfectly at equilibrium, we expect the dotted and solid lines to overlap.Here, the simulations on a 2.5k B T barrier satisfy the βW = β 2 σ 2 W /2 condition for simulations longer than 1ms, while the 10k B T barrier simulations begin to deviate from this condition already at 10ms simulation lengths.Here, a a batch of 5000 simulations is run at each simulation end time, and a linear trap protocol -not an optimal protocol -is used.All other parameters used are as described in Materials & Methods in the main text.The main text features simulations of length t=10ms, where the 2.5k B T barrier landscape dynamics are near-equilibrium and the 10k B T barrier landscape dynamics have begun to deviate from equilibrium.

Figure 10 :
Figure 10: Loss versus Adam optimization iteration for the (a) 2.5k B T barrier and (b) 10k B T barrier optimal protocols shown in the main text.