Reliable extraction of energy landscape properties from critical force distributions

The structural dynamics of a biopolymer is governed by a process of diffusion through its conformational energy landscape. In pulling experiments using optical tweezers, features of the energy landscape can be extracted from the probability distribution of the critical force at which the polymer unfolds. The analysis is often based on rate equations having Bell-Evans form, although it is understood that this modeling is inadequate and leads to unreliable landscape parameters in many common situations. Dudko and coworkers [Physical Review Letters 96, 108101 (2006)] have emphasized this critique and proposed an alternative form that includes an additional shape parameter (and that reduces to Bell-Evans as a special case). Their fitting function, however, is pathological in the tail end of the pulling force distribution, which presents problems of its own. We propose a modified closed-form expression for the distribution of critical forces that correctly incorporates the next-order correction in pulling force and is everywhere well-behaved. Our claim is that this new expression provides superior parameter extraction and is valid even up to intermediate pulling rates. We present results based on simulated data that confirm its utility.


I. INTRODUCTION
The contribution of quantum processes notwithstanding [1], classical energy landscape theory [2][3][4][5] provides a useful framework for describing the evolution of biopolymers between various folded and unfolded configurations through a process of thermally driven escape from local confining potentials [6]. Developing tools of analysis within this framework has become ever more pressing, given the profound developments in single-molecule biophysics [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. One of the key practical problems is how to infer the energy landscape, or at least a projection of it onto an appropriate reaction coordinate, from experimentally measured quantities. As is typical of inverse problems, recovery of the landscape from measured data is ill-conditioned: it is highly sensitive to experimental uncertainties and to any assumptions that go into the forward model.
In pulling experiments using optical tweezers [23], the determination of landscape features has historically been carried out based on Bell-Evans phenomenological theory [24][25][26][27][28], which assumes that the rate constant k(F) scales up exponentially with applied force from its unperturbed, intrinsic value k 0 according to the Ahrenius law, Here, β −1 = k B T is the thermal energy scale set by the aqueous environment; x ‡ is the minimum-to-barrier distance of the effective one-dimensional potential U 0 (x), a continuous (but not necessarily smooth) function of the end-to-end extension. A common experimental situation involves the application of a pulling force F = KVt that grows linearly in time until the rupture force F C is reached. Although other pulling protocols are sometimes employed [29][30][31][32][33], we focus on the case of constant pulling speed, and we ignore instrument-specific issues of compliance [34][35][36].
It is recognized that a description of pulling experiments based on the Bell-Evans formula for the force-induced rupture rate is in poor accord with results from numerical simulations [37]. The naive thermal-activation picture, represented by Bell-Evans, suffers from various inadequacies that are important to address. To begin, Eq. (1) is strictly applicable only in the limit of low pulling rate (KV KV min = k 0 /βx ‡ ) and ultra high barrier (∆G ‡ F x ‡ , k B T). Even in the moderate pulling regime, it incorrectly predicts the rupture force distribution. It also ignores self-consistency effects in the sense that it does not account for the fact that the distance x ‡ and the energy barrier ∆G ‡ are themselves force-dependent and both diminish with increasing F as the energy landscape is tilted. Nor does it properly account for the shape of the barrier, which plays a vital role in establishing the escape rate and the nature of the escape trajectory for more modest, biologically relevant barrier heights.
Consequently, there are many situations in which the phenomenological theory incorrectly predicts the results of pulling experiments. It tends to overestimate the rate of rupture k(F) at a given force F and to underestimate the mean and mostprobable rupture forces. Hence, when the Bell-Evans rate, k BE (F), is used as the basis for a fit to experimental data, the extracted parameters, ∆G ‡ , x ‡ , and k 0 , may be incorrectly predicted. Our main concern here lies in the reliable extraction of these physical quantities.
Attempts have been made to improve on Bell-Evans by introducing additional fitting parameters [38,39], sometimes in an ad hoc way. Dudko and coworkers have tried to make the analysis more rigorous [40]. They calculated k(F) and the corresponding probability density of the rupture force p(F C ) within the framework of Kramer's Theory [41] for two specific free energy surfaces-the cusp surface and the linear cubic surface-and showed that these two examples can be subsumed into a single result [appearing as Eq. (3) in Ref. 40], with interpolation provided by a shape parameter ν. This encompasses the Bell-Evans result, since k D (F) → k BE (F) arXiv:2001.02702v1 [physics.bio-ph] 8 Jan 2020 as ν → 1. It is clear, however, that for all ν 1 Eq. (2) has a dangerous point of nonanalyticity. The vanishing of the rate k D (F) → 0 as F → ∆G ‡ /x ‡ ν (for shape parameters in the range 0 < ν < 1) is manifestly unphysical; hence the Dudko expression is only appropriate for the pulling regime in which F ∆G ‡ /x ‡ ν. In fact, the region of validity is more constrained still, since we should further require that the escape rate grow with pulling force. As it turns out, the function k D (F) is monotonic increasing only for We pursue a different approach that produces no nonanalyticity and no obviously unphysical behavior. We compute log k(F)/k 0 order by order in the pulling force. Rather than truncate the expansion, we approximate the higher order terms as a resummation by geometric series-similar in spirit to the Random Phase Approximation or the infinite summation of ladder diagrams in many-body theory: Here κ ‡ is the reduced curvature of the well and barrier. The route to Eq. (4) is nothing more than a mathematical trick, but it rather elegantly cures the ill behavior of a truncated expansion, and it fortuitously leads to a closed-form expression for the cumulative probability distribution. Our attempts to benchmark Eq. (4) fall into two categories: prediction and parameter extraction, which correspond to the forward and inverse problems. In the forward direction, we determine the escape rates and the cumulative probability distribution of the critical force following the numerical method described in Sec. III. We compare the simulated behavior to the various analytical predictions. We find that our proposal outperforms the Bell-Evans and Dudko expressions, across many different choices of energy landscape and over a broad range of pulling rates. In the inverse direction, analytical forms for the cumulative probability distribution P(F C ) are fit to the simulated data to extract the optimal values of the intrinsic parameters k 0 , x ‡ , and κ ‡ .
The results we achieve are compelling. The values of the three parameters that we extract are in excellent agreement with the actual values that characterize the underlying energy landscape. Moreover, the agreement appears to hold over an unexpectedly large range of pulling rates, with KV/KV min spanning six or seven orders of magnitude.
In contrast, fits of simulation data to the Bell-Evans cumulative probability distribution, insofar as they are able to produce good values of k 0 and x ‡ at all, only do so at the very slowest pulling rates. It is difficult to speak definitively of how well Dudko's expression performs, since in that context fits must be carried in conjunction with a force cutoff somewhere below the point of nonanalyticity. This is an unwelcome complication. The cutoff itself introduces a significant element of uncertainty in the fit, since where best to put the cutoff cannot be determined if the landscape is not yet known.

Working Model
Folded The double-well potential has equilibrium positions at x l and x r , separated by a barrier at x b . A particle escaping from left to right experiences a barrier of height (Red) After application of a pulling force F, the energy landscape has tilted to favor the destination well on the right. Observe that the well positions have shifted and that the height of the barrier holding the particle in the left well has decreased.

II. FORMAL DEVELOPMENT
Kramers theory tells us that the escape rate depends weakly (polynomially) on the curvature at the bottom of the well and the top of the barrier but strongly (exponentially) on the height of the apparent energy barrier in the direction of travel [41,42]. We consider a double well potential U 0 (x), with wells at positions x l and x r separated by a barrier at x b (x l < x b < x r ), as illustrated in Fig. 1. The well escape rate from left to right is given by We allow for a pulling force F that tilts the potential landscape according to The corresponding rate equation becomes where δx l and δx r denote the shifts in the well positions as a result of the tilt. Taylor expansions of the extremal conditions and U(x l + δx l ) around x b and x l , respectively, up to second order in F, yields a rate equation of the form Here, x ‡ = x b − x l , and Successive terms in the expansion of log k(F)/k 0 have alternating sign, which is important for proper convergence of  7)]. The BE result grows exponentially without bound (but shows as a straight line because of the log-linear scale). The truncated expression turns over and becomes unphysical around 80 pN. The resummed form strikes a middle course, growing monotonically but saturating at a large, finite value, k 0 exp 2βκ ‡ (x ‡ ) 2 . the series. Indeed, there is no polynomial expression, arising as a truncation of the series at finite order, that does not either substantially over-or undershoot the true rate for large applied F. The negative-prefactor terms at even powers of F are particularly troublesome, because they lead to nonmonotonicity. As a workaround, we make use of the idea of infinite resummation, 1 − + 2 − · · · ≈ 1/(1 + ), which transforms Eq. (7) into Eq. (4), at least up to discrepancies at O(F 3 ). The transformed expression is well-behaved everywhere and displays no obviously unphysical behavior. See Fig. 2. Moreover, it leads to a closed-form expression for the cumulative probability distribution (with the correct normalization P(F C ) → 1 as F C → ∞; Dudko's expression, in contrast, cannot be properly normalized).
In the usual adiabatic limit, the expression for the cumulative probability distribution of the rupture force is given by Equations (1) and (9) together give the cumulative probability distribution of the rupture force as predicted by the Bell-Evans phenomenological model, If instead we put Eq. (4) into Eq. (9), we get a more complicated result, but one that is still simple enough to use for fitting (e.g., via Marquardt-Levenberg): The quantities F 1 and F 2 have units of force and are explicit functions of the critical value F C : The exponential integral Ei(x) = − ∫ ∞ −x dt t −1 e −t is a standard special function that is available in most data analysis software.
We now comment on the connection to the prior work of Dudko and coworkers. The unperturbed potential U 0 (x) can be expanded to quadratic order around the bottom of the well, U 0,l (x) = U 0 (x l ) + (κ l /2)(x − x l ) 2 , and around the peak of the barrier, where the two approximations take a common slope and match the functions smoothly there. The resulting piecewise composite curve has a total rise of which differs from the the true barrier height ∆G ‡ = U 0 (x b ) − U 0 (x l ) by a factor that Dudko labels 1/ν. That is, The equality ν = 2/3 holds for any degree-three polynomial. If the energy landscape is represented by a higher-degree polynomial, then the value of the shape parameter is idiosyncratic and should be viewed as drawn from a distribution with average 1/ν < 3/2. An advantage of working in terms of the shape parameter ν, rather than the effective curvature κ ‡ , is that the former can be defined even if the derivatives U (x l ) and U (x b ) vanish (e.g., a quartic well or barrier) or are not well defined (e.g., a cusp barrier). With Eq. (14) in mind, matching our resummed rate expression to that of Dudko order by order in the small-F limit produces This can be understood as a rewriting of Eq. (1), the Bell-Evans phenomenological rate, with an upward renormalization of the temperature, β → β/(1 + ν/4β∆G ‡ ), and a downward renormalization of the barrier distance x ‡ → x ‡ /(1 + νF x ‡ /4∆G ‡ ). Unlike Eq. (2), Eq. (15) is well-behaved everywhere.
In the case of an ultra-high barrier, defined by the double limit β∆G ‡ 1 and ∆G ‡ F x ‡ , Eq. (15) reduces to Eq. (1). For more modest barriers or higher temperatures, one or both of the factors (1 + ν/4β∆G ‡ ) and (1 + νF x ‡ /4∆G ‡ ) may differ appreciably from 1; this allows the rate expression to become aware of the details of the barrier's height and shape through the factor ∆G ‡ /ν.
The reliability of Eq. (15) was tested for six potential landscapes with different values of ν using the simulation scheme described in the next section. In every test example (see Fig. 3), our renormalized equation closely tracked the empirical escape rate determined from simulations. It noticeably outperformed the Bell-Evans and Dudko escape rate equations.

III. NUMERICAL SIMULATIONS
The reaction coordinate x was made to execute Langevin dynamics according to This was implemented using a modern reformulation [43] of the Verlet algorithm [44]. We mimicked the experimental situation by assuming stochastic motion of a molecule of effective mass m = 2 pg in a biquadratic potential. The data that appear in Figs. 4-7 correspond to the choice U 0 (x) = 4x 4 − 32x 2 + 64 (with x measured in nm and U 0 in pm). The molecule was assumed to be pulled from two ends along the reaction coordinate x by a laser potential with force constant K and pulling velocity V; i.e., with an instantaneous force F = KVt that in-creases linearly in time. For the given potential, the energy barrier was ∆G ‡ = 64 pN nm, the minimum-to-barrier distance x ‡ = 2 nm, and the effective curvature κ † = 42.7 pN nm −1 . The stochastic forces ξ(t) on the molecule were drawn randomly from a Gaussian distribution of width (2mγk B T δt) 1/2 with k B T = 4.1 pN nm, γ = 7 µs −1 , and a discrete timestep δt ranging from 10 −2 µs to 10 −6 µs. Pulling rates for the force F = KVt are measured with respect to KV min = k 0 /βx ‡ , which is the minimum rate for effectual pulling. Below KV min , the probability density p(F C ) is peaked at F C = 0; the particle escapes the well on its own before the applied force has appreciably modified the energy landscape. On the other hand, for rates above KV/KV min ≈ 10 6 , the barrier vanishes too quickly, long before the particle has moved any significant distance. Accordingly, we worked in the regime of pulling rates between these two extremes.
The simulation was initialized in the left well by drawing starting values of velocity and position x from the distributions e −βm 2 /2 and e −βU(x) Θ(x b − x), respectively, so that the each simulation began fully thermalized. The simulation flagged the value of pulling force at which x convincingly crossed the barrier or the barrier vanished; we took this to be the rupture or critical force F C . For each value of the pulling rate KV, the simulation was carried out 2500 times, each run generating a unique value of the rupture force. The cumulative probability distribution P(F C ) was constructed in the standard way-by sorting the measured rupture forces in ascending order and then pairing them with a uniform grid of values running from 0 to 1. The plot for P(F C ) so obtained was tested against Eq. (11) and against the Bell-Evans form, Eq. (10). The process was repeated for pulling rates ranging from KV = 10 −7 pN µs −1 to 0.6 pN µs −1 (roughly 1 KV/KV min 10 7 ) to determine how these expressions fare in the slow, intermediate, and fast pulling regimes.
In order to test parameter extraction, the original P(F C ) dataset for each pulling rate was bootstrapped [45] 100 times to generate 100 new instantiations. These data were fitted with Eq. (11) to extract the intrinsic parameters of the potential landscape: k 0 , x ‡ and κ ‡ . The spread in fit values was used to generate error estimates.
The datasets were also fitted to the Bell-Evans form given by Eq. (10) in order to extract the values of k 0 and x ‡ (κ ‡ does not appear in the Bell-Evans expression). The bootstrap-average values of the extracted parameters were compared to their known values. The theoretical intrinsic rate k 0 was computed according to the usual Kramers result, Our test potential corresponded to ω l = κ l /m = 8 µs −1 and ω b = κ b /m = 5.65 µs −1 . We verified that the theoretical value of k 0 = 1.192 × 10 −7 µs −1 was in agreement with numerical measurements of the escape rate for the nontilted energy landscape.

IV. RESULTS AND CONCLUSIONS
In the left column of Fig. 4, the rupture force distributions predicted by Eqs. (10) and (11) are compared to the results from simulation for three different pulling rates (corresponding to KV/KV min ≈ 10 1 , 10 3 , 10 5 ). For slow pulling (top row), the Bell-Evans theory and our resummed expression are wellmatched to each other and to the numerics. For intermediate pulling (middle row), the Bell-Evans result begins to deviate significantly, whereas our proposal continues to give accurate results (i.e., the solid green and red dotted lines coincide). Only at the highest pulling rates (bottom row) do we find significant deviation from the simulated rupture force distributions for both Eqs. (10) and (11); although, even there, our expression performs better and is in good agreement up to ∼ 25 pN.
It is instructive to look at the corresponding probability density of the rupture force, p(F C ) = P (F C ), obtained from Bell-Evans and our resummed form, as shown in the right column of Fig. 4. The Bell-Evans result systematically underestimates the pulling force required to traverse the barrier-and increasingly so for faster pulling. One observes that both its peak (typical rupture force) and its overall weight (mean rupture force) are positioned too far to the left (toward low force values). The same information is contained in the average critical force F C , which we obtained from the cumulative probability distributions, Eqs. (10) and (11), by numerical integration. Figure 5 shows a plot of F C as a function of the relative pulling rate. One can readily identify an intermediate regime (10 2 KV/KV min 10 5 ) in which the curve computed from the resummed rate tracks the true, numerically determined values of the average rupture force. In that same regime, the Bell-Evans curve deviates significantly.
The second part of the numerical analysis focussed on the inverse problem. Here, the simulated data were fitted using Eq. (11), and the intrinsic parameters of the energy landscape, viz., k 0 , x ‡ and κ ‡ , were determined by minimizing discrepancies between theory and data in the least-squares sense. The process was repeated for Eq. (10), but only k 0 and x ‡ (since κ ‡ does not appear in the fitting function). We found unambiguously that the parameter extraction is much more reliable using our resummed form. Indeed, use of the Bell-Evans theory was often quite misleading, because it would produce an apparently good fit that corresponded to incorrect values of the landscape parameters.
The top panel of Fig. 6, which shows a fast-pulling example with KV = 4 × 10 −2 pN µs −1 , emphasizes this point. The cumulative probability distribution appears to be equally well fit by Eqs. (10) and Eq. (11). The middle and bottom panels reveal this to be illusory. In the Bell-Evans analysis, the value of k 0 is systematically overestimated and x ‡ underestimated, and both ever more so as the pulling rate is ramped up. On the other hand, the analysis based on our resummed form yields values consistent with the correct landscape parameters. Moreover, even at low pulling rates, where Bell-Evans performs not too badly, our proposal is more reliable and produces less scatter in the parameter values.
We remark that fits of the simulation data to Eq. (11) yield astonishingly good values of κ ‡ , the effective curvature. See Fig. 7. In almost every case, regardless of pulling rate, the predicted value of κ ‡ coincides with the true value. This suggests to us that our inclusion of higher-order corrections in the rate equation plays an important role in improving the overall quality of the parameter extraction.
To conclude, our work highlights the known inadequacies of the Bell-Evans phenomenological well escape rate. It also suggests that the celebrated equation due to Dudko and coworkers is not an adequate fix. We propose a new expression, Eq. (4), that improves on Bell-Evans by including beyond-Arhenius contributions from the shape of the energy potential. Equation (4) clearly outperforms the Bell-Evans and Dudko expressions in terms of predicting the well escape rate (as is evident from Fig. 3). Crucially, it avoids the unphysical behavior that plagues Dudko's rate equation at large pulling force.
Of particular utility is that Eq. (4) integrates to give a manageable, closed-form expression for the cumulative probability distribution. The resulting Eq. (11) is straightforward to implement as a fitting function and can be incorporated into existing workflows with little additional effort. Rigorous numerical tests (illustrated in Figures 6 and 7) confirm that fits to Eq. (11) can be used to reliably extract the parameters that characterize the underlying energy landscape. , is plotted alongside the best fits for Eqs. (10) (blue dashed line) and (11) (red dotted line). The near indistinguishability of the curves illustrates the strong tendency toward overfitting. The Bell-Evans expression, though ill-suited for describing the behavior at this high pulling rate, is able to mimic the numerical data-but at the cost of producing fitting parameters that have drifted far from their true values. This is in contrast to the poor agreement in the bottom-left panel of Fig. 4, where there is no fitting and the known values of k 0 and x ‡ are used. Estimates of the intrinsic escape rate k 0 (middle panel) and the barrier distance x ‡ (bottom panel), as determined from fits of Eqs. (10) (blue crosses) and (11) (red diamonds) to simulation data over a range of pulling rates, are plotted alongside the actual value (green line). The plotted points (red diamonds) are estimates of the reduced curvature κ ‡ , as determined from fits of Eq. (11) to simulation data. They compare favorably to the true value (green line) over a range pulling rates spanning many decades.