Minimizing thermodynamic length to select intermediate states for free-energy calculations and replica-exchange simulations

In computational thermodynamics, a sequence of intermediate states is commonly introduced to connect two equilibrium states. We consider two cases where the choice of intermediate states is particularly important: minimizing statistical error in free-energy difference calculations and maximizing average acceptance probabilities in replica-exchange simulations. We derive bounds for these quantities in terms of the thermodynamic distance between the intermediates, and show that in both cases the intermediates should be chosen as equidistant points along a geodesic connecting the end states.


I. INTRODUCTION
Many computational thermodynamics applications involve the sampling of each of a sequence of ensembles lying between two equilibrium states. The choice of intermediate states can greatly affect the precision to which certain quantities can be derived from experimental measurements or computer simulations. In this work, we present results on how the intermediate states should be chosen in order to minimize the variance of computed free-energy differences ͑⌬F͒ and to ͑nearly͒ maximize average acceptance probabilities in replica-exchange ͑RE͒ simulations.
Estimation of ⌬F between two equilibrium states is an important problem with many applications, including the prediction of binding affinities of pharmaceutical compounds ͓1͔. The Bennett acceptance ratio method provides an asymptotically efficient estimator for ⌬F that uses equilibrium samples at the two states ͓2,3͔, but converges slowly unless the two states overlap significantly in phase space. The same problem applies to other free-energy perturbation methods, thermodynamic integration techniques, and various other approaches to the estimation of free-energy differences. It is thus common to introduce a sequence of intermediate states in which adjacent states overlap extensively and thus have a small variance in their estimated ⌬F. The intermediates should be chosen to minimize the total variance. Previous studies have suggested choosing intermediates so that the free-energy difference ͓4͔ or the entropy difference ͓5͔ between any two adjacent states is approximately equal; other heuristics have been proposed as well ͓6,7͔. In general, however, these choices do not minimize the total variance. To obtain well-converged estimate of ⌬F in practice, the intermediates are often determined by trial and error.
The choice of intermediate states is also important in RE simulations. RE enhances sampling by coupling the reference system to an "efficient sampler" ͓8͔, a replica of the system in a state where the sampling of the phase space is efficient, such as a state of high temperature ͑"temperature exchange"͒ or a state where specific energy barriers are attenuated ͓9-12͔. To allow more frequent exchanges between the efficient sampler and the reference system, intermediate replicas are introduced, with exchanges performed between adjacent replicas; we call such a sequence of replicas an RE schedule. For any given number of replicas, we wish to choose an RE schedule that maximizes the frequency of exchange between fixed end states. Extensive literature has been dedicated to the selection of intermediates in temperature exchange ͓13-17͔, but little is known in more general cases ͓18͔.
Suppose that each state of the system is characterized by a vector . The state of a gas, for example, might be represented by vector = ͑T , P͒, where the coordinate T specifies the temperature of the system and the coordinate P specifies its pressure. In a computer simulation, the coordinates might include parameters of the Hamiltonian. Selecting the intermediate states involves two steps: choosing a path in coordinate space and assigning the intermediates to specific points along this path.
In this work we analyze the choice of intermediates using methods from information geometry. Prior work has described a Riemannian metric on the space of coordinates , called the Fisher information metric or thermodynamic metric, as described below. The associated Riemannian length is called the thermodynamic length ͓19͔. It has been shown that the dissipation in a transition between two end states is minimized when following a geodesic-i.e., a path of minimal thermodynamic length-between them ͓20,21͔. We show here that geodesics also play an important role in selecting the intermediate states used in free-energy calculations and RE simulations: the choice of equidistant intermediates along the geodesic connecting the end states both minimizes the variance in the calculated free energy ͑Theorem 1͒ and provides an "almost optimal" RE schedule ͑Theorem 2͒.

II. PRELIMINARIES AND NOTATION
Suppose that the equilibrium states are determined by a vector = ͑ 1 , ... , d ͒, taking values in a convex set D ʕ R d . Let h be the Hamiltonian of the system at , multiplied by the inverse temperature ␤ = ͑k B T͒ −1 , where k B is the Boltz-mann constant and T is the temperature. The equilibrium probability density at a point x in phase space is p ͑x͒ = Z −1 exp͓−h ͑x͔͒, where Z = ͐e −h ͑x͒ dx is the partition function. Let f =−ln Z be the free energy multiplied by ␤, and ᐉ ͑x͒ =ln p ͑x͒ = f − h ͑x͒ be the log probability. In this notation, ␤ may be a component of . We will refer to h as the Hamiltonian and to f as the free energy of the system. We denote differentiation with respect to i by ‫ץ‬ i , and second derivatives by ‫ץ‬ ij . We assume throughout that we can differentiate with respect to i under the integral sign.
The Fisher information matrix g at is defined by where angled brackets denote expectation with respect to p ͑the second equality follows because ‫ץ͗‬ i ᐉ ͑x͒͘ =0͒. Under a few mild regularity conditions ͓͑22͔, 2.1͒, g is a well-defined positive-definite quadratic form varying smoothly with , endowing D with a Riemannian metric, called the Fisher information metric. The norm induced by g at D is denoted where ␥ = d␥ / dt. The Riemannian distance between a and b is defined as the length of the connecting geodesic, and is denoted L͑ a , b ͒.
The thermodynamic metric is closely related to the thermal fluctuations. This is most apparent when h is linear in , i.e., h ͑x͒ = ͚ i i X i ͑x͒, where X i are functions of the microstate x alone. In this case we say that the system is in Gibbs ensemble, and we have emphasizing the connection between the metric and fluctuations. We define the energy coordinates by U i ͑͒ = ‫ץ‬ i f = ͗X i ͘ ; in the coordinates U, the metric is given by g͑U͒ = g͑͒ −1 ͓͑22͔, 3.4, 4.1͒.
As an example, consider n molecules of ideal gas in the isothermal-isobaric ensemble. Here, h is given by the enthalpy multiplied by ␤: h͑x͒ = ␤E͑x͒ + ␤PV͑x͒, so the system is in Gibbs ensemble with respect to the coordinates ͑␤ , ␤P͒. The metric in these coordinates is where C V is the heat capacity and is the isothermal compressibility. Geodesics for an ideal gas are illustrated in Fig.  1. The connection between the thermodynamic length and fluctuations is intuitive in the energy coordinates, where the metric is g͑͗E͘ , ͗V͒͘ = g͑␤ , ␤P͒ −1 = ͑ var͑E͒ −1 0 0 var͑V͒ −1 ͒. The distance between two states differing in energy by d͗E͘ and in volume by d͗V͘ is dL = ͓d͗E͘ 2 / var͑E͒ + d͗V͘ 2 / var͑V͔͒ 1/2 ; the states are close if the fluctuations in energy and volume are large compared to the difference in their equilibrium values.

III. MINIMIZING VARIANCE IN FREE-ENERGY CALCULATIONS
We study the estimation of the free-energy difference ⌬f between two states a , b D, based on samples drawn from a sequence of intermediate states, 0 , 1 , ... , k , where 0 = a and k = b .
Theorem 1. Let ⌬f be an unbiased estimator for ⌬f based on a total of N observations. Then in the limit where the intermediates i are close, This bound can be achieved asymptotically as N → ϱ by choosing the i along the geodesic connecting a and b . Our argument will also show that to minimize the variance, the amount of sampling for the estimation of ⌬f i = f i+1 − f i should be proportional to L͑ i , i+1 ͒. In particular, when sampling equally at each i , the intermediates should be equidistant.
We first prove the theorem for a special case, which highlights that the thermodynamic length represents an inherent information-theoretical bound on the variance of a freeenergy estimator. Suppose that the system is in Gibbs ensemble and ⌬f͑x͒ is any unbiased estimator for ⌬f given an observation x drawn from b . Let Then we obtain an estimator for b , namely: b ͑x͒ = ⌬f͑x͒ − ͚ i ‫ץ‬ i f ͉ a · ⌬ i , whose variance is the same as that of ⌬f. Conversely, any unbiased estimator for b gives rise to an unbiased estimator for ⌬f with the same variance. Note that By the Cramér-Rao lower bound ͓22͔, using g͑U͒ = g͑͒ −1 and the invariance of the norm under coordinate transformations, we have Suppose now that we allow sampling at multiple intermediates and draw n i observations at each i . By Eq. ͑6͒, the variance for the estimation of ⌬f i is bounded by ʈ⌬ i ʈ i 2 / n i .
Minimizing subject to the constraint that N = ͚ i=1 k n i is a constant, we obtain that n i should be proportional to ʈ⌬ i ʈ i . In this case, assuming independent sampling at the various i , we have the bound This bound is rigorous. If the i are close enough and situated along a path ␥, we obtain in the limit k → ϱ Equality can be achieved asymptotically as N → ϱ, e.g., by using a maximum-likelihood estimator ͑the needed regularity conditions hold by our hypotheses on g ͓22͔͒. The variance is minimized when ␥ is a geodesic. We now turn to the general case, where the system is not necessarily in Gibbs ensemble. Consider the estimation of ⌬f without intermediates, based on a maximum-likelihood estimator given N samples. We assume for simplicity a sampling protocol where the two states are sampled equally; similar results hold for different ratios as well. The Cramér-Rao lower bound gives again an estimate for the variance ͓3͔, where the average is taken over all samples; the approximations are obtained by expanding to second order in ⌬ᐉ. In the limit of many samples, it follows from the protocol that ͗⌬ 2 ᐉ͘ → ͑͗⌬ 2 ᐉ͘ a + ͗⌬ 2 ᐉ͘ b ͒ / 2. Expanding in ⌬, the first-order term of ͗⌬ᐉ͘ a vanishes, so up to second order, where the second approximation follows directly from the definition of the metric. Combining this result with its counterpart for ͗⌬ 2 ᐉ͘ b in Eq. ͑9͒ we finally obtain This approximation becomes arbitrarily good as ⌬ vanishes. The theorem now follows in the same way as it does from Eq. ͑6͒.

IV. MAXIMIZING THE TOTAL ACCEPTANCE IN RE SIMULATIONS
Let p a , p b be the distributions defined by two replicas whose Hamiltonians are parametrized by a , b . The Me-tropolis acceptance probability for the exchange attempt ͑x , y͒ → ͑y , x͒ is min͑1, p a ͑y͒p b ͑x͒ p a ͑x͒p b ͑y͒ ͒, so the average acceptance probability is

͑12͒
Given a sequence of replicas along a path ␥, 0 = a , 1 , ... , k = b , we define the total acceptance by A͑␥͒ = ͟ i A͑ i , i+1 ͒. This is the probability that a state will propagate from 0 to k in k consecutive attempts. A͑␥͒ is not by itself an optimal measure for the efficiency of an RE schedule, as it does not take into account the number of replicas and the sampling quality. Other measures, such as the flux, the effective fraction, and the round-trip time ͓16,17,23͔, are more suitable for this end. Nevertheless, given k, maximizing A͑␥͒ is equivalent to optimizing these other measures. We note that the sampling efficiency of RE is only accurately reflected by the above measures under the assumption that at least one replica is a perfect sampler ͓17͔. Theorem 2. The total acceptance A͑␥͒ is maximized when the i are equidistant, and we have the bounds Once again, the bounds are maximized when ␥ is a geodesic. Even though sampling along a geodesic may not guarantee an optimal total acceptance, any other schedule cannot improve on it by a factor of more than exp͑L͑ a , b ͒ / ͱ 2͒.
For the proof, we examine the overlap integrals J q ͑ a , b ͒ ϵ ͵ 2 −1/q ͑p a q + p b q ͒ 1/q dx ͑q Յ 0͒. ͑13͒ J q is nondecreasing in q. We shall use the case q = −1 and the limiting case q → −ϱ, given explicitly by Using Jensen's inequality twice, we have: where ͑⌬ᐉ͒ is the standard deviation; by Eq. ͑10͒, ͑⌬ᐉ͒ a Ϸʈ⌬ʈ a . Up to second order, ͗⌬ᐉ͘ a Ϸ −ʈ⌬ʈ a 2 / 2, as can be seen by noting that ͗⌬ᐉ͘ a is the negative of the Kullback-Leibler divergence from p a to p b ͓͑22͔, 3.5͒. It follows that for a , b close enough, For J −1 , we write ln J −1 =ln͗2 / ͑1+e −⌬ᐉ ͒͘ a , expand to second order and use our previous approximations, . ͑17͒ Putting everything together, we obtain for a , b close enough and L = L͑ a , b ͒, To make connection to the average acceptance probability A͑ a , b ͒, note that we may regard D ϫ D as a manifold of distributions, where ͑ a , b ͒ corresponds to the product density p a ͑x͒p b ͑y͒. Defining J q on this manifold as before, it is clear from the definition ͑12͒ that The independence of p a ͑x͒ and p b ͑y͒ implies that the Fisher metric on D ϫ D is simply the product metric, given by If ␥ is a geodesic in D ϫ D, then by minimality the projections on each factor, ␥ 1 , ␥ 2 , are geodesics in D. It can be shown further that L DϫD ͑␥͒ = ͑L D ͑␥ 1 ͒ 2 + L D ͑␥ 2 ͒ 2 ͒ 1/2 . It follows that The proof of theorem 2 now follows by applying these bounds to all acceptances A͑ i , i+1 ͒ and maximizing the resultant bounds of A͑␥͒ subject to the constraint that the length of ␥ is constant.
As an example, consider temperature exchange. The Hamiltonian depends linearly on a single parameter, the inverse temperature ␤. The metric in the coordinate ln ␤ is g͑ln ␤͒ = C V ͑␤͒k B −1 , where C V ͑␤͒ is the heat capacity. If C V ͑␤͒ is constant, L͑␤ 1 , ␤ 2 ͒ is proportional to ͉ln ␤ 2 −ln ␤ 1 ͉. Hence, in order to obtain equidistant replicas, the temperatures should form a geometric progression, in accordance with previous results ͓13,14͔.

V. CONCLUSION
Previous studies have demonstrated examples in which well-chosen intermediates significantly improve the convergence of the estimated ⌬F ͓5͔ and the RE sampling efficiency ͓15͔. The present work shows how to select a nearly optimal set of intermediates in the general cases. We have shown that in the limit of a large number of intermediate states, choosing equidistant intermediates along the geodesic both minimizes the total variance in the estimate of ⌬F and guarantees that the total acceptance probability in RE simulations differs by no more than a system-dependent constant factor from the optimum. We suggest the following method to obtain such intermediates. First, conduct short simulations on a lattice in coordinate space and estimate the thermodynamic metric at each lattice point. After refining the lattice as needed, construct the shortest path between the end states using the metric estimates and select equidistant points along it as the intermediate states. Further development of efficient methods to estimate the thermodynamic metric will be of interest for purposes of determining accurate geodesics.