Variationally Derived Intermediates for Correlated Free Energy Estimates between Intermediate States

Free energy difference calculations based on atomistic simulations generally improve in accuracy when sampling from a sequence of intermediate equilibrium thermodynamic states that bridge the configuration space between two states of interest. For reasons of efficiency, usually the same samples are used to calculate the step-wise difference of such an intermediate to both adjacent intermediates. However, this procedure violates the assumption of uncorrelated estimates that is necessary to derive both the optimal sequence of intermediate states and the widely used Bennett acceptance ratio (BAR) estimator. In this work, via a variational approach, we derive the sequence of intermediate states and the corresponding estimator with minimal mean squared error that account for these correlations and assess its accuracy.


INTRODUCTION
Free energy calculations are widely used to investigate physical and chemical processes [1][2][3][4]. Their accuracy is essential to biomedical applications such as computational drug development [5][6][7][8] or material design [9][10][11]. Amongst the most widely used methods based on simulations with atomistic Hamiltonians are alchemical equilibrium techniques, including the Free Energy Perturbation (FEP) [12] and Thermodynamic Integration (TI) [13] methods. These techniques determine the free energy difference between two states, representing, for example, two different ligands bound to a target, by sampling from intermediate states whose Hamiltonians are constructed from those of the end states.
The choice of these intermediates critically affects the accuracy of the free energy estimates [14][15][16] by determining which parts of the configuration space are sampled to which extent [17], thereby performing a function similar to importance sampling [18]. In addition, different estimators that determine the free energy differences between these intermediates and the end states have been developed, most prominently the Zwanzig formula [12] for FEP, the Bennett Acceptance Ratio method (BAR) [19], and multistate BAR (MBAR) [20].
We have recently derived [21] the sequence of discrete intermediate states that yields, for finite sampling, the lowest mean squared error (MSE) of the free energy estimates with respect to the exact value. Notably, minimizing the MSE accounts not only for the variance, but also for possible bias. The result differs from the most common scheme, which linearly interpolates between the end states Hamiltonians H 1 (x) and H N (x), respectively, along a path variable λ, H s (x) = (1 − λ)H 1 (x, λ) + λH N (x, λ), λ ∈ [0, 1] (1) where x ∈ IR 3M denotes the coordinate vector of all M particles in the system. Here, the additional λ argument of the end states Hamiltonians indicates the commmon use of soft-core potentials [22][23][24] to avoid divergences for vanishing particle. Other approaches involve the interpolation of exponentially weighted Hamiltonians of the end states, such as Enveloping Distribution Sampling [25] or the Minimum Variance path [26,27] for TI.
In contrast, the variationally derived intermediates (VI) turn out to be coupled and thus determined through a system of equations [21]. For the setup shown in Fig. 1(a), where all states are labeled by integers s with 1 ≤ s ≤ N , sampling is conducted in the intermediates with even numbered s, governed by the optimal Hamiltonian where r s,t = Z s /Z t denotes the ratio of the configurational partition sums of states s and t. Virtual intermediates, i.e., the ones without sampling, are labeled with odd s with 2 < s < N − 1 and indicated by the dashed lines in Fig. 1 being used to calculate the difference between two adjacent states, as indicated by the arrows in Fig. 1. Furthermore, using the virtual target states described by Eq. (2) is equivalent to using BAR directly between two sampling states [21,28], and, therefore, Eq. 16 also describes the optimal intermediates for BAR. However, for BAR and VI to be optimal for multiple states, the free energy estimates to the states above and below an intermediate in the sequence have to be arXiv:2007.14095v1 [physics.comp-ph] 28 Jul 2020 based on separate, uncorrelated sample points [21], as illustrated by the separate yellow points in Fig. 1(a) that we refer to as the regular FEP setup. Yet, it would be twice as efficient to use the same sample points in both directions, as illustrated by Fig. 1(b), and as generally done in practice. However, this introduces correlations between the estimates to both adjacent intermediates, thereby violating the assumptions underlying the derivation of Eqs. (2) and (3). Therefore, in this case the above variational intermediates are not optimal anymore. Due to these correlations, we refer to the Fig. 1(b) as the correlated FEP (cFEP) setup.
Here, we derive the minimal MSE sequence of intermediate states for and the corresponding estimators for cFEP that take these correlations properly into account. As will be shown below, what might seem as a minor technical twist, markedly changes the shape of the optimal intermediates and considerably improves the accuracy of the obtained free energy estimates.

THEORY
For the cFEP scheme shown in Fig. 1(b), using N states, we aim to derive the sequence of intermediate along similar lines as before [21]. Here, ∆G 1,N denotes the free energy estimate based on a finite number of sample points n, and ∆G 1,N the exact difference between the end states 1 and N .
The cFEP variant in Fig. 1(b) only uses sampling in the intermediate states. Setups that, in addition, involve sampling in the end states, can also be treated with the formalism below. However, firstly, as we have tested, the accuracy for a given computational effort does not increase in this case. Secondly, mixing two different types of sample points (the ones used to evaluate ∆H to only one adjacent state vs. to both adjacent states) further complicates the analysis.
For cFEP, the estimated difference is As in Fig. 1 The first two lines of Eq. (7) have already been processed in Ref. 21, but the last term differs. Previously, as in the regular FEP scheme in Fig. 1(a), these last expectation values were originally derived from independent sample sets and were, therefore, uncorrelated. In the present context of cFEP, however, these estimates are correlated. Therefore, the term needs to be split in two sums, distinguishing between the pairs with samples from the same state and the ones from different states, where the expectation value of the product between the two estimates based on different sample sets has been separated, as these are uncorrelated.
As we are only interested in the intermediates that optimize the MSE, and not in the absolute value of the MSE, we focus on the terms that will not drop out in the optimization below.
Continuing with the expression inside the sum of the first term on the right hand side of Eq. 8, e −(Hs−1(xi)−Hs(xi)) .
As in the derivation of Ref. 21, the Hamiltonians are now shifted by a constant offset C s , i.e., H s (x) = H s (x) − C s . This offset will cancel out for a given shape of an intermediate when calculating the accumulated free energy difference in Eq. 6. However, as the intermediate states will turn out to be coupled, these offsets do influence the shape of these intermediates. The offsets can now be chosen such that the terms inside the logarithms of Eq. (10) are close to one. In this case, = ∆G s ,(s+1) [21], and, therefore, the two linear terms arising from Eq. (10) can be expressed in terms of the exact free energy differences.
Next, the product of the two sums in Eq. 10 is split into terms based on the same and different sample points, respectively, where the terms that can be expressed solely based on (constant) free energy differences are summarized by the term f s . Again, the first two terms of Eq. (12) can be expressed in terms of the free energy differences between s and s + 1 as well as between s and s − 1, respectively. Collecting all terms arising from Eq. (7) MSE ∆G where the function g s serves the same purpose as f s and can be dropped in the optimization below.
The condition of small ∆G (n) s →(s+1) is fulfilled by setting C s = − ln Z s . By variation of the MSE from Eq. (13), where ν is a Lagrange multiplier, the optimal sequence of Hamiltonians is obtained. For s even, we obtain For s odd and 2 < s < N − 1: where, as in Eqs. (2) and (3), the ratios r s,t of the partition sums between states s and t have to be determined iteratively. The above sequence, Eqs. (15) and (16), that we refer to as the correlated Variational Intermediates (cVI), yield the minimal MSE estimates for cFEP. Figure 2 shows the resulting configuration space densities of the above intermediates for the example of a start state with a harmonic Hamiltonian, H 1 (x) = 1 2 x 2 , and an end state with a quartic one, H N (x) = (x − x 0 ) 4 . Panel (a) shows the VI that are optimal for the regular FEP scheme in Fig. 1(a). Panel (b) shows the cVI, optimal for cFEP.
The yellow areas in Fig. 2, Eq. (17), provide a simple measure of the configuration space density overlap K between the end states 1 and N , Here, K = 0 indicates two separate distributions without any overlap, and K = 1 full overlap, i.e., identical configuration space densities. The two rows in Fig. 2(a) and (b) depict the result for two different values of x 0 , and correspondingly, varying K.
As can be inferred from Eq. (15), for N = 3, H 2 (x) diverges at the points where p 1 (x) = p 3 (x), and therefore, p 2 (x) = 0 at these points, as can also be seen for the intermediate sampling state shown in Fig. 2(a). More generally, H 2 (x) of cVI "directs" sampling away from the overlap regions and towards the ones that are only relevant for one, but not both end states. For instance, the tails of the start state in the upper row of (a) are sampled more for cVI than for VI. For larger horizontal shifts of x 0 , i.e., low values of K, the two variants become increasingly similar, as the additional term in Eq. (15) with respect to Eq. (2) becomes smaller compared to the first term.
For N = 7 states, Fig. 2(b) shows the converged resulting configuration space densities. The case of x 0 = 0, as shown in (a), was omitted in (b) as the visualization is more difficult in this case due to the higher number of states. In (b), the additional changes from VI to cVI become more complex. As in (a), the sampling states have smaller densities p(x) in the overlap regions of the end states, but, in contrast to (a), still differ between VI and cVI for smaller values of overlap K. The reason is that while the overlap between the end states vanishes with decreasing K, an overlap between adjacent intermediate states remains that affects the shape of the intermediates. Note that the divergences mentioned above introduce in-stabilities in solving the system of Eqs. (15) and (16). Hence, for N > 3 the factor 2 of the additional term in the logarithm Eq. (15) has been replaced by a factor κ that was set to slightly below 2 (κ = 1.95) in case of Fig. 2(b). See Appendix A for details.

cBAR Estimator
As mentioned above, using the Zwanzig formula [12] to evaluate the free energy difference between two sampling states with respect to the virtual intermediate, Eq. (3), of VI is equivalent to BAR [21,28]. Correspondingly, the virtual intermediate defined by Eq. (16) of cVI also corresponds to an estimator, that is optimal for the sampling states of cFEP and that we will refer to as correlated BAR (cBAR).
To derive cBAR, we use the relation between the two approaches. Determining the free energy difference between two sampling states labeled s−1 and s+1 by using the virtual intermediate s to evaluate the difference between the adjacent states yields Using the approach of Bennett [19] instead, where w(H s−1 (x), H s+1 (x)) is a weighting function. From Eqs. (21) and (19) where the MSE of the resulting estimates is minimal if all C s,t ≈ ∆G s,t . A numerator of 1 in Eq. 22 would yield the original BAR result.
Note that H s−2 (x), and H s+2 (x), are also virtual intermediates determined by Eq. 16. As such, the result is a system of weighting functions, i.e., one for every pair of adjacent sampling states. The optimal estimate can, therefore, only be found by iteratively solving for the free energy estimates between all sampling states at once. In this regard, the procedure is similar to MBAR [20].

TEST SIMULATIONS
To assess to what extent our new variational scheme improves accuracy, we consider the one-dimensional system with a harmonic and a quartic end state shown in Fig. 2. Rejection sampling is used to obtain uncorrelated sample points. The free energy estimate, obtained from these finite sample sets, is compared to the exact free energy difference. The MSE, Eq. (5), is then calculated by averaging over one million of such realizations. With this procedure, different combinations of overlap K, numbers of states N and sample points n are considered.
We compare three variants. Firstly, using VI, Eqs. (2) and (3), with FEP, i.e., the scheme in Fig. 1(a). Here, the estimates to both adjacent states are based on separate sample sets and, therefore, not correlated. Secondly, also using VI, but now with cFEP, shown in Fig. 1(b). In contrast to variant 1, these estimates are based on the same sample sets and, therefore, correlated. In order to keep the total computational effort constant, the number of sample points per set (i.e., per yellow point in Fig. 1) is two times larger for cFEP than for FEP. Thirdly, using cVI, Eqs. (15) and (16), that accounts for these correlations, also with cFEP.

RESULTS
For N = 3 states, Fig. 3(a) shows the MSEs of the three variants for different numbers of sample points. Here, for the quartic end state, x 0 = 0, corresponding to K = 0.85, was used. The corresponding configuration space densities of VI and cVI are shown in the upper row of Fig. 2(a).
As can be seen, cVI with cFEP, shown by the dark blue line, yields the best MSE for all numbers of sample points except very few ones. The other two variants, i.e., VI with FEP (grey line) and cFEP (red line) yield very similar MSEs. As such, the gain in information from evaluating the Hamiltonians to both adjacent states for all sample points yields only a very small improvement compared to using separate sample sets for this purpose.
In order to quantify the improvement of cVI compared to VI for cFEP, Fig. 3(b) shows the ratio of the MSEs of the two variants, again in relation to the number of sample points per set. The dark green curve (K = 0.85), corresponds to the MSEs shown in (a) (i.e., the values of the red curve divided by the blue curve). The improvement in the MSE plateaus slightly above two for more than two hundred sample points per state. In addition, the improvements for setups with different overlap K between the end states are shown (orange to light green). This improvement becomes smaller for smaller values of K, but the qualitative dependence on the number of sample points remains the same.   Fig. 2(b). Here, VI with FEP and cFEP still yield similar MSEs, whereas cVI with cFEP, in contrast to N = 3, now yields the best MSE for all K. The improvement to VI ranges from around 20 % for low K, to around 50 % for large K. This is in line with the observation from Fig. 2(b) that the configuration space densities between VI and cVI become more similar but do not fully converge for a larger number of states in the limit of small K.
Lastly, the cBAR estimator can be used with any choice of intermediate states for cFEP. To assess how much the cBAR estimator improves the accuracy of free energy estimates compared to BAR for cFEP, we conducted test simulations where the sampling states were chosen as in Eq. (1), i.e., by linear interpolation between the Hamiltonians of the end states. Test simulations were conducted at varying values of K and at N = 5 and N = 7. Evaluating the MSE, we found a statistically significant improvement, however, only in the range of 1 − 2 % (data therefore not shown here). The improvement was independent of K and similar for both numbers of N .
Considering that the MSEs of cVI and VI can improve up to an order of magnitude compared to the linear intermediates defined in Eq. (1) (for a detailed comparison between VI and linear intermediates, see Ref. 21), the large majority of improvements is not due to an improved estimator, but due to the way samples are generated.

DISCUSSION AND CONCLUSION
In summary, we have derived a new variant of variational intermediates (cVI) that yield the optimal free energy estimate with minimal MSE when using the same sample points to evaluate the differences between the adjacent states above and below in the sequence (cFEP). This procedure is commonly used in free energy simulations, as it is computationally much cheaper to evaluate sample points at different Hamiltonians than to generate these. However, the resulting correlations between these estimates have not been considered yet.
Our test simulations for a one-dimensional Hamiltonian show that cVI with cFEP yields an improved MSE compared to the optimal sequence (VI) with FEP, i.e., using different sample points for estimates to states above and below in the sequence. For N = 3 states, the first variant improved the MSE by more than a factor of two for end states with high configuration space density overlap K, whereas at low K the MSEs were similar. For N = 7 states, the MSE improved between 20 % (low K) and 50 % (large K).
Interestingly, due to the correlations mentioned above, using VI with FEP yields only slightly worse MSEs for all K as using VI with cFEP, even though the latter involves twice as many evaluations of Hamiltonians from adjacent states. Only for cVI, thereby accounting for these correlations, the additional gain in information translates into a marked improvement of the MSE.
Similar to most other theoretical analyses and derivations of free energy calculation methods, we also needed to assume that all sample points within each intermediate state are uncorrelated. If atomistic simulations are used for sampling, the resulting time-correlations reduce the number of essentially independent sample points. Unfortunately, for our one-dimensional systems, cVI increases barrier heights, thereby increasing correlation times. We have so far not tested our method on any complex biomolecular systems, so it is unclear if these barriers can be circumvented or what the expected in-crease in correlation times is. However, to avoid such correlations between sample points in atomistic simulations, usually only a small subset of all sample points is used to calculate free energy differences. Based on our findings and in contrast to common practice, we therefore recommend to use different subsets to evaluate the free differences to different adjacent states.
The above derivation provides an example on how optimal intermediates and estimators with minimal MSE can be derived for different types of setups based on finite sampling that may help to incorporate a variety of assumptions and models into future theoretical approaches.
causes numerical instabilities in solving the system of Eqs. (15) and (16). Replacing the factor 2 in Eq. (15) in the logarithm with a factor κ, i.e., for s even, and setting, e.g., κ = 1.95, avoids these complications. As can be easily validated, the inside of the logarithm in Eq. 24 is larger than zero for 0 < κ < 2 for all H s−1 (x) and H s+1 (x). As shown for cVI in Fig. 2(b), κ < 2 prevents p s (x) to go to zero at the crossing points of p s−1 (x) and p s+1 (x) of the neighboring states, but is still lowered at these points.