Hidden Structure in Protein Energy Landscapes

Inherent structure theory is used to discover strong connections between simple characteristics of protein structure and the energy landscape of a Go model. The potential energies and vibrational free energies of inherent structures are highly correlated, and both reflect simple measures of networks of native contacts. These connections have important consequences for models of protein dynamics and thermodynamics.

described as vibrations about local minima, punctuated by transitions between neighboring basins. A key assumption in IS theory is that vibrations are similar about minima with the same potential energy; however, importantly, IS theory allows for diversity among vibrations that have different potential energies. Guo and Thirumalai [10] have used IS theory to analyze fluctuations in the neighborhood of the native state of a coarse-grained model of a designed four-helix bundle protein.
Baumketner, Shea, and Hiwatari [11] have applied IS theory to study the glass transition in a coarse-grained model of a 16-residue polypeptide; by IS analysis of molecular dynamics trajectories, they demonstrated the ability to rigorously calculate the glass transition temperature due to freezing in the native-state basin. In a more recent study, Nakagawa & Peyrard [12] used IS theory to analyze the energy landscape of a protein G B1 domain using a coarse-grained model, finding that the density of minima increases exponentially with the energy. Importantly, their analysis relied on an assumption that vibrations are the same about all potential energy minima. However, because proteins become less rigid as noncovalent bonds are broken [7], vibrations are expected to be different for different minima, especially for minima with different potential energies. Diversity in vibrations not only would change the density of minima, but also would have important implications for the kinetics of transitions among conformational substates [1]; however, if vibrations are the same for different minima, their role in determining the kinetics of transitions would be trivial.
To characterize the diversity in vibrations among different protein inherent structures, we used IS theory to analyze a coarse-grained Gō model of the same protein fragment considered by Nakagawa & Peyrard [12]: GB1, a protein G B1 domain (Protein Data Bank [13] entry 2GB1 [14]). GB1 has 56 amino acids and consists of a four-stranded β-sheet packed against a single helix.
A configuration x is represented by the set of C α positions, and the Gō model potential energy U(x) for C α configurations x of all proteins is similar to that used by the GB1 studies in Refs. [15] and [12]: (1) The first term in Eq. (1) is the contribution from neighboring backbone C α bond distances, the second is from angles between neighboring bonds, the third is from dihedral angles, the fourth is from noncovalent interactions between native contacts, and the fifth is from noncovalent interations between other pairs of atoms. The crystal structure was used as the reference structure to determine r i,0 , θ i,0 , and r ij,0 , with native contacts determined using a cutoff distance of 5.5Å. Other parameter values are 18ǫ 0 , and C = 4Å. The absolute energy unit ǫ 0 = 1.89 Kcalmol −1 is determined as in Ref. [15], assuming a folding temperature of 350 K. Frustration is introduced through the dihedral angle terms, which are not defined with respect to the reference structure.
Langevin dynamics simulations were performed using a time step 0.0007τ and a friction coefficient of 0.2/τ , where τ = 1.47 ps (following Ref. [15]). The collapse temperature T θ , at which extended and collapsed configurations are approximately equally likely, was located by analysis of the specific heat [12]. A single trajectory at temperature T θ with structures for analysis. Local minima e α , corresponding to inherent structures α, were found using conjugate gradient minimization terminated when a step resulted in an energy change of just 10 −12 . The protein exhibited multiple transitions between extended and collapsed states during the course of the simulation, and the inherent structure ensembles exhibited a bimodal probability distribution P IS (e α , T θ ) of collapsed and extended inherent-structure potential energies e α , similar to the distribution in Ref. [12].
Like a previous application of IS theory to proteins by Baumketner, Shea, and Hiwatari [11], we replace the configurational integral in the partition function for an isolated protein with a sum over contributions from individual inherent structures: which defines the vibrational free energy F v (α, T ). In Eq. (2), R(α) is the basin surrounding inherent structure α, Λ i is the thermal wavelength of atom i, and σ is a factor to account for symmetries.
Values of F v (α, T ), calculated as differences with respect to the native structure α = 0 (the same holds for values of e α ), were estimated at the collapse temperature T θ using a harmonic approximation, where λ (α) i is the i th eigenvalue of the Hessian h jk = ∂ 2 U/∂x j ∂x k calculated at the energy minimum corresponding to inherent structure α, and λ that the corresponding modes describe the bond vibrations. Therefore bond vibrations do not change significantly among different inherent structures. However, the total F v , which includes contributions from the lowest 2/3 of the eigenvalues, changes significantly with e α (Fig. 1). The assumption of constant F v by Nakagawa & Peyrard [12] therefore is only valid for the modes that involve bond vibrations. This result is consistent with studies of the loss of protein rigidity upon protein unfolding [7], and is also consistent with molecular dynamics studies suggesting that vibrations can be diverse for different protein conformational substates [16,17].
As demonstrated by the fit in Fig. 1, F v is well-modeled using the function Equation (4) is essentially a piecewise-linear function with slope k 1 for e α < e c , and slope Inherent structure theory [8,9] assumes that F v (α, T ) = F v (e α , T ) (validated for the present application in Fig. 1), and relates the vibrational free energy F v (e α ) and the probability distribution P IS (e α ) to the density of states Ω IS (e α ) through which generalizes a similar equation in Nakagawa & Peyrard [12] to values of F v (e α , T ) that vary with e α .
We used Eq. (6) along with the calculated P IS (e α ) and F v (e α ) from Eq. (4) to model the density of inherent structures Ω IS (e α ). At energies below e c , Ω IS (e α ) exhibits an exponential increase, but with a slight increase in the exponent factor above an energy e r , giving rise to a knee in the plot of log Ω IS (e α ) vs. e α (Fig. 2). The knee is located at a minimum in P IS (e α ) between the extended and collapsed states, and is thus associated with the transition state.
Such a knee was also seen in a previous model of Ω IS (e α ) that did not consider vibrations [12].
Above e c , Ω IS (e α ) plateaus and decreases at the highest energies, which is a consequence of the structure in both P IS (e α ) and F v (e α , T θ ). Rather than being exponential in form [12], from Eqs. [4] and [6], Ω IS (e α ) in this region has the form Ω IS (e α ) = e (1+k 2 )eα/k B T e (k 1 −k 2 )ec/k B T θ P IS (e α , T ) P IS (e 0 , T ) .
Because k 2 is close to -1 at T θ , the structure of Ω IS (e α ) closely resembles that of P (e α , T θ ) above e c . We found (Fig. 3) that e α is closely related to the number of broken native contacts,n α through the piecewise-linear function The slopes h 1 and h 2 correspond to the amount of energy required to break a native contact below (h 1 ) and above (h 2 ) a critical number of broken contactsn r . Data for GB1 are wellmodeled using h 1 = 0.997, h 2 = 0.622, andn r = 60. Belown r , breaking a native contact requires more potential energy than aboven r . Therefore,n r is associated with a change in protein stress.
There are interesting connections between the structure of Ω IS (e α ) below e c (Fig. 2) and the dependence of e α onn α (Fig. 3). The change in the slope of Ω IS at e r is closely related to the change in the slope of e α (n α ) atn r , suggesting that Ω IS has a simple exponential dependence onn r below e c . However, the knee in Ω IS occurs at e r = 47.4k B T θ , which is smaller than the value e α (n r ) = 59.6k B T θ at the knee in Fig. 3. While the density of inherent structures might truly be enhanced in the gap between e r and e α (n r ), we note that the shift of e r with respect to e α (n r ) might indicate that the inherent structure basins associated with the transition state are especially large (as noted above, e r is associated with the transition state), and that the harmonic approximation might be especially ill-suited to estimating their free energies for use in Eq. (6).
The source of the plateau in Ω IS (e α ) above e c may be understood in terms of the dependence of the free energy on bothn α and the number of residues m α for which all native contacts are broken. As shown in Fig. 4, a plot of m α vs.n α is well-modeled by the function supporting an expectation that breaking a contact is only likely to create a residue with no native contacts at highn α . The following simple model for F v then successfully captures the structure of F v in Fig. 1: with m α given by Eq. (10). Using ν = −0.32 and µ = −1.07 yields good agreement between values of F v obtained either directly from the Hessian or using Eq. (10), with a correlation coefficient of 0.993 for values calculated from all inherent structures. We conclude that the change in the slope of F v vs. e α at e c , and therefore the plateau in Ω IS (e α ) above e c , is associated with an increase in the likelihood that breaking a native contact will increase the number of residues with no native contacts.
We found that protein stress and rigidity are closely tied to the network of native contacts through Eqs. [8] and [10]. This finding is remeniscent of an analysis of protein folding by Rader et al. [7], in which the loss of network rigidity was associated with protein unfolding.
It is therefore tempting to associate the region between e r and e c in Fig. 2  The maximum in the density of states above e c is a consequence of considering diversity in vibrations, and is not observed when uniform vibrations are assumed [12]. Interestingly, a similar structure for the density of states, in which an exponential increase is followed by a maximum, has been observed for many structural-glass-forming liquids [9]. It will therefore be interesting to improve the estimation of the density of states by obtaining more accurate estimates of F v than are possible using a harmonic approximation [11].
Studies of two other Gō models of proteins yielded results that are similar to those found here for GB1 (unpublished results), suggesting the possibility that a simple phenomenological relationship between the network of native contacts and the energy landscape might exist for all Gō models. It will be interesting to explore this relationship for a large number of proteins and seek representations in which it is identical for different proteins. Discovery of such "universality" would enable the prediction of important properties of the energy landscapes of Gō models without performing numerical simulations.
It will be important to extend the present results to models whose energy landscapes exhibit more frustration than Gō models. For example, consider a modified model in which there is a weak attractive interaction for non-native contacts. In contrast to the simple relation illustrated in Fig. 3, in such a model, inherent structures with the same potential energy would likely have diverse numbers of native contacts. However, by extending the parameter space, the energy still might be simply related to a combination of both the number of native contacts and the number of non-native contacts. Similarly, the vibrational free energies might exhibit diversity among inherent structures with the same energy, but might still be simply related to both the number of native contacts and non-native contacts through an equation analogous to Eq. (10). Ultimately, it will be interesting to incrementally increase the complexity of the model, extending the present results (as far as computationally feasible) to realistic, all-atom models of proteins that include explicit solvent and other effects that are important in controlling protein function. Use of such all-atom models will enable the link between analyses based on IS theory and network rigidity to be further explored.
The present results demonstrate that simple connections to protein structure are hidden within the energy landscape of a Gō model. The potential energies and vibrational free energies of inherent structures are highly correlated, and both reflect simple measures of networks of native contacts. Through use of IS theory, these regularities should enable significant simplification of thermodynamic models of proteins [8,9,12].
We gratefully acknowledge Arthur Voter and Donald Jacobs for discussions. This work was supported by the Department of Energy. * E-mail: mewall@lanl.gov