Role of Water in the Selection of Stable Proteins at Ambient and Extreme Thermodynamic Conditions

Valentino Bianco, Giancarlo Franzese, Christoph Dellago, and Ivan Coluzza Faculty of Physics, University of Vienna, Vienna 1090, Austria Secció de Física Estadística i Interdisciplinària–Departament de Física de la Matèria Condensada, Facultat de Física & Institute of Nanoscience and Nanotechnology (IN2UB), Universitat de Barcelona, Barcelona 08028, Spain (Received 21 October 2016; revised manuscript received 1 February 2017; published 26 June 2017)


I. INTRODUCTION
Proteins are molecules made of a sequence of amino acid residues that fold to target structures (native states) for a range of temperatures T and pressures P. Different proteins, with different sequences, have different T-P ranges of stability for their native state.As a matter of fact, many life forms survive under extreme T-P conditions [1,2], implying that their proteins fit that environment.Recent works [3][4][5][6][7][8][9][10] have evidenced that changes in T and P significantly alter the water-mediated hydrophilic and hydrophobic interactions of the residues along the chain, with an effect that depends on the sequence.However, a direct observation of how the protein selection responds to extreme changes in the aqueous conditions is still lacking.One of the reasons for this is that, for the few notable studies accounting for explicit water [11][12][13][14][15][16], an exhaustive sampling of the T-P stability region of the selected proteins is beyond the reach of current computers.Here, we fill this gap.We present a computational study based on a novel strategy that mimics the adaptation process of solvated proteins in a wide range of thermodynamic conditions.Our results reveal the border beyond which artificial protein design and natural evolution fail.Moreover, we show that high-T adaptation alone selects for superstable sequences that are highly resistant to both cold and pressure denaturation.Here, we define the proteins as superstable if their average stability region encompasses the average stability region of proteins designed at ambient conditions.Our results confirm, for the first time, the hypothesis that a strong stability of the folded state at high T corresponds to strong stability also at low T and high P [17][18][19][20].
We develop a protein design strategy that focuses on the relation between sequence and folded structure, allowing us to calculate the stability of many proteins in a wide range of T and P in explicit water, a formidable task that is currently infeasible for atomistic models.Following the standard approach introduced by Shakhnovich and Gutin [21,22], we adopt a coarse-grained lattice representation of proteins that is computationally effective and yields results that can also be extended to off-lattice proteins, as recently demonstrated for the implicit solvent case [23][24][25].The main difference between our strategy and the standard approach is that here, instead of considering the phenomenology of water implicitly, we explicitly include water in our coarsegraining.As an explicit solvent, we adopt a many-body water model that has recently been shown to provide relevant information on the role of water in the coldand pressure-denaturation mechanisms of proteins [9].

II. DESIGN APPROACH
Protein and solvent interactions are described by the Hamiltonian (see Appendix A for details) residue-residue interactions and the second term for the residue-hydrophobic or hydrophilic interaction with the solvent, respectively.The third term, H ðbÞ w;w , accounts for the water-water interaction in the bulk, including the isotropic van der Waals interaction, as well as the directional and cooperative components of the hydrogen bonds (HBs) [8,9,[32][33][34][35][36]. The last term, H ðhÞ w;w , describes the water-water interaction in the hydration layer-the first layer of water molecules in contact with the protein-and accounts for the effect of the protein's hydrophobic and hydrophilic residues on the interaction and structure of the hydration water [9].
By assuming that the protein interface affects the water properties in the hydration shell, we use a design strategy based on the enthalpy associated with the hydrated protein, regardless of the bulk contribution, , with a bias that enhances the sequence heterogeneity to avoid homopolymer design solutions [23,24,[29][30][31].In the above expression for H ðhÞ R , V is the total volume of the system and V b is the volume associate with the bulk water, i.e., the water not in direct contact with the protein.Since our optimization accounts for changes in the aqueous conditions, the enthalpies of each tested sequence must be averaged over the water configurations.We then envisage two possible design protocols based on measuring the average sequence enthalpies hH ðhÞ R i, where the angular brackets h…i refer to the thermodynamic average calculated over equilibrium water configurations, including the bulk.The first strategy consists in minimizing the average enthalpy hH ðhÞ R;f i of the hydrated protein in its folded (f) conformation.The second consists in maximizing the enthalpic gap ΔH ðhÞ R between the f state and the unfolded (u) protein conformation, both hydrated.As a u state, we use a completely stretched structure that is a representative of all equivalent unbounded configurations.For both protocols, the enthalpy of the hydration water is averaged over equilibrium configurations, accounting for the energy, density, and entropy of the solvent at the given thermodynamic conditions and protein interface.We refer to the two protocols as MIN ENTHALPY and MAX GAP, respectively.In order to quantify the solvent contribution, we compare the results obtained with these two protocols with those of a simpler method (IMPLICIT SOLVENT) based on the minimization of the residue interactions without any water contribution.

A. Thermal and pressure stability of proteins
We focus our attention on ten different target proteins with lengths between 30 and 90 amino acids that are compact enough that the exposed surface to the solvent is the smallest possible.For sake of clarity, here we show only the calculations for the 30 amino acid proteins.The same results hold for longer proteins.In each protocol, we design the protein at a given temperature T d and pressure P d and test its stability in a wide range of T and P [37].We choose T d and P d uniformly over values ranging from the liquidgas transition to the glass-transition temperature, and from negative pressure to ≃1 GPa.For each (T d , P d ) and each protocol, we identify 5-15 optimized sequences, for a total of more than 1.5 × 10 3 optimizations, a number far beyond the capability of any fully atomistic protein model.For each designed sequence, we test the stability in T-P by checking, with Monte Carlo simulations, if the average equilibrium conformation coincides with the protein nativetarget structure.Then, we average the stability regions of all sequences optimized for a given target-native state.For all designed proteins, we find stability regions that resemble those predicted by theory [38], previous numerical works [5,7,9,39], and those observed in experiments [18,[40][41][42][43][44][45][46][47][48][49][50][51][52] (Fig. 1).Throughout the paper, we use IU for T and P defined in Methods.We find that the boundaries of the stability regions extend from low T, where proteins undergo cold denaturation, to high T, where thermal denaturation occurs.The T range of stability depends on the protein sequence and size.Small proteins undergo cold denaturation for T < 0.1 IU [53], consistent with previous results [9].Longer proteins, with higher content of hydrophobic residues, cold-denaturate at higher T. All the sequences we consider are unstable above the threshold pressure P ≃ 0.7 IU (Figs. 5 and 6 in Appendix D), qualitatively consistent with what is observed in Refs.[40,54,55].We find that denaturation occurs also at low or negative P, consistent with recent findings [9,39,56].
Experimental data for different proteins [Fig.1(b)] show that the higher the thermal stability of the proteins, the higher the pressure-stability region [17,18].Our simulations have the same trend.In particular, we find that the stability region is more extended for sequences designed at higher T d and intermediate P d , corroborating the hypothesis that stability ranges in T and P are positively correlated: Proteins with a pronounced thermal stability are also more stable with respect to pressure.Furthermore, we observe that sequences designed at high T d are stable both at high and low T, while those designed at lower T d can fold only at low T. For example, sequences designed at P ¼ 0 and T d ¼ 0.1 IU are stable up to T 1 ≃ 0.3 IU, while to get a sequence that also folds at higher T 2 ¼ 0.4 IU, we need to design it at T d ¼ T 2 [Fig.1(a)].These results are consistent with experiments comparing the stability range of mesophilic and thermophilic proteins [57] and with data revealing the higher resistance of thermophilic proteins to cold denaturation [19,20].More generally, we find that proteins that are designed at high T d are superstable; i.e., they also show a remarkable stability at low T and at high P.

B. Water effect on the protein surface and core hydropathy profiles
We rationalize this fundamental property of protein evolution in terms of the relative fraction of hydrophobic and hydrophilic residues in the sequence and how it changes depending on the thermodynamic state of the solvent.We calculate the fraction P PHI surf ðT d ; P d Þ of hydrophilic (PHI) amino acids on the protein surface-number of hydrophilic amino acids on the surface divided by the total number of surface amino acids-and the fraction Q PHO core ðT d ; P d Þ of hydrophobic (PHO) amino acids into the protein core-number of hydrophobic amino acids divided by the total number of core amino acids (Figs. 2 and 3).The comparison between P PHI surf ðT d ; P d Þ and Q PHO core ðT d ; P d Þ characterizes the level of segregation of the sequence in the folded state.By segregation, we mean the tendency of the optimized sequence to have a surface exposed to the solvent that is rich in hydrophilic amino acids and a core that is abundant in hydrophobic amino acids.For all designed proteins and for both design protocols with solvent, we find that the optimized sequences at ambient conditions (P d ≃ 0.7 × 10 −4 IU, T d ≃ 0.35 IU) are mostly hydrophilic on the surface and hydrophobic into the core, as expected.However, upon lowering T d , both protocols with solvent generate protein sequences that, in their folded state, have a lower segregation between a hydrophilic surface and a hydrophobic core (Fig. 3).These proteins with less segregation in their sequence are less stable at high T [Fig.1(a)].
This finding is consistent with previous studies on thermophilic proteins, i.e., proteins stable at high T, showing that the higher thermostability is correlated with a stronger segregation between the polar surface [60,61] and the hydrophobic core [62,63].In particular, a systematic analysis of the hydropathy profile of thermophilic [58], mesophilic [58] (i.e., stable at intermediate T), and icebinding proteins (i.e., proteins that interfere with ice growth, Fig. 7) [59] reveals that these categories of proteins have, on average, compositions that are close to those that we calculate with our design strategies at high T (T ≳ 0.45 IU and P ≃ 0), ambient T (T ¼ 0.35 IU and P ≃ 0), and low T (T ≲ 0.15 IU and P ≃ 0).Indeed, the data [58] (Fig. 3) show that ≃78.8% of the surface of thermophilic proteins is hydrophilic and ≃54.5% of their core is hydrophobic, while for mesophilic proteins, the percentages are ≃78.4% and ≃54.7%, respectively.Moreover, our direct analysis of ice-binding proteins [59] (Fig. 7) shows that ≃67.0% of their surface is hydrophilic and ≃48.5% of their core is hydrophobic.Hypothesizing that the protein's composition is optimized at the environmental conditions at which the protein works, we observe that the real proteins have a segregation between hydrophilic surface and hydrophobic core that follows the same qualitative trend as our prediction (Figs. 3 and 8).Hence, both our theory and the real data show that the higher the temperature of stability of the protein, the stronger the segregation of the amino acid sequence into the hydrophilic surface and hydrophobic core, although an extreme segregation does not imply the stability of the folded state at higher T, as shown by the sequences generated with the implicit solvent design [Figs.1(a) and 3].
To get insight into the role of water in the evolutionary process of sequence segregation, we perform a detailed comparison of the results of the two design protocols MIN ENTHALPY and MAX GAP.We find that they select similar sequences at high T d but not at low T d .
In particular, at high T d and intermediate P d , e.g., T d ¼ 0.45 IU and P d ¼ 0.2 IU, both protocols select sequences with strong segregation between the hydrophilic surface and the hydrophobic core (Fig. 2).Instead, at low T d , e.g., T d ¼ 0.1 IU and P d ¼ 0.2 IU, the MIN ENTHALPY protocol selects sequences with a large number of surface hydrophobic residues with minor changes in the core with respect to the high-T d case [Figs.2(a) and 2(c)], while the MAX GAP protocol leads to proteins with a large number of hydrophilic residues in the core without major changes on the surface with respect to the high- These differences in the selection are a consequence of the fact that at low T the hydrophobic amino acids are more soluble [3,9,64,65] because the energetic gain of forming water-water HBs at the hydrophobic interface compensates for the loss of water-hydrophilic residue HBs [9].Hence, on the one hand, the first protocol minimizes the enthalpy of the f state by increasing the number of exposed hydrophobic residues.
On the other hand, the increased solubility at low T of a largely hydrophobic core would decrease the enthalpy of the u state and, as a consequence, the relative enthalpic gain of the f state.Hence, to maximize this gain, the second protocol selects sequences with a less hydrophobic core with respect to the high-T case and without changes of the surface composition, i.e., of the enthalpy of the f state.
Remarkably, both categories of sequences-those with less hydrophilic surface and those with less hydrophobic core-have similar stability at low T [Figs.1(a  segregated-sequences that are larger than at high T. Furthermore, we observe that proteins designed at low T d -e.g., at (T d ¼ 0.2 IU, P d ¼ 0)-have a stability range that includes ambient conditions [Fig.1(a)].Hence, the lack of sequence segregation found in naturally evolved proteins [66,67] is a way to maximize the number of folding sequences at ambient conditions, taking advantage of the T-P-dependent water properties.
This observation allows us to reconsider the usual understanding of the high number of hydrophobic residues on the surface of natural proteins as a necessary compromise between solubility and functionality [66,67].To clarify this point, we apply the IMPLICIT SOLVENT protocol, which uses implicit water without accounting for the water contribution to the enthalpy; we find that it generates only sequences that are highly segregated-with ≃90% hydrophilic residues on the surface and ≃55% hydrophobic residues in the core-and that are unstable at extreme conditions, as well as those designed at ambient T and P [Fig.1(a)] [68].Therefore, our results suggest that at extreme T and P, proteins are more stable if not completely segregated in their sequence, and this is the result of an adaptation process in water since water is a solvent that changes properties with temperature and pressure.

C. High-pressure design and stability
We now focus on the role of water in selecting sequences that fold at high P.While proteins designed at high T d are stable in their f state in a range of T > T d , we find that those generated by the protocols at high P d fold only at P < P d [Fig.1(a)].Furthermore, for P ≳ 0.7 IU, no protein folds.
These findings are surprising because, although the design protocols at high P d select for proteins that are more homogeneous than those generated at lower P d -with surfaces less hydrophilic and cores less hydrophobic (Fig. 2)-the generated sequences are such that the hydrated protein enthalpy of the f state is much less than that of the u state (Fig. 4, inset).Therefore, the larger instability of the folded protein is due to all the terms in the Gibbs free energy that are not explicitly included in our calculation of the hydrated protein enthalpy, i.e., to the bulk-water contributions.
To clearly show this result, we calculate, as a function of P and T, the denaturation free-energy gain Δ G ≡ G f − G u , where G f and G u are the free energy for the entire systemprotein, hydration water, and bulk water-calculated for the f state and for a completely stretched conformation representative of the u state, respectively.This quantity is given by the difference Δ G ¼ Δ H − Δ TS between the denaturation enthalpy gain Δ H ≡ H f − H u and the folding entropy surplus Δ TS ≡ TðS f − S u Þ, where H α (S α ) are the enthalpy (entropy) of the α state with α ¼ f, u.A positive value of Δ H favors the denaturation, while upon protein folding, there is an entropy surplus, Δ TS > 0, due to the larger bulk-water contribution.Hence, the balance Δ H − Δ TS regulates the protein stability, favoring the u state when Δ G > 0 and the f state when Δ G < 0.
We find that for proteins designed at high P d and low T d , e.g., P d ¼ 0.8 IU and T d ¼ 0.2 IU (Fig. 4), both Δ H and Δ TS become larger for increasing P at constant T, with the denaturation enthalpy gain increasing faster than the folding entropy surplus inducing the unfolding of the protein.The increase of Δ TS is expected because, for increasing P, the number of HBs decreases, making the entropy difference between the f and the u state larger when the water-protein interface decreases upon folding.On the other hand, our results (Fig. 4, inset) suggest that the protein and hydrationwater contributions to Δ H are negative for the considered range of P and T and are slowly changing with P at T ¼ 0.15 IU.Therefore, the increase of Δ H implies that the contribution coming from bulk water is increasingly larger for increasing P. We conclude that the proteins designed at high P d are unstable in their native state under pressurization because of the bulk-water contribution to the total free energy G of the system.This contribution completely dominates G for those pressures for which the number of HBs is vanishing-in our case, at P > 0.6 IU [69,70].
Hence, according to our simulations, both the cold-and the pressure-denaturation boundaries are due to the freeenergy contribution of the hydration and bulk water [71].This limit is a consequence of the specific features of the HB, representing a natural barrier above which folding is not expected.More importantly, such boundaries are also barriers for evolution and can be crossed only by introducing artificial interactions between the amino acids to stabilize the protein native state (Fig. 1) [72].

IV. CONCLUSIONS
We present a computationally efficient model capable of describing the artificial evolution of protein stability regions as a function of the thermodynamic properties of water.With our method, we study a large number of scenarios and demonstrate that the resulting stability regions are qualitatively similar to those of natural proteins.
Our results elucidate the role that water has in the selection process of protein sequences.In particular, we show that the maximum denaturation pressure ≃1 GPa above which all proteins denature in experiments is a consequence of the specific features of the hydration and bulk-water hydrogen bonds.
We adopt design protocols that mimic the natural selection of proteins and find that the proteins selected at high T d are superstable (Fig. 1), i.e., are stable in a wide range of T and P, while those selected at lower T d are stable only in a reduced range of T and P. We observe that proteins designed to be stable at different conditions of T and P are characterized by sequences with different degrees of segregation between the hydrophilic surface and the hydrophobic core (Figs. 2 and 3).The optimal degree of segregation is selected spontaneously as a consequence of the free-energy balance of the protein aqueous solution without the necessity of an external active process of selection.In particular, we find that the segregation decreases by decreasing T d , and at ambient conditions, it is moderate (≃70% of the surface is hydrophilic and ≃50% of the core is hydrophobic).This result is consistent with a trend observed in the composition of thermophilic, mesophilic, and ice-binding proteins (Fig. 3).The broader stability of high-T designed proteins implies that these segregated sequences are a subset of those designed at lower T d .
Furthermore, the general observation that many proteins expose the solvent to a high percentage of hydrophobic residues [66,67], as predicted by our model, suggests that such an exposure is not a compromise between stability and biological functionality of the proteins but rather a natural consequence of the water properties.As a matter of fact, selecting artificial sequences with an extreme segregation does not increase the stability of proteins but rather reduces it.We believe that our findings could potentially improve the engineering of artificial biopolymers since the aggregation can be prevented or enhanced in different thermodynamic conditions, according to the hydrophilic-hydrophobic ratio of the protein surface, although an experimental validation of the proposed design strategies remains crucial.
Putting the hydrophobic effect in an evolutionary perspective, our results substantiate an intriguing hypothesis: Many features observed in natural proteins generally arise when the selection process explicitly takes into account the thermodynamic properties of the solvent.

ACKNOWLEDGMENTS
We are thankful to L. Rovigatti, E. Locatelli, and M. Goethe for helpful discussions and suggestions.V. B. and I. C. acknowledge support from the Austrian Science Fund (FWF), Grant No. P 26253-N27.V. B. acknowledges support from the FWF, Grant No. M 2150-N36.G. F. acknowledges support from Spanish MINECO Grants No. FIS2012-31025 and No. FIS2015-66879-C2-2-P.The computational results presented here have been achieved using the Vienna Scientific Cluster (VSC).

APPENDIX A: PROTEIN-WATER MODEL
At any P and T, we partition the total volume VðP; TÞ of our system into N regular, nonoverlapping cells of volume v ≡ V=N ≥ v 0 , where v 0 is the water-excluded volume, each occupied by one of the N C protein's residues or by water molecules (Fig. 10).We adopt a coarse-grain representation of the protein that follows previous works [29][30][31] and has been extensively used in the literature to get a qualitative understanding of protein properties (e.g., Refs.[4,5,9,73,74]) and protein design [21,22,[75][76][77][78].For the solvent, we consider a coarse-grain model capable of reproducing in a qualitative way, at least, the properties of water [8,[32][33][34]36,79] and its contribution to the protein folding [9,35].
The Hamiltonian for the entire system is given by where each term is defined in the following.As in Refs.[29][30][31], we assume that the protein's residues have only nearest-neighbor interactions, where the indices i and j run over the residues and j 0 runs over the N W ≤ N − N C water molecules in the hydration shell [80]; C is a contact matrix, with C l;k ¼ 1, if l and k are first neighbors, and 0 otherwise; S ij are elements of the Miyazawa-Jernigan residue-residue interaction matrix S [29,81,82] (solvent independent-see Table 1 in Appendix C) accounting for the correlations between real amino acids, and for the protein-water interaction [9].
Each cell accommodates, at most, one molecule, with the average O─O distance between next-neighbor water molecules given by r ¼ v 1=3 .We associate a variable n i ¼ 1 BIANCO, FRANZESE, DELLAGO, and COLUZZA PHYS.REV.X 7, 021047 (2017) 021047-6 with each cell, if the cell i is occupied by a water molecule and has v 0 =v > 0.5, and n i ¼ 0 otherwise.Hence, n i is a discretized density field replacing the water translational degrees of freedom.The Hamiltonian of bulk water is The first term, with UðrÞ ≡ ∞ for r < r 0 ≡ v 1=3 0 ¼ 2.9 Å (water van der Waals diameter), UðrÞ ≡ 4ϵ½ðr 0 =rÞ 12 − ðr 0 =rÞ 6 for r ≥ r 0 , with ϵ ≡ 2.9 kJ=mol, and UðrÞ ≡ 0 for r > r c ≡ 3r 0 (cutoff), accounts for the O─O van der Waals interaction between molecules i and j.The sum runs over all possible water-molecule couples.
The second term represents the directional component of the HB interaction, where N ðbÞ HB ≡

P
hiji n i n j δ σ ij ;σ ji is the total number of bulk HBs.The sum is over each nearest-neighbor pair, and the argument is nonzero if the following conditions are satisfied: (i) n i n j ¼ 1, i.e., r ij ≤ 2 1=3 r 0 ¼ 3.6 Å, the maximum distance that is conventionally associated with a HB (while n i n j ¼ 0 otherwise), and (ii) δ σ ij ;σ ji ¼ 1, with δ ab ¼ 1 if a ¼ b, and 0 otherwise.Here, σ ij ¼ 1; …; q is a bonding index.Conventionally, a HB is broken if d OOH > 30°, implying that only 1=6 of the entire range of values [0,360°] for the d OOH angle is associated with a bonded state.Thus, choosing q ¼ 6, we correctly account for the entropy variation due to HB formation and breaking.With this definition, each molecule can form up to four HBs with its neighbors.Bifurcated HBs are excluded.
The third term accounts for the HB cooperativity due to the quantum many-body interaction [83,84] and leads to the low-P tetrahedral structure [85].The cooperativity is defined as the sum N ðbÞ coop ≡

P
i n i P lk δ σ ik ;σ il , over all the water molecules i and over all the lk pairs of the bonding indices σ il and σ ik of the molecule i.We choose J σ ≪ J to guarantee an asymmetry between the two HB terms.
The formation of a HB in the bulk leads to the local volume increase v ðbÞ HB =v 0 , with an enthalpic variation −J þ Pv ðbÞ HB , which accounts for the P-disrupting effect on the HB network.Here, v ðbÞ HB =v 0 represents the average volume increase between high-density ices VI and VIII and low-density (tetrahedral) ice Ih, and it is chosen as an approximation of the average volume variation per HB when a tetrahedral HB network is formed.Hence, the volume of bulk molecules is The presence of the hydrophobic or hydrophilic protein interface affects the water-water hydrogen bonding in the hydration shell [7,[86][87][88][89][90][91][92][93][94] (water molecules that are first neighbors of the protein amino acids).Hence, the Hamiltonian for water, including hydration molecules and the many-body effect on HB formation close to the protein interface, reads where Within the square brackets of Eq. (A5), we have explicitly indicated the contribution from the water molecules at the protein-interface (h) neighboring hydrophobic or hydrophilic residues.
The hydrophobic interface strengthens the water-water hydrogen bonding in the first hydration shell [7,[86][87][88][89][90] and increases the local water density upon pressurization [90,[95][96][97].Therefore, we assume J PHO > J for HBs between water molecules at the hydrophobic interface and J PHO σ > J σ for the cooperative component.This condition guarantees that the solvation free energy of a hydrophobic amino acid decreases at low T [65].Following Ref. [9], we express the average volume change per water-water HB at the hydrophobic interface as a series expansion in P up to the linear term v PHO HB =v PHO HB;0 ≡ 1 − k 1 P [98], where v PHO HB;0 is the volume change associated with the HB formation in the hydrophobic hydration shell at P ¼ 0 and k 1 > 0. Therefore, the volume contribution V PHO to total volume V due to the HBs in the hydrophobic shell is V PHO ≡ N PHO HB v PHO HB , where V PHO and N PHO HB are the hydrophobic hydration shell volume and number of HBs, respectively.We assume that the water-water hydrogen bonding and the water density at the hydrophilic interface are not affected by the protein since they are equal to the bulk-water values.Therefore, J PHI ¼ J, J PHI σ ¼ J σ , and HB .Moreover, the polarization effect of hydrophilic residues on the HB network in the hydrophilic shell is not included here because, in our coarse-grain description, it does not show a qualitative change in protein denaturation mechanisms [9], consistent with previous observations [99].A HB between two water molecules, one hydrating a PHO amino acid and the other a PHI amino acid, is formed with a coupling constant ðJ PHO þ J PHI Þ=2, and it leads to a local increase of volume ðv PHO HB þ v PHI HB Þ=2.Hence, the total volume is where V ðhÞ ≡ V PHO þ V PHI is the volume due to the HBs in the hydration shell.
In order to favor the visualization and the understanding of our results, we adopt a model representation in two dimensions [4,5,7,[75][76][77][78]100].We choose the parameters in Eq. (A1) in such a way as to get proteins that are stable for 250 ≲ T=K ≲ 350 and P < 1 GPa, consistent with experimental observations [40][41][42][43][44][45][46][47][48][49][50][51]54,55,[101][102][103].We express all the quantities in IU: adopting 8ϵ as the energy unit, v 0 as the volume unit, 8ϵ=k B as the temperature unit, and 8ϵ=v 0 as the pressure unit.Accordingly, we fix the interaction constants (all expressed in IU) J ¼ 0.3 and J σ ¼ 0.05 for bulk water, J PHI ¼ J and J PHI σ ¼ J σ for water at hydrophilic interfaces, and J PHO ¼ 1.2 and J PHO σ ¼ 0.2 for water at hydrophobic interfaces.Moreover, we fix k 1 ¼ 4 and v PHO HB;0 ¼ 2 (in IU).These choices have two effects: (i) They balance the residueresidue, residue-water, and water-water interactions and make the proteins stable for thermodynamic conditions comprised in the (stable and metastable) liquid phase, including ambient conditions; (ii) they account for the lower surface volume ratio in the two-dimensional system with respect to a three-dimensional one, enhancing the interface interactions.Changes in parameters combine in a nontrivial way, resulting in a shift, broadening, or reduction of the stability regions of proteins, but leaving the qualitative scenario unaffected [104].Lastly, since our preliminary results on the three-dimensional many-body water model [105] show a phase diagram qualitatively similar to the one in two dimensions [36], we expect that our findings will remain substantially unaltered in a more realistic threedimensional version of the model.

APPENDIX B: DESIGN AND FOLDING SIMULATIONS
The concept of protein design refers to an optimization scheme for the sequence of amino acids, aiming to maximize the probability of folding into a specific target conformation.The design simulation consists in a broad sampling of the space of sequences, on top of a fixed protein structure that defines the native-target conformations (Fig. 10).We perform Monte Carlo simulations, keeping P, T, N C , and N constant, with point mutation of the sequence and residue swapping moves with the following acceptance probability: where Δ is the difference between the new and the old configurations in hH ðhÞ R i or hΔH ðhÞ R i or H R;R , depending on the design protocol; T o ¼ 0.05 IU is the optimization temperature, N n P and N o P are the number of permutations for the new (n) and old (o) amino acid sequences, respectively; and ϵ p ¼ 14 is a weighting parameter.The term ðN n P =N o P Þ ϵ p is added to bias towards highly heterogeneous sequences, which are better folders [24,30].Therefore, we minimize the enthalpy of the folded structure via a Monte Carlo scheme with separate acceptance criteria for the water moves and the sequence moves.While the water is simulated at T d and P d , the sequences are sampled at low optimization temperature T o .In Fig. 8, we show that tuning ϵ p does not significantly affect the hydropathy profile of proteins, although a low value of ϵ p generates homopolymer solutions.We set T o ≪ T, the design temperature, because we look for sequences with either minimum or maximum values of the enthalpy.A Monte Carlo step consists in an attempt to modify the protein sequence followed by a number of water moves that is large enough to equilibrate the solvent around the fixed protein.For the MAX GAP design protocol, we also perform a simulation with a completely stretched protein, whose sequence is identical to the folded protein, in order to calculate the enthalpy difference between the two conformations (native and unfolded).For both the waterdependent design protocols, the protein enthalpy that we consider is hH ðhÞ R i ≡ hH p i þ hH ðhÞ w;w i þ PhV ðhÞ i.We perform the design for a large number of thermodynamic state points, sampling ½3; 5 × 10 8 independent sequences for each T, P and protein target-native structure.Each water-dependent design is performed on a grid of ≃90 different T-P points.Implicit water design is performed once per structure.We sample the averages hH ðhÞ R i and hΔH ðhÞ R i for each sequence over ≃10 2 water configurations.We sort, in ascending order, the sequences according to their values of hH ðhÞ R i, −hΔH ðhÞ R i, and H R;R , and consider, for characterization and stability analysis, only the top 5; 15 sequences from each list.Overall, we perform more than 1500 independent designs.
We test the validity of the design by folding the protein with Monte Carlo simulations at constant values of P, T, N C , and N. We start from a stretched protein conformation and, keeping the amino acid sequence, allow the protein to move using local corner flip, pivot, and crankshaft algorithms [106].Water is equilibrated using cluster moves [36].For each sequence and each state point, we calculate the free energy as a function of the number of native contacts.A protein is defined to be stable if, at the free-energy minimum, 90% of its native contacts are formed.This definition guarantees that a stable protein folds in its native state.Computing the free energy on a grid of values of T and P yields the stability region in the T-P plane.By averaging the stability region over all sequences designed at the same T d and P d , we calculate the average stability curve.

APPENDIX C: RESIDUE-RESIDUE INTERACTION MATRIX S
In Table I we report the interaction matrix between the amino acids.This matrix does not include solvent contributions, since these are explicitly stated in the Hamiltonians H R;w and H ðhÞ w;w , and has been scaled by a factor 2 to balance with the water-water HB interaction.Amino acids are indicated with letters following the FASTA code.The amino acids I, V, L, F, C, M, and A are assumed to be hydrophobic, according to the Kyte-Doolittle hydropathy scale [82].

APPENDIX D: ADDITIONAL FIGURES
In this Appendix we report supplementary figures.
w;w þ H ðhÞ w;w , where the first term accounts for the Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

FIG. 1 .
FIG.1.T-P stability regions of designed and experimental proteins.(a) Average stability regions (closed curves) within which the protein sequences designed with the MIN ENTHALPY protocol at different T d and P d fold into the native state.The regions for proteins designed at P d ¼ 0 (red, green, and blue curves), close to ambient pressure, are enclosed one into another with an extension that is proportional to T d .All of these regions enclose the stability region of proteins designed at higher P d (pink curve).The dotted line shows the stability region for a sequence calculated with the IMPLICIT SOLVENT protocol.The "glass-transition" (solid black) line defines the temperatures below which the system does not equilibrate.The long-dashed line represents the limit of stability (spinodal) of the liquid with respect to the gas phase.Pressure and temperature are expressed in internal units (IU).(b) Experimental stability region for different proteins indicated in the legend (adapted from Refs.[40][41][42][43][44][45][46][47][48][49]).The long-dashed line is the liquid-gas transition line.The data show a clear positive correlation between the T range and P range of stability.Pressure denaturation is observed in a range of −0.1 ≲ P=GPa ≲ 0.6, while stability at higher P is reached by introducing artificial covalent bridges between the amino acids, as in the case of Zn cytochrome c[43].In both panels, the shaded region corresponds to ambient conditions.

FIG. 3 .
FIG. 3. Hydrophilic profile of a protein surface designed at ambient pressure.The fraction P PHI surf ðT d ; P d Þ of surface hydrophilic amino acid decreases with the design temperature T d , for both the design protocols MIN ENTHALPY (black circles) and MAX GAP (diamonds).Data are from Figs. 2(a) and 2(b) at P ¼ 0.Our results qualitatively follow the trend observed for the average hydropathy of real thermophilic[58] (large red circle), mesophilic[58] (large green circle), and ice-binding proteins[59] (large blue circle), plotted at high T (T ¼ 0.45 IU), intermediate T (T ¼ 0.3 IU), and low T (T ¼ 0.15 IU), respectively, to ease the comparison.Such values of T resemble the average working temperature of real thermophilic, mesophilic, and ice-binding proteins.The hydropathy profile (independent of T d ) of sequences designed with implicit solvent (dashed line) is unable to reproduce the observed trend.
FIG. 2. Hydropathy profile of the designed protein surface and core.The color-coded contour plots of the fraction P PHI surf ðT d ; P d Þ of surface hydrophilic amino acid [panels (a) and (b)] and the fraction Q PHO core ðT d ; P d Þ of core hydrophobic amino acid [panels (c) and (d)] as a function of the design temperature T d and pressure P d , calculated with both design protocols, MIN EN-THALPY [panels (a) and (c)] and MAX GAP [panels (b) and (d)], show a stronger segregation between the hydrophilic surface and the hydrophobic core for proteins designed at high T d and low or intermediate P d .The straight (green) line in the lowerright corner of each panel marks the liquid-gas spinodal line of the solvent.T d and P d are expressed in internal units.

FIG. 4 .
FIG. 4. Isothermal free energy, enthalpy, and entropy variation.Inset: Color-coded (in IU) maximum value of hΔH ðhÞ R i found for the sequences designed with the MAX GAP protocol.Main panel: The variations of the denaturation freeenergy gain hΔ G ðPÞ − Δ G ð0Þi, of the denaturation enthalpy gain hΔ H ðPÞ − Δ H ð0Þi, and of the folding entropy surplus hΔ TS ðPÞ − Δ TS ð0Þi increase with P along the isotherm T ¼ 0.15 IU for a protein designed at P d ¼ 0.8 IU and T d ¼ 0.2 IU.We calculate hΔ H i as thermodynamic average over equilibrium configurations, hΔ G ðPÞ − Δ G ð0Þi ¼ R P 0 hV f ðPÞ − V u ð0ÞidP and hΔ TS ðPÞ−Δ TS ð0Þi¼hΔ H ðPÞ−Δ H ð0Þi−hΔ G ðPÞ− Δ G ð0Þi.

FIG. 5 .FIG. 6 .
FIG. 5. Average stability regions for proteins designed at constant pressure according to the MIN ENTHALPY protocol.The proteins are designed along the isobar P d ¼ 0.2 IU.We observe how designed sequences at high T d are more resistant to thermal fluctuations.

FIG. 8 .FIG. 7 . 10 FIG. 10 .
FIG.8.Hydropathy profile of proteins designed at ambient pressure for different values of ϵ p .Here, we report how the hydrophilic profile of the protein surface and the hydrophobic profile of the protein core, respectively shown in panels (a) and (b), changes as a function of T d , at ambient pressure, according to the parameter ϵ p , appearing in Eq. (B1).The data are calculated following the protocol MIN ENTHALPY.In all cases, the segregation between the hydrophilic surface and the hydrophobic core decreases with T.

FIG. 9 .
FIG. 9. Average stability regions for proteins designed according to the MAX GAP protocol.Average stability regions for proteins designed along the isobar P d ¼ 0 and the isotherm T d ¼ 0.2 IU, respectively, shown in panels (a) and (b).Arrows point at the design pressure P d ¼ 0 IU and to the design temperature T d ¼ 0.2 IU.

TABLE I .
Residue-residue interaction energy expressed in internal units.