Dimensional reduction in evolving spin-glass model: correlation of phenotypic responses to environmental and mutational changes

The evolution of high-dimensional phenotypes is investigated using a statistical physics model consists of interacting spins, in which genotypes, phenotypes, and environments are represented by spin configurations, interaction matrices, and external fields, respectively. We found that phenotypic changes upon diverse environmental change and genetic variation are highly correlated across all spins, consistent with recent experimental observations of biological systems. The dimension reduction in phenotypic changes is shown to be a result of the evolution of the robustness to thermal noise, achieved at the replica symmetric phase.

Biological systems generally consist of a huge number of components. Biomolecules (proteins) consist of a large number of monomers (amino acids), whereas cells consist of a variety of proteins, mRNAs, and other chemicals. Despite such high-dimensionality, however, there is growing evidence that the responses of phenotypes to external changes are often restricted to a low-dimensional subspace.
For instance, the concentrations of a huge variety of components such as mRNAs and proteins have been recently measured against a variety of environmental stresses. The changes in the (logarithmic) concentrations of mRNAs or proteins are found to be correlated [1][2][3] or proportional [4][5][6] across all components, against a variety of environmental stresses. This global proportionality suggests that phenotypic changes against environmental perturbations are constrained along a one-or low-dimensional manifold, a manifestation of a drastic dimension reduction from the high-dimensional composition space [7,8]. Indeed, such dimension reduction would be rather universal in biological systems, as reported in studies of protein dynamics [9], ecological systems [10], and neural learning dynamics [11]. This global proportional change is also extended to the evolutionary dimension. Changes in each concentration upon genetic mutation and those upon environmental perturbations are also highly correlated [12][13][14][15][16]. It has been recently conjectured that such dimension reduction is a consequence of the evolution to achieve functional phenotypes that are robust to perturbations. Although some evolution simulations of catalytic-reaction networks support this conjecture [7,17], thus far, the concept remains an intuitive sketch, and an underlying mathematical structure remains elusive.
At this moment, a statistical physics approach would be useful to address the question of if and how the dimension reduction evolves. Previously, we studied a sta-tistical physics model of spins, whose stochastic change is governed by a Hamiltonian that includes the two-body spin-spin interaction J i,j under thermal noise, specified by the temperature [18,19]. In the model, the following correspondences are taken: phenotypes → spin configurations {S i }; rule to shape the phenotype → Hamiltonian for spin-spin interaction H = − i,j J ij S i S j ; environmental condition → external field h i to each spin in the Hamiltonian. The evolution process is introduced by the "mutation" in J i,j and a selection according to the fitness defined from the spin configuration. By evolving the Hamiltonian under a certain temperature, we have previously demonstrated the evolution of Hamiltonians to shape phenotypes to be robust to perturbations at an intermediate temperature, whereas replica symmetry breaking (RSB) was demonstrated to lead to a nonrobust phenotype at a lower temperature.
By taking advantage of this spin model and evolving it under a certain temperature, one can investigate if the dimension reduction in phenotypic changes, as observed in biological systems, is formulated and understood in terms of statistical physics. Specifically, we focus on the following questions: (i) Are high-dimensional phenotypic changes against various environmental changes correlated? (ii) Are the changes induced by environmental and genetic changes correlated? (iii) If the above two correlations are observed, are they a result of dimension reduction from a high-dimensional phenotypic space, shaped by evolution? (iv) Finally, within what range of temperature are the above questions answered affirmatively? In other words, is the appropriate noise relevant to the evolution of dimension reduction? By answering these questions, we will elucidate the origin of dimension reduction in terms of statistical physics, in possible relationship with replica symmetry (breaking). Now, we define a spin-statistical physics model for phenotypic evolution, in which the phenotype is denoted by spins S = [S 1 , · · · , S N ] ∈ {−1, +1} N . The dynamics of the spins are given by the stochastic dynamics, prescribed by the Hamiltonian H as where superscript T denotes the transpose, and J ∈ R N ×N is a symmetric matrix whose diagonal components are zero. With this Hamiltonian, the spin dynamics with discrete time t is given by the transition probability where S (t) is the phenotype at step t, and The inverse temperature β = T −1 describes the stochasticity of the phenotype expression process. The elements of the interaction matrix are chosen as J ij ∈ Ω J (i = j) with Ω J = {−1/ √ N , 0, 1/ √ N }, and J ii = 0 (i = 1, · · · , N ). This matrix represents the genotype, which evolves over generations, as will be described later.
The fitness is generally given as a function of phenotypes, i.e., the spin configuration. Here, we assume that a part of the spins, named targets i ∈ T , contributes to the fitness, such as the active site residues of protein. As more of the target spins have the same value +1 or −1, the fitness ψ(J ) is higher, as defined as where N T is the size of T , and · · · denotes the average over the trajectories of the phenotype expression dynamics, which depend on genotype J . The evolution to select genotypes with higher fitness is represented by the following stochastic update rule with discrete time, represents the selection pressure; as T J decreases, only the genotypes with sufficient fitness survive to the next generation.
We mainly describe the results for N = 100 and ρ ≡ N T /N = 0.05, unless otherwise mentioned. For the phenotype dynamics eq.(2), we adopt the Markov Chain Monte Carlo (MCMC) method with detailed balance condition. After a sufficient number of updates, the distribution of S is expected to converge to the equilibrium distribution, P (S) ∝ exp(−βH(S|J )), for a given genotype. We numerically calculated the thermal average over t f = 2×10 4 MC steps, after discarding the initial t i = 10 4 steps.
At each generation g, The candidates of genotype J (g+1) are generated by introducing the mutations with probability p µ = 0.05. The values of J ij (i = j) change into one of the components in Ω J \J ij with equal probability, where A\a denotes the members of A, excluding a. We numerically update genotypes over generation g max at T J = 0.05. [20]. Without a loss of generality, hereafter, we set the target sites as T = {1, · · · , N T }. We numerically obtain 100 genotypes evolved at ρ and T with different initial conditions, and the set is denoted as J (T ).
First, we present the existence of three phases that depend on T [18,19]. Fig.1(a) shows the temperature dependence of the averaged fitness over J (T ). At T > T c1 , the fitness value is at the level expected by the random spin configuration, which indicates that the target spins do not show any order, and we define the phase T > T c1 as paramagnetic phase. The high-fitness phase is separated into two phases at T = T c2 . This phase boundary is characterized by the convergence of the belief propagation (BP) algorithm [21]. From the correspondence between the replica analysis and BP algorithm, it is expected that the convergence limit of the BP algorithm corresponds to the replica-symmetry-breaking transition [22]. As shown in Fig.1(b), the fraction of J ∈ J (T ), in which the BP algorithm does not converge within 10 5 steps, increases from zero at T c2 . Hence, the phases T c1 ≤ T ≤ T c2 and T > T c2 correspond to the replica symmetric (RS) and RSB phases, respectively. Now, we discuss if the response to different environmental conditions is correlated or not, depending on the phase. Hereafter, we study the symmetry breaking local magnetization µ i = sign(m T )S i , considering the Z 2 symmetry. Under the infinitesimal external fields, the difference between expression patterns δµ where χ ij (h, J ) = ∂µ i (h, J )/∂h j is the susceptibility. We regard eq.(5) as the response of the i-th component to the additional external field, for a system with genotype J subject to external field h. For simplicity, we consider the case that an external field δh i , whose i-th component is δh i ( = 0), otherwise 0, is applied to the system at h = 0. The first-order response of the j-th component to δh i is χ ji (0, J ). At the equilibrium, where · h means the average according to the equilibrium distribution under the external field h; P (S) ∝ exp(−βH(S|J ) + βh T S).
We numerically compute χ ij by MCMC simulation as Here, we ignore the responses of µ 1 and µ 2 to remove the trivial strong response directly to δh 1 and δh 2 itself. In Fig.1(e), T -dependence of the correlation coefficient {χ i1 } · · · {χ i,N T } is shown, which is averaged over J (T ). The correlation between the responses to external fields δh i (i ∈ T ) is discernible in the RS phase [23]. Next, we study the correlation between responses to the environment, δµ where M i,jk = ∂µ i (J )/J jk , which corresponds to β( S i S j S k − S i S j S k ) at the equilibrium. For the comparison between δµ i (J ), we assume that the components of δh and δJ independently follow a Gaussian distribution with mean 0 and variance for δh, and variance / √ N for δJ , respectively. The expected squared responses are given by where E δh [·] and E δJ [·] denote the average over δh and δJ , respectively, and χ i = j =i χ 2 ij , and M i = N −1 j<k,j,k =i M 2 i,jk . The quantities χ i (h, J ) and M i (J ) correspond to the spin-glass susceptibility and These numerical simulations indicate that the evolution under thermal fluctuation that leads to the RS phase induces the correlations between the responses. To understand the emergence of the correlation, we decompose the evolved genotypes into eigenvalues and eigenvectors as J = ΞΛΞ T , where Λ ∈ R N ×N is a diagonal matrix consisting of eigenvalues Λ ii = λ i (λ 1 ≥ λ 2 ≥ · · · ≥ λ N ), and Ξ = [ξ 1 , · · · , ξ N ] ∈ R N ×N is the set of corresponding eigenvectors. Fig.3(a) shows the averaged values of the first and second eigenvalues over J ∈ J (T ). The first eigenvalue is much larger in the RS phase than those in the other phases. The evolutionary change of the second eigenvalue is vanishingly small for any T . This tendency is common for any λ i (i ≥ 2). Hence, the dominancy of the first eigenmode is enforced as a result of the evolution at T c2 ≤ T ≤ T c1 .
On the basis of the large contribution of the first eigenvalue in the RS phase, we apply a 1-rank approximation of genotype J ∼ η 1 ξ 1 ξ 1 T . By a straightforward calcula-tion, the local magnetization is expressed as at sufficiently large N . Therefore, when the first eigenmode is dominant, the relationship ξ 1 i ∝ atanh(µ i ) should hold at h = 0. Fig.3(b) shows the correlation coefficient between {atanh(µ i )} and {ξ 1 i }. In the RS phase, the correlation coefficient approaches 1; hence, ξ 1 i ∼ atanh(µ i ) is a reasonable approximation. We note that the expression of J = η 1 ξ 1 ξ 1 T is similar to those of the Mattis model [24,25], which is the Hopfield model with a single embedded pattern [26]. The present embedded pattern, however, is √ η 1 ξ 1 , in contrast to a discrete vector with ±1 in the Mattis model. For sufficiently small ρ, the distribution of µ i is almost random, and the embedded pattern after the evolution is a random pattern, except the target spins [27] [28]. Last, we show that the dominancy of the first eigenmode of genotype induces a correlation between the responses to environmental and genetic changes, as observed in the RS phase. From eq.(9), we obtain the expression for susceptibility under the 1-rank approximation where v i = β(1 − µ 2 i ) and δ ij is Kronecker's delta. Because of the randomness of the embedded pattern, it is reasonable to assume that χ ij (i = j) is sufficiently small; hence, S i S j ∼ µ i µ j holds. Applying the equilibrium relationship M i,jk = ∂ S j S k /∂h i , we obtain M i,jk ∼ χ ij µ k + µ j χ ik . Because {µ i } is expected to be randomly distributed, M i = jk χ 2 ij µ 2 k holds, neglecting the cross-term. Setting Q ≡ N −1 i µ 2 i , we obtain Hence, the proportionality between {χ i } and {M i } is a consequence of the dominance of the first eigenmode evolved in the RS phase, i.e., the evolutionary dimensional reduction [29] The relationship eq.(11) is indicated by the solid line in Fig.2. We quantify the deviation of the observed χ-M relationship from the theoretical line eq.(11), by the normalized mean squared error d Fig.3(c) shows the T -dependence of d averaged over J (T ). In the RS phase, d is close to 0; hence, eq.(11) holds with high accuracy, which is a result of the emergence of the dominant first eigenmodes, accompanied by randomness in the non-target spins.
When T (< T c2 ) is close to the RS-RSB boundary, d is close to 0, as with the RS phase. The difference between the RS and RSB phase is clear for finite h and ∆J , which is a deviation of J from J (T ). We randomly generate h ∼ N (0, 2 I) and symmetric ∆J , where ∆J ij ∼ N (0, 2 /N ), and ∆J ii = 0 ∀i. We quantify the relationship between χ i (h, J ) and M i (J + ∆J ) using d. Fig.3(d) shows -dependence of the averaged d over J (T ) and 100 samples of h and ∆J for T = 1 (RS) and T = 0.5 (RSB). In the RSB phase, d increases faster than it does in RS phase, even when d at = 0 is close to zero. This robustness of the proportionality is also a consequence of the dominant first eigenmode [30].
The proportionality between {χ ij } and {χ ik } (j, k ≤ N T , j = k), shown in Fig.1(b), is also a consequence of the dominant first eigenmode. From eq.(10), the leading term of susceptibility is In the RS phase, both v j and ξ 1 j are functions of µ j ; hence, χ ij /χ ik ∼ 1 holds when µ j µ k . This is the origin of the linear relationship between {χ ij } and {χ ik } [31].
In summary, we applied an evolving spin-statistical physics model, representing phenotypes, genotypes, and the environment by spin configuration, interaction matrix, and the external field, respectively, and have answered the questions addressed at the beginning of this paper. (i) Correlated responses across different environmental changes are demonstrated by the correlation in susceptibilities χ ij and χ i in the evolved genotypes at the RS phase. (ii) Proportional responses to mutation and environmental changes are demonstrated by the proportionality between the "susceptibility to interaction matrix" M i and spin-glass susceptibility χ i . (iii) These proportional responses originate in the reduction of rank in the interaction matrix. (iv) Such dimension reduction and proportional changes are observed for the evolved genotypes at the RS phase, i.e., at an intermediate level of thermal noise.
Hence, robustness of phenotypes to noise [32,33] is essential to the evolutionary dimension reduction, leading to the correlated responses in the high-dimensional phenotypes to different types of perturbations. Although the present statistical physics model is highly simplified, it gives a theoretical basis for dimension reduction in biological systems, in which robustness to noise is also essential. For example, recent reports on protein dynamics suggest the existence of large collective motion, which may be a manifestation of dimension reduction [9,[34][35][36]. Although dynamics at the cellular level are not represented by a Hamiltonian, the similarity between spin-glass dynamics and gene expression dynamics with mutual activation and inhibition is now well recognized [32,33,[37][38][39].
In terms of statistical physics, the evolution to the RS phase under appropriate levels of noise should be considered, in which both higher fitness and robustness to noise are achieved with the dimension reduction. If the temperature is reduced, robustness in the phenotype is lost by RSB, even though a higher fitness state is reached after sufficient time steps of expression. Here, we have studied the simplest fitness condition. For higher biological functions, the response to diverse environmental conditions, say, different target spin configurations upon the application of different external fields, may be required. The extension to such problems would be straightforward, in which the need for both robustness and plasticity may lead to dimension reduction with higher ranks.