Prediction of Cross-Fitness for Adaptive Evolution to Different Environmental Conditions: Consequence of Phenotypic Dimensional Reduction

How adaptive evolution to one environmental stress improves or suppresses adaptation to another is an important problem in evolutionary biology. For instance, in microbiology, the evolution of bacteria to be resistant to different antibiotics is a critical issue that has been investigated as cross-resistance. In fact, recent experiments on bacteria have suggested that the cross-resistance of their evolution to various stressful environments can be predicted by the changes to their transcriptome upon application of stress. However, there are no studies so far that explain a possible theoretical relationship between cross-resistance and changes in the transcriptome, which causes high-dimensional changes to cell phenotype. Here, we show that a correlation exists between fitness change in stress tolerance evolution and response to the environment, using a cellular model with a high-dimensional phenotype and establishing the relationship theoretically. The present results allow for the prediction of evolution from transcriptome information in response to different stresses before evolution. The relevance of this to microbiological evolution experiments is discussed.

How adaptive evolution to one environmental stress improves or suppresses adaptation to another is an important problem in evolutionary biology. For instance, in microbiology, the change of resistance to one antibiotic by resistance acquisition by another drug is a critical issue that has been investigated as cross-resistance. Recent experiments on bacteria have suggested that the cross-resistance of their evolution to various stressful environments can be predicted based on the transcriptome changes after evolution under the corresponding stresses. However, there are no studies so far that explain a possible theoretical relationship between cross-resistance and changes in the transcriptome, which causes high-dimensional changes to cell phenotype. Here, we show that a correlation exists between fitness change in stress tolerance evolution and response to the environment, using a cellular model with a high-dimensional phenotype and establishing the relationship theoretically.
In the present study, we numerically evolved a high-dimensional gene regulation network, where genes determined the network and gene expression dynamics. From this network, the fitness of cells under given environmental conditions was determined by expression patterns of output genes. By numerically evolving such cells to satisfy several relationships between environmental inputs and protein expression outputs, we demonstrated that evolutionary changes in phenotypes are constrained to a low-dimensional subspace, whose dimension is given by the number of input-output relationships. This dimensional reduction is explained by the separation of a few eigenvalues in the Jacobian matrix for the expression dynamics. Additionally, we applied a large variety of environmental changes to these evolved cells and examined further evolution under these stress conditions. The correlation of the evolutionary change in fitness in response to different stresses is well predicted by the non-evolutionary environmental responses to each stress. This prediction is indeed possible, as evolutionary changes and adaptive responses are within the same constrained subspace. Therefore, by taking advantage of dimensional reduction, we formulated a potential theory of phenotypic changes caused by environmental and genetic changes. Then, the cross-fitness to different environmental stresses was computed by the Hessian matrix of the potential, which supports the results of the numerical evolution. Finally, we applied the theory to experimental data on bacterial evolution under antibiotics. The data demonstrate the predicted correlation between the fitness changes by evolution and transcriptome changes upon environmental stresses. The present results allow for the prediction of evolution from transcriptome information in response to different stresses before evolution. The relevance of this to microbiological evolution experiments is discussed.

I. INTRODUCTION
Generally, organisms change their state to adapt to various environmental stresses. This ability is thought to have been acquired through evolution [1,2]. Those who evolved to adapt to one environment may increase or decrease the degree of their adaptation to another environment. For example, adaptive evolution to one stressful environment may increase or decrease fitness to man- * takuya.sato.zs@riken.jp † chikara.furusawa@riken.jp ‡ lapikaneko@gmail.com age another stress as compared with that of the organism before evolution. This correlated change in fitness is called cross-resistance [3][4][5][6][7][8][9][10][11]. If the adaptive evolution to one environmental stress increases or decreases the fitness for another, the cross-resistance is positive or negative, respectively. In medicine, understanding the cross-resistance of bacteria to different antibiotics is a crucial issue. Can such cross-resistance be predicted? Extensive studies have been conducted to uncover specific genetic mutations which allow adaptive evolution to individual environmental stresses and to unveil functional changes that occur as a result of such mutations. Molecular changes caused by mutations have been identified in certain genes, which allow for resistance to environmental stress [12,13]. Molecular changes caused by mutations have been identified in certain genes, which allow for resistance to environmental changes. However, the detailed mechanisms of cross-resistance remain unclear. Crossresistance between different environmental conditions involves interactions among diverse components that are influenced by the mutation and are not explained directly by specific molecular changes. Examination of the correlation between fitness changes across different environmental conditions using standard molecular biology methods that focus on a one-to-one correspondence between genes and functions is not easy.
How can we compare adaptive evolution under different environmental conditions? For this purpose, we need to consider changes to the cellular state that is shaped by a wide variety of components. Such a cellular state can be represented by the concentrations of these components. Changes in the cellular state in response to environmental changes, such as antibiotics, temperature, and nutritional conditions, will lead to a change in the growth of a cell. The correlation of changes in the cellular state across different environmental changes will provide information on how organisms evolve to them. Such information involves high-dimensional data that characterize the cellular state.
Recent advances in experimental techniques have enabled the acquisition of high-dimensional data of cellular states, such as the transcriptome, proteome, and metabolome [14][15][16]. Using these high-dimensional data, a detailed analysis of the cellular state is now possible. However, how can we extract relevant information from high-dimensional data with thousands of components to obtain the correlation between evolutionary adaptation to different conditions?
A recent experimental report examined transcriptome changes throughout the evolution of bacteria in response to a variety of environmental stresses [8,17]. In these studies, the authors measured the cross-resistance, that is, how adaptive evolution to one environment, E 1 , changed the growth rate of bacteria in another environment, E 2 . Then, by measuring transcriptome changes through adaptive evolution, they constructed a lowdimensional linear model for these changes, explaining the observed cross-resistance. Notably, the environmental stresses adopted in their experiments had a variety of molecular effects on cells. The transcriptome of E. coli used in their experiment was high-dimensional data with over 4000 dimensions. Despite this complexity, low-dimensional information extracted from highdimensional information is suggested to be relevant to predict cross-resistance to a variety of conditions to a certain degree.
If cellular states moved throughout the entire highdimensional space during adaptive evolution to various stress environments, changes in phenotype (i.e., cellular state) in response to different stress environments would not be correlated, and predictions of cross-resistance by the environmental response would not be possible. How-ever, such predictions may be possible if transcriptome changes due to adaptive evolution are restricted to a relatively low-dimensional subspace. Is there general support for a such low-dimensional reduction in adaptive changes to cellular states?
Several recent experiments have suggested that changes in cellular state in response to environmental stresses are constrained in low-dimensional space [18][19][20][21][22][23]. Changes in the transcriptome of E. coli across various stress environments were found to be strongly correlated. Horinouchi et al. also showed that transcriptomic changes in independent evolutionary lineages converge along the common principal component (PC) space in the adaptive evolution of E. coli under ethanol stress. These results suggest that phenotypic changes in the adaptation and evolution of cells in response to environmental stress occur within a low-dimensional space.
How are phenotypic changes constrained to a lowdimensional space? By simulating a catalytic chemical reaction network model with thousands of components, it was found that high-dimensional concentration changes in response to environmental or mutational changes are constrained to a common low-dimensional space as a result of evolution to increase the fitness [24][25][26]. This constraint is then formulated in terms of dynamical systems theory as a separation of a few slow eigenmodes for the relaxation dynamics of the rate equation representing the cellular state changes.
Can we, then, theoretically predict cross-resistance using the information in such low-dimensional constraints [27]? In the presence of phenotypic constraints, responses to environmental and evolutionary changes are restricted to a common, lower-dimensional subspace. Accordingly, one does not need the entire high-dimensional data to predict the fitness change; information within the lowdimensional subspace will be sufficient to estimate the fitness changes across environmental conditions. Thus, the information needed to predict cross-resistance is significantly reduced. In the present study, we used a gene regulatory network (GRN) model to demonstrate such low-dimensional phenotypic constraints by evolution, and then demonstrated that cross-resistance is predicted by cellular responses to stress before evolution by taking advantage of phenotypic constraints [28][29][30][31][32][33].
The remainder of this paper is organized as follows: In Sec.II, we introduce the GRN model of a cell used in the present study and describe the procedure of simulated evolution. Next, in Sec.III, we show that phenotypic constraints are produced when the GRNs are evolved under fitness to satisfy multiple input-output relationships. We demonstrate that the degree of the phenotypic constraint acquired through evolution is determined by the number and strength of the postulated input-output relationships. We also explain such constraints in terms of the nature of gene regulatory matrices. In Sec.IV, we show the results of simulations of adaptive evolution to a variety of environmental stresses, by using the evolved GRNs obtained in Sec.III as the ancestor. Then, we com-puted the cross-fitness, that is, the fitness of a cell that has evolved under another environment for a new environment. We demonstrated that this cross-fitness is approximated using low-dimensional variables along with phenotypic constraint coordinates. In particular, when a P -dimensional phenotypic constraint exists, the crossfitness and cross-resistance that are desired from it are approximately described by a function of P variables. Then, the cross-fitness as a result of evolution is predicted by the correlation in transcriptome changes upon environmental stresses. In Sec.V, the approximate form of cross-fitness in Sec.IV is derived by assuming that fitness is given by a potential function of low-dimensional environmental and genetic coordinates. In Sec.VI, we apply the present theory to experimental data of evolution of resistance to antibiotics in E.coli. The experimental data well reproduce the predicted correlation between the cross-fitness and transcriptome changes. In Sec.VII, we summarize the result and discuss its relevance to crossresistance observed in experiments of bacterial evolution of antibiotic resistance.

A. Cell Model
We adopted a GRN as a model for the cellular state. The GRN is composed of N genes whose expression is represented by the N -dimensional vector x = (x 1 , x 2 , · · · , x N ), and the cell state is given by this vector. The time evolution of the state follows the rate equation: G is an N × N matrix representing the interactions between genes, satisfying G ij ∈ {−1, 1} (i = j), G ij = (i = j). G ij > 0 indicates that the product of jth gene positively regulates the ith gene, that is, accelerates its transcription. G ij < 0 represents negative regulation. η is an N I -dimensional vector that represents the input signal from the external environment to the cell, satisfying η i ∈ {−1, 1}. The strength of the interactions between the input signal and the GRN is represented by the N ×N I matrix I, satisfying I ij ∈ {−1, 1}. E is an Ndimensional vector representing the environmental stress. In the parametric region used in this study, cellular states always reach a unique fixed point x * = (x * 1 , x * 2 , . . . , x * N ) as a result of time evolution using the rate equation (Eq.(1a)). In this study, we refer to the fixed point x * of Eq.(1a) as the phenotype. The phenotype x * is uniquely determined for genotype (I, G, O) and environment η, E.
Both terms Iη and E represent the interactions between the external environment and the GRN, but their biological meanings are different. Iη represents the signal inputs from the external environment. Such input from the environment appears frequently over longterm, evolutionary timescales, allowing cells to adapt to these environments through evolution. For such evolved cells, we applied environmental stress E for a laboratory timescale, much smaller than the long-term evolutionary timescale (consider, for instance, the application of antibiotics to wild-type bacteria). Against such inputs, cells may be required to evolve by transient adaptations, which are lost in the long-term evolutionary time scale.
In this model, the fitness of a cell is determined by the expression of the output genes, that is, the vector o = (o 1 , o 2 , . . . , o N O ). The stationary expression of the output genes is given by Here, we postulate that the fitness for each condition η (n) is defined by the negative distance −|o * − t (n) | 2 between the output gene expression and the optimal gene pattern t (n) corresponding to each input signal η (n) , that is, the fitness takes a maximum value of zero if the expression pattern of the output genes o matches the optimal gene pattern t (n) (n = 0, 1, . . . , P − 1). Now, we consider P different environmental conditions with input signal η (n) (n = 0, 1, 2, . . . , P − 1) and an optimal gene pattern t (n) . In this study, we consistently use N = 100, N I = 8, and N O = 8.

B. Evolution
Evolutionary simulations were performed using the following procedure. In each generation, M mutant cells were created from L mother cells. The total population was M L. Mutant cells were generated by reversing the sign of each matrix element of the genotype (I, G, O) of the mother cell with probability ρ. In this study, ρ = 1/N 2 = 0.0001 was used. The fitness of each mutant cell was then calculated as follows: We calculated the fixed points x * and o * using the rate equation Eq.(1a) using the 4-degree adaptive Runge-Kutta method [34], and used this to calculate the fitness. Initial states for the calculation of the fixed point were randomly chosen from the uniform distribution 0 < x i < 1 (i = 1, 2, . . . , N ). However, because the model adopted in the present study has only one fixed point in the parameter region, the choice of initial values does not affect the results. Finally, the top L fitted cells were selected for the next generation of mother cells. In this study, we use M = 4 and L = 25.

A. Fitness
First, we performed evolution from randomly generated cells with given matrices I ini , G ini and O ini . I ini , G ini and O ini are randomly generated with probability p to take ±1 as follows: whereas G ii is set to 0. As mentioned, we assumed that cells need to respond appropriately to external inputs to survive; as such, output genes should take the appropriate expression pattern t (n) upon input signal η (n) . Fitness for a input-output pair (η (n) , t (n) ) under environmental stress E is given as followed; where o * | η (n) is stationary expression pattern of output genes with input signal η (n) and environmental stress E. Note that µ n (E) ≤ 0 and µ m (E) takes 0 only if the stationary expression pattern of output genes agrees with the target pattern. In this section, by considering P input-output relationships without environmental stress, that is E = 0, we used the following fitness functionμ: µ takes a maximum value of 0 only when the output gene expression pattern agrees with the target pattern t (n) for each of the input signals η (n) (n = 0, 1, . . . , P −1) from the environment. In this study, we used a nonsignal condition and the followingP pairs of signals and expression patterns of the output genes (i.e. P = 2P +1): Here, α is a parameter that represents the strength of the required output gene response and we define α (n) i )/2. The first signal-target relationship (η (0) , t (0) ) requires that there is no response to output genes when there are no environmental signals. ForP ≥ 1, we set a pair of patterns 2P −1 and 2P is symmetric from the case with no input signal. In addition, each input pattern η (2n) (n = 1, 2, . . . ,P ) is chosen to be linearly independent. The α (2n) (n = 1, 2, . . . ,P ) is linearly independent. That is, (η (2m) · η (2n) ) = 0 (m = n) and (α (2m) · α (2n) ) = 0 (m = n). When a set of (η (n) , t (n) ) (n = 0, 1, . . . , 2P ) is given by the above methods, there areP signal-target relationships. The purpose of the above pairwise signal-target relationship is to ensure that the symmetry of the phenotypic constraints is obtained as a result of evolution. However, the phenotypic constraints discussed below are obtained even when the signal-target relationship is randomly assigned, without the above symmetry. As a result of evolution, the fitness approached maximumμ ∼ 0, with µ n (E = 0) ∼ 0 for n ≤ 2P , as long asP and α are not so large( Fig.1). In the following sections, we study the behavior of such evolved networks.

B. Evolutionary dimension reduction
The phenotype of cell x * changes when environmental stress E is imposed. We denote the phenotypic change in response to environmental stress as δx * (E) = x * (E) − x * (0), where x * (E) represents the phenotype of the cell under environmental stress E, calculated using the environmental signal η (0) . We calculated phenotypic changes δx * (E) for cells evolved under various P and α, subjected to 10,000 randomly generated environmental stresses E. These environmental stresses E were generated such that each element followed a normal distribution with a mean of 0 and a variance of 1. We investigated the change in phenotype with environmental stress δx * (E) in the N -dimensional phenotypic space. However, as it is too high-dimensional, we performed principal component analysis (PCA) of over 10,000 phenotypic changes δx * (E) and examined if the variance was explained by a few components. The dependence of the explained variance onP and α is illustrated in Fig.2.
To study the validity of dimension reduction, we examined the dependence of the explained variance ratio (EVR) onP and α. As shown in Fig.2(a), the contribution of the topP PCs are large, whereas the components beyondP remain small. Recall that α is a parameter that represents the strength of the required target gene response; the larger α, the larger the response. The topP PCs account for a larger portion of the phenotypic change δx * (E) than other PCs do. This result implies that P , which represents the number of independent signaltarget relationships, determines the dimension of the phenotypic constraint. In contrast to the one-dimensional constraint studied earlier [21,25,35], the constraint tõ P (> 1)-dimensional constraint is generated, corresponding to the degree of freedom of environmental conditions in which the adaptive evolution progressed [36].
In summary, the dimension of the phenotypic constraint agrees with the degrees of freedomP of the signaltarget relationship, and the magnitude of the variance in these directions is correlated with the magnitude of the required target response. Note that the environmental stresses adopted to compute phenotypic variations are not included in the environment where evolution has taken place. However, the response to novel environmental changes is restricted toP -dimensional space after evolution. We also observed this in phenotypic changes caused by genetic mutations (Fig.S2) in the highdimensional gene expression space and the corresponding dynamical system analysis for an origin of phenotypic constraint in the dynamical system (see Sec.S3, S4, and S5 in supplementary material). These phenotypic changes due to environmental stresses and genetic mutation are restricted to a common low-dimensional space. This will be important for the correspondence between phenotypic changes in response to environmental stresses and due to adaptive evolution, to be studied in the following sections.

A. Fitness
In the previous section, we numerically evolved cells to realize the appropriate target pattern t (n) (n = 0, 1, . . . , 2P ) in response to each input signal η (n) (n = 0, 1, . . . , 2P ). As a result, phenotypic changes δx * in response to environmental stress E and mutation to geno-type G were constrained to the same subspace withP dimensions.
In this section, we adopt cells that have already evolved as in the previous section, achieved the phenotypic constraint, and then studied the evolution of adaptation to novel environmental stresses. This corresponds to the short-term adaptive evolution in laboratory experiments. Using this setup, we computed the cross-fitness, that is, the fitness of a cell that has evolved to adapt to an environmental stress E (1) , exposed to another environmental stress E (2) , and we show that the cross-fitness can be predicted by phenotypic constraints in a low-dimensional subspace.
In this section, we used the fitness µ 0 (E) (see Eq.3) in the evolutionary simulations. Evolution with this fitness requires that the appropriate target pattern t (0) be realized in response to the input signal η (0) in the presence of environmental stress E. Although η (0) and t (0) were used here, the qualitative results did not change when the other pairs of input signals and target gene response patterns were adopted. We calculated the fitness with one input-target relationship, assuming evolution under a constant environment over a short period, such as laboratory evolution.

B. cross-fitness
Now, we introduce the cross-fitness µ cross (E (1) , E (2) ) which is defined as the fitness of genotype G * (E (1) ), that is, the fitness when the cell, which has evolved to adapt to environmental stress E (1) , is exposed to environmental stress E (2) . Thus, it is represented as In other words, the cross-fitness µ cross (E (1) , E (2) ) represents the degree of adaptation under environmental stress E (2) of the cells that evolved to adapt to a different environmental stress E (1) .
In Fig.3, µ cross (E (1) , E (2) ) is plotted as a heat map with the evolved environment E (2) as the horizontal axis and the environment E (1) used to measure the fitness as the vertical axis. From the figure, it is difficult to obtain information from the heat map in which the environmental stresses E are randomly ordered.
To predict cross-fitness, we must find an appropriate feature variable y(E) that captures the effective internal state corresponding to environmental stress E. y(E) is a quantity determined by the cellular state before evolution to adapt to the stress.
Here, as a possible candidate for y(E), we adopted the PCs of the phenotypic changes δx * (E) against environmental stress E for the cells before the evolution because the dominantP PCs capture the phenotypic change under the phenotypic constraint, to which δx * (E) under environmental stress is restricted. The PC space was calculated using 10,000 random environmental stresses E, whose elements followed a normal distribution with a mean of 0 and variance of 1. The value y i (E) is the i th principal component value for the phenotypic change δx * (E). In Fig.4(a), the cross-fitness µ cross (E (1) , E (2) ) across 10,000 random environments is plotted as a function of y 1 (E (1) ) − y 1 (E (2) ) by red dots. It can be seen that the cross-fitness µ cross (E (1) , E (2) ) can be approximated by a single curve; that is, the cross-fitness µ cross (E (1) , E (2) ) is approximately represented by a single functionμ cross (δy 1 (E (1) , E (2) )) with δy 1 (E (1) , E (2) ) = y 1 (E (1) ) − y 1 (E (2) ), This is possible because of the existence of a 1-dimensional phenotypic constraint, as we adoptedP = 1 in this case.
Because the dimension of the phenotype constraint has been increased from 1 to 2, the cross-fitness µ cross (E (1) , E (2) ) is estimated as a function of a 2-dimensional PC plane in Fig.4(c). The difference in the colors of the dots in the Figure corresponds to the cross-fitness values. It can be observed that the points with the same crossfitness are distributed in a doughnut shape in the 2dimensional PC plane. Cross-fitness µ cross (E (1) , E (2) ) is represented by a function of two-dimensional arguments (δy 1 (E (1) , E (2) ), δy 2 (E (1) , E (2) )). It is suggested that when a D-dimensional phenotypic constraint exists, the cross-fitness µ cross (E (1) , E (2) ) can be approximated as a functionμ cross (δy 1 , . . . , δy D ).
In this section, we demonstrate the existence of an approximation function for the cross-fitness. These results suggest that the response of cells to environmental changes and evolution can be linked by phenotypic constraints, from which we can predict the cross-fitness in terms of a few, that is,P , PCs of the phenotypic change before evolution to novel environmental stresses.

C. Prediction of cross-resistance by cosine-similarity
So far, we have shown that cross-fitness can be approximated by a single curved surface using the information on the phenotypic constraints of the cell. To obtain this approximation function, information regarding phenotypic constraints, as represented by PCs, is required in advance. However, in a real cell, it may not be easy to determine this information: a large number of PCs are required to provide phenotypic constraints. Here, we propose a simpler alternative measure for predicting cross-fitness and demonstrate its reliability usingP = 3 conditions. Instead of the difference between the PCs of phenotypic changes in response to environmental stresses, we . , E (100) ) were randomly generated. The vertical axis represents the environmental stress used to measure adaptation, and the horizontal axis represents the environmental stress used for evolution. Stresses are ordered by the number of random seeds used to generate environmental stress. Here, y1(E (i) ) is the phenotypic change of δx * (E (i) ) when environmental stress E (i) is applied to the cells before evolution.
adopted a simple measure between two phenotypic responses to environmental stresses: cosine-similarity for phenotypic change δx * (E) in response to the stress environment E (1) , E (2) defined as follows: This is a quantity characterizing orientations between phenotypic changes δx * (E (1) ) and δx * (E (2) ); it takes 1 if they are oriented in the same direction, -1 if they are oriented in the exact opposite direction, and 0 if they are uncorrelated [37]. The cosine-similarity is symmetric for the stress environments E (1) and E (2) .
In Fig.5(a), the cross-fitness is plotted against cosinesimilarity across the pairs of randomly generated environments (both red and gray points). ForP = 3, one can see the correlation between cross-fitness and cosine-similarity (correlation coefficient 0.57). However, the correlation might not be significant, as shown in Fig.5(a). The main reason for this is that for some stresses E, the response is rather small, so the cosine-similarity and fitness change are small. To eliminate such "non-response" cases, we replotted the data across only the environment pairs under which the fitness decrease was larger than 0.1 (red dots in Fig.5(a)), for which the correlation coefficient was 0.79. The prediction of cross-fitness using cosine-similarity does not require direct information on phenotypic constraints. However, such constraints are necessary for the correlation between cross-fitness and cosine-similarity. Owing to the low-dimensional constraint, the environmental and evolutionary responses are correlated in the low-dimensional space, which reflects the cosinesimilarity (see also the discussion in the next section). ForP = 0, in which no phenotypic constraints evolved as no input-output relationship was postulated, such a correlation was not observed (Fig.5(b), correlation coefficient 0.25).

V. REPRESENTATION OF CROSS-FITNESS BY FITNESS POTENTIAL FUNCTION
A. Potential approximation of cross-fitness in low-dimensional phenotype space In the previous section, we showed that cross-fitness can be approximated based on the information that the phenotypic response of cells to environmental stresses is constrained in low-dimensional space. In this section, we describe a potential theory that characterizes the phenotypic response by representing fitness as a function of environmental and genetic changes.
For this, we consider a fitness function u(X) of Ddimensional variable X = (X 1 , X 2 , . . . , X D ). X is given as a function of genotype G and environment E. When phenotypic constraints exist, the phenotypic changes caused by environmental stress and genotypic mutations are restricted to a low, D-dimensional submanifold within the total N -dimensional phenotypic space.
We assume that the fitness function u(X(G, E)) has a maximum value at G (0) , E (0) . This means that the cell with genotype G (0) is adapted to environment E (0) . By expanding the fitness function u(X (G, E)) around G = G (0) , E = E (0) up to the second order, we obtain the following: where Note that the first-order derivatives (∂u/∂E) and (∂u/∂G) are equal to 0, because the fitness function reaches a maximum at G (0) , E (0) . Fitness decreases with environmental stress E; however, it is recovered by changing genotype G. At the end of the evolution, the fitness reaches a local maximum with genotype G * (E). Hence, G * (E) should satisfy the following condition: The solid lines represent the second-order approximation curve predicted from the theory. The second-order coefficient of the blue line is calculated by the least square method from the data on phenotypic changes under 10,000 randomly chosen environmental stresses. The second-order coefficient of the orange line is the (∂ 2 µ0/∂y 2 1 ), calculated using the information for the pre-evolutionary genotypes as given in Sec.V B. (b)P = 2. When two-dimensional phenotypic constraints exist, we cannot approximate the cross-fitness with the function of a one-dimensional feature value. (c)P = 2. The pair of environments (E (1) , E (2) ) used to measure adaptive evolution and cross-adaptation degree is transformed into a two-dimensional feature value space (y1(E (1) )−y1(E (2) ), y2(E (1) )−y2(E (2) )), plotted with colors coded according to the cross-adaptation µcross(E (1) , E (2) ). It can be seen that the pairs of environments corresponding to different adaptations are distributed in a doughnut shape. This is because the cross-fitness µcross(E (1) , E (2) ) atP = 2 can be approximated by a monotone univalent function whose arguments are (y1(E (1) ) − y1(E (2) ), y2(E (1) ) − y2(E (2) ).
By expanding the above equation to the second-order terms of δG and δE, we obtain: where δX i = δX i (G * (E), E) and δG * (E) = G * (E) − G (0) . Because the fitness function takes the maximum value at G (0) , E (0) , all eigenvalues of the matrix H = {H ij = (∂ 2 u/∂X i ∂X j ) G (0) ,E (0) } are negative, and H satisfies x T Hx 0 for any vector x, x T Hx takes 0 and only under the condition x = 0 (we assume that the change by stress is not so large, and remains within the range of the linear approximation in Eq.(8a) is valid). Therefore, when fitness is completely recovered by the genotype change G (0) → G * (E), δX(δG * (E), δE) should be a zero vector. Then, the following relationship holds: Here, (∂X/∂E) G (0) ,E (0) · δE is the first-order approximation of the phenotypic changes when the cell with genotype G (0) is subjected to environmental change E (0) → E. The conditions Eq. (11) indicate that the phenotypic changes caused by environmental change E (0) → E are cancelled out by genetic changes G (0) → G * (E) (justification of Eq.(11) is discussed in Sec.S6). Note that the existence of a genotype G * (E) may not always be guaranteed. However, as the dimension of the genotypic space is much larger than that of the phenotypic space, such a genotype will exist.
When the conditions Eq.(11) are satisfied, then Eq.(8c) with G * (E (1) ) and E (2) can be written as follows: This equation implies that the fitness of the cell that has evolved under environment E (2) , placed in environment E (1) , is given as a function of the difference between the phenotypic changes δX E (δE) under environments δE (1) and δE (2) . In Sec.IV B, the change δX E in Eq.(12a) is given by the changes in the PCs. Then, cross-fitness µ cross (E (1) , E (2) ) = µ 0 (E (2) ) with G = G * (E (1) )) can be approximated as a function of the difference in the PC changes in the phenotypes between E (1) and E (2) .

B. Application of potential theory to the result of evolution simulation
Following the argument in the last section, we estimate the coefficient of the second term of the crossfitness (∂ 2 µ cross /∂y 2 1 ) (y 1 is the first PC of the phenotypic change δx * ) from the change in µ cross against the change in y 1 . The solid lines in Fig.4 are predicted curves according to the above theory forP = 1 and α = 0.45. The coefficient of the blue one is calculated by the leastsquares method with µ cross = −cy 2 1 changing c. This curve approximates cross-fitness well, especially in regions where phenotypic changes are not too large.
The first and second terms in the above equation can then be interpreted as second-order approximations of fitness changes under stress environments E (1) and E (2) . The third term represents the interaction between stress environments E (1) and E (2) , that is, the fitness change in E (2) owing to adaptive evolution under E (1) . This term is proportional to the inner product of the phenotypic changes δX E (δE (1) ) and δX E (δE (2) ) under the metric H = {∂ 2 u/∂X i ∂X j | G (0) ,E (0) }. In other words, the third term corresponds to the difference in the orientation of phenotypic changes in E (1) and E (2) ; the similarity between the environments. In particular, when the first and second terms take the same value (fitness changes in E (1) and E (2) are the same), the cross-fitness is proportional to the cosine-similarity under the metric H. This proportional relationship between cross-fitness and cosine-similarity supports the results presented in Sec.IV C. Note that this relationship is obtained because the Hessian matrix of the cross-fitness can be approximated well by a constant multiple of the unit matrix in the present model.

VI. APPLICATION FOR LABORATORY EVOLUTION OF RESISTANCE TO ANTIBIOTICS
In this section, we apply the present theory to the experimental evolution of antibiotic resistance in E.coli, to confirm the prediction of cross-fitness by transcriptome changes. For details of the experiment evolution, see Sec.VIII and [8].
Here, 6 antibiotics A (1) , A (2) , . . . , A (6) were used for the experimental evolution. First, each antibiotic A k was added to the parental strain up to the level as long as the growth is sustained. Indeed this level is called the Minimum Inhibitory Concentration (MIC), which is the lowest concentration of an antibiotic that prevents visible bacterial growth, and is used as a measure to quantify antibiotic resistance, which corresponds to the fitness here. As a measure of phenotypic changes, we used logtransformed transcriptome responses, following our previous study [21], because changes in gene expression typically occur on the logarithmic scale. Namely, the phenotypic change was measured by the transcriptome change as δX i (A (k) ) = log 2 (x i (A (k) )/x i (N D)), where x(A k ) is the transcriptome data when an antibiotic A k is added near the MIC to the parent strain before evolution and x(N D) is the geometric mean of 3 independently measured transcriptome data under no-drug condition. As the fitness measure, we used log-transformed MIC values [log 2(µ g/ml] based on the previous study [8], which showed a linear correlation between log-transformed transcriptome changes and log-transformed MIC values. As MIC is larger, the fitness under the antibiotic is larger, so the former can be used as a measure of fitness Assuming that the laboratory evolution results in complete adaptation to antibiotics, we computed the relative MIC R M IC (A (k) , A (l) ) = M IC(A (k) , A (l) ) − M IC(A (l) , A (l) ), where M IC(A (k) , A (l) ) is the log-transformed MIC for A (l) of the strain that evolved to be resistant to A (k) , and used it as the measure of cross-fitness [38]. This quantity is non-positive which takes zero when A (k) and A (l) are the same antibiotics.
We then performed principal component analysis (PCA) for the transcriptome changes δX(A (1) ), δX(A (2) ), · · · , δX(A (6) ) and plotted in PC plane (Fig.6(a)). The contribution of the first principal component accounts for a high percentage of 63%. Then, similarly to Sec.IV C, we computed the differences y 1 (A (k) ) − y 1 (A (l) ) of the first principal component for an antibiotic A (k) used for the evolution of resistance and antibiotics A (l) used to measure the resistance of the evolved strain. In Fig.6(b), we plotted the measure of cross-fitness R M IC (A (k) , A (l) ) against y 1 (A (k) ) − y 1 (A (l) ), which would correspond to Fig.4(a). However, a clear one-dimensional curve as in Fig.4(a) was not observed. One possible explanation for this is that the phenotypic changes were not constrained in a 1-dimensional space.
From these data, we plotted the correlation between R M IC (A (k) , A (l) ) and S c (A (k) , A (l) ) ( Fig.6(c)), which corresponds to Fig.5(a), showing a significant correlation between them (Pearson's correlation coefficient is 0.70 with a p-value of 1.5 × 10 −5 ). Recalling that the gene expression dynamics involve hundreds of genes, this value is remarkable. Indeed it is about a similarly high value as obtained from the simulation. This result indicates that the present theory applies to the laboratory evolution of resistance to antibiotics.

VII. DISCUSSION
In the present study, we first evolved the gene regulatory network to be capable of multiple input-output relationships, to demonstrate that phenotypic changes due to environmental stress and genetic mutations are constrained to a lower-dimensional subspace, whose dimension corresponds to the degrees of freedom of the input-output relationship required for fitness. Phenotypic changes due to environmental stress and genotypic mutation are constrained to the common subspace, as formulated by dynamical system theory (see also [35,[39][40][41][42][43][44] for the relevance of dimensional reduction in biological systems).
In the present model, phenotypic constraints were caused by the separation ofP eigenvalues in the positive direction from the eigenvalues of the genotype G.
The inputs corresponding to these separatedP eigenvalues are amplified in the gene regulatory network, and the dominant phenotypic changes owing to environmental changes and genetic mutations are constrained in the directions of the eigenvectors of these eigenvalues. This suggests that the subspace of the phenotypic constraints reflects signals that have been important in the evolutionary process.
of the phenotypic change δx * (E (1) ) and δx * (E (2) ), cellular state changes to each stress E, before the evolution to stress. Thus, the cross-fitness µ cross (E (1) , E (2) ) can be well represented by low-dimensional (P ) phenotypic variables. This indicates that the fitness of the stress environment can be predicted by measuring the phenotypic changes in pre-evolutionary cells by the application of environmental stress.
To provide an approximate estimation of the crossfitness observed in evolutionary simulations, we introduced the fitness potential. The decrease in fitness due to stress is recovered when the phenotypic change in evolution completely cancels out the phenotypic change due to the stressful environment. By using the potential theory, cross-fitness can be approximated by the difference in phenotypic responses to the stress used for the evolution and that applied later for the test.
In this potential approximation, the cross-fitness is symmetric against E (1) and E (2) , that is, µ cross (E (1) , E (2) ) = µ cross (E (2) , E (1) ). Note that this is obtained by expanding the fitness up to the second order of phenotypic changes due to environmental changes and genetic mutations by assuming that these are not very large. If perturbations are much larger, the thirdor higher-order effects are not negligible, and the above symmetry no longer holds. However, even if the crossfitness is not symmetric, it is expected to correlate with the cosine-similarity (which is symmetric by definition) to a sufficient degree.
Notably, in the model in the present study, the Euclidian distance between the expression pattern of the output gene and the target pattern is used as the fitness function, which is symmetric for the input-output relationship (η (0) , t (0) ). This symmetry eliminates the third-(or odd-) order terms in the potential form. Hence, the symmetry could be violated to some degree, depending on the choice of the input-output relationship and fitness function.
Finally, we discuss the applications of our theory to experimental studies on laboratory evolution. In the present study, we first discuss the prediction of crossfitness using the PCs of phenotypic changes in response to environmental change. To do this, information on the representation of the fitness by the PCs is needed in addition to information on phenotypic constraints, which may not be obtained directly from experimental data. Later, however, we demonstrated the correlation between cross-fitness and cosine-similarity in phenotypic changes in response to the stressful environment. In fact, the transcriptome changes of laboratory evolution of E. coli confirm this correlation whereas fitness changes were not merely represented by the 1sr PC. The trend in fitness changes after evolution, thus, could be predicted by the transcriptome changes due to antibiotics before evolution(Sec.VI).
In the potential theory in the present study, we focused only on the full recovery of fitness via adaptive evolution in a stressful environment. However, in actual evolution, fitness may not be fully restored. We expect that our study will still provide relevant information in such cases, as long as phenotypic constraint exists and evolution occurs along it under a given fitness landscape. However, when a single mutation introduces drastic phenotypic changes or strong epistasis occurs during evolution, the correlation between cross-fitness and transcriptome cosine-similarity may not be clearly observed.
In the present study, long-term evolution provided a phenotypic constraint to the cellular state, and later adaptive evolution of such cells to a stress environment follows the already created constraint. This implicitly assumes that there is a time-scale gap between the evolution of the present cells and their laboratory evolution to gain stress tolerance. The phenotypic constraint itself was shaped by the former evolutionary process but was not altered by the latter, shorter-term evolutionary process.
Thus far, we have discussed cross-fitness in the presence of phenotypic constraints. Another commonly adopted measure of relative changes by evolution and adaptation is cross-resistance r(E (1) , E (2) ), given by It is defined as cross-fitness between E (1) and E (2) minus the fitness of the pre-evolutionary cell because we are mostly concerned with the relative fitness changes as a result of adaptive evolution. Therefore, even if the cross-fitness is symmetric, as in the present model, crossresistance is not. We should note this point when applying the present theory to cross-resistance using Eq. (14). Still, the present theory can be used to predict the cross-resistance (see Fig.7) for an example of the correlation between cross-resistance and transcriptome cosinesimilarity. This quantity is usually used as a measure to quantify antibiotic resistance. This will be useful for experimental verification of the present theory.
Cross-resistance is plotted against cosine-similarity forP = 3 case with 3-dimensional phenotypic constraint. Red dots represent the data from the stress environment, under which fitness change is larger than -0.1. Gray dots represent data with smaller fitness changes. The correlation coefficient across all data is 0.69, and that across only red dots is 0.85.

VIII. MATERIALS AND METHOD
We describe the method of laboratory evolution and the acquisition of data used in Sec.VI. The Insertion sequence-free Escherichia coli strain MDS42 [45] was purchased from Scarab Genomics. The cells were cultured in 200 µL modified M9 medium [46] in 96well microplates with shaking at 900 strokes min −1 at 34 • C. We prepared precultures by shaking -80 • C glycerol stocked MDS42 strains for 23 h without antibiotics. The cells precultured were diluted to an OD 600 nm of 1 × 10 −4 into 200 µL of fresh modified M9 medium in 96-well microplates with and without antibiotics. The final concentrations of antibiotics used in this study were as follows; 3.9 ×10 −3 µg/mL for Cefoperazone (CPZ), 1.2 ×10 −2 µg/mL for Cefixime (CFIX), 4.0 µg/mL for Amikacin, 2.0 µg/mL for Neomycin (NM), 3.1 ×10 −2 µg/mL for Enoxacin (ENX), and 2.0 ×10 −3 µg/mL for Ciprofloxacin (CPFX), respectively. The cultures were grown to an OD 600 nm in the 0.072∼0.135 range (the equivalent of 10 generations). 180 µL of exponential cultures were withdrawn rapidly, and cells were killed immediately by the addition of an equal volume of ice-cold ethanol that contained 10% (w/v) phenol. The cells were collected by centrifugation at 20,000 × g at 4 • C for 5 min, and the pelleted cells were stored at -80 • C prior to RNA extraction. Total RNA was isolated and purified from cells using RNeasy micro Kit with oncolumn DNA digestion (Qiagen) in accordance with the manufacturer's instructions.
Transcriptome analysis was performed as a previous study [8] by using a custom-designed Agilent 8 × 60 K array for E. coli W3110. Briefly, 100 ng of each purified total RNA sample was labeled using the Low Input Quick Amp WT Labeling kit (Agilent Technologies) with Cya-nine3 (Cy3) according to the manufacturer's instructions. Cy3-labeled cRNAs were fragmented and hybridized to the microarray for 17 h at 65 • C in a hybridization oven. Washing and scanning of microarrays were performed in accordance with the manufacturer's instructions Microarray image analysis was performed using Feature Extraction version 10.7.3.1 (Agilent Technologies).
The MIC values of evolved E. coli strains for the aforementioned 6 antibiotics were obtained in the previous study [8]. The transcriptome data and MIC values are available upon request.
In the data analysis, the intensity values were normalized using the quantile normalization method. We then excluded the following genes which the parent strain lacks fhuA, yagE, yagF, yagG, yagM,yagX, appY, ycdR, ymfD, ymfI, ycgG, paaJ, ydbD, cheW, yfjL, yqiG, yqiI, yhhZ, yrhA, intB, yjhI, fimD, hsdR, and yjiY. Furthermore, we excluded genes with low expression levels (≤ 100 a.u. in any strain) and with relatively small expression change in response to all 6 antibiotics (δX i (A k ) = log 2 (x i (A k )/x i (N D)) ≤ 1 for all A k ), since the expression changes of such low expression or relatively unchanged genes were dominated by the experimental errors.

ACKNOWLEDGMENTS
We thank Tetsuhiro Hatakeyama for his insightful comments. This research was partially supported by a Grant-in-Aid for Scientific Research (A) 431 (20H00123), a Grant-in-Aid for Scientific Research on Innovative Areas (17H06386) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, the Japan Society for the Promotion of Science (20J12168), and the Japanese Science and Technology agency (JST) ERATO (JPMJER1902). This research was also supported by the Novo Nordisk Foundation.
Appendix A: Cosine similarity between phenotypic changes due to environmental stress and genetic changes through adaptive evolution In Sec.V, we assumed that phenotypic changes due to environmental changes would be completely canceled out by phenotypic changes due to genotypic changes through evolution. To test that this assumption held in the evolutionary simulations, we measured the cosine similarity between phenotypic changes against environmental stress and the evolution under it; where δx * evo (E) is the phenotypic change in response to environmental stress E and δx * evo (E) is the phenotypic change from evolution with environmental stress E.
The potential theory discussed in this section assumes the ideal limit in which the phenotypic change δx * evo (E) due to genotypic change will completely cancel out the phenotypic change δx * env (E) due to environmental change, thereby recovering the fitness. The histogram is shown in blue in Fig.8(a) represents the genotypes that evolved atP = 0, α = 0.45. In this case, there are no phenotypic constraints, and the histogram does not deviate from the peak at similarity ∼ 0.
In contrast, the histogram plotted in orange represents the data examined for adaptive evolution from genotypes that evolved atP = 1, α = 0.4, where a one-dimensional phenotypic constraint is achieved. The distribution of cosine similarity is extended into negative regions, as shown in Fig.8(a). This implies that, when phenotypic constraints are present, the increased proportion of evolved cells has a cosine similarity closer to −1. The closer the cosine similarity is to −1, the more the phenotypic change δx * evo (E) in adaptive evolution is correlated with the phenotypic change δx * env (E) in response to environmental change.
To investigate how a larger proportion of evolution leads to cosine similarity close to -1, the histograms in the presence of one-dimensional phenotypic constraints were computed separately according to fitness when the pre-evolutionary cells were subjected to environmental stresses. As plotted in Fig.8(b), the cosine similarity, as a result of evolution against environmental stresses, shifted to a negative value, heading towards -1 as the reduction in fitness inflicted on the pre-evolutionary cells was greater than that observed in the post-evolutionary cells. This tendency was not observed in the absence of phenotypic constraints (Fig.8(c)).
In this model, the phenotypic space is 100-dimensional, whereas the number of output genes is eight. Moreover, for genotypes evolved withP = 1, the interaction matrix O between the gene regulatory network and the output gene is effectively a rank 1 matrix as a result of the evolution, with each row vector approximately 80% correlated. This implies that a large number of phenotypic patterns capable of realizing a given target pattern exist. Accordingly, there is a huge variety of phenotypic changes that cancel out changes in fitness caused by environmental stress. When phenotypic constraints are present, phenotypic changes due to adaptive evolution are more likely to occur along these constraints in the direction opposite to that of the response to environmental stress. In fact, in Fig.8, the distribution of the cosine similarity extends to the negative region in the presence of phenotypic constraints.
. The blue histogram represents the data of the genotypes evolved underP = 0. The orange histogram represents the data for the genotypes that evolved underP = 1, and we can see that the distribution shifts more negatively when there is a one-dimensional phenotypic constraint. (c) Histograms of (a) are divided into several parts according to the degree of fitness. (b) WhenP = 0, the shape and position of the distribution do not change even if the fitness changes, but whenP = 1, the larger the change in fitness, the more the distribution shifts in the negative direction.
FIG. S1. Explained variance ratio (EVR) of phenotypic changes for the first 5 principal components (PCs) when random environmental stresses E generated by Ei ∼ N (0, 1) were applied. The changes in x * for the evolved gene regulatory network were computed. The phenotypic changes δx * (E) were obtained for 10,000 independent environmental stresses, from which PCs were computed. Each variance was calculated for the cell with the top fitness value in the population that evolved for (a) P = 0 (b)P = 1 (c)P = 2 (d)P = 3, and was plotted against the stress strength α. The error bars represent the standard deviation of the five independent strains.  S2. Explained variance ratio (EVR) of phenotypic changes due to genotypic changes. The genotypic change G0 → G0 + δG was generated in the same manner in the evolution simulation with mutation rate ρ = 10/N 2 = 0.001. 10,000 phenotypic changes δx * (δG) were used in the principal component analysis (PCA). Each variance was calculated for the cell with the top fitness value in the population that evolved at (a)P = 0 (b)P = 1 (c)P = 2 (d)P = 3. The error bars represent the standard deviation of the five independent strains.

S3. PHENOTYPIC CONSTRAINT OF THE PRESENT MODEL IN DYNAMICAL SYSTEM
The formulation of phenotypic constraints in dynamical systems is given in [S1], extending those in [S2, S3]. In this chapter, we give the specific formula for the gene regulatory network in this paper. First, we restate the rate equations the gene regulatory network model in this paper follows.
The phenotype x * of the cell is given as a fixed point in Eq.(S1). Thus, the following equation is satisfied. x Now, considering the total derivative of Eq.(S2) by the parameter s Parameter t corresponds to environmental stress E or genotype G in this paper. By rearranging Eq.(S3), we obtain the following equation.
Using the Jacobi matrix J and the susceptibility tensor R s of the rate equation Eq.(S1), Eq.(S4) can be rewritten in the following form.
(S7b) L = J −1 . Phenotypic constraint means that the phenotypic change δx * (δs) of a cell due to a parameter change s 0 → s 0 + δs is restricted to a low-dimensional space. In other words, when an ensemble is taken with a random parameter change δt, the variances in the direction of certain eigenmodes become larger.

S4. ORIGIN OF DIMENSIONAL REDUCTION
In randomly generated genotypes in the present model, the response of the output genes to the environmental signal is quite small compared to the required response. Therefore, to achieve proper signal-target relationships, the input to the gene regulatory network must be amplified and transmitted to the target gene. Phenotypic constraints are acquired as a consequence of such evolution. This is suggested by the fact that the acquired phenotypes are constrained in theP -dimensional space depending on the magnitude of required output gene response α. Under a linear approximation, the phenotypic changes δx * (E) satisfies the following equations (see Sec.S3 for derivation); where L = J −1 is the inverse of the Jacobian matrix J in Eq.(S1) around the fixed point and R E = {[R E ] ij = (∂x i /∂E j )} is the partial derivative of the rate equation Eq.(S1) against environmental stress E. In the present model, J and R E are as follows; This means that, within a linear approximation, the explained variance of the phenotypic change δx * (E) with environmental change depends on the magnitude of the singular value of LR E . In fact, as shown in Fig.S3, the dependence of the singular value of LR E onP and α shows the same behavior as in Fig.S1. This suggests that the phenotypic constraints observed in the simulations of this model can be explained by the linear mode around the fixed point of the dynamical systems (see Sec.S3).

S5. EIGENVALUES OF G
Let us investigate the origin of the separation ofP -dimensional components in terms of the eigenvalue structure of G that changes directly during evolution. In this model, the G of a pre-evolutionary cell was generated so that the diagonal elements are 0, and the off-diagonal components and chosen to be −1 or 1 with equal probability. It is known that the eigenvalue distribution of such a random matrix is uniformly distributed within a circle of radius 1 centered at N → ∞ (Circular Law) [S4].
In the case of N = 100, used in this paper, the eigenvalue distribution of G ini is almost uniformly distributed inside a circle of radius 1 centered at the origin. In the early stages of evolution, the phenotypic changes x * under all input signal η (m) are not sufficiently large enough that output genes make the corresponding target patterns ( Fig.S4(a)). On the other hand, the eigenvalue structure of the evolved genotypes on the complex plane is plotted in Fig.S4(b). It can be seen that one eigenvalue is separated in the positive direction from the other eigenvalues. Except for the first eigenvalue, the eigenvalues follow the uniform distribution within a circle of radius 1 centered at the origin, which is preserved as in G ini . The introduction of phenotypic constraints did not destroy the original structure of the random matrix in G ini [S5]. This difference in the eigenvalue structures of the pre-and post-evolution cell was evolutionarily acquired. In Fig.S4(c), the real part of the eigenvalues of G of the cell with the highest fitness is plotted across generations, forP = 1 and α = 0.45. WhenP = 2, two eigenvalues are separated from the circle, as in the case withP = 1. Such preservation of the original eigenvalue structure is possible only whenP is small. AsP is larger, the radius of the circle in which the eigenvalues are distributed shrinks throughout evolution from the initial circle for a random matrix.