Universal constraints on selection strength in lineage trees

We obtain general inequalities constraining the difference between the average of an arbitrary function of a phenotypic trait, which includes the fitness landscape of the trait itself, in the presence or in the absence of natural selection. These inequalities imply bounds on the strength of selection, which can be measured from the statistics of trait values and divisions along lineages. The upper bound is related to recent generalizations of linear response relations in Stochastic Thermodynamics, and shares common features with Fisher's fundamental theorem of natural selection, and with its generalization by Price, although they define different measures of selection. The lower bound follows from recent improvements on Jensen's inequality, and both bounds depend on the variability of the fitness landscape. We illustrate our results using numerical simulations of growing cell colonies and with experimental data of time-lapse microscopy experiments of bacteria cell colonies.


I. INTRODUCTION
Quantifying the strength of selection in populations is an essential step in any description of evolution. With the development of single cell measurements, a large amount of data on cell lineages is becoming available both at the genotypic and phenotypic level. By analyzing the statistics of cell divisions in population trees, one can measure selection more accurately than using classical population growth rate measurements [1]. Similarly, by tracking phenotypes on cell lineages, one can obtain statistically reliable estimations of the fitness landscape of a given trait and of the selection strength of that trait [2]. In addition, an optimal lineage principle can be used to infer the population growth rate [3] or selective forces [4] from lineage statistics. All these methods contribute to bridging the gap between single-cell experiments at the population level and molecular mechanisms [5].
An alternate method to infer selection in evolution focuses on dynamical trajectories of frequency distributions [6,7]. In these works, Mustonen et al. introduced the notion of fitness flux to characterize the adaptation of a population by taking inspiration from Stochastic Thermodynamics. In fact, ideas from Stochastic Thermodynamics can be applied directly at the level of individual cell trajectories [8]. By following this kind of approach, we have derived general constraints on dynamical quantities characterizing the cell cycle such as the average number of divisions or the mean generation time [9,10]. These constraints are universal because they hold independently of the specific cell dynamic model and they are indeed verified in experimental data. Other examples of universality in the context of evolution includes the identification of universal families of distributions of selected values and the use of methods from extreme value statistics [11,12].
Here, we derive universal constraints for the average value of a function of a trait, and for its selection strength, by exploiting a set of recent results known un- * arthur.genthon@espci.fr der the name of Thermodynamic Uncertainty Relations (TUR). These relations take the form of inequalities, which generalize fluctuation-response relations far from equilibrium [13], and which capture important trade-offs for thermodynamic and non-thermodynamic systems [14] as recently reviewed in [15]. Although our results are framed in the context of cell population in lineage trees, they apply more broadly to general stochastic processes defined on any branched tree.
We start in section II by laying the theoretical framework and the definitions of the forward and backward samplings of lineages within a tree, which are at the core of the notions of fitness landscape and selection strength.
In section III, we derive a general upper bound for the difference between the average values of an observable with respect to two different probability distributions, which we use in section IV to obtain an upper bound for the strength of selection. That result goes beyond the Gaussian approximation. In section V, we study the case of small variability which leads to a simple expression of the strength of selection, reminiscent of the Gaussian case. Those expressions have mathematical similarities with Fisher's fundamental theorem of natural selection and Price's equation, although they correspond to different definitions of selection, as detailed in section VI. To complement the upper bound on the strength of selection, we use a recent sharpened version of Jensen's inequality to derive in section VII a lower bound for the strength of selection. Both bounds are tested with simulations and experimental data in section VIII, showing a very good agreement with the theory. Finally, we conclude in section IX. Several appendices (appendices A to G) present the details of the calculations, supplementary figures, and numerical comparisons between our results and results previously published.

II. A FRAMEWORK FOR LINEAGE STATISTICS
A colony of cells can be represented as a branched tree, whose branches are called lineages and whose nodes correspond to cell divisions. We assume that each cell in the population divides after a stochastic time into exactly m daughter cells. In order to extract relevant statistics from such a tree, one needs to sample the lineages following a weighting scheme. The backward (or retrospective) and the forward (or chronological) samplings have been introduced in the context of populations of cells [2,5,16], and previously defined in the mathematical literature [17,18]. The backward sampling of lineages assigns a uniform weight N (t) −1 to each of the N (t) lineages, leading to an over-representation of cells coming from sub-populations that divided more than average. To compensate this bias, the forward sampling takes into account the number of divisions K along a lineage and assigns to the lineage a weight N −1 0 m −K , where N 0 is the size of the initial population. Intuitively, a lineage is followed forward in time from a cell in the initial population, by choosing with uniform probability 1/m which daughter to follow among the m daughters at each division. In this sense, the forward sampling cancels the effect of selection because the sister cells born from the same division have the same weight, regardless of their reproductive successes, i.e. the sizes of the sub-populations they generate. Thus, the statistics obtained with a forward sampling of the lineages within a tree reproduces the statistics obtained in single-lineage experiments, like in mother-machine configuration [19]. A graphical example of the two samplings for a simple tree is given on fig. 1.
A general phenotypic trait S then admits a forward and a backward distributions respectively defined by p for (s, t) = ∞ K=0 n(s, K, t)/(N 0 m K ) and p back (s, t) = n(s, t)/N (t), where n(x, t) is the number of lineages featuring a cell with trait value x at time t. Comparing p for (s, t) and p back (s, t) offers some insight on the effect of selection on trait S. For this purpose, we define the fitness landscape as [2] h t (s) where Λ t = ln(N (t)/N 0 )/t is the population growth rate. Note that in the classical framework of evolutionary dynamics, the notion of fitness landscape finds its origin in Wright's seminal work [20], and is defined as a mapping between the values or versions of a phenotype or a genotype, with their associated fitnesses [21]. The notion of fitness in biology has multiple meanings, but is often understood in this context as the reproductive success, or growth rate. In contrast, this is not the case for the fitness landscape we defined, which therefore should not be confused with the growth rate of the sub-population carrying the trait value s. Indeed, the reproductive success is defined by the comparison of the frequencies of a trait in a population over time, whereas the fitness landscape as we defined it compares the frequencies of a trait at the same time but in ensembles with and without selection. Therefore, h t (s 1 ) > h t (s 2 ) means that the trait value s 1 benefits more from selection than s 2 , but not necessarily that s 1 has a greater reproductive success than s 2 . As a consequence, cells carrying the value s 1 could still be less represented in the population than those carrying trait s 2 . The two points of view are linked by simple relations as detailed in appendix A, and we argue in section VI that this subtle difference leads to different definitions of the effect of selection, and that the point of view which compares chronological and retrospective distributions could be more suitable to describe selection for certain applications.
When the statistics of trait S is unaffected by selection, that is when there is no correlation between the number of divisions undergone by a cell and the value s for this trait, then p back (s, t) = p for (s, t) and the fitness landscape is flat, equal to the population growth rate. Instead, if the statistics of the trait is strongly perturbed by selection, then the fitness landscape is more rough and exhibits important deviations from its mean value.
Therefore, the variance of the fitness landscape appears as a natural candidate to quantify the roughness of this fitness landscape effect. However, this variance can be computed in both the forward or backward ensembles, giving related but different results, and it is therefore unclear which of the two should be used. To resolve this issue, we define the strength of selection Π S acting on the trait S as the change in mean fitness landscape between the ensembles with and without selection [2]: This quantity indeed reflects the roughness of the fitness landscape, since it is null when the fitness landscape is flat and becomes larger as the difference between the backward and forward statistics for trait S increases. This behavior is well-understood by writing the strength of selection as [2]: where J is the Jeffrey's divergence, a non-negative and symmetric information-theoretic distance between the two distributions p back (s, t) and p for (s, t), defined as J (p(x)|q(x)) = (p(x) − q(x)) ln(p(x)/q(x))dx. Let us briefly comment on two points. First, the strength of selection defined here should not be confused with the coefficient of selection, usually defined as the relative difference in fitness associated with two values of a phenotypic trait [6]. Second, the strength of selection is a function of time, since fitness landscapes are time-dependent by definition. Only if a steady state is reached in the long time limit, then h t (s) tends to a constant equal to the steady state population growth rate Λ, and the strength of selection tends to 0, as expected since selection no longer shifts trait frequencies.
In the particular case of Gaussian distributions, the strength of selection Π S and the variance of the fitness landscape are in fact linked by a very simple relation. More precisely, when the forward distribution for the fitness landscape is Gaussian, and for a bijective function h t (s), then its backward distribution is also Gaussian, with the same standard deviation but a shifted average value, leading to (see appendix B) where the variance can be indifferently taken over the forward or backward sampling. Note that we recover here a result known from [2], in a more direct way and with restricted assumptions, since in that reference the authors derived this relation assuming that the joint distribution of h t (s) and K was a bivariate Gaussian distribution. However, the Gaussian case only covers a small portion of realistic cases, and fitness landscapes can exhibit strong deviations from Gaussian distributions. In the context of age-controlled divisions [10], it can be shown that the distribution of age fitness landscape is non Gaussian and depends on the shape of the division rate as a function of the age. Moreover, we show on fig. 3 in section VIII (and on fig. 6 in appendix G) experimental fitness landscapes that are non Gaussian.
In this article, we derive universal relations going beyond the Gaussian assumption, and obtain a set of upper and lower bounds for the strength of selection, in terms of both the forward and backward variances for the fitness landscape. To do so, let us first derive a general inequality constraining the difference in average value for an observable between two probability distributions.

III. GENERAL FLUCTUATION-RESPONSE INEQUALITY
We consider a general system described by a reference probability distribution p a (s, t), where s is the value taken by a state variable S. By perturbing the system, we change the distribution of the variable S from p a (s, t) to p b (s, t). We consider an observable depending on the variable S, through a function g t (s), and ask the question of how the mean value of this observable is modified when the system is perturbed.
Assuming that p a (s, t) and p b (s, t) have the same support, we can define the ratio Let us now compute the covariance between g t (s) and q t (s) with respect to p a (s, t): where we used q t a = 1, due to the normalization of p b , and q t g a = g t b .
Following the method used in [14] to derive meanvariance trade-off bounds in horse race gambling, we use the Cauchy-Schwarz inequality for the covariance: with σ 2 a the variance with respect to p a (s, t). Finally, by combining eqs. (6) and (7), we obtain a general bound for the difference in average values: The inequality can be understood as an out-ofequilibrium generalization of the fluctuation-dissipation theorem, because it involves a comparison between a reference unperturbed dynamics and a perturbed dynamics. The difference between the unperturbed and the perturbed averages of the function g t (s) is bounded by the unperturbed fluctuations of this function, measured by σ a (g t ), times σ a (q t ) which is a information-theoretic distance between the two probability distributions. Indeed, since q t a = 1, the variance of q t is given by σ 2 a (q t ) = s ds (p b (s) − p a (s)) 2 /p a (s), and thus the larger σ a (q t ), the further away p b (s, t) and p a (s, t) are from each other.
To derive eq. (8), we adopted the point of view of the unperturbed statistics p a (s, t) as reference, but a similar bound can be obtained in terms of standard deviations with respect to the perturbed dynamics p b (s, t). We consider the covariance between g t (s) and r t (s) = 1/q t (s), with respect to p b (s, t): Following the same steps, and using the Cauchy-Schwarz inequality for this covariance we finally obtain where the term σ b (r t ) is similarly interpreted as an information-theoretic distance measure between the two distributions. Thus, combining eqs. (8) and (10), the change in mean value of the function g t of the variable S between an unperturbed and a perturbed statistics is bounded by A similar bound for | g t b − g t a | was derived by Dechant et al. in [13], using Jensen's inequality. Their bound (eq. 5 or 11 in their text) also involves a measure of the distance between the two probability distributions (Kullback-Leibler divergence) and the standard deviation of the observable considered in the unperturbed dynamics. We carry out a numerical comparison between the two bounds in appendix C, to find which one is the tightest of the two. This shows that the relative performance of the two bounds depends on the shape of the perturbed and unperturbed distributions. In any case, our bound is easy to evaluate since it does not require an optimization over a free parameter, as it is the case in [13] (see eq. (C2)).

IV. THE STRENGTH OF SELECTION IS BOUNDED BY THE VARIABILITY IN FITNESS LANDSCAPE
The results derived in the previous section for general distributions a and b are now used to obtain constraints on the strength of selection. Indeed, by setting the unperturbed distribution a to be the forward distribution of a phenotypic trait S and the perturbed distribution b to be the backward distribution of this trait (which is allowed since the forward and backward distributions have the same support), the difference g t back − g t for is the change of mean value for g t between an ensemble without selection (forward) and with selection (backward), while the perturbation is measuring the selection itself. In this context, the ratio q t (s) and the fitness landscape h t (s) are linked by the simple relation An important application of the above results is when the arbitrary function g t (s) is the fitness landscape h t (s) itself. In this case, eqs. (6) and (9) read These equalities generalize the linear relation between the strength of selection and the variance of the fitness landscape, valid in the Gaussian case (eq. (4)). To better highlight the role of the variability of the fitness landscape, we write eq. (11) in this context where the absolute values in the l.h.s. can be removed because the strength of selection is defined positive, as deduced from eq. (3). Note that that the l.h.s. of eq. (14) involves averages with respect to the two probability distributions, unlike what happens in the standard TUR where only one such average is present. The reason is that in Stochastic Thermodynamics, the two relevant probability distributions correspond to a forward and a time-reversed dynamics, and the quantity which replaces g t (s) is a current, which changes sign under time reversal symmetry. Here there is no such symmetry present, hence the two averages are not the opposite of one another.
We obtained a universal upper bound for the strength of selection acting on trait S, which involves the information-theoretic distances σ for (q t ) and σ back (r t ) between the backward and forward statistics, and the variances of the fitness landscape in both ensembles, which are in general different from each other.
Even if the interpretation of σ for (q t ) as a distance in the framework of linear-response theory is general, σ for (q t ) can also be expressed in terms of measurable quantities for cell colonies: Thus, σ for (q t ) quantifies the relative fluctuation of the quantity exp [th t (s)], which itself represents the ratio of the expected number of lineages ending with trait value s, rescaled by the number N 0 of initial cells, to the forward probability of this trait value (see appendix D). A similar interpretation can be given for the term σ back (r t ).

V. A LINEAR RESPONSE EQUALITY
Let us now investigate precisely the conditions for which the previous inequalities become saturated. It is straight-forward to show that when the forward and backward statistics are equal, inequalities eqs. (11) and (14) are saturated. Indeed, the l.h.s terms are 0 and the r.h.s terms are null because they contain the standard deviation of the constant quantities q t (s) = r t (s) = 1.
We now study the case where the two probability distributions approach each other. One possible measure of the distance between the two distributions is σ(q), or equivalently σ(ln q) = tσ(h t ). In the limit tσ(h t ) → 0, referred to as the small variability limit, the l.h.s of eq. (11) reads (see appendix E) and the l.h.s of eq. (14) when the function g t is the fitness landscape itself reads where the variance and the covariance can be equivalently taken over the forward or backward sampling. When computing the r.h.s of eqs. (11) and (14), we obtain that eq. (14) is saturated in this limit whereas eq. (11) is not. The limit can also be written t σ(h t ) −1 which defines a characteristic timescale of the system. In practice, this limit can be reached either for short times or in the case of a strong control mechanism on the divisions, leading the lineages to stay synchronized even after a finite time. It is also possible to regard this limit as a regime of weak selection [22], since the strength of selection is small precisely because of eq. (14).

VI. COMPARISON WITH FISHER'S FUNDAMENTAL THEOREM AND PRICE'S EQUATION
In this section, we highlight the similarities between our results and the relations derived by Fisher and Price, in which the population growth rate, or fitness, associated with a trait value s plays a similar role to our fitness landscape h t (s). However, because these notions of fitness are distinct, as explained in section II and further analyzed in appendix A, the interpretations of selection contained in these equations are qualitatively different.
Fisher's fundamental theorem of natural selection states that the time derivative of the mean fitness of a population is equal to the variance of the fitness across the population [23,24]: dΛ t /dt = Var back (Λ i (s, t)), where Λ i (s, t) = (dn(s, t)/dt)/n(s, t) is the instantaneous growth rate, or instantaneous fitness, of the subpopulation of size n(s, t) carrying the trait value s. The variance is computed with respect to the backward distribution, which puts equal weights on individuals, and therefore is the natural distribution to consider. The r.h.s of both eqs. (4) and (17) and Fisher's theorem involve the variance of a certain kind of fitness within the population. In contrast, the l.h.s. in Fisher's theorem is a measure of evolution of the population, while the l.h.s. in our result is a trait-dependent measure of selection. Moreover, some well-known limitations of Fisher's theorem lie in the implicit assumption that natural selection is the only possible phenomenon leading to a change in the gene frequencies [24]. This assumption neglects many important phenomena such as mutations and recombination events [22], random drift due to finite population size, and specific features of seascapes [7]. In contrast, our result does not suffer from any of these limitations, since it only requires the population to be represented as a branched tree, and is completely independent of the dynamics that generates the tree.
Price's equation [24] predicts the time evolution of the mean value of a trait, and involves two terms: a covariance term representing the selection effect, and the 'environment change term', or dynamic effect, which accounts for all the other sources of variability leading to a change in the mean value of the trait. The part of the time evolution of s back that is due to the separate effect of natural selection, in Price's sense which we denote with the superscript NS, can then be written as where ∆ s NS back = ( s back (t + τ ) − s back (t)) NS , Λ(s) = n(s, t + τ )/n(s, t), and Λ = Λ(s) back . We can draw a parallel between this equation and eq. (16), as their r.h.s. both involve the covariance of the trait subjected to selection and a fitness associated to it. Note that there is no 'environment change term' in eq. (16) because the strength of selection is defined precisely in such a way as to isolate the effect of selection from other potential sources of variability.
Price's equation should be viewed as a way to separate the effect of selection from the effect of the environment, rather than as a predictive or quantitative formula to compute them, as remarked in [25]. The same can be said of all our results, where the strength of selection and the covariance between a trait value and its associated fitness landscape value can be viewed as two possible definitions of selection. These two notions of natural selection are different because of the distinction between growth rate and fitness landscape: one is concerned by the change in the frequencies of the trait values over time, and is computed by counting individuals, while the other one represents the shift in the frequencies of the trait values at snapshot time t between situations with and without selection, and is based on the comparison between chronological and retrospective samplings of the lineages.
Let us give a minimal example for which the strength of selection is non-zero while the mean value of the trait S is unchanged, because of the balance between heterogeneity in reproductive success and phenotypic switching at division. This case is illustrated on fig. 1 for a trait S taking only two values: s = 1 and s = 2. Individuals with trait value 1 reproduce typically twice as fast as those carrying trait value 2, but they can also switch to trait value 2 randomly at division. For simplicity, the values 1 and 2 of the trait cannot change themselves over time, in other words there is no environment effect here. The average value of trait S is the same at time t = 0 and at time t, the covariance term in Price's equation is zero, and there is no selection in Price's sense. Therefore, from this point of view, there is no difference between this situation and the situation where both values 1 and 2 reproduce at the same rate, without phenotypic switching at division. One the other hand, individuals with trait value 1 are over-represented in the backward statistics as compared to the forward statistics, while the opposite is true for trait value 2, meaning that the fitness landscapes for s = 1 and s = 2 are different. Indeed, p back (s = 1, t) = p back (s = 2, t) = 1/2, p for (s = 1, t) = 3/8, p for (s = 2, t) = 5/8, which leads to th t (s = 1) = ln (4/3) + tΛ t and th t (s = 2) = ln (4/5) + tΛ t , with tΛ t = ln 3, using eq. (1). This difference in fitness landscape results in a non-zero strength of selection, tΠ S = ln (5/3) /8, using eq. (2). We argue that the strength of selection Π S may be a more appropriate way to define selection, since it gives a non-zero measure of selection for the example discussed above, and thus is more representative of the selection occurring in the population.

VII. A LOWER BOUND FOR THE STRENGTH OF SELECTION
We showed how the equality between the strength of selection and the variance of the fitness landscape distribution, that holds in the Gaussian case, becomes an inequality in general. To complement the upper bound on the strength of selection given by eq. (14), we now derive a non-trivial lower bound, which presents an interest to quantify the minimal effect of selection on a particular trait.
Using a property of the Jeffrey's divergence, the strength of selection can be decomposed as a sum of two Kullback-Leibler (KL) divergences: J (p back |p for ) = D KL (p back |p for ) + D KL (p for |p back ), where D KL (p|q) = p(x) ln(p(x)/q(x))dx. The positivity of both KL divergences, ensured by Jensen's inequality, gives Λ t − h t (s) for ≥ 0 and h t (s) back − Λ t ≥ 0. By combining these two inequalities, we recover that the strength of selection is a positive quantity.
We can therefore improve the trivial bound on the strength of selection, which is 0, by improving the two inequalities separately, using a sharpened version of Jensen's inequality, derived in [26]. Let us now detail how this works in our problem.
We define the convex functions ϕ for (x) = e tx , ϕ back (x) = e −tx and the function where ϕ stands for the derivative of ϕ. The sharpened version of Jensen's inequality reads and is used to improve upon the inequality Λ t − h t (s) for ≥ 0. A similar improvement is obtained for h t (s) back − Λ t ≥ 0 by considering ϕ back instead of ϕ for . Combining the two results gives (see appendix F) which shows that the lower bound depends on the forward and backward variances of the fitness landscape, as well as on its average values and on the minimal (resp. maximal) values of these distributions denoted h min (resp. h max ). When the fitness landscape is a monotonic function of the value of the trait, which is the case for cell age and size [10], or for the number of divisions, these extreme values are given by the extreme values of the trait itself. Several weaker but simpler forms of this inequality, this time independent of the average fitness landscape values, or independent of the extreme values, or independent of the two, are derived in appendix F. In any of those cases, the lower bound is a linear combination of the forward and backward variances of the fitness landscape.

VIII. TESTS OF THE LINEAR RESPONSE RELATIONS
We now illustrate the various bounds for growing cell populations, using both simulations and time-lapse video-microscopy experimental data [27].
First, we test eq. (8) for the number of divisions K, and for the linear function g t (K) = K, so that the inequality bounds K back − K for . We simulate lineage trees starting from one cell, for a particular agent-based model in which cells are described by their sizes. Cell sizes continuously increase at constant rate between divisions, and cells divide after a stochastic time only depending on their sizes. Each simulation of such a tree yields a single point on the scatter plot fig. 2, which shows the ratio of σ for (K)σ for (q t ) to K back − K for versus the population growth rate Λ t . Two sets of points are presented, which only differ in the final time of the simulation. As expected from eq. (8), all points in both sets are above 1. When the duration of the simulation is small (t = 3), the final population is small, around N ∼ 20, therefore for a given tree the lineages do not have time to differentiate significantly and the variability in the number of divisions among the lineages is small. In that case, simulations points are approaching the horizontal dashed line at y = 1 corresponding to the saturation of the inequality. The final population N fluctuates significantly from one simulation to the next, because the simulation time is short and all simulations start with a single cell with random initial size. As a result, the dispersion of values of Λ t is large. Now, when doubling the duration of the simulation, the cloud of scattered points is considerably reduced in both directions. The horizontal dispersion reduces because as t increases, the state of the system at the final time becomes less and less affected by the initial condition. On the vertical axis, there is a gap between the lower part of the scatter plot and the horizontal line at y = 1 due to the increase of heterogeneity in the number of divisions in the lineages with the simulation time.
Second, we test our results on real data extracted from [27], which are made of 11 population trees corresponding to the growth of E. Coli in different nutrients. We focus on the number of divisions K, the size X and the age A, which are easily accessible and for which we previously studied the theoretical fitness landscapes [10]. Only plots for the size are presented in the main text, and similar plots for the age can be found in appendix G.
The first step is to determine the fitness landscapes, which are shown for three particular experimental conditions on fig. 3. Each row corresponds to a particular experiment, the first column displays the fitness landscapes as functions of the size x, and the second column shows the distributions of the corresponding fitness landscapes, computed with the forward size distributions. It is straight-forward to demonstrate that in the case where the number of divisions K is completely determined by the value s of the trait, then h t (s) = h t (K(s)) = K(s) ln 2/t [2,10]. In this case, the fitness landscape is an ensemble of plateaus corresponding to the values of K featured in the population at time t, and cells on the same plateau have undergone the same number of divisions, even though they have a different value s. Those predicted plateaus are actually observed for several experiments, as shown on fig. 3a,c. Indeed, we evaluate the mean number of divisions for the set of cells used to compute each point of the fitness landscape, and represent it with a color code. We also plot the theoretical plateaus, with the discrete number of divisions K corresponding to the plateau on the right y-axis of each plot. We see a very good agreement between the mean number of divisions of cells aligned on a particular plateau and the value K corresponding to this plateau on the y-axis. This suggests a strong correlation between the value of the size and the number of divisions on the lineage. The dots between the plateaus correspond to sizes that have been reached by cells with different numbers of divisions (leading to non-integer mean values), as highlighted by the gradation from one color to another. By going from the top experiment to the bottom one, the plateaus gradually blur and are replaced on fig. 3e by a smoother curve, in good agreement with the logarithmic prediction we made in [10]. This happens when lineages de-synchronize because of the cumulative effect of various noises, leading to a weaker dependence of the number of divisions K on the final value of the trait s.
On the right column of the figure, we see that fitness landscapes strongly deviate from being normally distributed, which justifies the need to go beyond the results known in the Gaussian case. More precisely, on fig. 3b,d, fitness landscapes exhibit peaks at values of h corresponding to one of the plateaus appearing on the left-column plot. We notice that not all the 3 plateaus of fig. 3a (resp. 2 plateaus of fig. 3b) are mapped with a peak in the corresponding forward fitness landscape distribution on fig. 3b (resp. fig. 3d). This is because the cell size distribution tend to zero for extreme sizes (that is 0 and +∞), thus cells of large sizes aligning on plateaus defined by small K and cells of small sizes aligning on plateaus defined by large K contribute very little to the cell size distribution and thus to the fitness landscape distribution.
Then, we test the upper and lower bounds on the strength of selection acting on cell size using these data. We show on fig. 4 the upper bound U X given by eq. (14) and the lower bound L X given by eq. (21), normalized by the strength of selection Π X . The x-axis labels in no particular order the colonies which have grown in different nutrient medium [27]. As expected, points representing the upper bound and those representing the lower bound are respectively above and below the horizontal dashed line at y = 1. Experiments for which the normalized upper bound approaches 1 indicate that cell cycles are almost synchronized and thus that there is small variability in terms of number of divisions among the lineages.
Note that Nozoe et al. also proved [2] that the strength of selection for the division bounds the strength of selection acting on any trait: 0 ≤ Π S ≤ Π K . This bound, is typically not as tight as eq. (14) (see appendix G for comparison). To improve upon it, one can use S = K in eq. (14) to obtain a bound for Π K itself.

IX. DISCUSSION
The general idea of comparing the response of a system in the presence of a perturbation to its fluctuations in the absence of the perturbation lies at the heart of the Fluctuation-Dissipation theorem, which has a long history of physics, with some applications to evolution [1,28]. Remarkably, the present framework with forward (unperturbed) and backward (perturbed) dynamics can be conveniently applied to population dynamics without having to perform additional experiments, since both probabilities can be calculated with the same lineage tree. Our main result is a set of inequalities for the average of an arbitrary function of a trait or for its fitness landscape, valid beyond the Gaussian assumption, and which constrain the strength of selection in popu-  For (b,d), the fitness landscapes are highly non-Gaussian, and the peaks in these distributions correspond to the value of one of the plateaus.  These inequalities are universal because they only rely on the branching structure of the population tree and are completely independent of the dynamics of the tree, that is the ensemble of rules governing the division of the branches. In the context of cell populations, this means for instance that our results are valid for any control strategy (sizer, timer, adder, ...) and in presence of any source of noise (variability in single cell growth rate or size at division, asymmetry in resource partitioning between sister cells, ...) [5], in presence of possible mutations, and regardless of the nature of the cell (bacteria, yeast, stem-cell, ...). Although we illustrated our results with cell populations, they apply to any stochastic process defined on a branched tree. In particular, they could be insightful in the context of ecology, where such trees are used to represent phylogeny [29]. In this case, each lineage could represent a species or a version of a gene, instead of an individual, and the divisions would represent speciations or mutations, respectively. The notions of fitness landscape and selection strength appear meaningful in this setting, as a quantification of the correlations between a feature of a species/genetic information and the number of speciations/mutations its phylogenetic lineage underwent.
For applications, we focused on phenotypic traits, like cell size or age, and in these cases, we found our upper bound on the strength of selection to be tight. In future work, it would be interesting to apply this framework to genotypic traits instead of phenotypic ones [22] and possibly exploit recent methods of lineage tracking [30]. This could open new perspectives to address a number of important problems like antibiotic resistance [4], the differentiation of stem cells [31] or virus evolution.
The search for universal principles in evolution is an active field of research [28,32]. An important step in this endeavor was made by Fisher, who boldly compared his theorem to the second law of thermodynamics. While the theorem turned out not to be as general as expected, Fisher had nevertheless the correct intuition about its importance for evolutionary biology, and he was also correct in expecting that such a general principle should be related to thermodynamics.  1) should not be confused with the fitness associated with the value s of the trait, which can be identified with the growth rate of the subpopulation carrying that trait. However, they are linked by a simple relation. In line with the definition of the population growth rate, we define the growth rate of the subpopulation carrying the value s of the trait S as Λ t (s) = 1 t ln n(s, t) n(s, 0) where n(s, t) = p back (s, t)N (t) is the number of cells with trait value s at time t. Of course, defining this quantity only makes sense between two times t = 0 and t for which the value s is present in the population, otherwise the expression in undefined. By comparing eqs. (1) and (A1), we can link the growth rate associated with a trait value s to the fitness landscape of this trait value: (A2) The last equality follows from the fact that, at t = 0, the cells have not divided yet and so the forward and backward samplings of the population are identical.
Finally, we see that the three quantities Λ t , Λ t (s) and h t (s) are different but intimately linked. In the long time limit, for which equilibrium distributions do not depend on time anymore, they all become equal to the steadystate population growth rate Λ Moreover, as we already noted, eq. (1) indicates that the sign of h t (s) − Λ t informs us on the comparison between the backward and forward probabilities of trait value s in the population at time t, or in other words, if the trait value s is over-represented in the population as compared to a situation without selection. Thus, h t (s) − Λ t quantifies the separate effect of selection. Following the intuitive understanding of the growth rate associated with a trait value s, eq. (A1) means that the sign of Λ t (s) − Λ t informs us on the comparison between the backward statistics of the trait value s at time t and at time 0. A trait value is favored by the population dynamics, which includes all phenomena leading to changes in trait value frequencies, if its growth rate is larger than the population growth rate, which corresponds to an increase of the frequency of that trait value in the population as time grows.
The last relation, eq. (A2) provides a new insight: the sign of h t (s) − Λ t (s) is linked to the comparison between the forward probability of trait value s at time 0 and time t. The forward statistics is constructed to balance the effect of selection occurring in tree-structured data. However, it is affected by all the other sources of variability, as for example mutations. Therefore, the sign of h t (s) − Λ t (s) indicates the evolution of the frequency of the value s of trait S as time grows, due to every phenomenon but selection.
The eq. (1) can be written in the form of a fluctuation relation [2,10], with an exponential bias between the forward and backward probabilities, similar to Crooks fluctuation relation for work in stochastic thermodynamics. The same can be done with eqs. (A1) and (A2), however, unlike eq. (1), these new fluctuation relations both link two probability distributions that may not have the same support.

Appendix B: Gaussian case
We show in this section that a linear relation between the strength of selection and the variance of the fitness landscape holds in the case where the fitness landscape is normally distributed. To do so, let us first derive a very useful result: by isolating p back (s, t) in eq. (1), and integrating over s using the normalization of p back , the population growth rate is expressed as a forward average We can do the same but isolating p for (s, t) this time, leading to We now assume that h t (s) can be accounted for by a continuous probability distribution, even though the trait S may be discrete, as it in the case for the number of divisions. We set a Gaussian forward distribution with mean h t for and variance σ for (h) 2 for the fitness landscape h t (s), then exp(th t (s)) follows a log-normal distribution of mean e tht for = e t ht for +(tσ for (ht)) 2 /2 .
This relation shows that for a given forward average fitness landscape, the growth rate is positively affected by the variability between the lineages. The backward average of the fitness landscape is given by the forward average of a biased fitness landscape: We make the hypothesis that the fitness landscape is a bijective function of the trait value and use the conservation of the probability: p for (s, t)ds = p for (h)dh, leading to a solvable Gaussian integral Finally, combining eqs. (B1), (B3) and (B5), we obtain and thus Moreover, combining eqs. (B3) and (B7) we deduce that h t for and h t back are not only respectively smaller and greater than Λ t , as discussed in section VII, but they are actually symmetrical around this value h t back − Λ t = Λ t − h t for = t σ 2 for (h t )/2. In other words, in this particular case, the KL divergence is symmetrical: D KL (p for |p back ) = D KL (p back |p for ).
In the case where h t (s) follows a Gaussian distribution in the forward statistics, it also follows a Gaussian distribution in the backward statistics because the bias of the fluctuation relation between p back and p for is exponential in h t . Since h t follows a Gaussian distribution of mean h t back and standard deviation σ back (h t ) in the backward statistics, then exp [−th t (s)] follows a log-normal distribution of mean We now take the inverse of this formula and use eqs. (B2) and (B6) to replace the backward average: e tΛt = e t( ht for +tσ for (ht) 2 )−(tσback(ht)) 2 /2 .
By comparing eqs. (B3) and (B9), it follows that σ back (h t ) = σ for (h t ). Finally, the standard deviation in eq. (B7) can be taken indifferently with respect to both statistics and we omit the index to write the general version of eq. (B7):

Appendix C: Upper bounds numerical comparison
In this section we compare numerically the upper bound U GL obtained in eq. (11): to the upper bound U DS derived by Dechant and Sasa (Eq. 5 in [13]): where σ = sign( g t b − g t a ) and K a gt (γ) = ln exp (γg t ) a the cumulant-generating function of g t . Both quantities U GL and U DS bound the difference | g t b − g t a | between the average values of a function g t of a variable S with respect to probability distributions p a and p b .
To compare them, we took beta distributions for p a and p b , having the same support [0, 1] so that both bounds are defined.
Beta-distributed variables admit a probability density function (pdf) of the form f (s, α, β) = B(α, β)s α−1 (1 − s) β−1 , where B(α, β) is a normalization constant, and their mean is given by s = α/(α + β). We fix the pdf in the ensemble b to p b (s) = f (s, 3, 3), whose bell shape is reminiscent of a Gaussian distribution, on a finite interval. The pdf in the ensemble a is taken as p a (s) = f (s, α, β), where α and β are varied in [2,4]. We choose the simple function g t (s) = s.
Results are shown on fig. 5. The first row of figures shows the difference between the upper bound and the actual difference | s b − s a |, for our bound U GL (fig. 5a), and for Dechant-Sasa's bound U DS (fig. 5b). As expected, all points on these two plots are positive. We plot on fig. 5c the real difference s b − s a , which is in complete agreement with the theory: s b − s a = 1/2−α/(α+β).
Finally, fig. 5d shows a comparison between our bound and Dechant-Sasa's bound: blue regions represent sets of parameters (α, β) where our bound is numerically tighter, and the opposite is true in red regions. We note that, if the blue region is larger than the red region, on the other hand the advantage of one bound over the other |U GL − U DS |, is generally larger in the red region. Therefore, the answer to the question 'Which bound is tighter?' depends on the actual distributions p a (s) and p b (s). However, we note that our bound is easier to compute since it does not require the optimization over an external parameter, which is the case for U DS in [13] (parameter γ in eq. (C2)). Note that this optimization can be bypassed by choosing a specific value γ in eq. (C2), but then the corresponding bound is less tight than the version with the infimum.
Appendix D: Information-theoretic-distance σ(q) between perturbed and unperturbed dynamics in terms of measurable quantities We show in this section how the information-theoreticdistance σ for (q t ) is linked to measurable quantities in cell colonies, for a general trait S, and for the particular case of age models. Combining eqs. (1) and (5) give the ratio q t (s) = exp [h t (s) − Λ t ], which combined with eq. (B1) lead to eq. (15) in the main text. Thus, σ for (q t ) is the coefficient of variation in the forward statistics of the quantity e tht(s) = e tΛt p back (s, t) p for (s, t) where n(s, t) = N (t)p back (s, t) is the number of cells with the value s of trait S at time t.
In the case where S is the number of divisions, the fitness landscape is called the lineage fitness and is given by h t (K) = K ln m/t. Therefore, σ for (q t ) = σ for (m K )/ m K for is the relative fluctuation of quantity m K , representing the expected number of lineages that underwent K divisions, normalized by the initial population N 0 , divided by the forward probability of K.
For age models, where the division is only controlled by the age of the cell, we know a fluctuation relation linking the forward and backward distributions of generation times τ , defined as the time between two consecutive divisions on the same lineage [33] This relation can be understood as a version of the fluctuation relation on the number of divisions [2,10] at the scale of the cell cycle. However, let us note two differences: first, unlike the one for the number of divisions, eq. (D2) is only true in the long time limit, when the population is growing at a constant steady state growth rate Λ; and second, the distributions f back and f for are not snapshot distributions at time t, but distributions of generation times computed along the weighted lineages. We define q(τ ) = f back (τ )/f for (τ ) in the same way, and following the same steps for a general function g(τ ) we derive | g(τ ) back − g(τ ) for | ≤ min (σ for (g(τ ))σ for (q(τ )), σ back (g(τ ))σ back (r(τ ))) . (D3) From eq. (D2), we express the information-theoreticdistance term as which is the relative fluctuation in the forward sampling of the quantity exp [−Λτ ] = f back (τ )/mf for (τ ). We know from [34] that the backward distribution is also the generation time distribution of the direct ancestor cells. Therefore, exp [−Λτ ] represents the ratio of the probability for the ancestor cell to divide at age τ to the expected number of daughter cells born from that division that divide at age τ .

Appendix E: Small variability limit
In this section, we study the two sides of the fluctuation-response inequality on an arbitrary function g t of a phenotypic trait S (eq. (11)) in the limit where the forward and backward distributions approach each other, and show that they are mathematically equivalent in this limit in the case where the function g t is the fitness landscape h t . The difference between the two distributions is captured by the 'distance' term σ(q t (s)), or equivalently σ(ln q t (s)) = σ(ln(p back (s, t)/p for (s, t))) = tσ(h t (s)), where the standard deviations can be taken either in the backward or forward statistics. From now on, we refer to the limit where the forward and backward distributions are close to each other as the small variability limit, defined by tσ(h t (s)) → 0.
We first use this limit in the forward statistics: tσ for (h t (s)) → 0. Starting from eq. (1), we isolate p back (s, t), multiply both sides by g t (s) and integrate over s, leading to the expression of the backward average of function g t as the forward average of a biased version of the same function: g t back = e t( ht for −Λt) g t (s)e t(ht(s)− ht for ) p for (s, t)ds . (E1) In order to expand the exponential, we assume that for any s, t(h t (s) − h t for ) is small, which corresponds to tσ for (h t ) small because σ for (h t ) is the characteristic distance to the mean. Therefore, where g t h t for − g t for h t for = Cov for (h t , g t ) is the covariance of h t and g t with respect to the forward probability. The term in the bracket is a first-order correction to g t for in tσ for (h t ). Now we need to compute the prefactor exp[−tΛ t ], starting with eq. (B1) and using the same expansion which is a second-order correction to exp[t h t for ] in tσ for (h t ).
Combining eqs. (E2) and (E3) we find at first order As mentioned at the beginning of this section, we can use the backward point of view, with a first-order expansion in tσ back (h t (s)) instead. In this case, g t for = e t(Λt− ht back ) g t (s)e t( ht back −ht(s)) p back (s, t)ds The prefactor is computed with the same expansion starting from eq. (B2): which is a second order correction in tσ back (h t ). Combining eqs. (E5) and (E6), we find When comparing eqs. (E4) and (E7), we conclude that the covariance can be taken equivalently in the forward or backward statistics. Let us now turn to the r.h.s. of inequality eq. (11). Using the expression of σ for (q t ) as the forward coefficient of variation of the quantity exp[th t (s)] (eq. (15)) (resp. σ back (r t ) as the backward coefficient of variation of the quantity exp[−th t (s)]), it is straight-forward to show from the same kind of Taylor expansion that σ for (g t )σ for (q t ) ∼ tσ→0 tσ for (h t )σ for (g t ) (E8) σ back (g t )σ back (r t ) ∼ tσ→0 tσ back (h t )σ back (g t ) .
Thus, the inequality eq. (11) does not get necessarily saturated in this limit. However, in the particular case where g t (s) is the fitness landscape h t (s), then eq. (E4) reads and thus the inequality eq. (14) is saturated in this limit.
(a) (b) (c) (d) (e) (f) Figure 6. Experimental fitness landscapes for age and their forward distributions, computed with data from [27]. Each line corresponds to a different experiment and the first column shows fitness landscapes ht(a) as functions of size a. (a,c): the grey horizontal dashed lines correspond to theoretical plateaus, equal to K ln 2/t, predicted when K is fully determined by the value s of the trait. The integers K corresponding to the plateaus are indicated on the right y-axis. (e): the plateaus are blurred and replaced by a smoother scatter plot in good agreement with the general shape of the theoretical prediction, made in the case of an age-controlled model in steady-state [10]. Λ is the population growth rate and the constant C was adjusted to fit the scatter plot. (a,c,e): each dot is made of all the cells having the same age, and the mean number of divisions amongst those cells is represented by the color of the dot. This shows that dots aligning on a plateau corresponding to a number K of divisions truly come from cells that underwent K divisions. (b,d,f ): the second column represents the distribution p for (h) of the corresponding age fitness landscapes (i.e. on the same line) with the forward age distribution. For all three rows, the fitness landscapes are highly non-Gaussian, and for (b,d) the peaks in these distributions correspond to the values of some of the plateaus. be expressed as [2]: th t (s) = ln K m K R for (K|s) , in terms of the conditional forward probability R for (K|s) of the number of divisions. Since m ≥ 1 and K ≥ 0, this relation implies that the fitness landscape is a positive quantity. In this case, eq. (F5) (resp. eq. (F6)) gives a non-trivial bound based on the first two moments (resp. second moment) of the forward fitness landscape distribution.
The simplest bound is thus obtained when considering eq. (F6) with h min = 0 and h max = +∞, which cancels the second term in the bracket .
(F7) Appendix G: Further tests on experimental data from [27] On fig. 3 in section VIII, we showed the size fitness landscapes h t (x) as a function of the cell size x, and the distribution p for (h) of size fitness landscape computed with the forward cell size distribution, for three of the eleven experiments from [27]. We now show the corresponding plots when choosing the age A of the cell as the phenotypic trait S. The experiment on line i on fig. 6 is the same as the experiment on line i on fig. 3. By comparing the two figures, we see that for the first two rows the theoretical plateaus at h = K ln 2/t are the same for h t (a) and for h t (x), which is logical since it is the same cells, and both the age and the size are highly correlated to the number of divisions. On fig. 6e, as for the size on fig. 3e, the plateaus start to blur to give rise to a smoother scatter plot, whose shape matches the linear prediction we made for age-controlled models in steady state [10]. Similarly to the case of the cell size, distributions of fitness landscapes shown on the second column are highly non Gaussian.
Then, we test the upper and lower bounds on the  Figure 8. Ratio of the general bound ΠK to our upper bound UX for size (top plot) and UA for age (bottom plot) for the 11 experiments from [27], in no particular order. All points are above the black horizontal dashed line at y = 1, which indicates that our upper bound is always smaller and thus better than ΠK. strength of selection acting on cell age using the same data. We show on fig. 7 the upper bound U A given by eq. (14) and the lower bound L A given by eq. (21), normalized by the strength of selection Π A acting on age. The x-axis numbers the colonies which have grown in different nutrient medium [27]. As expected, points representing the upper bound and those representing the