Origin of lognormal-like distributions with a common width in a growth and division process

Lognormal statistical distributions are observed in a variety of scientific fields. The widths of these distributions in the log scale are often similar, but the underlying mechanism that maintains these widths within a small range has not been well explained. We show that a stochastic process of halving followed by addition can yield a stationary distribution that resembles the universal lognormal distribution with a certain width. The mechanism that we propose here would provide insight into the essence of why lognormal-like distributions in many systems have a common width.


I. INTRODUCTION
The shape of a frequency distribution curve provides clues to the underlying processes that form the distribution.The lognormal distribution is a type of probability distribution in which the logarithm of a variable is normally distributed [1,2].Lognormal distributions are observed in various fields of science including the life sciences [3], materials sciences, and social sciences [4].Interestingly, although the kinetics that govern the systems represented by such distributions are likely to be diverse, the width of observed lognormal distributions on the logarithmic scale is often similar across fields when they are normalized, as seen in some of the examples from literatures plotted in Fig. 1(a).More specifically, the variance of the natural logarithm of the variable (V) ranges from about 0.2 to 0.5 in these examples [4][5][6][7][8][9][10][11][12][13][14], and in this paper we refer to it as the "common width." Here we are specifically interested in the fact that the sizes of the living cells from bacteria to mammalian cells also fall into lognormal distributions with the common width [Fig.1(a)].Cell size distributions should be determined by the growth and division processes [5], and the kinetics of consecutive division (fragmentation) processes are well studied and known to yield lognormal distributions under certain conditions [15,16].In this paper, we consider a simple growth and division model mimicking bacteria as a basis for the investigation of the underlying mechanism that produces a lognormal distribution with the common width.We show that a simple process involving the stochastic decrease of a variable by halving (binary division) followed by an increase by growth can yield a lognormal-like distribution with the common width [V ≈ (ln 2)/2 = 0.35] as a stationary probability distribution (SPD).We also discuss the universality and applicability of this model, by showing that this characteristic distribution appears regardless of the values of some set of parameters that govern the dynamics.

II. MODEL
A bacterium grows in size and divides into two individuals, which initially are each half the size of the original.Mimicking * yomo@ist.osaka-u.ac.jp these dynamics, we assume an object with a positive variable x (e.g., the size of bacteria) that grows at a rate h g (x) and divides into b different individuals, each of which has a valuable that is 1/b of the original (b > 1; b = 2 for halving) at a probability h d (x) per unit time [Fig.1(b)].Then, we derive the rate equation where f(x,t) is the frequency of the objects with variable ) dx increases over time as a result of the increase in the number of objects by the division process; therefore, we normalized f(x,t) to obtain the probability density function p(x,t) = f (x,t)/N(t).By integrating ∂f (x,t) ∂t with respect to x, we obtain dN(t) dt = (b − 1)ν(t)N (t), where ν(t) = ∞ 0 h d (x) p(x,t) dx.Taking this increase in the total frequency into account, we derive the rate equation of p(x,t): This is the master equation for the general model.As representative functions for h g (x) and h d (x) to be analyzed, we examined cases in which the growth rate and the division probability are proportional to arbitrary powers of x(n g and n d , respectively), i.e., h g (x) = k g x n g and h d (x) = k d x n d , where k g and k d are constants.These are monotonic functions with a positive value for x > 0. This description can generally cover various possible cases; for * [11] (0.48), liposome surface area (square of the diameter) [12] (0.41), nanoparticle length [13] (0.32), and number of words in a sentence - [14] (0.48).The dashed black line shows the SPD obtained from the numerical simulation using the bacterial model [Eq.( 1) , which gave V = 0.43.The bold gray line shows the lognormal curve with V = (ln 2)/2 = 0.35.Our model gives a value for V that roughly falls within the range described in the literature.The normalized variable on the horizontal axis was calculated by subtracting a value for the invalid range of variables where the frequency was zero from the original value.The resulting value was then normalized so that the variable at the maximum frequency (i.e., the mode) of the moving average of three data points was one.The frequency of the normalized variable on the linear scale from the literature was converted to frequency on the logarithmic scale by multiplying by the variable itself.(b) Schematic diagram of the general model described by Eq. ( 1).
example, when n g < 0, the object with a larger value of x grows more slowly, and when n g > 0, the object with a larger value of x grows faster.For such cases, Eq. ( 1) is rewritten as where μ i (t) is the ith raw moment of p(x,t).

A. Distribution of the variable derived from the bacterial model
Let us first show a simple case regarding bacterial size.We assume that both of the growth rate and the division probability are proportional to x, i.e., n g = n d = 1 in Eq. ( 2) [17].We also assume that the division is a halving, i.e., b = 2.We designate this condition as the bacterial model.We have used the finite difference method to numerically solve Eq. ( 2).Because x spans multiple orders of magnitude, we used a logarithmic scale (grid width in the simulation was set to create a constant bin for the logarithm of x) in the calculation.As a result, we obtained a lognormal-like SPD [Fig.1(a), black dashed line].Again, the shape of this SPD is similar not only to that of the size distributions of bacteria [5][6][7][8] but also to that of the size distributions of other single-celled microorganisms [8], mammalian cells [8][9][10], and organs [11] as well as nonliving material such as liposomes [12] and nanoparticles [13] and the number of words in a sentence [14] [Fig.1(a), symbols].Thus, the bacterial model representation of Eq. ( 2) might possess an intrinsic feature that is common to a variety of examples.

B. Dependence of the shape of the SPDs on the constants
To show how the model applies to more general cases other than the bacterial model, we analyze the dependence of the shape of the SPD on the constants in Eq. ( 2) (k g and k d ,b, and n g and n d ).

Dependence on k g and k d
Primarily, we show that the shape of the SPD is independent of the rate constants k g and k d when n g = n d (including the bacterial model).We will prove this by transforming the equation.Let us define q(z) as the SPD for a normalized variable z = x/a, where a = k g /k d , which satisfies p(x) dx = q(z) dz.In this case, the shape of the probability distribution for ln(z) is the same as that for ln(x), with the only difference being in the scale.At the stationary state (∂p(x,t) ∂t = 0), Eq. ( 2) is transformed to In this formulation, a is canceled out when n g = n d .Therefore, in the bacterial model and in its extended model, the shape of the SPD is independent of the rate constants k g and k d .

Dependence on b
We performed a numerical simulation of Eq. ( 2) for various values of b when n g = n d = 1 [Fig.2(a)].The resultant SPDs fit a lognormal distribution; however, when b = 2, the widths deviated from the common width.We calculated the variance of ln(x) (defined as VV) that represents the width of the SPD in the log scale and plotted it against b [Fig.2(b), •].The plot shows a monotonic increase of V with increasing b.We have thus shown that b = 2 is an important parameter that determines the common width.

Dependence on n g and n d
We again performed a numerical simulation and plotted the SPDs for various values of n g and n d with b = 2 as a constant [Fig.2(c)].The results show that the SPDs are similar when n g = n d = n (n = −1∼4), although the SPD becomes slightly narrower as n recedes from 1 [Fig.2(c)].When n g = n d [n g = 1.5 and n d = 1; Fig. 2(c), ], the SPD is still lognormal-like, but there is a significant change in the width of the distribution.Therefore, n g = n d is also an important condition that determines the common width.

Approximated analytical solution of V
To further investigate the dependence of the width of the lognormal-like distribution on the constants, let us derive an approximate analytical solution of V.By multiplying both sides of Eq. ( 2) by x m and integrating both sides with respect to x (0∼∞), we obtain an equation for the raw moments of the SPD in the stationary state: From this equation, we obtain two equations for m = 1 and m = 2.Then, using the lognormal approximation with the mean of ln(x) M and the variance of ln(x) V, the ith raw moment of the lognormal can be expressed as μ i = exp(iM + i 2 V /2).From these three equations we obtain the equation for V as b + e 2n d V = 2be (n g −1)V . ( In Eq. ( 3), the rate constants k g and k d are again canceled out.When n g = n d = 1, V is obtained as (ln b)/2, which agrees with the value of V derived from the numerical simulation [Fig.2(b), •] and shows that V is approximately proportional to ln b.As b increases, the deviation of V from (ln b)/2 increases because the approximation for V is based on the raw moments of the linear variable x; the disagreement is enhanced when the width of the SPD in the log scale increases.We then plotted a contour map of V for n g and n d as derived from Eq. ( 3) with b = 2 [Fig.2(d)].This plot clearly shows that V is smaller when n g is smaller (objects with a smaller variable grow relatively faster) and n d is larger (objects with a larger variable divide relatively more frequently).More importantly, the isocontours are parallel to the line n g = n d .This result implies that V is nearly constant when n g = n d , regardless of the values of each.We also used Eq. ( 3) to estimate an approximate analytical solution for V when n g = n d .By assuming that V is proportional to ln b for ln b near 0 and for n g = n d = n, differentiating both sides of Eq. ( 3) with ln b gives V = (ln b)/2.Indeed, the value of V from the SPD derived from the numerical simulation is insensitive to n and approximately (ln b)/2 even when n = 1 [Fig.2(b)].As mentioned above, V is approximately (ln b)/2 regardless of k g ,k d , and n as long as n g = n d ; that is, the common width appeared [V ≈ (ln 2)/2] when the division is halving and the growth rate and the division probability are proportional.

C. When the division is not exactly even
We have so far assumed that the variables of daughter objects after division are exactly 1/b of the parental object.Realistically, resultant variables after division often fluctuate.To consider the effect of fluctuations in the division process, we revise the division term in Eq. ( 2) with the conditions n g = n d = n and b = 2.When the probability of the variable after division is set to the uniform distribution function, the first term on the right side in Eq. ( 2) becomes 2 where s is the integration variable.This type of formulation of the division has been described as a fragmentation of a cluster [16,18,19].In this case of the uniform distribution function, the SPDs derived from the numerical simulation were skewed in the log scale, and their widths were greater than V = (ln 2)/2 [Fig.3(a)].This result indicates that there might be some limit to the variability of the division process.We next examined the case when the division fluctuates within a Gaussian probability distribution.In this case, objects with variable x divide into two daughters with variable ξ (0 < ξ < x) and x−ξ , where the probability of ξ follows a Gaussian distribution function with a mean value x/2 and a standard deviation σ x [G(x/2,σ x,ξ )] [Fig.3(b), insets].This function was truncated to satisfy the expression Thus, the first term on the right side of Eq. (2) becomes 2 σ s,x)p(s,t) ds.For the bacterial model (n = 1), the results of the numerical simulation for various values of σ show that the skewness of ln(x) is close to zero for a value of σ up to 0.1 [Fig.3(b)].This result implies that if the fluctuation in b is Gaussian and moderate (σ < 0.1), the shape of the SPD is insensitive to fluctuations in the division process.

IV. DISCUSSION
Our numerical and analytical investigations show that lognormal-like size distributions with the common width [V ≈ (ln 2)/2; Fig. 1(a), bold gray line] result from a balance between growth (continuous additive increase) and stochastic division (discrete multiplicative decrease) processes that have The effect on the model when the variables for daughter objects after halving are not exactly half of the parental object.(a) SPDs derived from numerical simulation when the probability of the variable after division is set to the uniform distribution with n = 0 (•), 1 (•), 2 (+), and 3 (×).The solid gray line shows the lognormal with V = (ln 2)/2.By differentiating both sides of the master equation by x in the stationary state, the SPD can be solved only when n is 0 or 1 as 4a −2 xexp(−2a −1 x) or a −1 exp(−a −1 x), respectively.(b) The σ dependency of the skewness of ln(x) for SPDs calculated from the numerical simulation when the probability of the variable after division is set to the truncated normal distribution and when n = 1.The insets show the probability distribution of the daughter variable ξ (0 to x) for each σ .the following characteristics: (i) division is a halving process (b = 2) and (ii) growth rate and division probability are proportional (n g = n d ).We will now discuss the mechanism that produces a stationary distribution such as size homeostasis.Let us consider the average rates of increase and decrease for the variables controlling the growth and division processes.Because division occurs stochastically, the average decrease rate should be the product of the division probability and the amount of decrease in a single division.Here, the amount of decrease in a single division is proportional to the variable (i.e., x−x/b = (1−1/b)x] because the division is a multiplicative process [Fig.1(b)].From requirement (ii) above, the growth rate h g (x) and the division probability h d (x) are proportional such that the difference between the average increase rate h g (x) and the average decrease rate h . For smaller variables, the average increase rate then becomes greater than the average decrease rate, while for larger variables, the reverse applies.Because of this feature, the variable remains within a range that forms a stationary distribution.If the amount of decrease in a single division is constant for all ranges of the variable, then the ratio of the increase rate and the decrease rate is constant for all ranges of the variable.In such a case, the distribution diffuses due to the stochasticity in the division, and there is then no stationary distribution.As mentioned above, when requirement (ii) is true, the major factor for producing stationary distribution is the multiplicative property of the division.For a rigorous proof of the existence of the stationary distribution, further studies are required.
Next, we discuss the factors that produce the lognormality and the common width of the SPD.In our model, the variability of the value of x results only from the discreteness of the stochastic division process.Because the amount of the discrete decrease in a single division in the log scale is constant for all ranges of the variable, i.e., ln(x)−ln(x/b) = ln b, the distribution appears as a normal-like shape in the log-scale curve.This aspect is in accordance with existing fragmentation theories [15,16].An important finding of our model is that the distribution becomes stationary over time due to the presence of the continuous growth term, and the result for the width [V ≈ (ln b)/2] is determined by the amount of the discrete decrease in a single division in the log scale.
The two prerequisites of our model for producing a lognormal distribution with the common width are reasonable assumptions in nature.Halving might be applicable to a process where division appears to be b > 2 because division can be thought of as a sequence of halvings.Furthermore, the proportionality of the growth rate and the division probability is not a strict assumption because growth and division dynamics may be governed by the same factors.In reality, specific dynamics could be involved in producing each frequency distribution.For instance, specific regulatory mechanisms might play important roles in determining the size of specific bacteria or of mammalian cells.Thus, the applicability of our model to particular examples should be examined in each case.However, the consideration of growth and division processes would provide general insight into the essence of the underlying processes that form lognormal-like distributions with the common width.The combination of the processes of addition and halving might contribute to the ubiquity of these distributions in nature.
x at time t.The first term on the right-hand side, b 2 h d (bx)f (bx,t), represents the increase in the frequency due to cell division [Fig.1(b), arrow i].Because the range of x for the parental objects [b x in Fig. 1(b)] is b-fold greater than that of the daughter objects ( x) and the division of a single parental object bx produces b daughter objects of x, this term is multiplied by b 2 .The second term, −h d (x)f (x,t), represents the decrease in the objects' frequency due to division [Fig.1(b), arrow ii].The third term, − ∂h g (x)f (x,t) ∂x , represents the flux of the frequency due to continuous growth [Fig.1(b), difference of arrows iii and iv].In this model, the total frequency

FIG. 1 .
FIG. 1. (Color online) (a) Probability distributions of the variables for various objects reported in the literature.From these data, the variance of the natural logarithm of the variable (V) was derived by fitting to a normalized lognormal distribution exp[−X 2 /(2V )], where X is the natural log of the normalized variable.Systems include the following: E. coli size by forward scattering • [6] (V = 0.35), E. coli volume from a Coulter counter • [7] (0.28), mouse lymphoblast volume + [9] (0.29), human platelet volume × [10] (0.33), mouse islets of Langerhans size (cube root of cell number)*[11] (0.48), liposome surface area (square of the diameter)[12] (0.41), nanoparticle length[13] (0.32), and number of words in a sentence -[14] (0.48).The dashed black line shows the SPD obtained from the numerical simulation using the bacterial model [Eq.(1)with h g (x) = 2.3 × 10 −7 x,h d (x) = 3.1 × 10 −3 x,b = 2], which gave V = 0.43.The bold gray line shows the lognormal curve with V = (ln 2)/2 = 0.35.Our model gives a value for V that roughly falls within the range described in the literature.The normalized variable on the horizontal axis was calculated by subtracting a value for the invalid range of variables where the frequency was zero from the original value.The resulting value was then normalized so that the variable at the maximum frequency (i.e., the mode) of the moving average of three data points was one.The frequency of the normalized variable on the linear scale from the literature was converted to frequency on the logarithmic scale by multiplying by the variable itself.(b) Schematic diagram of the general model described by Eq. (1).
FIG. 2. (Color online) The dependency of SPD on b, n g , and n d .(a) SPDs derived from numerical simulations of Eq. (2) with n g = n d = n = 1 and b = 1.2 (×), 2 (•), and 5.7 (+).The solid lines show the fit of each SPD to the lognormal.(b) The b dependency of the variance of ln(x) (V).V was calculated from the SPDs derived from the numerical simulation of Eq. (2) with various b values when n = 0 (•), 1 (•), 2 (+), and 3(×).The solid gray line shows V = (ln b)/2.(c) SPDs with various n g and n d .The SPDs were derived from numerical simulations of Eq. (2) with b = 2 for n g = 1.5 and n d = 1 ( ) and for n g = n d = n = −1 ( ), 0 (•), 1 (•), 2 (+), 3 (×), and 4( ).The gray dashed line shows the fit of to the lognormal, and the solid gray line shows the lognormal with V = (ln 2)/2.(d) The dependency of V on n g and n d .The V values were calculated from Eq. (3) with b = 2 and n d − n g > −1 because n d − n g > −1 is a sufficient requirement for the existence of the SPD.The dashed line shows n g = n d .

FIG. 3 .
FIG. 3. (Color online)The effect on the model when the variables for daughter objects after halving are not exactly half of the parental object.(a) SPDs derived from numerical simulation when the probability of the variable after division is set to the uniform distribution with n = 0 (•), 1 (•), 2 (+), and 3 (×).The solid gray line shows the lognormal with V = (ln 2)/2.By differentiating both sides of the master equation by x in the stationary state, the SPD can be solved only when n is 0 or 1 as 4a −2 xexp(−2a −1 x) or a −1 exp(−a −1 x), respectively.(b) The σ dependency of the skewness of ln(x) for SPDs calculated from the numerical simulation when the probability of the variable after division is set to the truncated normal distribution and when n = 1.The insets show the probability distribution of the daughter variable ξ (0 to x) for each σ .