Dynamics of DNA Replication in Yeast

We present a mathematical model for the spatial dynamics of DNA replication. Using this model we determine the probability distribution for the time at which each chromosomal position is replicated. From this we show, contrary to previous reports, that mean replication time curves cannot be used to directly determine origin parameters. We demonstrate that the stochastic nature of replication dynamics leaves a clear signature in experimentally measured population average data, and we show that the width of the activation time probability distribution can be inferred from this data. Our results compare favorably with experimental measurements in Saccharomyces cerevisae .

DNA replication starts at specific locations in the chromosome called replication origins.Most bacterial genomes are replicated from a single origin, but the much greater size of eukaryotic genomes requires multiple origins per chromosome to ensure that the replication process does not take too long.Genome replication has been comprehensively studied in the model organism S. cerevisiae (brewers' yeast).Origin locations in S. cerevisiae are determined by specific DNA sequences and are thus fixed in every yeast cell [1].The positions of origins in yeast have been comprehensively catalogued [2].However, a given origin may not be active during replication, because origins must be licensed before the start of S phase (the part of the cell cycle where DNA replication takes place).Licensing consists of a series of specific protein complexes binding at origin locations, culminating in the loading of pairs of Mcm2-7 molecules.If in a given cell licensing of a certain origin is not completed by the time S phase starts, the origin is unable to function [3].
High-throughput methods have allowed the measurement of replication times as a function of chromosomal position for the whole genome [4].These experiments yield average replication times over large cell populations (typically >10 7 cells) and therefore can mask the cell-tocell variability present in the system [5]; to date single cell and single molecule studies are not able to measure the kinetics of whole genome replication [5,6].The low abundance of the molecules involved in triggering origin activation strongly suggests that origins have stochastic activation times [7].This is often ignored in the biological literature, where there is a pervasive notion of a ''replication program,'' in which origins are considered to be programmed to fire following a precisely controlled order.This idea has frequently led to erroneous interpretations of replication time profiles (reviewed in [8]).
There has been much interest recently in mathematical modeling of DNA replication.Two different modeling approaches have been used: simulations to capture the replication dynamics at a single cell level [9][10][11][12], and probabilistic models that characterize the dynamics of replication at a population level [13,14].Some models of DNA replication [14,15] are closely related to Kolmogorov's classical model of nucleation processes [16].Our model can be regarded as an inhomogeneous model of nucleation with quenched disorder, where nucleation starts at specific sites.Inhomogeneous models of nucleation have been studied in the context of statistical physics and have relevance to surface science and other areas [17,18].
Reference [14] is particularly relevant for this work because it proposes that origins fire stochastically in time, and goes on to show that this can lead to reproducible replication dynamics population wide, without the need to invoke a ''replication program.''Although valuable insights have been gained from previous works, they ignore the possibility that origins can fail to license, and we will show that this has a crucial effect on the system's dynamics.In addition, most of the existing models are numerical.In this work, we introduce an analytical model of eukaryotic DNA replication which fully takes into account the stochastic nature of both origin activation and the licensing process.Using a simple two-origin chromosome, we illustrate how replication time curves from measurements are influenced by the stochasticity of origin activation as well as by the possibility that licensing fails, reinforcing results we obtained previously by direct simulations [8].
Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
0031-9007=11=107(6)=068103 (4) 068103-1 Published by the American Physical Society We establish that the shape of the average replication time profile has a signature of the stochasticity of the replication process, even though it is a quantity defined by a population average.We derive an analytical expression relating the replication time in the regions between two origins to the standard deviation Át of the activation time distribution of the origins.This is a valuable result, since few single cell experiments have been able to give direct information about the stochastic properties of the replication dynamics.
Our results allow Át to be obtained from widely available population-wide measurements.We apply this result to measured replication time data, and estimate Át from the data, obtaining a result in agreement with current estimates in the literature.
In our model we consider a chromosome with N origins, where each origin i is defined by the following: its chromosomal position x i ; the probability q i that the origin achieves licensing (in a given cell within a population) and is thus capable of activating; and the activation time probability distribution p i ðtÞ, which is the probability density of origin i activating and starting bidirectional replication forks at time t.Since an origin may not be competent in every cell within the population, in general q i < 1, and p i satisfies R þ1 À1 p i ðtÞdt ¼ q i .The fundamental quantity from which all statistical properties of this system can be calculated is the probability density Pðx; tÞ, defined such that Pðx; tÞdt is the probability that chromosomal position x is replicated between times t and t þ dt.If only origin i were present, P would be given by Pðx; tÞ ¼ p i ðt À jx À x i j=vÞ, where v is the fork velocity, which we assume to be a constant.
In the presence of all N origins, the calculation of Pðx; tÞ is complicated by the fact that position x can be replicated by forks originated from any of the origins.Let us assume that position x is replicated between times t and t þ dt by a fork from origin i.This requires that (i) origin i activated at time t À jx À x i j=v, so that the fork arrives at x at time t; and (ii) all other origins j Þ i have either not activated or they have activated but their forks would arrive at x later than t.The probability density for event (i) is p i ðx; tÞ ¼ p i ðt À jx À x i j=vÞ, and the probability for event (ii) is tÞ, where M i is the probability that a fork from origin i arrives later than t, or fails to activate: t p i ðx; yÞdy, where s i ¼ 1 À q i is the probability of origin i not being competent.Therefore, the probability density P i ðx; tÞ that position x is replicated by origin i at time t is P i ðx; tÞ ¼ p i ðx; tÞQ i ðx; tÞ: (1) Finally, the probability density that position x is replicated at time t, irrespective of which origin the fork started from, is One of the most important quantities for comparison with experimental data is the average replication time TðxÞ at position x, which is given in terms of P as (3) À1 Pðx; tÞdt is the probability that at least one of the origins will activate.The average replication time across whole chromosomes [TðxÞ curves] has been measured in a number of organisms.However, caution is required when interpreting TðxÞ curves.In some of the biological literature, TðxÞ curves are used to directly infer origin parameters [4].For example, it is widely accepted that the values of T at x i are the average activation times of origins.However, Eq. ( 3) shows that TðxÞ is determined collectively by all origins [8].This suggests that simple interpretations of TðxÞ are not justifiable.
We want to use the general theory presented above to study replication dynamics in a simple setting.From now on we focus on the case of a hypothetical linear chromosome with just two origins.We define the chromosomal coordinates so that one of the origins has position x 1 ¼ 0; the other origin has position x 2 ¼ D. We assume for simplicity that each origin can activate within a time window Át with uniform probability; we will argue later that our conclusions are largely independent of the precise shape of the probability distribution.We select origin activation times so that the average activation time of the first origin is 0. The other origin has an average activation time , and we assume without loss of generality that !0. Thus the activation time distributions are where i ¼ 1, 2, t av 1 ¼ 0, and t av 2 ¼ .p 1 and p 2 are set to zero outside the stated intervals.
Using Eqs. ( 3) and (4), we can write analytical expressions for the probability density Pðx; tÞ and the average replication time TðxÞ.From Eq. ( 4) and (2), Pðx; tÞ vanishes outside the intervals I 1 and I 2 given by where x 1 ¼ 0 and x 2 ¼ D. In total there are five scenarios (depending on the relative values of , Át, D, and v) that differ in the dynamics of how the chromosome is replicated.From here on we will consider just the case where the condition þ Át < D=v is satisfied, since this is the case for many pairs of origins in real chromosomes.This means that the variations in the activation time Át are small enough that a fork from one origin can only replicate the other origin if that origin is not competent.The expression for TðxÞ is then A plot of TðxÞ for different values of q 1 is shown in Fig. 1.We see that TðxÞ has discontinuous derivatives at the origin locations, because the forks originate there.At the origins, the mean replication times are It is commonly assumed in the replication literature that TðxÞ has a minimum at an origin, and that the value of this minimum directly gives the average activation time for the origin.However, Eq. ( 7) shows that this is not the case and, in fact, Tðx i Þ !t av i : the mean replication time at an origin location is equal to or greater than the origin's average activation time.Only when an origin has q i ¼ 1 can Tðx i Þ ¼ t av i , because if an origin fails to activate in a given cell, the DNA at the origin location will not be replicated until a fork from another origin arrives.This means that T i is higher for origins that are more likely to fail, as seen directly in Fig. 1.Another important conclusion from Eq. ( 7) is that even when both origins have the same average activation time ( ¼ 0), generally we have . This is again due to the possibility of origins not activating.Therefore, the origin with the lower minimum of TðxÞ does not necessarily activate earlier than the other origin: minima of TðxÞ cannot be used to draw conclusions on the relative activation times of the corresponding origins, as previously assumed [4,10].Equations ( 1)- (6) show that in general TðxÞ at any point depends collectively on the parameters of all origins.However, if an origin is highly competent, early activating, and isolated from other origins, TðxÞ at that origin's position will be close to the origin's average activation time.Equation ( 6) challenges the assumption that origins are located at minima of TðxÞ.From Eq. ( 6) the expression for the slope of T near the first origin (for x > 0): This expression shows that the slope is a function of the competencies q i of both origins as well as the fork velocity v.For the origin at x ¼ 0 to be a minimum of TðxÞ, we must have T 0 > 0 for x > 0, from which we get the condition q 1 > q 2 1þq 2 .This shows that if an origin has low competence compared to its neighbor, it may not be a minimum of TðxÞ, which can be seen in Fig. 1.This phenomenon has been observed in experimental data [4].Note that if q 1 > 1=2, this condition is always satisfied and a minimum is guaranteed for this two-origin system.In addition, Eq. ( 8) shows that the fork velocity is not given by the slope of TðxÞ, an assumption widely used in the literature [4].
Figure 1 shows that TðxÞ has sharp corners at origin locations.The reason for this is that in every cell the forks always start at the same locations (the origins), which causes a discontinuous change in the proportion of leftpropagating compared to right-propagating replication forks, which in turn causes the discontinuity in the derivative of TðxÞ at the origins.In contrast, Fig. 1 shows that the local maximum of TðxÞ between two origins is a smooth curve.The reason is that in different cells in a population forks meet each other and terminate at different locations on the DNA, because of the stochastic variations in activation times.This reasoning suggests that the shape of the maxima of T could be used to infer information about the width Át of the activation time distribution.We expect that sharp maxima should correspond to forks meeting within a narrow time window, and consequently a small value of Át; conversely, a broad maximum corresponds to a high Át.This can be seen in Fig. 2, where TðxÞ is plotted for various ORI 1 ORI 2 x T (x) q 1 0.9 q 1 0.7 q 1 0.5 FIG. 1 (color online).Replication time curves for differing values of competence q 1 .Parameters values: In order to investigate this more quantitatively, we use the modulus of the second derivative of TðxÞ at the maxima to measure how broad the maxima is-low values of jT 00 j correspond to broad peaks.We now use Eq. ( 6) to find the relationship between jT 00 j and the origin parameters.At the maximum of TðxÞ, we find Thus jT 00 j is inversely proportional to Át.Notice also that jT 00 j does not depend on , which means it is independent of the origins' average activation times.This expression can be used to calculate Át from an experimental replication time profile TðxÞ, if the origin competencies and the fork velocity are known.This is a very useful result because it allows the determination of a quantity characterizing stochastic properties of the system Át from TðxÞ, which is defined by a population average.This is valuable because experiments to directly measure Át are technically difficult, and there are few results available [5,6].We note that this does not require assuming that all cells in the population are synchronized, since in each individual cell in an asynchronous population, the statistics of the relative activation times of origins remain unaltered [8].Equation ( 9) was obtained using a simple uniform distribution for p i ðtÞ.However, we expect it to be a good approximation for any single-peaked distribution function p i ðtÞ, since Eq. ( 9) only involves the second moment (the variance) of the distribution, and the replication dynamics are mostly determined by the average activation time and the width of the activation distribution-the first and second moments of p i ðtÞ.To test this assumption we used Eqs.( 1)-( 6) to numerically compute TðxÞ for pairs of origins with either a Gaussian or a skewed distribution that lead to sigmoidal cumulative distributions [14].Choosing parameters such that all these distributions have the same mean and variance, we find that in all cases T 00 never differs between distributions by more than 10%.
Despite the fact that we have been considering a hypothetical two-origin chromosome, we expect Eq. ( 6) to be a good approximation for chromosomes with many origins when two neighboring origins are relatively isolated from other origins.To test this, we looked at experimental data [4] for S. cerevisiae chromosome X, specifically the region containing origins ARS1012/13 and ARS1014 (Fig. 3).The smoothness of the curve-ignoring the fluctuations caused by experimental noise-is direct evidence for stochastic origin activation, in agreement with other results [5,6].We fitted a parabola through the data points and from this determined the value of jT 00 j.Using Eq. ( 9) we estimate the value of Át as 10.8 min [19].This value is in agreement with the limited number of single cell measurements that have been made at other S. cerevisiae origins [6].

1 FIG. 2 (
FIG. 2 (color online).Replication time curves for different widths of the activation time window.