When the mean is not enough: Calculating fixation time distributions in birth-death processes

Studies of fixation dynamics in Markov processes predominantly focus on the mean time to absorption. This may be inadequate if the distribution is broad and skewed. We compute the distribution of fixation times in one-step birth-death processes with two absorbing states. These are expressed in terms of the spectrum of the process, and we provide different representations as forward-only processes in eigenspace. These allow efficient sampling of fixation time distributions. As an application we study evolutionary game dynamics, where invading mutants can reach fixation or go extinct. We also highlight the median fixation time as a possible analog of mixing times in systems with small mutation rates and no absorbing states, whereas the mean fixation time has no such interpretation.


I. INTRODUCTION
If a group of mutants is introduced into a population of wild-type individuals, how long does it take for the population to reach a homogeneous state? This is a typical question asked in population genetics and evolutionary biology. It can be addressed using the theory of stochastic processes and approaches from statistical physics.
In its simplest form, evolutionary dynamics is modeled as a Markov process in a population of N individuals of two types [1]. Fixation occurs when the process arrives at one of its absorbing states. As the time to fixation is itself a random variable, only the computation of the distribution of fixation times provides a complete answer to our opening question. The majority of existing studies avoid this mathematical challenge and instead focus on calculating only the mean fixation time [2][3][4]. While the first moment can provide a good indication of the outcome in some circumstances, this approach can be insufficient when the distribution of fixation times is broad [5,6]. As we show in this paper, such scenarios arise in examples from evolutionary game theory.
Although the master equation describing the birth-death dynamics is linear, calculating fixation time distributions is more intricate than one may initially think. Nested expressions for all moments of fixation times are known [6][7][8] and from these the distribution can in principle be constructed recursively up to arbitrary precision. However, this approach does not provide a simple closed-form solution or a means of efficiently sampling from the arrival time distribution.
An alternative approach is to diagonalize the linear operator of the master equation and to carry out the analysis in eigenspace. This leads to a theorem attributed to Karlin and McGregor [9,10], which states that arrival times can be written as the sum of independent, exponentially distributed random variables parametrized by the eigenvalues of the master equation. Alternatively stated, the arrival time distribution is given by the convolution of the exponential distributions from which the independent random numbers are sampled, i.e., it is a phase-type distribution [11]. This theorem has been discussed in numerous sources in the probability theory literature [10,[12][13][14][15][16][17][18][19]. The discussion of these matters is usually very terse, and not easily accessible to physicists or researchers in adjacent disciplines. Researchers in the theoretical biosciences are only recently beginning to use these ideas for the purpose of model reduction [20,21]. Existing results are limited to specific initial conditions and types of birth-death chains, and a clear understanding of the analysis in eigenspace is lacking.
We consider one-step birth-death processes with two absorbing states and a general initial condition. This model describes the invasion (or extinction) of a number of mutants in a wild-type population. It is a generalization of existing studies, which focus on a single absorbing boundary and a specific initial condition [9]. The purpose of our work is to provide a systematic and explicit closed-form solution for fixation time distributions, and a physical interpretation of different representations in eigenspace [14,17,18]. By generalizing the Karlin-McGregor result to account for arbitrary initial conditions and multiple absorbing states, we show that these different representations can be traced back to one common origin. To generate samples from the fixation time distribution it is sufficient to simulate forward-only processes in eigenspace. This provides effective model-reduction schemes.
We use our results to relate fixation processes to the equilibration dynamics of evolutionary systems with mutation (and hence with no absorbing states). These equilibration processes are often characterized by the so-called mixing time, the time it takes the system to come within a set distance of its stationary distribution [22,23]. We thus discuss an appropriate analog of the fixation time in the limit of small mutation rates, and outline the limitations of this relation.
The paper is organized as follows: In Sec. II we describe the model used and the principal idea behind the work. In Sec. III we outline the mathematical procedure that takes us from the model to the arrival time distributions. Details of the calculations can be found in the Appendixes. In Sec. IV we apply our results to specific examples of evolutionary games, confirming both the accuracy and efficiency of our method. In Sec. V we discuss the relationship between the fixation time and mixing time in the limit of small mutation rates, before drawing conclusions in Sec. VI.
one-step birth-death process in real space · · · FIG. 1. (Color online) One-step birth-death process in a population of N individuals. The variable i denotes the number of invading mutants. The states i = 0 (extinction) and i = N (fixation) are absorbing. Birth rates are labeled b i and death rates d i .

II. MODEL DEFINITION AND MAIN IDEA
We study a one-step birth-death process with states i = 0,1, . . . ,N and characterized by the birth and death rates b i ,d i (i = 1, . . . ,N − 1), as illustrated in Fig. 1. This describes a population of constant size N with i individuals of the mutant type and N − i of the resident wild type [1]. The states i = 0 and i = N are absorbing in the absence of mutation, and so the dynamics will end at one of these two states eventually. Our objective is to calculate the distribution of arrival times at these absorbing states for a general starting point i 0 .
The outcome of our analysis (detailed below) are the reduced forward-only processes in eigenspace illustrated in Fig. 2, which have the same arrival time distribution as the original birth-death process. The forward jump rates are determined by the eigenvalues λ α of the original birthdeath process. The schematic in Fig. 2(a) shows multiple forward-only channels, where dashed arrows indicate that forward-only process in eigenspace some steps are to be skipped. Figure 2(b) shows a single forward-only channel. Long arrows indicate direct jumps to the final eigenstate. In both cases, the number of possible paths is determined by the initial condition and the final state (i = 0 or i = N ) of the original birth-death process.
Arrival time samples of the original process are generated from Fig. 2(a) in the following way: One of the channels is chosen with probability determined by the birth-death rates, the initial condition, and the arrival state of the original real-space process. After a channel has been selected, the clock is started and the forward-only process of the channel is traversed. The clock is stopped when the final state in the schematic is reached (absorption).
Arrival time samples are generated from Fig. 2(b) by traversing the forward-only chain. The quantities 1 − F α are the probabilities to reach absorption directly from the intermediate eigenstate α. These probabilities are again determined by the birth-death rates, the initial condition and the final state of the original process and explicit formulas can be found in Appendix C 2.

III. ANALYSIS
To derive these results it is convenient to focus on the interior states i = 1, . . . ,N − 1 in the birth-death process in Fig. 1. The probability to be in state i at time t, p i (t), satisfies the equationṗ = A · p with formal solution p(t) = exp(At) · p(0). The matrix A describes the transient states and has elements a i, Probability continuously flows into the two absorbing states. Hence all eigenvalues of A are negative, and so we focus on −A and its eigenvalues. The matrix exponential can be evaluated by Laplace transformation, and subsequent back transformation allows one to calculateṖ 0|i 0 (t) = d 1 p 1 (t) anḋ P N|i 0 (t) = b N−1 p N−1 (t). Mathematical details can be found in Appendix B. Up to normalization these are the conditional arrival time distributions at states 0 and N respectively. For the remainder of the paper we focus only on absorption at state N . Analogous expressions for absorption at 0 can be found in the Appendixes. We find the following expression for the probability flux (per unit time) into state Ṅ where where the E (λ α ) (t) are exponential distributions with the eigenvalues of −A, λ α > 0, as parameters. The symbol * represents a convolution. The object R is of the form where y α > 0 are the eigenvalues of the submatrix −A (i 0 −1) . In these expressions δ is the usual Dirac distribution, and δ is its derivative, as described in Appendix A 1. We can identify the prefactor B i 0 ψ i 0 / = φ N|i 0 as the probability that the original system gets absorbed in state N (as opposed to 0). The effective dynamics shown in Fig. 2 are obtained by evaluating the convolutions in Eq. (1). To illustrate this it is useful to first consider the convolution of the exponential distribution E (λ) (t) with an object of the form δ(t) + y −1 δ (t) (y > 0). In Appendix A 4 we show that Assuming λ/y < 1 (which will be the case throughout our analysis) this describes a convex combination of a point mass at zero and an exponential distribution. For samples, set t = 0 with probability λ/y, otherwise draw t from E (λ) (t).
To arrive at the dynamics depicted in Fig. 2(a), each of the i 0 − 1 terms of the form δ + y −1 α δ in Eq. (1) is paired up with a separate exponential, as described in Appendix C 1. This creates a total of 2 i 0 −1 possible forward-only channels with up to i 0 − 1 exponential steps skipped, as shown in Fig. 2(a). To arrive at the dynamics shown in Fig. 2(b), the i 0 − 1 objects of the form δ + y −1 α δ are successively convoluted with the full chain E N−1 from the right, as described in Appendix C 2. This leads to i 0 channels, in which 0,1, . . . ,i 0 − 1 exponential steps are skipped. This can be illustrated as a single forward-only channel, with appropriate rates of jumping from intermediate eigenstates to absorption. By considering these forward-only processes we arrive at closed-form expressions for the arrival time distributions. For arrival at state N , the distribution is given by (see Appendix C 3) The representation shown in Fig. 2(a) corresponds to the picture obtained for a restricted set of processes by probabilistic methods in Ref. [14]. On the other hand, Fig. 2(b) reflects the findings of Refs. [17,18], derived from the construction of intertwining processes. Our analysis shows that these different decompositions originate from one common structure, Eq. (1). The explicit schemes in Fig. 2 provide a computational method to generate samples from the arrival time distribution efficiently, for example by carrying out simulations of these forward processes using the Gillespie algorithm [24]. It is important to keep in mind that the eigenstates shown in Figs. 2(a) and 2(b) cannot be mapped one to one to the states in Fig. 1. The equivalence of the real and eigenspace representations only holds on the level of arrival time statistics.

IV. EVOLUTIONARY GAMES
As an application of this theory we now consider examples of evolutionary dynamics with frequency-dependent selection [1,25,26]. Such models are used to describe the interaction of invading mutants (A) and resident wild-type individuals (B). These scenarios can be formulated as evolutionary games [1]. We focus on a 2 × 2 normal form game, where π A (i) and π B (i) are the expected payoffs in a population of i individuals of type A and N − i individuals of type B. For this example we assume a pairwise comparison process, leading to birth and death rates given by The parameter β > 0 is the so-called intensity of selection [1,27,28]. The parameters R,S,T ,P specify the interaction, and we consider three possible scenarios: coexistence (T > R and S > P , stable heterogeneous population); coordination (R > T and P > S, unstable heterogeneous population); and the prisoner's dilemma (T > R > P > S, stable homogeneous population) [1].
As shown in Fig. 3, arrival time distributions can be broad and skewed, such that the mean fixation time contains only limited information. Figure 3 also demonstrates the computational benefits of our formalism. Evaluating the distribution in Eq. (4) requires O(N 3 ) operations as the spectra of −A and −A (i 0 −1) are needed. Generating arrival time distributions from simulation of the original birth-death process instead can take exponentially long. For the coordination game and the prisoner's dilemma, fixation at N is rare and the bottleneck of direct simulations is the limited sampling. For coexistence games, fixation times are exponentially long in N . This impedes accurate simulations. Direct numerical integration of the master equation would suffer from the same problem.

V. EQUILIBRATION PROCESSES IN SYSTEMS WITH MUTATION
We now consider birth-death processes without absorbing states by adding mutation occurring at a rate u 1, such that b 0 = O(u) and d N = O(u). All other transition rates depicted in Fig. 1 are O(u 0 ) and are only affected at sub-leading order by  u (exact transition rates can be found in Appendix D 2). The timescale of the dynamics is characterized by the so-called mixing time, t mix . This is the time taken for the probability distribution, P(t), to come within a specified distance of the stationary distribution P st , i.e., t mix is the first time at which d[P(t mix ),P st ] = ε. The distance between distributions P and Q commonly used in this context is d( [22,23]. Using our results we can determine if and when there is a correspondence between the mixing time and the fixation time. For very small mutation rates (0 < uN 1) the stationary distribution is of the form P st In the strict absence of mutation (u = 0) the system reaches fixation, so its terminal distribution, , is of the It is clear that these two distributions are different; P st in systems with u > 0 is independent of the initial condition. Thus there is no obvious connection between fixation times and mixing times in the limit u → 0.
However, equilibration in many systems with rare mutations is a two-step process; the system first reaches an intermediate distribution that is dependent on the initial condition, before leaking on a longer timescale into the final stationary state [23]. This is analogous to the quasistationary distribution before reaching absorption in systems without  mutation [16]. Our analysis suggests that this intermediate distribution of systems with 0 < uN 1 is close to the terminal distribution of the system with u = 0, and that both systems initially evolve along similar trajectories (see Appendix D 2). This can be seen in Fig. 4 where we represent the probability distribution as single points in the (P 0 ,P N ) plane. The distribution first approaches the terminal distribution before slowly converging to the stationary distribution. The most appropriate analog of fixation times in systems with small mutation rates is thus the time to reach this intermediate distribution, not the time to stationarity (mixing time).
The mixing time can related to the fixation time in a different way: The probability distribution initially satisfies d[P(t),P st ] = d[P(t), ], which is a consequence of the distance measure used (solid lines in Fig. 4). This equivalence holds while P 0 (t) < 1 − σ and P N (t) < σ (see Appendix D 2). Prior to the first time that this condition is violated, P(t) is approximately the same as in the system without mutation where we have Pr(t fix < t) = 1 − d[P(t), ] for the cumulative fixation time distribution. The condition d[P(t), ] = 1/2 therefore translates into the time at which half of the samples have reached fixation, and provided the above conditions hold there is a correspondence between the mixing time and the median fixation time. This is illustrated in Fig. 5 where we consider the example of a coordination game with a symmetric stationary distribution [23]. If P st is asymmetric and σ (1 − σ ) (or vice versa), then the condition P N (t) < σ is violated after a short period of time. In such cases the above correspondence may not hold for the median fixation time, but only for a lower percentile. This is the case for the example shown in Fig. 6.

VI. CONCLUSION
In summary, we have constructed eigenspace representations that capture the full arrival time statistics of one-step birth-death processes. The mapping into eigenspace has a clear interpretation as forward-only exponential processes. Sampling of the original arrival time distributions reduces to simulating these forward-only processes, or equivalently evaluating a finite sum of exponential random variables, turning our results into an effective tool for model reduction. The compact structure of the forward-only processes allows us to derive exact, closed-form expressions for the arrival time distributions of the original process in terms of its spectrum. As we have demonstrated, the numerical evaluation of these expressions is an efficient polynomial-time method to obtain full arrival time statistics. We have also established a link between equilibration times in systems with small mutation rates and the median fixation time in absence of mutation in some circumstances. Our work advances the understanding of the mathematical structure underpinning the dynamics of fixation. We have placed existing representations for simpler cases into a wider and more coherent context [14,17,18]. Nevertheless, there are fundamental open questions. Claims of probabilistic interpretations of Karlin and McGregor's theorem have been made [15,16], but in our view this picture is still incomplete. We would argue that a full probabilistic interpretation of the representations in eigenspace is only reached when each time step of the forward-only process can be constructed directly and uniquely from realizations of the original process alone. Whether or not this is possible is unclear. ACKNOWLEDGMENT P.A. gratefully acknowledges support from the EPSRC.

Dirac distribution and its derivative
The Dirac-δ distribution has support {0}. It can be written in its Fourier representation, δ(t) = ∞ −∞ dω e iωt . The distributional derivative, δ , can be conveniently defined by its Fourier transform as well [29]. It has the form When this is convoluted with a test function, f (t), with infinite support one obtains (after integration by parts) If a test function, g(t), has finite support, say t 0, then one finds

Laplace transform of an exponential distribution
We frequently use the Laplace transform of an exponential distribution in our subsequent derivation. This is a standard result, but it is useful to include it here. We consider and exponential distribution with parameter λ > 0, such that E (λ) (t) = λe −λt (t 0). The Laplace transform is obtained as follows This integral is only convergent in the region Re(s) > −λ.
Within this region we have

Laplace transform of an object δ(t) + z −1 δ (t)
We now show that the Laplace transform of δ(t) + z −1 δ (t) is 1 + z −1 s. The object δ (t) is the derivative of the Dirac-δ distribution δ(t) (see Appendix A 1 above) and z > 0 is a constant. We have This expression has no singularities, and thus the region of convergence in terms of s is the entire complex plane. Note that we explicitly define the lower integration limit as 0 − to include the δ function in the integral, and we have used lim t→0 − δ(t) = 0. 042154-5

Convolution of an exponential distribution with an object δ(t) + z −1 δ (t)
The convolution of an exponential distribution E (λ) (t) with an object of the form δ(t) + z −1 δ (t) (for constant z), as shown in Eq. (3), is APPENDIX B: CALCULATION OF ABSORPTION TIME DISTRIBUTIONS VIA LAPLACE TRANSFORMS

Laplace representation
As mentioned in the main text it is convenient to focus on the states i = 1, . . . ,N − 1 of the birth-death process shown in Fig. 1 of the main paper. The dynamics of these states is given by p = A · p, where A is an (N − 1) × (N − 1) matrix, and where p i (t) is the probability that the system is in state i at time t. We note that this is not a probability-conserving master equation, as probability mass continuously leaks into the absorbing states. We will use the notation p i (lower case) when we discuss the restricted system (with N−1 i=1 p i (t) 1), and we will write P i (t) (upper case) when we discuss the full system, i = 0, . . . ,N. For the latter one always has We can take the Laplace transform and write the matrix exponential in the resolvent form We have here written p(s) for the Laplace transform L[p(t)].
We consider initial conditions of the form p i (0) = δ i,i 0 (1 i 0 N − 1). Our strategy is to compute p 1 (t) and p N−1 (t), and from these the rates d 1 p 1 (t) and b N−1 p N−1 (t), with which probability arrives in the absorbing states, can be obtained. Thus we are interested in

The (i,j )th element of the inverse of an invertible matrix B is given by
and likewise for the (N − 1,i 0 )th element.

Calculation of cofactors
The cofactor C i 0 ,1 is found from dropping column 1 and row i 0 from s1 I − A and evaluating where a i = b i + d i is introduced for compactness. Using Laplace's formula, this can be written as For the (i 0 ,N − 1)th cofactor we have The matrix A (i 0 −1) consists of rows and columns 1,

Arrival time distributions
Putting things together we have Here we have used det(s1 I − A) = N−1 β=1 (s + λ β ), where λ β > 0 are the eigenvalues of −A. Using Ṗ 0 (s) = d 1 p 1 (s) and Ṗ N (s) = b N−1 p N−1 (s), we obtain the Laplace transforms of the absorption time distributions at sites i = 0 and i = N , respectively. They are given by

Time representation of solutions
Combining Eqs. (A5), (A6), and (B9), we can write Using the fact that L −1 [ f (s) · g(s)] = f * g, we can perform the inverse Laplace transform of the expressions in Eq. (B10). We finḋ This is the expression shown in Eq. (1). We identify the prefactors in square brackets as the fixation probabilities φ 0|i 0 and φ N|i 0 , respectively.

Symmetry of the fixation time distributions
By choosing the initial conditions i 0 = N − 1 and i 0 = 1, the expressions in Eq. (B11) can be reduced tȯ where E = E (λ 1 ) * · · · * E (λ ) . From this we concludė that is the conditional arrival time distribution at state i = 0 given i 0 = N − 1 is equal to the conditional arrival time distribution at state i = N given i 0 = 1. This symmetry has been known for the mean fixation time [3,30], and it was recently shown that the correspondence holds for the full distribution [21]. Our approach offers an alternative way to obtain this intriguing result.

APPENDIX C: COMPUTING EIGENSPACE REPRESENTATIONS
As the convolution operator is commutative, we can order the convolutions in Eq. (B11) in any way. The exponential distributions and the objects of the form δ(t) + z −1 δ (t) in Eq. (B11) can hence be combined in multiple ways.

Convolutions I: Pairing δ + z −1 α δ with individual exponential distributions
In this section we choose to couple E (λ N−α ) (t) with δ(t) + x −1 α δ (t) for the purposes ofṖ 0|i 0 (t), and with δ(t) + y −1 α δ (t) when we calculateṖ N|i 0 (t). Carrying out these convolutions in Eq. (B11), we arrive aṫ We stress that the objects δ(t) can be paired with any of the exponential distributions. We chose to match these at the end of the exponential chain so that the reduced chains can be systematically compared. This is the representation described in Fig. 2(a).

Convolutions II: Recursively convolving with exponential chain
If we write Eqs. (B11) in the forṁ where E = E (λ 1 ) * · · · * E (λ ) , then we can recursively convolute the objects involving δ functions onto the chain of exponentials from the right. We note that which follows directly from Eq. (A7). From this, each of the recursive convolutions introduces a new exponential chain with one step less. For example, Performing all the convolutions leads to the following expressions: 042154-8 These expressions can be written aṡ where G (x) N−α and G (y) N−α are constants (independent of time). The fixation time distributions are thus linear combinations of the distributions E N−α . We note here the equalities We now proceed to express the above linear combination of exponential chains [Eq. (C6)] as the single chain shown in Fig. 2(b). In the schematic shown in this figure the system can transition to two possible states if currently in eigenstate α: either α → α + 1 or α → N . These paths have transition rate F α λ α and (1 − F α )λ α , respectively. The total rate of transitioning out of α is then λ α , and the waiting time at α is an exponential distribution with parameter λ α , no matter whether the system transitions to α + 1 or to N . The quantity F α denotes the probability that the next state of dynamics in eigenspace is α + 1, if the system is currently in eigenstate α. With probability 1 − F α the next state is eigenstate N . Evaluating the probability of a trajectory in terms of F α , and then matching with Eq. (C6) gives We can express all transition rates in Fig. 2(b) as

Evaluation of bottom-line arrival time distributions
The final expressions for the (un-normalized) arrival time distributions follow directly from Eq. (C6). First we note that the convolution of exponential distributions has the form Substituting this expression into Eq. (C6), or equivalently Eq. (C5), we arrive at the final expressionṡ Evaluating these expressions requires the calculation of the eigenvalues x α ,y α , and λ α .

Accuracy and efficiency of simulation-based approaches
In the inset of Fig. 3 we show that computing arrival time distributions directly from simulations is less efficient than computing them exactly using our formalism. To generate the CPU-time curve for the simulation method we measure the distance, d, between the histogram of arrival times at state N, ρ N , against the exact distribution. This distance is defined by where M is the number of histogram bins. Note that this is the continuous analog of the distance measure between distributions to be discussed in the next section. The histogram, ρ N , is populated with an increasing amount of independent realizations until the distance falls below d = 1/2. We then plot the time taken to complete this number of simulation runs of the given process.

Dynamics without mutation
In the system without mutation all realizations reach fixation eventually. If the dynamics is started from state i 0 [i.e., P i (t = 0) = δ i,i 0 ] the stationary state of the birth-death process, i.e., the terminal distribution, is of the form The quantity φ N|i 0 is the probability that the process reaches the absorbing state N . The probability of being absorbed at 0 is 1 − φ N|i 0 .
Let us now consider the distance of the distribution P(t) from this distribution = (1 − φ N|i 0 ,0, . . . ,0,φ N|i 0 ). In line with the existing literature [22,23] we use the distance measure for two distributions P and Q. We then have Probability continuously flows into the absorbing states, hence P 0 (t) 1 − φ N|i 0 and P N (t) φ N|i 0 [P 0 (t) approaches 1 − φ N|i 0 from below with time, and similarly for P N (t)]. We can therefore simplify the above expression, and we are left with This means that the distance d(t) = d[P(t), ] is given by the probability that the system has not yet reached fixation in either of the absorbing states by time t. This in turn means that 1 − d(t) = Pr(t fix t) is the probability to have reached fixation by time t, i.e., it is the cumulative distribution of the unconditional fixation time t fix . The quantity −ḋ(t) is therefore the probability density function of the unconditional fixation time.
As a side remark we note that the mean unconditional fixation time can be expressed as follows Thus the mean unconditional fixation time is the area under the curve d(t).

a. Definitions
We now consider systems with mutation, which occurs with rate u 1. This means that the states 0 and N are no longer absorbing. More specifically we consider systems in which b 0 = O(u) and d N = O(u), i.e., escape from the states 0 and N occurs with a rate proportional to u. All remaining transition rates (b i ,d i , i = 1, . . . ,N − 1) are O(u 0 ), and hence are only affected at sub-leading-order by the introduction of mutation. For u = 0, one recovers the case with absorbing states (b 0 = d N = 0). Below we will compare the system with small mutation rates with the system without mutation.
The rates used for the analysis shown in Figs. 4, 5, and 6 are given by (see Ref. [23])

b. Stationary state
For u > 0 there are no absorbing states and the dynamics reaches a stationary distribution, P st , with full support, i.e., P st i > 0 for all i = 0,1, . . . ,N. This distribution can be expressed as [31] In the limit of small mutation rates (0 < uN 1), it can be seen that Together with the normalization condition ( N i=0 P st i = 1) we can determine that P st 0 and P st N must be O(u 0 ), and the remaining probability masses are O(u). With this we can write for i = 0, . . . ,N, where σ = O(u 0 ). This indicates that in the limit of small mutation the distribution is peaked at the boundaries.

c. Intermediate distribution
The system without mutation (u = 0) has terminal distribution , as discussed above. In particular P 0 (t) and P N (t), the probability masses concentrated in the absorbing states, grow with time and we have P 0 (t) → 1 − φ N|i 0 and P N (t) → φ N|i 0 as t → ∞.
The rates of the system with small mutation differ from those of the system without mutation only by corrections of O(u). At small mutation rates we expect the dynamics on a short timescale (t u −1 ) to be essentially that of the system without mutation, the effects of mutation only set in at a longer time. Of course the boundary states are no longer absorbing, but we argue that the system initially approaches a distribution close to before reaching its stationary distribution P st .
This can be seen mathematically as follows: Let P (u=0) (t) = [P (u=0) 0 (t), . . . ,P (u=0) N (t)] be the probability distribution of the system without mutation. The time evolution is described by the master equationṖ where M is an (N + 1) × (N + 1) transition matrix (to be distinguished from the truncated matrix A). Let P (u) (t) be the distribution in the system with mutation whose evolution is described byṖ where the matrix Q reflects the difference between the systems with and without mutation. The elements of both matrices M and Q are O(u 0 ). Now, let q(t) = P (u) (t) − P (u=0) (t), such thaṫ q = M · q + uQ · P (u) .
We want to calculate how the separation, q, grows in time given that both systems (with and without mutation) start from the same initial condition [i.e., q(0) = 0]. For this purpose it is convenient to work in the eigenspace of M. This matrix has two zero eigenvalues, μ 0 = μ 1 = 0, with eigenvectors (v 0 ) i = δ i,0 and (v 1 ) i = δ i,N . These are the absorbing states of the system without mutation. Decomposing q(t) = α q(t) α v α into eigendirections v α of M we have˙ q α = μ α q α + ug α (t), where we have written Q · P (u) (t) = α g α (t)v α and we note that g α (t) = O(u 0 ). This can be integrated to give q α (t) = ue μ α t t 0 e −μ α τ g α (τ )dτ.
On short time scales (t u −1 ) q(t) = O(u), and hence the separation q(t) is also O(u). That is to say in the limit u → 0, both systems (with and without mutation) initially evolve along the same trajectory. On this time scale both systems approach the distribution . On a longer time scale [t = O(u −1 )], differences between the two systems become of O(u 0 ). However these differences are concentrated on the states i = 0 and i = N . Effectively, a redistribution of probability mass between the boundary states takes place. The distribution of the system with mutation evolves from i|i 0 = (1 − φ N|i 0 )δ i,0 + φ N|i 0 δ i,N to P st i = (1 − σ )δ i,0 + σ δ i,N , as shown in Fig. 5.

d. Distance from stationarity and mixing times
Approximating the stationary distribution of the system with small mutation rate as P st for the distance between the distribution P (u) (t) of the system with mutation at time t and the stationary distribution. While P 0 (t) and P N (t) are monotonically increasing with time in the system without mutation, this is not necessarily the case if there is mutation. Hence we cannot easily drop the absolute values in Eq. (D15) as in the case without mutation. We observe, though, that P (u) 0 (t = 0) = 0 and P (u) N (t = 0) = 0 for 0 < i 0 < N. Hence, there is an initial phase of the dynamics in which P (u) 0 (t) < 1 − σ and P (u) N (t) < σ . Let us write t * for the first time at which either P 0 (t * ) = 1 − σ or P N (t * ) = σ (whichever happens first). Prior to this time we have d[P (u) (t),P st ] ≈ 1 2 (1 − σ ) − P (u) 0 (t) This is the same as the distance to the fixation distribution, , in the system without mutation, given in Eq. (D4). From this we can conclude that That is to say, the mixing time is related to the cumulative distribution of fixation times. The first time at which d[P (u) (t),P st ] = ε, provided t < t * , is approximately the (1 − ε)th percentile of the unconditional fixation time distribution. Choosing ε = 1/2, we have the equivalence with between the mixing time and the median fixation time. This equivalence holds for the example shown in Figs. 4 and 5. However, it does not hold in the example shown in Fig. 6 where the stationary state is asymmetric in the (P 0 ,P N ) plane. In this case the equivalence in Eq. (D18) breaks down prior to the time at which d[P (u) (t),P st ] = 1/2.