Early Real-Time Estimation of the Basic Reproduction Number of Emerging Infectious Diseases

Bahman Davoudi, Joel C. Miller, Rafael Meza, Lauren Ancel Meyers, David J. D. Earn, and Babak Pourbohloul* Mathematical Modeling Services, British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA Section of Integrative Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, USA Department of Mathematics & Statistics and the M.G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada School of Population & Public Health, University of British Columbia, Vancouver, British Columbia, Canada (Received 22 July 2011; revised manuscript received 6 April 2012; published 30 July 2012)

goal of early outbreak analysis is to obtain a reliable estimate for the basic reproduction number, R 0 .
Over the past few years infectious disease epidemic processes have gained attention from the physics community. Much of the work to date, however, has focused on the analysis of an epidemic process in which the disease has already spread widely within a population; conversely, very little attention has been paid, in the physics literature or elsewhere, to formulating the initial phase of an outbreak. Careful analysis of this phase is especially important as it could provide policymakers with insight on how to effectively control an epidemic in its initial stage.

Introduction
The basic reproduction number, R 0 , is a fundamental characteristic of the spread of an infectious disease. It is generally defined as the expected number of new infections caused by a typical individual during the entire period of his/her infection in a totally susceptible population [1][2][3]. Because R 0 is a simple scalar quantity, and perhaps because in many circumstances it determines the expected (average) final size of an outbreak [3][4][5][6][7], it has been widely used to gauge the degree of threat that a specific infectious agent will pose as an outbreak progresses [8][9][10]. While it is clear that knowing the value of R 0 can be very useful for policymakers in planning a response; it is not as straightforward to obtain a reliable estimate of R 0 , especially in early stages of an outbreak, before large scale uncontrolled transmission has taken place and before the basic biology and transmission pathways of the pathogen have been characterized.
Early in an outbreak, the pattern of disease spread is predominantly influenced by the probabilistic nature of infection transmission. Consequently, a wide array of outcomes is possible, ranging from the outbreak fizzling out, even in the absence of an intervention, to circumstances where the initial stage expands into a large-scale epidemic. Once a fullblown epidemic develops, several assumptions can be made that simplify the estimation of R 0 , as has been discussed in detail in the literature [1,3,5,8].
In many cases, it is necessary to assess the impact of various intervention strategies before a large-scale epidemic occurs. In doing so, stochastic manifestations of disease transmission as well as the underlying structure of the contact network should be taken into account. The first aspect has been widely studied. For example, the Reed-Frost model is a chain-binomial stochastic model where each infected individual can infect susceptible individuals and they are all assumed to have the same contact rate [11][12][13][14].
Another example is the methodology developed by Becker [15], and Ball and Donnelly [16] based on a branching process susceptible-infected-recovered (SIR) model. Branching processes have received wide attention because they facilitate the evaluation of the basic reproduction number as well as the final epidemic size and epidemic probability [17]. More recently, the 2003 global outbreak of severe acute respiratory syndrome (SARS) inspired the development of new methodologies based on the daily number of new cases and the distribution of the serial interval between successive infections [18][19][20][21]. However, none of these methods take into consideration the influence of the contact network underlying an epidemic process or it is assumed that the contact network is a classic random graph.
Several new methods to estimate the basic reproduction number, R 0 , were proposed during or shortly after the 2009 H1N1 influenza pandemic. Notably, Nishiura et al. [22] employed an age-structured model to derive an estimate for R 0 . Katriel et al. [23] used a new discrete-time stochastic epidemic SIR model that explicitly takes into account the diseases specific generation-time distribution and the intrinsic demographic stochasticity inherent to the infection process. Balcan et al. [24] employed a method that is based on the distribution of arrival times of the H1N1 in 12 different countries seeded by the Mexico epidemic using one million computationally simulated epidemics. Nishiura et al. [25] also developed a discrete time stochastic model that accounts for demographic stochasticity and conditional measurement and applied to estimate the R 0 value using the weekly incidence of H1N1 pandemic influenza in Japan. Although all of these constitute an important advancement in the literature, none of them simultaneously addresses analytically the stochasticity due to the underlying contact network and the transmission process.

Outline Summary
In the following we first describe the basic notion of network model. We then define infection hazard or infectivity function, the removal hazard or removal function and their related quantities namely the transmissibility and the removal probability density function. Furthermore, we derive a stochastic renewal equation that allows us to understand the relationship between the rate of newly-infected individuals at a given time and all preceding infected cases in the initial stage of an outbreak. We study the renewal equation in the exponential regime and obtain the Wallinga-Lipsitch equation [26]. In this regime, we also obtain an equation that express the generation interval distribution in term of the transmissibility and the removal probability distribution function. In next part, we obtain an equation for the basic reproduction of removed individual. This equation acts as a constrain allowing us to find one unknown. We will then draw an algorithm and show how one can evaluate the basic reproduction number. We finally present our extensive numerical results showing the accuracy of the methodology in predicting the basic reproduction number for different network models and parameters types and values.

Network basis
This section briefly introduces the idea of contact network epidemiology and defines the key concepts of infection rate, removal rate, transmissibility and removal probability density. We map a collection of N individuals to a network where each vertex represents an individual and each edge shows a pathway of possible infection transmission between two individuals. We use k to denote the degree of a given individual (the number of contacts that s/he has) represented graphically by the number of edges emanating from a vertex, and we use p k to denote the degree distribution (the probability that a randomly chosen vertex has degree k (k contacts)). Several important quantities can be derived once a network's degree distribution is known. The moments of the degree distribution are k n = ∞ κ=0 k n p κ . For n = 1, k is the average number of nearest neighbours of a randomly chosen individual, which we denote z 1 . The average number of second nearest neighbours of a randomly selected individual, z 2 , can be expressed as k 2 − k [27]. To estimate R 0 , we count the number of edges along which an individual can infect others, once that individual has become infected. This quantity, usually termed excess degree, represents the number of edges emanating from a vertex (individual), excluding the edge that was the source of the infection. One can show that the average excess degree (Z x ) is given by the ratio z 2 z 1 [29].
We denote the time at which an individual acquires the infection by t i , and the time since acquiring infection by τ = t − t i (also known as age of infection implies that for small δτ the conditional probability that infection occurs across an edge between times τ and τ + δτ , given that it did not occur by time τ , can be approximated by λ i (τ )δτ . Typically, λ i (τ ) is initially zero during the latent period, it increases to a certain level and then declines during the infectious period, before finally vanishing and returning to zero at the time of permanent recovery. Figure 1 shows four hypothetical infectivity functions, the first of which is the typical case. In practice, the functional form of λ i (τ ) should be estimated from the actual transmission profile corresponding to a specific disease. Note that the only technical restriction on λ i (τ ) is that it must be a non-negative integrable function.
Given the infectivity function, one can evaluate the probability that an individual transmits the disease to one of his/her contacts during a specific time period. Let T (τ ) denote the probability of disease transmission along one edge for an individual with infection age τ . Then T (τ ) satisfies [27,30] In general, the time-to-removal varies from one individual to another and there is no a priori knowledge of the exact value of this quantity for each individual. Therefore we must account for its variability as well. Let λ r (τ ) denote the removal hazard or removal function, i.e., the instantaneous rate of removal for an individual with infection age τ . This implies that for small δτ the conditional probability that an individual is removed between  can be related to death or various interventions, such as quarantine, reduction of social activity due to severity of illness, behaviour change etc. Let Ψ(τ ) denote the probability that an individual has time-to-removal ≥ τ . Then [30] subject to the condition Ψ(∞) = 0. The removal probability density function is given by . Using Equation (1) and ψ(τ ) (or Ψ(τ )) one can calculate the expected transmissibility, i.e the probability of disease transmission -across a given edge: The basic reproduction number, which represents the average number of infections caused by a typical infected individual in a fully susceptible population, can be written as the product of the expected excess degree and the expected transmissibility [27] R 0 = Z x T.

Disease dynamics on networks
In this section we present some examples of the spread of an infectious agent on a contact network. The pattern of disease spread on a network can be categorized into three different regimes: stochastic, exponential and declining. The process of disease spread is stochastic in nature, given that the disease transmission along an edge occurs in a probabilistic manner and that the degree of the next infected individual cannot be determined a priori. The stochastic behaviour is dominant in the initial stage of disease spread when the number of infectious individuals is comparatively small (stochastic regime). The effect of stochasticity becomes much less pronounced when the number of newly infected individuals becomes significant, and stochastic fluctuations are smoothed out (exponential regime). Progression of disease spread starts to decline as the cumulative number of infected cases becomes comparable to the size of the network, at which point network finite-size effects becomes important (declining regime) [29].
From now on, we use the tilde notation to make the distinction between the realization of a stochastic process (with tilde) and its mean field value (without tilde). We definej(t) as the time series of infection events, which is a sum of delta Dirac functions located at each infection time. The case countC(t, δt) that gives the number of infections between times t and t+δt can be expressed asC(t, δt) = t+δt tj (t ′ )dt ′ . We defineJ(t) =C(t, δt)/δt as the incidence rate of new infections at time t, where δt is the maximum time resolution.
In the exponential regime, the incidence rate of new infections grows exponentially and therefore can be expressed as:J for some α > 0. The above-mentioned regimes are represented in Figure 2.

Stochastic dynamics of disease
In this section we outline a general framework to estimate the basic reproduction number assuming that all information about a specific realization of the epidemic process up to time t is known. We start by first deriving a renewal equation for the rate of new infections,J(t).

A renewal equation forJ(t)
Let's consider the first person in the population infected with the disease and assume that her/his infection occurred at time 0. From equation (3), we can infer that the expected number of infections that this individual will cause by time t is given by (assuming that his/her excess degree is equal to the average excess degree). This leads in the limit t → ∞ to the usual value of R 0 = Z x T . The above expression also implies that the mean contribution of this individual to the incidence rate of new infections at his/her infection age τ is given by The above equations can be readily generalized to address the random process of infection spread on a contact network. In particular, one can compute the contribution of the individuals infected at time t−τ ,J(t−τ )δt, to the number of new infections occurring in initial stage of an outbreak at time t,J(t)δt, namelỹ whereΘ(τ, t) denotes the fraction of those edges where disease is actually transmitted exactly at time t. This is a random function with expectation given by dT (τ ) dτ . Please note that Z x and Ψ(τ ) are a function of t in the most general case. Expression (8) is a generalization of the classical Lotka renewal equation for population growth [31,32], here applied to epidemic dynamics when taking into account the structure of the underlying contact network and the stochasticity inherent to the transmission process.

Exponential regime and the generation interval distribution
In this section we show the relationship between our results and the methodology developed by Wallinga and Lipsitch [26]. During the exponential regime we can ignore stochastic fluctuations and replace all quantities with their expected values. In particular, if we ignore stochastic effects we can re-write the renewal equation (8) as dτ . Equation (9) then takes the simpler form which is the well-known Lotka renewal equation [31,32].
From the definitions of χ(τ ), the expected transmissibility (Eq. (3)) and the basic reproduction number (Eq. (4)) we can see that in Eq. (10) and taking the limit when t → ∞ we obtain that It is worth mentioning that the exact exponential regime can be reached when t → ∞ for an infinite-size network and that is why it is valid to take the limit. This means that there should be a slight deviation from the exponential behaviour for a "finite-size" system at "finite time" once the outbreak has surpassed the stochastic regime. Dividing both sides of Eq. (11) by R 0 we find that [26] whereχ(τ ) = χ(τ )/R 0 is defined as the generation interval distribution. This equation relates the basic reproduction number to the Laplace transform of the generation interval distribution in the asymptotic case (infinite-size, infinite-time). Now, in our formulation the generation interval distribution can be written as This equation describes how the transmissibility, T (τ ), and the distribution of the time to removal, Ψ(τ ), determine together the generation interval distribution.

The generation interval distribution for constant parameters
Equation (11) has a simpler form for constant λ r and λ i . This can be obtained by replacing Therefore, we can express the rate of exponential growth of an epidemic in terms of the mean excess degree (Z x ), the infectivity (λ i ) and removal (λ r ) rates.
Furthermore, we can also explicitly compute the generation interval distribution (Eq. For constant parameters the generation interval is an exponential random variable with mean 1/(λ i + λ r ). Using this fact and Eq. (12) we obtain the following expression for R 0 This equation relates the value of R 0 to the rate of growth during the exponential phase of an epidemic (α) the infection rate (λ i ) and the removal rate (λ r ). Notice that in contrast to the results obtained from a deterministic SIR model, where the mean generation interval is equal to the mean duration of infection [26], here we find that the mean generation interval also depends on the infection rate.  16)). This shows the consistency between the simulated epidemic curve and its expected (exponential growth) behaviour. Figure 4 shows the same results for an exponential network with p k = 1 − e −1/κ e −k/κ , κ = 4, z1 = 3.52, and Z x = 7.0416.
In this example, the stochastic, exponential and declining regimes are approximately between 0 ≤ t < 10, 10 ≤ t < 20 and 20 ≤ t time intervals, respectively. The lines  whereS(t) denotes the number of susceptible individuals at time t. As illustrated in Figure 5, the total number of infected cases can be written as which in turn, implies that   For instance, a recovered/removed predecessor, denoted by superscript r inR r (t), may only come from a group whose members were recovered while their predecessors were still infectious (R i ), or from a group in which the person and his/her predecessor are removed (denoted by the self-loop fromR r to itself). Similarly, the recovered predecessor (r) of an infected person (Ĩ r (t)), may only come from a group of recovered individuals whose respective predecessors were infectious (R i (t)) or from a group whose respective predecessors were already recovered (R r (t)).   (20) and (21)), and the symbols from direct counting of the infectious and removed individuals at any given time for a specific realization of the process.

The transmissibility of removed individuals
Our methodology to estimate R 0 is based on a detailed analysis of the characteristics of Therefore, we instead derive theoretical expressions that can help us estimate some of these quantities. First we write an expression for the total number of secondary contacts of those individuals already removed by time t. Using ideas similar to those above, the total excess degree of removed individuals can be written as Here,Z r x (t) represents the total number of edges of already removed individuals that could have transmitted the infection by time t. However, only a fraction of these links actually transmitted the disease successfully. This latter fraction is given byĨ r (t) +R r (t), whereĨ r (t) andR r (t) represent the number of infectious and removed individuals at time t, respectively whose predecessor is already removed. The ratio of these two quantities represents the fraction of potential transmissions that actually occurred, which we shall refer to as the expected transmissibility of removed individuals orT r (t).
Estimates for the expected values for these quantities can, in principle, be calculated based on the rate of new infections, J(t), using arguments similar to those above. These expressions are derived in the next section. Equations (23) and (11) form a set of equations that allows us to find two unknowns, lets say, the amplitude and variance of infectivity profile that is the subject of other study. Here we use equation (23) to solely estimate the basic reproduction number, R 0 .

Expression for the transmissibility of removed individuals,T r (t)
As discussed in the previous section, our methodology is based on a careful analysis of the characteristics of the removed individuals. In particular, the expected transmissibility of removed individuals,T r (t), will play a crucial role in the estimation of R 0 . We now derive an alternative expression forT r (t) and other expressions related to equation (23).
The expected transmissibility of removed individuals,T r (t), can also be obtained as an extension of Equation (3) by replacing the removal distribution, ψ(τ ), with the conditional distribution of removal time, given that it occurred before time t, defined asψ r (τ, t). The quantityψ r (τ, t)δτ is proportional to the number of individuals already removed by time t that were removed after exactly τ units of time, i.e.,ψ r (τ, t)δτ ∝ ψ(τ ) t−τ 0J (τ ′ )dτ ′ δτ . This probability function, after incorporating the proper normalization, can be written as The expected transmissibility of removed individuals can then be calculated as (see Equa- whereT (τ, t) is an extension of T (τ ) that takes into account the stochastic effects in the disease transmission process (represented by the explicit dependence of this quantity on t).

The basic reproduction number of removed individuals and an equation forR 0
The total excess degree of removed individuals (given in equation (22)) takes the simpler Combining the last equation with equation (23) one obtains We define the right hand side of the previous equation as the reproduction number of the removed individualsR r 0 (t), i.e.,R r Using Equations (27) and (28), we can write a time-dependent estimator of the basic reproduction number,R 0 = Z x T : On the right hand side of the last expression, the expected value ofĨ r (t) +R r (t) can be calculated asĨ whereη(τ, t ′ , t) is the fraction of infected individuals who are removed by time t and who may have infected others at time t ′ ≤ t.η(τ, t ′ , t) can be written as whereJ i (τ, t ′ ) =J(t ′ − τ )Ψ(τ ). Figure 9 illustrates how the new infection rate at time t ′ depends on the infection rate at time t ′ − τ .
ζ(τ, t ′ ) is the probability function that an infection at time t ′ was caused by any of these individuals.ζ(τ, t ′ ) can be written as Substituting the expressions ofη(τ, t ′ , t) andζ(τ, t) in Equation (30) we obtaiñ Notice that the right-hand side depends only on the rate of new infections,J(t), and disease transmissibility (represented byΘ(τ, t)). It is also important to notice that outcome of Equation (29) is invariant under J(t) −→ const * J rep (t) where J rep (t) is rate of new reported cases. Finally, we notice that for a disease with the constant removal function, . Therefore,Ĩ r (t) +R r (t) =R(t). This means that the first fraction on the right hand side in Equation (27) equals unity, and thusR r 0 (t) =T r (t)Z x = 1. The expression forR 0 (t) then takes the simpler form: An algorithm for the estimation of the basic reproduction number 1. Evaluate the conditional distribution of removal time given that it occurred before time t,ψ r (τ, t), using Equation (24) and J(t).
2. CalculateT (τ, t) by equating the left-and right-hand side of Equation (27). We should use equation (25) to evaluate the left-hand side of (27). It is worth mentioning that since we use only one equation, we can estimate only one parameter. This means that we must assume a functional form forT (τ, t) that depends on at most one parameter value. For example, we could assume thatT (τ, t) = T (τ )Ã(t), where T (τ ) denotes the dependence of the transmissibility on the age of infection andÃ(t) denotes an amplitude effect that captures the stochasticity of the transmissibility as a function of time. Assuming that T (τ ) is given, thenT (τ, t) depends only on the multiplicative parameterÃ(t).
3. Calculate an estimator of the expected transmissibility,T (t), usingT (τ, t), ψ(τ ) and Equation (3). Notice that the dependence ofT (t) on t denotes that we are using only information up to time t.
4. The estimated reproduction number at time t is given byR 0 (t) = Z xT (t).
The above algorithm can be modified depending on the information available for the esti- framework. For more details please see [28] Numerical results To test the framework presented so far, we performed epidemic spread simulations on the two networks introduced earlier (binomial and exponential) and in each case collected the "time series" of case counts resulting from the simulations.

Constant infectivity and removal functions
In Figure 10 we present the basic reproduction number of the removed individuals, R r 0 (t), as a function of the number of removed individuals, R(t). The symbols show the results from direct counting during the simulation, whereas the lines show the results obtained from analytically evaluating (after settingT (τ, t) = T (τ ) andΘ(τ, t) = dT (τ ) dτ in the left-and right-hand sides of Equation (27)) each of the terms in Equation (27). The green/blue and red/pink colors correspond to the left and right hand side of Equation (27), respectively. The small asymptotic deviation of our estimate for the exponential network comes from finite size effects [29] (for more details, see figure 13 in the appendix). To compute the WL estimates we require knowledge of the epidemic exponential phase's growth rate (α). For each simulation, we estimated α from the logarithm of the cumulative incidence using simple linear regression and a window of four units of time of data for each time point. The figure shows that in both cases our estimator converges and becomes stable quicker than the WL estimator. This is because our methodology does not make an explicit assumption of exponential epidemic growth and is therefore able to incorporate and appropriately weight the information from the stochastic phase of the epidemic.
In order to assess the variability of our estimator, we simulated one hundred different realizations of the epidemic process and then estimated the value of R 0 for each of them.  Figure 12 shows the mean estimated value plus/minus one standard deviation (averaging across realizations) for each network. Notice that the variability for the exponential network is bigger than for the binomial network. We attribute this to the fact that the exponential degree distribution has a larger variance. In addition, the R 0 estimate for the exponential network also appears to have a negative bias. As mentioned above, we attribute this phenomena to finite size effects, which are stronger for this network in comparison to the binomial.
It is worth noting thatT r (t) is a function of λ i (τ ), λ r (τ ) and J(t) (see Equations (3), (24) and (25)). Therefore, when λ r is constant, the conditionT r (t)Z x = 1 allows one to evaluate, for a given time series, one of the three quantities λ i , λ r or Z x , if the other two quantities are assumed to be known (this statement holds even if λ r is a function of time, in which case the more complex Equation (27)    As mentioned earlierT r (t) is a function of λ i (τ ), λ r (τ ) and J(t) (see Equations (3), (24) and (25)). Therefore, when λ r is constant, the conditionT r (t)Z x = 1 allows one to evaluate, for a given time series, one of the three quantities λ i , λ r or Z x , if the other two quantities are assumed to be known (this statement holds even if λ r is a function of time, in which case the more complex Equation (27) should be used). For instance, we simulated again the epidemic on the binomial and exponential networks presented earlier, but this time with constant values for λ r and λ i . Using these values and the derived/calculated J(t), we calculated the value of Z x , which is presented in Figure (
The results show that although the methodology is sensitive to misspecifications in the functional forms of λ i and λ i , the estimated R 0 values are still relatively close to the true value. Figure 15 shows the sensitivity of the estimated reproduction number to misspecifications of the excess degree. To test this, we vary the assumed excess degree between {4 − 6} for the binomial network (true value equal to 5) and between {6.04 − 8.04} for the exponential network (true value equal to 7.04). The results show that although we assumed a misspecification in the excess degree of up to 20% for the binomial and up to 14.2%

Sensitivity Analysis
for the exponential network, the estimates had an error of at most 3% and 3.1%, respec-R R R Figure 14. The estimated basic reproduction number for the binomial (left panel; z 1 = 5, λ i = τ /(1 + τ 2 )(1 + 0.5τ 2 ) and λ r = 2τ /(1 + τ 2 )) and exponential (right panel; κ = 4) networks in terms of the number of removed individuals. The red line and green curves correspond to the real and the estimated value. The blue line shows the estimates if we incorrectly assume that λ i is constant. The pink line shows the corresponding estimates if we incorrectly assume that both λ i and λ r are constant tively. As described earlier in the description of the algorithm, we can use Equation (27) to estimate one model parameter, in this case λ i . And then use its value to evaluate the expected transmissibility and then the expected reproduction number (the first identity in Equation (29)). This shows the usefulness of Equation (27), which acts as a strong constraint that allows us to estimate the basic reproduction number with good precision regardless of misspecification in other input parameters.

Discussion
Using concepts from network theory and stochastic processes, we presented a method that provides a reliable estimate of the basic reproduction number, R 0 . Our method takes into account the stochasticity in disease spread and does not make an explicit assumption about exponential epidemic growth and therefore is able to provide estimates of R 0 at an Instantaneous rate of infection. λ i (τ )δτ gives the probability of disease transmission across an edge between infection age τ and τ + δτ , given it occurred after age τ . Removal hazard or removal function λ r (τ ) Instantaneous removal rate. λ r (τ )δτ gives the removal probability of an infectious individual between its infection age τ and τ + δτ , given it occurred after age τ . Transmissibility T (τ ) Probability of disease transmission by infection age τ . It is calculated by T (τ ) = 1 − exp −   Total number of removed individuals at time t whose predecessor is already removed. R i (t) Total number of removed individuals at time t whose predecessor is still infectious. I r (t) Total number of infectious individuals at time t whose predecessor is already removed. I i (t) Total number of infectious individuals at time t whose predecessor is still infectious. Z r x (t) Total number of excess links of removed individuals, calculated by Eq. (22). Transmissibility of removed individuals T r (t) Gives the transmissibility of removed individuals at time t and it is calculated by Eqs. (23) and (25). χ(τ ) χ(τ )δτ gives the expected number of new infections produced by an infectious individual between ages of infection τ and τ + dτ . Generation interval distributionχ (τ )χ(τ )δτ gives the conditional probability that given an infection, it occurred between ages of infection τ and τ + dτ .