Heterogeneity can markedly increase final outbreak size in the SIR model of epidemics

We study the SIR model of epidemics on positively correlated heterogeneous networks with population variability, and explore the dependence of the final outbreak size on the network heterogeneity strength and basic reproduction number $R_0$ -- the ratio between the infection and recovery rates per individual. We reveal a critical value $R_0^c$, above which the maximal outbreak size is obtained at zero heterogeneity, but below which, the maximum is obtained at finite heterogeneity strength. This second-order phase transition, universal for all network distributions with finite standardized moments indicates that, network heterogeneity can greatly increase the final outbreak size. We also show that this effect can be enhanced by adding population heterogeneity, in the form of varying inter-individual susceptibility and infectiousness. Notably, our results provide key insight as to the predictability of the well-mixed SIR model for the final outbreak size, in realistic scenarios.

Introduction.The SIR (susceptible-infected-recovered) model [1][2][3] has been a topic of great interest during the past decades [4], and is one of the most conceptually basic, yet powerful models that describes the spread of an infectious disease.The model includes three population classes: susceptible (S), infected (I) and recovered (R).A contact between S and I individuals can give rise to the infection of S. Conversely, an infected individual can recover and move to the R class.Remarkably, this simple model provides an adequate description to a wide variety of infectious diseases including COVID-19 pandemic [5].
Many works dealing with the SIR model assume a wellmixed topology; i.e., each individual interacts with all others (or has the same number of contacts) [2,3,[6][7][8].While this assumption is valid in some limits, in realistic scenarios one has to account for each individual's connectivity and deal instead with a population network.In recent years, there have been several works dealing with the SIR model on heterogeneous random networks, where different individuals have varying connectivity [9][10][11][12][13][14].In most of these works a mean-field approach is taken; i.e., the stochastic nature of the interactions and discreteness of individuals are neglected.Indeed, there have been other works that accounted for the demographic stochasticity in the SIR model, and studied the final outbreak size distribution [15][16][17][18][19][20][21].But even in the absence of demographic noise, while several authors have studied epidemic spreading on heterogeneous networks [9][10][11]14], to the best of our knowledge the direct influence of the network topology on the final outbreak size has not been studied.Importantly, this may be key for predicting the outcome of such a disease, as we show that the well-mixed (fully-connected) setting does not necessarily provide an upper bound for the final outbreak size.
Here we discover a novel second-order phase transition in the maximal outbreak size as a function of the network heterogeneity.Intuitively one would think that as the network heterogeneity increases, the final outbreak size should decrease, and thus, the outbreak size is maximized at zero heterogeneity.This is indeed the case for large values of the basic reproduction number R 0 = β/γ, describing the ratio between the infection rate β and recovery rate γ per individual.However, it turns out that there exists a critical value of R 0 , which we denote by R c 0 , below which the maximal outbreak size is obtained at nonzero heterogeneity.Furthermore, as R 0 is decreased below R c 0 , the magnitude of heterogeneity which maximizes the final outbreak size is increased.Interestingly, by introducing population heterogeneity in the form of varying susceptibility and/or infectiousness across individuals [22][23][24][25][26][27][28], this effect is enhanced, and the phase transition moves to increasingly larger values of R c 0 .In contrast, we find that the value of R c 0 decreases as the degree-degree correlation between neighboring nodes increases.Finally, we show that this phase transition is universal where R c 0 is independent on the network topology, as long as the degree distribution has finite standardized moments.Importantly, our results provide key insight as to the limits of applicability of the simplified well-mixed SIR model on real-life heterogeneous networks with respect to the outbreak size.
SIR model on networks.In the SIR model the sum of susceptibles S, infected I and recovered R is conserved: S + I + R = N .Here, N represents to network size, i.e., the number of agents spreading the infection.Below, we use concentrations of susceptibles, S = S/N , infected, I = I/N , and recovered, R = R/N .Denoting R 0 = β/γ, rescaling time t → γt, and assuming a well-mixed setting, in the limit of N 1 the dynamics read: (1) Notably, Eq. (1) ignores demographic noise, whose relative magnitude scales, in general, as N −1/2 1 [29,30].Moreover, in deriving Eq. ( 1) we used a fully-connected network, where each individual interacts with all others.
We now account for network heterogeneity by considering a population network, where each node represents an individual who can be either susceptible, infected or recovered, and edges between nodes represent interactions between them.We follow the formalism developed by Miller [9] and define p(k) as the network degree distribution.Namely, p(k) is the probability for a node to have k neighbors.We furthermore assume that the network has positive degree-degree correlations [31], see below.
Let us denote θ(t) as the probability that a random edge has not transmitted an infectious contact up to a time t.This definition is equivalent to the probability that a node of degree 1 is still susceptible at time t [10].Thus, the probability of an individual node with k neighbors to remain susceptible at time t is given by θ k .As a result, the fraction of susceptibles at time t is given by Here, σ is the standard deviation of the network degree distribution, where k 0 = k kp(k), is the distribution's mean.Notably, ψ(θ, σ) is the probability generating function of p(k); its derivatives with respect to θ at θ = 0 provide the complete distribution, p(k), while the derivatives at θ = 1 provide the distribution's moments; e.g., k 0 = ∂ θ ψ| θ=1 .While ψ depends on the entire distribution p(k), we have added an explicit dependence on σ, the heterogeneity strength, since we focus on the dependence of the final outbreak size on σ.
We now derive the governing equation for θ(t, σ) in order to obtain S ∞ = ψ(t → ∞), and the final outbreak fraction, R ∞ = 1 − S ∞ , where S ∞ is the final susceptible fraction.Below we set γ = 1, such that time is measured in units of γ −1 , and rescale β → β/k 0 , such that β now denotes the infection rate of a suscepetible node per infected neighbor.Defining an auxiliary variable φ(t) as the probability that a node v is infectious but has not transmitted the disease to its neighbor u, φ denotes the fraction of all v−u edges in the network where v is infected but has not (yet) directly infected u.Thus, θ = −βφ [9].
The dynamics of φ(t) satisfies: φ = −(β + 1)φ − ḣ.Here, φ decreases when the neighbor u is infected from v at rate βφ, or when node v is recovered at a rate of γφ = φ.On the other hand, φ increases when a susceptible node v becomes infected.Here, h(t) is the probability that v remains susceptible, and thus, − ḣ(t) is the rate at which v becomes infected from any of its neighbors except u.Accounting for positive degree-degree correlations, the probability that a neighbor of a degree-k node has degree k, i.e., the two-point degree correlation function, satisfies: [31], where α measures the correlation strength [32].Therefore, which can be integrated over time, using the fact that φ(0) = 0, θ(0) 1 and h(0) = 1.As a result, where we have used the definition of R 0 = k 0 β.This is a first-order nonlinear differential equation, which strongly depends on the network topology and degree correlations.While its time-dependent solution can be found numerically, we here study its steady-state solution, θ ∞ ≡ θ(t → ∞).Indeed, putting θ = 0 in Eq. ( 3) we find: Note that, for α = 0 the results of [9] are recovered.Maximal outbreak size.Equation ( 4) can be numerically solved for various network topologies, p(k), having mean k 0 and standard deviation σ.An example for the dependence of R ∞ on σ, for various values of R 0 , can be seen in Fig. 1(a) where we have used a bimodal network, with p(k) = 1/2 (δ k,k0−σ + δ k,k0+σ ).Remarkably, as R 0 is lowered below some threshold R c 0 , the maximum of R ∞ shifts from σ = 0 to σ > 0. That is, while for R 0 > R c 0 , the final outbreak size is maximized when the network is homogeneous, for R 0 < R c 0 the maximum is obtained at finite heterogeneity.This result is counter intuitive.As σ is increased, the final outbreak size should decrease, as nodes with very high degree become more abundant.Due to their high degree, these nodes get infected (and recovered) much quicker than lower-degree nodes, which causes a more rapid decrease in the effective infection rate per individual, and correspondingly, in the final outbreak size, compared to the homogeneous case.Yet, here we show that this phenomenon is not universal, but rather depends on the underlying value of R 0 .
We have studied the dependence of the threshold, R c 0 , on the network's degree distribution.In Fig. 1(b), we plot the value of the coefficient of variation (COV), σ/k 0 , which maximizes the final outbreak size, for bimodal, symmetric beta, gamma and uniform distributions, versus R 0 , for k 0 = 20 and α = 0.The fact that all curves collapse indicates that R c 0 is universal and is independent on the particular details of the network details, see below.
To find R c 0 we realize that at the threshold, R 0 = R c 0 , the maximum of R ∞ is obtained exactly at σ = 0, namely dR ∞ /dσ| σ=0 = 0. Above R c 0 this derivative is negative, whereas below R c 0 the maximum is obtained for σ > 0, see Fig. 1(a).Differentiating R ∞ = 1 − ψ(θ ∞ , σ) with respect to σ, using Eqs.( 2) and ( 4), and demanding that the derivative dR ∞ /dσ be zero at σ = 0, we arrive at This is an exact algebraic equation, whose solution provides R c 0 .In general it can be solved numerically, whereas analytical progress can be made for k 0 1.Here we seek for the solution perturbatively by assuming θ ∞ = 1 − with = O(k −1 0 ) 1 (to be verified a-posteriori).First, we establish a connection between and R c 0 by plugging θ ∞ = 1− into (4), and putting σ = 0, i.e., using a homogeneous distribution, p(k) = δ k,k0 .Keeping leading order terms we arrive at Going back to Eq. ( 5), for 1 corrections in the exponent.Thus, the two terms ∂ θ ψ(θ, σ) and ∂ θθ ψ(θ, σ) evaluated at θ = θ ∞ and σ = 0, read: Notably, the terms involving derivatives with respect to σ in Eq. ( 5) are more involved as one has to use the definition of ψ from Eq. (2).To proceed, we write where this expression has to be evaluated at θ = θ ∞ and σ = 0. Here, we added k 0 , and subtracted k 0 by subtracting k 0 from k in the exponent.The term in the brackets is (up to a minus sign) the generating function of the central moments (around the mean) {µ n } ∞ n=0 .Taylorexpanding in powers of (k − k 0 ), we find: where µ 1 = 0 and µ 2 = σ 2 .For p(k) with finite standardized moments, μn , one can show that µ n = σ n μn .As a result, plugging this series back into Eq.( 8), all terms with powers of σ greater than 2 vanish, since we set σ = 0 after the differentiation, and one finally obtains: . Plugging this along with Eq. ( 7) into (5), and using Eq. ( 6), in the leading order of k 0 1 the critical R c 0 is found to be For uncorrelated networks, α = 0, we find R c 0 = (3/2) ln 3 1.648.Plugging R c 0 into Eq.( 6) verifies aposteriori that = O(k −1 0 ).We have checked that as k 0 is increased, the numerical value of R c 0 approaches our theoretical prediction given by Eq. ( 9), see Fig. 1(c) [33].
To verify our results we ran Gillespie simulations [34] on correlated, bimodal and gamma distributed networks, of size N = 10 4 and mean degree k 0 = 100.To achieve a given correlation α, for each degree-k node having initial k stems, a fraction α of its stems were connected to stems of other degree-k nodes, while the rest were connected randomly, as in the configuration model [35].This algorithm creates a network with correlation α for small α, while it tends to lose accuracy as α grows, due to finite size effects.In Fig. 1(d) our theoretical prediction ( 9) is shown to agree well with simulations at low α's.While we focus on α > 0 indicative of social networks [36], we checked that for α < 0, R c 0 grows as expected.What is the reason for the second-order phase transition observed in Fig. 1 2(a).We propose to approximate R ∞ as R ∞ cI max ∆t, where I max is the maximal value of I (that defines herd immunity), and ∆t is the typical wave's duration: the time interval during which I is greater than a fraction f (yet to be found) of I max , while c is a constant.For the distributions we have studied, f and c were found to satisfy f ≈ 0.27 and c ≈ 0.785 for a wide range of R 0 = O(1) and σ values.In Fig. 2(b) the approximate and exact solutions for R ∞ agree well, for a bimodal networks [37].
To explain the appearance of a phase transition at R c 0 , we denote by Ĩmax (and similarly for ∆ t) the ratio of I max at given σ and its value at σ = 0, see Fig. 2(c)-(d), such that R∞ = Ĩmax ∆ t.Thus, we have R ∞ (σ)/ R∞ (σ) is negative for any σ.Yet, as R c goes below R c 0 a non-monotone regime appears, which gives rise to a maximum in R ∞ at σ > 0.
This can be understood as follows.As the network heterogeneity strength σ is increased, there are more very high degree nodes (hubs), which get infected first due to their high degree, and infect the entire network rapidly.This rapid epidemic spread causes I to surge, but also causes the epidemic's duration ∆t to decrease.For low infection rates, R 0 < R c 0 , increasing σ initially causes the increase of R ∞ as the increase of I max cannot be balanced by the decrease of ∆t, see Fig. 2(c)-(e).Notably, as σ exceeds σ max the rate of spread of the hubs is so rapid such that low-degree nodes are hardly infected, and thus, R ∞ starts to decrease.Exactly at the onset of decrease of R ∞ , i.e. at σ = σ max , the disease spread rate is optimal such that the total number of infected nodes, R ∞ is maximized.Importantly, increasing R 0 has a similar effect to increasing σ.That is, when R 0 grows, the increase of σ is no longer needed to increase the rate of disease spread.Thus, if σ is also increased, one exceeds the optimal disease spread rate which yields a decline in R ∞ .Therefore, if at R 0 < R c 0 the maximum of R ∞ is obtained at σ = σ max > 0, as R 0 is increased, σ max shifts towards zero, as increasing R 0 is complementary to increasing σ.
Population heterogeneity.We now add variability across the population (population heterogeneity) and study its effect on the phase transition, by using the formalism of [27] and modulating the infection rate β by the mean population's susceptibility x, such that R 0 → xR 0 .While for homogeneous populations x(t) = 1, for heterogeneous populations, x decays in time, as the highly susceptible individuals get infected and recover relatively quickly thereby decreasing x.In the well-mixed case, denoting by s(x, t) the fraction of susceptibles having infection rate between x to x + dx, the total fraction of susceptibles is S(t) = ∞ 0 s(x, t)dx.Thus, s(x, t) satisfies ∂ t s = −R 0 xsI, and the mean susceptibility becomes [27] x To find x(t), a new time scale τ = R 0 R is defined, measuring the epidemic spreading.Thus, dτ /dt = R 0 I, such that ∂ τ s = −xs, which yields: s(x, τ ) = s 0 (x) exp(−τ x).We incorporate population heterogeneity by taking a gamma-distributed initial susceptibility, s 0 (x) ∼ x −1+a e −ax , with average 1 and standard deviation σ p = a −1/2 [38].With this distribution, x given by Eq. ( 10) decays in time as x = 1 + τ σ 2 p −1 [27].
To find R c 0 under both population and network heterogeneity, we numerically compute the steady-state solution of Eq. ( 11) [39], which allows finding R ∞ (R 0 ).Here, as x decreases over time, the effective disease spread rate, xR 0 , decreases, which can be compensated by more highly connected nodes.Thus, R c 0 increases as population heterogeneity increases, namely as σ p increases.This is demonstrated for a bimodal network in Fig. 3(a).
Discussion.We have discovered a previously unknown phase transition in the maximum value of the final outbreak size R max ∞ , as function of the network heterogeneity strength, σ, as R 0 crosses a threshold of R c 0 .While for ∞ is obtained at σ > 0. This counter-intuitive result stems from an intricate balance between the increase in the peak and decrease in the duration of the epidemic wave, as the network heterogeneity grows.We also showed that population heterogeneity and degree correlations between neighboring nodes strongly affect the value of R c 0 .What are the implications of this phase transition for realistic scenarios?For diseases such as the smallpox, monkeypox, diphtheria or COVID-19, R 0 > 2 is above R c 0 [40][41][42][43][44]. Here, the prediction of the well-mixed SIR model gives an upper bound for R ∞ .Yet, for R 0 < R c 0 , taking the well-mixed SIR prediction as an upper bound may be erroneous; e.g., for seasonal influenza (R 0 = 1.28 [45]), σ max /k 0 0.857 for a gamma-distributed network with k 0 = 20.This yields R ∞ 0.466, higher by ∼16% than the well-mixed prediction, R ∞ 0.403.Notably, for positively correlated networks, R c 0 decreases, whereas adding population heterogeneity decreases R c 0 .Yet, in Fig. 3

(b) the decrease in R max
∞ for all values of σ, due to population heterogeneity, supersedes the increase in R max ∞ due to network heterogeneity.Thus, while evaluating R c 0 and R max ∞ in realistic scenarios is highly nontrivial, it may provide important insight as to the outcome of the epidemics in the worst-case scenario.
Acknowledgements.AL and MA acknowledge support from the ISF grant 531/20.
(b)?The total outbreak size satisfies R ∞ = ∞ 0 I(t)dt.Several examples of epidemic waves for various COV values are shown in Fig.