Limited data on infectious disease distribution exposes ambiguity in epidemic modeling choices

Traditional disease transmission models assume that the infectious period is exponentially distributed with a recovery rate fixed in time and across individuals. This assumption provides analytical and computational advantages, however it is often unrealistic. Efforts in modeling non-exponentially distributed infectious periods are either limited to special cases or lead to unsolvable models. Also, the link between empirical data (infectious period distribution) and the modeling needs (corresponding recovery rates) lacks a clear understanding. Here we introduce a mapping of an arbitrary distribution of infectious periods into a distribution of recovery rates. We show that the same infectious period distribution at the population level can be reproduced by two modeling schemes -- host-based and population-based -- depending on the individual response to the infection, and aggregated empirical data cannot easily discriminate the correct scheme. Besides being conceptually different, the two schemes also lead to different epidemic trajectories. Although sharing the same behavior close to the disease-free equilibrium, the host-based scheme deviates from the expected epidemic when reaching the endemic equilibrium of an SIS transmission model, while the population-based scheme turns out to be equivalent to assuming a homogeneous recovery rate. We show this through analytical computations and stochastic epidemic simulations on a contact network, using both generative network models and empirical contact data. It is therefore possible to reproduce heterogeneous infectious periods in network-based transmission models, however the resulting prevalence is sensitive to the modeling choice for the interpretation of the empirically collected data on infection duration. In absence of higher resolution data, studies should acknowledge such deviations in the epidemic predictions.


I. INTRODUCTION
The infectious period has a key role in the progression of an infectious disease.It is the time interval during which an infected host can transmit the pathogen to other susceptible individuals, therefore it is closely linked to the ecological persistence of the disease and the challenges of its eradication.The infectious period depends both on disease natural history, and on the interventions possibly put in place: treatment, for instance, can be effective in reducing it.Collecting and characterizing this type of data is quite challenging, as demonstrated by statistical studies on measles [1] and long-lived infections such as HIV [2].Recent examples about COVID-19 [3] and monkeypox [4] show that a common feature of empirical infectious period distributions is their overdispersion or underdispersion, deviating from an exponential dis-tribution.This is in contrast with the traditional analytical approach adopted in the physics community, based on modeling the infectious period with a fixed recovery rate.By this approach, each infectious host recovers at a rate µ.This paradigm has a twofold advantage.First, it is easy to handle both analytically and numerically; secondly, it maps the process into a spontaneous 1-body reaction, allowing to borrow solutions from other fields of physics, like reactiondiffusion processes, and decays.However, this approach uniquely constrains the distribution of the infectious period τ to an exponential distribution having expected value ⟨τ ⟩ = µ −1 .Therefore, fitting ⟨τ ⟩ from real data leaves no extra degrees of freedom to model the dispersion of the data.
Efforts to overcome this problem already exist, e.g. through additional infectious compartments [5][6][7], or by splitting hosts into epidemiologically relevant groups [8,9].An alternative approach is to directly plug heterogeneous recovery rates into the model [10,11].All three approaches have limitations: adding compartments limits the choice of possible distributions, partitioning individuals is limited to specific epidemiological contexts, and using distributed recovery rates is not justified by a clear link with a corresponding distribution of the infectious periods.On the other hand, parameterizing models with an explicit distribution of the infectious period makes them harder to solve.
In this study, we develop a theory to include arbitrary distributions of infectious periods, so that they can be informed by real data.We treat both the infectious period τ and the recovery rate µ as stochastic variables, and determine the mapping between their probability distributions.We treat individual recovery as a spontaneous process occurring at a given rate µ, but we sample that rate from a distribution f appropriately chosen so that the resulting distribution of infectious periods τ at the population level follows a desired distribution g, possibly fitted on data.Our mapping allows to analytically derive f (µ) from g(τ ).
This mapping also provides a clear understanding of the link between empirical data (the infectious disease distribution) and the modeling needs (the definition of the corresponding recovery rate).The infectious period distribution g(τ ) is usually reconstructed from population data collected through surveillance and observational studies.The recovery rate µ, instead, is a variable defined by the modeling scheme at the individual level.This implies a degeneracy of models that may assign individual rates differently, but produce the same g(τ ).We study this by defining the host-based scheme, by which each host is given a recovery rate µ that is fixed in time and sampled from f , and the population-based scheme, by which every time a host recovers it re-samples its recovery rate from f .The latter scheme models a scenario in which the chance of recovery for a specific host changes after each re-infection, for instance, because of a difference in the immunity response.We show that both schemes recover the same distribution g computed from f through our mapping.However, we prove that, while the two schemes share their critical behavior (close to the epidemic threshold), they significantly differ in the endemic equilibrium.This has far-fetching implications, as the choice of the correct scheme may be nonunivocal, depending on the epidemic context, but data collection often does not allow to empirically discriminate between the hostbased and the population-based schemes.
In Sec.II., we build and discuss the analytical mapping from g to f .We also define the host-based and population-based schemes.We then characterize the epidemic threshold (Sec.III.) and the endemic equilibrium (Sec.IV.).In Sec.V., we apply our methodology to the spread of nosocomial infections in healthcare settings using empirical contact data.
In Sec.VI. we discuss the implications of our results for the modeling community.

II. MAPPING INFECTIOUS PERIODS INTO RECOVERY RATES
We consider the Susceptible-Infected-Susceptible (SIS) model [12,13].A susceptible individual becomes infected at rate λ upon contact with an infected host.It then recovers spontaneously to the susceptible state at rate µ.Recovery confers no immunity.The transmission rate λ is constant, while the recovery rate µ is a stochastic variable with distribution f .A varying λ with constant µ was studied in Ref. [14] for other purposes.Let τ be the stochastic variable representing the infectious period, with distribution g.In the case of a fixed µ, the distribution g would be the exponential distribution.
Just as in a standard reaction-diffusion process, recovery is Markovian once the recovery rate is fixed, i.e. τ |µ ∼ Exp(µ).Under this assumption, we can write where L is the Laplace transform operator.By integrating in τ and solving for f , we obtain where Ĝ is the tail distribution function of τ , i.e.Ĝ(τ ) = ∞ τ dx g(x).Eq. ( 2) is solvable either by explicit computation of the inverse Laplace transform -when possible -or by numerical integration [15].In the latter case, it is possible to determine beforehand if the solution exists by noticing that one can generate the moments of f by repeatedly deriving Eq. ( 1) with respect to τ , and then setting τ = 0: where m n is the n-th moment of f , i.e. m n = ∞ 0 dµµ n f (µ).As such, determining if the solution of Eq. (2) exists maps onto a Stiltjes moment problem [16] (see Appendix A).
Table I reports the expression of f (µ) for some commonly used distributions of infectious periods: exponential, gamma-distributed, power-law.
The mapping introduced above works for both the host-based and population-based schemes.The difference relies on the fact that in the former scheme each host samples its µ from f only once at the beginning, while in the latter each host re-samples its µ every time it recovers.Therefore, in terms of infectious period, in the population-based scheme each individual follows the same infectious period distribution g(τ ), while in the host-based scheme each individual is characterized by a different (exponential) infectious period distribution, producing the distribution g(τ ) when aggregated at the population level.

III. EPIDEMIC THRESHOLD
The epidemic threshold is the critical value λ c of the transmission rate that discriminates between the disease-free state (λ < λ c ), and the endemic regime (λ > λ c ).The computation of the epidemic threshold provides an important public health metric to evaluate intervention policies [9,17].
The epidemic threshold depends on both disease features (transmission, recovery), and the topology of the underlying network of contacts along which the spreading occurs.We assume a network of N nodes with adjacency matrix A. Let x i (t) be the probability that node i is infectious at time t, with i = 1, . . ., N .In the host-based scheme, node i has a fixed recovery rate µ i .Following the microscopic Markov chain formalism [18][19][20], we can write the differential equations describing the evolution of the disease as a perturbation of the disease-free state (thus neglecting O (x i x j ) and higher orders): In matrix form this reads ẋ = (−M + λA)x, where x = (x 1 (t), . . ., x N (t)) and M = diag{µ 1 , . . ., µ N } is the diagonal matrix containing all the recovery rates, which have been sampled from f (µ) at t = 0.The epidemic threshold λ c then solves the equation as proven in [20,21], where ρ indicates the spectral radius, i.e. the largest eigenvalue.
In the population-based scheme, we can observe that, close to the disease-free equilibrium, re-infection events are suppressed and can thus be dropped in the threshold computation as higher-order terms.This means that we can neglect the update mechanism of µ, and retrieve the same epidemic threshold as in the host-based scheme.
In the standard case of exponentially distributed τ (i.e., constant µ), Eq. ( 6) reduces to given that the rate of the exponential distribution coincides with the inverse of its expected value.We now solve Eq. ( 6) in the case of a non-exponential distribution g(τ ), i.e. with heterogeneous recovery rates µ i .
In practical applications, the matrix A often comes from a generative network model designed to reproduce key topological features of the contact structure of the population under study.Ref. [22] argues that generative network models are representable in terms of adjacency matrices whose rank equals the number of node features constrained.For instance, if one just fixes the expected degree of each node (the so-called configuration model [23][24][25][26]), one will get the rank-1 adjacency matrix A = KK T / (N ⟨k⟩), where K is the N -dimensional vector containing the expected degree of each node, and ⟨k⟩ is the average expected degree.
For the generic rank-r model one can write where V is an N × r matrix encoding node properties, and ∆ is a r-dimensional bilinear form encoding the geometry of the model (see Ref. [22]).The epidemic threshold of the generic network model requires plugging Eq. ( 8) in Eq. ( 6), and working out the calculations under the assumption that the node properties fixed by the model are uncorrelated with recovery rates (see Appendix B for explicit computations).Unexpectedly, this leads to the following expression of the epidemic threshold which is the same as Eq. ( 7), when expressed in terms of the average infectious period.This result shows that only the average infectious period impacts the epidemic threshold.The distribution g(τ ) may be arbitrarily complex, but its first moment is enough to discriminate between disease extinction and endemicity.A model with fixed µ is therefore sufficient to study the critical behavior of disease spreading, and this is beneficial in two aspects: i) such a model is analytically and numerically the simplest possible, ii) estimating ⟨τ ⟩ from data is easier than fitting the full distribution, especially if the available sample is small.However, Eq. ( 9) also warns us about the misuse of the recovery rate.It rigorously proves that the relevant observable is indeed the average infectious period, and not the average recovery rate.Replacing 1/ ⟨τ ⟩ with ⟨µ⟩ in Eq. ( 7) would lead to an overestimation of the epidemic threshold, because the identity (provable from Eq. ( 1)), combined with Jensen's inequality [27], implies that ⟨µ⟩ ≥ ⟨τ ⟩ −1 .Overestimating the epidemic threshold is potentially harmful, as it leads to underestimation of the risk of the disease becoming endemic.

IV. ENDEMIC PREVALENCE
Above the epidemic threshold, the SIS model converges to an endemic equilibrium characterized by a certain disease prevalence (i.e., fraction of population infected at a given time) [28].Computing this quantity completes the epidemiological characterization of the SIS epidemic.Quantifying the endemic prevalence, alongside the epidemic threshold, is relevant from a public health perspective, as it allows to anticipate the impact of the disease spreading in the long-term.
The endemic equilibrium is typically harder to derive analytically concerning the epidemic threshold: no closed-form solution exists beyond homogeneous mixing even in the case of exponentially distributed τ .Here we focus on homogeneous mixing, i.e. a sequence of Erdős-Rényi networks [29], and compute the corresponding endemic prevalence in the population-based and host-based scheme.Appendix C contains a generalization to the configuration model for the population-based scheme.
To proceed, we divide compartments I and S into sub-classes according to the recovery rate.Classes I j (t) and S j (t) represent, respectively, the number of infected and susceptible individuals at time t with recovery rate equal to µ j .Recovery rate thus gets formally discretized in an arbitrarily large number of values.The spreading equations are where the average connectivity of the homogeneous network is absorbed in the transmission rate λ.At time t = 0, we have a fraction f (µ j )dµ j of the total number of individuals N that have rate µ j .In the population-based scheme, as time passes, large values of µ are replaced sooner, as they generate, on average, shorter infectious periods.Likewise, smaller values of µ persist longer.This implies that the fraction of hosts with rate µ j at a given time deviates from the initial fraction f (µ j ) dµ j as time passes.Notwithstanding, if we look exclusively at compartment S j (t), we can state that S j (t) = f (µ j )dµ j S(t), because a new µ j is assigned after recovery, and the inter-event time between recovery and re-infection does not depend on the recovery rate µ j of the susceptible individual.By inserting this in Eq. ( 11), we can then set the rhs equal to zero and sum over j to get rid of the discretization and obtain the endemic prevalence at equilibrium So we find that, as for the epidemic threshold, the endemic equilibrium in the population-based scheme depends only on the average infectious period, regardless of its distribution, and coincides with the equilibrium obtained assuming a homogeneous recovery rate It is a different matter for the host-based scheme.We find a new formula to analytically derive endemic prevalence x eq for homogeneous mixing, as a solution of the following equation: Details of the derivation can be found in Appendix D. In general Eq. ( 13) is solvable numerically.In some cases it leads to an analytic expression for x eq .One of such cases is obviously when τ is exponentially distributed, giving the same result as in Eq. ( 12).If τ is gamma-distributed (see Tab. I), Eq. ( 13) becomes relatively simple: (1 + x eq λ ⟨τ ⟩ /κ) −κ = 1 − x eq , where we used the parameterization ⟨τ ⟩ = κθ.Then, further assuming κ = 1/2, gives x eq = 1 . This example explicitly shows how different the endemic equilibrium can be from the exponentially-distributed case (Eq.( 12)).
In Sec.III.we showed that the average infectious period ⟨τ ⟩ alone completely determines the epidemic threshold.Equation ( 13) instead shows that higher moments of τ have an impact on the endemic equilibrium, in the case of the host-based scheme.In Fig. 1 we keep ⟨τ ⟩ fixed, and explore different levels of dispersion around it in case of gamma-distributed, and power-law-distributed infectious period.Comparison with exponentially-distributed τ (at same ⟨τ ⟩) shows that in the host-based scheme i) higher variance gives consistently lower endemic prevalence; ii) at fixed variance, gamma-distributed τ leads to lower prevalence than power-law-distributed τ .

V. APPLICATION TO MRSA DIFFUSION IN HEALTHCARE SETTINGS
Methicillin-resistant Staphylococcus aureus (MRSA) is responsible for severe bacterial infections.Its acquired resistance to antimicrobial treatment makes it one of the most dreaded infections occurring in healthcare settings [30].Patients can get colonized through direct contact with asymptomatic carriers (including healthcare workers).Outbreaks of MRSA infection increase mortality, hospitalization times, and are difficult and costly to contain [30].
MRSA carriage duration is non-exponentially distributed.We analyzed data on time to observed clearance [31] to reconstruct the distribution of carriage time g(τ ).We fitted the data through maximum likelihood using exponential and Gamma distributions.The Akaike Information Criterion (AIC) selected the Gamma distribution as the best-fitting model (see Fig. 2).This supports therefore the application of the approach presented here, as accurate model predictions are needed to improve surveillance and response against MRSA diffusion.
We use the estimated g(τ ) to simulate the spread of MRSA carriage on a real network of contacts among 590 individuals (both patients and healthcare workers) collected through wearable sensors in a long-term and rehabilitation facility in Northern France [32,33].
We simulated both the host-based, and the populationbased schemes.We also considered the scenario of constant recovery rate (exponentially-distributed τ ) as benchmark, with the same ⟨τ ⟩.The results of the simulations are displayed in Fig. 2

(b) and (c).
Epidemic trajectories in Figure 2(b) show that heterogeneity in the recovery rates has an effect in slowing down disease spread with respect to the homogeneous scheme.When approaching the endemic equilibrium, Figure 2(c) confirms that the homogeneous modeling scheme with a fixed rate and the population-based scheme share the same endemic prevalence, even in the case of a realistic temporal contact network.Instead, in the host-based scheme the predicted endemic prevalence turns out to be smaller.As the transmission rate λ decreases, the values for the three schemes converge, supporting the idea that they all hold the same epidemic threshold.

VI. DISCUSSION
A fixed recovery rate across individuals and throughout the epidemic outbreak fails in reproducing realistic distributions of infectious periods.Yet, it is the key modeling ingredient of traditional approaches because it treats recovery as a Markovian process, i.e. a spontaneous decay.This assumption allows analytical calculations, and largely simplifies numerical implementations.Based on a novel mapping of the infectious period distributions into a recovery rate distribution, we introduced a modeling framework that can capture and integrate arbitrary infectious period distributions that can be informed by empirical data, while remaining analytically and numerically treatable.
Collected empirical data usually provide information on the infectious period at the population level, not at the individual level, through a distribution of values.This lack of resolution at the host level opens the path to two possible modeling schemes, assuming either an immutable recovery rate per host (host-based scheme), or a rate that can be updated at each infection episode because it is altered by factors affecting the immune response of the individual (populationbased scheme).When data on re-infection of the same individual are too scarce to estimate a distribution for each host (as it is often the case, with a few exceptions [34]), the two schemes become empirically equivalent, but they conceal significant differences in terms of predictions.
We analytically prove that the epidemic threshold, in the case of any generative network model for hosts interactions, does not depend on the scheme chosen.We also prove that such threshold depends only on the average infectious period, making the standard assumption of constant recovery rate sufficient to correctly model the behavior around the disease-free state.Differences emerge when the sys-tem moves away from the critical point.The endemic disease prevalence -a predictor of how easy it is to eradicate a disease in a population -is quantitatively different in each scheme, as shown in theoretical examples of disease spread and in a case study applied to the spread of the multiresistant bacteria MRSA in healthcare settings.
The difference extends to the out-of-equilibrium dynamics, as disease spreading in the population-based scheme is faster with respect to the host-based scheme.
Our findings show that modeling heterogeneity within the population-based or the host-based paradigms, although apparently equivalent, has a considerable impact on the endemic disease prevalence, and caution should be taken when addressing specific epidemic contexts where data do not allow to distinguish between the schemes.This problem might disappear naturally in contexts in which individual recovery rates have a concrete meaning or can be measured directly, e.g. in information diffusion processes [35], but in the general case of biological diseases, modelers should be aware that an arbitrary choice of the scheme may represent a potential source of bias to be considered.

APPENDIX A: Existence of the function f
The conditions on g(τ ) under which there exists a probability density function f (µ) solving Eq.( 1) can be described in terms of a moment problem [16].Let us evaluate the n-th derivative of g(τ ) in τ = 0 from Eq.( 1) where m n indicates the n-th moment of f (µ).Thus a solution of Eq.( 1) exists if and only if the sequence is a Stieljies moment sequence, i.e. it represents the sequence of moments of a measure on the interval [0, +∞).We set m 0 = 1 as we are looking for a probability measure.A sufficient and necessary condition for a real sequence {m n } n∈N to be a Stieljies moment sequence states that the Hankel matrices need to be positive semi-definite for any n ∈ N [16].This property is useful to assess if the framework in terms of recovery rates is applicable or not given a certain g(τ ).
In Section II.we presented the analytical form of the distribution of recovery rates f (µ) when g(τ ) is a Gamma(κ, θ) with κ < 1.We can show that such distribution does not exist when κ = 2. Indeed, for g(τ ) = θ −2 τ e −τ /θ , the sequence turns out to be m n = −(n − 1)θ −n and the Hankel matrix 2 is not positive semi-definite.

APPENDIX B: Epidemic threshold of the generic network model
The generic rank-r network model is defined by its metadegrees (r properties for each node), encoded in the n × r matrix V , and the signature of the nonsingular metric ∆.See Ref. [22] for further details.The adjacency matrix of such model is the rank-r matrix A = V ∆V T .Let M be a diagonal matrix containing the recovery rate µ j of node j in its j-th diagonal entry.Then, the linearized evolution of the disease close to the disease-free state follows the vector equation ẋ = (−M + λV ∆V T )x, where x j (t) is the probability that node j is infectious at time t.Finding the epidemic threshold means finding the lowest value of λ for which −M + λV ∆V T as a zero eigenvalue.We can compute the characteristic polynomial of this matrix using Ref. [36]: , where ρ is the spectral radius.
The last step consists in proving the following: where v α is the α-th column of the matrix V , i.e. the α-th metadegree, and B is the matrix V T V ∆, obtained from A after rank reduction, following the notation in [22].As the matrix A and B share the spectral properties, we can conclude that APPENDIX C: Endemic equilibrium in the population-based scheme In this section we derive the endemic prevalence in the population-based scheme and we show that it coincides with the one obtained when using a homogeneous recovery rate.We consider a contact network with a fixed degree distribution P (k), the so-called configuration network model [37].Within the homogeneous modeling scheme, a constant recovery rate µ is assigned to each individual in the population, so that the infectious period τ is exponentially distributed with mean ⟨τ ⟩ = µ −1 .Let x k (t) be the probability that a node with degree k is infected at time t.According to the degree-based mean-field approximation [38,39], the equation describing the evolution of the SIS model is where P (k ′ |k) is the probability that a node with degree k is connected with a node of degree k ′ .In order to derive the number of infected individuals at equilibrium, one must solve for I k the following equation where I k is the number of infected at equilibrium that have degree k.Now we assume the population-based scheme and consider a general distribution g(τ ) for the infectious period, and the corresponding distribution f (µ) for the recovery rates derived from Eq. ( 2).Each individual is assigned a recovery rate µ j from f (µ), updated by resampling after each recovery.We can still reason in terms of classes of degree k, but we need to further divide each class according to the recovery rate.Let us define x jk (t) as the probability that a node with recovery rate µ j and degree k is infected at time t.Then we can write where P (µ h , k ′ |µ j , k) is the probability that a node with recovery rate µ j and degree k has a link to a node with recovery rate µ h and degree k ′ .We assume that recovery rate and degree are uncorrelated, i.e.
In terms of number of infected I jk with recovery rate µ j and degree k, we obtain The quantity S jk (t) is the number of susceptible nodes at time t with recovery rate µ j and degree k.This is equal to S k (t) multiplied by the probability of being assigned the recovery rate µ j at the time of the last recovery, i.e. S jk (t) = P (µ j )S k (t).
Solving for the equilibrium, and summing over the index j, we obtain which is equal to Eq. ( 19) provided that the mean infectious period ⟨τ ⟩ = µ −1 is the same as the one assumed in the homogeneous modeling scheme.In conclusion, within the population-based scheme, the endemic prevalence depends only on the average infectious period, and not on its distribution.
APPENDIX D: Endemic equilibrium in the host-based scheme In this section we derive the endemic prevalence in the host-based scheme, in the case of homogeneous mixing, as stated in the main text in Eq. ( 13).Let x j (t) be the probability that an individual with recovery rate µ j is infected at time t.Then, where the average connectivity of the homogeneous network is absorbed in the transmission rate λ.By setting dx j (t)/dt = 0, we find where x eq is the endemic prevalence.In continuous terms, by integrating on both sides over all possible values of µ, we get the following equation for the endemic prevalence x eq : We rewrite it as follows: Expanding the inside of the integral as a geometric series, we get: The integral in lhs is µ −(n+1) , so we can use Eq. ( 10), re-index the sum, and get to Now the lhs is by definition the moment-generating function of g, evaluated in −λx eq .Given that such argument is never positive, this is also -by definition of moment-generating function -the Laplace transform of g, evaluated in λx eq .We thus get to the final form of the equation of the endemic equilibrium: TABLE I: Mapping from the infectious period distribution g(τ ) to the recovery rate distribution f (µ), and corresponding average infectious period ⟨τ ⟩, for some commonly used infectious period distributions.δ is the Dirac delta distribution.H is the Heaviside step function.For gamma-distributed g, the displayed f is valid for κ < 1.For power-law distributed g, h > 2 ensures the existence of both f , and ⟨τ ⟩.

FIG. 1 :
FIG.1: Endemic prevalence in the host-based scheme as a function of the relative variance in g(τ ).The gray line represents the endemic equilibrium for exponentially-distributed τ (constant µ).Red and blue dots represent the numerical solution of Eq. (13), respectively for gamma-distributed and power-law-distributed τ (see also Tab.I).Box-plots represent values of the endemic equilibrium obtained from 100 SIS stochastic simulations on an Erdős-Rényi network of N = 10 3 nodes and average degree 5. Disease parameters were fixed at: ⟨τ ⟩ = 300 time steps, λ = 5λ c .

FIG. 2 :
FIG.2: MRSA infection spreading over a contact network within a hospital.(a) Gamma vs. exponential estimated cumulative distribution for MRSA colonization duration.Black represent empirical data on time to clearance from[31].Values of parameters estimated at maximum likelihood: r = 0.007, κ = 0.62, θ = 238.04(see Tab.I for the definition).AIC = 316 and 319 for gamma and exponential distribution respectively.(b) Colonization prevalence over time, for the host-based (red) and the population-based scheme (green) with a gamma-distributed infectious period, in comparison with the homogeneous scheme (black) assuming an exponential distribution and a constant recovery rate.Here λ was fixed at 0.2.(c) Colonization prevalence at equilibrium, for different values of transmissibility λ. Results are averaged over 100 stochastic runs.Uncertainty bars and shaded areas indicate 95% probability ranges.Parameters of the gamma-distributed τ are the fitted estimates.The rate of the exponential distribution is fixed to have the same ⟨τ ⟩ as the gamma-distributed one.Contact data are aggregated at a weekly time scale, getting a time-evolving, weighted network where weights encode contact duration over a specific week.