Message passing and moment closure for susceptible-infected-recovered epidemics on finite networks

The message passing approach of Karrer and Newman [Phys. Rev. E 82, 016101 (2010)] is an exact and practicable representation of susceptible-infected-recovered dynamics on finite trees. Here we show that, assuming Poisson contact processes, a pair-based moment closure representation [Sharkey, J. Math. Biol. 57, 311 (2008)] can be derived from their equations. We extend the applicability of both representations and discuss their relative merits. On arbitrary time-independent networks, as was shown for the message passing formalism, the pair-based moment closure equations also provide a rigorous lower bound on the expected number of susceptibles at all times.


INTRODUCTION
There is a large, rapidly growing body of research relating to 'network epidemiology', where real-world hostto-host disease dynamics are modelled as stochastic processes that take place upon contact networks [1]. In many other areas, including transportation, logistics and social interactions, stochastic processes are also propagated across networks [2]. These random processes which act directly on network nodes, or across network links, alter the states of the individuals in the network. Finding mathematical descriptions for the time evolution of the probabilities of the states of the individuals is essential to understanding the dynamics of these systems and how to control them. Recently, two practicable and provably exact representations of stochastic epidemic models on finite trees have been proposed. These are the message passing approach of Karrer and Newman [3] and the pairbased moment closure representation of Sharkey et. al. [4][5][6]. Here we generalise and compare these two representations, and show that the pair-based equations can be derived from the message passing formalism.

SUSCEPTIBLE-INFECTED-RECOVERED DYNAMICS ON TREE NETWORKS
We define an arbitrary network to be a directed graph (digraph) D = (V, A), where V represents the set of all individuals in the population and the existence of an arc (i, j) ∈ A (i, j ∈ V ) represents the ability of i to make contacts to j. The population size is denoted by N = |V |. Following Karrer and Newman [3], while allowing heterogeneity in the ordered-pairwise interactions, we define the functions f ij (τ ) (∀i, j ∈ V : (j, i) ∈ A) such that the probability that j makes an infectious contact to i within time period t of being infected is given by t 0 f ij (τ )dτ . If i ∈ V is susceptible when it receives an infectious contact then it immediately becomes infected. The time that it takes i to become recovered (infectious contacts from i permanently cease), after it has been infected, is distributed according to an individual-specific probability density function r i (τ ). We will also assume that f ij (τ ) = h ij (τ ) ∞ τ r j (τ ′ )dτ ′ , where t 0 h ij (τ )dτ is the probability that j makes any contact to i within time period t of being infected. The above defines susceptibleinfected-recovered (SIR) dynamics on an arbitrary network. Throughout, we assume that the initial states of individuals are statistically independent.
In this section we consider SIR dynamics on the full class of digraphs where the underlying graphs are trees or forests. We refer to such digraphs as tree networks.

Message passing formalism for individuals
The fundamental quantity in the message passing formalism for tree networks is H i←j (t) (also defined for nontree networks), where (j, i) ∈ A(D), which is the probability that i has not received an infectious contact from j by time t given that i is in the cavity state (contacts from i are not permitted). Note that placing an individual in the cavity state does not affect its fate since its ability to contact others only comes into play if it is infected. Here we will allow heterogeneity in the contact and recovery processes and initial conditions, and also allow individuals to be vaccinated. Thus we can express the message passing equation as where y j and z j are the probabilities that j is recovered at t = 0 (vaccinated) and susceptible at t = 0 respectively (j is initially infected with probability 1 − y j − z j ) and Φ j i (t) is the probability that j has not received an infectious contact by time t given that i and j are both in the cavity state. For the tree networks considered in the present section, we can substitute in Eq. 1 [3]: where (j, i) ∈ A(D) and N j is the set of individuals (neighbours) from which there are arcs towards j. If ∄k : k ∈ N j , k = i, then we define the right-hand side of Eq. 2 to be equal to 1. Here, Φ j i (t) can be expressed as a product of probabilities of independent events because of the simple structure of tree networks, i.e. there is no more than one simple directed path from any individual to any other, and no cycles. This is discussed in more detail in the subsequent section on non-tree networks.
Equation 1 can now in principle be solved (at every point in time) for every i, j ∈ V : j ∈ N i via the associated system of integral equations, as discussed in [3] (the number of these equations is equal to |A(D)|). It is then straightforward to obtain the time evolution of the probability of an individual being in a particular state [3]: where A i and A i t−τ are the probabilities that i is in state A ∈ {S, I, R} (susceptible, infected or recovered) at time t and at time t − τ respectively. Note that the expected number of individuals in state A at time t is given by i∈V A i . By setting z i , y i ∈ {0, 1} ∀i ∈ V , we can consider any pure initial system state (note that we assume initially infected individuals 'become' infected at t = 0). In this case, we could eliminate y i from our equations by removing from the network those individuals that are vaccinated. However, we can also consider mixed (probabilistic) initial system states, as long as the states of individuals are initially independent. For instance, we might consider the case where every individual is independently vaccinated with probability y.
We obtain the specific form in the Karrer and Newman article by setting f ij = f, r i = r, z i = z, y i = 0 ∀i, j ∈ V : j ∈ N i . In this case, the solution represents a measure of an 'average epidemic' but we note that the initial distribution for the total number of infecteds is binomial and includes the possibility of no initial infecteds. Typically, we are more interested in the expected outcome when a single initial infected individual is seeded uniformly at random in the population (see for example [7]). We identify two methods for computing this (appendix A).

Message passing formalism for pairs
To start to link the message passing method with the pair-based models [4][5][6], we express some relevant probabilities for connected pairs in tree networks. The probability S i S j that neighbouring individuals i and j ((i, j) ∈ A or (j, i) ∈ A) are susceptible at time t is given by which follows from the fact that this pair state can only be destroyed by the infection of i, not via j, or the infection of j, not via i. The probability S i I j that i is susceptible and j is infectious at time t (in a tree network) is more difficult to formulate (here there is an arc from j to i). Firstly, we place both i and j in the cavity state, with the exception that we allow infectious contacts from j to i. Now, we know that for this pair state to occur we need i to be susceptible at t = 0 and for it not to receive an infectious contact from any of its neighbours (ignoring j for the moment) by time t; the probability of this is z i Φ i j (t). Secondly, we multiply this quantity by the probability P ij (t) of the independent event that j is initially susceptible and receives an infectious contact from a neighbour other than i (or is infected at t = 0) and then remains infectious, without making an infectious contact to i, until time t. Thus: However, 1 − P ij (t) can be computed as a sum of probabilities of mutually exclusive events, i.e. j is initially recovered or vaccinated (with probability y j ), or j is initially susceptible and does not receive an infectious contact (with probability z j Φ j i (t)), or j does receive an infectious contact (or is initially infected) but then also makes an infectious contact to i or recovers without making such a contact, by time t. Therefore, we now have: where dτ is the probability that j recovers from the infection within time period t of being infected, without contacting i during this period.

Equations for Poisson contact processes
For the remainder of this section, we consider a special case of the message passing formalism where the contact processes are Poisson, i.e.
f ij (τ ) = T ij e −Tij τ ∞ τ r j (τ ′ )dτ ′ where T ij is the rate at which j, when it is infected, makes infectious contacts to i. In this case, it can be shown (see appendix B) that a deterministic time series for A i , ∀A ∈ {S, I, R}, ∀i ∈ V , can in principle be obtained by integrating the following integro-differential equations, with H i←j (0) = 1 (∀i, j ∈ V : j ∈ N i ): and then making use of Eq. 3.
The above message passing system (Eqs. 3 and 7), in conjunction with Eqs. 4 and 6 for the states of connected pairs, implies the following pair-based system of integrodifferential equations (see appendix B): and and S i S j can be expressed in terms of Φ i j (t) (Eq. 4). We recognise Eq. 8 as a generalisation (to arbitrary recovery processes) of the pair-based moment closure equations [6] which assume the following approximation for the probability of the state of a connected triple: In the case of SIR dynamics on tree networks we know that this approximation is exact when B = S (susceptible), as in Eq. 8 [6,8]. It is clear that the number of equations will be of the order of the number of directed links in the network, i.e. |A(D)|.
These systems are exact for any tree network. More specifically, they are exact for any digraph where there is no more than one simple directed path from any individual to any other individual, and no cycles. The reason for this will be made clear in the later section on nontree networks. Indeed, for some non-tree networks the equations are still exact or may become exact for certain initial conditions, as discussed by Sharkey et al. [6] in relation to the pair-based representation for Markovian SIR dynamics. The message passing system (Eqs. 3 and 7) and the pair-based system (8) are equivalent in the sense that they produce the same time series for the probabilities of the states of individuals. Here, we shall consider both systems for exponential and fixed recovery processes.

Exponential infectious periods
For exponential recovery we have r i (τ ) = γ i e −γiτ where γ i is an individual-specific constant. Substituting this into Eq. 7, we obtain a system of ordinary differential equations (ODEs) which are straightforward to integrate numerically: The time evolution of the probability of an individual being in a particular state is then obtained via Eq. 3 with the exception that we can now conveniently obtain the probability of an individual being recovered via the differential equation:˙ After computing the time derivatives of the probabilities of an individual being in each of the possible states (using Eqs. 3 and 10), and expressing the results in terms of individuals and pairs (Eqs. 3 to 6), the pairbased moment closure system of Sharkey et al. [6] can be derived. Alternatively, it can be derived from the more general pair-based system (8) via the substitution r i (τ ) = γ i e −γiτ and making use of Eqs. 3 to 6.

Fixed infectious periods
For fixed infectious periods (or fixed time to recovery) we have r i (τ ) = δ(τ − ω i ) where δ(τ ) is the Dirac δ function and ω i is the time it takes i to recover once it has been infected. Substituting this into Eq. 7, we obtain a system of delay differential equations (DDEs) which are straightforward to integrate numerically: where θ(t) is the Heaviside step function. For the individuals we have: Similarly, a DDE version of system 8 can also be derived. Fig 1 matches the output from Eqs. 12 and 13 to stochastic simulation for a tree network of 10 individuals, illustrating exactness.

SIR DYNAMICS ON NON-TREE NETWORKS
Karrer and Newman [3] proved that their message passing formalism, when applied to non-tree networks, provides a rigorous lower bound on S i ∀i ∈ V . Here we repeat their analysis in order to confirm that this bound is still obtained in our slightly more general setting, and to show that the pair-based system (8) consequently provides the same bound on S i ∀i ∈ V .
Following Karrer and Newman, we represent SIR dynamics on an arbitrary digraph D = (V, A) by randomly weighting and removing the arcs as follows: 1) assign an infectious period τ i to every individual i ∈ V , sampling from r i . 2) weight every arc (j, i) ∈ A with a contact time ω ij , sampling from contact functions h ij . 3) for every arc (j, i) ∈ A, if its weighting ω ij > τ j then completely remove it from the digraph. 4) for every individual i ∈ V , with probability y i , completely remove every arc emanating from i. Here we investigate SIR dynamics (Poisson contact, fixed recovery) on an undirected tree network of 10 individuals. We set the infectious period to unity for all individuals and Tij = 1 ∀i, j ∈ V : j ∈ Ni. Two non-adjacent indexindividuals were selected to be initial infecteds, while each non-index individual was vaccinated with probability 1/10 and susceptible otherwise (at t = 0). The line represents the output from our representation (Eqs. 12 and 13) while the crosses indicate corresponding numerical results from 10,000 full stochastic simulations.
The resulting weighted digraph is denoted D ′ . n iB (D ′ ), where B ⊆ N i , denotes the set of individuals from which i can be reached by a simple directed path of total weighting less than t, such that a member of B is the penultimate individual, given that i is in the cavity state (all arcs emanating from i are removed).
Let i ← B, where B ⊆ N i , denote the event that i (in the cavity state) does not receive any infectious contacts from any of the members of B by time t. Let |N i | = M and let us label each of these neighbours as N (1) i , N where the ordering is arbitrary. We can now express S i as a product of conditional probabilities: The particular way in which D ′ is constructed means that, for any j ∈ N i , we have: where the expectation operator is here applied to a function of the random weighted digraph of which D ′ is a sin-gle realisation, and the product is assumed to be equal to 1 when n ij = ∅. Equation 15 follows from the fact that all members of n ij (D ′ ) must be initially susceptible if D ′ is to represent the event that i (in the cavity state) does not receive an infectious contact from j by time t. z k /(1 − y k ) is the probability that k is initially susceptible given that it is not vaccinated (we excluded the possibility of a member of n ij (D ′ ) being vaccinated in step 4 of the construction of D ′ ). Similarly, for any B ⊂ N i : j / ∈ B, we can write: Now, since n ij \ n iB ⊆ n ij , with set equality occurring for tree networks, we have: with equality occurring for tree networks. In fact, n ij (D ′ ) and n iB (D ′ ) are necessarily disjoint sets if there is no more than one simple directed path from any individual to any other individual in D.
Inequality 17 implies that the conditioning in each term of the product in Eq. 14 can only serve to increase the total probability. Therefore Inequality 17 also implies that where we have ignored P (j ← i | i in cavity) since it is necessarily equal to 1 (Φ j i (t) is the probability that j has not received an infectious contact by time t given that i and j are both in the cavity state, where (j, i) ∈ A(D)see Eq. 1). Now, taking i out of the cavity state, we only increase (or leave the same) the probability of infectious contact across any arc (replacing the arcs emanating from i cannot result in n jk losing any of its members), and so with equality occurring for tree networks. Notice that this, in conjunction with equality in 17 and 19, implies Eq. 2 for tree networks. However, we also get equality in 20 whenever there are no cycles in the digraph D. Therefore, sufficient requirements for the exactness of Eqs. 2 to 8 are that (1) there is no more than one simple directed path from any individual to any other individual in D and (2) there are no cycles in D. Equations 1, 19 and 20 imply that (21) Following Karrer and Newman [3], we define the function: and note that it corresponds to the way in which H i←j (t) can be expressed for tree networks ( Eqs. 1 and 2). Now, using the iterative procedure which they suggest (see appendix C), it can be shown that H i←j (t) ≥ F i←j (t) (∀i, j, t : j ∈ N i ). This means that For Poisson contact processes, the (approximate) dynamics can be cast as systems of differential equations in both formalisms (all occurrences of H, S, I and R, in Eqs. 1 to 8, are changed respectively to F, X, Y and Z -indicating inexactness). Since they are implied by the message passing formalism, the solution of the pairbased equations (8) on non-tree networks, i.e. arbitrary digraphs, provides a rigorous lower bound on S i and approximations for I i , R i ∀i ∈ V . Figure 2 illustrates the application of the message passing approach to SIR dynamics with Poisson contact processes and fixed recovery processes on a non-tree network (we use Eqs. 2, 12 and 13, changing all occurrences of H, S, I and R, to F, X, Y and Z, respectively).

CONCLUSIONS
For Poisson contact processes, the message passing formalism can be cast as a system of integro-differential equations, which conveniently simplify to ODEs for exponential recovery processes and DDEs for fixed recovery processes. However, we note that for certain other biologically feasible sets of functions {f ij (τ ) : i, j ∈ V, j ∈ N i }, which do not correspond to Poisson contact processes, the message passing formalism may still allow the dynamics to be obtained via systems of ODEs or DDEs. See, for example, the 'top hat' function discussed in [3]. This is a clear advantage of the message passing formalism over the moment-closure formalism, the latter seeming to require the contact processes to be Poisson. In fact, for arbitrary contact and recovery processes, the message passing formalism is theoretically solvable as a system of integral equations. Other advantages of the message passing approach are its applicability in the domain of random graph ensembles and, by considering H i←j (t) (or F i←j (t)) ∀i, j : j ∈ N i in the limit as t → ∞, its connection to percolationbased theory for final outcome statistics [3].
The pair-based formalism is a special case of the message passing approach in the sense that it seems to only apply to Poisson contact processes. In this case, the message passing system (7) is more efficient than the pairbased system (8) in terms of the number of equations. However, it is not immediately obvious how to extend the applicability of the message passing equations. For example, to generate exact equations for non-tree networks, or to susceptible-infected-susceptible dynamics. Conversely, the pair-based (moment closure) representation can allow both of these extensions in a straightforward way [8,9]. Since the physical meaning of each term in the pair-based system is clear, it is also straightforward to make this system applicable to multiple competing diseases on the same network -the number of equations then grows linearly with the number of diseases. However, we note that in the context of configuration network ensembles and Poisson contact and recovery processes, Miller [10] has shown that the dynamics for competing diseases can be solved via a low-dimensional message-type system (see also Karrer and Newman [11]).
In our endeavour to understand the relationship between these two formalisms, we have shown that the pair-based moment closure formalism is applicable to arbitrary recovery processes and, for non-tree networks, provides a lower bound on S i ∀i -for tree networks the representation is exact. On the other hand, we have shown that the message passing formalism is applicable to arbitrary finite networks, where the contact and/or recovery processes are pair-specific or individual-specific, and can incorporate any pure initial system state including vaccinated individuals -or any mixed initial system state where the states of individuals are independent.

APPENDIX A
For the case where y i = 0 ∀i ∈ V , the expected outcome when a single initial infected is seeded uniformly at random (in a tree network) can be computed via the following methods: 1) solve the system N times with each individual in turn as the single initial infected, and then average. This would be an exact but relatively timeconsuming approach. 2) increase z towards 1 such that the ratio between the probability of there being one initial infected to the probability of there being more than one becomes large. We are then left with a sum of two terms, one corresponding to zero initial infecteds (contributing nothing to the time series) and the other to a single initial infected seeded uniformly at random. Thus, dividing the resulting time series (expected number infected) by the probability of having at least one initial infected, i.e. 1 − z N , approximates the desired result. We have achieved considerable success with this second approach in our numerical computations.

APPENDIX B
Setting t ′ = t − τ and applying Leibniz's rule, the time derivative of Eq. 1 can be written: and by then setting f ij (τ ) = T ij e −Tij τ ∞ τ r j (τ ′ )dτ ′ , for Poisson contact processes, Eq. 7 is obtained. Now, substituting from Eqs. 2, 3 and 6, and setting g ij (τ ) = r j (τ )e −Tij τ , for Poisson contact processes, we get: It is thus straightforward to derive the pair-based system represented by Eq. 8 by computing the time derivatives of the right-hand-sides of Eqs. 3, 4 and 6, and then using these same equations to express the derivatives in terms of individual states and pair states. In the case of exponentially distributed infectious periods the pairbased moment closure equations of Sharkey et al. [6] emerge.

APPENDIX C
Let F i←j 0 (t) = H i←j (t) ∈ (0, 1] (∀i, j : j ∈ N i ), and define an iterative process (as in [3]): (C1) From Eq. 21 we have: and since this is true in all cases, we have: and so in general Since F i←j n (t) is bounded below by 1 − t 0 f ij (τ )dτ [3], this iterative procedure must converge from above such that F i←j n (t) → F i←j (t) ≤ H i←j (t) as n → ∞. (C5)