Influential spreaders for recurrent epidemics on networks

The identification of which nodes are optimal seeds for spreading processes on a network is a non-trivial problem that has attracted much interest recently. While activity has mostly focused on non-recurrent type of dynamics, here we consider the problem for the Susceptible-Infected-Susceptible (SIS) spreading model, where an outbreak seeded in one node can originate an infinite activity avalanche. We apply the theoretical framework for avalanches on networks proposed by Larremore et al. [Phys. Rev. E 85, 066131 (2012)], to obtain detailed quantitative predictions for the spreading influence of individual nodes (in terms of avalanche duration and avalanche size) both above and below the epidemic threshold. When the approach is complemented with an annealed network approximation, we obtain fully analytical expressions for the observables of interest close to the transition, highlighting the role of degree centrality. Comparison of these results with numerical simulations performed on synthetic networks with power-law degree distribution reveals in general a good agreement, leaving some questions open for further investigation.

The identification of which nodes are optimal seeds for spreading processes on a network is a non-trivial problem that has attracted much interest recently. While activity has mostly focused on non-recurrent type of dynamics, here we consider the problem for the Susceptible-Infected-Susceptible (SIS) spreading model, where an outbreak seeded in one node can originate an infinite activity avalanche. We apply the theoretical framework for avalanches on networks proposed by Larremore et al. [Phys. Rev. E 85, 066131 (2012)], to obtain detailed quantitative predictions for the spreading influence of individual nodes (in terms of avalanche duration and avalanche size) both above and below the epidemic threshold. When the approach is complemented with an annealed network approximation, we obtain fully analytical expressions for the observables of interest close to the transition, highlighting the role of degree centrality. Comparison of these results with numerical simulations performed on synthetic networks with power-law degree distribution reveals in general a good agreement, leaving some questions open for further investigation.

I. INTRODUCTION
Some epidemic outbreaks die out very rapidly, affecting just a few individuals, while others, caused by the same pathogen, survive for much longer times, infecting considerably larger fractions of the overall population [1]. A meme posted by a celebrity on a social network is rapidly spread to millions of users, while a similar meme posted by a less connected individual rarely becomes viral. What decides the fate of a single spreading process? One of the most important factors is where, in the connectivity pattern mediating the spreading process, the initial seed is located. Intuitively, an infection seeded in a highly connected node has a higher chance of reaching a large number of individuals, but other, less local network features may in principle play a role: For example, a node sitting at the boundary between two communities can be highly effective in giving rise to large outbreaks even if having only a few direct connections. The non-trivial question of how to identify "influential spreaders" in a network has been the focus of an impressive amount of work [2,3] in the last decade. In a nutshell, the problem is the prediction of how large is, on average, a spreading event started by a single node in the network. 1 A slightly more limited goal is to rank network nodes depending on their spreading capability and to correlate such a ranking with purely topological node features, such as degree or other types of centrality.
The seminal paper by Kitsak et al. [4] posed in a clear manner the question, highlighting its non-triviality and reporting a strong correlation between the average size of an outbreak seeded in node n and a particular type of 1 In the epidemic framework a spreading event started in a single node is called an outbreak. We will often use in the following the more general term avalanche. centrality measure, the K-core index [5]. A huge number of other centralities have been proposed and tested as ways to identify influential spreaders [3,6,7], with different levels of success, depending in many cases on the type of network substrate considered for the empirical validation. Among the most firmly grounded results, Ref. [8] exploited a mapping to bond percolation to relate spreading influence at the epidemic transition to nonbacktracking centrality [9,10], while Ref. [11] extended the approach presenting a message-passing framework allowing to compute the average outbreak size for any value of the control parameter.
All these results were found for the Susceptible-Infected-Removed (SIR) dynamics [12], the simplest model for epidemics allowing permanent immunity, or simple variants of it. For this type of models the definition of spreading influence is straightforward: Throughout the whole phasespace, outbreaks have a finite duration and the average size of an outbreak seeded in a node is the measure of how influential the node is. The other large class of epidemic models, whose simplest example is the Susceptible-Infected-Susceptible (SIS) model, describes recurrent dynamics, where each node can be reinfected over and over again [12]. While some results are available [13,14], this class has received nevertheless much less attention. Our purpose in this paper is to fill this gap, performing a systematic study of the problem of identifying influential spreaders for a parallel, discrete-time, version of SIS dynamics. As for non-recurrent (SIR-like) epidemic models, the phase diagram for SIS dynamics consists in a healthy phase for small values of the control parameter (see below) separated from an epidemic phase by a critical value of the control parameter, the epidemic threshold. While the nature of the healthy phase is the same for both classes of models, the possibility of multiple reinfections changes the nature of the epidemic phase. Above the threshold there is a nonvanishing chance that a stationary "endemic" state is reached, where the epidemic survives forever, with a steady finite density of infected nodes. This possibility implies that for SIS dynamics there are different possible definitions of what an influential spreader is. Below the critical point, since all avalanches eventually end, one can take as a measure of spreading influence of node n the total duration or the total size of an avalanche seeded in n, just as in the SIR case. In the supercritical region instead, some avalanches are infinite, i.e. they give rise to the (neverending, at least in the thermodynamic limit) endemic state. In this case being an influential spreader may have two distinct meanings. Good spreaders can be nodes giving rise to an infinite avalanche with large probability or nodes giving rise to large or long finite avalanches. These definitions of influential spreaders do not necessarily coincide. For this reason we consider three main observables of interest [15]: • The probability b n that an avalanche seeded in node n is finite (i.e., it does not lead to the endemic state above the transition). Below the transition, b n = 1; above it, b n < 1.
• The average duration T n of (finite) avalanches seeded in node n. In the following we will consider a discrete-time dynamics, hence the duration of each avalanche will be an integer.
• The average size S n of (finite) avalanches seeded in node n. This quantity is equal to the total number of activation events. Since each node can be activated more than once, S can in principle be larger than the total number of nodes N .
Another observable of interest is the coverage, the average number of distinct nodes reached by an avalanche [16]. We performed some comparisons between this quantity and S n , finding minimal differences. For this reason we will not consider the coverage in the following.

A. Quenched Mean-Field theory
The identification of influential spreaders for SIS dynamics is greatly helped by the application of the general quenched mean-field (QMF) [17][18][19][20] theoretical approach for avalanches on networks proposed by Larremore et al. in Ref. [15]. In that paper a comprehensive theory is presented for the statistical properties of avalanches when the dynamical process is determined by a matrix A nm , whose individual entry is the probability that the avalanche propagates from node n to node m. The approach predicts a transition between a subcritical regime with all avalanches finite (b n = 1) and a supercritical regime where a fraction 1 − b n of all avalanches lasts forever. The transition depends on the largest eigenvalue λ of the A matrix. If λ < λ QM F c = 1 the system is subcritical, if λ > λ QM F c the system is supercritical and infinite avalanches start to appear.
We can apply this theory to the SIS dynamics by considering a parallel discrete-time version of it, analogous to the Independent Cascade Model (ICM) version of SIR dynamics [21]. In this dynamics, at each time step each infected node attempts to transmit the infection to all its direct neighbors and each independent attempt succeeds with probability p. After attempting to infect all its neighbors an infected node recovers and switches back to the susceptible state with a recovery time fixed to 1. The probability that an avalanche propagates from node n to m is therefore A nm = pa nm , where a nm is the unweighted adjacency matrix of the network [22]. Clearly, the largest eigenvalue of A is λ = pΛ M , where Λ M is the largest eigenvalue of the adjacency matrix a. As a consequence the epidemic threshold is p QM F c = 1/Λ M .

Probability of observing a finite avalanche
Within QMF theory the probability b n of having a finite avalanche is obtained by solving iteratively the equation As shown in Ref. [15] this equation admits a solution b n = 1 for any λ and for λ > 1, i.e. in the supercritical regime, another stable solution b n < 1.

Average avalanche duration
Let us consider c n (t) as the probability that an avalanche has a duration smaller than or equal to t, with c n (0) = 0, by definition. The probability c n (t) fulfills the recurrence equation [15] In the limit t → ∞, c n (t) → b n , where b n = 1 in the subcritical phase, and b n < 1 in the supercritical phase; hence Eq. (1). Consider c n (t) = b n − f n (t), with f n (0) = b n . For sufficiently large t, f n (t) is small, so that we can linearize Eq.
(2) to obtain [15] f where for the SIS process that, in the subcritical phase, where b n = 1, reduces to D nm = A nm = pa nm . The avalanche duration distribution p n (t) is given by Therefore, we can write the average avalanche duration T n as If f n (t) decreases sufficiently fast, we can assume that the linearization of f n (t) is valid for all times t, to obtain Combining the previous equations, we have We can obtain an exact expression at the QMF level by inverting this linear relationship, obtaining where T = (T 1 , . . . , T N ) T , B = (b 1 , . . . , b N ) T , and I is the identity matrix.

Average avalanche size
From Ref. [15], the generating function of the avalanche size distribution p n (s), fulfills the equation where in the SIS process. Taking logarithms of both sides of Eq. (11), deriving with respect to z and setting z = 0 one obtains From the definition of the generating function we have φ n (0) = 1, while φ n (0) is minus the average avalanche size, Hence Eq. (13) reads Inverting this linear relationship in matrix format, we can write the QMF solution where S = (S 1 , S 2 , . . . , S N ) T and 1 N is a column vector of ones, of size N . Notice that, contrary to the case of T n , no long-time assumption has been made to derive Eq. (15). It is interesting to observe that, due to the similarity relation H nm = bm bn D nm [15], the average size and duration at the QMF level are not independent, but related by the identity Below the threshold, when b n = 1, average time and duration are equal. This fact can be interpreted in terms of avalanches that infect just a new node in average every time step. On the contrary, in the supercritical regime, b n < 1 in general, and thus S n > T n in average. We remark also that, so far, we have not made any assumption on the value of p, hence these predictions are valid throughout the whole phase-diagram.

B. Annealed network approximation
Eqs. (1), (8) and (15) constitute sets of N closed equations, whose straightforward solutions provide predictions about the ability of each node in a generic network to spread influence It is possible to obtain fully analytical results (and thus obtain more physical insight) by implementing an additional approximation. The annealed network approximation [23] assumes that the network is fully rewired at each time step while keeping individual degrees fixed. In this way dynamical correlations among different nodes are destroyed and the state of each node can depend only on its degree k. Mathematically, the annealed network approximation consists in replacing, for uncorrelated networks, the adjacency matrix by the probabilistic form [24] a nm k n k m k N .
This allows us to write for any function F (k) depending on the degree.

Probability of observing a finite avalanche
Taking logarithms on both sides of Eq. (1), we have where we have assumed p(1 − b m ) to be small, i.e., either p is small, or the system is close to criticality (b n 1). Assuming that b n is a function of the degree k n , and using Eq. (19), we can then write leading to where we have defined the quantity which depends implicitly on p, through the average bk . We can make this dependence explicit by considering the self-consistent equation The value θ = 0, corresponding to b n = 1, is always a solution of Eq. (23). The condition for the existence of an additional non-zero solution is that the function y = Ψ(θ) crosses the line y = θ at a point θ > 0, something that happens when dΨ(θ) dθ θ=0 > 1. This condition translates into p > p ann c , where the threshold corresponds to the inverse of the largest eigenvalue Λ M of the adjacency matrix in the annealed network approximation.
For finite networks of size N , all moments of the degree distribution are finite. In this case, we can expand Ψ(θ) to second order in pθ, corresponding again to the vicinity of the transition point, Solving the previous equation for non-zero θ we obtain where we have defined λ 0 ≡ p/p ann c . This leads to the finite size expressions and

Average avalanche duration
Close to and below the critical point, i.e., assuming again p(1 − b m ) 1 we can write D nm pb n a nm . Plugging this into Eq. (8) we have that can be written as where X n = T n /b n . Given Eq. (30), assuming that X n is a function of k n , and in view of Eq. (19), we can make the ansatz X n = 1+Ãk n , withÃ a constant to be determined. Inserting the ansatz into Eq. (30), we havẽ Using the annealed network approximation, Eq. (19), we haveÃ leading toÃ so that In the subcritical phase, b n = 1 and we obtain the simple explicit form In the supercritical phase, T n depends on moments of k and b in a more complicated way. Assuming we are close to the critical point, in a finite network, we can use Eq. (27) to obtain Substituting into Eq. (34), we finally have that can be written in the fully explicit form

Average avalanche size
From Eq. (17), we have S n = T n /b n , which translates, from the annealed network solution for T n , into Hence in the subcritical phase we have while in the supercritical phase The relation S n = T n /b n implies that in the supercritical case avalanche size and duration are related in a nontrivial way. The explicit expressions point out that the average size (minus one) is proportional to the degree of the seed close to the epidemic threshold. The same is true for the average duration, but nonlinear effects are stronger in this case.  21) and (27), respectively. In (c), λ0 = p k 2 / k corresponds to the annealed network threshold.

III. NUMERICAL SIMULATIONS
In this Section we perform a systematic comparison between the results of the theoretical approaches presented in the previous Section and numerical results obtained by simulating the SIS dynamics on synthetic uncorrelated power-law distributed networks, built according to the Uncorrelated Configuration Model [25]. The comparison is aimed at validating the accuracy of the theoretical predictions and at identifying the origin of possible inaccuracies in the various approximations performed. The networks considered have a degree distribution P (k) ∼ k −γ between a minimum value k min = 3 and a maximum k max = N 1/2 (for γ < 3) and k max = N 1/(γ−1) for γ > 3.
We consider as infinite all avalanches whose duration is longer than t inf = 300 steps. This threshold, chosen for computational convenience, is larger than the average duration of finite avalanches except for a narrow interval around the critical point. To improve the readability of figures, plotted values of T n , S n and b n are binned over suitably chosen intervals.
We consider two values of γ corresponding to the two different scenarios occurring for the SIS transition [12]. For γ < 5/2 the transition is triggered by a subextensive network subset (the max K-core), composed of densely mutually interconnected nodes [26]. In this case QMF the-  21) and (27), respectively. In (c), λ0 = p k 2 / k corresponds to the annealed network threshold.
ory and the annealed network approximation are known to work well, providing an estimate of the position of the epidemic threshold which is asymptotically exact in the large size limit [27,28]. For γ > 5/2 the epidemic transition is triggered by a different and highly nontrivial mechanism. Nodes with high connectivity (large hubs) considered together with their immediate neighbors, form star graphs. When p becomes larger than p QM F c = 1/Λ M ≈ 1/ √ k max the largest of these star graphs becomes able to sustain epidemic activity for a long time, proportional to exp(k max ). For larger values of p other star graphs can independently sustain activity. QMF theory describes only the behavior of isolated stars. However, the epidemic in each of them survives for a time proportional to exp(k i ), i.e., there is no endemic activity in the system (surviving for a time exp(N )) if stars are isolated [29,30]. As a consequence the QMF estimate p QM F c = 1/Λ M is only a (not strict) lower-bound for the actual epidemic threshold. The consideration of long-range interactions among stars, neglected by QMF theory, allows to understand how endemic activity can be established and to calculate the actual critical threshold p c > p QM F c [16,31].

Probability of observing a finite avalanche
In Fig. 1(a) we first compare the probability 1 − b of observing a finite avalanche calculated within the QMF approach [Eq. (1)] with numerical results for γ = 2.25. It turns out clearly that QMF is very accurate in reproducing the behavior of 1 − b . The bending of the numerical curves for small λ − 1 is due to finite size effects reflecting the existence of a size-dependent effective threshold. As N increases the agreement between theory and numerics extends to lower values of λ − 1, confirming that the QMF threshold estimate is asymptotically exact [27,28]. The other two panels display instead the accuracy of the fully analytical predictions obtained when also the annealed network approximation is implemented. Also in this case we find a very good agreement between theory and simulations. Note that no parameter has been fitted.
In Fig. 2 the same analysis is performed for γ = 3.50. The scenario is completely different. Also in this case the agreement between theory and simulations for what concerns 1 − b holds above a size-dependent threshold, but now the threshold moves toward larger values of λ − 1 as N is increased. This shows that the actual SIS threshold is larger than the QMF prediction, λ c > λ QM F c = 1, and the discrepancy grows with the system size [16,31]. The disagreement is even more evident in panels (b) and (c): for λ = 1.1 all b n = 1, despite the QMF prediction that b n < 1 as λ > λ QM F c = 1.

Average avalanche duration and size
In Fig. 3 we report a comparison between QMF predictions (along the horizontal axis) and numerical results (vertical axis) for γ = 2.25. Results are presented for both the average duration T n (left column) and the average avalanche size S n (right column). In the subcritical case an excellent agreement is found for S n for any value of λ, while the same is true only for durations up to T n ≈ 1, corresponding to small values of λ. For larger durations a disagreement starts to appear for T n . This difference can be rationalized based on the different treatments of Section II A. The prediction for S n is valid for any size, without any small size assumptions. This failure of QMF theory in predicting the avalanche duration can be instead attributed to the breakdown of the assumption made to obtain Eq. (7). Close to the transition point, critical slowing down leads to a slow decay of f n (t) toward zero and it is not appropriate to assume linearization to hold starting from t = 1.
For the average avalanche size in the supercritical case, Fig. 3(d), large discrepancies between theory and simulations occur for λ slightly larger than 1, while an excellent agreement is found again for larger λ. This finding can be ascribed to the intrinsic impossibility of discriminating precisely finite or infinite avalanches in networks of finite size. At the critical point the distribution of avalanche times features a power-law decay plus incipient infinite avalanches, characterized by long but still finite durations. Some of them are classified as finite according to the criterion T < t inf and therefore contribute to T n increasing its value by a large amount. Reducing t inf alleviates this problem, but at the price of risking to spuriously consider as infinite avalanches those belonging to the power-law decay. This difficulty is naturally mitigated by considering larger systems, or going to larger λ, as the separation between the two components of the duration distribution becomes more pronounced. For the average avalanche duration in the supercritical case, Fig. 3(c) this intrinsic problem with finite size (which tends to increase T n ) competes with the breakdown of the linearization for f n (which tends to decrease T n ). As a consequence, numerical results are not far from QMF predictions. Figure 4 reports the same analysis for γ = 3.50; the same qualitative scenario emerges concerning the agreement between QMF theory and simulations. This is somewhat surprising, given the known failure of QMF in describing accurately the transition in this case. However, this failure becomes most evident for very large networks, while only moderate values of N are considered here [28].
Further insight into what happens for long subcritical avalanches close to the threshold is provided in Fig. 5, where we plot the average avalanche size versus the corresponding duration, obtained from numerical simulations. As we can see here, for small values of λ, the QMF prediction S n ∼ T n is well fulfilled. Closer to the critical point, on the other hand, we observe a departure from the linear behavior, which reflects the mismatch between the poor agreement with the QMF theory for large λ of the avalanche duration T n and the very good agreement of the avalanche size S n . Numerically, we observe instead a good fit for λ close to 1 to the form S n ∼ T 2 n , see  and (d), which is a consequence of the critical scalings of p n (s) and p n (t) [15]. in the subcritical regime, and rather poor away from the critical point in the supercritical regime. This is due to the fact that the annealed network threshold p ann c largely overestimates the actual threshold. A consequence of this overestimate is that for λ sligthly larger than 1, Eqs. (39) and (42) predict negative values of T n and S n .
Finally, in Fig. 8 we report the dependence of the average avalanche duration T n and size S n in the supercritical regime, as a function of the probability of observing a finite avalanche b n . For both γ = 2.25 and γ = 3.50, we find continuously decreasing functions of b n , with large but not huge fluctuations. The monotonicity allows us to assume, as a rule of thumb, that if a node has a larger probability 1 − b n to originate an infinite avalanche than a second node, the first will also give rise on average to longer and larger finite avalanches. In other words, ranking nodes according to b n , T n or S n produces the same ordering, apart from fluctuations.

IV. CONCLUSIONS
In this paper we present a thorough investigation, both analytical and numerical, of the problem of identifying influential spreaders for SIS dynamics on generic networks. We first apply to this problem the theoretical QMF framework introduced by Larremore et al. in Ref. [15]. In this way we are able to write down closed equations, whose numerical solution allows us to calculate all the observables of interest in the whole phase-diagram. Within this approach a further step, consisting in the implementation of the annealed network approximation, allows us to derive explicit analytical predictions valid below and above (but close to) the critical point. These predictions point out the importance of degree centrality in determining the spreading influence. Comparison of these results with analytical simulations, performed on synthetic networks with power-law distributed degrees P (k) ∼ k −γ , confirms the substantial accuracy of the theoretical approaches below the threshold, in particular for what concerns the avalanche size. Above the expected QMF threshold discrepancies arise, which tend to disappear inside the active region of the phase-diagram.
Let us discuss in detail strengths and limitations of the theoretical approaches considered here. The QMF theory of Larremore et al. [15] is based on three approximations: The neglect of dynamical correlations among neighbors' states; the locally tree-like assumption that allows to write down Eqs. (1) and (2); the linearization entailed in Eq. (3) for the derivation of T n . The annealed network approximation adds another assumption on top of them. For strongly heterogeneous unclustered networks, such as those generated by the Uncorrelated Configuration Model with γ < 5/2, the first two assumptions are essentially correct. Studies on SIS dynamics have shown that even the annealed network approximation works very well in this case. The linearization holds instead only if one is sufficiently far from the epidemic threshold. Close to it, the regime with exponential decay into the stationary state is preceded by a long transient during which the decay is power-law (critical slowing down), f n is not small and linearization does not hold; the dominating contribution to the overall duration of avalanches comes from this long preasymptotic regime. For this reason predictions for T n are not consistent with simulations, for values of p close to the threshold. When γ > 5/2, the agreement between our theoretical results and simulations is expected to be poorer. The reason is that already the first approximation, the neglect of dynamical correlations, fails to capture the complex physical mechanisms underlying the epidemic transition and thus introduces large errors. See Ref. [12] for more on this. A fundamental consequence is that already the QMF estimate of the epidemic threshold is qualitatively and quantitatively incorrect. This leads to the expectation that predictions for influential spreaders in the supercritical case are largely off target. The relatively small errors observed here are believed to be an effect of the relatively small network sizes considered. The additional annealed network approximation, which does not work for γ > 5/2, makes the theoretical approaches perform even worse. Recent results [31] have pointed out the great complexity of the physical mechanisms under-lying the epidemic transition in random networks with γ > 5/2. This understanding constitutes the necessary groundwork for reaching a satisfactory predictive ability for influential spreaders also in this case, a goal that remains to be reached.
Finally, it is worth remarking that our results have been derived for a particular, discrete-time, version of SIS dynamics, rather different from the usual continuoustime version. The present version, with parallel dynamics and recovery time fixed to 1, allows a freshly infected node to immediately reinfect the node that infected it in the preceding time step. This kind of process is instead strongly hindered in continuous-time SIS, where the neighbor j that infected node i remains infected for a while so that the number of connections of i available for further spreading is effectively k i − 1. This kind of dynamical correlations has strong effects for influential spreaders in SIR dynamics. In that case degree centrality is a good approximation of the spreading influence of individual nodes, but close to the transition a better approximation is provided by the non-backtracking centrality [8,11,32], which may differ markedly from the former. The quest for accurate predictors of spreading influence for continuoustime SIS dynamics close to the transition is an interesting open avenue for further research.