A unified framework for percolation on networks

In recent years, many variants of percolation have been used to study network structure and the behavior of processes spreading on networks. These include bond percolation, site percolation, $k$-core percolation, bootstrap percolation, the generalized epidemic process, and the Watts Threshold Model (WTM). We show that --- except for bond percolation --- each of these processes arises as a special case of the WTM and bond percolation arises from a small modification. In fact"heterogeneous $k$-core percolation", a corresponding"heterogeneous bootstrap percolation"model, and the generalized epidemic process are equivalent to one another and the WTM. We further show that a natural generalization of the WTM in which individuals"transmit"or"send a message"to their neighbors with some probability less than $1$ can be reformulated in terms of the WTM, and so this apparent generalization is in fact not more general. Finally, we show that in bond percolation, finding the set of nodes in the component containing a given node is equivalent to finding the set of nodes activated if that node is initially activated and the node thresholds are chosen from the appropriate distribution. A consequence of these results is that mathematical techniques developed for the WTM apply to these other models as well, and techniques that were developed for some particular case may in fact apply much more generally.


INTRODUCTION
The percolation properties of a static network G help us learn about processes spreading on G. Percolation comes in many flavors, and the information we gain depends on which variety we choose. Most frequently, we study bond or site percolation, but researchers have also found that k-core percolation, bootstrap percolation, the generalized epidemic process, and the Watts threshold model (WTM) provide valuable insights [2,6,8,10,11,14]. These processes are closely related, and indeed similar mathematical approaches have been used to study several of these processes [9,13]. Our main result is that all of these (and some related) processes can be derived in the context of the WTM.
We briefly review these processes: In bond percolation, some edges are randomly selected independently with probability p each to be retained while the remaining edges are deleted (with probability 1−p). Similarly in site percolation, some nodes are randomly selected with probability p and the remaining nodes are deleted. Typically our interest is in identifying the nodes in the connected components of the residual network, and whether a "giant" component exists (that is a component whose size is proportional to the network size in the infinite network limit).
In k-core percolation all nodes with degree less than some specified k are removed. This removal may reduce some nodes' degrees below k. If so, these are removed. This "pruning" process repeats until reaching a state in which all nodes have degree at least k. This remaining network is called the "k-core" of the network. In a variant, "heterogeneous k-core" percolation [4], each node is assigned its own threshold value. We note that many authors have used the term "bootstrap percolation" to denote k-core percolation, and indeed this appears to be the original term [2,8,10], but we reserve "bootstrap percolation" for a closely related dual process.
In bootstrap percolation (introduced in [2] where it is called "diffusion percolation") a collection of nodes is initially "activated". Then any inactive node with at least m active neighbors becomes active. The process repeats until all remaining inactive nodes have fewer than m active neighbors. We introduce a natural generalization analogous to heterogeneous k-core percolation in which each node is assigned its own threshold. This "heterogeneous bootstrap percolation" does not appear to have been studied previously.
In the "generalized epidemic process" (GEP) [6,11], we think of an infection spreading through the network. If a node has a single infected neighbor, its probability of becoming infected is p 1 . If it escapes infection, but a second neighbor becomes infected, then its probability of becoming infected is p 2 . This repeats and the probability of successful transmission on the m-th neighbor's infection is p m . If p m = p for all m, then this is the classical Reed-Frost model [1] for a susceptible-infectedrecovered disease [3]. If p m decreases as m increases, this could model decreasing susceptibility due to an improved immune response as exposures accumulate, or it could simply represent pre-existing heterogeneities in susceptibility that are revealed as the number of exposures increases. An increasing p m would model some synergistic or cumulative effect of exposures as seen in "complex contagions" [7]. For comparison with other models, we allow p m to depend on d u , the degree of node u.
In the WTM, each node u is assigned an individual threshold r u which we assume is assigned to u independently at random, with a probability that may depend on its degree d u . The probability node u has a given r is given by P (r u = r|d u ) = q(r|d u ). A node begins either active or inactive. If an inactive node u has at least r u active neighbors then it becomes active. We assume that the initially active nodes may be chosen independently at random (which can be modelled by having q(r|d) > 0 for some r ≤ 0), or they may be chosen by some other rule, in which case we treat the set of initially active nodes as an input to the algorithm. Often a common threshold r * is chosen so P (r u = r * ) = 1 or a common fraction ρ * is chosen so P (r u = ρ * d u |d u ) = 1.
Typically we are interested in the final set of active nodes, but sometimes we may be interested in the temporal dynamics as these nodes become active [5,13]. If we are interested in the temporal dynamics, then we must assign additional rules for how long it takes for a node to become active. Although the timing will depend on the details of the additional rules, the final set of active nodes is uniquely determined once the network, thresholds, and initially active nodes are chosen. We focus just on the final state.
We will show that by appropriately choosing the distribution of r and the initial set of active nodes, we can recover other versions of percolation from the WTM, including site percolation, k-core percolation, bootstrap percolation and the GEP. Going a step further, we show that the heterogeneous k-core of a network, the deleted nodes in heterogeneous bootstrap percolation, and the set of "infected nodes" in the GEP are in fact all equivalent to the set of active nodes emerging from the WTM. That is, given one model and the corresponding distribution of thresholds, we can define the distribution of thresholds of the other models to yield the same sets of nodes with the same probabilities. A natural generalization of the WTM to consider has each node "transmitting" or "passing a message" with some fixed probability T . We show that by modifying the threshold distribution the original WTM (with T = 1) can recover the same outcomes as for any other T 0 < 1, and thus allowing for T < 1 does not enlarge the set of possible outcomes.
Finally, we investigate the relation with bond percolation. If our interest in bond percolation is to identify the connected component containing a given node u, then we can find this component using the WTM with u as the initially active node and appropriate threshold distribution. To find all connected components, we can start the WTM with one initially active node, run it to completion, and then choose a remaining inactive node and rerun the WTM, iterating until no inactive nodes remain. The set of nodes that are activated in each pass correspond exactly to the components found in bond percolation.

ANALYSIS
We begin by explicitly describing an algorithm which implements the WTM in figure 1. Each node is assigned a weight w uniformly between 0 and 1. We use a model-dependent function dist func to convert the Input: Input network G, function generating numbers from a distribution dist func, and set of initially active nodes I0. Output: Set ActivatedNodes of activated nodes.

function WTM Assign Weights(G)
for u in G.nodes do Assign weight[u] uniformly from (0, 1) return weight weight into a threshold r, possibly depending on the degree of the node. Typically we choose the function to return the largest value r u such that If there are specified initially active nodes, they are given a threshold of zero. Alternately we can allow the randomly assigned threshold to permit values r u ≤ 0 in which case these nodes are initially active, and the iterative process begins. For each active node, we reduce the threshold of any inactive neighbor by 1. If a node's threshold reaches 0 it activates.
Once the random thresholds and index nodes are set, the final outcome of the WTM is deterministic. To show that the other percolation processes give the same behavior, we will show how to structure these processes to start from the same random weights w u and deterministically yield a final state that is identical to the state found by the WTM for some threshold distribution.

Site Percolation
In site percolation, each node is retained with probability p or deleted with probability 1 − p. To simulate site percolation, we can generate a random number w u ∈ (0, 1) independently and uniformly at random for each node u. If w u < p (which occurs with probability p) we keep u, otherwise we delete it. It is straightforward to see that this is identical to the algorithm in Figure 1 if the threshold is set to be r u = 0 whenever w u < p and r u = d + 1 otherwise. In this case, with probability p the node has threshold 0 and so is initially active, while with probability 1 − p it has threshold d + 1, and so can never become active as it will have at most d active neighbors. Thus, nodes are retained in site percolation iff they are active in the WTM. This is demonstrated in figure 2.

k-core Percolation
We now consider k-core and heterogeneous k-core percolation. The classical k-core percolation is deterministic: each node with fewer than k neighbors is deleted. This iterates until all remaining nodes have at least k neighbors among the remaining nodes. To reproduce this with the WTM, we set r u = d u − k + 1 regardless of w u .
All nodes with d u < k activate immediately in the WTM. In k-core percolation these same nodes are immediately deleted. For a given node u not in this set, let the number of neighbors activated/deleted be denoted n u . In the WTM, any remaining node with d u −k < n u then activates. In k-core percolation, any node with d u − n u < k is deleted. Again, these nodes are the same. Iterating as shown in figure 3, the set of activated nodes in the WTM is the set of deleted nodes in k-core percolation.
We can repeat this for heterogeneous k-core percolation. We assign weights w u to each node and map that to a heterogeneous k-core threshold k u . We can map this weight to a WTM threshold such that if the node is assigned a given k u , it is assigned r u = d u − k u + 1 for the WTM. Then the WTM and heterogeneous k-core percolation are equivalent: a node is deleted in heterogeneous k-core percolation iff it is activated in the WTM.

Bootstrap Percolation
In bootstrap percolation, some initial nodes are activated, and nodes become active once they have at least k active neighbors (k is the same for all nodes). This is similar to k-core percolation, but k-core percolation is subtractive while bootstrap percolation is additive [4]. (top) Each node is assigned a weight. (middle) Site percolation: If the weight is less than p, the node is kept, otherwise it is deleted. (bottom) WTM: if the weight is less than p it is given a threshold of 0. Otherwise it is given d + 1. Those with threshold 0 are shown in color, and activate immediately. Those with threshold larger than their degree are uncolored and never activate.
We consider bootstrap percolation with a set I 0 of initially active nodes, and compare it to the WTM with r u = k for all nodes except the nodes in I 0 which are initially active. Following a similar argument to the WTM/k-core percolation equivalence, we see that with this definition, the WTM adds nodes to the system exactly when bootstrap percolation does.
If we consider heterogeneous bootstrap percolation, then a similar argument also shows that it is equivalent to the WTM. Because of the correspondence between the WTM and heterogeneous k-core percolation, this means that heterogeneous bootstrap percolation is equivalent to heterogeneous k-core percolation, with the deleted nodes in heterogeneous k-core percolation matching the activated nodes in bootstrap percolation.
At first glance, this contrasts with observations of [4]. They showed that the k-core and the activated nodes in bootstrap percolation are not the same and can have different internal structure. In fact, the distinction between the two turns out to be that the nodes defined to be active for the bootstrap version are the nodes deleted in the k-core version. They are complementary processes. Any scaling behavior observed in heterogeneous k-core percolation can be observed in heterogeneous bootstrap percolation. This is previously known [2]. Figure 4 demonstrates the equivalence between heterogeneous bootstrap and heterogeneous k-core percolation.

Generalized Epidemic Process
We now consider the generalized epidemic process (GEP) [6,11] for which the m-th "infected" neighbor infects node u (given that the previous m − 1 did not) with probability p m (d u ). So p m (d u ) is the probability the first m − 1 infected neighbors do not infect u but the m-th does. As before we assign a random number w u ∈ (0, 1) and map this to m u .
For the WTM we use the same mapping from w u to r u so r u = m u . The node u activates exactly after the r uth neighbor activates, while in the GEP u is infected at exactly the same step. Thus any GEP can be expressed as a WTM. Showing the inverse is straightforward, and so the GEP and WTM are equivalent. If we do not allow p m to depend on d u (as in the original version), then this is a special case of the WTM.

Bond Percolation
We finally consider Bond Percolation. Typically in bond percolation, we can consider the edges in any order, choosing to keep each edge with probability p or delete it with probability 1 − p independently of the others. We   then identify the connected components of the network. We will focus our attention just on identifying which nodes form connected components after bond percolation; we are not interested in which edges exist in the components. In figure 5 we compare a bond percolation approach to finding the component containing a particular node with a WTM approach for finding the same component. We first perform bond percolation. We then select an initial node (highlighted in the figure), and follow edges out from that node in the percolated network to find its component. Nodes are labeled with r where r is the number of edges of the original network that were encountered (but deleted) prior to an undeleted edge.
We can think of this as being indistinguishable from selecting an initial node, following edges out from that node in some order, where each time an edge is considered, it is deleted with probability 1 − p or followed with probability p. The probability that the first r edges to a node are deleted but the next is not is p(1 − p) r .
We compare this with the WTM with a threshold of τ u = r u +1. The activated nodes are identical to the component found using bond percolation. In general, assigning nodes a threshold of τ where τ ≥ 1 is taken with probability p(1 − p) τ −1 will yield a set of active nodes from an initially active node which come from the same distribution as the component of that node following bond percolation. The breadth-first search figure is the implementation of the WTM shown in Figure 1, but we high-light that an alternate implementation with a depth-first search would yield the same outcomes.
In fact, we can generalize this approach to find all the components. The steps in our process are to begin with a network, and assign thresholds using a geometric distribution: for a threshold of τ the probability of τ is p(1 − p) τ −1 . We then select a node and successively add nodes to its component once their threshold number of neighbors have been visited. This process is likely to terminate without exploring all nodes. If this happens, we iteratively select a new node and add nodes to its component whenever their threshold number nodes have been visited (either in this stage or while building a previous component). The resulting components match the components observed in bond percolation. Nodes are activated exactly when they are added to a component in the bond percolation, and identifying in which iteration they are activated tells us which component they are part of. We again highlight that the order the active nodes in a given iteration are processed is not significant. As long as the initial nodes of each pass are chosen in the same order, the precise details of the search algorithm do not determine which nodes belong to which component. FIG. 6: Activated clusters found using depth-first (left) and breadth-first (right) searching using the WTM with a threshold of τ occuring with probability p(1 − p) τ −1 (independently of d) and p = 0.51 for a 9 × 9 lattice. The number at each node is its threshold. The circled nodes are the initial nodes chosen for each cluster. The bottom left node is chosen first, and its cluster traced out. The next cluster is initialized by the bottom-most of the left-most remaining nodes. Thick colored edges formed the final interaction that caused activation. Non-existent edges failed to cause activation (but moved the node closer to its threshold). Dashed black edges were not tested because both nodes were already active when the edge was considered. The clusters remain the same for both search orders (but edges change).
To arrive at bond percolation, the thresholds for the WTM process are assigned from a geometric distribution. It would be interesting to study whether a different distribution could be interpreted in the context of a generalized bond percolation.

DISCUSSION
Many percolation processes have been studied in networks. We have shown that site percolation, bootstrap percolation, k-core percolation, and the GEP are all special cases of the WTM. In fact, the GEP we consider is equivalent to the WTM, and if we allow a node-specific threshold then both bootstrap and k-core percolation are also equivalent. Which one should be considered the "base" model is a matter of personal choice.
Bond percolation is closely related to the WTM, but to arrive at an equivalent model, the WTM assigns thresholds from a geometric distribution, activates a node, follows the WTM process to completion, and then activates another node. The successive sets of activated nodes occur with the same probability as would be found in bond percolation.
We have further shown that generalizing the WTM to allow for a homogeneous transmission probability T from active nodes to neighboring inactive nodes results in a model which can be thought of as a special case of the WTM. Thus the potential space of models is not increased by this modification.
This commonality helps to explain why similar behaviors are observed and similar mathematical methods apply to these different processes.