Criticality in spreading processes without time-scale separation and the critical brain hypothesis

Spreading processes on networks are ubiquitous in both human-made and natural systems. Understanding their behavior is of broad interest; from the control of epidemics to understanding brain dynamics. While in some cases there exists a clear separation of time scales between the propagation of a single spreading cascade and the initiation of the next -- such that spreading can be modelled as directed percolation or a branching process -- there are also processes for which this is not the case, such as zoonotic diseases or spiking cascades in neural networks. For a large class of relevant network topologies, we show here that in such a scenario the nature of the overall spreading fundamentally changes. This change manifests itself in a transition between different universality classes of critical spreading, which determines the onset and the properties of an avalanche turning epidemic or neural activity turning epileptic, for example. We present analytical results in the mean-field limit giving the critical line along which scale-free spreading behaviour can be observed. The two limits of this critical line correspond to the universality classes of directed and undirected percolation, respectively. Outside these two limits, this duality manifests itself in the appearance of critical exponents from the universality classes of both directed and undirected percolation. We find that the transition between these exponents is governed by a competition between merging and propagation of activity, and identify an appropriate scaling relationship for the transition point. Finally, we show that commonly used measures, such as the branching ratio and dynamic susceptibility, fail to establish criticality in the absence of time-scale separation calling for a reanalysis of criticality in the brain.

At this critical point, these systems exhibit shared scale-free statistics exhibiting characteristic power-laws. Neuronal spreading cascades or avalanches with similar critical exponents to that of directed percolation have been observed experimentally in neural systems that differ in size by many orders of magnitude, from in vitro slices of a few hundred neurons, to whole-brain in vivo calcium imaging and functional magnetic resonant imaging (fMRI) [8,9,11,16,17]. These neuronal avalanches are at the core of the critical brain hypothesis, and link together self-organizing principles in brain dynamics and connectivity with optimal information processing [13,15]. Equivalent power-law distributions have been found to emerge naturally in the course of training artificial neural networks, suggesting they are a generic property of neural networks [18].
Although the mapping of neuronal avalanches to branching processes has proven quite successful, unavoidable discrepancies have appeared in recent years. There's an ongoing debate on whether these systems are really critical, quasi-critical, sub-critical, or a different definition altogether [19,20]. This debate stems at least in part from challenges identifying a suitable order parameter and determining whether observed avalanches are truly power-laws following the predicted mean-field values. Although an avalanche size distribution p(S) ∼ S −τ with τ ≈ 1.5 (consistent with mean-field) has been widely reported in the literature [11,16,21], avalanches with exponents ranging from 1.2 to 2.5 have also appeared [10,14,[22][23][24]. Whether the critical exponents even belong to directed percolation has also been challenged, with some proposing a oscillation-synchronization transition instead of a percolation transition [23][24][25][26]. These observations are complicated by the experimental limitations of sub-sampling and coarse-graining [27]. Further, experimental constraints make it challenging to distinguish sub-critical avalanches from those truncated by finite-size effects.
Some of these issues however, might be due to the often overlooked fact that real systems rarely show proper time-scale separation; a classical branching process only allows for nodes to be excited when induced to do so by an antecedent node, and branching processes typically presuppose one "root" node. In other words, a branching process description assumes avalanches propagate and terminate on timescales much faster than the initiation of new avalanches. Neural systems are not so simple however -neurons can spontaneously activate due to "minis" or due to external sensory input. For smaller and intermediate systems, where multiple independent cascades are rare, the assumption of time-scale separation is not particularly limiting. If the rate of spontaneous activity is not too high and all activity is aggregated into a single avalanche, then τ changes from the expected 1.5 to 1.25 [28]. However, in large neural systems, there are no global quiet periods with which to delimit neuronal avalanches as in smaller systems. Some attempts at defining avalanches by activity exceeding a threshold exist, but it is not clear whether these thresholds identify a genuine critical point [25,26]. This reflects the general challenge of defining criticality in strongly driven systems. However, some progress has been made. To disentangle neural avalanches in whole-brain fMRI and study their statistics, Tagliazucchi et al. used physical proximity (i.e. nearest neighbour connections) to delimit avalanches, instead of binning all activity together [11]. The same method was necessary to assess criticality in whole-brain zebra-fish data [29]. The approach of using network topology to identify avalanches makes sense, as information processing can only occur between connected elements of the network. To this end, neuronal avalanches have been generalized by "causal webs", which uses network structure to separate out independent avalanches [30].
It has remained an open question whether a genuine critical point with scale-free activity can exist alongside spontaneous activity, particularly as other markers of a directed percolation transition (such as a unity branching ratio and the appearance of an active fraction) are all affected in different ways by spontaneous activity. We address the issue here using a minimal spreading process with a spontaneous activation rate, making it a discretetime ε-SIS (susceptible-infected-susceptible) model, and study it on a variety of complex networks.
Without spontaneous activations, there are no concurrent independent cascades of activity, and a directed percolation phase transition is present. We show that the introduction of spontaneous activations means that the macroscopic markers used to identify the directed percolation transition, (i.e. the appearance of an active fraction, or a branching ratio of one) no longer identify a phase transition. Nonetheless, by using the network structure to disentangle causally unrelated avalanches, we can define a phase transition even in the presence of spontaneous activations, with scaling-relations between exponents and finite-size scaling. We perform an extensive study of the critical properties at this phase transition on a variety of network structures and show that the presence of any spontaneous activation changes the underlying universality class from that of directed percolation to that of undirected percolation, while preserving some features of directed percolation. To explain these results, we derive an analytical mean-field theory for branching processes with spontaneous activation and show that the appearance of undirected percolation exponents is a direct result of the merging of initially independent avalanches. The dynamics here exhibit two independent avalanches, one with two roots (node 1 at time t = 1 and node 4 at time t = 2), and one with a single root (node 0 at time t = 3). With spontaneous activations, spatially distinct events can overlap in time (e.g. t = 3 and t = 4) and initially distinct cascades of activity can overlap to form larger avalanches.

A. Model
To study the effect of spontaneous activations, we consider here a discrete time ε-SIS process [31] on directed networks equipped with spontaneous activations (see Fig. 1). At each time step, a given node can be activated by means of a spontaneous activation with probability p or through an incoming link by an infected parent with probability q. More precisely, the probability that node i is activated at time t + 1 is given by: where m(i, t) counts how many parents of node i were active at time t. Therefore spreading events occurs on a timescale of 1, while spontaneous activations occur on a timescale of p −1 . This model be considered a type of Domany-Kinzel cellular automaton [32], or akin to some form of mixed-percolation [33]. In our model, nodes do not remain activated, but recover and so may be reactivated the following time step. In the Appendix H, we also consider a variant of this model that includes the immunization of nodes, i.e., a susceptible-infectedrecovered (SIR) process, with multiple initial spreaders. Large systems with spontaneous activations will have concurrent and possibly unrelated avalanches. We employ the network structure itself to identify independent avalanches, using the "causal-webs" approach described in [30]. To summarize the approach, we identify nodes with no active parents (i.e., no possible source of network-borne activation) as "roots" of newly-initiated avalanches (see Fig. 1). Nodes with active parents inherit the avalanche labels of their parents. As avalanches overlap, they are merged together, so that nodes only ever have one label. This merging reflects that true causal information is often obscured in real systems, or that contributions from both streams of activity are necessary for activation. This description of avalanches maps naturally to the clusters of traditional percolation.
The model has two limiting cases: i) For p = 0, this model is a pure branching process with branching parameter q, which belongs to the directed-percolation universality class. This case corresponds to neural activity where avalanches are infrequent and a single leading neuron can be positively identified in each avalanche. ii) q = 0 corresponds to the ordinary percolation model on a directed network with probability p. This corresponds to neural activity that is entirely driven by external sources or spontaneous activation and do not spread on the network, such as in the retina. Both limits exhibit continuous phase transitions, but fall into different universality classes, and are characterized by power laws exhibiting different critical exponents. Establishing that a phase transition exists for p and q that are simultaneously nonzero and the universality class of this transition are the principle efforts of this paper.

B. Numerical results
The most experimentally accessible indicator of criticality in systems with activity spreading is the size distribution of clusters, which shows a different critical exponent in directed and undirected percolation. For all p, at some critical q c (p) we observe a transition that defines a critical line. Below the critical point, with q < q c (p), avalanches are limited in size (see Fig. 2a), while above the critical point, a permanent giant component affecting a non-zero fraction of the network appears. At the critical point, the exponential cut-off that characterizes the sub-critical phase diverges and the avalanche distribution is described asymptotically by a power law. The appearance of these power laws is used to identify the critical line in our simulations. By studying the critical exponent characterizing these power laws, we can identify the universality class of the critical line.
Because critical exponents depend on the network topology and dimension, we study avalanche distributions with extensive simulations on a variety of relevant network architectures, including the small-world [34] and power-law networks relevant to disease spreading (Figs. 3a, 3b, 3d, and 3e), the hierarchical modular network [22] recently used as a brain connectome analogue (Fig. 3f), and the analytically-tractable k-regular network (Fig. 3e). Strikingly, for every level of spontaneous activity there is a transition between two power-law exponents in the critical avalanche distribution.
We find that in all cases the first power-law exponent is consistent with the directed-percolation exponent for that system, while the latter exponent is indistinguishable from the pure percolation exponent (see Table I). Yet, to the best of our knowledge, no directed-percolation Approach to the phase transition in directed percolation. a. Avalanche distributions for with different spreading parameters q with spontaneous activation rate p → 0 on random 10-regular networks. Sub-critical avalanches are distributed with an exponentially-truncated power law p(s) ∼ s −3/2 exp[−s/sc]. At criticality, avalanches are scale-free sc → ∞ such that p(s) ∼ s −3/2 . The absolute size of the giants in the super-critical phase scales linearly with the network size (empty symbols N = 10 4 , filled N = 10 5 ). b. The giant fraction g of nodes participating in the largest avalanche for the same simulations as in a. avalanche exponent has been reported for directed smallworld networks. To check the consistency of our findings for these networks, we can lower the density of longrange connections. Indeed, we find that the directedpercolation exponent for small-world networks tends toward the (1+1)-dimensional directed-percolation limit of ≈ 1.108 expected of a circulant graph (see Fig. 12 and associated text). For the power-law network, the corresponding degree exponent was chosen such that the undirected-percolation exponent would change (from the 5/2 mean-field value to 8/3 as predicted in [37]) while leaving the directed-percolation exponent at 3/2 (as predicted in [36]).
In hierarchical modular networks, we observe nonscale-free behaviour for avalanches below the base module size M (s < M ), a p dependent power law for intermediate-size avalanches (M < s < M p −2/3 ), and finally a single power law in the tail (cf. Fig. 3f ). The varying exponent for intermediate-size avalanches is consistent with reports of a Griffiths phase in modular networks with the SIS model [22,38,41] which belongs to the universality class of directed percolation. The largest avalanches are governed by an exponent of ≈ 2.1, which matches with the undirected-percolation exponent for q = 0 (cf. Fig. 16b). We hypothesize that this exponent is close to the pure 2-dimensional percolation exponent, because the hierarchical modular network has a backbone that is very nearly one dimensional, and so the percolation process sees an effectively 2-dimensional lattice upon the introduction of time. All critical-avalanche distributions exhibit a universal curve collapse for various p by re-scaling the distribution by p −2/3 (Figs. 3b, 3d-3f). This indicates that for all p > 0, the critical point belongs to the same universality class for that network topology. In the super-critical regime, a giant component appears, just as in directed and undirected percolation. The probability that a randomly-selected node is in the giant component is the giant component fraction g, which exhibits TABLE I. Calculated critical exponents for different network structures. In each column, we report both the theoretical value from percolation theory, and the value determined in this work, either analytically (by application of generating functions) or from numerical simulations (where they are reported with decimal values). Errors in 1/ν indicate the range over which an acceptable curve-collapse was obtained. A number of additional critical exponents were determined for the k-regular network, these are summarized in Table III.   and p = 10 −4 , respectively, while solid lines are the analytical calculations for infinite lattices. b. As in a, but re-scaled to produce a finite-size scaling curve collapse. c. Symbols are as in a, but studying the susceptibility χ = s 2 c for finite clusters, which shares the same finite-size scaling exponent. d. Curve collapse for power-law networks (degree distribution p(k) ∼ k −3.5 ) with p = 10 −4 . Solid line is g ∼ (q − qc) β for β = 2, consistent with theoretical prediction. a power-law scaling, g = G N T ∼ (q−q c ) β , where G denotes the size of the largest cluster, N the number of nodes in the system, and T the simulation duration. However, g exhibits a strong finite-size effect, with smaller systems having a larger effective critical point. Below the effective critical point, the largest cluster G does not scale with the simulation duration and is not percolating. For this reason, we resort to finite-size scaling to reveal the critical behaviour of g on finite networks (Fig. 4). As N → ∞, the effective critical point q c (N ) tends towards the true critical point, with (q c (N ) − q c ) ∼ N −1/ν . The correctness of our finite-size scaling is confirmed by considering the finite cluster susceptibility χ ≡ s 2 c (where the c de-notes an average over all clusters), which should obey the same scaling collapse with χN −γ/ν (Fig. 4c). For mean field, 1/ν = 1/3 and for pure percolation on power-law networks (with degree distribution p(k) ∼ k −3.5 ), we expect that 1/ν = 1/5 [37]. Here, we find that the best scaling collapse occurs near these values, with 1 ν ≈ 0.36 for 10-regular networks and 1 ν ≈ 0.25 for power-law networks.
Above the effective critical point, the giant component agrees with our analytical predictions for the infinitesize limit (Fig. 4a). As expected, the giant components emerge with β = 1 for the mean-field case of random 10regular networks (Figs. 4a and 4b). As can be seen in Fig. 4d, the giant component grows with β = 2 for the given power-law networks. Since p(k) ∼ k −3.5 and there are no correlations between the indegree and outdegree, it is known that β = 2 for undirected percolation [37] and β = 1 for directed percolation [36]. This implies that the emerging giant component in our system is in the universality class of undirected percolation.

C. Analytical results
In this section we establish analytically that the universality class of the phase transition is lifted from directed to undirected percolation by the addition of spontaneous activations. To understand the transition between directed-and undirected-percolation exponents, we consider the analytically tractable k-regular network. Using the generating function formalism, we can explicitly derive scaling exponents related to the avalanche size and emergence of the giant component, as well as derive a critical line. By studying the sizes of singlyrooted avalanches on this critical line, we can identify the size at which the merging of independent clusters of activity becomes the predominate mechanism for cluster growth, and thereby explain the scaling collapse effected by p 2/3 observed in Figs. 3c-3f. Additionally, the directed-percolation universality class exhibits two distinct diverging correlation lengths at the critical point. However, only one correlation length diverges in our model, reinforcing that this is an undirected-percolation transition.
The avalanche distribution of our model is akin to the cluster-size distribution of percolation and directed percolation; this distribution has been analytically determined on a variety of infinite random networks for both types of percolation by using probability generating functions (PGFs) [1,36,37,42]. The technique's key assumption is that there are no loops and that all nodes are equivalent, in the sense that their network properties are independent of the properties of their neighbours. Although this is only an approximation, this tree-like approximation can still perform well in cases where loops are prevalent [43]. This assumption lets one write down a self-consistent equation for the PGF in terms of the number of connected neighbours where the cluster that each neighbours connects to is distributed according to the original PGF. Our approach, detailed in Appendix B, follows that same spirit, except that a system of two self-consistently coupled PGFs are required to describe the total cluster distribution. One PGF corresponds to the sizes of clusters reached from a direct descendent, while the other describes the cluster size reached when two independent cascades merge. On a tree-like network, merging occurs when spontaneous activations meet and become a larger avalanche. Therefore, the directedpercolation-like behaviour is entirely contained within the first PGF, while the second PGF captures the effect of new spontaneous activations.
This pair of PGFs combines to define the PGF H 0 (x) = ∞ s=1 P n (s)x s which corresponds to the avalanche-size distribution P n (s) obtained from sampling random active nodes (denoted with the sub-script n). Derivatives of H 0 (x) evaluated at x = 0 directly yield P n (s). The average cluster size is just given by s n = H ′ 0 (1) and the susceptibility χ = s 2 c = where the sub-script c indicates sampling over clusters as opposed to active nodes. Since the giant component is the unique infinite-size avalanche, the probability a random active node belongs to it is just 1 − H 0 (1), and so the giant component is also determined by H 0 . For sufficiently small p and q, the giant component is zero and all clusters are finite, but as the critical line is approached the susceptibility χ diverges as χ ∼ |q c − q| −1 (for fixed p) or χ ∼ |p c − p| −1 (for fixed q) as derived in Appendix B. This divergence defines the critical line, which can be simply expressed as where σ(p, q) is the reproduction number or branching ratio, and σ m (p, q) is the merging number, corresponding to the number of cascades of activity leading to a randomly-selected active node with at least one parent. Clearly then, the critical line has σ < 1 for all σ m > 0, meaning that giants can occur even before an average reproduction number of 1 is attained. As shown in Fig. 5a, σ = 1 only in the p → 0 and q → 1 k limit of directed percolation for a k-ary tree [40], where it agrees with the derived critical line. At the directed-percolation critical point, the active fraction of nodes Φ = Φ(t) exhibits a divergence in its dynamic susceptibility χ 0 , with χ 0 ≡ ∂Φ ∂p diverging as In the context of neural systems with mixed time-scales and a fixed level of spontaneous activation, the maximum of this dynamic susceptibility defines a "Widom" line (cf. Fig. 10 and associated text) and has been proposed as a quasi-critical line [44]. Although all three of these measures identify the directed-percolation critical point p = 0 and q = 1 k , they disagree as soon as spontaneous activation is introduced (p = 0) and exhibit distinct scaling (cf. Fig. 11). In the p ≪ 1 limit, the Widom line scales as p ∼ ( 1 k − q), the σ = 1 line scales Widom Line as p ∼ ( 1 k − q) 2 and the critical line scales as As for the q = 0 endpoint to the critical line, χ diverges when p = 1 (2k−1) and q = 0, the pure percolation critical point for the Bethe lattice of coordination number 2k. Hence, the critical line contains members belonging to two distinct universality classes.
To understand the appearance of the p −2/3 scaling of the transition point shown in Fig. 3, we can consider the distribution of avalanches with only one root. These avalanches are described by a branching process, on a k-ary tree, with a branching probability ) corresponding to the probability that a daughter branch activates with exactly one parent. The probability distribution for the size of the singly rooted avalanche is denotes the characteristic scale above which avalanches merge and P d is the probability a site does not activate, despite having an active parent. Hence, we expect that the exponent s −3/2 should be exponentially suppressed at s m . In the limit of p → 0 on the critical line (see Eq. (C4)) s m scales as: Combining Eqs. (3) and (4) shows that the characteristic size before merging scales as s m ∼ p −2/3 on the critical line.
Simulations confirm that the smallest avalanches typically only have one root (Fig. 5b), while the largest avalanches have a number of roots that scale with the avalanche size (Fig. 5c). This means that there are two competing processes at play in these avalanches, both the propagation of the avalanche, which belongs in the directed-percolation universality class, and the merging of initially-independent events, which falls into the percolation universality class. This explains the appearance of the two power-laws and the associated curve collapse in Fig. 3. The first power-law is governed by the spreading of activity from a single initiation site, while the second power-law is governed by the merging of activity springing from multiple sites. This − 2 3 scaling is a good approximation for random graphs that are close to mean-field. In Appendix F, we consider small-world networks with a low shortcut density. These networks are locally one-dimensional, and as a consequence exhibit a different scaling, s m ∼ p −0.75 (cf. Fig. 12) due to the directed-percolation phase being (1+1)-dimensional instead of mean-field.
This transition between the directed-and undirectedexponents also manifests itself in the approach to the critical point. For instance, for q < q c the susceptibility can be approximated by where Θ is the Heaviside step function, and s ξ is the size cut-off of the pure-percolation tail, s ξ ∼ |q c − q| −1/σ and F is a universal scaling function. Then, using that This suggests that we see a transition between exponents when δq ≈ 3 √ p, precisely as observed in Fig. 5d. Now, since χ ∼ δq γ defines γ, we have arrived at the usual scaling relation γ = 3−τ σ . This scaling relation holds for both the directed (γ DP = 3, σ DP = 1 2 , τ DP = 3 2 ) and undirected (γ = 1, σ = 1 2 , and τ = 5 2 ) percolation regimes.
A transition from directed-percolation exponents also appears in the dynamical exponents relating the size of avalanches to their duration (cf. Fig. 5e), where the exponent transitions from s ∼ T σνz= 1 2 to a power-law consistent with s ∼ T 1 4 . The onset of this transition again occurs with avalanches of size s m ∼ p − 2 3 , which defines a characteristic time to merging, T m ∼ √ s m ∼ p − 1 3 . The scaling of this characteristic time captures an exponent transition in the distribution of the avalanche durations (cf. Fig. 3f, with P (T ) ∼ T α , with α DP = 2 and a new asymptotic α ≈ 7.0. Intriguingly, the directedpercolation scaling relation τ −1 α−1 = σνz is satisfied even in the merging regime, assuming α = 7, τ = 5/2, and σνz = 1 4 . The existence of robust scaling relations and of curve collapses (Figs. 5b-5f ) that appear universal indicate that the critical line (for p > 0) belongs to a single universality class. Since this includes the point p = 1 2k−1 and q = 0, which we know is exactly undirected percolation, it suggests that the entire critical line (save for p = 0, q = 1/k) belongs to the universality class of undirected percolation.
We can further strengthen the argument that the critical line is an undirected-percolation transition by studying the correlation lengths of the system. Undirected percolation exhibits a single isotropic diverging correlation length ξ ∼ |δq| −ν , while directed percolation exhibits two diverging correlation lengths ξ ⊥ ∼ |δq| −ν ⊥ and ξ ∼ |δq| −ν corresponding to spatial and temporal correlation lengths respectively. We consider the correlation lengths corresponding to ξ ⊥ and ξ for our system, and will show that only one diverges on the critical line, precluding a directed-percolation transition. The twopoint connectedness function, γ(i, t i , j, t j ) measures the probability that node i at time t i and node j at time t j belong to the same cluster over the ensemble average. If we denote the shortest path connecting nodes i and j as d ij then we expect that the average connectedness function should decay with d ij . This can be seen by studying the exponential decay of the average connectedness function, g(d, t) = γ(i, t i , j, t j ) dij =d, t=tj −ti, i active which measures the decay of activity away from an active node.
Typically, activity decays exponentially, with g(2d, 0) ∼ exp[−d/ξ ⊥ ] and g(0, t) ∼ exp[−t/ξ ] defining the two correlation lengths ξ ⊥ and ξ [32]. In the loop-less (large N ) approximation, g(0, t) = δ t0 , as in the absence of loops activity can never return to the same site, meaning the correlation length ξ vanishes. Meanwhile, the perpendicular correlation length is given by , which implies that the perpendicular correlation length diverges when (1 − σ) 2 = σ 2 β, i.e. on the critical line where s n diverges. The divergence of ξ ⊥ and the decay of g(2d, 0) is compared to its analytical form in Fig. 13. The non-divergence of ξ on the critical line suggests that the critical line is not a directed-percolation transition. One might object that the non-divergence of ξ is a problem with its construction. In the Appendix G, we consider also an isotropic correlation length that diverges on the critical line, and show that an alternative definition of ξ based on the typical number of generations of direct descendants to an active node only diverges on the σ = 1 line (cf. Fig. 5a). Additionally, the fact that ξ ⊥ diverges with ν ⊥ = 1, as in undirected percolation, instead of ν ⊥ = 1 2 as expected in mean-field directed percolation reinforces that the critical line is an undirected-percolation transition.
In summary, the critical line is an undirected (as opposed to directed) percolation transition except at a singular point. This is supported by avalanche distribution exponents, exponents of the order parameter g, undirected-percolation scaling relations, and the divergence of a single correlation length. Many critical exponents of the directed percolation remain observable on small scales, such as in the beginning of the avalanche size distribution or in the susceptibility χ. These exponents then shift to the undirected exponents when the merging of initially-independent avalanches becomes prevalent. Meanwhile, other measures of criticality that hold for directed percolation, such as the divergence of the dynamical susceptibility χ 0 and a reproduction number of one, no longer capture critical behaviour. Instead, they predict phase-curves that agree only in the p = 0 limit, and scale with different power-laws near the directedpercolation limit. Specifically, 1 k − q ∼ p a , with a = 1 for the Widom line, a = 2 for the σ = 1 line, and a = 3 for the critical line (c.f. Fig. 11). Thus, the directedpercolation transition is not robust with respect to the introduction of spontaneous activation -any level of exogenous driving will introduce independent outbreaks, which on the largest scales will begin to merge. This is perhaps surprising, because the undirected-percolation limit q = 0 obeys detailed balance, while for the remainder of the critical line with q > 0 detailed balance is not respected.

A. General model observations
We have described a two-parameter spreading process that includes spontaneous activations and exhibits a phase line along which the critical exponents and behaviour of both directed and undirected percolation appear. When there is no spontaneous activation, the model exhibits a directed-percolation transition, marked by a divergence in the dynamical susceptibility, a reproduction number of 1, power-law distributed outbreak sizes and the appearance of a giant component. However, the introduction of spontaneous activation means that the dynamical susceptibility no longer diverges, and that the reproduction number is shifted. Nonetheless, by considering the cluster-size distribution and statistics related to the cluster size, a critical line -exhibiting universal curve collapses and finite-size scaling -can be defined. This means that even in the presence of spontaneous activity, a genuine phase transition exists. The introduction of spontaneous activity destroys the transition to the absorbing state, and shifts the phase line into the universality class of undirected percolation. We showed numerically, on a variety of relevant network topologies, that in the largest clusters as merging becomes increasingly dominant it is the undirected-percolation exponents that dominate. Although all the networks we considered were nominally directed, we expect that our results survive on undirected networks, as our small-world networks were comprised predominately of bidirectional connections.

B. Critical brain hypothesis
Our results have repercussions for the critical brain hypothesis. Although the hypothesis itself is not new [45,46], it gained traction with the seminal work on neuronal avalanches [16,21] and with the development of largescale brain recording techniques. It is still a highly debated topic [19,20], and recent work has focused mostly on (i) the appropriate definition of an order parameter and its tuning and (ii) whether neural activity distributions show critical power-law statistics.
Regarding (i) the main objective is to find a plausible mechanism by which the brain is able to tune its own activity to a critical point; ongoing research focuses on self-organized criticality [47, 48], excitatory-inhibitory activity balance [49], up and down states [50,51], adaptive mechanisms [52], and learning [53] amongst others. Many of these concepts are deeply related, and all of them might play a role. Regarding (ii), early work focused on whether the measured statistics followed a real powerlaw or just an approximation, and whether the activity was sub-or super-critical instead [54]. However, this is a challenging issue to solve experimentally due to the role of finite-size and sub-sampling effects [55][56][57], and due to the real lack of separation of time scales as we report here. The initial reports on neuronal avalanches reported an exponent close to τ ≈ 1.5 for the size distributions, consistent with a mean-field branching process [9,16], but experiments on a variety of neural systems have reported a variety of exponents, usually in the range 1.2 -2.5. While this discrepancy might be partly explained by the technical challenges in probing the tails of powerlaw distributions, or attributed to to variation in network topology between studies and heterogeneous dynamical properties [14], there yet remain reasons to think spontaneous activation has a significant bearing on the critical brain hypothesis. For instance, critical in vivo neuronal avalanches have only been reported for waking states, when animals presumably are stimulated by sensory input [50,54,58]. In light of our work, this might indicate that criticality is only attained with the help of external drive.
Our work addresses both (i) and (ii). For (i), we show that a susceptibility χ based on causal webs accurately identifies the critical line, with the excellent finite-size scaling one expects from a true phase-transition. This offers a well-defined observable that is coherent in the present of spontaneous activity, unlike the branching ratio or a global measure like the dynamic susceptibility. As for (ii), we show that power-laws (with exponents that vary based on network structure) are present at the critical point for any level of spontaneous activation. While power-law statistics can also appear in non-critical systems with simpler dynamics as a recent critique showed [59], the presence of scaling relations between the critical exponents [9] is only true in pure critical systems. We demonstrate such scaling relations in our model with coexisting power-law regimes, showing for the first time that neural networks with spontaneous activity can still be genuinely critical. Additionally, our prediction of two power-laws may help to explain the variety of exponents fitted to power-laws in the literature.
Only recently other works have started to pay attention to the role of spontaneous activity for the critical brain hypothesis [28,30,44,[60][61][62][63]. Most of the previous definitions and requirements for criticality cannot be satisfied in the presence of spontaneous activity since time-scale separation is barely satisfied for any realistic activity rate. Several attempts have been made to recover the original definition of criticality and powerlaw statistics by accurately choosing and exploring the appropriate temporal bin size for the avalanche definition [14]; but it is still not clear whether that approach really recovers the same underlying dynamics, and might only hold if the system is exactly at the critical point and for intermediate size systems. What is clear however, is that in the thermodynamic limit, with a fixed spontaneous activation rate, there will always be unrelated avalanches occurring, and so avalanches defined by global observables (such as fluctuations in Φ or those delimited by global quiescence) are not well-defined. As shown in the present work, in the absence of time-scale separation, it is essential to know the network structure to resolve the underlying dynamics. There is currently no way to recover the correct exponents without access to the network structure, but approaches that first try to infer the structure from the dynamics appear to be promising [64].
Another key point that will have to be tackled in the future in relation to the critical brain hypothesis in the presence of spontaneous activity has to do with information transmission. Can it still be optimal in this regime? Spontaneous activity can indeed enhance information transmission from a sensory system [65], or shift an inherently sub-critical system closer to the criti-cal point (see Fig. 5). However, a read-out of this activity might require an error-correction code to be present.

C. Aspects of disease spreading
Much study of spreading on complex networks has been driven by a desire to study disease spreading in human contact networks, and we have borrowed heavily from that discipline in this paper. We would be remiss not to mention that a small number of studies have relaxed the patient-zero assumption of the typical spreading process, by including multiple initial spreaders [66][67][68][69][70][71][72]. A limited few have reflect a disease reservoir that can cause new outbreaks even as old ones spread [31,73,74]. To the best of our knowledge however, the question of the universality class of spreading as new spreaders are introduced has not yet been addressed in the literature.
By endowing a spreading process with a spontaneous infection rate, we describe disease with an off-network reservoir. This might be an appropriate description for diseases like Zika virus, which can spread via human sexual networks, but also "off-network" via mosquito [75]. Other zoonotic diseases lack an obvious patient zero, as they exhibit periodic reintroduction from animal reservoirs [76]. For example, two-thirds of new cases in the 2018 Democratic Republic of the Congo Ebola outbreak could not be linked to existing cases, potentially representing hidden network links or new outbreaks originating from contaminated bush meat [77,78]. In this case, the spontaneous infection rate could represent either infection from the environment, or a first order approximation for a failure in contact-tracing.
Our model predicts that a non-zero spontaneous infection rate lowers the epidemic threshold. One consequence is that the local reproduction number can be less than one even in the epidemic phase. Further, for zoonotic diseases, the average outbreak size near the epidemic threshold will not follow the typical directed percolation scaling, but rather the isotropic percolation scaling exponents. Perhaps more importantly, close to the directed-percolation limit, a small change in the spontaneous infection rate can have a drastic impact on the average outbreak size. This suggests that a dual approach to epidemics, targeting both the transmission between individuals and the initial routes through which diseases enter the population, may represent a more efficient allocation of epidemiological intervention. For example, one can imagine disease control as a constrained optimization problem attempting to minimize Φ, with finite financial resources for disease interventions that affect p and q. To compute Lagrange multipliers and find an optimal allocation requires calculating ∂Φ ∂p and ∂Φ ∂q . We analytically do this for k-regular networks, but our approach can be directly adopted to networks more pertinent for disease spreading. By mapping real diseases onto our "normal-form", a concrete optimization problem can be constructed.
Finally, our SIS-based model describes a population that can be readily reinfected by a disease, i.e. with no acquired immunity. Although this is not representative of all diseases, the inclusion of immunization (as in the SIR model) does not impact our conclusions about the universality class of disease outbreaks with spontaneous infection. In the appendix (cf. Fig. 15 and associated text), we consider a variant of our model that does not allow for the reinfection of nodes, and find outbreak distributions that exhibit a transition between directedand undirected-percolation power-laws. This is a consequence of the fact that immunization only plays a role when a node would be reactivated i.e. after activity traverses a loop in the network. As we have shown with our tree-like (loop-free) analysis of the SIS case, these loops do not play a significant role in the mean-field limit. Immunization may play a stronger role in undirected networks or directed networks with many short-range loops.

D. Other applications
The generality of our model makes it applicable to other systems where the timescales of spontaneous activation and propagation of activity are comparable, e.g., rumour spreading on social networks [5] or the distribution and propagation of computer viruses [4]. It remains to be seen how our findings translate to self-organized systems and in particular to those that are known to exhibit a self-organized critical (SOC) regime under timescale separation [79][80][81]. Previous studies have shown that a sufficiently high driving rate can induce a transition from avalanche dynamics to continuous flow in SOC systems [82]. Within the context of zoonotic diseases, a multilayer network approach that incorporates humananimal interactions directly [83] or including cooperative diseases [84] might open the door for novel dynamics in the absence of time-scale separation. It would also be interesting to see which metrics, such as k-shell decomposition [85] or a local analysis [86], identify significant spreaders in our model and if network interventions, such as those connected to explosive percolation [87][88][89] and others [90], could be used to stymie or promote epidemics in the presence of spontaneous activation. These remain exciting challenges for the future.

IV. CONCLUSIONS
Spreading processes on networks frequently appear in natural and human systems. The inclusion of spontaneous activity changes the phase transition in these systems from directed percolation to isotropic percolation, because previously independent streams of activity can merge together. These universality classes have differing critical exponents, meaning that diseases with spontaneous infections (e.g. zoonotic diseases) will show different growth profiles near the critical point. This also has several implications for the critical brain hypothesis. Global quantities -such as the active fraction and its susceptibility, the branching ratio, or avalanches defined by global periods of quiescence -do not capture critical behaviour when spontaneous activity is considered. As such, criticality in the brain should be re-assessed using measures that tolerate spontaneous activity. This requires that the use of network structure (e.g. tractography) be paired with dynamics measurement (e.g. fMRI). Proximity to criticality should be assessed using an order parameter based on causal webs, such as susceptibility (χ). Other measures of criticality, like the branching ratio, might lead to critical behaviour being interpreted as sub-critical. If the brain as a whole is critical, then the largest avalanches will have isotropic, rather than directed, percolation critical exponents as merging becomes the dominant growth mechanism.

Network generation
For the finite networks, we generate finite directed kregular networks via the configuration model, shuffling connections to avoid self-links and multi-links. To generate power-law networks we employed a variation of the configuration model described in [91], with a degree distribution p in/out (k) = k −3.5 with a domain k ∈ [5, ..., 1000], with rejection parameters κ = 0.5, δ = 0.05. We generate the small-world networks using a directed network generalization of the Watts-Strogatz model [34], using rewire probability 10 −2 and with average degree 10 (a rewire probability of 10 −3 is also considered in Appendix F). We generate the hierarchical modular networks described as "HMN-2" in [22] as a backbone for our modular networks. Within each base module with n c connections in the module backbone, we place 10 2 + αn c nodes with α = 4 being the four inter-modular connecting nodes. The first M = 10 2 nodes we draw from the out-degree distribution p(k) ∼ e −(k−10)/(2×0.5 2 ) and connect to other uniformly drawn nodes in the same module. For the next αn c nodes, we draw from the same outdegree distribution, but connect to the first 10 2 nodes in the other modules, according to the module backbone wiring.

Simulation of model on finite networks
Networks are initiated with no active nodes. Each time-step, the number of nodes that will activate is drawn from the binomial distribution, with activation probability p. That number of nodes are randomly selected with uniform weighting, re-drawing duplicates. Nodes that activate spontaneously and had no active parents in the preceding time-step initiate a new cluster. Then, all nodes that had active parents in the preceding time step that were not already activated spontaneously are checked for activation. Each node with m active parents in the previous time-step is activated with probability 1 − (1 − q) m . Nodes inherit the cluster label of their parents. If a node would inherit more than one cluster label, then those clusters are merged into a single cluster by relabelling all nodes belonging to the smaller cluster with the label of the larger cluster. Clusters that are found to have no active nodes in a given time-step are terminated, and their size, duration, and number of roots recorded.

Simulations on infinite k-regular networks
For the infinite networks, we begin at a randomly selected active node. We then check its immediate neighbours to see whether they are part of the same cluster. For those that are included, we then check their neighbours for inclusion. We can perform this process such that we need only count the number of unexplored neighbours, of which there are two types: (I) daughters that haven't been checked for inclusion and (II) parents that are known to be included, but whose neighbours haven't been checked. If we're beginning from a root node, there are initially k unchecked daughter branches (type I). If we're beginning from a randomly-active node, we begin with one type (I) neighbour and one type (II) neighbour. The algorithm proceeds to check each unevaluated connection (of type-I or type-II), possibly adding more as it goes, until none remain or the cluster exceeds a given size (typically 10 10 ). Each type of connection is added as follows: • (Type I): We check each type-I, by assuming it has m d other active parents (drawn from a binomial distribution P (m d ) = k−1 m d Φ m d (1−Φ) k−1−m d of k − 1 other parents, activated with probability Φ, the active fraction, given by Eq. (B11)), and include each type-I with probability 1 − (1 − p)(1 − q) m d . If it is included, then we add k type-I con-nections from this daughter and m d type-II connections.
• (Type II): Each type II is included with probability 1. It adds k − 1 additional type-I connections, and m p type-II parents, with m p drawn from the distribution in Eq. (A1): The probabilities of adding a daughter or parent are as derived in the appendix. For the purposes of measuring the two-point connectedness function, the above algorithm can be easily extended to also include the number of time-steps, by simply tracking how many times each active front has followed a daughter branch or a parent branch.
FIG. 6. Numerically determined critical lines for various networks. Points correspond to the avalanche simulations plotted in Fig. 3 in the main article, except for the small-world network with a re-wire of 10 −3 , which is only studied in the appendix. Solid lines are approximate fits of the form p(q) = c(qc,DP − q) a for constant c, a, and qc,DP , whose values are summarized in Table II. k-regular phase curve is exact.

Critical point determination
Accurate determination of the critical point is necessary to effect accurate finite-size scaling. In the case of the k-regular network, the critical point can be determined analytically. However, for the power-law, smallworld, and hierarchical modular networks, determination of the critical point can be done in two ways. The naive approach is to simply tune q for fixed N and p until power-laws appear in the avalanche distribution. However, this is prone to finite-size effects: for fixed p and N , the largest power-laws in the P (s) distribution will appear at the pseudo-critical point, corresponding to a larger q c (N ) value than the true q c (N → ∞). Finitesize effects similarly cripple approaches based on just the , which was shown to have no finite-size dependence at the critical point. Therefore, for fixed p, the critical point q c can be found as the intersection point of the B(q) curves for different values of N (see Fig. 12a). This enables the numerical determination of the p, q critical lines for the networks discussed in the main text (cf. Fig. 6 and Table II).

Appendix B: Generating Functions
Before deriving the generating functions associated with the average cluster size, we begin with a brief review of probability generating functions (PGFs). For a discrete random variable X drawn from the probability mass function p(x), the probability generating function can be defined as g X generates the probability p(x) in the sense that g X (0) = p(0), and the n th derivative yields: 1 n! g (n) X (0) = p(n). The probability generating function can be used to obtain the moments of X, as X = g ′ X (1), X(X − 1) = g ′′ X (1), and so on. The final property of probability generating functions we will use is perhaps its most useful: when a family of independent and identically distributed variables {X 1 , X 2 , . . . X N } generated by g X (z) are summed Y = N X i , with N also being a random variable generated by g N (z), then g Y (z) = E(z Y ) = E(z N X ) = ∞ n=1 p(n = N ) E z X n = g Y (g X (z)). Although this may seem esoteric, it means that the sum of a collection of some random number of random variables can be concisely expressed using generating functions.
We will derive the PGFs corresponding to the cluster size distribution beginning from randomly selected active sites on a random network. If there are no loops in the network (the tree-like approximation), we can express the  of k parents n p active of k daughters n d active FIG. 7. A firing pattern example represented both as the sum of variables and the product of generating functions. a. An example activation pattern beginning from a randomly selected initial active node, I, on a 4-regular network. Thick edges indicated connected active nodes. b. The number of parent and daughter edges contributing to the cluster are labelled by np and n d , random variables that could vary from 0 to k. Nodes are labelled with their size contribution to the clusternodes that do not activate contribute zero, the initially considered node contributes one activation, while active parents and daughter contribute a random variable. c. The same activity pattern labelled with the the probability generating functions corresponding to each random variable.
cluster size s starting from a random active as where n d is the number of active daughters of the initial site, n p is the number of active parents of the initial site, s d,l is the size of the cluster reached from the l th active daughter, and s p,m is the size of the cluster reached from the m th active parent. This means that the PGF for the total cluster size s is for the PGFs A o (generating n d ), B i (generating n p ), H p (x) (generating the s d,l ) and H d (x) (generating the s p,m ). The connection between the activation pattern and Eqs. (B1,B2) is illustrated in Fig. 7. An active daughter of I, here labelled Y , will have the number of parent branches that can be considered reduced by one, so, whereñ p ranges from 0 to k − 1 and counts the parents other than I. s d is therefore generated by H p which obeys the following self-consistent equation: (B3) Similarly, an active parent of I, here labelled X, will have one fewer daughter branch to consider, so its cluster size contribution is whereñ d ranges from 0 to k−1, and counts the daughters other than I. s p is generated by H d , which obeys the following self-consistent equation: The relationship between the three size generating functions, H 0 , H d , and H p are illustrated in Fig. 8.
To summarize, H p corresponds to the cluster size reached when arriving at a node from one of its parent branches and H d corresponds to the cluster size reached when arriving at a node from one of its daughter  Fig. 7, but centred on three different nodes, X, I, and Y . a. Corresponding to H0, and beginning from a randomly selected initial active node, I, on a 4-regular network. Thick edges indicated connected active nodes. Two neighbouring active nodes, a parent and daughter (X and Y respectively) are highlighted as corresponding to the other two generating functions. b. An example activation pattern near X. Since one daughter connection leads to I, only k − 1 are available for other connections. This restriction on daughters is why H d differs from H0. c. An example activation pattern near Y . Since one of the parents of Y is I, only k − 1 other parents need to be considered. In part due to the restrictions on parents, Hp differs from H0.
As the giant component appears, the average cluster size diverges. Therefore, identifying the conditions under which the average cluster size diverges is a natural way to identify the critical line. For q ≤ q c , when H 0 (1) = H p (1) = H d (1) = 1, the average cluster size is given by Therefore H ′ p (1) and H ′ d (1) diverge when the determinant of the above matrix is zero, i.e., when 0 = (1 − . This condition will yield the critical line, when supplied with the PGFs for A and B.

Neighbour generating functions for k-regular networks
So far, we've been quite generic in developing the generating function H 0 . To proceed further, we must supply A i/o and B i/o for a given network. For simplicity, we focus on the k-regular network. This will allow us to develop expressions for Φ, the active fraction, and P d , the probability that the daughter of an active site activates in the next time step. The first quantity we will need is the active fraction -the proportion of nodes activated in each time step. A randomly selected (not necessarily active) node will have m active parents with probability k m Φ m (1 − Φ) k−m , as each parent is independent. With m parents, the probability of activating is 1 − (1 − p)(1 − q) m . Now since the probability of activation for a random node is also Φ, we can write (using the notation p = 1 − p to denote complementary probabilities) the self-consistent equation: It will be useful, when performing asymptotic analysis in the limit that p → 0, to have a closed-form approximation for Φ. If we assume that Φ ≪ 1, we can truncate the expression Φ = p(1 − kqΦ + k(k−1) 2 q 2 Φ 2 + . . .) to first or second order in Φ and solve for Φ, from which we obtain the first order approximation and the second order approximation (choosing the positive root, since Φ > 0) For P d , we have one active parent, and k − 1 parents that are independently active with probability Φ. Hence, and simplifying using Eq. (B11) Note that σ = kP d defines the branching ratio. Now that we have both P d and Φ, we can derive A i/o and B i/o . The simplest to derive are A o (x) and B o (x), because they describe the number of activated daughters, and the activation of each daughter is independent of the others. Considering a single daughter, whose activation can be described by a single random variable m ∈ {0, 1}, with m = 1 only if the single daughter activates. The PGF corresponding to m is If n is the number of activated daughters for a site with l available daughters, then n = l i=1 m i for m l being independent and identically distributed (iid) Bernoulli variables generated by C(x). Then, taking l = k for A o , we have For B o , we have one fewer daughter from which to choose, because we arrived at the node in question by means of one active daughter, so we take l = k − 1 to find (B16) Now, for A i and B i , we cannot treat the parents' activation as independent. This is because we must condition on the knowledge that their daughter must activate and, in the case of A i , also on the presence of other active parents.
Treating B i (x) first, we are considering an active site (labelled X) that we arrived at by means of an active daughter (in Fig. 8, I). Therefore, we have no knowledge about the number of active parents, save for the fact that they successfully activated the node in question. Considering the probability mass function in Eq. (B7), Bayes' theorem allows us to write P (n p active parents of X | X active) = P (X active | n p active parents)P (n p active parents) P (X active) .
However, P (X active | n p parents) = 1 − p q np by definition of the model (Eq. (1) of the main text), while the probability of n p active parents, unconditioned on anything else is just given by P (n p active parents) = k np Φ np Φ k−np . Lastly, the probability that X is active, conditioned on nothing else, is just the active fraction Φ. Thus, P (n p active parents of X | X active) which is exactly Eq. (A1). The generating function cor-responding to B i is therefore given by P (n p active parents of X | X active)x np and can therefore be expressed as For A i (x), we are considering a node, Y , that we arrived at from an active node (labelled I in Fig. 8) that is one of Y 's parent branches. A i (x) is the generating function for the number of additional active parents of Y .
Considering the probability mass function in Eq. (B5), and applying Bayes' theorem, P (ñ p active parents of Y excluding I | Y active & I active) = P ( Y active | I active &ñ p other active parents of Y ) × P (ñ p of the k − 1 parents other than I active ) P (Y active | I active) .
Each of these probabilities are known.
by definition of the model (Eq. (1) of main text), and P (Y active | I active) = P d . Hence, Now, the generating function A i is given by so after some algebra we have This concludes the calculation of the four generating functions A i/o and B i/o for the k-regular network. These calculations can also be conducted for other random networks, although the calculation is more technically involved when the in-degree can vary or correlations exist between the in-and out-degrees. In summary, and in terms of Φ and P d , the PGFs A i/o and B i/o for the k-regular network may be expressed as and

Observables from the generating function
Here, we summarize how to extract observables, such as the size fraction of the giant component g, suscep-tibility χ, and cluster distribution P c (s) from the generating function H 0 (x). Practically speaking, we solve Eqs. (B3) and (B4) self-consistently for H d (x) and H p (x) via a Newton-Raphson scheme for a given set of model parameters p, q, and x. With H d (x) and H p (x) in hand, we can insert these into Eq. (B2) and obtain H 0 (x).
The first quantity we can obtain from H 0 (x) is the fraction of nodes involved in finite clusters, which is just H 0 (1) = s p(s)s = 1 − P ∞ . So the giant component fraction g, the fraction of all nodes at all times that are part of the infinite cluster, is just g = Φ(1 − H 0 (1)). For the susceptibility, χ = s 2 = s 2 p c (s), we must make the distinction between the cluster size distribution p c (s) (for numerical simulations, reported simply as P (s)) and the per-node cluster size distribution P n (s). The latter describes the cluster sizes observed by sampling random active nodes, and is directly calculated by the generating function approach, or accessed by simulating avalanches on the infinite lattice. Clearly, P n (s) = AsP (s), for a normalization factor A. Since P (s) = 1, A = . As was pointed out in [35], numerically evaluating this derivative for large s is most easily accomplished via a contour integral on the circle z = e iφ for φ ∈ [0, 2π]. z d becomes highly oscillatory at large d, so convergence of this integral can be improved via standard numerical techniques for oscillatory integrals [93]. The cluster probability distribution can then be accessed as P (s) = 1 As P n (s).

Phase-diagram for the k-regular network
We can study the divergence of χ ∼ s n by solving Eq. (B10), and inserting the solution into Eq. (B9) to obtain is the expected number of other active parents, to an active node with one already known parent. That is, σ m describes the rate of merging of initially independent clusters. Clearly, s n diverges when k(1 − σ) 2 − (k − 1)σσ m = 0 (Eq. (2) of the main text). This result could also have been arrived at by setting the determinant of Eq. (B10) to zero. A reparameterization that will be convenient when considering the correlation length is to replace σ m with β = (k−1)σm kσ , meaning that the critical line diverges when The set of (p c , q c ) that cause this divergence define a critical line (see Fig. 5a in the main text). . Average cluster size for k-regular networks approaching critciality. Numerical simulations (points represent mean of 10 6 realizations) on an infinite 10-regular graph yield good agreement with analytical predictions (lines) for s n Solving 0 = (1 − σ) 2 − σ 2 β for q, and assuming Φ ≪ 1 (as occurs in the p ≪ 1 limit with q < q c ) yields Inserting the first-order closed form approximation for Φ ≪ 1 (Eq. (B12)) into the solution for q yields the small p expansion for the phase-curve (Eq. (3) in the main text) The average cluster size s n (and therefore susceptibility χ) diverges for (p, q) near to points on the critical line (p c , q c ) as s n ∼ |p c − p| −γ (for q = q c ) and s n ∼ |q c − q| −γ (for p = p c ) with γ = 1. This is a direct consequence of the fact that the numerator and denominator of Eq. (B27) cannot both be simultaneously zero (except for the degenerate q = 1 case). Hence, the behaviour near the critical line will depend only on how the denominator f (p, q) = (1 − σ) 2 − k−1 k σσ m scales near its zero p c , q c . As ∂f ∂p = 0 and ∂f ∂q = 0 at (p c , q c ), the Taylor series approximation f (p, q) ≈ ∂f ∂p (p − p c ) + ∂f ∂q (q − q c ). Choosing p = p c or q = q c immediately yields the powerlaw scaling exponent γ = 1. This divergence can be visualized in Fig. 9. H p (1) are strictly decreasing functions of q, g ∼ (q − q c ) identifying the critical exponent β = 1.

Appendix C: Mergeless Avalanches
The mergeless clusters are exactly those clusters with one root. The number of configurations of singly rooted clusters of size s is given by the Fuss-Catalan numbers C ks s , which count the number of incomplete k-ary trees with s vertices [94]. Such a tree has perimeter (unoccupied branches) of length t = (k − 1)s + 1. Nodes are included in the tree with probability denoting the probability that a given daughter node is activated while having exactly one parent. The excluded nodes on the perimeter occur with probability P d , which is the probability of not activating, despite having an active parent. Hence, the probability of observing a mergeless cluster of size s is given by P (s) = C (k) Applying Stirling's approximation we find where the characteristic mergeless size is In the limit of p ≪ 1 and 1 k − q ≪ 1, expressing P d (Eq. (B14)) and P d1 (Eq. (C1)) in p, q, and Φ, and again using the closed form approximation for Φ (Eq. (B12)), we find to lowest order in p and ( 1 k − q): (C3) and if we apply Eq. (3) to observe how the cut-off scales on the critical line, we find: which was Eq. (4) from the main text. Since q c,DP = 1 k and the relation s m ∼ (q c,DP − q) −1/σ DP defines the directed percolation exponent σ DP , we have also recovered the usual directed percolation exponent σ DP = 1 2 (cf. Table III). In equilibrium critical points, divergence in the correlation length is associated with a divergence in the susceptibility of the order parameter to an infinitesimal application of an external field. In directed percolation, the order parameter is Φ. The susceptibility measures activity of the system in response to an external stimuli. We can imagine that the external stimuli is an infinitesimal increase in the average spontaneous activity of the system, and hence we can define the dynamic susceptibility as χ 0 ≡ ∂Φ ∂p . So, using Eq. (B11) we find: and simplifying with Eq. (B11) we obtain In the limit p → 0 with Φ → 0 and q → 1/k, χ 0 is, asymptotically, χ 0 ∼ 1 k 1 k − q −1 , and therefore diverges at the directed percolation critical point q = 1 k and p = 0. This susceptibility has been studied in the context of neural systems, where the mixing of initiation and spreading time-scales means χ 0 no longer diverges (cf. Fig. 10), but instead is maximized on a quasi-critical "Widom" line, where the fluctuations Var(Φ(t)) are also maximized [44]. In directed percolation, there are several indicators of the critical point. The mean cluster size diverges, the branching ratio is one, and the dynamic susceptibility  Table I of the main text, where δq = |qc − q| and δqDP = |qc,DP − q|, where qc,DP denotes the directed percolation critical point (i.e. at p = 0). Exponents related to temporal dynamics (i.e. α and 1/σνz) reflect numerical observations from Fig. 17, while all other exponents are analytically determined.
Exponent Quantity This work Literature diverges. However, with the introduction of spontaneous activation, it is clear that these indicators no longer agree (cf. Fig. 5a of the main text, or Fig. 11). In fact, the branching ratio is no longer a clear signal, because independent streams of activity can merge together and nodes can spontaneously activate. Meanwhile, the dynamic susceptibility no longer diverges, but instead attains a maximum one what is referred to as the Widom line.
Although the Widom line, unity branching ratio (σ = 1) line, and line of diverging cluster size all agree as p → 0, they obey different power laws in their approach to that point (Fig. 11). In this section, we will derive the different scalings associated with these critical and quasicritical lines.
The scaling for the σ = 1 line is given by 1 k − q 2 ∼ p, which can be seen by solving Eq. (B14) for q and using the closed form for Φ (Eq. (B12)), which immediately yields k 2 k−2 1 k − q 2 ≈ p on the σ = 1 line.
As for the Widom line, by setting ∂χ0 ∂q = 0 and applying some simple algebraic manipulation, the Widom line can be found to consist of the q and p satisfying 0 = 1 − kqΦ 2 − 2Φ + qΦ 2 . The first order approximation for Φ given by Eq. (B12) is poor in the vicinity of the Widom line (after all, it is in the vicinity of the point of maximum susceptibility in Φ) and so a second order approximation for Φ (Eq. (B13)) is necessary. With this approximation, the Widom line becomes (upon expansion around p = 0 and q = 1 k ): p ≈ k 2 ( 1 k − q). The critical line was previously shown (Eq. (3)) to obey the scaling 1 k − q 3 ∼ p. These scalings are illustrated in Fig. 11.
Appendix F: Phase diagram and scaling collapse on small-world networks By numerically determining the critical point for different p values, we can build a critical line for the small-  The numerically derived critical line is represented with symbols, the black line is the non-linear least-squares fit to the data. c The average number of roots, exhibits a transition that scales with sm ∼ p −0.75 . d This same sm effects a curve-collapse in the avalanche distribution, which we see separates the mergeless and merging avalanches. world networks. In the main-text, we showed that the avalanche distribution P (s) exhibits a scaling collapse when assuming s m ∼ p −2/3 , for a small-world network with a re-wire probability of 10 −2 . This was somewhat surprising, as the small-world network still exhibits a large number of recurrent connections and is very nearly a circulant graph. However, for a lower re-wire probability, 10 −3 , recurrent connections play an even larger role, and p −2/3 does not provide such a robust scaling. Instead, we find the collapse is best for p −0.75 (cf. Fig. 12). This can be understood by considering the root-size distribution at the critical point, as is done in Fig. 12c. Clearly, the characteristic merging size s m scales as s m ∼ p −0.75 .
Interestingly, this also allows us to estimate the directed percolation 1/σ DP exponent for the small-world network. Given a phase-line that scales as p ∼ (q c,DP − q) a = δ a DP and a curve collapse effected by p b , and that we expect the curve collapse to scale as s m ∼ δ 1/σDP DP , we have the scaling relation 1/σ DP = ab. The phase diagram for this small-world network (Fig. 12b) relates p ∼ (q c,DP − q) 3.0908 for q c,DP ≈ 0.11533, which implies σ DP ≈ 1 3.09×0.75 = 0.43 for the small-world network with a re-wire probability 10 −3 . When the shortcut density is low, the small-world network is approximately a 1-dimensional circulant graph, which suggests we should use the 1 + 1-dimensional directed percolation exponents. Our result of σ DP ≈ 0.43 compares reasonably well with the σ DP = 0.391 reported in the literature [39]. Similarly, we should identify the avalanche exponent τ DP = 1.108, which matches well with our numerical results (cf. Fig. 12).

Appendix G: Correlation length
In the main text, we introduce the pair connectedness function g(d, t). By consider the (typically exponential) decay of this pair-connectedness function, we can define correlation lengths ξ of the form g(d) ∼ exp[−d/ξ]. We will show analytically that the perpendicular correlation length ξ ⊥ corresponding to the decay of g(d, 0) diverges on the critical line.

Divergence of perpendicular correlation length
We can derive the divergence of the perpendicular correlation length ξ ⊥ by computing g(2d, 0). We consider 2d, because when a daughter branch is followed, t advances by one, while a parent branch decreases t by one. However, ∆t = 0, so the number of parent and daughter branches must both be equal.
That being said, not all routes with d daughters and d parent connections are are equally likely. For instance, whenever a parental connection follows a daughter connection, the route requires that two initially independent avalanches merge at that point. It turns out that the most convenient way to compute g(2d, 0) is to sum over collections of routes that have a fixed number of merges. The number of routes with m merges is given by Given a sequence of parent / daughter network hops, the combinatorial factor d m counts the number of ways that the parent / daughter network hops could be rearranged without altering the number of merges. A network hop that follows a network hop of the same kind (i.e. a parent hop following a parent hop, or a daughter hop following a daughter), contributes k possible paths. Every time a change in direction occurs (i.e. a daughter followed by parent or parent by daughter), only k − 1 links are available, because one was taken to arrive at the node in question. The factor of k 2d−1 k−1 k 2m captures the number of possible paths, given a sequence of daughter / parent network hops. A correction of 1 is required, to account for those paths with one or two fewer change in direction (a boundary condition effect). A daughter or parental connection occurs with weight P d or P p = P d , except when a parental connection follows a daughter connection, when it instead contributes P p1 . So, since the number of merges could range from 0 to d we can write: This expression is compared to simulations on the infinite lattice in Fig. 13, and has the closed form expansion: where 2 F 1 denotes the Gauss hypergeometric function, and σ = kP d and β = (k−1) 2 Pp1 k 2 P d as before.
Since in the limit of large d, c d e f The isotropic correlation length diverges with a power-law of (qc − q) −1 .

Alternative correlation lengths
An alternative parallel correlation length ξ for random graphs is the length characterizing the decay of direct descendants of an active site. This can be measured with g(t, t) = σ t = exp[−t/ξ d ] where ξ d = −1/ ln(σ) denotes the descendant correlation length. This correlation length is given strictly by the branching ratio, σ, and diverges as ξ d ∼ |q(σ = 1, p) − q| −1 . However, the σ = 1 line on which ξ d diverges is well into the super-critical regime, save for the singular point p = 0 and q = 1 k . We've considered a correlation length ξ ⊥ that corresponds to the directed percolation perpendicular correlation length. ξ ⊥ is anisotropic, which may appear to make it unrelated to the isotropic correlation length of undirected percolation. However, the presence of a diverging anistropic correlation length implies that any isotropic correlation length will also diverge. To illustrate this, consider the correlation length, ξ defined by the mean number of sites active after d (isotropic) network hops away from an active node, i.e. g iso (d) =  g(d, 0) ∼ exp[−d/(2ξ ⊥ )], which tends to a constant as q → q c , we know therefore that ξ also diverges in the same limit, as can be seen in Fig. 14.
Appendix H: Static model Many diseases exhibit immunity after spreading. In this paper, we have predominately considered a model with re-excitable nodes, corresponding to the SIS model. It is, however, also of interest to consider diseases that might have multiple initiation points, but which cannot reinfect an individual after they have contracted the disease. This is a modification of the canonical SIR model. To consider such diseases, we initially infect some fraction p of the network. Each connection between nodes transmits the infection with probability q. The analogy to our initial model is not exact because we have disposed of the temporal aspect of the model, i.e. de novo infections are not repeatedly introduced to the system. Nonetheless, there are two modes by which the disease grows -a period of spreading followed by the merging of large clusters. Unsurprisingly, this model also shows two power-laws in the cluster size, with an exponent of −2.5 for q → 0 and an exponent of −1.5 for p → 0 (cf. Fig. (15)) on directed k-regular networks in the percolation and directed percolation limits respectively.
In contrast to our results with the SIS model, we anticipate that in SIR this transition is more sensitive to the details of the network. Immunization plays a role when activity traverses a loop in the network. Percolation on directed k-regular networks is mean-field, and so loops play little role. However, for networks where loops are prevalent (such as small-world networks, or undirected networks) immunization may have a stronger effect. It is known, for instance, that the SIR model falls into the dynamical percolation universality class and shows an undirected percolation phase transition on undirected networks [1,97]. The directed percolation exponents are only demonstrated with SIR on directed networks [36]. It would be interesting to see which aspects of the spreading-merging transition remain, if the SIR model were to be simulated on undirected networks.

Appendix I: Additional figures
In this section we include several figures to supplement the main text. Fig. 16 contains un-scaled avalanche distributions for the hierarchical modular networks and 10-regular networks corresponding to Fig. 3. Similarly, Fig. 17 consists of the exponent transitions for 10-regular graphs without the exponent transitions presented in Fig. 5.