Binary-state dynamics on complex networks: Stochastic pair approximation and beyond

Theoretical approaches to binary-state models on complex networks are generally restricted to infinite size systems, where a set of non-linear deterministic equations is assumed to characterize its dynamics and stationary properties. We develop in this work the stochastic formalism of the different compartmental approaches, these are: approximate master equation (AME), pair approximation (PA) and heterogeneous mean field (HMF), in descending order of accuracy. Using different system-size expansions of a general master equation, we are able to obtain approximate solutions of the fluctuations and finite-size corrections of the global state. On the one hand, far from criticality, the deviations from the deterministic solution are well captured by a Gaussian distribution whose properties we derive, including its correlation matrix and corrections to the average values. On the other hand, close to a critical point there are non-Gaussian statistical features that can be described by the finite-size scaling functions of the models. We show how to obtain the scaling functions departing only from the theory of the different approximations. We apply the techniques for a wide variety of binary-state models in different contexts, such as epidemic, opinion and ferromagnetic models.


I. INTRODUCTION
Binary-state models on complex network are a very general theoretical framework to study the effect of interactions in the dynamics of a population of individuals.They are composed by a set of nodes that are connected between them through a particular random network, where each node holds a binary ("spin"-like, two values) variable that evolves in time by some given transition rates.Typical problems that can be mapped in this scheme include models of epidemic spreading [1][2][3][4], language competition [5][6][7][8][9], social interaction [10][11][12][13][14][15], financial markets [16][17][18][19][20][21], among many others.
Recently, there has been a lot of effort in the development of highly accurate mathematical descriptions of the dynamics of these models.Typically, we can distinguish between two types of approaches depending on the variables that one chooses to describe the system: (i) individual based-approaches [22][23][24][25][26], where the "spin" or state of each node of the network is considered as an independent variable, (ii) compartmental approaches [11,[27][28][29][30][31][32], where nodes sharing the same topological property such as, for example, the number of neighbors in the network, are aggregated in a single variable, being this an integer (occupation) number.Depending on the level of description, i.e. the number of variables and its nature, one distinguishes between different compartmental approaches: approximate master equation (AME) [30,31,33,34], pair approximation (PA) [35][36][37][38][39][40] and heterogeneous mean field (HMF) [27][28][29].Only the individual node and AME approaches can be considered as a complete description of the models, while the PA and * Electronic address: afperalta@ifisc.uib-csic.esHMF introduce constraints between variables which may or may not be fulfilled, thus they are generally a worse approximation.Except in a few cases where fluctuations are taken into account at some extent [41,42] and in its completeness [25,26], the approaches are usually followed by a deterministic type of description [23,27,29,43], where the stochastic nature of the models defined by the individual transitions rates is neglected.The deterministic approach enables one to obtain some important quantities of the models such as the critical point (e.g. the epidemic threshold), or the time evolution of the global state of the system (e.g. the density of infected population).The accuracy and suitability of the different approaches has been widely discussed in the literature.For example in the determination of the epidemic threshold, it has been shown that the two approaches, individual and compartmental, may give contradictory results [44,45] and a general recipe for choosing one or another was given in [46].
Although the deterministic approach gives us relevant information in all situations, it is accurate only in the strict infinite system size limit.Depending on the model, the variables chosen, the values of the parameters and the network, the difference between the deterministic approach and the numerical results may be very important in finite networks [47].Finite-size effects become relevant even for extremely large system sizes, specially if the system is close to a critical point, or the network has high degree heterogeneity.Besides, there are some types of models where the deterministic approach does not provide the relevant information sought.For example, the noisy voter (Kirman) model [48][49][50][51] is an opinion model that considers neighbor imitation and random switching of opinion as basic ingredients.Different versions of the model have been applied in many different contexts, the most important in our perspective being the study price fluctuation in financial markets [16,17] and vote share distributions in electoral data [52][53][54].In this context, the global opinion does not take a fixed deterministic value but shows heavy fluctuations around the mean instead.The statistics of these fluctuations are the most important feature of study, as it shows deviations from the Gaussian behavior (the global variable distributes as a Beta distribution [16,50]) with very similar properties to financial series and vote share fluctuations.The model has a finite-size critical point that vanishes in the thermodynamic limit and thus a stochastic approach is mandatory in order to achieve the correct characterization [19,32].Additionally, the noisy voter model is of major importance because of its simplicity and the possibility of obtaining analytical results, which are helpful to fully understand its properties.Recent generalizations of the model include: the effect of non-linear copying mechanisms [55][56][57], non-Markovian memory effects [58][59][60], zealots [61] and contrarians [62], more than two states [53,63,64], the role of different noise and copying mechanisms in the nature of the transition (continuous or discontinuous) [65,66], etc.
The main aim of this work is to give a general theoretical approach to binary-state models on complex networks that takes into account stochastic effects, going beyond simple incomplete deterministic approaches.With this intention we will first find the general master equation of the individual and compartmental approaches.The master equation corresponds to a full characterization of any Markovian process and one can derive easily the deterministic equations from it [67,68].Although the master equation of the individual and AME approaches give a very accurate result, they are hardly impossible to solve in most situations even computationally [69].In order to overcome this issue and obtain at least an appropriate approximate solution, we will apply different expansion techniques of the master equation.The first one is a van Kampen-like system-size expansion [67,[70][71][72][73], where the variables are split between its deterministic value plus finite-size corrections.The van Kampen approach can be understood as an expansion in inverse powers of system size, and it is known to be accurate far from criticality with increasing accuracy when the system size increases [73].As a natural extension of the van Kampen expansion we will also apply the Kramers-Moyal expansion [67] and derive the corresponding (continuous) Fokker-Planck equation from the original master equation.Close to a critical point, the finite-size effects of the models are well captured by their finite scaling functions [40,42,47,56,74] which can not, a priori, be derived from the van Kampen expansion.The correct way of obtaining the theoretical scaling functions is to apply a similar system-size expansion of the master equation but with an anomalous scaling with system size [75,76].We will check the accuracy and suitability of the expansions as a function of the parameters for different models and networks.
The paper is organized as follows: in Section II we in-troduce the general definitions and notation of binary state models and the main characteristics of the networks.In Section III we construct the master equation for individual and compartmental approaches.In Section III we apply the van Kampen expansion to the general master equation and we show its connection to the Kramers-Moyal expansion.We re-derive the deterministic nonlinear equations [31], together with a set of linear equations for the correlations and average values of the stochastic corrections.In Section IV we compare the results of the numerical simulations in the stationary state of small systems with the theoretical results of the van Kampen approach for different models and networks.In Section V we apply the expansion of the master equation close to a critical point and compare it to the scaling functions obtained numerically in Section V.In Section VI we extend the numerical results to time dependent quantities, together with its comparison to the van Kampen results.We end with a summary and conclusions in Section VII.In the appendices we explain the intermediate steps in the derivation of the equations: in Appendix A we write out the expressions of the matrices involved in the equations of the van Kampen and Kramers-Moyal expansions, while Appendix B contains the details of the expansion around the critical point.

II. GENERAL ASPECTS, MODELS AND NETWORKS
A binary-state model is composed of a population of size N , where each member of the population can be in two states 1, "adopter", or 0, "non-adopter".Depending on the model and the context, the states may represent different properties of the individuals, for example magnetic spin, opinion on a topic, spoken language, infection state, etc.This is naturally described by a set of timedependent binary variables n(t) ≡ {n i (t) = 0, 1} i=1,...,N .Each individual i = 1, . . ., N is regarded as a node of a, single-connected, undirected network, which can be mapped into the usual (symmetric) adjacency matrix A = {A ij }, with coefficients A ij = 1 if nodes i and j are connected and A ij = 0 otherwise, where self-loops are avoided A ii = 0.The degree of node i is defined as the total number of nodes connected to it The degree k can be heterogeneous within the population and one defines the number N k of nodes with degree k, and the associated fraction P k = N k /N called degree distribution.It is also useful to define the moments of degree m as µ m = k∈[kmin,kmax] P k k m , with short notation µ 1 = µ, which corresponds to the average degree.Networks are assumed to be generated by the configuration model [77], with fixed degree distribution P k , which produces uncorrelated networks if k max ≤ √ µN (no degree-degree correlations and no triangles).
The dynamical model under study is defined by the in-dividual rates r ± i , which determine the time evolution of the state variables n i (t).They are defined as the probability per unit time that the transition n i = 0 → 1 occurs, with rate r + i , and n i = 1 → 0, with rate r − i .The rates may depend, in general, on the full set of states r ± i (n), however, most common models assume a dependence only through the number of neighbors in state 1, A ij n j , in addition to the total number of neighbors k i .For this reason we will term the individual rates as r ± i (n) ≡ R ± ki,qi , depending only on k i , q i .In our study we will focus on global quantities, such as the total density of nodes in state 1, m ≡ 1 N N i=1 n i ∈ (0, 1).For symmetrical models R + k,q = R − k,k−q it is more natural to define the magnetization as m S ≡ 2m − 1 ∈ (−1, 1) and we will use one quantity or another depending on the symmetries.The density of active links ρ, i.e. links connecting nodes in different states, is computed as One of the interesting properties of ρ for binary-state models is that it can be used as an alternative to m or m S to measure the level of order or agreement on one of the options, a situation in which ρ approaches zero, independently of the option.The stochastic dynamics produces variability across realizations/trajectories of the stochastic process.For this reason, one performs an average over realizations of the macroscopic quantities m(t) , ρ(t) to characterize the global state of the system.A way to measure fluctuations and variability across realizations is by calculating the variance of the magnetization: which is also traditionally called magnetic susceptibility in spin models, as it also quantifies how the system responds to an external perturbation such as a magnetic field.Note that after the average over the ensemble of realizations/trajectories is produced, one usually performs additional averages over the ensemble of networks generated with the configuration model with the same degree distribution P k .This is because we consider the degree distribution as the only relevant characteristic of the network.

III. THE MASTER EQUATION
The most detailed characterization of models whose dynamics is defined by stochastic rules is achieved by the knowledge of the probability P (x, t) of finding the system in state x at time t.The time-evolution of this probability is governed by a master equation.In order to construct a general master equation we consider: (i) a set of integer variables x ≡ (x 1 , . . ., x M ), and (ii) a set of processes ν = 1, . . ., K characterized by the changes in the variables x j → x j + (ν) j , j = 1, . . ., M , with rates W (ν) (x).Once we have these ingredients the general master equation reads [67,68,78]: where E j is the step operator acting on any function f (x) of the variable x j as E j [f (x 1 , . . ., x j , . . ., x M )] = f (x 1 , . . ., x j + , . . ., x M ).For example, if we choose to include in our description the full set of node-state variables x = n, we have the following K = 2N processes: ν = (i, +) where n i = 0 → 1, and ν = (i, −) where n i = 1 → 0, for i = 1, . . ., N .The changes in the variables are (i,±) j = ±δ i,j and the respective rates When the individual rates r ± i depend only on the number k i of neighbors and the number q i of those in the state 1, r ± i = R ± ki,qi an alternative to the description based on the full set of node-state variables is to consider a compartmental approach also known as AME [94].This mesoscopic description in terms of the number of nodes with the same transition rate, was studied in detail in [30,31] and generalizations of this approach have been developed for multi-state models [79] and weighted networks [80].The occupation numbers are defined as the number of nodes x ≡ {N n,k,q } that are in state n = 0, 1 and have degree k = k min , k min + 1, . . ., k max among which q = 0, 1, . . ., k are adopter neighbor nodes (nodes in state 1).The level of description consists of M = k,q 2 = (1 + k max − k min )(2 + k max + k min ) variables, which are not all independent.The total number of nodes that have degree k is fixed by the network, i.e.N k = n,q N n,k,q , which constitutes a total of k max − k min + 1 constraints between variables.Another more subtle constraint is that in an undirected network the number of 0-1 links is equal to the number of 1-0, i.e.
k,q qN 0,k,q = k,q (k − q)N 1,k,q .Interestingly, in the limit of uncorrelated networks k max ∝ √ N it is M ∝ N , which indicates that the number of variables is of a similar magnitude compared to the nodestate approach.Consequently, the occupation number approach will correspond to a significant decrease in the number of variables only when the degree distribution extends over a limited range of degree values k max √ N .The global variables of interest, used to portray the macroscopic state of the system, are the total number of adopter nodes N 1 = k,q N 1,k,q and the number of active links (connecting nodes in state 0 to 1 or viceversa) L = k,q qN 0,k,q , and their respective densities m = N 1 /N , ρ = 2L/(µN ) defined in Section II.
In this occupation number approach {N n,k,q }, however, the construction of the master equation is more cumbersome as we need to identify the possible processes ν and the associated rates W (ν) , and this will be our con- 3)
cern in the remainder of this section.Still, the elementary process of the dynamic is the state transition of a node i compatible with the number N n,k,q changing from n i = 0 to n i = 1 or viceversa, but all processes that lead to the same change of the occupation number variables are grouped under the same label ν.In an elementary process, 2(k+1) changes of the set of description variables {N n,k,q } are produced, two for the variables associated to the central node and two for each one of its neighbors, see Fig. 1 as a schematic example.The variables that change during this process depend on the values k, q of the chosen node i, and additionally on the set {k j , q j } j=1,...,k of the k neighbors of i.We adopt to order the list of neigh-bors such that {k j , q j } j=1,...,k−q correspond to the neighbors in state 0, and {k j , q j } j=k−q+1,...,k to the neighbors in state 1.Therefore, the characterization of a process ν requires of the knowledge of the full set of variables, i.e. ν = (n, k, q, {k j , q j } j=1,...,k ).The problem now is that, in principle, one is not able to know from the variables {N n,k,q } the set {k j , q j } j=1,...,k , and consequently we need some approximation to attain a closed treatment of the dynamics.We make the ansatz that the rate of each process is calculated as the total change rate of the central node N n,k,q R ± k,q times the probability of having a particular configuration of the neighborhood, this is: Here, we introduced P 0 (1, k j , q j ), defined as the probability that an edge leaving a node in state 0 connects to a node in state 1 with k j , q j , and equivalently for P 0 (0, k i , q i ), P 1 (0, k i , q i ) and P 1 (1, k j , q j ).These probabilities can be calculated using the description variables N n,k,q as: For example, P 0 (1, k, q) is the fraction of edges coming out of nodes in state 1 with k, q that go to nodes in state 0, divided by the total number of 0-1 edges and similarly for the other expressions.Note that the approximation in this method is that we assume the probability of the neighborhood configuration of a node to be a product of independent single event probabilities, which is of general validity for uncorrelated networks.
We now define n,k,q as the change in the variable N n,k,q → N n,k,q + (ν) n,k,q in the process ν, which are computed as (see Fig. 1 for a guide): Once the processes ν, rates W (ν) and changes in the variables n,k,q are defined, we can draw on the general theory of stochastic processes [67,68,73,78] in terms of the master equation (3).
Coarser levels of description are also possible.Let N n,k = q N n,k,q be the number of nodes in state n with degree k, and L n,k = q qN n,k,q the number of links that connect nodes of degree k and state n with nodes in state 1 (adopter nodes).The next level of description is the Pair Approximation (PA) that considers the set The pair approximation reduces the number of variables to M = 3(k max − k min + 1) with the conservation of the total number of 0-1 links k L 0,k = k (N 1,k k − L 1,k ) as the only constraint.The master equation requires to write the rates W (ν) as a function only of the description variables.To achieve this, one introduces an approximation based on the ansatz that the variable N n,k,q appearing in the rates Eqs.(4, 5) can be expressed as where Bin k,q [p] = k q p q (1 − p) k−q is the binomial distribution.In this paper, we restrict our study to this version of the pair approximation, but other variants exist in the literature, such as the so-called heterogeneous pair approximation [38], where one includes in the description the number of active links L k,k that join nodes of degree k and k that are in different states, or the original version [37] (also called homogeneous pair approximation) that takes into consideration just the global density L of active links.
An even cruder level of description is the Heterogeneous Mean Field (HMF), which considers the set of vari- , reducing the number of variables to M = k max − k min + 1 with no constraints.The closure of the rates W (ν) in terms of this set of variables is achieved by a similar binomial ansatz but with a simpler single event probability: The coarsest possible description is the Mean Field (MF) in which a single description variable Note that while the formulation of the node-state approach does not need any approximations, the different occupation number approaches, whether AME, PA, HME or MF, use approximations in the calculation of the rates that limit the validity of their predictions.In particular, as discussed previously, we expect the AME to be accurate only for uncorrelated networks.The fact that the master equation for the node-state approach is free of approximations does not mean in general that we are able to solve such equation, and different approximations are then needed to obtain a solution [25,26].The advantage of the occupation number approach is that it has some particularities that enables us to apply accurate methods to solve the master equation.These different techniques are explained and explored in the next sections.

IV. APPROXIMATE SOLUTION OF THE MASTER EQUATION Formulation
The main reason of the convenience of the occupation number approach is that the description variables are extensive.This means that for a fixed degree distribution P k , if we increase the system size N → λN , the variables scale in the same way N n,k,q → λN n,k,q and similarly for N n,k , L n,k , N 1,k and N 1 .This property is useful because it allows us to apply the well known system-size expansions of the master equation.Note that the rates Eqs.(4,5) are extensive functions W (ν) (x) = N w (ν) x N , where N = n,k,q N n,k,q is the total number of nodes and w (ν) are the set of intensive rate functions.In this case, following [32,73], we can use a van Kampen type of system-size expansion, that we now explain in detail.
In the case of the AME, the expansion splits the variables as x = Nφ φ φ + N 1/2 a a a + N 0 b b b, in components N n,k,q = N φ n,k,q + N 1/2 a n,k,q + N 0 b n,k,q , where φ n,k,q are a set of deterministic variables, while a n,k,q and b n,k,q are random variables.This is an expansion which is assumed to be of general validity in the thermodynamic limit N → ∞ and which yields the first stochastic correction terms to the deterministic approach [30,31].The deterministic evolution of the system fulfills a set of nonlinear differential equations characterized by the drift term defined as Φ n,k,q (φ φ φ) n,k,q w (ν) (φ φ φ) which leads, after some algebra using Eqs.(4-13), to: Here β s , γ s , β i and γ i are the individual rates R ± k,q at which a neighbor of a central node changes state averaged with the probabilities Eqs.(6)(7)(8)(9), where the symbol β, γ reflects the state of the neighbor node 0, 1, while the super index s, i reflects the state of the central node 0, 1, namely: Note that, at the deterministic level, the set of differential equations (16)(17)(18)(19)(20)(21)(22) coincides with the original work of Gleeson [30,31], as it is naturally expected.The advantage of the stochastic formalism presented here Eqs.(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13) is that, in addition, we will be able to obtain results for the average deviations a n,k,q , b n,k,q from the deterministic solution, and also for the fluctuations/correlations C n,k,q;n ,k ,q = a n,k,q a n ,k ,q − a n,k,q a n ,k ,q .In the van Kampen expansion, the set of differential equations for these quantities are linear and read in vector notation [73]: where B is the Jacobian matrix related to the Hessian matrices of Φ Φ Φ.For reason of space, the explicit expressions of these matrices are written down in Appendix A.
In the case of the Pair Approximation, and proceeding with the general theory, we split the variables like The evolution equation at the deterministic level is: In order to obtain the deterministic drift functions Φ Φ Φ, we have to perform sums in Eqs.(17)(18) as Φ k = q Φ 1,k,q and Φ n,k = q qΦ n,k,q , which leads to where one must replace φ n,k,q by the binomial ansatz The corresponding Jacobian B and G matrices of this Pair Approximation can be found in Appendix A.
In the case of the Heterogeneous Mean Field, the variable splitting in this case is and deterministic equation dφ k dt = Φ k , where the drift functions Φ Φ Φ are obtained by summing Eqs.(17)(18) like Φ k = q Φ 1,k,q , which leads also to but now φ n,k,q are given by φ 0,k,q Again, the corresponding Jacobian B and G matrices can be found in Appendix A.
In a previous work [73] we explained how to solve equations (23)(24)(25) and we developed a very stable and fast convergent implicit Euler method to find the numerical solution of the correlation matrix C. It is worth mentioning that a general result of the van Kampen expansion is that the stationary probability distribution Π st (a a a) of the first stochastic correction a a a is Gaussian [67,73] with zero mean a a a st = 0: Besides, if the initial condition Π(a a a, t = 0) is Gaussian, then the time-dependent Π(a a a, t) is also a Gaussian (32) replacing the stationary correlation matrix C st → C(t) by the time-dependent one.
The van Kampen expansion will be accurate in the thermodynamic limit N → ∞, for example in the determination of the magnetic susceptibility, this is: According to the van Kampen approach, the susceptibility defined as Eq. ( 2) does not depend on system size N , which is obviously not true for a finite system N .What we are obtaining in this approach is the thermodynamic limit lim N →∞ χ N .With respect to the average values of the macroscopic quantities m(t) , ρ(t) they are computed as: This is nothing but the deterministic solution plus a correcting factor of order O(N −1 ) (note that a a a = 0 Eq. ( 23)).
An alternative less restrictive system size expansion is the Kramers-Moyal expansion, which transforms the master equation (3) into a continuous PDE for the intensive variables ϕ ϕ ϕ.If we define the densities ϕ ϕ ϕ = x/N , the Kramers-Moyal expansion [67] leads to the Fokker-Planck equation [81] for the probability density Π(ϕ ϕ ϕ; t) of the intensive variables: where Φ i and G ij are the same functions defined previously.The problem with the Kramers-Moyal expansion is that, in most occasions, it is as complicated to solve as the original master equation (3), while the van Kampen expansion corresponds to a linearization of Eq. ( 36) (this is why it is also called linear noise approximation) where we assume ϕ ϕ ϕ to weakly fluctuate around the deterministic value φ φ φ, this is ϕ ϕ ϕ ≈ φ φ φ + N −1/2 a a a.
In the next subsection we apply the van Kampen expansion method to several models of interest and check its accuracy and validity.

Comparison with numerical simulations
We will now compare the results of the theory explained in the previous section to the numerical simulations.We will focus on stationary quantities in order to study how the results change with the parameters of the models.
The first model that we consider is the SIS (susceptible-infected-susceptible) epidemic model [1,82] on a scale-free network with rates R + k,q = ε + λq and R − k,q = µ.Here λ is the transmission rate, µ is the recovery rate, and ε is the rate at which an outbreak appears in the system.Note that we incorporate the parameter ε in order for the system to have a properly defined stationary result, in principle one recovers the traditional SIS model by letting ε → 0. In Fig. 2 we see that the van Kampen approach predicts accurately the stationary susceptibility for a small system size N = 100 with increasing accuracy as it increases to N = 400.The finitesize corrections to the average value ρ st are plotted in Fig. 2, with an improvement in the deterministic solution.If we focus on the comparison between the different approximations we observe, as expected, an increase in accuracy as AME > PA > HMF.Although the difference between the AME and the PA is small with respect to the deterministic φ φ φ and fluctuations χ, the finite size corrections of the average values N −1 b b b are well captured only by the AME.This indicates that the PA is a good approximation at the deterministic and linear (Jacobian) level Eq.(23,25) but not for the second order correction (Hessian) level Eq. ( 24).
The second model to which we apply our theory is the Ising model defined on an Erdős-Rényi network with Glauber rates [83 where J is the coupling strength and T the temperature.Note that it is a symmetric model and thus, as discussed in Section II, we choose m S = 2m − 1 to calculate the susceptibility.It is well known that the Ising model has a critical point and χ st = N m 2 S st and eliminate those points to the right of the peak of the first expresion.In Fig. 3 we see a good prediction of the stationary susceptibility for the small system N = 100 with increasing accuracy for N = 400.Note that the theory predicts the divergence of the susceptibility at the critical point T c , as χ st ∝ |T − T c | −γ , which can be strictly true only in the thermodynamic limit N → ∞.For a finite system it can be shown, see Section V, that if we approach the critical point as an inverse power of the system size |T − T c | ∝ N −r , the susceptibility scales as a positive power χ st ∝ N 2υ−1 , with appropriate exponents r > 0 and υ > 1/2 determined in Section V.This implies that the van Kampen expansion presents discrepancies with the numerical results that are important, for a finite system N , in a region of the critical point whose width decreases with system size.Similarly, we see in Fig. 3 that the stochastic correction to the density of active links ρ st is only accurate outside the critical region, while it diverges at the critical point, at odds with the numerical result.In the comparison between the different approximations we observe again an increase in accuracy as AME > PA > HMF.As proven in [31] the deterministic part of the PA and AME are completely equivalent for all models fulfilling the microscopic reversibility condition where c is a constant, for the Ising Glauber this is the case with c = e 4J/T .We also observe that the AME and the PA offer results for the susceptibility which are indistinguishable at the resolution of the figure.Although the AME and PA agree at the deterministic and fluctuation level, the finite size corrections to the average values N −1 b b b are only accurate for the AME, confirming the results obtained for the SIS model.
The third model is the majority-vote model [84] on a z-regular network with rates where Q is the rate of spontaneous opinion switching.It is also a symmetric model and it has similar phenomenology to the Glauber model with a critical point Q c , see Fig. 4. The most notorious difference is that in this case the AME and PA results are very different even at the deterministic level, and thus for this model only the AME gives more accurate results.The reason for this difference is that the rates do not fulfill the microscopic reversibility condition, see [31].Note also in Fig. 4 that the AME and PA predict similar critical points Q c , but the scaling |m S | st ∝ (Q c − Q) β is β = 1/2 for the AME and β = 1/4 for the PA.In fact, according to our discussion in the next section, the scaling behavior of the magnetization and susceptibility around a critical point depends on the normal form of the bifurcation.For example, for a typical continuous phase transition as in the Ising model we have β = 1/2 and γ = 1 which corresponds to mean-field exponents.This justifies the common knowledge that critical exponents in complex networks coincide with those of mean-field  33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq. (34).theory, see [85] where the critical exponent of the heat capacity is determined to be α = 0 (discontinuous heat capacity, which is the mean field result) for the Ising model in a small-world network.Note that this is true as long as the deterministic solution depends on degree moments µ m that are well defined in the thermodynamic limit N → ∞.This may not be the case on scale free networks [31], where the deterministic solution may depend on the µ 2,3,4,... degree moments that diverge, depending on the value of the exponent of the power law degree distribution, as N → ∞.In this case, this may imply that the critical exponents depend on the details of the degree distribution [86].As explained in the next section, one may redefine the finite-size scaling functions and critical exponents to take into account the N -dependence of the degree moments.Ising model with Glauber rates.We choose as parameter J = 1 on an Erdős-Rényi network with average degree µ = 5.Points correspond to numerical simulations of the model with N = 100 (solid squares) and N = 400 (empty squares) averaged over an ensemble of 100 networks.Lines of different colors are the theoretical prediction of the different approximations (the curves that do not appear are superposed).In the left panel the solid lines are the van Kampen result Eq. (33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq. ( 34).
In the next section we propose a different method for solving the master equation close to a critical region, where the van Kampen expansion fails.We also show how to determine the exponents and scaling properties of the models close to a critical point, for a finite-system and also in the thermodynamic limit N → ∞.

V. THE EXPANSION AROUND A CRITICAL POINT Formulation
Usually, the rates of the model depend on a set of parameters.Take, for example, a single parameter T for 0.0  33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq. (34).
simplicity.It may happen that at a determined value T = T c , one of the eigenvalues of the linearized deterministic dynamics becomes equal to zero D 1 = 0, this is called critical or bifurcation point.The proposed system size expansion x = Nφ φ φ + N 1/2 a a a + N 0 b b b in this case leads to singular, divergent, results for the correlations and average value corrections [73].The mathematical divergence of the correlations near the critical point is an accurate description only in the strict thermodynamic limit N → ∞.When N is finite, near the critical point, we have an anomalous scaling with system size, which implies that we have to consider a different ansatz for the system size expansion [32].
In order to deal with such situations, we start by finding the linear transformation that diagonalizes the Jacobian matrix B st = PDP −1 being D the diagonal matrix composed by the eigenvalues and P the matrix of change of basis whose columns are the corresponding eigenvectors, all evaluated at the critical point T = T c .We define the transformed variables in the eigenvector basis u = P −1 φ φ φ, such that the deterministic dynamics of the new variables is At the critical point we have U i (T c , u st ) = 0 and The Center Manifold Theory [87][88][89] states that, in this case, there exists a special trajectory or center manifold u i = h i (T, u 1 ) for i = 1 with u st i = h i (T, u st 1 ) and ∂ u1 h i (T c , u st 1 ) = 0, that describes locally the dynamics of u close to the critical point T c and near the fixed point u st .This implies that the time dependence of the fast variables u i>1 (t) is enslaved to the slow variable u 1 (t).We can write the dependence of h i (T, u 1 ) as a series expansion where the other terms of the expansion are neglected, for example (T The dynamics of u 1 inside the center manifold is u1 = U 1 (T, u 1 , h 2 , h 3 , . . . ) whose series expansion reads where β (0m) , m ≥ 2, and β (1n) , n ≥ 0, are the lowest non-zero terms in the expansion in powers of (u 1 − u st 1 ) and higher-order terms are neglected.The expressions of the first coefficients β (10) , β (11) , β (02) , β (03) are given in Appendix B.
Equation ( 39) is called the normal form of the bifurcation [89] and depending on the value of the coefficients it characterizes three types of critical points/bifurcations.If β (10) = 0 the bifurcation is a saddle node; while if β (10) = 0, but β (11) = 0, the bifurcation is said to be transcritical for m even, or pitchfork for m odd.
From the normal form one can determine the critical exponent β.Setting the time derivative of Eq. ( 39) equal to zero and keeping in mind that u st 1 refers to the fixed point at the critical point u st 1 (T c ) we obtain: (i) Saddle node, 0 = β (0m) (u st 1 − u st 1 ) m + β (10)  1) (m even for the transcritical and odd for the pitchfork).
Thus β = 1/m for the saddle and β = 1/(m − 1) for the transcritical and pitchfork bifurcations.Note that in Eq.( 39) we only keep the two most important terms of the expansion to study the behavior of the stable fixed point close to the transition, and the others can be neglected.This can be checked introducing the first order result the expansion Eq. ( 39) and evaluating the order of each term.
Once this is understood, we propose the following system-size expansion based on the results of [32,75,76].If we approach the critical point as (T − T c ) ∼ N −r , 0 < r < 1 and we define the transformed y i variables as y i = j P −1 ij x j , then y i follows the center manifold with small deviations of order N 1/2 , while the stochastic part of y 1 has an anomalous scaling N υ , 1/2 < υ < 1, namely: Note that r and υ are parameters to be determined and that fluctuations inside the slow center manifold are assumed to scale differently that fluctuations outside it.Using this change of variables (T, y 1 , y i>1 ) → (ξ 0 , ξ 1 , ξ i>1 ), we can expand the master equation (3) in powers of N , this is done in detail in Appendix B. During the expansion we determine that for a saddle node, β (10) = 0, it is r = υ = m m+1 , while for the transcritical or pitchfork bifurcations, β (10) = 0, it is υ = m m+1 , r = m−1 m+1 .After the expansion of the master equation we obtain a Fokker-Planck equation for the probability Π(ξ 1 ; t) of the slow variable ξ 1 , which for the transcritical and pitchfork bifurcation reads with a noise intensity F 11 = i,j P −1 1i P −1 1j G ij .For the saddle node we obtain the same equation but replacing β (11) ξ 0 ξ 1 by β (10) ξ 0 .Note that the equation evolves at a slow time scale τ = N (m−1)/(m+1) , this is known in the literature as critical slowing down.In the stationary state for the transcritical and pitchfork bifurcations we have Π st (ξ 1 ) ∝ exp β (11) This corresponds to a Gaussian distribution with a saturation term that the van Kampen approach does not take into account.For the saddle node one should replace β (11) ξ 0 ξ 2 1 by 2β (10) ξ 0 ξ 1 and the distribution is no longer Gaussian.Note that if m is even we can not integrate the probability Eq. ( 44) in the entire range of ξ 1 and we have to restrict it to the "stable" zone, where fluctuations are not big enough to drive the dynamics to a zone where the deterministic dynamics is unstable and evolves towards infinity.
Any moment of the y 1 variable can be computed integrating the distribution Eq. ( 44), for example the variance: where σ 2 [ξ 0 ] is the variance of the ξ 1 variable.Now, the average values and correlations of the x variables can be related to the transformed variables y as: and from this it is straightforward to determine m st , ρ st and χ st with the definitions given in Section II.
In the thermodynamic limit N → ∞, one can show that the van Kampen result is recovered naturally.Take for example a pitchfork bifurcation with m S st = 0.In this case, according to Eq. (46,47) the scaling properties of |m S | st and χ st with N are where m(ξ 0 ) and χ(ξ 0 ) are the respective scaling functions determined from Eqs. (44-47).In the limit N → ∞ of Eqs.(48,49), the argument ) and if we assume the scaling relations m ∼ ξ β 0 and χ ∼ ξ −γ 0 , with appropriate exponents β and γ such that |m S | st and χ st are N -independent, we obtain con- Another quantity that is of great interest and that we will use in the next section is the Binder cumulant, defined as the ratio of moments It is easy to show that the scaling of this function is given by: where is nothing but the Binder cumulant of the ξ 1 variable, that can be determined using the probability Eq. ( 44) (note that this is independent of the eigenvector coefficients P i1 , since they cancel out when computing the ratio of moments).
In the next section we apply this method to the models studied in Section IV and we check if it corrects the problems of the van Kampen expansion in the critical zone.The SIS model can not be studied using these techniques because the model has an absorbing state for ε = 0, and the noise intensity becomes equal to zero in this case, F 11 = 0 (evaluated at the absorbing state and at the critical point ε = 0, λ = λ c ) and thus, the stationary probability Eq. ( 44) is ill-defined.The finite-size scaling of this type of epidemic models is non-trivial and requires other type of techniques [4].We will thus consider only the Ising Glauber and majority-vote models.
The scaling functions m and χ generally depend on the degree moments µ m which may scale with systemsize N in a non-trivial way for certain types of highly heterogeneous networks, such as scale-free.It is possible, depending on the model, to reabsorb this N −dependence by redefining the scaling functions Eqs.(48,49), see [56], and this may imply network dependent critical exponents β, γ, see [86].

Comparison with numerical simulations
We will start with the Ising model with Glauber rates for the network and parameter specifications in the caption of Fig. 3.The critical point predicted by the AME and PA approximations is T c (AME/PA) = 4.93 . . .while for the HMF it is T c (HMF) = 4.96 . . .In order to determine the critical point numerically from the Monte Carlo (MC) simulations, we use a standard technique of statistical mechanics [90], which consists in computing the Binder cumulant defined as for different system sizes N , such that all the different curves cross at the critical T c , see Fig. 5.We obtain in this case T c (MC) = 4.93 ± 0.01 in perfect accordance to the AME/PA results.After computing the coefficients of the normal form of the bifurcation Eq. ( 39) we obtain for all three approaches AME/PA/HMF that β (10) = 0, β (11) < 0, β (02) = 0 and β (03) < 0 which indicates that, according to our discussion in Section V, we have a pitchfork bifurcation with m = 3, r = 1/2 and υ = 3/4.If the theory is correct, and the scaling properties Eqs.(48)(49)(50) are valid, if we rescale |m S st , χ st , U 4 by N 1/4 , N 1/2 and N 0 respectively, and the temperature by N 1/2 , all the curves should collapse on a single universal one m(ξ 0 ), χ(ξ 0 ) and u(ξ 0 ).In Figs. 5 and 6 we compute this numerically and compare it with the theoretical scaling functions derived from Eq. ( 44).The matching between numerical and theory is very good which proves the validity of the method.Note also that the scaling functions of the AME and PA coincide, while the HMF shows some deviations.This indicates that there is a strong relation between the validity of the deterministic solution and the scaling functions.The parameters are J = 1 on an Erdős-Rényi network with average degree µ = 5 and the results were averaged over an ensemble of 100 networks.Points correspond to numerical simulations of the model with different system sizes N specified in the legend, while lines are the theoretical scaling functions determined from Eq. (44,50).
The next model that we study is the majority-vote model with the same specifications of Fig. 4. The critical point predicted by the AME is Q c (AME) = 0.099 . . ., for the PA it is Q c (PA) = 0.100 . . .and the HMF is Q c (HMF) = 0.167 . . . .The numerical critical point obtained from the Binder cumulant in Fig. 7 is Q c (MC) = 0.100±0.01,compatible with the results of the AME and PA but not with the HMF.When we compute the coefficients of the normal form of the bifurcation Eq. ( 39) we obtain β (10) = 0, β (11) < 0, β (02) = 0 and β (03) < 0 for the AME and HMF, which corresponds again to a pitchfork bifurcation with m = 3, r = 1/2 and υ = 3/4.Surprisingly, for the PA we obtain instead β (03) = 0 which suggests a different type of pitchfork with m = 5, r = 2/3 and υ = 5/6.This could be already seen in Fig. 4, as  ) and average magnetization N 1/4 |mS| st as a function of N 1/2 (T −Tc) on an Erdős-Rényi network with average degree µ = 5 for the Glauber Ising model with J = 1.Points correspond to numerical simulations of the model with different system sizes N specified in Fig. 5, while lines are the theoretical scaling functions determined from Eqs. (44,48,49).
the scaling properties in this case.In Figs. 7 and 8 we compared the theoretical scaling functions with the numerical simulations, where we see that the theoretical scaling of the AME offers a reasonable agreement.Note, however, how the convergence to the theoretical scaling is very slow for Q < Q c .This is because for the AME β (03) is small, and the other higher order terms of the normal form Eq. ( 39) may be important, unless N is extremely large.This also explains the failure of the PA that actually predicts β (03) = 0.

VI. TIME DEPENDENCE
In the previous sections we have focused on stationary averages.The methods, however, are straightforwardly generalized for time dependent results.For the van Kampen approach, we have to solve the deterministic dynamics dφ φ φ dt = Φ Φ Φ and, at the same time, the dynamics of the average values and correlations Eqs.(23)(24)(25).On the other  (44,50).The finitesize scaling for the PA result is not displayed as it predicts incorrect scaling properties.
hand, if we are close to a critical point in the parameter space, we assume that dynamics evolve following the center manifold and we have to solve Eq. ( 43), obtaining Π(ξ 1 ; t).This corresponds to a separation of time scales, which implies that the dynamics outside the manifold is very fast compared to the dynamics inside and thus negligible.We will apply these methods to two different models of interest, not considered in the previous sections, the SI (susceptible-infected) epidemic model and the Threshold model.We chose these models because their dynamics are more interesting than the stationary properties.
We start with the application of the van Kampen expansion for the SI epidemic model with rates R + k,q = λq and R − k,q = 0 for a very small system of N = 25 nodes, see Fig. 9.For the susceptibility χ(t) the AME and PA give a good approximation with slight differences between both approaches, while the HMF shows important 0.0 0.5 1.0 ) and average magnetization N 1/4 |mS| st as a function of N 1/2 (Q − Qc) on a 3regular random network for the majority-vote model.Points correspond to numerical simulations of the model with different system sizes N specified in Fig. 7, while lines are the theoretical scaling functions determined from Eq. (44,48,49).The finite-size scaling for the PA result is not displayed as it predicts incorrect scaling properties.
discrepancies.For the average value m(t) , the deterministic AME and PA give the same results, while HMF again shows discrepancies.We observe that, similarly to what happens in the stationary state Fig. 2, although the deterministic part of the AME and PA part is equal, the stochastic corrections happen to be only accurate for the AME approach.
For the Threshold model [91,92], with rates R + k,q = 1 if q ≥ M k and R + k,q = 0 if q < M k (where M k is a set on integer parameters), R − k,q = 0, we conclude that the methods presented in this paper are not appropriate for all times t.In Fig. 10 we see that the van Kampen system size expansion is accurate only for the AME approach and until a certain time t < t c .After that point fluctuations increase with system size and finite size effects become very important.The numerical results of Fig. 10 suggest that t c increases with system size and so do the finite size corrections, at variance with the traditional system size expansion x x x(t) = φ φ φ(t) + b b b(t) N .In this special case we should apply alternative methods to solve the Master equation (3) for t > t c , that are beyond the scope of the current paper.

VII. SUMMARY AND CONCLUSIONS
In this paper, we have introduced theoretical tools to study stochastic effects in binary-state models on complex networks.First, we constructed the general master equation of the different compartmental approaches: approximate master equation (AME), pair approximation (PA) and heterogeneous mean field (HMF).After that, we elaborated on the different approximate methods for solving the master equation, in particular we explored the van Kampen expansion, valid far from a critical point, and a critical expansion, accurate at the critical zone.From the van Kampen expansion we were able to obtain equations for the correlation matrix of the set of variables and the corrections to their average values, while from the critical expansion we got their finite-size scaling functions.
We applied these techniques to characterize the stationary properties of the SIS epidemic, Glauber Ising and majority-vote models.When comparing the performance of the different compartmental approaches to numerical simulations we conclude that, if AME and PA have equal or similar results at the deterministic level, the same goes for the fluctuations of the van Kampen expansion and the scaling functions of the critical expansion, but that is not the case for the finite-size corrections to the average values which are only accurate for the AME.This is what we observe for the Glauber model where PA and AME have equivalent deterministic, fluctuations and scaling functions but different finite-size corrections to the average values.This is an indication that, although the PA may work very well in the determination of certain quantities such as average values, the binomial restriction between variables is not necessarily fulfilled by the stochastic trajectories.For the majority-vote model, the AME and PA give different results at all levels, where the PA even predicts incorrectly the scaling coefficients (critical exponents).In general, we can highlight that the probabilistic description using the AME gives very accurate results for stationary and also time-dependent results (as it is shown in Section VI for the SI epidemic model) within the range of validity of the expansion methods.Certain type of models may not fit in the scope of the expansion methods and their characterization is left for further studies, this is the case of the Threshold model.For t < t c (with t c increasing with N ) the van Kampen expansion gives correct results, but for long enough times t t c finitesize effects become more important and deviate from the predicted value.
The solution of the equations for the average values and the fluctuations has been performed numerically using an efficient method developed in [73].It is left for a future work to explore the possibility of obtaining analytical results of the models and the general conditions for the AME-PA equivalence at the stochastic level.A particularly interesting case that has not been considered in this work is the noisy-voter (Kirman) model for which the linearity of the rates allows one to close the equations for the moments and correlations without the need to resort to the van Kampen approximation.This was done for the homogeneous pair approximation in [32] and it would be interesting to extend these results to the more complicated compartmental models considered in this work.Initial results indicate that the AME corrects the lacks of the PA in specific cases.The differences being specially notorious for low dense -highly heterogeneous networks [93].

FIG. 2 :
FIG.2: Stationary susceptibility χst and average density of active links ρ st as a function of the transmission rate λ for the SIS model.We choose as parameters ε = 10 −2 and µ = 1 on a scale free network P k ∼ k −2.5 with kmin = 2 and kmax = 10.Points correspond to numerical simulations of the model with N = 100 (solid squares) and N = 400 (empty squares) averaged over an ensemble of 100 networks.Lines of different colors are the theoretical prediction of the different approximations.In the left panel the solid lines are the van Kampen result Eq.(33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq.(34).

FIG. 3 :
FIG.3: Stationary susceptibility χst and average density of active links ρ st as a function of the temperature T for the Ising model with Glauber rates.We choose as parameter J = 1 on an Erdős-Rényi network with average degree µ = 5.Points correspond to numerical simulations of the model with N = 100 (solid squares) and N = 400 (empty squares) averaged over an ensemble of 100 networks.Lines of different colors are the theoretical prediction of the different approximations (the curves that do not appear are superposed).In the left panel the solid lines are the van Kampen result Eq.(33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq.(34).

FIG. 4 :
FIG.4: Stationary susceptibility χst and average magnetization |mS| st as a function of Q on a 3-regular random network for the majority-vote model.Points correspond to numerical simulations of the model with N = 100 (solid squares) and N = 400 (empty squares) averaged over an ensemble of 100 networks.Lines of different colors are the theoretical prediction of the different approximations (the curves that do not appear are superposed).In the left panel the solid lines are the van Kampen result Eq.(33), while in the right panel the solid line is the deterministic approach and the dashed lines the corrected average values Eq.(34).

FIG. 5 :
FIG.5: Binder cumulant as a function of the temperature T (left panel) and as a function of the rescaled temperature N 1/2 (T − Tc) (right panel), for the Ising Glauber model with different system sizes N , specified in the figure.The parameters are J = 1 on an Erdős-Rényi network with average degree µ = 5 and the results were averaged over an ensemble of 100 networks.Points correspond to numerical simulations of the model with different system sizes N specified in the legend, while lines are the theoretical scaling functions determined from Eq.(44,50).

FIG. 7 :
FIG.7: Binder cumulant as a function of Q (left panel) and as a function of the rescaled N 1/2 (Q − Qc) (right panel), for the majority-vote model with different system sizes N specified in the legend on a 3-regular random network, and results were averaged over an ensemble of 100 networks.Points correspond to numerical simulations of the model with different system sizes N specified in the legend, while lines are the theoretical scaling functions determined from Eqs.(44,50).The finitesize scaling for the PA result is not displayed as it predicts incorrect scaling properties.

FIG. 9 :
FIG.9: Density of active nodes m(t) and susceptibility χ(t) as a function of time t, for the SI epidemic dynamics with λ = 1 on a scale free network with P k ∼ k −2.5 , kmin = 2 and kmax = 5, and N = 25.Dots are numerical simulations average over 100 trajectories and 100 networks, while lines are: (a) in the left panel it is the results of solving the dynamical Eq. (25) for the different approaches, (b) in the right panel the solid lines are the deterministic result φ φ φ(t), while the dashed are corrected by the second order term φ φ φ(t) + b b b (t)/N Eq.(24).

FIG. 10 :
FIG. 10: Density of active nodes m(t) and susceptibility χ(t) as a function of time t, for the Threshold model with M k = 2, ∀k on a five-regular random network.Dots (N = 100 triangles, N = 1000 circles, and N = 10000 squares) are numerical simulations average over 100 trajectories and 100 networks, while lines are: (a) in the left panel it is the results of solving the dynamical Eq. (25) for the different approaches, (b) in the right panel the solid lines are the deterministic result φ φ φ(t).
. As a consequence, we conclude that the PA is not able to capture correctly