Conditions for viral influence spreading through multiplex correlated social networks

A fundamental problem in network science is to predict how certain individuals are able to initiate new networks to spring up"new ideas". Frequently, these changes in trends are triggered by a few innovators who rapidly impose their ideas through"viral"influence spreading producing cascades of followers fragmenting an old network to create a new one. Typical examples include the raise of scientific ideas or abrupt changes in social media, like the raise of Facebook.com to the detriment of Myspace.com. How this process arises in practice has not been conclusively demonstrated. Here, we show that a condition for sustaining a viral spreading process is the existence of a multiplex correlated graph with hidden"influence links". Analytical solutions predict percolation phase transitions, either abrupt or continuous, where networks are disintegrated through viral cascades of followers as in empirical data. Our modeling predicts the strict conditions to sustain a large viral spreading via a scaling form of the local correlation function between multilayers, which we also confirm empirically. Ultimately, the theory predicts the conditions for viral cascading in a large class of multiplex networks ranging from social to financial systems and markets.


I. INTRODUCTION
People's adoption of new ideas or products and even novel scientific theories often depends upon the foregoing support of a few innovators, pioneers or knowledgeable individuals.These early adopters disseminate the new idea through viral influence spreading that leads to cascades of followers [1][2][3][4][5][6].Examples are found in the rise of brand-new consumer products, where a few early-adopters can have a large effect in the entire population, a process leading to modern engagement strategies of "viral marketing".Similarly, scientists in a given academic field work in specialized topics by developing collaboration networks.As innovative ideas arise, the majority of them may rapidly transition to the study of the new topic.When the pioneers migrate to develop a new idea, their former social network weakens its roots hence, leading to its rapid disintegration [7][8][9][10][11].A prominent example is the rise of the social networking community Facebook.comto the detriment of the previously dominant Myspace.com, as shown in Fig. 1a.The conditions favoring such a process of disintegration at the tipping point of a dominant network or community and the raise of a competing network have not been conclusively demonstrated so far.
Typically, the innovators exert their influence not only through the regular channels of communication in their social networks-such as mutual connectivity through friendship, collaborations, family or other types of direct contact-but through unidirectional links based upon their "cognition influence".This means that scientific innovators, for instance, would lead the introduction of new scientific ideas by engaging learners through "links of influence".These are hidden directed links that can be found in many situations, e.g., people following trends set by popular singers and actors, even though the actors do not "know" their followers.Equally important are financial networks and markets, where one company or bank may depend on others due to financial or technical reasons [12].This illustrates a fundamental property of social networks: while network functionality depends on a layer of mutual connectivity links, the network stability depends on a hidden "guided influence" network quantified by the state of being influenced by knowledgeable individuals.
Systems consisting of network layers with multiple types of links such as those treated here are referred to as multiplex networks [13][14][15], that, in some cases, are equivalent to interdependent networks [10].Such a network of networks structure is shown to be crucial for cascading failure [10], transport [16], diffusion [17], evolution of cooperation [18], competitive percolation [19] and neuronal synchronization [20].Specifically related to spreading processes, previous research has addressed the spreading of human cooperation in multiplex networks [18], showing that it depends significantly on the properties of the correlations between network layers, as described in [21].Our approach has close connection with recent work on generalized percolation [22].Furthermore, the percolation modeling that we apply is related to the study of percolation in multiplex networks in the context of interdependent networks as studied in [23,24].

II. RESULTS
The multiplex structure can be investigated in the collaboration networks formed by scientists [25] and in online social blogging communities of information dissemination such as LiveJournal.com[26,27].Formally, we consider a network with two types of links: connectivity and influence links.The connectivity links are undirected and correspond to a close relationship between two nodes: for instance, scientists who have coauthored at least s articles during a specified interval in the journals of the American Physical Society (APS).Underlying this basic structure, there is a hidden network of influence links among scientists that can be quantified through the citation list of papers.When author A systematically cites papers of author B, then we assume that there is a directed outgoing influence link A → B. Similar structure can be defined in the LiveJournal network of information dissemination, which will be studied below.
We start by tracking the upsurge and dissaperance of Physics trends through the creation and removal of fields in the Physics and Astronomy Classification Scheme (PACS) compiled by the APS from 1975 to 2010 (see Supplementary Information for details).PACS is a hierarchical classification of the literature in the physical sciences where each published paper is assigned one or more PACS numbers.Figure 1b shows the number of scientists in the largest connected component (analogous to the giant component in the thermodynamic limit) of scientists in the Statistical Physics community which, until around 1998, was publishing under PACS 64.60: "General Studies of Phase Transitions".After 2000, many of those researchers quickly switched to publish in the field of "Complex Networks" (under a series of PACS 89.75-k created in 2001).The number of scientists in the giant component of the collaboration networks (Fig. 1b) shows a similar behavior as myspace and facebook users in Fig. 1a.This similarity hints at possible generic features acting when a new trend competes with an older trend for the same pool of users.
To quantify the viral cascading process, we perform a real-time percolation analysis [28,29]  plex Networks.This situation explains why the frequency of citation in Fig. 1c does not decay as rapidly as the giant component in Fig. 1b, but remains relatively constant or present a small decay after scientists start to publish in Complex Networks.

A. Empirical results on APS collaboration networks
We calculate the size, N ∞ , of the giant connected component of authors in each field from the articles published in the pre-Complex Network period (1997)(1998)(1999)(2000)(2001).Then, we identify the fraction 1 − p of pioneers from each field defined as those scientists who published at least one paper in Complex Networks in 2001.To quantify the effect of cascades we measure the size of the largest component of the original collaboration network, n ∞ , at a later time (2005)(2006)(2007)(2008)(2009).Figure 2a shows the fast decay of the fraction of nodes in the largest connected component which we call P ∞ = n ∞ /N ∞ with 1 − p for the different PACS fields as evidenced of the rapid disintegration of each physics community.We notice that the fraction of departing pioneers is 1 − p.Thus, Fig. 2a should be interpreted, for instance for the field of Phase Transitions, as follows: a small fraction 1 − p = 10% of departing pioneers leads to a large 81% shrinking in the largest component of the network.The cascading behavior triggered by the departure of 1 − p pioneers is visualized in the influence network representation of Fig. 2b and the connectivity representation in Fig. 2c (see also SI Fig. 6).
In addition of considering the largest component, many networks consist of other clusters.For this reason, we have also calculated the size of the second largest clusters for the five networks involved in the APS communities: Chaos, Fluctuations, Interface, Phase Transitions and Thermodynamics.We find that the size of the second largest clusters are 31, 21, 49, 29, 46, respectively.These numbers are small compared with the size of the largest components of the networks: 1126, 522, 232, 193, 87, respectively.In principle, the second largest clusters can be analyzed in the same way as the largest component.However, we find that the size of the second largest clusters are too small to obtain meaningful statistical results.
The disintegration process can be interpreted as a percolation phase transition at a critical threshold p c , defined when P ∞ (p c ) = 0.For each network, 1 − p c quantifies the minimum fraction of departing nodes (pioneers) who are able to breakdown the network [28,29].The data seems to percolate at remarkably large values of p c (Fig. 2a) implying that the collaboration networks are highly vulnerable.This result is in sharp contrast to the prediction of classical percolation theory on scale-free networks without influence links: p c = 0 [28][29][30][31][32] (collaboration networks have been found to have a power-law tail in the degree distribution [25], P(k) ∼ k −γ with γ < 3, see also SI).The prediction p c = 0 exemplifies the extreme resilience of scale-free networks under random removal of nodes.
In principle, a plausible explanation of the extreme fragility of the scientific communities could be that the respective networks are being disrupted by the departure of the most connected people: scale-free networks are resilient to random removal of nodes (p c = 0) [29,31], but they are very vulnerable (p c close to 1) in regard to hub departure [30].However, we find empirically that the pioneers are not the highly connected scientists.Yet, they are minor players who develop a novel appealing idea leading to the creation of an entire new community and to the disintegration of the old system (see Fig. 2d and SI Table I).This collapse is particularly true since the most well-connected individuals follow the new trend sustaining a viral cascade of influences.
We compare the average connectivity degree of pioneers in the largest connected component k pio for each PACS community and found that k pio is much smaller than the maximum degree in the network, k max , as shown in Fig. 2d and SI Table I.Furthermore, k pio is smaller or of the same order than the average degree of the nodes, k .This result indicates that the pioneers are not always the hubs in the network.Instead, the pioneers have a degree which is very close to the randomly selected nodes.

B. Percolation modeling
To identify the conditions for viral cascading leading to network fragility, we develop a generic model and search the space of solutions by calculating the percolation threshold.The network contains undirected connectivity links and directed influence links.Each node in the network is characterized by the degree k of its connectivity links, the degree k in of incoming influence links, and the degree k out of outgoing influence links (Fig. 3a, by definition k in = k out ).In the most general case, these quantities are correlated as measured by the joint probability distribution function: P(k, k in , k out ).Indeed, below we demonstrate that viral cascades can be sustained only when there is a positive correlation between k and k out , which indeed we find empirically (Fig. 3g and Fig. 5b).
We demonstrate a cascading process (Fig. 3a-f) initiated by removal of a node who creates a new idea and moves to a new field of science.We map out this process to a correlated percolation model to find analytical solutions to predict p c , as well as the universal boundaries of the phase diagram over an ensemble of correlated random graphs.The main analytical treatment of the problem is based on the method of generating functions [10,28,33].We generalize the previous uncorrelated theory [10,28,33] to the case of a correlated network using: At the heart of the model, there is a cascading process mimicking the departure of nodes following influential nodes.Such a process is described by a certain probability q h which is estimated from the data and determines the departing process as follows.Imagine that node A is following k out = 15 other nodes.The model takes into account that node A will leave the network with a certain probability q h when one of his 15 influential nodes leave.This probability is a parameter of the model and is determined from the experimental data as analyzed in Fig. 9 and Section X.
For the sake of argument, imagine that node A will follow departing nodes with low probability, let's say q h = 0.2.Implementing such a probability per each link of node A would imply that node A would have a 20% chances to leave the network when one of the k out = 15 followees departs.A direct implementation of this rule would lead to a rather intractable model from the mathematical point of view.Rather than implementing this probability into the model directly, we perform a mapping to a completely equivalent process: we first create an equivalent network where we reduce the number of original out-going links k out links to an equivalent k equiv out , for instance, for node A from 15 to 3 (that is, k equiv out = q h × k out = 3).We then consider that if any of the 3 nodes in the equivalent network leave, then node A leaves with probability one.In the statistical ensemble, both networks are fully equivalent.The main parameter of the model is the average effective out-going links k out , which is obtained from the real data as explained in Fig. 9 and Section X.This mathematical trick, which has been introduced previously in [32], renders an untractable mathematical model, tractable.The probability q h which determines the effective out-going links is a parameter of the model and we will show in Fig. 2 that the experimental data on the five considered APS communities are within the upper and lower bound predicted by the theory of the effective k equiv out = 0.44 and k equiv out = 0.83. Figure 3a-f illustrates the cascading process in a simple network considered as the giant component in the model (black links, red nodes) plus the influence directed network (green links).
For simplicity we describe the process in the equivalent network, which is the one that is solved analytically.In Fig. 3a, a given pioneer departs as indicated.Such a departure produces a regular percolation process (not cascading) of disconnecting two other nodes from the giant component as seen in Fig. 3b.Additionally, the pioneer node produces an influence-induced departure of an extra node as indicated in Fig. 3c, which in turns produces another two influenced-induced departures as shown in the same figure.At this point, the cascading starts since the influence-induced departures produce extra percolation disconnections from the giant component of three nodes, as depicted in Fig. 3d.The process now continues back and forth between the simple percolation departure followed by the influence-induced departure until the cascading stops.For instance, one extra node departs in Fig. 3e due to influence leading to the final giant component of Fig. 3f where all the remaining influence links points towards nodes in the giant component and, therefore, no more cascading processes are possible.
The full cascading process is mathematically modeled on the equivalent network as follows (see SI for a detailed derivation): (i) We first apply a classical percolation process of random removal of a node (Fig. 3a), and remove all the nodes that become disconnected from the giant component due to the lost of the corresponding connectivity links to the network (Fig. 3b).In terms of generating functions, this process leads to the set of recursive equations SI Eqs. ( 16)- (18) as shown in [10].(ii) The next step corresponds to the process of node removal through correlated influence links.When a node leaves the network, the followers connected to the node via k in links leave too (Fig. 3c), triggering a cascading effect described by SI Eqs. ( 23)- (25).Notice that here we are applying this percolation step to the equivalent network and not the original.Thus, the nodes in the original network will still leave the network with probability q h < 1 when one of his k out followees depart.That is, the model is mathematically solved in the equivalent network (where nodes leave with probability one following influential nodes but with smaller number of links) but the real dynamics is still applied to the original network with the original number of out-links where nodes leave the network with a probability q h < 1. (iii) This influence-induced departure of followers can be mapped back to a second percolation removal of nodes (Fig. 3d) and the subsequent removal by influence (Fig. 3e).The whole process is captured by the set of iterative SI Eqs.(28) which describe a cascading process that terminates when all the influence links of the nodes in the giant component point to unremoved nodes in the same component (Fig. 3f).
C. Solving the cascading process (i) The first stage in the cascading process is described by p = p, where p is the fraction of links remaining in the giant component at a given stage in the cascading process.We obtain the size of the giant component x and the survival probability t of remaining nodes after the first removal of p nodes as: where and The physical meaning of t is that a node with connectivity degree k has a probability 1 − t k to belong to the giant component [36].
(ii) After the first undirected connectivity percolation step, we arrive at the second-stage removing process which is caused by influence links.In this new process, nodes are removed if and only if they reach any node outside the giant component following influence links.This removing process corresponds exactly to percolation on the directed influence network, where the correlation P(k, k in , k out ) needs to be explicitly taken into account.This process can be described by the following equations: where h = k x in k in is proportional to the sum of the in-degree of all nodes in x.These equations are derived in SI Eqs. ( 20)- (25).The physical meaning of y is that a node with out-degree k out will survive this removing process with probability y k out .It implies that integrating the initial removing in (i) and these two processes in (ii) is equivalent to randomly removing each node from the original network with probability 1 − py k out .
(iii) Therefore, the whole process (i) and (ii) can be thought of as a single removal in the original network with the definitions: p = pG x 1 (1,1,y) , while x and t remain the same as in Eqs. ( 2) and (3) (detailed derivations in SI Eqs. ( 26)-( 27)).
This kind of new "initial" removing can be described exactly by generating function.Thus, we arrive again to stage (i) to perform a modified undirected connectivity percolation step.The process continues until the cascading avalanche is over.
The above analysis leads to a set of recurring relations defining the cascading process.After the second stage, the current cascading effect can be mapped to a removing process in the original network.This property allows us to write down the cascading process as recursive equations, see also SI Eqs.(28), which allow us to solve the whole cascading process by finding its fixed point.
Integrating the above three stages, we can rewrite the cascading process as following: The physical meaning of these recursion relations is that, after each first stage, a node that is not removed in the initial attack has 1 − t k survival probability.After the second stage, the survival node can be mapped out to a removing process occurring on the original network with probability 1 − py k out .These two properties allow us to write down the formula of the relative size of giant component P ∞ at the final stable state of the cascading process at equilibrium:

D. Phase diagram
The model predicts the existence of first-order [38] and second-order phase transitions.When the transition is second order, we obtain an explicit formula for the percolation threshold (see SI Eq. ( 39) for derivation): Equation ( 8) generalizes the classical uncorrelated percolation result [29] to networks with influence links, and generic correlations, P(k, k in , k out ).The threshold for a first-order transition, p I c , is obtained through the implicit formula SI Eq. ( 40): where t(y, p) and y(t, p) are the functions describing the influence-induced percolation process according to SI Eqs. ( 40)-( 42).The boundary between the first and second order transitions in phase space is obtained by setting, p I c = p II c leading to SI Eq. (43).To determine the conditions for viral cascading of influence, we consider two cases in turn: uncorrelated and correlated networks.For uncorrelated networks, where the three functions are generic probability distributions.In this case the transition is of second order and p II,unc c is obtained explicitly (see SI Eq. ( 47)): where q 0 is the fraction of nodes with k out = 0.
Surprisingly, we still find p c = 0 for scale-free networks (due to the diverging second moment in Eq. ( 10) for γ < 3) despite the existence of influence links.This means that, without correlations, the influence links alone cannot sustain a viral spreading process to break down the strong resilience of scale-free networks; i.e., viral cascades cannot be sustained in an uncorrelated scalefree influence network.Indeed, empirically, we find that there exists strong correlations between k in , k out and k: the most active authors with large collaborative projects tend to receive and provide the largest influence from and to their peers (Fig. 3g).We find: where the correlation exponents are close to one (α = 0.91 ± 0.04 and β = 1.04 ± 0.05 for the APS data).When these correlations are included in Eq. ( 8), we predict a non-zero correlated percolation threshold (see SI Eq. ( 50)): Equation ( 12) is remarkable in two aspects: First, the value of p c increases sharply from zero as α increases until a maximum value that depends on γ and k out (Fig. 4a).The vulnerability increases until the term k α becomes dominant and stabilizes the network with the concomitant decrease of p II,cor c back to zero as α → ∞.Second, p II,cor c is independent of β, since x 2 = 1 in Eq. ( 8), implying, rather surprisingly, that the influence exerted by the large number of ingoing influence links of the hubs is not enough to produce viral spreading.
The theoretical results are tested against computer simulations of numerically generated scalefree networks with a prescribed set (α, β, γ, k out ).We first generate a scale-free network with a given value of γ = 2.5 and minimal degree equal to one.For a give node with connectivity degree k, the influence out-degree is proportional to k α and the in-degree is proportional to k β .We choose the number of in and out influence link from a Poisson distribution with average given by these two values.The Poisson distributions P(k out |k) and P(k in |k) are validated for the five APS communities in SI-Fig.8. Thus, we generate the connectivity network and the correlated influence directed links according to the connectivity degree of each node and (α, β).We then calculate numerically P ∞ (p) by performing a percolation cascading process directly on the network.Then, we calculate G(x 1 , x 2 , x 3 ) to obtain the theoretical predictions of Eqs. ( 29) and (39).Figure 3h shows the comparison between the theoretical results and the simulations.We obtain very good agreement between the predicted P ∞ (p) and p II c and the numerical estimation obtained by applying a percolation process to a correlated network with influence links with the parameters expressed in the figure .It is important to note that the generating function formalism is based on a locally tree-like assumption.Such an assumption is satisfied locally in random networks as well as scale-free networks and this is the reason of the very good agreement between theory and simulation in Fig. 3h.However, real networks are not tree-like and local clustering is an important property of any real world network.For instance, clustering in complex networks can be classified in two different classes, weak and strong.Strong clustering occurs where triangles in the network share edges, so that the multiplicity of edges can be high.Weak clustering occurs when triangles do not share edges.A formalism for weakly clustered networks has been recently considered in [39].However, strong clustering occurs more often in real networks.Thus, a more realistic theory would need to considered to capture the existence of strong clustering in real-world systems.
Taken together, these results paint a picture of a viral cascading process where few small players-not the hubs-initiate the cascades.However, the hubs play a key role in sustaining the cascades, not as pioneers but as followers: since the well-connected nodes receive a greater influence (via k out ∼ k α ), they are more "aware" of the latest developments.This allows the hubs to "jump" to the new trend easily.In percolation terminology, the random removal of nodes (which targets mostly low-degree individuals) becomes, at a later stage in the cascade, a targeted attack on the hubs via their large number of outgoing influence links.This is due to the cascading effect where low-degree pioneering nodes can now have easy access to the well-connected hubs through their large number of k out links.This effect explains the condition for the catastrophic fragility and the viral spreading in the highly correlated influence network.Contrary to expectations, the large ingoing influence of the hubs (via k in ∼ k β ) plays no role in sustaining the cascade.

E. Testing the theoretical results
We test our theoretical predictions by calculating P ∞ (p) from Eq. ( 7), and comparing with the collaborative networks of Fig. 2a (a comparison with LiveJournal is performed in the next section).The statistical estimation procedure of parameters (via, for instance, standard maximum likelihood of power-law distributions) to produce inputs to the mathematical model is a drawback of the modelization.Indeed, the mathematical model has probabilistic underpinnings, and therefore estimation based directly on those underpinnings would be more appropriate.Therefore, we directly use the empirical degree distribution into the theory to provide the theoretical estimation of the giant component in the (p, P ∞ )-plane in Fig. 2a.
Additionally, current approaches in the statistical literature [40] model the observed links of networks as the fundamental level of data hierarchy through which network structure is considered.Thus, rather that using an amalgamated data, such as degree distribution, for the base level of the model, we perform statistical analysis in terms of the exponential random graph models (ERGM) [40] using stochastic algorithms based on Markov chain Monte Carlo (MCMC) method to allow for the use of statistics (degree distribution) of a network as predictors of links.This method further allows for the estimation of distributions of functions based on a model for a network (predictive distributions).
For the exponential-family random graph model, we define , where p k (ω) is the probability of randomly choosing a node with degree k in a network ω.First, we use the empirical network to estimate all the parameters θ k .Then, we use MCMC to generate 100 networks with the same features captured by the model.We use the professional software ERGM obtained at http://statnetproject.org to estimate the parameters θ k and generate the 100 networks [41] (more details in SI).
We then employ a bootstrap method to estimate the exponent γ of the degree-distribution.
Employing bootstrap method combined with maximum-likelihood estimation and Kolmogorov-Smirnov test [34], we obtain the exponent of the degree distribution as γ =2.97 (0.01 standard deviation).
Furthermore, we use (α, β) = (0.91, 1.04) obtained from the real data.We also estimate the lower and upper bounds of the average influence degree k in = k out from the data (see SI).The empirical lower bound is k equiv out = 0.44, which gives rise to the predicted P ∞ (p) shown as the red curve in Fig. 2a with percolation threshold 0.54.The figure shows that this theoretical prediction provides a lower bound to the empirical data, providing support for the model.For larger k outfor fix (α, β)the vulnerability of the network increases according to larger percolation thresholds (inset Fig. 4a).Furthermore, a second-order transition at small k out turns into an abrupt firstorder transition at large k out , as shown in the phase diagram of Fig. 4b.As k out increases, the threshold condition changes from Eq. ( 8) to Eq. ( 9), and the transition becomes discontinuous.
For instance, while for k out = 0.44, the transition is just at the boundary between the first order and second order transition (red dot in Fig. 4b), for k out = 0.83, the transition becomes firstorder.The values k out = 0.44 and 0.83 provide a lower and upper bound of the empirical data P ∞ (p) as shown in the red and blue curves in Fig. 2a, providing further support for the model.
The first-order scenario implies a dramatic viral spreading where a network near its threshold will suddenly disintegrate by the departure of an infinitesimal number of its members.Since the empirical data in Fig. 2a is close to the upper bound provided by k out = 0.83 (specially the fields Phase Transitions, Fluctuations and Interfaces), it is plausible that these scientific communities are being disintegrated by catastrophic discontinuous events.
We would like to note that we do not claim to fit the five datapoints on the APS communities with a single functional form obtained from the theory.In fact, each point may corresponds to an independent percolation process given by a different k out .Instead, we show that these five datapoints are within the upper and lower bounds of the theory.Indeed, later we will repeat the same analysis to the LJ communities.These communities are much larger in number totaling 10,981 LJ communities.The data for this large number of communities is strikingly well fitted by the theory as we will see in Fig. 5.
One assumption of the model is that the scientists who leave a declining network, are leaving due to the influence exerted by the pioneers.However, scientists might be leaving a field for a variety of reasons such as a declining field and also to other destinations.
To study these questions, we have measured the ratio between departing scientists who moved to Complex Networks to the total number of departing scientists from a given field.The percentages for the different fields are Fluctuation: 86.0%, Chaos: 88.1%, Thermodynamics: 95.7%, Phase Transitions: 85.1% and Interfaces: 36.4%.Thus, except for the field of Interfaces, the other fields show a large percentage of departing scientists going to Complex Networks.
Yet, the fact that scientists leave a field to work in Complex Networks does not necessarily mean that they are following the pioneers.That is, the motives behind such a decision could be different and attributing the full attrition of a field to the influence of pioneers implies an assumption of causality which may not be satisfied.For instance, a scientist may leave the field under the impression that the original field is declining.
While the definite answer to this question would require a survey to know actual motives of scientists, the good agreement between model and empirical data (including the LiveJournal network studied next) suggests that the influence percolation model reproduces well the disintegration of communities.This result, in turn, suggests that an important mechanism behind the network disintegration are the correlated influence links.
We note that, while the model assumes local correlations between the different degrees of a given node, the correlation between the connectivity and influence networks themselves are neglected.To test for these correlations, we measure the average influence degree (in and out) of the nearest neighbors, k nn , of a node with connectivity degree k connected via an influence link in the APS networks (SI Fig. 10).We find a lack of correlations indicated by a flat k nn .If such a correlation exists, it may contribute to an under-estimation of the cascade effect.Therefore in SI we develop an extension of the theory to treat these correlations as well.

F. Disintegration of LJ communities
The mathematical modeling assumes that the settings are mutually exclusive, while in practice this may not be true in the example of the APS communities.For this reason, we also tested the model on another dataset: the communities formed by bloggers in LiveJournal.com(all datasets are available at http://lev.ccny.cuny.edu/∼hmakse/softdata.html).
LiveJournal (LJ) is a large social network of 8.3 million users who post information and articles of common interest.This community has been used in network studies of information flow and influence [26,27], since it was shown to have features consistent with other large-scale social networks.
We have recorded the posts in the LiveJournal social network from February 14th, 2010 to November 21st, 2011.We have also sampled the full network of LJ users and also the declared interest of each user which defines the community to which the user belongs.Data collections on the network has been performed every 1.5 months so that we have 14 snapshots of the entire LJ structure.The entire history of posts of users have been recorded continuously over the studied period of time.This information allows us to define the variables that are used in the model to describe the disintegration of communities: (a) The connectivity network: when users i and j declare their friendship in the network, (b) The influence network: when user i cites posts of user j, then we consider a directed influence link from i → j since user i is a follower of user j.Thus, we can define the respective degrees of connectivity links k and influence links, k in and k out and search for correlations between these variables.(c) Finally, each user in LJ declares a community to which the user belongs (sports, literature, etc).Therefore, we have the three main ingredients of the theory: connectivity and influence links and well defined communities that we can track over an extended period of time.Crucially, we are able to track those communities that are created and disintegrated in the number of 10,981.In LJ, the communities are declared by the users, and users change interest very often, creating and disintegrating communities quite often.In this case, the settings may be mutually exclusive, since the communities appear to be very dynamic and users change interests rapidly.
The analysis of the LJ communities reveals a remarkable result: In Fig. 5a we study the size of the giant components P ∞ of the disintegrated networks to produce the percolation plot of P ∞ as a function of the leaving pioneers 1 − p.We find that the disintegration of the LJ communities follows closely a quite generic curve in the (p, P ∞ )-plane indicating rapid disintegration and great fragility of the communities via cascading effects with critical percolation threshold p c = 0.962.
The empirical curve can be well-fitted by the theoretical model (Fig. 5a) when similar local correlations between connectivity and influence links as in the APS communities are taken into account (Fig. 5b).Our results are consistent with an abrupt first-order transition occurring during the disintegration process as indicated in the phase diagram of Fig. 4b.

III. DISCUSSION
From the development of new ideas, brand-new products to political trends, the present model shows the conditions for viral spreading: when k out and k become highly correlated (large α), a few individuals, who are not necessarily the hubs, can trigger a large cascade leading to network fragmentation.In conclusion, through mathematical and empirical calculations, we establish two emergent properties that result when overlying multiplex networks interact: (i) We mathematically derive the necessary conditions for sustaining a viral spreading process.
We show how damage in a network can, in turn, damage the influence network and vice versa, leading to viral cascades of followers.Our modeling predicts the conditions for these interactions to sustain a large viral spreading via a precise scaling form of the correlation function between multi-layers, Eqs.(11).
(ii) This theoretical prediction is in agreement with our empirical observations (Fig. 2a and   5a).We find that the conditions for viral spreading, Eqs.(11), to be valid in the studied networks.This viral effect is empirically quantified with the large percolation threshold (p c in Fig. 2a and   5a) which is also predicted by the theory.Contrary to expectation, the innovators are not the hubs, but the small players.
A related question arises whether there is a universal model to explain the fall of all kinds of communities/trends/topics even when characterized by different time scales.Such a model would be difficult to implement.Here we show that in two different networks, disintegration of communities can be understood via a modified percolation model in a multiplex network including correlated influenced links.These two networks have different time scales for disintegration where communities rise and fall in a matter of weeks (LJ) to years or even decades as in scientific trends in science.Certainly we have not exhausted all the cases, but the present data is indicative enough to suggest that the same modeling could be applied to other networks.In particular, it could be applied to Facebook of Tweeter where trends appear and disappear in similar fashion as in LJ.
Our results have consequences for a range of social, natural and also engineered systems.They cause us to rethink the assumptions about robustness and resilience of social networks, with implications for understanding viral spreading in social systems and the design of robust multiplex interconnected networks.In the present study, we have tested the theoretical results on typical cases of scientific collaboration network and online information dissemination, but the results are equally applied to a variety of interconnected multiplex systems with correlated influence links.These systems range from political networks, to financial markets and the economy at large.gree of pioneers in the largest cluster, k pio versus the maximum connectivity degree k max over the members of each PACS field.We find that in general the degree of the pioneers is much smaller than the maximum degree indicating that the pioneers are not the hubs (see also SI Table I).(e) Influence-induced percolation process: a final node is removed due to the influence link to one of the nodes removed in (d).(f) At the end of the cascade, the giant component is reduced to six nodes.(g) Empirical study of local correlations between influence degree and connectivity degree averaged over all APS networks.We obtain: k out ∝ k α and k in ∝ k β with α = 0.91±0.04 and β = 1.04±0.05.(h) Comparison between simulations and theoretical results.The symbols denote the simulation results and the curves are the prediction of theory for P ∞ (p).We use a scale-free network with γ = 2.5 and minimal degree 1.We first generate the connectivity network and then generate the correlated influence directed links according to the connectivity degree of each node and (α, β) and calculate P ∞ (p) by performing a percolation analysis directly on the network.Then, we calculate G(x 1 , x 2 , x 3 ) to obtain the theoretical predictions of Eqs. ( 29) and (39).We find that the theoretical results agree very well with the simulations.and found that the fraction satisfies that pioneers>followers>the rest.This suggests that influence links follows the influence links in a cascade of followers.

NETWORKS
We consider a network with both, bidirectional connectivity links and directed influence links.
Each node has three degrees: (k, k in , k out ) measuring the number of connectivity links, in-going influence links and out-going influence links, respectively.The properties of such a network are described by the three dimensional generating function: where the joint probability distribution P(k, k in , k out ) describes the local correlations among (k, k in , k out ).
In the following we denote the higher order derivatives as: If node i is being influenced by node j, there is a directed influence link from node i to node j.
In a real system, the influence of the piers is applied with a given probability q h which is less than 1.
That is, even if there is an influence link from i to j, if j departs, then i will depart with probability q h which in general is smaller than 1.In SI Section X we explain in detail how to estimate q h from the APS data.This effect is taken into account in the model.In order to simplify the problem, the removal following influence links is analogous to randomly deleting a 1 − q h fraction of influence links, and then assuming that all of the remaining nodes connected via influence links are removed with probability 1.Without losing any generality, in our analysis we set this probability to be q h = 1, as done in previous percolation studies [32].Next, we analyze the cascading process following the recursive stages: percolation process → influence-induced percolation process → percolation process → influence-induced percolation • • • .

A. First Stage: Classical Percolation Process
The cascading process is triggered by initially randomly removing a fraction of 1 − p nodes.
We use p to denote the fraction of remaining nodes after initial random removing.Thus, at this first stage, we have: The generating function of the connectivity degree distribution related to the branching process is G x 1 (1,1,1) [10, 28,33].Thus the giant component size x after the first stage of the percolation process can be written as: where, and The physical meaning of the quantity t is that, a node with connectivity degree k will have 1 − t k probability staying in the giant component [36].Accordingly, we get the average in-degree in the giant component of size x, k x in , as: B. Second Stage: Influence-Induced Percolation Process In order to treat the local correlations between (k, k in , k out ), we develop a generating function theory for the first time by combining the percolation process on the connective links and the influence directed links in a correlated fashion.It is instructed to treat first the second stage assuming that the network has influence links but they are uncorrelated and then we will generalize the results to the existence of correlation.
Let H(u) be the generating function for the probability of reaching an outgoing component of a given size by following a directed links on the original network.According to reference [37], H(u) can be written as To calculate f max w , we consider the number of authors that cite a given author and calculate the number of influence links which are active as those from a follower who actually leave the network: f max w = a max w A w and f min w = a min w A w , where a max w and a min w are the maximum and minimum number of active influence links with weight w, and A w is the number of links from the followers to the pioneers including both active and inactive links with weights w.For the calculation of f min w , if a follower who moves to Complex Networks depends on several pioneers, we consider the influence link with the largest weight.SI Figure 9 illustrates the calculation with an example.Applying this calculation to the APS communities we find the lower and upper bounds of the average influence links as k out ∈ [0.44, 0.83].The lower bound k out = 0.44 is used to calculate P ∞ (p) in Fig. 2a, which shows how all the empirical data is between the estimated bounds.
To measure the correlations between connectivity degree and in-and out-degree, first, we employ the above method to measure the influence probability for each directed influence link for a given network.Different weights of the directed links gives rise to different probabilities.We keep the directed links with these probability and record all pairs (k, k out ), (k, k in ).We repeat this calculation 20 times and compute the average in-and out-degree for all the node whose connectivity degree is k.Using the pairs (k, k out ) and (k, k in ), the correlation scaling law of Fig. 2d can be obtained.

WORKS
We investigate the correlations between both networks by measuring the average influence degree (in and out) of the nearest neighbors, k nn , of a node with connectivity degree k connected via an influence link.
For the nodes with degree k, we measure the average degree k nn of their following and follower nodes.We find that there is almost no correlations between k nn and k as shown in Fig. 10.
The lack of correlations is indicated by a flat curve in the plot.We note that due to the fact that the degree distribution is power-law, the second moment is very broad, so that it makes the error bars  We observe that the hubs with larger number of cooperators k and that are influenced by more scientists (larger k out ) will move into a new field with larger probability.
on the most important fields contributing to the rise of Complex Networks.These fields include: Fluc-tuations (PACS 05.40), Chaos (PACS 05.45), Phase Transitions (PACS 64.60), Thermodynamics (PACS 05.70), and Interfaces (PACS 68.35).These fields have experienced either a decline or slower growth in the number of published papers after 2000 as shown in Fig. 1c.We notice that scientists publishing in Complex Networks still include in their papers the original PACS number of, for instance, Phase Transitions or Chaos.Therefore, the decay in the frequency of citations plotted in Fig 1c is not very large, even though scientists are already working in the field of Com-

FIG. 1 .
FIG. 1. Rise and fall of communities.(a) Comparison of activity in "Myspace.com"vs "Facebook.com"from 2004 to 2012 through Google's Search Volume Index [35], which measures the number of Google searches of each word.The fall of myspace coincided with the rise of Facebook suggesting that users moved from one network to the other.The tipping point can be identified on March 25, 2007.(b) The size of the largest (giant) connected components of scientists studying "Phase Transitions" (PACS 64.60), and "Complex Networks" (PACS 89.75) from 1997 to 2009.The steady increase in the study of Phase Transitions declined as the Complex Networks field started to grow in the physics community.(c) The frequency of citations per year of the top five fields contributing scientists to the rise of Complex Networks.FIG. 2. Cascades of followers through the influence of pioneers in APS.(a) Relative size of the largest component, P ∞ (p), of collaborating scientists in the indicated fields after the departure of 1 − p pioneers to the field of Complex Networks (s = 2, other values yield similar results).The solid curves denote the different theoretical predictions.Black curve (extreme resilience): classical percolation theory on a scale-free network predicting p c = 0 [29-32].Red curve (high vulnerability): prediction of influence-induced correlated percolation for (α, β, k out ) = (0.91, 1.04, 0.44) with a predicted threshold very close to the boundary between first and second order transition p I c ≈ p II c = 0.54.Blue curve (extreme vulnerability): prediction of influence-induced correlated percolation for (α, β, k out ) = (0.91, 1.04, 0.83) giving rise to a first-order transition with p I c = 0.97.This means, the departure of 3% of pioneers will cause cascading followers and collapse of the original network.The red and blue curves are bounds to the real data.(b) The influence network (blue links) of collaborating scientists in the field of Phase Transitions.Green nodes are a sample of pioneers of the field of Complex Networks and the yellow nodes are their closest followers that departed afterwards.It is apparent the large cascading effect produced by the departing nodes.Black links are connectivity links.(c) The largest connected component of the collaboration networks of Phase Transitions up to 2001 and its reduced state in 2005-2009 with the concomitant creation of Complex Networks.(d) Pioneers are not the hubs.The average de-

FIG. 3 .
FIG. 3. Modeling the cascading process of followers.(a) Sketch of the model network (considered as the giant component) with undirected connectivity links (solid black links) superimposed by a network of directed influence links (green links).The nodes are characterized by (k, k in , k out )

FIG. 4 .
FIG. 4. The phase diagram predicted by the influence percolation model with correlation.(a) Prediction of the percolation threshold versus α according to Eqs. (9) and (12) for (β, γ) = (1.04,2.90) and k out as indicated.Solid lines denote the region in α of second order transitions, while dashed lines denote first-order transitions.Inset shows the increase of p II c with k out for (α, β, γ) = (0.91, 1.04, 2.90).(b) Phase diagram denoting the areas in the plane (α, β) of first-order and second-order regimes for two values of k out as indicated.The first order regime is inside the indicated curve, while the second order is outside for a given value of k out .The location of the APS networks in the phase diagram is indicate as a red dot.The networks are located near the boundary of the transitions for k out = 0.44 and inside the region of first-order for k out = 0.83.The LJ network is also located in the first-order transition regime.FIG. 5. Disintegration of communities of bloggers in LiveJournal.com.(a) Percolation plot in the plane (p, P ∞ ) of the declining communities.The communities follow a generic curve in the plane (p, P ∞ ) quantifying the rapid disintegration and cascading effects.We find a very good ) w, as shown in SI Fig. 9.The lower and upper bounds of the averages k out (= k in ) are estimated from the interval [ w n w f min n w is the number of influence links with weight w in the giant component, N is the size of the giant component and f min w , f max w are the minimum and maximum active influence links estimated as follows.

FIG. 6 .
FIG.6.Comparison of the fraction of papers published by pioneers, followers and followers of followers.Followers are the scientists who at least cited one paper of the pioneers between 1997 to 2001 and are not the cooperators of pioneers.For a given PACS number as indicated in the figures, we record the number of papers published in the PACS number and the number of papers published only in 89.75 (Complex Networks) for each author.Then we get the total number of papers published in complex networks N net and the number of papers with these two PACS numbers N all for every author.The vertical axis shows the fraction N net N all as a function of time, which approximately satisfies pioneers>followers>the rest.This result suggests that influence links are important for cascading dynamics.

12 FIG. 9 . 1 = 3 7 , f max 2 = 2 2 = 1 , f max 3 = 1 , and f min 1 FIG. 10 .FIG. 11 .
FIG. 9. Estimation of f minw and f max w to obtain the bounds in k out from the empirical data.In the figure, we show three pioneers (red nodes), citation links with weights w given by the number of citations, and all of the followers of the pioneers (open circle).The purple open circles denote the followers who move to Complex Networks, while the blue open circles do not move.For the calculating of f max w , we consider the links that are active as those from followers who move to Complex Networks.In the figure, all the green and purple directed links are active.We estimate f max w = a max w A w and f min w = a min w A w as indicated in the text.For the calculation of f min w , if a follower who moves to Complex Networks depends on several pioneers, we consider the influence link with the largest weight.For instance, for all the green influence links in the figure, the links from node 12 to 1 with weight 3 is active and the links from 12 to 2 and 3 are not active.In this case we obtain: f max 1 = 3 7 , f max 2 = 2 2 = 1, f max 3 = 1, and f min 1 = 2 7 , f min 2 = 1 2 , f min 3 = 1.