Effect of Coagulation of Nodes in an Evolving Complex Network

Wataru Miura, Hideki Takayasu, and Misako Takayasu Department of Computational Intelligence and Systems Science, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G3-52, Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan Sony Computer Science Laboratories, 3-14-13, Higashigotanda, Shinagawa-ku, Tokyo 141-0022, Japan Meiji Institute of Advanced Study of Mathematical Sciences, Meiji University, 1-1-1 Higashimita, Tama-ku, Kawasaki 214-8571, Japan (Received 17 November 2011; published 16 April 2012)

Recently, network structures have been attracting a lot of attention among physicists because such structures are considered to be gateways for clarifying complex inhomogeneous interactions among many elements. The scale-free network [1][2][3][4][5] is one of the universal key words in this area of study, where the distribution of the number of links, called the degree distribution, follows a power law.
The BA model [6,7] proposed by Barabási and Albert was the first simple model of network growth realizing a power law, introducing the concept of preferential attachment, that is, a new link is more likely to be attached to a node having a larger number of links, and many variant models have since been introduced [6][7][8][9][10][11][12]. Although the original BA model is applicable only for a monotonically growing network, Moore et al. introduced annihilation of nodes, and a power-law degree distribution was shown to be realized when the rate of creation outweighs that of annihilation [11]. Moore's model has wider applicability; however, the model can explain only power-law degree distributions with exponents larger than 3. In the real world, there are cases in which nodes appear and disappear like the situation in Moore's model, but the degree distributions follow power laws with exponents smaller than 3, as in the case of business firm networks [13]. In the case of a business relationship network in Japan, it is confirmed that the cumulative degree distributions clearly follow power laws with exponents of about 1.3 (i.e., the probability density exponent is 2.3) for both the money flow network and for its conjugate, the material flow network [13]. Therefore, we need a new model to understand the power laws.
The particle models of aerosol and colloids extensively studied around 1990 provide new insights into the mechanism of power-law mass distribution [14][15][16][17][18]. In these models, the statistically steady state is realized by the balance of particle numbers: an increase of particles by continuous injection of small particles and decrease of particle numbers by coagulation. In the process of coagulation, the mass of the aggregated particle equals the sum of the initial two particles. In this case, the mass distribution in the steady state follows a power law with an exponent in a wide range, which depends on the collision probability [19,20].
Here, we propose a new model of network growth that considers the coagulation process of nodes corresponding to the merger of firms, in addition to annihilation and creation. Similar ideas of merging nodes have already been considered in the study of complex networks for the purpose of coarse graining the structure of networks [21][22][23]. However, here we introduce the real merging of nodes as in the coagulation of aerosol masses. By solving the model both numerically and analytically, we identify in this Letter that the role of coagulation of nodes is essential in the steady power-law distribution of the number of links for cases such as the power-law mass distribution of aerosols.
We start with a data analysis of the network structure of business relationships. The first step toward construction of a growth model of networks is to observe the distribution of firms' lifetimes. In Fig. 1(a) the cumulative distribution of lifetimes is plotted in semilog scale, which shows it is well characterized by an exponential function, Pð! tÞ / expðÀt=Þ, where ¼ 18:8 years is the characteristic decay time. There is a cutoff at a lifetime of 137 years simply because the credit reporting agency started data collection in 1868. This exponential distribution is roughly consistent with the simple assumption that a firm disappears randomly following a Poisson process.
As a second step, we observe the effect of preferential attachment. To observe the preferential attachment rate from the data, we categorize the business firms in the network into two groups: old firms (those founded before 2004) and new firms (those founded 2004 or after). We construct the network structure with the old firms and check the degrees of business partners one-by-one for each entry of a new firm to determine QðkÞ, the probability of connecting to an old firm with degree k in the money flow network representation. Then, we divide this probability by NðkÞ, the number of nodes with degree k, and the preferential attachment rate Åðk i Þ is defined as QðkÞ=NðkÞ. We observe the following integrated attachment rate function to reduce fluctuation, as introduced by Jeong et al. [24]: As shown in Fig. 1(b), the obtained ðkÞ is approximated by power laws for both in-degree and out-degree with exponents in ¼ 2:09 and out ¼ 2:05, which are close to 2, demonstrating that the preferential attachment rate is actually proportional to the average number of links for real business firms. The third step is to observe the relation between the number of links and the lifetime. In Fig. 2(a), the value of degrees averaged over the bins of a lifetime is shown as a function of the lifetime in the semilog plot; the results imply that the degree grows exponentially with the lifetime, kðtÞ / expðAtÞ, where A ¼ 0:017. This exponential growth observed in the real network is not consistent with   that of the BA model, in which the mean degree is known to grow proportionally to the square root of the lifetime [6]. Combining the above two empirical exponential relations, we can naively derive a power-law cumulative degree distribution Pð! kÞ / k À , where ¼ 1=ðAÞ, ¼ 3:1; however, this value is not consistent with the exponent for the real degree distribution, ¼ 1:3, as shown in Fig. 2(b). Apparently, the combined effects of random annihilation and growth by preferential attachment are not sufficient to capture the scale-free nature of a network in the real world. For this reason, we add one more effect to the evolution of the network, that is, the merger of two business firms or the purchase of a small firm by a big firm, both represented by node-node coagulation in the network representation, as schematically shown in Fig. 3(a).
We now introduce a simplified model of the growth of a business network in the following way. We start with N 0 nodes with any configuration of links and evolve the system by choosing one of the following three events stochastically for T time steps.
Annihilation.-A randomly chosen node is removed, along with all links connected to this node.
Creation.-A new node having two links (one is an out link and the other is an in link) is added. Each link is connected to a node chosen randomly following the preferential attachment.
Coagulation.-A randomly chosen node is merged with a partner node randomly chosen following the preferential attachment. All the links connected to the nodes are also rewired to the partner node. If two identical links appear after this rewiring, those identical links are merged to produce a single link.
The occurrence probabilities of these events are a, b, and c, respectively, satisfying a þ b þ c ¼ 1. This is because the number of nodes decreases in the event of annihilation and coagulation, whereas it increases in the event of creation. An additional condition, b ¼ a þ c ¼ 0:5, is required to realize a kind of statistically steady state in the number of nodes; without this condition, the number of nodes would either monotonically decrease or increase on average. In this model, we define the preferential attachment rate to be proportional to the degree plus 1, which makes it possible for an isolated node to grow. Figures 3(b) and 3(c) show numerical simulation results for the case of c ¼ 0:3, the initial number of nodes N 0 ¼ 10 5 , and the number of time steps T ¼ 10 7 . Because of the effects of the random occurrence of annihilation and coagulation, the distribution of the lifetime of nodes is confirmed to follow an exponential distribution with a factor 5:02 Â 10 À6 . By comparing the exponent with that of the real network already shown in Fig. 1(a), we can estimate that 1 yr in the real network corresponds to 10 600 steps in our simulation. The time evolution of the distribution of the number of links is shown in Fig. 3(b), starting with two extreme initial link configurations: all bare nodes having no links, and the complete network, that is, all pairs of nodes are linked in both directions. We can observe that the effect of the initial condition gradually vanishes and the statistics of links converge to those of a steady state by time step 10 7 . In the steady state, the link number distributions both for in links and out links follow power laws with exponent about 1.3, consistent with the real network. As shown in Fig. 3(c), we can confirm that the basic property of exponential growth of the mean number of links is reproduced automatically in this steady state.
The steady state distribution depends on the value of coagulation rate parameter c, as shown in Fig. 3(d). For small values of c, the link number distribution converges to a distribution close to an exponential distribution, whereas for larger values of c, the steady distribution is well approximated by a power law. We can show this basic property analytically by solving the master equation for the in-degree probability density at time t, Pðk; tÞ: Pðk; tÞ : Here, the effects of the three basic processes, annihilation, creation, and coagulation, have coefficients a, b, and c, respectively, in the right-hand side. This equation belongs to the class of Smoluchowski equations which describe irreversible mass aggregation processes of colloids or aerosols under the mean field approximation. Namely, the aggregation of links caused by the merging process of nodes in a complex network is mathematically equivalent to the mass aggregation process caused by irreversible collisions of particles. We solve this master equation analytically for two extreme cases: no coagulation (c ¼ 0) and maximum PRL 108, 168701 (2012) P H Y S I C A L R E V I E W L E T T E R S week ending 20 APRIL 2012 168701-3 coagulation (c ¼ 0:5). In the case of no coagulation, we have the following steady state equation by substituting the mean degree conservation, b À ahki ¼ 0, into the master equation, Eq. (2). As there are no nonlinear terms, for the no coagulation case, Eq. (2) can be solved analytically by introducing the generating function PðkÞ $ ðk þ 2ÞÀð0; ðk þ 1Þ log2Þ À 2 Àk ; (3) where Àðx; dÞ is the incomplete gamma function. For large values of k, the distribution function of degree decays exponentially, which is consistent with our numerical simulation.
In the case of maximum coagulation (c ¼ 0:5), it is known that this type of Smoluchowski equation has a power-law solution, PðkÞ $ k ÀÀ1 , when the coagulation kernel for particles with masses k 1 and k 2 is proportional to 1, k 1 k 2 , or k 1 þ k 2 [25]. Assuming the functional form as a power law, we can confirm that Eq. (2) is satisfied only in the case of ¼ 1. Namely, the steady state solution in the maximum coagulation case is given by the cumulative distribution Pð! kÞ $ k À1 , which is consistent with numerical simulation, as shown in Fig. 3(d).
We can solve the steady state solution of Eq. (2) analytically using the computer algebra system Mathematica with the boundary condition Pðk max þ 1Þ ¼ 0, where k max is the largest value of the number of links. For comparison, we performed Monte Carlo simulation with the same parameter sets in parallel and we numerically estimated the parameter values needed to solve Eq. (2), hki and k max . In all cases, the analytical solutions and the results of Monte Carlo simulation agreed very well, as shown in Fig. 3(d). In all cases, the link number distribution decays exponentially for very large k, but there exists a range of k which can be approximated by a power law. This is a typical property of mass distributions of random coagulation systems with injection, such as the case of aerosols. The determined distribution which best agrees with the real link distribution, Fig. 2(b), is the case of c ¼ 0:3. We cannot conclude whether the apparent power-law exponent, ¼ 1:3, is a genuine value; however, the master equation, Eq. (2), definitely describes the essence of the growth dynamics of the coagulation of nodes in a complex network.
To summarize, we have shown that in the growth of a complex network, the effect of coagulation plays a central role, along with those of random annihilation, creation, and preferential attachment. Previous theoretical studies of complex networks revealed the importance of one-by-one rewiring of links [7]; here, we found that massive simultaneous rewiring of links caused by node-node coagulation is very effective, especially in the real world example of a business firm network. We expect that there are other network systems in which the effect of node-node aggregation is not negligible, and the balance of coagulation and creation governs the statistically steady state.