Interacting discovery processes on complex networks

Innovation is the driving force of human progress. Recent urn models reproduce well the dynamics through which the discovery of a novelty may trigger further ones, in an expanding space of opportunities, but neglect the effects of social interactions. Here we focus on the mechanisms of collective exploration and we propose a model in which many urns, representing different explorers, are coupled through the links of a social network and exploit opportunities coming from their contacts. We study different network structures showing, both analytically and numerically, that the pace of discovery of an explorer depends on its centrality in the social network. Our model sheds light on the role that social structures play in discovery processes.

Innovation is the driving force of human progress. Recent urn models reproduce well the dynamics through which the discovery of a novelty may trigger further ones, in an expanding space of opportunities, but neglect the effects of social interactions. Here we focus on the mechanisms of collective exploration and we propose a model in which many urns, representing different explorers, are coupled through the links of a social network and exploit opportunities coming from their contacts. We study different network structures showing, both analytically and numerically, that the pace of discovery of an explorer depends on its centrality in the social network. Our model sheds light on the role that social structures play in discovery processes.
Discoveries are essential milestones for the progress of our societies [1][2][3][4][5][6][7][8][9][10][11]. Recently, different mathematical approaches have been proposed to model the dynamics of innovation [12][13][14][15][16][17][18][19][20][21][22][23]. Among these, of particular interest are those based on random processes with reinforcement [24][25][26], such as Pòlya urns [27,28]. Urns have been extensively used to study and model a variety of systems and processes, from evolutionary economics, voting and contagions [29][30][31][32] to language and folksonomies [33,34]. More recently, they have been employed to filter information [35] and grow social networks [23]. Interestingly, urns can also be used to model discovery processes, if opportunely combined with the concept of the adjacent possible (AP)-the set of all those things which are one step away from what is already known (Kauffman [36]). This formulation of the AP, which dates back to concepts previously introduced by Farmer, Langton and others [37][38][39], has been translated into the urn model with triggering (UMT), a particular process in which the space expands together with the discovery dynamics, and the appearance of a novelty opens up the possibilities of further discoveries [4,[40][41][42][43]. UMTs could successfully replicate the basic signatures of real-world discovery processes, such as the famous Heaps' and Zipf's laws [44,45], often recurrent in complex systems [15,[46][47][48][49][50], as well as Taylor's law [51]. It turns out that the Heaps' law, a sublinear growth of the number of distinct elements D(t) ∼ t β with the number of elements t, well describes the pace at which scientists discover concepts, or users collect new items [40,52,53], with higher values of β denoting a faster exploration of the AP. However, despite the fact that existing models can capture essential underlying mechanisms behind the discovery of novelties, little emphasis is given to the collective dynamics of exploration and to the benefits that social interactions could bring. In fact, with the exception of * E-mail: i.iacopini@qmul.ac.uk † E-mail: v.latora@qmul.ac.uk Ref. [23], the modeled exploration dynamics refers to a single entity, representing, for example, the joint effort of researchers within a field [52]. Without taking into account the multiagent nature of the process, these models (i) do not capture the heterogeneity of the pace of the individual explorers and (ii) do not include the benefits brought by social interactions and collaborations. Indeed, empirical evidence of these mechanisms has been found in various contexts [54][55][56], such as music listening, politics, voting, and language [57][58][59].
In this Letter, we propose a model of interacting discovery processes where an explorer is associated to each of the nodes of a social network [60][61][62], and its dynamics is governed by an UMT. Hence, the local dynamics of each node accounts for the presence of an AP, more precisely the adjacent possible in the space of concepts. The social network makes the exploration a collective one, since processes of neighboring urns are coupled. This coupling expands the notion of the AP by adding a social dimension, represented by the set of opportunities one is possibly exposed to through his/her social contacts. We call this the adjacent possible in the social space. Social networks have been extensively used as a substrate on top of which dynamical processes take place [63,64]. Notice, however, that our setting crucially differs from the typical approach in which the network mediates, for example, the diffusion of innovations or social contagions [65,66]. Here, the interactions among the many discovery processes reveals the twofold nature of the AP of each individual, highlighting the crucial role played by the social structure in determining the individual exploration dynamics.
Model.-Let us consider an unweighted directed graph G(N , E), where N and E are, respectively, a set of N = |N | nodes and a set of E = |E| links. Each node of the graph represents an individual/agent, while link (i, j) denotes the existence of a directed social relation from individual i to j (such that i can benefit from j). The graph is described by its adjacency matrix A ≡ {a ij }, whose element a ij is equal to 1 if link (i, j) is present, and is 0 otherwise. Each node i is FIG. 1. Illustration of the model in the case of a network with two nodes. Each node is equipped with an urn obeying the UMT with the same parameters ρ = 2, ν = 1, and M0 = ν + 1. At the time t, the urns start with two balls, one red (R) and the other blue (B). Then, each node extracts a ball (1:R, 2:B), and therefore ρ additional balls of the same colors are added to the respective urns (reinforcement). Also, since in both cases, the extracted balls represent a novelty for the respective nodes, ν + 1 balls of new colors are also added (adjacent possible). At t + 1, node 1 has access to all its balls plus two extra ones coming from the adjacent possible in the social space, i.e., the set of balls available through its neighbor (dashed borders).
equipped with an UMT that describes the discovery process of the agent i [40]. We indicate the urn i at time t as U i (t), while S i (t) denotes the sequence of balls generated up to time Each urn i is characterized by two parameters, ρ i and ν i . As in the original UMT, the reinforcement parameter ρ i accounts for the number of balls of the same color that are added to the urn i whenever a ball of a given color is extracted at time t. Furthermore, the triggering parameter ν i controls the size of the adjacent possible in the space of concepts, as (ν i + 1) balls of new colors are added to the urn of node i whenever at time t a color is extracted for the first time [40]. In this abstract representation, the space of concepts-made by all the colors-expands in time together with each discovery process, without relying on a predefined structure [41]. Discovery processes of different individuals are then coupled through the links of the network, representing social interactions. Namely, at each time t, the individual i draws a ball from an enriched urn, the so-called social urn of node i,Ũ i (t), composed by its own urn plus the additional balls present at time t in the urns of its neighbors, without their reinforcement. The latter represents the AP in the social space. Figure 1 illustrates the case of two nodes with a directed link. We thus have: is the underlying set of the multiset U j (t) (with multiplicity m = 1), i.e., the set of size U j (t) = |U j (t)| formed by its unique elements. Duplicates in the urn associated to node j at time t are indeed not considered. Thus, the "memory" of node j due to the reinforcement does not influence node i. Similarly, let us denote with S i (t) the underlying set of the sequence S i (t), i.e., the sequence of all the unique elements of S i (t). We consider synchronous updates for all the urns.
Pace of discovery.-As previous works have shown [40], the dynamics of novelties and innovations share a number of commonalities and can, thus, be thought as two sides of the same process; a novelty refers to the discovery of something by an individual (already known to others), while innovations are novelties that are new to everybody. Here, we are interested in studying the asymptotic growth of the number of novelties-of each sequence-as a function of time (sequence length), representing the pace of discovery. We know, from standard results on the UMT [40], that an isolated urn i follows a Heaps' law, i.e., a power law behavior D i (t) ∼ t βi [44], D i (t) = |S i (t)| being the number of different elements contained in the sequence S i (t) up to time t. Thus, the Heaps' exponent β i quantifies the speed at which the urn discovers new elements (by definition bounded by β i ≤ 1). Let us consider now a node i that interacts through the network. In general, since D i (t) increases by one every time a ball is extracted for the first time, we can write 1] is the probability that the ball extracted at node i at time t never appeared in S i (t) before. In other words, and we can express it as the fraction of discoverable balls over the total number of balls available to node i at time t. This leads to an equation for the asymptotic Heaps' dynamics that in the continuous time limit reads: where A B denotes the multiset obtained by removing all the elements in set B from the multiset A (all duplicates are removed). Notice that if a node i has an out-degree j a ij = 0, its associated Eq. (2) reduces to the one of an isolated urn, for whichŨ i (t) = U i (t). Thus, its Heaps dynamics for ρ > ν follows D i (t) ∼ t ν/ρ for t → ∞ [40,51] (see Supplemental Material [67]). In the most general case, where each node i is equipped with a UMT(ρ i , ν i ), the equation for the Heaps' laws of each node i ∈ N can be written as in Eq. (2), by accounting for all the neighbors that are part of the social urn of node i. This can be done by using the non-zero elements of A, so that the number of ballsŨ i (t) in the social urn of node i at time t reads: where M 0 is the initial number of balls in each urn, and δ ij stands for the Kronecker delta. Finally, the large time behavior of the number of different elements D i (t) for each node i can be written as Equation (4) forms a system of N coupled non-linear ordinary differential equations, with initial conditions D i (0) = 0 ∀i ∈ N , that can be numerically integrated for any network topology {a ij }. Numerical results.-We start exploring the behavior of our model on the famous Zachary Karate Club network (ZKC) [73], where each node is equipped with an UMT(ρ = 6, ν = 3) with same parameters and initial conditions. We run different simulations and observe, for each node i, the average growth of the number of distinct elements D i (t) as a function of time. We then extract the values of the Heaps' exponents of each node as β i = β i (T ), where β i (t) = ln D i (t)/ ln t and T = 10 4 . Figure 2 shows the nodes of the networks colored accordingly. Notice the higher pace of discovery displayed by the notoriously central nodes corresponding to the instructor (node 1) and the administrator of the club (node 34). This proves that nodes with identical UMTs can have completely different dynamics, suggesting that a strategic location on the social network correlates with the discovery potential of an individual. To further investigate this relation, we study the dynamics on five small directed networks. Figure 3(a-e) shows the temporal evolution of D i (t) for each node i of the networks displayed on the left. We report the simulated Heaps' laws (colored points), whose extracted exponents β i are shown in the legend. In addition, to assess the validity of Eq. (4), we also plot the curves (continuous black lines) obtained using the appropriate {a ij }. It can be seen that the analytical formalism introduced perfectly captures the Heaps' laws, since lines are almost indistinguishable from (simulated) points. In particular, in Fig. 3(a) we observe the highest pace of discovery in the node with more outgoing links. However, the non-trivial behaviors observed in Fig. 3(b-e) for chains and graphs containing cycles indicate that the exponent of a node does not depend solely on local node properties. For instance, in Fig. 3(d) node 2 has two outgoing links, while the others have one link only. In contrast with what is observed in Fig. 3(a), here the highest pace of discovery is the one of node 1, whose social urn gets the benefits of the urn of node 2. Moreover, in Figs. 3(c) and (d) a simple change of direction of link 4 → 2 translates into completely different dynamics. We also notice that in both typical of empirical trajectories of diffusion process [74], here are strongly influenced by the topology of the network. This can be seen by comparing the two β 1 (t) of Fig. 3(f) and (g), both approaching-as we will see later-the asymptotic value ν/ρ = 0.5 but at very different timescales. Nevertheless, the ranking induced by the pace of discovery persists at all finite times. In the next section we will further investigate this characteristic behavior, ultimately proving its universality for all networks (see Supplemental Material [67]).
Analytical results.-In order to extract the asymptotic values of the Heaps' exponents, and their dependence on the network topology, we derive an analytical solution of Eq. (4) for t → ∞. Let us suppose ρ i = ρ and ν i = ν ∀i ∈ N . For sufficiently high values of ρ we have lim t→∞ D i (t)/t = 0 ∀i, so that the denominator of the rhs. of Eq. (4) can be approximated by ρt, leading to: where D(t) ≡ {D i (t)} i=1,...,N , I denotes the N × N identity matrix, and we have introduced the constant matrix M = f (A) = ( ν ρ I + ν+1 ρ A). By operating the change of variable t = e z , Eq. (5) can be rewritten as d z D(z) ≈ M D(z), a standard first-order differential system, which leads to the solution where {λ } =1,...,r and {m } =1,...,r are the eigenvalues of M with their respective multiplicities, and c p are vectors defined by the initial conditions. The asymptotic behavior of D i (t) is then governed by the leading term in Eq. (6), so that: where λ(i) is the eigenvalue of M with the biggest real part such that the i-th entry of at least one of its eigenvectors c p is different from zero. Similarly, p(i) is the maximum value of p among these eigenvectors, and, in general, can be less than the multiplicity of the eigenvalue λ(i) minus one. For example, in the case of a chain as in Fig. 3(b), the asymptotic solution is D i (t) ∼ u i ln N −i (t) t ν/ρ . In this example all the exponents tend to ν/ρ at large times, while at finite times nodes with higher powers in the logarithm show higher paces of discovery, thus explaining the behavior seen in Fig. 3(g) (see Supplemental Material [67]).
In the case of strongly connected graphs, Eq. (7) simplifies: the logarithmic correction disappears and all the asymptotic exponents are equal to the maximum eigenvalue λ = f ( µ) of M . In fact, for the Perron-Frobeniusnius theorem [75,76], A has a simple and positive maximum eigenvalue µ corresponding to an eigenvector u with all positive entries. Thus, the approximated solution becomes: where u i is proportional to the Bonacich eigenvector centrality [77] of node i, a global indicator of centrality that recursively quantifies the importance of a node from that of its neighbors, and not just from the number of neighbors. As a consequence of Eq. (8), for strongly connected graphs every node has approximately the same behavior t λ . What makes a node different from another is precisely the multiplicative factor u i . In cycles and cliques, nodes are all structurally equivalent (u i = u ∀i), meaning that they all have the same D i (t).
On the contrary, in graphs such as the ZKC (see Fig. 2), the different values of u i play a very important role. Most central nodes, as the instructor and the chief administrator, are the fastest explorers (highest β i ), even having the same asymptotic Heaps' exponent λ.
In the more general case in which a graph is not strongly connected, Eq. (7) still holds, and the same argument can be applied to each of the strongly connected components to recursively find the values of u i , p(i), and λ(i) (see Supplemental Material [67]). In such cases, the eigenvector centrality needs to be replaced by its natural extension to non-stronglyconnected graphs, i.e., the α-centrality [78]. We have investigated the correlation between the α-centrality and the pace of discovery in real-world networks. Figure 4 shows the scatter plot of the number of discovered colors D i (T ) and the normalized α-centrality c max in four empirical social networks: (a) the ZKC [73], (b) a Twitter network of followers [79], (c) a co-authorship network in network science [80] and (d) a collaboration network between jazz musicians [81] (see Supplemental Material [67]). The high values of the Spearman's rank correlations (r S ≥ 0.97 in all cases) found in both undirected [ Fig. 4(a,c,d)] and directed networks [ Fig. 4(b)] is in agreement with our predictions. This confirms that, together with the AP in the space of concepts, it is crucial to take into account of an AP in the social space.
In conclusion, we have presented a first example in which stochastic (and not deterministic) processes are coupled over the nodes of a complex network, and analytical insights on the relations between structure and dynamics are possible. The results highlight that the structural-not just local-properties of the nodes can strongly affect their ability to discover novelties. Our networked model of social urns is not just a simple extension of UMTs. What makes it novel and different is the very same idea of coupling together many urns over a complex social network, and the concept of "social urn" we have introduced. It is such a network coupling that spontaneously produces novel behaviors, such as different exponents of the Heaps' law in a single system, and has the potential to open new areas of research and applications. This work represents only a first step toward the inclusion of structured interactions in discovery processes. Urns can, in fact, result oversimplified models for the dynamics of individual explorers. Future works could consider non-identical urns, or even explore the effects of having individuals with a finite storage capacity, or where the adoption of the new might trigger the abandoning of the old, as for substitutive systems [82]. Another natural extension would be considering discoveries and social relationships unfolding across different network layers [83] or higher-order structures [84,85]. In addition, it would be interesting to study the relationship with existing models of social spreading and meme popularity [86][87][88]. Finally, our results could be directly applied in studies on efficient team structures in cooperative creative tasks [89][90][91][92][93].  In this section, we will study in more detail the analytical solutions we derived in the main text. We start by reviewing the case of an individual urn, which is equivalent to the urn model with triggering. We will then move on to more complicated cases, such as a pair of nodes, a chain, a cycle, a clique, ending with the formulation for the very general networks. Moreover, we will derive an algorithmic solution that allows deriving an analytical solution for each of the small networks studied in Fig. 3 of the main text. In every case, we will set the same parameters for each urn, so that ρ i = ρ (reinforcement) and ν i = ν (triggering) ∀i = 1, . . . , N . Each urn will be initialized with M 0 balls of different colors. These and the other colors-added from an individual i when triggered by a discovery-will be taken from a single predefined set of discoverable balls of different colors. Notice that this set is shared by all the urns so that once a ball is drawn from an urn, it will not be available anymore to the others, except when enlarging the urn through the social adjacent possible (if they are connected).

A. The single urn model
Let us consider the simplest case of an isolated urn, or equivalently, an urn on a node i for which the out-degree j a ij is null. In this case, the dynamics will be the same of the Urn Model with Triggering (UMT) [40,51], since the node does not have access to the balls of the neighbors, implying that its social urn will not be enriched (Ũ i (t) = U i (t)). For such a node, the equation for the asymptotic Heaps' dynamics, Eq. (2) of the main text, reduces to: where A B denotes the multiset obtained by removing all the elements in set B from the multiset A (all duplicates are removed). Equation (S1) can now be written as a function of the parameters of the model. In particular, we can write the total number of balls in the urn up to time t, U (t), as the initial number of balls M 0 , plus the ρ balls added (t times) as reinforcement, plus the (ν + 1) balls added (D(t) times, one for each novelty) due to the triggering mechanism: Similarly, the number of unique elements in the urn at time t, U (t), can be obtained by subtracting from U (t) the ρt repeated balls coming from the reinforcement, that is: Thus, using Eq. (S2) and Eq. (S3) in Eq. (S1) we obtain: From now onwards we suppose that t M 0 , so that we can disregard M 0 in Eq. (S4) and in the similar equations we will obtain in the following sections. Therefore, after the introduction of the auxiliary variable z(t) = D(t) t , Eq. (S4) can be rewritten as: which can be integrated as: The asymptotic solution (t → ∞) depends on the parameters ρ and ν. It can be shown, as in the Supplemental Material of Ref. [40,51], that the asymptotic solution for D(t) is that is precisely the Heaps' law [44], with sublinear growth for ρ > ν and linear for the other cases. As empirical data has shown [40,44], Heaps' laws usually have a sublinear behavior. For this reason, in this paper, we focus only on the case ρ > ν.

B. Two coupled urns
Let us consider now the simplest case of two coupled urns, that is a network with only two nodes connected by a directed edge (1 → 2), as in Fig. 1 of the main text. This is equivalent to a directed chain of N = 2 nodes, that will be discussed in the next Section for a general number N of nodes. The associated equations to determine the asymptotic Heaps' laws can be written expressing the probabilities P new i (t) to draw a new ball as the the fraction of discoverable balls over the total number of balls available to node i at time t: Notice that the right-hand side of Eq. (S8b) is simplified since node 2 does not have any outgoing link, and therefore its dynamics is the same of an isolated urn for whichŨ 2 (t) = U 2 (t). Thus, following the procedure discussed in the previous section, we have, for ρ > ν: The denominatorŨ 1 (t) of Eq. (S8a) can be expressed in terms of the two contributions coming from the two urns at time t, which reads:Ũ (S10) Similarly, the numerator of Eq. (S8a), consisting in the number of balls present in the social urn of node 1 at time t which did not appeared yet in S 1 (t), can be written as the total number of balls in the social urn of 1 at time t, minus the number of duplicates, minus the number of balls that do not represent a novelty anymore with respect to the sequence S 1 (t), i.e.: Then, using Eq. (S10) and Eq. (S11), the final expression for Eq. (S8a) reads: . (S12) For large times (t M 0 ) we can approximate Eq. (S12) as . (S13) Let us assume now that the dynamics of node 2 relaxes before the one of node 1, so that we can solve Eq. (S13) independently from Eq. (S9). In addition, if we suppose that lim t→∞ D(t)/t = 0, Eq. (S13) can be approximated as: The related homogeneous equation has a similar solution of Eq. (S9), i.e.: We now look for a solution like D 1 (t) = κ(t)D 1 (t) that, plugged into Eq. (S14), leads to: Thus, from Eq. (S9) and Eq. (S15) we get: whose solution is The asymptotic solution (t → ∞) of D 1 (t) is then approximated by: In conclusion, comparing the solutions in Eq. (S9) and Eq. (S19) the presence of an outgoing link increases the number of novelties with respect to an isolated urn dynamics. However, as we have shown here, this increase is approximately only logarithmic, meaning that we can see a slight increase at finite times which practically disappears for larger times. Le us also notice that this applies to the directed case, while in the case of an undirected link we would get identical Heaps' laws for both nodes i = 1, 2, without logarithmic corrections, but with higher exponents. This particular case is a cycle of two nodes, and as we will see in a dedicated section, cycles have their own behavior.

C. Chain of N urns
Let us consider now a slightly more complicated case. Let us suppose that the network is composed by an open chain of N urns, where there are only directed links (i → i + 1) , with i = 1, 2, . . . , N − 1. This is the case considered in Fig. 3(b, g) of the main text, where in that case N = 4. Analogously to the previous case, the associated set of equations governing the growth of the number of novelties can be approximated to: We can solve the system by solving each equation, starting from the last one and recursively substituting its solution into the equation above. Indeed, since node i = N does not have any outgoing link its independent Eq. (S20d) can be immediately solved, resulting in the known asymptotic solution: As in the previous case, in Eq. (S20c) we can consider D N −1 (t) to be the only unknown variable. Then, following the same analytical steps presented in previous section leads to: The same reasoning can be iterated for each node i. Let us now prove by induction on i that the asymptotic solution is We have already proved that this holds for i = N and i = N − 1. Let us now suppose that it holds for i and let us prove it for i − 1, with 1 < i < N . In the asymptotic limit, the equation for the growth of the number of novelties of node i reads . (S24) For the induction hypothesis, in Eq. (S24) the only unknown variable is D i (t). Therefore, we can consider the homogeneous associated equation which provides the approximated solution: As for the case of two coupled urns, we now look for a solution like D i−1 (t) = κ(t)D i−1 (t), that, plugged into Eq. (S24), leads to: Thus, we get whose solution is .

(S29)
Finally, after combining Eq. (S26) and Eq. (S29), we reach the solution for the dynamics of node i − 1, that reads: which completes the proof by induction. Finally, it is worth observing that the Heaps' laws would be very different if the links were undirected. This would indeed result, similarly to undirected cycles, in higher asymptotic Heaps' exponents.

D. Cycle of N urns
Directed cycle-Let us consider the case of directed cycles. As we will see, this is the simplest system leading to asymptotic Heaps' exponents that are higher than that of an individual urn. Let us hence suppose that every node i is connected just to the following one, node i + 1, with a directed link (i → i + 1), with i = 1, 2, . . . , N , where we identify node N + 1 with node 1. For a generic node i, the asymptotic differential equation for the growth of the number of novelties reads: For symmetry reasons, the dynamics of each node is the same, implying that D 1 (t) ≈ · · · ≈ D i (t) ≈ · · · ≈ D N (t). Hence, Eq. (S31) becomes that is equal to the equation of an individual urn [see Eq. (S4)], with ν = 2ν + 1. Therefore, if ρ > ν we have the solution Undirected cycle-Let us now consider cycles composed by undirected links. Let us suppose that N > 2, considered that for N = 1 the network reduces to an individual urn, and for N = 2 it is equivalent to a directed cycle of 2 nodes. For N > 2, each node i is therefore connected to two different nodes i − 1 and i + 1, and the associated equations to be solved are: Again, for symmetry reasons, we can equivalently write Eq. (S34) as that is equal to the equation of an individual urn [see Eq. (S4)], with ν = 3ν + 2. Therefore, if ρ > ν we have the solution Notice that for undirected cycles, since all connections are mutual, the resulting paces of discovery are higher than those in the directed case. However, in both cases, directed and undirected, the dynamics of each node does not depend on the length of the cycle.

E. Clique of N urns
Let us consider a N -clique, that is a fully connected network of N nodes, equivalently directed or undirected. Being every node i connected to all other nodes, all nodes are equivalent, and the general equation for the growth of the number of novelties of node i reads: For symmetry reasons, each urn follows the same dynamics and we can equivalently write Eq. (S37) as that is equal to the equation for an individual urn [see Eq. (S4)], with ν = N (ν + 1) − 1. Therefore, if ρ > ν we have the solution Let us observe that for any network with N nodes, the maximum allowed Heaps' exponent is hence [N (ν + 1) − 1]/ρ, which occurs only in the case of a fully connected network.

F. The general case
Let us consider a general graph G(N , E), either directed or undirected. In order to write and solve the equations for the growth of the number of novelties, we first have to calculate the probability P new i (t) of drawing a new ball from the urn of each node i. This can be done by considering the number of different colors present in the social urnŨ i (t) of node i at time t that have not been discovered yet by i, divided by the total number of ballsŨ i (t) present in its social urn at that time. The numerator can be expressed as |Ũ i (t) S i (t)|, which is the length of the multiset obtained by removing from the multisetŨ i (t) all the elements appeared in the sequence (taking out all duplicates). In other words, it is the number of unique colors present in the urn of node i and in the one of its neighbors (without their multiplicity) minus the number of colors already drawn (unique elements in the sequence of i). Considering that all (and only) the already discovered balls are those that have been reinforced and that the number of triggered colors added to the urn j is exactly (ν + 1)D j (t), we can write: or, equivalently: For t M 0 we can disregard the presence of M 0 in Eq. (S41). As shown above for N -cliques, in the asymptotic limit t → ∞ the growth of the number of novelties obeys an Heaps' law with maximum exponent [N (ν + 1) − 1]/ρ. This means that if ρ is high enough, we can approximate the denominator on the r.h.s. of Eq. (S41) to ρt. After finding the approximated solution, we will estimate the set of parameters for which this approximation is valid for any topology. Therefore, in the asymptotic limit and with a proper choice of the parameters, Eq. (S41) can be rewritten as: which can be expressed in a more compact way as: where I is the N × N identity matrix and M = f (A), with f (x) = ν ρ + ν+1 ρ x. By operating the change of variable t = e z , Eq. (S43) can be rewritten as a standard first-order differential system, i.e. d z D(z) ≈ M D(z), which leads to the solution where {λ } =1,...,r and {m } =1,...,r are the eigenvalues of M with their respective multiplicities, and c p are vectors defined by the initial conditions. The asymptotic behavior of the number of novelties D i (t) discovered by node i at time t is then governed by the leading term in Eq. (S44), so that we can write: where λ(i) is the eigenvalue of M with the biggest real part such that the i-th entry of at least one of its eigenvectors c p is different from zero. Similarly, p(i) is the maximum value of p among these eigenvectors with non-zero i-th entries. In general, then, λ(i) might not be the maximum eigenvalue of M , like p(i) might be less than the multiplicity of the eigenvalue λ(i) minus one. Moreover, different nodes may have different values for these exponents. In particular, we have the same exponents for nodes in the same strongly connected components (SCCs), while they may vary from SCC to SCC. In the following paragraphs we will investigate this aspect.
Strongly connected network-Let us suppose that the graph G(N , E) is strongly connected. In this case the solution given by Eq. (S45) simplifies. Indeed, in this case, the corresponding adjacency matrix A = {a ij } is irreducible [94]. Let us recall that for irreducible matrices the Perron-Frobenius theorem holds [75,76], according to which there exists a positive eigenvalue µ greater or equal to (in absolute value) all other eigenvalues. Such eigenvalue corresponds to a simple root of the characteristic equation and the corresponding eigenvector u has all positive entries too. The latter vector is a multiple of the Bonacich eigenvector centrality vector [77]. Widely used in network science, the Bonacich eigenvector centrality is a measure that recursively accounts for local and global properties of the network, relying on the notion that a node can be highly central either by having a high degree or by being connected to others that themselves are highly central [62]. Simple algebraic steps can prove that if µ is an eigenvalue for A, then λ = f (µ) is an eigenvalue for M . Moreover, if u is an eigenvector corresponding to the eigenvalue µ of A, then u is also an eigenvector corresponding to the eigenvalue λ = f (µ) of M . Therefore, if µ is the maximum eigenvalue of A, then λ = f ( µ) = ν ρ + ν+1 ρ µ > 0 is the highest eigenvalue of M , and with the same positive eigenvector u. Thus, for strongly connected graphs, the approximated solution given by Eq. (S45) becomes meaning that all nodes have similar Heaps' laws, and the key difference is made by their eigenvector centrality. As we saw in the main text (and we will see here more in details), these differences, more pronounced in transient times, will contribute to determine the fastest explorers in the network. Moreover, we deduce that the approximation used in Eq. (S42) is valid provided that λ = f ( µ) < 1, that is ρ > ν + (ν + 1) µ, while for higher values of ρ the solution is bounded by the linear solution as seen for the individual urn in Eq. (S7), since in the original system in Eq. (S40) we have d t D i (t) ≤ 1.
Non-strongly connected network-Let us now consider the most general case, that is a directed or undirected graph with any hypotheses of connectivity. Let us construct an algorithm to determine the pace of discovery of each node, which will help us better understand analytically why some nodes have higher paces of discovery. To do this, let us partition the graphs into its strongly connected components (SCCs), i.e. maximal strongly connected subgraphs of G, which can be found in linear computational time, for example with a DFS-based algorithm [68]. Let all the SCCs be indexed as C 1 , . . . , C p , with C i ∩ C j = ∅ ∀i = j.
Without loss of generality, let us suppose that the graph G is weakly connected, because otherwise we can repeat the same reasoning for each weakly connected component. Let us also suppose that the number of SCCs is p > 1, because otherwise the graph would be strongly connected, which we already discussed in the previous paragraph. Since G is weakly connected, for each SCC C q there must exist another component C l , with l = q, such that there are some links from C q to C l or viceversa. However, there cannot be links in both directions (from C q to C l and viceversa), because otherwise they would be a unique SCC. It is also easy to show that there is always a SCC without any outgoing links to other SCCs. Eventually permutating the indexes of the SCCs, let us call C 1 , . . . , C p1 all the components with no outer links. Then, for each 1 ≤ q ≤ p 1 , the respective system of differential equations for D i , i ∈ C q , does not depend on any outer variable D j , j ∈ C l = C q . Therefore, we can consider C q as an independent strongly connected subgraph of G, for which the reasoning in last paragraph holds. The solution for these SCCs is then: where λ (q) is the maximum eigenvalue of the adjacency matrix of subgraph C q and γ (q) i is a multiple of the eigenvector centralitiy for node i in C q . Found all the Heaps' laws relative to the nodes in C 1 , . . . , C p1 , it is possible to show that there exist SCCs C p1+1 , . . . , C p2 that have links only towards the previously studied SCCs C 1 , . . . , C p1 , with p 2 > p 1 . Then, choosing C q one of these other SCCs, let λ (q) be the highest eigenvalue of the adjacency matrix of C q . Let alsoλ (q) = max l≤p1 (γ ql λ (l) ) be the maximum of the Heaps' exponents in Eq. (S47) of the SCCs reachable from C q , where γ qr = 1 if there is at least a link from C q to C l , γ qr = 0 otherwise. As we will see further in this section, the Heaps' solutions for the nodes in these SCCs is: meaning that the Heaps' exponent λ (q) for node i in C q , p 1 + 1 ≤ q ≤ p 2 , is that is the maximum of the highest eigenvalue λ (q) of M relative to C q and the highestλ (q) of the Heaps' exponents λ (l) for , a factor ln(t) appears in the solution. The same procedure can be repeated for all other successive SCCs C q , keeping in mind that now a higher power ln p(q) (t) of log(t) can appear. In this algorithmic process, let us now consider a generic SCC, say C q , and let us suppose we have solved inductively all the equations for the Heaps' law of the nodes in the already examined SCCs, that is C 1 , . . . , C q−1 . Let us recall that we arranged the indexes in such a way that the only outgoing links from C q are pointed to nodes in previous SCCs, i.e. in some of the SCCs C 1 , . . . , C q−1 . For this reason, in order to solve the asymptotic differential equations responsible for the Heaps' law of the nodes in C q , we can consider only the equations relative to the nodes in C q in Eq. (S43), since the previous SCCs have been already solved and the following variables do not appear in these equations. We hence have to solve the following approximated equations: hence the solution: .
In conclusion, when dealing with a network with multiple strength connected components, we solve the equations for the components that are independent from the others. Then we consider the SCCs that have links only to previous SCCs, applying the method just described. This is repeated until every SCC is studied, thus solving the whole system and describing the pace of discovery of each node of the entire network analytically, obtaining solutions of the type in Eq. (S45). In the next section this algorithmic method is applied to simple networks with N = 4 nodes, as we have already implicitly done above for a two nodes network and for chains.
G. Application to the five graphs in Fig. 3 As an application of the analytical results of the previous sections, we study here the very same five networks reported in Fig. 3 of the main text. In particular, we will be able to provide an explicit expression for the growth of the number of novelties at each of the four nodes of the social network.
Graph a-Let us consider a network where nodes 2, 3, and 4 do not have any outgoing links, while node 1 has the links 1 → 2, 1 → 3, and 1 → 4 to all other nodes (see network representation in Table S1). Let us observe that the dynamics here is very similar to the case of a couple of urns with the only link 1 → 2. Nodes 2, 3, and 4 can be considered as three individual urns, for which the Heaps' law is the same to the classic one in Eq. (S9), that is: As for node 1, the differential equation for the Heaps' law is approximated by: The resolution of Eq. (S65) is the same as the one done for the couple of urns, with only a multiplicative factor 3. Therefore, the Heaps' solution for node 1 is: which means that node 1 has a higher pace of discovery than nodes 2, 3, and 4, but at asymptotic times they will show the same Heaps' exponent. Moreover, it is clear that in star-like networks adding more nodes does not increase significantly the pace of discovery.
Graph b-The next network we studied is a chain of 4 nodes, with links 1 → 2, 2 → 3, and 3 → 4. This network has already been studied in Sec. 1.3, and the solutions are: This analytical result shows us why node 1 has an higher pace of discovery than the other nodes, with lower Heaps' exponents for higher nodes. This is due to the presence of different powers of the logarithm. In the end, however, they all have the same asymptotic Heaps' exponent, meaning that the difference is visible only at finite times.
Graph c-Let us consider a network made by a directed cycle between nodes 2, 3 and 4, with links 2 → 3, 3 → 4, and 4 → 2, and another node 1 linked directly to node 2 (1 → 2). In this case, we can distinguish two SCCs, the cycle and node 1. Since there is no link going out from the cycle, we start solving the Heaps' law equations related to it. As we have seen in Sec. 1.4, the solution is given by Eq. (S33) with N = 3, that is: Now let us consider the remaining SCC, namely node 1. Its equation is the same as Eq. (S13) for the two coupled urns case in Sec. 1.2, with the only difference that here the solution of D 2 (t) has a higher exponent. Then, if we search for a solution like being the solution of the associated homogeneous equation, we get: whose solution is: which gives the asymptotic solution: We could have obtained the same result using the algorithm developed in the last section. In this case, node 1 gets the same dynamics of the nodes in the cycle, with just a scaling factor (ν + 1)/ρ, since the maximum eigenvalue of its SCC (node 1 itself) is lower than the maximum eigenvalue of the SCCs he is linked to (the cycle).
Graph d-In this case we consider the same network as the last graph we just analyzed swapping the direction of the link 4 → 2. Therefore, the cycle is broken (see network representation in Table S1), and as we are about to see, the dynamics is much more similar to a chain. We could give a detailed solution as done for the chain; instead, we are going to use directly the algorithm we developed to assess all the exponents in the Heaps' laws for every node. Let us start from node 4, which has no outgoing links. This node is hence an individual urn, with the usual solution: Let us move on to the SCC with outgoing links only towards previously studied SCCs, that is the SCC composed by node 3. If this SCC had no outgoing links, then it would be an isolated urn, therefore with the same exponent of the other SCC studied (node 4), meaning that the actual solution for node 3 has that exponent and a logarithmic factor. Indeed, the dynamics of node 3 is the same derived for the couple of urns in Sec. 1.2, which is: Proceeding with node 2, we compare its exponent if it was isolated to the maximum of the exponents of node 3 and 4, which are all the same. Moreover, since node 3 has a higher power in the logarithm than node 4, in the asymptotic solution, we can disregard the presence of the link 4 → 2. Thus, the solution for node 2 has another logarithmic factor and another constant multiplicative factor than those of node 3, that is we have the solution: To complete, similarly we obtain the solution for node 1, i.e.: TABLE S1. Summary of the asymptotic Heaps' laws derived analytically for the 4 nodes composing the five networks reported in Fig. 3 of the main text (here displayed at the top). The coefficients ui have not been reported to focus on the exponents of the power laws and the logarithms, when present.
We can hence see that the solutions are equal to those of the chain in Sec. 1.7.b, and there are only some slight differences at finite times due to the presence of another link, but not significantly.
Graph e-The last case to be examined is again similar to Graph c, but this time we swap the direction of the link between nodes 1 and 2 (see network representation in Table S1). Here the order with which we study the SCCs is inverted, because now only node 1 has no outer links. Therefore, the Heaps' law for node 1 is the classic individual one in Eq. (S7). Then we have to solve the equations for the cycle, which in this case are: (S76) In this system, we can consider D 1 (t) known, working at large time-scales. Therefore, following the algorithm described in Sec. 1.6.2, we first solve this system without the external sources (i.e. node 1), in order to find the leading solution and then compare the exponents with the external sources ones. The solution of the associated homogeneous system is the same of a directed cycle as in Eq. (S33), i.e. a power-law function with exponent 2ν + 1/ρ. Now, we observe that the Heaps' exponent of the cycle is higher than the exponents of outer SCCs it is linked to, that is just node 1 with exponent ν/ρ. Then, the asymptotic solution for the nodes in the cycle corresponds to the solution of the cycle as if it had no outer links. Explicit solutions are given in Table S1.

II. NODE RANKING AND HEAPS'LAW
In this section, we study more in details the validity of the eigenvector centrality and α-centrality to rank the nodes in a social network according to their discovery dynamics. First, we describe the real-world data sets considered. Then, we test the persistence of the nodes ranking based on the fitted Heaps' exponents at different times. Finally, we explain why the eigenvector centrality and the α-centrality lead to the same ranking of the Heaps' exponents for strongly-connected and generic networks respectively. All simulations in this section are performed with model parameters: ρ = 10, ν = 1, M 0 = ν + 1.

A. Description of the data sets
We consider four data sets of real-world networks representing different types of social interactions: the Zachary Karate Club (ZKC) network [73], a network of follower relationships among Twitter users [79], a co-authorship network in Network Science [80], and a collaboration network between jazz musicians [81]. The network of Twitter from the original data set (Ref. [79]) has been reduced by performing a random walk sampling. Some basic properties of the networks are summarized in Table S2, like the total number of nodes N , the total number of links E, the average degree k , and the maximum eigenvalue µ of the related adjacency matrix. Moreover, we have shown some properties of connection of the networks. In particular, we distinguished weakly-connected components (CCs) and strongly-connected components (SCCs), because they play an important role in the dynamics under investigation. Therefore we showed the number of both CCs and SCCs, as well as the size of the respective largest one. As we can see, the networks we have chosen have all very different properties, either in size, average degree, and connection.

B. Rank persistence
In the main text, we have developed a networked model for the dynamics of discovery that introduces an heterogeneity in the paces of discovery, as it happens in real-world social networks. In the previous sections, we concentrated on finding an analytical asymptotic solution of the Heaps' laws. However, for most of applications we are interested in transient times. As can be seen in Fig. 3 of the main text, the paces of discoveries, represented by the fitted Heaps' exponents, change in time, depending on the network characteristics and the model parameters. Nonetheless, the ranking of the nodes based on these fitted exponents remains almost the same. To show this, we plot in Fig. S1 the scatter plot and the Spearman's rank correlation coefficient between the fitted Heaps' exponents β(T ) at T = 10 4 and T = 10 8 , together with their distributions, for the four real-world networks presented in the last section. In all cases, we get a Spearman's correlation of 1.00, meaning that even though the distribution of fitted exponents change, the ranking is time-invariant and does not depend on the particular T at which Heaps' exponents are fitted. Let us observe that we used a set of parameters that in all cases invalidate the approximations used in the analytical study, i.e. ρ < ν + (ν + 1) µ. This is evident in the scatter plot of Fig. S1(b), where, apart from a set of nodes whose exponents span across the entire range, most of the nodes present a very low pace of discovery, with fitted exponents very close to 0. A similar thing can be seen in Fig. S1(d), for which we have the highest eigenvalue and hence the highest Heaps' exponents among the four networks (with all Heaps' exponents very close to 1). All this is a strong indication that the various paces of discovery have to depend on some structural characteristics of the networks.
In the following sections, we keep investigating the relations between Heaps' exponents and network measures. In particular, we focus on the eigenvector centrality and the α-centrality, respectively useful for strongly-connected graphs and generic graphs. More insights on these centrality measures will be provided, both from a numerical and an analytical point of view.

C. Heaps' exponents and the eigenvector centrality
In the main text, we have shown that in strongly connected graphs each urn has the same asymptotic Heaps' exponent, and the driving factor for each node is the associated asymptotic coefficient. As we saw when we derived the asymptotic expression of the Heaps' law for strongly connected graphs in Eq. (S45), the Heaps' exponent corresponds to the maximum eigenvalue λ of the matrix M = ν ρ I + ν+1 ρ A, where A is the adjacency matrix. In particular, because of the Perron-Frobenius theorem [75,76], we know that λ is positive and simple, and the related eigenvector u has all positive entries. We also derived that the coefficients of the Heaps' laws are all multiples of this eigenvector. A lot of importance has been given in the past to this vector, from which we can derive the eigenvector centrality, also known as the Bonacich centrality [77]. As a definition, the eigenvector centrality c where λ is the highest positive eigenvalue [75]. This centrality measure accounts for both local and global properties of the network, as it is not just dependent on the degree of the node, but also on the positioning of each node in the network [69].
FIG. S1. Scatter plot and Spearman's rank correlation coefficient rS between fitted Heaps' exponents βi(T ) at T = 10 4 and T = 10 8 associated to the i = 1, . . . , N nodes off the four empirical networks considered: (a) the Zachary Karate Club network [73], (b) a network of follower relationships of Twitter [79], (c) a co-authorship network in network science [80] and (d) a collaboration network between jazz musicians [81].
Our analytical investigation showed us that for strongly connected components we expect the same asymptotic Heaps' exponents. However, the same analysis showed us that the coefficients depend on the eigenvector centrality. This factor plays a role in the transient times, when we are far from the asymptotic regime, and it is thus especially important for real-world systems.
To complement the results presented in the main text, we now test numerically the correlation between the eigenvector centralities and the measured Heaps' exponents at transient times for the Zachary Karate Club network. Figure S2(a) shows the scatter plot and the Spearman's rank correlation of the eigenvector centralities and the fitted Heaps' exponents at time T = 10 4 for the (largest strongly connected component of) ZKC network and in Fig. S2(b) its visualization with color-coded nodes (cfr Fig. 2 of the main text). The resulting Spearman's rank correlation higher than 0.98 persists changing the parameters in the simulations, even for sets of parameters in contrast with the approximations used in the analytical study, i.e. ρ < ν + (ν + 1) µ. We can hence conclude that the eigenvector centrality is an optimal proxy for the distribution of Heaps' exponents in strongly connected social networks, and it can be used to give a faithful ranking of the individual expected paces of discovery.

D. Heaps' exponents and the α-centrality
In this section we focus on generic directed graphs and the usage of the α-centrality as a proxy for the ranking of the nodes based on their pace of discovery in these more general cases. The α-centrality, widely used in network analysis [70,71], has been first introduced in Ref. [78] to extend the eigenvector centrality to asymmetric graphs. The underlying idea is to tune the influence of the adjacency matrix structure with a parameter α to add exogenous sources to the centrality [62,78]. Formally, it is defined as the vector u such that where e is an N -dimensional vector of ones. The matricial form of Eq. (S78) reads: where I is the N -dimensional identity matrix. It has also been shown that this centrality is equivalent to Katz-centrality [72] given by with a being an attenuation factor. In fact, it has been shown that the equality c (K) = − e + c (α) holds, i.e. these two centralities differ only by a constant [78]. From Eq. (S78) and (S79), it is clear that the α-centrality can be both a local and global measure. In fact, for α → 0 + , the relative importance of the structure given by the adjacency matrix A decreases, in favor of the exogenous factor given by e. With higher values of α, instead, the role of the exogenous part is damped.
For an undirected graph, the α-centrality becomes proportional to the eigenvector centrality when α → (1/ µ) − , where µ is the highest positive eigenvalue of the adjacency matrix. In fact, in this case all eigenvalues are real and the eigenvectors are orthogonal. Following Ref. [78], let {µ } and { u } be the (eventually multiple) eigenvalues and eigenvectors of the adjacency matrix A, with µ = µ 1 > µ for = 1. Then we can write A = N =1 µ u u T . Considering that A k = N =1 µ k u u T , from Eq. (S79) we have: When α → (1/ µ) − , the factor relative to = 1 in the last term of Eq. (S81) becomes the leading term, thanks to the Perron-Frobenius theorem, so that we can write: where we have noted with c (E) the eigenvector centrality.
Let us now generalize the analytical steps above to understand why the α-centrality correlates with the fitted Heaps' exponents for generic graphs, as we showed numerically in the main text for real-world social networks. Let us suppose that the social network is a weakly-connected directed graph, since otherwise we can repeat the same argument for each weakly-connected component. As we have shown before, the asymptotic behavior of the Heaps' law for node i is of the type u i ln p(i) (t)t λ(i) . We have shown also that the values of p(i) and λ(i) for each strongly connected component can be determined algorithmically. Here we will show that not only the α-centrality can account for the coefficient u i like the eigenvector centrality, but also for the different values of p(i) and λ(i). Let us first concentrate on what happens with multiple eigenvalues, for which the biggest difference is primarily given by p(i). Therefore, let us suppose for now that all SCCs in the graph have the same Heaps' exponent λ(i) = λ, but different values of p(i), and that in the leading terms the maximum value assumed by p(i) is p max < N . This is the case for example of an open chain (already studied above), where p(i) = 0, 1, . . . , N − 1 for i = N , N − 1, . . . , 1 respectively, and p max = N − 1. Notice that, in this particular case, the adjacency matrix has only one eigenvalue µ, related to the Heaps' exponent λ through the relationship λ = f ( µ), with f (x) = ν ρ + ν+1 ρ x. Therefore, the Jordan canonical form of the adjacency matrix is: where P = u 1 u 2 . . . u N has the generalised eigenvectors in each column, and J j denotes the N × N matrix with ones only in the (j + 1)-th upper diagonal and null everywhere else, with J 0 = I. It is possible to show that Hence, from Eq (S79), similarly to what we have done in Eq. (S81), we have: From the above, it is clear that the nodes for which ( u 1 ) is positive have the greatest α-centrality when α → 1/ µ, since they they are associated to the highest power in the logarithm p(i) = p max . Among these, as with the eigenvector centrality, nodes with higher coefficients (corresponding to the eigenvector centralities in that SCC) have higher ranking. Then the nodes who have zeroes in u 1 but positive entries in u 2 are next in the ranking, and so on. This confirms the fact that, when comparing nodes with same asymptotic Heaps' exponent, those with higher discovery rates, i.e. those with higher powers in the logarithm factor, have the highest α-centrality.
A similar approach to the one we used to derive the algorithmic solution of the Heaps'law for a generic graph can be used to treat generic weakly-connected graphs. Let us divide the network into its SCCs. For each component C q , we denote µ (q) the maximum between the maximum eigenvalue the component would have if isolated and the maximum eigenvalue of the neighboring SCCs, following the same order used with the developed algorithm. In this setting, it is then possible to compute the α-centrality at α → (1/µ (q) ) − , that might be different across SCCs. The final ranking is given by ordering the evaluated α-centralities starting from those with the highest µ (q) .
It is worth noticing that this method can be computationally not efficient, especially for big networks. For this reason, we have tested how reliable the α-centrality with the same value of α is when comparing it to the Heaps' exponents, regardless of the procedure above. In Fig. 4 of the main text we have investigated the relation between Heaps' exponents and α-centralities setting α to 0.85/ µ. Here, we further investigate how the correlation changes with α. This is shown in Fig. S3, where we plot the Spearman's rank correlation coefficient between the paces of discovery β i (10 4 ) and the α-centralities c [α] i as a function of α for all the nodes i = 1, . . . , N composing the four considered real-world networks. Although panel (d) displays a decrease in the correlation when approaching 1/ µ, however, setting α < 1/ µ leads to Spearman's rank correlation coefficients r S > 0.89 in all four cases (cfr main text).