Hierarchical route to the emergence of leader nodes in real-world networks

A large number of complex systems, naturally emerging in various domains, are well described by directed networks, resulting in numerous interesting features that are absent from their undirected counterparts. Among these properties is a strong non-normality, inherited by a strong asymmetry that characterizes such systems and guides their underlying hierarchy. In this work, we consider an extensive collection of empirical networks and analyze their structural properties using information theoretic tools. A ubiquitous feature is observed amongst such systems as the level of non-normality increases. When the non-normality reaches a given threshold, highly directed substructures aiming towards terminal (sink or source) nodes, denoted here as leaders, spontaneously emerge. Furthermore, the relative number of leader nodes describe the level of anarchy that characterizes the networked systems. Based on the structural analysis, we develop a null model to capture features such as the aforementioned transition in the networks' ensemble. We also demonstrate that the role of leader nodes at the pinnacle of the hierarchy is crucial in driving dynamical processes in these systems. This work paves the way for a deeper understanding of the architecture of empirical complex systems and the processes taking place on them.

Many real systems in nature are organized such that they are constituted by single entities that interact with one another through complex structures.The architecture of these interactions has been the subject of study within the field of network science over the past two decades [1,2].Recently, the directedness and hierarchical nature of real networks has attracted significant focus [3][4][5].One prime example is the literature demonstrating how many real networks, from biological to social, possess both a strong asymmetry and non-normality [4,6,7].This ubiquitous property of real networks has resulted in the concept of non-normal networks, described as those whose adjacency matrix A, is (strongly) non-normal, by definition implying that A T A = AA T [8].One striking feature of this finding is the implication that empirical networks are structurally similar to directed acyclic graphs (DAG), from which they also inherit their strong non-normality [6].Previous results have illustrated the effects of the non-normality on the collective dynamics from a variety of processes in areas such as the ecosystems stability [9], synchronization of networked electrical devices [10], neuronal dynamics [11], network resilience [7], trophic relationships [12], pattern formation [13,14], and information transmission [15].The asymmetric nature of the networks, a ubiquity within the field of complex systems [6], has been shown to result in qualitatively different behavior for the processes taking place on them in comparison to those observed upon their symmetric counterparts.In particular, perturbations of a stable state cause a transient growth proportional to the level of non-normality in the linear regime, which may result in a permanent instability in the nonlinear regime [6][7][8].
Motivated by the structural and dynamical properties of non-normal networks, in this paper, we aim to gain a higher-level understanding of the relationship between hierarchy and directedness of networked structures and the behavior of the dynamical processes therein.With this aim we consider a classical measure that bridges these two aspects-the entropy rate (ER) [16].This measure provides an estimation of the inherent randomness, arising from the underlying network structure, within a given dynamical process [17,18].The rationale behind the choice of an entropic measure is based upon the intuition that real networks with a strongly non-normal structure should have a lower level of entropy rate due to the hierarchical (directed) topology in comparison to a general random network [6].In this sense, we view the network as transporting a quantity of mass (e.g., molecules, information, energy,) across it, driven by its DAG-like structure.Through this interpretation, the non-normality becomes a meaningful measure to quantify the level of polarization within the flow's equilibrium states as an immediate consequence of the network's topological properties.
With these ideas in mind, we proceed to study the entropy rate of the generic random walk taking place on a large collection of real-world networks from a wide variety of domains.Surprisingly we notice that in the ensemble of networks the ER collapses to zero as the non-normality increases.This change in ER is generally monotonic for lower values of non-normality, but once some threshold for higher values of non-normality is surpassed the ER abruptly decreases to zero, indicating a sharp transition.Such a pervasive vanishing of the ER measure within the ensemble of empirical networks is a remarkable occurrence and is, to our knowledge, a novel observation.To understand this evidently universal phenomenon, we first observe that the ER becomes zero when each of the Strongly Connected Components (SCC) in which mass is trapped are constituted by a single node with only incoming edges (sink node) which in this paper we denote as a leader.
It is quite surprising, firstly, that each of the trapping states are simultaneously constituted by sink nodes (no SCC with multiple nodes coexist together with leader ones), and secondly, how this property depends upon a global structural feature such Non-normality FIG. 1.The universal emergence of leaders with increasing non-normality.Schematic illustration of the evolution of hierarchical structure within networks ranging from random graphs (left panel) with small non-normality before a hierarchical structure develops in conjunction with the network becoming increasingly non-normal (middle panel).And finally when the non-normality surpasses an empirical threshold leader nodes, those with no out-degree at the top hierarchical level (the magenta nodes), simultaneously emerge across the ensembles of networks.Blue edges represent links directed up the hierarchical level while red edges are those which move downwards.Also shown are the green edges between nodes at the same hierarchical level.
as non-normality.A visual illustration of this behavior is shown in Fig. 1.To shed light on the mechanism responsible for the emergence of leader nodes, we extend our analysis to each network as a whole, looking for hierarchical structures.Once a hierarchical ranking of the nodes is obtained based upon their proximity to the leaders, an immediate conclusion is reached: real networks share the same underlying pattern of hierarchy.In fact, the edges can be classified in three groups: the first, constitute the hierarchical backbone, the DAG substructure made by links that are part of (at least) a directed path to the leaders; the remaining links, considerably lower in numbers, show either a common distribution linking higher hierarchical levels with lower ones or create richclub communities between nodes of the same hierarchy.
Based on these empirical observations, we propose a mechanistic null model for generating non-normal networks with emerging leader nodes.Following the recipe proposed in Ref. [6], we deploy the classical Price's model [19] to generate the DAG-backbone links.Subsequently we complete the model by adding reciprocal links in such a way as to compensate the amount of non-normality, controlled by an external parameter.Our model accurately captures the empirically observed relation between the entropy rate and level of non-normality within the ensemble of networks and, importantly, the abrupt emergence of leader nodes.
We believe that the ubiquitous emergence of leader nodes is not casual, but a result of an evolutionary process driven by strong benefits in the collective dynamics of interacting individuals.To emphasize this idea, we consider a classical framework for competition dynamics [7,20] , and show that the hierarchical structure results not only in an obvious benefit for the leaders who have considerably higher survivability but also the indi-viduals directly related to them.

I. RESULTS
In this study we aim to understand the relationship that exists between emerging structural properties of realworld networks with their level of non-normality and how such features affect the resulting dynamical processes which take place upon them.To conduct this analysis we must first provide some quantification of a network's nonnormality which, in general, implies that the underlying adjacency matrix is such that A T A = AA T [8].With this aim, we consider the normalized Henrici departure where || • || F describes the Frobenius norm and λ i represents the eigenvalues of the matrix A [8].This quantity varies from the extreme values of a symmetric network ( dF (A) = 0) to the case of an exact DAG ( dF (A) = 1).
Since in our analysis the entropy rate will be a diagnostic tool for the level of the structural non-normality and the consequent emergence of leaders, we need to formalize its definition with a specific dynamical process.Without loss of generality, we consider in the sequel the generic random-walk that describes the flow of some quantity, which we refer to as mass, that moves between the nodes of a network following the rules specified in the dynamics of the process.We define here the fraction of mass present on node j at time t to be given by q j (t) which may move to the neighbor nodes with some probability dependent on the number of connections the node has [21] (see Methods for more details).Specifically, the transition rate T ij describes the probability of the particle (constituting the mass, information, etc.) moving from node j to node i in each time step.We describe this process via its stationary distribution q * j = lim t→∞ q j (t) which describes the steady state of mass on each node.The entropy rate of the process is successively given by the following This quantity provides a gauge regarding the level of randomness of the network structure through the corresponding transition matrix associated with a given stochastic dynamical process [16,17].
Equipped with these tools we now proceed to consider a large variety of empirical networks and the effect that their structure has upon the corresponding process.Subsequently, we will use the entropy rate as the main observable of a null model for generating synthetic networks that mimic the properties observed in the empirical ones.

A. Real-world networks
We now consider a large collection of 124 (directed) realworld networks from a wide range of domains spanning from biology to social interactions including communication, ecological, transport, among many others.To make our analysis compatible with all the networks under scrutiny, we take the following steps.Firstly, since the calculation of the entropy rate requires knowledge of the stationary distribution which, in the case of directed graphs, is not necessarily unique [21], we initialize the system uniformly such that each node has mass with magnitude given by the reciprocal of network size before proceeding to observe the dynamics until convergence.The second step we take is the rescaling of the entropy rate in order to make two distinct networks comparable with one another.As such, we propose using the following quantity ĥ which we coin as the relative entropy rate.The term h A describes the entropy rate of the network and h H(A) that of the Hermitian matrix of its adjacency H(A) = (A + A T )/2, which may be viewed as a symmetrized version of the network.This choice is motivated by the fact that the random walk diffusion tends to accumulate the mass in the nodes of higher degree.Such a rescaling is crucial in distinguishing the effect of a network structure's directedness upon the equilibrium state of the random walk diffusion in comparison to a related symmetric network [1,3,4].An important property of this measure is related to how we choose the adjacency matrix.In fact, in the data describing real networks, the direction of edges can vary based upon interpretation, namely both the adjacency matrix A and its transpose A T may be eligible, according to what an outgoing (incoming) edge physically means.To avoid this eventual mismatching and uniformize our measure for all the networks, we choose the direction of edges that minimize ĥ.Lastly, we highlight that following this agreement a sink node can behave as a source node and vice versa depending upon the interpretation of directionality.We present in Fig. 2 the results of this simulation for the entire dataset of empirical networks we have collected.The first fact that can immediately be noticed is that for most of the networks (more precisely 85% of them), the entropy rate equals zero.Such a result is surprising since a value of entropy rate that equals zero implies that the mass has been accumulated predominantly in sink nodes (nodes with only incoming edges) [22].In the analysis to follow, we shall call these nodes leaders.The reader can readily verify this by referring to Eq. (S5) and the Methods section.The other interesting fact, is that for the remaining 15% of the networks there appears to exist a negative correlation between the non-normality and the corresponding normalized entropy rate.In fact, the most remarkable finding is that there appears to be an orderly monotonic decrease of the ER for four families of networks, namely those describing roads, trade, neuronal, and animal relationships, in which there is a collapse of entropy values once the level of non-normality underlying the networks within said domains surpasses a certain value.An exception of this trend is observed only in the subdomain describing levels of travel between airports, but we view this outcome as abnormal due to the physical structure of the networks [23].These two important aspects that characterize the empirical networks in our dataset raise the question, which we proceed to consider in the section to follow, regarding what are the underlying mechanisms resulting in the simultaneous emergence of leaders in real-world networks and why their occurrence is so ubiquitous among empirical systems.

Emergence of leaders
The observed monotonic behavior in the relative entropy rate ĥ across networks from numerous domains as the level of non-normality increases and eventually surpasses a certain threshold, implies that the dynamics of the processes taking place on these networks are becoming less random as a consequence of the increasing directedness of the underlying structure.This behavior is a result of an increasing accumulation of mass within a given set of nodes who receive from, but do not contribute to, other nodes in the network (terminal SCC).However, the abundance of cases in which a complete collapse of the entropy rate occurs indicate that the accumulation has in fact reached a critical state whereby the mass has been accumulated on single nodes, where it is trapped forever.In this paper, we focus on how these leaders emerge as a consequence of the non-normality Normalized entropy rate dependence on the network polarization in real-world networks.The network non-normality, quantified by the normalized Henrici departure df versus the normalized entropy rate ĥ for 124 empirical networks from a large range of domains is shown.At first glance, the set of networks seem to be grouped in a consistent majority (∼ 85%) having ĥ = 0, and a small minority (∼ 15%) with ĥ = 0.In particular, for those data with non-zero entropy rate, a monotonically decreasing relationship of the entropy rate with the non-normality is observed.When some threshold value of the Henrici measure is reached across the ensemble of networks, a transition-like behavior occurs for four of the subdomains under consideration, which results in an abrupt collapse of the entropy rate towards zero.This occurrence indicates that the mass is accumulated exclusively in single nodes without outgoing edges, or leaders.Inset: We have shown the four subdomains where the transition occurs (neuronal, trade, animal relationship, and roads) for a more detailed inspection of the emergent behavior.Also shown are a number of percentiles (0.5%, 25%, 50%, 75%, and 99.5%) of the two quantities obtained from 10 4 realizations of the proposed null model (N = 100) with increasing threshold.These realizations indicate how our model manifests a similar transition in the entropy rate with increasing non-normality.
underlying the network structure.As such we pose the following question: are they created at random while the non-normality increases?To respond to these issues, we have measured the number of leaders in each empirical network, as shown in Fig. 3.It can be immediately noticed that (almost) all the scrutinised empirical networks which have ĥ = 0 have no leaders in their structure.Furthermore, we observed a collapse of the entropy rate across the ensemble of networks from all domains.Such results validates our belief that the emergence of leaders is not an unorganized behavior, but on the contrary, such occurrence is simultaneous once a given threshold of non-normality is reached.
Figure 3(a) gives us further information regarding the percentage of leaders in the real-world networks.Inspired by the term leader we use in this paper, we will characterise the networks accordingly to the relative number of leaders per network (similarly to those found in Ref. [4]).If the accumulation of mass in the steady-state within a given terminal structure is shared between sev-eral nodes, then we view the network as having an oligarchic structure; this is the case, for instance, for all the networks with ĥ = 0. On the contrary, when the leaders emerge with increasing non-normality, the organization can be considered autocratic or anarchic, respectively, if the overall fraction of leader nodes is low or high.For example, most empirical networks belonging to the neuronal, animal relationship, social relationship, etc., subdomains tend to have a strong autocratic structure.On the other side, networks such as metabolic and genetic ones are highly anarchic with a star-like shape.Other domains such as food webs and communications networks, however, may vary between an autocratic to a more anarchic organization.This classification is further demonstrated via a k-means clustering approach [24] in Fig. 3(b) which naturally finds the three classes of organizational structure described above.

Hierarchical Structure
The results presented so far consider only a special subset of nodes, the leaders, without any discussion as to how they can be related to the other nodes of the network.Motivated by the identification of emergent behavior across our ensembles of non-normal empirical networks we now proceed to consider how the remainder of the topology describing these networks is shaped in relation to their leaders.Specifically, we consider a ranking of the nodes in relation to their position in the hierarchical structure underlying the leader nodes.In this sense each node has a level of importance based upon their proximity to a leader.We first identify each of the leaders before searching for shortest paths originating from these nodes [1] to each other node in the network.This results in each of the network's constituents having a hierarchical label based upon their minimum distance to a leader as outlined in the Supplementary Material (SM).The resulting rankings is such that leaders have a hierarchical label of zero, their direct neighbors a label of one, and so on.
With the labels obtained for all nodes in each network, we proceed to determine the types of relationship facilitated by each edge, schematically visualised in Fig. 4. Firstly, there are ascending edges (blue) that are aimed towards, and thus contribute to directing the flow to the leader.In contrast, we denote descending edges (red) as those that shift the flow from nodes of higher hierarchy, so nearer to the leader, towards those of lower hierarchy.Lastly, those edges which are between two nodes with the same hierarchical level, entitled neutral edges (green), thus keeping mass at a certain level.We highlight that by definition, neither descending nor neutral edges can originate from leader nodes.
The organization of the nodes following the recipe described above is now explicitly considered for a set of empirical networks from different domains -the citation network to the Small & Griffith paper up to the year 2001 [25], the email network from the Democratic National Convention in 2016 [26], the E. coli's gene regulatory network [27], and lastly the network of concatenated words in Dr. Seuss's novel Green Eggs and Ham [3].For an extended analysis of the entire dataset of networks, the interested reader may refer to SM.In each of the leftmost panels, we consider the fraction of (weighted) blue and red edges that correspondingly enter or leave the hierarchical levels indicated in the horizontal axis.The inset histograms demonstrate the total sum of edges for each of the three edge types above.We see in each case a common structural pattern (the same pattern can be observed for the most of the empirical networks in SM) where, in particular, a considerably larger fraction of blue and green edges in comparison to red ones indicates that the flow in the ascending hierarchy is considerably higher than that in the opposite direction.The center panels provide an insight into the exact hierarchical structure by indicating the fraction of edges between each hierarchy where we again notice a considerably larger proportion of blue (represented here by the upper triangular elements) and green (those along the diagonal) edges compared to red ones (in the lower triangular part).This analysis provides quantitative evidence that the structure of empirical networks is such to prove beneficial to those nodes closer to the pinnacle of the hierarchical structure.The right side schematics provide an illustrative indication of the type of structure present in the network.Remarkably, it can be noticed here that nodes belonging to the In all the cases a significantly higher number of upwards edges is observed associated to nodes of hierarchical level immediate to the leader nodes, the entourage nodes.It can also be noticed that the downwards red links, in fewer numbers tend to redistribute the flow from higher to lower hierarchical level.And last, more green links are associated to the entourage set of nodes, yielding a rich-club effect.
hierarchical levels right after the leader have, on aver-age, a high concentration of incoming ascending edges and neutral self-loops (this can also be further noticed in the SM).This occurrence is suggestive of a rich-club like effect [28].As we shall show in the sequel, both these features prove to be ultimately beneficial to the leaders and the nodes immediately associated with them, that we denote here as entourage nodes.

B. Network generation models
After the preceding systematic empirical study of the hierarchical properties of the real-world networks, we now consider mechanistic models with the aim of shedding light -both analytically and through simulation -upon the possible mechanisms that relate a network's non-normality and the corresponding emergence of leader nodes.With this motivation we proceed to propose a novel model which is based on the structural features observed within the empirical networks shown in Sec.I A and, most importantly, that can capture the emergence of leader nodes.
The generation mechanism can be seen as organized in two stages: first, we create a network using the renowned Price's model originally used to model the emergence of a citation network [19,29].According to this recipe, at each time step, a new node j creates m directed edges to m (distinct) already present nodes where the likelihood of joining to a node i is proportional to its in-degree.Importantly in the case of m > 1 this network immediately entails two interesting features -it is an exact DAG with one leader node (the first to appear) and also exhibits a hierarchical structure that may contain each of the three types of edges described in Sec.I A 2. In fact, although the wiring of new incoming nodes is more likely to be towards the nodes with high in-degree (and generally closer to the leader) in the case of 2 or more links, connections to nodes with the same or lower hierarchical level may also occur.In order to control variations in the level of non-normality, we move to the second stage whereby we consider creating reciprocal edges of those generated in the first stage, similarly as done in Ref. [6].The reasoning behind considering the reciprocal links is to capture the entire spectrum of behaviors from symmetric to DAG networks which would be otherwise impossible.An important observation here is that, if the distribution of reciprocal edges is uniform, this will lead to a larger number of edges from the seed node in the Price's model (who generally has a large number of incoming edges).This goes against the hierarchical structure observed in the empirical networks in Fig. 4. To deal with such an issue, we will distribute the reciprocal edges according to a fitness model inspired by the well-known Bianconi-Barabási model [30].So we generate a reciprocal edge i → j with probability proportional to 1/k out i , such that an edge is included if this quantity surpasses a certain threshold p with which the level of non-normality may be varied, thus decreasing the role of a node's importance in the first stage, and maintaining the distribution of hierarchical edges observed in the empirical networks.Note in this case for p = 0 all reciprocal edges are drawn resulting in a symmetric network while p = 1 implies an exact DAG.
A schematic demonstration of this model at the multiple growing stages is provided in Fig. 5(a) while simulations of such networks with m = 3 and their corresponding normalized entropy rate as a function of parameter p are shown in Fig. 5(b).To validate our model, we compare properties from ensembles of synthetic graphs to the ground truth data where the entropy rate transition occurs.It can be observed in the inset of Fig. 2 that the model fits very well, in spite of its relative simplicity, capturing very well the emergent behavior of the leader nodes.Notice that simpler generation models that, al- though not able to describe real data's behavior, give a good intuition in the relationship between non-normality and the ER are considered in Supplementary Material.

C. The role of leaders in dynamical processes
So far, we have demonstrated that empirical networks are characterized by a rich structure which apparently evolves across ensembles of networks with increasing nonnormality, culminating in the occurrence of leader nodes.
Although the importance of strongly directed hierarchies has been shown to be a signature of many complex systems in nature, and the decisive role of non-normality in the dynamics has similarly been highlighted on several occasions [6][7][8][9]15], to the best of our knowledge, the ubiquitous occurrence of an abundance of leader nodes within natural systems and in particular their apparent emergence in relation to a global measure such as nonnormality has yet to be discussed.Consequently, it is important to illustrate, from a holistic point of view, the role of said leaders in the the dynamical processes taking place on such systems.With the aim of providing this exemplification in a generic manner, we consider a competition process where N identical individuals, among whom a leader exists, compete for energy or mass (or resources in a more general term), measured by x = [x 1 , x 2 , . . ., x N ], which flows through the connections between the individuals encoded by the adjacency matrix A. To keep the model simple, we consider here a bistable dynamical system with two possible (stable) states that each individual i can have, namely it can either go extinct, where x i = 0 or survive in the case x i > 0. We describe this process mathematically via the following system of diffusively coupled equations where x i describes the density of the i-th species, − δ ij are the entries of the random walk Laplacian matrix, r is the reproductive rate, D is the diffusion coefficient, and A is a parameter which allows the introduction of an unstable state, necessary for the bistability.Notice that the transport operator used here is the mean-field equivalent of the random walk process considered throughout for the entropy rate [1].This model resembles that used to describe a phenomenon known in ecology as the Allee effect which describes the principle that undercrowding or a small density of a species' population decreases the likelihood of said species surviving [20].Recently it has been shown that, for the case of symmetric networks, when the initial densities are small, i.e., 0 < x i (0) 1, ∀ i each individual becomes extinct.Conversely for a non-normal system the behavior can result in some species surviving with some equilibrium density x e = 0 [7].
To provide some understanding of the particular role of leaders we simulate this process upon an empirical network constituted by a group of female Japanese macaque monkeys where the edges of the network represent dominance interactions between two animals [31], and in which a single leader exists.Figure 6(a) shows the evolution of the mass of each node x i (t), through the solid lines and we see the leader (blue) survives along with two other nodes (red and green) who are at the next level of hierarchy within the network.The network itself (direction of edges omitted) is shown in Fig. 6(b) and we can observe the proximity of the nodes who survive both with one another and also the leader.The outcome of these dynamics provides an indication as to how the specific hierarchical ranking of the individuals within the network can be a benefit not only for the leaders but also those who position themselves in close proximity to said leaders.In order to further comprehend this phenomenon we consider the linearised model whose evolution is governed by the system of equations ẋi = −rx i + D N j=1 L ij x j where r is the decay (death) rate and D is the diffusion coefficient as before.The evolution of the system in this case is presented, with equivalent initial conditions to the non-linear case, by the inset of Fig. 6(a).Now this simplified system has a unique fixed point x e = 0, that ultimately defines the final outcome also.Nevertheless, since the survivability of each node depends on the balance of mass received and released per unit of time, for the case when the decay is slow compared to the diffusion rate, r D, it might occur that nodes that have a high positive balance will initially accumulate a larger quantity of mass in comparison to the other nodes before eventually losing it in the asymptotic regime.This behavior is known in theliterature as transient growth [8] and characterizes a large number of real systems [6].However, such transient growth can turn into an instability mechanism when we deal with nonlinear systems, pushing the system unexpectedly far from the steady-state predicted from the linear analysis.This is the case for the individuals who survive in our scenario.In particular, having the role of a leader implies that the flow balance will always be large and positive, constituting a major benefit for the individual under consideration.

II. DISCUSSION AND CONCLUSIONS
In this paper we have studied the architecture of the hierarchical structures underpinning a large collection of empirical networks through the lens of the emerging leader nodes.Based on a tool borrowed from information theory -the entropy rate -we conducted a study aimed to quantify the amount of randomness underlying the distribution of the equilibrium state for a random walk process occurring on top of the network under analysis.In particular, we have related the configuration of equilibrium states to the polarization of the network structure by quantifying the latter through a global measure of the network, namely the graph non-normality [6][7][8].Considering such a setting, we observed a remarkable property for the entropy rate, which universally applies to all the real-world networks, specifically, it decreases monotonically while the level of non-normality describing the networks increase.
One surprising result found to be particularly interesting is how the entropy rate exhibits an abrupt collapse across the ensemble of networks once their non-normality level succeeded a certain threshold.We show that this phenomenon is immediately related to the emergence of leader nodes, namely those with only incoming (or only outgoing) edges, across the networks.In fact, these nodes are not present for the more normal networks and instead appear once a certain threshold has been reached.With these nodes identified, we proceeded to obtain a hierarchical ranking of the other nodes based upon their distance from the leaders.We used the resulting orientation to identify three categories of links: those that direct the flow towards the leaders, those that redistribute the mass from the leaders to other nodes with lower hierarchy, and lastly, the intermediate edges that link nodes of the same hierarchical level.Ubiquitously, the links that "feed" the leaders are a considerable majority compared to both other categories.Based on these observations, we developed a null model for the generation of non-normal networks with the aforementioned topological properties, capturing the ground truth relationship between the entropy rate and the network non-normality, particularly the discontinuous transition behavior which yield the leader nodes.This apparently ubiquitous behavior across domains is characteristic of those found in first-order phase transitions [32].
The leader nodes, either sinks or sources depending upon interpretation, can eventually prove crucial in different scenarios.Possible examples can be found in ecology, e.g., the dominance hierarchies among individuals of animals (sink nodes that receives the "benefits" from other members) or food webs (sink nodes where the biomass accumulates); control engineering, e.g., the master-slave coupling of oscillators (source node, the "master" node who impose the oscillating frequency and phase); social interactions, e.g., contagion dynamics (source node that seeds the infection), etc.However, the leaders' role in collective behavior, particularly concerning the underlying non-normal dynamics, has been neglected so far.The present paper briefly illustrates this role in the case of a simple competitive dynamics between individuals occurring on an empirical dominance hierarchical network.We show that the privileged status that a leader node has is related to the fact that it absorbs the flow without the constraint of releasing.Of particular importance is the balance of incoming and outgoing flux that a node has, which results in an advantage even for the entourage nodes, those immediately connected to the leaders.
Based on the apparent ubiquity of leaders in real-world systems, we are confident that our finding will trigger future exciting research directions, and contribute to better understanding how different dynamical systems are affected by this emergent phenomenon.

A. Random walk process and its entropy
The random-walk process considered in this article describes some quantity which we refer to as mass that is transported between nodes such that at each time its particles move from node j to one of its neighbors i with transition probability where k out j = i w ij is the outdegree of node j.Note that following this definition the mass cannot leave the nodes without outgoing edges.We consider, as in [17,33], the probability q j (t) that the random walker who represents a unit of mass is present at node j at time t such that the vector q(t) = [q 1 (t), q 2 (t), • • • , q N (t)] describes the proportion of mass with each node at time t (with j w j (t) = 1).The dynamics of this system is thus given by q(t+1) = T q(t).In general we are concerned with the long-time behavior of these systems, i.e., the stationary distribution q * j = lim t→∞ q j (t), the existence of a unique occurrence of this distribution is very much dependent on the structure of T which is itself determined by A. Finally the entropy rate, which represents the amount of information required to describe the diffusion process in question [16,17], is given by h = − i,j T ij × q * j ln(T ij ).Importantly one may notice that in the case of (sink) leader nodes, this process results in an entropy rate of value zero as in this case all the mass is accumulated in these nodes and the only contributing term is due to the leader node j with T jj = 1.

Supplementary Note I. NON-NORMALITY METRICS
Throughout this work we focus on directed, weighted graphs described by the N ×N adjacency matrix A, which has elements w ij describing the weight of an edge from node j to node i.In order to quantify the level of non-normality present in a given network we make use of two measures from matrix theory.The first of these is known as the Henrici departure from normality d , where || • || F describes the Frobenius norm and λ i represents the eigenvalues of the matrix [8].As this quantity does not have a natural scale we instead consider the normalized Henrici departure from normality dF (A) = d F (A)/||A|| F , which varies from the extreme values for a symmetric network ( dF (A) = 0) to the case of an exact DAG ( dF (A) = 1).The second measure of non-normality considered in the article is known as the unbalance ∆ between the number of entries in the upper and lower triangular elements of the adjacency matrix such that ∆ = |K < − K > |/(K < + K > ) where K < = i<j Ã, K > = j<i Ã, and Ã represents a relabeled version of the original adjacency matrix obtained via an optimization procedure.A heuristic strategy looks for the matrix which maximizes the unbalance between its upper and lower triangles and the search space of this procedure is navigated through simultaneously swapping two randomly picked rows and their two corresponding columns of the original adjacency matrix.The heuristic implemented in this case is a simulated annealing, similar to [6].Its output should approximate the closest the network structure may be to a DAG in one of the triangles of the resulting matrix.The two metrics are then shown in Supplementary Figure 1 in the case of the 124 empirical networks used in this study (the exact numeric quantities can be found in the Supplementary Note III).It may immediately be seen that a strong positive correlation exists between the two quantities demonstrating their usefulness in describing the level of non-normality present in a network and the pervasive nature of this feature among empirical systems.This relationship has previously been commented on for a smaller collection of networks in Ref. [6] thus the analysis provided here gives further evidence of the finding.

Supplementary Note II. SYNTHETIC NETWORK MODELS
A. An Exactly Solvable Model: The Chain Network We now consider a linear graph, in particular, a unweighted unidirectional chain network that has been complemented with backward loops of weight .The latter will be the control parameter with which we can tune the level of non-normality of our toy network.A similar network model has been also considered in [14,15].In this scenario, we have a network that can vary from the case of a simple unidirectional chain network when = 0 and a fully symmetric version of the network when = 1.An illustration of such a network is provided in Supplementary Figure 2(a).
We now consider the entropy rate of the generic random walk tkaing place upon this network given by where each term is as in the main text.If we begin by noticing that this network's adjacency matrix is given by and thus the transition matrix of the random walk taking place on this system is given by The stationary distribution of the random walk process occurring on the network is also required to determine the entropy rate and this is readily shown to be given by lastly substituting both Eqs.(S7) and (S8) into (S5) allows one to calculate the entropy rate exactly as From here it is immediately possible to determine the behavior of the extreme values of the control parameter .First, we see that for lim →0 h = 0, i.e., as the network approaches a complete hierarchical or DAG structure the entropy becomes zero as expected due to mass all accumulating with the top node in the network.The other extreme is when the chain network becomes entirely symmetric, and so lim →1 h = N − 2 N − 1 log (2).Results from simulation of this dynamical processes on synthetic networks alongside the corresponding estimates from Eq. (S9) are shown in Supplementary Figure 2(b).Since this model is exactly solvable, the perfect agreement observed between theory and simulation is fully expected.This model proves attractive as due to its analytical tractability allowing an insight into the monotonic relationship that exists between the non-normality and the entropy rate of the random walk.

B. Non-normal Scale-free Networks
Our focus now turns to the case of a synthetic model of non-normal scale-free networks which are similar to those introduced in Ref. [6].With motivation coming from an extension of the original Price's model [19], we start by first generating a scale-free network via the configuration model [1] which provides an undirected graph such that the resulting network has a power-law degree distribution P (k) = k −γ .Being symmetric, this network is structurally normal by definition.So in order to introduce a level of non-normality, we modify the network such that the new adjacency matrix Ã is given by where A upper describes the upper triangular elements of the original adjacency matrix and likewise A lower describes the lower triangular elements.Note that this adjacency matrix reverts to the original one in the case = 1 and in the case of = 0 represents a perfect DAG.Although an approximate formulation of the entropy rate can be found for the case of symmetric networks [17], the asymmetric case considered here is not amenable to analysis and as such we address the problem of calculating the entropy rate numerically, Supplementary Figure 3 shows the results of the entropy rate as a function of the strength of backward edges and for various values of the parameter γ.The simulations are averaged over 100 ensembles of these networks with a size of N = 100 nodes.Algorithm 1 Hierarchical label identification for each node in a given network of size N .Firstly, it finds all leaders or nodes with no outdegree.If there are no leaders the algorithm stops and does not return labels.Otherwise, it proceeds to find shortest paths over the network from each of these leaders.Every node is given a label corresponding to their minimum distance to a leader node.With this labels at hand we proceed to conduct analysis with regards the structure of edges within each network.In particular, we focus upon the hierarchical levels of nodes at the end of each edge in the network.We then view edges based upon their contribution towards the networks structure such that those joining a larger to smaller hierarchical level (and thus contributing towards the leader nodes) are blue, those in the opposite direction (contributing away from the leader) are red, while lastly those which join two nodes of the same hierarchical level are green.Furthermore, we also provide analysis into the general structure by looking at the H × H matrix, where H is the number of hierarchical levels, with entries ψ ij describing the sum of the weights of edges joining nodes in hierarchical level j to those in hierarchical level i. Importantly, the upper triangular elements of this matrix describes the blue edges, the diagonal elements the green edges, and the lower triangular the red edges.
We proceed to visualize, as in Fig. 5 of the main text, the structure of each network in our collection.For each case we provide three visualizations:

Hierarchical Number Fraction of Edges in Direction
FIG.2.Normalized entropy rate dependence on the network polarization in real-world networks.The network non-normality, quantified by the normalized Henrici departure df versus the normalized entropy rate ĥ for 124 empirical networks from a large range of domains is shown.At first glance, the set of networks seem to be grouped in a consistent majority (∼ 85%) having ĥ = 0, and a small minority (∼ 15%) with ĥ = 0.In particular, for those data with non-zero entropy rate, a monotonically decreasing relationship of the entropy rate with the non-normality is observed.When some threshold value of the Henrici measure is reached across the ensemble of networks, a transition-like behavior occurs for four of the subdomains under consideration, which results in an abrupt collapse of the entropy rate towards zero.This occurrence indicates that the mass is accumulated exclusively in single nodes without outgoing edges, or leaders.Inset: We have shown the four subdomains where the transition occurs (neuronal, trade, animal relationship, and roads) for a more detailed inspection of the emergent behavior.Also shown are a number of percentiles (0.5%, 25%, 50%, 75%, and 99.5%) of the two quantities obtained from 10 4 realizations of the proposed null model (N = 100) with increasing threshold.These realizations indicate how our model manifests a similar transition in the entropy rate with increasing non-normality.

4 FIG. 4 .
FIG. 4. Hierarchical structure of real-world networks.(a)The fraction of edges to each hierarchical level entering (respectively leaving) from a lower level, blue lines, and (respectively) from a higher hierarchical level, red lines.The inset plot shows the total weight of edges which are upwards (blue), downwards (red), and between hierarchies (green) in the case of the citations graph to Small & Griffith (2001).(b) The fraction of edges between each hierarchical level is shown where we see that a large fraction of edges either move up hierarchical levels or else stay within their own hierarchy.(c) Illustrative schematic of the network's hierarchical structure.Equivalent plots for (d)-(f ) the email network at the Democratic National Convention (2016), (g)-(i) the gene regulation network of the Escherichia coli, and (j)-(l) the word association network from the novel Green Eggs and Ham.In all the cases a significantly higher number of upwards edges is observed associated to nodes of hierarchical level immediate to the leader nodes, the entourage nodes.It can also be noticed that the downwards red links, in fewer numbers tend to redistribute the flow from higher to lower hierarchical level.And last, more green links are associated to the entourage set of nodes, yielding a rich-club effect.

FIG. 5 .
FIG. 5. Entropy rate of the generic random-walk on synthetic networks.(a) Schematic demonstrating of the null model at multiple points in its growth process.(b) Normalized entropy rate for the null model for a number of different network sizes with m = 3 and threshold values p, with the inset demonstrating the same quantities with the horizontal axis on a log-scale, each point shown is the average over 10 4 realizations.

FIG. 6 .
FIG. 6. Dynamical processes taking place on a dominance network of Japanese Macaque monkeys.(a) The time evolution of xi(t) for both the Allee model (solid) and the linearised version of the model (inset).In spite of the network only having one leader (whose behavior is shown by the blue lines) there are three nodes which survive in the Allee model shown by the colored lines.Note the same three nodes show the strongest transient growth in the linearised model.(b) The network representation with direction of edges omitted to improve clarity where the size of the nodes are inversely proportional to their outdegree (the leader is thus the largest) while their color represents the node's final density.

1 .Supplementary Figure 2 .
The strong asymmetric and non-normal structure of empirical networks.The normalized Henrici's departure from normality df versus the structural measure of asymmetry ∆ for 124 networks from a large range of domains is shown.We note the positive correlation between the two measures.The data are grouped in 6 domains represented by the same color of the symbols which in turn are divided in several sudomains identifiable by different shapes.Entropy rate of the generic random-walk on synthetic chain networks.(a) Schematic representing the chain network of length N .The DAG structure is represented by the blue lines of unitary weight while the normality is introduced through the red backward edges of weight .(b) entropy rate for the generic random walk taking place on these chain networks as a function of for a number of network sizes, where dots represent the simulated values and lines the theory as per Eq.(S9).The three larger networks are practically indistinguishable until being close to symmetric as shown in the inset plot.

for i = 1 to do 20 :
leader ← leaders(i) 21: labels ← shortestpathdistancesfrom(G,leader) 22:for j = 1 to N do 23: L[j] ← min(L[j], labels[j]) contained in the DBLP computer science bibliography as of May of The hierarchical organization in empirical networks: oligarchy vs. anarchy.The fraction of leader nodes (nodes without outgoing links) in each network versus the level of non-normality present in the network captured via its normalized Henrici departure from normality df .The color and shape of the points indicate the different domains where the networks belong to, with the same notation used in Fig.2.(b) Equivalent to (a) but where the two quantities are clustered via a k-means approach with k = 3.We see a clear pattern of clusters: first, the cases where those networks with a larger fraction of leader nodes which describes the different form of leadership ranging from autocratic to anarchic; the second clear grouping distinguishes the level of directionality of the networks captured via the non-normality measure.