Geometric detection of hierarchical backbones in real networks

Hierarchies permeate the structure of real networks, whose nodes can be ranked according to different features. However, networks are far from treelike structures and the detection of hierarchical ordering remains a challenge, hindered by the small-world property and the presence of a large number of cycles, in particular clustering. Here, we use geometric representations of undirected networks to achieve an enriched interpretation of hierarchy that integrates features deﬁning the popularity of nodes and similarity between them, such that the more similar a node is to a less popular neighbor the higher the hierarchical load of the relationship. The geometric approach allows us to measure the local contribution of nodes and links to the hierarchy within a uniﬁed framework. Additionally, we propose a link ﬁltering method, the similarity ﬁlter , able to extract hierarchical backbones containing the links that represent statistically signiﬁcant deviations with respect to the maximum entropy null model for geometric heterogeneous networks. We applied our geometric approach to the detection of similarity backbones of real networks in different domains and found that the backbones preserve local topological features at all scales. Interestingly, we also found that similarity backbones favor cooperation in evolutionary dynamics modeling social dilemmas.

Many real systems display a hierarchical organization [1] where higher status members dominate over lower-graded ones, according to a certain measure of power, wealth, importance or influence.Examples are ubiquitous in living systems, including molecular regulators governing gene expression [2], animal communities of eusocial insects [3], dominantsubordinate relationships in mammals [4], and different structures -companies, political parties, courts, military, organized religion etc-in human society [5].Additionally, hierarchical organization can be found in non-living systems such as computer generated imagery (CGI) [6], grammatical theory of language [7] or the structure of a musical composition [8].Hierarchies are, thus, ubiquitous, and shape more easily controllable structures [9] that can emerge as the result of opposing forces, such as competition between individuals [10] or a combination of cooperation and imitation strategies [11].
Complex networked systems, however, present important challenges when it comes to the detection of hierarchies due to the lack of a unique and unambiguous stratification scheme, possibly including nestedness or layered structure.This is caused by the small world property and the presence of a large number of cycles of different lengths in their topolo- * marian.serrano@ub.edugies, in particular clustering [12], so that networks' organization deviates strongly from tree-like.In directed networks, link directionality can be exploited to ease the problem and hierarchical order can be detected using, for instance, penalty-function minimization strategies [13][14][15][16].Nevertheless, most frequently the only meaningful or available representation of a complex system is an undirected graph.Within this architecture, a hierarchy is typically defined as a ranking where the status of a node becomes determined by some heterogeneous topological property, for instance degree [17] or some other centrality measure [9,18].However, other attributes shape as well the hierarchical structure of real networks, like similarity between nodes [19].Clearly, the control exerted by a higher-status node over a lower-status one will be more effective when there exists closeness or affinity between them.Conversely, the strength of hierarchical relations gets dissolved as nodes loose their proximity and become dissimilar.
Here, we integrate degree rank, or popularity, and similarity between nodes in an enriched interpretation of hierarchy, valid for real networks.For this purpose, we capture network architecture using geometric network maps [20,21] and models [19,22], which naturally encode popularity and similarity attributes of nodes as coordinates in an underlying metric space.Exploiting the geometric approach, we are able to characterize the individual contribution of each node and each link to the hierarchical structure of a net-work.Moreover, we exploit the great heterogeneity found in the hierarchy load of links to propose a filtering method, the similarity filter, that offers a practical procedure to extract a hierarchical similarity backbone of a network.The obtained similarity backbones contain the links that represent statistically relevant interactions with respect to the maximally random geometric network organization, constrained by node degrees and clustering [23].The similarity filter preserves network features at all scales while pruning a large number of links, in the spirit of the disparity filter for weighted networks [24].However, the similarity filter has a different purpose and operates on the basis of different models in a completely distinct framework, that of unweighted undirected networks.To illustrate the use and results of the similarity filter, we extracted and analyzed the hierarchical similarity backbones of several real-world networks from different domains.Finally, we explored the role of hierarchical similarity backbones in evolutionary dynamical processes, which historically have been argued to be sensitive to the hierarchical organization of complex architectures [25,26].Implementing an evolutionary prisoner's dilemma game [27][28][29][30] on the real networks under study, we discovered that the similarity backbones tend to achieve final states of greater cooperation as compared to the corresponding original networks, when initial conditions are equivalent.

I. HIERARCHY LOAD OF LINKS AND NODES
We base our definitions of hierarchy on geometric maps of real networks obtained from geometric network models.In the S 1 model [19], the contributions of popularity and similarity dimensions to network connectivity are independent and explicit.Nodes lie on a one-dimensional circle representing the similarity space and thus have an angular coordinate θ i , so that the more similar two nodes are the shorter is the angular separation, ∆θ ij , between them.Nodes are also characterized by a hidden degree κ i , directly related to the observed degree, standing for popularity or status.The probability that two nodes i and j connect is a function that increases with their similarity and with the product of their hidden degrees where R is the radius of the similarity circle, which we choose to be equal to N/(2π) for N nodes, so that the density of nodes in the circle is equal to one, and parameters β and µ control the mean clustering coefficient and the average degree of the network, respectively.
The S 1 model is able to reproduce many of the features widely observed in real complex networks, such as scale-freeness, high levels of clustering and the small-world property, among others [19].Interestingly, the S 1 model is the only model able to produce maximum entropy ensembles with power-law degree distributions and clustering and without nonstructural degree correlations [23].Moreover, the model also allows us to construct geometric maps of real networks through a process called network embedding.Given a real network, one can find the values {κ i , θ i } for every node i, as well as the global parameters µ and β, that maximize the likelihood for the observed network to be generated by the model [20].The congruency of the S 1 model with real networks results in very meaningful embeddings, which have proven to be useful for network navigation [20,31,32], and for the discovery of symmetries in the structure of real networks such as self-similarity at different length scales [33].Notice that the distribution of angular positions of nodes in the S 1 model is assumed to be homogeneous, while typically real network embeddings show that nodes form geometric communities in similarity space [34][35][36][37].The geometric embeddings of real networks used in this work were computed using the mapping tool Mercator [21].
We use the purely geometric isomorphic version of the S 1 model, named the H 2 model [22], for visualization purposes.In the H 2 model the hidden degree is transformed into a radial coordinate in a hyperbolic two-dimensional disk, such that higher degree nodes are placed closer to the centre of the disk.The angular coordinate remains as in the S 1 circle and the connection probability becomes a decreasing function of the hyperbolic distance between nodes, see Fig. 1A.For further details refer to section Methods B.

A. Definition of hierarchy load
We characterize the local contribution of a link or a node to the hierarchical structure of a network by measuring its hierarchy load, which depends on status, similarity, and the reference provided by a null model to discount the effects of random fluctuations.We use the S 1 as a null model since it is the maximum entropy model for geometric networks with heterogeneous degrees [23].It provides expectations for the distribution of hierarchy loads in a pure random assignment of angular positions of nodes given a degree distribution and a level of clustering, so that anomalous fluctuations can be detected.
Given a geometric network embedding, where nodes have coordinates {κ obs , θ obs }, we consider that popularity, corresponding to the hidden degree or, equivalently, to the radial position in the hyperbolic plane, is a measure of status.This means that a node i has a lower-status neighbor j when κ obs j < κ obs i .
Figure 1.Hierarchy load in geometric networks.A) Illustration of a geometric network in the hyperbolic disk.Node i and its subordinates l, m, n are highlighted inside a blue square.Notice node i has a fourth neighbor which is not a subordinate since it lies at a smaller radial position from the center of the disk and thus has higher degree than node i. B) Hierarchy load of a link, hij, between node i and a lower-status neighbor j, located at angular distance ∆θ obs ij .Notice in this example hij is necessarily hij < 0.5 as j is located at an angular distance greater than the expected from the null model, which appears depicted in grey.The hierarchy load of node i, hi, is obtained from averaging all link hierarchy loads corresponding to this node.C) Hierarchy load of node i as the circular mean of the angular coordinates of lower-status neighbors l, m, n.Mean resultant vector is depicted by a black arrow of |R| < 1. Vectors added in the calculation of h * i (see Eq.2) appear as arrows in yellow, orange and red inside a circle of unit radius.
Similarity between the two nodes is represented by their angular separation in the network map, First, we define the hierarchy load h ij of a link between node i and its lower-status neighbor j as the probability of obtaining a similarity distance between them in the S 1 model greater than the one observed in the map Being a probability, h ij is bounded in the interval [0,1], while the angular separation between nodes ∆θ obs ij ∈ [0, π].Furthermore, Eq. ( 2) can be computed analytically giving an expression depending on the coordinates of the nodes in the embedding (see Methods D).In particular, in a synthetic network generated with the S 1 model, the expected value of h ij for any {i, j} is h ij = 1/2 since, in that case, the observed angular distance ∆θ obs ij is generated by the model and, hence, it is a random variable distributed according to the same distribution as ∆θ ij .In other words, Eq. ( 2) reduces to the probability for two equally distributed variables a and b to fulfill a > b.As a consequence, Eq. ( 2) has the clear advantage of being size independent, in the sense that the value h ij = 1/2 defines the S 1 -model hierarchy baseline for any N , therefore allowing us to compare the hierarchy structure of networks of different size.
Link hierarchy loads inform about how substantial is the contribution of a link towards the hierarchy by comparison with the reference level provided by the S 1 null model, which is h ij = 1/2 as explained above.Accordingly, when the link hierarchy load is higher than the reference, h ij > 1/2, nodes i and j are closer than expected in angular distance, and so they are more similar and their relationship is more hierarchical.In contrast, when the probability is low, h ij < 1/2, i and j are more dissimilar than expected by the null model, meaning that they lie farther away in the angular space and, thus, their relationship is less hierarchical.In fact, Finally, within the same framework we also define a measure of hierarchy load for nodes as where the sum runs over the n i lower-status neighbours j of node i satisfying κ j < κ i .

B. Hierarchy load of nodes in terms of angular concentration
Alternatively, one can also measure the hierarchy load of a node i as the angular concentration of its n i lower-status neighbors by computing the circular mean of their angles {θ j } j=1,...,ni .This method is different but very similar to the node hierarchy measure used in [36] for the analysis of the World Trade Web.When lower-status neighbors are concentrated in a narrow angular sector it means that node i has a tendency to establish links with more similar nodes, hence node i is contributing towards a more hierarchical structure and thus carrying a higher hierarchy load h i .Conversely, when the lower-status neighbors of i are distributed in a very broad angular sector, the hierarchy load of node i is low, indicating that it is able to establish links with more dissimilar lowerstatus nodes, thus supporting a flatter organization.
The hierarchy load of node i in terms of angular concentration of lower-status neighbors is computed as the length of the mean resultant vector, see Fig. 1C.The modulus of the circular mean vector R ∈ [0, 1] is a good proxy of angular confinement since it is 0 for angles pointing in opposite directions and it becomes 1 when the angles are totally aligned.The measure is simple enough so that it generalizes well to networks of very different domains as long as they admit a geometric interpretation.It is worth mentioning that, since the average angular separation between nodes decreases with the network size N , this quantity increases with network size.This should be taken into account when comparing h * i measurements among networks of different size.

C. Hierarchy load vs geometric communities
To understand how the global distribution of angles, and, in particular, the existence of geometric communities, could affect the spectrum of hierarchy loads, we consider synthetic networks with controllable angular concentration of nodes.Typically, real network maps present angular regions more densely populated than others, which define a partition of the network into different geometric groups [34][35][36][37].Geometric communities can be accurately detected using algorithms such as the Topological Critical Gap Method [36] and reproduced by geometric network models [38,39].We generated synthetic networks with tunable geometric communities using the soft communities in similarity space (SCSS) model [38] (see Methods B).The SCSS model is an extension of the S 1 model that enables to create scale-free networks with high clustering while controlling for the global heterogeneity of the distribution of angular coordinates.Essentially, this model relies on a preferential attachment process in similarity space [39], so that the angular coordinates of nodes depend on the angular coordinates of higher-degree nodes.The effect of this similarity preferential attachment is regulated by a parameter Λ.The SCSS recovers the S 1 model in the limit of homogeneous angular distributions, which corresponds to Λ → ∞.
We show the spectra of hierarchy loads h i , Eq. (3), and h * i , Eq. ( 4), in Fig. 2A-B for synthetic networks with very different geometric community strengths (Λ = 0.01 and Λ = 10.0).The spectra of hierarchy loads is measured by averaging the node hierarchy loads over degree classes.The two measures display different results, but in both cases the global angular heterogeneity has a minor effect in shaping the hierarchy loads of nodes.Therefore, the spectrum of hierarchy loads of nodes is not merely a measurement of geometric community structure.As expected, h i spectra are flat and lie around the average hierarchy load h = N i=1 h i /N for the whole network.At large Λ values, h tends to 0.5 because, by construction, SCSS networks recover the S 1 model in this limit.Heterogeneous angular distributions in the limit of small Λ values reduce the average angular distance between nodes in the network, and as a consequence the average hierarchy load of the network is slightly above 0.5.
This effect is also evident in the spectra of hierarchy loads h * i , where h * is lower for more homogeneous angular distributions (Λ = 10.00,h * = 0.89 ± 0.10), as compared with networks with more heterogeneous distributions that present higher values (Λ = 0.01, h * = 0.95 ± 0.06).In this case, the average hierarchy level of the synthetic networks is rather high ( h * ≈ 0.9), basically due to sustained large values of hierarchy load for a wide range of node degrees.

D. Hierarchy spectrum of real networks
We measured and compared the hierarchy spectrum of several real networks from different domains: the email communication network within the Enron company (Enron) [40], the Internet at the autonomous system level (Internet) [20,41], the one-mode projection onto metabolites of the human metabolic network at the cell level (Metabolic) [35] and the network of chord transitions in western popular music (Music) [42].For more information see Table I and Methods A.
Real networks show a variety of profiles, see blue curves in Fig. 2C-F.They also present in all cases an average hierarchy load h below the reference of 1/2.In particular, Enron shows the highest average hierarchy load at the node level, h = 0.47 ± 0.19, followed by Metabolic, Music and, lastly the Internet with h = 0.29 ± 0.20.Variations in h across networks conform to their distinct spectra.For instance, whereas Music and Internet networks show noticeable fluctuations in h i across degree classes, these are milder for Enron and Metabolic networks.In general, however, all networks show a tendency for the lowest degree nodes to have the highest hierarchy loads and for the highest degree nodes, or hubs, to approach h i = 1/2.This last observation may be attributable to great heterogeneity in the hierarchy load of links h ij and the fact that hubs present numerous connections with lower status nodes which are averaged when computing h i .
Regarding the hierarchy load of nodes in terms of angular concentration, the Internet and Metabolic are significantly more hierarchical ( h * ≈ 1) than Music and Enron.The specificity of each profile has its roots in each network's specific degree sequence, whereas angular communities show again minor influence as revealed by the resemblance between the spectra of the 4 real networks with the spectra of their corresponding geometrically randomized counterparts (see Methods B and Supplementary Fig. S1).The randomized counterparts consist in replicas of the real networks where the distribution of angular coordinates is homogeneized, thus eliminating geometric community structure, while the rest of properties are preserved and the replica network remains maximally geometric.Moreover, we note that the hierarchy loads of nodes tend to show strong heterogeneity and an inverse correlation with the degree, so that more popular nodes (higher k) contribute more to dilute the hierarchical structure by A-B) Link hierarchy loads of the Enron and Metabolic networks, respectively.Each dot indicates the value of hij of a link stablished between node i and one of its lowerstatus neighbors j with hidden degree κj indicated by the color code.Notice node labels in the x-axis are sorted by increasing hidden degree κi, so that more popular i nodes appear to the right.
connecting to less affine lower-popularity neighbors.This trend is remarkably pronounced for the Internet, Enron, and Music, while heterogeneity is less pronounced in Metabolic.In fact, this happens because the Metabolic network presents more modular hubs, less prone to connect with dissimilar nodes due to an unusually marked partition of the similarity space.

II. THE SIMILARITY FILTER AND HIERARCHICAL BACKBONES OF REAL NETWORKS
Strong heterogeneity is found in real networks if we analyze the contribution of lower-popularity neighbors to the hierarchy load of nodes, see Fig. 3.We can take advantage of this heterogeneity to filter out the connections that dominate the hierarchical organization of the network in terms of popularity and similarity, what we name the hierarchical similarity backbone (HSB).A hierarchical similarity backbone contains the links that represent statistically significant contributions with respect to the null hypothesis given by the S 1 model, that is the only model able to produce maximum entropy ensembles constrained by power-law degree distributions and clustering and without nonstructural degree correlations [23].

A C
Fraction of hubs  [36] (in red) filtered with α = 0.4, on top of the complete network (pale blue).B) Plots from I to IV show: the relative number of nodes in the backbone, against the relative number of edges in the backbone (subindex 0 refers to the complete network); the relative number of nodes in the backbone giant connected component (gcc), against the relative number of edges in the backbone gcc; mean clustering coefficient of the backbone for increasing α; number of disconnected components against increasing values of the significance α.C) Fraction of nodes considered hubs, for threshold value τ = 0.25, found in backbones obtained with increasing α.D) Topological features of the HSBs of the Music network, obtained for α's from 0.1 to 0.8.Value α = 0.0 corresponds to the original unaltered network, whereas in the most restrictive HSB (α = 0.8) 0.60% of the nodes and 14% of the links remain.From I to III: complementary cummulative distribution of rescaled degrees, kres = k/ k , degree dependent clustering coefficient over rescaled-degree classes, normalised average nearest neighbor degree knn res = knn(kres) k / k 2 .
The link hierarchy load in Eq.( 2) measures the probability under the null hypothesis that the similarity distance between a node and a lower-status neighbor is larger than the observed in the embedding of the network, what is known as p-value in statistical inference.By imposing a significance level α, the links that carry a hierarchy load that are not compatible with the random angular distribution of angles in the S 1 model, and reject the null hypothesis, can be filtered out.A hierarchical similarity backbone is then obtained by preserving all the links that satisfy the criterion h ij ≥ α, while discounting the rest.As we increase the significance level α ∈ [0, 1], the filter progressively focuses on more relevant links to obtain a sequence of nested subgraphs, each with a more strict condition for a link to belong to the HSB of the network, see an illustration in Fig. 4A.Noteworthy, since the similarity filter is applied to the links, nodes of any degree may find a place in a very hierarchical backbone if they are found to have significantly strong hierarchical connections.
We tested the performance of the similarity filter by exploring the hierarchical similarity backbones of the four real networks considered in this work.For instance, when filtered with α = 0.25, the Internet, Metabolic and Music HSBs contain a proportion of edges that is already less than half of the original, so E/E 0 < 0.5.In contrast, the proportion of nodes in the same backbones remains very high, the lowest case being the Internet, but with still 85% of the nodes.The results in Fig. 4B-II show similar behaviour for the reduction in nodes and edges of the giant connected components of the backbones, for the 4 networks under study.Only the decay in number of nodes in the gcc of the backbones tends to be less abrupt than in Fig. 4B-I and start sooner, at smaller values of the filtering parameter α.Moreover, we observe in Fig. 4B-III that the mean clustering coefficient of the filtered similarity backbones does not have strong fluctuations and varies little with α, with the exception of the Internet which shows a clear decreasing trend.
In Fig. 4C we inspect the participation of hubs in the HSBs.For this purpose, we sort the nodes in the network from highest degree to lowest and tag as "hubs" all nodes lying within a top slice of the list, delimited by a threshold value τ .For instance, when τ = 0.25, the top 25% of the nodes in the ranked list are considered as hubs.Subsequently, we keep track of the proportion of such high-degree nodes in every hierarchical backbone for increasing α. Figure 4C demonstrates that, even when considering that a large fraction of the network (τ = 0.25) are nodes of high degree, in fact these nodes only represent, at best, half of the backbone composition (see Internet, Metabolic and Music in Fig. 4C for α = 0.9).In Supplementary Fig. S4 we show analogous plots to Fig. 4C for a wider range of threshold values τ ∈ [0.05 − 0.30] providing further evidence that, while more restrictive HSBs become enriched with hubs, the similarity backbones still present an assorted composition in terms of node degrees.
Finally, we find that topological features -degree distribution, clustering coefficient and average nearest neighbors degree-of the original network are preserved in the subgraphs as we increase α and the backbone is progressively restricted to exceedingly hierarchical links, see Fig. 4D and Supplementary Fig. S2.Only for very high values of the significance level, α 0.8, when not only the number of links but also the number of nodes is strongly reduced, the measured topological properties start to deviate from the original curves.This suggests some grade of selfsimilarity across the sequence of HSBs.

III. GAME DYNAMICS IN HSBS
As an illustration of the importance of HSBs, we study a dynamical process with intriguing behavior in networked systems: the evolution of cooperation.The evolution of cooperation has been studied in different fields [43][44][45][46], but it is not well understood yet in scale-free networks [47][48][49][50][51] or in real networks [52], which additionally present high levels of clustering and finite size effects.In fact, only recently some mechanisms have been proposed to aid cooperation in real networks [53], based precisely in the geometric approach followed in this contribution.Here, we show that, counterintuitively, our HSBs capture the links of real networks that better support cooperative behavior in evolutionary dynamics.We consider an evolutionary prisoner's dilemma game consisting of two players deciding to cooperate or defect with one another, and gaining a specific reward depending on which of the four possible outcomes takes place.The game proceeds in successive rounds; in each round every node accumulates a payoff π i resulting from playing the game with all its neighbors; after that, and before moving to the next round, the strategy (cooperate or defect) played by every node is updated simultaneously taking into account an imitation mechanism.The imitation step consist of each node i deciding to adopt the strategy of a randomly chosen neighbor j with a probability dependant upon the difference of their payoffs (π i − π j ), to reflect the tendency of copying more successful neighbors.See section C in Methods for more details.
We simulated the dynamics on the four real systems analyzed in this work and on two different HSBs for each of them (with α values and corresponding sizes in number of nodes and links reported in Supplementary Table S1).The results are provided in Fig. 5A-H.Notice that the similarity backbones are always selected so that they face a considerable reduction in the number of links while their number of nodes does not decay drastically.That is, for a given real network, the HSBs where we run the game dynamics lie along the slope change part of the blue curves in Fig. 5I-L, and are identified by blue symbols.We use random surrogates to discern wether the results of the dynamics on HSBs are due to their hierarchical nature.The surrogate backbones are obtained by removing a number of links at random so that the they have exactly the same number of links as the corresponding HSBs (see matching fraction of edges between red and blue symbols in Fig. 5I-L).As a result, the fraction of remaining nodes in a given HSB may be higher or lower than in the analogous random surrogate.Regardless of the situation, however, the dynamics results stay consistent, see red curves in Fig. 5I-L.Note that the fraction of nodes in the random surrogates is still very high even when the number of links has been greatly reduced, akin to the case of similarity backbones.This was expected from the reported robustness of scale-free networks to random removals [54,55].
The evolutionary dynamics are initiated by distributing a proportion of initial cooperators uniformly at random among the nodes of a network.For each of the three graphs (original network and the two similarity backbones), we vary the proportion of initial cooperators and quantify the level of cooperation achieved in the network at the end of the dynamics by measuring the fraction of final cooperators N fin coop /N gcc after 10 5 rounds of the game.This metric is averaged over 100 realizations in all showcased curves in Fig. 5B showing the results for the dynamics on HSBs.Notice the system can reach a quenched state before the maximum number of rounds is achieved if all agents become either cooperators or defectors.In that situation the imitation mechanism does not induce any further evolution and the dynamics becomes effectively frozen.
The results in Fig. 5A-H show that the real networks have a tendency for their hierarchical similarity backbones to display final cooperation levels equal or greater than the achieved in the original network for equal proportions of initial cooperators, despite their radically reduced number of links.This is general for  S1).For an HSB (blue curve with symbols), the corresponding random surrogate appears as a grey dashed line together with the original network in black line for reference.I-L) Relative number of nodes against relative number of edges in the gcc of similarity backbones, and their corresponding surrogates, for the four networks under study.Blue lines correspond to backbones obtained using the similarity filter and red lines to surrogates obtained by random link removal.Blue circle and triangle symbols highlight the fraction of nodes and edges of the 2 similarity backbones filtered with α o and α , respectively.The same information is featured by red symbols for the random surrogates.
all networks in Fig. 5A-H, but specially for Metabolic and Music whose HSBs curves are visibly above the curves for the original networks (in black) for a wide range of initial conditions, N ini coop /N gcc ∈ (0.4 − 0.9).For instance, the HSB with α = 0.59 of the Music network (see blue symbols curve in Fig. 5H) has 73% less links than the original network while preserving 83% of nodes, and still sustains up to ≈ 8 times more final cooperation than the original network, for a fraction of initial cooperators of 0.8.To further ensure that the enhanced cooperation actually stems from the categorical structure of the backbones one should compare an HSB curve with that of its corresponding random surrogate.By doing so, we observe that the random surrogates happen to reproduce closer the cooperative behaviour of the original network, that is LR curves in Fig. 5A-H follow the profile of the original network instead of appreciably deviating upwards, thus revealing that the surrogates do not provide a substantial gain in cooperation as opposite to HSBs.In general, the surrogates also require a higher proportion of initial cooperators, around N ini coop /N gcc 0.6, to produce any sizable increase in final cooperation with respect to the original network.This indicates that the internal hierarchical organization of the HSBs is key to sustain enhanced cooperation.Actually, the similarity filter preferentially removes links with lower hierarchy load, usually consisting of long-range connections stablished by high degree nodes, whereas the random removal makes no distinction.In fact, given the scale-freeness of real networks, deleting a long-range link at random is less likely due to their scarcity.Therefore, during the dynamics, similarity backbones may develop clusters of same-strategy nodes that are more stable through the evolutionary process than those found in random surrogates, the reason being the former are less exposed to distant contacts belonging to clusters of opposite strategy.This means the hierarchical structure of HSBs enables a better shielding for the groups of cooperators in the shape of metric clusters [53], which in turn can explain the increased cooperation levels found in similarity backbones.To additionally validate our results we choose to explore four more different combinations of payoff values for the Music network in Supplementary Fig. S6.We observe that modifying the payoffs produces the same qualitative results as discussed above, with HSBs curves clearly surpassing the original network and evidencing that similarity backbones can reach superior cooperation.

IV. DISCUSSION
The existence of a metric space underlying complex networks allows us to provide an enriched interpretation of hierarchy that integrates two dimensions: popularity, or degree rank, and similarity between nodes, thus overcoming the problem of detecting hierarchies in the presence of clustering and the small world effect.The metric approach enables a unified framework to define the hierarchy loads of nodes and links.
Interestingly, the spectra of hierarchy loads of real networks revealed that, in general, these networks are less hierarchical than the reference provided by the maximum entropy null model and show greater variation in the hierarchy load of nodes across degree classes.Particularly, the lowest degree nodes typically contribute more towards the hierarchical structure, although their fluctuations are remarkable.
Moreover, we introduced the similarity filter, a link pruning method which exploits the heterogeneity found in the hierarchy load of links.The filter extracts the connections that dominate the hierarchical structure of networks in terms of popularity and similarity, providing what we name hierarchical similarity backbones (HSB).The analysis of such backbones uncovered that, strickingly, the similarity filter is able to preserve network topological features at all scales while discarding a large number of links.Accordingly, from a fundamental point of view, hierarchical backbones could help provide new insight about the percolation properties of highly stratified real networks, aiding control of cascading failures, as well as have the potential to become a standard methodology for the detection, visualization and inspection of hierarchical clusters [56] in machine learning and data science environments.
From a practical point of view, the similarity filter has proven to be an exceptional tool to unravel the backbone that sustains enhanced cooperation in social dilemmas on structured populations.This is in line with previous simulations of prisoners dilemma type dynamics on adaptive networks, showing that cooperation combined with imitation can lead to a hierarchical structure [11].When this dynamics is played on heterogeneous contact networks with underlying metric structure, the evolution of cooperation leads to the formation of clusters of cooperators in the similarity subspace [53].In the presence of these clusters, heterogeneity in the degrees was nevertheless found to hinder cooperation.Those findings reveal a tension between the popularity and similarity dimensions in evolutionary dynamics modelling social dilemmas.Our findings here solve this opposition by identifying the similarity backbones composed of significant links that are simultaneously hierarchical in terms of popularity and similarity, and which are expressly relevant in supporting and fostering cooperation.
Lastly, the methods developed in this contribution can be used to study the hierarchical nature of complex networks of any domain as long as they admit a geometric representation.The detection of hierarchical similarity backbones could for instance help in designing controllability of gene regulatory networks, improve communicability in information systems and infrastructures or assess robustness to species loss in ecological networks.Other possibilities for our framework include its extension to multiplex networks, opening promising future lines of research.

A. Empirical data
All real complex networks used in this paper have been mapped into their hyperbolic latent geometry using the embedding method Mercator [21].This method mixes machine learning and maximum likelihood approaches to infer the coordinates of the nodes in the underlying hyperbolic disk, while ensuring best congruency between the real network topology and the S 1 geometric model.
Enron.The network captures the email communication activity (125,409 emails) within employees from the Enron company.Edges are stablished between email addresses that shared correspondence.We use the dataset provided in [57] which includes also information about the organizational roles of 130 users.
Internet.We use the adjacency data for the Internet at the autonomous systems (AS) level assambled by the Archipelago project [41] during June 2009.
Human metabolic.This network is the onemode projection of metabolites of the bipartite metabolic network of human cell metabolisms.In this representation [35], there is a link between two metabolites if they participate in the same biochemical reaction.
Music.Nodes of the network are chords-sets of musical notes (see [42]) played in a single beat while edges represent detected transitions between these chords.We use a sparser, undirected version of the network reconstructed in [33].

B. Models of Network Geometry
H 2 model An isomorphism exists between the S 1 and the H 2 models [22], so that hidden degrees κ are mapped into radial coordinates, r, in a hyperbolic disk of radius R H 2 , such that κ ∼ e (R H 2 − r)/2 .Consequently, in the hyperbolic version, nodes with larger radial coordinates are located towards the edge of the hyperbolic disk and show lower expected degree.
Particularly, in the H 2 model every node i is defined by the tuple (r i , θ i ), and the probability that a link exists between two nodes i and j depends on their distance d ij , as measured in the hyperbolic space using the hyperbolic law of cosines cosh(d ij ) = coshr i coshr j −sinhr i sinhr j cos∆θ ij .Nodes closely positioned in the hyperbolic disk have higher chances of being connected, thus the connection probability p(d ij ) must be a decreasing function of distance between them and, specifically, it can be chosen to be where the parameter β still controls the network's clustering coefficient.In this paper, we mainly use the S 1 model for calculation purposes, and its equivalent H 2 version for visualization tasks.

Soft Communities in Similarity Space (SCSS)
The SCSS model [38] produces synthetic geometric networks with inhomogeneous angular distributions, derived from geometric preferential attachement mechanisms, which were conceived in growing geometric network models [39,58].This means, for a network represented in an underlying (hyperbolic) metric space, the initial attractiveness of different angular regions during a geometric preferential attachemnt process is controlled by a parameter Λ.
Consequently, this parameter regulates the heterogeneity of the angular coordinate, so that heteorgeneity is a decreasing function of Λ, with Λ → ∞ recovering the homogeneous distribution.The SCSS model, then takes such heterogeneous angular distribution defined by Λ and adjusts it to an independent power-law degree distribution (P (k) ∼ k γ ) and a tunable level of mean clustering c , controlled through parameter β.The SCSS model does so by introducing correlations between κ and θ coordinates of nodes of the geometric network.

Geometric Randomization (GR)
The GR [59] is a model for the randomization of complex networks with geometric structure, which allows to uniformize their angular coordinate distribution, while preserving the exact degree sequence of the network.It thus applies to both real and synthetic networks where nodes have an observed degree and exist in a similarity space.In the GR model, angular coordinates θ are assigned to the nodes, chosen uniformly at random from [0, 2π].The network is then rewired following a likelihood maximization process that ensures the new topology is one generated by the S 1 model, while the observed degrees (and thus the number of edges) remain unaltered.The model is implemented using a single parameter β controlling the mean clustering of the resultant rewired network.The rewiring and maximization procedure executed by the GR are specially useful to produce faithful real network replicas where only geometric (soft) communities have been supressed.

C. Evolutionary Prisoner's dilemma
The evolutionary prisoner's dilemma game [27], conducted on a network, considers that individual nodes playing with their contacts choose to either cooperate (C) or defect (D) every turn.The choice of strategies of the two interacting agents leads to specific payoffs, summarized the payoff-matrix That is, if both players cooperate, they both receive the reward R for cooperating.If both players defect, they both receive the punishment payoff P .Lastly, if one of them defects while the other agent cooperates, the defector receives the temptation payoff T , while the cooperator receives the "sucker 's" payoff, S. In order for the game to be recognized as a prisoners's dilemma the condition T > R > P > S must apply.Different ordinalities of the parameters define further classes of games [29].In this paper, the prioner's dilemma is defined with parameter values: T = 1.5, R = 1, P = 0, S = −0.5.Further parameter values are explored in Supplementary material.The game proceeds in successive rounds.After each round, the strategies (C or D) of all nodes are updated synchronously according to the outcome of the imitation dynamics [27,60,61] that outline the evolutionary mechanism.This means, during a single round, each individual node i collects payoffs given by Eq.6 from the interactions with all its neighbors and obtains an accumulated payoff π i .All players chose then between their old strategy and the strategy of a randomly picked up neighbor j.In this way, node i will adopt j's strategy with a probability that depends upon the difference between collected payoffs of both nodes (π i − π j ) as which reflects the popular tendency of individuals to copy more successful neighbors.Such updating rule is a common choice in evolutionary dynamics [62], known as the Fermi rule since it is based in the Fermi distribution from Statistical Mechanics.The variable 1 τ , which in Physics stands for the inverse temperature, can be interpreted here as the intensity of the selection.That is, parameter 1 τ (which we set to 0.5) controls the noise added to the decision-making process of, otherwise, perfectly rational players.After the simultaneous update of strategy of all nodes, their accumulated payoffs are reset and a new round begins.

D. Probability distribution of angular distances between connected nodes in the S 1 model
The hierarchy load of an observed link, h ij , can be computed analytically given the angular distance between the two nodes observed in the embedding, ∆θ obs ij .To derive a closed expression for it, we first need to write down an expression for the probability distribution function of the angular separation between two nodes in the S 1 model conditioned to the fact that they are connected.Using Bayes' rule, we see that where a ij is the adjacency matrix element corresponding to the two nodes.In the above expression, p (a ij = 1|∆θ ij ) is the connection probability, Eq. (1), while the distribution of angular distances is simply p (∆θ ij ) = 1/π, given that angular coordinates are homogeneously distributed in the S 1 model.The denominator can be obtained by direct integration, p (a ij = 1) = π 0 p (a ij = 1|∆θ ij ) p (∆θ ij ) d∆θ ij .Using the definition of h ij , we obtain where 2 F 1 (a, b; c; z) is the hypergeometric function.Equation ( 9) yields the hierarchy load of a given link in terms of the angular separation and product of hidden degrees of the nodes at both ends of the edge, as well as of the global parameters R, µ, and β.
Finally, let us also show that the expected value of h ij for any link in a network generated by the S 1 model is h ij S 1 = 1/2.To calculate h ij S 1 , we only need to notice that, in this case, the angular separation between the nodes at the ends of the edge in the resulting network, ∆θ obs ij , is itself a random variable with distribution ρ ∆θ obs ij given by Eq. (8) (that is, ρ ∆θ obs ij = p ∆θ obs ij |a ij = 1 and, therefore, the expected value of h ij is In the last step, we have used the fact that ρ(z) is normalized, Figure 2C-F in the main paper shows the spectrum of hierarchy loads of nodes in terms of angular concentration, h * i , for 4 real networks.Here, we provide the same metric for angularly randomized versions of such real networks, obtained using the Geometric Randomization model (see Methods B in main paper).By comparing the spectra in Fig. 2C-F and Fig. S1 in this Supplementary one can notice that the complete homogenization of the angular coordinates of nodes (and thus the full elimination of geometric communities) does not translate into radical changes in the hierarchy load profiles.Instead, below (Fig. S1) we observe for each of the randomized networks that the inverse correlation with the degree of the hierarchy loads is compatible with that of the original network, and that the average hierarchy load h * values of the GR networks are not remarkably different to that of the real ones.The most noticeable influence of geometric community structure in the hierarchy load measurement is found for the Metabolic network, which turns to be a natural effect given the pronounced bimodal distribution of node angles of the original real network.First, in Fig. S2 we provide results for the topological features of similarity backbones examined in Fig. 4D of the main paper for the rest of real networks under study.Secondly, in Fig. S3 we cover further topological metrics of the HSBs and showcase them against the α parameter controlling the filtering procedure.Table S1: Backbones used to study evolutionary game dynamics.HSB stands for hierarchical similarity backbone, meaning these backbones are obtained using the similarity filter with specific α.RLR stands for random link removal, thus this method gives random surrogates where a specific number of links have been removed, indicated by LR (links removed).The results in Fig. 5

Figure 2 .
Figure 2. Hierarchy load spectra of synthetic and real networks.A-B) Hierarchy load spectrums for synthetic SCSS networks of size N = 1000 nodes, generated with power-law degree distribution exponent γ = 2.50, clustering parameter β = 2.50 and variable attractiveness Λ. Results are averaged over 10 network realizations for each choice of Λ. Purple curves correspond to hi in Eq. 3 computed from the link hierarchy loads while yellow curves correspond to h * i in Eq. 4 as given by the circular mean resultant vector.Dashed purple and dashed yellow lines indicate the corresponding average hierarchy load values for the whole network.C-F) Hierarchy load spectra of 4 real networks.Blue curves indicate hi while red curves correspond to h * i .Dashed blue and dashed red lines indicate the average hierarchy load h for matching color profiles.

Figure 3 .
Figure 3. Link hierarchy loads of real networks.A-B) Link hierarchy loads of the Enron and Metabolic networks, respectively.Each dot indicates the value of hij of a link stablished between node i and one of its lowerstatus neighbors j with hidden degree κj indicated by the color code.Notice node labels in the x-axis are sorted by increasing hidden degree κi, so that more popular i nodes appear to the right.

Figure 4 .
Figure 4. Hierarchical similarity backbones.A) A hierarchical similarity backbone of the World Trade Web[36] (in red) filtered with α = 0.4, on top of the complete network (pale blue).B) Plots from I to IV show: the relative number of nodes in the backbone, against the relative number of edges in the backbone (subindex 0 refers to the complete network); the relative number of nodes in the backbone giant connected component (gcc), against the relative number of edges in the backbone gcc; mean clustering coefficient of the backbone for increasing α; number of disconnected components against increasing values of the significance α.C) Fraction of nodes considered hubs, for threshold value τ = 0.25, found in backbones obtained with increasing α.D) Topological features of the HSBs of the Music network, obtained for α's from 0.1 to 0.8.Value α = 0.0 corresponds to the original unaltered network, whereas in the most restrictive HSB (α = 0.8) 0.60% of the nodes and 14% of the links remain.From I to III: complementary cummulative distribution of rescaled degrees, kres = k/ k , degree dependent clustering coefficient over rescaled-degree classes, normalised average nearest neighbor degree knn res = knn(kres) k / k 2 .
Figure 4B-I shows that, for all real networks, low values of α reduce the number of links drastically while most of the nodes are preserved in the backbone.Notice that α increases from right to left in Figs.4B-I-II.

Figure 5 .
Figure 5. Evolutionary game dynamics in HSBs of real networks.A-H) Fraction of final cooperators against fraction of initial cooperators for similarity backbones of Enron, Internet, Music and Metabolic networks.For every network we show two plots, each with the results of an HSB filtered with either α o or α (see numeric values in Supplementary TableS1).For an HSB (blue curve with symbols), the corresponding random surrogate appears as a grey dashed line together with the original network in black line for reference.I-L) Relative number of nodes against relative number of edges in the gcc of similarity backbones, and their corresponding surrogates, for the four networks under study.Blue lines correspond to backbones obtained using the similarity filter and red lines to surrogates obtained by random link removal.Blue circle and triangle symbols highlight the fraction of nodes and edges of the 2 similarity backbones filtered with α o and α , respectively.The same information is featured by red symbols for the random surrogates.

Figure S1 .
Figure S1.Hierarchy load spectrums for Geometric Randomization (GR) model replicas of each of the 4 real networks under study.Each spectrum is the average over 10 independent realizations.Dashed lines indicate the average hierarchy load of the network, obtainded as h * = N −1 N i=1 h * i , yielding to the following values for each network: h * Enron = 0.82 ± 0.19, h * Internet = 0.99 ± 0.02, h * Metabolic = 0.99 ± 0.04 and h * Music = 0.95 ± 0.11.

Figure S2 .
Figure S2.Topological features of the hierarchical backbones of A) Enron, B) Internet, C) Metabolic , obtained for α ∈ [0.1 − 0.8].In each case, value α = 0.0 corresponds to the original network.The properties in each row, from left to right, are: complementary cummulative distribution of rescaled degrees (kres = k/ k ), degree dependent clustering coefficient over rescaled-degree classes, normalised average nearest neighbor degree knn res = knn(kres) k / k 2 .

Figure S3 .
Figure S3.Topological properties of hierarchical backbones of the four real networks against increasingly restrictive filtering parameter (α) values.Plots from I to IV show, for increasing values of α: the normalised number of nodes in the backbone; the normalised number of edges in the backbone; normalised average degree of the backbone and the HSB normalised maximum degree.
-C and Fig.S5-C are the average of 10 random surrogate realizations.Because of this, and since by construction we fix the number of edges in the random surrogates, only their number of nodes can fluctuate and hence display an error interval.

Figure S5 .Figure
Figure S5.Topological features of random surrogates and original network of A) Enron, B) Internet, C) Metabolic D) Music.In each case, value LR indicates the number of links that have been removed at random to produce the surrogate backbone.The properties in each row, from left to right, are: complementary cummulative distribution of rescaled degrees kres = k/ k , degree dependent clustering coefficient over rescaled-degree classes, normalised average nearest neighbor degree knn res = knn(kres) k / k 2 .

Table I .
Properties of the data sets under consideration: N , size of the network; E number of edges; parameter β estimated from the embedding of the real network; kmax, highest degree; k , average degree; and c , average clustering coefficient; h , average hierarchy load; h * , alternative average hierarchy load in terms of angular concentration.