Link prediction with hyperbolic geometry

Link prediction is a paradigmatic problem in network science with a variety of applications. In latent space network models this problem boils down to ranking pairs of nodes in the order of increasing latent distances between them. The network model with hyperbolic latent spaces has a number of attractive properties suggesting it must be a powerful tool to predict links, but the past work in this direction reported mixed results. Here we perform systematic investigation of the utility of latent hyperbolic geometry for link prediction in networks. We first show that some measures of link prediction accuracy are extremely sensitive with respect to inaccuracies in the inference of latent hyperbolic coordinates of nodes, so that we develop a new coordinate inference method that maximizes the accuracy of such inference. Applying this method to synthetic and real networks, we then find that while there exists a multitude of competitive methods to predict obvious easy-to-predict links, among which hyperbolic link prediction is rarely the best but often competitive, it is the best, often by far, when the task is to predict less obvious missing links that are really hard to predict. These links include missing links in incomplete networks with large fractions of missing links, missing links between nodes that do not have any common neighbors, and missing links between dissimilar nodes at large latent distances. Overall these results suggest that the harder a specific link prediction task is, the more seriously one should consider using hyperbolic geometry.

Latent space network models [16][17][18][19][20] offer an intuitive and simple approach to link prediction.In these models, network nodes are points in a latent space, while connections are established with probabilities that decrease with latent distances between nodes.Latent distances model similarity between nodes, and the main idea behind these models is to model homophily: more similar nodes are more likely to be linked.Link prediction then reduces to ranking unconnected node pairs in the order of increasing latent distances between them: the closer the two unlinked nodes in the latent space, the higher the probability of a missing link [4,[21][22][23].
Among many latent space models considered in literature, only the one that assumes that the latent space is hyperbolic, reproduces sparsity, self-similarity, scale-free degree distribution, strong clustering, the small-world property, and community structure [19,[24][25][26].All these properties are often observed in many real networks [27][28][29], and hyperbolic geometry captures them all.In addition, the hyperbolic network model is likely to be the simplest or parsimonious with respect to these properties, as in some of its limiting regimes it has been proven to be statistically unbiased, satisfying the maximum en-tropy principle [30,31].
Given the combination of these attractive properties, one could naturally expect that the hyperbolic latent space model must be a powerful tool in link prediction.Yet the previous studies on this subject reported mixed results [23,[32][33][34][35][36].
Here we perform systematic investigation of the efficiency of link prediction using latent hyperbolic geometry.In Section II, we first recall the definitions of the hyperbolic latent space network model, which for short we call hyperbolic random graphs (HRGs), and of the main measures of link prediction accuracy: AUC (Area under Receiver-Operating Characteristic), AUPR (Area under Precision-Recall Curve), and Precision.We also discuss there what the AUC and AUPR measures actually measure: while AUPR cares mostly about most obvious easy-to-predict missing links, AUC puts more weight on less obvious and harder-to-predict missing links between more dissimilar nodes at large latent distances, albeit with the cost of not caring that much about false positives.Our main results are then in Sections III and IV: Section III: Upper bounds of link predictability with hyperbolic geometry.Here we calculate analytically the AUC and AUPR on HRGs with known hyperbolic coordinates of all nodes.That is, the same coordinates are used both to generate HRGs and to predict missing links in them, an ideal situation yielding the upper bound for the link prediction accuracy using hyperbolic geometry.To understand the robustness of link prediction in the case where coordinates are inferred (Section IV), so that they are not equal exactly to the true coordinates, we add uniform noise to the true coor-dinates, and analyze the AUC, AUPR, and Precision as functions of the noise amplitude to find that: • AUC is not that sensitive to noise, but • AUPR and Precision decrease quickly as noise grows.
The latter result implies that the AUPR and Precision scores of link prediction using hyperbolic geometry in real networks can be high only if node coordinates are inferred with sufficiently high accuracy.This is because the most likely missing links candidates are those between similar nodes at small hyperbolic distances, which are most sensitive to coordinate inaccuracies.
Section IV: Link prediction in real networks and HRGs with inferred coordinates.To predict missing links in networks with unknown coordinates one first needs to infer these coordinates.Motivated by the results in Section III calling for high-accuracy coordinate inference, and given that no existing hyperbolic coordinate inference algorithm is sufficiently accurate, we develop a new one, which we call HyperLink Embedder, whose focus is on high precision in coordinate inference.We describe it in Appendix E, where we also compare it to some existing inference algorithms to show that its accuracy is indeed higher.We then apply it to a collection of HRGs with "forgotten" coordinates, and to real networks, calling the overall link prediction procedure as the HyperLink method.Comparing the HyperLink to a representative collection of other link prediction methods, we find that: • HyperLink's AUC scores are always the best.
• HyperLink's AUPR and Precision scores in -HRGs: the best if, surprisingly, clustering is weak, and/or the degree distribution exponent γ is high, real networks: the scores are rarely the best but are often competitive, especially if the fraction of missing links is high.
• In prediction of nonlocal links, which are links between nodes that do not have any common neighbors (such links are really hard to predict, and tend to comprise significant fractions of all missing links), the HyperLink is always by far the best, according to all the AUC, AUPR, and Precision measures.
In Section V we conclude the paper with a more detailed summary of these results, their discussion, and an outline of open problems.

II. METHODS
We begin the exposition by discussing the latentgeometric link prediction framework and the null model that we utilize to predict missing links.
A. Link prediction with latent geometry Link prediction with hyperbolic geometry is a two-step procedure.First, one needs to infer node coordinates in the hyperbolic space and calculate hyperbolic distances between node pairs.This coordinate inference procedure is often referred to as network mapping or embedding.The second step of the procedure is to identify most likely missing link candidates.This subsection focuses on the second step of this procedure, while the technical details of the null geometric model and the network mapping algorithm constituting the first step, are provided, respectively, in Section II B and Appendix (E).We refer to the network mapping algorithm and the entire hyperbolic link prediction framework as the HyperLink Embedder and the HyperLink, respectively.
The latent-geometric link prediction framework is applicable to all latent geometric models, where connections are established independently with decreasing connection probability function p(x).Intuitively, the smaller the latent distance between two nodes, the higher the probability of a link between them.Then, if two nodes located close to each other in the latent space are not connected, it is likely that there is a missing link between them.
Specifically, consider a latent-geometric model where nodes are assigned positions {x i } in a certain latent space M, and every node pair {ij} is connected with probability p ij = p (x ij ), where x ij = d (x i , x j ) is the latent distance between the nodes, and p : R + → [0, 1] is the decreasing connection probability function specified by the model.After all connections are established, some links are removed with probabilities 1 − q ij .These pairs of nodes are referred to as missing links.
Any unconnected node pair {ij} in the resulting network is either not connected in the network formation process, or connected in the network formation and later removed with probability 1 − q ij .Therefore, the probability for an unconnected pair of nodes {ij} separated by x ij to be a missing link, is In the particular case of a decreasing connection probability function p(x) and the random link removal process, is the decreasing function of x ij for any q > 0. Thus, the most probable candidates for missing links are indeed unconnected node pairs located at small latent distances, as stated, and the latent-geometric link prediction algorithm only needs to rank unconnected node pairs in the increasing order of latent distance between them.
It is important to note, however, that this approach is only guaranteed to work in the case the links are removed uniformly at random.In the general case, missing link probabilities in Eq. ( 1) depend both on latent distances {x ij } and missing link rates {1 − q ij } and further information on the nature of {q ij } is needed to rank missing link candidates properly.

B. Hyperbolic Random Graphs
While the latent-geometric framework described above is applicable to all latent-space models, in our work we use the HRG model as a null model for link prediction.
The latent space of the HRG model is the twodimensional hyperbolic disk of constant negative curvature K = −1 and radius R. The hyperbolic distance x between any two points in the hyperbolic disk is given by the hyperbolic law of cosines: cosh x = cosh r cosh r − sinh r sinh r cos ∆θ, (3) where (r, θ) and (r , θ ) are the hyperbolic coordinates of the two points within the disk and ∆θ = π − |π − |θ − θ || is the angle between them.

Connect node pairs with probability
We summarize basic HRG properties in Appendices C: parameter α controls the exponent γ = 2α + 1 of the power-law degree distribution, while clustering is a decreasing function of temperature T approaching 0 in the N → ∞ limit as T → 1.In this limit, clustering is zero for any T ≥ 1.

C. Link Prediction Accuracy
We evaluate the accuracy of the HyperLink as well as other link prediction methods through random link removal experiments.To this end, we first remove existing links uniformly at random with probability 1 − q from the network of interest G.We refer to the remaining network as pruned network and denote it by G.We refer to removed links as missing links and denote them by Ω R .The set of remaining links in G is referred to as Ω E .
To test the link prediction method of interest we compute likelihood scores for all unconnected node pairs in G, Ω E , which include both missing links Ω R and true non-links Ω N , so that Ω E = Ω R ∪ Ω N .We then rely on these scores to rank unconnected node pairs in the decreasing order of missing link likelihood and refer to them as missing link candidates.We denote the fraction of λ ∈ [0, 1] of most likely missing link candidates as set Ω M (λ).In the case λ = 0, Ω M (λ) is the empty set, while In the case the exact number of missing links is known, the most direct way to assess link prediction accuracy is to consider the same number of the most likely missing link candidates and evaluate its intersection with the set of missing links.This metric is known as Precision and is formally defined as where fraction The Precision score is bounded by 0 and 1 with the upper bound corresponding to the ideal link predictor ranking all missing links in Ω R higher than non-links in Ω N .
In practical circumstances, however, the exact number of missing links is often unknown.Further, depending on the application, one might be interested to minimize the number of false positives in the prediction set, possibly by the expense of false negatives, or vice versa, minimize the number of false negatives by the expense of false positives.One example of the former case where one is interested to minimize the number of false positives, i.e., good citizens misclassified as criminals, is the criminal justice system.This example is in contrast to cancer screening, where the number of false negatives, or not-identified cancer cases, should be minimized.In both cases one is interested to explore the performance of the link predictor for a range of Ω M (λ) sizes.
A number of link prediction metrics has been developed to this end with the Receiver Operating Characteristic (ROC) and the Precision-Recall (PR) being the most popular.
To formally introduce ROC and PR curves we first define the confusion matrix.The latter consists of four values, the numbers of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), Fig. 1, and is extensively used in statistical classification problems.Link prediction is not a genuine classification problem since one is only interested to predict links and not their absence.Non-link node pairs are predicted implicitly as unconnected node pairs that are not part of Ω R .In the context of link prediction, the number of true positives is the number of correctly identified missing links from Ω M (λ), Eq. ( 8).The number of false negatives is the remaining number of missing links that are not part of the Ω M (λ), Eq. ( 9).The number of false positives is the number of missing link candidates in Ω M (λ) that are not correctly identified, Eq. (10).Finally, the number of true negatives is the number of unconnected node pairs that are neither true positives, nor false positives nor false negatives, see Eq. (11) and Fig. 1.
Since network sizes vary, it is common to normalize confusion matrix elements, obtaining true positive, false positive, false negative and true negative rates, formally defined as: ROC statistics or curve is defined as the parametric plot of the true positive rate tpr(λ) as function of the false positive rate fpr(λ) obtained by varying the fraction of considered link candidates λ ∈ [0, 1].The ideal predictor is expected to rank all node pairs corresponding to missing links, Ω R , higher than non-links, Ω N , resulting in unit true positive rate and zero false positive rate for λ = 1 − q, tpr(1 − q) = 1, fpr(1 − q) = 0.The corresponding ROC curve of the ideal predictor is thus a rectangle going through the upper left corner (0, 1) of the ROC space.A fully random link predictor, on the other hand, will guess missing links at random from Ω E and is expected to yield equal true positive and false positive rates, tpr(λ) = fpr(λ) for all λ values, resulting in the diagonal ROC curve, Fig. 2a.
The standard way to quantify ROC-based prediction accuracy is through the Area under ROC curve (AUC), AUC values vary in between 0 and 1 with AUC = 0.5 corresponding to fully random predictor and AUC = 1.0 corresponding to the perfect predictor.
The AUC score can be interpreted as the probability that a randomly chosen missing link is assigned a higher link prediction score than a randomly chosen unconnected node pair.ROC curves are easy to read and interpret, which is arguably the basic reason behind their popularity.
At the same time, there is a growing consensus that ROC curves and corresponding AUC scores are insensitive in class imbalance problems, where the size of the positives is disproportional to that of the negatives [43].Link prediction in sparse networks is one example of class imbalance.Here the number of missing links is of the order of N and is significantly smaller than the number non-links which is of the order of N 2 .Intuitively, in this situation tpr(λ) rate grows much faster then false positive rate since the latter is normalized by |Ω N and, as a result, most ROC curves tend to be substantially above the random baseline, yielding AUC scores close to 1.0, regardless of the link prediction method.
An alternative to the ROC curve is the PR characteristic, defined as the parametric plot of the precision rate pr(λ) as a function of the recall rate rc(λ) obtained by varying λ ∈ [0, 1], where the two rates are defined by That is, the recall rate is identical to the true positive rate, while the precision rate differs from the latter by a different normalization -to the number of predicted links versus the number of removed links.
In the case of ideal predictor precision rate is maximized, pr(λ) = 1.0 for λ ≤ 1 − q, while the recall is growing from rc(0) = 0 to rc(1 − q) = 1, resulting in the rectangular PR curve going through the upper right corner (1, 1) of the PR space.A fully random predictor, on the other hand, maintains constant precision rate equal to the ratio of the number of true missing links to the total number of unconnected node pairs, pr rand (λ) = AUPR values vary between and 1 with the unit score corresponding to the ideal predictor.In the case of sparse networks Ω R Ω N , leading to AUPR 1 in the case of a random predictor.Unlike ROC curves, PR characteristics does not directly depend on the number of true negatives and, as a result, does not suffer from the class imbalance problem in case of sparse networks.

D. AUC versus AUPR
While both AUC and AUPR quantify link prediction accuracy, they tend to weigh missing link candidates differently.AUPR scores tends to emphasize highly ranked missing links candidates, i.e., those corresponding to small λ values.AUC scores, on the other hand, put more weight on missing links candidates corresponding to larger λ values.
Indeed, AUPR averages precision rate pr(λ) over the recall rate rc(λ).Since the recall rate is given by rc , Eq. ( 19), good link predictors tend to reach rc(λ) = 1 values when the size of missing link candidates set Ω M (λ) becomes comparable to that of Ω R : In summary, AUC and AUPR scores complement each other by weighing missing link candidates in Ω M differently.Thus, in our work we compute both metrics to obtain a comprehensive view on the utility of hyperbolic geometry in link prediction.In addition to AUPR and AUC scores, we also compute Precision scores, which are the scores to use if the number of missing links is known exactly, although such knowledge is rarely the case in practice.

III. LINK PREDICTION WITH KNOWN COORDINATES
Before investigating link prediction accuracy in real networks, we conduct link prediction experiments on HRG graphs with known coordinates.In doing so we pursue several goals.The HRGs provide the upper bound for link prediction accuracy of the HyperLink if same node coordinates are used both for the construction of the graphs and for link prediction [23].Thus, we want to quantify the upper bound of link predictability on HRG graphs.Second, we want to measure link prediction accuracies of other methods, listed in Appendix B, and compare them to that of the HyperLink.Establishing these results provides a baseline for interpreting link prediction results on real networks.
We start with the analysis of HyperLink accuracy in the case of randomly missing links in HRGs.After the generation of an HRG graph we visit each of its links and remove it with probability 1 − q, arriving at pruned network.We then rank missing link candidates using distances between all unconnected node pairs calculated with coordinates from which the network was originally generated.
As seen in Fig. 3, the predictive power of the Hyper-Link is maximized as T → 0 and decreases as T increases.This result is expected.In the T → 0 limit the HRG model is deterministic since the connection prob- In all experiments we remove links uniformly at random with probability 1−q = 0.5.Then missing links are predicted using hyperbolic distances between unconnected node pairs.Link prediction accuracy is quantified using, a AUC, b AUPR, and c Precision scores plotted as a function of HRG temperature T .All results correspond to HRG graphs with N = 10 4 nodes, γ = 2.5 and k = 10.The HyperLink link prediction scores are compared to those of AA, RA, JC, CN, CRA, and SPM methods, see Appendix B.
ability in Eq. ( 6) becomes the Heaviside step function, p(x) → Θ(R − x).As a result, all node pairs with x < R are connected and other node pairs are not.Then, an unconnected pair of nodes at distance x < R is guaranteed to be a true positive and all unconnected pairs at x ≥ R are true negatives.As T increases, connections are allowed at distances x > R with increasing probability and, as a result, underlying geometry plays smaller role in the formation of links, explaining the decreasing link prediction accuracy as a function of T , as quantified by all scores in Fig. 3.
Even though all scores, AUC, AUPR and Precision, are decreasing functions of T , they behave differently.AUC scores remain constant in the T ∈ 0, 1  2 interval and then exhibit a slow decay to AUC = 0.95 at T = 0.9.AUPR and Precision scores, on the other hand, decrease rapidly on the in the entire testing interval of T ∈ [0, 0.9] from AUPR = 1 (Precision = 1) at T = 0 to AUPR = 0.34 (AUPR = 0.29) at T = 0.9.

A. AUC
To understand the behavior of AUC scores as a function of HRG parameters we define distance-dependent true positive tpr(x) and false positive fpr(x) rates as the fractions of true and false positives, respectively, contained among unconnected node pairs separated by distances up to x.
where E is the true number of links in the network, is the connection probability in the HRG given by Eq. ( 6) and n(y) is the distance distribution for node pairs in the HRG.It is seen from Eqs. ( 21) and ( 22) that in the T → 0 limit p(y) = Θ(R − y), resulting in fpr(x) = 0 for x ≤ R and tpr(x) = 1 for x ≥ R, resulting in the ideal ROC curve, Fig. 2a, and AUC = 1.
To quantify the behavior of AUC in the T > 0 regime we rely on the functional form of n(y), which has been found in [44].It follows from Ref. [44] that n(y) has the upper bound of and this upper bound also provides its leading term behavior in the large N limit: for α > 1 2 .Using Eq. ( 24), we can evaluate true and false positive rates, up to proportionality coefficient, as: where and 2 F 1 is the Gaussian Hypergeometric function.In the z 1 regime I (z; T ) ≈ z and, thus, tpr(x) ≈ e x−R 2 and fpr(x) ≈ 0 for x < R, Fig. 4a,b.
This result is easy to interpret.Connection probability p(x) is close to unity for x < R and thus all unconnected node pairs with x < R are almost guaranteed to be true positives, resulting in negligible false positive rates.Since missing links are removed uniformly at random, the number of true positives for x < R grows proportional to the number of node pairs in the hyperbolic disk.
In the z 1 regime I (z; T ) ∼ I(T ), where I(T ) = π T sin(π/T ) , explaining the saturation of the true positive rate, tpr(x) → 1 as x approaches 2R, and the exponential growth of the false positive rate, fpr(x) ∼ e x 2 for x > R, Fig. 4a,b.
The behavior of tpr(x) and fpr(x) provides a qualitative explanation for nearly perfect AUC scores observed in Fig. 3a.The false positive rate fpr(x) takes large values only when x approaches 2R.At the same time, as x approaches 2R the tpr(x) approaches 1.
To obtain the analytical estimate of the AUC as a function of HRG parameters we represent it as By making use of Eqs. ( 21) and ( 22) we arrive at where n c (x) ≡ x 0 n(y)dy.In the case of sparse networks the first correction term ∆ 1 ∼ N −1 and can be ignored in the large N limit.The second correction term requires further analysis.It is straightforward to verify that in the T → 0 limit ∆ 2 ∼ N −1 and can also be ignored.Indeed, in this case p (x) = −δ(x − R), and Since N 2 [n c (R)] equals the number of node pairs in the hyperbolic disk with distances up to R and all these node pairs are connected in the To estimate the behavior of ∆ 2 in the case of T > 0 we need to understand the behavior of its integrand in Eq. (31).Since 2 , the integrand is sharply peaked at Conversely, the integrand in Eq. ( 31) grows monotonously as a function of x in the case of The evaluation of ∆ 2 in this regime is quite involved and is not informative.Instead, we elect to compute the upper bound for ∆ 2 , which also provides the lower bound for AUC scores.In doing so we note that the leading term behavior of n(x) given by Eq. ( 24) is also its upper bound, see Ref. [44].Then since e R 2 ∼ N in the case of sparse HRGs, see Eq. (C3).In the case of T = 1 2 Eq. ( 33) simplifies to Taken together, the results above show that the AUC scores for HRG graphs with known coordinates converge to 1 in the large N limit as B. AUPR AUPR scores can be evaluated in a similar fashion: where pr(x) and rc(x) are, respectively, distancedependent precision and recall functions for hyperbolic distances up to x: where N d (x) is the number of disconnected node pairs with distances up to x: Using Eqs. ( 21) and ( 39) we obtain pr(x) = (1 − q) x 0 n(y)p(y)dy In the T → 0 limit pr(x) = 1 for all x < R, while rc(x) = E(x)/E, resulting, as expected, in AUPR = 1.In all experiments we remove links uniformly at random with probability 1 − q = 0.5.Then missing links are predicted using hyperbolic distances between unconnected node pairs.All results correspond to HRG graphs with N = 10 4 nodes, γ = 2.5 and k = 10.The insets display the same plots as the main panels but in log-linear format.Solid lines correspond to analytical estimates.
Here E(x) is the cumulative number of links between the node pairs with distances up to x.
In the x R regime I e x−R and pr(x) → 1.In the x R case I e x−R 2 ; T ∼ π T sin(π/T ) , and as a result, precision decays exponentially, pr(x) ∼ e −x/2 , independent of T , Fig. 4c.
The dependence of AUPR on T arises from the recall function or its derivative, rc (x), quantifying the expected distance-dependent link density and, conse-quently, the density of missing links.
E c (x) grows exponentially as e x/2 for x R values and decays as e x(1− 1 T ) for x R, reaching the maximum at . Thus, as T increases, the missing links are more likely to be located at larger distances where precision pr(x) is smaller, resulting in lower AUPR scores, consistent with our observations in Fig. 3.
We also note that AUPR scores have only weak dependence on node density parameter α, and consequently on degree distribution exponent γ = 2α + 1.Indeed, as seen from Eqs.( 41) and (40), precision and recall rates only depend on α through the node pair distribution n(x), which depends on α only in sub-leading terms, as shown in Ref. [44].

C. Coordinate uncertainty and link prediction accuracy
While the HyperLink provides the upper bound for link prediction on HRGs, it is important to note that its accuracy is comparable to that of other link prediction methods, in particular, Resource Allocation (RA) and Adamic Adar (AA) indexes, Fig. 3.This observation motivates the question: How accurately does one need to infer node coordinates to ensure the superior performance of the HyperLink?
To answer this question we analyze the impact of node coordinate uncertainty on the HyperLink accuracy.To this end, we add synthetic noise to original angular node coordinates, while keeping radial node coordinates unchanged: where a > 0 is the noise amplitude.The effects of synthetic noise on the HyperLink accuracy are depicted in Fig. 5. Our results indicate that AUPR and Precision scores, Fig. 5b,c, decrease rapidly as a function of noise amplitude, while AUC scores remain largely unchanged even at a > 1radians values.
To better understand the effects of noise on link prediction accuracy we juxtapose HyperLink prediction results to those of the RA method, which is its leading competitor according to Fig. 3.We show RA accuracy with dashed lines of matching color in Fig. 5. Consistent with our earlier observations we find that HyperLink AUC scores are robust to noise, preserving its leading ranking among other link prediction methods, Fig. 5a.
In contrast, as quantified by AUPR and Precision scores, the HyperLink is superior to the RA method only if coordinate uncertainty is sufficiently small.The maximum tolerable noise amplitude value a c increases as T increases, see the inset of Fig. 5b,c.While noise amplitude a does not exceed 10 −2 radians in the case of T = 0.1, the noise tolerance in the case of T = 0.9 is significantly higher, a c ≈ 0.5 radians, suggesting, somewhat surprisingly, that the HyperLink is better off on networks characterized by larger T values, or, equivalently, smaller clustering coefficient.
Qualitatively, the observed fast degradation of the AUPR and Precision scores is due to the sensitivity of the hyperbolic distance to the angular distance between the nodes ∆θ.It follows from Eq. ( 3) that even a small change in ∆θ may significantly change the corresponding hyperbolic distance, adversely affecting the ranking of missing link candidates at small distances x, Appendix D. Since AUPR and Precision emphasize link prediction accuracy of most likely candidates, proper ranking of unconnected node pairs at small x values is crucial.AUC scores, on the other hand, place more emphasize on less obvious link candidates and are less affected by coordinate uncertainty.We find that the uniform synthetic noise adversely affects distance dependent true positive rate tp(x|a), which scales as see Appendix D, leading to The robustness of the AUC scores to synthetic noise in HRGs can be qualitatively explained by the fact that AUC scores emphasize the prediction of missing links at large x distances.Large hyperbolic distances are affected by synthetic noise to the lesser extent then small hyperbolic distances.This effect follows directly from Eq. (3) and can be observed in Fig. 12a, displaying the saturation of tp(x|a) → 1 as x approaches 2R, regardless of noise amplitude a.
Our conclusions in this section are different for AUC and AUPR/Precision metrics.
The AUPR and Precision metrics emphasize prediction of the most likely missing link candidates and are highly sensitive to the accuracy of node coordinate inference.Synthetic noise added to original node coordinates smears hyperbolic distances among missing link candidates, adversely affecting the HyperLink accuracy.Our results suggest that one needs to maximize the accuracy of the network mapping in order to efficiently predict missing links.We also find that as temperature T increases, the performance of other link prediction methods, as measured by AUPR and Precision, decreases faster than that of the HyperLink, suggesting that the latter has a competitive advantage on networks characterized by large T values.
AUC scores, on the other hand, emphasize less obvious link candidates that correspond to node pairs at larger hyperbolic distances.Since larger hyperbolic distances are affected by coordinate uncertainty to the lesser extent, the AUC scores of the HyperLink are robust to synthetic noise, suggesting that HyperLink is capable of predicting less obvious missing links even under less accurate mapping conditions.

IV. LINK PREDICTION WITH INFERRED COORDINATES
In this section we build upon our results obtained in the previous section to analyze the HyperLink accuracy on networks with unknown node coordinates.We first conduct systematic analysis of HyperLink accuracy on HRGs with unknown node coordinates and then apply HyperLink to several real networks.In both cases network coordinates are unknown and in order to predict missing links we first infer node coordinates by mapping networks of interest to the 2-dimensional hyperbolic disk.To this end, we developed a mapping algorithm, which is tailored to the link prediction problem.This algorithm is referred to as the HyperLink Embedder and is fully described in Appendix E.

A. Tests on HRG graphs with inferred coordinates
To evaluate the HyperLink accuracy on HRG networks with unknown node coordinates we perform the following experiments.After generating an HRG network we re-move a fraction of existing missing links.As before, each existing link is removed with probability 1 − q.Occasionally, after links are removed, the remaining network splits into several components.If this is the case, we limit our consideration to the largest connected component of the pruned network.We refer to the resulting connected component of the pruned network as the training network.To predict missing links we erase our knowledge of the true node coordinates and then infer node coordinates by mapping the training network to the hyperbolic disk using the HyperLink Embedder, see Appendix E for details on the mapping procedure.After the mapping is complete, we use the inferred node coordinates to calculate distances between all unconnected node pairs in the training network and rank these pairs in the increasing order of distance.
Figures 6, 7, and 8 show the results for the AUPR, Precision, and AUC scores, respectively.Each panel in these figures is a heatmap, aggregating the link prediction accuracy scores for HRGs with different γ ∈ [2.1, 2.9] and T ∈ [0.1, 0.9] values, which we change with an increment of 0.1 each.We compare the HyperLink to the RA method, which is its leading competitor in these experiments, cf.Fig. 3a,b.
The results for the AUPR and Precision scores are similar.Quantified by these scores, the HyperLink accuracy is nearly independent of degree distribution exponent γ, and at the same time decreases rapidly as temperature T increases, see Figs. (6,7)a,d,g.This observation is consistent with our theoretical analysis in Section III, where we establish that AUPR scores decrease as T increases and do not strongly depend on γ.
Even though RA performs similar to HyperLink, Figs.(6,7)b,e,h, we note that RA is more accurate at lower T values and less accurate than HyperLink for higher T values.To obtain the direct comparison of the two methods we plot the difference between their AUPR (Precision) scores in Figs.6c,f,i(7c,f,i).In agreement with our theoretical considerations in Fig. 5, we find that the HyperLink is superior to RA in the region of γ-T phase space corresponding to higher T values, these regions are demarked with dashed lines in Figs.(6,7) c,f,i.
Compared to RA, the HyperLink yields better link prediction accuracy for larger fractions of missing links.In the case 1 − q = 0.1, for instance, HyperLink is better than RA in a small upper right corner region of the γ-T phase space, Fig. (6,7)c.On the other hand, in the case 50% of links are missing, 1 − q = 0.5, the HyperLink outperforms RA for the majority of γ-T values with the exception of smallest, T = 0.1, and largest, T = 0.9, temperature values, Fig. (6,7)i.
The better, compared to RA, performance of the Hy-perLink in Fig. (6,7)i is the result of two effects.One one hand, the HyperLink accuracy appears to increase as 1−q increases.This effect is consistent with a recent observation in Ref. [23] that the upper bound of link predictability in edge-independent graphs increases with 1 − q.On the other hand, as 1−q increases, the accuracy of RA de-  6: Link prediction accuracy for HRGs with inferred coordinates: AUPR.Panels a-i correspond to random missing links, and j-l to nonlocal missing links.Each panel is a heatmap displaying AUPR values as functions of T and γ = 2α + 1 parameters of the HRG model.We compare link prediction accuracy of the HyperLink to that of the RA and SPM methods, which are its leading competitors in cases of randomly missing links and nonlocal missing links, respectively.In each random missing link experiment links are removed uniformly at random with prescribed probabilities: a-c, 1 − q = 0.1, d-f, 1 − q = 0.3 and g-i, 1 − q = 0.5.Panels a, d, g and b, e, h show the AUPR values for HyperLink and RA respectively.Panels j-l show the AUPR values of HyperLink and SPM, as well as their difference, for nonlocal links.i.e., links connecting nodes with no common neighbors, which are a subset of randomly removed links with 1 − q = 0.5.The dashed curves in panels c, f, i, l demark the regions in the γ-T parameter space where the HyperLink accuracy is higher than that of the competitive method.
creases.RA, as well as other similarity-based methods, e.g., RA, CRA, AA, CN and JC, predict missing links based on the similarity of node neighborhoods, e.g, the number of common neighbors, the higher the similarity the higher the probability of a missing link, Appendix B. Neighborhood similarities are local measures, reflecting network structure in the network-based vicinity of the node pair of interest, and ignoring the structure of the remaining network.The larger the fraction of missing links, the smaller is the fraction of links in the training network, and as a result, the poorer the link prediction results.While this is true for all link prediction methods, the similarity-based methods are the ones that suffer most.Since links are established independently in HRGs, and each link is removed with probability p = 1 − q, the number of common neighbors between any node pairs on average decreases proportionally to p 2 .All extensive HRG properties, on the other hand, depend on p linearly.HyperLink as a global method uses the structure of the entire network to map it, so that it is less sensitive to network incompleteness.
An attractive feature of a global method is that it is capable of predicting nonlocal missing links, i.e., links between node pairs with no common neighbors.To quantify HyperLink accuracy for nonlocal links we consider the subset of nonlocal links within the set of links removed with probability 1 − q = 0.5, Fig. (6,7)j, which comprise from 20% (for γ = 2.1, T = 0.9) to 86% (for γ = 2.9, T = 0.1) of all removed links.Similarity-based methods, RA, AA, CN, and JC, cannot predict nonlocal missing links since corresponding node pairs have no common neighbors at all, and, consequently, have zero similarity.Therefore, in nonlocal link prediction experiments we compare HyperLink to SPM index, which is a global method and the leading competitor to HyperLink for nonlocal links.As seen in Figs.(6,7)k,l, SPM index yields substantially lower link prediction accuracy than HyperLink for all the considered values of γ and T .
Overall, we observe that according to the AUPR and Precision scores, HyperLink's competitive advantage is the higher, the more incomplete the network is, and the HyperLink is particularly strong in prediction nonlocal links.
According to AUC scores, the HyperLink offers superior link prediction accuracy across the entire γ-T parameter space, surpassing its leading competitors-RA for all links, and SPM for nonlocal links, Fig. 8.This  result is again consistent with our calculations in Section III showing that HRG-based AUC scores are robust with respect to coordinate uncertainty.

B. Tests on real networks
Finally, we apply the HyperLink to real networks: the Internet at the Autonomous System level [45], the network of human metabolism [46], and the Pretty-Good-Privacy (PGP) web of trust [47].Basic properties of these networks as well as the data curation steps are discussed in Appendix A.
Our link prediction experiments on real networks are performed identically to those on HRG networks with inferred coordinates, and the results are shown in Figs. 9, 10, and 11.
According to AUPR and Precision metrics, the Hy-perLink offers competitive performance in random link removal experiments, Figs.9-11a -f, but, at the same time, is not the most accurate, except for the case of the Internet at 1 − q = 0.5, Fig. 9d.We do note that the relative performance of the HyperLink is better in cases of higher missing link rate, 1 − q = 0.5, which is consistent with our results in HRG networks.
We also note that the HyperLink offers superior performance in prediction of nonlocal links where it is the best, with SPM method being its leading competitor, Figs.9gi, 10g-i, and 11g-i.This observation comes in sharp contrast with nearly random performance of similarity based methods, RA, AA, CN, JC, and CRA, in nonlocal link prediction.
By examining Precision-Recall curves in Figs. 9, 10, and 11 we observe that Precision scores of the HyperLink exceed those of its competitors for larger Recall values.This observation supports our earlier conclusions that HyperLink performs best at identifying hard-to-predict links.
In contrast to AUPR-based rankings where HyperLink is rarely the most accurate method, HyperLink is ranked best in all experiments according to the AUC metric, in agreement with all the AUC-related results above.

V. DISCUSSION
≈ const decreasing Fraction of missing links 1 − q increasing increasing Noise amplitude a ≈ const decreasing  I.
Tables I and II summarize the results in Sections III and IV, respectively.We see that when it comes to predicting obvious missing links that are easy to predict, then employing hyperbolic geometry may be not the best way to proceed.One can safely use a variety of other methods that may be better at this task, according to the AUPR or Precision measures.This is because most obvious missing links are the links between closest nodes in the latent hyperbolic space, and to rank them exactly at the top of the disconnected node pair list, one has to infer the coordinates nearly exactly, Section III.
However, if the task is to identify missing links that are really hard to predicts, then this is where hyperbolic network geometry should be used.The most striking example is the prediction of missing links between nodes that do no share any common neighbors.Here the Hy-perLink is by far the best, according to all the AUC, AUPR, and Precision measures, in all the considered real and synthetic networks.It is not surprising that local methods do a poor job in predicting such links-they are simply not designed to do so.In contrast, the Hyper-Link is not a local but global method, taking the whole network structure into account, but so is the highly advanced SPM method that outperforms a vast collection of other methods according to [48], and we see that the HyperLink outperforms even it by far at this task.
We also see that according to the AUC measure, the HyperLink is also the best in all the considered situations.This is because the AUC does not care that much about false positives, in which case the HyperLink achieves the best balance between the true and false positive rates by doing better than any other method in finding missing links between highly dissimilar nodes located at large distances in the latent hyperbolic space.
We have also shown that the HyperLink is the better off, the weaker the clustering (the higher the T ), and the larger the fraction of missing links 1 − q in HRGs with inferred coordinates.This does not mean that Hyper-Link's link prediction accuracy scores are getting better in these more difficult conditions; its scores do degrade.But the speed of the degradation of these that the other methods experience are higher than HyperLink's.
Overall, it appears to be that the harder a specific link prediction task is, the better the HyperLink is at this task.
Our results also resolve the controversy among earlier reports on link prediction using hyperbolic geometry [23,[32][33][34][35][36].These reports approached link prediction using different measures of link prediction accuracy.To reiterate, if applied to sparse networks, the AUPR emphasizes the prediction of a small fraction of the most likely missing links and, as a result, is extremely sensitive to inaccuracies in the node coordinate inference.On the other hand, the AUC is more robust to coordinate uncertainties as it emphasizes the prediction of less likely missing links between dissimilar nodes at large latent distances.
To maximize HyperLink's link prediction accuracy, we have developed a new hyperbolic network mapping method, the HyperLink Embedder, that maximizes the accuracy of coordinate inference.Its accuracy comes at the computational complexity cost of O n 2 .While faster methods for hyperbolic mapping have been developed recently [49][50][51][52][53], an optimal balance between the accuracy and speed of hyperbolic mapping is still to be found.Ideally, it would be highly desirable to have a method that would be as accurate as at least the Hyper-Link Embedder, and that would run in O (n) time.
Finally, we emphasize that link prediction using latent hyperbolic geometry is expected to yield best results only if this geometry is there in a given network.That is, the network structure must be consistent with the existence of this geometry.It is well known that HRGs are characterized by sparsity, self-similarity, scale-free degree distributions, and strong clustering, meaning that these prop- Precision q = 0.1 Precision q = 0.5

Nonlocal links
FIG. 9: Link prediction accuracy for the Internet with a-c 10% (q=0.9)randomly missing links, d-f 50% (q=0.5)randomly missing links, and g-i nonlocal missing links, i.e., links connecting node pairs that have no common neighbors.Nonlocal links constitute 32% of the q = 0.5 missing links set.Panels a, d, g, j depict Precision, AUPR, and AUC link prediction scores.Panels b, e, h, k and c, f, i, l show, respectively, the ROC and PR curves.
erties are necessary conditions for hyperbolic geometry presence.It is also well known that many real networks do possess these properties as well.The results in [31] suggest that clustering is also a sufficient condition for network geometricity, but these results apply only to homogeneous large-world networks, and ignore coordinate entropy.In other words, the detailed sufficient conditions for the presence of latent hyperbolic geometry are currently unknown, remaining a subject of ongoing research.III: Basic properties of the considered real networks.N is the number of top nodes, E is the number of edges, k is the average degree, γ is the degree distribution exponent, which we estimated using methods from Ref. [55], c is the average degree-dependent clustering coefficient, and T is the corresponding HRG temperature.
Appendix B: Link Prediction: Alternative Methods and Scoring Techniques

Alternative Methods
We compare the accuracy of the HyperLink link prediction method against the following set of link-prediction methods: Common Neighbors (CN) [56], Adamic and Adar (AA) [7], Resource Allocation (RA) [57]), Local Community Resource Allocation (CRA) [58], Jaccard's index (JC) [59], and Structural Perturbation Method (SPM) [48].All these methods, as well as HyperLink, assign scores to (a subset of) all not directly connected pairs of nodes (non-links), and all such pairs are then ranked according to these scores from the most to least likely interaction prediction.To briefly describe these methods, it is thus sufficient to tell how these scores are calculated, for which we use the following notations: k i the degree of node i; Γ(i) the set of i's neighbors (directly connected nodes); γ ij (s) the subset of Γ(s) that are neighbors of both i and j; e j i , i's j-external degree, the number of i's neighbors that are not j's neighbors; A A A is the network adjacency matrix.Common Neighbors (CN).The score for a pair of nodes i and j is defined as the cardinality of the intersection of their sets of neighbors, Jaccard's index (JC).The score is a normalized measure of the overlap of i's and j's sets of neighbors, Adamic-Adar index (AA).The score assigns more weight to the less-connected neighbors, Resource Allocation index (RA).The score is similar to the AA score, but punishes high-degree nodes more strongly, Local Community Resource Allocation index (CRA).The score is similar to the RA score, but takes into account the subset of nodes shared between nodes i, j and their common neighbors s: Structural Perturbation Method (SPM).This method is based on repetitive perturbations of the adjacency matrix A A A by removals of small fractions of links that we denote by ∆E.The original adjacency matrix can then be written as A A A = A A A +∆A ∆A ∆A, where A A A is the adjacency matrix of the network after removal of links ∆E, and ∆A ∆A ∆A is the adjacency matrix constructed on the set of removed links ∆E.Denoting eigenvectors and eigenvalues of A A A by x k and λ k , the perturbations of the original eigenvalues λ k using the perturbation matrix ∆A ∆A ∆A, are so that the perturbed adjacency matrix is All non-links i, j are then ranked by A ij .In our experiments we repeat this perturbation procedure 10 times, and then average perturbed matrices over these trials, thus obtaining an averaged perturbed matrix A A A , so that the SPM score is Appendix C: Basic properties of the HRG model The hyperbolic geometry inference algorithm relies on several properties of the HRG model, which we review in this section.
Degree distribution.HRG models are characterized by scale-free degree distributions, P (k) ∼ k −γ , where γ = 2α+1.Indeed, the expected degree of a node located at (r, θ) is independent of its angular coordinate θ, k(r, θ) = k(r, 0) = k(r), and is given by see [19].The average degree of the model is given by As seen from Eq. (C2), k in the most general case depends on the network size N .
To achieve sparse models with k independent of N one sets the radius of the hyperbolic disk to where ν > 0 is the tuning parameter, directly related to k.Indeed, with R(N ) given by (C3) prescribing the value of ν for the target values of k, α and T .It has been shown in [60] that in the sparse limit the probability of a node located at (r, θ) to have k connections can be approximated with the Poisson distribution with the mean of k(r): Then degree distribution of the HRG is It follows from Eqs. (C1) and (C4) that model parameters α and R can be used to control degree distribution exponent γ and the average degree of the model, respectively.Clustering coefficient.As seen from Eq. ( 6), connection probability p(x) decreases exponentially for distances x > R with the rate of 1  2 T .Thus, the temperature parameter T , tunes the role of large distances in the formation of links: the higher the T the more likely are long-distance connections.As a result, T controls the clustering coefficient of the HRG.In the T → 0 limit connections are only possible at hyperbolic distances x < R and the clustering coefficient is maximized.Conversely, the clustering coefficient decreases as T increases and vanishes asymptotically in the T ≥ 1 case [19].
where n(x|a) is the node pair distribution in the hyperbolic disk with coordinate noise.
Due to the uniform initial angular distribution ρ(θ), the node pair distribution is independent of noise, n(x|a) = n(x), Fig. 12b.Further, in the case of sufficiently large noise amplitude a, tp(x|a)) As a result, in the case x < R pr(x|a) ∼ a 1−2α , see Fig. 12c.Since distance-dependent recall function is proportional to the true positive rate, resulting AUPR score scales as where see Fig. 12d.This result suggest that the impact of coordinate uncertainty on link prediction is higher in HRG with larger γ = 2α + 1 values.Intuitively, this is the case since networks with larger γ values have larger fractions of small degree nodes.Small degree nodes in the HRG are characterized by large radial coordinates, and hyperbolic distance between point with large radial coordinates are most affected by angular coordinate uncertainties.

The average degree of the noisy subgraph
Here we derive the leading term behavior of the average degree of the noisy subgraph G y (a) for y < R. As shown in the subsection above, the number of true positives tp(y|a) is related to the average degree of noisy subgraph G y (a).To define G y (a) we add uniform noise of amplitude a to original angular coordinates of the HRG and calculate noisy hyperbolic distances xij between all node pairs using noisy coordinates.G y (a) is the HRG subgraph formed by node pairs with noisy hyperbolic distances xij < y.The average degree of G y (a) is given by (D14) Here ρ(r) is given by Eq. ( 5), and ρ(θ| θ) is the conditional probability of the true angle θ, given inferred angle θ.In case of the uniform noise, ρ(θ| θ) is also a uniform distribution centered at θ: Throughout the calculation of k y (a) we will rely on the number of assumptions.We are primarily interested in HRGs with 2 < γ < 3, which correspond to 1 2 < α < 1.Further, we focus on the values of y > R/2 since these values allow us to simplify calculations.One can justify this constraint since the number of node pairs in HRG is very small for y ≤ R/2.To identify leading terms we will also recall on the scaling of R with the system size, N ∼ e Since hyperbolic distance x in Eq. (3) depends on θ 1 and θ 2 only through their difference, and angles distributed uniformly on [−π, π], ρ( θ1,2 ) = 1 2π we can simplify Eq. (D14) as where and Θ[x] is the Heaviside Theta function.Similar to the calculation of k in the HRGs, Ref. [19], we can rewrite Eq. (D19) as FIG. 13: Integration domain for ky(a) in the case y < R. The integration is performed at the intersection of two hyperbolic disks.The first disk (yellow) corresponds to the latent space of the HRG, has radius R and is centered at the origin.The second disk (blue) has radius y and is centered at (r 1 , 0).The third disk depicts the integration radius r 2 that sweeps the integration domain.Angle φy ≈ 2e y−r 1 −r 2 corresponds to the intersection of disks y and r 2 .Bases on R, y, and r 1 values we distinguish three configurations.a, Disk y contains the origin and is fully contained within R, regions I and II.b, Disk y contains the origin and is partially contained within R, region III.c, Disk y does not contain the origin and is partially contained within R, regions IV and V. d, Shaded region corresponds to the integration domain for ky(a).Vertical dashed lines separate the five integration regions.Phase space below the blue dashed lines correspond to the case of disk r 2 fully contained within disk y.Phase space above the blue line corresponds to the case of disk r 2 intersecting disk y.The red dashed line is given by r 2 + r 1 = R − 2 ln a 2 and corresponds to the loci of the integrand maxima in regions, II, III, and IV.
where k y (r|a) is the average degree of node with radial coordinate r in noisy subgraph G y (a): and angles φ ≡ ∆θ 12 and φ ≡ ∆ θ12 are introduced to ease the notation.
To evaluate k y (r 1 |a) we note that the integration region in Eq. (D22) is given by intersection of two hyperbolic disks.The first one is of radius R and is centered at the coordinate system origin, (0, 0).The second disk is of radius y and is centered at (r 1 , 0).We perform the integration for the two regimes of y ∈ [0, R] and y ∈ [R, 2R] separately.
To evaluate k y (r 1 |a) for the y ≤ R case we need to distinguish five cases: R, which we depict for convenience in Fig. 13.
Region I: 0 ≤ r 1 ≤ R−y 2 − ln a 2 .In this region the disk y is fully contained within the disk R. Further, since y > R/2, disk y is guaranteed to include the coordinate system origin for all r 1 ∈ [0, R − y] values, Fig. 13a.In this case the integral in k y (r 1 |a) can be evaluated as where φ y is the angle given by the intersection of disk with radius r 2 centered at r = 0 and that of radius y, centered at r = r 1 .To estimate φ y we consider the triangle formed by the origin (0, 0), disk y center at (r 1 , 0) and the intersection of r 2 with y.The triangle has sides equal r 1 , r 2 and y with φ y being the angle between r 1 and r 2 .Thus, φ y is given by the hyperbolic law of cosines: In the case of sufficiently large r 1 , r 2 , and y values we can approximate cos φ y as cos φ y ≈ 1 − 2e y−r1−r2 (D27) Since φ in the first integral sweeps the entire 2π angle, I 1 is given by Then, since x r 1 , r 2 , φ ≤ r 1 + r 2 ≤ R, p x r 1 , r 2 , φ ≈ 1, leading to The evaluation of I 2 is more involved and requires further approximations.We notice that φ y 1 since r 2 ∈ [y − r 1 , y + r 1 ], and can be further approximated as Then, for sufficiently large noise amplitudes a φ y , we can approximate the integral φ+a φ−a dφ as 2 a 0 dφ, resulting in Since r 1 < R−y 2 − ln a 2 , r 2 < y + r 1 , and y < R, it follows that x (r 1 , r 2 , φ) < r 1 + r 2 + 2 ln a 2 < R, and, as a result, exp x(r1,r2,φ)−R 2T 1, resulting in e −αR e αy e (α−1)r1 .(D32) We are primarily interested in the γ > 2 case, which corresponds to α > 1/2.In this case I 2 I 1 , and e −αR e αy e (α−1)r1 .

(D33)
Region II: R−y 2 − ln a 2 ≤ r 1 ≤ R − y.Similar to Region I, the hyperbolic disk y fully lies within disk R, Fig. 13a.Thus, k y (r 1 |a) is given by the same expression The calculation of I 3 is identical to that of I 1 , resulting in Different from region I is the calculation of I 4 .Indeed, in the case r 1 ≥ R−y 2 + ln a 2 , and r 2 ∈ [y − r 1 , y + r 1 ] hyperbolic distance x (r 1 , r 2 , φ) is no longer guaranteed to be smaller than R, and p [x (r 1 , r 2 , φ)] can no longer be approximated by unity.We first split I 4 into two parts and calculate them separately: where By approximating the hyperbolic law of cosines in Eq. (3) as, x (r 1 , r 2 , φ) ≈ r 1 +r 2 +2 ln φ 2 and making use of Eq. (D30) we obtain for I 41 : where is the same function as in Eq. ( 27).
Recall that for small z 1 function I(z; T ) ≈ z, while for z . With these approximations in mind we split the integration in I 41 into two subregions : ; T ≈ I (T ).Using these approximations we obtain, to the leading order Following the same approximation steps, where in the case T < 1/2.Taken together, I 41 and I 42 result in Finally, since y < R, we conclude that I 3 I 4 , resulting in In this region disk y is partially contained within the disk R. Since r 1 ≤ y, disk y still contains the coordinate system origin, Fig. 13b.Similar to regions I and II, we split the calculation of k y (r 1 |a) into two parts: where φ y 1 is the intersection angle of disk r 2 with that of y, see Fig. 13b, and is given by Eq. (D30).We first note that the integral in I 5 is identical to those in I 3 and I 1 : The integration in I 6 is very similar to that in I 4 with the only difference in the upper integration bound of r 2 ≤ R.
The evaluation of I 6 is, therefore, is straightforward and requires the same approximation steps as in I 4 .A quicker estimate can be obtained by noting that the upper bound for r 2 in I 4 does not contribute to the leading term.The reason is that I 42 is dominated by r 2 in the vicinity of the with integrands identical to those of I 41 and I 42 .Since the integrand in I 42 is dominated by smaller r 2 values we conclude that Finally, I 6 dominates I 5 for α > 1 2 , resulting in Region IV: y ≤ r 1 ≤ R+y 2 − ln a 2 .In this region, hyperbolic disk y is partially contained within R and does not include the origin, Fig. 13c.Therefore, in this region Using the arguments similar to that of Region III, we obtain: Similar to the situation in region IV, hyperbolic disk y intersects disk R and does not include the coordinate system origin.Different from region IV is the r 2 = R − r 1 − 2 ln a 2 point that lies outside the r 2 integration region and we can no longer relate k y (r 1 |a) to those in other regions.
To evaluate we recall that φ y 1, and for sufficiently large a φ y we obtain  In the regime ygeqR hyperbolic disk y always contains the origin, Fig. 14.To evaluate k y (r 1 |a) in this regime we need distinguish 2 cases, (VI) 0 ≤ r 1 ≤ y − R, and (VII) y − R ≤ r 1 , R.
In this regime hyperbolic disk R is fully contained within hyperbolic disk y, Fig. 14a, and the k y (r 1 |a) = k(r 1 ), where k(r 1 ) is the average degree of a node at r 1 in the HRG graph.Indeed, radial coordinates of all points are within disk R, and all distances from point (r 1 , 0) to any point within disk R are guaranteed to be smaller than y, x(r 1 , 0, r 2 , θ) < y for any θ ∈ [0, 2π].Therefore in this regime Since the integral over φ sweeps the entire circle, θ ∈ [0, 2π], synthetic noise does not affect the integration: in the case 0 ≤ r 1 ≤ y − R.
Region VII: R − y ≤ r 1 ≤ R. In this regime hyperbolic disk R is partially contained within y and the calculation of k y (r 1 |a) splits into two integrals, where φ y is the angle of intersection of disks R and y, Fig. 14b.We note that the integration region for I 9 is identical to that of I 1 .Different, from the case of I 1 is the condition that y > R. In this case x (r 1 , r 2 , φ) is no longer guaranteed to be less than R, and p [x (r 1 , r 2 , φ)] cannot be approxmated by 1.We start evaluating I 9 by performing the integration over φ, which leads to r 1 +r 2 −R 2 ; T ≈ I(∞; T ).The remaining integration steps in I 9 are straightforward, resulting in Finally, since y > R and α > 1 2 , we get In order to calculate I 10 we first need to estimate the cutoff angle φ y , which is given by the intersection of disks R and y, and is given by Eq. (D27).φ y takes values from φ y ≈ 2e y−2R 2 at r 1 = r 2 = R to φ y = π at r 2 = y − r 1 .Thus, we can no longer use the φ y a approximation, as in I 2 .To proceed further we note that the integration domain in I 10 is given by the area above the r 2 = y−r 1 line, Fig. 14c.We recall that the integration in the case y < R is dominated by points in the vicinity of the r 1 + r 2 = R − 2 ln a 2 line, see red dashed line in Fig. 13c.Let us assume that this is also the case in the y ≥ R regime, red dashed line in Fig. 14c.We next note that in the vicinity of r 1 + r 2 = R − 2 ln a 2 line cos φ y ≈ 1 − 2e y−R−2 ln a 2 .For sufficiently small noise amplitude, such that y < R − 2 ln a 2 , the cutoff angle φ y 1 and can be approximated by Eq. (D30), and we can employ the same approximation techniques as in I 2 .
Our strategy now is to split the integration domain of I 10 into two parts by curve r 2 = R(r 1 ) such that (i) this curve is below the r 1 + r 2 = R − 2 ln a 2 line, and (ii) above this curve, r 2 > R(r 1 ), the cutoff angle φ y 1.One possibility for such curve is R(r 1 ) = A − r 1 line, where A = y+R 2 − ln a 2 , green dashed curve in Fig. 14c.Then region VII splits into two subregions, VIIA and VIIB, corresponding to r 1 ∈ y − R, 2 ln 2 a and r 1 ∈ 2 ln 2 a , R , respectively, see Fig. 14c.We expect that the contribution to k y (a) from VIIA to be much small than that from VIIB since the latter contains r 1 +r 2 = R −2 ln a 2 line and the former does not.Therefore, we will estimate the upper bound for k y (r 1 |a) in VIIA by replacing φ y with π.In subregion VIIB we split the integration over r 2 into two intervals, r 2 ∈ 0, R(r 1 ) and r 2 ∈ R(r 1 ), R .
a .Here the integral splits into ) Following our strategy, we evaluate the upper bound for I 12 by replacing the integration limit of φ y with π: Then, Here we distinguish three intervals: k y (r 1 |a) = I 13 + I 14 + I 15 , (D79) where R(r 1 ) = y+R 2 − ln a 2 − r 1 .We evaluate the upper bound for I 14 by replacing the φ y cutoff with π leading to After the same calculation steps as in I 9 we obtain To evaluate I 15 we use the φ y 1 assumption, which enables us to use Eq.(D30).This approximation holds since r 2 > R(r 1 ).Then, by following the same simplification steps as in I 4 we obtain Following the same evaluation steps as in I 4 we confirm that both I 151 and I 152 are dominated by points in the vicinity of r 1 + r 2 = R − 2 ln a 2 , resulting in By comparing Eqs.(D91) and Eqs.(D86) we establish that I 15 I 13 + I 14 since R(r 1 ) < R and y > R, confirming our hypothesis and resulting in in case 2 ln 2 a ≤ r 1 ≤ R. Taken together, our results for regions VI and VII read: Using Eq. (D93) together with Eq. (D21) we finally obtain Finally, we conclude that k 2 y (a) k 1 y (a) since y > R, which allows us to establish for y ≥ R. Equation (D64) together with Eq. (D97) establish the baseline for calculation of AUPR(a) in Subsection D 1.

Appendix E: HyperLink Embedder
The original hyperbolic geometry inference algorithm was developed in Ref. [38] and is based on maximum likelihood estimation (MLE).While the algorithm is rather slow with the overall computational complexity of O N 3 , it has been shown to accurately infer node coordinates in H 2 leading to a number of promising applications ranging from the interdomain Internet routing [38] to understanding the growth of large-scale networks [25].

Finite size effects and model parameter inference
The HRG model has four parameters: the number of nodes N , hyperbolic disk radius R, node density parameter α and temperature T .
To infer α we first estimate the degree distribution exponent γ through the inspection of the network degree distribution P (k).Node density α is related to γ through Eq. (C7).
The estimation of N and in R is less straightforward due to finite size effects.First, in a real network one normally can only observe nodes with non-zero degrees.In contrast, the HRG model may generate nodes of zero degree, which are accounted for in the calculation of network's average degree, k, Eq. (C2).
Second, due to finite size effects, there is a cut of value for the smallest node radius, R 0 , affecting e −r/2 and, as a result, the observable k(r) and k, Eqs.(C2) and (C1).Specifically, with the radius cutoff R 0 where λ(α, x) is the finite size correction coefficient However, in networks with α close to 1/2 (γ close 2) the rate of λ convergence is slow and one needs to account for non-zero R 0 .Third, one needs to account for missing links that affect all observable properties of the HRG model.In the particular case links are missing uniformly with probability 1 − q, the connection probability function p(x) gets attenuated by the factor of q, Eq. (E4), affecting all observable network properties.
Taken together, zero degree nodes, minimum radius cutoff and missing links affect observable network properties as follows: where kmax is the maximum degree observed in the network and P (0) is the fraction of zero degree nodes in the network.The latter can be estimated by averaging the conditional degree distribution P (k = 0|r) in Eq. (C5) over possible r values: where Γ [s, x] is the upper incomplete gamma function.
The caveat here is that parameter estimation presumes the knowledge of the missing link probability 1 − q.While this information is available in our synthetic experiments, it may not be available in real networks.In case the fraction of missing links is small, one can assume that q = 1.The most general case of substantially incomplete networks where q 1 is beyond the scope of this work and will be studied elsewhere.Finally, the temperature parameter T needs to be estimated numerically by finding the solution of where c 0 is the average clustering coefficient of the network of interest and c(T ) is the average clustering coefficient of the HRG model generated with temperature T .We utilize this approach to infer T of real networks in Section IV B, while in experiments with HRG models we use actual T values.

MLE-based inference of radial node coordinates
To infer radial node coordinates we extremize the logarithm of the likelihood function, ∂ ∂r ln L ({x i }|a ij , P, q) = 0, (E16) obtaining 2αT coth (αr ) + j 1 − p (x j ) 1 − qp (x j ) (a ,j − qp (x ,j )) ∂x j ∂r = 0. (E17) In the case of sufficiently large r values coth (αr ) ≈ 1.Further, one can approximate x j as x j ≈ r + r j + ln sin θ j /2, (E18) resulting in ∂x j ∂r ≈ 1.Taken together, these approximations allow us to simplify Eq.(E17) as 2αT + j a ,j − q j p (x ,j ) = 0 (E19) for 1 − q 1.Note that the first summation in Eq. (E19) is the degree of node , j a j = k , while the second summation is the expected degree of node with r , k (r ) = q j p (x ,j ).As a result, the value of r extremizing the likelihood is given by k (r ) = k + 2αT, (E20) where k (r) is the observable expected degree of node with radial coordinate r.Since the latter is given by k (r) = qλ (α, R − R 0 ) k e −r/2 e −r/2 , (E21) one can estimate r as r = 2 ln qλ (α, R − R 0 ) k (k + 2αT ) e −r/2 . (E22)

MLE inference of angular node coordinates
To infer angular node coordinates one needs to maximize the likelihood ln L ({x i }|a ij , P, q) in Eq. (E5) with respect to angular coordinates {θ i }, given the MLE values for radial coordinates {r i }.Since the maximization of define the set C of all nodes with degrees k > 1.We then rank all nodes in C in the decreasing order of their degree value, and split the resulting node list into m layers with logarithmically growing sizes s i , i = 0, .., m − 1: where N (k > 1) is the number of nodes with degree k > 1, and s 0 N .Unless otherwise noted, we set s 0 = 20.Finally, all k = 1 nodes are assigned to the outer layer s m .
Complementary to layers {s i }, we also define self-enclosed cores {cr i }, i = 0, .., m, such that core cr i contains all layer with indices j ≤ i, cr i = i j=0 s j , as well as the sequence of nested subgraphs {G i }, i = 0, .., m, spanned by the nodes in corresponding cores, see Fig. 15.
We start by inferring node angular coordinates i ∈ cr 0 by maximizing G 0 -specific likelihood lnL [G 0 ].We then utilize the inferred angles {θ i } ∈ cr 0 as initial approximation to maximize lnL [G 1 ].We continue the angular coordinate inference procedure in the nested fashion to find angular values maximizing lnL [G m ]: We maximize each log-likelihood lnL [G ] iteratively by visiting G nodes in rounds.At each round every node i in G is visited once and placed at θi maximizing its local log-likelihood L [G S ] i with respect to the current angular values of other nodes in G .The procedure is continued until we arrive at the stable angular configuration: where 0 < 1 is the precision parameter, and ∆ θi is the angular difference between angular positions of node i in two consecutive rounds.In our experiments we set = 10 −4 radians.
The required total number of all-node visit rounds is typically small, of the order of the network average degree.In certain circumstances, e.g., in the case of the global ln L [G ] maximum close to the second largest maximum, the procedure may require a large number of rounds to converge.To avoid these scenarios we limit the maximum number of rounds to 10 per G .
Our experiments indicate that the resulting HyperLink link prediction accuracy is highly sensitive to the correct placement of highest degree nodes.Thus, to further improve angular inference of the most connected nodes, we split the procedure into two parts, = 0, 1, .., m/2 and = m/2 + 1, .., m, respectively.The first part is repeated independently for max iter = 20 times, starting from different initial angle values.For each repetition the resulting ln L G m/2 value is computed.The second part is carried out only once using {θ i } values corresponding to the iteration with largest ln L G m/2 value.Since = 0, 1, .., m/2 cores are significantly smaller than = m/2 + 1, .., m/2 cores, the first part is carried out much faster than the second, despite the large number of repetitions.
After each round we perturb the angular coordinates θi , i ∈ cr , by adding random noise θi ← θi + a( )X i , (E30) with amplitude a( ), which we decrease linearly as a( ) = π 4 1 − m + a 0 .These coordinate perturbations allow us to avoid getting trapped in local maxima of the log-likelihood function and to arrive to the optimal angles {θ i } faster.We also stress the importance of the non-zero residual noise amplitude of a 0 .In the final = m stage residual noise allows to effectively "repel" k = 1 nodes connected to the same node.Without residual noise at the = m step, all k = 1 nodes connected to the same node are likely to be placed very close to each other and their common neighbor.As a result, pairs of these k = 1 nodes will be ranked as the most likely candidates for link prediction, and will adversely affect the HyperLink accuracy.Our experiments indicate that the HyperLink accuracy is not sensitive to specific a 0 values, as long as a 0 ∈ 10 −6 , 10 −3 .In all our experiments we set a 0 = 10 −4 radians.The angular inference procedure is summarized in Alg. 1.
Having sketched the angular inference procedure, we now focus on the individual node placement subroutine.We determine θi for each node by maximizing the corresponding local log-likelihood ln L [G ] i .To this end, we split the angular space [−π, π] evenly into O(N ) regions, where N is the number of nodes in G .By placing node i into each of these regions we then identify θi maximizing its local likelihood.Since ln L [G ] i calculation takes O(N ) steps for each θ i value, it takes O(N 3 ) steps to execute each round .As a result, the overall running-time complexity for m layers, O(mN 3 ), is prohibitive for large networks.To reduce the running time complexity to O(m k N 2 ), where k is the average degree of the entire network we utilize the following approximation, first offered in Ref. [38].For each node we first obtain the rough estimate of θi by taking into account only its neighboring nodes in G .To this end we find the nearly-optimal placement θi by maximizing ln L [G S ] i = j =i∈G S a ij ln p (x ij ) . (E32) Since the summation in Eq. (E32) goes only through node i neighbors, it now takes O(k i N ) steps to find θi .Having obtained the initial approximation, we then look for the optimal angle θi in the neighborhood of θi maximizing the full local likelihood ln L [G S ] i , which takes O(LN ) steps, where L is the neighborhood centered at θi .Specifically, we search for θi within L = 300 N N regions on both sides of θi , which takes O To validate the hyperbolic geometry inference algorithm we compare inferred coordinates in the HRG model network to its true coordinates.Parameters of the HRG are taken to be N = 5, 000, k = 10, T = 0.5, and γ = 2.5.As seen from Figs. 16 a-c, the accuracy of the angular coordinate inference does not decline significantly for small degree nodes.This is the case, mainly, due to the nested inference with inference cores cr i covering all network nodes, in contrast to the original algorithm of Ref. [38], where cores only cover the most connected nodes.
As seen from Fig. 16 d, Eq. (E22) allows for accurate inference of small radial coordinates.At the same time, radial coordinates inference is less accurate for large radial coordinates.To explain this observation we recall that the key assumption in Eq. (E22) is that the node degree, in the HRG is fully determined by its radial coordinate.In other words, we assume that possible node degree values are narrowly distributed around its expected value, which is

FIG. 1 :
FIG. 1: Confusion matrix and a toy example of link prediction.(Top) Confusion matrix for link prediction.(Bottom) Toy link prediction example.Existing links are shown with solid black lines.Missing links, Ω R = {13}, are shown with red dotted lines, while predicted missing links, Ω(λ) = {13, 14} are shown with green dashed lines.In this example the sizes of the confusion matrix sets are TP = 1, FP = 1, FN = 0, and TN = 1.

FIG. 4 :
FIG.4: Link prediction with known coordinates.a, true positive rate tpr(x), b, false positive rate fpr(x), c, Precision pr(x) and, d, link density n(x)p(x) as a function of hyperbolic distance x.In all experiments we remove links uniformly at random with probability 1 − q = 0.5.Then missing links are predicted using hyperbolic distances between unconnected node pairs.All results correspond to HRG graphs with N = 10 4 nodes, γ = 2.5 and k = 10.The insets display the same plots as the main panels but in log-linear format.Solid lines correspond to analytical estimates.

FIG. 5 :
FIG.5: Effects of synthetic noise on link prediction accuracy.HyperLink accuracy quantified using, a, AUC, b, AUPR, and c, Precision scores as a function of noise amplitude a for HRG graphs with different T values.All results correspond to HRG graphs with N = 10 4 nodes, γ = 2.5 and k = 10.The HyperLink accuracy is compared to that of RA, i.e., its top competitor according to Fig.3.Corresponding scores of the RA index are shown with dashed lines of matching color.The insets of panels b and c display the maximum tolerable coordinate noise amplitude as a function of T , i.e., the values of a corresponding to equal HyperLink and RA accuracy.

FIG. 7 :
FIG. 7:Link prediction accuracy for HRGs with inferred coordinates: Precision.The legend is identical to that of Fig.6.

FIG. 8 :
FIG.8: Link prediction accuracy for HRGs with inferred coordinates: AUC.The legend is identical to that of Fig.6.

P 9 FIG. 12 :
FIG. 12: HyperLink accuracy in case of coordinate uncertainty.All plots correspond to HRG graphs of N = 10 5 nodes, γ = 2.5 (α = 0.75), T = 0.1 and k = 10.a, Distance-dependent true positive rate tp(x|a) evaluated for different noise amplitude values.For x < R, tp(x|a) grows as e x/2 , see the dashed line for the reference.The inset tests the scaling of tp(x|a) ∼ a 1−2α for x < R. b, The cumulative number of node pairs in the hyperbolic disc as a function of hyperbolic distance between the nodes.Note that the cumulative number of node pairs is independent of noise amplitude.c, Distance-dependent precision rate pr(x|a) for different a values.pr(x|a) is nearly constant for x < R since both tp(x|a) and n(x|a) grow as e x/2 .pr(x|a) decays as e −x/2 for x > R. The inset test the scaling of pr(x|a) ∼ a 1−2α for x < R. d.The scaling test for AUPR(a) of the HRG with N = 5000, γ = 2.5, and k = 10.Note that a 4α−2 AUPR(a) grows linearly as a function of R + 2 ln a 2 , confirming Eq. (D11).

FIG. 14 :
FIG.14: Integration domain for ky(a) at y > R. The integration is performed at the intersection of two hyperbolic disks.The first disk (yellow) corresponds to the latent space of the HRG, has radius R and is centered at the origin.The second disk (blue) has radius y and is centered at (r 1 , 0).The third disk (green) depicts the integration radius r 2 that sweeps the integration domain.Angle φy corresponds to the intersection of disks y and r 2 .Based on R, y, and r 1 values, we distinguish two configurations.a, Disk y fully contains disk R, regions VI. b, Disk y overlaps within R, region VII.c, The integration domain ky(a) is shown by the shaded region.Vertical dashed lines separate the domain into two integration regions, VI and VII.Region VII further splits into subregions VIIA and VIIB.Phase space below the blue dashed line correspond to the case of disk r 2 fully contained within disks y and R. Phase space above the blue line corresponds to the case of disk r 2 intersecting disk y.The red dashed line is given by r 2 + r 1 = R − 2 ln a 2 and corresponds to the loci of the integrand maxima in region VII.Green dashed line corresponds to the R(r 1 ) line.By construction, φy 1 for r 2 ≥ R(r 1 ).

2 < 1 and we approximate I π 2 e r 1 +r 2 −R 2 ; T ≈ π 2 e r 1 +r 2 −R 2 , while in the second integral π 2 e r 1
z; T ) is given by Eq.(27).Recall that I(z; T ) ≈ z if z 1 and I(z; T ) ≈ I(∞; T ) in case x 1.Thus, to evaluate I 9 we split the integration over r 2 in to two integrals,

FIG. 15 :
FIG. 15: Layered network structure for MLE inference.Nodes are sorted in the decreasing order of their degree and placed into logarithmically-sized layers.The outer layer contains only k = 1 nodes.

FIG. 16 :
FIG. 16:Testing the hyperbolic geometry inference algorithm.Here we plot inferred vs original node coordinates for the HRG graph that we map to the hyperbolic space.All plots correspond to the same HRG network of N = 5, 000, k = 10, T = 0.5, and γ = 2.5.Panels a and b display angular coordinates for nodes with degrees k > 25 and k > 2, respectively.Panel c displays angular coordinates of all nodes.Panel d displays radial coordinates of all nodes in the graph.

N 2 N
steps, leading to the overall running time complexity of O(m k N 2 ) steps.The individual node placement subroutine is summarized in Alg. 2.

TABLE I :
The summary of the results in Section III: HyperLink's measures of accuracy of link prediction in HRGs with known node coordinates as functions of the parameters in Section III.

TABLE II :
The summary of the results in Section IV: HyperLink's measures of accuracy of link prediction in HRGs with inferred coordinates and in real networks, as well as those for nonlocal links, compared to other methods.The parameters are the same as in Table