Ollivier-Ricci curvature convergence in random geometric graphs

Connections between continuous and discrete worlds tend to be elusive. One example is curvature. Even though there exist numerous nonequivalent definitions of graph curvature, none is rigorously known to converge in any limit to any traditional definition of curvature of a Riemannian manifold. Here we show that Ollivier curvature of random geometric graphs in any Riemannian manifold converges in the continuum limit to Ricci curvature of the underlying manifold. This result establishes the first rigorous link between a definition of curvature applicable to random graphs and a traditional definition of curvature of smooth space.

Curvature is one of the most basic geometric characteristic of space. The original definitions of curvature apply only to smooth Riemannian or Lorentzian manifolds, yet recent extensions of curvature to graphs and networks have seen a surge of interest in areas as diverse as network/data science and quantum gravity. In network science, graph curvature is interesting in general since many real-world networks were found to possess different flavors of geometry [1]. On the application side, graph curvature was used to characterize congestion in telecommunication networks [2,3], detect cancer cells [4], analyze robustness and other properties of the Internet, financial, and brain networks [5][6][7][8][9], and in a number of classic applications such as community inference [10] and network embedding in machine learning [11]. In general relativity, scalar Ricci curvature is a key player since it appears in the Einstein-Hilbert action whose least-action extremization leads to Einstein's equations [12]. Therefore, various extensions of Ricci curvature to graphs and simplicial complexes representing "quantum spacetimes" have attracted significant research attention in different approaches to quantum gravity including causal dynamical triangulations [13][14][15], causal sets [16][17][18][19][20][21][22], and combinatorial quantum gravity [23,24]. It is impossible to list all areas of science where curvature appears and plays an important role.
However, there exist not one or two but quite many nonequivalent definitions of curvature applicable to graphs. Here is a very incomplete list of notable definitions due to Steiner [25], Regge [26,27], Bakry-Emery [28], Gromov [29,30], Higuchi [31], Eckmann-Moses [32], Forman [33][34][35], Ollivier [36][37][38], Lin-Yau [39,40], Knill [41,42], Keller [43,44], and hybrids thereof [45]. Unfortunately, none of these definitions of graph curvature is known to converge in the continuum limit to any traditional curvature of any Riemannian manifold. Therefore, it is often unclear how one should interpret different measurements of different curvatures in different graphs and networks, and how reliable such interpretations are. This problem is becoming particularly acute in view of growing evidence that different graph curvature definitions may disagree even about the sign of curvature in some paradigmatic graphs and networks [46,47].
Here we show that the Ollivier curvature [36][37][38] of random geometric graphs [48][49][50] in any Riemannian manifold converges to the Ricci curvature of the manifold in the continuum limit. This is the first result of this sort, linking rigorously a definition of graph curvature to the traditional Ricci curvature of a Riemannian manifold. To the best of our knowledge, the closest, in spirit, previous result is Cheeger et al.'s proof [27] of Regge's seminal observation [26] that an angle-defect-based curvature of increasingly finer-grained simplicial triangulations of a manifold converges to its Ricci curvature.
To proceed, we first recall what random geometric graphs (RGGs) and Ollivier curvature are. Given any Riemannian manifold, which henceforth we will often call just space, the RGGs in it are defined constructively via the following two-step procedure: 1) sprinkle points uniformly at random in the space via a Poisson point process of rate r > 0 driven by the volume form defined by the metric in the space [51], and 2) connect by edges all pairs of points whose pairwise distances in the space are smaller than threshold t > 0, Fig. 1(a). In topology, RGGs are fundamental because they are 1-skeletons of Vietoris-Rips complexes [52] whose topology was proven to converge to the space topology under very mild assumptions [53]. Here we show that their geometry also converges to the space geometry.
As with any graph curvature, Ollivier curvature captures a particular aspect of Ricci curvature, and then encodes it such that it can be extended to graphs. In Ollivier curvature, this aspect is how balls shrink or expand under parallel transport. If Ricci curvature of a space is Nodes are sprinkled randomly in the space via the Poisson point process with rate r = n. Node x is connected to all the nodes that happen to lie within the connectivity threshold radius t = εn from it in the space. The Ollivier ball radius is δn ≥ εn. Node y is at distance δn from x.
Vector v (not shown) is the tangent vector at x towards y. The probability distribution µx is the uniform distribution over all the nodes that happen to lie within distance δn from x in the graph. These are not exactly the nodes within distance δn from x in the space lying in a larger disk, but at large n the difference is negligible.
livier curvature this aspect is how balls shrink or expand under parallel transport. If Ricci curvature of a space is negative, zero, or positive, then balls in this space expand, stay the same, or shrink, respectively, under parallel transport.
To capture this property, one needs to consider the Wasserstein a.k.a. transportation distance W (µ 1 , µ 2 ) between two probability distributions µ 1 , µ 2 in any nice (Polish to be exact) metric space: where d(x, y) is the distance between points x and y in the space, and the infimum is taken over all joint probability distributions µ whose marginals are µ 1 and µ 2 . If µ 1 , µ 2 are represented by sand piles, then W (µ 1 , µ 2 ) is the minimum cost to transport pile µ 1 into pile µ 2 , where the grain of sand at x gets transported to y incurring cost d(x, y). Let x and y be now two points at a small distance d M (x, y) = δ in a Riemannian manifold M , and let µ 1 = µ x and µ 2 = µ y be the normalized restrictions of the volume form in M onto the balls B M (x, δ), B M (y, δ) of radius δ centered at x, y, i.e., µ Ollivier curvature between x and y is then defined by It was shown in [37] that where D is M 's dimension, and Ric(v, v) is the Ricci curvature at x along v which is the unit tangent vector at x pointing along the geodesic from x to y. Ricci curvature Ric(v, v) is equal to the average of sectional curvatures at x over all tangent planes containing v. The key point is that in Riemannian manifolds, the rescaled Ollivier curvature converges to Ricci curvature in the limit of small ball sizes. Moving from manifolds M to simple unweighted graphs G, let x, y be graph vertices, d G (x, y) the shortest path distance in hops between x and y, and µ x , µ y the uniform probability distributions over all vertices lying from x, y at shortest path distances d G ≤ δ hops. In the simplest settings, x and y are restricted to be neighbors, so that d G (x, y) = δ = 1 and µ x , µ y are the uniform distributions over the neighbors of x, y, representing the standard random walk in the graph. With these settings, Ollivier curvature was shown to be bounded by −2 and 1 [53]. It was measured in a great variety of synthetic and real-world networks [4-6, 10, 46, 47, 53, 54], and was investigated at great depths in connection to quantum gravity [13-15, 23, 24].
It is evident that the settings above are too restrictive to talk about any curvature convergence simply because Ollivier curvature of graphs in these settings is always between −2 and 1, while Ricci curvature of a Riemannian manifold can be any real number. Therefore, our settings below are slightly different. First, we consider RGGs whose edges are weighted by manifold distances, and then relax this requirement.
Our weighted RGG continuum limit n → ∞ is as follows. Given any nice D-dimensional Riemannian manifold M , we first fix any point x in it and any unit tangent vector v at x. We are concerned with curvature at x in the v-direction. For a given n, we add the second point y at distance δ n from x, d M (x, y) = δ n , along the geodesic from x in the v-direction. The distance δ n is also the radius of the balls in the Ollivier curvature definition above, so that it must go to zero in the limit, δ n → 0. We then add the Poisson point process (PPP) of rate r ∼ n in M to the two nonrandom points x, y. Henceforth, f (n) ∼ g(n) means lim n→∞ f (n)/g(n) = c ∈ (0, ∞), and f (n) ≈ g(n) means c = 1. The connection distance threshold t in RGGs defined on top of this PPP+{x, y} is then set to t = ε n , where ε n → 0 and ε n ≤ δ n . That is, all pairs of points are linked in RGG G if the distance between them in M does not exceed ε n . In the continuum limit of this setup, our PPP samples the manifold increasingly densely, while the RGGs become increasingly "microscopic." Every created edge is then weighted by the distance in M between the pair of nodes that the edge connects. For any two vertices z 1 and z 2 in RGG G, we thus have two distances: the manifold distance d M (z 1 , z 2 ) and the Nodes are sprinkled randomly via the Poisson point process with rate r = n. Node x is connected to all the nodes that happen to lie within the connectivity threshold radius t = εn from it in the space(time). The Ollivier ball radius is δn ≥ εn. Node y is at distance δn from x. Vector v (not shown) is the tangent vector at x towards y. The probability distribution µx is the uniform distribution over all the nodes that happen to lie within distance δn from x in the graph. These are not exactly the nodes within distance δn from x in the space(time), but at large n the difference is negligible, Appendix A 2. negative, zero, or positive, then balls in this space expand, stay the same, or shrink, respectively, under parallel transport.
To capture this property, one needs to consider the Wasserstein a.k.a. transportation distance W (µ 1 , µ 2 ) between two probability distributions µ 1 , µ 2 in any nice (Polish to be exact) metric space: where d(x, y) is the distance between points x and y in the space, and the infimum is taken over all possible transportation plans which are joint probability distributions µ whose marginals are µ 1 and µ 2 . If µ 1 , µ 2 are represented by sand piles, then W (µ 1 , µ 2 ) is the minimum cost to transport pile µ 1 into pile µ 2 , where the grain of sand at x gets transported to y incurring cost d(x, y). Let x and y be now two points at a small distance d M (x, y) = δ in a Riemannian manifold M , and let µ 1 = µ x and µ 2 = µ y be the normalized restrictions of the volume form in M onto the balls where vol(z) dz is the volume element in M . Ollivier curvature between x and y is then defined by It was shown in [37] that where D is M 's dimension, and Ric(v, v) is the Ricci curvature at x along v which is the unit tangent vector at x pointing along the geodesic from x to y. Ricci curvature Ric(v, v) is equal to the average of sectional curvatures at x over all tangent planes containing v. The key point is that in Riemannian manifolds, the rescaled Ollivier curvature converges to Ricci curvature in the limit of small ball sizes.
Moving from manifolds M to simple unweighted graphs G, let x, y be graph vertices, d G (x, y) the shortest path distance in hops between x and y, and µ x , µ y the uniform probability distributions over all vertices lying from x, y at shortest path distances d G ≤ δ hops. In the simplest settings, x and y are restricted to be neighbors, so that d G (x, y) = δ = 1, and µ x , µ y are the uniform distributions over the neighbors of x, y, representing the standard random walk in the graph. With these settings, Ollivier curvature was shown to be bounded by −2 and 1 [54]. It was measured in a great variety of synthetic and real-world networks [4-6, 10, 46, 47, 54, 55], and was investigated at great depths in connection to quantum gravity [13-15, 23, 24].
It is evident that the settings above are too restrictive to talk about any curvature convergence simply because Ollivier curvature of graphs in these settings is always between −2 and 1, while Ricci curvature of a Riemannian manifold can be any real number. Therefore, our settings below are slightly different. First, we consider RGGs whose edges are weighted by manifold distances, and then relax this requirement.
Our weighted RGG continuum limit n → ∞ is as follows. Given any nice D-dimensional Riemannian manifold M , we first fix any point x in it and any unit tangent vector v at x. We are concerned with curvature at x in the v-direction. For a given n, we add the second point y at distance δ n from x, d M (x, y) = δ n , along the geodesic from x in the v-direction. The distance δ n is also the radius of the balls in the Ollivier curvature definition above, so that it must go to zero in the limit, δ n → 0. We then add the Poisson point process (PPP) of rate r ∼ n in M to the two nonrandom points x, y. Henceforth, f (n) ∼ g(n) means lim n→∞ f (n)/g(n) = c ∈ (0, ∞), and f (n) ≈ g(n) means c = 1. The connection distance threshold t in RGGs defined on top of this PPP+{x, y} is then set to t = ε n , where ε n → 0 and ε n ≤ δ n . That is, all pairs of points are linked in RGG G if the distance between them in M does not exceed ε n . In the continuum limit of this setup, our PPP samples the manifold increasingly densely, while the RGGs become increasingly "microscopic." Every created edge is then weighted by the distance in M between the pair of nodes that the edge connects. For any two vertices z 1 and z 2 in RGG G, we thus have two distances: the manifold distance d M (z 1 , z 2 ) and the graph distance d G (z 1 , z 2 ) defined as the shortest path distance in the weighted graph G. If z 1 and z 2 are linked in G, then d G (z 1 , z 2 ) = d M (z 1 , z 2 ), also equal to the weight of edge (z 1 , z 2 ). Ollivier's balls centered at x, y are then the sets of G's vertices z lying within graph distances d G ≤ δ n from vertices x, y: The probability distributions µ x , µ y are the uniform distributions on the vertex sets B G (x, δ n ), B G (y, δ n ). Finally, Ollivier graph curvature κ G (x, y) is defined by the same equation (2), except that probability distributions µ x , µ y are over balls not in manifold M but in graph G, Fig. 1(a): We emphasize that as opposed to previous literature on Ollivier curvature of graphs, we do not require the RGG connection radius ε n to be equal to the Ollivier ball radius δ n , Fig. 1(a). This is because we do not want to exclude sparse and ultrasparse graphs from the consideration in the limit. We call graphs dense, sparse, and ultrasparse or truly sparse if their expected average , and ∼ const., respectively. In the continuum limit of (truly) sparse graphs, the connection radius ε n may be so small that all that one-hop neighborhoods of individual edges can "feel" is locally Euclidean flatness, so that we need to consider much larger neighborhoods in graphs, δ n ε n , to "feel the curvature." With these settings and notations, we can show that in the continuum limit, rescaled Ollivier curvature in random graph G converges to Ricci curvature in the underlying manifold M in the following strong sense if the RGG connectivity radius ε n and Ollivier's ball radius δ n shrink to zero with n as with exponents α, β satisfying If the two balls shrink at the same rate ε n ∼ δ n ∼ n −α , then we need The expectation · in (5) is w.r.t. the RGG ensemble, in which κ G (x, y) is random. The result in (5) is strong in the sense that it implies not only the convergence of the expected value of rescaled Ollivier curvature to Ricci curvature, κ G (x, y) /δ 2 n ≈ Ric(v, v)/2(D + 2), but also the concentration of random κ G (x, y) around its expected value, Prob[|κ G (x, y)/δ 2 n − Ric(v, v)/2(D + 2)| > ε] → 0 for any ε > 0.
We note that the conditions (8-10) allow graphs to be arbitrarily sparse, but not exactly ultrasparse. This is because β can be arbitrarily close to zero, albeit not exactly zero, but the closer the β to zero, the closer the α can be to 1/D, while α = 1/D corresponds to the ultrasparse limit. We note though that the closer the β to zero, the slower the convergence, simply because Ollivier's balls shrink too slowly in this case.
We prove (5) in three steps, outlined in Appendix A. First, we approximate distances in RGGs G by distances in space M , and find the associated error. Then we show that the Wasserstein distance between the uniform distributions on balls in G with respect to the two distances is negligible. At the last step, we show that the Wasserstein distance between the uniform and continuous distributions on graph and space balls is also negligible. All these approximations must be done with care to guarantee that the error introduced at each step is bounded by δ 3 n → 0. This is because if the error is so small, then (5) follows from (3).
The fact that we can prove the Ollivier→Ricci convergence only under the conditions (8-10) does not mean that there is no convergence outside of this parameter region, and this is indeed what we observe in simulations.
For simulations we select the sphere, torus, and the Bolza surface as the three 2D manifolds of constant Ricci curvature +1, 0, and −1. The Bolza surface [56,57] is the simplest hyperbolic 2D manifold of genus 2 with no boundaries. We consider the Bolza surface, versus something simpler of negative curvature, such as a hyperbolic disk, because we want our manifolds to have no boundaries. We want this, because computing Ollivier curvature in simulations is a major challenge at large n, so that we do not want any boundary effects at any n.
We then fix x and y as described above, sprinkle n points on the three manifolds uniformly at random according to the manifold volume form, and link all pairs of points at distances ≤ ε n on the manifold. We then find Ollivier's balls B G (x, δ n ), B G (y, δ n ), set up the uniform probability distributions µ x = 1/|B G (x, δ n )|, µ y = 1/|B G (y, δ n )| on them, and compute all the pairwise distances d G (x i , y j ) in graph G between nodes i ∈ B G (x, δ n ) and j ∈ B G (y, δ n ) whose coordinates are x i and y j . All this data allow us to compute the Wasserstein distance W G (µ x , µ y ) by solving the linear program where the minimization is over transportation plans ρ ij whose entries describe pairwise movements of probability masses. Having computed this W G (µ x , µ y ), Ollivier Ollivier-Ricci curvature convergence in constant-curvature random geometric graphs. Panel (a) shows the simulation data for rescaled Ollivier curvature (4) in weighted random geometric graphs on the D = 2-dimensional torus, sphere, and Bolza surface. The scaling exponents of the connectivity radius (6) and Ollivier's ball radius (7) in the graphs are set to α = β = 0.16 < 1/6 lying within the proof-accessible regime (8)(9)(10). Panel (b) further shows apparent convergence outside of this regime with α = {1/4, 1/2} and β = 1/4. The value of α = 1/2 corresponds to ultrasparse graphs. The error bars represent the standard error σ/ √ ns, where σ is the standard deviation and ns = 10 · 2 17 /n is the number of sampled random graphs of size n. The smallest graph size n is chosen to ensure the graphs are connected, while the largest size n = 2 17 is bounded by memory constraints of the MOSEK package. curvature is finally given by (4). Further details on the simulations are in Appendix C.
The results are shown in Fig. 2. Remarkably, we observe that rescaled Ollivier curvature in our RGGs converges to Ricci curvature of the underlying manifold not only in the parameter regime accessible to our proofs, Fig. 2(a), but also well outside of this regime, for much sparser graphs, Fig. 2(b). In particular, we observe that the convergence is as good for ultrasparse graphs-that is, graphs with constant average degree corresponding to α = 1/2-as for denser graphs.
Yet it is certainly the case that the weighted RGGs that we have been considering thus far contain too much information about the manifold in their edge weights. Is it at all possible to talk about any curvature convergence in unweighted RGGs? As mentioned above, since Ollivier curvature of unweighted graphs is always between −2 and 1, the answer is obviously 'no'. However, if all edges are weighted by the same weight, playing the role of the scale of the system, such as the Planck scale, then the situation changes drastically.
This "Planck scale" in our case is the RGG connectivity radius ε n because if the weights of all edges are set to be ε n , then the distance d G (z 1 , z 2 ) = ε n d G (z 1 , z 2 ), where d G (z 1 , z 2 ) is now the shortest path hop distance between vertices z 1 and z 2 in the unweighted RGG G, is an approximation to the manifold distance d M (z 1 , z 2 ). Unfortunately, how good this approximation is (characterized by stretch d G /d M ) has been rigorously documented only for the RGGs in the Euclidean plane [58]. Since the distances we approximate are bounded by Ollivier's ball radius δ n that tends to zero, and since any Riemannian manifold is locally Euclidean, our result below relying on [58] holds for any 2D-manifold. Analogous results for higher dimensions can be worked out as soon as stretch results for higher dimensional RGGs are obtained.
The result is that if all edges in our RGGs are now weighted by ε n , then we can prove the Ollivier→Ricci convergence in the same strong sense (5) if Similar to the weighted case, these conditions allow for arbitrarily sparse but not ultrasparse graphs (β → 0, α → 1/2). Unlike the weighted case, these conditions imply that, as expected, the convergence is slower: β < 1/9 (2D unweighted) versus β < 1/6 (2D weighted). Another difference with the weighted case is that these conditions do not allow for the connectivity radius ε n to be the same as Ollivier's ball radius δ n : α ≥ 3β ⇒ δ n ε n . We believe this is not a deficiency in our proof techniques, but a reflection of reality. It seems obvious that one must consider large "mesoscopic" graph neighborhoods of size δ n ε n to "feel" any curvature in unweighted or scaleweighted graphs since one-hop "microscopic" neighborhoods of individual links in these graphs are so small that all they can "feel" is locally Euclidean flatness.
The unweighted proof, outlined in Appendix B, follows a strategy similar to the weighted case. The key difference is that in the weighted case, the graph distance already provides a sufficiently accurate approximation to the manifold distance. This is no longer true if we consider just the shortest path hop counts. However, if these counts are multiplied by ε n , then the resulting rescaled lengths of sufficiently long paths in our RGGs still approximate sufficiently well the corresponding manifold distances [58].
We have thus established a rigorous connection between curvatures of discrete and continuous objectsrandom graphs and Riemannian manifolds. Anecdotally, this connection is reminiscent of what Riemann had in mind working on the foundations of Riemannian geometry [59]. The closest results to ours, dealing with convergence of curvature of simplicial triangulations of manifolds in Regge's calculus [26,27] and its many derivatives [60], are very different. Informally, while any simplex in any simplicial triangulation of any manifold is a chunk of space, graphs are much more primordial objects in that they do not have any space attached to them whatsoever.
Our proofs appear to leave much space for improvement since the simulations suggest that the convergence holds well outside of the parameter regions accessible by our proof techniques, which is an exciting news for a variety of applications, such as manifold learning and many others mentioned at the beginning. It would be also interesting to know what other definitions of graph curvature converge, if any.
Even more interesting, especially in the context of quantum gravity, is the inverse problem of geometrogenesis [23,61]: can we go from graphs to manifolds? One option to address this question is conceptually similar to [62]. One can consider a canonical Boltzmann/Gibbs ensemble of random graphs whose edges or shortest paths have some fixed expected values of Ollivier curvature, and whose entropy is maximized under these constraints. The questions then are: is this ensemble equivalent (in the continuum limit) to the RGG ensemble on a Riemannian manifold with the corresponding values of Ricci curvature, and do we have any (second-order) phase transitions in this ensemble? Indeed, our results suggest unambiguously that the analogy of the Einstein-Hilbert action in random graphs must be properly averaged Ollivier curvature whose continuum limit is Ricci curvature. Therefore, the temperature in this canonical ensemble must be analogous to the gravitational coupling constant. In that context, the settings considered in this paper are particularly interesting because they allow one to probe graph curvature at any scale-Ollivier curvature (4) is well defined for any shortest path between any pairs of nodes x, y located at any distance in the graph.
However, for quantum gravity applications one first needs to address the difference between Riemannian spaces and Lorentzian spacetimes. To the best of our knowledge, no notion of RGGs or Rips complexes have been defined for the latter. Since causal diamonds, a.k.a. Alexandroff sets, Fig. 1(b), in Lorentzian geometry play the role of balls in Riemannian geometry-the topology defined by Alexandroff sets agrees with the base topology in any nice Lorentzian spacetime [63]-we propose to define Lorentzian RGGs as shown in Fig. 1(b). Could results similar to the ones presented here be then obtained for Lorentzian spacetimes? Universities. Experiments were conducted using the Discovery cluster, supported by Northeastern University's Research Computing team, and the Béluga, Graham, and Cedar clusters, supported by Compute Canada and its regional partners Calcul Québec, Compute Ontario, and Westgrid. If d M (x, y) ≤ Cδ n , for some C > 0, then The second term in the last equation is δ 3 n by (A5), and so is the first term if (9) holds. We therefore conclude that Finally, the Wasserstein distances W M and W M are defined by distances d M and d M , respectively. Therefore we immediately conclude that n for any two probability distributions µ 1 and µ 2 on M .
If now x, y are also any two nodes of our RGG G, then d G (x, y) = d M (x, y), so that it follows that if µ 1 and µ 2 are defined on G's nodes, then The calculations above thus show that from now on we can always work only with the Wasserstein distance W M in the space M , and this indeed what we do at the next steps.

Approximating probability distributions in graphs
The next hurdle we need to overcome is comparing the uniform probability distribution µ G x on B G (x, δ n ) with µ M x , the normalized restriction of the volume form to B M (x, δ n ). The main difficulty here lies in that d M (x, z) ≤ δ n does not necessarily imply that d G (x, z) ≤ δ n . In general, the shape of the intersection B G (x, δ n ) ∩ B M (x, δ n ) can be highly non-trivial. We tame this shape by introducing a new probability distributionμ G x which is the uniform probability distribution on the set B G := B M (x, δ n ) ∩ G which is the set of G's nodes that happen to lie within distance δ n from x in the space, i.e., all nodes in the rose ball in Fig. 1(a). Observe that First, consider (A6). If the conditions (8-10) hold, then λ n δ 3 n and λ n /ε n δ 2 n . It follows then that there exists a ξ n δ n such that For instance, we can take the ξ n to be ξ n = max{ 3λ n /ε n , λ 1/3 n }. Define now the new radii δ ± n = (δ n ± ξ 3 n )/(1 ∓ ξ 2 n ), and note that δ − n < δ n < δ + n , so that x denote the uniform probability distribution on B ± G . We then obtain an upper bound for W M (μ G x , µ + x ) by considering the following joint probability distribution a.k.a. transport plan µ for is the infimum over all joint distributions (1), and sinceμ G x and µ + x are discrete, it follows that Since we are dealing with a PPP, both |B + G \B G | and |B + G | are Poisson distributed with diverging means. Although they are not completely independent, it can be shown that This fraction is of the order ξ 2 n δ 2 n , so that we conclude Finally, let node z lie in the manifold ball of radius δ − n around node x, d M (x, z) ≤ δ − n . Then (A10) implies that the graph distance between x and z is upper bounded by δ n : This means that z ∈ B G (x, δ n ), so that we have the following sandwich: Using calculations similar to the ones above for this sandwich, we obtain Therefore, by the triangle inequality of the W M distance, The calculations above thus show that from now on we can work with the probability distributionμ G x instead of µ G x . The former is more convenient to work with than the latter because it is over G's nodes that happen to lie within manifold ball B M (x, δ n ), the rose ball in Fig. 1(a).

Going from discrete to continuous probability distributions
The final step is to go from the discrete probability distributionμ G x to the continuous distribution µ M x . Similar to the goal of the previous section, the task is to show that This is done by applying results on the matching distance between a PPP and points of a grid on the same space [64][65][66][67]. Note that bothμ G x and µ M x are now defined on B M (x, δ n ). We now need to devise a transport plan fromμ G x to µ M x that assigns to any (measurable) set A ⊆ B M (x, δ n ) how much mass from each PPP point in B G = G ∩ B M (x, δ n ) is used to make up µ M x (A). We do so as follows.
First, suppose that space M is flat, the D-dimensional Euclidean space. We construct our transport plan in three steps: 1. we first place a grid on the space; 2. we next find a minimal matching between this grid and the PPP points in B G ; and 3. finally, for each (measurable) set A in B M (x, δ n ), we find all the PPP points in B G that are matched to the grid points that lie in A.
Observe that this transport plan is such that every PPP point, i.e., graph vertex in B G contributes an equal fraction of its mass to make up µ M x (A). This plan is also such that the largest distance any amount of mass has to move is given by the largest matching distance. Relying on the results from [64][65][66][67], it can then be shown that where the last inequality holds if the conditions (8,9) are satisfied. Next, we relax the condition that the space M is flat. Indeed, it can be any nice Riemannian manifolds because δ n → 0, so that we can map the ball B M (x, δ n ) to the flat D-dimensional tangent space at x using the exponential map. We have to be careful here though, because the exponential map does not preserve distances. Still, by fixing a sufficiently small neighborhood U around the origin of the tangent space, we can ensure that for large enough n, exp −1 B M (x, δ n ) ⊂ U , and the distances are distorted by a fixed small amount: where B D (0, δ) is the D-dimensional Euclidean ball of small radius δ around the origin in the tangent space. Finally, relying on this mapping and on the PPP Mapping Theorem [51] that applies to U , we see that the mapped PPP is still a PPP with intensity ∼ n, now relative to the Euclidean volume form. Moreover, since the mapped ball is sandwiched between two balls whose radii scale as δ n , the results for the D-dimensional Euclidean space above yield matching lower and upper bounds for the Wasserstein distance, from which (A12) follows.
The ε n in (B1) is δ 3 n if the lower bound in (13) holds. Therefore, to translate (B1) to (A10), all we need to do is to show that γ n δ 2 n . If this holds, then we can simply take the ξ n in (A10) to be ξ n = max{ √ γ n , ε which is indeed true if the upper bound in (13) holds.
is the hyperbolic distance between the points in the Poncaré model. We compute what nodes are linked in parallel using a multi-GPU solution, where each thread works on one pair of nodes, except in the case of the Bolza surface where each thread computes one or two of 49 distances and each thread block works on two node pairs. The adjacency matrix is first tiled such that (1) the data required to generate each tile fits on a single GPU and (2) the total number of tiles is a multiple of the number of GPUs available. This decomposition provides a scalable solution independent of the number or type of GPUs present in the system. The algorithm is improved by using the shared L1 memory cache to store the Bolza generators and partial results, warp shuffling to accumulate results, function templating to eliminate kernel branches, and asynchronous CUDA calls to pipeline data transfers across different GPUs.

Distance matrix computation
After constructing the graph, we identify the balls B G (x, δ n ) and B G (y, δ n ). When ε n = δ n , these are simply the nearest neighbors of x and y, respectively; otherwise, they are calculated using Dijkstra's algorithm [68]. For the largest graphs we consider, the size of these balls can be up to tens of thousands of nodes, resulting in over billions of pairwise distances between the nodes in the two balls. The overall simulation bottleneck is the computation of the matrix of the weighted shortest-path distances between these large sets of nodes.
We employ a custom multi-GPU A * search algorithm using the methods described in [69]. The standard A * algorithm works by constructing a priority queue of visited nodes z using a binary heap. In computing the shortest path distance between a source node x i and destination node y j , priorities are assigned to node z in the queue according to the heuristic function where d G (x i , z) is the weighted graph distance between z and source x i ∈ B G (x, δ n ), while h(z, y j ) is the lowerbound estimate of the distance between z and destination y j ∈ B G (y, δ n ). As soon as new node z ∈ B G (y, δ n ) is added to the priority queue, its weighted graph distance d G (x i , z ) is added to the distance matrix. The A * algorithm is implemented on a single GPU by utilizing multiple priority queues, one per CUDA thread, so that it is efficient for graphs with large average degrees of the order of hundreds. Each priority queue extracts multiple states, after which we detect duplicates using a technique called parallel hashing with replacement, which is a modification of the cuckoo hashing scheme that avoids hash conflicts by allowing some duplicates to remain [69,70]. During this step, the heuristic for extracted nodes is updated, and they are re-added to the priority queues using a parallel heap insertion. This procedure continues until the destination node y j has been extracted by at least one of the priority queues.

Wasserstein distance computation
Having in place the distance matrix between the sets of nodes in the two balls B G (x, δ n ) and B G (y, δ n ), we compute the Wasserstein distance between the uniform probability distributions on these balls by solving the linear program (11). For the simulations presented here, we found it sufficient to use the MOSEK package [71] as long as the number of variables, given by |B G (x, δ)||B G (y, δ)|, is roughly less than 1.3 × 10 9 , past which we run out of memory. In most cases with smaller balls, however, we solve the linear program quite quickly using the standard primal-dual interior point method [72].