Small worlds and clustering in spatial networks

Networks with underlying metric spaces attract increasing research attention in network science, statistical physics, applied mathematics, computer science, sociology, and other ﬁelds. This attention is further ampliﬁed by the current surge of activity in graph embedding. In the vast realm of spatial network models, only a few reproduce even the most basic properties of real-world networks. Here, we focus on three such properties—sparsity, small worldness, and clustering—and identify the general subclass of spatial homogeneous and heterogeneous network models that are sparse small worlds and that have nonzero clustering in the thermodynamic limit. We rely on the maximum entropy approach where network links correspond to noninteracting fermions whose energy dependence on spatial distances determines network small worldness and clustering.

In spatial networks, nodes are positioned in a geometric space, and the distances between them in the space affect their linking probability in the network [1].In real-world systems, such spaces can be explicit/physical, as in geographically embedded networks [2,3] or in the Ising model with long-range interactions [4][5][6].Yet these spaces can be also hidden/latent.Latent similarity spaces have been employed for nearly a century to model homophily in social networks, for instance [7,8]: the closer the two people are in a virtual similarity space, the more similar they are, the more likely they know each other [9].Another field where the space can be virtual are graph embeddings in computer science and machine learning, with applications including network compression, visualization, and node labeling [10,11].
In models of spatial networks, the space is usually explicit.Perhaps the simplest spatial network model is that of random geometric graphs that have been extensively studied in mathematics and physics since the early 60ies [12][13][14][15].In these graphs, nodes are positioned in a space randomly using a point process, usually a Poisson point process, and two nodes are linked in the graph if the distance between them in the space is less than a fixed threshold.If the intensity of the point process does not depend on the graph size n, then the resulting graphs are sparse and have nonzero clustering in the thermodynamic n → ∞ limit, thus sharing these two properties with many real-world complex networks [16,17].Yet many of these networks are also heterogeneous small worlds, while random geometric graphs are homogeneous large worlds.This mismatch was resolved in [18,19] where a class of models of spatial networks that are sparse heterogeneous small worlds with nonzero clustering was introduced.Networks in these models have some additional properties commonly observed in real-world networks, such as self-similarity [18,20] and community structure [21][22][23].Yet the following question remains: what are the general requirements to spatial network models so that networks in these models possess the properties of real-world networks?
Here, we first focus on just three properties: (1) sparsity, (2) small worldness, and (3) nonzero clustering.Simplifying the results a bit, we show that spatial networks in R d have all these three properties at once only if the probability p ij of connection between nodes i and j scales with the distance x ij between them in R d as p ij ∼ x −β ij with β ∈ (d, 2d).We then add (4) heterogeneity to the list of the requirements, and show that β must be within the same range (d, 2d) if the variance of the degree distribution is finite.If it is infinite, however, e.g. if it is a power law with exponent γ ∈ (2, 3), then the networks are always ultrasmall worlds, and any β > d satisfies all the four requirements.Finally, we show that if we also want to suppress nonstructural degree correlations, then the unique shape of the connection probability in the heterogeneous case is as in [18,19]: , where κ i , κ j are the expected degrees of nodes i, j.
To obtain these results, we take a statistical physics stance in which we interpret spatial network models as probabilistic mixtures of grand canonical ensembles that maximize ensemble entropy under certain constraints, and are thus statistically unbiased.We call these mixtures hypergrandcanonical ensembles, as some of their parameters are random, and random parameters are known as hyperparameters in statistics.
Settings and notations.We consider a very general class of spatial network models.The space is any compact homogeneous and isotropic Riemannian manifold of dimension d and volume n, and with no boundaries.We require the curvature of the manifold to go to zero at n → ∞.That is, the space is locally the Euclidean space, and it is exactly the Euclidean space R d in the thermodynamic limit.Examples are the d-sphere or d-torus of size growing with n such that its volume is n.Any growing compact d-dimensional hyperbolic manifold with no boundaries is also fine.On such a manifold we sprinkle n points uniformly at random according to the manifold metric.These points are thus the binomial point process of rate 1 on the manifold, and they form the node set of a random graph.Conditioned on node coordinates on the manifold, nodes i and j are connected independently with probabilities p ij = p(x ij ), where x ij is the distance between i and j on the manifold.By a ij we denote the adjacency matrix of these random graphs: conditioned on node coordinates, a ij s are independent Bernoulli random variables with success rates p ij .These graphs are known as soft random geometric graphs [24,25].
We interpret these random graph ensembles as probabilistic mixtures of grand canonical ensembles that maximize ensemble entropy under the constraints that the average number of particles and average energy are fixed to given values.Particles are edges a ij here, and their energies ε ij depend on distances x ij : ε ij = f (x ij ).The connection probability function p(x) then takes the familiar Fermi-Dirac form, see Sec.A, The Lagrange multipliers corresponding to the numberof-particles and energy constraints are the chemical potential µ and inverse temperature β ≥ 0, as usual.We assume that neither f (x) nor β depend on n, but µ, and consequently the absolute activity λ = e βµ , can depend on n as they usually do in statistical physics.Since energies ε ij are not fixed as in grand canonical ensembles but are random instead, we call this ensemble a hypergrandcanonical ensemble.We require our networks to be always sparse, meaning that the expected average degree in them is fixed to a finite positive constant k for any network size n.We call a network model a small world if the average hop distance of shortest paths in the model networks grows slower than any polynomial of n.In particular, average distances growing as any polynomial of ln n in a model would render the model a small world.The model is also an ultrasmall world if the average distance grows slower than any polynomial of ln n.If a model is not a small world, we call it a large world.By clustering we mean the average local clustering coefficient.Symbol '∼' in a n ∼ b n or a(x) ∼ b(x) means that a n /b n or a(x)/b(x) converge to a finite positive constant at n → ∞ or x → ∞, respectively.
Homogeneous spatial networks.In any network model satisfying the settings above, the degree distribu-Parameter regime Small world Clustering The result summary for homogeneous networks.Small world : yes/no: the networks are small/large worlds.Clustering: yes/no: the networks have nonzero/zero clustering in the thermodynamic (n → ∞) limit.ER: Erdős-Rényi random graphs [29].RGG: sharp random geometric graphs in R d [12].Parameters: β the inverse temperature, d the space dimension, l inf = lim infx→∞ f (x)/ ln x, lsup = lim sup x→∞ f (x)/ ln x, where f (x) is the energy function: εij = f (xij), where xij is the distance between nodes i and j in the space.Note that lsup = 0 corresponds to f (x) growing slower than logarithmically, in which case the networks are in the first regime for any value of β < ∞.Note that l inf = ∞ corresponds to f (x) growing faster than logarithmically, in which case the networks are in the last regime for any value of β > 0. Note that f (x) = c ln x + o (ln x) corresponds to and lsup ≥ 2l inf require further details about the specific shape of f (x) to classify the network into one of the three shown classes.
tion is homogeneous: in the thermodynamic limit n → ∞ it converges to the Poisson distribution with the mean equal to the average degree in the network [27,28].By network homogeneity we mean here not only degree homogeneity, but also all the consequences of the manifest invariance of these ensembles with respect to the group of isometries of the manifold.In particular, the expected values of any graph property of any two nodes in these random graphs are the same.For instance, not only the expected degree, but also the expected clustering of any two nodes is the same and equal to the average clustering in the network.The main question is under what conditions these networks are small worlds and have nonzero clustering in the thermodynamic limit.
The results are summarized in Table I.Intuitively, they are easy to comprehend.If f (x) grows too fast with x, so that p(x) decays too fast, then the network does not have sufficiently long links that are needed for small worldness.The network is thus necessarily a large world.On the other hand, if f (x) grows too slow with x, so slow that with an n-independent p(x) the average degree diverges, then to have an n-independent average degree the absolute activity λ must depend on n and go to zero at n → ∞, meaning that p ij s go to zero as well.But since clustering scales with n the same way as p ij s do (recall that average clustering is the probability that two random neighbors of a random node are connected), it is zero at n → ∞.Luckily, there exists a "sweet spot" at which the rate of growth of f (x) is not too fast and not too slow, so that the networks are small worlds and have nonzero clustering at the same time, the second regime in Table I.
To show that this sweet spot (or range indeed) is as shown in Table I, we first observe that the average degree in our graphs is ( This is because the number of nodes at distances [x, x + dx] from a given node i in R d is proportional to x d−1 dx, node i is connected to each of those nodes with probability p(x), and we integrate up to the space diameter, which is ∼ n 1/d .The lower integration limit is any positive constant.
If the integral I n diverges with n, then λ must depend on n and go to zero at n → ∞ to yield an nindependent k above.But if λ tends to zero, then p ij ∼ λe −βεij tends to zero as well, and so does clustering.The integral I n diverges if the monotonic function f (x) does not grow sufficiently fast.In particular, I n diverges if l sup = lim sup x→∞ f (x)/ ln x < d/β, the first regime in Table I.On the other hand, if I n converges-in particular, it does so if l inf = lim inf x→∞ f (x)/ ln x > d/β, the second row in Table I-then λ is strictly positive, and so are p ij s and clustering.
Turning to small worldness now, the network is a small world only if it contains links connecting nodes located at distances x ij of the order of the space diameter ∼ n 1/d , as well as at all other smaller distances.Let l(x) be the distribution of link lengths x, defined as distances between linked nodes in the space.Observe that l(x) ∼ Since networks are sparse, there are ∼ n links.The expected maximum value among n samples from a power law with exponent δ is ∼ n ξ with ξ = 1/(δ − 1) [30].The network is a small world only if this expected maximum link length is larger than the space diameter ∼ n 1/d , which implies ξ > 1/d or β < 2d/c, cf. the second row in Table I with l sup = c.If f (x) grows faster than logarithmically, l inf = ∞, then l(x) decays faster than a power law, ξ = 0, and there are no long links at all, so that our networks are necessarily large worlds, the last regime in Table I.
This logic is about the necessary conditions for small worldness, but they have been proven to be also sufficient [31][32][33], and we confirm all the results above in simulations in Fig. 1 (small worldness) and in Fig. 3 (clustering) in the Sec.E. Figure 1 shows that the average shortest path length l s scales with the network size n as l s ∼ ln b n if β < 2d, and as l s ∼ n b if β ≥ 2d.In the small world regime β < 2d, the exponent b in l s ∼ ln b n is close to 1 for any β < d, while for β ∈ (d, 2d) it is a growing function of β that appears not to diverge but to approach some finite maximum value as β approaches 2d.In the large-world regime β ≥ 2d, exponent b in l s ∼ n b is also growing function of β ranging in values from some minimum value at β = 2d that does not appear to be zero, to its theoretical maximum b = 1/d at zero temperature β → ∞ corresponding to sharp RGGs.The nature of the small-to-large world phase transition at β = 2d appears to be an interesting open question [32].The simulations can hardly reach network sizes that are sufficiently large to provide any hints regarding whether this transition is continuous or discontinuous, yet the results in Fig. 1 suggest the latter since the continuous transition would yield small-world b → ∞ at β → 2d − and large-world b → 0 at β → 2d + .
Heterogeneous spatial networks.Instead of the chemical potential µ, the Lagrange multiplier that fixes the expected average degree in the homogeneous ensemble, in the heterogeneous ensemble we have n Lagrange multipliers α i that fix the expected degree k i = j a ij of each individual node to a desired value κ i .The relations between κ i and α i are documented in Sec. C. Here, we assume that the parameters κ i are hyperparameters, meaning they are random and sampled from a fixed distribution ρ(κ), in which case we have the same hypergrandcanonical ensemble as in the homogeneous case above, except that the connection probability changes from (1) to The degree distribution in this ensemble converges to the mixed Poisson distribution P (k) = (1/k!)κ κ k e −κ ρ(κ) dκ whose shape "follows" the shape of ρ(κ) [27,28].This type of heterogeneous spatial network models were first introduced in [18], and many other similar classes of models have been defined and studied since then [34][35][36].
The qualitative behavior of clustering-zero versus nonzero in the thermodynamic limit-is exactly the same in these heterogeneous models as in the homogeneous one.Indeed, the expression (2) for the average degree k changes to where λ = e −α 2 , and ρ(α) is the distribution of Lagrange multipliers determined by the distribution of expected degrees ρ(κ).Following exactly the same reasoning as in the homogeneous case, albeit applied to λI n instead of λI n , we thus conclude that clustering is zero or nonzero at n → ∞ depending on whether I n diverges or converges.For f (x) = ln x for example, this means that the situation is exactly the same as in the homogeneous case: the clustering is zero if β < d and nonzero if β > d.
Turning to small worldness, we assume henceforth that f (x) = ln x.We do so not only to simplify the discussion, but also because we prove in Sec.B that f (x) = ln x is unique in the sense that this is the only possible form of f (x) that does not induce any degree correlations other than the structural ones [30].We also assume that the distribution ρ(κ) of expected degrees κ is the Pareto distribution ρ(κ) = (γ − 1)κ γ−1 0 κ −γ , where κ ≥ κ 0 > 0 and γ > 2.
(5) We note that the networks defined by (3,5) with f (x) = ln x were introduced in [18] and are equivalent to random hyperbolic graphs [19].
The calculation of the link length distribution l(x) in this case in Sec.D yields l(x) ∼ x −δ with δ = β − d + 1 if β < d(γ − 1), and δ = d(γ − 2) + 1 otherwise.Following the same logic behind the necessary conditions for small worldness as in the homogeneous case, which says that the networks can be small worlds only if ξ = 1/(δ − 1) > 1/d, we conclude that small worlds are possible if β < 2d or γ < 3, or both.The networks are necessarily large worlds if β > 2d and γ > 3. A more detailed analysis proves that these necessary conditions for small worldness are also sufficient [38][39][40].In TABLE II.The result summary for the heterogeneous networks with f (x) = ln x and Pareto ρ(κ) as in [18].The abbreviations are: HSCM : the hypersoft configuration model [37]; RHG: sharp random hyperbolic graphs in H d+1 [19]; USW : ultrasmall worlds; SW : small worlds; LW : large worlds; ZC : zero clustering at n → ∞; PC : positive clustering at n → ∞.If f (x) grows slower or faster than logarithmically, then the networks are in the first and last rows, respectively.The γ = ∞ case is the homogeneous ensemble in Table I. fact, the qualitative clustering/small-worldness yes/no diagram for any γ > 3 is exactly the same as in Table I for the homogeneous ensemble with f (x) = ln x and l inf = l sup = 1.If γ < 3, then our networks are worlds that are not only small but also ultrasmall, regardless of the value of β [41,42].
Table II collects all the results, and Fig. 2 and Figs.3-4 in Sec.E confirm them in simulations.One sees in Fig. 2 that if γ > 3, then the simulation results are qualitatively similar to the homogeneous case, except that they are noisier, and the values of exponent b are significantly smaller.If the network size is small, such small values of b can be deceiving, making these large worlds appear as small worlds.
We finally remark that the homogeneous ensemble is the γ → ∞ limit of the heterogeneous one, because at γ → ∞ the Pareto distribution ρ(κ) becomes the degenerate distribution δ(κ − κ 0 ), so that ρ(α) → δ(α + βµ/2) recovering ( 1) from (3).In the infinite temperature β → 0 limit, the connection probability (3) is equal to 1/(e αi+αj + 1), which is the connection probability in the hypergrandcanonical or hypersoft configuration model that defines the unique ensemble of unbiased random graphs whose entropy is maximized across all graphs with a given degree distribution [37].In the opposite zero temperature β → ∞ limit, the ensemble is equivalent to random hyperbolic graphs with a sharp connectivity threshold [19].Finally, the γ → ∞, β → 0 limit is ER.
In summary, in spatial networks that are either homogeneous or have a finite degree distribution variance, the decay of the connection probability function with distance x in a space of dimension d must be between ∼ x −d and ∼ x −2d to yield sparse small worlds with nonzero clustering.If the degree distribution variance is infinite though, then the spatial networks are ultrasmall worlds with any connection probability, and they have nonzero clustering if this probability decays with x faster than x −d .Small worldness is linked to link energy and the distribution of link lengths.Networks are small worlds if they contain links of all lengths up to the space diameter.Clustering is dictated by the integrability of the connection probability function.If it is not integrable, then it must decay with the networks size n to let the network be sparse, but then clustering is zero.This is directly related to the important notion of projectivity [43,44]: if the connection probability depends on n, then the network model is not projective, leading to nonlocal effects that cannot be present in any real-world network [37,45,46].We thus see that any realistic model of sparse spatial networks must necessarily have nonzero clustering, which is natural.
As a final comment, we have presented spatial network models as hypergrandcanonical ensembles, probabilistic mixtures of grand canonical ones.In the latter ensembles, the constraints under which the ensemble entropy is maximized are clear: the average energy and the average number of particles in the ensemble, that fix the average link length and average degree, or a sequence of expected degrees, respectively.What remains is under what constraints the considered hypergrandcanonical ensembles are entropy maximizers.Are these constraints similar to the grand canonical ones, or are they completely different, perhaps related to the expected number of triangles in the network [47]?In other words, what are the unbiased maximum entropy spatial network models for sparse heterogeneous small worlds with nonzero clustering?
where ρ(α) is the distribution of α defined by ρ(κ) given the relations between κs and αs as documented in the subsequent section, and function F is defined to be To find under which conditions P (α |α) is independent of α, we differentiate (B1) with respect to α and equate the result to zero to obtain Since the right hand side of this equation does not depend on α , function F is of the form F (x) = ae bx , with a and b some constants.Define q(x) ≡ e f (x) and z ≡ e −(α+α )/β to rewrite the uncorrelatedness condition as That is, the network is uncorrelated at the level of hidden variables α, α whenever Eq. (B4) holds for any value of z ∈ R + , with a and b some constants.We are next to prove that in the small world regime where f (x)/ ln x ∈ (l inf , l sup ) for x > X, l inf = lim inf x→∞ f (x)/ ln x, l sup = lim sup x→∞ f (x)/ ln x, and some constant X > 0, the assumption that Eq. (B4) holds implies that f (x) = c ln x ∀x ∈ R + .We consider two cases.and since q(x) = e f (x) , we have If we define q(x) ≡ q(x)/x c , Eq. (B4) can be written as If this equation holds for all values of z ∈ R + , then the integral in it must be a power of z for any z including z 1.Let us split this integral in two: where x c (ε) is such that for any x > x c (ε) we have that |q(x) − 1| < ε.We thus see that function q is bounded in the integration domain of the second integral in Eq. (B9).This implies that which is a constant independent of z.At the same time, the limit z → ∞ of the first integral in Eq. (B9) is zero because the domain of integration goes to zero and the integrand does not diverge at zero.Combining all these observations with Eq. (B8) we conclude that bβ = d/c and that (B11) This is possible only if q(x) = 1, and hence f (x) = c ln x.
b. Case l inf = lsup Let us assume now that l inf and l sup are both positive and finite but not necessarily equal.The condition for uncorrelatedness in Eq. (B4) implies that there must exist a value of b such that the limit exists.However, if f (x)/ ln x is squeezed between l inf and l sup at x 1, then this integral is squeezed between z d lsup and z d l inf at z 1, and the limit does not exist, so that we arrive at a contradiction.We thus conclude that the only possibility is that l inf = l sup = c, so that f (x) = c ln x.
Finally we remark that c can be always set to 1 by a proper choice of energy units.
Appendix C: Relations between κ and α Here, we derive these relations for the heterogeneous hypergrandcanonical ensemble with the energy function f (x) = ln x, the Poisson point process of intensity 1 in R d , and any distribution of expected degrees ρ(κ).The cases with β > d and β < d must be considered separately.

Case β > d
In this case, thanks to the integrability of the connection probability w.r.t. the spatial distance, we can work directly in the thermodynamic limit in R d .Since the space is homogeneous we assume without loss of generality that a node with variable α is at the origin, and we want to calculate its expected degree κ.It is convenient to work in spherical coordinates in R d , in which the volume element is where dV S d−1 is the volume element on the unit (d − 1)sphere whose volume is The expected degree κ of our node is then  ) This result implies that the edge-state energy ε ij and chemical potential µ in the ensemble are given by The connection probability can then be written as (C9)

Case β < d
In this case, the connection probability is not integrable w.r.t.distance, so that we have to take the finite size effects into account.This implies that the answer depends on a particular choice of the manifold family.
Yet we remind that our general settings are such that for any n the manifold volume is n, so that where R is the linear size of the manifold, while V d is its volume at R = 1.For example, if the manifold is a d-torus, then R is its side length and V d = 1.If it is a d-sphere, then R is its radius, and V d is the volume S d of the unit d-sphere: We consider the case with the d-sphere for concreteness.Since the d-sphere is homogeneous, we assume without loss of generality that a node with variable α is at its north pole.The volume element on the d-sphere with θ the polar angle is The connection probability is then This connection probability depends on n and tends to zero as ∼ 1/n 1−β/d since so does μ.We also note that it cannot be written as a Fermi-Dirac distribution function, meaning that the energy of edges cannot be defined in the case β < d.

FIG. 1 .
FIG. 1.The average shortest path length ls in the homogeneous spatial networks as a function of the network size n.This function is measured for different values of inverse temperature β in the connection probability (1) with f (x) = ln x used to generate random networks on the d = 2dimensional sphere of area n.The average degree in all these networks is fixed to k = 10 by the appropriate choice of the chemical potential µ, and the results are averaged over 10 random network realizations for each data point.The functions ls(n) are then fit with a ln b n for β < 2d (left panels) and with an b for β ≥ 2d (right panels), and the exponents b of these fits as functions of β/d are shown in the bottom panels.The dashed red line is ∼ n 1/2 , the distance scaling in the two-dimensional sharp RGGs corresponding to β → ∞.

FIG. 2 .
FIG. 2.The average shortest path length ls in the heterogeneous spatial networks as a function of the network size n.The settings are the same as in Fig.1, except that the networks are heterogeneous (3) with Pareto ρ(κ) (5) and γ = 3.5.

1 .
If f (x) = c ln x, then Eq. (B4) holds We first notice that the energy function f (x) = c ln x is a sufficient condition for uncorrelatedness, since then Eq. (B4) trivially holds with bβ = d/c and a = ∞ 0 t d−1 dt 1 + t cβ .(B5) 2. If Eq. (B4) holds, then f (x) = c ln x a. Case l inf = lsup = cIn this case, function f (x)/ ln x has a limit,

)
Changing variables we simplify this to κ = S d−1 e − αd β taking the average of Eq. (C4), we find the relation between the term e − αd β and the average degree k , which plugged again in Eq. (C4) leads to the final relation between α and κ: dV S d = sin d−1 dθ dV S d−1 ,(C12)so that the expected degree of our node isκ = S d−1 R d ρ(α )dα d − β R −β e −(α+α ) .(C14)Using this expression in Eq. (C13), we conclude that for n 1 the relation between α and the expected degree κ is given by α = − ln κ +

5 Heterogeneous, γ=2. 5 FIG. 4 .
FIG. 4. The average local clustering coefficient in homogeneous and heterogeneous spatial networks as a function of the network size n for different values of inverse temperature β and power-law exponent γ.The space is the d = 2-sphere of area n.The average degree is fixed to k = 10.The results are averaged over 10 random network realizations for each data point.The bottom right panel shows the average clustering as a function of β/d for the largest network size n = 3 × 10 5 .