Scaling Laws in Spatial Network Formation

Geometric constraints impact the formation of a broad range of spatial networks, from amino acid chains folding to proteins structures to rearranging particle aggregates. How the network of interactions dynamically self-organizes in such systems is far from fully understood. Here, we analyze a class of spatial network formation processes by introducing a mapping from geometric to graph-theoretic constraints. Combining stochastic and mean field analyses yields an algebraic scaling law for the extent (graph diameter) of the resulting networks with system size, in contrast to logarithmic scaling known for networks without constraints. Intriguingly, the exponent falls between that of self-avoiding random walks and that of space filling arrangements, consistent with experimentally observed scaling (of the spatial radius of gyration) for protein tertiary structures.

Most networks forming in the real world are spatially extended and often geometrically constrained.Common examples include volume exclusion in the dynamics of polymers, chemical interactions in folding proteins and local electromagnetic forces in ferrofluidic aggregates [1][2][3][4][5][6].How geometric constraints impact the dynamic formation processes of spatial networks and thereby their function, is far from fully understood.
In many physical, chemical and biological systems, interaction structure and geometrical arrangement are equally important [7], in particular for their dynamics.Key examples include proteins folding into their tertiary structures [8,9].During the folding process, not only do amino acids interact with their neighbours along the chain but also with units that are far apart in the chain but close in space [10,11].On the level of abstract contact networks [12][13][14], the process of protein folding can thus be considered as adding interaction links to a network, akin to percolation [15][16][17], but spatially transforming the network at the same time.
In this Letter, we demonstrate that geometric constraints induce algebraic scaling laws in the formation of spatial networks, suggesting self-similar ('fractal') structures.We introduce a stochastic model that explicitly captures the essential impact of such geometric constraints on establishing spatial contacts and map them to constraints on graph-theoretic link additions.Combining probabilistic analysis with mean field calculations, we show that the extensions of the resulting networks exhibit an algebraic scaling law with system size.In stark contrast, network formation processes without such constraints exhibit logarithmic scaling [18] such that geometric constraints qualitatively change the nature of the scaling law.Intriguingly, the algebraic scaling law per se as well as its exponent are consistent with the scaling of the experimentally observed spatial radius of gyration with the chain length of protein tertiary structures.
Geometrically constrained network formation.To understand basic principles underlying geometrically con- strained network formation dynamics consider an initial chain of identical, spatially extended units, each in contact with its nearest neighbor units (as all units are identical, this is a special case of a coin graph [19]).For later analytic accessibility, we take the space to be twodimensional and the chain to be closed to a single cycle such that initially the units are indistinguishable.The latter does not change the scaling behaviour, because folding an open chain results in a collection of closed cycles, as we will see below.This chain represents the original aggregate such as an unfolded protein where the units are amino acids or an initial contact sequence of ferrofluidic particles.In a time-discrete network forming process (Fig. 1), the units randomly come into contact with each other under the geometric constraints that in each step (i) no two units overlap and (ii) units in contact at some point in time stay in contact.The chain thus non-locally deforms each time a new contact forms (Fig. 1).The sequence of connections models the emergence of pair-wise contacts between interacting units moving in space under the above constraints.In the model, new contacts keep forming until no additional contacts are consistent with the constraints.Thus, the resulting network is a collection of non-overlapping disks arranged to rigid triangles in two dimensional space (or spheres arranged to rigid hexagonal layers in three dimensional space).Checking whether this rigidity property can still be achieved for every newly established contact constitutes a non-local, computationally hard problem and is not simply feasible.
To analytically access the problem, we first map the spatial contact process with geometric constraints to a link addition process of network formation, with constraints on changes in the network topology only (Fig. 2).The map yields an approximate ensemble of networks that represent the spatial structure formation process.The topological constraints in the network model become: (a) Links can only form between two units that are part of the same face of the graph (region enclosed by a cycle in the network).This ensures that geometric constraint (i) is not violated by links crossing.(b) Links do not form across the outer face.This ensures that no unit can be enclosed by less than six other units (which is geometrically impossible) such that (i) stays satisfiable.(c) The maximum degree of each unit is six.This also ensures that (i) is not violated by forcing more than six units around one given unit.(d) Once connected by a link, pairs of units do not disconnect, representing geometric constraint (ii).The spatial extension of an aggregate is often measured by its radius of gyration quantifying the average distance between any pair out of N units.Here, r i is the spatial position of unit i ∈ {1, . . ., N } and 1 ν is the scaling dimension.Real three-dimensional protein structures indeed exhibit an algebraic scaling law (Fig. 3a) with an exponent ν ≈ 0.42 ± 0.04 above a lower bound ν SF = 1/3 implied by compact space filling aggregates [20][21][22][23] and below an upper bound ν RW = 3/5 resulting from self-avoiding random walks in three dimensions without further restrictions [24][25][26], together yielding: For spatially embedded networks where each unit occupies space of the same order of magnitude we expect the diameter D to increase linearly with spatial extension.Direct numerical simulations of the model processes for various system sizes indicate an algebraic scaling law as found for biological protein tertiary structures, see Fig. 3. Specifically, the obtained scaling exponent ν ≈ 0.62 ± 0.04 moreover satisfies the same types of upper and lower bounds (Eq.2) as experiments on proteins suggest, between space filling configurations (in two dimensions ν SF = 1/2) and that of self-avoiding random walks (ν RW = 3/4).Network formation integrating constraints.To understand the emergence of this scaling law and estimate its exponent, we mathematically analyze the network formation in the simplified network model with graph-theoretic constraints (a)-(d) inherited from the geometric ones (i) and (ii).
Consider at time t = 1 an initial graph consisting of one cycle of N units that evolves in a process in discrete time t ∈ {1, 2, . ..}, with exactly one link adding at a time.Each new link divides one cycle into two smaller cycles.Such a process exclusively generates networks that are planar graphs consisting of cycles.
How does the above scaling emerge?How do the constraints impact the structure formation process on the network level?The graph-theoretical diameter of the dual graph of a given network serves as a natural quantity measuring the networks' extension.The vertices of the dual of a graph are defined by the faces of the cycles of the original, with two vertices connected if the two cycles they result from are neighboring, that is, share an edge in the original graph.At time t, the diameter D t of the dual therefore equals the length of (one of) its longest paths, representing a longest sequence of neighboring cycles in the original graph.We call such a sequence a diameter path.The union of all diameter paths (all sequences of cycles of the same (largest) length) in the original graph is called the diameter graph.
For small times t, the cycles are typically of different lengths, for larger times become similar and eventually all become triangles.Thus, for sufficiently large times t, the diameter of the network is proportional to that of the dual (Fig. 4).We thus take a mean field view and simply talk about the diameter, also when analyzing the scaling of the the diameter of the dual.Since no two cycles share more than one link, and no unit of the original network becomes enclosed in any path (due to condition (b)), the resulting dual graph stays a tree at all times.The diameter graph thus is the union of all paths of cycles of length D t .We note that the total number of cycles present at that time t equals t.
We now derive a recurrence relation for the average diameter D t to then estimate how the final diameter scales with the chain length.Let V t be the expected number of cycles on the diameter graph and let E t be the number of end cycles (degree-one vertices of the dual) on any diameter path, as shown in Fig. 4. The average diameter D t evolves with time in three different ways.First, if a new link divides a cycle that is not part of the diameter graph, the diameter D t stays unchanged.Second, if a new link divides an end cycle of the diameter graph (Fig. 4), which in mean field approximation occurs with probability E t /t, D t grows by one.Finally, if in the diameter graph a new link divides a cycle that is not an end, which analogously occurs with probability ( V t − E t )/t, D t grows by one if the splitting is transverse to a diameter path, which in turn occurs with some probability P + t ; otherwise, if the splitting is parallel to the diameter path, D t also remains unchanged, compare Fig. 5.We thus obtain the recurrence relation for the expectation value of the diameter.It remains to estimate P + t , E t and V t and then to iterate the recurrence relation in time to obtain the diameter of the final network.
Approximating P + t .To find P + t , we first compute the probability P t (D t increases| ) of the diameter increasing given that a link adds in a cycle of length on the diameter path [27].There are two ways a link can add, see Fig. 5.If adding a link splits splits the cycle parallel to the diameter path, the newly created cycle becomes a side arm of the path, leaving the diameter unchanged.Alternatively, if the new link splits the cycle transversally to the direction of the path the diameter extends by one.Let h 1 and h 2 (= − h 1 ) be the numbers of units in the two fractions transversal to the diameter path (Fig. 5).Increasing the diameter thus requires to connect one of the h 1 units to one of the other h 2 units.Then P(D t increases| , h 1 ) = 2 h1( −h1)−2 , because there are ( − 3)/2 ways of connecting any two units in the cycle and h 1 ( − h 1 ) ways of forming a transversal connection, the term "−2" taking care of the two links that already exist between the two fractions of the original cycle.As every splitting of the cycle into two parts is equally likely for part sizes h 1 ∈ {1, . . ., − 1}, we find (5) Finally, the probability P t ( ) of picking a cycle of length on the diameter path and depends on the entire past history and cannot be rigorously derived.We thus approximate P + t = N =4 P t (D increases| )P t ( ) ≈ P t (D increases| t ) by its rigorous lower bound given by Jensen's inequality.
We take the desired expected cycle length for sufficiently small times t to be its average length t = N +2(t−1) t of all cycles at time t.As no links can be added to cycles of less then = 4 units, we take t = 4 once the previous average reaches that value from above, N +2(t−1) t ≤ 4, i.e., for t ≥ N/2 for sufficiently large N , yielding We now approximate the detailed dynamics (6), by its time average, Next we estimate the average number of cycles in the diameter graph given by two contributions, the average diameter and the summed sizes V b (t) of all side branches b of an arbitrary but fixed diameter path, weighted with the probability P b that branch b creates an alternative path overlapping with the original.As longer side chains are exponentially suppressed, the second term is negligible for the scaling in the limit of large N (see Supplemental Material for more details).
Iterated recurrence and scaling law.This suggests that D t and V t scale the same and therefore E t can be neglected in Eq. 4 without changing the scaling behaviour.With V t ≈ D t , the recurrence (4) becomes The solution through the initial condition D 2 = 1 is D t = 2Γ(P + + t)/(Γ(P + + 1)Γ(t)) ≈ 2/Γ(P + + 1)t P + , where Γ(.) is the Gamma function.In the limit of large t = N − 2, a power law with specified exponent results, As found above already through direct numerical simulations, the scaling law now also obtained analytically is consistent with experimentally obtained law (1) for proteins, with scaling exponent between the set upper and lower bounds (2), compare Figs.3b with 3a.Interestingly, the generally concave form of the dynamics of P + t , (see SM), indicates that any estimate of the time average ν theory = P + must lie within an interval ν theory ∈ [ν min , ν max ], where ν max < 2/3 and ν min > 1/2.Thus even without the approximation of the dynamics (6), an algebraic scaling is guaranteed and its exponent is above that for space filling aggregates, ν theory > ν SF .
The scaling law intrinsically results from the geometric constraints: without such constraints the process analyzed above exactly reduces to the formation of Watts-Strogatz small-world networks with new links randomly adding to a circular graph [28][29][30]; for sufficiently many links, the diameter of such networks exhibits logarithmic scaling that is thus inconsistent with the algebraic scaling we found.Roughly speaking, due to the geometric constraints, any new link between two units drastically increases the probability of creating further links in these units' respective neighborhoods.As a consequence, the structures cannot be arbitrarily compact.Our numerical results as well as analytic derivations above indicate that the spatial extent is modified qualitatively, changing a logarithmic to an algebraic scaling law.
Conclusion and outlook.Taken together, we uncovered an algebraic scaling law for network formation processes under geometric constraints.We have analyzed a spatial network formation model by mapping geometric constraints in space to purely graph-theoretical constraints on the topological changes of a network.Direct numerical simulations as well as analytic mean field calculations strongly indicate a scaling law with the graph diameter growing algebraically with system size, representing spatially self-similar ('fractal') networks.This algebraic law scaling is largely independent of the details of the model setup and clearly induced by geometric constraints.Even without the time-averaging approximation of the dynamics (6) an algebraic scaling is guaranteed, exhibiting an exponent larger than that of a space filling aggregate, ν > ν SF , thus indicating self-similar features.Both the algebraic scaling per se and its exponent are consistent with experimentally observed scaling of protein tertiary structures in real space [20,21,23].More generally, our results may suggest that geometric constraints generically induce algebraic (rather than logarithmic) scaling laws of networks forming in space.

FIG. 1 .
FIG. 1. (color online) Mapping spatial structure formation onto network formation.Units coming into spatial contact (green dashed lines) induce additional links on the network level.The network becomes more and more compact as links add, in two dimensions yielding a subgraph of the triangular grid.For illustration, panels show networks of N = 11 units for time steps t ∈ {1, 2, . . ., 8}.

FIG. 2 .
FIG. 2. (color online) Mapping constraints from spatial geometry to network topology.Links in the contact graph form when two randomly chosen units come into spatial contact, subject to geometric constraints (a)-(d) specified in the text.Process 1: adding a link (green dashed line) is allowed because all conditions (a)-(d) are satisfied.Process 2: adding a link (red dashed line) is forbidden due to condition (a) to avoid overlapping units.Process 3: adding a link on the outer face is forbidden due to condition (b) to avoid the possibility that units (here the one shaded yellow) may with later links (red dotted line) be enclosed by less than six other units during a subsequent step (e.g., red dotted line)

FIG. 3 .
FIG. 3. (color online) Algebraic scaling laws in spatial network formation.(a) Scaling of chain lengths of experimentally analyzed proteins vs. their radius of gyration (Eq. 1) (37162 data points from [20] log-binned, with error bars indicating standard deviations).Best fits suggest algebraic (red) rather than logarithmic (gray) scaling a (b) Algebraic scaling of graph diameter D final (N ) as derived in this Letter (orange line), plotted vs. the chain lengths N .Black dots indicate 450 stochastic realizations of network formation processes (uniformly sampled on a logarithmic chain length scale, binned and evaluated as in (a)) indicating the diameter of the original graph with best algebraic fit (red line).The algebraic scaling law with the (inverse) scaling dimension ν lying between that of self avoiding random walk ν RW (green dashed lines) and that of space filling aggregates (blue dotted lines) is consistent with biological data but inconsistent with logarithmic scaling as expected from network formation processes without geometric constraints.a Logarithmic fitting by Rg = a ln(bN + c) − a ln(c) ensuring that lim N →0 Rg(N ) = 0

FIG. 5 .
FIG. 5. (color online) Diameter-increasing vs. diameterconserving link addition.Example illustrating three cycles of length , ' and ' along the diameter graph with the dashed lines signifying the rest of the network.Adding a link (red) parallel to the diameter path leaves the diameter constant and creates a branch.Adding a link (blue) transverse to the path (and thus parallel to the edges indicated by wiggled lines) increases the diameter by one.Out of the ( − 3)/2 potential links to add, h1h2 − 2 may add transversely.
FIG. 4. (color online)Diameter path, diameter graph and end cycles.A diameter path is a sequence of cycles of maximum length (here Dt = 7, indicated by the dashed red line.For large graphs with defined average cycle length, Dt is proportional to the diameter of the original graph (black dots, black solid lines, pink solid lines indicate diameter).The diameter graph is the union of all such diameter paths (all shaded regions).Vt denotes the number of cycles on the diameter graph (here Vt = 12) and Et the number of end cycles (with only one neighbour) on any diameter path (here Et = 5, shaded light rose).