Emergence of complex network topologies from flow-weighted optimization of network efficiency

Transportation and distribution networks are a class of spatial networks that have been of interest in recent years. These networks are often characterized by the presence of complex structures such as central loops paired with peripheral branches, which can appear both in natural and man-made systems, such as subway and railway networks. In this study, we investigate the conditions for the emergence of these non-trivial topological structures in the context of human transportation in cities. We propose a minimal model for spatial networks generation, where a network lattice acts as a spatial substrate and edge velocities and distances define an effective temporal distance which quantifies the efficiency in exploring the urban space. Complex network topologies can be recovered from the optimization of joint network paths and we study how the interplay between a flow probability between two nodes in space and the associated travel cost influences the resulting optimal network. In the perspective of urban transportation we simulate these flows by means of human mobility models to obtain Origin-Destination matrices. We find that when using simple lattices, the obtained optimal topologies transition from tree-like structures to more regular networks, depending on the spatial range of flows. Remarkably, we find that branches paired to large loops structures appear as optimal structures when the network is optimized for an interplay between heterogeneous mobility patterns of small range travels and longer range ones typical of commuting. Finally, we show that our framework is able to recover the statistical spatial properties of the Greater London Area subway network.


I. INTRODUCTION
Cities represent one of the most fascinating man-made complex systems, exhibiting complex features ranging on different scales: from their structure and dynamical behavior, up to the scaling of socio-economic factors with their size [1][2][3][4][5].These features represent a strong hint towards the existence of universal underlying mechanics behind apparently very different cities [6][7][8].Out of these structural properties, one of the most relevant, as it plays a fundamental role mediating the complex interplay between human dynamics [9,10] and mobility in urban context, are transportation networks [11][12][13][14][15].These networks are a class of spatial networks whose properties have been investigated in the literature during the last two decades [14,16].In particular, they have been studied under the lens of optimality conditions and minimization of cost-based functionals [16], in order to identify specific features behind efficient networks.The concept of optimal networks [2] and energy-like minimization [17] has its natural understanding in the physics language.States of a system which minimize a functional defining trade-offs between system's observables (e.g., free energy) represent the most likely to be observed states of many real world systems.While in some complex systems, such as cities, these physical variables can not be derived from first * Corresponding author: manlio.dedomenico@unipd.itprinciples, these analogies and concepts can still offer a valid perspective and provide an embedding of these systems in a space where the interplay between their structure and dynamics can be unfolded and better understood.Simple laws have been studied [16,18] to better understand the emergence of hierarchy and the role of traffic in the network state.Moreover, global and local optimization criteria lie in the evolution of man-made systems where policy makers and planners can adopt some of these criteria in their plans [14].
Transportation networks are often characterized by the presence of complex structures [19][20][21] such as loops paired with branches [22], which can appear both in natural [23] and man-made systems [14], like railway and subway networks.These structures represent the key topological elements behind efficient public transportation systems [20].In this study we investigate the conditions for the emergence of these non-trivial structures [18,24] in the context of human transportation in cities.We aim to reconstruct these topologies by means of an optimal configuration [25] of the network state.Under the assumption of a fixed total cost and a limited set of high-capacity connections (e.g., a constraint in the expenditure available on infrastructure), the optimal configuration is the assignation of connections' velocities, or edges' weights, such that the joint amount of time required to travel between two nodes is minimized for all pairs of nodes.Moreover, as these networks represent the arteries in urban exploration/navigation via public transportation, we study the role of traffic [18,26] in weighting the importance of specific set of connected edges (paths).We model the urban morphological structures which generate heterogeneous distributions of human mobility in space, biasing these optimal networks to converge towards specific topological features.We aim to explore the minimum requirements and the conditions for these optimization processes to reproduce the empirical structures aforementioned.
At variance with the recent works on network efficiency, we adopt some fundamentally different modeling choices.We evaluate the efficiency in terms of time necessary to explore the network, where edges' weights act as travel speeds.We also weight path efficiency by the traffic probability between nodes.The underlying network lattice (as represented in its simplest form by the triangular lattice in the next sections) acts as a substrate that allows the network to evolve [27] and possibly exhibit the network topological features typical to real world systems.On this framework, we show how introducing simple probabilities biasing the optimal efficiency between points in space force a transition between a tree-like topology and a network resembling a simple lattice.We show also that the modeling of traffic-like flows forces the emergence of preferential shared paths in space.The optimal configuration of these shared paths leads to complex topologies, which ultimately shows features seen in real systems.Features such as a bi-modality in the edges' velocity distribution, characteristic of multilayered transportation, and the central core with loops paired with branches typical of subway systems [20,22] are recovered.We finally show an application of the model within the Greater London Area, finding similarities of the optimal model with its London Underground network.

II. FRAMEWORK FOR URBAN SPATIAL MORPHOLOGY
We introduce here a general framework for spatial networks with the aim of recovering a minimal model for urban morphologies that encode both transportation properties and urban features such as population and density of points of interest (POIs).To this aim, we begin from the definition of a network substrate which acts as an effective discretization of the spatial dimension.Its simplest form can be found in an hexagonal 2-dimensional tiling [28] and its planar dual, the triangular lattice.More formally, in this network substrate each tile in space is represented by a node, connected to its set of neighboring nodes (see Fig. 1).The existence of a physical edge between nodes/tiles i and j is encoded in the adjacency matrix A where A ij = 1 if the regions are neighbors in the lattice.Distances and metrics are therefore computed on top of this network substrate and are biased by nodes' features.Nodes of this network can encode spatial features at the urban scale, such as population or amenities' density in a given node.We therefore have a minimal representation of a urban morphological structure (see Fig. 1), and a network substrate that acts as a transportation system and can be optimized to generate optimal transportation networks [27].The path-based temporal distance on top of the transportation network acts as the fundamental metric we aim to minimize.The rationale behind a network-based distance is grounded on the assumption that in the context of public transportation, urban systems are not navigated by considering geographical distance but rather by evaluating the travel-time between departure and arrival.More specifically, multi-layer transportation networks [11,21] are characterized by layers having a hierarchical organization with different characteristic speeds [29].Thus, an effective temporal distance becomes fundamental in determining accessibility and efficiency in urban space exploration.
In this model, we denote e as an edge in the network, and w e as the associated edge's weight which can be seen as a velocity of the edge in the transportation network.d e is the euclidean distance of edge e between the nodes it is connecting; here edge weights are visually mapped as widths of the links.Information about edges distance in this framework can be relevant when generalizing to the case of random spatial networks where edges have different lengths.In the case of a general non-spatial network, where there is no notion of spatial distances, the model can be adapted by fixing d e = 1.Finally, we define Ω Γij as the set of paths connecting the two nodes.We then maximize the efficiency of this underlying substrate.The transportation efficiency between two nodes i−j is computed as a cost in terms of time [30], and we do not account for congestion, which can be introduced in a possible extension of this framework.We find the path (a set of connected edges starting from source node i and ending in destination node j) with the smallest cumulative time, where the time delay introduced by choosing en edge e is measured as a ratio between its euclidean distance and its weight, representing a proxy of speed.Refer to Fig. 1 for a graphic depiction.Here G({w e }) is used to indicate the network configuration with the associated set of edges weights {w e }.We therefore aim to find the assignment of weights {w e } as a trade-off between net- Loop Dimension vs β: Minimum cycle basis is used as a network's observable to study the appearance of loops.For each point the median and its absolute deviation are shown.A) The average size (number of edges) of the loops that constitute the cycle basis.B) Cycle basis dimension as number of loops.The optimal network ranges from a tree structure to a lattice-like with small loops, as the probability of long range movement decreases (large β).A transition in the cycle basis property is observed at β ∼ 1.0 for the triangular lattice under study, where the optimal network results in an intermediate state with large loops.
work efficiency in transportation, by minimizing the set of costs {c ij } of travelling between pair of nodes i − j where each element c ij is defined as: and in absence of further information, the optimization procedure is the equivalent of minimizing the travel costs ij c ij .Here we add a novel ingredient, in which we couple the optimization of the network temporal distances with a traffic flow or probability between pairs of nodes.Operationally, when dealing with real world Origin-Destination (OD) matrices, this probability can be then mapped to a traffic T ij between two points.T ij represents the probability of a person from node i to travel to node j, and a traffic can be recovered when information about populations in source and target nodes is added.T ij effectively acts as a rank in the importance of a specific path in the network.As paths connecting different pairs of nodes may share common edges of the network substrate, complex topologies emerge from the shared paths jointly optimizing the network efficiency.The flow-weighted transportation efficiency therefore becomes: We also require that the total network infrastructure cost, defined as the cumulative sum of edges weights per unit length, multiplied by edge distance C G = e∈G d e w e is conserved.This is a generalization of a standard optimization process, in the sense that when T ij = 1, ∀(i, j), the efficiency is optimized for all possible trip pairs (i, j) with equal importance, where the Minimum Spanning Tree often represents the optimal solution [14].
Before tackling the problem of traffic-like (OD) flows, we study a simpler definition of T ij , which allows to understand the role of distance in the optimization process, in absence of other nodes' features: The coefficient β appearing in Eq. 3 is introduced as a penalizing parameter and determines how relevant is the pair-wise distance d ij when computing probabilities.We can understand it as the inverse of a characteristic traveling distance for an agent on the network β ∼ 1 d0 .While several alternatives on the integration of distance in spatial-dependent probabilities (such as power-laws ) can be employed, we focus on the exponential dependence as it represents the foundational result from the maximum entropy derivation of gravity flows [31].The introduction of gravity-like flows will be discussed in Section IV.
In the following, we introduce the application of the model on simple substrates to explore the role of β in absence of spatial urban features.

III. OPTIMIZATION OF SIMPLE NETWORK SUBSTRATES
In order to asses the role of the characteristic distance parameter β in the emergence of specific topologies, we compute networks statistics on a set of generative models for both spatial and non spatial networks.As hexagonal tiling of space is preferable when an analysis includes aspects of connectivity [28], the first model we study is a triangular lattice.The reason behind this choice is that it represents the planar dual [14,32] of the hexagonal lattice.Therefore, as space is discretized in hexagonal tiles, the spatial network connecting its centers is the triangular lattice, which is isotropic and presents less equivalent degenerate paths of a rectangular lattice.As a direct reference to hexagonal tiling, we refer to this network as HEX (see Fig. 2).We also extend the analysis also to the case of a random network model where nodes are not embedded in a metric space.Specifically, we study an Erdős-Rényi (ER) network topology, where the definition of distance between nodes L ij can be defined in terms of topological shortest path distance [33].
As a first benchmark we simplify flows as a spatial probability T ij = p ij that decays exponentially with distance and does not consider nodes features, the resulting equation for p ij is: Where i =j d ij is the average distance of points in the network and acts as a normalization factor (euclidean distance r in case of a spatial network or topological L for the ER network).
Therefore p ij encodes how much of the nearby space is explored by a single source node.An illustration of the spatial dependence of target probabilities and samples of the resulting optimal topologies are presented in Fig. 2.
For a range of β values the optimization process is performed on an ensemble of these models.To assess the emergence of complex structures we observe the number of loops that emerge in the optimal state.This measure is relevant in the context of spatial networks, where loops break the symmetry introduced by optimal structures such as trees.We compute the minimum cycle basis set as a metric to observe the emergence of loops [36]: i.e. the minimum set of loops (where a single loop is encoded in a set of edges that defines a closed path in the graph) such that any other closed path in the network can Minimal models of urban morphology and attractiveness distributions under study (3-Points and Exponential decay): Morphology of POIs, where attractiveness Wj is mapped with color intensity (yellow being higher).Optimized edges weights distributions P ({we}) are characterized by the bi-modal nature that reveals the multi-layered structure of the optimal transportation networks when close-range flows are paired with long-range traffic typical of commuting towards city centers (Insets P (Tij) with peaks on large-flows due to POIs).Gaussian KDE is shown in orange as a visual aid.A) 3-points polycentric distribution of POIs, resembling the euclidean Steiner Tree problem [34,35] for three points.The network is optimized with β = 0.1 and β = 4.0, and shows the appearance of branches connecting the POIs paired to large loops in the periphery.B) Optimal state and distribution of speeds with exponential decay of wj from the center and an exemplifying result with β = 3.0.The optimal topology is characterized by a central loop paired with branches.
be reconstructed via combination of this cycle basis [36].Specifically, we investigate the cycle basis dimension (the number of loops that constitute this set) and the average loop size, against a range of β values.This metric allows to quantify the emergence of spatial topological features that differentiate the optimal state from a tree structure.Results for these synthetic systems are presented in Fig. 1-2.A tree-like topology is recovered when the flows probabilities are distributed uniformly across all nodes in space (when β → 0 and distance is therefore not a penalizing variable in Eq. 4), while loops emerge when farther targets become less likely to be explored and the network is globally optimized for close-range trips.Notably, in Fig. 3 around β ≈ 1.0, we observe a sharp transition in the average loop size in the HEX lattice under analysis: connections appear between neighboring nodes which are far from the tree-root as it becomes more efficient to have a direct link.In this β regime the tree topology does not guarantee the most efficient configuration for peripheral nodes, which have their high probability targets in their direct neighborhood (see Eq. 4).Thus in the optimization process edges appear between leaves nodes which are in separated branches: this ultimately breaks the tree structure and leads to the emergence of largescale loops.Eventually the optimal network converges to a simpler structure with small loops as the network is optimized for nodes to target only direct neighbors in the lattice.Finally, in SM Section 2 we show an application on the case of a single target node in the perimeter of the lattice, where the model reproduces leaves venation patterns [14,37].

IV. SPATIAL ATTRACTIVENESS AND TRAFFIC-LIKE FLOWS
In the context of urban systems, optimal transportation networks need to be devised to accommodate traffic flows [26] towards specific areas of interest, e.g.due to high commercial and business land use density.Hence we extend the efficiency optimization framework in the case where we have more realistic traffic on top of the urban networks, as the presence of nodes with high attractiveness (POIs) biases the flows towards them.In urban scenarios we adopt spatial-interaction models to mimic more traffic-like flows.In these models, flows are obtained via a gravity-like equation: T ij ∝ p i p j exp (−βd ij ) [10] which can be derived from first principles via entropy maximization, thus representing the most likely set of flows to be observed.In the context of urban exploration, the gravity equation can be mapped to a model for spatial interaction [31,38] where nodes with a given attractiveness W j compete as possible targets for traffic: Normalization Z accounts for all possible trip alternatives k W k exp (−βd ik ).P i is the population density in node i and W j encodes a suitable definition of benefit/attractiveness of node j as a possible target [38].T ij is therefore the fraction of population in node i commuting/travelling on average to node j.To better understand the role of nodes' attractiveness, we start with the simplest assumption of equal population distribution on all nodes: P i = 1.0 ∀i; we will introduce more realistic population distribution in the next section with the London case study.
We apply these models on the triangular lattice to unravel the optimal topologies that emerge when traffic probabilities are biased towards some nodes having high attractiveness (simulating POIs) and we study two spatial configurations for nodes' W j .In the first configuration high W j is assigned to three nodes (POIs) placed at the vertices of an equilateral triangle.We study the 3points distribution as it mimics a prototypical polycentric distribution of city-centers, and it can be linked to the solution of the euclidean Steiner Tree problem [34,35].The Steiner Tree is a class of problems where given a set of N points in a plane the goal is to find the set of lines connecting the points with minimum cumulative length.In our case, the solution would lie in the central node of the lattice being the Fermat point [35] and the Steiner node, which connects the three vertices of the high W j triangle, as illustrated in Fig. 4 panel A. The second case is a distribution of W j that decays exponentially from the center, mimicking a more realistic morphology for a urban monocentric structure.The two morphologies are depicted in Fig. 4.
We find that due to nodes in the network biasing the traffic flows, as it can be seen in the insets of Fig. 4 A, the traffic flows get divided into two types: a close range paired to a long range set of flows, due to POI polarization.We show in Fig. 4 optimal solutions for values of β = 0.1, 4.0.Interestingly, optimal solutions are characterized by three central lines branching from the center (which therefore acts as Steiner node) and connecting the three nodes with high attractiveness, therefore resembling the solution of the Steiner tree problem.Moreover, in the case of more localized flows (β = 4.0) these lines are also paired with large scale loops connecting farther nodes.We also find that the heterogeneity of traffic flows forces the appearance of a second mode in the distribution of speeds w e (see Fig. 4).The two peaks in the optimal P (w e ) can be interpreted as two different levels of speed, which suggests that the entire process can be decomposed in two distinct mechanisms which can be mapped as a bi-layer network: one layer at high capacity with long-range/commuting trajectories and the other one at low velocity with short-range paths.These two layers can be ideally separated, hinting towards a possible extension of the model to multilayer networks.

V. GREATER LONDON AREA: GENERATIVE MODEL FOR THE SUBWAY SYSTEM
We extend in this section the application of the model by integrating data from a real urban structure.Specifi-cally, we model the morphology of Greater London Area (GLA) on top of our framework and apply the efficiency optimization process with the aim of understanding if the temporal efficiency optimization of the spatial substrate paired with realistic flows is sufficient to yield a transportation network with similar topological features (such as a central core paired with peripheral branches [22]) as the London subway system.To extend the model to real urban scenarios, we first obtain the distribution of amenities [39] from OpenStreetMap [40] and we use this density of points in space as a proxy to estimate the attractiveness W j of a tile.Census data for Greater London Area yards from 2014 is used to recover population density P i .These densities are then mapped to Uber's H3 tiling to recover the spatial discretization in hexagonal tiles, such that we can have direct mapping to the nodes on a triangular lattice, as in the examples discussed in previous sections.We thus have the ingredients to finally simulate the spatial interaction flows T ij in Eq. 5.In Fig. 5 the integration of urban data describing the London's morphology in the model is explained and we provide a depiction of the OD matrix that arises from the spatial interaction model.With the aim of reproducing real features, we impose an upper limit on edge weight, so that the distribution of weights is bounded during the optimization process: w e ∈ (0, w * ).This better simulates the upper bound in speed of real multilayer systems.Further explanation of data recovery and integration in the model is provided in SM Section 3. We find (see SM Figure 6) that {w e } distribution displays a bi-modal shape, and this allows the analysis of the generated network in a sub-graph defined by the set of high speed edges.In Fig. 5, panel C, we show a sample result for β = 0.35 of this sub-graph.The characterization of the network into a central core paired with peripheral branches as the optimal state can be visually observed.The model's subgraph of high speed edges is compared to the real tube network in the Greater London Area [21] to assess the similarities between the optimal structure and the real subway system.We quantify this similarity by means of spatial scaling laws [22], these are convenient to highlight the recovery of the central core structure characterized by loops, paired with quasi monodimensional lines branching from the core.We investigate the distribution of nodes stations using the profile function N (r) that quantifies the total number of stations at a distance r from the network barycenter, computed as the average location of all station nodes [22].Results of this scaling analysis for the real and simulated networks are presented in Fig. 6.The two scaling regimes indicate the separation of core and branches: the scaling of r 2 in the core center and a second trend due to monodimensional branches for r > r C , where r C is the radius of the core structure.The second trend can be computed analytically via an integral curve for N (r > r C ) which can be approximated by a power law r γ (γ = 1.25 ± 0.02, see SM Section 4), as in Ref. [22].The curve of N (r) is consistent with the real network and confirms scaling Scaling properties of GLA Tube stations: Profile of the number of stations (nodes in the optimal discretized network (see SM Section 3) reproducing GLA underground) versus the distance from the barycenter.The scaling of N (r) profile of the model is compared with the real network system.Scaling properties predicted in [22] are verified, finding the two different scaling regimes separated at r r C ∼ 1 for core paired with branches systems, where rC is the core radius (characterized by r 2 scaling), and NC is the number of stations in the core.The scaling exponent γ = 1.25 ± 0.02 is obtained as a linear fit of the integral curve [22] for r > rC (see SM Section 4 for more details).

VI. DISCUSSION
Starting from simple conditions on temporal efficiency on a spatial network substrate, we show that network optimization paired with traffic-like flows weighting the importance of specific connections in space can reproduce complex networks features from man-made transportation networks.Specifically, we devise a framework for spatial networks where nodes can encode features of urban systems and can ultimately lead to the study of optimal topologies in real scenarios.A key novelty lies in the optimization process happening on a spatial substrate, such that edges of the resulting optimal network are optimized to improve the efficiency of the shared space by all nodes in the network.We show how the probabilities of moving from one point to another in space force a transition between a tree-like and a lattice-like topology in the optimal network.Fixing certain target points in space with a higher attractiveness for flows can reproduce theoretical results such as the Steiner tree solution or leaves venation patterns.We also show that extending these probabilities using urban spatial information and traffic-like flows modeling forces the emergence of shared preferential paths that are organized as complex topologies, resulting from traffic weighted optimization of network time efficiency, which ultimately exhibits the characteristics seen in real systems.We recover features such as a bi-modality in the speed distribution of the edges of the network, characteristic of multilayer transportation.Or the appearance of a central core with loops coupled to branches typical of underground systems, as in the case of the London underground system.We find that branches paired to large loops structures appear as optimal structures when the network is optimized for an interplay of traffic flows mixed between small range travels and longer range ones typical of commuting.This novel framework for the optimization of spatial networks in urban contexts may show further improvements and extensions to better accommodate the concepts of multilayer and shared space.It could be extended also to the case of inter-cities transportation, where specific nodes in the network substrate represent cities.To conclude, in this work the problem is addressed in a theoretical way with the aim of reproducing and understanding some features observed in real spatial networks, but future works can exploit this framework as a basis to understand how to generate optimal transportation networks in a urban planning scenario.

S2. LEAVES PATTERNS
Here we show an application of the model to reproduce leaf venation patterns [37].A single attracting node (single target or sink) is considered at one of the perimeter nodes of the lattice, and the substrate is optimized using all the other nodes as sources.

FIG. 1 .
FIG. 1.Spatial network model for urban morphology: A) Mapping population distribution and urban transportation network to a minimal spatial network where nodes encode urban features.Example with hexagonal tiling mapped to the triangular lattice.B) Network-based distance cΠ ij versus euclidean distance dij; edges weights/velocity we are depicted as widths.C) Edges weights of the lattice substrate are optimized via simulated annealing to unravel spatial features of the optimal transportation network.

FIG. 2 .
FIG. 2.Optimization of synthetic networks: The role of β is studied for two network models: the triangular lattice (HEX) and the non-spatial (ER) network.A) Heatmap of target nodes probabilities pij from source node (yellow) under two different β values: as the penalty parameter grows, farther nodes are more penalized and flows tend to stay close to the source.B) Samples of the associated optimized network states: when flows are not affected by distances (β = 0.1) source nodes target all the other nodes in the network with approximately equal probability, the optimal network converges to a tree-like structure.With larger β (β = 5.0), trip probabilities are more localized and the presence of loops appear in the optimal structure.
FIG. 4.Minimal models of urban morphology and attractiveness distributions under study (3-Points and Exponential decay): Morphology of POIs, where attractiveness Wj is mapped with color intensity (yellow being higher).Optimized edges weights distributions P ({we}) are characterized by the bi-modal nature that reveals the multi-layered structure of the optimal transportation networks when close-range flows are paired with long-range traffic typical of commuting towards city centers (Insets P (Tij) with peaks on large-flows due to POIs).Gaussian KDE is shown in orange as a visual aid.A) 3-points polycentric distribution of POIs, resembling the euclidean Steiner Tree problem[34, 35]  for three points.The network is optimized with β = 0.1 and β = 4.0, and shows the appearance of branches connecting the POIs paired to large loops in the periphery.B) Optimal state and distribution of speeds with exponential decay of wj from the center and an exemplifying result with β = 3.0.The optimal topology is characterized by a central loop paired with branches.

FIG. 5 .
FIG. 5. Optimal network model for Greater London Area subway system: Application of the efficiency optimization with realistic flows on the urban structure of the Greater London Area.A) Urban morphology data is recovered from Census and OSM and population and POI densities are mapped to the H3 tiling.B) Data is mapped to the triangular lattice, with nodes having features which allow the calculation of traffic-like flows, a sample OD matrix is shown where Tij are computed with β = 0.35.C) Optimal network state for the London model, where only edges and nodes corresponding to the second mode are shown (see SM Section 3).Central core structure with loops paired with peripheral branches can be visually seen.