Designing optimal networks for multi-commodity transport problem

Designing and optimizing different flows in networks is a relevant problem in many contexts. While a number of methods have been proposed in the physics and optimal transport literature for the one-commodity case, we lack similar results for the multi-commodity scenario. In this paper we present a model based on optimal transport theory for finding optimal multi-commodity flow configurations on networks. This model introduces a dynamics that regulates the edge conductivities to achieve, at infinite times, a minimum of a Lyapunov functional given by the sum of a convex transport cost and a concave infrastructure cost. We show that the long time asymptotics of this dynamics are the solutions of a standard constrained optimization problem that generalizes the one-commodity framework. Our results provide new insights into the nature and properties of optimal network topologies. In particular, they show that loops can arise as a consequence of distinguishing different flow types, complementing previous results where loops, in the one-commodity case, were obtained as a consequence of imposing dynamical rules to the sources and sinks or when enforcing robustness to damage. Finally, we provide an efficient implementation of our model which convergences faster than standard optimization methods based on gradient descent.


I. INTRODUCTION
Optimizing networks for the distribution of quantities like passengers in a transportation network or data packets in a communication network is a relevant matter for network planners. Similar problems arise in natural systems like river basins and vascular networks. A variety of models have been proposed to study these systems within an optimization framework [1][2][3][4]. The standard goal is to find the values of flow and the network topology that minimize a transportation cost. A common choice for this cost is the total power dissipation [1,2,[5][6][7][8][9], but alternatives can be adopted depending on the application, see for instance [10]. More recently, different approaches based on a dynamical adaptation of network properties coupled with conservation laws have been proposed [5,6]. These models can be reformulated within the framework of optimal transport theory, following the work of [11][12][13][14][15][16][17]. Very efficient computational techniques have been developed for solving such optimal transport based models [13][14][15].
In all these systems there is a unique undistinguishable flow traveling through the network. However, it may occur that flows of different types compete in the network infrastructure, yet all the physical models mentioned above have been developed for one type of flow only. One could use these methods to analyze multi-commodity problems by either aggregating together all flow types or by treating them independently. In either case, one loses the important information of how interacting commodities affect * alessandro.lonardi@tuebingen.mpg.de † caterina.debacco@tuebingen.mpg.de the flow, which constitutes the multi-commodity character of these settings. Multi-commodity-specific methods that rely on standard optimization suffer of high computational costs caused by the simultaneous assignment of multiple interacting paths to minimize a global cost function. As a consequence, existing multi-commodity flow algorithms rely on ignoring these interactions, or use greedy heuristics and approximations that lead to suboptimal solutions [18]. Approaches based on statistical physics and message-passing algorithms have improved results [19,20] but remain computationally costly.
In this work, we propose a model to design the topology of optimal networks where multiple resources are moved together. This is based on principles of optimal transport theory similar to those studied in [16,17]. Assuming potential-driven flows, this optimal design problem is posed as that of finding the distribution of multicommodity fluxes that minimize a global cost functional, or equivalently, as that of finding the optimal edge conductivities. The cost functional is the multi-commodity extension of the optimal-transport Lyapunov functional proposed in [14,15]. It is given by the sum of the convex cost incurred in transporting all the commodities across the network, summed to a concave cost proportional to the total flux on the network. This second term can be interpreted as the cost for building and maintaining the transport infrastructure, and controls traffic congestion on the network edges by either distributing fluxes on many edges, or by concentrating them on fewer edges following a principle of economy of scale.
Additionally, we show that the problem of minimizing the proposed cost functional is equivalent to a constrained optimization problem that generalizes the one-commodity case. The optimal distribution of fluxes is used to identify the optimal network topology by discarding edges where conductivities are small. Within this optimization framework, numerical experiments supported by analytical evidence lead to the important result that optimal network topologies may have loops as a consequence of distinguishing flow types. Generally, loops are pervasive in both natural and anthropic networks [7,[21][22][23][24]. However, in one-commodity settings, several studies have shown that trees are often optimal [1,2], while few results show that loops can be obtained by fluctuating flows or by aiming at increased robustness to damage [3,5,7]. This implies either changing the type of cost function or introducing stochasticity in the sources and sinks. Instead, in our multi-commodity model loops emerge naturally as a consequence of the presence of different flow types.
In order to minimize the highly non-linear and nonconvex cost functional mentioned before, we propose a particular set of dynamical equations for the edge conductivities, generalizing to a multi-commodity scenario those proposed in [16,17], and find their stationary solution. We demonstrate that the cost functional is indeed a Lyapunov functional (i.e. it is strictly decreasing along the solution trajectories) for the proposed dynamics. Altogether, our results extend the theoretical insights of two separate lines of literature, optimal transport and network dynamics. Two principled algorithms for solving the multi-commodity problem are proposed. They have similar computational complexity that largely improves that of techniques based on gradient descent or Monte Carlo methods, thus making the model scalable to large datasets and the only computationally viable optimization alternative for large problems.

II. MODEL
Consider a graph G made of a set of N nodes V interconnected by a set E of E edges. We want to model transport through the network of M ≥ 1 commodities, each identified by a color. The inflow/outflow rate of each commodity is given by a vector S i ∈ R N such that v S i v = 0 for all i = 1, . . . , M to ensure global mass preservation. Let the "colored-flux" F e = (F 1 e , . . . , F M e ) be a vector with entries F i e , which represent the commodities flux passing through edge e. In standard one-commodity cases, the flux per unit time could represent a water or an electrical current, and typically is "colorless", i.e. F e is a scalar quantity. In turn, the components F i e can be thought as fluxes of immiscible substances traveling through the same edge. Denote with B the signed network incidence matrix, with entries B ve = +1, −1 if node v ∈ V is the starting or ending point of edge e ∈ E, respectively, and zero otherwise. We require the flux to obey the "colored" local Kirchhoff's law: where each edge e = (u, v) has length e > 0. We could assume that the components (colors) of the flux derive from differences of a colored-potential (pressure) defined on nodes p i v , and a colored-conductivity µ i e : The commodity index i can be any arbitrary attribute of the mass traveling through the network without impacting the validity of our model. In fact, the important idea behind multi-commodity optimization is that mass of different type interacts while being transported in a shared infrastructure, and a suitable cost needs to be minimized. In Fig. 1 we show a simple example of the model construction.
Up to this point, we have a set of independent onecommodity flows, one per color i. Taking them separately and then superimposing each individual flux or conductivity a posteriori would be a naive strategy, neglecting possible complex interactions. For example, it may be more convenient to gather multiple flows through one channel with high capacity (the conductivity µ i e ). More generally, the optimal network design mechanism must take into account all commodities at once. Deciding how this should be done is an open problem in the context of optimal transport theory, the approach we take here.

Topology
Transport network Here, number inside nodes correspond to their mass inflow, we assume for each commodity to have its mass concentrated in a single vertex; in the gray node no mass is entering nor exiting, i.e. S i 3 = 0 for every i. Widths of edges are drawn proportional to F i e , in case all F i e = 0 links are not drawn, and arrows denote the direction of the colored fluxes F i e . Notice how blue and orange mass share the same edges, thus creating possible traffic congestion. The inflow/outflow rates of each color are are S 1 = (1, 0, 0, −1) (blue), S 2 = (0, 3, 0, −3) (green) and S 3 = (−2, 0, 0, 2) (orange).

A. Introducing a shared conductivity
Our first model assumption is that all the conductivities must be equal, namely: ∀e ∈ E, ∀i = 1, . . . , M .
The quantityμ e plays the role of a "colorless" conductivity. Given that the conductivity can be seen as proportional to the size of an edge, Eq. (3) can be interpreted as allocating the same edge capacity for the different colors. This is a reasonable assumption in systems for which there is no priority between commodity types or users. In communication networks like the internet, this captures the situation where all users share the same bandwidth, and no privileged user exists who has access to more bandwidth, which is often the case. Notice, however, that the flux F i e still depends on the color, because the difference in potential does. This implies that users can transfer different amounts of data packages, with potential for traffic congestion when they overload the network, e.g. when streaming videos. This is one of the many possible alternatives of coupling between colors. Other more complex choices could be made, for instance by introducing an explicit coupling involving the fluxes, or opting for controlling some global functions of the conductivities, e.g. their sum or the magnitude of their fluctuations across colors. However, we find that our choice, while being analytically convenient, it allows for a rigorous generalization of the one-commodity case with fixed and fluctuating loads [1-3, 5, 7] and leads to rich topological behaviors, as we show below.

B. The dynamics
Having defined how colors move through the network, we now turn our attention on describing the mechanism to design the network. Formally, we propose the following dynamics for the colorless conductivity: where we define ∆P uv as a vector of pressure differences with entries Note that p i , and thus ∆P uv , are implicit functions ofμ(t) because of Eq. (1) and (2). The parameter β determines the type of optimization associated to this dynamics. In the standard one-commodity case, for β < 1 one aims at minimizing traffic congestion and obtains loopy topologies, for β > 1 the aim is to consolidate paths and optimal networks are trees; the case β = 1 is shortest path-like. This dynamics describes a feedback mechanism. If the total flux through an edge is large, its conductivity increases. If the flux decreases, the conductivity decreases over time and becomes negligible when no flux occurs. The systems of Eq. (1)-(4) represents our model for multi-commodity flow optimization.
In the presence of only one commodity, our model is similar to the dynamics used to solve the basis pursuit problem on networks [15] and as a principled mechanism for filtering networks from redundancies [16]. However, both cases are limited to one-commodity scenarios. A similar dynamics is also proposed in [5], where the authors focus on the average time-evolution of a stochastic model with fluctuating loads. Analogously to these onecommodity cases, one can efficiently solve the system in Eq. (1)-(4) using optimized numerical methods, however in our case the complexity increases with the number of colors, see Appendix Sec. E 2 for more details. Our model also bears a close mathematical relationship to a recent work, where similar ideas have been studied in a multi-commodity setup [17]. Beyond the fact that this work focuses on the case β = 1 (i.e. shortest path-like) and thus on a convex optimization scenario, there is one other main conceptual difference with our model. Notice that Eq. (4) couples together the various colors by means of f (∆P uv ) = ||∆P uv || 2 2 , i.e. the 2-norm squared of the pressure difference. Instead, they consider the 1-norm and 2-norm (not squared). Analyzing the solutions of the dynamics under different f (∆P uv ) is an interesting avenue for future work.
The key insight of optimal transport theory, is that Eq. (4) admits a a Lyapunov functional (a functional decreasing in time along solution trajectories) having the nice interpretation of being the transportation cost: where is a function implicitly defined as solution of Eq. (1)-(2) when imposing Eq. (3). The first term corresponds to the energy dissipated during transport, it can be interpreted as the operating costs, whereas the second is the cost of designing the infrastructure. The equilibrium point ofμ e is stationary at the previous Lyapunov functional, and for β ≤ 1 it acts also as the global minimizer due to its convexity. For β > 1, while the first term (operating cost) is convex, the second (infrastructural cost) is not. As a consequence, the transportation cost is not convex, thus in general the functional will present a rich landscape with several local minima towards which the dynamics will be attracted.
We formally show that Eq. (5) defines a well-defined Lyapunov functional for the dynamics of Eq. (4) in Appendix A 1, following similar arguments as in [17]. This extends the work of Bonifaci et. al. [11], where a similar functional has been proposed to complete the characterization of the dynamics regulating slime molds' evolutionary feedback mechanism.

C. Mapping to standard optimization setups
Although not evident, our dynamics is connected with an optimization problem analogous to previous models for the one-commodity case [1,2]. Specifically, the stationary solutions of our system minimizes the network total transportation cost J = 1 2 e∈E ê µe ||F e || 2 2 subject to the global constraint of constant material cost e∈E eμ 2−β e = K 2−β and local Kirchhoff's law on nodes as in Eq. (1) (using Kirchhoff's law one can show that J is equivalent to the first term in Eq. (5), for more details see Appendix A 2). Formally the optimization problem is: such that This optimization problem is analogous to that in [2], except here the flux appears in terms of its 2-norm. As in the one-commodity case, this leads to an optimal configuration where the conductivities similarly scale with the fluxes,μ and the proportionality constant can be fully determined analytically (see Appendix B for detailed derivations). Using Eq. (9) we can rewrite the total transportation cost in terms of the flux as where , which is analogous to the optimization problem of Banavar et al. [1], where there was no conductivity in the setup. Notice that all these results generalize the one-commodity case [1,2] by means of the 2-norm ||F e || 2 of the colored flux. If there were only one color, and thus ||F e || 2 2 = F 2 e , our model reduces exactly to them. Similar relations can be obtained with a stochastic approach as the one proposed in [3,7,8], but by considering ensemble averages instead of the 2-norm of the fluxes. In these works, the authors study a setup where sources and sinks' positions are extracted randomly from a distribution on the network nodes. They also find loops in non-trivial regimes. While the mathematical formulations show some similarities, there are main conceptual differences between these models and ours. These approaches are stochastic, thus the main quantities are calculated with ensemble averages and loops arise as a consequence of stochastic fluctuations, or when randomly cutting edges in the network. Instead, our problem is deterministic and loops arise as a result of an optimization process while assuming a shared conductivity.
Solving this optimization problem directly by means of gradient descent is computationally expensive (see Appendix Sec. E 1). Methods relying on Monte Carlo schemes [2] can also be computationally demanding, and they are valid only when the optimal topology is known to be a tree. Instead, we derive update rules which have similar complexity as that of finding the steady states of our dynamics, and can be implemented with efficient numerical solvers. They consist in iterating between updating conductivities and fluxes as follows: complemented with Kirchhoff's law in Eq. (1), and can be put within the framework of fixed-point iterations. This generalizes results obtained adopting a similar approach for the one-commodity case [2,3]. We make available an open source implementation of the two approaches which we summarize here: finding the steady state of the dynamics by solving the systems of Eq. (1)-(4) (Dynamics); extracting the solution of the optimization problem with the iterative updates of Eq. (11)-(12) (Optimization).
We provide a pseudo-code for each of these in Appendix Algorithms (1) and (2). They have similar computational complexity that scales as O(M N 2 ), and are much faster than techniques based on gradient descent, see Appendix Sec. E 1.

A. Optimal topologies may have loops
Now, we address the important question of which network topologies are optimal for the cost in Eq. (10). For the analogous models in the one-commodity case, there is a phase transition at β = Γ = 1 where optimal networks pass from being trees (1 < β < 2, 0 < Γ < 1) to containing loops (0 < β < 1, 1 < Γ < 4/3) [1,2], see [9] for a thorough investigation of this transition. Remarkably, we obtain that in the multi-commodity case, loopy structures can be optimal also in the regime where trees were optimal in the previous models, depending on the values and locations of sources S i v and on the edge lengths e . The loopy structures in what was previously a tree-like regime arise from the colored Kirchhoff's law (1), distinguishing different commodities entering and exiting a node. Had we imposed a similar but "colorless" constraint i e B ve F i e = i S i v , trees would have been optimal.

B. Phase diagram tree-loops (S fixed)
To illustrate this, we consider the simple triangular loop Kirchhoff's law, we can write all the fluxes as a function of F a = F 1 a , F 2 a . Then, by choosing two arbitrary values of F 1 a , F 2 a we propose a loopy solution G L to compare against the trees, this is the leftmost bottom triangle in Fig. 2. We show that there are values of 0 < Γ < 1 for which this loopy solution has lower transportation cost than any of the trees. One can compute all the costs using Eq. (10) (see Appendix. C for details) and then find values of holds. To find an example solution, one could fix certain values for these parameters and then numerically solve Eq. (13); for few simple cases this can also be done analytically. We show an example phase diagram obtained by varying in Fig. 3, where we plot the values of Γ crit such that for Γ ≥ Γ crit , the cost J Γ (G L ) is optimal, i.e. we have a phase transition between trees to loopy optimal topologies. Notice that such values of Γ crit depend on the selected values of ( , F a ), and that optimal loopy solutions are not guaranteed to exist for any arbitrary configuration of these values. This can be numerically investigated using similarly reasoning as done for the case above. The important point here is that we could find at least one setting of ( , F a ) for which we have loopy solutions in the non-trivial regime 0 < Γ < 1. Similar arguments can be used to find phase diagrams in S when fixing , see Appendix C, Fig. 5.

C. Phase diagram tree-loops (lengths fixed)
To make more clear the consequences of the implicit interaction between different fluxes when imposing a shared conductivity and the optimization process is run, we show results on a simple synthetic toy model where we vary the load of one color, while keeping the others fixed.
Specifically, we study the triangle topology of Fig. 2 and consider two different configurations of S. The first has S 1 1 = −S 1 3 = 1 for the yellow commodity, i.e. one unit of mass of type i = 1 is moving from node 1 to node 3, while the purple commodity has S 2 2 = +2, and S 2 3 = S 2 1 = −1, i.e. two units of mass of type i = 2 are injected in node 2 and are equally split between destination nodes 1 and 3. This corresponds to the green triangle in the phase diagram of Fig. 4a. The second configuration has the same sources and sinks for the yellow commodity, while the purple mass is doubled, i.e. S 2 2 = +4, and S 2 3 = S 2 1 = −2. This corresponds to the red triangle in Fig. 4a. As we can see from Fig. 4b, both the optimal network topologies and the fluxes of individual colors differ in the two configurations. The important point here is that the fluxes of the yellow change, even though its forcing S 1 does not change between the two configurations. This is a consequence of having distinct commodities sharing a common infrastructure: acting solely on the purple mass impacts also the path taken by the yellow, and consequently the overall optimal network topology.
Additionally, the way the topology changes between these two configurations depends on the exponent Γ . In this simple scenario, we can have either a tree or a loop at Γ = 0.6 (Γ < Γ crit 0.77) or Γ = 0.8 (Γ > Γ crit ) respectively, see Fig. 4b. In particular, the case of Γ = 0.8 is a simple example of how the routing mechanism is responsible for the generation of loops in a multi-commodity setting. Finally, if we were to consider two separate unicommodity scenarios and solve the optimization for the two colors independently, we would have obtained a different result, as shown in Fig. 4c. In this case, the yellow remains the same in the two configurations, while the purple would simply double the amount of fluxes along edges, but the set of edges being used would stay the same.
In addition to the numerical analysis presented to study the generation of loops, in Appendix D we adapt to our "colored" case the proof of Proposition 2.1 in Xia [25], where it was demonstrated that one-commodity (i.e. "colorless" Kirchhoff's law) optimal transport paths are trees. Here we show that for our model optimal networks may contain loops.

IV. CONCLUSIONS
Although we have a rigorous theoretical understanding of the behavior of one-commodity flows in networks, comparable theoretical insights for flows of different types have been lacking. Here we propose a model for multi-  for which optimal solutions can be loopy; these are associated with the markers drawn on the heatmap, the green star is the configuration of the toymodel in Fig. 2. In particular, running our dynamics on the pink-diamond graph (resp. the yellow-plus) leads to a loopy configuration for Γ > Γcrit or to a tree if Γ < Γcrit. Running our dynamics on the blue-cross graph returns always since its Γcrit is equal to 1. commodity flows that extends and generalizes various results obtained for the one-commodity case. It assumes that all the commodities have the same priority by imposing their conductivities to be equal and that their dynamics is regulated by the 2-norm squared of the fluxes. By drawing from theoretical results of optimal transport theory, the equilibrium solutions of our dynamics are also stationary points of a cost function that can be interpreted as the sum of operating and infrastructural costs. As we tune a parameter β, our dynamics can solve various type of routing optimization problems. Its numerical implementation is efficient and scalable to large systems. Remarkably, our model shows how optimal loopy topologies can arise from simple dynamical rules. We explain how this emerges as a consequence of the colored Kirchhoff's law and how the theoretical proof valid in one-commodity fails when fluxes are vectors. We provide example phase diagrams on a simple toy model that illustrates how optimal topologies evolve from being trees to containing loops.
Our model is applicable to all situations where it is relevant to distinguish flow types and to consider how these interact. One important example of such an instance is in communication networks where packets of information need to be delivered at different destinations.
In our formulation the underlying network topology is given in terms of sets of nodes and edges. While our model allows for edge removal (and node removal as a consequence), it does not provide a mechanism for adding new connections. In order to allow for this, the natural modification of our approach would be to consider a continuous formulation as in [15,16]. In this case, we would have no underlying topology to start with, except the presence of sources and sink nodes at given locations in space. This is an interesting direction for future work.
In addition to solving multi-commodity problems, our model allows to draw a rigorous mapping between two different formalisms. In fact, while both the physics and optimal transport communities are actively investigating these systems, we still miss a clear connection between them, even for the one-commodity case. We make a first attempt to fill this gap by showing how our dynamics maps to a standard optimization setup while also generalizing to the multi-commodity case. Furthermore, we deploy two numerical methods that have lower computational complexity compared to others based on gradient descent.
We expect that our formalism can be further extended in the future to accommodate for more sophisticated in-Γ = 0.6 teraction between commodities or in multilayer networks [26] thus better representing specific application scenarios. Similarly, modifying the dependence of the fluxes in driving the dynamics and investigating possible mappings to suitable optimization setups are natural next steps.
We foresee that the insights gained about the structure of optimal topologies and in combining principles of optimal transport and physics will open the way to further studies targeting these systems. To facilitate this, we provide an open source implementation of our code at https://github.com/aleable/McOpt.
Here we prove that the functional proposed in Eq. (5) is a Lyapunov for the dynamics in Eq. (4) for 0 < β < 2.
To do that, we follow the derivations proposed in [17] for a similar problem. We need to show: (i) L β ≥ 0, (ii) L β ≤ 0 andL β = 0 if and only if {μ e } is a stationary point for the dynamics. The first condition is trivial. In order to prove the second one, we first define the quantity: which is the entry (u, v) of the Laplacian of a graph with adjacency matrix with entries A uv =μ uv / uv . We can thus rewrite Eq. Now, we claim that for each edge: with ∆P e being an M -dimensional vector of entries , v). Summing over i gives: notice that the term in parentheses in the l.h.s. is exactly the "operating cost" of Eq. (A14)

Equivalence between the Lyapunov transportation cost and the dissipated energy
We prove that the transportation cost J = which is the equality we wanted to show.

Appendix B: Mapping the dynamics to an optimization problem
We show that a constrained optimization problem with a cost function representing the total dissipated energy over the whole network has a solution with the same scaling as in Eq. (9).
Formally, consider the constrained optimization problem of Eq. (6)- (8). This can be turned into an unconstrained optimization problem by introducing Lagrange multipliers: Here we introduced a multiplicative factor 1/2(2 − β) for the Lagrange multiplier λ to ease calculations. Taking the partial derivatives w.r.t.μ e and setting them to zero (the optimality condition on the derivative of J β with respect to F e will be treated later on) yields, for each edge, (B2) This is the same scaling relationship obtained from the stationary state of the dynamics in Eq. (9), up to a multiplicative constant. It is also the natural "colored" generalization of the one-commodity case presented in [1,2,10], where instead of having ||F e || 2 one has the absolute value |F e |, as F e is a scalar quantity there. Imposing the global constraint in Eq. (7) allows to determine the value of the multiplier λ: which is analogous to that of the one-commodity case Eq. (5) in [2]. The same exact scaling can be recovered from our dynamics by settingμ e = 0 as shown in Eq. (A13)-(A14). The total dissipation is obtained by substituting Eq. (B5) inside Eq. (6), leading to: This cost is again analogous to that of the one-commodity case: Eq. (6) in [2] for γ = 2−β. Using similar arguments, i.e. noticing that the function x (3−β)/(2−β) = x (1+γ)/γ is monotonically increasing for 0 < γ = 2 − β < 2, the original minimization problem reduces to that of minimizing with respect to {F e } the cost of Eq. (10): which is analogous to the model of Banavar et al. [1]. Lastly, we can set to zero also the derivative w.r.t. F i e in Eq. (B1): recovering the classical result stating that the pressure p is (minus) the Lagrange multiplier obtained when we minimize the dissipated energy J β ({μ e } , {F e }) under the Kirchhoff's laws constraints.

Appendix C: Phase diagram for a toy model
Here, we discuss in more detail the computations described in Sec. III B to enforce the claim that networks with loops can be optimal for 0 < Γ < 1. The simple triangular loop of Fig. 2 (top) admits three possible tree topologies T i , i = 1, 2, 3, drawn at the bottom right of Fig. 2. Exploiting Kirchhoff's law we write the fluxes as a function of F a = F 1 a , F 2 a . Then, computing all the costs using Eq. (10), we get: Thus, we need to find values of { * = ( * a , * b , * c ), F * a , Γ * } for which Eq. (13) is satisfied. In practice, the lengths are usually given in input, thus we set a = b = 3 2 c , c = 1; we then propose F * a = (0, −1). Thus: Analytically from Eq. (C5)-(C7) or numerically solving Eq. (13) (fixing F * a as proposed for G L ) one can show that Γ * 0.83. Notice that such Γ * is not optimal, in the sense that for other choices of F a we may find lower values of the exponent Γ enabling loopy networks to be optimal, we denote the minimum of this values as Γ crit . Numerically solving Eq. (13) for the toy model just discussed returns Γ crit 0.77, as shown in Fig. 3  (green star). However, notice that the key point of this derivation is that we could find at least one choice of ( , F a ) for which we have loopy solutions in the nontrivial regime 0 < Γ < 1. Indeed, at Γ = Γ * we have The same procedure can be used to find the phase diagram of Fig Sketch of the trimming procedure described in Appendix D, on the toy model of Fig. 2. The tree obtained in the colored case is not optimal, while the one obtained in the colorless has lower cost but violates Kirchhoff's law. among these weighted graphs, i.e. loop-less topologies with weights minimizing J Γ as defined in Eq. (10). These trees (not necessarily unique) can be obtained by taking a weighted graph G L with a single loop denoted as L, and cutting the loop by trimming one of its edges, then redistributing the fluxes passing through the trimmed edge over the remaining links of L. We assign an arbitrary orientationê to the edges of L so that ê ,ê = ±1, where the directionê of each link of a graph is uniquely determined by its incidence matrix. The edge to be cut is the one with smallest weight over the edges in the loop with a negative direction with respect to the graph's orientation. Its flux is redistributed over the remaining edges, which now make a tree. Formally, we assign to the edges of T fluxes F * e such that their entries are F i e * = F i e + ê ,ê F i min ∀e ∈ E, ∀i = 1, . . . , M , (D1) with F min = (F 1 min , . . . , F M min ) = arg min e { e ||F e || Γ 2 : ê ,ê = −1} and F e are the fluxes of G L . The orientation of L can be switched in case the set of edges with negative orientation is empty. In the one-commodity case, as in Xia [25], there is a similar trimming, but with scalar weights on the tree being F * e = F e + ê ,ê F min , where now all the fluxes are numbers. The key effect of having a scalar trimming is that F * e can become zero as a result of having a negative orientation ê ,ê ; in words, the flux F min adds negatively to the fluxes originally present in G L along the edges in the loop with negative orientation, if F min = F e then F * e = 0. Here instead, in Eq. (D1) we add a vector. While we might have that for certain components the flux cancels out, the norm of the whole vector F e might not be zero, because not all the components (colors) cancel. This results from imposing a colored Kirchhoff's law.
We illustrate this procedure on the triangle network in Fig. 2. In particular, Fig. 6 shows how this trimming applies in the colored case (our case) against a colorless case. The tree obtained in the colored case is not optimal, while the one obtained in the colorless has lower cost but is not valid, as it violates the constraints enforced by Kirchhoff's law. The consequence is that now loops can be the optimal solutions while in the one-commodity case optimal networks were trees. Specifically, there exists a Γ crit ∈ (0, 1) (or β crit ∈ (1, 2)) such that we have a phase transition between trees and loopy structures. The value of Γ crit depends on (S, { e }). two consecutive time steps, which can be arbitrarily set in input. Solutions of Kirchhoff's law have been computed using a sparse direct solver. Lastly, we impose the following convergence criteria: convergence is achieved when these conditions are satisfied where τ dyn , τ opt > 0 are parameters arbitrarily set in input.
To test our methods, we developed a momentum-based gradient descent as baseline algorithm. This consists in the component-wise iterative update of the fluxes using with η, δ > 0 fixed increment rates and V i e 0 = F i e . We fixed the convergence criteria analogously to what done for the other two methods: max e ||F e || t+1 2 − ||F e || t 2 /η < τ gd , with τ gd > 0 a parameter that needs to be set in input. In our experiments we set it to τ gd = 10 −2 . From a theoretical point of view, the comparison with a standard gradient descent method was proposed in light of the equivalence of our dynamics and a mirror-descent approach for the Lyapunov functional, as proved for β = 1 in [27]. The dynamics automatically preserves positiveness of the conductivities {μ e }, thus a large time-step can be used. On the contrary, using purely gradient descent approaches, the time step size must be reduced when some entries of the vector {μ e } go to zero. After running our algorithms until convergence, the original network is trimmed by removing edges with negligible fluxes. Formally, we remove links for which ||F e || 2 < τ , with τ > 0 arbitrarily fixed. Typically, as we empirically found, the distribution of ||F e || 2 over the edges is divided in two sets having values differing by several orders of magnitude. It is thus straightforward to distinguish what edges to be trimmed, i.e. those that have negligible values compared to the rest of the distribution.

Computational complexity
Each temporal step executed in our algorithms requires the approximate solution of M linear systems of dimension N . This operation has been carried out by means of a sparse direct solver (UMFPACK) that performs a LU decomposition for each column of the right hand side of Eq. (1). The total computational complexity of this process scales as O(M N 2 ). To have a better understanding of this we tested our models with several synthetic Waxman networks obtained by placing N nodes uniformly at random in a square domain of size 1. Nodes are connected with probability p = A exp (−d/αL), where A, α, L are parameters that we fix arbitrarily to A = 1/4, α = 1/4 and L = 1; d is the euclidean distance between a pair of nodes. The matrix S is constructed assigning a total inflowing mass of 10 4 at random to M nodes, and redistributing on the nodes of the network proportionally to their inflows.
We test the efficiency of our schemes by measuring the total running time (in seconds) to reach convergence for different values of β, M, N . Results are shown in Fig. 7. We notice that the two algorithms Dynamics and Optimization have similar computational complexity. Their small running time's differences are negligible and only due to how convergence is precisely defined, i.e. how the corresponding parameters τ dyn , τ opt are set. The running time is shorter for β < 1 (traffic optimization, loopy) than in the opposite scenario of β > 1 (minimization of infrastructural cost, tree-like). The case β = 1 is more nuanced, as the cost transitions between two opposite situations. In this case, Optimization fails to converge for M/N < 1, if convergence is defined in terms of variations of ||F e || 2 between iteration steps. This is because the algorithm gets lost in degenerate local minima, configurations with same cost but different set of fluxes. This lack of con-vergence suggests that, for β = 1, the energy landscape around these minima is flat, i.e. there are many configurations with same cost but non-negligible differences in their fluxes. The Optimization routine keeps switching between these different states. In this case, one can simply pick one of these possible many solution as an example local optima. The dynamics does instead converge. This suggests that Dynamics is biased towards one of these degenerate solutions. For M/N = 1, Optimization converges with same running time as Dynamics, suggesting that as we enlarge M the landscape becomes less flat. A possible cause is that by increasing M the system has more constraints to be satisfied via Kirchhoff's law, which reduces the number of possible degenerate solutions. This claim is also supported by the behavior of Dynamics' running time, which does not monotonically increase with M/N in this case, as shown in Fig. 7b. These behaviors highlight relevant differences between the two implementations. Finally, we note that the computational complexity could in principle be further reduced to O(M N ) using multigrid methods [28], we do not explore this here.