Enhancing transport properties in interconnected systems without altering their structure

Units of complex systems -- such as neurons in the brain or individuals in societies -- must communicate efficiently to function properly: e.g., allowing electrochemical signals to travel quickly among functionally connected neuronal areas in the human brain, or allowing for fast navigation of humans and goods in complex transportation landscapes. The coexistence of different types of relationships among the units, entailing a multilayer represention in which types are considered as networks encoded by layers, plays an important role in the quality of information exchange among them. While altering the structure of such systems -- e.g., by physically adding (or removing) units, connections or layers -- might be costly, coupling the dynamics of subset(s) of layers in a way that reduces the number of redundant diffusion pathways across the multilayer system, can potentially accelerate the overall information flow. To this aim, we introduce a framework for functional reducibility which allow us to enhance transport phenomena in multilayer systems by coupling layers together with respect to dynamics rather than structure. Mathematically, the optimal configuration is obtained by maximizing the deviation of system's entropy from the limit of free and non-interacting layers. Our results provide a transparent procedure to reduce diffusion time and optimize non-compact search processes in empirical multilayer systems, without the cost of altering the underlying structure.


I. INTRODUCTION
A wide variety of social, natural and artificial systems are inherently complex [1].For instance, societies exhibit rich microscopic dynamics, at the level of single individuals, that might lead to emergent collective phenomena at larger scales [2] -e.g., financial collapses or revolutions -which are usually difficult to predict [3].Natural systems, like ecological ones, are characterized by complex topologies that, in presence of interdependencies with other networks, exhibit a richer response to perturbations with respect to the case where they are considered in isolation [4].Similar phenomena have been observed also for engineering systems, from transportation to communication networks, where their resilience to targeted attacks or random failures of their units is important for applications [5][6][7], such as robustness enhancement or recovery strategies [8,9].
In fact, complex systems are characterized by a wide variety of physical attributes and dynamics, making difficult to analyze them within a unified framework.What do electrochemical signals exchanged among neurons in the human brain, mobility of goods/people/vehicles between different areas of a urban ecosystem, spreading of an infectious pathogen through social contact patterns and financial transactions through the backbone of a stock market have in common?From a very general perspective, the corresponding systems consist of interconnected units exchanging information similarly to how computers and servers exchange packets of bits through the Internet.In fact, it has been originally conjectured * Corresponding author: mdedomenico@fbk.euby Murray Gell-Mann that despite their differences, complex systems resemble one another in the way they handle information and, consequently, studying how information is exchanged and processed provides a promising starting point for exploring transport phenomena in order to understand how such systems operate.
A general way to model the propagation of information through complex structures is by means of diffusive processes, such as random walks [10], since they are versatile models that allows one to cope with the uncertainty in the structure -e.g., temporary link failures or damages -of complex systems, they are based on local knowledge of structure [11] -which is usually less expensive than models relying only on the global knowledge of the topology, like shortest-path-based ones.Nevertheless, there is increasing evidence that units of real complex systems do not preferentially exchange information along shortest paths, too expensive in terms of routing.In fact, the human brain operates in an intermediate communication regime, neither based on shortest-path routing nor on a diffusion process [12], while the Internet relies on routing tables more specifically, forwarding tables chosen by the routing algorithm on each machine to find sub-optimal communication pathways through the routing hierarchy [13,14].Even individual human mobility, usually assumed to be far from random [15] is affected by arbitrariness of individuals actions responsible for a stochastic component [16] in trip patterns characterizing human flows [17], which is usually modeled by means of stochastic processes ranging from random walks to variants of brownian motion and Levy flights [18].In this work, we present a way to alter the dynamics on top of multilayer networks, while keeping their structure (see Fig. 1).Without changing the underlying structure, shortest paths across the whole multiplex system do not In structural reduction (left) one alters the structure by iteratively aggregating layers, e.g., by summing the corresponding adjacency matrices.However, this type of intervention usually comes with some costs (e.g., temporal, infrastructural, economic, etc.).Here, we propose functional reduction (right), an alternative approach where the structure of the system is not altered while coupling the dynamics on the top of layers.While any dynamics is allowed in principle, diffusive processes are particularly suitable ones because they allow for theoretical and computational treatment.In this work, random walk dynamics is considered and layers are functionally coupled if their network states are not distinguishable from the point of view of a walker (i.e., the same color is assigned to functionally coupled layers).From an application perspective, this approach corresponds to shared tickets for public multimodal transport infrastructure or shared promotions for users who are part of multiple social networks at the same time.
change accordingly.This unfortunate feature of shortest paths makes them useless to enhance transport properties under the hard constraint of keeping the structure unaltered.
Therefore, constituents of complex systems tend to or are designed for exchanging information in a very efficient way to function properly.An emblematic example is given by the human brain, where the collective dynamics of billions of neurons is responsible for coordinating the human body and cognitive activities, while keeping energy cost as low as possible [19].However, a deeper understanding of how to act on the system to enhance or hinder transport phenomena -such as navigability [20] continues to elude us, because they depend on the interplay between structure and dynamics of a complex system, which is usually difficult to model and quantify [21].Nevertheless, understanding such an interplay represents one of the most important challenges in complexity science, with promising advances.In fact, it has been recently shown that the relation between structure and dynamics of information flow can be unraveled by analyzing how perturbations propagate through the system [22].The complex interplay between structure and dynamics of information propagation has been also mapped to its latent geometry to better understand network-driven contagion phenomena [23] and, more recently, it has been shown how signal propagation is able to capture the role of network connectivity in propagating local information, thus linking the topology to the observed spatio-temporal spread of perturbative signals across it [24].
However, the same task is even more challenging when the system of interest can be described in terms of a multilayer network [25][26][27][28][29][30][31].For instance, multilayer models for urban and regional transportation systems -accounting for different means, from rails, to ships and flights, serving the same geographical areas simultaneously -have highlighted that an efficient flow might be hindered by the lack of synchronization between different layers [32].
Up to date, it is still unknown how to enhance flow distribution in multilayer systems, especially under constraints.For instance, one might want to add new fast connections (e.g., highways, tube, flights, etc.) between distant geographic areas to speed up the traffic flow.However, since each connection comes with an economic cost, it is very unlikely that most of them can be realized in practice.Similarly, one might avoid to cut existing connections, even if desirable in some cases, because they might have a high societal cost in terms of impact on the population.It has been recently shown that for multilayer networks with interlinks, changing the weights of interlinks can enhance the diffusion on top of the net-works [33] and lead to the phenomenon of "super diffusion".However, in many complex systems, as the ones which can be modeled by coupled networks, interlinks either do not exist or it is not trivial to assign weights to them.Therefore, the challenge is to enhance transport properties without altering the existing structure: this can be achieved, in principle, by functionally coupling together layers with similar flow patterns.When each layer of the multilayer system is represented by a specific color, an emblematic way to understand functional coupling of two layers is to assign the same color to both of them and make them indistinguishable, as illustrated in Fig. 1.
It is worth remarking that this approach leaves the structures of layers intact, while only altering the dynamics on top of them, in a way that the dynamical process can not recognize functionally coupled layers as distinct ones.For instance, this type of solutions is adopted, to some extent, by airlines proposing shared flights to their customers.
Intuitively, one might think that reducing structural redundancy -defined in terms of replicated connections across layers -can provide a solution.Remarkably, the recent analysis of common estimators of structural redundancy, such as edge overlapping, revealed that the presence of redundant connectivity might boost the robustness of multiplex networks [34].However, when more complex indicators of redundancy are needed, our understanding is dramatically more limited.On the one hand, recent studies focused on quantifying the distance between two complex networks in terms of their structural dissimilarities [35] -as a distance between probability distributions extracted from the networks -or their spectral information content [36] have been successfully used to compare classical, single-layer, networks.On the other hand, structural dissimilarities [37] and spectral entropy divergence [38] between layers have been recently proposed as effective approaches for reducing the structure of such systems, based on different ordering strategies.However, methods relying only on structural features are not suitable for our purposes -as they are costly in practice -and they perform poorly, as we show later.The reason is that they are based on heuristics, lacking a fundamental understanding of the underlying physics.
We show that a fundamentally different perspective must be considered instead of structural redundancy: by using random walk dynamics as a proxy for information flow, transport phenomena are enhanced when subsets of layers are functionally grouped together in such a way that the corresponding diffusion pathways are maximally different.Therefore one must analyze the functional redundancy of a multiplex system, defined by abundant and redundant diffusion pathways that distribute the flow between different system's units.
In the following, for sake of clarity, to avoid any confusion with structural redundancy, we will refer to functional reduction and functionally reduced networks to indicate a loss of redundancy in diffusion pathways across layers.At variance with structural reducibility, functionally reducible multiplex systems consists of layers that can be functionally grouped into subsets to enhance transport, whereas irreducible systems do not allow for any functional aggregation.The difference between structural reducibility that alters the structure of multilayer networks, and the proposed functional reducibility that only alters the dynamics on top of multilayer networks, is illustrated in Fig. 1.
In this work, we develop tools familiar to physicists -such as partition function and mean-field calculations -to investigate the interplay between structure and dynamics of multilayer systems from a spectral perspective -an approach revealing successful for a similar purpose in the case of classical networks [39][40][41][42].By exploiting the relation between the flow distribution in a multiplex network and the spectral diversity of its layers, our framework can provide a broad spectrum of applications -from more optimal supply strategy that combine transportation of goods to more efficient navigability of urban areas by allowing for multimodal trips with the same ticketthat can be achieved by devising tailored policies targeting the dynamics of systems, without the cost of changing their structure.

II. STATISTICAL PHYSICS OF RANDOM WALKS IN COMPLEX NETWORKS
In this section, we introduce the state of multiplex networks, discuss the role of their structure in hindering the information flow, find an inequality relating partition functions of single layers to the partition function of the whole system and develop a mean-field approach to allow for analytical derivations of the next sections.Note that most mathematical details are provided in the Appendices, while we provide here the theoretical grounds required for defining and understanding the functional reducibility of multilayer systems.Information flow in multiplex networks.-Multiplexnetworks are a special class of multilayer systems providing a successful and widely used model for a broad variety of empirical systems [29,30].This class of networks is characterized by nodes with multiple types of relationships or interactions, simultaneously, which are encoded by layers.Here, physical nodes correspond to the physical units of the system, whereas state nodes correspond to the replicas of a physical node across layers.
Information flow in complex networks has been successfully modeled by diffusive processes such as random walks [10].However, while random walk dynamics has been previously introduced for interconnected multilayer systems [6,26], the case of multiplex systems has been recently studied only by using the full aggregation of the system into a single-layer network [43].Clearly, the dynamics of a random walker on the aggregated representation of a multiplex system can not coincide with the multiplex dynamics.
To describe the random walk dynamics on top of multiplex networks, we first find the corresponding transition matrix, that is given by T = T ( ) (See Appendix B), and encodes the probability of jumps from each node to any other.Similarly, the normalized Laplacian matrix corresponding to the multiplex random walk dynamics can be obtained as L = I − T = L ( ) , being I the identity matrix.This means that the normalized Laplacian matrix encoding random walks on a multiplex network is the weighted average of Laplacian matrices of layers.Importantly, each layer's weight corresponds to its contribution to the overall flow of information.As these weights are often hard to assess or not known at all, when this is the case we assume that there is no preference on layers, and from now on we consider uniform distribution of weights, 1/L, for a multiplex network of L layers.Nevertheless, it is worth remarking that the proposed method and the theoretical framework are valid for any general distributions of weights.
The propagator (see Appendix B) governs the temporal evolution of the random walk and, for this reason, it encodes information about the interplay between the structure of the system and the random search dynamics on the top of it.In fact, it has been recently used to define the state of the system [36], similar in spirit to the approach widely used in quantum statistical mechanics to define the mixed state of entangled quantum units.In this framework, the network state is defined by ρ(τ ) = e −τ L Z , where Z(τ ) = Tr e −τ L is a normalization factor, playing the same role of the partition function of the system.This state, which is formally identical to a density matrix [44], is then used to calculate the Von Neumann entropy of a complex network as S(τ ) = −Tr (ρ(τ ) log 2 ρ(τ )).See Appendix A for further details.Physical meaning of the partition function.-Here,we provide first a deeper understanding of the meaning of the partition function.In fact, Z(τ ) = N R(τ ), being e −τ λi the average return probability of random walker and λ i is the i-th eigenvalue of L (see Appendix C).As mentioned before, random walk can be used as a proxy of information transport in complex networks.Intuitively, high average return probability is associated with the tendency of structure to trap the information flow in its initial place.So, this correspondence between partition function of the system and average return probability, unravels how partition function relates to transport phenomena in complex networks.Furthermore, one can define density matrices in terms of any valid Laplacian encoding other types of dynamics (e.g., continuous diffusion).In this case, while the average return probability can not even be defined, the partition function still exists and corresponds to the amount of information that is trapped in its initial place and flow more difficultly towards peripheral parts of the network.
As mentioned above, high average return probability indicates that the random walker can not explore efficiently the rest of the network.Consequently, the presence of structural symmetries and abundance of redundant diffusion pathways for information to propagate between system's units are expected.For instance, in an ordered and symmetric structure like a regular lattice, where each node interacts only with its first neighbors and no long-range interactions are possible, the information exchange between distant nodes is slow and inefficient.Conversely, low average return probability indicates that the random walker can navigate the network faster.For instance, breaking down structural order by adding long range interactions between distant nodes, generating topological shortcuts and other defects like in small-world [45] and scale-free [46] networks, information flows faster and transport properties are enhanced, accordingly.Z(τ ), as a measure of transport that we name average dynamical trapping in the following, is able to encode these phenomena.The same arguments apply to multiplex networks, with the difference that in this case we are wondering if the redundancy of diffusion pathways across layers can be reduced by superimposing subsets of layers, in order to reduce the average dynamical trapping.
Fundamental inequality for multiplex systems.-Multiplexity of interactions among units generate nontrivial dynamical correlations between layers that have no counterpart when layers are considered in isolation.This important difference can be characterized in terms of differences in information content, quantified by average entropy distance of the multiplex and its layers (see Appendix D), that we call intertwining.In the next sections, we discuss the central role of intertwining in functional reducibility.Directly from intertwining, a fundamental inequality between the partition function of a multiplex system as whole and the partition functions of its layers can be derived.This inequality is important, as it relates the transport phenomena of multiplex system and layers, through average dynamical trapping (see Appendix D for details): where equality holds if and only if all the layers are the same, i.e. the random exploration of the network does not depend on any specific layer of the system.This equality defines the non-interacting scenario where layerlayer correlations do not alter the underlying dynamics: the entropy S ( ) (τ ) of each layer is calculated separately and the overall entropy is given by their average S nint (τ ) = S ( ) (τ ) .Conversely, any topological alteration of the non-interacting scenario introduces a dynamical correlation between layers, requiring the exploration of layers to gather more information about the system: in this case the network consists of interacting layers, where the diffusion dynamics on the whole multiplex network is considered to measure the entropy S int (τ ).Mean-field entropy.-Ananalytical treatment of Von Neumann entropy is, in general, difficult and we have developed a spectral mean-field theory to cope with this complexity (Appendices E-F):

Minimum-Redundancy Functional Coupling
In the next section, we use mean-field entropy to find a simple representation of intertwining, in terms of S int and S nint .

III. QUANTIFYING LAYER-LAYER INTERACTIONS: THE INTERTWINING
Using fundamental inequality (1) and mean-field entropy(2), it can be shown that S int (τ ) ≤ S nint (τ ) (Appendix G): i.e., layer-layer interactions lower system's entropy and, consequently, its partition function Z(τ ).We exploit the average information divergence of layers from the whole (see Eq. D2) rescaled by its upper bound, to provide a normalized measure that quantifies the importance of describing the system in terms of coupled networks rather than networks in isolation.For values of τ sufficiently large, and in absence of isolated state nodes, it can be shown that the relative intertwining reduces to the deviation between interacting and non-interacting entropies (see Appendix I for details): which is bounded between 0 (i.e., the system is reducible to a single network) and 1 (i.e., the system is irreducible).Before presenting how to use the relative intertwining for the analysis of multiplex networks, we derive analytically its direct dependence on the partition function and the eigenvalues of the normalized Laplacian matrix, especially on λ 2 -the second smallest one -which governs many transport phenomena, as shown in the following.From the master equation, the Laplacian matrix of the multiplex network is given by L = L ( ) : here, we write the Laplacian matrix L ( ) of each layer as the perturba-tion L ( ) = L + ∆L ( ) , reflecting in the corresponding eigenvalues as λ It can be shown that ∆λ ( ) = 0 (see Appendix I) and that (∆λ ( ) ) 2 ≥ 0, the latter quantifying the influence of the perturbation to each layer.The variance is zero only in the case of identical layers.
A quantity of interest is the variance averaged across all layers, i.e. ∆λ ( ) 2 , which measures the overall influence of the perturbation: it is expected to increase for increasing deviation of layers from the average, i.e. for increasing diversity of the layers.In fact, we show that the relative intertwining is directly proportional to this variance (Appendix I): which increases as the diversity of diffusion pathways of layers increases.
Similarly, Z ( ) (τ ) = Z int (τ ) + ∆Z ( ) (τ ) holds for partition functions and we show that (see Appendix I for details) Equations ( 4) and ( 5) provide the first fundamental result of this work: they show that by minimizing the partition function of the system one maximizes the relative intertwining while favoring the maximum functional diversity of layers.The second fundamental result of this work is the direct relationship between the relative intertwining and the second smallest eigenvalue λ 2 of the normalized Laplacian matrix (see Appendix M for details): This relation is important because it highlights how maximizing intertwining corresponds to maximize λ 2 , which in turn is equal to the inverse of diffusion time and it plays a crucial role for navigability as two important transport properties of complex networks (see Appendix L).
Thus, by maximizing the relative intertwining, one actually enhances the most important transport phenomena on multiplex systems and, as a byproduct, reduces their dimensionality.To use our framework for practical applications, we need a strategy for identifying similar layers in a multiplex network, aggregating them accordingly, and a stopping criterion.We use a novel approach, namely collective cosine distance, which is computationally faster and more reliable than existing methods when dealing with isolated nodes and multiple connected components (see Appendix J) to cluster layers.Note that we perform an exhaustive pairwise search among possible functional groups, de facto getting rid of heuristics such as hierarchical clustering [36,38].

IV. FUNCTIONAL REDUCIBILITY OF MULTIPLEX SYSTEMS
For each value of the time τ , we calculate all pairwise distances between layers to to find the most similar ones: at each step m of this sequence, the corresponding value I (τ ; m) of the relative intertwining is calculated.Finally, the values of relative intertwining are averaged over τ as I (m) = I (τ ; m) and used for comparison against navigability, diffusion time at each aggregation step.We repeat this procedure until the original system is aggregated into a single network, representing the superposition of the whole system from the dynamical perspective.The average relative intertwining is expected to reach a maximum value if any desirable superposition of subsets of layers exists (see Appendix I analytical details): in this case, the resulting system is characterized by enhanced transport properties.At this optimal grouping, in the sense that it is the most desirable configuration among the possible ones, the remaining number of layers can be indicated by L opt and one can define the reducibility of a multilayer network as χ = (L − L opt )/(L − 1) to characterize this system's feature with one scalar.Note that χ quantifies the fraction of layers grouped together: one can use either structural reducibility [38] -maximizing the topological distinguishability from the aggregate -or functional reducibility -maximizing differences in diffusion pathways across layers -defined in this work.However, when transport phenomena are a concern, functional reducibility must be preferred.Synthetic systems.-Wehave performed several numerical experiments to validate our theory on toy models with functionally reducible or irreducible systems.We show one representative case for each case in Fig. 2. For the reducible case, see Fig. 2A-C, our framework is able to detect functional redundancy and reduce it as expected, while lowering diffusion time, average return probability and dynamical trapping, and increasing navigability.For the irreducible case, see Fig. 2D-F, a multiplex system consisting of noisy and independent layers is not functionally reducible, as expected.For further synthetic systems, see Appendix N.
Remarkably, all transport properties considered here are enhanced by functionally coupling layers according to our theoretical framework.
In the following, we apply our framework to analyze the functional reducibility of empirical multilayer systems, including biological, social and transportation ones.Biological system.-As a representative biological system, we consider the largest connected component of molecular interactions of 263 proteins characterizing the proteome of Xenopus Laevis (the African clawed frog) [38], where layers encode different genetic interactions (Association, Direct interaction, Physical association, Colocalization and Suppressive genetic interaction defined by inequality) [49].Note that the last layer con-  Our analysis suggests that, from a functional perspective, protein-protein interactions characterized by Association and Colocalization in this organism can be coupled together.This result is in striking disagreement with the result obtained from structural reducibility, where the other two layers were aggregated because indistinguishable from an information-theoretic perspective [38].Social system.-Thesocial system considered is the largest connected component of the collaboration network of the 475 scientists affiliated to the Pierre Auger experiment [47], the largest observatory of ultra-highenergy cosmic rays.Here, collaborators work together in different research topics on specific tasks defining the 16 layers (e.g., Source detection, Mass composition, Experimental enhancements, Shower reconstruction, etc) of the multilayer co-authorship system.Note that one layer was so sparse with respect to the other layers that we have discarded it from the analysis.Results are shown in Fig. 3B and summarized in Tab.II.
In this case, our analysis suggests the existence of two functional groups of layers.The two groups do not reflect similarities observed in the multilayer community organization of scientists [47], highlighting how the suggested functional coupling does not reflect trivial or complex structural features.Urban and large-scale transportation systems.-Wefurther consider two transportation systems at different scales: the urban public transportation system of London consisting of 11 tube lines, DLR and Over-functional group layer ID (layer name)  ground [6] and 369 stations, and the European airport network [48], consisting of 417 airports connected by flight routes served by 37 airlines defining layers.Results are shown in Fig. 3C-D and summarized in Tab.IV-III, respectively.
Remarkably, in the case of London transportation, the result differs from the one obtained from structural reducibility, which instead predicts the aggregation of only one layer (χ struct = 8% [38]).This effect is due to the fact that distinguishability with respect to the aggregate network is not sufficient, alone, to capture the complex effects due to layer-layer coupling which, instead, are perfectly accounted for by the system's intertwining.In fact, the functional reducibility χ obtained in this case is 33.3%, with the advantage of lowering both the diffusion time and dynamical trapping.Interestingly, DLR and Overground, which are very different from other tube lines, are not functionally coupled with other layers, confirming that, from a functional perspective, they are not redundant for flowing passengers.
In the case of the EU airport network we observe a functional reducibility of about 35%.It is worth mentioning that the analysis of structural reducibility of a larger data set -encoding flight routes in Europe from 175 airlines serving 1064 airports -reported a reduction of 6% [38]: we will discuss better this result in the next section.
It is worth discussing about results obtained for the navigability of all empirical systems considered here.In fact, this transport descriptor is not enhanced, at variance with the other descriptors, when intertwining is maximized.However, this is not a limitation of the proposed framework because the equation relating intertwining to navigability has been obtained under an approximation that, evidently, is not satisfied when many isolated state nodes are present.In fact, even the coverage, and consequently the navigability, has been originally in-functional group layer ID (layer name)  troduced to deal with interconnected multilayer systems where state nodes exist in all layers.In this work, we have introduced its generalization to the case of noninterconnected multiplex networks: when state nodes are defined in all layers, as in our synthetic benchmarks, both the coverage and the navigability behave as expected and, in fact, are correctly enhanced.When a large fraction of state nodes are present only in a few layers, as in most empirical systems, random walkers are more unlikely to reach those nodes, dramatically reducing the overall probability of hitting the corresponding physical nodes.Since the study of a new measure of coverage and navigability is beyond the scope of this work, we leave it for future studies devoted to better define such measures to deal with more exotic topological scenarios.

V. CONCLUSIONS AND OUTLOOK
This study provides a transparent framework to better understand the interplay between complex topologies and non-compact search dynamics [50], allowing one to exploit the multifaceted interactions among constituents of complex systems -usually encoded by multilayer networks -to enhance transport phenomena without altering the underlying structure.
First, we defined the average dynamical trapping of a complex network to quantify the efficiency in information exchange between its constituents: the flow distribution is more efficient when this trapping is minimized.Therefore, we demonstrated that this concept is intimately related to the spectral partition function of a complex network, allowing us to map the analysis of transport phenomena into the analysis of system's spectral entropy.
By applying the same principle to multilayer networks, we have identified a strategy for coupling subsets of layers -according to a functional criterion -in order to minimize the overall partition function.In fact, layers are grouped together by considering their superposition with functional group layer ID (layer name)  respect to random walk dynamics.The choice of layers to superpose is driven by the deviation of system's entropy from the limit of non-interacting layers.This approach, justified from an information-theoretic perspective, has an elegant physical meaning: redundant diffusion pathways across layers are reduced until the multilayer system consists of layers with very different transport properties.This results is confirmed theoretically, showing that the average variance of the corresponding eigenvalue spectra must be maximum.
Results from synthetic benchmarks were in agreement with our expectations.Similarly, results from empirical multilayer systems confirmed the possibility to functionally couple group of layers to enhance information flow in the corresponding systems.
For instance, in the case of the collaboration network of Pierre Auger scientists, we provided evidence that some tasks should be coupled together.A mailing list -or any other shared communication mean -involving all scientists collaborating on the corresponding tasks would facilitate exchange of relevant information among peers.In fact, at the time described by the data set (i.e., between 2010 and 2012), an independent mailing list was assigned to each task and scientists had to subscribe to multiple ones to be updated with relevant information.However, this approach was not really efficient: it was not unusual to receive multiple times the same information.
The case of the European airport network was interesting as well.From a policy perspective, the functional coupling of routes corresponding to different airlines suggests that passenger flows would benefit from shared tickets.Among the identified functional groups, many are very reasonable when the location of the corresponding headquarters is considered: e.g., Air France and Swiss International Air Lines, Iberia and TAP Portugal, Ryanair and Flybe.Remarkably, while airline companies tend to avoid overlapping routes, as demonstrated by a structural reducibility of 6% [38], they are correlated (functional reducibility of 35%) in a such a way that, from a functional point of view, they can still improve their transportation service.
In the case of London public transport, we have found that a few configurations close to the functionally optimal one are also plausible in terms of enhanced transport properties and intertwining.This result was not surprising: the layers of this multiplex system have treelike structures which tend to avoid topological overlap towards peripheral areas, thus different functional configurations might enhance transport in a similar way.
On the one hand, our findings open the doors to a broad spectrum of applications where superimposing layers with tailored policies induces relevant and controllable changes in transport phenomena, reducing diffusion time and enhancing navigability in multilayer networks.Nevertheless, the proposed approach can be easily adapted to scenarios where one physically acts on system's topology, not necessarily multilayer, for instance by adding (or removing) either its constituents or its connections.On the other hand, the theoretical framework developed in this work can be used further explore the bridge between statistical physics and network information theory.
Appendix A: Basic notation Given a complex network of N nodes, undirected and weighted, the corresponding normalized Laplacian matrix governing the random walk dynamics, is defined by being W the adjacency matrix of the network and D the diagonal matrix with entries The corresponding density matrix [36] is defined by The spectral entropy of the network is therefore calculated according to the definition of Von Neumann entropy in case of a quantum system: where Z(τ ) = Tr e −τ L is the partition function.
In the following we will use the notation Z int (τ ) and S int (τ ) to indicate, respectively, partition function and spectral entropy of a multiplex network.We will avoid to specify the index where this choice does not generate ambiguity.

Appendix B: Random walk dynamics on multiplex
Random walk dynamics on top of multiplex systems has not been yet defined.Therefore, for our purposes, we firstly introduce a more general framework to derive the multiplex master equation as a function of diffusion time τ .
In fact, when exploring a layer isolation, a random walker jumps from node i to node j with probability T , where T ( ) is the transition matrix of the random walk, A ( ) is the adjacency matrix of the layer and k ij is degree of node i in layer .Interacting layers in a multiplex network exhibit a richer dynamics, as the random walker in node i can first switch from layer to layer with probability P (i) , and then jump from node i to node j with probability T ( ) ij .Although all results of this paper are valid for any probability distribution, in absence of knowledge about the relative importance of weights,let us assume in the following that the probability to switch layer is uniform, i.e.P = 1/L: then the overall probability to observe a transition from i to j, regardless of the layer, is given by ij , which can be written in a more compact notation as T = T ( ) .The weighted average, in case of further knowledge about the layer's weights, becomes: ( ) , where P ( ) is the contribution of layer in the total flow.
It is worth remarking once again that the dynamics is dramatically different from aggregating all layers to a single network and then calculating the transition matrix of the random walk.In fact, two layers encode networks which are represented by adjacency matrices: aggregating two layers is achieved by summing entry-wise their adjacency matrices.Similarly, superimposing layers makes them indistinguishable to the eyes of random walker, so the dynamics can be seen as random walk on the aggregated network.Evidently, the distinguishability of layers' dynamics, plays an important role in transport phenomena of complex systems and cannot be simply neglected.In the following, we will use the terms aggregation and superposition interchangeably.Two or more layers are functionally grouped together, or aggregated, if the corresponding networks are superimposed.
Having the transition matrix, we can find the Laplacian matrix of multiplex given by L = I − T. This allows for finding the master equation for random walk on multiplex networks.If the i-th components of the vector p(τ ) indicates the probability to find the random walker in node i at time τ , its evolution is governed by the master equation which, in the continuous-time approximation reduces to (Eq.(B2).) with solution given by p(τ ) = p(0)e −τ L .Here, L indicates the normalized Laplacian matrix of the multiplex, p(τ ) is the vector encoding the probability of finding the random walker in a specific node at time τ , and e −τ L is the propagator of random walk dynamics.Note that the Laplacian matrix of the multiplex network is derived in terms of the average of Laplacian matrices L ( ) ( = 1, 2, ..., L) of the single layers, i.e., L = L ( ) .

Appendix C: Dynamical trapping
To show this, let p(0) = p(i|0) ≡ (0, ..., 1, 0, ..., 0) indicate that the walk originates in node i at time 0 with probability 1.In practice p(i|0) = e i is the i-th canonical vector in R N , being N the size of the network.The return probability for node i at time τ is the probability of finding the random walker in i at time τ steps later assuming it originated in i at time 0, i.e. p(τ )p † (i|0), where † indicates the transpose operator.From the solution of Eq. (B2) in terms of the random walk propagator: Consequently, the average return probability is given by Appendix D: Properties of the partition function Spectral entropy can be used to compare two networks [36].One measure of similarity is the Kullback-Leibler (KL) divergence.Spectral KL divergence is nonnegative, exactly as its information-theoretic counterpart.
Given a multiplex network with density matrix ρ(τ ) and any of its layers, indicated by l = 1, 2, ..., L, the following inequality holds: Note that the explicit dependence on τ is omitted for sake of clarity.
The average entropy distance of multiplex and its layers, can be quantified by the Kullbeck-Leibler entropy divergence, and in case of uniform distribution of layer weights, defines the intertwining as follows: which is zero if the system as a whole does not provide any additional information about its layers in isolation.
Conversely, a large average divergence between the system and its layers highlights the necessity for using the multiplex description to encode the layer-layer interactions.In the next sections, we will show how the normalized version of this measure can be understood in terms of the entropy of layers in isolation and multiplex as a whole, and how it can be used to enhance transport properties while superimposing subsets of layers.We emphasize again that also intertwining can be found for any arbitrary set of layer weights, in case we have a way to asses the relative importance of layers.Using Eq. (A3) we obtain As it is true for all layers, we sum over layers and divide by the number of layers and exploit the relation L = from which the following fundamental inequality, relating the partition functions of layers with the one of the multiplex system, is obtained: A direct consequence of this result is that at least one layer must have a partition function such that Z ( ) (τ ) ≥ Z(τ ), or inequality in Eq. (D4) would not be satisfied.It follows: which will be useful in the following.This last result holds for networks consisting of a single connected component, a scenario that could not be satisfied by many empirical systems.Its generalization to the case of multiple connected components is obtained as follows.First, let us assume that there are C connected components in the system.If τ 1 then Z(τ ) ≈ C and the following approximation holds: the last step justified by the fact that λ 1 , ..., λ C = 0 for a network with C connected components, which includes the possibility for C isol isolated nodes, i.e. nodes not interconnected with the rest of the system.For sake of completeness, it is worth remarking that Tr (L) = N if there are multiple connected components but no isolated nodes, whereas Tr (L) = N − C isol if isolated nodes are present.
The mean-field approximation consists in neglecting higher-order terms in the following: To calculate the mean values, it is worth noting that: It follows that which, for a network with no isolated nodes and only one connected component, it reduces to for N 1.It follows that in the thermodynamic limit the mean-field entropy is given by or, more generally: which, in the thermodynamic limit, reduces to if C scales sub-linearly with N , i.e., if the number of isolated nodes and disconnected components is much smaller than the size of the system, which might not be the case for finite empirical systems.
Taylor expanding the logarithm, keeping the first term in the limit of large τ , The goodness of the mean-field approximation is numerically investigated for a variety of network models (see Fig. 4 and 5 for emblematic examples).

Appendix F: First order correction
In the previous section we have neglected the crosscorrelation term (λ − λ)(ν(τ ) − ν(τ )) to obtain the mean-field approximation of the spectral entropy.It is straightforward to show that this term can be considered as a correction to the mean-field: in order to write the spectral entropy as S(τ ) = S MF (τ ) + δS MF (τ ).Note that δ i = λ i − λ , where λ i is the i-th eigenvalue of the normalized Laplacian matrix and λ = 1 in the thermodynamic limit.
In fact, Z(τ ) ≥ 1 for any value of τ , therefore the following inequality holds: where we have expanded the exponential in its Taylor series truncated at second order and we have used the fact that δ 2 = λ 2 + 1.In the case of C disconnected components, this difference is From the definition of Snint (τ ) and the above results, it is straightforward to show that that in our case reduces to Var(λ) ≤ 1, for the distribution of normalized Laplacian matrix's eigenvalues.It follows that 0 Appendix G: Non-negativity of Intertwining From the fundamental inequality , it's straightforward to show that If τ 1 then Z(τ ) ≈ 1 and the following approximation holds: which leads to By summing side by side inequalities (G1) and (G3) we obtain S int (τ ) ≤ Snint (τ ), (G4) leading to non-negativity of the intertwining for τ 1.We wonder to which extent this inequality holds for smaller values of τ .
After expanding Z around δ i = λ i − λ = λ i − 1 we have at first-order: It follows that Eq. ( G2) is satisfied provided that (N − 1)e −τ 1, which is equivalent to require that τ should be larger than the τ c where τ c > log(N − 1) in order for mean-field approximation to be very accurate.
In the more general case, we have Z(τ which similarly holds if τ c > log( N C − 1) so that log Z C can be expanded in the same way.Note that if there are isolated nodes, the fundamental inequality must be proven.
Appendix H: When and why the structural reducibility fails Of course, the non-interacting scenario described here is barely observed in reality and S nint (τ ) would be equal to the entropy of one single layer.However, our framework suggests that even in the case of different layers, we can interpret S nint (τ ) as an upper bound -obtained by exploring the system using each layer separately -to S int (τ ) (see Appendices E, F and G).Then, any deviation from this upper bound highlights the existence of layer-layer interactions and can be used functional reduciblity of the system.In the light of this distinction, we are now able to better understand existing measures in the literature.For instance, the classical structural reducibility [38] is based on comparing S nint against the entropy of the fully aggregated system S agg : clearly this method has to fail when some layers are exactly the same and, more generally, when they are very similar with each other (Appendix H).
The classical structural reducibility method is based on maximizing the quality function where n is the aggregation step, and S agg is the entropy of the fully aggregated system.Let us consider a multiplex network with L layers, where m layers are identical and have entropy S (same) = S ( ) ( = 1, 2, ...m), while the remaining layers have different topologies and we indicate by their average entropy.We expect a suitable optimization procedure to identify the m − 1 redundant layers and reduce the multiplex system to L − m + 1 layers.
During the first m aggregation steps, the layers that are supposed to be aggregated are exactly the same.Remarking that the entropy remains unchanged if the adjacency matrix is multiplied by a constant, when two out of m equal layers are aggregated, the entropy of the resulting network is still S (same) .So when calculating the average entropy of layers, aggregation of every pair of equal layers is exactly equal to removing one of them.This way it is straightforward to write the average entropy of layers as a function of aggregation step n which follows, Writing one of the entropies in terms of the other one S (dif ) = δ + S (same) we have Therefore, if S (dif ) > S (same) then δ is positive, and the aggregation of similar layers increases the average entropy of layers S ( ) (n), while decreasing the classical quality function: Sagg .This case corresponds to an irreducible structure, an evident failure of the method because it is only comparing the average entropy of different groups of layers.A convincing example is given in Fig. 6, where the method is applied to a multiplex system consisting of 10 layers (9 are identical).FIG. 6. Functional versus structural redundancy.A multiplex network with 10 layers is given, in which 9 layers are exactly the same and expected to be grouped into one network.In scenarios like this one, the structural reducibility fails to identify the redundant layers to superimpose.The curve corresponding to functional redundancy has been rescaled to match the scale of the other curve to facilitate the comparison of trends and the identification of maxima.

Appendix I: Using intertwining for dimensionality reduction
If L indicates the normalized Laplacian matrix of the multiplex network, let us introduce perturbation matrices ∆L ( ) such that the normalized Laplacian matrix of each layer can be written as What is the effect of such perturbations on the corresponding partition functions?In fact, we have already shown that either for the multiplex network or for each layer separately.This result suggests that it is sufficient to find a relation between the second-order moments of the eigenvalue distribution for networks separately and the whole multiple system.
In the following, we use the perturbation approach suggested by Lord Rayleigh, where we implicitly assume that perturbations to the Laplacian matrices are small.Since matrices L ( ) are diagonalizable, then λ with ∆λ where q i indicates the i-th eigenvector of L. Since λ ( ) = 1 and λ = 1, it follows that ∆λ ( ) = 0, whereas where the covariance term λ∆λ ( ) = 0 because perturbed eigenvalues are not depending from the eigenvalues of L. It follows that We can also write the product of partition functions of layers in terms of perturbations: we can use the approximation Zint(τ ) at first order, to obtain where Above the time τ c , Eq. (G2) holds and we can write entropy in mean-field regime as On the one hand, let us consider the average informa-tion divergence given by where, multiplying by τ + 1 we obtain the important relation with the entropy deviation given by On the other hand, since log we can write It is worth remarking the dependence of the intertwining on the average variance of single-layer eigenvalues, stressing the fact that it can be used as a measure of topological diversity of layers.In practice, it is desirable to work with the relative intertwining, defined by Since S int (τ ) ≤ Snint (τ ) then 0 ≤ Snint (τ ) − S int (τ ) ≤ Snint (τ ), therefore 0 ≤ I (τ ) ≤ 1.For small perturba-tions, the relative intertwining reduces to It is worth noting that the second-order approximation used in this section requires that τ 2 2 ∆λ ( ) 2 1 and 1.It follows that the relative intertwining can be well approximated by The last relation highlights how the intertwining is expected to increase for increasing deviation of single-layer eigenvalue spectra from the overall average.Additionally, the relative intertwining can be approximated as . If the perturbation is sufficiently small, i.e. ∆Z ( ) (τ ) Z int (τ ), the approximation reduces to I * (τ ) ≈ ∆Z ( ) (τ ) Zint(τ )−1 , which directly relates the intertwining to the partition function of multiplex network.
The integral over time of the relative intertwining is used as a quality function in the main text in order to have a parameter-independent quantification of diversity.If m = 0, 1, ..., L − 1 indicates the aggregation step of the functional reduction procedure, we define the quality function by where τ c > log(N − 1) and τ max can be arbitrarily large.
To allow for a meaningful comparison with some transport properties, as shown in the next section, we can set τ max = N without loss of generality: this choice corresponds to let random walkers explore the network with a number of steps that is of the order of system's size.Since I (m, τ ) = f (τ )φ(m), i.e. its dependence on vari-ables can be separated, the resulting quality function will not be affected by the integral of f (τ ) at different aggregation steps, therefore we can focus only on the influence of φ(m) = ∆λ ( ) 2 .
To better understand the behavior of φ(m), let us consider the case of a multiplex network consisting of L layers, among which m 0 < L are identical.For sake of simplicity, but without loss of generality, we can assume that the identical layers are the ones labeled by 1, 2, ..., m 0 , where non-identical layers are labeled by m 0 + 1, ..., L. It follows that suggesting that the difference in the quality function between two successive aggregation steps depends on the trade-off between a negative term (the first one, since m 0 < L) and a positive term (the second one).This difference is positive if and only if The last result provides us with a deep insight on the behavior of the quality function: it increases after aggregating two layers that are identical -or, equivalently, with an eigenvalue spectrum whose variance is identically displaced from the overall average -if and only if the average variance of the eigenvalue spectra of non-identical layers is larger than the variance of the eigenvalue spectra of identical ones.Roughly speaking, the quality function increases when superimposing two layers which provide redundant information in the multiplex system whereas it decreases when it is not the case.Since at the latest possible aggregation step, i.e. m = L − 1, the intertwining is zero by construction, when redundant layers are present we expect an optimal step m opt to exist in correspondence of the step where topological diversity is maximum.

Appendix J: Collective cosine distance between layers
A standard way to quantify dissimilarity between layers is to use Jensen-Shannon entropy divergence.Indi-cating by ρ ( ) (τ ) and ρ ( ) (τ ) be the density matrices corresponding to two layers and of a multiplex network, their Jensen-Shannon divergence is defined by J [ρ ( ) (τ )||ρ ( ) (τ )] = S (mix) (τ ) − 1 2 [S ( ) (τ ) + S ( ) (τ )], formally equivalent to the measure widely used in quantum computing.Here, S (mix) is the entropy of the mixture matrix defined by µ ( , ) = [ρ ( ) (τ ) + ρ ( ) (τ )]/2.The quantity J [ρ ( ) (τ )||ρ ( ) (τ )] defines a metric distance between layers that is a function of diffusion time τ .However, Jensen-Shannon distance (JSD) might be not reliable in the case of multiplex systems consisting of multiple connected components and isolated nodes across layers, scenarios typical when analyzing empirical systems.Moreover, it depends on τ : for multi-resolution analysis this is desirable, while it is less desirable when one is not interested in having results at different time scales.
In this section, we introduce a novel distance measure, that we name collective cosine distance (CCD), which is not affected by the above limitations.At variance with JSD, which is an information-theoretic measure, CCD is a geometric measure based on the calculation of the average angular distance between transition vectors.The transition vector T i associated to node i is a column of the transition matrix T, and encodes the probability of jumps from that specific node to any other node in the network.As the role of a certain node in distribution of flow can vary from layer to layer, T Note that in case of isolated state nodes, the corresponding transition vector can be taken as zero.The collective cosine distance (CCD) between two layers, accounting for the collective angle, is defined by and can be used to quantify the pairwise dissimilarity of layers while accounting for dissimilarities of their diffusion pathways.
It is worth remarking that CCD is more convenient than JSD for the goal of this study.From a computational perspective, it is dramatically faster than JSD and avoid hierarchical clustering: therefore, reduction is not based on a heuristics but it can be the result of an exhaustive search through all possible functional couplings.From a theoretical perspective, CCD does not depend on a temporal scale and naturally deals with the presence of isolated nodes and multiple connected components.
Appendix K: Laplacian matrix in presence of isolated nodes Empirical multiplex systems usually consist of several isolated state nodes and, sometimes, of disconnected components.The presence of these isolated nodes leads to singularities when modeling the dynamics on top of the network.To avoid such singularities, a typical workaround is to add self-loops connecting isolated nodes to themselves.However, this mathematical trick boosts the dynamical trapping and consequently, it dramatically alters the resulting dynamics in an unpredictable way.Another possibility is to add virtual links, with extremely small weights, connecting an isolated state node to all state nodes of the corresponding layer, or to a number of them.This workaround provides a more realistic approach to model a diffusion process on top of multiplex networks.Unfortunately, when the number of isolated state nodes is large, a large portion of the flow goes through the virtual links and, consequently, the resulting dynamics in an unpredictable way.
To overcome those issues, we propose an alternative approach to find the right Laplacian matrix of a multiplex network and its layers when an arbitrary number of isolated state nodes is present.To this aim, we first question the physical nature of isolated state nodes.As they have no contribution in any interaction, they should not have any impact on the overall flow as well.More technically, instead of focusing on the structure, we associate suitable transition vectors to the isolated state nodes in order to remove their influence on the overall dynamics.
Mathematically, let us indicate by T ( ) i the transition vector for node i in layer , and, if the node is isolated in that layer, let us indicate the vector by T ( ,iso) i .Our goal is to exclude the effects of isolated state nodes on the transition vector of node i.Let us assume that node i is connected in only m out of L layers of the multiplex network: the corresponding multiplex transition vector is given by Once transition vectors for isolated state nodes are obtained, one can find the transition matrix of layers with isolated nodes, and consequently the transition matrix of the multiplex, as the weighted average of transition matrices of layers.

4 FIG. 1 .
FIG.1.Structural versus functional reducibility of multilayer systems.In structural reduction (left) one alters the structure by iteratively aggregating layers, e.g., by summing the corresponding adjacency matrices.However, this type of intervention usually comes with some costs (e.g., temporal, infrastructural, economic, etc.).Here, we propose functional reduction (right), an alternative approach where the structure of the system is not altered while coupling the dynamics on the top of layers.While any dynamics is allowed in principle, diffusive processes are particularly suitable ones because they allow for theoretical and computational treatment.In this work, random walk dynamics is considered and layers are functionally coupled if their network states are not distinguishable from the point of view of a walker (i.e., the same color is assigned to functionally coupled layers).From an application perspective, this approach corresponds to shared tickets for public multimodal transport infrastructure or shared promotions for users who are part of multiple social networks at the same time.

FIG. 2 .
FIG.2.Functional reducibility in synthetic systems (A) A multiplex network with 100 nodes, consisting of 5 identical layers and 5 distinct random networks with wiring probability 0.2 is illustrated in its simulated form (Original System), its form after functionally coupling its layers according to the framework presented in this study (Minimum-Redundancy Functional Coupling) and the functional form corresponding to the coupling of all layers simultaneously (Aggregate).(B) Heatmap encoding pairwise dissimilarity (the darker the color, the larger the distance) between layers shown in panel (A), according to collective cosine distance (see Appendix J).For clarity, cells are re-arranged in such a way that closer layers have smaller distance.(C) From top to bottom: intertwining compared to transport properties such as average return probability (rescaled by N ), diffusion time and navigability, respectively, that can be enhanced on this multiplex network by using functional reducibility.Note that average return probability R(m) at reduction step m, integrated over τ , has been then transformed by − log(1 − R(m)/ max m {R(m))} for clarity.(D) A multiplex network with 100 nodes where layers are uncorrelated random networks with wiring probability 0.2.(F) As in (B), but for the system considered in panel (D).(F) Transport properties are not enhanced by functional coupling because spectral diversity is maximized when the original system is not functionally aggregated.See Fig.7for results on additional synthetic models.

FIG. 3 .
FIG. 3. Functional reducibility of empirical multilayer systemsThe same analysis shown in Fig.2is performed on different multilayer systems, namely the: (A) protein-protein interactions of Xenopus Laevis (the African clawed frog)[38]; (B) co-authorship network of the Pierre Auger experiment[47]; (C) public transportation of London[6]; (D) European airport network[48].See the text for further details about the data set.Note that entries dij of distance matrices have been transformed by − log(dij) to enhance the visualization of the corresponding heatmaps.(Theaverage return probability is rescaled by N )

(
Assuming the same contribution for all isolated state nodes in their average, T

TABLE I .
Functional coupling of layers in the multiplex protein-protein interaction network of Xenopus Laevis.Reducibility: χ f unc = 33.3%.

TABLE II .
Similar to Tab.I, for the Pierre Auger collaboration network.Reducibility: χ f unc = 21.4%.