Algorithmic complexity of multiplex networks

Multilayer networks preserve full information about the different interactions among the constituents of a complex system, and have recently proven quite useful in modelling transportation networks, social circles, and the human brain. A fundamental and still open problem is to assess if and when the multilayer representation of a system is a qualitatively better model than the classical single-layer aggreagated network approach. Here we tackle this problem from an algorithmic information theory perspective. We propose an intuitive way to encode a multilayer network into a bit string, and we define the complexity of a multilayer network as the ratio of the Kolmogorov complexity of the bit strings associated to the multilayer and to the corresponding aggregated graph. We find that there exists a maximum amount of additional information that a multilayer model can encode with respect to an equivalent single-layer graph. We show how our measure can be used to obtain low-dimensional representations of multidimensional systems, to cluster multilayer networks into a small set of meaningful super-families, and to detect tipping points in different time-varying multilayer graphs. These results suggest that information-theoretic approaches can be effectively employed in the study of multi-dimensional complex systems, and pave the way to a more systematic analysis of static and time-varying multidimensional complex systems.

Multilayer networks preserve full information about the different interactions among the constituents of a complex system, and have recently proven quite useful in modelling transportation networks, social circles, and the human brain. A fundamental and still open problem is to assess if and when the multilayer representation of a system is a qualitatively better model than the classical single-layer aggreagated network approach. Here we tackle this problem from an algorithmic information theory perspective. We propose an intuitive way to encode a multilayer network into a bit string, and we define the complexity of a multilayer network as the ratio of the Kolmogorov complexity of the bit strings associated to the multilayer and to the corresponding aggregated graph. We find that there exists a maximum amount of additional information that a multilayer model can encode with respect to an equivalent single-layer graph. We show how our measure can be used to obtain low-dimensional representations of multidimensional systems, to cluster multilayer networks into a small set of meaningful super-families, and to detect tipping points in different time-varying multilayer graphs. These results suggest that information-theoretic approaches can be effectively employed in the study of multi-dimensional complex systems, and pave the way to a more systematic analysis of static and time-varying multidimensional complex systems.

I. INTRODUCTION
The success of network science in modelling real-world complex systems [1][2][3][4][5] relies on the hypothesis that the interconnections among the elementary units of a system -i.e., the network of their interactions-are responsible for the emergence of complex dynamical behaviours [6][7][8]. Traditionally, relevant contributions towards a better understanding of complex networks have come from statistical physics [9][10][11][12][13], where the main aim is to characterise the ensembles of random graphs comparable with an observed real-world network. However, really interesting results have also come from information theory [14][15][16][17]. In particular, quite prolific lines of research in this area have focused on the adaptation of classical concepts and methods from information theory to networks analysis [18][19][20][21][22], on the definition of entropy measures on empirical networks [23,24], and on the quantification of the significance of structural indicators based on algorithmic information theory [25][26][27].
A current hot research topic in network science is represented by multi-layer and multiplex networks, which take into account different kinds of relations among the same set of nodes at the same time [28][29][30][31][32]. The main idea behind the investigation of high-dimensional network representations is that retaining full information about the structure of a system under study is fundamental to fully understand its behaviour. Indeed, multi-layer networks have helped unravelling interesting structural properties in transportation systems [33,34] and neuroscience [35][36][37], and have revealed qualitatively new emerging phenomena [38] including abrupt cascading failures [39], super-diffusion [40], explosive synchronisation [41], and * Corresponding author: v.nicosia@qmul.ac.uk the appearance of new dynamical phases in opinion formation [42][43][44], spreading [45] and epidemic processes [46][47][48][49].
These encouraging results have transformed our understanding of many physical systems, but an overarching question remains about whether it is indeed necessary to incorporate all the available data about a system in order to fully characterise its behaviour. Some recent studies have indeed shown that the multi-layer version of some dynamical processes cannot be reduced to the corresponding single-layer process on any simple combination of the existing layers [50]. Nevertheless, determining if -and to which extent-it is possible to obtain and use an equivalent lower-dimensional representation of a given multi-layer system, while still observing the same structural and dynamical richness, is mostly an open question. The recent attempts to formalise multi-layer dimensionality reduction in terms of a quantum information problem [51] represents a concrete step in that direction. However, we still lack a convincing method to quantify the amount of information contained in a multi-layer network model, and to compare the information content of different multi-layer networks.
In this paper, we take an algorithmic complexity perspective on this problem, and we define the complexity measure C(M) to quantify the amount of information contained in a multiplex network M. The measure leverages the classical concept of Kolmogorov Complexity [52,53], according to which the complexity of a bit string is equal to the length of the shortest possible program that can produce that string as its output. In particular, we show that C(M) is quite useful in determining the optimal number of layers needed to represent a multi-layer network, and in detecting similarities among multi-layer networks from different domains.
FIG. 1. The complexity C(M) of a multiplex network M is defined as the ratio between its Kolmogorov complexity and the Kolmogorov complexity of the associated single-layer aggregated graph. The multiplex is transformed into a string of bits by means of the prime-weight matrix. Since Kolmogorov complexity is not computable, we rely on an upper-bound based on the size of the compressed string of bits associated to each object. A common way to obtain an upper bound is by computing the length of the string compressed through the gzip algorithm [66].

II. RESULTS
The formalism we propose here to quantity the complexity of a multiplex network over N nodes and M layers is based on the comparison of the Kolmogorov complexity of the multiplex and of the corresponding aggregated graph. We start by encoding the multiplex M into the N × N prime-weight matrix Ω defined as follows: The prime-weight matrix is obtained by assigning a distinct prime number p [α] to each of the M layers of the multiplex, and then setting each element Ω ij equal to the product of the primes associated to the layers where an edge between node i and node j actually exists. Note that, given a certain assignment of prime numbers to the M layers, the matrix Ω is uniquely determined. Moreover, thanks to the unique factorisation theorem, the prime-weight matrix preserves full information about the multiplex network M, i.e., about the placement of all its edges.

A. Complexity of multiplex networks
We define the complexity of a multiplex network M with N nodes and M layers as the ratio: where the numerator is the Kolmogorov Complexity of M and the denominator is the Kolmogorov Complexity of the weighted aggregated graph associated to M. In particular, the matrix Ω is the prime-weight matrix representation of M, while W is the single-layer network obtained by flattening all the M layers (see Materials and Methods). We compute an approximation of the Kolmogorov Complexity of a matrix by looking at the size of the compressed weighted edge list (see Material and Methods and Supplementary Information for more details). The measure of complexity in Eq. (2) effectively quantifies the relative amount of additional algorithmic information needed to encode the multiplex network with respect to the amount needed to encode the corresponding single-layer aggregated graph. As a particular case, C(M) = 1 if the multiplex network consists of M identical layers, but in general C(M) ≥ 1, since the different possible arrangements of edges across the layers require more than one symbol to be encoded. The main hypothesis is that the higher the value of C(M), the larger the amount of information lost when representing the multiplex as a single-layer graph. In practice, if C(M) ≈ 1 it would not make much difference to represent the multiplex as a single-layer graph, since the multiplex representation is not adding much more information. Conversely, when C(M) > 1 the aggregation of the multiplex into a single-layer graph would discard relevant information, and the larger the value of C(M) the more important it is to retain the full multiplex model.

B. Synthetic multiplex networks
The fundamental ingredients contributing to the complexity of a multiplex M as quantified by C(M) are the number of distinct pairs of nodes connected by an edge, and the actual number of distinct symbols present in the prime-weight matrix Ω. In fact, both a larger number of connected pairs of nodes and a larger number of distinct symbols will in general result in a larger encoding, and a larger value of C(M). Indeed, the number of symbols present in Ω is equal to the number of different multiplex motifs with two nodes present in the system [36].
The structural edge overlap o is a an easy-to-compute proxy for the variety of different multi-edge configurations in a multiplex network (see Materials and Methods). In order to understand the effect of edge overlap on C(M), we considered ensembles of synthetic multiplex networks with different number of layers, where the total number of nodes and the average node degree on , indicating that there exists indeed a maximum amount of additional information that a multiplex can encode with respect to the corresponding aggregate graph. We find it quite remarkable that the range at which C(M) peaks is compatible with the typical values of structural edge overlap observed in many real systems [50,54] (see Table I).
Structural hysteresis. -In order to fully explore the behaviour of the complexity C, we considered an ensemble of synthetic multiplex networks where we iteratively decrease and increase the structural overlap o (see Materials and Methods for details). Interestingly, we found two robust and distinct trajectories when the structural overlap is decreased (resp. increased), characterised by a hysteresis loop (Fig. 2B). In the simulations, we start from a multiplex network with N = 10000 nodes and k = 6, where the M = 4 layers are identical Erdös-Renyi random graphs, and we iteratively rewire the links in order to decrease the total edge overlap until we reach a multiplex network with o = 0. After that, we successively increase the structural overlap until the system results in a multiplex network with o = 1. Remarkably, C(M) remains a non-monotonic function of the structural edge overlap, but the plot of C(M) as a function of o reveals the presence of a structural hysteresis, i.e., the trajectory leading from o = 1 to o = 0 is different from the one obtained when the structural overlap is increased from o = 0 to o = 1. This indicates that the procedures used to decrease and increase edge overlap are not ergodic, due to the intrinsic difference between the way overlap is created and destroyed. Indeed, if we start from a multiplex M with identical layers, the total number of ways in which overlap randomly reduce its overlap is significantly larger than the total number of ways in which edge overlap can be increased through random moves.

C. Reducibility and multiplex cartography
One of the main issues of multi-dimensional data sets is that they normally contain redundant information. Consequently, the direct transformation of each type of relation available in a multi-dimensional data set into a distinct layer of a multiplex network will possibly result in a structurally redundant representation of the original system. However, dealing with parsimonious models is always desirable, and is especially important in the case of multi-dimensional systems, where additional model complexity usually yields additional computational costs and interpretation issues. The "reducibility problem", originally formulated in Ref. [51], is the problem of finding low-dimensional representations of a multiplex network which preserve as much structural information as possible about the original system.  The Multiplex Complexity C(M) that we have defined provides a natural and meaningful way of obtaining reduced (low-dimensional) versions of a multiplex networks over M layers. If we start from the original multiplex network M and we aggregate some of its layers, we obtain a reduced multiplex network X with X ≤ M layers, which will have a multiplex complexity C(X ). We propose to quantify the normalised information content of the reduced multiplex network X as: where K X is the number of distinct links in the multiplex X . The normalisation by log K X is necessary, since in general the Kolmogorov Complexity of a bit string of length n is not smaller than c + log n, for some c ≥ 0 [55,56]. The length of the bit string associated to a multiplex network is proportional to the number of distinct links in the multiplex hence, on average, a multiplex with a larger number of edges is expected to have a higher multiplex complexity.
Notice that q(X ) behaves as a quality function, meaning that larger values of q(X ) indicate that the (possibly reduced) multilayer network X encodes a relatively larger amount of information with respect to the corresponding weighted aggregated graph W X . Hence, our goal is to find argmax [q(X )], which represents the optimally reduced multiplex X max yielding the maximum value of information with respect to the aggregated graph. In particular, if all the layers of the multiplex network X are identical, then the maximum of q will always be at 1 = X ≤ M layers, since the multiplex network and the aggregate graph are equivalent.
Maximising q(X ) by enumerating all the possible partitions of the M layers is not feasible, since their number increases super-exponentially with M . We employed here a classical greedy algorithm to approximate the optimal solution (see Materials and Methods). The results are reported in Table I. Notice that most of the technological and biological multiplex networks in the Table admit reduced representations which have only a slightly smaller number of layers than the original systems. This is in agreement with the observation that in technological systems structural redundancy is purposely avoided. Similarly, the poor redundancy observed in biological multiplex networks is in line with the functionally different role played by each layer (protein interaction, functional dependence, mechanical interaction, and so on). However, technological systems exhibit consistently larger values of multiplex complexity than biological systems. A comparison with the reducibility algorithm proposed in Ref. [51] shows that our definition of multiplex complexity is more conservative, and often yields an optimal partition that has a slightly larger number of layers. We argue that this is a nice feature of the algorithm, since aggressive aggregation is known to produce structural artefacts.

D. Time-varying multiplex networks
The multiplex complexity C(M) can be used to track the temporal evolution of the structure of time-varying 1 9 1 0 1 9 3 0 1 9 5 0 1 9 7 0 1 9 9 0 2 0 1 0  multiplex networks (see Methods for details about the data sets). In Fig. 4 we show how the complexity of five large-scale multiplex networks has changed over time. Interestingly, C(M) provides a very good picture of the alternating behaviour of the IMDB movie co-starring network over about a century (Fig. 4A), and of the network of 35 major assets in the NYSE and NASDAQ financial markets in the period 1998-2013 (Fig. 4B). In both cases, local maxima of complexity are consistent with the most notable periods of crisis in each data set, while local minima of complexity seem to be precursors of renaissance in IMDB or stability in the financial market. The value of complexity of scientific collaboration networks (APS and Web of Science, Fig. 4C) has remained stable around C(M) = 1 over the last 35 years. This is mainly due to the fact that in these multiplex networks each layer represents a different field or sub-field of science, and authors normally tend to publish in one or at most a couple of fields. In fact, the structural overlap of those multiplexes is always very small, and the large majority of pairs of nodes are connected in at most two layers. As a result, there is not indeed much benefit in considering the multiplex representation, since the information encoded in the different layers is comparable to that contained in the corresponding aggregated graph. It is worth noting that the complexity of the FAO multiplex network of food exchange has kept increasing steadily in the last 30 years (Fig. 4D). This is most probably linked to the globalisation of commercial exchanges in general, which is reflected also in a more intricate pattern of relations among countries across a wide range of products.
Mapping multiplex networks. -Finally, in Fig. 3 we show how multiplex complexity can be used to obtain a planar embedding of multiplex networks of different kind, and to reveal the presence of interesting clusters. For each multiplex network in the data set, we used the maximum value of the quality function max q(•) as one of the coordinates, and the maximal entropy rate per nodeh max as the other one (see Methods for details). Note thath max is linked to the dispersiveness of random walks on a graph, and it thus carries information about the the large-scale dynamical properties of a multiplex [57]. As a consequence, it provides a perspective on a system that is orthogonal to that captured by multiplex complexity, which is instead a purely structural quantity. In Fig. 3A we indicated with different colours the two largest group obtained through hierarchical clustering, while in Fig. 3B we show the corresponding dendrogram. Interestingly, biological multiplex networks are all grouped together, while social and technological systems are put in another cluster.

DISCUSSION
Quantifying the structural information encoded in a network is of fundamental importance to identify the key components of the system it represents. Indeed, information theory has already proven quite successful at extracting meaningful structural information from graphs [20,51,58] and at providing sound null-models for different network-related tasks [59][60][61][62]. In the case of multi-dimensional data sets, and in particular of the multi-layer networks constructed from them, assessing the actual amount of information contributed by each additional layer is of fundamental importance. Despite the current trends in Data Science seem to suggest otherwise, more data is not always a blessing. Not all additional data available is indeed informative, and quite often more data implies more redundancy and more noise.
The algorithmic information approach proposed in this paper allows to condense the structural properties of a multiplex in a number -the multiplex complexity C(M). But since C(M) is defined as the ratio between the Kolmogorov complexity of the multiplex and that of the corresponding aggregated graph, its values allow to assess to which extent a given multiplex representation of a system is more informative than a single-layer graph (e.g., if C(M) is larger than 1). Consequently, it is meaningful to rank different multiplex networks according to their value of C(M), since those values are indeed telling us how much a multiplex representation deviates from the corresponding null-model hypothesis, i.e., that the system can be represented instead as a single-layer graph.
One of the most appealing aspects of C(M) is that it can be successfully employed to detect redundancy in a multiplex network, and to obtain meaningful lowerdimensional representations of a system. The interesting results about multiplex reducibility shown in the paper have a double-pronged significance. On the one hand, the fact that almost all the multiplex networks analysed admit a more compact lower-dimensional version is a warning against the quest to obtain more and more detailed data. There is a clear indication that not just more data points (edges), rather more informative data points are needed to complement the information already provided by existing layers. On the other hand, the possibility of reducing the number of layers of a multiplex has a lot of practical implications. Indeed, even simple multilayer structural descriptors, such as clustering coefficient, average shortest path or any centrality measure based on paths scale super-linearly or exponentially as a function of the number of layers. Hence, a sound procedure to reduce the dimensionality of a network, without sacrificing information, allows to speed-up most of the computations on multiplex networks without loosing accuracy.
We argue that, although other definitions of complexity of multilayer networks have been recently explored [51], the multiplex complexity proposed here links closely to the traditional meaning of complexity of a system as the amount of information needed to fully describe it. This link is made possible by the prime-weight matrix, which encodes the full structure of the system in a string of bits. It is true that, in principle, the prime-weight matrix encoding depends on the chosen assignment of labels to nodes and primes to layers. However, it is remarkable that the assignment of primes to layers in increasing order of total number of edges provides a consistent approximation of Kolmogorov complexity, although a quite conservative one.

III. MATERIALS AND METHODS
Prime-weight matrix encoding. -A multiplex network M over N nodes is a set of M graphs (layers), each representing one type of interaction among the N nodes. In this framework, each node has a replica on each of the M layers, and the structure of each of the layers is in general distinct. The classical way to represent an unweighted multiplex network is by means of a vector of adjacency matrices A = {A ij of the adjacency matrix A [α] at layer α is equal to 1 if and only if node i and node j are connected by a link at that layer, and zero otherwise. If we assign a distinct prime number p [α] to each of the M layers, we can define the prime-weight matrix Ω whose elements are: The matrix Ω ∈ R N ×N is a compact encoding of the vector of adjacency matrices A. In fact, thanks to the unique factorisation theorem, the adjacency matrix of a generic layer α can be obtained from Ω by considering all the elements Ω ij which are divisible by the corresponding prime p [α] . Notice that this encoding works also for matrices with integer weights, by associating to each ex- ij is the weight of edge (i, j) on layer α. Although the actual set of primes associated to the layers does not impact the construction of Ω, for practical reasons it makes sense to always use the sequence of the first M primes {2, 3, 5, ...}, since the actual number of bits required to store the matrix Ω is Notice that in general a multiplex network M admits M ! different prime-weight matrices, one for each of the possible permutations of the primes associated to the M layers. In this paper we choose a "canonical prime association", that is the one that associates prime numbers to layers in increasing order of their total number of edges. In practice, we assign the prime 2 to the layer with the smallest total number of edges ij , the prime 3 to the layer with the second-smallest total number of edges, and so on. See SI for details.
Multiplex Complexity. -The Kolmogorov Complexity (KC) of a bit string S is defined as the length of the shortest computer program that generates S as output [14,52,53]. However, it is easy to prove that the Kolmogorov Complexity is an non-computable function [64], thus it is only possible to obtain approximations of KC. A common approach is to compress S using a compression algorithm of your choice, and to consider the length of the compressed string S as estimate of the KC of S. In fact, it is possible to obtain S from the compressed string S by using the de-compression routine corresponding to the compression algorithm used to obtain S . Thus, the concatenation of S and of the decompression routine is a program able to generate S, and its length is an upper bound for KC. The bit string S(M) associated to the multiplex network M and the corresponding primeweight matrix Ω is the bit string of the edge list of Ω, where edges are listed in lexicographic order and with their corresponding weight (see SI for a discussion about fluctuations due to node labelling) [65]. We define the Kolmogorov Complexity KC(M) of the multiplex M as the length of the bit string S (M) obtained by compressing S(M) with gzip [66].
The complexity C(M) of a multiplex network M is equal to the KC of its prime-weight matrix Ω divided by the KC of the single-layer weighted matrix W , obtained by considering the aggregate binary matrix multiplied by the largest entry of Ω. In formula: Notice that we constructed W so that when the multiplex M consists of M identical layers, we would always obtain C (M) = 1. Conversely, values of complexity larger than 1 somehow indicate that the multiplex contains more information that the corresponding aggregate. In general, the complexity of a multiplex might depend on the association of prime numbers to layers and on the actual node labelling. Numerical evidence confirms that the value of multiplex complexity obtained using the canonical association is always in the right-most tail of the corresponding distribution (see SI for details). As a consequence, it represents a conservative upper-bound of the actual value of complexity.
To reduce the effect of the other source of variability in the values of C(M) (i.e., the actual labelling of nodes, which affects the lexicographic ordering of the edge list), we define C(M) as the average of the multiplex complexity obtained by using the canonical ordering on 10 3 realisations of node relabelling on the same multiplex graph. See SI for details on the distribution of C(M) as a function of node relabellings.
Structural edge overlap. -Given a multiplex M and a pair of nodes (i, j), the overlap o ij of the pair is defined as the number of layers in which an edge exists between node i and node j. The matrix O = {o ij } is the overlapping matrix associated to the multiplex M [67]. The edge overlap o s of a multiplex network is the expected number of layers in which a pair of nodes is connected by an edge [43,67]: The plots are obtained by starting from a multiplex network with M identical Erdös-Rényi random graphs as layers, corresponding to o = 1, and iteratively rewiring the edges on each layer in order to decrease the structural overlap to o = 0. Edge rewiring is performed by selecting a pair of edges and swapping their end-points uniformly at random. This rewiring procedure is similar to the one used in Ref. [50], and preserves the degree sequence at each layer. Consequently, the layers of all the multiplex networks obtained through relabelling are Erdös-Rényi random graphs belonging to the same ensemble. The value of multiplex complexity corresponding to a certain value of structural overlap is obtained by averaging over 10 2 realisations with that specific value of structural overlap.
The algorithm to increase the structural overlap of the multiplex is similar to that used to decrease it, with the only difference that a rewiring is accepted only if results in the increase of the edge overlap of at least one of the two edges involved in the rewiring.
Reducibility. -Computing the global maximum of the quality function q(•) is in general computationally unfeasible, since it requires to enumerate all the possible partitions of M objects. This is a NP-hard problem that requires a number of operations that scales superexponentially with M . In order to avoid this problem, we used instead a greedy algorithm, which reduces the time complexity to O(M 2 ). The algorithm starts from the original multiplex with M layers and at each step computes the complexity of the two-layer multiplex networks corresponding to all the possible pairs of layers. Then, the pair of layers D yielding the smallest value of complexity q(D) is aggregated in a single one, taking the union of the edges of the two layers. The rationale behind this choice is that if two layers form a multiplex with small complexity then they are similar enough and can be thus flattened in a single layer. The iteration of this procedure will result in a sequence of multiplex networks with {M, M − 1, M − 2, . . . , 2, 1} layers. Among those M reduced multiplex networks, we choose the one yielding the largest value of q(•). Notice that unlike the process described in Ref. [51], we use the same quality measure q(•) both to perform layer aggregation and to select the best (sub-)optimal partition.
Maximal Entropy Rate. -The maximal entropy rate is a measure of the dispersiveness of a random walk on a graph, in the limit of long trajectories. In this sense, it summarises how diffusion on the graph depends on its global structure [57]. In the case of a multiplex network M, the maximal entropy rate is defined as: where λ max is the maximum eigenvalue of the overlapping matrix O associated to M [57]. To account for the dependence of λ max on the total number of nodes in the graph, we use the normalised maximal entropy rate: Multiplex data sets. -More information about all the data sets used in this paper are available in the SI.

Multiplex
M o Mopt(C) [max q(·)C ] Copt Mopt(V N ) q(·)V N London Tube [51] 13 0.  I. Reducibility of technological, social, and biological multiplex network. From left to right, the columns report the number of layers in the original system (M ), the structural edge overlap (o), the number of optimal layers (Mopt(C)) obtained when maximising the quality function q(•), the value max q(•), and the optimal value of complexity Copt observed. The last two columns show the optimal number of layers Mopt(V N ) and the corresponding value of the quality function q(·)V N obtained when using the multiplex structural reducibility procedure introduced in Ref. [51]. Although the two methods yield different results, they share similar features, i.e. technological multiplex networks are less likely to be reduced compared to biological and social systems.