Mathematical Formulation of Multi-Layer Networks

A network representation is useful for describing the structure of a large variety of complex systems. However, most real and engineered systems have multiple subsystems and layers of connectivity, and the data produced by such systems is very rich. Achieving a deep understanding of such systems necessitates generalizing"traditional"network theory, and the newfound deluge of data now makes it possible to test increasingly general frameworks for the study of networks. In particular, although adjacency matrices are useful to describe traditional single-layer networks, such a representation is insufficient for the analysis and description of multiplex and time-dependent networks. One must therefore develop a more general mathematical framework to cope with the challenges posed by multi-layer complex systems. In this paper, we introduce a tensorial framework to study multi-layer networks, and we discuss the generalization of several important network descriptors and dynamical processes --including degree centrality, clustering coefficients, eigenvector centrality, modularity, Von Neumann entropy, and diffusion-- for this framework. We examine the impact of different choices in constructing these generalizations, and we illustrate how to obtain known results for the special cases of single-layer and multiplex networks. Our tensorial approach will be helpful for tackling pressing problems in multi-layer complex systems, such as inferring who is influencing whom (and by which media) in multichannel social networks and developing routing techniques for multimodal transportation systems.

A network representation is useful for describing the structure of a large variety of complex systems. However, most real and engineered systems have multiple subsystems and layers of connectivity, and the data produced by such systems is very rich. Achieving a deep understanding of such systems necessitates generalizing "traditional" network theory, and the newfound deluge of data now makes it possible to test increasingly general frameworks for the study of networks. In particular, although adjacency matrices are useful to describe traditional single-layer networks, such a representation is insufficient for the analysis and description of multiplex and time-dependent networks. One must therefore develop a more general mathematical framework to cope with the challenges posed by multi-layer complex systems. In this paper, we introduce a tensorial framework to study multi-layer networks, and we discuss the generalization of several important network descriptors and dynamical processes-including degree centrality, clustering coefficients, eigenvector centrality, modularity, Von Neumann entropy, and diffusion-for this framework. We examine the impact of different choices in constructing these generalizations, and we illustrate how to obtain known results for the special cases of single-layer and multiplex networks. Our tensorial approach will be helpful for tackling pressing problems in multi-layer complex systems, such as inferring who is influencing whom (and by which media) in multichannel social networks and developing routing techniques for multimodal transportation systems.

I. INTRODUCTION
The quantitative study of networks is fundamental for the study of complex systems throughout the biological, social, information, engineering, and physical sciences [1][2][3]. The broad applicability of networks, and their success in providing insights into the structure and function of both natural and designed systems, has thus generated considerable excitement across myriad scientific disciplines. For example, networks have been used to represent interactions between proteins, friendships between people, hyperlinks between Web pages, and much more. Importantly, several features arise in a diverse variety of networks. For example, many networks constructed from empirical data exhibit heavy-tailed degree distributions, the small-world property, and/or modular structures; such structural features can have important implications for information diffusion, robustness against component failure, and many other considerations [1][2][3].
Traditional studies of networks generally assume that nodes are connected to each other by a single type of static edge that encapsulates all connections between them. This assumption is almost always a gross oversimplification, and it can lead to misleading results and even the sheer inability to address certain problems. For example, ignoring time-dependence throws away the ordering of pairwise human contacts in transmission of dis-eases [4], and ignoring the presence of multiple types of edges (which is known as "multiplexity" [5]) makes it hard to take into account the simultaneous presence and relevance of multiple modes of transportation or communication.
Multiplex networks explicitly incorporate multiple channels of connectivity in a system, and they provide a natural description for systems in which entities have a different set of neighbors in each layer (which can represent, e.g., a task, an activity, or a category). A fundamental aspect of describing multiplex networks is defining and quantifying the interconnectivity between different categories of connections. This amounts to switching between layers in a multi-layer system, and the associated inter-layer connections in a network are responsible for the emergence of new phenomena in multiplex networks. Inter-layer connections can generate new structural and dynamical correlations between components of a system, so it is important to develop a framework that takes them into account. Note that multiplex networks are not simply a special case of interdependent networks [6]: in multiplex systems, many or even all of the nodes have a counterpart in each layer, so one can associate a vector of states to each node. For example, a person might currently be logged into Facebook (and hence able to receive information there) but not logged into some other social networking site. The presence of nodes in Figure 1: Schematic of a multi-layer network. When studying ordinary networks, one represents the interaction of entities (i.e., nodes) using an adjacency matrix, which encodes the intensity of each pairwise inter-relationship. However, as shown in the left panel, entities can interact in different ways (e.g., depending on the environment in which they are embedded). In this panel, each layer of a multi-layer network corresponds to a different type of interaction (e.g., social relationships, business collaborations, etc.) and is represented by a different adjacency matrix. As shown in the right panel, the layers can also be interconnected. This introduces a new type of relationship and complicates the description of this networked system. multiple layers of a system also entails the possibility of self-interactions. This feature has no counterpart in interdependent networks, which were conceived as interconnected communities within a single, larger network [7,8].
To study multiplex and/or temporal networks systematically, it is necessary to develop a precise mathematical representation for them as well as appropriate tools to go with such a representation. In this paper, we develop a mathematical framework for multi-layer networks using tensor algebra. Our framework can be used to study all types of multi-layer networks (including multiplex networks, temporal networks, cognitive social structures [33], multivariate networks [34], interdependent networks, etc). To simplify exposition, we will predominantly use the language of multiplex networks in this paper, and we will thus pay particular attention to this case.
There are numerous network diagnostics for which it is desirable to develop multi-layer generalizations. In particular, we consider degree centrality, eigenvector centrality, clustering coefficients, modularity, Von Neumann entropy, and random walks. Some of these notions have been discussed previously in the context of multiplex net-works [12,21,22,24,27,28]. In this paper, we define these notions for adjacency-tensor representations of multi-layer networks. Our generalizations of these quantities are natural, and this makes it possible to compare these multi-layer quantities with their single-layer counterparts in a systematic manner. This is particularly important for the examination of new phenomena, such as multiplexity-induced correlations [19] and new dynamical feedbacks [26], which arise when generalizing the usual single-layer networks. In Fig. 1, we show a schematic of multi-layer networks. In the left panel, we highlight the different interactions (edges) between entities (nodes) in different layers; in the right panel, we highlight the connections between different layers.
The remainder of this paper is organized as follows. In Section II, we represent single-layer (i.e., "monoplex") networks using a tensorial framework. We extend this to multi-layer networks in Section III, and we discuss several descriptors and diagnostics for both single-layer and multi-layer networks in Section IV. We conclude in Section V.

II. SINGLE-LAYER ("MONOPLEX") NETWORKS
Given a set of N objects n i (where i = 1, 2, . . . , N and N ∈ N), we associate to each object a state that is represented by a canonical vector in the vector space R N . More specifically, let e i ≡ (0, . . . , 0, 1, 0, . . . , 0) † , where † is the transposition operator, be the column vector that corresponds to the object n i (which we call a node). The i th component of e i is 1, and all of its other components are 0.
One can relate the objects n i with each other, and our goal is to find a simple way to indicate the presence and the intensity of such relationships. The most natural choice of vector space for describing the relationship is created using the tensor product (i.e., the Kronecker product) R N ⊗ R N = R N ×N [35]. Thus, 2 nd -order (i.e., rank-2) canonical tensors are defined by E ij = e i ⊗ e † j (where i, j = 1, 2, . . . , N ). Consequently, if w ij indicates the intensity of the relationship from object n i to object n j , we can write the relationship tensor as Importantly, the relationships that we have just described can be directed. That is, the intensity of the relationship from object n i to object n j need not be the same as the intensity of the relationship from object n j to object n i . In the context of networks, W corresponds to an N ×N weight matrix that represents the standard graph of a system that consists of N nodes n i . This matrix is thus an example of an adjacency tensor, which is the language that we will use in the rest of this paper. To distinguish such simple networks from the more complicated situations (e.g., multiplex networks) that we discuss in this paper, we will use the term monoplex networks to describe such standard networks, which are time-independent and possess only a single type of edge that connects its nodes.
Tensors provide a convenient mathematical representation for generalizing ordinary static networks, as they provide a natural way to encapsulate complicated sets of relationships that can also change in time [12,36]. Matrices are rank-2 tensors, so they are inherently limited in the complexity of the relationships that they can capture. One can represent increasingly complicated types of relationships between nodes by considering tensors of higher order. An adjacency tensor can be written using a more compact notation that will be useful for the generalization to multi-layer networks that we will discuss later. We will use the covariant notation introduced by Ricci and Levi-Civita in Ref. [37]. In this notation, a row vector a ∈ R N is given by a covariant vector a α (where α = 1, 2, . . . , N ), and the corresponding contravariant vector a α (i.e., its dual vector) is a column vector in Euclidean space.
To avoid confusion, we will use Latin letters i, j, . . . to indicate, for example, the i th vector, the (ij) th tensor, etc; and we will use Greek letters α, β, . . . to indicate the components of a vector or a tensor. With this terminology, e α (i) is the α th component of the i th contravariant canonical vector e i in R N , and e α (j) is the α th component of the j th covariant canonical vector in R N .
With these conventions, the adjacency tensor W can be represented as a linear combination of tensors in the canonical basis: where E α β (ij) ∈ R N ×N indicates the tensor in the canonical basis that corresponds to the tensor product of the canonical vectors assigned to nodes n i and n j (i.e., it is E ij ).
The adjacency tensor W α β is of mixed type: it is 1covariant and 1-contravariant. This choice provides an elegant formulation for the subsequent definitions.

III. MULTI-LAYER NETWORKS
In the previous section, we described a procedure to build an adjacency tensor for a monoplex (i.e., singlelayer) network. In general, however, there might be several types of relationships between pairs of nodes n 1 , n 2 , . . . , n N ; and an adjacency tensor can be used to represent this situation. In other words, one can think of a more general system represented as a multi-layer object in which each type of relationship is encompassed in a single layerk (wherek = 1, 2, . . . , L) of a system. We use the term intra-layer adjacency tensor for the 2 nd -order tensor W α β (k) that indicates the relationships between nodes within the same layerk. The tilde symbol allows us to distinguish indices that correspond to nodes from those that correspond to layers.
We take into account the possibility that a node n i from layerh can be connected to any other node n j in any other layerk. To encode information about relationships that incorporate multiple layers, we introduce the 2 ndorder inter-layer adjacency tensor C α β (hk). Note that C α β (kk) = W α β (k), so the inter-layer adjacency tensor that corresponds to the case in which a pair of layers represents the same layerk is equivalent to the intralayer adjacency tensor of such a layer.
Following an approach similar to that in Section II, we introduce the vectors eγ(k) (whereγ = 1, 2, . . . , L and k = 1, 2, . . . , L) of the canonical basis in the space R L , where the Greek index indicates the components of the vector and the Latin index indicates the k th canonical vector. The tilde symbol on the Greek indices allows us to distinguish these indices from the Greek indices that correspond to nodes. It is straightforward to construct the 2 nd -order tensors Eγ δ (hk) = eγ(h)eδ(k) that represent the canonical basis of the space R L×L .
We can write the multi-layer adjacency tensor discussed early in this section using a tensor product between the adjacency tensors C α β (hk) and the canonical tensors Eγ δ (hk). This yields where w ij (hk) are real numbers that indicate the intensity of the relationship (which may not be symmetric) between nodes n i in layerh and node n j in layerk, and E αγ βδ (ijhk) ≡ e α (i)e β (j)eγ(h)eδ(k) indicates the 4 th -order (i.e., rank-4) tensors of the canonical basis in the space The multi-layer adjacency tensor M αγ βδ is a very general object that can be used to represent a wealth of complicated relationships among nodes. In this paper, we focus on multiplex networks. A multiplex network is a special type of multi-layer network in which the only possible types of inter-layer connections are ones in which a given node is connected to its counterpart nodes in the other layers. In many studies of multiplex networks, it is assumed (at least implicitly) that inter-layer connections exist between counterpart nodes in all pairs of layers.
However, (i) this need not be the case; and (ii) this departs from traditional notions of multiplexity [5], which focus on the existence of multiple types of connections and do not preclude entities from possessing only a subset of the available categories of connections. We thus advocate a somewhat more general (and more traditional) definition of multiplex networks.
When describing a multiplex network, the associated inter-layer adjacency tensor is diagonal. Importantly, connections between a node and its counterparts can have different weights for different pairs of layers, and interlayer connections can also be different for different entities in a network [38]. For instance, this is important for transportation networks, where one can relate the weight of inter-layer connections to the cost of switching between a pair of transportation modes (i.e., layers). For example, at a given station (i.e., node) in a transportation network, it takes time to walk from a train platform to a bus, and it is crucial for transportation companies to measure how long this takes [39]. See Fig. 2 for schematics of multiplex networks.
As we discussed above, entities in many systems have connections in some layers but not in others. For example, a user of online social networks might have a Facebook account but not use Twitter. Such an individual can thus broadcast and receive information only on a subset of the layers in the multiplex-network representation of the system. In a transportation network, if a station does not exist in a given layer of a multi-layer network, then its associated edges also do not exist. The algebra in this paper holds for these situations without any formal modification (one simply assigns the value 0 to associated edges), but one must think carefully about the interpretation of calculations of network diagnostics.
If there is only a single layer, there is no distinction between a monoplex network and a single-layer network, so we can use these terms interchangeably. However, the difference is crucial when studying multi-layer networks. Importantly-because it is convenient, for instance, for the implementation of computational algorithms-one can represent the multi-layer adjacency tensor M αγ βδ as a special rank-2 tensor that one obtains by a process called matricization (which is also known as unfolding and flattening) [40]. The elements of M αγ βδ , which is defined in the space R N ×N ×L×L , can be represented as an N 2 × L 2 or an N L × N L matrix. Flattening a multi-layer adjacency tensor can be very helpful, though of course this depends on the application in question. Recent studies on community detection [12,41], diffusion [21], random walks [22], social contagions [27,28], and clustering coefficients [42] on multi-layer networks have all used matrix representations of multi-layer networks for computational (and occasionally analytical) purposes. For many years, the computational-science community has stressed the importance of developing tools for both tensors and their associated unfoldings (see the review article [40] and numerous references therein) and of examining problems from these complimentary perspectives. We hope that this paper will help foster similar achievements in network science.
Another important special case of multi-layer adjacency tensors are time-dependent (i.e., "temporal") networks. Multi-layer representations of temporal networks have thus far tended to include only connections between a given node in one layer and its corresponding nodes in its one or two neighboring layers. For example, the numerical calculations in Refs. [12] and [36] only use these "ordinal" inter-layer couplings, which causes the off-diagonal blocks of a flattened adjacency tensor to have non-zero entries only along their diagonals, even though the theoretical formulation in those papers allows more general couplings between layers. Indeed, this restriction does not apply in general to temporal networks, as it is important for some applications to consider more general types of inter-layer couplings (e.g., if one is considering causal relationships between different nodes or if one wants to consider inter-layer coupling over a longer time horizon). In temporal networks that are constructed from coupled time series, the individual layers in a multi-layer adjacency tensor tend to be almost completely connected (though the intra-layer edges of course have different weights). In other cases, such as most of the temporal networks discussed in Refs. [4], there might only be a small number of non-zero connections (which represent, e.g., a small number of phone calls in a particular time window) within a single layer.

IV. NETWORK DESCRIPTORS AND DYNAMICAL PROCESSES
In this section, we examine how to generalize some of the most common network descriptors and dynamical processes for multi-layer networks. First, we use our tensorial construction to show that the properties of a multi-layer network when it is made up of only a single layer reduce to the corresponding ones for a monoplex network. We obtain these properties using algebraic operations involving the adjacency tensor, canonical vectors, and canonical tensors. We then generalize these results for more general multi-layer adjacency tensors.

A. Monoplex Networks
Degree Centrality. Consider an undirected and unweighted network, which can be represented using the (symmetric) adjacency tensor W α β . Define the 1-vector u α = (1, . . . , 1) † ∈ R N whose components are all equal to 1, and let U β α = u α u β be the 2 nd -order tensor whose elements are all equal to 1. We adopt Einstein summation notation (see Appendix A) and interpret the adjacency tensor as an operator to be applied to the 1-vector.
We thereby calculate the degree centrality vector (or degree vector ) k β = W α β u α in the space R N . It is then straightforward to calculate the degree centrality of node n i by projecting the degree vector onto the i th canonical vector: k(i) = k β e β (i). Analogously, for an undirected and weighted network, we use the corresponding weighted adjacency tensor W α β to define the strength centrality vector (or strength vector ) s β , which can be used to calculate the strength (i.e., weighted degree) [43,44] of each node. With our notation, the mean degree is Directed networks are also very important, and they illustrate why it is advantageous to introduce contravariant notation. Importantly, in-degree centrality and outdegree centrality are represented using different tensor products. The in-degree centrality vector is k β = W α β u α , whereas the out-degree centrality vector is k α = W α β u β . We then recover the usual definitions for directed networks. For example, the in-degree centrality of node n i is k in (i) = W α β u α e β (i). In directed and weighted networks, the analogous definition yields the in-strength centrality vector and the out-strength centrality vector. These calculations with directed networks are simple, but they illustrate that the proposed tensor algebra makes possible to develop a deeper understanding of networks, as the tensor indices are related directly to the directionality of relationships between nodes in a network.
Clustering Coefficients. Clustering coefficients are useful measures of transitivity in a network [45]. For unweighted and undirected networks, the local clustering coefficient of a node n i is defined as the number of existing edges among the set of its neighboring nodes divided by the total number of possible connections between them [46]. Several different definitions for local clustering coefficients have been developed for weighted and undirected networks [44,[47][48][49] and for directed networks [50]. Given a local clustering coefficient, one can calculate a different global clustering coefficient by aver-aging over all nodes. Alternatively, one can calculate a global clustering coefficient as the total number of closed triples of nodes (where all three edges are present) divided by the number of connected triples [2].
One can obtain equivalent definitions of clustering coefficients in terms of walks on a network. In standard network theory, suppose that one has an adjacency matrix A and a positive integer m. Then, each matrix element (A m ) ij gives the number of distinct walks of length m that start from node n i and end at node n j . Therefore, taking j = i and m = 3 yields the number of walks of length 3 that start and end at node n i . In an unweighted network without self-loops, this yields the number of distinct 3-cycles t(i) that start from node n i . One then calculates the local clustering coefficient of node n i by dividing t(i) by the number of 3-cycles that would exist if the neighborhood of n i were completely connected. For example, in an undirected network, one divides t(i) by k(i) (k(i) − 1), which is the number of ways to select two of the neighbors of n i . In our notation, the value of (A m ) ij is which reduces to t(i) = W α ρ W ρ σ W σ β e α (i)e β (i) for j = i and m = 3. One can then define the local clustering coefficient by dividing the number of 3-cycles by the number of 3-cycles in a network for which the neighborhood of the node n i is completely connected. This yields where is the adjacency tensor corresponding to a network that includes all edges except for self-loops. To use the above formulation to calculate a global clustering coefficient of a network, we need to calculate both the total number of 3-cycles and the total number of 3cycles that one obtains when the second step of the walk occurs in a complete network. A compact way to express this global clustering coefficient is One can define a clustering coefficient in a weighted network without any changes to Eqs. (5) and (6) by assuming that W α β corresponds to the weighted adjacency tensor normalized such that each element of the tensor lies in the interval [0, 1]. If weights are not defined within this range, then Eqs. (5) and (6) do need to be modified. One might also wish to modify Eq. (5) to explore generalizations of the several existing weighted clustering coefficients for ordinary networks [47].
We now modify Eq. (6) to consider weighted clustering coefficients more generally. Let N be a real number that can be used to rescale the elements of the tensor. Definẽ W α β = W α β /N , where one can define the normalization N in various ways. For example, it can come from the maximum (so that N = max It is straightforward to show that c(W α β ) = c(W α β )/N . Therefore, we redefine the global clustering coefficient c(W α β ) from Eq. (5) using this normalization: The same argument applies in the case of the local clustering coefficient for weighted networks. The choice of the norm in the normalization factor N is an important consideration. For example, the choice N = max α,β {W α β } ensures that Eq. (7) reduces to Eq. (6) for unweighted networks.
Eigenvector and Katz Centralities. Numerous notions of centrality exist to attempt to quantify the importance of nodes (and other components) in a network [5]. For example, a node n i has a high eigenvector centrality if its neighbors also have high eigenvector centrality, and the recursive nature of this notion yields a vector of centralities that satisfies an eigenvalue problem.
Let A be the adjacency matrix for an undirected network, v be a solution of the equation Av = λ 1 v, and λ 1 be the largest ("leading") eigenvalue of A. Thus, v is the leading eigenvector of A, and the components of v give the eigenvector centralities of the nodes. That is, the eigenvector centrality of node n i is given by v i [51,52].
In our tensorial formulation, the eigenvector centrality vector is a solution of the tensorial equation and v α e α (i) gives the eigenvector centrality of node n i . For directed networks, there are two leading eigenvectors, and one needs to take into account the difference between Eq. (8) and its contravariant counterpart. Moreover, nodes with only outgoing edges have an eigenvector centrality of 0 if the above definition is adopted. One way to address this situation is to assign a small amount b of centrality to each node before calculating centrality. One incorporates this modification of eigenvector centrality by finding the leading-eigenvector solution of the eigenvalue problem v = aAv + b1, where 1 is a vector in which each entry is a 1. This type of centrality is known as Katz centrality [53]. One often chooses b = 1, and we note that Katz centrality is well-defined if λ −1 1 > a. Using tensorial notation, we obtain To calculate Katz centrality from Eq. (9), we need to calculate the tensor inverse T α β , which satisfies the equation Modularity. It is often useful to decompose networks into disjoint sets ("communities") of nodes such that (relative to some null model) nodes within each community are densely connected to each other but connections between communities are sparse. Modularity is a network descriptor that can be calculated for any partition of a network into disjoint sets of nodes. Additionally, one can attempt to algorithmically determine a partition that maximizes modularity to identify dense communities in a monoplex network. There are many ways to maximize modularity as well as many other ways to algorithmically detect communities (see the review articles [54,55]). We will consider Newman-Girvan modularity [56], which is the most popular version of modularity and can be written conveniently in matrix notation [57,58]. Let S α a be a tensor in R N ×M , where α indexes nodes and a indexes the communities 1 in an undirected network, which can be either weighted or unweighted. The value of a component of S α a is defined to be 1 when a node belongs to a particular community and 0 when it does not. We introduce the tensor B α β = W α β − k α k β /K, where K = W α β U β α . It follows that the modularity of a network partition is given by the scalar 2 To consider a general null model, we write B α β = W α β − P α β , where P α β is a tensor that encodes the random connections against which one compares a network's actual connections. With this general null-model tensor, modularity is also appropriate for directed networks (though, of course, it is still necessary to choose an appropriate null model).
Von Neumann Entropy. The study of entropy in monoplex networks has been used to help characterize complexity in networked systems [59][60][61][62][63]. As an example, let's consider the Von Neumann entropy of a monoplex network [64]. Recall that the Von Neumann entropy extends the Shannon (information) entropy to quantum systems. In quantum mechanics, the density matrix ρ is a positive semidefinite operator that describes the mixed state of a quantum system, and the Von Neumann entropy of ρ is defined by H(ρ) = −tr (ρ log 2 ρ). The eigenvalues of ρ must sum to 1 to have a well-defined measure of entropy.
We also need to recall the (unnormalized) combinatorial Laplacian tensor, which is a well-known object in graph theory (see, e.g., [65,66] and references therein) and is defined by L α β = ∆ α β − W α β , where ∆ α β = W η γ u η e γ (β)δ α β is the strength tensor (i.e., a diagonal tensor whose elements represent the strength of the nodes). The combinatorial Laplacian is positive semidefinite, and the trace of the strength tensor is ∆ = ∆ α α . The eigenvalues of the density tensor ρ α β = ∆ −1 L α β sum to 1, as required, and they can be used to define the Von Neumann entropy of a monoplex network using the formula Using the eigen-decomposition of the density tensor, one can show that the Von Neumann entropy reduces to where Λ α β is the diagonal tensor whose elements are the eigenvalues of ρ α β (see Appendix C). Diffusion and Random Walks. A random walk is the simplest dynamical process that can occur on a monoplex network, and random walks can be used to approximate other types of diffusion [2,67]. Diffusion is also relevant for many other types of dynamical processes (e.g., for some types of synchronization [68]). Let x α (t) denote a state vector of nodes at time t. The diffusion equation is where D is a diffusion constant. Recall that s γ = W α γ u α is the strength vector and that s γ e γ (β)x β (t) = s γ e γ (β)δ α β x α (t). This yields the following covariant diffusion law on monoplex networks: where L α β = W η γ u η e γ (β)δ α β − W α β is the combinatorial Laplacian tensor. The solution of Eq. (14) is x β (t) = x α (0)e −DL α β t .
Random walks on monoplex networks [2,67,69] have attracted considerable interest because they are both important and easy to interpret. They have yielded important insights on a huge variety of applications and can be studied analytically. For example, random walks have been used to rank Web pages [70] and sports teams [71], optimize searches [72], investigate the efficiency of network navigation [73,74], characterize cyclic structures in networks [75], and coarse-grain networks to illuminate meso-scale features such as community structure [76][77][78].
In this paper, we consider a discrete-time random walk. Let T α β denote the tensor of transition probabilities between pairs of nodes, and let p α (t) denote the vector of probabilities to find a walker at each node. Hence, the covariant master equation that governs the discretetime evolution of the probability from time t to time t + 1 is p β (t + 1) = T α β p α (t). One can rewrite this master equation in terms of evolving probability rates aṡ p β (t) = −L α β p α (t), where L α β = δ α β −T α β is the normalized Laplacian tensor. The normalized Laplacian governs the evolution of the probability-rate vector for random walks.

B. Multi-Layer Networks
Because of its structure, a multi-layer network can incorporate a lot of information. Before generalizing the descriptors that we discussed for monoplex networks, we discuss some algebraic operations that can be employed to extract useful information from an adjacency tensor.
Contraction. Tensor contractions yield interesting quantities that are invariants when the indices are repeated (see Appendix A). For instance, one can obtain the number of nodes in a network (which is an invariant quantity) by contracting the Kronecker tensor N = δ α α , where we have again used Einstein summation convention. For unweighted networks, one obtains the number of edges (which is another invariant) by calculating the scalar product between the adjacency tensor W α β with the dual 1-tensor U β α (whose components are all equal to 1).
Single-Layer Extraction. In some applications, it can be useful to extract a specific layer (e.g., ther th one) from a multi-layer network. Using tensorial algebra, this operation is equivalent to projecting the tensor M αγ βδ to the canonical tensor Eδ γ (rr) that corresponds to this particular layer. The 2 nd -order canonical tensors in R L×L form an orthonormal basis, so the product Eγ δ (hk)Eδ γ (rr) equals 1 forh =k =r and it equals 0 otherwise. Therefore, we use Eq. (3) to write which is the desired adjacency tensor that corresponds to layerr. Clearly, it is possible to use an analogous procedure to extract any other tensor (e.g., ones that give inter-layer relationships). In practical applications, for example, it might be useful to extract the tensors that describe inter-layer connections between pairs of layers in the multi-layer network to compare the strengths of the coupling between them. Another important application, which we discuss later, is the calculation of multi-layer clustering coefficients.
Projected and Overlay Monoplex Networks. In some cases, one constructs a monoplex network by aggregating multiple networks. This is useful in many situations. For example, the first step on studying a temporal network is often to aggregate over time. When studying social networks, one often aggregates over different types of relationships, different communication platforms, and/or different social circles. To project a multilayer network onto a weighted single-layer network, we multiply the corresponding tensor by the 1-tensor Uδ γ .
The projected monoplex network P α β = M αγ βδ Uδ γ that we obtain is Importantly, a projected monoplex network is different from the weighted monoplex network (which one might call an overlay monoplex network ) that one obtains from a multi-layer network by summing the edges over all layers for each node. In particular, the overlay network ignores the non-negligible contribution of inter-layer connections, which are also important for quantifying the properties of a multi-layer network. One obtains the overlay network from a multi-layer adjacency tensor by contracting the indices corresponding to the layer components: In Fig. 3, we show schematics to illustrate the difference between projected and overlay monoplex networks.
Network of Layers. It can be useful to construct a global observable to help understand relations between layers at a macroscopic level. For instance, suppose that two layers have more inter-connected nodes than other pairs of layers. Perhaps there are no connections across a given pair of layers. One can build a network of layers to help understand the structure of such inter-connections. Such a network is weighted, though the weighting procedure is application-dependent. As an example, let's consider the most intuitive weighting procedure: for each pair of layers (hk), we sum all of the weights in the connections between their nodes to obtain edge weights of qhk = C α β (hk)U β α . For the special case of multiplex networks with unit weights between pairs of nodes in different layers, we obtain qhk = N if layersh andk are connected. The resulting weighted adjacency tensor of layers in the space R L×L is Hence, one can calculate Ψγ δ from the multi-layer adjacency tensor with the formula One can then normalize the resulting tensor in a way that is appropriate for the application of interest. For multiplex networks, for example, the most sensible normalization constant is typically the number of layers L.
In the insets (in the top left corners) of the panels of Fig. 2, we show representations for networks of layers that correspond to three different topologies of connections between layers.
Degree Centrality. We now show how to compute degree centrality for multi-layer networks by performing the same projections from the case of monoplex networks using 1-tensors of an appropriate order (recall that a 1tensor is a tensor that contains a 1 in every component). We thereby obtain a multi-degree centrality vector K α = M αγ βδ Uδ γ u β . After some algebra, we obtain where k α (hk) is the degree centrality vector that corresponds to connections between layersh andk. Even in the special case of multiplex networks, it is already evident that K α differs from the degree centrality vector Triangles can be closed using intra-layer connections from different layers. In the figure, we show two different situations that can arise. For example, the left panel might represent a multi-layer social network in which nodes A and B are friends of node C but are not friends with each other (first layer), but nodes A and B still have a social tie because they work in the same company even though node C does not work there (third layer). In the right panel, one might imagine that each layer corresponds to a different online social network: perhaps node B tweets about an item; node C sees node B's post on Twitter (first layer) and then posts the same item on Facebook (second layer); node A then sees this post and blogs about it (third layer), and node B reads this blog entry. that one would obtain by simply projecting all layers of a multi-layer network onto a single weighted network. The definitions of mean degree, second moment, and variance are analogous to the corresponding monoplex network counterparts, except that one uses K α instead of k α .
Clustering Coefficients. For multi-layer networks, it is nontrivial to define a clustering coefficient using triangles as a measure of transitivity. As shown in Fig. 4, a closed set of three nodes might not exist on any single layer, but transitivity can still arise an a consequence of multiplexity. In the left panel, for example, suppose that nodes A and B are friends with node C but not with each other, but that nodes A and B still have a social tie because they work at the same company (but node C does not). Information can be passed from any one node to any other, but connections on multiple layers might be necessary for this to occur.
As with monoplex networks, we start by defining the integer power of an adjacency tensor and use a contraction operation. For instance, one can calculate the square of M αγ βδ by constructing the 8 th -order (i.e., rank-8) tensor M αγ βδ M ρ˜ ση and then contracting β with ρ andδ with . One computes higher powers analogously. We define a global clustering coefficient on a multi-layer adjacency tensor by generalizing Eq. (6) for 4 th -order tensors: where we again define F βδ η = U βδ η − δ βδ η as the adjacency tensor of a complete multi-layer network (without selfedges). The choice of the normalization factor N is again arbitrary, but the choice N = max α,β,γ,δ {M αγ βδ } ensures that Eq. (21) is well-defined for both weighted and unweighted multi-layer networks.
The tensor contractions in Eq. (21) count all of the 3-cycles, including ones in which a walk goes through any combination of inter-layer and intra-layer connections. Thus, for multiplex networks with categorical layers, Eq. (21) counts not only fully intra-layer 3-cycles but also the inter-layer 3-cycles that are induced by the connection of nodes to their counterparts in all of the other layers.
A more traditional, and also simpler, approach to calculating a global clustering coefficient of a multi-layer network is to project it onto a single weighted network [i.e., the overlay network O α β defined by Eq. (17)] and then calculating a clustering coefficient for the resulting network. In this case, we obtain where M = max α,β {M αγ βγ }/L. In the chosen normalization, note that we need to include a factor for the number of layers L, because the construction of the overlay network discards the information about the number of layers. For example, adding an empty layer to a multi-layer network does not affect the resulting overlay network, but it increases the number of possible walks that one must consider in Eq. (22). In general, the clustering coefficient in Eq. (22) is different from the one in Eq. (21), because Eq. (22) discards all inter-layer connections. However, there are some special cases when Eq. (22) and Eq. (21) take the same value. In particular, this occurs when there is only a single layer or, more generally, when there are no inter-layer edges and all of the intra-layer networks are exactly the same.
The global clustering coefficient defined in Eq. (22) sums the contributions of 3-cycles for which each step of a walk is on the same layer and those for which a walk traverses two or three layers. We decompose this clustering coefficient to separately count the contributions of 3-cycles that take place on one, two, and three layers [42]. To do this using our tensorial framework, we modify Eq. (22) as follows: where we have employed layer extraction operations (see our earlier discussion), w Ω is a vector that weights the contribution of 3-cycles that span Ω layers, and δ Ω (where Ω = 1, 2, 3) is a function that selects the cycles with Ω layers: We recover Eq. (22) with the choice w Ω = (1/3, 1/3, 1/3).
(We note that C(M αγ βδ , w Ω ) = C(M αγ βδ , cw Ω ) for any nonzero constant c, though we still normalize w Ω by convention.) By contrast, with the choice of w Ω = (1, 0, 0), we only consider 3-cycles in which every step of a walk is on the same layer.
Importantly, we can define the weight vector w Ω in the clustering coefficient in Eq. (23) so that it takes into account that there might be some cost for crossing different layers. As discussed in [42], one determines this "cost" based on the dynamics and the application of interest. For example, if one is studying a dynamical process whose time scale is very fast compared to the time scale (i.e., the cost) associated to changing layers, then it is desirable to consider the contribution from only one layer (i.e., the one in which the dynamical process occurs). For other dynamical processes, it is compulsory to also include contributions from two or three layers. To give a brief example, consider transportation at the King's Cross/St. Pancras station in London. This station includes a node in a layer that describes travel via London's metro transportation system, a node in a layer for train travel within England, and a node for international train travel. A relevant cost is then related to how long it takes to travel between different parts of the station [39]. One needs to consider such intra-station travel time in comparison to the schedule times of several transportation mechanisms. By contrast, there is typically very little cost associated with a person seeing information on Facebook and then posting it on Twitter.
In the above definition, the entries of F βδ η are all equal to 1 except for self-edges. Sometimes, we need to instead use a tensorF βδ η that we construct by setting some of the off-diagonal entries of F βδ η to 0. If the original multilayer network cannot have a particular edge, then the tensorF βδ η needs to have a 0 in its corresponding entry. For example, this is necessary for multi-layer networks whose structural constraints forbid the existence of some nodes in certain layers or forbid certain inter-and intralayer edges. It is also necessary for multiplex networks, for which inter-layer edges can only exist between nodes and their counterparts in other layers. Note, however, that using the tensorF βδ η instead of F βδ η influences the normalization of clustering coefficients, as it affects the set of potential 3-cycles that can exist in a multi-layer network.
Eigenvector Centrality. Generalizing eigenvector centrality for multi-layer network is not trivial, and there are several possible ways to do it [24].
References [21,22,27] and [28] recently introduced the concepts of supra-adjacency (i.e., "super-adjacency") and supra-Laplacian (i.e., super-Laplacian) matrices to formulate and solve eigenvalue problems in multiplex networks. Such supra-matrices correspond to unique unfoldings of corresponding 4 th -order tensors to obtain square matrices. It is worth noting that the tensorial space in which the multi-layer adjacency tensor exists is R N ×N ×L×L , and there exists a unique unfolding-up to the L! permutations of diagonal blocks of size N × N in the resulting space-that provides a square supraadjacency tensor defined in R N L×N L . We now exploit the same idea by arguing that a supra-eigenvector corresponds to a rank-1 unfolding of a 2 nd -order "eigentensor" V αγ . According to this unique mapping, if λ 1 is the largest eigenvalue and V αγ is the corresponding eigentensor, then it follows that Therefore, similarly to monoplex networks, one can calculate the leading eigentensor V αγ iteratively. Start with a tensor X αγ (t = 0), which we can take to be X αγ (t = 0) = U αγ . By writing X αγ (0) as a linear combination of the 2 nd -order eigentensors and by observing that X αγ (t) = M αγ βδ t X αγ (0), one can show that X αγ (t) is proportional to V αγ in the t −→ ∞ limit. The convergence of this approach is ensured by the existence of the unfolding of M , since the iterative procedure is equivalent to the one applied to the corresponding supramatrices.
We thereby obtain a multi-layer generalization of Bonacich's eigenvector centrality [51,52]: The monoplex notion of eigencentrality grants importance to a node based on its connection to other nodes. One needs to be careful about both intra-layer and interlayer connections when intepreting the results of calculating a multi-layer generalization of it. For example, the intra-layer connections in one layer might be more important than those in others. For inter-layer connections, one might ask about how much of a "bonus" an entity earns based on its presence in multiple layers. (This contrasts with the "cost" that we discussed previously in the context of transportation networks.) For instance, many web services attempt to measure the influence of people on social media by combining information from multiple online social networks, and one can choose which communication modes (i.e., layers) to include. Moreover, by considering an overlay monoplex network or a projection monoplex network, it is possible to derive separate centrality scores for different layers. In the above example, this would reflect the different levels of importance that somebody has on different social media.

Modularity.
A multi-layer generalization of modularity was derived in Ref. [12] by considering random walks on networks. Let S αρ a be a tensor in R N ×L×M , where (α,ρ) indexes nodes and a indexes the communities in an undirected multi-layer network, which can be either weighted or unweighted. The value of a components of S αρ a is defined to be 1 when a node belongs to a particular community and 0 when it does not. We introduce the tensor B αρ βσ = W αρ βσ − P αρ βσ , where K = W αρ βσ U βσ αρ and P αρ βσ is a null-model tensor that encodes the random connections against which one compares a multi-layer network's actual connections. It follows that the modularity of a partition of a multi-layer network is given by the scalar There are numerous choices for the null-model tensor P αρ βσ . The null models discussed in Refs. [12,36,79] give special cases of the multi-layer modularity in Eq. (26).
Von Neumann Entropy. To generalize the definition of Von Neumann entropy to multi-layer networks, we need to generalize the definition of the Laplacian tensor. Such an extension is not trivial because one needs to consider eigenvalues of a 4 th -order tensor.
As we showed previously when generalizing eigenvector centrality, the existence of a unique unfolding into supramatrices allows one to define and solve the eigenvalue problem where L αγ βδ = ∆ αγ βδ − M αγ βδ is the multi-layer Laplacian tensor, ∆ αγ βδ = M η˜ ρσ U η˜ E ρσ (βδ)δ αγ βδ is the multi-strength tensor (i.e., the rank-4 counterpart of the strength tensor ∆ α β that we defined previously for monoplex networks), λ is an eigenvalue, and V αγ is its corresponding eigentensor (i.e., the unfolded rank-1 supra-eigenvector). We note that there are at most N L different eigenvalues and corresponding eigentensors. Let ∆ = ∆ αγ αγ be the trace of the multi-strength tensor. The eigenvalues of the multi-layer density tensor ρ αγ βδ = ∆ −1 L αγ βδ sum to 1, so we can use them to define the Von Neumann entropy of a multi-layer network as where Λ αγ ββ is the diagonal tensor whose elements are the eigenvalues of ρ αγ βδ . Diffusion and Random Walks. Diffusion in multiplex networks was investigated recently in Ref. [21]. A diffusion equation for multi-layer networks needs to include terms that account for inter-layer diffusion. Let X αγ (t) denote the state tensor of nodes in each layer at time t. The simplest diffusion equation for a multi-layer network is then As in the case of monoplex networks, we introduce the multi-layer combinatorial Laplacian to obtain the following covariant diffusion equation for multi-layer networks: The solution of Eq. (31) is X βδ (t) = X αγ (0)e −L αγ βδ t , and this provides a natural generalization of the result for monoplex networks. The study of random walks is important for many applications in multi-layer networks. For instance, they were used to derive multi-layer modularity [12] and to develop optimized exploration strategies [22]. As we illustrate in Fig. 5, a random walk on a multi-layer network induces nontrivial effects because the presence of inter-layer connections affects its navigation of a networked system. As with monoplex networks, we consider discrete-time random walks. Let T αγ βδ denote the tensor of transition probabilities for jumping between pairs of nodes and switching between pairs of layers, and let p αγ (t) be the time-dependent tensor that gives the probability to find a walker at a particular node in a particular layer. Hence, the covariant master equation that governs the discrete-time evolution of the probability from time t to time t + 1 is p βδ (t + 1) = T αγ βδ p αγ (t). We rewrite this master equation in terms of evolving probability rates to obtainṗ βδ (t) = −L  in a multiplex network. A walker can jump between nodes within the same layer, or it might switch to another layer. This illustration evinces how multiplexity allows a random walker to move between nodes that belong to different (disconnected) components on a given layer.

V. CONCLUSIONS AND DISCUSSION
In this paper, we developed a tensorial framework to study general multi-layer networks. We discussed the generalization of several important network descriptorsincluding degree centrality, clustering coefficients, eigenvector centrality, and modularity-for our multi-layer framework. We examined different choices that one can make in developing such generalizations, and we also demonstrated how our formalism yields results for monoplex and multiplex networks as special cases.
As we have discussed in detail, our multi-layer formalism provides natural generalizations of network descriptors, and this allows systematic comparisons of multilayer diagnostics with their single-layer counterparts. As we have also illustrated (e.g., for global clustering coefficients), our formalism also allows systematic comparisons between different ways of generalizing familiar network concepts. This is particularly important for the examination of new phenomena, such as multiplexity-induced correlations [19], that arise when generalizing beyond the usual single-layer networks. This occurs even for simple descriptors like degree centrality, for which the tensor indices in our formulation are related directly to the directionality of relationships between nodes in a multi-layer network.
The mathematical formalism that we have introduced can be generalized further by considering higher-order (i.e., higher-rank) tensors. This will provide a systematic means to investigate networks that are, for example, both time-dependent and multiplex.
Our tensorial framework is an important step towards the development of a unified theoretical framework for studying networks with arbitrary complexities (including multiplexity, time-dependence, and more). When faced with generalizing the usual adjacency matrices to incorporate a feature such as multiplexity, different scholars have employed different notation and terminology, and it is thus desirable to construct a unified framework to uni-fying the language for studying networks. Moreover, in addition to defining mathematical notation that simplifies the handling and generalization of previously known diagnostics on networks, a tensorial framework also offers the opportunity to unravel new properties that remain hidden when using the classical approach of adjacency matrices. We hope to construct a proper geometrical interpretation for tensorial representations of networks and to ultimately obtain an operational theory of dynamics both on and of networks. This perspective has led to significant advances in other areas of physics, and we believe that it will also be important for the study of networks.
Einstein notation is a summation convention, which we adopt to reduce the notational complexity in our tensorial equations, that is applied to repeated indices in operations that involve tensors. For example, we use this convention in the left-hand sides of the following equations: whose right-hand sides include the summation signs explicitly. It is straightforward to use this convention for the product of any number of tensors of any order. Repeated indices, such that one index is a subscript and the other is a superscript, is equivalent to perform a tensorial operation known as a contraction. Contracting indices reduces the order of a tensor by 2. For instance, the contraction of the 2 nd -order tensor A α β is the scalar A α α , and the 2 nd -order tensors A α β B β δ and A δ β B γ δ are obtained by contracting the 4 th -order tensor A α β B γ δ . It is important to adopt Einstein summation on repeated indices in a way that is unambiguous. For example, A α α = A β β , but A α β B β γ C γ α is not equivalent to A α β B β β C β α because of the ambiguity about the index β in the second term of A α β B β β C β α . Specifically, it is not clear if the contraction B β β should be calculated before the product with the other tensors or vice versa. Another situation that deserves particular attention is equations that involve a ratio between tensorial products, where one should separately apply the Einstein convention to the numerator and the denominator. Thus, one should not perform products between tensors B α β and C α β with repeated indices α and β in cases like This occurs, for example, in Eq. (5) in the main text.

Appendix B: Definition of the Tensor Exponential and Logarithm
The exponential of a tensor B α β is a tensor A α β such that e B α β = A α β . The tensor exponential is defined by the power series [80]  A complete discussion of the properties of the tensor exponential is beyond the scope of the present paper. However, we show an example of how to calculate it for diagonalizable tensors. Let B α β be a diagonalizable tensor. In other words, there exists a diagonal tensor D α β , whose elements are the eigenvalues of B α β , and a tensor J α β , whose columns are the eigenvectors of B α β , such that B α β = J α σ D σ τ J τ The exponential of a diagonal tensor is the tensor obtained by exponentiating each of the diagonal elements D α β , and it is straightforward to calculate e D α β .
The logarithm of a tensor A α β is defined as the tensor B α β that satisfies the relation e B α β = A α β . It is straightforward to show for a diagonal tensor A α β that where D α β is the diagonal tensor whose elements are the eigenvalues of A α β , and J α β is a tensor whose columns are the eigenvectors of A α β .

Appendix C: Derivation of Von Neumann Entropy
The Von Neumann entropy of a monoplex network is defined by Eq. (11). Let ξ α be the i th eigenvector (i = 1, 2, . . . , N ) of the density tensor ρ α β , and let Ξ α β be the tensor of eigenvectors. The density tensor is defined by rescaling the combinatorial Laplacian, so it has positive diagonal entries and non-positive off-diagonal entries. It is positive semidefinite and has non-negative eigenvalues. We diagonalize the density tensor to obtain , where Λ α β is a diagonal tensor whose elements are the eigenvalues of ρ α β . These eigenvalues are equal to the eigenvalues of the combinatorial Laplacian tensor rescaled by the scalar ∆ −1 , where ∆ = ∆ α α and ∆ α β is the strength tensor. It follows that where we have exploited the relation Ξ α σ (Ξ α ) −1 = δ σ . We obtain Eq. (12) by multiplying both sides of Eq. (C1) by −1.