Multiplex Markov chains: Convection cycles and optimality

Multiplex networks are a common modeling framework for interconnected systems and multimodal data, yet we still lack fundamental insights for how multiplexity affects stochastic processes. We introduce a novel “Markov chains of Markov chains” model called multiplex Markov chains (MMCs) such that with probably (1 − ω ) ∈ [0 , 1] random walkers remain in the same layer and follow (layer-speciﬁc) intralayer Markov chains , whereas with probability ω they move to different layers following (node-speciﬁc) interlayerMarkovchains . One main ﬁnding is the identiﬁcation of multiplex convection , whereby a stationary distribution exhibits circulating ﬂows that involve multiple layers. Convection cycles are well understood in ﬂuids, but are insufﬁciently explored on networks. Our experiments reveal that one mechanism for convection is the existence of imbalances for the (intralayer) degrees of nodes in different layers. To gain further insight, we employ spectral perturbation theory to characterize the stationary distribution for the limits of small and large ω , and we show that MMCs inherently exhibit optimality for intermediate ω in terms of their convergence rate and the extent of convection. As an application, we conduct an MMC-based analysis of brain-activity data, ﬁnding MMCs to differ between healthy persons and those with Alzheimer’s disease. Overall, our work suggests MMCs and convection as two important new directions for network-related research.

Multiplex networks [1]-a type of multilayer network [2,3] in which layers encode different types of intralayer edges, and interlayer edges couple the layers-have been used to model interconnection complex systems including transportation networks [4][5][6], critical infrastructures [7], and different types of relationships [8].In the context of data science, they provide frameworks for data integration/fusion [9][10][11] and stratification [12,13].Here, we propose a multiplex generalization of Markov chains [14], a memoryless process for stochastic transitions between discrete states, and they provide a theoretical foundation for diverse applications, such as queuing theory [15], population dynamics [16], and machine-learning algorithms that rely on Markov chain Monte Carlo [17], hidden Markov models [18], and/or Markov decision processes [19].Similar to Markov chains, multiplex Markov chains (MMCs) are expected to find diverse applications within, and beyond, the study of networks.
According to an MMC, random walkers move along (layerspecific) intralayer Markov chains with probability ω ∈ [0, 1], and with probability 1−ω they transition to new layers following (node-specific) interlayer Markov chains.MMCs can be used to study random walks on multiplex networks, and diffusion physics for multiplex networks is already a burgeoning field [20][21][22][23][24][25][26].Most approaches rely on a generalization of the graph Laplacian called a supraLaplacian matrix, which can be constructed by first multiplexing the network layers' adjacency matrices into a supra-adjacency matrix, and then creating a Laplacian matrix by treating the supra-adjacency matrix as if it were a normal adjacency matrix (i.e., neglecting that intra-and interlayer edges are different).Both normalized [20] and unnormalized supraLaplacians [21][22][23][24][25][26] have been studied, and in the latter case, one can simply couple the unnormalized Laplacians of layers.Other formulations have also been proposed to study centrality and consensus [27][28][29][30][31][32].Despite the significant advances that have been made, this field remains in its infancy [33].
By coupling "Markov-chain layers" rather than "network layers," we identify and study novel multiplexity-induced phenomena including multiplex imbalance and multiplex convection, which-along with the convergence rate λ 2 -are found to be optimized at intermediate ω.These properties are shown to have an interesting and complicated relation to the imbalances of nodes' intralayer degrees.We analyze MMCs with spectral perturbation theory to characterize the stationary distribution when there is a separation of timescales between intra-and interlayer transitions: As ω → 0, intralayer Markov chains each approach (local) stationary solutions, and these (layer-specific) solutions are balanced by the interlayer Markov chains.Analogously, as ω → 1 the interlayer Markov chains individually approach (local) stationary solutions, and these (node-specific) solutions are balanced by the intralayer Markov chains.This work also reveals that the optimality for λ 2 (ω) is an inherent property of MMCs.
Model.Consider a set of intralayer Markov chains (the "layers") with size-N transition matrices P (i) for i ∈ {1, . . ., I} and a set of (node-specific) interlayer Markov chains with size-I transition matrices P (n) for n ∈ {1, . . ., N }.Each P (i) nm scales the transition probability from node n to m in layer i, while P (n) ij scales the transition probability from layer i to j for node n.Or equivalently, these give transitions between node-layer pairs: from (n, i) to (m, i) in the case of P (i) nm , and from (n, i) to (n, j) in the case of is a stochastic process in which the states are node-layer pairs (n p , i p ), which we enumerate p ∈ {1, . . ., N I}, yielding n p = (p mod N ) and i p = p/N ( 'mod' and • indicate the modulus and ceiling functions, respectively.)Transitions between states follow a supratransition matrix Coupling strength ω ∈ [0, 1] tunes the probability that random walks use interlayer vs intralayer Markov chains, diag(•) is a block-diagonal matrix in which the argument matrices are placed along the diagonal, ⊗ denotes the Kronecker product, and E (n) = e (n) [e (n) ] T , where e (n) m = 1 if n = m and 0 otherwise.Each [P(ω)] pq gives the transition probability from (n p , i p ) to (n q , i q ).Under the assumption of uniform coupling, we let P (n) = P ∀n, and the last term in Eq. ( 1) simplifies to ω P ⊗ I, where I is a size-N identity matrix.
Let x (t) be a length-N I row vector such that [x (t) ] p gives the expected fraction of random walkers at node-layer pair (n p , i p ) at time t.Given initial condition x (0) , x (t) evolves following a linear discrete map x (t+1) = x (t) P(ω).If P(ω) is nonnegetive, irreducible, and aperiodic, then x (t) → v(ω) converges to a unique stationary distribution, which is the left dominant eigenvector of P(ω).It is convenient to define π (ip) np (ω) = [v(ω)] p and to drop the subscript p, so that π (i) n (ω) is the density of walkers at node n in layer i. Application to a multiplex network.MMCs provide a new form of diffusion on multiplex networks.Let intra-and interlayer transition matrices, respectively, of Markov chains derived from a multiplex network in which A (i) and Ã(n) are intra-and interlayer adjacency matrices, and D (i) and D(n) are diagonal matrices with entries nm and Dii = j Ã(n) ij that encode the intra-and interlayer node degrees.In Fig. 1(A), we visualize such an MMC in which P (1)  and P (2) are intralayer transition matrices associated with chain and star networks (both are undirected and unweighted).Self-edges are added to the first and last nodes of the chain to make it 2-regular.We enumerate the nodes n ∈ {1, . . ., 11} clockwise around the chain, starting at the center node.We implement uniform coupling with P = 0 1 1 0 .Node colors in Fig. 1(A) depict π (i) n (ω) for ω = 0.5.Note that π (i) n (ω) is largest for node-layer pair (n, i) = (1, 2) (the hub node in the star network); π (i) n (ω) is also large for node-layer pair (1, 1), since it is coupled to (1, 2) by an interlayer edge.
Observe for layer i = 1 that the π n (ω) values monotonically decrease as one moves clockwise around the chain.That is, the π (i) n (ω) values of MMCs are influenced by the global structure, which can be beneficial, for example, if one seeks to study the importances of nodes and/or layers [29,31,32].This contrasts diffusions based on unnormalized supraLaplacians [21][22][23][24][25][26] (since their stationary distribution is the uniform distribution) and the normalized supraLaplacian of [20] (whose entries are proportional to the node-layer pairs' total degrees).See the Supplementary Material for further discussion.
Multiplex imbalance and convection.The π compose F(ω) into its symmetric, (F pq + F qp )/2, and skew symmetric, ∆ pq /2, components to define the stationary flow imbalance ∆ pq = F pq − F qp across each edge.∆ pq > 0 indicates that there is a greater flow from (n p , i p ) and (n q , i q ), whereas ∆ pq < 0 indicates the opposite.We quantify the total flow imbalance ||∆|| F using the Frobenius norm, and a Markov chain is reversible iff ||∆|| F = 0 (i.e., ∆ pq = 0 ∀p, q).We define multiplex imbalance as the phenomenon whereby the multiplex coupling of reversible Markov chains yields an irreversible MMC.In that case, ||∆|| F quantifies the total multiplex imbalance.
In Fig. 1(B), we visualize ∆ pq for the MMC from Fig. 1(A).Observe that these values exhibit a circulating flow imbalance involving more than one layer, which we call multiplex convection.Because P (1) , P (2) and P are all transition matrices of reversible Markov chains, the irreversibility of the MMC is an emergent multiplexity-induced property.It is worth noting that multiplex convection does not require multiplex imbalance, nor vice versa, and so these are related, but notably different, phenomena.
Timescale separations.We now analyze π n (ω) for two limits: Walkers rarely move between layers as ω → 0, which yields a type of layer decoupling.Walkers rarely remain in the same layer as ω → 1, which yields a type of layer aggregation.
In the Supplementary Material, we show that α is the dominant left eigenvector of an "effective" interlayer transition matrix X with entries Xij = n v That is, each intralayer Markov chain i obtains its stationary distribution v (i) , and these local solutions are balanced by the stationary distribution of an "effective" interlayer Markov chain that depends on all inter-and intralayer Markov chains.
In the case of uniform coupling, P (n) = P , X = P , and α = ṽ (the left dominant eigenvector of P ).We analyze the limit ω → 1 in a similar way, except we first implement a change of basis via the (unitary) stridepermutation matrix U that reorders the node-layer pairs as layer-node pairs: is a supratransition matrix for the same MMC as P(ω); the only difference is that the rows and columns have been permuted.We obtain T and ẽ(i) is a length-I unit vector.Observe that the form of Q(ω) qualitatively matches that of Eq. ( 1); only the inter-and intralayer transition matrices have been swapped.Thus, one can equally interpret an MMC as intralayer Markov chains coupled by intralayer ones, or as interlayer Markov chains coupled by intralayer ones.These are formally the same.We can thus make use of our earlier results to obtain where α = [α 1 , . . ., α N ] is the dominant left eigenvector of a transition matrix X with entries nm for an "effective" intralayer Markov chain.
In Fig. 2, we validate the accuracy of Eqs.Optimal imbalance and convergence rate.Next, we show that total multiplex imbalance ||∆|| F is maximized at some value ω * ∆ = argmax ω ||∆|| F ∈ (0, 1).We use the same interlayer Markov chains as in Fig. 1, but now allow the interlayer Markov chains to be different for every node: In Fig. 3(A), we plot ||∆|| F versus ω; each panel depicts a different strategy.First, because a → 1 recovers the interlayer transition matrix used in Fig. 1 for all strategies, the (blue solid) curves for a = 0.99 are nearly identical in all top panels.As a decreases, the different strategies yield different ||∆|| F curves: (I)-(II) the curves for identical and increasing a n flatten as a decreases and the location of the optimum shifts toward larger ω; (III) the ||∆|| F curves for decreasing a n are insensitive to a; and (IV) the curves for random a n seem to change randomly, but generally decrease.These responses can be understood by noting that a 1 strongly varies with a for the first two strategies, it remains unchanged for the decreasing-a n strategy, and it is random for the last strategy, although its expectation decreases.Parameter a 1 determines the optimality of ||∆|| F in this case, because the net flow ∆ pq is largest from node-layer pair and a 1 tunes the probability of walkers make this transition.In Fig. 3(B), we show that λ 2 (ω)-the second-largest-inmagnitude eigenvalue of P(ω)-is also optimized at similar ) is called the convergence rate.Importantly, because we found P(w) to have I-dimensional and N -dimensional dominant eigenspaces when ω = 0 and ω = 1, respectively, and due to the continuity of eigenvalues [35], it follows that λ 2 (ω) → 1 in either limit.Furthermore, dλ 2 (ω)/dω < 1 at ω = 0 and dλ 2 (ω)/dω > 1 at ω = 1, and so Rolle's theorem [36] guarantees that a mimimum exists.
Interestingly, the locations ω * ∆ and ω * λ2 of optima for ||∆|| F and λ 2 (ω) appear to be strongly correlated for some strategies.We explore this further in Supplementary Material, but generally find that there does not seem to be a simple relationship between multiplex imbalance, convection and the optimality of ||∆|| F and λ 2 (ω).
Application to brain networks reveals mechanism for multiplex imbalance.Next, we study a mechanism that contributes to the optimality of multiplex imbalance: ∆ pq is often large for an interlayer edge associated with a node n such that its intralayer degrees d nm ).Observe in Fig. 1(B), for example, that ∆ pq is largest from node-layer pair (1, 2) to (1, 1), and this edge is also associated with the largest imbalance of intralayer degrees: d = 8 (recall the chain layer has self-edges to make it 2-regular).We now further study this phenomenon for a MMC representation of an (empirical) multiplex brain network [12] with N = 148 nodes (brain regions) and I = 7 layers, which represent pairwise coherences of magnetoencephalography (MEG) signals at different frequency ranges: {[1, 4), [4,8), [8, 10.5), [10.5, 13), [13,20), [20,30), [30,45)}, which are measured in Hz.We uniformly couple the layers with an interlayer Markov chain with transition matrix: Pij = 1 if |i − j| = 1 and i ∈ {1, 7}, Pij = 1/2 if |i − j| = 1 and i ∈ {1, 7}, and Pij = 0 otherwise.In Fig. 4, we plot ∆ pq versus (d nq ) separately for (A) intralayer edges (i.e., i p = i q ) and (B) interlayer edges (i.e., n p = n q ).(Since ∆ pq = −∆ qp , we only plot the positive curves.)Different columns show results for different ω.First, observe that ∆ pq are approximately 50× larger for interlayer edges than for intralayer edges [i.e., compare the y-axis of (B) to that of (A)].Also, note that ∆ pq obtain their largest magnitudes near ω = 0.5 (center column), which is consistent with our finding that ω * ∆ ≈ 0.5 for this MMC.The strong correlation between ∆ pq and (d n ) supports out hypothesis that the largest ∆ pq occur for interlayer edges associated with the largest intralayer-degree imbalance, (d In Supplementary Material, we study the Pearson correlation coefficient R between ∆ pq and (d n ) for frequencymultiplexed brain networks of 50 persons: 25 healthy and 25 with Alzheimer's disease.For healthy persons and ω = 0.5, we find R = 0.52 (−0.37) for interlayer (intralayer) edges.For persons with Alzheimer's disease, we find R = 0.56 (−0.37).We also find max ω ||∆|| F to be 6.6% larger, on average, for persons with Alzheimer's disease.Given our observation that imbalanced intralayer degrees contribute to multiplex imbalance and convection, and both are increased for persons with Alzheimer's disease, our findings are consistent with previous work that found persons with Alzheimer's disease to have a loss of brain inter-frequency hubs [12].
Conclusion.We formulated MMCs as the multiplexing of intra-and interlayer Markov chains, explored novel physical phenomena, and developed a theoretical foundation using perturbation theory.Our work was inspired by the study of diffusion on multiplex networks [20][21][22][23][24][25][26][27][28][29][30][31][32], and we focused our attention on aspects of MMCs that are in striking contrast to these models.In particular, stationary distributions of MMCs can yield multiplex imbalance and convection.These emergent, multiplex-induced phenomena, along with the convergence rate λ 2 , were shown to be optimized for intermediate coupling of Markov chains.These new insights for how multiplexity affects stochastic processes provide a step forward in bringing multiplexity theory to Markov-chain applications within, and beyond, the study of networks [15][16][17][18][19].
DT thanks Per Sebastian Skardal and Naoki Masuda for helpful feedback on this manuscript.DT was supported in part by the Simons Foundation (grant #578333).

II. SPECTRAL PERTURBATION THEORY FOR TIMESCALE SEPARATION
We first study the dominant eigenvector of a supratransition matrix P(ω) in the limit ω → 0 + , which corresponds to when transitions rarely occur using an interlayer Markov chain.
Let µ (i) 1 , v (i) , and u (i) denote the largest positive eigenvalue and corresponding left and right eigenvectors of each transition matrix P (i) of the intralayer Markov chains, where i ∈ {1, . . ., I} is a layer index.Furthermore, let μ(n) 1 , ṽ(n) , and ũ(n) denote the same mathematical elements for each transition matrix P (n) of the interlayer Markov chains, where n ∈ {1, . . ., N } is a node indicx.Note that µ for any i and n, since the transition matrices are row stochastic.Also, their corresponding right eigenvectors u (i) = [1, . . ., 1] T and ũ(n) = [1, . . ., 1] T are vectors in which all the entries are ones.It is advantageous to let the left eigenvectors represent probability distributions, and so we normalize them in 1-norm.We do not normalize the right eigenvectors (i.e., n u for any i and n.Provided that the transition matrices P (i) and P (n) are nonnegative and irreducible, Perron-Frobenius theory for nonnegative matrices [10] guarantees that the left eigenvectors v (i) and ṽ(n) are unique and contain positive entries.
Turning our attention to the spectra of P(ω), we denote its largest positive eigenvalue by λ 1 (ω) and its left and right eigenvectors by v(ω) and u(ω), respectively.We can write the dominant eigenvector equations as Because P(ω) is row stochastic for any ω That is, both are independent of ω, and we can drop ω as an argument.(This will be more rigorously supported in a theorem below.) Provided that P(ω) is a nonnegative irreducible matrix, v(ω) and u(ω) are uniquely defined and have positive entries [10].Note that this explicitly assumes ω ∈ (0, 1).We denote the ω → 0 + limits of v(ω) and u(ω) by u(0 + ).However, when ω = 0 (i.e., ω is exactly zero), then P(ω) is not irreducible.We provide the following theorem to characterize the eigenspace associated with λ 1 = 1 in this case.
Theorem II.1 Let P(ω) be a supracentrality matrix of a multiplex Markov chain and assume that each intralayer transition matrix P (i) is nonnegative, irreducible.Then the geometric and algebraic multiplicity of eigenvalue λ 1 = 1 of P(0) are both I (recall that I is the number of intralayer Markov chains), and the left and right eigenspaces are spanned by orthogonal eigenvectors respectively, where e (i) denotes the unit vector (i.e., all entries are zero except for the i-th entry, which is a 1) and ⊗ denotes the Kronecker product.
Remark II.2We refer to the vectors v (i) and u (i) as 'block vectors', and they consist of zeros, except in the i-th blocks, which are v (i) and u (i) , respectively.
Proof.First, we show that v (i) and u (i) are left and right eigenvectors of P(0) corresponding to the eigenvalue λ 1 = 1, and It is also straightforward to show that these sets of eigenvectors are orthogonal: and These results use that the Kronecker-product identity (a ⊗ b)(c ⊗ d) = ac ⊗ bd (assuming the dimensions appropriately match).Provided that each P (i) is nonnegative and irreducible, the dominant eigenvalue µ (i) 1 = 1 of each P (i) has geometric and algebraic multiplicity equal to 1. Thus the eigenvalue λ 1 = 1 of P(0) has geometric and algebraic multiplicity equal to I. The sets of eigenvectors {v (i) } and {u (i) } are eigenbases for the left and right eigenspaces for λ 1 = 1.
Next, we present our main analytical result for when there is a separation of time scales and transitions are far more likely to utilize an intralayer Markov chain versus an interlayer one.
Theorem II.3Let P(ω) be a supracentrality matrix of a multiplex Markov chain and assume each intralayer transition matrix P (i) is nonnegative, irreducible, and has a dominant eigenvalue such that µ (i) 1 = 1.We define v(0 + ) = lim ω→0 + v(ω) as the limiting left eigenvector of P(ω).Then where the vector α = [α 1 , . . ., α I ] T has positive entries that satisfy I i=1 αi and is a unique solution to αT X = αT . ( with Remark II.4 Since each P (n) is row stochastic, matrix X is also row stochastic: It follows that it [1, . . ., 1] T is a right eigenvector of X for an eigenvalue equal to 1. Therefore X is an "effective" interlayer transition matrix that represents a type of weighted aggregation of the interlayer Markov chains { P (n) }.
Remark II.5 When the intralayer Markov chains are uniformly coupled, i.e., P (n) = P for each node n, it then follows that X = P and α = ṽ, which is the left dominant eigenvector of P .
Remark II.6When the intralayer Markov chains have doubly stochastic transition matrices, i.e., n P (i) nm = m P (i) nm = 1 for each node i, then v (i) n = 1/N for each i and X is the mean intralayer transition matrix.
We Taylor expand v(ω) for small ω as Successive terms in this expansion represent higher-order derivatives of v(ω) with respect to ω, and we assume that v(ω) has the appropriate smoothness [i.e., v(ω) ∈ C (k) (0, 1)].The ω → 0 + limit of v(ω) then becomes v 0 = v(0 + ).Focusing on the first-order approximation, we insert v(ω) ≈ v 0 + ωv 1 into the eigenvalue equation v(ω) = P(ω) T v(ω) (14) to obtain The second-order terms will be negligible as ω → 0, and so we separately collect the zeroth-order and first-order terms in ω to obtain two consistency equations: and The consistency equation arising for the zeroth-order terms is exactly the eigenvalue equation with ω = 0, as expected.It implies a solution of the form given by Eq. ( 9).
To proceed, we left multiply the consistency equation arising from the first-order terms by u (i) , yielding However, [u (i) ] T P(0) T = [u (i) ] T and the term on the left-hand side is canceled by the first term on the right-hand side, which yields To simplify the left-hand side of Eq. ( 19), we use and v 0 = j αj e (j) ⊗ v (j) to obtain (Recall that u  19), Finally, we set Eq. ( 21) equal to Eq. ( 22) to obtain αT X = αT .

III. EXTENDED STUDY OF OPTIMAL MULTIPLEX IMBALANCE
In Fig. 3 of the main text, it appears that there is a relationship between the value of ω at which the total multiplex imbalance ||∆|| F is optimized [which we denote ω * ∆ = argmax ω ||∆|| F ∈ (0, 1)] and the value at which the convergence rate λ 2 (ω) is optimized [which we denote ω * λ2 = argmax ω λ 2 (ω)].To further explore the relationship between optimums ω * λ2 and ω * ∆ , we repeated this experiment with many values of a ∈ (0, 1).Recall that a tunes the heterogeneity of node-specific interlayer Markov chains, which have transition matrices P In Supplementary Fig. 2(A) and (B), we plot ω * ∆ and ω * λ2 , respectively, as a function of a.In each panel, we show results for the four strategies for creating interlayer Markov chains.Observe that in the limit a → 1, all of the optimums occur at approximately the same coupling strength ω * λ2 ≈ ω * ∆ ≈ 0.4, which is expected since all four strategies yield uniform coupling: P (n) → P = 0 1 1 0 .As one decreases a, the optimums shift to larger values of ω for all strategies except for the strategy with decreasing a n .This response is largest for the strategies with identical a n and increasing a n , and it is less clear for the strategy with random a n (in which case, the dependence of ω * ∆ on a is more random).

FIG. 2 .
FIG. 2.Validation of theory for timescale separations.We compare observed and predicted values of π (i) n (ω) for the MMC from Fig.1for two limiting cases: (A) Eq. (2) for ω → 0; and (B) Eq. (3) for ω → 1.The symbols' colors correspond to the same color scale as that shown in Fig.1(A).Note that 22 symbols are plotted in each panel, but because they overlap, we perceive just a few.

FIG. 4 .
FIG. 4.Multiplex imbalance and degree imbalance for a frequency-multiplexed functional brain network.We separately plot ∆pq versus (d (ip) np −d (iq ) nq ) for (A) intralayer edges and (B) interlayer edges for a MMC representation of an empirical brain network with N = 148 nodes (brain regions) and I = 7 layers (correlated MEG signals at different frequency ranges).∆pq and (d (ip) np − d (iq ) nq ) are positively (negatively) correlated for interlayer (intralayer) edges, and ∆pq have their largest magnitudes for ω = 0.5 (center column).

n = 1
for each n and n v (i) n = 1.)Next we simplify the right-hand side of Eq. ( (n) = (1−an) an an (1−an) .Here the values a n tune the probability of switching layers at each node n, and we considered four strategies for choosing a n : (i) Identical a n : we define a n = a for each n; (ii) Increasing a n : we define a n = a + (n − 1)δa for n ∈ {1, . . ., N } with δa = (1 − a)/(N − 1); (iii) Decreasing a n : we let a n = 1 − (n − 1)δa; (iv) Random a n : we sample a n uniformly at random from [a, 1].