A framework for the construction of generative models for mesoscale structure in multilayer networks

Multilayer networks allow one to represent diverse and coupled connectivity patterns — e.g., time-dependence, multiple subsystems, or both — that arise in many applications and which are diﬃcult or awkward to incorporate into standard network representations. In the study of multi-layer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-deﬁned nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such generative models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It uniﬁes and generalizes many existing models for mesoscale structures in fully-ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially-ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-speciﬁed dependency structure between layers. We discuss the parameters and properties of our framework, and we illustrate examples of its use with benchmark models for community-detection methods and algorithms in multilayer networks


I. INTRODUCTION
One can model many physical, technological, biological, financial, and social systems as networks.The simplest type of network is a graph [1], which consists of a set of nodes (which represent entities) and a set of edges between pairs of nodes that encode interactions between those entities.One can consider either unweighted graphs or weighted graphs, in which each edge has a weight that quantifies the strength of the associated interaction.Edges can also incorporate directions to represent asymmetric interactions or signs to differentiate between positive and negative interactions.
However, this relatively simple structure cannot capture many of the possible intricacies of connectivity patterns between entities.For example, in temporal networks [2,3], nodes and/or edges change in time; and * Both authors contributed equally. in multiplex networks [4], multiple types of interactions can occur between the same pairs of nodes.To better account for the complexity, diversity, and dependencies in real-world interactions, one can represent such connectivity patterns using "multilayer networks" (see Section II A) [4][5][6][7][8].We use the term single-layer network (which is also called a "monolayer network" or a "graph") for a multilayer network with a single layer (i.e., an 'ordinary' network), and we use the term "network" to refer to both single-layer and multilayer networks.
By using a single multilayer network instead of several independent single-layer networks, one can account for the fact that connectivity patterns in different layers often "depend" on each other.(We use the term dependent in a probabilistic sense throughout the paper: an object A depends on B if and only if P(A|B) = P(A).)For example, the connectivity patterns in somebody's Facebook friendship network today may depend both on the connectivity patterns in that person's Facebook friendship network last year (temporal) and on the connectivity patterns in that person's Twitter followership network today (multiplex).Data sets that have multilayer structures are increasingly available (e.g., see Table 2 of [4]).A natural type of multilayer network consists of a sequence of dependent single-layer networks, where layers may correspond to different temporal snapshots, different types of related interactions that occur during a given time interval, and so on.Following existing terminology [8,9], we refer to an instance of a node in one "layer" as a "state node" (see Section II A).
There is a diverse set of models for multilayer networks.(We overview them in Section II C.) Many of these models take a specific type of dependency (e.g., temporal) as their starting point.In the present paper, we introduce a framework for the construction of generative models for multilayer networks that incorporate a wide variety of structures and dependencies.It is broad enough to unify many existing, more restrictive interlayer specifications, but it is also easy to customize to yield multilayer network models for many specific cases of interest.Key purposes of such generative models include (1) generating synthetic networks with empirical features of interest, (2) benchmarking methods and algorithms for detecting mesoscale structures, and (3) inferring structure in empirical multilayer networks.

A unifying framework
A key feature of multilayer networks is their flexibility, which allows one to incorporate many different types of data as part of a single structure.In this spirit, our goal is to provide a general, unifying framework that enables users to construct generative models of multilayer networks with a large variety of features of interest in empirical multilayer networks by appropriately constraining the parameter space.We accomplish this in two consecutive steps.First, we partition the set of state nodes of a multilayer network.Second, we allocate edges, given a multilayer partition.We focus on modeling dependency at the level of partitions (as was done in [10]), rather than with respect to edges.Additionally, we treat the process of generating a multilayer partition separately from that of generating edges for a given multilayer partition.This modular approach yields random structures that can capture a wide variety of interlayer-dependency structures, including temporal and/or multiplex networks, appearance and/or disappearance of entities, uniform or nonuniform dependencies between state nodes from different layers, and others.For a specified interlayer-dependency structure, one can then use any (single-layer or multilayer) network model with a planted partition (i.e., a planted-partition network model ) to generate a wide variety of network features, including weighted edges, directed edges, and spatially-embedded layers.
The flexibility of our framework to generate multilayer networks with a specified dependency structure between different layers makes it possible to (1) gain insight into whether, when, and how to build interlayer dependen-cies into methods for studying many different types of multilayer networks and (2) generate tunable "benchmark models" that allow a principled comparison for community-detection (and, more generally, "meso-setdetection") tools for multilayer networks, including for complicated situations that arise in many applications (such as networks that are both temporal and multiplex) but thus far have seldom or never been studied.In many benchmark models, one plants a partition of a network into well-separated "meso-sets" (e.g., communities), and one thereby imposes a so-called "ground truth" (should one wish to use such a notion) [11] that a properly deployed meso-set-detection method ought to be able to find.Benchmark networks with known structural properties can be important for (1) analyzing and comparing the performance of different meso-set-detection tools; (2) better understanding the inner workings of meso-setdetection tools; and (3) determining which tool(s) may be most appropriate in a given situation.
One can also use our framework to generate synthetic networks with desired empirical properties, to generate null networks, and to explore "detectability limits" of mesoscale structures.(See, for example, the study of detectability thresholds in [10] for a model that is a special case of our framework.)With some further work, it is also possible to use our framework to develop models for statistical inference.In our concluding discussion (see Section VII), we suggest directions for how to apply our framework to the task of statistical inference.Our intention in designing such a flexible framework is to ensure that the generative models that it provides remain useful as researchers consider progressively more general multilayer networks in the coming years.

Paper outline
Our paper proceeds as follows.In Section II, we give an introduction to multilayer networks, overview mesoscale structure in networks, and review related work on generative models for mesoscale structure.In Section III, we formally introduce the notation that we use throughout the paper.In Section IV, we explain how we generate a multilayer partition with a specified dependency structure between layers.We also give examples of how to constrain the parameter space of our framework to generate several different types of multilayer networks and discuss what we expect to be common use cases (including temporal structure [10,12,13] and multiplex structure [9,13]) in detail.In Section V, we describe how we generate edges that are consistent with the planted partition.In Section VI, we illustrate the use of our framework as a way to construct benchmark models for multilayer community detection.In Section VII, we summarize our main results and briefly discuss possible extensions of our work to enable statistical inference on empirical multilayer networks with our framework.
Along with this paper, we include code [14] that users Figure 1.Toy example of a multilayer network with three physical nodes and two aspects.We represent undirected intralayer edges using solid black lines, directed interlayer edges using dotted red lines with an arrowhead, and undirected interlayer edges using dashed blue arcs.The first aspect is ordered and corresponds to time.It has two time points (which we label as "1" and "2").Therefore, L1 = {1, 2}.
can modify to readily incorporate different types of "null distributions" (see Section IV A), "interlayer-dependency structures" (see Section IV B), and planted-partition network models (see Section V).The model instantiations that one needs for generating the figures in Section VI are available at https://dx.doi.org/10.5281/zenodo.3304059.

II. BACKGROUND AND RELATED WORK A. Multilayer networks
The simplest type of network is a graph G = (V, E), where V = {1, . . ., n} is a set of nodes (which correspond to entities) and E ⊆ V ×V is a set of edges.Using a graph, one can encode the presence or absence of connections (the edges) between entities (the nodes).However, in many situations, it is desirable to include more detailed information about connections between entities.A common extension is to allow each edge to have a weight, which one can use to represent the strength of a connection.We assign a weight to each edge using a weight function w : E → R.
As we mentioned in Section I, one can also generalize the notion of a graph to encode different aspects of connections between entities, such as connections at different points in time or multiple types of relationships.We adopt the framework of multilayer networks [4,[6][7][8]15] to encode such connections in a network.In a multilayer network, a node can be present in a variety of different states, where each state is also characterized by a variety of different aspects.In this setting, edges connect state nodes, each of which is the instantiation of a given node in a particular state.The set of all state nodes of a given entity corresponds to a single physical node (which represents the entity), and the set of all state nodes in a given state is in one layer of a multilayer network.In the remainder of this paper, we use the terms "physical node" and "node" interchangeably and the terms "layer" and "state" interchangeably.One can think of the aspects as features that one needs to specify to identify the state of a node.In other words, a state is a collection of exactly one element from each aspect.For convenience, we introduce a mapping that assigns an integer label to each element of an aspect.That is, we map the l a elements of the a th aspect to the elements of a set {1, . . ., l a } of integers.Aspects can be unordered (e.g., social-media platform) or ordered (e.g., time).Most empirical investigations of multilayer networks focus on a single aspect (e.g., temporal [16][17][18] or multiplex [19][20][21]).However, many real-world situations include more than one aspect (e.g., a multiplex network that changes over time).For an ordered aspect, we require that the mapping respects the order of the aspect (e.g., t i → i for time, where t 1 ≤ . . .≤ t la is a set of discrete time points).A multilayer network can include an arbitrary number of ordered aspects and an arbitrary number of unordered aspects, and one can generalize these ideas further (e.g., by introducing a time horizon) [4].
To illustrate the above ideas, consider a hypothetical social network with connections on multiple socialmedia platforms ('Facebook', 'Twitter', and 'LinkedIn ') between the same set of people at different points in time.In this example network, there are two aspects: socialmedia platform and time.(One can consider 'type of connection' (e.g., 'friendship' and 'following') as a third aspect, but we restrict our example to two aspects for simplicity.)The first aspect is ordered, and the number of its elements is equal to the number of time points or time intervals.(For simplicity, we often refer simply to "time points" in our discussions.)The second aspect is unordered and consists of three elements: 'Facebook' (which we label with the integer "1"), 'Twitter' (which we label with "2"), and 'LinkedIn' (which we label with "3").If we assume that the time resolution is daily and spans the year 2010, an example of a state is the tuple ('01-Jan-2010', 'Twitter'), which is (1, 2) in our shorthand notation.We show a multilayer representation of this example social network in Fig. 1.
Edges in a multilayer network can occur either between nodes in the same state (i.e., intralayer edges) or between nodes in different states (i.e., interlayer edges).An example of an intralayer edge in Fig. 1 is (1, (1, 1)) ↔ (2, (1, 1)), indicating that entity 1 is friends with entity 2 on Facebook at time 1.All interlayer edges in Fig. 1 are diagonal (because they connect state nodes that correspond to the same physical node).Interlayer edges be-tween layers at different times for a given social-media platform are ordinal (because they connect state nodes with successive time labels [22]) and directed.Interlayer edges between concurrent layers that correspond to different social-media platforms are categorical and undirected.(In this example, such edges occur between state nodes from all pairs of concurrent layers, although one can envision situations in which interlayer edges occur only between state nodes from a subset of such layers.)An example of an ordinal intralayer edge in Fig. 1 is (1, (1, 1)) → (1, (2, 1)), indicating a connection from entity 1 in Facebook at time 1 to entity 1 in Facebook at time 2.An example of a categorical intralayer edge in Fig. 1 is (1, (1, 1)) ↔ (1, (1, 2)), indicating a connection between node 1 in Facebook at time 1 and node 1 in Twitter at time 1.One can also imagine other types of interlayer edges in an example like the one in Fig. 1, as there may be edges between state nodes at successive times and different social-media platforms (e.g., (1,

B. Mesoscale structures in networks
Given a (single-layer or multilayer) network representation of a system, it is often useful to apply a coarsegraining technique to investigate features that lie between those at the microscale (e.g., nodes, pairwise interactions between nodes, or local properties of nodes) and those at the macroscale (e.g., total edge weight, degree distribution, or mean clustering coefficient) [1].One thereby studies mesoscale features such as community structure [23][24][25][26], core-periphery structure [27,28], role structure [29], and others.
Our framework can produce multilayer networks with any of the above mesoscale structures.Notwithstanding this flexibility, an important situation is multilayer networks with "community structure", which are the most commonly studied type of mesoscale structure [1].Community detection is part of the standard toolkit for studying single-layer networks [23,24,30], and efforts at community detection in multilayer networks have led to insights in applications such as brain and behavioral networks in neuroscience [31], financial correlation networks [16], committee and voting networks in political science [32,33], networks of interactions between bacterial species [21], disease-spreading networks [12], social networks in Indian villages [34], and much more.
Loosely speaking, a community in a network is a set of nodes that are "more densely" connected to each other than they are to nodes in the rest of the network [23,24,30,35,36].Typically, a 'good community' should be a set of nodes that are 'surprisingly well-connected' in some sense, but what one means by 'surprising' and 'well-connected' is often applicationdependent and subjective.In many cases, a precise definition of "community" depends on the method that one uses to detect communities.In particular, many popular community-detection approaches in single-layer and multilayer networks define communities as sets in a partition of a network that optimizes a quality function such as modularity [1,24,37]; stability [38][39][40]; Infomap and its variants [9,41]; likelihood functions that are derived from stochastic block models (SBMs), which are models for partitioning a network into sets of nodes with statistically homogeneous connectivity patterns [21,[42][43][44][45]; and others [4,46,47].In this paper, we refer to a partition of the set of nodes of a single-layer network as a single-layer partition and to a partition of the set of state nodes of a multilayer network as a multilayer partition.The two primary differences between a community in a multilayer partition and a community in a singlelayer partition are that (1) the former can include state nodes from different layers; and ( 2) "induced communities" (see Section III) in one layer of a multilayer network may depend on connectivity patterns in other layers.For many notions of (single-layer or multilayer) community structure -including the most prominent methodsone cannot exactly solve a community-assignment problem in polynomial time (unless P = NP) [30,[48][49][50]; and popular scalable heuristics currently have few or no theoretical guarantees on how closely an identified partition resembles an optimal partition.These issues apply more generally to the field of cluster analysis, such as in graph partitioning [51], and many of the problems that plague community detection also apply to detecting other kinds of mesoscale structures.
Throughout our paper, to make it clear which results and observations apply to community structure in particular and which apply to mesoscale structure more generally, we use the term "community" when referring to a set in a partition of the set of nodes (or the set of state nodes) of a network that corresponds to community structure and the term "meso-set" (see Section III) when referring to a set in a partition that corresponds to any type of mesoscale structure.(In particular, a community is a type of meso-set.)

C. Generative models for mesoscale structure
The ubiquity and diversity of mesoscale structures in empirical networks make it crucial to develop generative models of mesoscale structure that can produce features that one encounters in empirical networks.Broadly speaking, the goal of such generative models is to construct synthetic networks that resemble real-world networks when one appropriately constrains and/or calibrates the parameters of a model using information about a scenario or application.Generative models of mesoscale structure can serve a variety of purposes, such as (1) generating benchmark network models for comparing mesoset-detection methods and algorithms [10,[52][53][54][55][56]; (2) undertaking statistical inference on empirical networks [10,43,57]; (3) generating synthetic networks with a desired set of properties [20,58]; (4) generating null models to take into account available information about an em-pirical network [59]; and (5) investigating "detectability limits" for mesoscale structure, as one can plant partitions that, under suitable conditions, cannot subsequently be detected algorithmically, despite the fact that they exist by construction [10,60,61].
One of the main challenges in constructing a realistic generative model (even for single-layer networks) is the breadth of possible empirical features in networks.The available generative models for mesoscale structure in single-layer networks usually focus on replicating a few empirical features at a time (rather than all of them at once): heterogeneous degree distributions and community-size distributions [43,53,62], edge-weight distribution [52,57,63], spatial embeddedness [12,64], and so on.Multilayer networks inherit all of the empirical features of single-layer networks, and they also have a key additional one: dependencies between layers.These interlayer dependencies can be ordered (as in most models of temporal networks), unordered (as in multiplex networks), combinations of these, or something more complicated.However, despite this variety, existing generative models for mesoscale structure in multilayer networks allow only a restrictive set of interlayer dependencies.They assume either a temporal structure [10,12,54,[65][66][67], a simplified multiplex structure with the same planted partitions across all layers [20,44,[68][69][70], or independent groups of layers such that layers in the same group have identical planted partitions [9,21].Using an alternative approach, a very recent model is able to generate multilayer partitions that satisfy the constraint that nonempty induced meso-sets in different layers that correspond to the same meso-set in the multilayer partition are identical [71].Recent work [13] on the link between multilayer modularity maximization and maximum-likelihood estimation of multilayer SBMs allows either temporal or multiplex interlayer dependencies with induced partitions that can vary across layers, but it makes restrictive assumptions on interlayer dependencies (e.g., all layers have the same set of nodes, interdependencies are "diagonal" and "layer-coupled", and so on).Importantly, in all aforementioned generative models of mesoscale structure in multilayer networks, interlayer dependencies are either (1) not explicitly specifiable or (2) special cases of the framework that we discuss in the present paper.

III. NOTATION
We now present a comprehensive set of notation for multilayer networks.Different subsets of notation are useful for different situations.
We consider a multilayer network M = (V M , E M , V, L) with n = |V| nodes (i.e., physical nodes) and l = |L| layers (i.e., states).We use d to denote the number of aspects and L a = {1, . . ., l a } to denote the labels of the a th aspect (where a ∈ {1, . . ., d}).We use O to denote the set of ordered aspects and U to denote the set of unordered aspects.For each ordered aspect a ∈ O, we assume that the labels L a reflect the order of the aspect.That is, for all α, β ∈ L a , we require that α < β if and only if α precedes β.We say that a multilayer network is fully-ordered if U = ∅, unordered if O = ∅, and partiallyordered otherwise.The set L = L 1 × . . .× L d of states is the Cartesian product of the aspects, where a state α ∈ L is an integer-valued vector of length d and each entry specifies an element of the corresponding aspect.Note that l = |L| = d a=1 l a .We use (i, α) ∈ V M ⊆ V × L to denote the state node (i.e., "node-layer tuple" [4]) of physical node i ∈ V in state α ∈ L. We include a state node in V M if and only if the corresponding node exists in that state.The edges E M ⊆ V M × V M in a multilayer network connect state nodes to each other.We use ((i, α), (j, β)) to denote a directed edge from (i, α) to (j, β).For two state nodes, (i, α) and (j, β), that are connected to each other via a directed edge ((i, α), (j, β)) ∈ E M , we say that (i, α) is an in-neighbor of (j, β) and (j, β) is an out-neighbor of (i, α).We categorize the edges into intralayer edges E L , which have the form ((i, α), (j, α)) and link entities i and j in the same state α, and interlayer (i.e., coupling) edges E C , which have the form ((i, α), (j, β)) for α = β.We thereby decompose the edge set as We define a weighted multilayer network by introducing a weight function w : E M → R (analogous to the weight function for weighted single-layer networks), which encodes the edge weights within and between layers.For an unweighted multilayer network, w(e) = 1 for all e ∈ E M .We encode the connectivity pattern of a multilayer network using an adjacency tensor A, which is analogous to the adjacency matrix for single-layer networks, with entries Note that G M = (V M , E M ) is a graph on the state nodes of the multilayer network M .We refer to G M as the flattened network of M .The adjacency matrix of the flattened network is the "supra-adjacency matrix" [4,15,72,73] of the multilayer network.One obtains a supra-adjacency matrix by flattening [74] an adjacency tensor (see Eq. ( 1)) of a multilayer network.The multilayer network and the corresponding flattened network encode the same information [4,75], provided one keeps track of the correspondence between the nodes in the flattened network and the physical nodes and layers of the multilayer network.In a similar vein, aspects provide a convenient way to keep track of the correspondence between state nodes and, for example, temporal and multiplex relationships.
We denote a multilayer partition with n set sets by S = {S 1 , . . ., S nset }, where nset s=1 S s = V M and S s ∩S r = ∅ for s = r.We represent a partition S using a partition tensor S with entries S i,α , where S i,α = s if and only if the state node (i, α) is in the set S s .We refer to a partition of a temporal network (a common type of multilayer network) as a temporal partition and a partition of a multiplex network (another common type of multilayer network) as a multiplex partition.A multilayer partition induces a partition S| α = {S 1 | α , . . ., S nset | α } on each layer, where S s | α = {i ∈ V : (i, α) ∈ S s }.We refer to a set S s of a partition S as a meso-set, to S| α as the induced partition of S on layer α, and to S s | α as the induced meso-set of S s on layer α.A community is a set S s in a partition that corresponds to community structure.We call s ∈ {1, . . ., n set } the label of meso-set S s .One way to examine overlapping meso-sets is by identifying multiple state nodes from a single layer with the same physical node.

IV. GENERATING MULTILAYER PARTITIONS
The systematic analysis of dependencies between layers is a key motivation for analyzing a single multilayer network, instead of examining several single-layer networks independently.The goal of our partition-generation process is to model interlayer dependency in a way that can incorporate diverse types of dependencies.We now motivate our partition-modeling approach, and we describe it in detail in Sections IV A-D.
The complexity of dependencies between layers can make it difficult to explicitly specify a joint probability distribution for meso-set assignments, especially for unordered or partially-ordered multilayer networks.To address this issue, we require only the specification of conditional probabilities for a state node's meso-set assignment, given the assignments of all other state nodes.The idea of conditional models is old and follows naturally from Markov-chain theory [76,77].Specifying conditional models (which capture different dependency features separately) rather than joint models (which try to capture many dependency features at once) is convenient for numerous situations.For example, conditionally specified distributions have been applied to areas such as spatial data modeling [78], imputation of missing data [77], secure disclosure of information [79], dependence networks for combining databases from different sources [80], and Gibbs sampling [81,82].
A problem with conditional models is that the different conditional probability distributions are not necessarily compatible, in the sense that there may not exist a joint distribution that realizes all conditional distributions [80,[82][83][84][85].Although several methods have been developed in recent years for checking the compatibility of discrete conditional distributions, these either make restrictive assumptions on the conditional distributions or have scalability constraints that significantly limit practical use [77,86].Nevertheless, the employment of conditional models (even if potentially incompatible) is common [87], and they have been useful in many applications, provided that one cautiously handles any potential incompatibility [77].For our use case, potential incompatibility arises for unordered or partially-ordered mul-tilayer networks.An issue that can result from it is the non-uniqueness of a joint distribution.We carefully design our partition-generation process such that it is welldefined irrespective of whether conditional distributions are compatible.In particular, we show that convergence is guaranteed and we address non-uniqueness by appropriately sampling initial conditions (see Appendix B).We also suggest empirical checks to ensure that a generated partition reflects planted interlayer dependencies.
In our framework, we define conditional probabilities in two parts: we separately specify (1) independent layerspecific random components and (2) interlayer dependencies.For a choice of interlayer dependencies, there are several features that one may want to allow in a multilayer partition.These include variation in the numbers and sizes of meso-sets across layers (e.g., meso-sets can gain state nodes, lose state nodes, appear, and disappear) and the possibility for these meso-set variations to incorporate features of the application at hand.(For example, in a temporal network, one may not want a meso-set to reappear after it disappears.)We ensure that such features are possible for any choice of interlayer dependency via independent layer-specific null distributions that specify the set of possible meso-set assignments for each layer and determine the expected size and expected number of meso-sets in the absence of interlayer dependencies.
Interlayer dependencies should reflect the type of multilayer network that one is investigating.For example, in a temporal network, dependencies tend to be stronger between adjacent layers and weaker for pairs of layers that are farther apart.It is common to assume that interlayer dependencies are uniform across state nodes for a given pair of layers (i.e., interlayer dependencies are "layer-coupled" [4]) and occur only between state nodes that correspond to the same physical node (i.e., interlayer dependencies are "diagonal" [4]) [12,16,21,33,61,72,73,88].However, these assumptions are too restrictive in many cases (e.g., situations in which dependencies are state-node-dependent or in which dependencies exist between state nodes that do not correspond to the same physical node) [9,46,70,[89][90][91].To ensure that one can relax these assumptions, we allow dependencies to be specified at the level of state nodes (or at the level of layers, when a user assumes that dependencies are layercoupled).We encode these interlayer dependencies in a user-specified interlayer-dependency tensor that determines the extent to which a state node's meso-set assignment in one layer "depends directly" on the assignment of state nodes in other layers (see Sections IV B and C).Our independence assumption on the null distributions allows us to encode all of the interlayer dependencies in a single object (an interlayer-dependency tensor).The entries of an interlayer-dependency tensor specify an "interlayerdependency network" and correspond to the causal links for the flow of information (in the information-theoretic sense) between different layers of a multilayer network.Therefore, they should reflect any constraints that one wishes to impose on the direct flow of information between layers.Longer paths in an interlayer dependency network yield indirect dependencies between structures in different layers.
After specifying the interlayer-dependency structure, we define the conditional meso-set assignment probabilities such that we either sample a state node's mesoset assignment from the corresponding null distribution or obtain it by copying the assignment of another state node (based on the interlayer-dependency tensor).Using these conditional probabilities, we define an iterative update process on the meso-set assignments of state nodes to generate multilayer partitions with dependencies between induced partitions in different layers.When updating the meso-set assignments of state nodes, we respect the order of an aspect (e.g., temporal ordering).For a fully-ordered multilayer network, our update process reduces to sequentially sampling an induced partition for each layer based on the induced partitions of previous layers.For an unordered multilayer network, our update process defines a Markov chain on the space of multilayer partitions.We sample partitions from a stationary distribution of this Markov chain.This sampling strategy is known as (pseudo-)Gibbs sampling [80][81][82]87].(We use the word "pseudo" because the conditional probabilities that we use to define the Markov chain are not necessarily compatible.)For a partially-ordered multilayer network, our update process combines these two sampling strategies.
In Section IV A, we describe possible choices for the independent layer-specific null distributions.In Section IV B, we explain our framework for generating a multilayer partition with general interlayer dependencies.In Section IV C, we focus on the specific situation in which interlayer dependencies are layer-coupled and diagonal, and we also assume that a physical node is present in all layers (i.e., the network is "fully interconnected" [4]).In Section IV D, we illustrate the properties of example temporal and multiplex partitions that are generated by models that we construct from our framework.Additionally, we take advantage of the tractability of the special case of temporal networks to highlight some of its properties analytically.

A. Null distribution
We denote the null distribution of layer α by P α 0 and the set of all null distributions by P 0 = {P α 0 , α ∈ L}.A simple choice for the null distributions is a categorical distribution, where for each layer α and each meso-set label s, we fix the probability p α s that an arbitrary state node in layer α is assigned to a meso-set s in the absence of interlayer dependencies.That is, where n set is the total number of meso-sets in the multilayer partition and nset s=1 p α s = 1 for all α ∈ L. The set {1, . . ., n set } is the set of meso-set labels, and the support of a null distribution is the set of labels that have nonzero probability.In the absence of interlayer dependencies, a categorical null distribution corresponds to specifying independent multinomial distributions for the sizes of induced meso-sets on each layer and fixing the expected size np α s of each induced meso-set.Therefore, by choosing the probabilities p α s , one has some control over the expected number and sizes of meso-sets in a sampled multilayer partition.A natural choice for p α is to sample it from a Dirichlet distribution, which is the conjugate prior for the categorical distribution [92,93].One can think of the Dirichlet distribution, which is the multivariate form of the beta distribution, as a probability distribution over the space of all possible categorical distributions with a given number of categories.Any other (probabilistic or deterministic) choice for p α is also possible.We give further detail about the Dirichlet distribution in Appendix A, where we discuss how one can vary its parameters to control the expected number and sizes of meso-sets in the absence of interlayer dependencies.This can allow a user to generate, for example, a null distribution with equally-sized meso-sets or a null distribution with a few large meso-sets and many small mesosets.Furthermore, irrespective of the particular choice of categorical distribution, it may be desirable to have meso-sets that have a nonzero probability of obtaining state nodes from the null distribution in some, but not all, layers.In Appendix A, we give examples of how one can sample the support of the null distributions before the probability vectors to incorporate this property when modeling the birth and/or death of meso-sets in temporal networks and the presence and/or absence of meso-sets in multiplex networks.
In general, the choice of the null distributions can have a large effect on the set of sampled multilayer partitions and ought to be guided by one's use case.For our numerical examples in Section VI, we fix a value of n set ∈ {1, . . ., nl}, where n is the number of physical nodes in each layer and l the number of layers (see Section III); and we use a symmetric Dirichlet distribution with parameters q = n set and θ = 1 (see Appendix A) to sample probability vectors p α of length n set .This produces multilayer partitions in which the expected meso-set labels are the same across layers (and given by {1, . . ., n set }) in the absence of interlayer dependencies and for which the expected induced meso-set sizes (which are np α s in the absence of interlayer dependencies) differ across layers.

B. General interlayer dependencies
We denote the user-specified interlayer-dependency tensor by P , where P j,β i,α is the probability that state node (j, β) copies its meso-set assignment from state node (i, α), for any two state nodes (i, α), (j, β) ∈ V M , and where P is fixed throughout the copying process.The probability that state node (j, β) copies its meso-set assignment from an arbitrary state node when state node (j, β)'s meso-set assignment is updated is where we require that pj,β ≤ 1 for all state nodes (j, β) ∈ V M .We also require that all intralayer probabilities are 0; that is, P j,α i,α = 0 for all i, j ∈ V and α ∈ L. We say that a state node (j, β) depends directly on a state node (i, α) if and only if P j,β i,α is nonzero.By extension, we say that a layer β depends directly on a layer α if there exists at least one state node in layer β that depends directly on a state node in layer α.The interlayer-dependency tensor induces the interlayerdependency network, whose edges are all interlayer, directed, and pointing in the direction of information flow between layers.The in-neighbors (see Section III) of a state node (j, β) in this network consists of the state nodes from which (j, β) can copy a meso-set assignment with nonzero probability.The support of the null distribution P β 0 corresponds to the only possible meso-set assignments for a state node (j, β) whenever the state node does not copy its assignment from one of its in-neighbors in the interlayer-dependency network.
For a given null distribution P 0 and interlayerdependency tensor P , a multilayer partition that results from our sampling process depends on four choices: (1) the way in which we update a state node assignment at a given step; (2) the order in which we update state node assignments; (3) the initial multilayer partition; and (4) the criterion for convergence of the iterative update process.We discuss points (1), (2), and (3) in the remainder of this subsection.We describe the sampling process in more detail and discuss convergence in Appendix B.

Update equation
A single meso-set-assignment update depends only on the choice of state node to update and on the current multilayer partition.Let τ be an arbitrary update step of the copying process.Suppose that we are updating the meso-set assignment of state node (j, β) at step τ and that the current multilayer partition is S(τ ) (with partition tensor S(τ )).We update the meso-set assignment of state node (j, β) either by copying the meso-set assignment in S(τ ) from one of its in-neighbors in the interlayer-dependency network or by obtaining a new, random meso-set assignment from the null distribution P β 0 for layer β.In particular, with probability pj,β , a state node (j, β) copies its meso-set assignment from one of its in-neighbors in the interlayer-dependency network; and with probability 1 − pj,β , it obtains its meso-set assignment from the null distribution P β 0 .This yields the following update equation at step τ of our copying process: The update equation ( 4) is at the heart of our framework.It is clear from Eq. ( 4) that the set of null distributions P 0 is responsible for the specification of meso-set assignments in the absence of interlayer dependencies (i.e., if P j,β i,α = 0 for all (i, α), (j, β)).In general, pj,β determines the relative importance of interlayer dependencies and the null distribution on the meso-set assignment of state node (j, β).Specifically, when pj,β = 0, the mesoset assignment of (j, β) depends only on the null distribution; and when pj,β = 1, the meso-set assignment of (j, β) depends only on the meso-set assignments of its in-neighbors in the interlayer-dependency network.

Update order
The order in which we update meso-set assignments of state nodes via Eq.( 4) can influence a generated multilayer partition.As we mentioned in Section II A, an aspect of a multilayer network can be either ordered or unordered, and the update order is of particular importance when generating a multilayer partition with at least one ordered aspect.In particular, the structure of the interlayer-dependency tensor should reflect the causality that arises from the order of the aspect's elements.For an ordered aspect, structure in a given layer should depend directly only on structure in previous layers.Formally, for each ordered aspect a ∈ O of a multilayer network, we require that where α a denotes the element of state α corresponding to aspect a and where (as we stated in Section III) we require that the labels of an aspect's elements reflect the ordering of those elements [94].For example, if we think of the interlayer edges in Fig. 1 as interlayer edges in the interlayer-dependency network, then Eq. ( 5) states that all edges with nonzero edge weights must respect the arrow of time.That is, it is impossible to have edges of the form ((j, (2, α 2 )), (i, (1, α 2 ))), with nodes i, j ∈ {1, 2, 3} and aspect α 2 ∈ {1, 2, 3}.The update order for the state nodes also needs to reflect the causality from the ordered aspects.In particular, if a ∈ O is an ordered aspect and we update a state node in a layer β, then the final meso-set assignment for state nodes in layers α with α a < β a should already be fixed.For example, in Fig. 1, an edge direction should respect the arrow of time; and it is also necessary that the assignment of all state nodes in the first time point are fixed before one updates the assignments of state nodes in the second time point.To achieve the latter, we divide the layers into sets of "order-equivalent" layers, such that two layers α and β are order-equivalent if and only if α a = β a for all a ∈ O.We also say that layer α precedes layer β if and only if α a < β a for all a ∈ O. (For example, the three layers in Fig. 1 that correspond to the first time point are order-equivalent and precede the three layers that correspond to the second time point.)Based on this equivalence relation, we obtain an ordered set of equivalence classes by inheriting the order from the ordered aspects.If we sort the classes of order-equivalent layers based on "lexicographic" order [95] of the ordered aspects, then (as a consequence of Eq. ( 5)) the meso-set assignment of state nodes in a given layer depends only on the assignment of state nodes in order-equivalent or preceding layers.This ensures that our partition-generation process reflects notions of causality that arise in multilayer networks with at least one ordered aspect.

Sampling process
The general idea of our sampling process is to simultaneously sample the meso-set assignment of state nodes within each class of order-equivalent layers and to sequentially sample the meso-set assignment of state nodes in non-order-equivalent layers, conditional on the fixed assignment of state nodes in preceding layers.
More specifically, our sampling algorithm proceeds as follows.First, we sample an initial multilayer partition from the null distribution (i.e., S i,α (0) ∼ P α 0 ).Second, we sample a partition for the first class of orderequivalent layers using (pseudo-)Gibbs sampling.In particular, we iteratively sample a layer uniformly at random from the first class of order-equivalent layers and update the meso-set assignment of all state nodes in that layer [96] based on Eq. ( 4).This defines a Markov chain on a subspace of multilayer partitions.We repeat the update process for sufficiently many iterations such that we approximately sample the meso-set assignments of the first class of order-equivalent layers from a stationary distribution of this Markov chain.Third, we sample state node assignments for subsequent classes of orderequivalent layers in the same way, based on a fixed statenode assignment from preceding layers.In particular, at each update step τ in Eq. ( 4), a state node can either copy a fixed meso-set assignment from an in-neighbor in a preceding layer, copy a current meso-set assignment from an in-neighbor in an order-equivalent layer, or obtain a meso-set assignment from the null distribution.
For a fully-ordered multilayer network (i.e., a multilayer network with U = ∅), each class of order-equivalent layers consists only of a single layer.Consequently, one needs only a single update for each state node for the sampling process to converge.(Subsequent updates would constitute independent samples from the same distribution.)For an unordered multilayer network, our sampling algorithm reduces to (pseudo-)Gibbs sampling, because all layers are order-equivalent and there is thus only one class of order-equivalent layers.For a partially-ordered multilayer network, there are multiple non-singleton classes of order-equivalent layers.We use (pseudo-)Gibbs sampling on each class, conditional on the meso-set assignment of state nodes in preceding layers.
In Appendix B, we explain our sampling process in more detail.We describe our (pseudo-)Gibbs sampling procedure for each class of order-equivalent layers, and we show pseudo-code that one can use to sample partitions in unordered, partially-ordered, and fully-ordered multilayer networks in Algorithm 1.We also discuss the convergence properties of our sampling procedure for unordered and partially-ordered multilayer networks, including the effect of the potential incompatibility of the distributions that are defined by Eq. ( 4).

C. Layer-coupled and diagonal interlayer dependencies
As we mentioned at the beginning of Section IV, a particularly useful restriction of the interlayer-dependency tensor that still allows us to represent many situations of interest is to assume that it is layer-coupled and diagonal.For simplicity, we also assume that the multilayer network is fully interconnected.Under these assumptions, we can express the interlayer-dependency tensor P using a layer-coupled interlayer-dependency tensor P with elements P j,β i,α = δ(i, j) P β α .The probability that state node (j, β) copies its meso-set assignment from an arbitrary state node when (j, β)'s meso-set assignment is updated is given by As before, we require that pβ ≤ 1 and P α α = 0 for all α ∈ L. The update equation (Eq.( 4)) then simplifies to which depends on the interlayer-dependency tensor P (which is now independent of state nodes) and the null distributions P 0 .Each term P β α quantifies the extent to which an induced partition in layer β depends directly on an induced partition in layer α.By considering different P , one can generate multilayer networks that correspond to several important scenarios, including temporal networks, multiplex networks, and multilayer networks with Layer-coupled interlayer dependency tensors (which, in this case, are matrices) for different types of multilayer networks with a single aspect.(a) In a typical temporal network, an induced partition in a layer depends directly only on the induced partition in the previous layer.Therefore, the only nonzero elements of the layer-coupled interlayer dependency tensor occur in the first superdiagonal.(b) In a typical multiplex network, an induced partition in a layer depends directly on the induced partitions in all other layers.We show a layer-coupled example.The copying probability in Eq. ( 6) is p β for the temporal layer-coupled interlayer-dependency matrix in panel (a) and p for the multiplex layer-coupled interlayer-dependency matrix in panel (b).We suppress the subscript β in panel (b), as the copying probability is the same for all layers.more than one aspect (e.g., with combinations of temporal and multiplex features).That is, the above restriction reduces the dimension of the interlayer-dependency tensor significantly, while still allowing one to analyze several very important situations.
In Fig. 2, we show layer-coupled interlayer-dependency tensors for two types of single-aspect multilayer networks: a temporal network and a multiplex network.As we mentioned in Section IV, it is useful to think of interlayer dependencies as causal links for the flow of information between layers.In a temporal network, it is typical to assume that an induced partition in a layer depends directly only on induced partitions in the previous layer.There are thus l − 1 copying probabilities (one for each pair of consecutive layers), which we are free to choose.Common examples include choosing the same probability for each pair of consecutive layers [10,12] or making some of the probabilities significantly smaller than the others to introduce change points [44,97].In a multiplex network, an induced partition in any layer can in principle depend directly on induced partitions in all other layers.This yields l(l − 1) copying probabilities to choose.In Fig. 2b, we illustrate the simplest case, in which each layer depends equally on every other layer.The layercoupled interlayer-dependency tensors in Fig. 2 are matrices, so we sometimes refer to them and other examples of layer-coupled interlayer-dependency tensors with a single aspect as layer-coupled interlayer-dependency matri- Figure 3. Block-matrix representation of a layer-coupled interlayer-dependency tensor for a multilayer network that is both temporal and multiplex.This example has more than one aspect, and it combines the features of the examples in Fig. 2. It has two classes of order-equivalent layers (where each corresponds to one time point), as indicated by the presence of two diagonal blocks.An induced partition in a layer depends directly on the induced partitions of its order-equivalent layers and on the induced partition of the corresponding layer in the preceding class of order-equivalent layers.If we think of the interlayer edges in Fig. 1 as edges in an interlayer-dependency network, this matrix with diagonal blocks of size 3 × 3 would be the corresponding layer-coupled interlayer-dependency tensor.

ces.
We can also generate multilayer networks with more than one aspect and can thereby combine temporal and multiplex features.In Fig. 3, we illustrate how to construct a layer-coupled interlayer-dependency tensor to generate such a multilayer network on a simple example with two aspects, one of which is multiplex and the other of which is temporal.

D. Temporal and multiplex partitions
In Fig. 4 and Fig. 5, we show example multilayer partitions that we obtain with the interlayer-dependency tensors of Fig. 2. The examples in Fig. 4 are temporal, and the examples in Fig. 5 are multiplex.For simplicity, we assume in Fig. 2a that dependencies between contiguous layers are uniform (i.e., p β = p ∈ [0, 1] for all β ∈ {2, . . ., l}).To illustrate the effect of the interlayer dependencies, we also show a heat map of the normalized mutual information (NMI) [98] between induced partitions in different layers.NMI is a measure of similarity between partitions, so we expect the values of NMI to  (with the exception of p = 0, which we include for completeness).We choose a node ordering for each visualization that (whenever possible) emphasizes "persistent" mesoscale structure [16].We show only the first 15 layers of each multilayer partition, because (as one can see in the NMI heat maps) similarities between induced partitions for p < 1 decay steeply with the number of layers when dependencies exist only between contiguous layers.
reflect the planted dependencies.As expected, for the temporal examples, partitions are most similar for adjacent layers.Similarities decay with the distance between layers and increase with the value of p.For the multiplex examples, we obtain approximately uniform similarities between pairs of layers (for all pairs of layers), where the similarities increase with the value of p.We use the examples of interlayer dependency of Figs. 4 and 5, as well as non-uniform and multi-aspect examples, in our The parameter values match those in the numerical examples in Section VI (with the exception of p = 0, which we include for completeness).We choose a node ordering that (whenever possible) emphasizes "persistent" mesoscale structure [16].
numerical experiments of Section VI.
To provide a detailed illustration of the steps that result in a multilayer partition, we focus on the temporal examples in Fig. 4. For the important special case of temporal interlayer dependencies, the generative model for multilayer partitions simplifies significantly.In particular, there is a single ordered aspect, so the layer index α ∈ N is a scalar and the order of the layers corresponds to temporal ordering.Furthermore, as we mentioned in Section IV B, for a fully-ordered multilayer network, we require that the order of the meso-set-assignment update process in Eq. ( 4) respects the order of the layers.The update order of state nodes (1, α) . . .(n, α) in any given layer α can be arbitrary (so we can update them simultaneously), but each update is conditional on the meso-set assignment of state nodes in layer α − 1.
The update process that we described in Section IV B reduces to three steps: (1) initialize (in an arbitrary order) the meso-set assignments in layer α = 1; (2) take the meso-set assignment of (i, α + 1) to be that of (i, α) with probability p, and sample the meso-set assignment of (i, α + 1) from P α+1 0 with complementary probability 1 − p; and (3) repeat steps ( 1) and ( 2) until α + 1 = l.As we mentioned in Section IV B, convergence is not an issue for this case (or, more generally, for any fully-ordered multilayer network), as we need only one iteration through the layers.This three-step generative model for temporal partitions was also suggested by Ghasemian et al [50].They used it to derive a detectability threshold for the case in which the null distributions are uniform across meso-sets (i.e., θ → ∞ in Section IV A) and intralayer edges are generated independently using the standard SBM.In other words, one replaces the degree-corrected SBM in Section V with the non-degree-corrected SBM of [42].In Appendix C, we highlight properties of a sampled temporal partition that illustrate how the interplay between p and the null distributions affects the evolution of meso-sets across layers (e.g., growth, birth, and death of induced meso-sets).The properties that we highlight hold for any choice of null distribution.In the same appendix, we subsequently illustrate that the particular choice of null distributions can greatly influence resulting partitions.For example, a non-empty overlap between the supports of null distributions that correspond to contiguous layers is a necessary condition for meso-sets to gain new state nodes over time.The properties and observations in Appendix C are independent of one's choice of planted-partition network model.

V. GENERATING NETWORK EDGES
There are diverse different types of multilayer networks [4,8].One common type of empirical multilayer network to study is one with only intralayer edges.There are also many empirical multilayer networks with both intralayer and interlayer edges (e.g., multimodal transportation networks [99]), as well as ones with only interlayer edges (e.g., temporal networks with edge delays, such as departures and arrivals of flights between airports [5,100]).One can use our framework to generate all three types of examples, provided the underlying edge generation model is a planted-partition network model.
Having generated a multilayer partition S with dependencies between induced partitions in different layers, the simplest way to generate edges is to use any singlelayer planted-partition network model (e.g., SBMs [42,43,49,57,63,101], models for spatially-embedded networks [12,64], and so on) and to generate edges for each layer independently.This yields a multilayer network with only intralayer edges, such that any dependencies between different layers result only from dependencies between the partitions that are induced on the different layers.For our numerical experiments in Section VI, we use a single-layer network model that is a slight variant (avoiding the creation of self-edges and multiedges) of the degree-corrected SBM (DCSBM) benchmark from [43], where "DCSBM benchmark" designates the specific type of DCSBM that was used in the numerical experiments of [43].We include pseudocode for our implementation of the DCSBM benchmark in Algorithm 3 of Appendix D.
One can also include dependencies between layers other than those that are induced by planted mesoscale structures.For example, one can introduce dependencies between the parameters of a single-layer plantedpartition network model by ( 1) sampling them from a common probability distribution (e.g., interlayer degree correlations [4,102] in a DCSBM) or by ( 2) introducing interlayer edge correlations, given a single-layer partition on each layer [103].For temporal networks, one can also incorporate "burstiness" [2,3,104] in the interevent-time distribution of edges.In such a scenario, the probability for an edge to exist in a given layer depends not only on the induced partition on that layer but also on the existence of the edge in previous layers.For example, one can use a Hawkes process to specify the time points at which edges are active [105,106].
In Appendix D, we describe a generalization of the DCSBM in [43] to multilayer networks that one can use to generate intralayer edges and/or interlayer edges.Our generalization constitutes a framework within which we formulate the parameters of a single-layer DCSBM in a multilayer setting.One can use our multilayer DCSBM (M-DCSBM) framework to incorporate some of the features that we described in the previous paragraph (e.g., degree correlations) in a multilayer network with intralayer and/or interlayer edges.

VI. NUMERICAL EXAMPLES
In this section, we use our framework to construct benchmark models for multilayer community-detection methods and algorithms.We use the examples of interlayer-dependency tensors from Figs. 2 and 3 to generate benchmark models.We also consider a variant of Fig. 2b in which we split the layers into groups, use uniform dependencies between layers in the same groups, and treat layers in different groups as independent of each other.These examples cover commonly studied temporal and multiplex dependencies, and they illustrate how one can generate benchmark multilayer networks with more than one aspect.We focus on a couple of popular quality functions for multilayer community detection in our numerical examples, rather than investigating any given method or algorithm in detail.
We compare the behavior of several variants of [107], which are similar to the the locally greedy Louvain computational heuristic [108], to optimize a multilayer modularity objective function [16,33] using the standard Newman-Girvan null model (which is a variant of a "configuration model" [109]).Modularity is an objective function that is often used to partition sets of nodes into communities that have a larger total internal edge weight than the expected total internal edge weight in the same sets in a "null network" [16], which is generated from some null model.Modularity maximization consists of finding a partition that maximizes this difference.For our numerical experiments, we use the generalization of modularity to multilayer networks of [33].For multilayer modularity, the strength of the interactions between different layers are governed by an interlayer-coupling tensor that controls the incentive for state nodes in different layers to be assigned to the same community.We use multilayer modularity with uniform interlayer coupling, so the strength of the interactions between different layers of the network depends on a layer-independent and node-independent interlayer coupling weight ω ≥ 0. We use diagonal and categorical (i.e., between all pairs of layers) interlayer coupling with weight ω for the multiplex examples in Section VI A. We use diagonal and ordinal (i.e., between contiguous layers) interlayer coupling with weight ω for the temporal examples in Section VI B. In Appendix E, we describe the Louvain algorithm and the variants of it that we use in this section.
We compare the results of multilayer modularity maximization with those of multilayer InfoMap [110] [9], which uses an objective function called the "map equation" (which is not an equation), based on a discrete-time random walk and ideas from coding theory, to coarsegrain sets of nodes into communities [111].In multilayer InfoMap, one uses a probability r ∈ [0, 1] called the "relaxation rate" to control the relative frequency with which a random walker remains in the same layer or moves to other layers.(A random walker cannot change layers when r = 0.) The relaxation rate thus controls the interactions between different layers of a multilayer network.We allow the random walker to move to all other layers when r = 0 for the multiplex examples in Section VI A, and we allow the random walker to move only to adjacent layers for the temporal examples in Section VI B. In contrast to multilayer modularity (where ω = 0 yields single-layer modularity), single-layer InfoMap is not equivalent to choosing r = 0 in multilayer InfoMap because placing state nodes that correspond to the same physical node in the same community can contribute positively to the quality function even when r = 0 [9].Consequently, we compute single-layer InfoMap separately and reference it by the data point "s" on the horizontal axis.
In all experiments in this section, we generate a multilayer partition using our copying process in Section IV (see Algorithm 1 in Appendix B) and a multilayer network for a fixed planted partition using a slight variant of the DCSBM benchmark network model in [43] that avoids generating self-edges and multi-edges (see Algorithm 3 in Appendix D).This produces multilayer networks that have only intralayer edges and in which the connectivity patterns in different layers depend on each other.Following [43], we parametrize the DCSBM benchmark in terms of its distribution of expected degrees and a community-mixing parameter µ ∈ [0, 1] that controls the strength of the community structure in the sampled network edges.For µ = 0, all edges lie within communities; for µ = 1, edges are distributed independently of the communities, where the probability of observing an edge between two state nodes in the same layer depends only on the expected degrees of those two state nodes.We use a truncated power law for the distribution of expected degrees.
The DCSBM benchmark imposes community structure as an expected feature of an ensemble of networks that it generates.The definition of the mixing parameter µ of the DCSBM benchmark ensures that the planted partition remains community-like -as intra-community edges are more likely to be observed and inter-community edges are less likely to be observed than in a network that is generated from a single-block DCSBM with the same expected degrees -for any value of µ < 1.Consequently, given sufficiently many samples from the same DCSBM benchmark (i.e., all samples have the same planted partition and same expected degrees), one should be able to identify the planted community structure for any value of µ < 1 (where the necessary number of samples goes to infinity as µ → 1).This feature makes the DCSBM benchmark an interesting test for the ability of multilayer community-detection methods to aggregate information from multiple layers.
We use NMI [98] to compare the performance of different community-detection algorithms.For each of our partitions, we compute the mean of the NMI between the partition induced on each layer by the output partition and that induced by the planted partition.That is, The quantity NMI is invariant under permutations of the meso-set labels within a layer.Consequently, NMI is well-suited to comparing multilayer communitydetection methods with single-layer community-detection methods.In particular, it allows us to test whether multilayer community-detection methods can exploit dependencies between layers of a multilayer network when p 0 (see Eq. ( 6)).In Appendix F, we show numerical experiments in which we compute NMI between multilayer partitions.We denote the NMI between two multilayer partitions by mNMI.In all of our numerical experiments in Sections VI A and B, we sample the benchmark networks in the following way.For each value of p, we generate 10 sample partitions.For each sample partition and value of µ, we generate 10 sample multilayer networks.This yields 100 benchmark instantiations for each pair (p, µ).We run each community-detection algorithm 10 times on each instantiation.In Figs.6-9, we show NMI between planted and recovered multilayer partitions averaged over sample partitions, sample networks, and algorithmic runs for each value of µ.The results for different planted partitions are generally similar.The only exception is In-foMap on temporal networks with p = 0.99 and p = 1, where we observe large differences between results for different partitions for certain values of µ.

A. Multiplex examples
In this section, we consider two stylized examples of multiplex networks.Multiplex networks arise in a variety of different applications, including international relations and trade [112][113][114], social networks [115], and ecological networks [116].In our multiplex examples, we consider simple dependency structures in which we expect multilayer community-detection methods to outperform single-layer methods by exploiting interlayer dependencies.
In Fig. 6, we consider multiplex networks with uniform dependencies between community structure in different layers.In Fig. 7, we consider multiplex networks with nonuniform dependencies between community structure in different layers.In both figures, we parametrize the amount of interlayer dependency in a network by the probability p (see Eq. ( 6)) that a state node copies its community assignment from a neighbor in the interlayer-dependency network.All multilayer networks have n = 1000 physical nodes and l = 15 layers.Each node is present in every layer, so there are a total of 15000 state nodes.In Fig. 2b, we show the layer-coupled interlayer-dependency matrix that we use to generate the uniform multiplex networks.For the nonuniform multiplex networks, we split the layers into 3 groups of 5 layers each.We use uniform dependencies between layers in the same group, and layers in different groups are independent of each other.The resulting layer-coupled interlayer-dependency matrix is block diagonal with diagonal blocks as in Fig. 2b and 0 entries on the off-diagonal blocks.We use a Dirichlet null distribution with n set = 10, θ = 1, and q = 1 (see Appendix A) to specify expected community sizes and the M-DCSBM benchmark (see Appendix D) with η k = 2, k min = 3, and k max = 150 to generate intralayer edges.We perform 200 iterations of our update process (see Appendix B).
Comparing our results for GenLouvain (Figs. 6a-e and 7a-c) and GenLouvainRand with reiteration and post-processing (Figs.6f-j and 7d-f), we see that the choice of optimization heuristic for a given quality function can significantly affect the quality of resulting identified partitions.In particular, for GenLouvain, we observe two distinct regimes and what appears to be a sharp transition between them [117].For ω < 1, we obtain a partition that is of similar quality to what we obtain by maximizing single-layer modularity; however, for ω > 1, we obtain a partition with identical induced partitions for each layer.We call the latter an aggregate partition.
Although the aggregate partition can be more similar to the planted partition than the single-layer partition when p is sufficiently large (e.g., this occurs in Figs.6d and e), we do not observe an interval of ω values between the two regimes in which GenLouvain recovers the planted partition better than both the single-layer and aggregate partitions.By contrast, GenLouvainRand with reiteration and post-processing identifies partitions that match the planted partition more closely than either the singlelayer or aggregate partitions in most cases.The exceptions are uniform multiplex networks with p = 0.5 (see Fig. 6f), where it is unable to exploit the weak dependencies to outperform the single-layer case; and p = 1 (Fig. 6j), where the aggregate partition is always best.Most of the improvement in the results for GenLou-vainRand over those for GenLouvain comes from reiteration.The additional randomization helps smooth out the transition at ω ≈ 1. Post-processing only yields a minor improvement in the value of NMI .However, its effect is more pronounced if we compute the mNMI between multilayer partitions (see Appendix F).
Multilayer InfoMap exhibits some problematic behavior in these benchmark experiments.Our results for multilayer InfoMap are noticeably worse than those that we obtain with single-layer InfoMap (corresponding to the data point "s" on the horizontal axis) for networks with comparatively weak community structure (specifically, µ ≥ 0.4), unless the planted partition for different layers is very similar (i.e., unless p ≥ 0.99).Furthermore, the methods that are based on multilayer modularity outperform multilayer InfoMap in our experiments for networks with weak community structure and similar layers (i.e., when both µ and p are close to 1).In-foMap does not identify meaningful community structure in networks with µ > 0.7 in any of our numerical examples, whereas GenLouvain and GenLouvainRand identify meaningful structure even for networks with very weak community structure (e.g., with µ = 0.9) for sufficiently large values of ω.
Our results for nonuniform multiplex networks in Fig. 7 are similar to those that we showed for the uniform multiplex networks in Fig. 6.In particular, we see that GenLouvainRand can exploit dependencies between community structure in different layers (although the improvement over single-layer modularity is less pronounced than in the uniform examples), whereas Gen-Louvain cannot.The main difference is that for p = 1, .Multiplex networks with uniform interlayer dependencies.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the communitymixing parameter µ in benchmark networks with uniform multiplex dependencies (see Fig. 2b).Each multilayer network has 1000 nodes and 15 layers, and each node is present in all layers.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmarks.(See the introduction of Section VI.)Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.
the aggregate partitions from GenLouvain and Gen-LouvainRand do not recover the planted partition for sufficiently large values of ω, because the induced partitions are not identical across layers in the planted partition.In principle, multilayer InfoMap should have an advantage over methods based on maximizing multilayer modularity in this case study, as the former's quality function is designed to detect breaks in community structure, whereas multilayer modularity forces some persistence of community labels between any pair of layers.In practice, however, multilayer InfoMap correctly identifies the planted community structure only when it is particularly strong (as we show in Appendix F, it correctly identifies the different groups of layers for networks with µ ≤ 0.3); and it is outperformed by single-layer InfoMap for networks with µ ≥ 0.4.
We suspect that at least some of the shortcomings of multilayer InfoMap in these experiments are due to the use of a Louvain-like optimization heuristic, rather than from flaws in InfoMap's quality function.As we have seen from the results with the heuristics GenLouvain and GenLouvainRand, seemingly minor adjustments of an optimization heuristic can have large effects on the quality of the results.We are thus hopeful that similar adjustments can also improve the results for multilayer InfoMap.

B. Temporal examples
Temporal networks arise in many different applications [2,3], such as the study of brain dynamics [17], financialasset correlations [16], and scientific citations [18].When representing a temporal network as a multilayer network, one orders the layers in a causal way, such that structure in a particular layer depends directly only on structure in previous layers (and not in future layers).To gen- Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a multiplex benchmark with nonuniform interlayer dependencies.Each multilayer network has 1000 nodes and 15 layers, and each node is present in all layers.The layer-coupled interlayer-dependency matrix is a block-diagonal matrix with diagonal blocks of size 5 × 5.For each diagonal block, we set the value in the interlayer-dependency matrix to a value p (so each diagonal block has the same structure as the matrix in Fig. 2b); for each off-diagonal block, we set the value to pc = 0, thereby incorporating an abrupt change in community structure.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmarks.Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.
erate multilayer networks that have such structure, we use our sampling process for fully-ordered multilayer networks and the interlayer-dependency tensor of Fig. 2a.
We consider two stylized examples of temporal networks.In Fig. 8, we show results for temporal networks that have uniform dependencies between consecutive layers (and hence tend to evolve gradually).To generate these networks, we set p β = p for all layers in Fig. 2a.In Fig. 9, we show results for temporal networks with change points.To generate these networks, we set p β = p for all layers except layers 25, 50, and 75, for which p β = p c = 0, resulting in abrupt changes in community structure.Each multilayer network in these two examples has n = 150 physical nodes and l = 100 layers.Each node is present in every layer, so there are a total of 15000 state nodes.We use a Dirichlet null distribution to specify expected community sizes and set n set = 5, θ = 1, and q = 1 (see Appendix A).For the temporal networks with change points in Fig. 9, we choose the support of the null distribution so that communities after a change point have new labels.We use the M-DCSBM benchmark (see Appendix D) with η k = 2, k min = 3, and k max = 30 to generate intralayer edges.
Both multilayer-modularity-based algorithms (i.e., GenLouvain and GenLouvainRand) can exploit interlayer dependencies for these temporal benchmark networks and identify partitions with significantly larger NMI than those that we obtain with single-layer modularity (i.e., with ω = 0).Typically, the peak of NMI seems to occur when 1 ω 4. When p < 1, one expects NMI to decrease for sufficiently large values of ω, as increasing ω further favors "persistence" [16] in the output partition that is not present in the multilayer planted partition.
For multilayer InfoMap, the results are less promising.For most parameter choices, the best result for multilayer InfoMap is at best similar and often worse than the result for single-layer InfoMap.(The NMI value for single-layer InfoMap is the data point that we label with "s" on the horizontal axis.)An exception are the results for p = 1 and uniform temporal dependencies (see Fig. 8o).In this example (where induced partitions are the same across layers), increasing the value of the relaxation rate r enhances the recovery for all sampled planted partitions when µ ≤ 0.4 and for a subset of sampled planted partitions when µ = 0.5 and µ = 0.6.The results for multilayer InfoMap on temporal benchmarks with uniform interlayer dependencies with p = 0.99 (see Fig. 8n for µ = 0.3 and µ = 0.4) and p = 1 (see Fig. 8o for µ = 0.5 and µ = 0.6) are the only instances where we observe large differences in results for different partitions that are generated by our model with the same parameter values.
Comparing results for GenLouvain (see Figs. 8a-e  and 9a-c) and GenLouvainRand with reiteration and post-processing (see Figs. 8f-j and 9d-f), we see that the difference in results between the two optimization heuristics for multilayer modularity is even more pronounced in these temporal examples than in the multiplex examples in Section VI A. In particular, GenLouvain exhibits an abrupt change in behavior near ω = 1; this is related to the transition behavior that was described in [16].As we explained in Section VI A (where we also observed this phenomenon), this transition occurs for the following reason: for values of ω that are above a certain threshold [16], only interlayer merges occur in the first phase of the Louvain heuristic.This abrupt  2a).Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmark.(See the introduction of Section VI.)Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.
transition no longer occurs with the additional randomization in GenLouvainRand, which exhibits smooth behavior as a function of ω.However, unlike in our multiplex experiments, we observe benefits from the behavior of GenLouvain in some situations, especially for networks with weak community structure (specifically, for µ ≥ 0.8).This phenomenon first becomes noticeable for networks with p = 0.95, and it becomes more pronounced for progressively larger values of p. Compare Figs.8c-e with Figs.8h-j; and compare Figs.9b and c with Figs.9e  and f.Our results for temporal benchmark networks with change points (see Fig. 9) are mostly similar to those for temporal benchmark networks with uniform dependencies when considering NMI to compare planted and recovered partitions.Only the results for p = 1 are noticeably different.(Compare Figs.9c, f, and i with Figs.8e, j, and o.)Because of the change points, the induced planted partitions for each layer are not identical, even when p = 1.Consequently, unlike in Fig. 8, the results for p = 1 in Fig. 9 are qualitatively similar to those for smaller values of p.In principle, one might expect multilayer InfoMap to have an advantage over multilayer-modularity-based methods in this experiment, as the former's quality function is designed to detect abrupt changes in community structure, whereas maximizing multilayer modularity always favors some persistence of community labels from one layer to the next.However, this theoretical advantage is not borne out in our experiment.When comparing the planted and recovered partitions using the NMI between multilayer partitions, we see that multilayer InfoMap can correctly recover the change points only for p = 1 and µ = {0, 0.1} (see Appendix F).

C. Multi-aspect examples
In this section, we illustrate the ability of our framework to generate multilayer networks with more than one aspect.A common situation that gives rise to multilayer networks with two aspects is a multiplex network that changes over time (e.g., citations patterns that change over time [118]).In our framework, such a situation corresponds to the interlayer-dependency tensor in Fig. 3.For the illustrative examples in this section, we use uniform multiplex and temporal dependencies; and we parametrize the interlayer dependency tensor as fol- (i) p = 0.99, a = 0.9 Figure 10.Multi-aspect networks with uniform interlayer dependencies.Effect of the strength of temporal interlayer coupling ωt and multiplex interlayer coupling ωm on the ability of maximization of multilayer modularity to recover the planted partition in two-aspect multilayer networks that we generate using our framework.The first aspect is temporal (and thus ordered) and the second aspect is multiplex (and thus unordered).The heat maps show the mean ( NMI , 0 1) over all layers of the NMI between induced planted partitions and recovered community structure.A black cross indicates the maximum value of NMI .The dependence between structure in different layers increases from top to bottom, and the importance of the temporal aspect versus the multiplex aspect increases from left to right.We average our results for each pair of parameters over 10 runs of GenLouvainRand with reiteration.The generated networks have n = 100 nodes and l = 100 layers with |LT| = |LM| = 10, so there are a total of 10000 state nodes.We fix the community-mixing parameter of the M-DCSBM benchmark to be µ = 0.5.

lows:
where a controls the balance between multiplex and temporal dependencies and p controls the overall strength of the interlayer dependencies.
To recover communities, we optimize a multi-aspect generalization, of the multilayer modularity function that was introduced in [33].
We specify the interlayer-coupling tensor C (a multiaspect generalization of the "interslice coupling" of [33]), such that it reflects the planted interlayer dependencies, by writing where ω t denotes the coupling parameter for the temporal dependencies and ω m denotes the coupling parameter for the multiplex dependencies.In Fig. 10, we illustrate the effects of the two interlayer coupling parameters, ω t and ω m , on the extent to which multilayer modularity maximization can recover a planted partition.We use GenLouvainRand with reiteration but without post-processing to find a local optimum of Eq. ( 8).For all of the values of p and a that we consider in Fig. 10, we see some improvement in the performance of multilayer modularity maximization with nonzero interlayer coupling compared with the case where both ω t = 0 and ω m = 0.As the latter case corresponds to independently maximizing modularity on each layer of a network, this demonstrates that multilayer modularity is able to exploit the interlayer dependencies to better recover the planted partition.As expected, increasing a (which gives more weight to temporal dependencies) leads to a shift of the region of the parameter space with good planted-partition recovery to larger values of ω t and smaller values of ω m .For progressively larger p, we observe a small overall increase in the value of NMI , but its dependence on ω t and ω m remains similar.

VII. CONCLUSIONS
We introduced a unifying and flexible framework for the construction of generative models for mesoscale structures in multilayer networks.The three most important features of our framework are the following: (1) it includes an explicitly parametrizable tensor P that controls interlayer-dependency structure; (2) it can generate an extremely general, diverse set of multilayer networks (including, e.g., temporal, multiplex, "multilevel" [119], and multi-aspect multilayer networks with uniform or nonuniform dependencies between state nodes); and (3) it is modular, as the process of generating a partition is separate from the process of generating edges, enabling a user to first generate a partition and then use any planted-partition network model.Along with our paper, we provide publicly available code [14] that users can modify to readily incorporate different types of null distributions (see Section IV A), interlayer-dependency structures (see Section IV B), and planted-partition network models (see Section V).
The ability to explicitly specify interlayer-dependency structure makes it possible for a user to control which layers depend directly on each other (by deciding which entries in the interlayer-dependency tensor are nonzero) and the strengths of such dependencies (by varying the magnitude of entries in the interlayer-dependency tensor).One can thereby generate multilayer networks with either a single aspect or multiple aspects (e.g., temporal and/or multiplex networks) and vary dependencies between layers from the extreme case in which induced partitions in a planted multilayer partition are the same across layers to the opposite extreme, in which induced partitions in a planted multilayer partition are generated independently for each layer from a null distribution for that layer.To the best of our knowledge, this level of generality is absent from existing generative models for mesoscale structures in multilayer networks, as those models tend to only consider networks with a single aspect (e.g., temporal or multiplex) or limited interlayerdependency structures (e.g., with the same induced partitions in a planted multilayer partition across all layers).
We illustrated several examples of generative models that one can construct from our framework, with a focus on a few special cases of interest, rather than on trying to discuss as many situations as possible.We focused on community structure in our numerical experiments, because it is a commonly studied mesoscale structure, but one can also use our framework to construct generative models of intra-layer mesoscale structures other than community structure (e.g., core-periphery structure, bipartite structure, and so on) by taking advantage of the flexibility of our framework's ability to use any singlelayer planted-partition network model.For our singleaspect examples, we assumed that interlayer dependencies exist either between all contiguous layers (a special case of temporal networks), between all layers (a special case of multiplex networks), or between all contiguous groups of layers (another special case of multiplex networks).For both our temporal and multiplex examples, we considered both uniform and nonuniform interlayer dependencies.We also combined some of these scenarios in an example with two aspects, a temporal aspect and a multiplex aspect.However, our framework's flexibility allows us to construct generative models of multilayer networks with more realistic features.For example, for temporal networks, one can introduce dependencies between a layer and all layers that follow it (such that Fig. 2a is an upper triangular matrix with nonzero entries above the diagonal) to incorporate memory effects [120].One can also consider interlayer dependencies that are not layer-coupled.For example, dependencies can be diagonal but nonuniform, or they can be nonuniform and exist only between sets of related nodes.
In Section VI, we consider the commonly studied case of a multilayer network with only intralayer edges and connectivity patterns in the different layers that depend on each other.We use a slight variant of the DCSBM benchmark from [43] to generate edges for each layer (see Algorithm 3).However, other types of multilayer networks are also important [4], and one can readily combine our approach for generating multilayer partitions with different network generative models that capture various important features.For example, one can use an SBM to generate interlayer edges (e.g., using our M-DCSBM framework, which we discuss in Appendix D), or one can replace the degree-corrected SBM in Section V with any other planted-partition network model or other interesting models (e.g., other variants of SBMs [42,49] and models for networks whose structure is affected by space (and perhaps spatially embedded) [12] or arbitrary latent features [64]).In all of these examples, dependencies between connectivity patterns in different layers arise only from a planted multilayer partition.It is also possible to modify our network generation process (see Section V) to incorporate additional dependencies between layers beyond those that are induced by a planted mesoscale structure, such as by introducing dependencies between a node's degree in different layers [102,121] or burstiness [104] in inter-event-time distributions of edges.
Our work has the potential for many useful and interesting extensions, and we highlight three of these.First, although we have given some illustrative numerical examples in Section VI, the area of benchmarking communitydetection methods in multilayer networks is far from fully developed.Generative models are useful tools for understanding the behavior of community-detection methods in detail and thus for suggesting ways of improving heuristic algorithms without losing scalability.One can use our framework to construct benchmark models that provide a test bed for gaining insight into the advantages and shortcomings of community-detection methods and algorithms (and, more generally, of meso-setdetection methods and algorithms).Importantly, we expect these benchmark models to be very informative for detecting potential artifacts of algorithms that can sometimes be masked in real-world applications.Second, a well-understood generative model can be a powerful tool for statistical inference (i.e., inferring the structure of a multilayer network rather than generating a multilayer network with a planted structure) [24].For temporal networks, closed forms for the joint distribution of mesoset assignments have been derived for models that are special cases of our framework [10,13].These results may be useful for statistical inference.Additionally, it seems likely that it is possible to adapt Bayesian inference techniques (including Gibbs sampling [122][123][124] or variational methods [125]) that have been developed for SBMs both for inferring a multilayer partition and for inferring an interlayer-dependency tensor.A key advantage of the generality of the framework that we have developed in the present paper is that it may be possible to frame many model-selection questions in terms of posterior estimation of the interlayer-dependency tensor.Finally, it is important to model interdependent data streams and not just fixed data sets.For example, for any fully-ordered multilayer network, our generative model respects the causality of layers (e.g., temporal causality), and one can thus update a multilayer network with a new layer without the need to update any previous layers.It is critical to develop generative models that are readily adaptable to such situations, and our work in this paper is a step in this direction.

Appendix A: Example null distributions
In this appendix, we discuss parameter choices for a categorical null distribution and give concrete examples that can be useful for modeling mesoscale structure in temporal or multiplex networks.
In Section IV A, we described the general form of a categorical null distribution: where p α s is the probability for a state node in layer α to be assigned to a meso-set s in the absence of interlayer dependencies, n set is the number of meso-sets in the multilayer partition, and nset s=1 p α s = 1 for each α ∈ L. The support G α of the null distribution P α 0 is G α = {s : P α 0 [s] = 0}.We say that a label s is active in a layer α if it is in the support of the null distribution P α 0 (i.e., if P α 0 [s] = 0); and we say that a label is inactive in layer α if it is in the complement of the support of P α 0 (i.e., if P α 0 [s] = 0).A natural choice is to sample p α from a Dirichlet distribution, which is the conjugate prior for the categorical distribution [92,93].The Dirichlet distribution over q variables has q parameters θ 1 , . . ., θ q .Its probability density function is , where x i ∈ (0, 1) and θ i > 0 for each i ∈ {1, . . ., q}.
The case in which all θ i are equal is called a "symmetric Dirichlet distribution", which we parametrize by the common value θ (the so-called "concentration parameter") of the parameters and the number q of variables.The concentration parameter θ determines the types of discrete probability distributions that one is likely to obtain from the symmetric Dirichlet distribution.For θ = 1, the symmetric Dirichlet distribution is the continuous uniform distribution over the space of all discrete probability distributions with n set states.As θ → ∞, the Dirichlet distribution becomes increasingly concentrated near the discrete uniform distribution, such that all entries in p α are approximately equal.As θ → 0, it becomes increasingly concentrated away from the uniform distribution, such that p α tends to have 1 (or a few) large entries and all other entries are close to 0. Consequently, to have very heterogeneous meso-set sizes, one would choose θ ≈ 1.To have all meso-sets be of similar sizes, one would choose a large value of θ.To have a few large meso-sets and many small meso-sets, one would choose θ to be sufficiently smaller than 1.The value of n set also affects the amount of meso-set label overlap across layers in the absence of interlayer dependencies.For example, if p α is the same for all layers, then larger values of n set incentivize less label overlap across layers (because there are more possible labels for each layer) and smaller values of n set incentivize more label overlap across layers (because there are fewer possible labels for each layer).
In some situations -e.g., when modeling the birth and death of communities in temporal networks or the appearance and disappearance of communities in multiplex networks -it is desirable to have meso-set labels that have a nonzero probability in (A1) only in some layers.For these situations, we suggest sampling the support of the distributions before sampling the probabilities p α .Given the supports for each layer, one then samples the corresponding probabilities from a symmetric Dirichlet distribution (or any other distribution over categorical distributions).That is, where G α = {s ∈ {1, . . ., n set } : P α 0 [s] = 0} is the complement of the support, with n set = max α∈L max(G α ).
A simple example for a birth/death process for mesosets is the following.First, fix a number of meso-sets and a support (i.e., active community labels) for the first layer.One then sequentially initializes the supports for the other layers by removing each meso-set in the support of the previous layer with probability r d ∈ [0, 1] and adding a number, sampled from a Poisson distribution with rate r b ∈ [0, ∞), of new meso-sets (with new labels that are not active in any previous layer).In temporal networks, for example, this ensures that if a meso-set label is not in the support of a given layer, then the label is also not in the support of any subsequent layers.For this process, the expected number |G α | of meso-sets in a layer approaches r b /r d as one iterates through the layers.Therefore, one should initialize the size of the support for the first layer close to this value to avoid transients in the number of meso-sets.For this process, the lifetime of meso-sets follows a geometric distribution.The nature of the copying process that we use to introduce dependencies between induced partitions in different layers typically implies that meso-sets that have been removed from the support of the null distribution do not lose all of their members instantly, but instead shrink at a speed that depends on the values of the copying probabilities in the interlayer-dependency tensor.
One can also allow labels to appear and disappear when examining multiplex partitions.For example, given a value for n set , one can generate the support for each layer by allowing each label s ∈ {1, . . ., n set } to be present with some probability q and absent with complementary probability 1 − q.This yields a sets of active and inactive meso-set labels for each layer.One can then sample the nonzero probabilities in p α that correspond to active labels from a Dirichlet distribution and set p α s to 0 for each inactive label s.Because multiplex partitions are unordered, there is no notion of one layer occurring after another one, so we do not need to ensure that an inactive label in a given layer is also inactive in "subsequent" layers.

Appendix B: Partition sampling process
In this appendix, we provide a detailed description of the way in which we sample multilayer partitions, including a discussion of the convergence properties of our sampling process.As we mentioned in Section IV B, we assume that the multilayer partitions are generated by a copying process on the meso-set assignment of state nodes.Recall that the conditional probability distribution for the updated meso-set assignment of a state node, given the current state of the copying process (Eq.( 4) in Section IV B), is i,α δ(S i,α (τ ), s) where P 0 is a given set of independent layer-specific null distributions and P is a given interlayer-dependency tensor.Further recall that a multilayer network can have both ordered aspects and unordered aspects, where we denote the set of ordered aspects by O and the set of unordered aspects by U. Our goal is to sample multilayer partitions that are consistent with generation by Eq. (B1) while respecting the update order from its ordered aspects (if present).Recall that two layers, α and β, are order-equivalent (which we denote by α ∼ O β) if α a = β a for all a ∈ O. Based on this equivalence relation, we obtain an ordered set of equivalence classes L/∼ O by inheriting the order from the ordered aspects (where, for definiteness, we consider the lexicographic order [95] over ordered aspects).We want to simultaneously sample the meso-set assignment of state nodes within each class of order-equivalent layers and we want to sequentially sample the meso-set assignment of state nodes in non-orderequivalent layers, conditional on the fixed assignment of state nodes in preceding layers (see Algorithm 1).

Scan order and compatibility
To sample the induced partitions for each class of order-equivalent layers, we use a technique that resembles Gibbs sampling [81].As in Gibbs sampling, we define a Markov chain on the space of multilayer partitions by updating the meso-set assignment of state nodes by sampling from Eq. (B1).In Gibbs sampling, one assumes that the conditional distributions that are used to define this Markov chain are compatible, in the sense that there exists a joint distribution that realizes these conditional distributions [82][83][84][85].However, for a general interlayer-dependency tensor, there is no guarantee that the distributions that are defined by Eq. (B1) are compatible.Following [80], we use the term "pseudo-Gibbs sampling" to refer to Gibbs sampling from a set of potentially incompatible conditional distributions.We can define different Markov chains by changing the way in which we select which conditional distribution to apply at each step (i.e., which state node to update at each step).
The order in which one cycles through the conditional distributions is known as the scan order [87].We use the term sampling Markov chain for the Markov chain that is defined by pseudo-Gibbs sampling with a specified scan order.Common scan orders are cycling through conditional distributions in a fixed order and sampling the update distribution uniformly at random from the set of conditional distributions.In pseudo-Gibbs sampling, the choice of scan order deserves careful attention.In pseudo-Gibbs sampling, different scan orders correspond to sampling from different distributions [87].(By contrast, in Gibbs sampling, the choice of scan order influences only the speed of convergence.)In particular, when cycling through conditional distributions in a fixed order, conditional distributions that are applied later have an outsized influence on the stationary distributions of the sampling Markov chain [87].Sampling the update distribution uniformly at random mitigates this problem, at the expense of introducing computational overhead.
We can improve on a purely random sampling strategy by exploiting the structure of the interlayer-dependency tensor.In particular, note that the conditional distributions for state nodes in the same layer are independent (as we assume that the interlayer-dependency tensor has no intralayer contributions) and hence compatible.When updating a set of state nodes from a given layer, we can update them in any order or even update them concurrently.To take advantage of this fact, we first sample a layer uniformly at random from the current class of orderequivalent layers and then update all state nodes in that layer.The sequence of layer updates defines a Markov chain on the space of multilayer partitions, and its convergence properties determine the level of effectiveness of our sampling algorithm.

Convergence guarantees
We now discuss two key properties -aperiodicity and ergodicity -that determine the convergence behavior of finite Markov chains.
First, note that sampling layers uniformly at random guarantees that the Markov chain is aperiodic.To see this, note that a sufficient condition for aperiodicity is that, for all states, there is a nonzero probability to transition from the state to itself.This clearly holds for our sampling chain, as it is possible for two consecutive updates to update the same layer, and (because the probability distributions in Eq. (B1) remain unchanged at the second update) there is a positive probability that the second update does not change the partition.
Second, note that if pi,α < 1 for all state nodes (i, α) ∈ V M such that α ∈ [α], where [α] is a class of order-equivalent layers, then the sampling Markov chain for that class is ergodic over a subspace of multilayer partitions that includes the support G [α] = α∈[α] G α of the null distributions.We have already shown that the sampling Markov chain is aperiodic, so all that remains is to verify that there exists a sample path from any arbitrary initial partition S 0 to any arbitrary multilayer partition S g ∈ G [α] in the support of the null distributions.Such a sample path clearly exists, as Therefore, one can achieve the desired transition using one update for each state node.Sample paths of a finite, aperiodic Markov chain converge in distribution to a stationary distribution of the Markov chain.However, different sample paths can con-verge to different distributions, and the eventual stationary distribution can depend both on the initial condition and on the transient behavior of the sample path.If the Markov chain is ergodic, the stationary distribution is unique, so it is independent of the initial condition and transient behavior.
After sufficiently many updates (i.e., by specifying a sufficiently large value for the expected number n upd of updates for each state node in Algorithm 1), we approximately sample the state of the Markov chains for each class of order-equivalent layers from their stationary distributions.For fully-ordered multilayer networks (i.e., for multilayer networks with U = ∅), setting n upd = 1 is sufficient for convergence, because the equivalence classes include only a single layer in this case and subsequent updates are thus independent samples from the same distribution.
Sampling initial conditions in an 'unbiased' way can be important for ensuring that one successfully explores the space of possible partitions, as the updating process is not necessarily ergodic over the support of the null distributions when (i,α)∈V M P j,β i,α = 1 for some state nodes.For example, if pi,α = 1 for all (i, α) ∈ V M , then any partition in which all state nodes in each weakly connected component of the interlayer-dependency network have the same meso-set assignment is an absorbing state of the Markov chain.
However, provided the updating process has converged, any partition that one generates should still reflect the desired dependencies between induced partitions for different layers.Therefore, to circumvent the problem of non-unique stationary distributions when the updating process is not ergodic, we reinitialize the updating process by sampling from the null distribution for each partition that we want to generate.Reinitializing the initial partition from a fixed distribution for each sample always defines a unique joint distribution, which is a mixture of all possible stationary distributions when the updating process is not ergodic.When the updating process is ergodic, it is equivalent to sample partitions by reinitializing or to sample multiple partitions from a single long chain.Using a single long chain is usually more efficient when one needs many samples and it is not problematic if there is some dependence between samples [126].In our case, however, we usually need only a few samples with the same parameters, and ensuring independence tends to be more important than ensuring perfect convergence (provided the generated partitions exhibit the desired interlayer-dependency structure).Using multiple chains thus has clear advantages even when the updating process is ergodic, and it is necessary to use multiple chains when it is not ergodic.

Convergence tests
It is difficult to determine whether a Markov chain has converged to a stationary distribution, mostly because it as measured by mNMI.On the horizontal axis, we show the value of n upd that we use to generate the first partition in the comparison.The lag is the number of additional updates per state node that we use to generate the second partition in the comparison.We average results over a sample of 100 independent chains with different initial partitions.The shaded area indicates one standard deviation above and below the mean.Note that the large amount of noise for small values of the lag is not a result of differences between chains in the sample; instead, it is an inherent feature of all chains in the sample.
is difficult to distinguish the case of a slow-mixing chain becoming stuck in a particular part of state space from the case in which the chain has converged to a stationary distribution.There has been much work on trying to define convergence criteria for Markov chains [127], but none of the available approaches are entirely successful.In practice, one usually runs a Markov chain (or chains) for a predetermined number of steps.One checks manually for a few examples that the resulting chains exhibit behavior that is consistent with convergence by examining autocorrelations between samples of the same chain and cross-correlations between samples of independent chains with the same initial state.When feasible, one can also check whether parts of different, independent  B2)).On the horizontal axis, we show the value of n upd that we use to generate the first partition in the comparison.The lag is the number of additional updates per state node that we use to generate the second partition in the comparison.We average results over a sample of 100 independent chains with different initial partitions.The shaded area indicates one standard deviation above and below the mean.
chains or different parts of the same chain are consistent with sampling from the same distribution.
To estimate the number of updates that we need in the numerical experiments of Section VI A, we examine the autocorrelation of the Markov chain in two different ways.In Fig. 11, we consider the multilayer NMI (mNMI) between sampled partitions at different steps of the Markov chain.We are interested in the behavior of the mNMI both as a function of n upd (the expected number of updates per state node that we use to generate the first partition in the comparison) and the lag (the number of additional updates per state node that we use to generate the second partition in the comparison).As the Markov chain converges to a stationary distribution, the value of the mNMI for a given lag should become inde-pendent of n upd (so the curves in Fig. 11 should become flat).The results in Fig. 11 suggest that the number of updates that we need for convergence increases moderately with p, where n upd ≈ 50 is sufficient for convergence even when p = 1.In Fig. 12, we examine whether the dependency pattern between induced partitions in different layers -we characterize such a pattern by the NMI between induced partitions in different layers -has converged using the mean absolute distance between the NMI matrices between induced partitions in a multilayer partition.Figs.11 and 12 reveal additional important information about the behavior of our sampling process.We can estimate the mixing time of the sampling Markov chain at stationarity from Fig. 11, because the mixing time corresponds to the lag that is necessary for the mNMI to converge to 0. The mixing time at stationarity increases with p and becomes infinite as p → 1, because the Markov chain converges to an absorbing state.From Fig. 12, we observe that, for all values of p, the mean absolute distance d between NMI matrices converges to a value near 0 and becomes essentially independent of the lag.This corroborates our assumption that different partitions that we generate from our sampling process have similar interlayer-dependency structure.

Appendix C: A generative model for temporal networks
In this appendix, we examine the generative model for temporal networks that we considered in Section IV D. In this model, we assume uniform dependencies between contiguous layers (i.e., p β = p ∈ [0, 1] for all β ∈ {2, . . ., l} in Fig. 2a).For this choice of interlayerdependency tensor, our generative model reduces to Algorithm 2. In this case (and, more generally, for any fully-ordered multilayer network), convergence is automatic, because we need only one iteration through the layers.
A first important feature of the generative model in Algorithm 2 is that it respects the arrow of time.In particular, meso-set assignments in a given layer depend only on meso-set assignments in the previous layer (e.g., the previous temporal snapshot) and on the null distributions  2a).
P 0 .That is, for all s ∈ {1, . . ., nl} and all α ∈ {2, . . ., l}, we have where S| α is the n-node partition that is induced on layer α by the multilayer partition S. The value of p determines the relative importance of the previous layer and the null distribution.When p = 0, meso-set assignments in a given layer depend only on the null distribution of that layer (i.e., on the second term of the right-hand side of Eq. (C1)).When p = 1, meso-set assignments in a given layer are identical to those of the previous layer (and, by recursion, to meso-set assignments in all previous layers).Using Eq. (C1), we see that the marginal probability that a given state node has a specific meso-set assignment is Computing marginal probabilities can be useful for computing expected meso-set sizes for a given choice of null distributions.
We now highlight how the copying update (i.e., Step C) and the reallocation update (i.e., Step R) in Algorithm 2 govern the evolution of meso-set assignments between consecutive layers.Steps C and R deal with the movement of nodes by first removing some nodes ("subtraction") and then reallocating them ("addition").In Step C, a meso-set assignment s in layer α can lose a number of nodes that ranges from 0 to all of them.It can keep all of its nodes in layer α + 1 (i.e., S s | α+1 = S s | α ), lose some of its nodes (i.e., S s | α+1 ⊂ S s | α ), or disappear entirely (i.e., S s | α+1 = ∅ and S s | α = ∅).The null distribution in Step R is responsible for a meso-set assignment s gaining new nodes (i.e., S s | α+1 ⊂ S s | α ) or for the appearance of a new meso-set label (i.e., S s | α+1 = ∅ and S s | α = ∅).When defining the null distributions P 0 , it is necessary to consider the interplay between Step C and Step R.
To illustrate how the meso-set-assignment copying process and the null distribution in Algorithm 2 can interact with each other, we give the conditional probability that a label disappears in layer α and the conditional probability that a label appears in layer α.For all s ∈ {1, . . ., nl} and all α ∈ {2, . . ., l}, the conditional probability that a label disappears in layer α is The expression (C2) depends only on our copying process and simplifies to (1 − p) Ss|α−1 when P α 0 [S i,α = s] = 0 (i.e., when the probability of being assigned to label s is 0 using the null distribution of layer α).Furthermore, for progressively larger P α 0 [S i,α = s], the probability that a label disappears is progressively smaller.
For all s ∈ {1, . . ., nl} and all α ∈ {2, . . ., l}, the conditional probability that a label appears in layer α is (C3) The expression (C3) gives the probability that at least one node in layer α has the label s, given that no node in layer α − 1 has the label s.When Sr|α−1∈S|α−1 P α 0 [S i,α = r] = 0, the probability that a label appears depends only on our meso-set-assignment copying process and is given by 1 − p n .Furthermore, larger values of Sr|α−1∈S|α−1 P α 0 [S i,α = r] reduce the probability that a label appears in layer α.
All discussions thus far in this appendix hold for any choice of P 0 .In the next two paragraphs, we give two examples to illustrate some features of the categorical null distribution from Section IV A. In particular, we focus on the effect of the support of a categorical null distribution on a sampled multilayer partition.(See Section IV A for definitions of some of the notation that we use below.)The support G α of a categorical null distribution P α 0 is G α = {s : p α s = 0}, where s ∈ {1, . . ., n set }.An important property of the support for Fig. 2a is that overlap between G α and G α+1 (i.e., G α+1 ∩G α = ∅) is a necessary condition for a meso-set in layer α to gain new members in layer α + 1.
Let c α denote the vector of expected induced meso-set sizes in layer α (i.e., c α s = np α s ), and suppose that the probabilities p α are the same in each layer (i.e., p α = p for all α).The expected number of meso-set labels is then the same for each layer, and the expected number of nodes with meso-set label s is also the same in each layer and is given by c α s .This choice produces a temporal network in which nodes change meso-set labels across layers in a way that preserves both the expected number of induced meso-sets in a layer and the expected sizes of induced meso-sets in a layer.Now suppose that we choose the p α values such that their supports do not overlap (i.e., G α ∩ G β = ∅ for all α = β).At each Step C in Algorithm 2, an existing meso-set label can then only lose members; and with probability 1 − p n , at least one new label will appear in each subsequent layer.For this case, one expects pc α s members of meso-set s in layer α to remain in meso-set s in layer α + 1 and (1 − p)c α s members of meso-set s in layer α to be assigned to new meso-sets (because labels are nonoverlapping) in layer α + 1.This choice thus produces multilayer partitions in which the expected number of new meso-set labels per layer is nonzero (unless p = 1) and the expected size of a given induced meso-set decreases in time.

Appendix D: M-DCSBM sampling procedure
In this appendix, we discuss a generalization to multilayer networks of the degree-corrected SBM (DCSBM) from Ref. [43].In other words, it is a multilayer DCSBM (M-DCSBM).The parameters of a general, directed M-DCSBM are a multilayer partition S (which determines the assignment of state nodes to meso-sets), a block tensor W (which determines the expected number of edges between meso-sets in different layers), and a set σ of state-node parameters (which determine the allocation of edges to state nodes in meso-sets).The probability of observing an edge (or the expected number of edges, if we allow multi-edges) from state node (i, α) to state node (j, β) with meso-set assignments r = S i,α and s = S j,β in a M-DCSBM is where W s,β r,α is the expected number of edges from state nodes in layer α and meso-set S r to state nodes in layer β and meso-set S s , the quantity σ β i,α is the probability for an edge starting in meso-set S r in layer α and ending in layer β to be attached to state node (i, α) (note that  the dependence on S r is implicit in σ β i,α ), and σ j,β α is the probability for an edge starting in layer α and ending in meso-set S s in layer β to be attached to state node (j, β).(Note that the dependence on S s is implicit in σ j,β α .)For an undirected M-DCSBM, both the block tensor W and the state-node parameters σ are symmetric.That is, W r,α s,β = W s,β r,α and σ i,α β = σ β i,α .The above M-DCSBM can generate multilayer networks with arbitrary expected layer-specific in-degrees and out-degrees for each state node.(The DCSBM of [43] can generate single-layer networks with arbitrary expected degrees.)Given a multilayer network with adjacency tensor A, the layer-α-specific in-degree of state node (j, β) is and the layer-β-specific out-degree of state node (i, α) is The layer-α-specific in-degree and out-degree of state node (i, α) are the "intralayer in-degree" and "intralayer out-degree" of (i, α).For an undirected multilayer network, layer-β-specific in-degrees and out-degrees are equal (i.e., k i,α β = k β i,α ).We refer to their common value as the "layer-β-specific degree" of a state node.For an ensemble of networks generated from an M-DCSBM, the associated means are As we mentioned in Section V, we consider undirected multilayer networks with only intralayer edges for our experiments in Section VI.The block tensor W thus does not have any interlayer contributions (i.e., W s,β r,α = 0 if α = β).Furthermore, we can reduce the number of node parameters that we need to specify the M-DCSBM to a single parameter σ i,α = σ α i,α = σ i,α α for each state node (i, α).
We sample the expected intralayer degrees e i,α = k α i,α , where k α i,α = j∈V M A j,α i,α , for the state nodes from a truncated power law [128] with exponent η k , minimum cutoff k min , and maximum cutoff k max .We then construct the block tensor W and state-node parameters σ for the M-DCSBM from the sampled expected degrees e and the meso-set assignments S. Let κ s,α = i∈Ss|α e i,α , S s ∈ S be the expected degree of meso-set s in layer α; and let be the expected number of edges in layer α.Consequently, is the probability for an intralayer edge in layer α to be attached to state node (i, α), given that the edge is attached to a state node that is in layer α and is part of the community S i,α .The elements W s,β r,α = δ(α, β) (1 − µ)δ(r, s)κ s,α + µ κ r,α κ s,α 2w α (D4) of the block tensor give, for r = s, the expected number of edges between state nodes in meso-set s in layer β and state nodes in meso-set r in layer α.For s = r and β = α, the block-tensor element W s,β r,α gives twice the expected number of edges.One way to think of the DCSBM benchmark is that we categorize each edge that we want to sample as an intra-meso-set edge with probability 1 − µ or as a "random edge" (i.e., an edge that can be either an intra-meso-set edge or an inter-meso-set edge) with probability µ.To sample an edge, we sample two state nodes (which we then join by an edge).We call these two state nodes the "end points" of the edge.We sample the two end points of an intra-meso-set edge with a frequency that is proportional to the expected degree of their associated state nodes, conditional on the end points being in the same meso-set.By contrast, we sample the two end points of a random edge with a frequency proportional to the expected degree of their associated state nodes (without conditioning on meso-set assignment).We assume that the total number of edges in layer α is sampled from a Poisson distribution [129] with mean w α .Although our procedure for sampling edges describes a potential algorithm for sampling networks from the DCSBM benchmark, it usually is more efficient to sample edges separately for each pair of mesosets.
In Algorithm 3, we show pseudocode for the specific instance of the model in Eq. (D1) that we use to sample intralayer network edges (independently for each induced partition) for a given multilayer partition in our numerical experiments of Section VI.The only difference between Algorithm 3 and the sampling algorithm of [43] is that we use rejection sampling to avoid creating selfedges and multi-edges.(In other words, if we sample an edge that has already been sampled or that is a self-edge, we do not include it in the multilayer network; instead, we resample.)Rejection sampling is efficient provided all blocks of the network are sufficiently sparse, such that the probability of generating multi-edges is small.For dense blocks of a network, we instead sample edges from independent Bernoulli distributions with success probability Eq. (D1).This algorithm for sampling networks from a DCSBM is very efficient; it scales linearly with the number of edges in a network.
In the above discussion, we defined the parameters in Eq. (D1) to generate intralayer edges in a multilayer network by specifying layer-specific in-degrees and outdegrees.That is, the layer-β-specific in-degree and outdegree of state node (i, α) are 0 if β = α.We can also extend the model in Eq. (D1) to generate interlayer edges.For a directed multilayer network with interlayer edges, we sample expected layer-β-specific in-degrees e i,α β and out-degrees e β i,α for each state node (i, α) and layer β from appropriate distributions.Given expected layerspecific degrees e and a multilayer partition S, we then construct the block tensor W and state-node parameters σ for the M-DCSBM analogously to the special case that we described above.Let For directed networks, the expected layer-specific indegrees and out-degrees generated by the above model do not correspond exactly to the values from the input parameters e, except when µ = 1.for all β ∈ {2, . . ., l} in Fig. 2a).Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.
ous.For example, in a temporal network, if a community in one layer splits into multiple communities in the next layer, there are multiple plausible ways to define labels across layers (e.g., all new communities get new labels or the largest new community keeps the original label).Moreover, different quality functions reward interlayerassignment labeling in different ways, and mNMI conflates this issues with how well a quality function recovers structure within layers.In particular, a small value of mNMI can hide an optimal value of NMI (as NMI = 1 is necessary, but not sufficient, to have mNMI = 1).We include only a subset of the values of p and p  2a to p.We define the null distributions in these experiments so that there is no overlap in community labels between different groups of layers.
that we considered in Figs. 6 and 8, as these suffice for the purpose of this appendix.In the numerical experiments of Section VI, we observed that post-processing does not have a major effect on NMI .This suggests that any misassignment across layers that underemphasizes the "persistence" of a multilayer partition (which post-processing tries to mitigate [16]) does not affect an algorithm's ability to identify structure within lay-ers.(Recall that post-processing relabels an assignment without changing the induced partitions.)We include figures without post-processing in the experiments of this appendix, as we expect the effect to be more noticeable when examining NMI values between multilayer partitions (which accounts for assignments both within and across layers).In Figs. 13 and 14, we use the same benchmark networks as in the multiplex examples in Figs. 6 and 7, respectively.In Figs. 15 and 16, we use the same benchmark networks as in the temporal examples of Figs. 8  and 9, respectively.All mNMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmark.(See the introduction of Section VI for more details on how we generate these instantiations.)Each curve in each figure corresponds to the mean mNMI values that we obtain for a given value of the communitymixing parameter µ, and the shaded area around a curve corresponds to the minimum and maximum mNMI values that we obtain with the 10 sample partitions for a given value of p or p.
For experiments with nonuniform interlayer dependencies in Figs. 14 and 16, it seems that InfoMap is better at detecting abrupt differences ("breaks", such as in the form of change points for temporal examples) in community structure than both GenLouvain and GenLou-vainRand, especially for the multiplex case (where this manifests for all examined values of p).However, multilayer InfoMap correctly identifies the planted community structure only when it is particularly strong (specifically, for µ ≤ 0.3).Recall from our experiments in Figs.7 and 9 that both GenLouvain and GenLouvain-Rand are better than InfoMap at detecting an induced partition within a layer (especially in cases where the planted community structure is quite weak).The reason that multilayer modularity maximization does not perfectly recover breaks in community structure of the planted multilayer partition is because multilayer modularity maximization with uniform ordinal or categorical coupling (see the introduction of Section VI) incentivizes "persistence" of community labels between layers (and nonzero persistence is achievable even when one independently samples induced partitions on different layers) [16].However, by design, our benchmark networks have no persistence whenever there is a break in community structure because of how we defined the support of the null distributions.Therefore, we do not expect multilayer modularity maximization to recover the planted labels between layers when p c = 0; this manifests as a de facto maximum in the mNMI between the output partition and the planted partition.EP/N510129/1.
LGSJ acknowledges a CASE studentship award from the EPSRC (BK/10/39).MB, LGSJ, AA, and MAP were supported by FET-Proactive project PLEXMATH (FP7-ICT-2011-8; grant #317614) funded by the European Commission; and MB, LGSJ, and MAP were also supported by the James S. McDonnell Foundation (#220020177).MAP was also supported by the National Science Foundation (grant #1922952) through the Algorithms for Threat Detection (ATD) program.Computational resources that we used for this re-search were supported in part by the National Science Foundation under Grant No. CNS-0521433, in part by Lilly Endowment, Inc. through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative.The Indiana METACyt Initiative at Indiana University Bloomington was also supported in part by Lilly Endowment, Inc.We thank Sang Hoon Lee, Peter Mucha (and his research group), Brooks Paige, and Roxana Pamfil for helpful comments.

Figure 4 .
Figure 4. Example temporal partitions for (n, l) = (150, 100).(Recall that n is the number of physical nodes and l is the number of layers.)We use the interlayer-dependency tensor from Fig. 2a with uniform probabilities p β = p for all β ∈ {1, . . ., l} and a Dirichlet null distribution with q = 1, θ = 1, and nset = 5 (see Appendix A).For (a) p = 0, (b) p = 0.5, (c) p = 0.85, (d) p = 0.95, (e) p = 0.99, and (f) p = 1, we show color-coded meso-set assignments ( , top) for a single example output partition and NMI values [98] (0 1, bottom) between induced partitions in different layers averaged over a sample of 10 output partitions.The parameter values match those in the numerical examples in Section VI(with the exception of p = 0, which we include for completeness).We choose a node ordering for each visualization that (whenever possible) emphasizes "persistent" mesoscale structure[16].We show only the first 15 layers of each multilayer partition, because (as one can see in the NMI heat maps) similarities between induced partitions for p < 1 decay steeply with the number of layers when dependencies exist only between contiguous layers.

Figure 5 .
Figure 5. Example multiplex partitions for (n, l) = (1000,15).We use the interlayer-dependency tensor in Fig.2band a Dirichlet null distribution with q = 1, θ = 1, and nset = 10 (see Appendix A).We perform 200 updating iterations (see Appendix B).For (a) p = 0, (b) p = 0.5, (c) p = 0.85, (d) p = 0.95, (e) p = 0.99, and (f) p = 1, where p = (l − 1)p is the probability that a state node copies its assignment from another state node, we show color-coded meso-set assignments ( , top) for a single example output partition and NMI values[98] (0 1, bottom) between induced partitions in different layers averaged over a sample of 10 output partitions.(For the temporal case in Fig.4, note that p = p.)The parameter values match those in the numerical examples in Section VI (with the exception of p = 0, which we include for completeness).We choose a node ordering that (whenever possible) emphasizes "persistent" mesoscale structure[16].

Figures 6 -
9 correspond to different choices for the interlayer-dependency tensor.In each figure, rows correspond to different choices for the community-detection algorithm, and columns correspond to different values of the copying probabilities (with the strength of interlayer dependencies increasing from left to right).All benchmark instantiations that we use for Figs.6-10 are available at https://dx.doi.org/10.5281/zenodo.3304059.

Figure 6
Figure 6.Multiplex networks with uniform interlayer dependencies.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the communitymixing parameter µ in benchmark networks with uniform multiplex dependencies (see Fig.2b).Each multilayer network has 1000 nodes and 15 layers, and each node is present in all layers.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmarks.(See the introduction of Section VI.)Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.

Figure 7 .
Figure 7. Multiplex networks with nonuniform interlayer dependencies.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a multiplex benchmark with nonuniform interlayer dependencies.Each multilayer network has 1000 nodes and 15 layers, and each node is present in all layers.The layer-coupled interlayer-dependency matrix is a block-diagonal matrix with diagonal blocks of size 5 × 5.For each diagonal block, we set the value in the interlayer-dependency matrix to a value p (so each diagonal block has the same structure as the matrix in Fig.2b); for each off-diagonal block, we set the value to pc = 0, thereby incorporating an abrupt change in community structure.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmarks.Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.

= 1 Figure 8 .
Figure 8. Temporal networks with uniform interlayer dependencies.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the communitymixing parameter µ in a temporal benchmark with uniform interlayer dependencies (i.e., p β = p ∈ [0, 1] for all β ∈ {2, . . ., l} in Fig.2a).Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmark.(See the introduction of Section VI.)Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.

Figure 9 .
Figure 9. Temporal networks with nonuniform interlayer dependencies.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a temporal benchmark with nonuniform interlayer dependencies.Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.Every 25th layer (i.e., for layers 25, 50, and 75), we set the value of p β in Fig. 2a to pc = 0, thereby introducing an abrupt change in community structure; and we set all other values of p β in Fig. 2a to p.All NMI values are means over 10 runs of the algorithms and 100 instantiations of the benchmark.Each curve corresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.

Figure 11 .
Figure 11.Autocorrelation of the sampling Markov chain, as measured by mNMI.On the horizontal axis, we show the value of n upd that we use to generate the first partition in the comparison.The lag is the number of additional updates per state node that we use to generate the second partition in the comparison.We average results over a sample of 100 independent chains with different initial partitions.The shaded area indicates one standard deviation above and below the mean.Note that the large amount of noise for small values of the lag is not a result of differences between chains in the sample; instead, it is an inherent feature of all chains in the sample.

Figure 12 .
Figure 12.Convergence of the dependency pattern between induced partitions in different layers, as measured by the mean absolute distance d between NMI matrices (see Eq. (B2)).On the horizontal axis, we show the value of n upd that we use to generate the first partition in the comparison.The lag is the number of additional updates per state node that we use to generate the second partition in the comparison.We average results over a sample of 100 independent chains with different initial partitions.The shaded area indicates one standard deviation above and below the mean.
, r = S i,α .(D3) κ s,α β = i∈Ss|α e i,α β , κ β s,α = i∈Ss|α e β i,α , S s ∈ Sbe the expected layer-β-specific in-degree and out-degree of meso-set s in layer α; and let number of edges from layer α to layer β.(Note the consistency constraint on expected in-degrees and expected out-degrees.)It then follows that σ

Figure 15 .
Figure15.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a temporal benchmark with uniform interlayer dependencies (i.e., p β = p ∈ [0, 1] for all β ∈ {2, . . ., l} in Fig.2a).Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.

Figure 16 .
Figure16.Effect of interlayer coupling strength ω and relaxation rate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a temporal benchmark with nonuniform interlayer dependencies.Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers.Every 25th layer (i.e., for layers 25, 50, and75), we set the value of p β in Fig.2ato pc = 0 (thereby incorporating an abrupt change in community structure); we set all other values of p β in Fig.2ato p.We define the null distributions in these experiments so that there is no overlap in community labels between different groups of layers.
The parameter n upd is the expected number of updates for each state node.
(We showed examples of such NMI matrices in Figs.4 and 5.) As with mNMI, the mean absolute distance d should become independent of n upd once the Markov chain has converged to a stationary distribution.This yields estimates for the minimum number of updates for convergence that are consistent with those from the results in Fig.11.It is always safe to use a larger number of updates to sample partitions, and the results from Figs. 11 and 12 suggest that our choice of n upd = 200 is conservative.
Pseudocode for generating partitions of a temporal network with uniform interlayer dependencies p between successive layers (i.e., p β = p for all β ∈ {1, . . ., l} in Fig.