Epidemic spreading in group-structured populations

Individuals involved in common group activities/settings -- e.g., college students that are enrolled in the same class and/or live in the same dorm -- are exposed to recurrent contacts of physical proximity. These contacts are known to mediate the spread of an infectious disease, however, it is not obvious how the properties of the spreading process are determined by the structure of and the interrelation among the group settings that are at the root of those recurrent interactions. Here, we show that reshaping the organization of groups within a population can be used as an effective strategy to decrease the severity of an epidemic. Specifically, we show that when group structures are sufficiently correlated -- e.g., the likelihood for two students living in the same dorm to attend the same class is sufficiently high -- outbreaks are longer but milder than for uncorrelated group structures. Also, we show that the effectiveness of interventions for disease containment increases as the correlation among group structures increases. We demonstrate the practical relevance of our findings by taking advantage of data about housing and attendance of students at the Indiana University campus in Bloomington. By appropriately optimizing the assignment of students to dorms based on their enrollment, we are able to observe a two- to five-fold reduction in the severity of simulated epidemic processes.


I. INTRODUCTION
From the Black Plague in the 13th century to the recent COVID-19 pandemic, epidemics have always represented significant threats for humanity [1].Network science has been at the core of many advances made in epidemic modeling, given that the fate of an epidemic process is fundamentally determined by the structure of the social network where spreading occurs [2][3][4][5].Notably, network structural properties play non-trivial roles in shaping not only the dynamics of spreading, but also mitigation strategies [2,[6][7][8][9][10].
Contacts of physical proximity that mediate spreading among individuals often occur in group activities/settings [5,11,12].For example in a college, two students can get in contact because they are enrolled in the same class and/or live in the same dorm.Similarly, physical proximity between school-going children happen in classroom/family settings.Realistic models of epidemic spreading are informed by data accounting for the existence of multiple types of group interactions that individuals are exposed to [13].This aspect is also accounted for in some recent theoretical meta-population models of epidemic spreading [5,11,12,[14][15][16].
A common feature of the above-mentioned studies is considering the exposure of individuals to multiple types of group interactions as an input rather than a free, or tunable, parameter of the spreading model.For example, Granell and Mucha assume that the groups of a population are given and then analytically study how the threshold value of an epidemic spreading in the group- * These authors contributed equally to this work.structured population depends on the mobility of individuals among groups [11].In this paper, we change perspective by focusing our attention on the effects that group structure and interrelation among groups have on the properties of a Susceptible-Infected-Recovered (SIR) epidemic process.To this end, we represent interaction patterns among individuals of a population as edge-colored graphs [17] with block/community structure [18].Each color or layer in these graphs represents a specific social setting where two individuals may get in physical proximity, e.g., a classroom or a dorm in the case of college students.Blocks or communities are instead used to model group interactions in the layers.In the example of a population of college students, blocks represent specific classes attended by students or specific dorms where students reside in.The framework allows us to control for the strength of the block structure of the layers, e.g., the likelihood that two students in the same class/dorm have a proximity contact, as well as for the similarity of groups across layers, e.g., the likelihood that two students live in the same dorm and attend the same class.We study the importance of these two factors in determining the fate of epidemic processes.
A study that is immediately related to ours is the one by Fan et al. [19].They consider epidemic spreading in edge-colored graphs that are embedded in geometric space.They numerically estimate the epidemic threshold for graphs with variable levels of correlation between the layers' embeddings, and find that such a quantity grows as geometric correlation increases.In other words, the severity of an epidemic is reduced in a network with geometrically correlated layers compared to a network whose layers are not correlated.
As there exists a tight analogy between geometric embedding and community structure [20], the work by Fan arXiv:2306.04733v2[physics.soc-ph]20 Dec 2023 et al. directly relates to ours.In our work, however, we go well beyond the state of the art by expanding the analysis by Fan et al. in two main, fundamental respects.First, we provide a full characterization of the importance of group strength and correlation in epidemic spreading by monitoring not only static quantities such as the size of the outbreak and the epidemic threshold, but also dynamical quantities such as the duration and intensity of the epidemic.Looking at dynamical metrics of epidemic severity turns out to be essential for a full understanding of how topological correlations affect spreading dynamics.In particular, there are cases where changes in group correlation do not lead to any variations in outbreak size and epidemic threshold, but to apparent variations of both duration and intensity.In addition, we consider the role of correlation in interventions for disease containment, an aspect that is not considered by Fan et al., but that actually displays a marked dependence on the topological correlations that characterize a network.We find that herd immunity in a network with correlated group structure can be achieved by immunizing a fraction of individuals that is even five-fold smaller than in the case of a network with non-correlated group structure.Second and more important, Fan et al. do not apply their approach to the analysis of real-world systems.The method is in fact limited by a series of challenges in practical applications.The approach requires knowledge of the network topology of the layers to perform the embedding in the hyperbolic space, but detailed network data are in many cases not readily available.Also, once the embedding is performed, manipulating the network to reduce the severity of a disease outbreak is not an intuitive task as it requires changing coordinates of nodes in the hyperbolic space.Our approach instead does not require knowledge of the network of contacts.We simply use information about groups' composition.This information is generally available in real systems.Further, interventions on the network are easy to formulate/implement as they require only changing the group assignments of nodes.In this paper, we actually demonstrate the utility of our proposed framework in the analysis of a real-world scenario of data concerning enrollment and housing of college students at the Indiana University campus in Bloomington.We show that a simple strategy of re-assigning students to housing facilities can (i) dramatically reduce the severity of an epidemic outbreak, both in terms of intensity and duration, as well as (ii) greatly enhance the effectiveness of interventions for disease containment.

A. Synthetic graphs
In Fig.  Here, the community structures of the layers are maximally correlated (NMI = 1.0, with NMI standing for normalized mutual information), meaning that there is a one-to-one map from shapes to colors.Connections among pairs of nodes are created such that the average degree is ⟨k (ℓ) ⟩ = 3; a fraction µ = 0.025 of these edges connects a node to other nodes outside its own community.Using these parameters, we generate edges in layer ℓ = 1 (full gray) and in layer ℓ = 2 (dashed red).(b) Same as in (a), but for uncorrelated community structure (NMI = 0.0), i.e., colors and shapes are assigned to nodes randomly.(c) We run V = 1, 000, 000 simulations of the SIR model on top of the graphs of panels (a) and (b) by setting the spreading rate β = 0.4, i.e., setting the basic reproduction number R0 = β(⟨k (1) ⟩ + ⟨k (2) ⟩) = 2.4.We display the fraction of infected nodes in the population as a function of time.We consider bins of size 0.05 and only report the average values over at least 100 surviving runs.(d) Same as in (c), but here we are displaying the fraction of recovered nodes in the population as a function of time.
on an edge-colored graph with N nodes [2,17].For simplicity, in most of our analysis we assume that there are only two layers of interactions.To have a concrete example in mind, think of a population of college students, where layer ℓ = 1 represents interactions in housing facilities, and layer ℓ = 2 represents classroom interactions.Individual layers of the network have pre-imposed block/community structure constructed according to the rules of the stochastic block model [18].Once more for simplicity, we assume the model to be homogeneous both in terms of degree and block sizes.Specifically, nodes' degrees in layer ℓ obey a Poisson distribution with average ⟨k (ℓ) ⟩.Also, each individual i is associated to a group σ ℓ) in layer ℓ, with all groups having the same size q (ℓ) .The mixing parameter µ quantifies the fraction of connections that each node has towards other nodes outside its own group.Please note that we use the same mixing parameter for both layers just for convenience, but this is not a requirement of our model.The correlation between the community structure of two layers is measured in terms of normalized mutual information (NMI) [21].Perfectly aligned communities generate large NMI values, whereas uncorrelated block structures have low NMI values.Concerning SIR dynamics, the recovery rate is set equal to one, whereas spreading events occur in both layers at rate β.The choice of using identical spreading rates for both layers is made for simplicity; the effective rate of spreading is anyway also a function of ⟨k (ℓ) ⟩.Initial conditions of the dynamics are such that all nodes are in the susceptible state, except for a single random node that is set in the infected state.
In the example of Fig. 1, we play only with one ingredient, that is the correlation of the community structure of the layers.Individual layers have a neat block structure, as they are generated for µ = 0.025.However, the two structures are in one case maximally correlated (NMI = 1.0) and in the other case completely uncorrelated (NMI = 0.0).In the example of the population of college students, correlated blocks occur when the conditional probability for two students to attend the same class given that they live in the same dorm is higher than the probability of two random students to attend the same class; uncorrelated community structure indicates that the two probabilities are equal.
The effect that correlation has on the outcome of SIR spreading is particularly interesting even for a small population like the one considered in Fig. 1.Outbreaks are comparable in size between the correlated and uncorrelated cases; however, graphs with correlated community structure display milder and longer epidemics than graphs with uncorrelated groups.In essence, topological correlations among the layer-wise community structure of the network may have an impact on how the epidemic unfolds, but not on its magnitude.This is a fundamentally important aspect that was not captured in the analysis by Fan et al. [19].
We study in a systematic manner the effect that the correlation among the community structure of the graph layers has on SIR spreading.To this end, we employ a simple algorithm that allows us to generate block structures anywhere between the maximally correlated and the uncorrelated configurations, see Methods for details.In Fig. 2, we display results for a graph with N = 10, 000 nodes.Model parameters are: ⟨k (1) ⟩ = 3, ⟨k (2) ⟩ = 10, q (1) = 5, q (2) = 25, and µ = 0.025.We consider three distinct values of the SIR spreading rate β = 0.2, 0.4 and 0.6, all of them corresponding to the endemic regime of the dynamics.As the figure clearly displays, increasing the correlation between the block structure of the layers reduces the severity of the epidemic.There is no appar-ent reduction in terms of outbreak size, unless the layer community partitions are very close to the configuration corresponding to maximum correlation.Severity reduction is instead mostly visible in terms of intensity and duration.Essentially, spreading slows down as the correlation of the partitions is increased, with a clear reduction in the peak value of the fraction of infected and an apparent increase in the total duration of the epidemic.
The above findings are valid for network layers that are sufficiently modular.When the strength of the community structure varies, nothing happens if the layers' partitions are uncorrelated, see Figs. 2d-f.However, if partitions are correlated, we observe an increase of epidemic severity as the community structure progressively becomes loose.Further, we observe an interesting trend in the metrics of epidemic severity as the number/size of communities is varied (Figs.S1 and S2).The trend is visible only if partitions are correlated, and community structure is sufficiently strong.In such a case, the outbreak size, the duration and the peak of infected nodes grow as the size of the blocks decreases.Once more, if partitions are not correlated, then no change in the values of metrics of epidemic severity is visible as the size/number of clusters is varied.All the above considerations are still valid even if we consider models with heterogeneous community sizes (Fig. S3) and node degrees (Fig. S4).Further, results for edge-colored graphs composed of three layers are almost identical to those obtained on two-layer edge-colored graphs (Fig. S5).
Also, we analyze how the two main ingredients of our network model, i.e., the strength of the community structure of the individual layers and the correlation among the community partitions of the layers, influence the effectiveness of immunization in suppressing an epidemic, see Fig. 3. To this end, we simply change the initial conditions of the dynamics by imposing that a fraction of randomly chosen nodes is set to the recovered state.All other nodes are in the susceptible state, except for one randomly chosen initial spreader that is set in the infected state.The effectiveness of random immunization is greatly enhanced when communities are correlated: at parity of spreading rate β, correlating the group structures leads to even a five-fold reduction in outbreak size and peak of infection compared to the uncorrelated case; also, the overall behavior of the system obtained for β = 0.6 in presence of group correlation is almost identical to the one observable for β = 0.2 in a population with uncorrelated group structure.A similar conclusion is reached by measuring the site-percolation thresholds of the networks, see Fig. S6.The site-percolation model is closely related to immunization, as the immunized nodes and their connections are effectively removed from the system and the spreading dynamics is governed by the network formed by the edges between the non-immunized nodes.
All the above results are obtained by means of numerical simulations.In the Supplemental Material (SM), we describe the derivation of two theoretical approx- FIG. 2. Epidemic spreading in synthetic group-structured populations.(a) We consider edge-colored graphs with N = 10, 000 nodes and parameters ⟨k (1) ⟩ = 3, q (1) = 5, ⟨k (2) ⟩ = 10, q (2) = 25, and µ = 0.025.We tune the correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We simulate SIR dynamics, and measure the size of the outbreak.We plot it as a function of the NMI between the layers' partitions.Results are averaged over V = 5, 000 repetitions.Different colors/symbols refer to results valid for different choices of the spreading rate β, i.e., β = 0.2, 0.4 and 0.6, all corresponding to supercritical spreading rates [see inset in panel (c)].The values of the reproduction number are R0 = 2.6, 5.2, and 7.8, respectively.(b) Same as in (a), but for the peak value of the fraction of infected.(c) Same as in (a), but for the duration of the epidemic.(d) We consider only the configurations corresponding to maximum (full curves, solid symbols) and minimum (dashed curves, transparent symbols) correlation among layers' partitions, and generate networks with variable mixing parameter µ.We plot the size of the outbreak as function of µ. Results represent averages over V = 5, 000 realizations of the model.(e) Same as in (d), but for the peak of the fraction of infected.(f) Same as in (d), but for the average duration of the epidemic.
imations: the individual-based mean-field approximation (IBMFA) and the group-based mean-field approximation (GBMFA).IBMFA is a standard theoretical approach that uses as input the topology of the network [2].GBMFA is an approximation introduced in this paper, inspired by the so-called degree-based mean-field approximation where nodes with identical degrees are treated as indistinguishable elements [2].The main difference in GBMFA is that classes of indistinguishable nodes are defined based on the layer-wise community structures.IBMFA and GBMFA generate similar predictions; one of these predictions is that the epidemic threshold equals −1 for any level of correlation existing between the layers' community structures , see Fig. S7.Qualitatively, this theoretical prediction supports our numerical findings on the weak dependence of the long-term metrics of epidemic severity from topological correlations among the network layers, highlighting once more fundamental differences existing between our results and those by Fan et al. [19].From the quantitative point of view, however, we find the mean-field approximations to be accurate enough only for populations with sufficiently large communities or sufficiently large aver-age degree, see Figs.S8-S11.These regimes, however, are of little importance in realistic settings where groups are generally much smaller than the size of the entire population and the average degree of individual nodes is small.

B. College housing/attendance network
The paper has thus far presented simulations on synthetic group-structured populations aimed at understanding the fundamental principles of epidemic spreading in these systems.In this section, we demonstrate the significance and effectiveness of these principles in a realworld scenario.We examine data about housing and attendance at Indiana University Bloomington (IUB), see Methods for details.There are N = 10, 132 individuals in the IUB dataset, each representing a student who resided in one of the campus facilities during the Fall 2019 semester.We use layer ℓ = 1 to represent contacts between students in housing settings.We form Q (1) = 396 groups composed of students living in the same floor of a large dormitory, or in the same Greek house.The average size of these groups is ⟨q (1)  to represent classroom interactions.We form Q (2) = 600 groups of students based on their enrollment program (e.g., Computer Science, Finance, Mathematics, Physics) and their education level (from Freshman to Ph.D.).The average size of these groups is ⟨q (2) ⟩ = 16.86.Note that the partitions within the classroom and residential lay-ers are disjoint.Moreover, the group sizes in the two partitions are not homogeneous, see Fig. S12.
The ground-truth community partitions display a correlation level similar to that of the uncorrelated case.Specifically, the NMI between the partitions obtained from the data is 0.389, which is comparable with the one obtained by randomizing the dormitory assignment of students which is 0.327.We also create an artificial configuration, where students are re-assigned to residences so that the NMI between the partitions of the two layers is maximized.The optimization technique we employ is described in the Methods section.The optimized configuration corresponds to the NMI value 0.934.
Given the three configurations described above, we generate modular network layers using the Lancichinetti-Fortunato-Radicchi (LFR) model [22], see Methods for details.Nodes' degrees obey Poisson distributions with average equal to ⟨k (1) ⟩ = ⟨k (2) ⟩ = 5.The strength of the community structure is controlled by the value of the mixing parameter, set here to µ = 0.025.We stress that the sizes of the communities in the two layers is always set equal to the one learned directly from the real data.
In Figs.4a-c, we display the metrics of epidemic severity as functions of the spreading rate.No difference between the ground-truth and randomized configurations is noticeable.The optimized configuration displays instead a less severe epidemic compared to the other two configurations.Also, we notice that random immunization becomes more effective when communities are correlated than when no correlation among the layers' partitions is present, Figs.4d-f.Qualitatively similar results are obtained also for µ = 0.1, see Fig. S13.

III. DISCUSSION
In this paper, we studied susceptible-infectedrecovered (SIR) spreading in group-structured populations.The real system that inspired our work is a population of college students that reside on campus during an academic semester.Recurrent interactions among students are due to group activities/settings, e.g., housing and class enrollment; each setting is responsible for the formation of network layers with modular structure.At the level of individual layers, modular structure is characterized by the strength and size of the blocks/communities that are present in the layer.In the example of the population of college students, these correspond to the size of classes or dorms, and to the level of mixing between students depending on their enrollment or housing assignment.Correlation among the layer-wise community structures is indicative instead for the likelihood of two nodes to belong to the same community in both layers, i.e., the conditional probability of two students to attend the same class given that they live in the same dorm.All these parameters play fundamental roles for the determination of properties characterizing the spread of an infectious disease within the population.
In the first part of the paper, we considered synthetic populations where the above ingredients can be finely tuned.By means of extensive numerical simulations, we showed that SIR dynamics is characterized by mild and long spreading processes if the communities of the individual layers are neat, and the correlation among the layer-wise community structure is sufficiently high.On contrary, weak and/or uncorrelated community structures generally correspond to short and intense outbreaks.Immunization in populations with correlated communities is also greatly more effective than in populations with uncorrelated layer-wise community structures.
In the second part of the paper, we took advantage of real data about housing/attendance of students at the Indiana University Bloomington campus, and numerically simulated SIR dynamics in this population.Groundtruth data correspond to layer-wise partitions that are uncorrelated.By appropriately re-assigning students to housing facilities on the basis of their enrollment profiles, we were able to show that the severity of an epidemic can be significantly reduced compared to the case in which no re-assignment is performed.The re-assignment strategy also enhances the effectiveness of random immunization in suppressing disease spreading.
The above results are obtained under the assumption that the layer-wise group structure is sufficiently neat, i.e., for sufficiently low values of the mixing parameter µ.This assumption can be empirically justified by analyzing publicly available datasets concerning contact networks within group-structured populations.First, we consider a contact network for a primary school.The dataset consists of events of physical proximity between students and teachers [23].Each event of physical proximity is associated to a duration.The system contains 242 individuals and 11 communities (10 classrooms + teachers).Two days of observation are included in the dataset: day one consists of 37, 414 contact events and day two of 40, 188 contact events.We find that the value of the weighted mixing parameter ⟨µ⟩, i.e., the average fraction of contact time spent by the students outside their classrooms, are ⟨µ⟩ = 2.3 × 10 −5 and ⟨µ⟩ = 2.6 × 10 −5 , respectively.Next, we look at a dataset concerning physical proximity in an office building [24].The dataset consists of 9, 827 contact events between 92 individuals organized in 5 departments (communities).We find ⟨µ⟩ = 9.1 × 10 −4 .
Our findings stem from the basic principle that the severity of an epidemic is proportional to the overall level of mixing between individuals of a population.Correlated modular structure in our framework corresponds to redundant interactions among subsets of individuals, which speeds up the propagation of a disease within a subset, but also slows down the diffusion across subsets thus in the overall population.Although having a radically different implementation, the mechanism of reducing the severity of an epidemic by enhancing group correlation is similar to the one at the basis of the mitigation strategy proposed by Meidan et al., consisting of an alternating lockdown policy that involves half of the population at a time [25].
A very important remark is that our findings are apparent only from numerical simulations of SIR dynamics, but they do not emerge from the numerical integration of mean-field equations aimed at approximating SIR dynamics.The mismatch between ground-truth dynamics and its approximations is particularly apparent in realistic settings where groups are small compared to the population size; also, the mismatch mostly concerns earlyand mid-stages of SIR dynamics, rather than its longterm features.
All our results are based on important simplifications concerning the spreading dynamics of realistic diseases as well as the contact networks that are the basis of the spread of infectious diseases within real-world populations.Nonetheless, we believe they provide some easyto-implement principles to reduce the severity of real epidemics that managers can take under consideration when planning group activities in schools, colleges or other large organizations.

A. Edge-colored block-structured graphs
We consider edge-colored graphs where N nodes are connected via L different types or layers of interactions [17].We use L = 2 in most of our analysis; some results for L = 3 are presented in the SM.The topology of the layer ℓ = 1, 2, . . ., L is fully specified by the adjacency matrix A (ℓ) .The generic element of this matrix A

Synthetic graphs
In the majority of our experiments on synthetic graphs, network layers are instances of the version of the stochastic block model (SBM) [18] named planted partition model [26].Inputs for the generation of the network layer ℓ are the average degree ⟨k (ℓ) ⟩, the mixing parameter µ (ℓ) , the size q (ℓ) of each community, and the vector ⃗ σ (ℓ) = (σ N ) denoting the community membership of individual nodes, i.e., node i belongs to community σ total number of blocks in layer ℓ; the output is the adjacency matrix A (ℓ) .The element A (ℓ) ij is a Bernoulli random variate equal to one with probability p j , and with probability p Please note that this is the only model used to generate edge-colored graphs with L = 3 layers.
In some of our experiments on synthetic graphs, we generate network layers using the Lancichinetti-Fortunato-Radicchi (LFR) model [22], i.e., a variant of the SBM designed to deal with heterogeneous community sizes and/or nodes' degrees.More specifically, in the set of experiments where we consider power-law distributed community sizes, we first generate the community sizes q (ℓ) 1 , q (ℓ) 2 , . . ., q (ℓ) Q (ℓ) .These are integer numbers randomly extracted from the distribution P (q) ∼ q −τ for q ∈ [q min , q max ] and P (q) = 0 otherwise, with τ , q min and q max input parameters of the model.Clearly, we have that Please note that the same sequence for the community sizes is used for all layers of the network.Nodes are assigned to communities according to their sizes to form the community vector ⃗ σ (ℓ) .For the generation of the topology of the graph layer ℓ, we first set the degree of each node to be a random variate extracted from a Poisson distribution with average ⟨k (ℓ) ⟩.
We then connect pairs of nodes at random, taking care of preserving for each node its pre-assigned degree value, as well as the ratio µ (ℓ) between inter-community and total connections.Here the average degree ⟨k (ℓ) ⟩ and the mixing parameter µ (ℓ) are input parameters of the model.In the set of experiments where we consider power-law degree distributions, the size of all communities in layer ℓ is equal to q (ℓ) , an input parameter of the model.We then assign degrees to nodes by extracting random numbers from the power-law distribution P (k) ∼ k −γ for k ∈ [k min , k max ] and P (k) = 0 otherwise, with γ, k min and k max input parameters of the model.The degree sequence of each layer is generated independently.We then connect pairs of nodes at random, taking care of preserving for each node its pre-assigned degree value, as well as the ratio µ (ℓ) between inter-community and total connections, with the mixing parameter µ (ℓ) being an input parameter of the model.

College housing/attendance network
Also, we consider a dataset containing information about class attendance of students who resided in the Indiana University Bloomington (IUB) campus during the Fall 2019 semester [27].The dataset comprises N = 10, 132 students.We use layer ℓ = 1 to represent housing, and group together students living in the same floor of a dormitory building or in the same Greek house.We form Q (1) = 396 groups of average size ⟨q (1) ⟩ = 25.58.Please note that the size of the various groups is not constant, see Fig. S12.We use layer ℓ = 2 to represent class interactions.Each student in the dataset is associated to one of the 218 degree programs (e.g., Law, Mathematics, Physics) and 6 education levels (i.e., Freshman, Sophomore, Junior, Senior, Master, and Ph.D.).On the basis of our data, we form Q (2) = 600 blocks of average size ⟨q (2) ⟩ = 16.86, by grouping together students belonging to the same degree program and education level.Note that not all possible compartments defined by the pairs degree program/education level contain at least one stu-dent.Also for layer ℓ = 2, the size of the various groups is not a constant, see Fig. S12.Network layers considered in the analysis of the IUB dataset are generated according to the LFR model.Inputs required for the generation of the topology of layer ℓ are the average degree ⟨k (ℓ) ⟩, the mixing parameter µ (ℓ) , and the community-membership vector ⃗ σ (ℓ) .For the generation of the topology of the graph layer ℓ, we first set the degree of each node to be a random variate extracted from a Poisson distribution with average ⟨k (ℓ) ⟩.We then connect pairs of nodes at random, taking care of preserving for each node its preassigned degree value, as well as the ratio µ (ℓ) between inter-community and total connections.

B. Measuring and tuning correlation among the community structures of the layers
We measure the similarity between the community structure of two network layers, namely ⃗ σ (1) and ⃗ σ (2) , in terms of normalized mutual information (NMI) [21].By definition, NMI values are in the range [0, 1]; however, the actual minimum and maximum values that NMI can assume depend on the size and number of clusters in the two partitions.Small values of the NMI are associated to uncorrelated partitions, whereas large NMI values indicate correlated partitions.

Synthetic graphs
In our systematic tests on synthetic network models that are constructed using the SBM, we generate perfectly correlated partitions as follows.First, we fix the size q (1) of the clusters in layer ℓ = 1; then we determine the size of the clusters in layer ℓ = 2 as q (2) = d q (1) , where d is a tunable integer parameter.For simplicity, we appropriately choose q (1) and d so that both q (1) and q (2) are divisors of N .For example, in many of our tests we have N = 10, 000, q (1) = 5, d = 5, and q (2) = 25.Finally, we set σ (ℓ) i = ⌊i/q (ℓ) ⌋ + 1 for all i = 1, . . ., N and for ℓ = 1, 2, with ⌊•⌋ floor function.
In the tests where we take advantage of the LFR model, the size of the communities is identical in both layers.We thus generate perfectly correlated partitions by setting σ for all nodes i in the network.Starting from the partitions ⃗ σ (1) and ⃗ σ (2) that are maximally correlated, their NMI is reduced by considering each node, and swapping with probability r its community membership in layer ℓ = 1 with that of another randomly selected node.For r = 0, no swaps are actually performed so that the correlation of the community partitions is preserved; for r = 1, community labels in layer ℓ = 2 are completely randomized compared to the initial configuration so that correlation between the layers' community memberships becomes minimal.
In the set of experiments where we consider edgecolored graphs composed of L = 3 SMB-generated net-work layers, all communities have the same size thus we set σ for all nodes i in the network to generate perfectly correlated partitions.To decrease correlation, we consider each node and swap with probability r its community membership in layers ℓ = 1 and ℓ = 2 with that of another randomly selected node.Please note that we only consider the cases r = 0 and r = 1 for this specific sets of experiments.

College housing/attendance network
To create completely uncorrelated partitions, we simply swap the housing assignment of each student with that of another randomly selected student.
To create a configuration with highly correlated partitions, we take advantage of the following greedy optimization technique.We define the tuple , where z v denotes the remaining availability of group v in layer ℓ.We initially set z , where q (ℓ) v , i.e., the size of group v in layer ℓ, is imposed by the IUB housing/attendance data.We set M = 0 denoting the number of elements that have been assigned to groups in the layers.We then iterate the following operations: 1. We find the group label r (ℓ) corresponding to the largest element in P(ℓ).We can either have r (1) ≥ r (2) , or r (2) ≥ r (1) .We indicate with r (ℓt) the larger of the two values, where ℓ t is the label of the corresponding layer; we use r (ℓs) to indicate the smaller of the two values, with ℓ s denoting the label of the corresponding layer.
4. Until all the individuals have been assigned to a group, i.e., M < N , we go back to point 1, otherwise, we end the iterative procedure.
In the above step 2, we maximize the overlap of the largest groups still available in the layers.This greedy choice is iterated multiple times until all individuals have been assigned to groups in both layers.At the end of the algorithm, all groups of a given layer will have the desired size.Since the labels of the nodes are completely arbitrary, the obtained partitions can be thought as equivalent to re-assigning students to dorms based on their enrollment.

C. SIR dynamics on edge-colored graphs
We consider continuous-time Susceptible-Infected-Recovered (SIR) dynamics on an edge-colored graph [19].Without loss of generality, we assume that the rate of recovery equals one.This is a standard setting as the behaviour of SIR models is determined by the ratio of the spreading and recovery rates rather than their raw values.We indicate with ρ i (t) = S, I or R the state of node i at time t.Our initial conditions are generally such that all nodes are in the S state, except for a randomly selected node v that is in state I, i.e., ρ i (t = 0) = S for i = 1, . . ., N and i ̸ = v, while ρ v (t = 0) = I.In some of our experiments, we also impose that a fraction f of randomly selected nodes is in the recovered state, i.e., ρ i (t = 0) = R with probability f for all i = 1, . . ., N .
Starting from an initial condition, the rules of SIR dynamics are as follows.A generic node i can change its state in the infinitesimal time interval [t, t + dt] if: (i) Node i is in the infected state and recovers with probability dt, i.e., ρ i (t) = I → ρ i (t + dt) = R with probability dt; (ii) Node i is in the susceptible state and gets infected with probability β (ℓ) dt by its infected neighbor j on layer ℓ, i.e., ρ i (t) = S → ρ i (t + dt) = I with probability β (ℓ) dt if ρ j (t) = I and A (ℓ) ij = 1.Please note that spreading events of type of (ii) can happen for all individual edges in all layers in the graph, thus if node i is in the S state at time t, the overall probability to get infected in the time interval i , where g ij δ(ρ j (t), I) is the number of infected neighbors that node i has in layer ℓ at time t with δ(x, y) = 1 if x = y and δ(x, y) = 0 otherwise.The dynamics proceeds until no nodes are in the infected state.The model is efficiently implemented via the Gillespie's algorithm [28].

Metrics of epidemic severity
We characterize the behavior of the SIR model on a given edge-colored graph by measuring the values of the following metrics: (i) The outbreak size, i.e., the fraction of nodes that are in state R at the end of the epidemic.
(ii) The peak of infection, i.e., the maximum value of the fraction of infected nodes that are simultaneously present in the network.
(iii) The epidemic duration, i.e., the instant of time when the last infected node recovers.
Given an edge-colored graph and the parameters β (ℓ) of the epidemic model, we simulate multiple times SIR dynamics.Values of the above metrics displayed in our figures correspond to their sample averages.
In the inset of Fig. 2(c), we approximate the pseudocritical epidemic threshold of the given edge-colored graph as the value of the spreading rate β corresponding to the maximum of the ratio between the sample standard deviation and the sample average of the outbreak size.

D. Code and data availability SUPPLEMENTAL MATERIAL Community sizes and epidemic dynamics on edge-colored graphs
In Fig. S1, we vary the group size in synthetic edgecolored graphs and show its effect on the metrics of epidemic severity.We look at the outbreak size, the fraction of the population infected during the peak, and the duration in the correlated and uncorrelated edge-colored graphs as a function of the spreading rate β.We set the community sizes q (1) = q (2) = q and consider three different values of q.We consider graphs with size N = 10, 000, mixing parameter µ = 0.025, and average degree ⟨k⟩ = 6.Furthermore, we consider the same setting and plot the outbreak size, the fraction of the population infected during the peak, and the duration in the correlated and uncorrelated edge-colored graphs as functions of the group size q in Fig. S2.
We also explore the effect of the heterogeneous community sizes on the outbreak.We consider edge-colored graphs with N = 10, 000 nodes, average degrees ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025.We choose the community sizes q from the power-law distribution P (q) ∼ q −τ for q ∈ [10, 100] and P (q) = 0 otherwise.We consider exponent values τ = 2, 3, and 4. Community sizes q 1 , q 2 , . . ., q Q are generated from the above powerlaw distribution and used in both layers to define their block structure.As a term of comparison, we also display results valid when communities have all identical size q = 10.We tune the correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics, and display average values in Figure S3.

Edge-colored graphs with heterogeneous degree distributions
In Fig. S4, we explore the effect of heterogeneous degree distributions in synthetic group-structured populations using edge-colored graphs with N = 10, 000 nodes.We consider degrees following power-law degree distribution defined by P (k) ∼ k −γ for k ∈ [2, 10], and P (k) = 0 otherwise.We fix the mixing parameter µ = 0.025 and community sizes at q = 100.We vary the exponent values γ from 2 to 4 for graphs with either maximally correlated (solid curves) or uncorrelated (dotted curves and transparent symbols) group structures.We consider three values of the spreading rate β = 0.2, 0.4, and 0.6, corresponding to reproduction numbers R 0 = 2.4, 4.8, and, 7.2, respectively.We show (a) the outbreak size, (b) the peak infected population fraction, and (c) the total outbreak duration, all as functions of the exponent γ.These values are averages over V = 5, 000 simulations.We see that the outbreak size and the peak infected population reduce as the degree exponent γ increases, i.e., as the degree distribution becomes less heterogeneous.

Edge-colored graphs with three layers
We study spreading on three-layer edge-colored graphs with correlated and uncorrelated group structures.We consider networks with N = 10, 000, the average degrees on the three layers is set to ⟨k (1) ⟩ = ⟨k (2) ⟩ = ⟨k (3) ⟩ = 4, and the mixing parameter is set to µ = 0.025.We fix the group sizes q = 20 in all the layers and compare the case where the three layers have a completely correlated and uncorrelated community structure.We tune the pairwise correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics and display average values in Figure S5.We further compare the results obtained on the three-layer edge-colored graphs with the two-layer case.To make the results comparable, we set ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6, so that the overall average degree of the nodes is the same as for the three-layer edge-colored graphs.

Site percolation on edge-colored graphs
In the ordinary site-percolation model, nodes are randomly occupied with probability p.For p = 0, no nodes are present in the system, so all clusters have null size; for p = 1, only one single cluster, coinciding with the whole network, is present.The term percolation transition refers to the structural change, between these two extreme configurations, observed as a function of the occupation probability p. Please note that no actual phase transition occurs in a network of finite size.We can, however, still well describe the overall change in connectedness of the network by looking at how the relative size of its largest cluster varies with p, and identify the pseudocritical point p c as value of the occupation probability p where the ratio of the standard deviation and the average value of the largest cluster, across multiple realizations of the model, is maximal.
To mimic the effect of random immunization on edgecolored graphs, we compute the site-percolation threshold of the aggregated version of the graph, i.e., the singlelayer network where nodes are connected if there exists an edge between them in any of the layers.
We use the same experimental setting as in Figs. 2 (d-f) of the main paper: number of nodes N = 10, 000 nodes, average degrees k (1) = 3 and k (2) = 10, community sizes q (1) = 5 and q (2) = 25.We consider edge-colored graphs with label shuffling probabilities r = 0 (correlated community structure) and r = 1 (uncorrelated community structure).We plot the site-percolation thresholds of the aggregated edge-colored graphs with correlated (blue circles) and uncorrelated (orange triangles) community structures as a function of the mixing parameter (µ) in Epidemic spreading in synthetic group-structured populations with variable group sizes.We consider edge-colored graphs with N = 10, 000 nodes and label shuffling probabilities r = 0 (correlated community structure) and r = 1 (uncorrelated community structure).Graphs with ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025 are considered.We set the community sizes q (1) = q (2) = q, and consider q = 20, 100, and 400.The solid and dotted curves show the indicated epidemic properties as a function of the spreading rate β for edge-colored graphs with correlated and uncorrelated community structures, respectively.We run V = 5, 000 simulations of the SIR dynamics, and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions the spreading rate β.Epidemic spreading in synthetic group-structured populations with variable group sizes.We consider edge-colored graphs with N = 10, 000 nodes and label shuffling probabilities r = 0 (correlated community structure) and r = 1 (uncorrelated community structure).Graphs with ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025 are considered.We set the community sizes q (1) = q (2) = q, and vary q from 10 to 1250.The solid and dotted curves show the indicated epidemic properties as a function of the group size q for edge-colored graphs with correlated and uncorrelated community structures, respectively.We run V = 5, 000 simulations of the SIR dynamics, and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions of the spreading rate β.Fig. S6.We use the Newman-Ziff algorithm to efficiently simulate the site-percolation model [29].

MEAN-FIELD APPROXIMATIONS
We consider two mean-field approximations for continuous-time susceptible-infected-recover (SIR) dynamics [2] on an edge-colored graph composed of two layers [17].Each layer is generated according to the planted partition model (PPM) [26].Nodes are partitioned in Q groups of identical size q = N/Q.We denote with σ (ℓ) i = 1, . . ., Q the community membership of node i in layer ℓ.Two generic nodes i and j in layer ℓ are connected with probability

and with probability
The parameter ⟨k⟩ denotes the average number of con-nections of nodes in a layer; µ is the mixing parameter of the PPM.
We initially assume that σ (2) i for all i = 1, . . ., N .This ensures perfect correlation among the layer-wise partitions.Correlation is reduced by shuffling, with probability 0 ≤ r ≤ 1, the community membership of each node in layer ℓ = 1 with that of another randomly selected node.For r = 0, partitions are still perfectly correlated; for r = 1, no correlation exists between the two partitions.
We denote with β and γ the spreading and recovery rates of SIR dynamics, respectively.
Group-based mean-field approximation (GBMFA) Since the system is perfectly symmetric, meaning that the number of communities in each layer equals Q, and FIG.S3.Epidemic spreading in synthetic group-structured populations with heterogeneous group sizes.We consider edge-colored graphs with N = 10, 000 nodes.Graphs with ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025 are considered.We choose the community sizes q from the power-law distribution P (q) ∼ q −τ for q ∈ [10, 100] and P (q) = 0 otherwise.We consider exponent values τ = 2, 3, and 4. As a term of comparison, we also display results valid when communities have all identical size q = 10.We fix the spreading rate to β = 0.2 corresponding to reproduction number R0 = 2.4.We tune the correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics, and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions of the normalized mutual information between the two group structures.(a) We consider edge-colored graphs with N = 10, 000 nodes.Nodes in the layers are organized in communities of size q = 100, and their degrees are chosen at random from a power-law degree distribution given by P (k) ∼ k −γ for k [2, √ q] and P (k) = 0 otherwise.The mixing parameter is µ = 0.025.We consider the cases when the layer-wise community structures are maximally correlated (solid curves and solid symbols) and when they are uncorrelated (dotted curves and transparent symbols).We plot (a) the outbreak size, (b) the fraction of the infected population at the peak, and (c) the total duration of the outbreak as functions of the exponent γ.Different combinations of colors and symbols are obtained for different values of the spreading rate β.Results are obtained over V = 5, 000 simulations.(d-f) Same as in (a), (b), and (c), respectively, but for q = 1, 000.
the probability of observing respectively within-group and between-group connections are p in and p out , then we can consider that there are only two types of nodes: 1) nodes with identical community labels and 2) nodes with different community labels.Let us indicate with n 1 the number of nodes in each of the groups with identi-cal labels, and with n 2 the number of nodes in each of the groups with distinct labels.There are a total of Q groups of nodes with identical labels, and Q 2 − Q groups of nodes with distinct group labels.We consider edge-colored graphs with N = 10, 000 nodes.We consider two versions of the system, respectively composed of two and three layers.In both versions, the group sizes q are set equal to 20 in all the layers and the mixing parameter is µ = 0.025.In the two-layer case, we fix ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6, whereas in the three-layer case we have ⟨k (1) ⟩ = ⟨k (2) ⟩ = ⟨k (3) ⟩ = 4.We contrast the cases where the layers have a completely correlated and uncorrelated community structure.We tune the pairwise correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics and display average values in the plot.S6.Effect of community correlation on the sitepercolation threshold of edge-colored graphs.Edgecolored graphs with label shuffling probability r = 0 (correlated community structure) and r = 1 (uncorrelated community structure) are considered.The number of nodes N = 10, 000 nodes, average degrees k (1) = 3 and k (2) = 10, community sizes q (1) = 5 and q (2) = 25 in the two layers.The site percolation thresholds of the aggregated edge-colored graphs with correlated (blue circles) and uncorrelated (orange triangles) community structures are plotted as a function of the mixing parameter (µ).
The expected number of nodes of type 1 is Before nodes are re-assigned, there are N/Q nodes with identical community labels (a, a) for each a value.Out of these nodes, (1 − r)N/Q are not re-assigned.Among those that are re-assigned, i.e., rN/Q × 1/Q are reassigned to their original cluster, thus the second term.The expected number of nodes of type 2 is obtained from Connectivity matrix for type-1 and type-2 nodes.
We first focus on the number of spreading events that, on average, a generic node of type 1 is involved with in the network.Indicate with (a, a) the cluster labels of this generic node.We can identify four main types of spreading events that this node has with other nodes in the network: 1.A spreading event between a node of type 1 with community labels (a, a) and another node of type 1 with community label (a, a) happens at rate 2βp in .This follows from the fact that the two nodes are connected with probability p in in each layer, and that there are two layers.There are n 1 − 1 of such potential spreading events as there are other n 1 − 1 nodes with community labels (a, a).

2.
A spreading event between a node of type 1 with cluster labels (a, a) and another node of type 1 with community label (b, b), with a ̸ = b, happens at rate 2βp out .There are (Q − 1)n 1 of such potential spreading events as there are other Q − 1 pairs of community labels (b, b), with a ̸ = b, each containing n 1 nodes.
is the total number of pairs of distinct community labels, and we discounted from this number the other communities already considered at point 3.
We summarize the various types of spreading events involving a node of type 1 in Table S1.
TABLE S1.Connectivity matrix for a node of type 1 with community labels (a, a).
We can now repeat a similar enumeration for a node of type 2 with community labels (a, b) and a ̸ = b.We can identify five main types of spreading events that this node has with other nodes in the network: 1.A spreading event between a node of type 2 with community labels (a, b), with a ̸ = b, and another node of type 1 with community labels (a, a) or (b, b) happens at rate β(p in + p out ).The term p in is due to the fact that the two nodes belong to the same community in one of the layers; the term p out is due to the fact that the two nodes belong to distinct communities in one of the layers.There are 2n 1 of such potential spreading events as there are only 2 groups of this type each composed of n 1 nodes.
is the total number of pairs of distinct community labels, and we discounted from this number the other clusters already considered at points 3 and 4.
We summarize the various types of spreading events involving a node of type 2 in Table S2.

Mean-field equations
The differential equation that determines the evolution of individual nodes of type 1 and 2 are respectively where S x and I x are the probabilities that a generic node of type x = 1, 2 is susceptible or infected, respectively.We consider edgecolored graphs with N = 1, 000 nodes and label shuffling probabilities r = 0 (correlated community structure, blue circles) and r = 1 (uncorrelated community structure, orange triangles).We set the community q (1) = q (2) = q = 100, the mixing parameter to µ = 0.025 and the overall average degree ⟨k⟩ = 10.One layer has average degree ⟨k (1) ⟩, whereas the other has average degree ⟨k (2) ⟩ = ⟨k⟩ − ⟨k (1) ⟩.Once the graph has been generated, we numerically estimate the largest eigenvalue of the adjacency matrix λmax.We plot λmax vs. ⟨k of the epidemic curves obtained from numerical simulations is due to the increasingly smaller number of surviving runs that are obtained as time increases.A total of V = 5, 000 simulations are run for each configuration.In all cases, SIR dynamics is started from an initial conditions in which all nodes are in the S state, except for a randomly chosen node in the I state.
In Fig. S11, we compare mean-field predictions of the size of outbreak, the peak value of infected, and the duration of the epidemic with estimates from numerical simulations (averages over V = 1, 000 realizations).Please note that the average of the peak value of the fraction of infected does not correspond to those visualized in Fig. S8.In the measurements done in Fig. S11, the peak value can occur at any instant of time during SIR dynamics, instead the results of Fig. S8 display averages at specific ranges of time.

Group sizes for the IUB data
We take advantage of data about housing and attendance at Indiana University Bloomington (IUB) to construct realistic edge-colored group-structured graphs.There are N = 10, 132 individuals in the IUB dataset, each representing a student who resided in one of the campus facilities during the Fall 2019 semester.We use layer ℓ = 1 to represent contacts between students in housing settings.We form Q (1) = 396 groups composed of students living on the same floor of a large dormitory, or in the same Greek house.The average size of these groups is ⟨q (1) ⟩ = 25.58.Layer ℓ = 2 is used to represent classroom interactions.We form Q (2) = 600 groups of students based on their enrollment program (e.g., Computer Science, Finance, Mathematics, Physics) and their education level (from Freshman to Ph.D.).The average size of these groups is ⟨q

FIG. 1 .
FIG.1.Modeling framework and metrics of epidemic severity.(a) As a basic illustration of the modeling framework, we consider an edge-colored graph composed of N = 200 nodes and two layers of interactions.Nodes are organized in four groups of size q (ℓ) = 50 for both layer ℓ = 1 and ℓ = 2. Colors of the nodes (i.e., orange, purple, blue, green) represent the community structure in layer ℓ = 1; shapes (i.e., circle, square, triangle, diamond) indicate the community memberships in layer ℓ = 2. Here, the community structures of the layers are maximally correlated (NMI = 1.0, with NMI standing for normalized mutual information), meaning that there is a one-to-one map from shapes to colors.Connections among pairs of nodes are created such that the average degree is ⟨k (ℓ) ⟩ = 3; a fraction µ = 0.025 of these edges connects a node to other nodes outside its own community.Using these parameters, we generate edges in layer ℓ = 1 (full gray) and in layer ℓ = 2 (dashed red).(b) Same as in (a), but for uncorrelated community structure (NMI = 0.0), i.e., colors and shapes are assigned to nodes randomly.(c) We run V = 1, 000, 000 simulations of the SIR model on top of the graphs of panels (a) and (b) by setting the spreading rate β = 0.4, i.e., setting the basic reproduction number R0 = β(⟨k(1) ⟩ + ⟨k(2) ⟩) = 2.4.We display the fraction of infected nodes in the population as a function of time.We consider bins of size 0.05 and only report the average values over at least 100 surviving runs.(d) Same as in (c), but here we are displaying the fraction of recovered nodes in the population as a function of time.

FIG. 3 .FIG. 4 .
FIG.3.Immunization in synthetic group-structured populations.(a) We consider the same experimental setting as in the bottom row of Fig.2.We set the mixing parameter µ = 0.025 and consider three values of the spreading rate β, i.e., 0.2, 0.4, and 0.6, corresponding to the reproduction number R0 = 2.6, 5.2, and 7.8, respectively.We change, however, the initial condition of the dynamics, by immunizing a random fraction nodes.We then plot the size of the outbreak as a function of the fraction of immunized nodes.Results are obtained by averaging the outcome of V = 5, 000 repetitions of the epidemic process.(b) Same as in (a), but for the peak fraction of infected nodes.(c) We rescale the abscissa values of panel (a) by the outbreak size that is observed when a null fraction of nodes is immunized.
ji = 1 if nodes i and j are connected, and A FIG. S1.Epidemic spreading in synthetic group-structured populations with variable group sizes.We consider edge-colored graphs with N = 10, 000 nodes and label shuffling probabilities r = 0 (correlated community structure) and r = 1 (uncorrelated community structure).Graphs with ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025 are considered.We set the community sizes q (1) = q (2) = q, and consider q = 20, 100, and 400.The solid and dotted curves show the indicated epidemic properties as a function of the spreading rate β for edge-colored graphs with correlated and uncorrelated community structures, respectively.We run V = 5, 000 simulations of the SIR dynamics, and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions the spreading rate β.
FIG. S2.Epidemic spreading in synthetic group-structured populations with variable group sizes.We consider edge-colored graphs with N = 10, 000 nodes and label shuffling probabilities r = 0 (correlated community structure) and r = 1 (uncorrelated community structure).Graphs with ⟨k (1) ⟩ = ⟨k (2) ⟩ = 6 and µ = 0.025 are considered.We set the community sizes q (1) = q (2) = q, and vary q from 10 to 1250.The solid and dotted curves show the indicated epidemic properties as a function of the group size q for edge-colored graphs with correlated and uncorrelated community structures, respectively.We run V = 5, 000 simulations of the SIR dynamics, and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions of the spreading rate β.

4 FIG
FIG. S4.Epidemic spreading in synthetic group-structured populations with heterogeneous degree distributions.(a) We consider edge-colored graphs with N = 10, 000 nodes.Nodes in the layers are organized in communities of size q = 100, and their degrees are chosen at random from a power-law degree distribution given by P (k) ∼ k −γ for k [2,√ q] and P (k) = 0 FIG.S5.Epidemic spreading in synthetic group-structured populations.We consider edge-colored graphs with N = 10, 000 nodes.We consider two versions of the system, respectively composed of two and three layers.In both versions, the group sizes q are set equal to 20 in all the layers and the mixing parameter is µ = 0.025.In the two-layer case, we fix ⟨k(1) ⟩ = ⟨k (2) ⟩ = 6, whereas in the three-layer case we have ⟨k(1) ⟩ = ⟨k(2) ⟩ = ⟨k (3) ⟩ = 4.We contrast the cases where the layers have a completely correlated and uncorrelated community structure.We tune the pairwise correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions of the spreading rate β.
FIG.S5.Epidemic spreading in synthetic group-structured populations.We consider edge-colored graphs with N = 10, 000 nodes.We consider two versions of the system, respectively composed of two and three layers.In both versions, the group sizes q are set equal to 20 in all the layers and the mixing parameter is µ = 0.025.In the two-layer case, we fix ⟨k(1) ⟩ = ⟨k (2) ⟩ = 6, whereas in the three-layer case we have ⟨k(1) ⟩ = ⟨k(2) ⟩ = ⟨k (3) ⟩ = 4.We contrast the cases where the layers have a completely correlated and uncorrelated community structure.We tune the pairwise correlation among the community structure of the layers by swapping community memberships of nodes as explained in the Methods section.We run V = 5, 000 simulations of the SIR dynamics and display average values in the plot.(a) Outbreak size, (b) fraction of the infected population at the peak, and (c) total duration of the outbreak are plotted as functions of the spreading rate β.

3 . 4 .
A spreading event between a node of type 1 with community labels (a, a) and another node of type 2 with community labels (a, b) or (c, a), with b ̸ = a and c ̸ = a, happens at rate β(p in + p out ).The term p in is due to the fact that the two nodes belong to the same community in one of the layers; the term p out is due to the fact that the two nodes belong to distinct communities in one of the layers.There are 2(Q − 1)n 2 of such potential spreading events.There are in fact Q−1 groups with labels (a, b) and b ̸ = a, and other Q − 1 groups with labels (c, a) and c ̸ = a.Each of these groups contain n 2 nodes on average.Finally, a spreading event between a node of type 1 with community labels (a, a) and another node of type 2 with community labels (b, c), with c ̸ = b, c ̸ = a and b ̸ = c, happens at rate 2βp out and there are (Q 2 − 3Q + 2) n 2 potential events of this type.The coefficient Q Type of node Number of contacts Effective spreading rate 1, (a, a) n1 − 1 2βpin 1, (b, b)

2 .
A spreading event between a node of type 2 with community labels (a, b), with a ̸ = b, and another node of type 1 with community labels (c, c), with a ̸ = c and b ̸ = c, happens at rate 2βp out .This follows from the fact that the two nodes are connected with probability p out in each of the two layers.There are (Q−2)n 1 of such potential spreading events as there are only Q − 2 groups of this type each composed of n 1 nodes.3.A spreading event between a node of type2 with community labels (a, b), with a ̸ = b, and another Type of node Number of contacts Effective spreading rate 1, (a, a) ∨ (b, b)

5 .
Finally, a spreading event between a node of type 2 with community labels (a, b), with a ̸ = b, and another node of type 2 with community labels (c, d), with a ̸ = c, b ̸ = d and c ̸ = d, happens at rate βp out .This follows from the fact that the two nodes are connected with probability p out in each of the two layers.There are (Q 2 − 3CQ + 3) n 2 potential events of this type.The coefficient Q FIG. S7.Theoretical prediction of the epidemic threshold in group-structure populations.(a) We consider edgecolored graphs with N = 1, 000 nodes and label shuffling probabilities r = 0 (correlated community structure, blue circles) and r = 1 (uncorrelated community structure, orange triangles).We set the community q (1) = q (2) = q = 100, the mixing parameter to µ = 0.025 and the overall average degree ⟨k⟩ = 10.One layer has average degree ⟨k(1) ⟩, whereas the other has average degree ⟨k(2) ⟩ = ⟨k⟩ − ⟨k(1) ⟩.Once the graph has been generated, we numerically estimate the largest eigenvalue of the adjacency matrix λmax.We plot λmax vs. ⟨k(1) ⟩.The dashed horizontal line indicates λmax = ⟨k⟩.(b) Same as in (a), but for µ = 0.3.(c) Same as in (a), but for q = 25.(d) Same as in (a), but for ⟨k⟩ = 100 and q = 250.(e) Same as in (d), but for µ = 0.3.(f) Same as in (d), but for q = 100.

( 2 )
FIG. S10.Comparison between mean-field predictions and numerical estimates of the epidemic curves.Same as in Fig.S8, but for ⟨k⟩ = 40.
1, we provide an example of our modeling framework, see Methods for details.Continuous-time Susceptible-Infected-Recovered (SIR) dynamics happens

TABLE S2 .
Connectivity matrix for a node of type 2 with community labels (a, b) with a ̸ = b.node of type 2 with community labels (a, b) happens at rate 2βp in .This follows from the fact that the two nodes are connected with probability p in in each of the two layers.There are n 2 of such potential spreading events as there is only 1 group of this type each composed of n 2 nodes.4.A spreading event between a node of type 2 with community labels (a, b), with a ̸ = b, and another node of type 2 with community labels (a, c) or (c, b), with a ̸ = c and b ̸ = c, happens at rate β(p in +p out ).This follows from the fact that the two nodes are connected with probability p in in one layer, and with probability p out in the other layer.There are are 2(Q − 2)n 2 of such potential spreading events as there are only 2(Q − 2) groups of this type each composed of n 2 nodes.