Super-Resolution Community Detection for Layer-Aggregated Multilayer Networks

Applied network science often involves preprocessing network data before applying a network-analysis method, and there is typically a theoretical disconnect between these steps. For example, it is common to aggregate time-varying network data into windows prior to analysis, and the trade-offs of this preprocessing are not well understood. Focusing on the problem of detecting small communities in multilayer networks, we study the effects of layer aggregation by developing random-matrix theory for modularity matrices associated with layer-aggregated networks with N nodes and L layers, which are drawn from an ensemble of Erdős–Rényi networks with communities planted in subsets of layers. We study phase transitions in which eigenvectors localize onto communities (allowing their detection) and which occur for a given community provided its size surpasses a detectability limit K*. When layers are aggregated via a summation, we obtain K∗∝O(NL/T), where T is the number of layers across which the community persists. Interestingly, if T is allowed to vary with L, then summation-based layer aggregation enhances small-community detection even if the community persists across a vanishing fraction of layers, provided that T/L decays more slowly than 𝒪(L−1/2). Moreover, we find that thresholding the summation can, in some cases, cause K* to decay exponentially, decreasing by orders of magnitude in a phenomenon we call super-resolution community detection. In other words, layer aggregation with thresholding is a nonlinear data filter enabling detection of communities that are otherwise too small to detect. Importantly, different thresholds generally enhance the detectability of communities having different properties, illustrating that community detection can be obscured if one analyzes network data using a single threshold.


I. INTRODUCTION
Network-based modeling provides a powerful framework for analyzing high-dimensional data sets and complex systems [1].Often, a network is best represented by a set of network layers that encode different types of interactions, such as categorical social ties [2] or a network at different instances in time [3], and an important pursuit involves extending network theory to the multilayer setting [4,5].Sometimes, however, a multilayer framework can require too much computational overhead or can represent an overmodeling (e.g., when the layers are correlated, either in terms of the edge overlap [6] or other properties [7][8][9]), and it can be beneficial to aggregate layers [9][10][11].In particular, aggregation provides a crucial step for analyzing temporal network data, which is often binned into time windows [12,13] (see Fig. 1).Layer aggregation and other types of network preprocessing (e.g., sparsification [14], network inference [15], and denoising [16,17]) can greatly influence the resulting network structure, which in turn influences the outcomes of network analyses and their many applications.In general, there remains a significant need for improved theoretical understanding for how such network preprocessing influences network-analysis methodology.
We study the effects of layer aggregation on community detection, one of the widely used methods for studying social, biological, and physical networks [18][19][20][21].Communities are typically studied as dense subgraphs and can represent, for example, coordinating neurons in the brain [13] or a social clique [22] in a social network.(Hereafter, we restrict our usage of the term "clique" to the graph-theoretical meaning of a subgraph with all-to-all coupling.)Of particular interest is the detection of small-scale communities, which is a paradigmatic pursuit for anomaly detection within the fields of signal processing and cybersecurity [23][24][25][26][27][28].In this context, small communities can represent anomalous events such as attacks [23], intrusions [24], and fraud [25].
Given these and many other applications, there is great interest in understanding fundamental limitations on community detection [11,[26][27][28][29][30][31][32][33][34][35][36].We highlight recent detectability results for multilayer [10,11,37] and temporal networks [29].It is worth noting that much of the detectability research has focused on large-scale communities whose sizes are OðNÞ, where N is the number of nodes in the network [29][30][31][32][33][34][35], and the phase transitions are typically driven by varying the prevalence (e.g., edge density) of the communities.In contrast, detectability phase transitions for small communities can also be onset by varying their size K [11,[26][27][28] and are thus a type of resolution limit [36].We note that the literatures on detectability and resolution limits have developed independently, and there is need for a better understanding of the relationship between these topics.In particular, a planted clique in a single-layer Erdős-Rényi (ER) network is detectable via a spectral analysis only if its size K surpasses a detectability limit K Ã ∝ Oð ffiffiffiffi N p Þ [26], in which case, a dominant eigenvector (in this case, that corresponding to the second-largest eigenvalue of the adjacency matrix) localizes onto the clique.Extending previous research for the detectability of a clique planted in single-layer networks [26][27][28] and a clique that persists across all layers of a multilayer network [11], herein we study the detectability of small communities (including, but not limit to, cliques) planted in a subset of layers in a multilayer network.
With the application of detecting small communities in mind, we study the effects of layer aggregation as a network preprocessing step.We first ask a foundational question: Across how many layers must a community persist in order for layer aggregation to benefit detection.To this end, we study a multilayer network model in which small communities are hidden in network layers generated as ER networks with N nodes and L layers with (possibly) heterogeneous edge probabilities.We study detectability phase transitions wherein eigenvectors localize onto communities, which we analyze by developing random matrix theory for the eigenvectors of modularity matrices associated with an aggregation of the layers.When the aggregation is given by summation of the adjacency matrices, the detectability phase transition occurs when a community's size K ≪ N surpasses a critical value where T is the number of layers across which a community persists.Note that if T depends on L, then summation-based layer aggregation benefits smallcommunity detection even if the fraction T=L of layers containing the community vanishes, provided that the fraction decays more slowly than OðL −1=2 Þ.
We additionally study network preprocessing via thresholding-that is, we threshold a summation of layers' adjacency matrices at some value L so that there exists an unweighted edge between two nodes in the aggregated network if and only if there exists at least L edges between them across the L layers.While it is well known that thresholding can be used to simultaneously sparsify and dichotomize a network, here we introduce thresholding as a nonlinear data filter [38] for enhancing small-community detection.Specifically, we find that thresholding can, in some cases, reduce K Ã by orders of magnitude, revealing communities that are otherwise too small to detect.We call this phenomenon super-resolution community detection and show, for clique detection in sparse networks, that K Ã decays exponentially with ffiffiffi ffi L p =T for threshold L ¼ T. Importantly, we find that different thresholds enhance the detection of communities with different properties (e.g., size and edge density), illustrating how community structure can be obscured if one uses a single threshold, which is an important insight for network preprocessing in general.
The remainder of this paper is organized as follows.In Sec.II, we further specify our model.In Sec.III, we study the effects of layer aggregation on detectability phase transitions characterized by eigenvector localization.In Sec.IV, we highlight implications of our findings with a numerical experiment involving small-community detection in a temporal network.We provide a discussion in Sec.V

II. MODEL A. Multilayer networks with planted small communities
We generate L network layers with N nodes so that each layer l ∈ f1; …; Lg is an ER random graph with edge probability p l ∈ ð0; 1Þ, which is allowed to vary across the layers.We plant R communities via the following process.For r ∈ f1…; Rg, uniformly at random, we select a set T r ⊂ f1; …; Lg of layers and a set K r ⊂ V ¼ f1; …; Ng of nodes, and we define an edge probability ρ r .The variable K r ¼ jK r j ≪ N denotes the size of community r, and we refer to T r ¼ jT r j as its persistence across network layers.Then, for each r, we construct a dense subgraph between FIG. 1. Preprocessing networks (including multilayer representations of temporal networks) often involves aggregating network data into bins (or time windows).We study how many layers must contain a community in order for aggregation to enhance its detection and introduce layer aggregation with thresholding as a filter enabling super-resolution community detection.
nodes K r in layers T r by first removing edges between them occurring under the ER model and creating new edges with probability ρ r .To ensure that the communities are denser than the remaining network, we assume ρ r > hp l i, where h•i denotes the mean value across all layers.We allow self-edges in both the ER model and the planted communities.We note that the layers are not required to have a particular ordering, and the community is not restricted only to consecutive layers.Moreover, we restrict our study to nonoverlapping communities by assuming that the communities involve different nodes so that K r ∩ K s ¼ 0 for any r ≠ s.We leave open the study of eigenvector localization in the case of overlapping communities.Finally, we assume P r K r ≪ N so that only a small fraction of nodes are involved in communities, making them anomalous structures.

B. Layer-aggregation methods
We find that layer aggregation is a preprocessing step for multilayer networks that can be used to reduce data size and/or as a data filter to benefit network-analysis outcomes such as community detection.Following the approach in Ref. [10], we study two methods for aggregating layers of a multilayer network: (i) The summation network corresponds to the weighted adjacency matrix Ā ¼ P l A ðlÞ , where A ðlÞ denotes the symmetric adjacency matrix encoding each network layer l ∈ f1; …; Lg. (ii) The family of thresholded networks represented by unweighted adjacency matrices f Âð LÞ g are obtained by applying a threshold L ∈ f1; …; Lg to the entries f Āij g of matrix Ā, Note that thresholding dichotomizes the network, and one can vary L to tunably sparsify the network.

III. DETECTABILITY OF SMALL COMMUNITIES WITH EIGENVECTOR LOCALIZATION
We now develop random matrix theory to analyze how layer aggregation affects small-community detection.In Sec.III A, we present results for aggregation by summation, studying the fraction of layers that must contain a community in order for layer aggregation to enhance detection.In Sec.III B, we present results for layer aggregation with thresholding, highlighting that certain threshold values can yield super-resolution community detection.

A. Layer aggregation via summation 1. Random matrix theory for modularity matrices
We first describe the statistical properties of matrix entries f Āij g.For edges ði; jÞ∈ ∪ r fK r × K r g, f Āij g are independent and identically distributed (i.i.d.) random variables following a Poisson binomial distribution, Pð Āij ¼ aÞ ¼ f PB ða; L; fp l gÞ, where and S a denotes the set of ð L a Þ different subsets of layers f1; …; Lg that have cardinality a (i.e., S 1 ¼ ff1g; f2g; …g, S 2 ¼ ff1; 2g; f1; 3g; …g, and so on).We note that f PB ða; L; fp l gÞ has mean Lhp l i and variance Lhp l ð1 − p l Þi.When the edge probability is identical across the layers (i.e., p l ¼ p), then Eq. ( 2) simplifies to the binomial distribution, fða; L; pÞ ¼ L a p a ð1 − pÞ P−a ; ð3Þ with mean Lp and variance Lpð1 − pÞ.
For within-community edges ði; jÞ ∈ fK r × K r g associated with community r, the entries f Āij g are i.i.d.random variables following f PB ða; L; fq ðrÞ l gÞ, where q ðrÞ l ¼ ρ r for l ∈ T r and otherwise q ðrÞ l ¼ p l .It follows that the entries have mean T r ρ r þ P l∈f1;…;LgnT r p l and variance Because the layers T r are selected uniformly at random, the expected mean and variance across all possible choices for T r are given by T r ρ r þ ðL − T r Þhp l i and T r ρ r ð1 − ρ r Þ þ ðL − T r Þhp l ð1 − p l Þi, respectively.
We now study the spectra of the modularity matrix [39], based on an ER null model in which each edge has expected weight Lhp i i. Importantly, this null model does not use knowledge that edges ði; jÞ between nodes i, j ∈ K r have different expected edge probability [i.e., T r ρ þ ðL − T r Þhp i i vs Lhp i i], which respects our assumption that it is unknown which nodes are in the hidden community.We note that one could also define the ER null model with the observed mean edge probability Lhp i i þ P r ½ðK 2 r T r Þ=N 2 Lðρ r − hp i iÞ to account for the slight increase in overall edge probability due to the presence of small communities.However, this change does not affect the position of the dominant eigenvalues relative to the bulk, which is the relevant issue for community detectability, as we will see below.In particular, since ½ðK 2 r T r Þ=N 2 L ≪ 1 for each r, even the shift of the single associated eigenvalue within the bulk is negligible; therefore, we focus on the null model with expected edge weight Lhp i i.
We develop random matrix theory based on the analysis in Refs.[27,40].To this end, we note that B can be written in the form is a rank-R matrix with eigenvalues given by and fu ðrÞ g are normalized indicator vectors for the R communities that have entries The random matrix X has zero-mean entries X ij with variance Tρ r ð1−ρ r ÞþðL−T r Þhp l ð1−p l Þi if ði;jÞ∈K r ×K r , and Lhp l ð1 − p l Þi otherwise.In the N → ∞ limit, and assuming the sizes fK r g grow more slowly than N, then the P r K 2 r ≪ N 2 matrix entries corresponding to communities become negligible and X limits to a Wigner matrix [41].This allows us to use known results for the limiting dominant eigenvector of low-rank perturbations of Wigner matrices with variance 1=N.Specifically, we define γ ¼ 1= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi NLhp l ð1 − p l Þi p so that the matrix γX has entries with variance 1=N in the limit.We similarly define so that γ B ¼ P r θr u ðrÞ ðu ðrÞ Þ T þ γX.It follows that the limiting N → ∞ dominant eigenvectors fv ðrÞ g of γ B (and of B since scalar multiplication does not affect eigenvectors) satisfy [40,42] Note we assume that the dominant eigenvectors have been suitably enumerated so that v ðrÞ corresponds to the eigenvector localizing on community r.The value θr ¼ 1 identifies critical points at which there is a phase transition in eigenvector localization and detectability for community r, and this gives the critical community size In other words, a small community can be detected using a dominant vector v ðrÞ of B only when K r > K Ã r .We note that setting L ¼ T r ¼ 1, ρ r ¼ 1, and p l ¼ p in Eq. ( 11) , which describes the detectability transition for a single planted clique in a single-layer network [26].
We highlight an important consequence of Eq. ( 11).First, if the community persists across some fixed fraction of the layers, TðLÞ ¼ cL, then K Ã r ∝ ffiffiffiffiffiffiffiffiffi ffi N=L p ; therefore, if N, p, and T r =L are held fixed and L increases, then K Ã r vanishes with scaling OðL −1=2 Þ.This square-root scaling behavior is similar to that obtained for detection in layer aggregation of large-scale communities that persist across all layers [10].Second, for fixed N and p, a community of fixed size K r and persistence T r will become impossible to detect as L increases because K Ã r increases with scaling OðL 1=2 Þ.This result highlights the importance of knowing which layers potentially contain the community since the aggregation of layers lacking the community can severely inhibit its detection.
Digging further, one can let T r vary with L and then ask how K Ã r depends on the scaling behavior for T r .For T r ∝ L β , Eq. ( 11) implies In other words, T r , the number of layers containing the community, must increase with L at least as OðL 1=2 Þ; otherwise, summation-based layer aggregating will inhibit (rather than promote) small-community detection.Note that T ∝ L −1=2 is a critical case in which K Ã r is independent of L. We highlight that Eq. ( 12) is somewhat surprising since summation-based aggregation benefits detection even if the fraction T r =L of layers containing the community vanishes with L, provided that it decays more slowly than OðL −1=2 Þ.

Numerical validation and scaling behavior
We support Eqs. (10) and (11) in Fig. 2, using numerical experiments with N ¼ 10 4 nodes and edge probabilities fp l g drawn from a Gaussian distribution with mean p ¼ 0.01 and standard deviation σ p ¼ 0.001.We focus on the case of clique detection (i.e., ρ ¼ 1), hiding the clique in T ¼ 2 of the L ¼ 16 layers.In Fig. 2(a), we plot the entries fv ðrÞ i g (symbols) of the dominant eigenvector of the modularity matrix for the summation network as well as the entries fu ðrÞ i g for the indicator vector, which are nonzero only for nodes i ∈ K involved in the clique.We show results for community sizes K r ∈ f6; 26; 86g, which respectively place the system below, just above, and well above the phase transition.The illustration highlights that as K increases, vector v ðrÞ aligns with u ðrÞ .We quantify this localization phenomenon by plotting in Fig. 2(b) observed (symbols) and predicted values of jhv; uij 2 given by Eq. ( 10) (curve).Note that the values of jhv ðrÞ ; u ðrÞ ij 2 depict a phase transition that occurs at a critical subgraph size K Ã r given by Eq. ( 11): jhv ðrÞ ; u ðrÞ ij 2 > 0 when K r > K Ã r , whereas jhv; uij 2 ¼ 0 when K r ≤ K Ã r .This phase transition in eigenvector localization drives a phase transition for community detection based on v ðrÞ .Arrows indicate the values of K r used in panel (a).
In Fig. 3(a), we compare observed (symbols) and predicted values of jhv; uij 2 given by Eq. (10) (curves) for varying K r with T r ∈ f1; 2; 4; 8g.Open symbols indicate the parameters used in Fig. 2, whereas filled symbols indicate the mean value of jhv; uij 2 for 10 trials in which the layers' edge probabilities fp l g are drawn uniformly from [0, 0.02].Note that as T r increases, the curves shift to the left, illustrating that as the community persists across more layers, the localization phenomenon is stronger and the hidden community is easier to detect.In Fig. 3(b), we study the dependence of K Ã r on the number of layers, L, and we compare the effect of keeping T r fixed vs allowing T r to grow with L. Specifically, we set either T r ðLÞ ¼ 20 or T r ðLÞ ¼ L, and we plot the value of K Ã r given by Eq. (11).Note that if the community persists across a fraction of the layers-that is, T r ðLÞ ¼ cL for some constant c-then K Ã r vanishes with scaling OðL −1=2 Þ.However, if T r is held fixed, then K Ã r increases with scaling OðL 1=2 Þ.
In summary, these experiments illustrate how layer aggregation through summation can enhance smallcommunity detection if the community persists across sufficiently many layers, but it can obscure detection if the community is present in too few layers.We will see in the next section that thresholding the summation can help overcome this problem, potentially reducing the detectability limit by orders of magnitude to yield superresolution community detection.

B. Thresholding as a nonlinear data filter 1. Random matrix theory for modularity matrices
We now study layer aggregation with thresholding as a filter that enhances small-community detection.We begin by solving for effective edge probabilities for the thresholding process [10].Thresholding the summation P l A ðlÞ at L yields a binary adjacency matrix Âð LÞ with entries Âð LÞ ij ∈ f0; 1g indicating whether or not Āij ≥ L. For edges ði; jÞ∈∪ r fK r × K r g, Āij follows a Poisson binomial distribution f PB ða; L; fp l gÞ given by Eq. ( 2), and the inequality is satisfied with probability where F PB ða; L; fp l gÞ is the associated cumulative distribution function (CDF).For edges ði; jÞ ∈ fK r × K r g, Āij follows a Poisson binomial distribution f PB ða; L; fq ðrÞ l gÞ given by Eq. ( 2), and the inequality is satisfied with probability for nodes i ∈ f1; 100g.As shown by the illustration, as K r increases, v ðrÞ aligns with the indicator vector u ðrÞ , which is nonzero only for the K r ≪ N entries u ðrÞ i that correspond to nodes in the community, K r .(b) Observed (symbols) and predicted (curves) values of jhv ðrÞ ; u ðrÞ ij 2 given by Eq. ( 10) quantify this localization phenomenon.Arrows indicate the values of K used for panel (a).The critical size K Ã r such that jhv ðrÞ ; u ðrÞ ij 2 ¼ 0 for K r ≤ K Ã r , whereas jhv ðrÞ ; u ðrÞ ij 2 > 0 for K r > K Ã r marks a phase transition-that is, both in terms of eigenvector localization and detectability of the community.FIG. 3. Influence of community persistence T r on eigenvector localization for summation-based layer aggregation.(a) Observed (symbols) and predicted values of jhv ðrÞ ; u ðrÞ ij 2 given by Eq. ( 10) (curves) vs K r for T r ∈ f1; 2; 4; 8g.Open symbols indicate the parameters used in Fig. 2, whereas filled symbols indicate when the layers' edge probabilities fp l g are drawn uniformly from [0, 0.02]; we plot the mean value of jhv ðrÞ ; u ðrÞ ij 2 across 10 choices for the sets K r and T r .(b) Critical size K Ã r given by Eq. ( 11) vs L for fixed T r (dashed line) and T r ¼ L (solid line).As indicated by Eq. ( 12), layer aggregation by summation can enhance or inhibit detection depending on whether or not the scaling for T r ðLÞ exceeds where q ðrÞ l ¼ ρ r for l ∈ T r and otherwise q ðrÞ l ¼ p l .In the case of a clique (i.e., ρ r ¼ 1), Eq. ( 14) can be written as Given the effective edge probabilities for the network and a community (i.e., pð LÞ and ρð LÞ r , respectively), it is straightforward to study the detectability limits of a community for thresholded networks using Eqs.(10) and (11).In particular, we substitute L ¼ T r ¼ 1 to obtain where vðrÞ is a dominant eigenvector of modularity matrix and θr ¼ Kðρ Equations ( 16)- (18) illustrate that the detectability limits for thresholded networks depend only on the effective edge probabilities; however, these depend sensitively on the choice of threshold L. Importantly, KÃ r given by Eq. ( 18) can potentially be orders of magnitude smaller than K Ã r given by Eq. ( 11), a phenomenon we call super-resolution detection.In addition to numerical experiments that will follow below, we further study this phenomenon by comparing KÃ r and K Ã r for network parameters wherein we can obtain deeper insight.We consider clique detection (i.e., ρ r ¼ 1) in a sparse network (i.e., p l ≪ 1) and focus on the threshold value Using these assumptions also in Eqs. ( 13) and ( 15), we find the effective edge probabilities pðT r Þ ¼ 1 − F PB ðT r − 1; L;fp l gÞ and ρðT r Þ r ¼ 1.Furthermore, we apply Hoeffding's inequality [43] to obtain pðT r Þ ≤ e −2Lðhp l i−T r =LÞ 2 .Noting 0 < hp l i ≪ T r =L, we find the hp l i → 0 limiting bound illustrating that pðT r Þ and KÃ r decay exponentially with T 2 r =L.On the other hand, we use the sparsity assumption in Eq. ( 11) to obtain Thus, in this case, K Ã r decays as Oð1= ffiffiffiffiffiffiffiffiffiffi ffi T 2 r =L p Þ, whereas KÃ r decays exponentially (i.e., considerably faster) with T 2 r =L.

Numerical validation and super-resolution detection
We now support Eqs. ( 13)-( 18) with numerical experiments and illustrate that certain thresholds lead to superresolution community detection.We consider the detection of a dense subgraph that is hidden in both (a) a dense network with hp l i ¼ 0.5 and (b) a sparse network with hp l i ¼ 0.01.Both networks were constructed with N ¼ 10 4 , σ p ¼ 0.001, ρ r ¼ 1, L ¼ 16, and T r ¼ 5.
In Fig. 4, we compare observed (symbols) and predicted values (curves) of the effective edge probabilities pð LÞ given by Eq. ( 13) and ρð LÞ r given by Eq. ( 14) as a function of the threshold L. Note in both panels that the effective edge probability pð LÞ of the background network always decays with increasing L. In contrast, the effective edge probability between nodes in the community depends on whether or not L > T r : ρð LÞ r ¼ 1 when L ≤ T r since ρ ¼ 1, whereas ρð LÞ r decays with increasing L for L > T r .Importantly, the rate of decay depends on the network's mean edge density hp l i: ρð LÞ slowly decreases for the dense network, whereas it abruptly drops for the sparse network.
In Fig. 5, we plot observed (symbols) and predicted values (curves) for jhv ðrÞ ; u ðrÞ ij 2 given by Eq. ( 16) vs K for different choices of L. The parameters used are identical to those of Fig. 4, and panels (a) and (b) again depict results for hp l i ¼ 0.5 and hp l i ¼ 0.01, respectively.We highlight several important observations.First, note in both panels that L ¼ T r ¼ 5 yields better detectability than L ¼ 1.However, when L > T r , we find contrasting results for sparse and dense networks.For the sparse network shown in Fig. 5(b), the hidden community becomes harder to detect when L > T r (see curve for L ¼ 16), which intuitively occurs because ρð LÞ r rapidly decays and the thresholded networks will no longer contain a dense subgraph.On the other hand, for the dense network depicted in Fig. 5(a), increasing L can improve detectability when L > T r (see curve for L ¼ 10).
We now present an experiment highlighting the occurrence of super-resolution community detection for certain threshold values.In Fig. 6, we study the dependence of the critical community size K Ã r on the threshold L. We plot KÃ r given by Eq. ( 18) as a function of L for p ∈ f0.01; 0.05; 0.2; 0.5g, N ¼ 10 4 , ρ ¼ 1, σ p ¼ 0.001, L ¼ 16, and either (a) T r ¼ 5 or (b) T r ¼ 10.Note that for the sparsest network, i.e., p ¼ 0.01, the minimum value of K Ã occurs when L ¼ T r (vertical dashed line).Interestingly, as the mean edge density p ¼ hp l i increases, the threshold L at which KÃ r attains its minimum value shifts from L ¼ T r towards L ¼ L. The horizontal lines on the right edge of the panels indicate K Ã r given by Eq. ( 11) for the summation network.
Importantly, note that for a wide range of parameters, KÃ r for the thresholded networks is significantly smaller than K Ã r for the corresponding summation networks.In particular, one can observe for p ¼ 0.1 and L=L ¼ T r =L in Fig. 6(b) that KÃ r is many orders of magnitude smaller than K Ã r [Oð10 −6 Þ times here].In other words, thresholding the summation can dramatically improve detectability as compared to summation without thresholding.This surprising result contrasts our previous findings for the detectability of large communities that persist across all layers [10], where it was found that thresholding always inhibited detection (although optimal thresholds were found to minimize inhibition).

IV. SMALL-COMMUNITY DETECTION IN TIME-VARYING NETWORKS
We now present an experiment involving small-community detection in time-varying networks to highlight several practical insights following from our theoretical results.Note that unlike Sec.III, where there were no restrictions on which layers a community persists, we now assume that each community persists across consecutive layers.We conducted experiments for a synthetic temporal network with N ¼ 10 4 nodes and L ¼ 32 time layers, each of which is drawn from an ER network with edge probability p l , which we drew from a Gaussian distribution with mean p ¼ 0.01 and standard deviation σ p ¼ 0.001.We then planted R ¼ 4 communities, each involving K r ¼ K ¼ 8 nodes, in the following sets of layers: T 1 ¼ f3; 4; 5g for community 1, T 2 ¼ f7; …; 15g for community 2, T 3 ¼ f18; …; 22g for community 3, and T 4 ¼ f24; …; 30g for community 4. In Fig. 7(a), we provide a representative illustration of the temporal network, where we indicate in which layers the communities are present.We also illustrate by the shaded region an example time window, or bin, W w ðtÞ ¼ ft − ðw − 1Þ=2; …; t þ ðw − 1Þ=2g for t ∈ fðw − 1Þ=2; L − ðw − 1Þ=2g, that contains layers to be aggregated.FIG. 5. Detectability phase transitions for threshold-based layer aggregation.We plot jhv ðrÞ ; u ðrÞ ij 2 vs community size K r with identical parameters to those used to produce Fig. 4 except with selected choices for the threshold L. FIG. 6. Super-resolution community detection for thresholdbased layer aggregation.We plot KÃ r given by Eq. ( 18) as a function of L for p ∈ f0.01; 0.05; 0.2; 0.5g, N ¼ 10 4 , ρ ¼ 1, σ p ¼ 0.001, L ¼ 16, and either (a) T r ¼ 5 or (b) T r ¼ 10.Note that the L value yielding the minimum KÃ r occurs at L ¼ T r (vertical dotted lines) for sparse networks, whereas it increases with increasing p [e.g., compare p ¼ 0.01 and p ¼ 0.5 in panel (b)].The horizontal lines on the right edge of the panels indicate K Ã r given by Eq. ( 11) for summation networks.Importantly, thresholding can potentially decrease KÃ r by many orders of magnitude as compared to K Ã r .
We first consider aggregation by summation.In Fig. 7(b), we illustrate by color the values jhv ðrÞ ; u ðrÞ ij 2 for the aggregation of layers across bins W w ðtÞ.In particular, we show Eq. ( 10) under the variable substitutions T r ðW w ðtÞÞ ↦ T and w ↦ L, where T r ðW w ðtÞÞ ¼ jW w ðtÞ ∩ T r j is the number of layers in which community r is present in bin W w ðtÞ.We show results for several bin widths w ∈ f1; 3; 5; 7; 9g.The green arrows indicate, for each r, the bin location and w value at which jhv ðrÞ ; u ðrÞ ij 2 obtains its maximum.As expected, jhv ðrÞ ; u ðrÞ ij 2 obtains its maximum for each community r when the bin W w ðtÞ is exactly the set of layers in which community r is present, W w ðtÞ ¼ T r (i.e., when T r ¼ w).
Before studying aggregation by summation and thresholding, we first make several important observations using Fig. 7. First, note that for w ¼ 1 in panel (b), no communities are detectable.In other words, all communities are undetectable if the layers are studied in isolation.However, they can be detected if the layers are binned into time windows.Second, because the optimal bin size w is unique to every community (i.e., because they have different persistence T r ∈ ½3; 9), there is no bin size that is best for all communities.In fact, detectability requires K r > K Ã r given by Eq. ( 11), which requires that, for each community, w is not too large or too small.For example, community 1 is only detectable when w ¼ 3, and community 3 is only detectable when w ∈ ½3; 7.
One final important observation for Fig. 7(b) is that even when communities are detectable, the values jhv ðrÞ ; u ðrÞ ij 2 are not very large-specifically, jhv ðrÞ ; u ðrÞ ij 2 ≤ 0.7 in all cases.This can be problematic since detection error rates increase as jhv ðrÞ ; u ðrÞ ij 2 decreases, approaching 100% error as jhv ðrÞ ; u ðrÞ ij 2 → 0. (See Ref. [27] for an analysis of error rates based on a hypothesis-testing framework for clique detection in single-layer networks.)Because jhv ðrÞ ; u ðrÞ ij 2 remains small for community 1 for all choices of w, it effectively remains undetectable by summationbased layer aggregation.
We now illustrate layer aggregation with thresholding as a filter that can allow greatly improved small-community detection for the temporal network shown in Fig. 7(a), including the accurate recovery of community 1.In Fig. 8, FIG.7. Detectability of small communities in temporal networks with summation-based binning into time windows.(a) Illustration of a temporal network with L ¼ 32 time layers and hidden communities that persist across different time layers.The shaded region indicates a bin, or time window, of size w ≤ L at time t for which the layers will be aggregated, which is a process that can be used to discretize and/or smooth the network data.The bin contains layers W w ðtÞ ¼ ft − ðw − 1Þ=2; …; t þ ðw − 1Þ=2g.(b) We illustrate by color the values jhv ðrÞ ; u ðrÞ ij 2 for the aggregation of layers across bins W w ðtÞ for each of the four communities r ∈ f1; 2; 3; 4g.In particular, we show Eq. ( 10) under the variable substitutions T r ðW w ðtÞÞ ↦ T and w ↦ L, where T r ðW w ðtÞÞ is the number of layers in which community r is present in bin W w ðtÞ.Layer aggregation across each bin was implemented by summation.We study a temporal network with N ¼ 10 4 , L ¼ 32, p ¼ 0.01, σ p ¼ 0.001, and we show results for several bin widths w ∈ f1; 3; 5; 7; 9g.The hidden communities all contain K r ¼ 8 nodes and have different persistent lengths T r as depicted in panel (a).The green arrows indicate, for each r, the bin location and w value at which jhv ðrÞ ; u ðrÞ ij 2 obtains its maximum.we plot jhv ðrÞ ; u ðrÞ ij 2 given by Eq. ( 16) with the variable substitutions T r ðW w ðtÞÞ ↦ T and w ↦ L into Eqs.( 13)- (18).Results reflect the aggregation of layers into bins W w ðtÞ for each of the four communities r ∈ f1; 2; 3; 4g and with bin sizes w ∈ f1; 3; 5; 7; 9g.Panels (a)-(c) indicate results for different thresholds, L ∈ fw; 0.8w; 0.5wg.
Our first observation for Fig. 8 is that none of the communities can be detected (for any threshold) if the layers are analyzed in isolation (see results for window size w ¼ 1).This result is similar to that shown in Fig. 7(b) for summation without thresholding (i.e., whenever w ¼ 1, we find jhv ðrÞ ; u ðrÞ ij 2 ¼ jhv ðrÞ ; u ðrÞ ij 2 ¼ 0).In other words, the detectability of communities is only made possible through layer aggregation.
Our next observation is that the values jhv ðrÞ ; u ðrÞ ij 2 are either zero or close to one, which is in sharp contrast to the values of jhv ðrÞ ; u ðrÞ ij 2 shown in Fig. 7(b), which can be observed to obtain many values across the range [0, 0.7].In other words, in this experiment, the use of thresholding as a filter allows small communities to be either strongly detected or not detected-there is no middle ground for weak detection (which is the case for layer aggregation without thresholding).This is important since error rates for community detection vanish as jhv ðrÞ ; u ðrÞ ij 2 → 1 [27].
Our final observation is that different threshold values enhance the detectability of different communities.For example, community 1 is detectable when w ¼ 3 for L ≥ 0.8w but not for L ¼ 0.5w [compare panels (a) and (b) to panel (c)].Similarly, community 3 is detectable when w ¼ 9 for L ≤ 0.8w but not for L ¼ w [compare panels (b) and (c) to panel (a)].Interestingly, in this experiment, we were able to identify a combination of parameters ð L; wÞ that allows accurate detection of all four communitiesthat is, jhv ðrÞ ; u ðrÞ ij 2 ≈ 1 for bin W w ðtÞ only when community r is present in time layer t [i.e., t ∈ T r ]; otherwise, jhv ðrÞ ; u ðrÞ ij 2 ≈ 0. We highlight these values of ð L; wÞ in panel (b) with a violet box.However, we stress that these "best" values for ð L; wÞ arise in this experiment because the communities are relatively similar in size (i.e., K r ∈ ½3; 9) and density (i.e., ρ r ¼ 1).In general, one should not expect there to exist one choice of parameters ð L; wÞ to work well for all communities since the detectability-limit criterion given by Eq. ( 18) depends on a complex interplay between the network and community parameters fp l g, ρ L , T r , K r , L, and L.

V. DISCUSSION
There is considerable need to better understand how network preprocessing affects network-analysis methodologies.Herein, we studied how different methods for layer aggregation affect the detectability of small-scale communities in multilayer networks (including multilayer representations of temporal networks).Small-community detection is widely used for anomaly detection in network data [23][24][25][26][27][28]; in cybersecurity, for example, it allows detection of harmful events such as attacks [23], intrusions [24], and fraud [25].Understanding limitations on small-community detection provides insight towards the detectability of these harmful activities.Despite most FIG.8. Detectability of small communities in temporal networks with time-window binning by summation and thresholding.We illustrate by color the values jhv ðrÞ ; u ðrÞ ij 2 given by Eq. ( 16) for each of the four communities r ∈ f1; 2; 3; 4g with the substitutions T r ðW w ðtÞÞ ↦ T and w ↦ L into Eqs.( 13)- (18).Results are shown for bins of width w ∈ f1; 3; 5; 7; 9g for a temporal network with N ¼ 10 4 nodes, L ¼ 32 time layers, and hidden communities as depicted in Fig. 7(a).The communities each contain K r ¼ K ¼ 8 nodes and have different persistence lengths T r .Layer aggregation across each bin was implemented by summation and thresholding at L. Panels (a)-(c) respectively indicate the choices L ¼ w, L ¼ 0.8w, and L ¼ 0.5w.The violet box in panel (b) indicates combinations of thresholds and bin sizes that yield accurate detection of all four communities.We stress, however, that since the criterion by Eq. ( 18) depends on a complex interplay between the community and network characteristics, one should not, in general, expect there to exist a single best combination for all communities.
networks inherently changing in time, previous theory for limitations on small-community detection have been restricted to single-layer networks [26,27] or summationbased aggregation [11].We highlight that our model and analysis generalizes these previous works in several ways: (i) A community has edge probability ρ ∈ ð0; 1 and is not necessarily a clique, (ii) a community can persist across a subset of layers, (iii) the mean edge probability p l can vary across network layers, and (iv) the multilayer or temporal network can simultaneously contain several communities.
Motivated in this way, we developed random matrix theory [27,40] to analyze detectability phase transitions in which the dominant eigenvectors of modularity matrices associated with layer-aggregated multilayer networks localize onto communities, thereby allowing their detection.We developed theory for when a community with K r ≪ N nodes is hidden (i.e., planted) in T r ≤ L layers of a multilayer network with N nodes and L layers.We found a detectability phase transition to occur for a given community r when its size K r surpasses a detectability limit.When layers are aggregated by summation, the detectability limit K Ã r is given by Eq. ( 11) and has the scaling behavior K Ã r ∝ ffiffiffiffiffiffiffi NL p =T r .Surprisingly, if L is allowed to vary, this implies that summation-based aggregation enhances community detection even if the community exists in a vanishing fraction T r =L of layers, provided that T r =L decays more slowly than OðL −1=2 Þ.This result is surprising since layer aggregation still benefits community detection despite the fact that most layers carry no information about the community.
We also introduced and studied the utility of layer aggregation with thresholding as a nonlinear data filter to enhance small-community detection.Our analysis [particularly, Eq. ( 18)] revealed that in addition to implementing sparsification and dichotomization, thresholding can allow super-resolution community detection, whereby the detectability limit decreases by several orders of magnitude (see Fig. 6).In particular, we showed in Sec.III B that KÃ r decays exponentially with ffiffiffi ffi L p =T r for clique detection in layer-aggregated sparse networks filtered by threshold L ¼ T r .
To illustrate practical implications of our results, in Sec.IV we presented an experiment involving the detection of small communities in a time-varying network, highlighting the following key insights: (i) Aggregating time layers into appropriate-sized bins can allow the detection of small communities that would otherwise be undetectable (that is, if the layers were considered in isolation or if all layers were aggregated).(ii) Layer aggregation by summation enhances community detection if the community persists across sufficiently many [specifically, OðL 1=2 Þ] layers; otherwise, it can obscure detection.
(iii) Layer aggregation with thresholding is a filter that can allow super-resolution community detection of small communities that are otherwise too small for detection.(iv) The threshold that best enhances the detection of a small community depends on many parameters, and the detection of multiple communities should, in general, utilize multiple thresholds.We have thus provided a theoretical framework supporting how small-community detection in temporal network data can be improved through network preprocessing in which network layers are binned into time windows and are aggregated using summation with thresholding.This filtering, however, should not be approached as a "one-size-fitsall" procedure.In particular, we find that there exist optimal time window sizes w and layer-aggregation strategies that, in general, are unique to each community (i.e., depending on its size, density, persistence across the layers, etc.).While it is important to consider a range of window sizes and layer-aggregation methods, this leads to an unavoidable trade-off between computational cost and sufficient exploration of different parameters.
Before concluding, we discuss implications of our work regarding the topic of eigenvector localization in complex networks, which is an important topic in network science [44,45] for the study of centrality [46][47][48], spatial analysis [49], and core-periphery structure [50,51].In particular, there is growing interest in extending these ideas to timevarying [52] and multilayer networks [53].Recently, Ref. [54] showed that an Anderson-localization-type transition occurs for material transport on several real-world networks (e.g., interconnected ponds of melting sea ice, porous human bone, and resistor networks) and noted that they did not observe the wave interference and scattering effects that typically occur for Anderson localization (a widely studied phenomenon in which eigenfunctions localize onto defects in disordered materials [55,56]).Reference [54] found the phase transition to coincide with a phase transition in network connectivity due to eigenvector localization onto different connected components.Our work complements these findings, showing that a similar localization phenomenon can be brought on by small communities-that is, localization does not necessarily require network fragmentation.(We note in passing that connected components can be interpreted as one, and perhaps the strictest, notion of a community.)Future research should further explore the connection between community-based and connected-component-based eigenvector localization on networks, and their relationship to Anderson localization in materials.(See Refs.[57,58] for related research using network-based models for disordered and composite materials.)Finally, we highlight other extensions to our work that would be interesting to pursue.Motivated by applications for data fusion, recent research [11] considered weighted averaging of adjacency matrices, allowing them to optimize the weights for the different network layers.It would be interesting to extend our research to weighted averages, which should be fairly straightforward by redefining h•i in Eqs. ( 9)- (11) with weights.We leave open the joint optimization of weighting and thresholding.Finally, it would also be interesting to use our method to study the temporal behavior of communities [59], such as a set of nodes that form a recurring community in different time windows (i.e., periodically or stochastically).

FIG. 2 .
FIG. 2. Eigenvector localization yields detectability phase transition.(a) Entries v ðrÞ i (symbols) of a dominant eigenvector of the modularity matrix for the summation network of a multilayer network with a hidden community of size K r .Parameters include T r ¼ 2, L ¼ 16, N ¼ 10 4 , ρ ¼ 1, and the edge probabilities fp l g of layers are Gaussian distributed with mean hp l i ¼ 0.01 and standard deviation σ p ¼ 0.001.To allow visualization, we assume nodes i ∈ f1; …; Kg are in the community, and we only visualize v ðrÞ i

FIG. 4 .
FIG.4.Effective edge probabilities for threshold-based layer aggregation.Observed (symbols) and predicted values given by Eqs.(13) and (15) (curves) for the effective edge probability of the background network, pð LÞ , and for a community, ρð LÞ r , as a function of L. Network parameters include N ¼ 10 4 , L ¼ 16, T ¼ 5, and σ p ¼ 0.001 and either (a) hp l i ¼ 0.5 or (b) hp l i ¼ 0.01.Note that for the sparse network in panel (b), ρð LÞ undergoes an abrupt drop when L surpasses T r ¼ 5.