Abstract
Applied network science often involves preprocessing network data before applying a network-analysis method, and there is typically a theoretical disconnect between these steps. For example, it is common to aggregate time-varying network data into windows prior to analysis, and the trade-offs of this preprocessing are not well understood. Focusing on the problem of detecting small communities in multilayer networks, we study the effects of layer aggregation by developing random-matrix theory for modularity matrices associated with layer-aggregated networks with nodes and layers, which are drawn from an ensemble of Erdős–Rényi networks with communities planted in subsets of layers. We study phase transitions in which eigenvectors localize onto communities (allowing their detection) and which occur for a given community provided its size surpasses a detectability limit . When layers are aggregated via a summation, we obtain , where is the number of layers across which the community persists. Interestingly, if is allowed to vary with , then summation-based layer aggregation enhances small-community detection even if the community persists across a vanishing fraction of layers, provided that decays more slowly than . Moreover, we find that thresholding the summation can, in some cases, cause to decay exponentially, decreasing by orders of magnitude in a phenomenon we call super-resolution community detection. In other words, layer aggregation with thresholding is a nonlinear data filter enabling detection of communities that are otherwise too small to detect. Importantly, different thresholds generally enhance the detectability of communities having different properties, illustrating that community detection can be obscured if one analyzes network data using a single threshold.
1 More- Received 14 September 2016
DOI:https://doi.org/10.1103/PhysRevX.7.031056
Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Popular Summary
Networks can represent relationships between the online behaviors of individuals such as emailing, file sharing, and purchasing history. Small-community detection, which attempts to identify anomalous clusters within such networks, has important applications in cybersecurity for detecting attacks, intrusions, and fraud. Before analyzing the structural patterns of these networks, data are often preprocessed using a variety of techniques. While there is a wealth of theory for network analysis methodology, most preprocessing is done on an ad hoc basis, and there is significant need for an encompassing theory bridging these two steps.
Here, we focus on the task of detecting small communities in large multilayer networks, wherein “layers” encode different types of connections, such as different instances in time. Because it can be beneficial to aggregate layers that are similar, we analyze the effects of layer aggregation on the detectability of communities. We analyze a novel, important metric: the number of layers a community must persist across in order for aggregation to benefit detection. In addition, we introduce layer aggregation with thresholding as a nonlinear data filter, showing that this preprocessing step can allow super-resolution detection of communities that are otherwise too small to detect.
This work paves the way for a new class of holistic methods that simultaneously address both the data preprocessing and network analysis steps. Such an approach will lead to improved pattern analytics in cybersecurity and broadly benefit the analysis of diverse networks arising in the social, biological, and engineering sciences.