• Open Access

Hierarchical Block Structures and High-Resolution Model Selection in Large Networks

Tiago P. Peixoto
Phys. Rev. X 4, 011047 – Published 24 March 2014
PDFHTMLExport Citation

Abstract

Discovering and characterizing the large-scale topological features in empirical networks are crucial steps in understanding how complex systems function. However, most existing methods used to obtain the modular structure of networks suffer from serious problems, such as being oblivious to the statistical evidence supporting the discovered patterns, which results in the inability to separate actual structure from noise. In addition to this, one also observes a resolution limit on the size of communities, where smaller but well-defined clusters are not detectable when the network becomes large. This phenomenon occurs for the very popular approach of modularity optimization, which lacks built-in statistical validation, but also for more principled methods based on statistical inference and model selection, which do incorporate statistical validation in a formally correct way. Here, we construct a nested generative model that, through a complete description of the entire network hierarchy at multiple scales, is capable of avoiding this limitation and enables the detection of modular structure at levels far beyond those possible with current approaches. Even with this increased resolution, the method is based on the principle of parsimony, and is capable of separating signal from noise, and thus will not lead to the identification of spurious modules even on sparse networks. Furthermore, it fully generalizes other approaches in that it is not restricted to purely assortative mixing patterns, directed or undirected graphs, and ad hoc hierarchical structures such as binary trees. Despite its general character, the approach is tractable and can be combined with advanced techniques of community detection to yield an efficient algorithm that scales well for very large networks.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
1 More
  • Received 5 November 2013

DOI:https://doi.org/10.1103/PhysRevX.4.011047

This article is available under the terms of the Creative Commons Attribution 3.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Authors & Affiliations

Tiago P. Peixoto*

  • Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D-28359 Bremen, Germany

  • *tiago@itp.uni-bremen.de

Popular Summary

Many social, technological, and biological networks are known to organize into modules or “communities,” groups of nodes with a similar connectivity pattern. At first sight, the task of identifying and characterizing these modules seems a simple one. This simplicity is, however, deceiving. Despite a large amount of effort aimed at implementing this task, several fundamental questions still remain open: How should one formally quantify the network’s modular structure? Given a formal characterization, what is the optimal method to identify modules from empirical or simulated data? How should one distinguish modules that arise from some underlying organizational principle from those that occur purely by chance? Why do some existing methods have a “resolution” limit, in other words, fail to detect modules smaller than a certain size? And, how does one overcome such limits? In this work, we show that the concepts of statistical inference and modular hierarchy may hold the key to all these questions, in particular, in terms of providing a principled means for distinguishing structure from noise and a detection method without a resolution limit.

We approach the module-detection problem in two steps. First, we view networks as the results of a nested generative model that is capable of incorporating arbitrary hierarchical modular structures present at different size scales. Indeed, this model serves as a general and formally satisfying definition of arbitrary modular structure and provides a more faithful description for many empirical networks. Based on this view, we then use principled statistical inference methods to detect modules of different sizes and show that this detection approach can reliably distinguish genuine modules from spurious ones arising from noise while at the same time not suffering from severe resolution limits present in other approaches.

Our modular-hierarchy approach does not impose any specific constraints on the types of modular structures, and the statistical inference methods are derived from basic principles rather than being ad hoc. Despite its general character, the approach can be effectively combined with some of the existing community-detection techniques for efficient implementations on very large empirical networks.

Key Image

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand
Issue

Vol. 4, Iss. 1 — January - March 2014

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review X

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 3.0 License. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×