Detectability thresholds and optimal algorithms for community structure in dynamic networks

We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.

Relational or interaction variables are common feature of modern data sets, and these are often represented as a network.Examples include friendships or communication within a social network, regulatory interactions among genes, transportation between cities, and relations or hyperlinks in information systems.Many, perhaps most of these systems are also dynamic in nature, and their evolving structure is commonly represented as a sequence of graphs [1][2][3][4][5][6][7][8].Recently, a variety of techniques have been developed for automatically detecting communities-a task that is similar to traditional clustering [9], but on graphs-in these dynamic networks.These techniques include variants of multilayer or temporal modularity optimization [5,10,11], non-negative matrix or tensor factorization [3,6,8,12,13], minimum description length [14,15], and probabilistic models [4,7,[16][17][18][19][20]. See Refs.[21,22] for reviews.Despite these advances, relatively little is known about their optimality or the fundamental difficulty of detecting community structure in dynamic networks.In this paper, we derive a mathematically precise threshold on the detectability of communities in dynamic networks and give two algorithms that are optimal in the sense that they succeed all the way down to this threshold.
Community detection in dynamic networks inherits many of the challenges of community detection in static networks, including learning the number of communities, their sizes and node membership, and the pattern of connections among communities, e.g., assortative, disassortative, core-periphery, etc.It also poses new challenges, because both the network edges and the community memberships may evolve over time.A common approach is to simply take the union of dynamic graphs over a certain time window, and treat the resulting graph with techniques from static network analysis [1], thereby ignoring the dynamics within the window.Here, we explicitly model the dynamic nature of these networks and the way community memberships change over time, integrating information about the communities in an optimal way.
Our approach relies on probabilistic generative models, which can be used to learn latent community structure in real networks via Bayesian inference and to generate synthetic networks with known structure that can be used as benchmarks.A number of such models have recently been proposed for detecting communities in dynamic networks [4,17], including those based on the stochastic block model (SBM) [16,18] and its mixed membership counterpart [7].Indeed, the variant of the stochastic block model [23,24] we analyze here is a special case of some of these models: namely, where nodes change their community membership over time, but where edges are generated independently at each time step.As a result, the network of connections between nodes at different times is locally treelike, which makes a belief-propagation approach asymptotically optimal and allows us to compute the detectability threshold exactly.
In static networks, it has recently been shown that there exists a phase transition in the detectability of communities [25][26][27][28] such that below the transition no algorithm can recover the true communities better than chance (for two groups of equal size) but that efficient algorithms exist above it.Here, we generalize this result to dynamic networks, deriving a mathematically precise expression that describes where the detectability transition occurs as a function of both the strength of the communities and how quickly their membership is changing.When temporal correlations in community membership are present, we show that community detection in dynamic networks improves substantially over detection in static networks (or in a dynamic network where we cluster each graph independently).
Finally, we give two principled and efficient algorithms for community detection in dynamic networks.Specifically, we use belief propagation (BP) to pass messages between neighbors both within a given graph and between timeadjacent graphs to integrate information over the network's history in an optimal way.We then linearize BP to obtain a spectral algorithm, based on a dynamic version of the non-backtracking matrix [29,30].We show experimentally that these algorithms can accurately recover the true community structure in dynamic networks all the way down to the threshold.

I. A DYNAMIC STOCHASTIC BLOCK MODEL
The stochastic block model (SBM) is a classic model of community structure in static networks.Here, we use a variant of the SBM in which the community labels of nodes change over time, but where edges are independent, which is a special case of several models previously introduced for community detection in dynamic networks [4,7,[16][17][18].Crucially, our variant captures the important behavior of changing community labels and is analytically tractable.
Under the SBM a graph G = (V, E) is generated as follows.Using a prior distribution q r over k group or community labels, we assign each of the n nodes i ∈ V to a group g i .We then generate the edges E according to the probability specified by a k×k community interaction matrix p and the group assignments g.In the sparse case, where |E| = O(n), the resulting network is locally tree like and the number of edges between groups is Poisson distributed with parameter c rs = np rs .
In a dynamic network, we have a sequence of graphs G(t) = (V, E(t)) with 0 ≤ t ≤ T , where each graph has its own group assignment vector {g i (t) | i ∈ V, t ∈ {1, . . ., T }}.To generate each such assignment, we draw g i (0) from the prior, where each node has probability q r of being in community 1 ≤ r ≤ k.With probability η, each node keeps its label from one time step to the next g i (t) = g i (t − 1), and otherwise it chooses a new label g i (t) from the prior q r .Formally, the transition probability for community memberships is where δ a,b = 1 if a = b and 0 otherwise.The edges E(t) are then generated independently for each t according to the community interaction matrix p, by connecting each pair of nodes at the same time i(t) and j(t) with probability p gi(t),gj (t) .Note that while the group assignments may change over time, the matrix p remains constant.Subsequently, we use A (t) to denote the adjacency matrix for the graph (V, E(t)) at time t, and D (t) to denote the diagonal matrix of node degrees at time t, i.e.D (t) uw .At successive times in this model, edges are correlated only through the group assignments {g(t)}.Given these, the full likelihood of a graph sequence under this dynamic SBM is where P ({g(t)}) = P (g(0)) ).For our subsequent analysis, we focus on the common choices of a uniform prior q r = 1/k, and where c rs = np rs has two distinct entries: c rs = c in if r = s and c rs = c out if r = s.In this setting the average degree of each graph is then We are interested in the sparse regime where c = O(1), because most real-world networks of interest are sparse (e.g., the Facebook social network), and sparsity allows us to carry out asymptotically optimal inference.Note that the case where every group has distinct average degrees is easier than the equal-average-degree case that we consider, because distinct average degrees give prior information about group memberships.

II. THE DETECTABILITY THRESHOLD IN DYNAMIC NETWORKS
The fundamental question we now consider is, under what conditions can we detect, better than chance, the correct time-evolving labeling of the latent communities in this model?
Previous work on community detection in static networks has shown that there exists a sharp threshold below which no algorithm can perform better than chance in recovering the latent community structure [25,26], at least in the case k = 2.This threshold occurs at positive values of the difference in the internal and external group connection probabilities, meaning that the community structure may still exist, but is undetectable.In terms of the SBM's parameters, this phase transition occurs at In a dynamic network where community memberships correlate across time, we will exploit these correlations to improve upon the static detectability threshold.In the worst case, when these temporal correlations are absent, i.e., η = 0, we should do no worse than the static threshold.To facilitate our analysis, we define an extended graph structure, called a spatiotemporal graph, in which we take G(t) and add special "temporal" edges that connect each node i(t) with its time-adjacent versions i(t − 1) and i(t + 1).Under our model, the "spatial" edges E(t) are independent and sparse, implying that this spatiotemporal graph is locally treelike.
Consider a particular node i(t) as n → ∞ and T → ∞.Moving outward in space and time, inference becomes a tree reconstruction problem, with stochastic transition matrix σ, along each spatial edge where I is the identity matrix, J is the matrix of all 1s, and Similarly, along each temporal edge we have a stochastic matrix Thus, moving along a spatial or temporal edge copies a community label with probability λ or η respectively, and otherwise randomizes it according to the prior.That is, these edges multiply the distribution of labels by the stochastic matrices σ and τ , whose eigenvalues are λ and η, other than the trivial eigenvalue 1 corresponding to the uniform distribution.
Since each node in the spatiotemporal graph has Poi(c) (Poisson-distributed random variable with mean c) spatial edges but exactly two temporal edges, the tree is generated by a two-type branching process.Each spatial edge gives rise to two temporal edges (to each of the time-adjacent versions of its end point), and each temporal edge gives rise to one temporal edge (continuing in the same direction in time), and both give rise to Poi(c) spatial edges.Thus the matrix describing the expected number of children (where we multiply a column vector of populations on the left) . Using the results of Ref. [31], the detectability threshold occurs when the largest eigenvalue of matrix cλ 2 cλ 2 2η 2 η 2 exceeds unity, which yields When η = 0, i.e., when there is no temporal correlation in community assignments over time, Eq. ( 7) recovers the static detectability threshold cλ 2 > 1, which is equivalent to Eq. ( 3).On the other hand, when η = 1, i.e., when the community assignments are fixed across time, we may simply integrate the graph over T , making it arbitrarily dense.We then have detectability for any λ > 0, implying that any amount of community structure can be detected.At intermediate values of η, the detectability threshold falls between these two extremes.This analysis corresponds to robust reconstruction on trees, where we are given noisy information at the leaves of a tree and we want to propagate this information to the root [31].For k = 2 groups, it is known rigorously in the static case [26] that detecting the communities below this bound is information-theoretically impossible.We conjecture that the same is true in the dynamic case.For k > 4 groups, it has been conjectured [25] that it is information-theoretically possible to succeed beyond the Kesten-Stigum bound, but that doing so takes exponential time.

III. BAYESIAN INFERENCE OF THE MODEL
Given an observed graph sequence G(t), we use Bayesian inference to learn the posterior distribution of latent community assignments: FIG. 1.A schematic representation of belief propagation messages (see Eqs.( 9) and ( 10)) being passed along spatial and temporal edges in a spatiotemporal graph.
This distribution is hard to compute in general because the summation runs over an exponential number of terms.However, when the spatiotemporal graph is sparse, as generated by our model, we may make a controlled Bethe approximation (also known as belief propagation (BP) in machine learning and as the "cavity method" in statistical physics) that allows us to carry out Bayesian inference in an efficient and asymptotically optimal way.We now describe a BP algorithm for learning our model form data, which we then linearize to obtain a fast spectral approach, based on a dynamic version of the non-backtracking matrix.This yields two inference algorithms that perform accurately all the way down to the transition.

A. Belief propagation
Instead of inferring the joint posterior distribution, we use belief propagation to compute posterior marginal probabilities of node labels {µ i s (t)} over time.Belief propagation assumes conditional independence of these marginals, which is exact when the graph is a tree and is a good approximation when the graph is locally-tree like, as in our spatiotemporal graph.In our setting, nodes update their current belief about marginals according to the marginals of both their spatial and temporal neighbors.That is, we define two types of messages: spatial messages that pass along spatial edges and temporal messages that pass along temporal edges.Fig. 1 illustrates this message passing scheme for a spatiotemporal graph.
A spatial message µ i→j r (t) gives the marginal probability of a node i at time t being in community r, when we consider node j to be absent at time t.This message is computed as where Z i→j (t) is the normalization.The temporal message µ i(t)→i(t+1) r (or µ i(t)→i(t−1) r ) represents the marginal probability of node i at time t being in community r, when we consider node i to be absent at time t + 1 (or at t − 1) and has a similar form: When t = 0 or t = T , we remove the term corresponding to the temporal edge coming from outside the domain of t.Furthermore, following past work on BP for the static SBM [25,32], we exploit these networks' sparsity to reduce the computational complexity of the spatial updates at the cost of introducing o( 1 n ) corrections in sparse graphs.Specifically, we let µ →i s (t) be the same for all of 's non-neighbors i.We then model the effects of all such non-edges as an adaptive external field on each node, which depends on the current estimated marginals µ s (t).That is, we let s (1 − p rs )µ s (t) ≈ e −hr(t) , where h r (t) = 1 n s c rs µ s (t) , which has the effect of preventing belief propagation from putting all the nodes at a given time into the same community.The adaptive fields only need to be updated after each BP iteration.This approximation yields a significant improvement in efficiency, reducing the computational complexity to be proportional to total number of edges in the spatiotemporal graph cnT , rather than n 2 T .
Once the BP messages converge, we compute the marginal probability µ i r (t) that node i belongs r at time t.This is identical to ( 9) and ( 10), except that we take all incoming edges into account.We then obtain a partition by marginalization, which assigns each node to its most-likely group: It is well known in Bayesian inference [33] that if the marginals are exact, then the marginalized partition is the optimal estimator of the latent community labels.Because spatiotemporal graphs under our model are sparse, we know that with n → ∞, the marginals given by BP are asymptotically correct.Thus, our BP algorithm succeeds all the way down to the detectability threshold given by Eq. ( 7), and gives an asymptotically optimal partition in terms of accuracy.

B. Spectral clustering
The BP equations described above can be linearized to obtain a fast spectral approach for detecting community structure in dynamic networks.It is easy to verify that in our setting, when q r = 1/k, the average degree in each group is c.This implies that BP equations will always have a solution which we call a factorized fixed point.This fixed point only reflects the permutation symmetry in the system, and could be unstable due to random perturbations.If we use the correct parameters in BP equations, i.e., the same parameter used to generate the observed network, then in the language of physics we would say that system is in the Nishimori line [33].That is, if the BP messages deviate from the factorized solution, then they are correlated with the latent community labels and we say that there is no spin glass phase in system [33].This allows us to simplify the BP equations by studying how the messages deviate from the factorized solution, which results in a linearized version of BP.In the static SBM, this linearization is equivalent to a spectral clustering algorithm using the non-backtracking matrix [29].
To do this, we rewrite the BP messages µ i(t)→j(t) r as the uniform fixed point 1 k plus deviations away from it.The vector of deviations is given by and the linearized BP equations are then The detectability thresholds for each choice of η, according to Eq. ( 7), are shown as vertical lines in lower panel, and the hatched area shows the region of detectability for static networks [25].Each data point is the average of 100 instances of dynamic networks from our model, with n = 512, T = 40, and k = 2 groups, with average degree c = 16.
where ∂i(t) means neighbors of i(t), U and V denote derivatives evaluated at the factorized fixed point: Solving Eq. ( 13) amounts to finding eigenvectors of the Jacobian matrix B composed of derivatives of the BP messages.However, the size of the matrix B is (cn × T + 2n × (T − 1)) 2 , which is relatively large for an eigenvector problem.Using the non-backtracking matrix approach [29], we convert this problem into a smaller eigenvector problem of size 4nT × 4nT by defining where I denotes the nT -dimensional identity matrix; A temp is the adjacency matrix of temporal edges with A temp (u,t),(v,t ) = δ uv (δ t,t +1 + δ t,t −1 ) ; D temp is the diagonal matrix of temporal degrees with D temp (u,t),(u,t) = 2 if 0 < t < T , and 1 if t = 0 or t = T ; A spatial is the nT -dimensional matrix consisting of all the spatial edges, i.e., A spatial = t A (t) meaning A (u,t),(v,t ) = δ tt A (t) uv ; D spatial = t D (t) is the diagonal matrix of spatial degrees where D (u,t),(u,t) = D (t) uu .We now obtain a spectral clustering algorithm using B in the following way: given a spatiotemporal graph, we construct matrix B , then take vectors composed of first n entries of eigenvectors associated with the largest (absolute) eigenvalues, and finally perform k-means clustering on matrix composed of the vectors.This yields a partition of the nodes; if desired number of clusters is two, then we simply use the sign of entries of the vector to separate nodes into two communities.
From the principle of linearization, we know that real eigenvalues of the non-backtracking matrix B describe stability properties of fixed points of the BP equations, i.e., if there is a real-valued eigenvalue larger than unity, it represents a stable fixed point in the equations.Moreover, if the BP equations have a stable fixed point, then B should have a real-eigenvalue that is larger than unity, denoting a partition of the nodes that correlates with the latent community labels.Thus, our spectral clustering algorithm should work as long as BP works, implying that it also works all the way down to the detectability transition in sparse networks.
In Fig. 2 (left) we show the spectrum of B in the complex plane for a network in the detectable regime, generated by the model.As with existing non-backtracking approaches [29], most of the eigenvalues are confined to a disk, while several real eigenvalues fall outside this disk.In this example, entries of the eigenvector associated with the largest real eigenvalue have the same sign, hence the leading or "ferromagnetic" eigenvector does not yield information about the latent community structure.In practice, we can perform regularizations to push such ferromagnetic eigenvectors back into bulk, thereby lifting the eigenvectors correlated with the latent community structure to the top positions.Eigenvectors associated with other real eigenvalues outside the bulk are correlated with the latent community structure.In this case, because we have two groups, we obtain the inferred partition by using the sign of entries of second real eigenvector v 2 .The detectability threshold from Eq. ( 7) is shown as a solid line.Each point shows the average over 100 instances of dynamic networks drawn from our model with n = 512, T = 40, k = 2 groups, and average degree c = 16.

IV. NUMERICAL VERIFICATION
To verify our claims of the detectability transition in dynamic networks, and the accuracy of our algorithms, we conduct the following numerical experiment.Using our generative model of dynamic networks with community structure, we generate a number of dynamic networks for various choices of ( , η).When = c out /c in = 0, communities are maximally strong, with every edge being located within a community, while at = 1, we have Erdős-Rényi random graphs with no community structure.We then use our BP or spectral algorithm to infer the group assignments, assuming within each sequence that parameters {η, , c} are known.For each choice of ( , η), we average our results over 100 dynamic networks with T = 40 graphs and n = 512 nodes (for 20, 480 nodes total), with an average degree c = 16, divided into k = 2 latent communities.
We measure the accuracy of the inferred community labels by the overlap between the latent partition g * and the inferred one ĝ.This is the fraction of nodes labeled correctly, maximized over all k! permutations of the groups, normalized so that it is 1 if ĝ = g * and 0 if ĝ is uniformly random.In Fig. 2 (right) we show the overlap obtained by BP for dynamic networks as a function of for several choices of η.The detectability threshold for each η, from ( 7) is shown as vertical lines in the lower panel.When η = 0, we recover the static detectability threshold given by Eq. ( 3).As we increase η, the phase transition occurs at increasing values of , as predicted, with the largest increase occurring when η = 1.
Similar results are obtained for other choices of n and T , with better agreement for larger networks.The slight deviation between numerical and analytic transition points observable in Fig. 2 right is a finite-size effect, which we numerically estimated to decrease like O( √ nT ).
Figure 3 show the overlap throughout the ( , η)-plane, using both BP and spectral algorithms, along with the line of the threshold given by Eq. (7).Notably, both algorithms perform similarly: they have large overlap with small , indicating that the learned partition is highly correlated with the latent community structure.As increases (weaker community structure), both algorithms encounter a second-order phase transition in which the overlap decreases from a finite value to zero.Separate numerical experiments indicate that the convergence time of BP diverges in the vicinity of the phase transition, which agrees with past work on the detectability threshold in static networks [25].We also find that at each point in ( , η)-plane, the accuracy of BP is always larger than that of the spectral algorithm, especially away from the transition, reflecting the optimality of our BP algorithm.

V. CONCLUSIONS
We have derived a mathematically precise and general limit to the detectability of communities in dynamic networks.This threshold assumes a probabilistic model of community structure that is a special case of several previously developed methods to detect dynamic communities: specifically, where nodes may change their community membership over time, but where edges are generated independently at each time step.We also gave two efficient algorithms for learning latent community structure that are optimal in the sense that they succeed all the way down to the detectability threshold in dynamic networks.
A simple extension of our algorithm is to apply our BP equations to a dense network consisting of all spatial edges from all graphs projected to the time t, handling the message passing over time steps by using a damping factor τ |t−t | .This approach extends our analysis to networks that evolve in continuous time rather than in discrete time steps.
For larger numbers of groups, such as k > 4, it has been conjectured [25] that there is a "hard but detectable" regime where the factorized fixed point described in Section III B is locally stable, but where one or more accurate fixed points exist as well.In such a regime, community detection is information-theoretically possible, but we believe that it takes exponential time (though see [34] for the case where the number of groups grows with n).We propose this as a direction for further work.
Other directions for future work include handling cases where the community interaction matrix p may also change over time (a situation similar to change-point detection in networks [35]), where edges are not generated independently at each time step, or where networks have edge weights [32] or node annotations.

5 FIG. 2 .
FIG.2.(left) Spectrum (in the complex plane) of matrix B for a network generated by our model with n = 300, c = 3, k = 2 groups and ( , η) = (0.05, 0.5).The complex eigenvalues are circumscribed by the circle.(right) Overlap as a function of for different values of η (given in the legend).The detectability thresholds for each choice of η, according to Eq. (7), are shown as vertical lines in lower panel, and the hatched area shows the region of detectability for static networks[25].Each data point is the average of 100 instances of dynamic networks from our model, with n = 512, T = 40, and k = 2 groups, with average degree c = 16.

FIG. 3 .
FIG.3.Heat maps showing the numerically estimated overlap for (left) belief propagation and (right) spectral algorithms.The detectability threshold from Eq. (7) is shown as a solid line.Each point shows the average over 100 instances of dynamic networks drawn from our model with n = 512, T = 40, k = 2 groups, and average degree c = 16.