Consistency between ordering and clustering methods for graphs

A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.

When a graph consists of subgraphs in each of which vertices are densely connected, the graph structure is referred to as a community structure. A common approach for extracting a community structure is the partitioning of graphs, termed community detection [2,3] or graph clustering [1]. In this approach, an algorithm assigns a group label to each vertex such that vertices with the same group label are densely connected. Alternatively, we may also identify densely connected vertices through an ordering method that infers the optimal ordering of vertices such that vertices close to each other in the sequence are densely connected. The corresponding optimization problems are collectively termed the minimum linear arrangement [11][12][13] or envelope reduction [14,15], and the inferred structural property is called a banded structure or sequentially local structure [10]. As exemplified in Fig. 1, the densely connected vertices are clearly detected by appropriately visualizing the graph and the adjacency matrix based on an appropriate vertex ordering.
Despite the similarity between these two approaches, the clustering problem has received considerable attention in the literature. Figure 2 shows the number of articles with keywords that represent ordering (pink bars) or clustering (blue bars) problems. Most of the keywords for ordering problems represent more general matrix or-FIG. 1. Simple example of a graph (left) and its adjacency matrix (right) for identifying a community structure through the optimal ordering of vertices without specifying the group labels. The (i, j) element of the adjacency matrix is one (highlighted) when vertices i and j are connected, and zero (not highlighted) otherwise.
dering problems rather than vertex ordering problems for graphs (i.e., adjacency matrices), whereas the keywords for clustering problems mostly capture problems for graphs. Clustering methods have been studied and applied much more actively than ordering methods.
Spectral methods are popular in both ordering and clustering problems; the former and the latter are respectively termed spectral ordering [8,14,15] and spectral clustering [17][18][19]. In both methods, the leading eigenvector(s) of a Laplacian or its variant is used to identify the optimal ordering or clustering of vertices. Specifically, when a graph is partitioned into two groups based on the sorting of the eigenvector elements [17], the result of spectral clustering is generally consistent with the vertex sequence inferred by spectral ordering. However, spectral ordering and clustering algorithms are not generally consistent. For instance, when graphs are partitioned into more than two groups, it is common to employ the K-means algorithm [20] on K(> 2) leading eigenvectors to achieve a K-way partitioning [19]. By contrast, to identify the optimal vertex sequence using the spectral ordering method, we always use the eigenvector associated with the second leading eigenvalue. Therefore, it is nontrivial to determine the extent to which the two methods are quantitatively consistent. Even when we partition a graph into two groups, the result of spectral clustering may not be consistent with the vertex sequence obtained by spectral ordering when the K-means algorithm is used to obtain a partition.
We conduct a systematic investigation to evaluate the consistency between the spectral ordering and clustering methods. We first introduce a generic measure, referred to as the label continuity error (LCE), to quantify the difference between a sequence and partition for a set of elements (e.g., vertices of graphs). Intuitively, a sequence and partition are more consistent with each other if, for a given number of groups, the group label flips less often when following the elements in the specified order. We provide a more precise definition in the next section. Although we use this measure throughout the study, it is not the only method of quantifying consistency; we will revisit this point in Sec. V.
There are also several modern spectral clustering algorithms with unexplored ordering counterparts. These include the methods based on the modularity matrix [2,22], Bethe Hessian [23,24], and regularized Laplacian [25][26][27][28][29]. To fill this gap, we show how spectral ordering algorithms can be derived from optimization problems using the matrices on which these modern spectral clustering methods are based. Spectral ordering problems based on these matrices are formulated as variants of the classical spectral ordering problem [14,15] with different penalty terms and/or constraints.
The remainder of this paper is organized as follows.
Section II formally introduces the LCE to quantitatively evaluate the consistency between ordering and clustering methods and examine its properties. Section III formulates spectral ordering methods corresponding to existing spectral clustering methods for graphs. Using the LCE introduced in Sec. II and the methods formulated in Sec. III, we analyze the consistency between spectral ordering and clustering methods using synthetic and realworld networks in Sec. IV. Finally, Sec. V discusses the results of this study.

II. LABEL CONTINUITY ERROR
Let G(V, E) be a graph, where V = {v 1 , . . . , v N } is the vertex set and E is the edge set. We assume that every vertex in the graph is distinguishable and let I = {1, . . . , N } be the ordered set indicating the original sequence of the vertices which corresponds to the subcripts in {v 1 , . . . , v N }. For vertex v i ∈ V (i ∈ I), we denote π(i) = π i ∈ {1, . . . , N } as the index after permutation (i.e., we use π as both a mapping and a variable) and π = {π(i)|i ∈ I} as the reordered sequence of the vertices. Similarly, we denote σ(i) = σ i ∈ {1, . . . , K} as the group label of vertex v i and σ = {σ(i)|i ∈ I} as the partition of the vertex set. We also denote . Throughout this study,π andσ represent the inferred sequence and partition using algorithms, respectively. We denote d i for the degree of vertex v i .

A. Definition
We introduce a measure to quantify the consistency between a sequence π and partition σ. We define the sequence π as consistent with σ if vertices with the same group label are maximally adjacent to each other in the sequence π. For instance, if the original indices I are consistent with group labels σ, is maximized, where δ(a, b) represents the Kronecker delta; Fig. 3(a) presents an example. To evaluate the consistency between π and σ, we introduce a measure that we refer to as the label continuity, defined by where π −1 is the inverse mapping of π and i is the index label after the permutation; that is, π −1 (i ) is the label i in the original indices satisfying π(i) = i . The number of times that the group labels are flipped when following the vertices in the order of π is expressed as (N − 1)(1 − C (π, σ)), for which the group labels must be flipped at least K − 1 times. Considering this feature, we define the label continuity error (LCE) as Hereafter, we abbreviate C (π, σ) and ∆ (π, σ) as C and ∆, respectively, as long as there is no possibility of confusion. For a given partition σ and different vertex sequences, we can evaluate which vertex sequence is more consistent with σ using the LCE (e.g., Figs. 3(a) and 3(b)). Similarly, for a given sequence π and different partitions, we can also evaluate which partition is more consistent with π, keeping the group sizes {N k } fixed (e.g., Figs. 3(a) and 3(c)).

B. Properties of the LCE
The LCE can take only small values when the number of groups K is very small or large. For example, it is obvious that ∆ is zero when K = 1 or K = N . In other words, the resolution of the LCE is low in such regions. Moreover, this property would depend on the distribution of the group sizes {N k }. In this section, we quantify these intuitions.
The minimum value of ∆ is zero by construction. The maximum value of ∆ is obtained when labels are flipped the maximum number of times. The maximization of ∆ by optimizing the sequence π, given an arbitrary partition σ with {N k }, is equivalent to maximizing ∆ by optimizing partition σ (constrained to {N k }) for a given sequence π. We denote the maximum by max ∆ as As derived in Appendix A, we have where · denotes the ceiling function. We next investigate statistical properties of the LCE. First, we calculate the probability P(m) that the number of times that two consecutive vertices in a sequence have the same group label is m, where m = (N − 1)C (π, σ). When any sequence realizes at random, we have where σ is an arbitrary partition with group sizes {N k } and the sum is over all possible sequences (|{π }| = N !). Note that Eq. (6) is also a distribution in which each distinct partition realizes at random. This equivalence might sound peculiar because there are only N !/ K k=1 N k ! distinct partitions, whereas there are N ! possible sequences. However, because every distinct partition is overcounted exactly K k=1 N k ! times in the summation of Eq. (6), the distribution P(m) is identical for both random sequences and random partitions.
Although Eq. (6) is a straightforward expression, a strict constraint on {N k } makes analytical calculations complicated. Therefore, we instead calculate the distribution of bootstrapped group labels σ * as an approximation. That is, we generate a random group assignment σ * by sampling independently from the empirical distribution Prob[k] = N k /N (k ∈ {1, . . . , K}); in other words, we randomly resample group labels from σ with replacement. The distribution of group labels σ * is This approximation for random group labels is expected to be accurate if each element in {N k } is sufficiently large. Using the bootstrapped group labels, the mean value of C is obtained as Therefore, the mean value of LCE under random partitioning is As the LCE does not practically become greater than ∆ ({N k }), this mean value is a more meaningful reference value than Eq. (5) as the upper bound. We can also derive the variance Var[∆] (the derivation is shown in Appendix B) as showing that ∆ converges to E [∆] by the law of large numbers. Furthermore, in Appendix C, we show that the probability distribution is asymptotically normal when the group sizes are equal and K = O(1), implying that higher-order moments will vanish. Let us summarize the results we obtained in this section. As the number of groups K (> 1) increases, the upper bound of the LCE (max ∆ in Eq. (5)) decreases monotonically as long as the partitions are not highly skewed, i.e., max k N k < N/2 . However, as illustrated in Fig. 4, the LCE for a random sequence (∆ in Eq. (9)) is a convex function with respect to K. When K is small, the LCE increases because the chance for label flips increases, while the LCE decreases owing to the increase in the minimum number of label flips. For equipartitioning ( Fig. 4(a)), the mean LCE ∆ is peaked at an integer of approxmately K = √ N − 1. As a partition becomes more skewed (Fig. 4(b)), max ∆ and ∆ are peaked at larger values of K. Therefore, when evaluating the LCE, we must implement appropriate normalizations.
In this study, we focus on comparing partitions with the same number of groups K. In Appendix D, however, we discuss nested partitions (subpartitions of another partition) as an example in which different partitions have different numbers of groups.

III. SPECTRAL ORDERING METHODS
In this section, we describe variants of spectral ordering methods using different matrices. After reviewing the derivation of the standard methods based on the unnormalized and normalized Laplacians, we show how spectral ordering problems can be formulated with the modularity matrix, regularized Laplacian, and Bethe Hessian.

A. Unnormalized Laplacian
Spectral ordering is derived as a continuous relaxation of the discrete optimization problem called envelope re- duction [14]. This problem optimizes the vertex sequence π such that each connected pair of vertices is located close to each other in the sequence. To this end, the following objective function is considered: which is the sum of squared distances (π i − π j ) 2 with respect to the set of connected vertices. The sequence that minimizes this function is the solution to envelope reduction.
As the minimization of Eq. (11) is not computationally feasible, we consider its continuous relaxation. That is, we represent π using a continuous vector x ∈ R N . However, if we simply replace π with x, x = 0 would be the trivial minimizer of H 2 (x; A). Thus, we constrain x such that i is a positive constant (i.e., the spherical constraint) to reflect the fact that N i=1 π 2 i is positive regardless of the choice of sequence. Therefore, we consider the minimization of the following function: where λ is the Lagrange multiplier. The extremum condition in Eq. (12) yields the following eigenvalue equation with an eigenvector ν: Here, L ≡ D − A is the unnormalized (or combinatorial) Laplacian, where D = diag(d 1 , . . . , d N ) is the degree matrix (d i = N j=1 A ij ). Although we would have a vector proportional to 1 (a vector of ones) as the minimizer of Eq. (12), which is also the eigenvector associated with the smallest eigenvalue of L, we cannot infer the optimal sequence from 1 because all the elements are identical. Therefore, we exclude vectors proportional to 1, which is equivalent to imposing a perpendicular constraint to 1 in Eq. (12), i.e., N i=1 x i = 0. Then, the minimizer of the objective function is the eigenvector ν 2 of L associated with the second-smallest eigenvalue.
The estimate of the optimal sequenceπ using the spectral ordering method iŝ where ν 2i is the ith element of ν 2 , and rank(ν 2i ) is the index of ν 2i in an array in which the vector elements of ν 2 are sorted in the ascending or descending order.

B. Normalized Laplacian
A spectral ordering method with the normalized Laplacian was derived in [15]. Note that the objective function (11) does not have a periodic boundary condition. Therefore, while the distance from one vertex at the end of the sequence to another vertex ranges from 1 to N − 1, the distance from the vertex at the middle of the sequence ranges from 1 to N/2 , where · is the floor function. This implies that when a graph has a vertex with a considerably large degree (i.e., a hub), it is typically more beneficial for the minimization objective to assign such a vertex near the middle of the sequence. To incorporate this feature, we replace the spherical constraint in Eq. (12) with the following ellipsoidal constraint: which tends to restrict x i with a large d i to be relatively small (recall that a variable with a large coefficient typically has relatively small values on an ellipsoid). Therefore, Eq. (15) constrains x such that x i of a hub vertex v i is near the origin, and when x is discretized, the hub vertices are likely to be located near the middle of the sequence. Note also that the mean of {x i } is located at the origin because of the perpendicular constraint N i=1 x i = 0. Consequently, Eq. (13) is replaced with the following generalized eigenvalue equation with respect to its second-smallest eigenvalue λ 2 : This is equivalent to where L ≡ D − 1 2 LD − 1 2 is the normalized Laplacian and z 2 ≡ D 1 2 ν 2 . As ν 2 is a continuous relaxation of the sequence π, we estimate the optimal sequenceπ aŝ π = rank(d

C. Modularity matrix
The modularity matrix Q appears in the spectral clustering method for modularity maximization in community detection [2]. The matrix element is commonly defined as where M is the total number of edges in the graph.
To formulate the spectral ordering problem with the modularity matrix, we again consider the objective function H 2 (π; A) in the envelope reduction problem and its continuous relaxation with the spherical constraint Herein, we add the following penalty terms to the objective function: The penalty terms ensure that {x i } are "balanced" around the origin. The first term prohibits {x i } for hub vertices from being located only on the positive or negative side of the real interval [−1, 1]. Owing to the second term, {x i } associated with hub vertices also tend to be away from the origin. Therefore, the penalty term Eq. (20) decreases when {x i } are more symmetrically distributed around the origin. Using Lagrange multipliers, the objective function to be minimized is then Here, we do not impose the perpendicular constraint in Eq. (21), because a vector proportional to 1 is not a trivial minimizer. The extremum conditions in Eq. (21) yield where λ 1 represents the largest eigenvalue of Q and ν 1 is the associated eigenvector. ν 1 is the minimizer in Eq. (21) provided that it is not a vector proportional to 1. Analogously to Eq. (14), we estimate the optimal sequenceπ asπ The ellipsoidal constraint enforces {x i } for hub vertices to be concentrated around the origin, whereas the penalty terms (20) enforce them to be evenly distributed at both the positive and negative ends of the real line. Therefore, the results of the spectral ordering methods using the normalized Laplacian and modularity matrix are expected to be quite distinct for graphs with heterogeneous degree distributions.

D. Bethe Hessian
Bethe Hessian is also a matrix that is originally formulated to perform spectral clustering [23,24]. This method is inspired by the statistical inference of the stochastic block model, which will be explained in Sec. IV A. This section considers a spectral ordering method using the Bethe Hessian.
The derivation of spectral ordering with the Bethe Hessian is analogous to that with the normalized Laplacian. However, instead of imposing an ellipsoidal constraint i as a penalty term. Thus, we consider the following objective function: where τ is an arbitrary constant (hyperparameter) that can be either positive or negative. To avoid the trivial minimizer x = 0, we impose the spherical constraint N i=1 x 2 i = 1. Using Lagrange multipliers, the objective function to be minimized is then where is a matrix element of Bethe Hessian. The extremum conditions in Eq. (25) yield an eigenvalue equation with respect to B.
We estimate the optimal sequenceπ as follows: where ν 2 is the eigenvector associated with the secondsmallest eigenvalue λ 2 , i.e., Note that there is no guarantee that ν 2 always provides the best estimate in terms of H 2 (π; A) among all the eigenvectors. In fact, we confirmed that the eigenvector that yields the best estimate in terms of H 2 (π; A) (when we employ the rounding rule in Eq. (27)) depends sensitively on the value of r, particularly when r is small (see Sec. S1 in Supplemental Material for details). However, we employ Eq. (27) because the estimate with ν 2 offers the smallest value of H 2 (π; A) as long as r is sufficiently large. Throughout this study, we set r , because it is a commonly employed value in spectral clustering. The hyperparameter τ is negative when r > 1. Thus, {x i } for hub vertices are aligned near the ends of the real line [−1, 1] so that a sequence achieves a lower value of Eq. (24) using the penalty term. By contrast, when r < 1 (τ > 0), {x i } for hub vertices are likely to be located near the origin, implying that the resulting sequence is similar to that obtained by the spectral ordering method based on the normalized Laplacian.

E. Regularized Laplacian
During the past decade, it has been found that the performance of the Laplacian-based spectral clustering can be considerably improved by adding a constant value to every element in the adjacency matrix [26,28] or the diagonal elements in the degree matrix [25,27]. Although the two variants of the Laplacian are often termed differently, we collectively refer to them as the regularized Laplacian [28] for simplicity, and we denote the former version of the regularized Laplacian as L (τ ) and the latter version as L. The spectral clustering method based on L can also be interpreted as a continuous relaxation of the minimization of the core cut function [29]. This section considers the spectral ordering method using a regularized Laplacian.
Similar to the formulation of the spectral ordering method with the modularity matrix, we consider the continuous relaxation of H 2 (π; A) with a penalty term. We consider the following objective function: which is minimized with respect to the continuous vector x. τ is an arbitrary positive constant (hyperparameter) and is the variance with respect to the elements in x. To ensure that x is not a vector of zeros, we impose the following ellipsoidal constraint: By incorporating this constraint, the objective function to be minimized is given by Because a vector proportional to 1 is a trivial minimizer of Eq. (32), we also impose the constraint that Although most of the coordinates on the ellipse have x1 > x2, the variance is smaller when x1 and x2 are closer.
x is perpendicular to 1, i.e., where I is the identity matrix. λ 2 is the second-smallest eigenvalue of the generalized eigenvalue equation and ν 2 is the associated generalized eigenvector. Equation (33) is equivalent to where is the regularized Laplacian and z 2 = (D + τ I) 1/2 ν 2 . Similar to the spectral ordering method with the normalized Laplacian, we estimate the optimal sequenceπ asπ The contribution of hub vertices is more complicated than that of other methods. As shown in Fig. 5, whereas {x i } for the hub vertices tend to be relatively small because of the ellipsoidal constraint, the variance Var [x] is minimized when all {x i } have the same value. Therefore, when the hyperparameter τ is small, the result is similar to that obtained by using the spectral ordering method based on the normalized Laplacian. As τ increases, the hub vertices are less likely to be located in the middle of the sequence because of the penalty term.
As mentioned above, we also consider as the definition of a regularized Laplacian. Unlike L (τ ) , only the degree matrix is perturbed by a constant value in L. If we consider as the objective function to be minimized, and impose the ellipsoidal constraint (31), we obtain the eigenvalue equation with respect to L as a result of the extremum conditions. If we also impose the constraint (38), this objective function becomes equivalent to Eq. (29). We do not have such a constraint because L does not have 1 as a trivial eigenvector unlike L (τ ) . The eigenvectors of L (τ ) and L are therefore distinct. As a spectral ordering method with the regularized Laplacian L, we replace z 2 in Eq. (36) with the eigenvector associated with the second-smallest eigenvalue of L. Here, we use the second-smallest value because L approaches L as τ → 0, and L has z 1 ∝ 1. Hereafter, when we refer to the spectral ordering method with the regularized Laplacian, we employ L because it is more computationally efficient. Throughout this study, we set τ as the average degree of the graph, as it is a commonly employed value [27].
The constraints and penalty terms for each method are summarized in Table I. Compared to the classical method based on the normalized Laplacian, where hub vertices are concentrated around the middle of the sequence ("hub-centered"), the spectral ordering methods obtained with the modularity matrix, Bethe Hessian, and the regularized Laplacian may assign hub vertices at both ends of the sequence ("hub-at-the-corner"). Particularly, for the Bethe Hessian and the regularized Laplacian, we can choose the "hub-centered" or "hub-at-the-corner" alignment by tuning the hyperparameter.
Although we found the penalties and constraints that provide the spectral ordering methods corresponding to the ones considered in spectral clustering, we have not confirmed whether these choices of penalties and constraints are unique. In addition, there is no guarantee that the resulting spectral ordering methods exhibit high performance in practice. The next section investigates the practical performance of these spectral ordering and clustering methods using synthetic and real-world datasets. Although one might expect all the methods to work similarly when the graph is close to regular, it is not trivial to determine whether this always holds; this is investigated using synthetic datasets in Secs. IV A and IV B. The effect of heterogeneous degree distribution in each method is examined using real-world datasets in Sec. IV C.

IV. PERFORMANCE ANALYSIS
We conduct a numerical performance analysis of the spectral ordering and clustering methods using synthetic small τ : Concentrate around the middle large τ : Avoid concentration around the middle Regularized Laplacian small τ : Concentrate around the middle large τ : Avoid concentration around the middle graphs and real-world networks. For experiments on synthetic graphs, we consider a random graph model with a prespecified module structure, which is referred to as the stochastic block model (SBM) [30][31][32], and a random graph model with a prespecified sequentially local structure, which is referred to as the ordered random graph model (ORGM) [10].

A. Stochastic block model
The SBM is often used as a generative model for the inference of module structures in graphs [33,34] and in several theoretical studies in the community detection literature [35]. In the SBM, each vertex has a "planted" (or preassigned) group assignment; we denote the corresponding partition as σ B . Each vertex pair is connected by an edge, independently and randomly, based on the planted group assignments. The probabilities for the upper-right elements of the adjacency matrix are given as follows: where p k is the probability that a vertex in group k and vertex in group are connected (in Eq. (39), k = σ B i and = σ B j ). We have A ij = A ji for any pair of elements because we consider undirected graphs. In general, the SBM can generate graphs with complex module structures. Herein, however, we focus on the SBM with a community structure that is characterized by the follow-ing group-wise connection probability: where 0 < p out ≤ p in ≤ 1, that is, vertices are more densely connected within the same planted group than between different groups. In particular, when the group sizes are equal, it is common to parametrize the model using the average degree c and the fuzziness parameter , which are related to p in and p out as As approaches unity, the planted community structure becomes less clear. This particular case of the SBM is known as the planted partition model [36]. For a given average degree c, the critical value of above which an algorithm cannot detect the planted block structure better than chance is called the (algorithmic) detectability limit [37][38][39][40][41][42].
Using the SBM and spectral ordering methods, we investigate the following questions: To answer these questions, we apply both spectral ordering and clustering methods to graphs generated by the SBM. We first investigate the former question. Figure 6 shows the results of spectral ordering applied to instances of SBM. Vertices in the same planted group are indeed located closely in the inferred sequence when the community structure is strong. Even when the community structure is weak, the planted group labels and the inferred sequence are correlated. In both examples, the boundaries of the groups are ambiguous. Therefore, if we do not know the planted group labels (and without the coloring of the adjacency matrix elements), it is not clear whether the identified structure is a community structure or a banded structure from the reordered adjacency matrix. Note that, as discussed in [10], even when we generate a graph from a uniformly random graph model, one could identify a weak banded structure owing to the ordering of vertices.
Next, we address the latter question. The consistency between the sequence inferred by a spectral ordering methodπ and planted partition σ B is measured with the normalized LCE ∆ π, σ B /∆ {N B k } . Here, {N B k } is the set of group sizes in σ B and ∆ {N B k } is the LCE under a random sequence defined in Eq. (9). When ∆ π, σ B saturates (i.e., the normalized LCE equals unity) as increases, the spectral ordering method does not infer σ B better than random; it is deemed that the algorithm has reached the detectability limit.
The consistency between the inferred partitionσ by a spectral clustering method and planted partition σ B is measured using the normalized mutual information (NMI) [43], which is defined as where is the entropy with respect to the frequency of group labels, and is the mutual information. Here, q(k, k ) is the fraction of cooccurrences that a vertex belonging to group k in partition σ 1 belongs to group k in partition σ 2 . The NMI is unity when a pair of partitions coincides perfectly. When NMI σ, σ B reaches (nearly) zero as increases, the spectral clustering method does not infer σ B better than random, which again represents the detectability limit. The detectability analysis of spectral clustering methods is not new and has been analyzed in several theoretical and benchmark studies [40-42, 44, 45]. We evaluate ∆ π, σ B and NMI σ, σ B to compare the performances of the spectral ordering and clustering methods for each of the matrices considered in the previous section. Figure 7 shows the performances of the ordering and clustering methods based on the SBM for different graph sizes N , numbers of blocks K, and fuzziness parameter . When graphs are small (and thus relatively dense), there is no clear saturation in the curves of the LCE and the NMI, and it is difficult to evaluate whether the ordering methods or clustering methods exhibit superior performance in terms of the detectability limit. Moreover, the differences in performances are not noticeable among the different matrices, except for the unnormalized Laplacian. When graphs are large, we can clearly identify the saturation. For the unnormalized and normalized Laplacians, the values of LCE gradually decrease, even when the values of NMI saturate, indicating that the spectral ordering methods are superior to their clustering counterparts. By contrast, the detectability limits of the modularity matrix, regularized Laplacian, and Bethe Hessian are not very different between the ordering and clustering methods. In addition, the methods with the regularized Laplacian and Bethe Hessian perform similarly and are superior to the other matrices, whereas the methods with the unnormalized Laplacian are clearly inferior.

B. Ordered random graph model
We have observed how and to what extent the community structure can be inferred using spectral ordering methods. This section discusses the opposite scenario. That is, we analyze whether the spectral clustering methods can infer banded structures. To this end, we conduct a performance analysis using the ORGM. This section uses the K-means method to determine the group labels in spectral clustering.
The vertex set in the ORGM has a planted sequence, as the vertex set in the SBM has a planted partition. We let the planted sequence coincide with the original sequence I. The edges in the ORGM are generated independently and randomly by referring to the planted sequence. We divide the space of the adjacency matrix elements into two regions, Ω in and Ω out . Ω in (resp. Ω out ) is the set of elements in which an edge connects two vertices that are deemed to be "close" (resp. "not close") to each other. An edge is generated between a vertex pair with probability p in if they are "close" and with probability p out otherwise. Therefore, the probabilities of the upper-right elements of the adjacency matrix are given as follows: We set the boundary of Ω in and Ω out as where r is the bandwidth that specifies the boundary of the regions. Although Eq. (47) is a simple one, we note that the boundary in the ORGM can be more complex in general. In the following, instead of p in and p out , we specify the edge density by the average degree c and the strength of the banded structure = p out /p in ; when = 0, nonzero elements in the adjacency matrix are completely confined within Ω in , whereas the model is uniformly random when = 1. (See Fig. 8(a) for an example of the resulting adjacency matrix of the ORGM). In summary, except for the number of vertices N , which is a nuisance parameter, the parameters in the ORGM are the average degree c, strength of the banded structure , and bandwidth ratio r/N .
Using the ORGM and the spectral clustering methods, we investigate the following questions: 1. How would the reordered adjacency matrix look like? Can we visually identify the banded structure through the matrix?
2. How and when would the spectral clustering algorithms lose their correlations with the planted ordering in the ORGM?
We first investigate the former question. Figure 8 shows the results of a spectral clustering method with different values of K applied to a graph generated by the ORGM. A graph tends to be partitioned into equallysized groups (see also Fig. S2 in the Supplemental Material for a quantitative evidence). Recall that we observe a banded structure through a spectral ordering method even when the graph is generated from the SBM (Fig. 6). Analogously, we can identify block-diagonal structures in Fig. 8 although the graph is generated from the ORGM. This is an interesting observation because it implies that some of the community structures identified in the literature may be better described by banded structures. Figure 9 shows the normalized LCE ∆ (I,σ) /∆({N k }) between the planted sequence I and inferred partitionσ, where {N k } is the set of group sizes inσ. The normalized LCE is generally low when r/N is not too small or large and is small.
The existence of detectability limits is implied from Figure 9. In the limit of N → ∞, there exists a critical value of above which the normalized LCE is unity for any value of r/N . Moreover, for a given , there also exists an upper limit (and possibly a lower limit) of the bandwidth ratio r/N beyond which a spectral clustering method is not correlated with the planted sequence better than a random guess. These critical values depend on the average degree (see Fig. S3 in the Supplemental Material for the numerical phase diagrams).
Analogous to the analysis for the SBM, the performance of the unnormalized Laplacian is notably inferior in terms of the normalized LCE; in most of the parameter sets, it does not perform better than a random guess. The behaviors of the modularity matrix, Bethe Hessian, and regularized Laplacian are similar. Moreover, the results for the latter two matrices are apparently identical. In contrast to the analysis of graphs generated by the SBM, the performance of the normalized Laplacian is as good as or even better than that of the Bethe Hessian and regularized Laplacian.
The inferior performance of the spectral clustering with the unnormalized Laplacian can also be characterized by the distribution of the group sizes {N k }. The fraction of the largest group max k N k /N is nearly unity, i.e., most of the vertices belong to the same group (see Fig. S2 in the Supplemental Material for the experimental results). In such a case, the result of clustering contains very little information about the inherent ordering in the graph; as shown in Fig. 4(b), the upper bound max ∆ and the mean value under the random sequence ∆ are small when a partition is highly skewed, reflecting the fact that the group labels tend to be aligned consecutively for any sequence. A possible mechanism for such skewed distributions of group sizes is the emergence of localized eigenvectors [41,46], which deteriorates the performance of spectral clustering. However, we do not pursue the detailed mechanisms that could have caused the outcome obtained in this study.
In summary, we have confirmed that some spectral clustering methods detect community structures that are correlated to the inherent sequential structure of the ORGM, and that there are nontrivial limits of detectability.

C. Real-world networks
We now apply the spectral ordering and clustering methods to five empirical adjacency matrices. Descriptions of the empirical datasets examined are provided in Table S1. Note that many empirical datasets exhibit a high degree heterogeneity, whereas the synthetic graphs in Secs. IV A and IV B do not. As discussed in Sec. III, spectral orderings with different matrices are characterized as the minimization problem of H 2 with different constraints and penalty terms (Table I), and these differences become prominent when vertex degrees are heterogeneous.
In Fig. 10, we see a banded structure for the ver- tex orderings based on the normalized and unnormalized Laplacians, L and L, for the karate club ( Fig. 10(a)) and political books datasets ( Fig. 10(b)), where the hub vertices tend to be located around the middle of the optimized sequence. In contrast, the ordering method with the modularity matrix Q locates vertices with large degrees at both ends of the sequence, as expected from the penalty term in the objective function (20). A similar observation applies to the methods using the Bethe Hessian B and regularized Laplacian L (Figs. 10, S4 and S7). Importantly, however, vertex orderings based on these matrices is critically influenced by the regularization parameter τ .
For many empirical graphs, the hyperparameter r = i d 2 i / i d i − 1 in the Bethe Hessian takes a large positive value, i.e., τ < 0 (Eq. (24)). Thus, the penalty term τ d i x 2 i contributes to reducing the objective function. Hence, hub vertices tend to be aligned at the ends of the vertex sequence. However, the spectral ordering with the regularized Laplacian has an exogenous regularization parameter τ in its constraint and penalty terms (see Eqs. (32) and (38)), where we set τ as the average degree. As discussed in Sec. III E, a larger value of τ tends to avoid locating hub vertices around the middle of the sequence. Although the validation analysis based on synthetic graphs suggested that the sequences inferred based on these matrices are fairly similar (Sec. IV), they do not necessarily coincide in general. Note also that, as τ → 0, the Bethe Hessian B approaches the unnormalized Laplacian L, and the regularized Laplacian L approaches the normalized Laplacian L (Table I). Therefore, when τ is small in absolute value, the optimal vertex sequences based on the Bethe Hessian and regularized Laplacian are close to those obtained from the unnormalized and normalized Laplacians, respectively (Figs. S8 and S9). Indeed, the location of vertices with large degrees in optimal vertex sequence can be tuned by varying τ , from a "hub-centered" alignment to a "hub-at-the-corner" alignment. Figure 10 also shows the normalized LCE representing the consistency between the inferred sequenceπ and group labelsσ for each matrix used in the spectral ordering and clustering methods. When we set K = 2 in the clustering method, as shown in Figs. 10(a) and 10(b),π andσ are perfectly consistent in terms of the LCE. For K ≥ 3, the LCEs are mostly lower than 0.8 and are typically approximately 0.5 ( Fig. 10 and S10-S14), suggesting that the optimized vertex sequences using spectral ordering convey some information about a non-random structure. We also find that some methods yield similar LCEs for all datasets, whereas the LCEs obtained with the (un)normalized Laplacian exhibit different behaviors (Figs. S10-S14). This is consistent with the previous numerical observation that the spectral ordering based on the (un)normalized Laplacian is quite distinct from those obtained from the modularity matrix, Bethe Hessian, and regularized Laplacian (Fig. 10).
Interestingly, despite having distinct optimized sequences using different objective functions, the value of the normalized LCE can be very close to each other. Therefore, adjacency matrices may exhibit the same or similar structures from the perspective of community structure, and they are differentiated only by detailed orderings within each group.

V. SUMMARY AND DISCUSSION
This study analyzed the relationship between the ordering and clustering methods for graphs by quantifying the extent to which vertices close to each other in the optimized sequence have the same group label through the LCE. To obtain analytical insight into spectral ordering, we first showed that the spectral ordering problem is formulated as a minimization of the squared sequential distance H 2 subject to a particular penalty function and constraints, depending on the matrix representation of a graph (i.e., normalized Laplacian, modularity matrix, etc). The numerical results suggested that the spectral ordering methods, except that based on unnormalized Laplacian, often yield optimized sequences such that vertices in the same group are close to each other; that is, the normalized LCEs are considerably below 1 as long as strong community structures exist.
Several issues remain to be addressed in future studies. First, we defined LCE to quantify the continuity of group labels for a given vertex sequence. The consistency between ordering and clustering can also be measured in other ways; for example, one can quantify the continuity of indices in a vertex sequence for given group labels on the vertices, whereas the LCE quantifies the continuity of group labels for a given vertex sequence. Second, we focused on unipartite graphs for which the connectivities are represented by square matrices (i.e., adjacency matrices). In principle, the proposed method can also be applied to study non-square matrices, such as bipartite graphs. Third, we implemented ordering and clustering methods independently and examined their consistency. Given that we found some consistency between the two, it would be possible to develop a clustering method that incorporates information about the inherent vertex sequence. Analogously, the spectral ordering method can be adjusted in such a way that the obtained vertex sequence reflects group labels. We expect our paper will stimulate further research in these directions.

Appendix A: Upper bound of the label continuity error
We derive the upper bound of the LCE by explicitly constructing a worst-case sequence. We assume that a partition σ is given (i.e., the number of groups K and group sizes {N k } are given), and the first group is the largest group (i.e., max k N k = N 1 = |V 1 |). When N 1 satisfies N 1 > N/2 , some vertices in V 1 must be aligned consecutively. As exemplified in Fig. 11(a), the LCE is maximized when the vertices in V 1 and those in ∪ k>1 V k are aligned alternately as possible. In this case, there are 2(N − N 1 ) vertices that are aligned alternately with different group labels, and the label continuity is C = (2N 1 − N − 1)/(N − 1). Therefore, the maximum LCE leads to which corresponds to the upper case of Eq. (5). When N 1 is less than or equal to the sum of the vertices in all other groups (Figs. 11(b), 11(c), and 11(d)), vertices can be aligned such that no group labels are consecutive. Such a sequence is constructed as follows. We first align the vertices in V 1 and the vertices in ∪ k>1 V k as alternately as possible. In this step, all the vertices in V 1 are aligned, and there are k>1 N k − N 1 vertices that are not yet aligned; here, in ∪ k>1 V k , we preferentially consume the labels with larger N k (Fig. 11(b) and Step 1 in Figs. 11(c) and 11(d)). When there are remaining vertices, we regard a set of alternately-aligned vertices as a fundamental unit and treat all such sets as "super vertices" with the same labels. We then align the super Step 1 Step 2 Step 1 Step 2 Step 3 FIG. 11. Vertex sequence yielding the maximum LCE for a given group sizes {N k }. Panels (a) and (b) are cases where N1 > N/2 and N1 = N/2, respectively, and panels (c) and (d) are cases where N1 < N/2. The sequence with the maximum LCE can be constructed by aligning the vertices with different labels alternately in the procedure shown in each step. Vertices in the box indicate that they are to be aligned in the following steps. The vertex indices are omitted because they are not essential for the construction of a sequence.
vertices and the remaining vertices in the same manner as in the previous step. We repeat this procedure until all vertices are aligned. We can always align vertices and super vertices alternately because the number of remaining vertices with the same label never exceeds the number of already aligned vertices or super vertices. Therefore, we can establish a sequence for which the label continuity C is zero, and the upper bound of the LCE leads to Appendix B: Variance of the label continuity error in random partitions The second moment of ∆ is Thus, the variance Var[∆] is Appendix C: Probability distribution of the label continuity error in random partitions This section derives the probability distribution of the label continuity error ∆ (π, σ) when group labels are assigned randomly based on bootstrapped group labels σ * .
To derive the probability distribution of ∆ (π, σ), it is sufficient to calculate that of label continuity C (π, σ).
The probability of (N − 1)C = m is Here, I is the identity matrix. In Eq. (C1), we used the identity which is an integral around the origin of the complex plane.
Using the eigenvalue decomposition, F can be expressed as where u k (2 ≤ k ≤ K) is an eigenvector of F that is perpendicular to 1, and we have Because the second term in Eq. (C5) vanishes when the group sizes are equal, the exact probability distribution can be derived as follows: Equivalently, Therefore, (N −1)C follows a binomial distribution. This result can be interpreted as follows. We suppose that there are N elements that are linearly aligned, and we assign group labels from one end. As we focus only on the consecutive property of the group labels, the label of the first element can be arbitrary. For the next N − 1 elements, the probability that the label is consecutive to the previous one is 1/K, whereas the complement probability is 1 − 1/K because the group label can be arbitrary as long as it is not identical to the previous one. We sum over all possible patterns that have consecutive labels m times to obtain P C = m N −1 .
Even when the group sizes are not equal, Eq. (C6) is close to the actual distribution as long as the second term in Eq. (C5) is negligible. When N 1 and the size of each group is of constant order, i.e., N/K = O(1), Eq. (C6) is well approximated as a Poisson distribution. Furthermore, when N/K 1, the distribution is nearly normal. The distribution of ∆ (π, σ) is obtained by shifting the distribution (C7) by a constant factor.

Appendix D: Label continuity errors for nested partitions
As an example of partitions with different numbers of groups, here we investigate the difference in the LCEs between a partition σ with K groups and its nested partition σ . The partition σ is obtained by subpartitioning the vertices V K having Kth group label in σ into V K,1 and V K,2 (V K,1 ∪V K,2 = V K ); we denote the sizes of these two groups as N K,1 and N K,2 (N K,1 + N K,2 = N K ), and also denote {N k } = {N 1 , . . . , N K−1 , N K,1 , N K,2 }. The partitions σ and σ are only locally different. The difference in the LCEs for σ and σ with the same sequence π is bounded as The lower bound is trivial because the label continuity C is a nonnegative quantity and C cannot be smaller when C = 0 before subpartition. The upper bound of Eq. (D1) can be derived as follows. The difference in the LCE is maximized when the difference in C is maximized. Note that the number of flips of the labels can be maximized when the labels before the subpartition are aligned completely consecutively, e.g., the case in Fig. 12(a). In this case, we can maximize the difference in C by aligning the vertices in V K,1 and V K,2 as alternately as possible. The achieved difference is Equation (D1) indicates that the LCE is a local quantity, that is, the bound of variation in the LCE is characterized by N K,1 and N K,2 ; the variation tends to be small when min{N K,1 , N K,2 } is small. However, when it comes to the specific difference, not bounds, it depends not only on the subsequence within V K , but also on the position V K in the entire sequence π (see Fig. 12 for specific examples). The present result implies that comparison of the LCEs is generally complicated when partitions have different numbers of groups.

Supplementary Materials
"Consistency between ordering and clustering methods for graphs" Tatsuro Kawamoto, Masaki Ochi and Teruyoshi Kobayashi S1: Hyperparameter dependency in the Bethe Hessian This section investigates the dependency of the hyperparameter r and the choice of eigenvector in the Bethe Hessian B on spectral ordering. Figure S1 shows, for eachπ estimated by kth (k = 1, 2, 3, 4) eigenvector, the achieved value of H 2 (π; A) as we sweep the hyperparameter r. In this experiment, we used an instance of the ORGM with N = 100, c = 6, = 0.1, and r/N = 0.1. The dashed line in the figure represents the default value of r.
This result indicates that the eigenvector with which H 2 (π; A) is minimized varies as r increases, particularly when r is relatively small. However, when r is sufficiently large, the estimateπ using the eigenvector ν 2 associated with the second-smallest eigenvalue yields the lowest value of H 2 (π; A).
Based on this observation, we employ ν 2 for ordering with the Bethe Hessian. It should also be noted that, as shown in Fig. S1, the global minimum of H 2 (π; A) is typically achieved when r is lower than the default value we employed. Therefore, although it is not within the scope of this study, a better performance would be obtained if we optimize with respect to r.   [1] https://graph-tool.skewed.de.