Contribution of directedness in graph spectra

In graph analyses, directed edges are often approximated to undirected ones so that the adjacency matrices may be symmetric. However, such simplification has not been thoroughly verified. In this study, we investigate how directedness affects the graph spectra by introducing random directization, which is an opposite operation of neglecting edge directions. We analytically reveal that uniformly random directization typically conserves the relative spectral structure of the adjacency matrix in the perturbative regime. The result of random directization implies that the spectrum of the adjacency matrix can be conserved after the directedness is ignored.


I. INTRODUCTION
Many real-world datasets are represented by directed graphs. In social networks, the follower-followee relationship defines a directed edge [1]. In nervous systems, a signal transduction between neurons occurs only in one direction [2]. In this way, the directedness renders the relationship between a pair of vertices asymmetric and characterizes many properties on graphs, such as the diffusion [3] and reciprocity [4]. Nevertheless, in the studies of complex networks, the edge directions in graphs are often ignored, and directed graphs are converted to the undirected counterparts. Here, we refer to such simplification as undirectization. While undirectization may affect the result of an analysis only negligibly, it can be critical in some cases. In this study, we investigate the importance of the directedness in graph analyses by focusing on the change in graph spectra using the matrix perturbation theory.
Graphs are typically represented by matrices, such as adjacency matrices, combinatorial and normalized Laplacians [5], and non-backtracking matrices [6]. The associated graph spectra offer important tools for capturing the global properties of graphs, such as module structures [7] and network centralities [8]. For example, in spectral clustering for undirected graphs, the number of clusters is determined by the number of eigenvalues that are isolated from the (asymptotically) continuous spectral band. The eigenvectors corresponding to these isolated eigenvalues provide partitioning of a graph [5,9]. There have been a number of studies on the spectral properties for variety of matrices in the context of statistical physics and random matrix theory [10][11][12]. In spectral graph theory, several bounds for the largest and second-largest eigen-FIG. 1. Adjacency matrix spectra for the macaque-cortex network [16]: (a) original network with directed edges and (b) undirectized network in which every directed edge is converted to an undirected edge.
Spectral methods for undirected graphs and directed graphs are not completely analogous to each other. Because the matrices are asymmetric for directed graphs, their spectra contain eigenvalues with non-zero imaginary parts. This is partly a reason that motivates researchers and practitioners to ignore edge directions. To formulate spectral methods for directed graphs, several graph Laplacians for the spectral clustering of directed graphs have been proposed in the literature [17][18][19][20][21][22]. Although the choice of Laplacians is an important issue, herein, we focus only on adjacency matrices of directed and undirected graphs. There are several researches on the spectra of directed graphs. For example, in spectral graph theory, bounds of the spectral radius of adjacency matrices for directed graphs have been studied [23,24]. In random graph theory and statistical physics, spectral densities of adjacency matrices for random directed graphs have been investigated [25,26]. In contrast, we consider typical spectra of directed graphs based on its undirected counterpart. Figure 1 (a) shows the eigenvalue distribution of the adjacency matrix of a macaque-cortex network [16], which is partially directed; Fig. 1(b) presents the undirectized counterpart. We note that all eigenvalues are projected onto the real axis, and the scale along the real part is slightly larger in the undirectized graph. Despite these differences, the relative distances among the five largest eigenvalues along the real axis are almost unchanged between Figs. 1(a) and 1(b).
The last observation in this example motivates us to theoretically investigate the relationship between the spectral structures of directed graphs and their undirectized counterparts. To this end, we introduce random directization as an opposite operation of ignoring edge directions. That is, we consider an undirected graph as the original graph and randomly make undirected edges directed. We consider the typical variation of eigenvalues and eigenvectors under random directization. When the fraction of directized edges is sufficiently small compared to the total number of edges, the resulting adjacency matrix can be considered as a perturbed matrix of the original one. We apply the matrix perturbation theory to analytically evaluate variations of eigenvalues and eigenvectors after directization. As shown below, an important prediction of the perturbation theory is that the relative spectral structure along the real axis of the adjacency matrix is approximately conserved when the edges are directized uniformly randomly. This conversely explains our observation in Fig. 1 on undirectization.
There have been many works on perturbative analysis for undirected graph spectra. Let A and V be realsymmetric n × n matrices, and we perturb A by adding V . Let λ i , v i respectively denote the ith eigenvalue and eigenvector of A, andλ i ,ṽ i respectively denote the ith eigenvalue and eigenvector of A + V . The bound for the variation of eigenvalues through this perturbation is known as the Weyl's theorem [27]: where ||V || represents the spectral norm of V . As for the variation of eigenvectors, the Davis-Kahan theorem [28] explains how the eigenvector can change with the same perturbation: the angle between v i andṽ i is bounded as where the sine of the angle between two vectors is defined by sin ∠(v, w) = 1 − (v · w/|v||w|) 2 . In addi- tion, many variants of the Weyl's and Davis-Kahan theorems, such as the one convenient for application in statistical contexts [29] and the ones which assume lowrank matrices for A and random matrices for V [30,31]. Sarkar et al. [32] extended the theorem in order to algorithmically estimate the number of modules in undirected graphs. Karrer and Newman [33] experimentally investigated the effect of adding edges to undirected graphs on their spectra. Note that in these studies, both the unperturbed and perturbed graphs were undirected, whereas in our study, we consider directizing perturbation. The rest of the paper is organized as follows. In Sec. II, we formally define random directization and conduct a perturbative analysis. Then, we evaluate the variation in the spectra under undirectization in Sec. III. Finally, Sec. IV is devoted to a summary and discussions. The symbols used in this paper are listed in Appendix A.

II. RANDOM DIRECTIZATION
An undirected graph has a symmetric adjacency matrix A, in which A ij = A ji = 1 when vertices i and j are connected, while A ij = 0 otherwise. The degree, defined by d i = j A ij = j A ji , represents the number of the neighbors of vertex i. For a directed graph, in contrast, the adjacency matrix is asymmetric, i.e., A ij = 1 and A ji = 0 when an edge has a direction from vertex i to vertex j only. Hereafter, we regard an undirected edge in a directed graph as a pair of directed edges in both directions. The in-degree and out-degree, defined by d denote the number of in-neighboring vertices (with in-coming edges) and out-neighboring vertices (with out-going edges) of vertex i, respectively.
We define uniformly random directization as follows. Let G(V, E) denote an undirected graph that consists of a set of vertices V and edges E and letG(V,Ẽ) denote a graph randomly directized from the original graph G(V, E). The number of vertices is denoted by N = |V | for both graphs, while the numbers of edges for G(V, E) andG(V,Ẽ) are respectively denoted by M = |E| and M = |Ẽ|; note that each undirected edge is doubly counted in the latter, but not in the former. Hereafter, A 0 andÃ denote the adjacency matrices of G(V, E) and G(V,Ẽ), respectively. On directization, we alter an undirected edge e ij ∈ E between vertex i and vertex j to a directed edge e i→j ∈Ẽ (from i to j) with probability q/2, to e i←j ∈Ẽ (from j to i) with probability q/2, and remain undirected with probability 1 − q, where 0 ≤ q ≤ 1; see Fig. 2. The number of directed edges after the uniformly random directization becomes M = qM + 2(1 − q)M on average because we doubly count an undirected edge as a pair of directed edges with two directions. We express the adjacency matrixÃ of the uniformly directized graph as where V is the perturbation matrix defined as follows: , 0} with probabilities q/2, q/2, and 1 − q, respectively; otherwise, V ij = V ji = 0. We express the ith eigenvalue of A 0 as λ i and its eigenvector as v i . The corresponding eigenvalue and eigenvector for A are denoted byλ i andṽ i , respectively. We express the ensemble of perturbation matrices V for a given adjacency matrix A 0 as S(A 0 ). Throughout this paper, the norm of each eigenvector is normalized to unity.

A. Perturbative analysis
Perturbation theory allows us to estimate the variation of each eigenvalue δλ i =λ i − λ i and the variation of each eigenvector δv i =ṽ i − v i under perturbative directization. We calculate the ensemble average and variance of these variations up to the first-order approximation.

Eigenvalues
For a given graph, the variation of the ith eigenvalue λ i along the real axis in the first-order approximation [34,35] is where v denotes the transpose of a vector v. Note that this is for a specific instance of graph, whereas we are interested in the ensemble of randomly directized graphs.
To this end, we define a generating function for δλ where β is an auxiliary parameter, v i denotes the th element of v i , and [· · · ] V |A 0 represents the random average We find the average and variance of the variation for λ i respectively by Because each edge is directized independently, the random average in Eq. (5) is calculated as Then, the average and variance of the variation δλ i are given by respectively, where we used the fact Equation (10) indicates that, in the range where the first-order approximation is valid, the average of the perturbed eigenvalue is Thus, the relative spectral structure is conserved under uniformly random directization as long as the first-order approximation is valid. Let us investigate the condition that the fluctuation of the variation is small. We consider the ratio between the average and standard deviation for the ith eigenvalue: Because each element of the eigenvector typically scales as O N −1/2 and the number of nonzero elements in the summation is cN , where c denotes the average degree of the original undirected graph, defined by c = i d i /N , the ratio above scales as In the case of regular random graphs, the eigenvalue at the edge of the spectral band is 2 √ c [36]. Thus, the fluctuation of the variation is expected to be negligible when 2q (1 − q) N is sufficiently large for the eigenvalues out of the spectral band.

Eigenvectors
For a given graph, the variation of the ith eigenvector v i along the real axis in the first-order approximation [34,35] is For the th element of v i , we define its generating function by We find the average and variance of the variation of the th element of v i respectively by As was done for the eigenvalues above, we can take the random average with respect to each edge, obtaining From Eqs. (18) and (20), we find the average of v i 's variation as follows: Thus, in the first-order perturbative regime, uniformly random directization does not vary the eigenvectors of adjacency matrices on average. The variance of v i 's variation is also obtained by using Eqs. (19) and (20):

B. Numerical confirmation
We now compare the first-order perturbative estimation with numerical calculations. We generated 10,000 uniformly randomly directized samples from the undirectized counterpart for real-world networks, and numerically calculated the eigenvalues and eigenvectors of the adjacency matrices. Figure 3 exhibits the variation of the real part of each eigenvalue δλ i and the first element of each eigenvector δv i1 of the undirectized macaque-cortex network [16] under random directization. We show the variation of all eigenvalues in the left panel and the variation of eigenvector elements corresponding to the top 20 eigenvalues in the right panel. In both panels, we observe that the firstorder perturbative estimates and the numerical results agree well, particularly for the top eigenvalues, which are expected to be isolated eigenvalues. Figure 4 shows the q-dependency of the top five eigenvalues along the real axis for two real-world directed networks; the macaque cortex network [16] and social network of employees at a consulting company [37]. The fractions of directed edges are 0.30 and 0.41, respectively. We numerically calculated the perturbed eigenvalues normalized by the original eigenvalues. Based on Eq. (10), we estimate the normalized eigenvalueλ i /λ i to be 1−q/2 regardless of the index of eigenvalue, which is illustrated by the dashed line in Fig. 4. For both networks, the estimation is reasonable when q is sufficiently small. As q increases, the difference between the theoretical estimate and the numerical result increases.

C. Distribution of directed in-degrees
Before we conclude this section, here we consider the degree distribution of directized edges after uniformly random directization. We define the directed in-degree k (in) i for a directed graph as the number of in-coming edges for each vertex: Here, we do not count undirected edges for k Note that because the directization of an edge affects the directed in-degrees of the vertices on both ends, the directed in-degrees of neighboring vertices are correlated with each other. Instead of deriving the directed indegree distribution after actual random directization, we derive it after a process in which we uniformly randomly directize stubs (half-edges). This procedure is illustrated in Fig. 6. We independently alter an undirected stub to the in-coming one with probability ω and keep it undirected with probability 1 − ω. The probability that the number of directed stubs for vertex i is k is given by a binomial distribution: We regard that each edge becomes a directed edge if one end of the edge is directized while the other end is not. After random directization of stubs, an edge e ij is converted to e i→j with probability ω(1 − ω), converted to e i←j with probability ω(1 − ω), and remains undirected with probability (1 − ω) 2 . With probability ω 2 , an edge becomes bi-directed and the resulting object is no longer a directed graph that we consider. Nonetheless, when ω is sufficiently small such that the emergence of the bi-directed edges is negligibly rare, this process is almost identical to the directization defined in Eq. (3) with ω = q/2. Thus, when q is sufficiently small, the probability that a vertex i has a directed in-degree k uniformly random directization approximately follows a binomial distribution: Then, we obtain the directed in-degree distribution over a graph after uniformly random directization. For a subset of vertices with the same degree d, the directed in-degree distribution approximately follows a binomial distribution: Thus, the directed in-degree distribution P (k (in) ) over the whole graph approximately follows the mixture of binomial distributions: where Q(d) denotes the degree distribution of the original undirected graph.

III. VARIATION OF SPECTRA UNDER UNDIRECTIZATION
We return to our original motivation of clarifying how the graph spectra are varied by ignoring edge directions, namely undirectization. Let G D (V, E D ) denote a directed graph and let G U (V, E U ) denote its undirectized graph. The numbers of edges for G D and G U are respectively denoted by M D = |E D | and M U = |E U |. The fraction of directed edges,q, which corresponds to the probability of directization in random directization, should satisfy M D =qM U + 2(1 −q)M U , and hence is given byq = 2 − M D /M U . The adjacency matrices for G D and G U are respectively denoted by A D and A U .
The result from random directization implies that the relative spectral structure along the real axis is typically maintained under undirectization when the fractionq is sufficiently small. The result of directization, Eq. (10), conversely implies that, after ignoring the edge directions, the real part of the ith eigenvalue λ D i of A D is altered to the corresponding eigenvalue λ U i of A U as in where Re [· · · ] denotes the real part of a complex value. The perturbative analysis explains why the two spectra shown in Fig. 1 share almost the same relative spectral structure for the real parts. In Fig. 7, we compare the spectra of the macaque-cortex network, in whicĥ q = 0.30, with the theoretical estimates. The top panel shows the resulting spectrum after random directization in Eq. (10), and the bottom panel shows the resulting spectrum after undirectization in Eq. (28). Both panels show that the perturbative analysis is moderately accurate, particularly for isolated eigenvalues.
Apart from the accuracy of the perturbation theory, we can assess, using Eq. (27), to what degree a directed graph is regarded as a uniformly randomly directized one. Figure 8 compares the directed in-degree distribution of the original (directed) macaque-cortex network and the theoretical prediction [Eq. (27)] for a uniformly randomly directed graph. We assess the null hypothesis that the edges are directized uniformly randomly, by the χ 2 goodness-of-fit test for the distributions in the range k in ≤ 11. As a result, we find that the p-value of the empirical distribution is less than 0.01, which implies that the macaque-cortex network may not be regarded as a uniformly randomly directed graph, despite the accuracy of the perturbative analysis.
Note that the uniformity in random directization is not a necessary condition for the conservation of the relative spectral structure. Thus, the χ 2 goodness-of-fit test of the in-degree distribution itself is not a criterion for the validity of our perturbative analysis. Nonetheless, this test partly explains why our analysis is plausible when the null hypothesis is not rejected.

IV. SUMMARY AND DISCUSSIONS
In this study, we investigated the contribution of directedness in graph spectra. We introduced random directization as the inverse operation of ignoring the edge directions. We revealed that, in the perturbative regime, uniformly random directization does not destroy the relative spectral structure along the real axis of the undi-  rected graphs. Additionally, we observed that the relative spectral structure along the real axis was also conserved for real-world datasets. Although the effects of directization and undirectization on the graph spectra are generally not symmetric, we showed that the results of random directization can be used to explain the behavior of the undirectization.
Several comments are in order. In Sec. II, we analyzed up to the first-order term in the perturbative expansion. However, it is not obvious whether the contributions from higher-order terms are negligible. In our formulation of perturbative random directization in Eq. (3), q is not a perturbative expansion parameter, but simply defines the fraction of non-zero elements. Instead, we carry out the perturbative expansion with respect to the matrix V , which does not consist of infinitesimally small elements.
Here, let us consider the contribution from higher-order terms, especially the second-order term. The secondorder terms in the perturbative expansion of the ith eigenvalue λ i along the real axis [34,35] is given by We numerically calculate the average of this second-order contribution δλ (2) i over randomly directized graphs and show the result in Fig. 9 as solid lines. We also show the real part of the deviation from the first-order approximation δλ i − δλ Fig. 9 as points. We observe that the second-order contribution is smaller when q is sufficiently small, and the first-order approximation is indeed valid in that region. In addition, the average of the second-order term is not proportional to the original eigenvalues while that of the first-order term is. Therefore, the spectral structure can be much more compli- cated when higher-order contributions are dominant. In Appendix B, we analytically calculated the second-order perturbation expansion up to the first order in q. As an exceptional case, we can analytically obtain the random average of the variation for random graphs when q = 1 using the cavity method. We show the result for the stochastic block model in Appendix C. Second, we can easily show that the conservation property of the relative spectral structure cannot be generalized to arbitrary directizations. For example, let us consider non-uniform directization on a graph with a module structure with two blocks. When a specific edge e ij is altered to be directed as e i→j , the variation of each eigenvalue is expressed as The largest eigenvalue λ 1 varies negatively, regardless of the choice of the edges e ij because v 1i and v 1j always have the same sign thanks to the Perron-Frobenius theorem.
On the other hand, the variation of the second-largest eigenvalue λ 2 , which is related to the module structure, depends on which edge is altered. When the edge e ij functions as a bridge between the two blocks, δλ 2 is expected to be positive because the eigenvector elements v 2i and v 2j typically have different signs. In contrast, when both ends of the edge e ij are located inside a common block, δλ 2 is expected to be negative because v 2i and v 2j typically have the same sign. Thus, the relative spectral structure is not conserved when the random directization is not uniformly random.
In this study, we evaluate the spectral variation only along the real axis. In the perturbative regime, all eigen-gaps along the real axis remain finite; thus, complex conjugate pairs never appear. As we further asymmetrize the adjacency matrix, some of the neighboring eigenvalues may collide at an exceptional point [35] and turn into a pair of complex conjugate eigenvalues. At the exceptional point, the corresponding eigenvectors become parallel to each other, and hence the matrix becomes undiagonalizable. The perturbation theory would no longer be valid because the matrices are assumed to be diagonalizable.
Random directization presented in this paper is applicable to resampling of graphs. Many complex systems have only one network data at one time, which makes it difficult to statistically analyze its properties, including the spectrum. To address this problem, there have been several resampling methods to duplicate similar undirected graphs from the original graph [38,39]. The present study implies that we can utilize random directization for the purpose of resampling directed graphs with the same configuration of edges, whose relative spectral structure is typically close to the original undirected graphs.

Appendix A: List of symbols
Here, we show the list of notations used in the main article in Fig. 10. Here, we conduct an analytical calculation for the random average of the second-order terms of the eigenvalues and eigenvector elements up to the first order of q. Note that, from the numerical calculation in Fig. 9, the random average of the second-order terms is not linear to q, and thus this form of first-order evaluation is only valid when q is sufficiently small.

Directization
The second-order terms in the perturbation theory of the ith eigenvalue λ i and the corresponding eigenvector v i along the real axis [34,35] are, respectively, given by Similarly to the case of the first-order approximation, we respectively obtain the average of the second-order term up to the first order of q for the ith eigenvalue and the th element of the ith eigenvector v i under uniformly random directization in the forms The detailed calculations are shown in Appendix D.
We find that the second-order variation of an eigenvalue is not typically proportional to the original one as in Eq. (10) and that of the eigenvector elements do not vanish as in Eq. (21). Thus, the relative spectral structure is not conserved under uniformly random directization when the contribution of the second-order term is sufficiently large. Figure 11 numerically compares the second-order terms, Eqs. (B3) and (B4), with the first-order terms, Eqs. (10) and (21), for the undirectized macaque-cortex network. We observe that the second-order term is sufficiently smaller than the first-order term for some of the top eigenvalues. On the other hand, in the region in which the eigenvalues gather densely around zero, the second-order term is comparable to the first-order term, and our first-order approximation is invalidated. Appendix C: Spectrum out of the perturbative regime We here investigate the behavior of eigenvalues out of the perturbative regime using synthetic graphs generated from the stochastic block model (SBM) [33,40], which is a random graph model with a preassigned module structure. We particularly consider the SBM with two equally sized blocks, namely the symmetric SBM; we let B 1 and B 2 denote the vertex sets of the two blocks, with |B 1 | = |B 2 | = N/2. For each pair of vertices, an edge is generated independently and randomly with probability p in = 2c in /N if the vertices belong to the same block. Otherwise, they are connected with probability p out = 2c out /N . The average degree is given by c = c in + c out . Figure 12 shows the variation δλ of the top five eigenvalues for graphs generated from the SBM. Similarly to the real-world networks in Fig. 7, it is confirmed that the perturbation theory is valid when q is sufficiently small, and the differences between the estimate and numerical results increase as q increases.
We can analytically estimate the variation of isolated eigenvalues when the graph is fully directized, i.e., q = 1, by using the cavity method [26,41] for the symmetric SBM. After full directization, the average number of in-neighbors in the same block and that in the different block are given by c I = c in /2 and c O = c out /2, respectively, while the average number of out-neighbors in the same block and that in the different block are also given by c I = c in /2 and c O = c out /2, respectively. From the cavity method, the spectral band edge of the adjacency matrix spectrum in the which we transform as Assuming that q is small, we find Thus, we obtain the average of the second-order term in the perturbation theory as which we used in Eq. (B3).

Eigenvectors
From Eq. (B2), we define the three generating functions for the second-order term of the ith eigenvector v i as Then, the random average of the second-order term is given by We find the first generating function in the form For small q, we have The second generating function gives .
Finally, the third generating function takes the form which we used in Eq. (B4).