Community detection for correlation matrices

A challenging problem in the study of complex systems is that of resolving, without prior information, the emergent, mesoscopic organization determined by groups of units whose dynamical activity is more strongly correlated internally than with the rest of the system. The existing techniques to filter correlations are not explicitly oriented towards identifying such modules and can suffer from an unavoidable information loss. A promising alternative is that of employing community detection techniques developed in network theory. Unfortunately, this approach has focused predominantly on replacing network data with correlation matrices, a procedure that tends to be intrinsically biased due to its inconsistency with the null hypotheses underlying the existing algorithms. Here we introduce, via a consistent redefinition of null models based on random matrix theory, the appropriate correlation-based counterparts of the most popular community detection techniques. Our methods can filter out both unit-specific noise and system-wide dependencies, and the resulting communities are internally correlated and mutually anti-correlated. We also implement multiresolution and multifrequency approaches revealing hierarchically nested sub-communities with `hard' cores and `soft' peripheries. We apply our techniques to several financial time series and identify mesoscopic groups of stocks which are irreducible to a standard, sectorial taxonomy, detect `soft stocks' that alternate between communities, and discuss implications for portfolio optimization and risk management.


I. INTRODUCTION
Over the past couple of decades the amount of raw data available has started to grow at an exponential rate, doubling approximately every 12 months [1], while the amount of data being consumed by users remains linear [2]. The so-called 'Big Data' phenomenon imposes an urgent need to develop, possibly with the aid of high-speed computing and cheap data storage, efficient pattern detection methods and data mining techniques aimed at identifying a few but highly relevant pieces of information in an ever-increasing noisy or irrelevant background. One of the most important and widespread examples of the Big Data phenomenon is time series data, as witnessed by the impressive growth of databases of electronic and mobile-device communication patterns in large social systems, financial returns in stock markets, physiological signals such as heartbeat and brain dynamics, gene expression profiles, and finally climate, weather and earthquake activity. In all these examples, high-dimensional (multiple) time series originate from the dynamical activity of the constituent units (such as stocks, people, neurons, genes, etc.) of large systems with complicated internal interactions. For this reason, 'Big time series Data' offer an unprecedented empirical resource for the science of complex systems.
Multiple time series are in fact the key ingredient required in order to face one of the main challenges for our modern understanding of real-world complex systems: the identification of an emergent, mesoscopic level of dynamical organization which is intermediate between the microscopic dynamics of indivual units (e.g. neurons) and the macroscopic dynamics of the system as a whole (e.g. the brain). Many complex systems are indeed organized in a modular way, with functionally related units being correlated with each other, while at the same time being relatively less (or even negatively) correlated with functionally dissimilar ones. While the existence of such a modular organization is intuitively plausible, its empirical identification is still an open problem, complicated by the fact that modules are typically emergent, in the sense that they are not evident a priori from a local inspection of static, or even dynamic, similarities or connections among individual units. In neuroscience, for instance, 'functional brain networks' are precisely defined by the correlated dynamical activity of neurons, as opposed to 'structural brain networks' which are instead defined by static neuronal connections [3]. Remarkably, it has been proposed that the observed divergence between functional and structural brain networks represents a signature of the brain's many-to-one (degenerate) functionstructure relationships which allow diverse functions to arise from a static neuronal anatomy [3]. Similarly, in the analysis of financial markets it has been observed that groups of correlated stocks evolve in time and only partially overlap with industrial sectors, implying that the (static) industrial classification fails to capture the dynamical modularity of real markets [4][5][6][7][8].
The approaches proposed so far to infer some form of modular or hierarchical organization from multiple time series are based on (necessarily arbitrary) criteria used to filter information [4,5,9]. As we discuss in more detail below, these filtering criteria are either the introduction of thresholds or a geometric embedding in some metric space with pre-defined properties. Our aim in the present paper is that of going beyond the limitations imposed by these arbitrary criteria. We propose that, both conceptually and algorithmically, the identification of mesocopic modules whose dynamical activity is more correlated internally than with that of other modules, requires iterated recursions into many attempted partitions of the system, an inherently non-local operation. By their nature, threshold-based or geometric methods are unfortunately not suited to deal with this sort of iterative partitioning problem 1 .
Our strategy towards a solution is the adaptation of a different class of rapidly developing techniques, specifically those aimed at identifying the static mesoscopic organization in complex networks, a problem known as community detection [10,11]. Communities within networks are groups of nodes that are more densely connected to each other than would be expected under a suitable null hypothesis. Additionally, the nodes within a community are less connected to the nodes within other communities of the same network. Several methods have been proposed over the last decade in order to empirically detect communities within networks. Different techniques have explored different ways to optimize the search over all possible partitions of the system. Conceptually, these methods contain precisely the ingredients that we need in order to solve our problem of identifying the hidden mesoscopic organization encoded within multiple time series. Adapting the existing community detection techniques to deal with time series data is the main goal of this paper.
While the idea of using community detection algorithms in order to analyse time series data has been already exploited a few times in the past [14][15][16], the attempts made so far have basically replaced network data with cross-correlation matrices. Here we show that this procedure suffers from the limitation that the underlying null hypotheses used in network-based community detection algorithms are inconsistent with the properties of correlation matrices. We illustrate that one of the undesired consequences is a systematic bias in the search over partitions, that becomes stronger as the heterogeneity of the size of the 'true' communities increases.
Here we propose a solution to this problem by introducing appropriate redefinitions of the so-called modularity [10], the core quantity that most methods aim at maximizing when searching the space of possible partitions. While in ordinary community detection methods the modularity is defined in terms of a null model that is (approximately) correct for networks, in the methods we propose the modularity is defined in terms of different null models that are appropriate for time series data and therefore dictated by random matrix theory (RMT) 1 One might think of community detection itself as a geometric method, but this is not what we mean here. The geometric methods we are referring to consist in embedding techniques that reduce the complexity of the original system by projecting the latter into some metric space of low dimensionality. [7,17,18]. We also adapt three popular algorithms that have been proposed to find the optimal partition (in networks), i.e. the one that maximizes the modularity. As a result, we end up with three community detection algorithms that are consistent with time series data and represent the counterparts of the most popular techniques used in network analysis. We also provide extensions to resolve hierarchically nested subcommunities (multiresolution community detection) and 'hard' cores versus 'soft' peripheries inside communities (multifrequency and time dependent community detection). After introducing our theoretical framework, we put special emphasis on financial applications, where the units of the system are assets and the corresponding time series are sequences of logarithmic price increments [4,5,9]. Even though advanced techniques to analyse correlations have been developed in other fields as well, financial time series analysis is one of the most active domains in this respect (another important example is that of functional brain networks, as we have already mentioned). We show that our methods allow us to efficiently probe the mesoscopic structure of different financial markets and ascertain communities of corporations, based on the time series of their daily stock returns. We uncover a variety of correlations between stocks of different industry sectors, not intuitively obvious from the sectorial taxonomy alone, thus confirming in a more rigorous manner the aforementioned result that market correlations only partially overlap with industry classifications. More importantly, the communities we detect after removing noisy and market-wide dependencies turn out to be internally correlated and mutually anti-correlated, a feature of particular relevance for portfolio optimization and risk management. We also analyse the stability of communities over different frequency resolutions and time horizons, thereby identifying groups of 'hard stocks' that reside stably in the core of communities and groups of 'soft stocks' that alternate between communities.
The rest of the paper is organized as follows: in section II we briefly describe the most important approaches that have been proposed in order to filter correlation matrices and highlight their issues with characterizing the modular properties of systems described by multiple time series. In section III we show that the existing community detection algorithms are based on a null hypothesis that is inconsistent for time series data, making these methods inadequate as well. In sec. IV we then introduce alternative and appropriate null models based on RMT, and exploit them in order to redefine three of the most popular community detection algorithms, in a way that makes them consistent with time series data. In sec. V we apply our methods to several time series of daily stock returns, from various financial markets around the globe. In sec. VI we analyse the dependence of community structure on the temporal resolution (i.e. the frequency) of the original time series. In sec. VII we investigate the evolution of community structure over time. Finally, in sec. VIII we summarize our results and provide some conclusions.

II. EXISTING APPROACHES
We start by introducing some useful notation. Let us consider a system with N units. The single time series X i ≡ {x i (1), x i (2), . . . , x i (T )} (1) represents the temporally ordered activity of the i-th unit of the system over T timesteps. In the case of financial markets, i is typically one particular stock and x i (t) is the 'log-return' of stock i, i.e. the difference between the logarithms of the price of i at times t and t − 1 (more details will be given later). The whole set of N time series, denoted by {X 1 , X 2 , . . . , X N }, describes the synchronous activity of all the units of the system. The vast majority of the available techniques aimed at quantifying the level of mutual dependency within such a set of multiple time series exploit the information encoded in the N ×N crosscorrelation matrix. The cross-correlation matrix C measures the mutual dependencies among N time series on a scale between −1 and 1. The ij th entry of C is defined as the Pearson correlation coefficient where is the covariance of X i and X j and is the variance of X i . In the above equations, the bar denotes a temporal average, i.e.
Clearly, the diagonal entries of the correlation matrix are C ii = 1. We will assume, as routinely done in order to filter out the intrinsic heterogeneity of time series, that each series X i has been standardized by subtracting out the temporal average X i and dividing the result by the standard deviation σ i , i.e. that X i has been redefined to (X i − X i )/σ i . Then the following expressions hold: Var Note that, despite in statistics the notation Corr[X i , X j ], Cov[X i , X j ] or Var[X i ] usually denotes a population value, i.e. a theoretical value calculated using the knowledge of the (joint) probability distributions for X i and X j , all quantities we have defined so far are instead sample quantities, i.e. measured on the specific realized values of a set of time series. Our choice of a somewhat unconventional notation is merely due to the fact that it allow us to describe various operations more compactly. We will need to denote the population value of a quantity only in a few cases, and when this happens such population value will coincide with the expected value f (X, Y, . . . ) (over the joint probability distribution of the random variables X, Y, . . . involved) of the corresponding sample quantity f (X, Y, . . . ). We will therefore directly express population quantities in terms of expected values when necessary.
We stress that empirical cross-correlation matrices are intrinsically limited by the fact that they assume temporally stationary and linearly interdependent time series. Clearly, both assumptions are in general violated in real financial markets and many other complex systems. Nonetheless, cross-correlations are still the most widely used quantity. Improving the definition of correlations is a very important open problem, but is beyond the scope of this paper. Here, we want to overcome the limitations encountered when the methods introduced so far to process or filter correlation matrices are used in order to identify a mesoscopic modular structure. These current limitations are in place even when correlations are an appropriate measure, i.e. for stationary and linearly interdependent time series. Therefore, our goal is that of introducing a consistent methodology that makes optimal use of correlation matrices in order to resolve the mesoscopic organization of complex systems. If improved measures of interdependency are introduced, our approach will still represent a valuable guideline in order to implement a consistent community detection framework in that case as well.
In what follows, we review the most important correlation-based approaches and their limitations. We will put special emphasis on financial time series, even if our discussion is more general.

A. Asset Graphs
Among the proposed approaches to filter crosscorrelation matrices, the simplest one is perhaps that of focusing on the strongest (off-diagonal) correlations by introducing a threshold value and discarding all the correlations below the threshold. The result can be represented as a network, also known as an Asset Graph (AG) in the Econophysics literature [4,19,20], connecting the nodes whose time series are more strongly correlated. Since the method entirely depends on the choice of the threshold, one usually investigates how the properties of the AG change as the threshold is varied. The method is quite robust to noise, precisely because it discards the weakest correlations that are more subject to random fluctuations. However, for the same reason it fails in detecting a mesoscopic organization (if present) of the system. In fact, the use of a global threshold prevents the identification of modules whose internal correlations, even if below the threshold because they are weak with respect to the strongest ones, are still significantly stronger than the external correlations with different modules. Therefore, while valuable as a filtering technique, the AG discards a significant amount of information and is not best suited to detect emergent groups of correlated time series. We provide additional information about AGs, along with an explicit example, when we analyze real financial data in sec.V B.

B. Minimal Spanning Trees
Another filtering approach looks for the Minimal Spanning Tree (MST) obtained again from the strongest correlations, but now retaining only the N − 1 correlations that are required for each node to be reachable from any other node via a connected path, while discarding those that produce loops [9]. This procedure automatically produces an agglomerative hierarchical clustering (a dendrogram) of the original time series and requires that the correlation matrix is 'renormalized' at each iteration of the clustering according to some protocol (the one having some distinct theoretical advantage is the so-called Single-Linkage clustering algorithm [9]), until a final filtered matrix is obtained.
The MST method does not require the introduction of an arbitrary threshold, but it assumes that the original correlations are well approximated by the filtered ones. At a geometrical level, this corresponds to the assumption that the metric space in which the original time series are embedded (via the definition of a proper correlation-based distance) effectively reduces to a so-called ultrametric space where well-separated clusters of points are hierarchically nested within larger wellseparated clusters [21].
Even if the method exploits the correlations required for the MST to span the entire set of time series, it discards all the weaker correlations. Moreover, the approximating (renormalized) correlations are progressively more distant from the original ones as higher and higher levels of the taxonomic tree are resolved. This means that the method is more reliable when using the strongest correlations to determine the low-level structure of the taxonomic tree (small clusters of time series), while it is progressively less reliable when using the weaker correlations to determine the high-level taxonomy (medium-sized and large clusters).
With the above warning in mind, the method allows one to identify correlated groups of stocks lying on separate 'branches' of the MST or that become disconnected when the associated dendrogram is cut at some level. However, this comes at the price of introducing an arbitrary threshold on the value of the correlation again. Moreover, just like the AG technique, the MST one does not compare internal and cross-group correlations (possibly with the aid of a null model) in order to identify emergent mesoscopic modules.

C. Planar Maximally Filtered Graphs
An alternative approach, which is similar in spirit to the MST but discards less information, is the so-called Planar Maximally Filtered Graph (PMFG) [22,23]. This method allows one to retain not just the correlations required to form the MST, but also a number of additional ones, provided that the resulting structure is a planar graph (a network that can be drawn on a plane without creating intersecting links).
A nice feature of the PMFG is that it always contains the entire MST, so that the former provides additional, and not just different, information with respect to the latter. However, also this method is affected by some degree of arbitrariness, which lies again in the properties of the postulated, approximating structure. There is no obvious reason why stocks (or other time series) should find a natural embedding in a bidimensional plane. In fact, the PMFG has also been described as the simplest case of a more general procedure based on the embedding of high-dimensional data in lower-dimensional manifolds with a controllable genus (number of 'handles' or 'holes') [23]. The PMFG corresponds to the case when the genus is zero. So the arbitrariness of the method can be rephrased as its dependence on some value of the genus that must be fixed a priori.
The method has been extended in a variety of ways in order to produce a nested hierarchy of time series by exploiting the properties of the embedding space [24][25][26]. However, as with the MST, the target of these methods is that of finding the postulated approximating structure, rather than optimizing the search of groups of time series that are more correlated internally than with each other.

D. Random matrix theory
We finally mention an important technique, based on random matrix theory (RMT) [7,17,18], which is widely used in order to identify the non-random properties of empirical correlation matrices. We will use this technique extensively in the paper. A correlation matrix constructed from N completely random time series of duration T has (in the limits N → +∞ and T → +∞ with 1 < T /N < +∞) a very specific distribution of its eigenvalues, known as the Marcenko-Pastur or Sengupta-Mitra distribution [27,28]. This distribution reads 11) and ρ(λ) = 0 otherwise, where the maximum (λ + ) and minimum (λ − ) eigenvalues are given by The bulk of the eigenvalues of an empirical correlation matrix that fall within the range [λ − , λ + ] can be considered to be mostly due to random noise. Thus, any eigenvalues larger than the maximum eigenvalue λ + predicted by the Marcenko-Pastur distribution are deemed to represent meaningful structure in the data [6][7][8]. That being the case, any empirical correlation matrix C can be decomposed as the sum of two matrices: where (using bra| and |ket notation) is the 'random' component constituted from the eigenvalues {λ i } less than or equal to λ + (usually, the eigenvalues smaller than λ − are included as well) and their corresponding eigenvectors {|v i }, and C (s) = C − C (r) is the 'structured' component constituted from the remaining eigenvalues corresponding to eigenvalues larger than λ + . The deviation of the spectra of real correlation matrices from the RMT prediction provides an effective way to filter out noise from empirical data, and also illustrates some robust property of financial markets. For instance, in Fig. 1 we superimpose the eigenvalue density of the empirical correlation matrix obtained from T = 2500 logreturns of daily closing prices of N = 445 stocks of the S&P 500 index (from 2001 to 2011) and the corresponding expectation given by the Marcenko-Pastur distribution with the same values of N and T . As also observed in a multitude of previous studies [4,5], a typical feature of the spectrum of empirical correlation matrices is that the largest observed eigenvalue λ m is much larger than all other eigenvalues (see inset of fig. 1). The corresponding eigenvector |v m has all positive signs and one can therefore identify this eigencomponent of the correlations as the so-called market mode [4,5], i.e. a common factor influencing all stocks within a given market. Interpreting this, the bulk of the correlation between pairs of stocks is attributed to a single common factor, much as all boats in a harbor will rise and fall with the tide.
In order to clearly see which 'boats' are rising and falling relative to one another, one must subtract out the common 'tide', which in terms of the correlation matrix leads to the further decomposition where we have rewritten the structured component as (representing the 'market' mode) and (representing the remaining correlations). The correlations embodied by C (g) act neither at the level of individual stocks (uncorrelated noise), nor at that of the entire market. Such correlations act at the level of sub-groups of stocks within a market, and they are often referred to as the 'group' mode [4,28]. The eigenvectors contributing to C (g) have alternating signs, and this allows the identification of groups of stocks that are influenced in a similar manner by one or more common factors [6][7][8]. Broadly speaking, these groups are expected to reflect some sectorial or sub-sectorial classification of stocks according to their industrial category, however the overlap between nominal asset classes and groups of empirically correlated stocks is only partial [6][7][8].
We should at this point stress that the above discussion makes some strong assumptions, which have been recently criticized. In particular, the interpretation of the largest eigenvalue in terms of a 'market' mode and the assumption that the elimination of the market and 'noise' modes does not alter the information present in the re-maining subspace are not correct in general, and sometimes only approximate [29,30]. Moreover, the eigencomponents of the correlation matrix, and consequently the filtered correlation matrix itself, can end up being not proper correlation matrices, and alternative constructions enforcing the required properties have been proposed [31][32][33][34]. Finally, the way to filter out the 'market' and 'noise' modes is not unique [35].
Bearing these limitations is mind, RMT is still to be considered a valuable tool to filter empirical correlation matrices and clean them from both stock-level (random) and market-wide fluctuations. However, after this preprocessing, filtered correlation data still needs to be analyzed according to the particular research question. For instance, the matrix C (g) is often processed further and used as an alternative, filtered input in all the algorithms (AG, MST and PMGF) described above. So RMT alone is not enough in order to resolve the mesoscopic organization of markets, in the sense defined above.

III. COMMUNITY DETECTION IN GRAPHS AND ITS INCONSISTENCY WITH CORRELATION MATRICES
In the previous section, we clarified that many of the available techniques used to identify the most relevant correlations are not designed to isolate groups of time series whose dynamical activity is more correlated internally than with that of other groups. At an abstract level, achieving this task would require iterated recursions into many attempted partitions of the system, an inherently non-local and computationally demanding operation. Notably, an entire branch of Network Science is devoted to an analogous problem, known as community detection [10]. In this section, we briefly illustrate the principles of community detection in networks and show how that knowledge can be in principle transferred to our initial problem, namely the identification of a mesoscopic organization across multiple time series. We also show that despite the many progressive inroads made in this direction so far, they often rely on an inherently biased approach.

A. Community detection in networks
In network analysis, community detection is the process of identifying relatively dense clusters of nodes. There has been a flurry of research in the area of community detection over the last decade [10]. In this paper we focus on the method of modularity optimization [36], which is one of the most popular methods identifying non-overlapping communities. It should be noted that various alternative methods other than modularity optimization exist, including techniques that resolve overlapping communities [10,37]. However, this method has the advantage of being based on a null model, acting as a community-free benchmark to which the real network is compared. It is the appropriate modification of such benchmarks that will lead us, in sec.IV, to a redefinition of modularity optimization methods valid for correlation matrices.
We restrict ourselves to undirected networks, since they exhibit the same symmetry property as correlation matrices. Given a network with N nodes, one can introduce a number of partitions of the N nodes into nonoverlapping sets. Each such partition can be mathematically represented by an N -dimensional vector σ where the i-th component σ i denotes the set in which node i is placed by that particular partition. Then, one can introduce the so-called modularity Q( σ) as a measure of the effectiveness of a particular partition σ in identifying densely connected groups of nodes. The process of modularity optimization seeks to find the optimal partition that maximizes the value of Q( σ), by varying the communities to which the different nodes of the network belong. The modularity Q( σ) is expressed in the form where, here and throughout the paper, the sum is intended to run over all pairs of nodes even if we are considering undirected networks, and we are also including the diagonal elements corresponding to i = j since many expressions become simpler with this choice. The meaning of the different terms of the above expression is as follows. The delta function is δ(σ i , σ j ) = 1 if σ i = σ j and δ(σ i , σ j ) = 0 if σ i = σ j , ensuring that only nodes within the same community contribute to the sum. For binary networks, A ij is the entry of the adjacency matrix representing the presence (A ij = 1) or absence (A ij = 0) of a link between nodes i and j in the observed network. The initial pre-factor works to normalize the value of Q( σ) between −1 and 1, where A tot ≡ i,j A ij = 2m is twice the total number m of links. The term A ij is a key element determining the outcome of the entire community detection process. It mathematically represents a null model for the network, i.e. an expectation for A ij under some suitable null hypothesis. The most popular null model for a binary network, known as the Configuration Model, is one where the expected value k i of the degree k i (number of links) of each node i is equal to the value N j=1 A ij observed in the real network and where the topology is otherwise completely random. This null hypothesis ensures that the local heterogeneity of nodes, e.g. the fact that more popular people naturally have more friends in social networks, is appropriately controlled for. Mathematically, this model is approximately (i.e. only when the heterogeneity of the degrees is weak [38,39]) represented by the expression which gives a rough estimate of the probability that nodes i and j are connected, under the null hypothesis that the observed network's structure is completely explained on the basis of the different degrees of vertices. For weighted networks, A ij denotes the weight of the link between nodes i and j, k i is called the strength of node i and 2m is twice the total weight (of all links in the network). Still, eq.(19) is used without modifications [10] to determine the (again approximate [39,40]) expected edge weight under the null hypothesis that the network's structure is completely explained on the basis of the observed strengths of all vertices.
The accuracy and usefulness of the results obtained from the process of modularity optimization depend heavily on the choice and suitability of the null model. When the null hypothesis is true, no higher-order patterns (including communities) are present. Consistently, one expects the modularity in eq.(18) to be close to zero for every partition. In maximizing the modularity for a network which does have community structure, the nodes that are more tightly connected than one would expect on the basis of their individual characteristics will be clustered together in the same community, while the nodes for which the opposite occurs will be placed in different communities.
It should be noted that, in the context of network analysis, the modularity function defined in eq.(18) suffers from a main drawback: it cannot resolve communities below a typical scale [41]. This resolution limit was proven to be rooted in the specific mathematical form of eq. (19) used to represent the null model. However, it was not proven to be due to the concept of the null model itself, i.e. to the choice of comparing the real network with an ensemble of graphs with given degrees (or strenghts). In particular, we stress again that eq. (19) only approximately represents such an ensemble, the exact formula being a more complicated nonlinear equation [38][39][40]. Whether the resolution limit disappears if the exact expression is used in place of eq.(19) has never been investigated. Rather, it has been proposed [42] that a way to change the resolution of the community detection is the introduction of an extra resolution parameter φ > 0 in the null model, i.e. replacing eq. (19) with Many studies have indeed shown that, as φ is varied, different hierarchical levels of the community structure can be revealed, so that a so-called multiresolution method can be obtained [42][43][44]. In general, multiresolution methods can resolve smaller subcommunities, which are nested inside larger communities. One should however bear in mind that the resolution parameter was originally introduced in an ad hoc fashion and without a theoretical foundation, its main justification being an agreement a posteriori with the hierarchical community structure expected in some real-world networks. Only later, it was shown to have some physical interpretation in terms of an inverse time required to explore the network under certain assumptions [44]. When extending modularity-based algorithms to the analysis of multiple time series, we will address the problem of multiresolution community detection in a fundamentally different way, which avoids ad hoc parameters and is theoretically consistent with the properties of correlation matrices (see sec. IV C).
B. The inconsistency of modularity for cross-correlation matrices The appealing properties of community detection in networks clearly have the potential to solve our initial problem of finding groups of time series that are more correlated than we would expect. However, one should be very careful in identifying the correlation-based problem with the network-based one. A naïve approach would be that of treating the empirical correlation matrix C as a weighted network, and looking for communities using the modularity as defined in eq.(18), i.e. setting A ij = C ij . This would result in a modularity of the form where C norm = i,j C ij , C ij = k i k j /C norm and k i = N j=1 C ij . This idea has been recently exploited, sometimes with modifications, to study communities of interest rates [15] and stocks [14,16] in financial markets.
Unfortunately, although the above approach has made a lot of headway, it suffers from some fundamental flaws and can lead to biased results, as we now show. The problem arises because the null model defined in eq. (19), while (approximately [38][39][40]) valid when the matrix A describes a network, is inconsistent if A is replaced by a correlation matrix C. To see this, note that if A ij = C ij and if X i denotes a standardized time series i (see sec. II), then eq.(10) implies (22) where X tot = {x tot (1), x tot (2), . . . , x tot (T )} is the time series of the total increment x tot (t) ≡ N j=1 x j (t). Note that, even if all X i 's are standardized, X tot has zero mean but non-unit variance, and is therefore not standardized. Similarly, It then follows that We therefore arrive at an important conclusion: for correlation matrices, the 'naïve' modularity as ordinarily defined in eq.(18) with the ordinary specification given in eq. (19) corresponds to the following null hypothesis: When used within the modularity function, the above null model will not necessarily give more importance to pairs of strongly correlated time series, but rather to pairs of time series whose 'direct' correlation C ij is larger than the product of the correlations of each time series with the 'common signal' X tot . On the other hand, if we want to detect communities of time series that are empirically more correlated than expected under the hypothesis that all time series are independent of each other, we know that the correct null model (at least for infinitely long time series, a hypothesis that we will relax later) is i.e. the expected correlation matrix C should be the N ×N identity matrix I. Other acceptable forms of C ij based on realistic properties of correlation matrices will be discussed later.
The origin of the problematic discrepancy between eq.(25) and eq.(26) is the fact that the null model defined in eq.(19) is meant to represent networks with given degrees, i.e. matrices with given column and row sums. Any matrix that matches this constraint is admissible, in the sense that it represents a possible 2 network consistent with the hypothesis that degrees are an important structural constraint. By contrast, sums over rows or columns of correlation matrices do not represent any meaningful constraint, as evident from eq.(22). Moreover, not every symmetric real matrix with given row and column sums is a possible correlation matrix. Correlation matrices must also be positive-semidefinite, i.e. have non-negative eigenvalues. A little algebra shows that eq.(25) fulfills this property, but in a very extreme way: the eigenvalues of the matrix having elements C ij naive are λ = 0 (with multiplicity N − 1) and (with multiplicity 1). This result holds irrespective of the original data, e.g. also for correlated and finite-length time series. Our discussion of the spectrum of realistic correlation matrices in sec. II D strongly indicates that a sensible null model for correlation matrices should feature an eigenvalue distribution that is not easily reducible to the extremely simple one found above.
Similar conceptual limitations are encountered also in more sophisticated null models, which while allowing for both positive and negative link weights [45], still consider all possible matrices (many of which are inconsistent with correlation matrices) with given sums over rows and columns. More importantly, the above problems cannot be solved by the introduction of resolution parameters. If, in analogy with eq.(20), we consider the generalized null model (with φ > 0), we are still left with an expression that cannot be reduced to eq.(26) or some other meaningful alternatives, which we will introduce later in sec. IV A. For instance, the eigenvalues become φλ, where λ still takes only the two values shown above. Further aspects of this limitation are explicitly illustrated in a benchmark case below, and imply that appropriate multiresolution community detection methods for correlation matrices should be implemented in a completely different way (see sec. IV C).

C. The bias produced by the naïve approach
To have an idea of the consequence of using the naïve approach, i.e. the application of a network-based modularity directly to a cross-correlation matrix, we consider an ideal benchmark case where N infinitely long time series are divided into c 'true' communities, specified by a 'true' partition σ * . We assume that each community A is made of n A standardized time series (with c A=1 n A = N ) that are perfectly correlated with each other and completely uncorrelated to the time series in other communities, i.e.
In such a case, (where n σ * i is the number of time series in the community of the time series i) and From the last two equations it follows that eq.(25), or more generally eq.(28), can be rewritten as (with φ > 0), which is the fundamental result showing the inconsistency of the naïve approach, and the nature of the resulting bias. Equation (32) can never lead to the correct expectation (26) because it cannot produce off-diagonal zeros. If there are c equally sized communities of n = N/c time series each, then C ij naive = φ/c for all i, j, i.e. the distribution of C ij naive has a single peak and zero standard deviation. In this case, apart from the minor 3 problem of non-unit diagonal entries, the use of C ij naive in eq.(18) can still be justified on the basis of the fact that φ/c is a constant term having no effect on the modularity maximization. However, for heterogeneously sized communities, eq.(32) does not lead to a mere overall shift in the modularity. As the size heterogeneity increases, the distribution of the off-diagonal entries of C ij naive will become broader. In general, C ij naive is larger for pairs of time series belonging to larger communities. This effect is shown in Fig. 2 for three choices of benchmark communities. The above consideration implies that the standard deviation (irrespective of the average) of the off-diagonal (i = j) entries of C ij naive can be taken as a quantitative measure of the bias induced by eq.(32). This definition depends linearly on the multiresolution parameter φ. Alternatively, the coefficient of variation (standard deviation divided by average value) of the off-diagonal entries of C ij naive is a measure of the relative bias of the naïve approach, and is independent of φ. One should bear in mind that when the value of the coefficient of variation is much lower than one, the heterogeneity is moderate while when it approaches or exceeds one then the heterogeneity is such that the average value is no longer representative of the distribution.
In Fig. 3 we show both the bias (for φ = 1) and the relative bias as a function of size heterogeneity, the latter being in turn defined as the coefficient of variation of community size. We see that the (relative) bias first steadily increases as the size heterogeneity increases from zero to approximately two, and then decreases when the heterogeneity further increases. This decrease corresponds to entering an extremely heterogeneous regime where there is a giant community of O(N ) nodes, and other very small communities of O(1) nodes. In this regime, the effective number of communities is practically one and the distribution of C ij naive becomes sharp again, as most entries have the same value. So, for a very broad range of heterogeneity (say, when the coefficient of variation of community sizes is between 0.5 and 2.5), the (relative) bias is very strong. In this regime, eq.(32) gives a prediction C ij naive ≈ 0 (close to the correct expectation) only for pairs of time series belonging to the smallest community. For such time series the difference C ij − C ij naive is still close to one, and one therefore expects that the smallest community will be detected correctly. However, for time series belonging to larger communities C ij naive increases, progressively biasing the community detection. For the largest community, the expected internal correlation is always larger than the correlation among any pair of communities, so C ij − C ij naive is very low and this community is paradoxically difficult to detect.
It should be noted that the use of the multiresolution parameter φ does not help reduce the relative bias, as the latter is independent of φ. In order to reduce the absolute bias (which for φ = 1 has values around 0.3 in the relevant regime, see fig. 3) to small values (say of the order of 0.01), φ should be set to very small values (around 0.03), which is another way of saying that the null model in eq.(28) should effectively be replaced by that in eq.(26) (we recall that we are referring only to the off-diagonal entries here), thus confirming our previous discussion.
The above results lead us to conclude that the ordinary definition of modularity, even with the introduction of a multiresolution parameter, cannot properly detect communities. This limitation would systematically bias any modularity-based community detection algorithm. It is therefore clear that ordinary network-based clustering methods, when used with correlation matrices, lead to incorrect results. In the rest of the paper, we try to overcome this limitation.

IV. REDEFINING COMMUNITY DETECTION METHODS FOR MULTIPLE TIME SERIES
We now come to the most pertinent of our results, i.e. the introduction of improved and consistent methods to cluster multiple time series using appropriate null models. In sec.IV A we give three redefinitions of the modularity Q( σ) that make use of the results of RMT, which we summarized in sec.II D. In sec.IV B we introduce the correlation-based counterparts of three of the most popular community detection algorithms used in network analysis. In sec.IV C we discuss how these algorithms can be further extended in order to obtain appropriate, multiresolution community detection methods. Finally, in sec.IV D we benchmark our methods on various test cases.

A. Correlation-based redefinitions of modularity
From our previous discussion it should be clear that simply replacing network data with correlation matrices in eq.(18) leads to eq.(21) where C norm is i,j C ij and the null model C ij is incorrectly given by eq. (25). We now introduce three redefinitions of modularity based on appropriate null models. The end result of this redefinition will be a set of modularity functions that correctly identify communities of correlated time series. For  compactness, we postpone the possible (re)definition of C norm to the end of this discussion, in sec. IV A 4.

Infinite time series without global mode
We have already noted that, for infinitely long time series, the correct expression corresponding to the null hypothesis of independency is given by eq. (26). This leads us to a first redefinition of modularity with expectation C ij 1 ≡ δ ij , i.e. where

Finite time series without global mode
For finite-length independent time series, we should further modify our null model to one which anticipates a certain amount of noise, as determined by RMT (see sec.II D). In such a case, we know that the correct null hypothesis is C ij 2 ≡ C (r) ij where C (r) is given by eq. (14). This gives us a second redefinition of modularity for dealing with noisy correlation matrices: Note that now in general C (s) ii = 0 as a result of the eigendecomposition defined in eq. (13). However, the diagonal terms with i = j give an irrelevant constant contribution to the modularity, due to the fact that δ(σ i , σ i ) = 1 for all i, independently of the particular partition σ. This makes the above definition well defined even in the presence of non-zero diagonal entries.

Finite time series with global mode
Lastly, we consider the case where we expect an overall level of positive correlation among all time series, or 'global mode'. For instance, we have already mentioned that in financial markets the presence of the 'market mode' (see sec. II D) generally results in a positive correlation affecting all pairs of stocks altogether. The corresponding dominant positive component C (m) of C would make Q( σ) be maximized by the (trivial) partition where all time series are in the same community. In order to detect non-trivial communities, we can choose a null model that includes both the random component of the correlation matrix and the global or market mode, i.e.
where C (r) and C (m) are given by eqs. (14) and (16) respectively. This yields our third and final formulation for the modularity: In this case as well, C ii = 0 as a result of the eigendecomposition defined in eq.(15), but this does not affect the outcome of the community detection.
The above definition is now explicitly aimed at detecting mesoscopic communities, which are in between the 'microscopic' level of unit-specific noise and the 'macroscopic' level of system-wide fluctuations. While the existence of the market mode is well established in finance, for other types of time series it might be inappropriate to postulate the existence of a global mode. However, we also expect that, whenever the use of Q 1 ( σ) or Q 2 ( σ) yields only a single community, the most plausible reason is the existence of a global mode. Accordingly, we expect that the use of Q 3 ( σ) might be the most appropriate way to filter out global dependencies for a variety of systems, not only for financial markets. Moreover, as we discuss at length in sec.IV C, iteratively filtering out the global mode from the correlation matrices restricted to individual communities can result in the definition of a useful multiresolution method to resolve multiple hierarchical levels of community structure, if present.

A unified redefinition
For simplicity in what follows, it is useful to express the three definitions of modularity we gave in eqs. (33), (34) and (35) in unified form: where In what follows, given a choice of l we will refer to C (l) as the 'filtered' correlation matrix.
The overall constant C norm has no role in determining the final partition, but it does have a role when different systems, or different snapshots of the same system (including dynamical analyses of community structure), are compared. For simplicity we keep the same definition as in eq.(21), i.e.
This definition implies that the modularity is the sum of intra-community (filtered) correlations, divided by the variance of the total increment X tot . This variance is a natural measure of the volatility of the system over the considered time window, which in the case of financial time series is an important property of the market. In other words, eq. (38) automatically controls for the volatility of the system, a feature that is typically desirable when analysing the evolution of (the community structure of) wildly fluctuating systems. However, in some cases it might be interesting to compare the above modularity with one calculated using a different definition of C norm , e.g. one that does not control for the volatility.
It should be noted that the above definition is such that the typical (for real-world systems like financial markets) values of the modularity defined in eq.(36) will tend to be much lower than the typical (for real-world networks) values of the modularity defined in eq.(18), even for systems with well-defined communities. One should bear this consideration in mind when interpreting the (maximized) modularity value as a measure of the strength of community structure in the system. Unlike its network counterpart, our definition of the modularity does not quantify the strength of community structure in an absolute scale between −1 and +1. It only has a meaning in relative terms, and the more information is contained in the null model, the lower the value of the resulting modularity.
We remind the reader of the fact that, since the results of RMT used in the above definition hold only in the regime where N and T are both large (with T > N ), we require the original time series to respect these conditions. The requirement T > N is sometimes referred to as the 'curse of dimensionality' in the literature, since it implies that, in order to study the cross-correlations of a large set of time series, one needs to extend the time interval so much that the assumption of stationarity (implicit, as we mentioned, in the definition of crosscorrelations themselves) is violated. On the other hand, choosing sufficiently short time intervals to make the time series approximately stationary implies that the number N of time series be severely reduced. One should therefore choose the data in such a way that a reasonable compromise is achieved. This is an ordinary trade-off to be made in the analysis of any empirical (financial) cross-correlation matrix.
We finally stress that the three RMT-based null models we have adopted do not represent the only possible choices. One might for instance exploit more sophisticated results [29][30][31][32][33][34][35] and introduce refined null models that overcome some of the limitations of RMT that we mentioned in sec. II D. These alternative choices can then be incorporated into our approach by redefining C l and consequently C (l) . Exploring the entire space of possibilities is beyond the scope of this paper. The key point we are stressing here is that, whatever the choice of the null model, it must respect some realistic properties of correlation matrices. The network-based definition of modularity, which has been used so far, does not do so and as such is not the best choice. Our approach can therefore be considered as a guideline, in order to introduce improved techniques in the future.

B. Maximizing the new modularity
The discussion so far completes our first task of introducing modularity functions which are consistent with the properties of correlation matrices. Our second task is that of incorporating the above definition(s) into community detection algorithms that seek to maximize the modularity. Below, we start by briefly mentioning the algorithms we adapted in order to search for the optimal partition (more extended descriptions are in the Appendix) and then prove an important property of the optimal partition itself, namely the fact that its communities are internally positively correlated and mutually negatively correlated.

Redefining three community detection algorithms
Given our new definition of modularity in eq.(36), we cannot directly apply the traditional optimization algorithms devised for graphs, since the majority of these algorithms rely in some way or another on the properties of the original network-based definition of modularity, where the degrees of nodes are used to construct the null model. For this reason, we selected three of the most popular network-based community detection algorithms and reformulated them to be compatible with time series data and our new definition of modularity. The three algorithms we selected are known as the Potts (or spin glass) method [42,46], the Louvain method [47] and the spectral method [48]. Note that even if these techniques are customarily referred to as 'methods', they can actually be considered as three different algorithms implementing the same method of modularity maximization. Since the appropriate redefinition of these algorithms can require quite technical discussions, it is described in the Appendix.
We note that there exist many modularity maximization algorithms, some of which may already be much better suited to our definition of modularity. However, we wanted to choose popular algorithms whose original specifications required varying levels of rework, ranging from verification of its suitability to accommodate time series based modularity through to modifications of the underlying tenets of the algorithm itself.
Doing so, allows us the possibility to illustrate further differences between network-based and correlation-based community detection problems. The reader is again referred to the Appendix for a detailed discussion of these differences.

Identifying anti-correlated communities
We now prove the result that the partition maximizing the modularity (whichever method is used to search for it) is characterized by positive intra-community (filtered) correlations and negative inter-community (filtered) correlations.
Let us first define the 'renormalized' inter-community correlations (also see the Appendix) where the notation i ∈ A indicates that the node i belongs to the community A, and the sum is over all such nodes. Now, assume that we have identified the optimal partition maximizing the modularity, and consider the modularity change ∆Q l that would be obtained by further merging two different communities of the optimal partition, say A and B. From eq.(36), we can write this change as The above change cannot be positive, otherwise merging A and B would further increase the modularity, which is impossible since A and B are communities of the optimal partition. Therefore ∆Q l ≤ 0 which also implies C (l) AB ≤ 0. On the other hand, for every community A of the optimal partition we must haveC (l) AA ≥ 0, otherwise A would give a negative contribution to the modularity, which is impossible as the partition where all nodes of A are isolated communities would have higher modularity than the optimal partition. Taken together, these considerations imply that The above result follows simply from the maximization of eq.(36) and will be confirmed empirically in sec.V D. So our algorithms effectively partition the network into mutually anti-correlated communities of positively correlated time series, where it is intended that the term '(anti-)correlated' refers to the residual correlations remaining after applying the filtering procedure defined by eq. (37). For this reason, we will sometimes use the term 'residually (anti-)correlated' when referring to the sign of filtered correlations. As we will discuss in more detail in sec.V D, this property has important consequences for portfolio optimization and risk management.

C. Multiresolution community detection
We now come to the problem of introducing an appropriate multiresolution method. As we mentioned, one way to resolve a hierarchical community structure in ordinary networks using a modularity-based community detection algorithm is that of introducing a resolution parameter φ as in eq. (20). We have already noted, in our discussion of eq.(28), that the same operation would not cluster correlation matrices appropriately if applied to the naïve null model appearing in eq. (25). The same kind of limitation persists if we introduce a resolution parameter multiplying any of the three improved null models C l defining eq.(36) through eq. (37). While the range of any observed correlation coefficient C ij is [−1, +1] by construction, a resolution paramater would unreasonably map the range of the expected correlation C ij to [−φ, +φ]. Similarly, since the null correlation matrices C l we introduced are obtained from the eigencomponents of the observed correlation matrix C, rescaling them by φ is equivalent to an overall rescaling of the corresponding eigenvalues of C, which is again an unjustified operation.
Given the above limitations, which indicate a lack of theoretical foundation for resolution parameters in the case of correlation matrices, we introduce a completely different multiresolution approach that is specifically designed for multiple time series, and has no counterpart in network analysis. After running one of our newly introduced community detection algorithms on the original empirical correlation matrix C, for each community of size s in the optimal partition we consider the corresponding s × s sub-matrix C * of C. For this sub-matrix, we define the three null models C * l as discussed in sec.IV A for the original matrix C. By running our community detection algorithms recursively inside each of the communities, we can thus resolve subcommunities within communities. Iterating this procedure identifies a hierarchical community structure, if present. Within each community, the procedure stops automatically when it resolves no further subcommunities.
At each iteration, the 'noise' component C (r) * will have the same interpretation as when it is identified on the entire correlation matrix, since C * is the sub-matrix of the original matrix C and not of the filtered matrix C (l) defined in eq.(37), so it still contains the node-specific noise component (the reason why we do not consider the sub-matrix of C (l) is because, as we mentioned, the latter may not be a proper correlation matrix [31][32][33][34] and cannot thus be filtered further using RMT). The 'global' mode C (m) * is now interpreted as the 'community' mode, i.e. a common factor influencing all the time series within that particular community. This will now include both the system-wide mode C (m) , restricted to the subspace relative to C * , that would be identified on the entire matrix C (e.g., in the case of financial time series, the market mode) and a genuinely community-specific mode not shared with the time series in other communities. Different communities are therefore possibly characterized by different community modes, and the fact that both this mode and the restriction of the global mode are filtered out is precisely what allows the algorithm to resolve deeper hierarchical modules. Finally, the 'group' component C (g) * represents the effect of subgroups nested within the specific community, if present.
It should be noted that the original correlation matrix C typically has large dimensionality (large N ), a property ensuring that the results of RMT, in particular the expected eigenvalue distribution appearing in eq.(11), hold to a satisfactory level. However, when considering smaller subcommunities, RMT becomes less reliable because eq.(11) no longer holds for small sets of nodes. For this reason, for small submatrices (low-dimensional C * ) it is preferable to determine the eigenvalues λ ± not via eq.(12), but by randomly shuffling the temporal increments of the original time series and constructing the corresponding spectrum as shown in Fig. 1.
We conclude by noting that, for the particular case of multiple time series, there is another 'multiresolution' character, which can be attached to the problem of community detection, namely the fact that different communities can in principle be obtained for different choices of the initial temporal resolution, i.e. for different choices of the frequency of the original time series (e.g. second, minute, or daily returns). Note that this notion of temporal resolution is specific to correlation matrices and has no analogue in the ordinary problem of community detection in networks. It is also not necessarily attached to an idea of hierarchy, in the sense that we do not expect e.g. communities obtained at higher frequency to be necessarily nested within communities obtained at lower frequency (even if this can reasonably happen in some cases). To distinguish this specific notion from the usual one of multiresolution community detection, we will refer to it as the 'multifrequency' problem and address it separately in sec.VI.

D. Benchmarking our methods
Before applying our methods to the analysis of real correlation matrices, we ran a series of tests confirming that we can correctly detect correlated sets of time series in controlled benchmark cases. Our benchmarks consist of heterogeneously sized communities of time series that are internally correlated and additionally display varying levels of noise and global signal (market mode). The reason why we consider heterogeneous community sizes is because this is the more challenging case where we showed the naïve method to display a higher bias (see sec. III C).
We constructed these benchmarks by first choosing the number N of time series, the number c of communities and the desired number n A of time series in each community A, such that c A=1 n A = N (as in sec. III C). Then we generated c random and uncorrelated time series (with T > N ) with values γ A (t) (where 1 ≤ A ≤ c) drawn independently from a normal distribution with zero mean and unit variance. Then, we created n A identical copies of the A-th time series, for all A. To each of the resulting N time series, each labeled by an index i, we added a local noise β i (t) (a new normally distributed random variable with zero mean and unit variance, independent of all the other ones) multiplied by a 'noise parameter' ν ≥ 0 and a global signal α(t) (again, an independent normally distributed random variable with zero mean and unit variance) multiplied by a 'market-mode parameter' µ ≥ 0. This resulted in a set {Y 1 , . . . , Y N } of N time series with values y i (t) = µ·α(t)+ν ·β i (t)+γ A (t) i ∈ A, A = 1, c. (42) Note that this procedure is similar to the so-called 'factor models' used in financial analysis [5,29,30,[49][50][51]. The time series {Y 1 , . . . , Y N } were further standardized to obtain a final set {X 1 , . . . , X N } of N time series, each with zero mean and unit variance, in compliance with the general prescription mentioned in sec. II.
We generated several benchmarks according to the recipe described above, for various choices of N , c, {n A }, µ and ν. In general, when µ = ν = 0 the benchmark is  35,60,85,110,140,165,190, and 215 time series respectively), were initially generated according to eq. (42). Then, the 1000 × 1000 correlation matrix C was calculated. The heat maps in this figure show the values of the entries of the filtered matrix C (g) defined in eq.(17) and obtained by removing the noise and 'market-mode' components from the original correlation matrix. Blocks along the diagonal represent the residual correlations within each community, while off-diagonal blocks show the residual negative cross-correlations among communities. Our method, here using the Potts algorithm (see Appendix) to maximize the modularity Q3( σ) defined in eq.(35), was always able to correctly identify the target communities, even for values of µ and ν exceeding one. This is indicated by the value of V I (averaged over 10 runs of the community detection algorithm) calculated between the 'true' and the detected partition in each benchmark. The average (over multiple runs) maximum modularity value Q3( σ * ) is also shown in each case. similar to the ideal one described in sec.III C: the communities are completely correlated internally (all the time series in the same communities are identical) and uncorrelated with the time series in other communities. This results in a benchmark partition σ * such that, for infinite time series, C ij = δ(σ * i , σ * j ). However, for finite (but still such that T > N as prescribed by random matrix theory, see sec.II D) time series, C ij will be affected by noise. As µ and ν increase, additional noise will be generated and the community structure will be more difficult to detect. If µ = 1 (ν = 1) then the amplitude of the global mode (local noise) is the same as that of the community signal. Therefore when µ and/or ν approach or exceed one, the community detection problem becomes more challenging. Still, the ambition of our method is that of correctly identifying the benchmark partition σ * even in this 'hard' regime.
In Fig. 4 we show nine benchmarks, organized in a 3 × 3 table with different combinations of values for µ and ν. In all these cases, the communities to detect are the same set of c = 8 heterogeneously sized communities shown previously in fig.2c. The color maps show the values of the entries of the filtered correlation matrix C (g) defined in eq.(17), i.e. the residual correlations obtained after removing the noise and market-mode components. It can be seen that, even for values of µ and ν exceeding one, the filtered matrices always display a clear blockdiagonal structure with a visible contrast across diagonal and off-diagonal blocks.
In all these benchmarks we confirmed that, using the corresponding modularity Q 3 ( σ) defined in eq.(35), our method succeeded in detecting the correct partition σ * . We quantitatively measured the performance of our method in terms of a metric known as Variation of Information (V I) [52,53], which measures the entropy difference between two partitions of the same network, providing a rigorous way for us to quantify the similarity between the 'true' partition and the one identified by our method. More precisely, V I involves the use of Shannon's entropy to measure the amount of uncertainty that exists across the set of communities of two different partitions of the same network. It provides a quantitative measure of the difference between two partitions, a normalized value where zero implies the two partitions are completely identical and one implies that they are completely unrelated. As can be seen from Fig. 4, the values of V I (averaged over multiple runs of the community detection algorithm) are zero or extremely small, indicating a perfect or almost perfect performance of the method.
The average (over multiple runs) maximum modularity value Q 3 ( σ * ) obtained in the above benchmarks is also illustrated in Fig. 4. Lower values of the modularity imply that the network as a whole is more homogeneous in its construction, to the extent that the detected communities exhibit only a relatively weak increase in their collective correlation, above the ambient level. As expected, we see that the modularity decreases for increasing levels of market mode. Increasing levels of noise however do not have such a strong effect, since noisy time series tend to diminish the strength of the intra-community correlations, which enter in both the numerator and denominator of the modularity. In contrast, the market mode has significant impact on the inter-community correlations, which primarily end up only in the denominator of the modularity. Hence the observed decrease in modularity with an increase in market mode. The corresponding low values of the modularity confirm what we had anticipated about the effects of eq. (38). We should bear these effects in mind when interpreting the (low) values of the modularity arising from the partition of real financial time series, where the market mode is very strong. The fact that our method correctly identifies the benchmark partitions even for strong market mode (and low resulting modularity) makes us confident that it will also properly detect the community structure of real markets.

V. THE MESOSCOPIC ORGANIZATION OF REAL FINANCIAL MARKETS
Having redefined the modularity consistently with the properties of correlation matrices and appropriately reconfigured three different techniques for optimizing it, we are now in a position to apply our methodology to a variety of real-world data sets and evaluate the quality of the results. In particular we will apply our three algorithms and the null model expressed in eq.(35) to time series representing stock prices from a variety of stock indexes that span multiple industries and multiple countries. We first obtained static results, including the multiresolution community structure as introduced in sec. IV C, using time series of log-returns of daily closing prices for all the three indexes. These results are shown in this section. Then, we considered different temporal (frequency) resolutions and studied the time dynamics of community structure. These additional results are described in secs.VI and VII respectively.

A. Data and pre-processing
The indexes we used are the S&P 500 (US Large Cap. Stocks), the FTSE 100 (British Large Cap.) and the Nikkei 225 (Japanese Large Cap.). For each of these indexes, we considered a period of 2500 trading days, corresponding to approximately 10 years of market activity, from 2001Q4 to 2011Q3. We selected all stocks for which complete data are available during this period. This resulted in the selection of 445 S&P stocks, 78 FTSE stocks and 193 Nikkei stocks. All these stocks are classified within the Global Industry Classification Standard (GICS) 4 . The complete taxonomy can be found online 5 , however we briefly mention that there are ten top-level 'sectors' (see table I) split into 24 sub-categories called 'industry groups', which are in turn divided into 68 'industries'.
It is important to note that, although we would expect stocks within certain industry sectors to be correlated with each other, we do not expect to observe this effect within and throughout all industry sectors. Previous research in the area of stock clustering [4-6, 20, 54, 55] (see also our discussion in sec.II) has shown some relationships between the industry sectors and clusters of stocks identified by the various methods. We therefore expect to find a certain degree of overlap with this research. However, our choice of null models in conjunction with our tailored community detection algorithms is designed to uncover nontrivial correlations, beyond a direct mapping to industry sectors, such as finding stocks from different industry sectors that tend to move together, and even in opposition to other stocks in their own sector. It is therefore useful to use industry sectors not as a target, but as a baseline to highlight important and non-trivial deviations identified by the community detection algorithms.
As with the benchmarks described in sec.IV D, each set of financial time series was used to initially create a correlation matrix C that was then filtered to produce the matrix C (g) following the procedure described in secs. II D and IV A. Each such matrix was then operated on individually by the three community detection algorithms described in sec. IV. We found that all algorithms always generate very similar partitions. This important result, which for the sake of exposition, is postponed to sec. V E, implies that we can refrain from showing the results of every algorithm. For brevity, we will instead select representative exemplars, with the understanding that any one of the algorithms would generate very similar results.

B. Standard approaches
Before showing the main results of our own methodology, as a preliminary study we illustrate what would be obtained using some of the standard approaches available, in particular the correlation thresholding described in sec.II A and the community detection built on the network-based modularity, described in secs.III B and III C.

Asset Graph from Fisher-transformed correlations
As we discussed in sec.II A, imposing a threshold on the entries of a correlation matrix C allows us to obtain an Asset Graph where links connect the more strongly correlated pairs of stocks [4,19,20]. In fig. 5 we show the effect of this procedure on our S&P 500 data. Rather than showing the results for multiple choices of the threshold, we used a rough criterion to select a unique threshold that would in principle correspond to a standard level of statistical significance. This criterion is as follows.
Using general results in statistics [56], one can easily show that, under the null hypothesis, two time series X i and X j of length T representing T realizations of two independent and normally distributed random variables, the quantity (where C ij is the sample correlation coefficient) is distributed as a normal variable with zero mean (representing the population correlation coefficient in the case of independent variables) and standard error In other words, under the above null hypothesis we expect a concentration of values of z ij around zero, with standard error σ.
In order to detect significant deviations from the null hypothesis, one may select a threshold τ such that only the values outside τ standard errors, i.e. |z ij | > τ σ, are considered as statistically significant. This means that one can select a threshold z τ ≡ τ σ for z ij . In terms of the correlations C ij , the corresponding critical value is A suitable choice of the value of τ can be used to threshold the correlation matrix into an Asset Graph at the corresponding significance level: specifically, one can draw a link only if |C ij | > C τ . The advantage of introducing the above criterion is that, at least in principle, it associates a precise statistical significance level to any value of the threshold (there are however various problems with this approach, as we briefly comment later). This makes it possible to select a unique threshold value corresponding to a standard accepted level of significance. We used the above approach as a rough criterion to select an indicative threshold, choosing τ = 2 so that only the correlations lying two standard deviations away from the null hypothesis are in principle retained. The resulting Asset Graph for the stocks of the S&P 500, plotted in fig.5, was visualized using a clustered rendering [57] of all the stocks that do not end up completely isolated after the filtration, according to the Fruchterman-Reingold [58] force-based algorithm. As expected, we immediately see a significant correspondence between groups of densely connected nodes and industry sectors. However, there is no linear relationship between the attractive and repulsive forces defined by the graph drawing algorithm and the contribution of the corresponding correlations to the modularity. As such, the visualization of the graph cannot be directly used to partition the network into communities.
Moreover, it should be noted that the approach we have used to define a threshold has two main theoretical disadvantages: first, it assumes normally distributed logreturns (while it is well known that real log-return distributions are fat-tailed [12,13]); secondly, it does not in- The network is generated from the correlation matrix of the constituent stocks, after taking the Fisher transform and setting a threshold at 2 standard deviations. The color of each node represents the industry sector to which that stock belongs (see Table I). The force-based layout clearly indicates the existence of strong connections between stocks of the same industry sector, however this approach (like any other threshold-based approach) cannot identify communities of stocks that are internally more correlated than with the rest of the market, and mutually anti-correlated.
FIG. 6. The trivial, single community containing all stocks of the S&P 500 (log-returns of daily closing prices from 2001Q4 to 2011Q3), obtained by either naïvely treating the correlation matrix as a weighted network and using the ordinary networkbased modularity, or alternatively using the correlation-based modularity Q1( σ) (i.e. without filtering the correlation matrix). In both cases, the Louvain algorithm (see Appendix) has been used. The colors represent different GICS sectors (color legend in Table I) and span an area proportional to the number of stocks in each sector.
troduce multiple hypothesis test corrections. A more rigorous way to statistically validate links in a correlationbased network would be that of using numerical bootstrapping methods such as the one considered in ref. [59].
In any case, since no search over the space of possible partitions is performed, the Asset Graph method cannot identify communities of stocks that are more strongly correlated internally than with the rest of the market. As we anticipated in sec.II A, this leaves the problem we started with unsolved. We also recall from sec. IV B 2 that our methodology detects residually anti-correlated communities. This property, which we will illustrate in sec. V D for the data considered here, cannot be achieved by any threshold-based method, or any of the other available methods we described in sec.II.

Naïve application of community detection
As another baseline reference, in fig.6 we show the result of applying to the same S&P 500 data, the community detection described in sec. III B, i.e. by treating the correlation matrix C as a weighted network and running an ordinary (network-based) community detection algorithm [14,15]. We see that the resulting, trivial, community is a single one spanning the entire set of stocks. In such a case, the pie chart depicting the community merely illustrates the distribution of industries within the S&P 500. The same result is obtained if one uses the correlation-based modularity Q 1 ( σ) defined in eq.(33) in terms of the null model C = 1 (i.e. assuming that all time series are completely independent and of infinite length).
In the first case, this result is due to the inconsistent structure of the modularity and to the resulting bias of the algorithms used to maximize it, as we discussed in sec.III B. In the second case, it is due to the inadequacy of the null model defined in eq.(26) for financial correla-  Table I). The blue inter-community link weights are negative, indicating that the communities are all residually anti-correlated. The red circles around each community indicate that the total intracommunity correlations are all positive.
tions (see sec.IV A): the community detection algorithm finds only a single community because of the systemic correlation of the market mode affecting all stocks simultaneously.

C. Community detection using our method
We now come to the application of our own methodology described in sec.IV. In fig.7 we show the result of the application, to the same daily S&P 500 data, of the appropriately redefined community detection methods introduced in sec.IV B. Specifically, making use of the modularity Q 3 ( σ) defined in eq. (35). Since such null models discount both random and market-wide correlations, the community detection algorithms are now able to successfully find correlations that exist in between the microscopic and macroscopic levels. For the S&P 500, the result is a set of five mesoscopic communities whose relative size (the number of nodes in each community) is expressed by the size of the pie chart in the graph. The relative breakdown of the stocks in each community, classified according to their top level GICS sector (see table  I Table I). The blue inter-community link weights are negative, indicating that the communities are all residually anti-correlated. The red circles around each community indicate that the total intracommunity correlations are all positive. community.
In addition to the communities presented for the S&P 500, in figs. 8 and 9 we also provide the communities for the FTSE 100 and the Nikkei 225 respectively, again detected using the null model from eq. (35). As before, the naïve community detection would place all stocks into a single community (not shown). For all these data sets, the values of the maximized modularity Q 3 ( σ * ) achieved by the optimal partitions will be shown later in sec.V E.
While at first glance it may seem as though there is no particular pattern to the community structures in the three markets (as each community contains a plethora of stocks from different industry sectors), a closer look at the industries to which the stocks belong though does in fact yield some interesting observations. First and foremost, some of the industry sectors tend to dominate communities, where in some cases 100% of the stocks for a particular industry sector are in the same community, meaning that on average over the past ten years they have all remained correlated. Examples of this include Energy ( , community B), Financials ( , community C) and Information Technology ( , community D) in the S&P, Utilities ( , community A), Health Care ( , community A), Information Technology ( , community  Table I). The blue inter-community link weights are negative, indicating that the communities are all residually anti-correlated. The red circles around each community indicate that the total intracommunity correlations are all positive.
C), Telecom. Services ( , community C) and Energy ( , community E) in the FTSE, and finally Utilities ( , community B), Energy ( , community B) and Consumer Staples ( , community B) in the Nikkei. There are also instances where top-level sectors are split among different communities according to their subclassification (Industry Group and Industry). This is very interesting because it shows that subgroups of stocks within one sector are often more correlated with a different sector than their own sector.
Other interesting cross-sector correlations can be found too, with Health Care for example. In the FTSE communities, Health Care stocks ( ) are exclusively in community A, whereas in the S&P they are predominately in community E, with some in community D. Interestingly enough, in the latter case the one Health Care Technology industry sector stock from the Health Care sector is in community D, which also happens to be the community containing all of the Information Technology (IT) stocks ( ), whereas all of the Pharmaceutical stocks are in community E, which contains the bulk of the Consumer Staples ( ) stocks. The reader might at this point notice that the FTSE community A containing all Health Care ( ) stocks also contains the bulk of the Consumer Staples ( ) stocks. It is probably not surprising then to discover that those Health Care stocks are comprised of entirely Pharmaceuticals. We find an identical relationship between Pharmaceuticals and Consumer Staples in community B of the Nikkei 225 as well. Furthermore, the one other Health Care stock, a Health Care Equipment & Supplies stock trades in the same community as the IT stocks. This might not be particularly interesting except for the fact that in the Nikkei, the vast majority of IT sector stocks are sub-classified as Electronic Equipment.
One might continue finding interesting trends such as these, however our purpose is not to glean specific qualitative information regarding financial markets, but rather to illustrate how the underlying quantitative information can be ascertained from the raw data, through the appropriate choice of null models in conjunction with the process of community detection. The most important result of this process is the successful identification of mesoscopic communities of correlated stocks that are irreducible to a standard sectorial taxonomy and also anticorrelated with each other, as we now discuss.

D. Residually anti-correlated communities and portfolio optimization
The age old proverb, "Don't put all your eggs in one basket", could never be more insightful than when deciding how to invest one's money. Entire departments of almost every investment bank, insurance firm and hedge fund are dedicated to picking the right baskets for their customers' nest eggs. This process is often referred to as portfolio optimization (or asset allocation) and involves optimizing the way in which a sum of money is divided up between a variety of financial instruments such that one maximizes the return for a given risk, or alternatively minimizes the risk for a given return. According to Modern Portfolio Theory (MPT) [60][61][62], which is widely used in the financial world to calculate asset allocations, one of the most effective ways to accomplish this is through diversification, that is to select groups of assets which are as uncorrelated as possible, or even anti-correlated.
Clearly, we can identify numerous parallels between MPT and our community detection method. As we anticipated in our proof of eq.(41), a key property of the correlation-based modularity is that its maximization will identify mutually anti-correlated groups of time series (where anti-correlations are intended as residual, if some filtering has been applied). Indeed, in figs. 7, 8 and 9, all the links connecting different communities have negative weights, i.e. all communities are mutually anti-correlated.
The (residual) anti-correlations among communities allow us to identify combinations of stocks, which on top of the overall market mode and purely random fluctuations, move in opposition to each other. Recalling from eqs. (A6), (A8) and (A9) that whereX A ≡ i∈A X i , we obtain a practical recipe to construct a set {X A } of community-specific indexes (each built as the sum of the time series of the stocks within a community) such that, as follows from eq.(41), In other words, the two indexes are residually less correlated with each other than expected under the null model, i.e. their mutual filtered correlations are negative. This is a desirable trait from the point of view of risk management and portfolio optimization.

E. Comparative analysis of the three algorithms
We now show a result that we anticipated at the beginning of this section, i.e. the fact that the three algorithms we introduced in sec.IV B identify a very similar community structure on the data we considered. This makes the results shown so far quite robust under changes of the protocol used to derive them.
In figs. 10 and 11 we show the value of the maximized modularity Q 3 ( σ * ) and number of detected communities as the result of running all our three algorithms on the filtered correlation matrices for the S&P 500, the Nikkei 225 and the FTSE 100. We recall from the discussion following eq.(38) and from the benchmarks studied in sec. IV D that, unlike the corresponding problem in network analysis, our choice of C norm implies very small values of the maximized modularity, even in the presence of welldefined communities, when the market mode is strong. So the small values of Q 3 ( σ * ) shown in fig. 10 do not imply a poor or weak community structure. It can be seen from fig. 10 that all three algorithms perform very closely in terms of the maximized modularity value they achieve. Similarly, if we compare the number of communities found by the three methods (see fig.11) we find that the number of communities is quite stable as well.  In table II we quantify more rigorously the differences in the composition of the communities detected by the three algorithms, by showing the V I (see sec.IV D) among all pairs of algorithms, for all the three indexes. The values are quite low, indicating that the partitions found by different algorithms are very similar.

F. Hierarchical community structure of the market
We now come to the application of the multiresolution community detection approach we introduced in sec.IV C. In the case of financial markets, the community-specific correlation responsible for the modular structure shown so far can be regarded, from the perspective of all stocks within one community, as a 'micro market mode'. Just as the market mode discussed previously is responsible for the collective tide of an entire market, a similar force can be extrapolated at the community level. As discussed in sec.IV C, accounting for this 'community mode' in the leading eigenvalue and corresponding eigenvector of the correlation submatrix restricted to an individual community allows us to incorporate its effects, together with those of the overall market mode into the null model and again, detect any underlying structure, which surfaces upon the removal of its influence.
FIG. 11. Number of communities detected in each of the three markets (log-returns of daily closing prices from 2001Q4 to 2011Q3). Green is the Potts algorithm, orange is the Louvain algorithm and blue is the spectral algorithm. Figure 12 shows the result of a single layer of recursion into the five communities of the S&P 500 (depicted previously in fig. 7). Again, we note that the sub-communities are all residually anti-correlated with each other (within each parent community) but maintain an internal positive correlation. Although not obvious from the graph, the sub-communities tend to fall along GICS industry sector lines, with some interesting exceptions, as before.
To call out a few examples, in community D ( fig.12), which contains all of the IT stocks, we see the subcommunities separating along Industry Group and Industry lines 6 . Sub-community D5 is comprised of only Software stocks, D4 contains all of the Semiconductor & Semiconductor Equipment stocks, and D2 contains all of the Internet Software & Services stocks. Interestingly enough though, D2 also contains Amazon Inc. and Priceline Inc. from the Consumer Discretionary Sector, which one could argue are quite aligned with the Internet. Continuing the analysis further, we see that in community C the Finance community sub-community C2 contains all of the Commercial Bank stocks, while C3 contains all but one of the Insurance companies. C4 is exclusively Real Estate Investment Trusts (REITs) and accounts for all of them. Similar partitions can be seen in the other sub communities and further recursion into these communities produce still further separation, close to but not exactly in line with the GICS classification. Figure 13 depicts the hierarchical nature of the S&P 500 to three layers deep. The process can be continued until no single community can be partitioned further into any combination of two or more sets which are anti correlated with each other. For instance, community E ( fig.12) which contains a variety of stocks from various GICS sectors separates out such that the bulk of the stocks in the different sectors find themselves in their own sub community. If we further probe into community E1, which contains all of the Health Care stocks we see that the sub-communities (not shown) fall very closely along Industry lines, with five communities each comprised predominantly of Pharmaceutical, Biotechnol- FIG. 12. Our multiresolution community detection method resolves the sub-community structure of the five communities of the S&P 500 (see fig. 7). Community A mainly comprises Consumer Discretionary and Industrial stocks, B all the Energy stocks, C all the Finance stocks, D all the IT stocks, while E is highly heterogeneous but very well resolved into five separate subcommunities mainly comprising Utilities, Industrial, Health Care, Telecommunication Services, and Consumer Staples stocks respectively. Besides this relatively predictable partition, we note that Industrials stocks, and to a lesser extent also Materials and Consumer Discretionary stocks, are quite dispersed across different communities. (From cross-correlations of log-returns of daily closing prices from 2001Q4 to 2011Q3; the Louvain algorithm has been used). Similar outliers exist in the other communities as well. It may well be the case that there is good reason for their association, for example a shared parent company, sizable investment, common board members or some other significant relationship, or it may be purely coincidental. Gaining a better understand of this takes us to our next lines of experimentation.

VI. MULTIFREQUENCY COMMUNITY DETECTION
Having examined the mesoscopic structure of a set of financial markets, one might be curious as to whether that structure is specific to the chosen frequency of the original time series. That is, one would like to check whether the same communities would be retrieved if the returns which comprised the original time series were calculated every minute, every half hour or every two days. To answer this 'multifrequency' community detection problem, in this section we evaluate the robustness of partitions at a variety of temporal resolutions.

A. Multiple-frequency data
In order to maintain consistency with the results previously described in this paper, we use the same time frame but, instead of working with daily log-returns, we created new data using minute log-returns for the S&P 500 stocks. This has the initial effect of greatly increasing the amount of data being using: from 2500 data points per stock for the daily returns to approximately 900,000 for the minute returns. In order to accommodate some missing data from the minute returns, we had to reduce the set of 445 to 413 stocks, noting that the removed stocks were relatively evenly distributed across the top level sectors of the GICS, so as not to deplete any one particular sector. With the minute return time data of these 413 stocks we created nine new sets of time series, corresponding to a variety of different resolutions ∆ t spanning the same ten-year period: ∆ t ∈ {1, 5, 10, 15 & 30 mins, 1 hour, 0.5, 1 & 2 days}. For example, the 5-min data was created by taking the price of every stock every 5th minute throughout the day. From these nine sets of time series we then proceeded in the same fashion as was previously described for daily return data, creating correlation matrices and leveraging RMT filtering to produce the respective null models.

B. Robustness over multiple frequencies
To measure the effects of resolution we applied all three of the community detection algorithms discussed above to all nine data sets, yielding various values for the modularity Q 3 ( σ * ) of the partition (see fig. 14). Since there can be multiple peaks within a modularity landscape FIG. 14. Multifrequency analysis of the modularity Q3( σ * ) for the different methods, as the resolution goes from oneminute intervals to two-day intervals. A relatively consistent value of the modularity can been seen across all time step resolutions. (Potts algorithm in green squares, Louvain algorithm in orange circles and Spectral algorithm in blue triangles).
[63], all yielding the same value of Q 3 ( σ * ) but exhibiting different community structures, we use V I (see Sec. IV D) as a measure of the difference between the partitions. Since V I is a comparative measure, we (arbitrarily) use the community structure previously ascertained from the daily returns as the point of reference. Thus, as can be seen in fig. 15, the V I for the 1-day returns is 0 by construction, indicating perfect similarity, whereas the community structure for every other resolution shows some level of deviation.
Overall, it can be seen from the combination of figs. 14 and 15 that there exists a considerable amount of consistency between the communities detected at differing resolutions, with the Q 3 ( σ * ) values remaining almost constant and V I deviating slightly with each resolution interval but indicating in most cases no more than a 10% difference between the communities of a particular resolution and those of the 1-day resolution. This means that the correlations between large groups of stocks are not strongly dependent on the resolution of the chosen time step. One might expect to see fluctuations in the variance of stocks at smaller time resolutions, where the more volatile periods of trading (such as market open and market close) are captured. However since we are dealing with correlation matrices, this variance is normalized away. Moreover, we recall that our definition of C norm in eq.(38) controls for the varying volatitily (variance of the total log-return over all stocks), allowing us to focus solely on the relationships between the stocks themselves. Although the values of Q 3 ( σ * ) and V I do provide reasonable insight into the robustness of community structure at the different resolutions, we take the analysis one step further and examine the communities from the perspective of the individual stocks. That is, we can further examine the community affiliation of individual stocks at the various resolutions to ascertain the frequency of times any two stocks find themselves in the same community as each other. We show the results of this analysis in fig. 16, which is a heat map of the different stocks, such that the color of every pair indicates the frequency of cooccurrence in the same community, across all resolutions. For example, if two stocks were always in the same community (unit frequency) then their entry in the heat map is white, while if they were never in the same community regardless of the time step chosen (zero frequency) then their entry is black. Stocks which share a community for some time steps are shades of red (lower frequency) or yellow (higher frequency).
As we can see, the results are in line with the graph of V I in fig. 15. That is, the communities tend to consist of a large core of 'hard' stocks that are unwavering over the different resolutions, plus a small amount of 'soft' stocks that fluctuate between communities, presumably giving rise to the 10% fluctuation in community structure observed with the V I analysis. A significant finding is the existence of a group of soft stocks that alternate It should be noted that our identification of the 'hard' stocks that are most of the time part of the core of a community and the 'soft' ones that are instead alternating across communities is a way to take the potentially overlapping nature of communities into account, even if using a non-overlapping method like modularity maximization. This possibility has no counterpart in the standard network-based community detection problem, and is offered by the intrinsic dependence of correlation matrices on the frequency of the original time series. In what follows, we will use the dynamical evolution of correlations to explore another dimension of variability leading to an alternative way to resolve overlapping communities of multiple time series.

VII. TIME DYNAMICS
When optimizing a portfolio, there is a constant need to choose an adequate period of history from which to try to predict future behavior of the assets in the portfolio. Choosing too short a period will inaccurately bias one's results, because extreme events are weighted too heavily. Similarly, choosing too long a history can imply stability where none exists. In general, analyzing the stability of communities over time provides us with reassurance that our models are in fact producing statistically significant results as well as providing insightful information about the data itself. For example, it is well known in finance that markets become much more globally correlated during periods of economic decline. Stated in the terminology we have been using throughout this paper, they fall more under the influence of the market mode and relinquish the structure provided by the group mode. That being the case, we would expect to see communities lose coherence during periods in the dataset that we know to have been economically troublesome, for example the tech bubble bursting in 2000 -2001 or the sub-prime lending crisis, 2007 -2008.
Since we have shown that the 15-minute data set yield very similar communities to the daily data set for the S&P, we can feel reasonably assured that we can use 15minute data instead of the daily data, which will allow us to examine the S&P data set using a sliding time window of 2 years, corresponding to a ratio T /N = 6, and even look at more fine grained windows, e.g. 6 months.

A. Two-year window
We now seek to examine the community structure over sequential periods of two years to unearth any anomalies which might exist. We again apply the different methods of community detection using the two-year window time series sets of the S&P 500 and subsequent null models created using RMT filtering. As we did for our analysis of resolution, here we evaluate the modularity function Q 3 ( σ * ) for each period (see fig. 17) along with the V I (see fig. 18), where for V I we are comparing each window with the initial two-year window.
As before, we see that all three algorithms perform in a reasonably similar manner. However, unlike our analysis of robustness over different resolutions (which showed little change in community structure or in the modularity), here we see that Q 3 ( σ * ) fluctuates over the different windows. We recall again that, as we mentioned in our discussion following eq.(38), our choice of C norm is already discounting (the evolution of) the volatility of the market. Still, we see that Q 3 ( σ * ) rises slowly from the period ending in 2003 to the period ending in 2007, implying an increase in the strength of communities, and then falls by more than 50% by the end of the period ending in 2009. This drop implies a de-coherence of the communities throughout that period, quite possibly attributed to the financial crash of 2007 -2008. This seems in line with the observation that during periods of financial crisis, markets tend to become more globally correlated, overwhelming the effect of group-level correlations. However, it is interesting that the values of V I have remained quite small and stationary ( fig. 18). This indicates that, despite the fluctuating value of the modularity (i.e. of the relative intra-community correlations), the composition of the communities has remained very stable over time.

B. Six-month window
We can continue to probe this system at a finer grained resolution of time periods, to see if the observations made with the two year window hold up. Again, we plot both Q 3 ( σ * ) and V I for the same ten-year period of the S&P and present the results in figs. 19 and 20 respectively. We can immediately see that the homogeneity of community structure that we see when probing the data using two-year time windows still exists for the most part, but there exists some fluctuations in modularity and community composition over the various six-month periods. The graph of Q 3 ( σ * ) in fig. 19 reinforces the observation from fig. 17 of a significant drop in modularity around the time of the most recent financial crisis, and more accurately pinpoints it to the last half of 2007. The V I plot in figure 20 indicates again that although the strength of community structure, as measured by Q 3 ( σ * ), may have been decreasing, the overall composition of the communities remained relatively constant.
To further examine the coherence and fluctuations in communities across all of the six-month windows, in fig.  21 we provide a heat map showing the mutual V I between every two pairs of 6-month windows. Each square in the matrix is a colored representation of the value of V I between the i th and j th 6-month period. Of particular interest, we can see in the lower right corner of the image (which displays the V I between the most recent time windows) that the communities are slightly more similar than communities generated from the other windows. This indicates that there was less movement of stocks between communities during the most recent couple of years of the past decade. Additionally, these periods are closer to the community structure observed when we measured the entire ten-year period.
As far as explaining this behavior in financial and economical terms, we are again left to hypothesize. Perhaps the observed effect is due to a solidification of communities of stocks caused by the financial collapse, or perhaps it is merely the result of increased accessibility to the markets. With the advent of smartphones, tablets, ease of streaming and subscribing to news feeds and social networks in conjunction with faster trading systems, quantitative and high frequency trading, it is conceivable that our increased access to information and the ability to act on it in near real-time has caused a solidifying behavior of the stocks within communities.
C. Temporal coherence of communities: 'hard' and 'soft' stocks again We display here one final take on the results obtained from our sliding time window, but this time with a stockcentric view, similar to that which we performed for the multifrequency analysis in sec.VI. In the previous sections we have alluded to how communities change with time. One question that should be addressed in conjunction with the previous discussion of time scales is then how the composition of a community changes over time. We have already seen from a variety of V I plots that across each of the six-month periods, the sets of communities look slightly different from each other, but what changes are actually taking place? Are there groups of stocks that form tight knit, unwavering cores of communities? Or do they morph fluidly from one to the other, maintaining no coherence over the entire span of ten years?
To address this, we examined the sets of stocks that comprised the communities of each six-month time frame of the S&P over the course of ten years and created a co-occurrence matrix like the one previously shown in fig.16, where we calculated the frequency of periods during which any pair of stocks resided in the same community. The resulting heat map is presented in figure 22. Again, pairs of stocks which were in the same community all the time are white, and those which were never in the same community are black.
The list of stocks is too long to place on the figure as axis labels, but from observing the raw results we can make some very interesting observations, which we have tried to summarize by labeling again the graph with GICS industry sectors. We can see that over the course of ten years, the communities do exhibit strong cores which are unwavering in their construction and constantly anticorrelated with each other. For example there exists a set of core energy, IT and financial stocks which always reside in their own community, but never share a community with each other. Groups of Energy, Materials and Utilities stocks almost always share the same community, but there have been instances when they did not. Finance is broken into a couple of segments of stocks, such as Banks, Real Estate Investment Trusts (REITs), etc. where the smaller groups always trade with each other but not necessarily aggregated together in a larger community. Similarly, Health Care stocks are fractured in subsets of highly correlated groups of Pharmaceuticals, Services and Biotech, whose allegiance to the larger industry sectors, such as IT and Consumer Staples is more fluid. These trends display interesting overlap with the hierarchical community structure of the S&P discussed earlier in sec.V F. We also see individual stocks from one top-level industry sector spending most of their time in communities comprised predominantly of a different toplevel industry sector, for example Amazon (Consumer Discretionary) spends 90% of the time in the IT group, as does Motorola (Telecommunications).

VIII. CONCLUSIONS
In this paper we have addressed the challenging problem of the detection of communities of strongly correlated time series, whose importance resides in the possibility of identifying a mesoscopic level of organization in the dynamics of complex systems. While the available techniques to analyze correlation matrices failed to detect such modules, we have shown how the concepts of null models, modularity and community detection developed in network theory can be appropriately modified in order to successfully cluster matrices of multiple time series. Our redefinitions of the standard methods solve a number of problems encountered when correlation matrices are naïvely regarded as weighted networks and when ordinary community detection methods are used improp-

erly.
Through the use of various financial markets as examples, we have demonstrated how community detection can be used as a tool to extract specific structural information from time series data. By surfacing group correlations and trends of the stocks in the S&P 500, the FTSE 100 and the Nikkei 225, we were able to isolate well-defined communities of stocks such that each community exhibited an internal positive correlation between its constituent stocks, where those same stocks exhibited an aggregate residual anti-correlation with the stocks of each of the other communities. While some of these communities showed an association with the more qualitative classification expressed in the GICS industry sector taxonomy, our approach was able to uncover a host of interesting correlations between stocks of different sectors and industry groups, as well as unsuspected residual anti-correlations between stocks of the same sector. As such, our methods and results show that the observed patterns are irreducible to a standard taxonomy, and therefore highlight nontrivial patterns. Moreover, they could prove particularly useful in a number of different fields of finance, such as portfolio optimization and risk management.
It is worth pointing out that our modifications to the Potts, Louvain and Spectral Optimization algorithms for community detection, although beneficial in and of themselves, act as a proof of concept opening the door to the adaptation of other techniques existing in the field of community detection, allowing e.g. for overlapping, multiresolution, or hierarchical communities [10,64,65]. Similarly, alternative null models controlling for additional or more sophisticated features of the data can also be developed and incorporated in our approach. The key point is that these models, unlike the naïve approach, which has been used so far, should always be consistent with correlation matrices. We hope that our approach will stimulate further research in this direction. Moreover, although we have focused on financial time series as our primary example of real-world data, our general methodology can of course be applied or adapted to any type of time series data, hopefully yielding equally promising results.
We conclude by noting that, abstractly, the ordinary (network-based) community detection techniques and the (correlation-based) clustering that we have introduced can be thought of as lying at two opposite extremes of a more general problem, in the following sense. The network-based clustering is in the vast majority of cases aimed at identifying groups of statically linked objects (as captured by a single temporal snapshot of the network) while disregarding their possibly correlated evolution. By contrast, the correlation-based clustering that we have introduced assumes that the community-defining features are precisely those determining synchronized trends of dynamical activity among nodes, and that the presence (if any) of static dependencies among the latter can be disregarded. One could of course imagine a more general framework where both static linkages and temporal correlations contribute to the definition of communities, possibly overcoming the 'functional versus structural' dichotomy such as the one existing in brain network anal-ysis that we mentioned in the Introduction. The present work thus represents one step towards the introduction of a fundamentally more general interpolating formalism.

Modified Potts method
The first of the three methods we have selected is based on the so-called q-state Potts model [42,46]. It represents the system as a q-state spin glass, where each node maintains a spin state σ i (as given by some attempted partition σ) and the weights of the edges between nodes map to coupling strengths. So any partition of the network is regarded as a spin configuration σ. In this paradigm, the modularity Q( σ) is proportional to the negative energy −H( σ) of the system. The goal of optimization is then to find the ground state of a spin glass, which corresponds to the maximum value for the modularity. The use of a multi-state super-paramagnetic model for graph clustering was first introduced by Blatt, Wiseman and Domany [66] and later revised by Reichardt and Bornholdt [42,46]; it is upon the latter that we base our extension to incorporate multiple time series.
Within the q-state Potts spin glass model, Reichardt and Bornholdt construct a Hamiltonian by rationalizing a number of energy contributions from the edges between nodes within the same community and nodes in different communities: where the contributions from the various types of links can be tuned through the set of coefficients, a ij , b ij , c ij , d ij . Instead of directly maximizing the modularity Q( σ) defined in eq.(18), Reichardt and Bornholdt minimize the Hamiltonian H( σ). The latter (under certain conditions and some simplifying assumptions) can be condensed to where A ij is the observed value and A ij is the corresponding null model for that edge. The actual search over spin configurations is done using Simulated Annealing [67], which is an approximate technique that in general returns a different solution each time it is used.
In the same vein, introducing a Hamiltonian corresponding to our correlation-based modularity is equally straightforward, however we need to ensure that the logic and derivation that was used to develop the original network-based Hamiltonian holds true for a network created from time series data. For a correlation-based network we have the situation where every node is connected to every other node, in principle eliminating the energy contributed by non-links in eq.(A1) above. However, as is ordinarily done when applying the Potts model to weighted networks, we can replace the energy contribution of non-links with the energy contribution of links whose weight is less than expected, allowing us to immediately introduce our null model. Maintaining the balance between internal and external edges (a ij = c ij and b ij = d ij ) as was done by Reichardt and Bornholdt in their original derivation of the Hamiltonian, we end up with a variant of eq. (A2) directly derived from a complete weighted network where A ij and A ij are replaced by the observed correlation C ij and one of our three null models C ij l defined in sec. IV A, giving Apart from the absence of C norm , the r.h.s. of the above expression is the opposite of the r.h.s. of eq.(36). Therefore our optimization method using the Potts model will attempt to find the lowest value of the Hamiltonian, which will correspond to the highest modularity. For the rest, our algorithm is identical to the procedure described by Reichardt and Bornholdt [42,46]. Therefore the Potts model is a simple algorithm to adapt to correlation matrices, the reason being that although the modularity of the system is explained using a spin-glass model, the actual optimization process is performed using Simulated Annealing, which keeps working even if we use our redefinition of modularity as the cost function [67].

Modified Louvain method
We now consider a second approach to the problem of modularity optimization. Possibly one of the most successful approaches, the Louvain method [47] (named after the University from which it emerged) is a simple, greedy, agglomerative algorithm whose strength lies in the fact that it is computationally fast. Unlike the spinglass model, the Louvain method does not set up a framework for its optimization problem. It simply starts from the definition of modularity specified in eq. (18) and derives a new, more computationally efficient equation for testing the relative gain in modularity by moving a node from one community to another. It is this equation that allows the Louvain method to perform so well.
The method initially considers all nodes as placed in individual communities, and then calculates (sequentially for each node i) the gain of modularity associated with moving node i to the same community where each of its neighbours j belong. The algorithm explores all possible such moves and implements those that give the maximum gain in modularity, and the first iteration stops when no further improvement is possible. Then, a new 'renormalized' network is built by merging all nodes within the previously found communities into a single 'hypernode', and the algorithm is iterated again until there are no more possible changes and a maximum of modularity is attained. To do so, the renormalized weight of the link between two hypernodes is defined as the sum of the weight of the links between nodes in the corresponding two communities. Links between nodes of the same community lead to self-loops for the corresponding hypernode.
The key requirement of the Louvain method is that the system can be properly renormalized, i.e. that successive coarse-grainings of the system remain consistent with the meaning of the modularity at the corresponding level of aggregation. In an ordinary network this is relatively straightforward to show, i.e. a hypernode obtained merging two or more nodes can be legitimately interpreted (from the point of view of the modularity function) as a coarse-grained node with a self-loop to itself and renormalized interactions to all other (hyper)nodes. It is however not intuitively obvious whether our modularity defined in eq.(36) admits an equivalently consistent definition of 'renormalized time series' obtained by 'merging' two or more time series. And even if such a definition exists, one should understand how to correctly define also the renormalized interactions and self-loops.
To this end, we recall from eq.(10) that if X i and X j are two standardized time series then C ij = Cov[X i , X j ]. We can therefore exploit the fact that the covariance is a bilinear function of its arguments to calculate the following renormalized interactions between two hypernodes (communities) A and B: i∈A j∈B The above formula shows that, if we define the 'renormalized time series' of community A as then we can consistently define the renormalized interactions asC AB ≡ i∈A j∈B C ij = Cov X A ,X B (A6) and the renormalized self-loops as We therefore find that, for a graph composed of financial time series, renormalized interactions have a correct interpretation in terms of covariances, rather than correlations. They also show that the summation of a group of time series yields something that resembles an index fund of the set of stocks, so the concept of aggregating nodes maintains a strong grounding in reality. We now have to check whether the modularity function remains consistent with the null model when defined at the level of renormalized nodes. Note that the linearity of the definition ofC AB ensures that, given any of our null models defined in sec.IV A, we can write This means that the filtered quantity C ij − C ij l can be similarly renormalized as for each of the three cases in eq. (37). Now, imagine that in subsequent iterations of the model the hypernodes are further merged into 'communities of communities'. The resulting 'metapartition' can be specified by a vector σ of dimension smaller than (or equal to, if the metapartition is trivial) any of the original vectors σ. Each element σ A denotes the community to which the hypernode A is placed by the metapartition. If σ denotes the underlying (node-level) partition identified by the metapartition σ (i.e. σ i =σ A for all i ∈ A), we can define the renormalized modularitỹ where, in analogy with eq.(38), we have defined Equation (A10) coincides with the original modularity defined at the level of individual nodes. This means that the modularity is manifestly invariant under renormalization, implying that we can indeed consistently redefine a coarse-grained modularity at each iteration of the Louvain method. The second requirement of the Louvain method is the fact that the change in the modularity obtained by adding a previously isolated node to a given pre-existing community can be easily calculated. This ensures the computational efficiency of the algorithm. In adapting the model to correlation-based networks, we must start from eq. (36) and check whether this is still the case, and if so arrive at a new corresponding expression for the modularity change. We will do so using directly the invariant modularity defined in eq.(A10), so that we are sure that the result will hold at any aggregation level. Given the modularityQ l ( σ), we denote the modularity change obtained by adding the (hyper)node I to the community J by ∆Q (I→J) l and calculate it as the difference betweenQ l ( σ ) for a (meta)partition σ where I is part of the community J (i.e.σ I = J) andQ l ( σ ) for a (meta)partition σ where I is isolated in its own community (i.e.σ I = J). Sinceσ A =σ A for all A = I and δ(σ I ,σ A ) = 0 for all A = I, we can write this difference as That is, the change in modularity obtained from adding a (hyper)node I to a pre-existing community J is simply proportional to the renormalized interaction between I and J, i.e. the sum of the (filtered) correlations of all time series within I with all those within J. Note that in the above formula the notation A ∈ J implies A = I, since I does not (yet) belong to J. Similarly, it is possible to calculate the change in modularity −∆Q (I→J ) l obtained when a (hyper)node I belonging to a community J is disconnected from the latter and placed in its own isolated community. Combining these two contributions, we can easily calculate the change in modularity −∆Q obtained by moving a (hyper)node I from a community J to a different community J. So our reformulation above satisfies also the second requirement of the Louvain method at all aggregation levels, and allows us to define a computationally efficient method to detect communities of time series.

Modified spectral method
We now come to the third and final method of optimizing the modularity cost function. Spectral Optimization is the process of using matrix eigendecomposition to recursively bisect a network into communities of nodes according to the principle of maximizing the modularity function [48]. The matrix which is the subject of the eigendecomposition is the so-called modularity matrix appearing in eq.(18) and having entries B ij = A ij − kikj 2m . In other words, the modularity matrix B is the difference between the observed network, represented by the adjacency matrix A, and the null model A . In the spectral method, the modularity matrix is eigendecomposed into its constituent eigenvalues and eigenvectors, the intent being to isolate the eigenvector corresponding to the largest eigenvalue and use the signs of the elements of this vector to infer an optimal partition. Specifically, the network is split into two communities, each comprising the nodes corresponding to eigenvector components with the same sign. The process is implemented recursively in each partition (deriving a new modularity matrix for every community), until no further increase in modularity is obtained.
We need to extend this algorithm to accommodate correlation-based networks. In our case, as clear from eqs. (36) and (37), the modularity matrix is C (l) , i.e. the filtered matrix defined using one of our three null models. We fill therefore adapt the procedure outlined by Newman in the original paper, and implement the spectral optimization method by iteratively bisecting the network into two sub-communities (say A and B). Each such bisection can be denoted either by an appropriate partition vector σ or equivalently by a vector s having elements s i = −1 if node i belongs to (say) community A and s i = +1 if i belongs to community B. The correspondence between these vectors is given by δ(σ i , σ j ) = s i s j + 1 2 . (A13) Given a bisection, we can therefore rewrite our unified correlation-based modularity Q l ( σ) defined in eq.(36) as ij .
In Newman's original formulation the last term sums to zero, because the network-based modularity matrix B has the property that all of its rows sum to zero. However, this is not the case with our correlation-based modularity matrix C (l) defined in eq.(37). So we retain the second term and, defining C (l) ij , rewrite our modularity in matrix form as The vector s maximizing Q l ( s) is easily found as the vector matching the signs of the components of the eigenvector of C (l) corresponding to the largest eigenvalue.
Clearly, both C norm and C (l) tot have no effect on the result, making the original procedure of the spectral algorithm consistent with our reformulation. After the initial bisection, we need to calculate the potential modularity change ∆Q l obtained by further subdividing the communities yielded in the previous step. Let us consider the case where one community (say A) among the ones obtained thus far in the algorithm is further subdivided into two new communities (say A 1 and A 2 ). If s is a vector (restricted to the vertices in A only) denoting the bisection of A into A 1 and A 2 , then the modularity change associated with such bisection reads where C

(l)
A represents the sub-matrix of C (l) restricted to the subset of nodes within community A, and the nota-tionC (l) AA is borrowed from eq.(A9). As with the initial bisection, s is chosen to maximize ∆Q (A1|A2) l by selecting its elements to match the sign of the eigenvector corresponding to the largest eigenvalue of the matrix C (l) A . As for the original algorithm, our modified spectral method proceeds by iterating the above procedure until no further bisection can make the modularity increase.