How strong are correlations in strongly recurrent neuronal networks?

Cross-correlations in the activity in neural networks are commonly used to characterize their dynamical states and their anatomical and functional organizations. Yet, how these latter network features affect the spatiotemporal structure of the correlations in recurrent networks is not fully understood. Here, we develop a general theory for the emergence of correlated neuronal activity from the dynamics in strongly recurrent networks consisting of several populations of binary neurons. We apply this theory to the case in which the connectivity depends on the anatomical or functional distance between the neurons. We establish the architectural conditions under which the system settles into a dynamical state where correlations are strong, highly robust and spatially modulated. We show that such strong correlations arise if the network exhibits an effective feedforward structure. We establish how this feedforward structure determines the way correlations scale with the network size and the degree of the connectivity. In networks lacking an effective feedforward structure correlations are extremely small and only weakly depend on the number of connections per neuron. Our work shows how strong correlations can be consistent with highly irregular activity in recurrent networks, two key features of neuronal dynamics in the central nervous system.


I. INTRODUCTION
Two-point correlations are commonly used to characterize collective dynamics in extended systems [1][2][3][4][5][6][7]. Recent technical advances [8][9][10] make it possible to simultaneously record the activity of many neurons in networks in the brain. This allows for the measurement of two-point correlations for large numbers of neuronal pairs in spontaneous activity as well as upon sensory stimulation or in controlled behavioral conditions.
Correlations in neuronal activity impact the ability of networks to encode information [11][12][13][14]. Correlations are also functionally important in performing sensory, motor, or cognitive tasks [15]. For instance, correlated oscillatory activity has been hypothesized to be involved in visual perception [16]. In a recent study combining modeling, electrophysiology and analysis of behavior, we * Correspondance author:david.hansel@univ-paris5.fr argued that non-oscillatory correlated neuronal activity in the central nervous system is essential in the generation of exploratory behavior [17]. Correlations are also important for the self-organization of neuronal networks through activity dependent plasticity [18]. Indeed, changes in synaptic strength are thought to depend on the temporal correlation of the activty of the pre and postsynaptic neurons [19,20].
Neurons in cortex receive recurrent inputs from several hundreds [41] to a few thousands [42] of other neurons. Individual connections can induce post-synaptic potentials in a range of 0.1 mV to several mV [35]. Thus, 50 simultaneous inputs are sufficient to trigger or suppress a spike in a neuron. These facts have been incorporated in model networks with strongly recurrent connectivity in which connection strengths are O(1/ √ K), where K is the average number of inputs per neuron. This scaling is in contrast to the one used in standard mean-field models (e.g. [43]) where connections are O(1/K) and thus are much weaker. Recent experiments in cortical cultures are consistent with a 1/ √ K scaling of connection strength [44].
van Vreeswijk and Sompolinsky [45,46] showed that strongly recurrent networks consisting of two populations of neurons, one excitatory (E) and one inhibitory (I), randomly connected on a directed Erdös-Rényi graph, operate in a state in which the strong excitation is balanced by the strong inhibition. In this balanced regime, neurons receive strong excitatory and inhibitory inputs, each O( √ K), but due to the recurrent dynamics these inputs cancel each other at the leading order. This cancellation, which does not require fine-tuning, results in O(1) net inputs into the neurons whereas spatial heterogeneities and temporal fluctuations in the inputs are also O(1). As a result, the activity of the neurons remains finite and exhibits strong temporal irregularity and heterogeneity [47,48].
The latter results were established in two-population sparsely connected networks. Renart et al. [28] showed that they also hold for densely connected networks. This is because in unstructured and strongly recurrent networks the dynamics suppress correlations [49]. Despite the fact that in these networks neurons share a finite fraction of their inputs, they operate in an asynchronous state with O(1/N ) correlations, where N is the number of neurons in the network [50].
These previous studies focused on strongly recurrent unstructured networks with two neuronal populations. In the brain, however, neural networks comprise a diversity of excitatory and inhibitory cell types, which differ in their morphology, molecular signature and importantly for our purpose, in their connectivity. These networks are also structured at many levels. In particular, the probability of connection falls off with distance and depends on functional properties of pre-and postsynaptic neurons. For example, in mouse primary auditory cortex the probability of excitatory neurons to be connected decays to zero within ∼ 300µm [35]. In cat primary visual cortex neurons interact locally on range of ∼ 500µm, whereas long range patchy connections are observed up to several mms [51,52].
In the present paper we investigate how structure in network connectivity can give rise to strongly correlated activity. Our goal is to explore the general architectural features that control the strength of pair-wise correlations in strongly recurrent neural circuits consisting of several excitatory and inhibitory neuronal populations. In Section II, we define the network architectures and the neuronal model we use. In Section III, we establish a set of constraints, the balanced correlation equations, that need to be satisfied in any strongly recurrent network if firing rates do not saturate. We derive in Section IV explicit expressions for the Fourier components of the correlations in two-population networks with spatially modulated connectivity. We establish the conditions under which correlations are strong and show that these do not violate the balanced correlation equations. Section V is devoted to networks with an arbitrary number of populations. We prove two theorems, which state for which network architectures correlations are O(1/N ) and for which they increase with K, when K is large. In Section VI, we apply these theorems to specific examples. In Section VII, we assume that K = O(N γ ) and derive a bound on γ for which the scaling theorems still apply. The paper closes with a discussion of our results.

II.1. Architecture
We consider a neuronal network with a ring architecture, comprising D neuronal populations, some excitatory and others inhibitory ( Fig. 1; [53,54]). For sim-plicity, we assume that all populations have the same number of neurons, N . Neuron i, in population α (neuron (i,α)) is located at angle θ α i = 2πi N with, i = 1, . . . , N and α = 1, . . . , D. The probability, P αβ ij , that a neuron (j, β) projects to neuron (i, α) depends on their distance on the ring (Fig. 1b). We write: In this paper we assume a finite number of non-zero Fourier modes in f αβ .
Thus, a neuron in population α receives, on average, K inputs from neurons in each of the populations β. We denote by Λ the adjacency matrix of the network connectivity For simplicity we assume that all connections from population β to population α have the same strength, j αβ . We thus define the connectivity matrix, J , as Note that j αβ is positive (negative) if population β is excitatory (inhibitory). In all this paper we focus on strongly recurrent networks, characterized by interactions which are O( 1 √ K ) [45,46]. Thus, we scale the synaptic strengths with the mean connectivity as where J αβ is O(1).

II.2. Single neuron dynamics
The state of neuron (i, α) is characterized by a binary variable, S α i . When the neuron is quiescent, S α i = 0, while if it is active, S α i = 1. The total synaptic current into neuron (i, α) at time t, h α i (t), is the result of all its recurrent interactions with the other neurons in the network, as well as the feedforward inputs coming from outside the network. It is given by where the external input I α i = √ KI α (θ) [46] is assumed to be constant in time. Here, I α is O(1) and thus I α i is O( √ K). The neurons are updated with Glauber dynamics at zero temperature [1,28,43]. Specifically, update times for neuron (i, α) are Poisson distributed with rate 1/τ : if neuron (i, α) is updated at time t, S α i is set to 1 if h α i (t) ≥ T α and to 0 otherwise (for simplicity, we assume the same threshold, T α , for all neurons in population α, The neurons in each population are located on a ring. The probability that neuron (j, β) is connected to neuron (i, α) depends on their distance. c. Left: Neurons are binary units with zero temperature Glauber dynamics. Right: The network operates in the balanced regime. Red: Excitatory input into a neuron. Blue: Inhibitory input into the same neuron. Black: The net input is on the order of the threshold (green, T = 1). Time is given in units of τ = 1. and the same update rate for all neurons). Accordingly, the transition probability, w, (Fig. 1c) can be written as where Θ is the Heaviside function. We normalize time so that τ = 1.

III. SPATIO-TEMPORAL PROFILE AND CORRELATIONS OF THE ACTIVITY FOR N, K → ∞
We write the state of neuron (i, α) as where m α (θ) = [ S α i (t) t ] J is the average of S α i (t) over realizations of the network connectivity matrix and over time. In Eq. (3), the term ∆S α i is the quenched disorder, whereas δS α i (t) represents the temporal fluctuations in the activity of neuron (i, α), where h α (θ α i ) = [ h α i t ] J is a smooth function of its argument. For large N it is given by: (5) The second term in Eq. (4) is the quenched disorder in the input. It satisfies Finally, the temporal fluctuations in the inputs are Because we scale the synaptic strength as 1/ √ K, these temporal fluctuations are O(1) when correlations are weak. At first sight, . This would imply that, depending on whether h α (θ) is positive or negative, the neurons fire either at very high or at very low rate. This happens unless the network settles into a state in which excitation and inhibition are, on average, balanced. In this case, the net average inputs to the neurons are in fact O(1). On the other hand, large spatial and temporal correlations could in principle lead to temporal fluctuations and heterogeneities in the inputs of O( √ K). Thus in presence of strong correlations it is not sufficient to require that the mean input, h α (θ), is O(1), for the network to operate in a biologically relevant regime. For that, we need both the mean net inputs and the input fluctuations to be O(1) at any time and for all neurons. When this happens we say that the network operates in the balanced regime.

III.1. Balance equations for the quenched averaged population activities
As in [55,56], the requirement that the mean input is for all α ∈ {1, . . . , D} and all θ ∈ [0, 2π). In the large K limit this yields a set of linear equations which determines the functions m α (θ) to leading order. These equations can be written in Fourier space where the superscript n denotes the n th Fourier mode and we have used the short hand notation which is the n th Fourier mode of the connectivity matrix.
In what follows, we consider J (n) αβ as the elements of a D × D matrix, J (n) .
The spatial average of the activities, m (0) α , must be non-negative for the balanced state to exist. This implies that the parameters J αβ and the external inputs, I (0) α , must satisfy a set of inequalities. For example, for networks with two populations, one excitatory (α = E) and one inhibitory (α = I), these inequalities are [46] In general, Eq. (9), also implies additional inequalities that must be satisfied by f (n) αβ to guarantee that m α (θ) ≥ 0 for all α and θ [55,56]. In the present work we focus on the case where the external inputs are spatially homogeneous. Thus, m (n) α = 0 for n ≥ 1, and the only condition which is required for the balanced state to exist is: m To study the stability of the homogeneous balanced state it is useful to introduce the interaction matrix where g α is the population averaged gain (see Appendixes A,D; [46,49]). Small perturbations, δm α (θ, t), of the activity profile around this homogeneous state evolve according to [46]: where δm αβ is O(1). The stability of the balanced state with respect to perturbations in m α (θ) requires that, for all n, all the eigenvalues of the matrices J (n) have real parts smaller than 1. For instance, for a two population (E,I) network one must have in the large K limit for all n. Note that the gain of the neurons, g α , dropped from this equation. The balanced state undergoes a Turing bifurcation when for some n ≥ 1, IE crosses 0 and becomes positive.

III.2. Balance equation for pair-wise correlations
The time-lagged auto-and cross-correlation functions of the activities of a pair of neurons, (i, α), (j, β), for (j, β) = (i, α), are In what follows, it is convenient to define c αα ii = 0. Due to the randomness of the connectivity, the number of excitatory and inhibitory inputs varies from neuron to neuron, resulting in heterogeneous firing rates between neurons even when they belong to the same population [46,48]. This structural randomness also results in heterogeneity of the auto-correlations of single neuron activities. It also contributes to the heterogeneity in pair cross-correlations. The latter is further enhanced by the spatial variability in the number of inputs shared by pair of neurons. A full characterization of the distributions of the auto-and cross-correlations is beyond the scope of this paper. Instead, here we will focus on their population averages.
The population average auto-correlation, A α , is given by A α (τ ) = 1 N i a α i (τ ), which in the thermodynamic limit is also A α (τ ) = [a α i (τ )] J . With the architecture we use, the probability that neurons (i, α) and (j, β) share common inputs from a third neuron, (k, γ) is As is evident from this equation, the number of shared inputs averaged over all pairs of neurons separated by the same distance on the ring, ∆, depends only on ∆. Thus we define the average cross-correlations as C αβ (∆, τ ) = c αβ ij (τ ) , where . denotes the average over pairs with |θ α i − θ β j | = ∆. In the thermodynamic limit, this quantity does not depend on the specific realization of the network connectivity matrix. The Fourier expansion of this function is where C (n) . Equation (9), which determines the population averaged firing rates, stems from the constraint that the net input in every neuron must be O(1) when K is large. The condition that for any pair of neurons the correlation in their inputs is finite in that limit leads to another constraint, which we now derive.
Let us consider the N D × N D matrix Q defined by for (i, α) = (j, β) and Q αα ii = 0. Using Eq.(7), one finds which can be written in matrix form Here X denotes the D × D matrix X αβ , A αβ = A α δ αβ and the superscript † denotes Hermitian conjugation βα ]). Note that the diagonal of the matrix Q is not the variance of the inputs. The latter is (see Appendix E4) The requirement that crosscorrelations of the inputs into pair of neurons are at most O(1) implies that all the quantities Q (n) αβ are also at most O(1). This yields We call Eq. (23), the balanced correlation equation. It implies that for all n for which the matrix J (n) is invertible and has entries O(1), C (n) is smaller or equal to O(1/K) and is thus weak. For a broad class of network architectures, we show below that correlations are in fact O(1/N ) and barely depend on K, for large K. When, however, J (n) is singular for some n ≥ 0, correlations can be larger than O(1/K). In fact we will see that in those cases correlations can even increase with K.
Finally, we note that one can also write a balanced correlation equation for the quenched disorder of the neural activity, using the fact that [∆h α i ∆h β j ] (n) J also must be at most O(1). This requirement leads to the condition

IV. CORRELATIONS IN TWO-POPULATION NETWORKS
In Appendix A we derive an equation for the spatial Fourier modes of the equal-time quenched average correlation functions, C αβ (∆, 0). It yields (omitting the second argument) There we also show that the solution of this equation is a fixed point of the dynamics of the correlations which is always stable when the balanced state is stable with respect to perturbation in the population rates. This equation holds for a network with an arbitrary number of neuronal populations. In the case of two populations, it can be solved explicitly, yielding after a straightforward calculation (see Appendix E) where T (n) and ∆ (n) are the trace and the determinant of the matrixJ (n) . The expansion of these expressions for large K gives Thus in general when K is large C K. It should be noted that to derive Eq. (28) we assumed that T (n) = 0 and ∆ (n) = 0. Equation (27) indicates, however, that when T (n) = 0 and ∆ (n) = 0, or T (n) = 0 and ∆ (n) = 0, C (n) The situation is different if T (n) = 0 and ∆ (n) = 0. Equation (27) shows that in this case it is possible to get correlations which are O(K/N ).
In the rest of this section we consider in detail twopopulation networks in which for the probabilities of connection only the first two Fourier modes are non-zero with α = E, I, β = E, I.
EI are O(1). Thus, in that limit, the spatial average and modulation of the correlations within the E and I populations do not depend on K, at the leading order. Moreover, C EE (∆) and C II (∆) depend on the synaptic strengths only because the autocorrelations A E and A I depend on these parameters. In all these cases C EE (∆) is very small (note the scale on thr y-axis). When the connectivity is spatially modulated C EE (∆) varies with distance. However, the spatial averages are comparable in the three cases consid- . This is in agreement with Eq. (30) since, to the leading order, the autocorrelations, A E and A I , are not expected to depend on whether the connectivity is spatially modulated or not.
Note that according to Eq. (30), C EE (∆) and C II (∆) are negative for close-by neurons (∆ small) and positive for neurons that are far apart. This is the case for the set of parameters corresponding to the solid line in Fig. 2a (see also Fig. 2b-d). However, when finite K corrections are not negligible, which happens when |T (1) ∆ (1) | is sufficiently small, short range correlations can be positive and longer range correlations negative (Eq. (30)). This is the case in Fig. 2a, dashed line.

Figure 2b compares simulations (circles) and analytical results (solid lines) for the dependence on N of N C
(1) αβ (parameters as in panel a, red solid line). It shows that the spatial modulation of the correlations in the simulation is close to the large K analytical results. Figure 2c-d depict the dependence of N C αβ on K. In the whole range of K considered here simulations and analytical results are close. For |C αα | the nearby correlations and modulation amplitudes increase with K and are much larger than that of |C EI |.  We now consider a network in which f (1) The spatially modulated component of the interaction has therefore an explicit feedforward structure (Fig. 3a, top panel) and T (1) = ∆ (1) = 0.
Solving Eq. (27) shows that the correlations are on average O(1/N ) and that their modulations are  Correlations between E and I neurons (green) are weaker than those between E neurons. They are negative for nearby neurons while for nearby excitatory pairs they are positive. All these features are in agreement with our analytical results, The spatial modulation of the EE correlations increases with K (Fig. 3b, circles). There is quantitative agreement between simulations and theory (Eq. (31)) up to K ≈ 4000 for N = 40000. In this range, C EE is larger than predicted by Eq. (31). This is because this equation was derived by linearizing the dynamics, which is only valid when correlations are not too large. In fact, simulation results for fixed K deviate less from Eq. (31) when N is increased (Fig.3b-c).
EI , becomes more negative ( Fig. 3c; green). Here too, for N = 40000 simulations agree well with Eq. (31) up to K ≈ 4000 and deviations are smaller when N is larger. The spatial modulation of the II correlations in the simulations are extremely small ( Fig. 3c; blue) as the theory predicts.
Finally, according to Eq. (31), the spatial modulation, C Correlations in a two population network with an explicit feedfworward structure. Connection probabilities as in Eq. (29). Parameters: f  The network investigated in IV.2 has an explicit feedforward structure. Adding any spatial modulation to the II connectivity (f (1) Fig. 4a) destroys this structure and now T (1) = 0 (while ∆ (1) = 0). Solving Eq. (26) for that case, one finds: In the large N, K limit, C EE (∆) and . Therefore the addition of a spatial modulation in the II interactions suppresses the correlations that inhibitory projections induce in the E population.  To understand further the origin of this, let us consider the quenched average correlations of the inputs, Q αβ (∆) (Eq. (19)). Using Eq. (21), one finds II is given by Eq. (32). This yields Thus, when II interactions are spatially modulated, a cancellation between terms which are O(K/N ), reduces Q According to Eq. (32), the correlation in the E population always increases linearly with K, for small K. The cross-over between this regime, where II . Figure 4c depicts this crossover in numerical simulations. Thus, although in the large N, K limit a transition from O(K/N ) to O(1/N ) correlations occurs as soon as f (1) II = 0, this qualitative difference is significant only when K is sufficiently large. In other words, for moderately large number of inputs per neuron, correlations can exhibit a close to linear increase even if the structure of spatial modulation of the interaction matrix is not completely feedforward.

V. SCALING CORRELATION THEOREMS
In the previous section we studied networks with two neuronal populations. In this case, it is straightforward to analytically derive explicit expressions for the correlations. These expressions are simple enough to fully classify how the network structure affects the scaling (with K and N ) of the correlations. For networks consisting of more than two populations, analytical expressions for the correlations can in principle be derived. However, dealing with these expressions becomes rapidly impractical as the number of populations increases. In the following we adopt an alternative approach. We prove two general theorems, which, given the network architecture, allow us to determine how correlations vary with K, without computing these correlations explicitly.
To prove these theorems, we rewrite Eq. (26) in the Jordan basis ofJ (n) . We writē Here x * denotes the complex conjugate of x, U (n) (V (n) ) are matrices whose rows are the generalized eigenvectors ofJ (n) , and J where we have defined Note that while the matrix C (n) is symmetric and the matrix A (n) is diagonal, this is not in general the case forĈ (n) andÂ (n) . Let us assume that the network is in a stable balanced state in which the matrixÂ jor does not have a Jordan block whose real part is a shift matrix (A shift matrix, S, of dimension P is a P × P matrix of the form S µν = δ µ+1,ν ).

Correlation Theorem 2: IfJ
(n) has at least one Jordan block whose real part is a shift matrix, the n th Fourier mode of the correlation matrix is O(K P (n)−1 /N ), where P (n) is the dimension of the largest block in J (n) jor , whose real part is a shift matrix. In Appendix B we also show how to extend these resukts to case whereÂ (n) has zero elements. The matrix U (n) can be viewed as a transformation of the original network of D populations into a network of D effective populations. The condition thatJ (n) has a Jordan block which is a shift matrix of size P , can be interpreted as the existence of P effective populations whose effective interaction matrix is feedforward in its n th Fourier mode. In other words, the original network exhibits a hidden feedforward structure, which is embedded in the n th Fourier mode of its connectivity. Theorems 1 and 2 therefore implies that only when such a structure exists, the n th mode of the correlation matrix increases with K. To know which elements in this matrix increase with K one has to compute the matrices U (n) and V (n) (see Eq. (36)).

VI. APPLICATIONS OF THE CORRELATION THEOREMS
In this section we consider networks comprising D populations with connection probabilities Using Eq. (36), one finds that C where a, c > 0.
In this case the network has no explicit feedforward structure since all four interactions (EE, EI, IE, II) are spatially modulated. It has, however, a hidden feedforward structure, as revealed by the Jordan form of the interaction matrix.
For this network, the transformation matrices are We consider an example of such a network in Fig. 5. The parameters J αβ and the external inputs are as in Fig. 2-4. Therefore, to leading order, the population averaged activities, m α , the autocorrelations, A α , and the population gains, g α are the same as in Figs. 2-4. The modulation of the connection probability, f II , are all non-zero (Fig. 5a) and tuned so that:J The Jordan form of this matrix is graphically represented in Fig. 5b. The top panel in Fig. 5c depicts simulation results for the correlations in this network. They are all positive for close-by neurons and their spatial modulations increase approximately linearly with K (Fig. 5c, bottom). The top panel of Fig. 5d shows that most of the power in these correlations results from the elementĈ 12 (red). The latter increases linearly with K (Fig. 5d, bottom), whileĈ 11 ,Ĉ 22 = O( √ K/N ) andĈ 21 is two order of magnitude smaller and does not exhibit significant change with K (note the log-log scale in this figure). These simulations are in quantitative agreement with Eq. (27) up to K ≈ 4000 and the deviations for larger K decrease with N (as in Fig 4).
The interaction matrix,J (1) , depends on the matrix J (1) , and on the population gains, g E and g I . If the connectivity matrix has an explicit feedforward structure, the interaction matrix has also such a structure, independantly of the population gains. Thus, although changing the external inputs, I E , I I , modifies this gains, this does not destroy the feedforward structure and thus does not change the scaling of the correlations with K and N . In contrast, in networks with a hidden feedforward structure, this scaling is sensitive to perturbations in the external inputs since the hidden feedforward structure is destroyed unless the ratio g E /g I remains constant.

VI.2. Examples with three populations or more
The two networks depicted in Fig. 6 consist of one excitatory and two inhibitory populations. The positivity of the spatial averaged activity of these three populations, m (0) α (α = 1, 2, 3), constrains the parameters (see Section III), J αβ , through a set of inequalities. We leave the calculation of these conditions to the reader.
In both cases, the first Fourier mode of the population average connectivity matrix has the form with J 12 , J 13 , J 23 < 0 (left panels of Fig 6a-c). In the network of Fig. 6a, J This is graphically represented in Fig. 6a, middle panel.
The matrixĈ 22 < 0. Therefore, the interaction matrix,J (1) , is not nilpotent. Its Jordan form is The upper Jordan block is a shift matrix of degree 2. The corresponding feedforward structure is graphically represented in Fig. 6b (middle panel). According to The-

VII. CONSTRAINT ON SCALING OF THE NUMBER OF INPUTS WITH THE NETWORK SIZE
In this section we assume that K and N scale together with 0 < γ ≤ 1. Equation (26), which determines the correlations of the neuronal activities, is obtained under the Ansatz that the correlations between the inputs h α i (t) and h β j (t) are sufficiently small, namely o(1) (see Appendix A). This condition is more stringent than the balance correlation equation, Eq. (23). It can be written as This condition constrains γ as we now show. According to Theorem 1, if the Jordan form, J jor , has no Jordan block whose real part is a shift matrix for any n, correlations will be O(1/N ). This will also be jor , contains a Jordan whose real part in the shift matrix, we have to apply Theorem 2. In this theorem, P (n) is the dimension of the largest block in J (n) jor , with a real part which is a shift matrix (see Section V). Let us denote by P max the largest P (n) over all Fourier modes, i.e., P max = max n P (n) (42) Equation (41) implies for all n. This yields in the Jordan basis for all µ, ν. By definition of P max , for at least one Fourier mode, n, the matrix J (n) jor has at least one block whose real part is a shift matrix of degree P max . In general, there can be several such Jordan blocks. For example, in the network depicted in Fig. 7b, for which P max = 2, there are two Jordan blocks with P = 2.
We first assume that all blocks which are a shift matrix of size P max are real. Since for such blocks in J jorĈ J * jor scale the same with K, it is sufficient to consider the case where there is only one such block. We denote it by S and byĈ max the corresponding block inĈ. Equation (41) then yields For instance, for P max = 2, we have (see Eq. (B8)) and thus SĈ max S = 0 Therefore, for this block Eq. (41) is always satisfied. The latter equation, however, also applies to other Jordan blocks and Fourier modes. This implies that γ < γ max = 1 For P max = 3, we havê (41) is satisfied only if γ < γ max = 1/2. In general, for a P max × P max shift matrix SĈ max S is O(N γ(Pmax−2)−1 ). This implies that γ < γ max with .
Let us now consider networks in which there is at least one pair of complex conjugate Jordan blocks whose real parts are a shift matrix of size P max . An example, of such a network is depicted in Fig. 7d. The first Fourier mode of the population averaged connectivity matrix in this example is with a, b, c real and positive. For this network with ω = √ ab. These complex conjugate blocks can in general be written as ±iωI + S, where I is the identity matrix of size P max . Thus As shown above, (41) is then satisfied only if γ < γ max , with According to Theorem 2, if J (n) jor has a block whose real part is a shift matrix for at least one mode n, C = O(N −α ) where α = 1 − γ(P max − 1). If γ < γ max , correlations in the activity will decrease more slowly than 1/N , when N is increased. If γ > γ max correlations will increase with N and the network will not operate in the balanced regime. Finally, if γ = γ max , our theory will give O(1) correlations in the input which is inconsistent with the Ansatz in Eq. (41). In this case substantial corrections to the Eq. (26) should be taken into account. A different approach, similar to the one in [28] must be adopted to self-consistently determine these correlations.

VIII.1. Main results
We developed a theory for the emergence of correlations in strongly recurrent networks of binary neurons. Each neuron receives on average K inputs from each of D populations, with probabilities which are spatially modulated. The synaptic strengths scale as 1/ √ K and the network operates in the balanced regime [45]. For simplicity we considered networks with a one dimensional ring architecture with connection probabilities solely dependent on distance and on the nature (excitatory or inhibitory) of the pre-and postsynaptic populations. We present a balanced correlation equation, which together with the balanced rate equation, define the balanced regime and insure that mean inputs to the neurons and their fluctuations are both O(1). We derive a set of equations that determine the equilibrium values of the quenched averaged correlations and we show that the solution of these equations is stable provided that the solution of the balanced rate equations is stable.
Key results of our work are two scaling correlation theorems. The first shows that generically, all the Fourier modes of the quenched average correlations are small when N and K are large. They are of O(1/N ), and independent of K to leading order. This is true in the large N limit even if we take K = pN , provided that p is not too large. However, the second theorem states that there are recurrent network architectures in which some of the Fourier modes of the quenched averaged correlations increase with K. These architectures are characterized by an explicit, or a hidden, feedforward structure in those modes. This structure is revealed by the Jordan form of the interaction matrix averaged over realizations. If this Jordan form contains a block whose real part is a shift matrix of size P > 1, the corresponding mode in the correlation increases at least as O( K P −1 N ). Importantly, in these cases the network still operates in the balanced regime provided that K N

VIII.2. Generality of the results
For simplicity, we assumed that synaptic weights depend only on the identities of the populations to which pre and postsynaptic neurons belong. The results, however, will not change if the weights are heterogeneous with distributions that depend on the pre and postynaptic populations, as long as the mean and the variance of all these distributions are finite.
For notational simplicity we assumed that all populations have the same number of neurons, N , and each population receives, on average, inputs from K neurons in every population. The theory can be easily extended to networks in which population α has N α = ν α N neurons and the average number of connections from population β to population α is K αβ = κ αβ K. This will not affect the scaling of the correlations with N and K. Prefactors, however, will be different. For instance, assuming four times fewer inhibitory than excitatory neurons in the two-population networks of subsection IV.1, without changing the number of connections per neuron, the correlations of the inhibitory neurons will increase by a factor of 4.
We focused on networks with a one-dimensional ring architecture and connection probabilities which are solely distance dependent. This greatly simplifies the problem, because when averaged over realizations, correlations depend solely on distance. Furthermore, because of the linearity of the self-consistent equation for the correlations, the different Fourier modes decouple, allowing us to analyze each mode separately. However, our analytical approach does not require rotation invariance. It can be extended to any network architecture for which the Jordan normal form of the interaction matrix, averaged over realizations, can be established.

VIII.3. Robustness and self-consistency of the results
The theory presented here makes the Anzatz that correlations are sufficiently small so that the dynamics of the crosscorrelations can be linearized (Appendix A). If this Anzatz is correct, the theory is self-consistent. When the theory predicts correlations which are O(1), non-linear terms contribute and the correlations start to deviate from the theoretical value (see for example simulation results for N = 40000 in Fig. 3b). Nevertheless, the order of the correlations is still correctly predicted.
In the large N, K limit, only networks with feedforward structures -hidden or explicit-can exhibit correlations that increase with the average number of inputs. However, when K and N are only moderately large, as is the case in biological systems, a strict tuning of the architecture is not necessary. This is because there is a crossover between the regimes of strong and weak correlations as K is increased (see Fig. 4c). As shown in Appendix B1, the value of K for which this crossover occurs depends on the eigenvalues of the interaction matrix.

VIII.4. Relation to previous works
Non-interacting neurons can exhibit correlations if they share feedforward inputs [57,58]. For instance, Litvak et al. [59] investigated a chain of layers of integrateand-fire neurons lacking any recurrent interactions and coupled only feedforwardly. In their model, each neuron in a layer receives inputs from the same number of excitatory and inhibitory neurons in the previous layer in such a way that their temporal averages exactly balance. They found a build up of correlations along the chain. This is because the correlations induced by shared feedforward inputs are not suppressed during the activity propagation since the network lacks any recurrent interactions and is thus purely feedforward.
Cortes and van Vreeswijk [60] studied a chain of strongly recurrent unstructured E-I subnetworks coupled through excitatory unstructured feedforward projections. They found a gradual build up of correlations along the chain. These correlations, however, decrease if the connectivity, K, and the sub-networks size, N , increase together. This result is in agreement with our theory which predicts that the correlations are O(1/N ) through the whole chain.
In [43] Ginzburg and Sompolinsky considered networks of binary neurons with finite temperature Glauber dynamics, unstructured, dense (K = pN ) connectivity and weak interactions, i.e., of the order of O(1/K) and not O(1/ √ K) as in our work. Mean Field theory shows that in these networks correlations are O(1/N ) with a prefactor which diverges in the zero temperature limit. This is in contrast to what happens in the strongly recurrent unstructured networks we considered here where correlations also scale as O(1/N ) but with a prefactor, which is finite despite the fact that we assumed zero temperature Glauber dynamics. This is because in strongly recurrent networks, intrinsic noise emerges from the deterministic dynamics of the network.
They also demonstrated that in their model the correlations amplify up to O(1) at Hopf bifurcation onsets [43]. At such onsets the dynamics exhibit critical slowing down and thus this amplification is accompanied by a divergence of the decorrelation times. Our work demonstrates a different amplification mechanism: it occurs at a point where the Jordan form of the interaction matrix, J , contains a block which is a shift matrix. Since there is no critical slowing down at such a point, the decorrelation times are finite and on the order of the update time constant (data not shown). Our theory may thus account for substantial correlations with short time scales, as frequently observed in the brain [28,31,61].
Renart et al. [28] and Helias et al. [49] investigated strongly recurrent unstructured networks with one excitatory and one inhibitory population of binary neurons which are densely connected, i.e with K ∝ N . They found that in these networks mean pair correlations were O(1/N ). As they only considered dense connectivity they could not, however, disentangle the dependence on N and K. Here, we considere different relations between K and N and show that in unstructured networks the correlations are O(1/N ), and in practice do not depend on K. This last result is remarkable since one would expect correlations to increase with the degree of connectivity. This is not the case: the balance of excitation and inhibition prevents that to occur in unstructured networks.
On the other hand, and somewhat surprisingly, we found that even if the fraction of common inputs shared by neurons is very small, a build up of correlations can still occur for some network architectures. For example, in a network of four populations with a feedforward structure, correlations would be of O(K 3 /N ), and thus, to satisfy the balanced correlation equation, the scaling of K with the network size can be at most K = O(N 1/3 ). However, with this architecture and scaling, the probability of two neurons to share their inputs is O(N −4/3 ) (Eq. (17)) and the number of inputs shared by two neurons is therefore O(N −1/3 ). Thus, although the number of shared inputs goes to zero in the large N limit, correlations get amplified up to O(1) thanks to the feedforward architecture.
Rosenbaum et al. [39] have recently investigated how feedforward excitation can drive correlations in spatially structured E-I networks operating in the balanced regime. The specific architecture they considered is reminiscent of the particular example presented in Fig. 7a. In their study the fluctuations which drove the correlated activity were those in the feedforward inputs and the contribution to the correlations of the fluctuations generated by the recurrent dynamics was neglected. In contrast, our work focuses on the role of the recurrent dynamics in the emergence of correlations. We recently studied the emergence of correlations in a network consisting of two strongly recurrent E-I subnetworks, the first projecting to the second with topographically organized feedforward connections [17]. The architecture in that work is also reminiscent of the example presented in Fig. 7a. We showed that while in the first subnetwork correlations were weak, O(1/N ), in the second subnetwork the activity was self-organized in such a way that correlations in macroscopic sub-populations of excitatory neurons were finite and did not depend on N and K when the latter were sufficiently large. This does not contradict our theory. In the architecture considered in [17] subsets of neurons in the first subnetwork are projecting to a macroscopic fraction of neurons in the second subnetwork. This organization generates correlations between elements of the connectivity matrix, unlike in the models considered here. Generalizing our theory to such cases is possible but beyond the scope of this paper.

VIII.5. Directions for future work
The theorems of Section V, tell us how the Fourier modes of the averaged correlations scale for large N and K.
In the examples considered in Sections IV, VI and VII, we focused on connectivities whose Fourier expansion involve only two modes. In these cases, the scaling of the spatial correlations can be immediately deduced from that of the Fourier modes. This will also be the case for interactions described by a finite number of Fourier modes. However, if the interactions are described by an infinite number of modes, inferring the scaling of the correlations from those of their modes can be more complicated. For instance, if one takes the large N limit with K ∝ N γ , the convergence of the Fourier series may not be uniform in N . In that case the scaling with N of the spatial correlations may be highly non-trivial. We will address this issue in an upcoming paper.
The present paper focuses on locally averaged twopoint correlation functions. Recent progress in experimental techniques will create large data sets of neuronal activities from which distributions of pairwise correlations can be extracted. The power of theoretical approaches to interpret such data will be greatly enhanced if they provide not only locally averaged correlations but also higher order statistics of their distributions. Thus it would be interesting to extend our approach to estimate the scaling of higher order moments of correlations in binary networks.
Several previous studies investigated EI networks (e.g. [49,62]) in which inhibition and excitation were unstructured and their strengths were only a function of the presynaptic neurons, i.e., J EE = J IE and J II = J EI . Other studies assumed unstructured connectivity with J αE = −J αI , α ∈ {E, I} (e.g. [63]). In both cases the interaction parameters are on the edge of the region where the network evolves towards the balanced state (see Eq. (11)). The network dynamics may be qualitatively different on the edge of this region than inside it. It is thus not clear that the scaling theorems presented here apply to these tuned cases. A further investigation of the correlation structure in such networks is a subject for future research.
Do the conclusions derived here for networks of binary neurons hold for networks with more realistic single neuron dynamics? To approach this question we performed extensive numerical simulations of strongly recurrent networks consisting of one inhibitory and one excitatory population of leaky integrate-and-fire (LIF) neurons. The detailed analysis of these simulations will be presented elsewhere. In brief, we found in our simulations, that in these networks the averaged pairwise correlations scale with K and N in manner that is consistent with the theory presented here for binary networks. It would be very interesting to extend our analytical approach to these type of networks.
To conclude, van Vreeswijk and Sompolinsky [45,46] investigated strongly recurrent networks of binary neurons with unstructured sparse connectivity. They showed how these network dynamics evolve into a balanced state. Due to the sparseness of the connectivity, correlations are negligible in these networks. Subsequent studies [28,49] extended these results to unstructured networks with dense connectivity, and found that here too the network dynamics evolve to a balance state in which reverberations keep the correlations very small. In contrast, as shown here, in networks with structured connectivity correlations can be large. The theory presented here gives the conditions on the network architecture to evolve into a balanced state with strong correlations and show how they depend on the network size and number of connections. equal-time crosscorrelations is given by [1,28,43] If we make the Ansatz that correlations are weak, we can, to leading order, take which is, to leading order, independent of the correlations [28,43,49]. Thus, whereJ αβ ij = g α i J αβ ij . Note that, for weak correlations S α i (t) init does not depend on the correlations so that, S α i (t) init = S α i (t) t , for our choice of initial conditions ( S α i (0) init = S α i (t) t ). Hence, the equal time autocorrelation a α i is independent of time and given by where J αβ (∆) = √ Kg α J αβ f αβ (∆). Here we have assumed that the correlations in the quenched disorder in the inputs to the neurons are small, such that, to leading order, the expected value of the gain does not depend on the neuronal position. We comment on this Ansatz in Appendix C.
Thus, for large N , the quenched average of Eq. (A1) yields Here we have not taken into account that c αα ii (t) = 0. It is easy to see, however, that in the cross-correlations this neglects O(1/N 2 ) corrections. We show below that these corrections are indeed negligeable in the large N limit.
The n th Fourier mode of C αβ (∆, t), C (n) where J This can be written more compactly as where X denotes the D × D matrix, X αβ , and A is the The equilibrium values of the Fourier components of equal-time correlation functions thus satisfy Eq. (26).

Appendix B: Correlation theorems
In Appendix A we derived N 2 D 2 coupled linear differential equations that determine the evolution of equaltime cross-correlations for all neuronal pairs in the network. Averaging these equations over the quenched disorder and using the rotation invariance of the connection probabilities yields a set of N D 2 coupled equations for the quenched averaged correlations. In Fourier space these equations lead to N independent sets of D 2 coupled equations for the correlations. Here we prove Theorems 1 and 2 (see Section V) which state how these correlations scale with N and K.
To leading order, A is independent of K and N , but J (n) is proportional to √ K. Accordingly, we definē Thus, we can rewrite the evolution equation of the n th mode of the correlations as The D × D matrix,J (n) can be written as where J jor can be written as where λ (n) µ are the eigenvalues ofJ (n) and (n) µ = 1 inside a Jordan block and is 0 otherwise (for clarity, in the Jordan basis we use the subscripts µ and ν, rather than α and β that we use in the original basis).
Importantly, the Jordan form of a matrix and of its Hermitian conjugate are complex conjugate. We thus can write For notational convenience we will suppress the superscript (n) in the rest of this Appendix. DefiningĈ aŝ where we have defined Λ µν = 2 − √ K(λ µ + λ * ν ) and We now assume that the connectivity is such that the system is stable to perturbations of the locally averaged rates. This implies that the real part of all eigenvalues of J are less than 1/ √ K. The real part of Λ µν is therefore positive and thusĈ(t) converges to an equilibrium value, C ∞ . To see this, first considerĈ D,1 (t) which satisfies ThereforeĈ D,1 (t), converges to its equilibrium value. The evolution equations ofĈ D,2 can be written as where M D,2 depends onĈ D,1 (t),Â D,1 andÂ D,2 . Sincê C D,1 (t) converges to its equilibrium value, M D,2 converges to a constant. Because Re(Λ D,2 ) > 0,Ĉ D,2 also converges to its equilibrium value,Ĉ ∞ D,1 . A similar argument shows that likewiseĈ D−1,1 converges toĈ ∞ D−1,1 . One then sees by recursion that the whole matrixĈ converges to its equilibrium value,Ĉ ∞ . Hence, Eq. (26) determines the stable equilibrium values of the correlations.
From here we only consider the correlations at equilibrium and, for notational simplicity, we drop the superscript ∞.
In the Jordan basis the equilibrium values of the correlations satisfy Let us now consider the case whereJ is diagonizable, so that µ = 0 for all µ. In this caseĈ µν = 0 for µ = ν andĈ In the first situationĈ µµ = O(1/N ). In the second situ-ationĈ µµ = O(1/N ). Let us now assume thatJ is not diagonizable. Then the D × D Jordan form ofJ consists of B Jordan blocks (1 ≤ B < D) that we denote by [J jor ] i , i = 1, ..., B. The size of the ith block will be denoted by s(i) × s(i). Without loss of generality, we can assume that the blocks are ordered in increasing size.
The indices µ and ν of the elements of the Jordan block i, take values between l(i) and h(i), with l(i) = 1 + i−1 j=1 s(j) and h(i) = i j=1 s(j). All the diagonal elements of this Jordan block are equal one of the eigenvalues of J , which we denote byλ i . The off-diagonal elements, µ , are all equal to 1 except for µ = h(i) (i = 1, .., B − 1) for which µ = 0.
Let us consider Eq. (B3) for µ, ν in sector S ij . Since h(i) and l(j)−1 are zero, the equation in this sector does not depend on elements ofĈ outside of it. Thus, we can solve Eq. (B3) recursively to determine all the elements ofĈ in this sector. The recursion goes as follows. First, one solves forĈ h(i),l(j) We can the solve Eq. (B3) to getĈ µν for µ, ν = h(i) − 1, l(j) and µ, ν = h(i), l(j) + 1. This process can be repeated until all the elements ofĈ in the sector S ij are determined. Forλ i +λ * j = O(1) the recursion shows that we havê for all µ, ν in sector S ij . Ifλ i +λ * j = 0,Ĉ h(i),l(j) = 0 and by recursion all the other elements in sector S ij arê where = ν − µ + h(i) − l(j). Thus, in this caseĈ l(i),h(j) is the largest entry in the sector. It satisfieŝ C l(i),h(j) = O K (s(i)+s(j)−2)/2 /N (B9) The scaling of the elements in sector S ij thus dependsλ i +λ * j . The stability of the dynamics imposes that Re(λ i ) < 0 for all blocks (or positive but at most O(1/ √ K), see next subsection) and thus the condition λ i +λ * j = 0 implies that, for large K, the real part of λ i andλ j are zero. In other words, the real part of [J jor ] i and [J jor ] j are shift matrices of size s(i)×s(i) and s(j) × s(j), respectively. For the sector S ii it means that the real part of [J jor ] i is a shift matrix of size s(i) × s(i).
Proving Correlation Theorem 1 is now straightforward. If the real part of all the B Jordan blocks ofJ are different from a shift matrix, one finds thatλ i +λ So far, we have assumed that all the elements inÂ are non zero. The derivation can, however, be extended to include also situations where this is not the case as follows.
ElementsÂ µν = 0 in a sector whereλ i +λ * j = O(1) might result in some elements ofĈ in that sector to be equal to 0 rather than O 1 N . However, this will not affect the overall scaling of the correlations.
ElementsÂ µν = 0 in a sector S ij whereλ i +λ * j = 0, do not change the scaling in the matrix, provided that A h(i),l(j) = 0. IfÂ h(i),l(j) = 0 the effect depends on A h(i)−1,l(j) andÂ h(i),l(j)+1 . If at least one of them is nonzero, the order of the largest entry in the sector is decreased by √ K. If both of them are zero, the order of this entry decreases at least by a factor of K. Reflecting on further element being zero, one reaches the following conclusion: Let {µ ij , ν ij } be the indices {µ, ν} in S ij for whichÂ µν = 0, which maximize µ − ν. Then, the maximal order ofĈ µν in the sector is O(K Pij −1 /N ), where In conclusion: The highest order inĈ is O(K P −1 /N ), where P is the maximum value of the P ij (i, j ∈ {1, . . . , B}) defined by: 1) P ij = 1 for sector S ij for whicĥ λ i +λ * j = 0. 2) P ij = P for sector S ij in whichλ i +λ * j = 0 and allÂ µν = 0. 3) P ij is given by Eq. (B10) if in sector S ij ,λ i +λ * j = 0 but someÂ µν are zero. where T (n) = T rJ (n) , ∆ (n) = detJ (n) . After some algebra, this equation can be rewritten as Eq. (27).