Spontaneous and stimulus-induced coherent states of critically balanced neuronal networks

How the information microscopically processed by individual neurons is integrated and used in organizing the behavior of an animal is a central question in neuroscience. The coherence of neuronal dynamics over different scales has been suggested as a clue to the mechanisms underlying this integration. Balanced excitation and inhibition may amplify microscopic fluctuations to a macroscopic level, thus providing a mechanism for generating coherent multiscale dynamics. Previous theories of brain dynamics, however, were restricted to cases in which inhibition dominated excitation and suppressed fluctuations in the macroscopic population activity. In the present study, we investigate the dynamics of neuronal networks at a critical point between excitation-dominant and inhibition-dominant states. In these networks, the microscopic fluctuations are amplified by the strong excitation and inhibition to drive the macroscopic dynamics, while the macroscopic dynamics determine the statistics of the microscopic fluctuations. Developing a novel type of mean-field theory applicable to this class of interscale interactions, we show that the amplification mechanism generates spontaneous, irregular macroscopic rhythms similar to those observed in the brain. Through the same mechanism, microscopic inputs to a small number of neurons effectively entrain the dynamics of the whole network. These network dynamics undergo a probabilistic transition to a coherent state, as the magnitude of either the balanced excitation and inhibition or the external inputs is increased. Our mean-field theory successfully predicts the behavior of this model. Furthermore, we numerically demonstrate that the coherent dynamics can be used for state-dependent read-out of information from the network. These results show a novel form of neuronal information processing that connects neuronal dynamics on different scales.

How the information microscopically processed by individual neurons is integrated and used in organizing the behavior of an animal is a central question in neuroscience. The coherence of neuronal dynamics over different scales has been suggested as a clue to the mechanisms underlying this integration. Balanced strong excitation and inhibition may amplify microscopic fluctuations to a macroscopic level, thus providing a mechanism for generating coherent multiscale neuronal dynamics. Previous theories of brain dynamics, however, were restricted to cases in which inhibition dominated excitation and suppressed fluctuations in the macroscopic population activity. In the present study, we investigate the dynamics of neuronal networks at a critical point between excitation-dominant and inhibition-dominant states. In these networks, the microscopic fluctuations in neuronal activities are amplified by the strong excitation and inhibition to drive the macroscopic dynamics, while the macroscopic dynamics determine the statistics of the microscopic fluctuations. Developing a novel type of mean-field theory applicable to this class of interscale interactions, for which an analytical approach has previously been unknown, we show that the amplification mechanism generates spontaneous, irregular macroscopic rhythms similar to those observed in the brain. Through the same mechanism, microscopic inputs to a small number of neurons effectively entrain the dynamics of the whole network. These network dynamics undergo a probabilistic transition to a coherent state, as the magnitude of either the balanced excitation and inhibition or the external inputs is increased. Our mean-field theory successfully predicts the behavior of this model. Furthermore, we numerically demonstrate that the coherent dynamics can be used for state-dependent read-out of information from the network. These results show a novel form of neuronal information processing that connects neuronal dynamics on different scales, advancing our understanding of the circuit mechanisms of brain computing.

I. INTRODUCTION
The cerebral cortex and hippocampus, the areas believed to be the origin of the versatile intelligent functionality of the mammalian brain, exhibit characteristic activities on two different scales. On the microscopic scale, neurons in these areas display various temporal patterns of firing activities in response to external stimuli or to being driven internally. These activities are correlated with fine features of the information the animal is processing [1][2][3][4][5]. On the macroscopic scale, electroencephalograms (EEGs) and measurements of local-field potentials (LFPs) have revealed a diverse range of rhythmic activities. These vary in both frequency and amplitude, but they are clearly correlated with the behavioral states of the animal, such as its attention and arousal levels [6][7][8][9]. Furthermore, in recent years, increasing numbers of experimental results have suggested that coherence of activities on these two scales-namely, the degree of temporal cross-correlation among the activities-is finely controlled, reflecting the mechanisms underlying the binding of sensory stimuli, sensori-motor coordination, and learning in behavioral tasks [10][11][12][13][14][15][16][17][18].
How patterns emerge in multiscale dynamics in highly non-linear and non-equilibrium regimes has been a sub- * hayakawa.takashi@nihon-u.ac.jp ject of active research in statistical physics. From this perspective, understanding the multiscale dynamics in the brain and their coherence can be considered as a challenge in statistical physics. Physicists have thus far constructed various models of the activities in the brain and have investigated those models both numerically and theoretically. In particular, a mean-field theory (MFT) of randomly connected neuronal networks (RNNs) has provided a solid theoretical foundation that allows us to investigate neuronal dynamics using analytical methods similar to those employed for spin-glass systems [19]. To enhance its applicability, different versions of the theory have been developed for different models, ranging from simple networks of neurons described by one-dimensional firing-rate variables to structured networks of neurons described by binary spike variables or more realistic kinetic variables of biological membranes [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. These studies have theoretically shown that their RNNs have dynamical phases with different characteristics, such as chaotic fluctuations in firing-rates, asynchronous irregular firing, and regular and irregular synchronized firing [19,25,27,35]. Efficient computation that takes advantage of the dynamical properties of RNNs has also been investigated recently, in both biological and engineering contexts [36][37][38][39][40][41][42][43][44].
A primary constraint in modeling neuronal dynamics of cortical areas is the fact that neurons in a local circuit densely form synapses on one another, and that these synapses obey "Dale's law", a principle that prohibits neurons from forming both excitatory and inhibitory synapses. Despite the knowledge about RNNs and MFTs accumulated over decades, only recently have researchers successfully begun to develop MFTs for networks under these two constraints. In RNNs comprising N (≫ 1) neurons with these two constraints, O(N ) excitatory and inhibitory neurons are connected with a fixed O(1) probability. As is usually the case for physical systems with random couplings, non-trivial dynamics are observed for synaptic strengths with O(1/ √ N ) standard deviation. Then, Dale's law requires us to determine the means of the excitatory and inhibitory synaptic strengths to be ±O(1/ √ N ), respectively. As a result, neurons receive very strong excitatory and inhibitory recurrent inputs from other neurons. By extending a previous theory [24], recent work showed that as observed experimentally [45][46][47][48][49], feedback inhibition balance strong excitatory recurrent inputs to neurons and stably produces an asynchronous and irregular firing state in such a model [27]. Although this model did not have non-trivial population dynamics, two very recent studies [50,51] investigated spatially extended versions of such balanced RNNs in the presence of external inputs, reporting multiscale dynamics in which macroscopic spatiotemporal patterns and microscopic irregular firing of individual neurons coexisted.
Although these studies have successfully demonstrated the relationship between spatial structures and multiscale dynamics of balanced neuronal networks, there remains a fundamental issue that stems from a limitation inherent in these models. In the previous models, a change in the population activity caused by a small number of constituent neurons is quickly counter-balanced by the feedback-inhibition mechanism, resulting in only vanishingly small responses in the population dynamics [see further discussion in Sec.V C]. Presumably, this is the reason why the previous studies required a network-wide application of external inputs-that is, an extrinsic origin-to induce multiscale dynamics. The vanishingly small responses of their models, however, contrast with recent experimental results that a weak stimulation of a small number of excitatory neurons effectively evokes a population response within the local circuit, suggesting an intrinsic origin of multiscale dynamics [52,53]. This discrepancy may imply a fundamental difference between the nature of the dynamics of the previous models and those of the neuronal circuits investigated experimentally. Theoretically, the weak effects of the stimulation of neurons in the previous models originate from the fact that a set of population statistics of neuronal activities follows equations that are closed among themselves [51]. This implies that the time evolution of those population statistics are independent of the microscopic fluctuations in the activities of individual neurons; namely, that there exists a separation of scales. From a general point of view in statistical mechanics, finding such a separation of scales is a common step in constructing an MFT. The description of intrinsically generated, multiscale neuronal dynamics, however, requires a theory without such a separation of scales. Although microscopic fluctuations are known to evoke very large responses in systems in critical states, the previous theories of critical phenomena are not of immediate use for this purpose, because the average of critical fluctuations over time and population are still vanishingly small in those theories [54]. Thus, regardless of the observation of critical responses in the brain both on the microscopic and macroscopic scales [55], it remains theoretically unclear how the critical responses of neurons are reflected in the population dynamics.
In this study, we present a solution to this fundamental issue by constructing a novel type of MFT for densely connected RNNs with Dale's law, for which mean synaptic weights are set to critical values between those for excitation-dominant and inhibition-dominant states. In this theory, unlike the previous theories of critical dynamics, we show that fluctuations in individual neuronal activities are amplified by the strong excitation and inhibition to provide stochastic driving forces for the population dynamics. We also show that external inputs to a O( √ N ) number of neurons effectively entrain the whole network, comprising N excitatory and N inhibitory neurons, through the same amplification mechanism. Then, we observe that the network dynamics undergo a transition from irregular dynamics to coherent dynamics as the magnitude of either the excitation and inhibition or that of the external inputs is increased. The transition to a coherent state is found to be strongly dependent on the configuration of the random connectivity. These phenomena are predicted by our MFT, which yields good quantitative agreement with direct numerical results. Numerical results further suggest that such coherent dynamics can be used for reading out information from the network in a state-dependent manner. Although, for the sake of mathematical clarity, our theory is derived for a network of simplified neurons described by firing-rate variables, we confirm numerically that similar multiscale dynamics arise in a network of leaky integrate-and-fire (LIF) neurons.

II. MODEL
Our theory is formulated for a single pair of excitatory and inhibitory neuronal populations (denoted by index k = E, I, respectively), each of which consists of N neurons [ Fig.1(a)]. The i-th neuron of population k is described by a single, real-valued dynamical variable h      In these equations, the function φ is the hyperbolic tangent function, which describes the sigmoidal response of the neurons. The quantities h  i (t) denote the strengths of the recurrent synapses on, and the external input to, the neuron. The synaptic strengths are independently and identically distributed (i.i.d.) quenched random variables, and their means and standard deviations are parameterized as g kℓ / √ N and σ 0 / √ N , respectively. Note that the means, but not the standard deviations, depend on the populations to which the presynaptic and postsynaptic neurons belong [ Fig.1(a)]. To describe this randomness, we have used the i.i.d . quenched random variables J ij kℓ with zero mean and unit variance in Eq. (2). Unless otherwise stated, we consider the following case throughout this study: Equation (3)   i (t) ≡ 0, this model is equivalent to the classical model investigated by a previous study [19]. For the case with I where we define S def = {(k, i)|k = E, 1 ≤ i ≤ √ N }. Neurons in cortical areas have been thought to receive such sparse inputs [56,57]. Previous experiments showed that inputs to a small number of cortical neurons can drive the whole local circuit [52,53]. We model these neuronal responses.
In addition to the above model, we also study a model obeying the same dynamical equation as Eq.(1) with the following additional tuning of the synaptic weights: Note that the sum of the random variations in the weights of the excitatory and inhibitory synapses on each neuron is finely tuned to zero; namely, 1≤j≤N J ij kℓ = 0. The weight matrix, J ij kℓ , is known to have a finite number of outlier eigenvalues for the untuned synaptic weights obeying Eq.(2), even in the limit of infinitely large N , but not for the finely tuned synaptic weights obeying Eq.(5) [58,59]. We observe how this qualitative difference is reflected in the dynamics of the model.

A. Model with finely tuned synaptic weights
In this section, we formulate an MFT for the models described above. The detailed derivation is given in Appendix E. The MFT is slightly simpler for the finely tuned model obeying Eqs.(1), (3), and (5) with I k,i (t) ≡ 0 than for the untuned model. Therefore, we first consider the dynamics of the finely tuned model.
Following a similar analysis to the previous one [32], we divide the dynamics of the model into macroscopic and microscopic parts. Let m k (t) be certain averages of neuronal variables h (k) i (t) for k = E, I. The microscopic deviations from these averages are defined as Similarly, we decompose the outputs of the neurons into macroscopic and microscopic parts as The precise definitions of m k and φ k will be stated below.
Here, we note that m k and φ k coincide with the averages of h i ), respectively, over the population k up to the leading order in N .
With these decompositions, we can rearrange the model equations into the following form in the large N limit: The first of the above equations with a suitable initial condition defines m k . The configuration of the random synaptic weights does not change during time evolution. In the framework of MFT, however, we consider the distributions of the microscopic variables, δh (k) i and δφ (ℓ) j , over a large ensemble of networks with different weight configurations. Therefore, the time evolution of these variables is stochastic. In the stochastic dynamics, the driving-force term in the right-hand side of Eq.(9), has the following property: given the entire time evolution of the mean activity m E and m I , the conditional probability distribution for (η (k) , · · · ), for any finite set of time points, t 1 , t 2 , · · · , is a zero-mean Gaussian that is i.i.d. with respect to the index i. This is intuitively justified by the central limit theorem applied to Eq.(10), under the assumption that the random synaptic weights and fluctuations in the neuronal outputs are almost independent [see Appendix E for further justification]. Furthermore, since the driving-force term in the linear equation (9) has a conditionally i.i.d. Gaussian distribution with zero mean, so also do fluctuations in {δh Then, the following first-and second-order moments fully characterize these Gaussian fluctuations: (11) where the brackets denote averages over the Gaussian fluctuations. Note that φ k is defined in the above. The correlation function for η (k) i is obtained from C ℓ (t, s): Below, we omit the population index k because the excitatory and inhibitory populations have the same statistical properties in our model setting. Thus, we have The statistics of the fluctuations defined above must satisfy certain consistency conditions. First, m and φ have been defined from dynamical variables that evolve under the influences of m and φ themselves [Eqs. (8), (9), and (11)]. Therefore, their values need to be determined in a self-consistent manner. Second, the two Gaussian fluctuations characterized by (m, D) and (φ, C) are related to each other because they originate from the same dynamical variables. Consistency among these statistics gives rise to self-consistent equations that determine the time evolution of φ, C, and D for given values of m and boundary conditions. Firstly, φ and C are represented as nonlinear functions of m and D [see Eqs.(F5) and (F6) in Appendix F for the details]: φ(t) = G 1 (m(t), D(t, t)), C(t, s) = G 2 (m(t), m(s), D(t, t), D(s, s), D(t, s)). (14) Secondly, the relation between η (k) i and δh (k) i results in the following dynamical equation: An important difference between our MFT and conventional MFTs lies in the macroscopic driving-force in Eq. (8). The right-hand side of this equation involves the summation of fluctuations in the outputs of individual neurons which can be considered as independent random quantities with correlation C(t, s). Then, the sum of these quantities in Eq. (8), is also a random quantity (which is equal for k = E, I).
From the first line to the second line, we have used φ E (t) = φ I (t). This stochasticity contrasts starkly with the deterministic macroscopic dynamics in conventional MFTs. The central limit theorem implies that η obeys the following probability distribution: where the normalization term F is given by Here, we have regarded η and C as a vector and a matrix, respectively, that consist of their values for infinitesimally discretized timesteps. With these stochastic dynamics for η, the macroscopic dynamics in Eq.(8) reads We note that, for a given history of η and m up to time t, the conditional distribution of η(t + ∆t) determined by Eq.(17) with small ∆t is approximately Gaussian. In this case, in the conditional distribution up to time t + ∆t, the correlation matrix C and normalization term F are independent of η(t + ∆t) up to the leading order in ∆t. Therefore, the deviations of the conditional probability distribution from a Gaussian distribution are negligibly small. This fact enables us to solve the stochastic dynamics numerically for η, m, φ, C, and D by iteratively updating their values with the Euler method [see Appendix F for the details].
In sum, the microscopic fluctuations in the neuronal activities obey a Gaussian distribution that depends on the mean activity m [Eqs. (14) and (15)], and the probability of realizing the mean activity m depends on the correlation matrix C of the microscopic fluctuations [Eqs. (17) and (19)]. Due to this strong link between the microscopic and macroscopic dynamics, the entire dynamics are, in general, non-Gaussian, even though the distribution of η resembles a Gaussian distribution [Eq. (17)]. This means that-unlike in conventional MFTs-a solution for the model cannot be completely determined by the first-and second-order moments.

B. Balance equations
In Eq.(16), we removed the population-averaged part by using φ E (t) = φ I (t). Before using this relation, the macroscopic part of the mean-field equations reads, to leading order, In these equations, the driving-force terms on the righthand side are O( √ N ) and hence may diverge. Thus, the following condition must hold for stable dynamics: Eqs. (22) are called "balance equations." In previous theories, the balance equations were often non-degenerate and determined unique values for φ E (t) and φ I (t). This implies that population-averaged activities exhibit only vanishingly small fluctuations if the entire dynamics are stable. In contrast, in our theory-for the values of g kℓ satisfying Eq.(3)-Eqs. (22) are degenerate, and they are satisfied as long as equality φ E (t) = φ I (t) holds. Furthermore, this equality is always ensured to hold as a consequence of the fact that the macroscopic equation, Eq.(8), has exactly the same driving-force term for the excitatory and inhibitory populations, and hence m E (t) = m I (t). As a result, the average of the neuronal outputs φ k (t) are allowed to fluctuate strongly.

C. Model with untuned synaptic weights
Next, we describe how the above theory is modified for the untuned model [Eqs. (1) and (2)]. In this case, the dynamical equation is divided into microscopic and macroscopic parts in a slightly different manner: j (t)). (24) Note that, in Eq.(24), the fluctuation term δφ (ℓ) j (t) in Eq.(9) is replaced by the uncentered quantity φ(h (ℓ) j (t)). As a result, the microscopic driving-force terms, have the following correlation functions: . This equation means that individual neurons receive additional synchronous inputs with random amplitudes, which can be represented as where {ξ quenched Gaussian variables with zero mean and unit variance. Then, the modified self-consistent equation, together with Eqs. (14) and (26) determines the time evolution of φ, C, and D for a given orbit m. On the other hand, the macroscopic dynamics of m are described by Eqs. (17) and (19).

D. Application of external inputs
In Sec.IV C, we apply external inputs of strength I(t) to √ N neurons in the E population of the finely tuned model. The stimulus-driven dynamics are analyzed by an MFT that is slightly modified from the one introduced above. It is described by (29) where the microscopic and macroscopic stochastic driving-force terms, η (k) i and η, are distributed according to the same equations as those for the autonomous case, namely, Eqs. (12) and (17), with the same self-consistent equations, Eqs. (14) and (15). The difference, φ(t), between the averages of the stimulus-driven and undriven neuronal variables over the Gaussian fluctuations gives an additional driving force term for the mean activity in Eq. (29).  . Solutions for the E and I populations from direct simulations (indicated by "direct") and solutions of the mean-field equations (indicated by "MFT") are depicted in red, blue, and gray, respectively. The numerically determined value of the largest Lyapunov exponent, λLE, is shown above each panel.

IV. RESULTS
A. Dynamics of the finely tuned model

Fluctuations in the mean activity
We first examine the finely tuned model described by Eqs.(1) and (5) without external inputs, for which the MFT takes the simplest form [Sec.III A]. For g 0 = 0, the MFT of this model is essentially equivalent to that studied previously [19]. The previous theory showed that the present model with g 0 = 0 undergoes a transition from a trivial fixed point to a chaotic state at σ 0 = 1/ √ 2, in which the mean activity, m, is constantly zero and individual neuronal activities, h (k) i , exhibit Gaussian fluctuations around it.
We are particularly interested in cases with non-zero values of g 0 . We study these cases both by numerically solving the mean-field equations and by directly simulating the model for a large value of N . In these numerical simulations, we find only a trivial fixed-point solution In contrast, for σ 0 > 1/ √ 2, we obtain non-trivial solutions. Since the repertoire of solutions is qualitatively the same for different values of σ 0 , we show typical activity patterns only for σ 0 = 1.2 in Fig.2(a)-(d). In our MFT, the excitatory and inhibitory populations obey the same dynamical equations. Therefore, we plot only a single representative solution from MFT for each value of g 0 . In fact, in the plots from the direct simulations, the mean activities of the excitatory and inhibitory populations are almost equal, and the individual neuronal activities in the two populations exhibit similar temporal patterns. Comparing the plots from the MFT and direct simulations, we observe similar amplitudes and temporal patterns for the mean activities and the microscopic fluctuations around them. These results suggest that our theory successfully predicts the behavior of the model. Below, we further evaluate this point quantitatively.
As the value of g 0 is increased from zero, the mean activity of the model starts to fluctuate with non-zero amplitudes. . This is expected from the MFT, which shows that the driving force for the mean activity is the summation of individual neuronal fluctuations scaled by . With a further increase in the value of g 0 , the model starts to show irregular, intermittent dynamics, varying between positive and negative values close to ±1, with patterns that are reminiscent of the UP-DOWN states observed in the brain [6,15]  Increasing the value of g 0 still further, we occasionally observe stable fixed-points and regularly oscillating solutions, as well as irregular, chaotic solutions. Although these non-chaotic solutions are observed for networks with a fairly large number of neurons [Appendix C], further theoretical analyses suggest that these solutions are due to finite-size effects and not stable in the thermodynamic limit [see the discussion in Sec.IV B 3 and Appendix H] To examine the extent to which the description provided by our MFT is accurate, we calculate statistics of the dynamics from numerical solutions of the mean-field equations and of the original model equations. We calculate the autocorrelation functions for the mean activity and for the individual neurons as Since we expect the dynamics to be non-Gaussian, we also calculate the fourth-order statistics defined by In these equations, bracketing indicates averaging over both time and configurations of the random connectivity. In Eqs. (31) and (33), we also take averages over population k in direct simulations and averages over microscopic Gaussian fluctuations in the corresponding MFT. The fourth-order statistics defined above vanish if the dynamics are Gaussian. For the direct simulations, we show only the statistics of the E population, because those of the I population are essentially the same.
The panels in Fig.3 compare these calculated statistics, and they show good agreement between the theory and direct simulations. This indicates that our theory predicts the behavior of the model quantitatively, at least statistically. In this figure, we also observe large fourthorder statistics for networks with large values of g 0 , which implies the highly non-Gaussian nature of the dynamics.
For the autocorrelation functions defined above, one would expect a perturbative expansion to provide a good analytical approximation, as it does for many physical systems. We can actually formulate such a method by expanding the dynamics around g 0 = 0. However, it is numerically intractable to carry out the calculation of even the first-order expansion [see Appendix G]. Here, we restrict ourselves to showing only the zero-th order term, µ(τ ) ≈ g 2 0 D 0 (τ )/σ 2 0 , of this perturbative expansion   For larger values of g 0 , the analytical approach encounters another difficulty in addition to the computational problems mentioned above. Fig.2(c) and (d) show that the trajectories of the mean activity are observed with frequencies that are obviously asymmetric with respect to the time reversal of the trajectories. Note that the mean activity overshoots immediately after it makes an intermittent transition between positive and negative values, and that the temporal order of the transition and the overshooting is never reversed. Analytical approaches-such as a perturbative expansion around g 0 = 0-however, yield only symmetric solutions [see Appendix G]. This inconsistency suggests the possibility of symmetry breaking with respect to time reversal. If a symmetry is broken, one cannot expect a symmetrybreaking solution to be obtained from a series expansion around the symmetric solutions. In the following, we use a heuristic approach to seek clues to the occurrence and mechanism of such symmetry breaking and to an understanding of the waveform of the mean activities for large g 0 . In our MFT, the correlation function of the microscopic fluctuations is determined by Eqs. (14) and (15) for a given trajectory of the mean activity, which in turn determines the realization probability of the mean activity. Since this dependence is complicated, we first focus on the case with constant mean activities of different values, expecting the results to provide some clue to the dynamics with time-varying mean activities. Applying the previous theory [19,32] to this analysis, we find multiple fixed-point solutions and chaotic solutions. Fig.4 shows the variance of the neuronal activities of these solutions for different constant values of the mean activity. A branch of chaotic solutions [the black solid line in Fig.4] coincides with the solution examined in a previous study [19] for m(t) = 0. As the absolute value of the mean activity increases, the neuronal fluctuations in these solutions decrease. Another branch of chaotic solutions with smaller neuronal fluctuations [the black dotted line in Fig.4] emerges at the value satisfying 2σ 2 0 φ ′ (m) = 1, |m| ≈ 0.76. The neuronal fluctuations in these solutions increase as the mean activity increases, and this branch eventually connects to that with larger neuronal fluctuations. From the numerical simulations, we find that the branch of chaotic solutions with larger fluctuations is stable, while that with smaller fluctuations is unstable. Fixed-point solutions and their stability can also be examined by applying the previous theory [32] (or the method presented in Appendix H), and we find two connected branches of unstable fixed-point solutions as well as trivial fixed-point solutions [ Fig.4]. The trivial fixedpoint solution is stable for |m| larger than the bifurcation point given by 2σ 2 0 φ ′ (|m|) = 1, while it is unstable below this point. Fig.4 suggests the following explanation for the waveform of the mean activity and its time-reversal asymmetry observed in Fig.2(c) and (d). Let us assume that, for a time-varying mean activity, the instantaneous behavior of the neuronal fluctuations is the same as the above solution for the corresponding value of the constant mean activity. When the mean activity remains small for some time, the neuronal fluctuations increase. Since the neuronal fluctuations serve as a driving force for the mean activity, the mean activity is stochastically pushed to larger values. For larger values of the mean activity, the neuronal fluctuations decrease (along the black solid line in Fig.4), while still remaining chaotic. When the mean activity reaches a value for which there are no stable chaotic solutions, the neuronal fluctuations start to decay to the trivial fixed point. Then, the mean activity loses its driving force and decays to smaller values. In this descending part of the mean activity, the network state passes through the region with the unstable chaotic and fixed-point solutions (the lower branches of the nontrivial solutions in Fig.4). The profile of the neuronal fluctuations in this descending part is therefore different from that of the ascending part. Because of this passage through the region with unstable fixed points, both the neuronal fluctuations and the mean activity slow down, as we observe in Fig.2(c) and (d). We suggest that this hysteresis in the multiscale dynamics is the mechanism for the observed waveform of the mean activity and its time-reversal asymmetry.

Ferromagnetic effects and critical fluctuations
Thus far, we have examined balanced networks with parameter values satisfying the condition in Eq. (3). In this section, we briefly mention what happens if this condition is not satisfied. As in a previous study [32], if the balance equation is not degenerate, the mean activities of the neuronal populations take a set of constant values uniquely determined by the balance equations, or else diverge. The remaining cases are described with two parameters α and β as By definition, the parameter α is interpreted as the magnitude of ferromagnetic interaction, while β is interpreted as the relative gain of the synaptic input to inhibitory neurons. The case we have examined in the previous sections corresponds to (α, β) = (0, 1). We examine the fluctuations in the mean activity by calculating their mean and variance averaged over a long period of simulations and by observing how they change as the value of α or β deviates from (α, β) = (0, 1) [ Fig.5(a) and (b)]. We find that the mean activity diverges as α increases or β decreases, while the variance of the fluctuations decays to zero as α decreases or β increases. As shown in Fig.5(a) and (b), the rate of this divergence and decay is proportional to √ N , which indicates that in the N → ∞, the macroscopic dynamics of the network are divergent or trivial for α = 0, β = 1 and α = 0, β = 1. This behavior can be understood by first examining O(1/ √ N ) values of g 0 and then taking the g 0 → ∞ limit. As shown in a previous study [32], for g kℓ ∝ 1/ √ N , the dynamics of the mean activity are no longer subject to the balance between strong excitation and inhibition but instead are described by a simpler MFT: for k = E and I, whereφ ℓ (t) is the population average of φ(h j (t)). Fig.5(c) shows how the mean activity changes as g 0 increases for fixed values of α and β. As g 0 increases for α > 0, β = 1 or α = 0, β < 1, the second-term on the right-hand side of Eq.(35) starts to dominate, causing the trivial solution to bifurcate in a similar manner to a ferromagnetic transition [the result with (α, β) = (1/3, 1), (0, 1/4) and the result for a purely ferromagnetic interaction shown in Fig.5(c)]. For α ≤ 0, β = 1 or α = 0, β ≥ 1, we find that the mean activity remains at zero.
For this case of strict inequality, the second term on the right-hand side of Eq.(35) supplies feedback suppression to changes in the mean activity in a similar manner to anti-ferromagnetic effects. For α = 0, β = 1, however, such a feedback mechanism does not work. These behaviors of the model with O(1/ √ N ) values of g 0 account for the divergent or suppressed dynamics observed for O(1) values of g 0 as the limit of the former. These results also suggest that the present model under the condition given by Eq. (3) is at the critical point between the two states governed by extremely strong ferromagnetic and anti-ferromagnetic interactions.
B. Dynamics of the untuned model

Qualitatively different solutions
The behavior of the untuned model, described by Eqs. (1) and (2), is different from the results discussed above. We show plots of its activity patterns in Fig Which of these diverse solutions is observed for a given set of parameter values depends on the configuration of the random connectivity of the directly simulated networks or on the sequence of the random numbers used for the simulations of the mean-field equations. In the regularly oscillating solutions, we also observe that both the mean activity and the activities of individual neurons are coherent, which means that the activities of individual neurons have various waveforms but are all phaselocked to the same rhythm [ Fig.6(i)].

Strong dependence on the configuration of the random connectivity
Next, we examine in further detail the three qualitatively different solutions for larger values of g 0 . Here, we emphasize that the type of the observed dynamics depends on the configuration of the random connectivity but not on the initial condition of the simulations.  same parameter values but different configurations. We find that a network with the same configuration shows dynamics convergent to the same attractor when it is simulated from different initial conditions, while those with the same parameter values but different configurations show various dynamics. Such individuality among networks with different configurations was expected from the outlier eigenvalues of the synaptic weight matrices [58][59][60], which we also confirm numerically [Appendix B]. The outlier eigenvalues in the synaptic weight matrices indicate that the untuned model has a strong configuration dependence at the level of its dynamical equation.
Despite this obvious configuration dependence, our mean-field equations reproduce activities similar to those of the directly simulated networks. We therefore expect the MFT to give us further insights into the configuration-dependent dynamics, and we examine this point below.

Fixed-point solutions and their stability
We first examine the observed fixed-point solutions. Suppose that the activity of the entire network is constant, with mean activity m(t) ≡ m f . Recall that the dynamics of the mean activity are described by Eq. (19), rewritten here as where the fluctuation term η(t) is generated according to Eq. (17). If the network state remains at a fixed-point, the correlation function of the microscopic fluctuations, C(t, s), is constant, and therefore, neuronal activities take normally distributed values that do not change temporally. Then, the fluctuation term η(t) does not change temporally either, because it is the sum of the microscopic neuronal fluctuations [Eq. (17)]. From Eq.(36), we find that must hold in order for the network state to remain at the fixed point without requiring an external input. Applying the MFT, we find that there is a continuous band of values for m f for which the stable solution for C(t, s) is constant. From that analysis, we expect that a solution satisfying Eq.(37) exists with a non-zero probability [also see the discussion at the end of Appendix H]. The existence of this band suggests the mechanism for the appearance of fixed-point solutions as follows: When the mean activity stays in this band, C(t, s) tends to be constant, and hence, the microscopic fluctuations slow down. As a result, the mean activity receiving driving forces from the microscopic fluctuations also slows down. This leads to the convergence of the entire dynamics to an equilibrium point. This scenario is justified by a perturbative stability analysis. In this analysis, we examine the response of the system around a fixed-point solution to a temporary external perturbative input. Suppose that Eq.(37) holds and that the network state is set to a fixed-point solution with mean activity m(t) = m f for a long time prior to t = 0. Then, suppose that temporary external inputs, collectively denoted by p, are applied in t > 0. For t ≤ 0, the self-consistent equation, Eq.(15), reads with the variance of δh [see Appendix H for the derivation]. Here, the angle brackets with subscript 0 denote averaging over the unperturbed dynamics with m(t) ≡ m f . For t ≥ 0, we perturbatively expand the dynamics around the fixed-point solution. We calculate how a change in the mean activity, δm(t) = m(t) − m f , evokes a response in the correlation D(t, s) and how the evoked response in the correlation generates additional fluctuations in η(t). We refer readers interested in the details of this analysis to Appendix H. From this analysis, we find that up to the first order, a self-consistent equation of the following form-with i.i.d. unit-Gaussian coefficients ξ jℓ and ξ ′ jℓ determined by the configuration of the random connectivity-must be satisfied: Here, the term p 0 is the component of the perturbative input that is uniformly applied to all neurons, and the terms d 1 [δm], d 2,jℓ [δm], and d 3,jℓ [p] are certain linear transformations of δm or p, respectively. The operation denoted by (1 + ∂ t ) −1 is defined as The solution of this self-consistent equation can be obtained explicitly. From this solution, we find that if we have for constant θ 1 calculated from the unperturbed dynamics [see Eq.(H26) for the details], δm(t) → 0 holds with a non-zero probability as t → ∞, depending on the values of ξ jℓ , and hence, on the random weight configuration. This convergence of the mean activity, together with the microscopic stability given by Eq.(39), implies the stability of the entire dynamics around the fixed point, and hence, justifies the scenario with the slowing-down of both the mean activity and the microscopic fluctuations. From the same analysis, we also see that, depending on the values of ξ jℓ , the obtained solution converges for a short time after the application of the input but eventually diverges.
In summary, we find that the model has fixed-point solutions for which the mean and variance of the neuronal activities are determined in a configuration-dependent We find that the conditions for the stability are actually satisfied for a certain range of m f [orange arrows in Fig.8(a)]. We find that all fixed points observed in numerical simulations, including those shown in Fig.6, fall in this range of m f [ Fig.8(b)]. This contrasts with fixed points that are occasionally observed for the finely tuned model with large g 0 . Fixed points of that model never satisfy the corresponding stability condition. This implies that fixed points do not exist in the thermodynamic limit. In the above analysis, we find that the condition given by Eqs. (39) and (42) itself does not depend on the values of g 0 , while the probability of realizing stable fixed points does depend on it. For small g 0 , the values of η(t) that satisfy Eq.(37) with a value of m f within the range of stability is very large. The realization probability of such a large value of η(t) is expected to be small from Eq. (17). This explains the reason that we do not observe fixed points for a very small g 0 and also suggests that stable fixed points still exist, albeit with a very small probability, for such small g 0 .

Regularly oscillating solutions and their stability
Next, we examine the regularly oscillating solutions. If the entire network dynamics have stable oscillations, so do the microscopic parts of the dynamics. To find such microscopic oscillations for a given oscillatory orbit of the mean activity, m(t) = m o (t), we solve the self-consistent equation, Eq.(28), which we rewrite as This equation can be solved iteratively in the frequency domain [see Appendix I]. Using the mean activity observed in Fig.6(g) for m o (t), we compute the autocorrelation D(t, s) and show its magnitudes in the frequency domain [ Fig.9(a)]. We find that the solution has a nonzero value only for multiples of the basic frequency ω 0 of the mean activity, which indicates that the microscopic dynamics are completely entrained by the oscillatory mean activity. We also confirm this by checking the following averaged autocorrelation function calculated from D(t, s): The autocorrelation function in Fig.9(d) shows that the neuronal variables are periodic. In contrast, as is well known from a previous study [19], the autocorrelation function D(t, s) for zero mean activity, m(t) ≡ 0, has the frequency representation, D(ω 1 , ω 2 ) = D 0 (ω 1 )δ(ω 1 − ω 2 ), with a continuous function D 0 (ω 1 ) [ Fig.9(c)]. The averaged autocorrelation function for this case is unimodal and tends to zero as τ → ∞ [ Fig.9(f)]. The qualitative difference between these two autocorrelation functions suggests the occurrence of a phase transition from one to the other. To check this, we decrease the amplitude of the mean activity, m o (t), without changing its waveform, and we find that the solution starts to have a continuous spectrum extending over frequencies other than the multiples of ω 0 . After the transition, the autocorrelation function has the form of D(ω 1 , Fig.9(b)] and the averaged autocorrelation function has both a periodic component and a component that vanishes at infinity [ Fig.9(e)]. As the amplitude of the mean activity decreases, the periodic component in the autocorrelation function gradually decays, disappearing at m(t) = 0. The analysis we perform in Appendix I actually shows that these observed entrained dynamics are stably realized. The transition behavior observed above is qualitatively the same as that observed in a previous study [29], although that study used periodic inputs with random phases to induce the transition. This microscopic transition gives an intuitive explanation for the mechanism of the observed oscillations. Recall that the dynamics of the mean activity is described by Eq. (19), rewritten here as Because the driving-force term, η(t), is the sum of the microscopic fluctuations, it is entrained to the oscillation of the mean activity itself, if the mean activity oscillates with a sufficiently large amplitude, to entrain the microscopic fluctuations. We suggest this reverberation of entrainment between the mean activity and the microscopic fluctuations as the mechanism underlying the coherent oscillations we observe in Fig.6(g) and (h). The stability of this reverberation mechanism can be examined by using a perturbative method similar to that employed for fixed points: we derive a self-consistent equation with random coefficients that determines linear responses to external inputs, and construct its solution from which the condition for the stability of the reverberation can be examined numerically. From this analysis, we draw the same conclusion about the stability as that for the fixed points: regularly oscillating solutions for the untuned model, such as that observed in Fig.6(g), are linearly stable with a non-zero probability; occasionally observed regular oscillations of the finely tuned model [ Fig.16(c)], however, turn our to be unstable. These conclusions are consistent with the tendency observed in the results of direct simulations of networks with different system sizes [ Fig.15(a),(b)]. Since this analysis is complicated and essentially the same as that for fixed points, we omit its presentation here and refer interested readers to Appendix I. We only note that we cannot examine exhaustively oscillatory orbits, and therefore we cannot completely exclude the existence of stable limit-cycle solutions for the finely tuned model.

C. External inputs to
√ N excitatory neurons

Coherent states induced by sinusoidal inputs
In this section, we apply sinusoidal inputs of amplitude A and period T per to √ N neurons in the E population of the finely tuned model, namely, in Eq.(4). In this model setting, we observe two qualitatively different types of behavior [ Fig.10(a)-(h)]. We find that the activity patterns obtained from the direct simulations and from the MFT are quite similar, suggesting that our theory successfully predicts the behavior of the model for the case with external inputs as well. As we increase the amplitude A for a fixed value of g 0 , the solutions undergo a transition from irregular chaotic dynamics partially entrained by the input to regular nonchaotic dynamics synchronous with the input. This indicates that inputs to an O( √ N ) number of neurons can effectively entrain the whole network in this model.
We further observe that this transition occurs at different values of A, depending on the configuration, but not on the initial condition, similarly to Fig.7 (not shown). Fig.10(i) shows a histogram depicting the percentage of twenty networks with random configurations that synchronize with the inputs for each value of A. We see that the transition point is highly variable among networks with different configurations. Nevertheless, in the autocorrelation function µ(τ ) of the mean activity averaged over configurations and time according to the definition in Eq.(30), we observe good agreement between direct simulations and MFT [ Fig.10(j)].

Stability of the entrained dynamics
The stability of the regular, entrained dynamics can be examined using the same perturbative method as that for the regularly oscillating solutions of the model without external inputs. In this case, the results of the analysis indicate that the numerically observed oscillations of the model are linearly stable [see Appendix I]. Consistent with this finding, in the direct numerical simulations, the induced coherent states are robustly observed for networks with large system sizes [see Appendix C]. In contrast with the untuned model, in the present case, the synaptic weight matrix of the model does not have configuration-dependent outlier eigenvalues. We observe, however, that the linear variational equation around the attractors does have coefficient matrices with outlier eigenvalues [see Appendix B for details]. These results suggest that the finely tuned model still shows strong configuration-dependence in its stimulus-driven dynamics.
Here,φ k denotes the population average of φ(h (k) i (t)) over the population k. In the above equation, the contribution of this population average is subtracted. This is because the population-averaged activity simply replicates the external inputs, and reading out this component does not have much computational value.
If the dynamics are coherent, we expect the read-out values, γ(t), to be O(1). In contrast, if the dynamics are chaotic, the ensemble of neuronal activities can be regarded as an incoherent Gaussian fluctuations, and therefore, values read out from them are expected to be . We numerically test this hypothesis by examining the values read out with the following weighting coefficients: In this equation, we set the coefficients to such values that the initial value of γ(t) at time t init is O(1). We show the network activities and typical read-out values obtained from them in Fig.11(a) and (b). The values read out from the coherent regular oscillation show a regular pattern of magnitudes comparable to the initial value, γ(t init ), [ Fig.11(a)], while those read out from the irregular activity decay rapidly from the initial value [ Fig.11(b)]. This observation is consistent with the above argument. To evaluate the magnitudes of the read-out values further, we calculate the following normalized standard deviation, S γ , of γ(t) for networks of different system sizes: The bracket in this equation indicates the average over the entire network. We also take the average over a long time, of length T avg , starting from a suitably chosen initial time t 0 , for a simulation that starts from a random initial condition different from that of the simulation for which we have determined the weighting coefficient r i k . We also show the activity patterns obtained from this initial condition in Fig.11(a) and (b). In Fig.11(c) and (d), we show the calculated values of the normalized standard deviations on logarithmic scales, and we find that the values read out from regular oscillations do not depend much on the system size, while those read out from irregular dynamics are roughly proportional to 1/ √ N , as we expect. These results suggest that the above mechanism for reading out O(1) values only when the network dynamics are coherent enables neuronal networks to transmit information in a state-dependent manner. Note that we have repeatedly read out the same pattern from the coherent dynamics, regardless of the independent initial conditions [ Fig.11(a)]. Regardless of the symmetry among the neurons that receive inputs through statistically the same set of synaptic weights, identical coherent dynamicsnot coherent dynamics randomly reshuffled with respect to the neuronal indices-are always realized. Although the above coherent states are induced by artificial sinusoidal input, similar results are obtained for the case with irregular input [see Appendix D for details].

Remarks on the untuned model under external inputs
The untuned model behaves in a qualitatively similar manner to the finely tuned model when both are driven by external inputs. The untuned model also shows transitions from irregular, partially entrained dynamics to regular dynamics that are completely synchronous with external inputs in a configuration-dependent manner. Thus, to avoid redundancy, we do not present the results for this model setting in this article. From a quantitative viewpoint, we note that the entrainment in this model setting is more complicated than that for the finely tuned model, presumably because the untuned model has inherent configuration-dependent rhythms, as observed in Fig.6.

D. Multiscale dynamics of critically balanced networks of LIF neurons
Thus far, we have focused on a highly simplified model with firing-rate variables. However, from way we have constructed our MFT, we expect that in principle a similar theoretical framework will hold for networks of spiking neurons (and hence we expect similar multiscale dynamics to those observed in the rate model). To demonstrate this, we numerically examine a commonly investigated network of leaky-integrate-and-fire (LIF) neurons (see e.g. [61]) described by the following equation: for k = E, I, In this equation, the variables V III], fluctuations in the inputs to the neurons in the network are considered to be conditionally Gaussian, given the orbit of the population-averaged input to the neurons. Since the population-averaged input is given by the sum of the fluctuating inputs to individual neurons divided by √ N , the population-averaged input is stochastic and its realization probability is determined through the correlation of the microscopic fluctuations. The neuron model used in the above, however, is highly nonlinear, and thus solving the mean-field equations demands much more intensive numerics than those we presented in the previous sections. Thus, we restrict ourselves to numerically simulating the model and examining whether similar multiscale dynamics arise intrinsically.
As the value of g 0 is increased from zero with the above model settings, we actually observe increasingly large fluctuations in the population-averaged activity [ Fig.12].
Examining the autocorrelation and cross-correlation functions of individual and populationaveraged inputs to the neurons, we observe that the population-averaged input is significantly correlated with the excitatory, inhibitory, and overall inputs to individual neurons [ Fig.12(d)-(f)]. As we increase the value of g 0 , these dynamics undergo a transition to coherent dynamics in a configuration-dependent manner [ Fig.12

V. DISCUSSION
In the present study, we developed a novel type of MFT for RNNs consisting of a pair of excitatory and inhibitory populations of simplified neurons with finely-tuned or untuned synaptic weights that obey Dale's law. The mean strengths of the synaptic weights were assumed to take a set of critical values. In this theory, microscopic fluctuations in the neuronal activities that are amplified by the strong excitation and inhibition serve as driving forces for the macroscopic dynamics of the population activity, while the population activity constrains the statistics of the microscopic fluctuations. The investigated RNNs exhibited non-vanishing fluctuations in their population activity. When the magnitudes of excitation and inhibition were large, we found interesting dynamical properties in these fluctuations, such as high non-Gaussianity and asymmetry with respect to time reversal for the finelytuned model, and strongly configuration-dependent transition to a static or oscillatory non-chaotic state for the untuned model. In the oscillatory state, neuronal ac- tivities have various waveforms while they are all phase locked to the rhythm of the population activity. Our theory successfully predicts these dynamical properties. We found that these multiscale dynamics occur at a critical point between extremely strong ferromagnetic and anti-ferromagnetic states.
In networks with external inputs, periodic inputs to an O( √ N ) number of excitatory neurons effectively entrained the whole network. As the amplitude of the inputs was increased, the networks underwent a transition from irregular, partially entrained dynamics to regular dynamics synchronous with the input; the transition point again depended on the configuration. Unlike the autonomous case, the application of external inputs induced a transition to a coherent oscillation in the finelytuned model, indicating that the fine-tuning of the synaptic weights reduces but does not remove the configuration dependence from the network dynamics. We also showed numerically that the induced coherent dynamics can be used as media for transmitting information in a statedependent manner.
Furthermore, based on analogy, more biologically realistic networks of spiking neurons are expected to display similar multiscale dynamics. Although theoretical prediction of their dynamics is much more computationally demanding and beyond the scope of the current study, we numerically confirmed this hypothesis for networks of LIF neurons.

A. Closely related results
The present study was largely inspired by a previous investigation of RNNs [32] and by a couple of published and unpublished studies on the same model as ours [60,62]. In the former study [32], the authors investigated a network with a balance between strong excitation and inhibition, providing a theoretical framework for dealing with a network with multiple neuronal populations and for analyzing its non-trivial fixed points and transitions to chaos. Our study used their theoretical framework as a starting point to analyze further the non-trivial population dynamics that they had not analyzed. In the latter published study [62], the authors numerically analyzed the same network as ours and found similar oscillatory dynamics. They further analyzed the dynamics with approximate reduced equations and related them to the eigenvalues of the connectivity matrices. Although these results were quite inspiring, their approach-focusing on the eigenvalues with the largest real parts-was not always sufficient to characterize behavior of the model. It is known that the eigenvalues of the connectivity matrices of their networks with finely tuned weights in the N → ∞ limit are uniformly distributed over a disk [59] [ Fig.13(b)]. This implies that it is difficult to select a single eigenspace that effectively determines the dynamics in this limit, on which their theory relied. The unpublished study [60] also took a similar approach for the same model and came to a pessimistic conclusion about the usefulness of an MFT in this setting. Sometime after we initially publicized the present results, a very recent work [63] examined the case with g 0 ≈ 0, ∞ for our networks. Those authors developed a perturbative mean-field theory combined with the approach of [62]. Although they gained insights into the behavior of the model analytically, their theory still largely relies on heuristic, approximate calculations, and its application is limited to nearly deterministic dynamics with small fluctuations. In contrast with these previous studies, we have here derived an MFT in a much more rigorous manner and have laid a foundation for further analysis. Our theory applies to the entire range of model parameters and gives accurate probabilistic descriptions of large macroscopic fluctuations.

B. Stimulus-induced suppression of chaos and reservoir computing
Prior to the present study, several authors have theoretically studied the externally driven, non-chaotic dynamics of neuronal networks without balanced excitation and inhibition [29,64,65]. These studies have shown that the chaoticity of the dynamics of RNNs is suppressed by external random inputs and that a non-trivial nonchaotic regime appears after a transition at some amplitude of the input. In particular, a seminal study [29] showed that sinusoidal inputs with random phases induce a coherent state similar to ours. The transition to a coherent state in the present model is closely related to this suppression of chaos by stimulus, because the microscopic part of our model dynamics are statistically the same as the dynamics of a model without balanced excitation and inhibition that is suitably driven by a uniform external input, as shown by our MFT. In fact, the autocorrelation functions of neuronal activities of our networks [ Fig.9] behave in a similar manner to those observed in the previous study [29]. However, we note that the transition to this microscopic coherent state due to a uniform external input, not with random phases, has not been well studied to date. Besides the fact that the transition induced by uniform input is more difficult to analyze theoretically, the uniform application of an input often results in chaotic or trivial dynamics, and the transition to a coherent state is not found unless the waveform of the input is finely tuned. In the present model, the waveforms of the mean activity that induce a coherent state are determined by the network itself through the interactions between the microscopic and macroscopic dynamics, even in a case with external inputs. The main difference between the coherent states in the present study and those in previous studies lies in this spontaneity.
The spontaneously discovered coherent states discussed above may have implications for learning with RNNs. In previous studies, learning was first considered in the context of the "edge of chaos," where the variety and stability of network dynamics at the transition point to chaos were exploited in learning [66][67][68]. More recent studies have focused on different non-chaotic dynamical phases induced by external or feedback inputs [40,65,69]. In particular, the authors in a seminal work [40] stably reconstructed desired patterns from the coherent dynamics induced by randomly weighted strong feedback from read-out values to all of the neurons in the network. This strong random global feedback is expected to induce coherent dynamics by a mechanism similar to that studied in [29] (see also [65,70] for a similar result with strong random global feed-forward input). This requirement for a strong global input, however, restricted the applicability of their framework to the supervised learning of a small number of temporal patterns. Our results suggest a new regime of dynamics, in which non-chaotic coherent dynamics emerge spontaneously and stably reproduce output patterns [ Fig.11(b)], without being passively entrained by strong global inputs. Investigating learning based on the dynamical phase we have found is a worthy challenge for a future study.

C. Population dynamics and critical fluctuations
The relationship between population dynamics and individual neuronal activities has also been studied in previous models. In these studies, however, population dynamics and microscopic fluctuations in individual neurons were treated as statistically independent. Therefore, unlike our theory, none of the previously proposed theories for balanced networks could account for the experimentally observed strong impact of single neurons on the population activities [52,53].
In sparsely-connected balanced networks of spiking neurons, population dynamics are unaffected by fluctu-ations in the irregular firing of individual neurons. In fact. a previous study [25] showed that even when individual neurons fire irregularly, the population-averaged activities exhibit regular slow oscillations except for tiny fluctuations due to finite-size effects. This indicates the fact that the irregular firing of neurons exerts only negligible effects on the population dynamics of the sparsely connected networks.
In densely connected balanced networks, stimuli to a small number of excitatory spiking neurons also induce a vanishingly small response in the entire population. Two recent studies investigated the responses of such networks with spatial structures to correlated external inputs applied to a large number [O(N )] of neurons [50,51]. It was shown analytically that a non-negligible population response can be induced only when the spatial extent of the input correlation is narrower than that of recurrent connections from a single neuron [51]. This theoretical result should also hold when stimuli are given to a small number [O( √ N )] of excitatory neurons because such stimuli generate correlated internal inputs to the surrounding neurons that are connected with the stimulated neurons. In this case, the previous theory indicates that the population response is negligibly small.
The difference in the impact of single neurons on the population dynamics between the previous models and our model can be understood from the strong ferromagnetic effects examined in Sec.IV A 3. Previous models focused on the dynamical regime in which the network activity is stabilized by strong inhibitory feedback that suppresses excessive excitations. This regime corresponds to the anti-ferromagnetic state we observed in Sec.IV A 3. The anti-ferromagnetic effects strongly suppress the responses of the neuronal population when a small number of neurons are stimulated. In contrast, we have focused on the dynamical regime emerging at the critical point between the ferromagnetic and anti-ferromagnetic states. Activated spontaneously or driven by stimuli to a small number of neurons, our model displays strong macroscopic fluctuations at the critical point. Remarkably, our MFT precisely describes the probabilistic behavior of these critical fluctuations , which was, to our knowledge, difficult for any of the previous MFTs developed in the statistical mechanics of disordered systems. Although the parameter values yielding the critical point are not generic, experiments have shown evidence for selforganized critical dynamics in the brain [55], and thus we can reasonably expect some adaptive mechanisms to finely tune the system to the critical point.
The intrinsic origin of the multiscale dynamics may also be supported by the experimentally observed large cross-correlations of EEG/LFPs and individual neuronal activities [15]. EEG and LFPs are considered to reflect mainly a collective excitatory component of synaptic inputs to (apical dendrites of) neurons in the local circuit [71], and thus they should reflect the waveforms of the very large excitatory inputs to neurons. When strong excitatory and inhibitory inputs to neurons cancel out, the remaining fluctuations do not need to be strongly correlated with the original excitatory and inhibitory inputs, especially if the main driving-force for the population dynamics are extrinsic. In fact, from the multiscale dynamics of the previous model, small cross-correlations of population dynamics and recurrent excitatory input are expected (cf. Fig.1 of [51]). In contrast, our model shows large cross-correlations [ Fig.12(d)-(f)]. From the fact that cortical activity displays switching behavior between states with small and large cross-correlations of EEG/LFPs and neuronal activities [15], our theory and previous theories are suggested to separately model two different operating regimes of the same cortical circuits.
In our theory, the dynamical nature of the critical fluctuations depends strongly on the detailed configuration of the synaptic connections. On the other hand, accumulating evidence suggests that the connectivity of local cortical circuits is rapidly remodeled [72]. Whether the fine connectivity structure has a strong impact on critical fluctuations in cortical network dynamics-and what functional implications such an impact has-need to be further clarified.

D. Limitations and future extensions of the theory
Despite the advantages of our theory mentioned above, it is fair to say that the validity of the theory is still restricted by the simplicity of the model settings. One of the most important steps for widening the applicability of our theory is to extend the theory to networks of spiking neurons. In the present study, we gave priority to the analytical tractability and simplicity in numerical simulations, and we restricted ourselves mainly to networks of firing-rate model neurons. However, the extension of our theory to networks of more realistic model neurons should be straightforward. This can be done by regarding balanced inputs to individual neurons as Gaussian fluctuations and by determining the related statistics to ensure consistency with the nonlinear dynamics of single neurons. In this calculation, we can use our method in combination with a previous method [31] to describe the network dynamics. In the previous study, the mean-field equations for a simple RNN of nonlinear firing-rate units without balanced excitation and inhibition was solved by using the statistics calculated from extensive numerical simulations of single units driven by random forces. Applying the combined method to a critically balanced network of spiking neurons would be computationally demanding but in principle doable.
Although we leave this challenge for future study, we note that the computational costs associated with the approach of [31] cannot be reduced by commonly employed approximate treatments such as white Gaussian approximation of inputs to neurons [25,61]. This is because we must take into account the time-dependent nonlinear interactions between the microscopic neuronal fluctuations and macroscopic population activity underlying the crit-ical multiscale dynamics. These interactions cannot be handled by the approximate treatments. This distinction between a full treatment and an approximate treatment may be related to the recent controversial argument about the transition in networks of spiking neurons from a state with irregular spiking at a constant rate to a state with irregular firing-rate fluctuations, as the mean strength of the synaptic connections is increased [61]. Although the interpretation of this observation based on an approximate description was controversial [73], a full treatment is expected to give an accurate description.
The other simplified aspects of the model include the neglect of different cellular properties of excitatory and inhibitory neurons and of spatial structure of cortical circuits. Although our theory can be extended to include these elements, substantial works will be needed for that. For example, for models with different membrane constants of excitatory and inhibitory neurons, automatic cancellation of excitatory and inhibitory inputs with φ E = φ I is not ensured from the condition in Eq.(3). However, it is plausible that some feedback mechanism dynamically clamp the population activity to satisfy φ E = φ I . Then, similar critical multiscale dynamics to those we have observed are expected to emerge. In experimental studies, characteristic spatial responses have been observed when a small number of neurons were stimulated [74]. Thus, it is an important open challenge to understand how critical multiscale dynamics emerge in a spatially extended model and to examine whether those dynamics are consistent with the experimentally observed spatial patterns.
From the theoretical point of view, another question that remains unanswered concerns the way qualitatively different solutions bifurcate in our model when we increase the magnitudes of excitation and inhibition. Theoretical analysis of this bifurcation is hard due to the fact that the MFT is constructed based on an averaging over network configurations while the bifurcation point depends strongly on the individual configurations. Regarding this point, a recent study [33] developed a theory of linear dynamics for disordered systems with individual configurations. The stochastic linear response theory shown in Appendices H and I also allows us to analyze the response dynamics around fixed points and regular coherent oscillations for individual configurations. However, to identify the type of a bifurcation, information is needed about the lowest order nonlinear term relevant to the bifurcation. We expect higher order corrections to the lowest order result to provide nonlinear response dynamics valid for individual configurations and information about the bifurcations. In the main text, we simulate the model equations, Eq.(1) or (50) together with synaptic weights described by Eq.(2) or (5), directly. In this section, we describe the details of the simulations. We first describe the random variables J ij kℓ . As indicated in [60], we can choose random variables for the connectivity, so that the model does not violate Dale's law, a rule that prohibits neurons from feeding both excitatory and inhibitory connections. We use the following random variables with zero mean and unit variance: The sign before J ij kℓ is positive if the population ℓ is excitatory, and negative otherwise. With the same usage of sign ± for J ij kℓ , these random variables give, for both Eqs. (2) and (5), Note that the effect of the adjustment in the second of Eq. (5)  With these random synaptic weights, we integrate the model equations using the fourth-order Runge-Kutta algorithm [75] with discrete timesteps of size ∆t = 0.05 for Eq.(1), and ∆t = 0.02 for Eq.(50). We use N = 10240 for most of the results, except that we use N = 40960 in Fig.3 We show the entire eigenvalue spectra of the synaptic weight matrices of the untuned and finely tuned models in Fig.13(a) and (b). As pointed out in previous studies [58][59][60], the synaptic weight matrices of the untuned model have configuration-dependent outliers while those of the finely tuned model do not.      Fig.14(a)]. Since the sum of a row of B is not finely tuned to a fixed value even if J is the synaptic weight matrix of the finely tuned model, a previous result suggests that B has configuration-dependent outlier eigenvalues [58]. We observe this in Fig.14(b). In Fig.14(c), we also show the eigenvalue spectrum of the same random coefficient ma-

The variational equation (B1) describes how an infinitesimal variation in h
j ) are generated independently in such a manner that they have the same first-and second-order moments as those used for Fig.14(b). These spectra have a common diskform distribution of eigenvalues and outlier eigenvalues at different positions, as expected from the previous study [58]. A previous study [65] also calculated the spectra of the coefficient matrices of the linear variational equations around dynamics of RNNs and showed that they agreed with those estimated from random matrix theory. The eigenvalue spectra of B are also calculated by using two sets of independent random weight configurations and independent random neuronal activities of the same first-and second-order moments as those of the simulated activities used in (b) (indicated as "random 1" and "random 2", respectively). In these spectra, outlier eigenvalues are observed [arrowheads in (b) and (c)], while most of the eigenvalues are distributed over a common disk. In (c), most of the eigenvalues for "random 2" in the common disk are hidden behind those for "random 1", although the distributions in the common disk are quite similar in the two settings.
The results also agreed with the largest Lyapunov exponent calculated from the linear response theory based on an MFT. By analogy with this, our stochastic linear response theory derived in Sec.IV B 3, Sec.IV B 4, Appendix H, and Appendix I is expected to allow further quantitative evaluation of the agreement between random matrix theory with outliers and our stochastic MFT. We leave this as a challenge for the future. Here, we investigate how often we observe each of the qualitatively different solutions in simulations of the network models with different system sizes. We do this by conducting simulations of the models with random weight configurations. We use the same model equation and parameter values as for Figs.6(g), 10(c), and 12(c), and for occasionally observed fixed-point and limit-cycle solutions for autonomous networks with finely tuned synaptic weights [ Fig.16]. We also examine stimulusdriven dynamics of networks of LIF neurons, by applying sinusoidal inputs to √ N excitatory neurons as described by Eqs. (4) and (46). The input term in these equations are added to the right-hand side of Eq. (50). As mentioned in the main text, the entire network gets entrained to the sinusoidal input in a configuration-dependent manner, as we increase the amplitude of the input [Fig.17].
In the model setting for Figs.6(g), 10(c), 12(c), and 17(b), we find that the frequency with which each type of solution is observed does not depend much on the system size [ Fig.15(b)-(e)], but it does in the model setting for Fig.16 [Fig.15(a)]. These findings are consistent with the results of the stability analysis we perform in Sec.IV B 3, Sec.IV B 4, Appendix H, and Appendix I.

Appendix D: Application of irregular external inputs
A different type of external input of particular interest is an irregular input with no periodicity. As an example of such an input, we use the following filtered noise: where F and F −1 indicate Fourier and inverse-Fourier transforms, respectively. We define white Gaussian noise with unit variance, I WG , with I WG (t)I WG (t−τ ) = δ(τ ). The domain of integration is between ω min and ω max , which are given suitably in the discrete Fourier transforms we use for the numerical calculations below [75].
To filter out the high frequency components, we use the step function Θ(ω) = 1 for ω ≥ 0 and Θ(ω) = 0 for ω < 0. Examining the properties of the network dynamics resulting from these irregular inputs, we find that the largest Lyapunov exponent decreases and becomes negative as we increase the amplitude A of the input. By analogy with the results shown in Sec.IV C 3, we hypothesize that neuronal activities are coherent in the dynamics with the negative largest Lyapunov exponents. In Fig.18(a) and (b), we show typical activity patterns for the networks with the negative and positive largest Lyapunov exponents, given these irregular inputs. Coherence among the neurons is not seen in the activity patterns. To further test the hypothesis, we read out values linearly from these networks according to Eqs. (47) and (48) and examine the scaling of the read-out values in the same manner as for Fig.11(c) and (d). Fig.18(c) and (d) show that the normalized standard deviations of the values read out from networks with negative exponents do not depend much on the system size, while those calculated from networks with positive exponents are roughly proportional to 1/ √ N . This suggests that the networks undergo a transition to coherent dynamics as we increase the amplitude of the irregular external input, even though coherence among the neurons is not obvious from their activity patterns. In this section, we derive the MFT described in Sec.III based on a path-integral representation of dynamics. The approach based on path-integral representations has recently become increasingly popular in the analysis of RNNs and other disordered systems [34,65,[76][77][78][79][80]. Our argument and notation follow [80]. We refer readers to the first two chapters of [80] for precise definitions and notations for the path integral we use below.
We first analyze the autonomous dynamics of the model with finely-tuned synaptic weights described in Sec.III A. The moment-generating functional for the dynamics of the present model from an initial condition a is given by In the above, following [80], we collectively denote {h (k) i } k,i by h as a single column vector, and add the subscript α as the time index. The superscript T denotes transposition. The probability density over the sample paths is denoted by p. Hereafter, we sometimes use the same sort of collective notation without mentioning it. The main step in deriving the MFT for the present model is to transform the above functional into an integral with respect to the sample paths for the mean activity. Putting the right-hand side of the model equation equal to f (h), and using the following Fourier representation of the Dirac delta functional: the above generating functional is transformed as follows: In the above, we have introduced an auxiliary field, j, for calculating the response function (see [80]). In the above definition, we adopt the Ito convention and take the noiseless limit in defining the path integrals for the dynamics.
In this section, we assume that the noiseless limit, thermodynamic limit, and stepsize limit all commute with one another. Note that we have because this quantity is the limit of integrals of proper probability densities. In the following, we represent inner products with respect to time in the L 2 sense, using a vectorial notation such as Rewriting the above with a concrete form for f and making the dependence on configurations explicit, we obtain Here, we define the action: We define the vector and matrix for which all entries are unity at each timestep as 1 and M 1 , respectively. The left action of M 1 is thus given by The column vector which consists of φ(h (k) i ) has been denoted by φ(h (k) ). From the first line to the second line of Eq.(E6), we have inserted the Dirac delta functional equating m k and 1 T h (k) for each t, where 1 T denotes the following operation: (E9) Note that, under the condition in Eq. (3), the insertion of any value of φ ℓ does not affect the value of the integrand in the last line of Eq.(E6) as long as φ E = φ I and hence ℓ=E,I g kℓ φ ℓ = 0 hold. We will determine the precise value of φ k below. Next, we take the configurational average of the integrand of the above equation. We consider unit Gaussian measures for J ij kℓ denoted by N (J ij kℓ ). Other distributions for these random variables can be analyzed in essentially the same manner, where distributions with zero mean and unit variance give the same result (to see this, expand the exponential in the integrand in terms of small values). Focusing on the term involving J ij kℓ , we have Taking the product of the second line over k, ℓ, i, j, we obtain the part of the integrand involving {J ij kℓ } k,ℓ,i,j as Here,φ ℓ denotes the average of φ(h (ℓ) j ) over the ℓ population. We now introduce an auxiliary field by inserting the following Dirac delta functionals: Following the convention in [80], we regard Q ℓ,i (i = 1, 2) as matrices and use the following notation: Using Eqs.(E11)-(E13), we take the average of the moment-generating functional over the probability distribution P J (J) of J ij kℓ as follows: j . Using this representation, we note that the argument of the first exponential takes the form of an independent interaction between each neuron and the auxiliary fields if the values of θ k ,φ ℓ , ψ ℓ , Q ℓ,1 , and Q ℓ,2 are fixed.
In the above equation, we notice that, for fixed sample paths for θ and m, the integral of the first exponential function with respect to Q 1 , Q 2 , ψ,φ, h, and h gives the generating functional for the following dynamics of a fictitious RNN with a uniform external input θ k (t): The dynamics of this fictitious RNN are no longer under the effects of balanced excitation and inhibition, and therefore solved by a conventional MFT. We first rewrite the averaged generating functional for these dynamics as The method for obtaining the dynamics described by the above moment-generating functional has been developed in previous studies [32,80]. First, the above functional is rewritten as In the second equation, h (k) and h (k) are no longer collections of variables corresponding to individual neurons but instead are one-dimensional variables for a representative neuron feeling the mean-fields. Similarly, l (k) and l (k) are one-dimensional variables that take values randomly drawn from the measure P j, j corresponding to j and j: Applying the saddle-point method, we find that the entire probability mass of the path integral of the dynamics at j = j = 0 concentrates at the values ofφ ℓ , ψ ℓ , Q ℓ,1 , and Q ℓ,2 that maximize the integrand of the first of Eq.(E18).
Taking the functional derivatives and examining the stationarity conditions, we obtain these optimal values as Here, the bracket denotes the expected value of the argument for all possible sample paths weighted according to the path-integral representation of the fictitious dynamics. The zeros for Q * ℓ,2 and ψ * ℓ are obtained by taking directional functional derivative with respect to j (k) i (t) = α (k) (t) at α = 0 and j = 0 and using the normalization of the probability density (see also the arguments in [80,81]). We actually have because we have Z * [0, j|θ] = 1 for any j, similarly to Eq.(E4). Then, following the argument in [80], we can regard the dynamics of individual neurons, as described by the second of Eq.(E18), as linear dynamics driven by a set of i.i.d.
Gaussian noise with correlation Q * ℓ,1 and drift term θ k . Thus, we obtain This is the solution for the fictitious RNN described by Eq.(E16). We now return back to the generating functional for the original dynamics in Eq.(E15). Note that the integrand of Eq.(E17) gives a proper probability density over the sample paths for the fields. Then, because of the independence among neurons for fixed values of θ k and m k , the integrand in Eq.(E15) can be rearranged into the following form with the aid of the central limit theorem: Here, we have set the values of φ E = φ I to φ(h (ℓ) j ) , the configuration and population average of φ(h (ℓ) j ) over the fictitious dynamics for each sample path for θ k . We have defined η ℓ def = 1 √ N 1 T (φ(h (ℓ) ) − φ ℓ 1). We determine the above normalization term, F ℓ , below. Noting that the auxiliary field m k is imposing equality between θ k and ℓ g kℓ η ℓ , the above equation determines the probabilities with which the sample paths for η ℓ and θ k are realized, and the density for each value of η ℓ is given by Note that this density is at most O(1) in terms of the neuron count, and therefore, does not affect the values of Q ℓ,1 and Q ℓ,2 , which have most of the probability mass.
Because of the condition, g EE = g IE = g 0 and g EI = g II = −g 0 , we have θ E = θ I def = θ from θ k = ℓ g kℓ η ℓ . Since the right-hand sides of Eq.(E22) take the same values for k = E, I, we have We have Q * E,1 = Q * I,1 = C from Eq.(E20), and hence F E = F I def = F . Defining η = √ 2 2 (η E − η I ) and η = √ 2 2 (η E + η I ) and combining these with Eqs.(E22) and (E24) and the equality, θ E = θ I = g 0 (η E − η I ), we obtain the realization probability of m(t) given by which is equivalent to Eqs. (17) and (19). We also obtain the dynamical equation for the correlation matrix for the microscopic fluctuations from Eqs.(E22) and (E27) as which is identical to Eq. (15). Finally, we identify the normalization term F . Let us consider the following representation, If H is independent of η, this transformation, together with the probability density in Eq.(E28), yields i.i.d. unit Gaussian variables ξ. In this case, the normalization term is given by exp(−F ) = exp(− 1 2 ln |C|) = |C| −1/2 . This is the Jacobian accompanying the transformation of probability densities over η and ξ. However, in the present setting, H depends on η. For this case, taking the nonlinear deformation of the coordinates spanned by ξ into account, we expect that the normalization term F is given by the Jacobian ln |H + ∂H ∂ξ ξ| and hence that the transformation η = Hξ still gives unit Gaussian variables. Here, the Jacobian matrix is rearranged as which yields Eq. (18). Although rigorous treatment of this point encounters the mathematical difficulty in rigorously dealing with path-integrals, the validity of the above normalization term is supported by the fact that it yields the proper normalization of the path integral consistently with Eq.(E4). This normalization needs to be kept in mind as we perform perturbative expansion below. Below, we describe the outline of how the above theory is modified for the case with external input to √ N excitatory neurons and for the case with untuned synaptic weights. For the case with I(t) = 0 in Eq.(4), the following O( √ N ) correction is introduced to Eq.(E18): In the subsequent saddle point method, however, we notice that the O( √ N ) correction does not affect the values of Q * ℓ,j , ψ * ℓ , andφ * ℓ in the leading order. Then, the neurons feel the same Gaussian fields as those described by Eqs.(E20). Then, the second exponential in Eq.(E15) is modified as , with φ ℓ = φ(h (ℓ) j ) j / ∈S . In the above equation, 1 S (1 \S ) denotes a vector whose elements are one for indices belonging to S, and zero otherwise. We easily find that the second term in the above exponential gives the driving-force term, φ, for the mean dynamics [Eq. (29)]. We also find that the exclusion of the sum of the √ N random values in the first term followed by the division by √ N does not affect the values of the driving-force term η in the leading order. Thus, with η given by Eq. (E28), we obtain the mean-field equations for the stimulus-driven dynamics as presented in Sec.III D.
For the untuned model, we easily see that the averaging with respect to the untuned random synaptic weights leads to the replacement of ∆φ . By a straightforward application of the same argument as above, we obtain the results presented in Sec.III C.
Appendix F: Efficient method for solving the mean-field equations In this section, we show how the stochastic mean-field equations are numerically solved by recursively updating the statistics characterizing the microscopic and macro-scopic dynamics of the system. In the main text and Appendix E, we found that self-consistent equations (14) and (15) determine the time evolution of the statistics of the microscopic fluctuations. These equations are rewritten as Dealing with the above matrix requires us to retain large matrices. To reduce the computational cost, we recursively update matrices of smaller size by using the following auxiliary matrix that retains the effect of the boundary conditions: which means R is the solution of the ordinary differential equation, (1 + ∂ t1 )R = 2σ 2 0 C, from suitable initial conditions, and the initial conditions do not affect the values of A at time indices distant from those of the initial conditions because of the decay with the unit decay constant. For the dynamics of the untuned model, matrix C should be suitably replaced by C = C + φφ T .
Suppose that we have values of m(t 1 ), η(t 1 ), φ(t 1 ), C(t 1 , t 2 ), D(t 1 , t 2 ), and R(t 1 , t 3 ) for timesteps t ≥ t 1 , t 2 ≥ t − T 1 , and t − ∆t ≥ t 3 ≥ t − T 1 . For the model with external inputs, also suppose that we have values of φ(t 1 ). The length of the time interval T 1 should not be too large. In what follows, we obtain the values of these vectors and matrices for a time index one step beyond the currently available entries.
We first update the value of m, which is obtained from the values of η(t), φ(t), and φ(t) by applying the Euler method to Eq. (19) or (29).
Next, we update R. The values of R(t + ∆t, t 3 ) for t− ∆t ≥ t 3 ≥ t− T 1 are obtained by solving Eq.(F2) from the initial values, R(t, t 3 ), by using the Euler method.
For the values of R(t 1 , t), we need to solve Eq.(F2) from the unknown initial value, R(t − T 1 , t). We obtain the initial value by applying a discrete Fourier transform to Eq.(F2): This implicitly assumes that the regularity of the correlation function 2σ 2 0 C is captured by the discrete Fourier transform. Actually, if the microscopic dynamics of the model are chaotic and have a relatively short correlation time, then, C(t 1 , t) ≈ 0 for t 1 ≤ t − T 1 holds and we can use the initial condition, R(t − T 1 , t) ≈ 0, which is given also by Eq. (F3). If the entire dynamics have periodic components with a relatively short period, the Fourier transform detects the corresponding peak of the correlation function in the frequency domain and Eq.(F3) is expected to provide an accurate initial condition. Either one of these two situations is almost always the case if we use a sufficiently large value of T 1 . The exception among the cases we have investigated in this study is one in which the external input is irregular and the network dynamics retain finite irregularly varying correlation over a very long time. In this case, the above two assumptions do not hold and we do not attempt to obtain a numerical solution from the MFT.
We use the initial value of the solution obtained by solving Eq.(F3) and then applying the inverse discrete Fourier transform. Note that, although the discrete Fourier transform captures decaying and regular patterns in the autocorrelation function, it does not necessarily provide a good approximation to the solution over the entire time domain. Thus, we solve the ordinary differential equation (F2) again from the initial values thus obtained, not directly using the solution obtained from the inverse discrete Fourier transform.
Once we obtain the updated values of R, those of D are obtained straightforwardly with the Euler method by noting that Eqs.(F1) and (F2) yield (1 + ∂ t2 ) D(s, t 2 ) = R(s, t 2 ). (F4) We are not concerned about the initial conditions for this equation; for the values of D(s, t + ∆t) with s ≤ t, we can use the values of D(s, t) as initial conditions. The value of D(t + ∆t, t + ∆t) is almost independent of the initial value, D(t + ∆t, t − T 1 ), since the variation at time t 2 = t − T 1 converges exponentially with unit time constant. Next, we obtain the updated values of C(t + ∆t, s), φ(t + ∆t), and φ(t + ∆t) from D and m. They are computed by using Eqs. (11), (14), and (29), which can be written more precisely as In the above, y 1 , y 2 and w are independent unit Gaussian variables integrated with dN (w) = exp(−w 2 /2)dw/ √ 2π. We have used the abbreviations D αβ = D(t α , t β ) and m α = m(t α ) for α, β = 1, 2. In the calculation of φ, To calculate the double integral in the above efficiently, we use a table that retains the values obtained by performing one of the two integrations with respect to y 1 or y 2 for a fixed value of w. More precisely, we prepare a table consisting of the values of the integral for different values of (α, β). In the calculation of Eq.(F5), we perform the double integration by interpolating the values in the table and integrating them with respect to w. Finally, we obtain a realization of the random variable, η(t + ∆t). Recall that the realization probability of η is give by Eq. (20), which is rewritten as The above conditional Gaussian distribution has mean µ t+∆t and variance ν t+∆t , with In these equations, we define the column vector c t+∆t (s 1 ) def = C(s 1 , t + ∆t) and matrix C t:t−T2 (s 1 , s 2 ) def = C(s 1 , s 2 ) for the restricted range of time indices t − T 2 ≤ s 1 , s 2 ≤ t. Since approximation errors may be larger at the boundary of the time domain, we use a smaller value of T 2 (< T 1 ). From the above, we obtain a realization of η(t + ∆t) using independent unit Gaussian random variable ξ t+∆t as In this way, we obtain all necessary updated values. For the computation in Eq.(F12), we need to calculate the inverse matrix, C −1 t:t−T2 . Since our computation of matrix C is based on numerical integration, small amounts of errors are inevitable. When the inverse matrix is computed, the effects of small errors in the small eigenvalues of the matrix can be large. Thus we introduce a small ridge, computing (C t:t−T2 + ǫdiag(C t:t−T2 )) −1 instead of C −1 t:t−T2 , where the diagonal matrix diag(C t:t−T2 ) consists of the diagonal elements of C t:t−T2 . This amounts to ignoring the small eigenvalues of C. Also, since taking matrix inverse at every timestep is inefficient, we update the inverse matrix using the formula, for a square symmetric matrix A and column vectors p and q of the corresponding size. In the above, we define Applying this formula with A = C t:t−T2 , p(t − T 2 ) = −1, p(t − s 1 ) = 0 for s 1 = T 2 , q(t − T 2 ) = 0 and q(s 2 ) = C(t − T 2 , s 2 ) for t − T 2 < s 2 ≤ t gives the inverse matrix of C t:t−T2+∆t in the upper left part of the output matrix. Applying this formula with A(t + ∆t, t + ∆t) = C(t + ∆t, t + ∆t), A(s 1 , t + ∆t) = A(t + ∆t, s 1 ) = 0, A(s 1 , s 2 ) = C(s 1 , s 2 ), p(t + ∆t) = 1, p(s 1 ) = 0, q(t + ∆t) = 0, and q(s 1 ) = C ℓ (t + ∆t, s 1 ) for t − T 2 < s 1 , s 2 ≤ t then gives C −1 ℓ,t+∆t:t−T2+∆t . To avoid the accumulation of numerical errors, we directly compute the inverse matrix every 500 timesteps. In the main text, we use the following values of the parameters: T 1 = 480, T 2 = T 1 /2 for Fig.6 and Fig.10, and T 1 = 960, T 2 = T 1 /2 for Fig.2. We use ǫ = 1.0 × 10 −6 for g 0 = 0.25 in Fig.2, ǫ = 1.0 × 10 −3 for Fig.6 and Fig.10, and ǫ = 1.0 × 10 −4 for the rest. We use the discrete timesteps with stepsize ∆t = 15/128. Each simulation is performed using 16 cores of recent versions of Intel Xeon processors in parallel for a few hours-a few days.
Appendix G: Perturbative expansion for chaotic solutions with mean activities of small amplitudes In Sec.IV A, we have mentioned a perturbative expansion around g 0 = 0 for the calculation of the moments of the mean activity. This perturbative calculation scheme, however, turns out to not work well. We briefly explain how it is performed and why it does not work well. According to our MFT, the probability distribution over sample paths for the mean activity m(t) is given by ( In this section, we make explicit the dependence of C and F on η. Although the above equation is derived for sample paths from a fixed initial condition, we assume that it holds on the entire time axis. This is justified by the intuition that the mixing property of chaotic dynamics keep the calculated moments from being severely affected by the boundary values.
In what follows, for illustration, we focus on the calculation of autocorrelation in the frequency domain, which is easier to carry out than that in the time domain. Here, · denotes the Fourier transform. Our objective is to compute the following moment: where we define the normalization constant Z and the Fourier transform of the autocorrelation For this calculation, we need to compute the perturbed correlation matrix, which can be carried out by differentiating Eq.(E29) with respect to η. The first-order response in D(t, s) then obeys the following equation: where we define a 1 (t − s) = 2σ 2 Here, the average over the unperturbed dynamics with m(t) ≡ 0 is denoted by the angle bracket with subscript 0, and we use abbreviations such as φ 0 def = φ(h(t)) 0 . These coefficients originate from the differentiation of C(t, s) with respect to D(t, s) and m(t), as summarized in Appendix J. Since φ(h(t)) − φ 0 and φ ′ (h(t)) − φ ′ 0 are odd and even functions of h(t), and since the dynamical variable h(t) is distributed symmetrically around h(t) = 0 in the unperturbed dynamics, we find a 3 (t, s) = 0. Thus, there is no driving force in the above equation, and we obtain δD (1) (t, s) ≡ 0.
Next, we examine the second-order response in D(t, s), which obeys the following non-homogeneous linear equation: To obtain a solution to this equation, we first compute an approximate solution that satisfies by ignoring the first three terms of the right-hand side of Eq.(G5). This equation is solved in the frequency domain as δD (2) The above solution leaves a residual error on the right-hand side of Eq.(G5). Then, we recursively make corrections to the solution by considering the residual error as a new non-homogeneous term and solving (1 + ∂ t )(1 + ∂ s )δD (2) k+1 (t, s) = a 1 (t − s)δD (2) k (t, s) + a 2 (t − s)δD (2) k (t, t) + a 2 (s − t)δD (2) k (s, s), in the frequency domain.
Assuming that the series thus obtained is convergent, we obtain the second-order response in the form δD (2) (t, s) = ∞ k=0 δD (2) k (t, s). Using this response, we obtain the response in C(t, s) as δC (2) (ω 1 , ω 2 ) = (1 + iω 1 )(1 − iω 2 ) δD (2) (ω 1 , ω 2 )/(2σ 2 0 ). Now, we obtain Here, the second angle bracket denotes average of just the connected contribution between the external legs and the interaction vertex, as is conventional in diagrammatic calculations [54]. The symbol ⊙ denotes the element-wise product of two matrices followed by integration with respect to the two argument variables. We omit to write down the terms originating from the complicated dependence of F on η. In principle, we can calculate the desired moment from the above expansion. However, this calculation scheme turns out to be difficult to carry out. The difficulty originates from the slow convergence of the sum of the series. Because of this, we need to compute δD (2) k up to a large k. However, we find that the diagrammatic calculations of δD (2) k up to a large k requires us to carry out multiple integrals explicitly, and this is computationally intractable. This happens because of the lack of interchangeability between the diagrammatic averaging and the recursion in Eq.(G8). In the main text, because of this difficulty, we restrict ourselves just to the autocorrelation function calculated using the crudest approximation, η(p 1 ) η(p 2 ) ≈ η(p 1 ) η(p 2 ) 0 [ Fig.3(a) and (c)].

Appendix H: Perturbative stability analysis of fixed points
In this section, we analyze the stability of fixed-point solutions observed in Sec.IV B 3 of the main text. We consider the case in which the mean activity initially takes a constant value, m(t) = m f for t ≤ 0. According to the MFT, the mean activity is determined by where we define, δm(t) def = m(t) − m f . To analyze the dynamics of the mean activity, we first need to examine the associated microscopic Gaussian fluctuations determined by the past values of m(t). For a certain range of constant values of m(t), these Gaussian fluctuations have a stable fixed-point solution. For these fixed points, the correlation matrices D(t, s) and C(t, s) take constant values, D 0 and C 0 , which are obtained by solving the self-consistent equation, Eq. (38). As described in the main text, we assume that η(t) = m f / √ 2g 0 holds for t ≤ 0. Then, the network state stays at the fixed point without requiring external inputs, and we have δm(t) = 0 for t > 0 if there is no external input in t > 0. This condition is expected to be satisfied for some value of m f with a non-zero probability (see the discussion at the end of this section as well). We then examine the linear response for t > 0 to temporary external perturbative inputs, collectively denoted by p.
From the above analysis, by putting δD (1) (t, s) = r(t) + r(s), we obtain Noting that the first-order response in φ(t)φ(s) is given by we have the correlation matrix for (1 + ∂ t ) −1 η(t), where we define V 0 = 1 Noting that V (t, s) = V 0 and v(t) = v(s) = 0 holds for t, s ≤ 0, and that V (t, s) is the correlation matrix that determines the realization probability of (1 + ∂ t ) −1 η(t), we have where δη (2) (t) represents O(|p|) fluctuations due to the higher-order response in D(t, s). In the case with finely-tuned synaptic weights, the correlation matrix for (1 + ∂ t ) −1 η(t) is given by D(t, s)/2σ 2 0 and we have the same representation with V 0 = D 0 /2σ 2 0 and v(t) = r(t)/2σ 2 0 √ V 0 . Next, we consider the second-order response in D(t, s), which is denoted by δD (2) (t, s). Differentiating Eq.(H2) once again, we obtain We also define the random part of the input, p 2,i (t). The solution for the above non-homogeneous linear equation is obtained as the superposition of special solutions for the equations with each of the non-homogeneous terms on the right-hand side plus a solution for the homogeneous equation. Noting δD (1) (t, s) = r(t) + r(s), the non-homogeneous terms are rewritten as: (b 1 + 4b 2 + 2b 4 )(r(t) 2 + r(s) 2 ) + b 5 (δm(t) 2 + δm(s) 2 ) + (2b 1 + 4b 3 + 4b 4 )r(t)r(s) + b 6 δm(t)δm(s) + i p 2,i (t)p 2,i (s). (H12) We ignore the first two terms above, because they only yield responses of O(|p| 2 ) magnitude in η(t). This can be seen by checking that these two terms only make corrections to Eq.(H8) of the following form: (H13) Also note that the response in φ(t)φ(s) due to δD (2) (t, s) is negligible for the same reason. Thus, we are interested in the non-homogeneous equations of the following form: In solving this equation, we first ignore the unknown quantities on the right-hand side and obtain the following approximate solution: With this solution, residual error a 1 q 1 (t)q 1 (s) + a 2 (q 1 (t) 2 + q 1 (s) 2 ) remains on the right-hand side of Eq.(H14). We then solve Eq.(H14) by regarding the residual-error term as a new non-homogeneous term and by ignoring the two unknown terms on the right-hand side again. Then, we obtain the following two corrections: The second term in the above is again negligible because of its O(|p| 2 ) contribution to η. We can then recursively obtain δD (2) q,j , (j ≥ 2) in the same manner as above and obtain the relevant part of the solution for Eq.(H11) as 1≤j<∞ δD (2) q,j (t, s). If we have a 1 < 1, this series is convergent because we have, Here, · denotes the Fourier transform, and · 1 denotes the L 1 -norm. From the above analysis, we now have the first-and second-order responses in D(t, s), which yields D(t, s) ≈ D 0 + r(t) + r(s) + j,1≤ℓ<∞  In the above, we obtain d 2,jℓ (t) by applying the recursion relation in Eq.(H16) with q ℓ (t) = d 2,jℓ (t) and q 0 (t) = d 2,1,0 (t) = √ 2b 1 + 4b 3 + 4b 4 r(t) or d 2,2,0 (t) = √ b 6 δm(t). We also define d 3,jℓ (t) by applying the recursion relation in Eq.(H16) with q ℓ (t) = d 3,jℓ (t) and q 0 (t) = d 3,j,0 (t) = p 2,j (t). Note that v(t)v(s) needs to be suitably subtracted from these terms to compensate the corresponding term in Eq.(H8), which modifies the definitions of d 2,j1 and d 3,j1 described above. We omit the precise expressions of these terms, because they do not affect our conclusion.
Noting that the second-order response in D(t, s) does not evoke a response in φ(t)φ(s) which leads to O(|p|) response in η, we have the relevant part of the response in V (t, s), Putting d 1 (t) = v(t)/ √ V 0 , we now have δη (2) in Eq.(H10) and obtain where we define i.i.d. unit Gaussian random variables, {ξ jℓ } j,ℓ and {ξ ′ jℓ } j,ℓ . Here, note that the outer-product representation in Eq.(H20) gives a transform of the form of (1 + ∂ t ) −1 η = U ξ, U U T = V . This gives a representation of η with unit Gaussian variables, ξ, as we discussed in Appendix E. Together with the equation for m(t) for t > 0, Eq.(H21) gives a self-consistent equation that the first-order responses in m(t) must satisfy. Here, we define the uniform component of the input, p 0 (t).
From the above relation, we obtain the first-order response in m(t) for a given set of values of {ξ jℓ } and {ξ ′ jℓ } by further iteration. Initially, calculating d 1 (t) and d 3,jℓ (t) for δm(t) = 0 [denoted by d 3,jℓ (t), respectively], we have, The solution for the above equation needs the following correction to the right-hand side of the first of Eq.(H22): where d 2,jℓ (t) are the corrections to d 1 (t) and d 2,jℓ (t) due to the change δm (0) (t). We then recursively obtain δm (j) and δη (j) by alternately correcting the errors in Eqs. (H21) and (H22). The sum of the series thus obtained gives the desired first-order solution, δm(t) = j≥0 δm (j) (t), and the sum of this series converges with a non-zero probability if we have a 1 < 1 and with m f θ 1 < 1. From Eqs.(H5) and (H9), we find Here, note that the norm of d 2,j0 is bounded by a certain multiple of the norm of δm (k) . This implies that, for any positive θ 2 , holds with a non-zero probability. The convergence in 1-norm in the frequency domain implies the uniform convergence in the time domain, which asserts that p(t) → 0 implies δm(t) → 0 as t → ∞ with the aid of the dominated convergence theorem. Hence we have proved the linear stability with a non-zero probability. Also note that a single configuration of the random connectivity of the network corresponds to a single set of values for {ξ jℓ }. Otherwise, the above solution is not consistent with the linearity of the response. Besides the above stability result, we can also examine how the response dynamics diverge depending on the configuration. For example, suppose that the random coefficients ξ jℓ take large values for small ℓ, and that all the inputs initially take constant values for a sufficiently long time. Then, the constructed solution is likely to diverge at some t, because the divergence of d 2,j0 and suppose to choose a very short time interval compared with the unit decay constant of (1 + ∂ t ) −1 . This consideration indicates that, for any values of ξ jℓ and ξ ′ jℓ , the constructed solution converges over a short time interval.
Also, suppose the case in which the equality, m f = √ 2g 0 η(0), is slightly violated and that the mean activity is initially clamped to m f by an external input until the clamping input is removed at t = 0. Then, the inequality introduces a small constant, uniform, input term to the above self-consistent equation for δm. Up to the first-order, this results in the convergence of δm and η − m f / √ 2g 0 to non-zero values for t → ∞, with the same non-zero probability as that for the above stability to an external input. This indicates that a non-zero interval of values of η(0) around m f / √ 2g 0 results in convergence to nearby fixed points. Thus, we reasonably expect the existence of fixed points with a non-zero probability, although a rigorous proof of this needs careful evaluation of all higher-order terms of the perturbative expansion.
In this section, we extend the perturbation analysis developed in Appendix H and analyze the stability of the regular oscillations observed in the main text. Although this analysis is along the same lines as that in Appendix H, much more complex calculations are required for the present case. The complexity of these calculations would make it harder to grasp the basic idea behind it. Thus, we recommend readers to first check Appendix H and to familiarize themselves with the basic idea before reading this section. In what follows, we present a theoretical framework of the analysis first and then show the values of the derived bounding constants for concrete cases at the end of the section.
Suppose that the mean activity of a network is initially set to a periodic orbit m o (t) for t ≤ 0 and that temporary external perturbative inputs, collectively denoted by p, are applied for t > 0. Also suppose that the neuronal fluctuations are coherent and phase locked to the oscillation for t ≤ 0. As we have seen in the main text, such coherent neuronal fluctuations are found by iteratively applying the self-consistent equation: with the fixed mean activity m o (t). Concretely, we first transform the autocorrelation function on the right-hand side to the frequency domain: C(ω 1 , ω 2 ) = 1 2π e −iω1t e iω2s C(t, s)dtds. Then, the left-hand side is obtained by a simple algebraic computation in the frequency domain. After transforming this back to the time domain, we update the values of C(t, s) using Eq.(F5). Although this type of iterative approach is thought to be a heuristic method for finding a solution, the convergence argument we make below provides criteria for judging whether the solutions thus obtained are stable. In the unperturbed dynamics, we have the following eigen-decomposition of the correlation matrix for (1 If the microscopic part of the dynamics is coherent, the eigenvectors can be expanded with the Fourier basis, e ℓ (t) = e iℓω0t , for the basic frequency ω 0 , as v i (t) = ℓ v iℓ e ℓ (t). In the initial unperturbed dynamics, the driving-force term η(t) is given by for a suitable set of values for {ξ o,i } i . Otherwise, the sample path for η(t) is never realized [cf. Eq.(E28)]. Here, the term φ is the external driving-force term defined in Eq. (29). For the model without external inputs, we set φ(t) to zero. For t > 0, the mean activity obeys the following dynamical equation: where we define δm(t) = m(t) − m o (t) and define δ φ(t) as the deviation of the function φ(t) from the initial periodic orbit. The last term, p 0 (t), represents the effect of the uniform component of the input. To analyze the above response dynamics, we first examine how a change in m(t) evokes a response in D(t, s) for t, s ≥ 0. The first-order response in D(t, s) is denoted by δD (1) (t, s) and obeys (1 + ∂ t )(1 + ∂ s )δD (1) (t, s) = a 1 (t, s)δD (1) (t, s) + a 2 (t, s)δD (1) (t, t) + a 2 (s, t)δD (1) (s, s) For the model with finely-tuned synaptic weights, the above coefficient functions must be suitably replaced. The coefficients originate from the differentiation of C(t, s) and C(t, s) with respect to D(t, s) and m(t), which is summarized in Appendix J. The term p 1,i represents the effects of the component of the input that is correlated with the i-th mode of the unperturbed neuronal fluctuations.
To check the convergence and magnitude of the above first-order response, we iterate the recursion relation from different initial values and check the 1-norm of the final result. More precisely, we numerically examine {δv i } i for initial input δm(t) = e iωt with different values of ω. From this calculation, we estimate the value of the bounding constant θ 1 , for √ 2g 0 i |ξ o,i | δv i 1 ≤ θ 1 δm 1 + i θ ′ 1,i p 1,i 1 . For use below, we also estimate the value of the bounding constant θ 2 , for i δv i 1 ≤ θ 2 δm 1 + i θ ′ 2,i p 1,i 1 . Here, note that the 1-norm is evaluated as the sum of the discrete components over the multiples of the basic frequency ω 0 and the continuous component over the other frequencies.
Appendix J: Price theorem and differentiation of correlation matrix C and C We have used the differentiation of correlation matrix C with respect to D and m in Appendix G-I. This is given by the following formula: The variables {z α } α=1,2 are Gaussian random variables that have the same first and second-order moments as {h (ℓ) j (t α )} α=1,2 (ℓ = E, I), similarly to Eq.(F5). Also note that the above partial derivatives with respect to D αβ are not total differentials with respect to D(t α , t β ) but are derivatives with respect to the corresponding variables that appear in Eq.(F5). The twice differentiation is performed similarly and easily inferred from the above results. The differentiation of C is easily obtained from the results for the differentiation of C and φ. The above results can be derived by the differentiation of both sides of Eq.(F5) followed by integration by parts, but are obtained more easily by performing the differentiation in the frequency domain in a manner similar to the derivation of Price theorem [82] (also see [80] for a simpler example). We show the outline of how it is performed below.