A Microscopic Theory of Intrinsic Timescales in Spiking Neural Networks

A complex interplay of single-neuron properties and the recurrent network structure shapes the activity of cortical neurons. The single-neuron activity statistics differ in general from the respective population statistics, including spectra and, correspondingly, autocorrelation times. We develop a theory for self-consistent second-order single-neuron statistics in block-structured sparse random networks of spiking neurons. In particular, the theory predicts the neuron-level autocorrelation times, also known as intrinsic timescales, of the neuronal activity. The theory is based on an extension of dynamic mean-field theory from rate networks to spiking networks, which is validated via simulations. It accounts for both static variability, e.g. due to a distributed number of incoming synapses per neuron, and temporal fluctuations of the input. We apply the theory to balanced random networks of generalized linear model neurons, balanced random networks of leaky integrate-and-fire neurons, and a biologically constrained network of leaky integrate-and-fire neurons. For the generalized linear model network with an error function nonlinearity, a novel analytical solution of the colored noise problem allows us to obtain self-consistent firing rate distributions, single-neuron power spectra, and intrinsic timescales. For the leaky integrate-and-fire networks, we derive an approximate analytical solution of the colored noise problem, based on the Stratonovich approximation of the Wiener-Rice series and a novel analytical solution for the free upcrossing statistics. Again closing the system self-consistently, in the fluctuation-driven regime this approximation yields reliable estimates of the mean firing rate and its variance across neurons, the inter-spike interval distribution, the single-neuron power spectra, and intrinsic timescales.


I. INTRODUCTION
Neural dynamics in the cerebral cortex of awake behaving animals unfolds over multiple timescales, ranging from milliseconds up to seconds and more [1][2][3][4][5]. Such a heterogeneity of timescales in the dynamics is a substrate for temporal processing of sensory stimuli [6] and reflects integration of information over different time intervals [3,4]. Intriguingly, in vivo electrophysiological recordings reveal a structure in the autocorrelation timescales of the activity on the level of single neurons [2,7]. This structure could arise from systematic variations in singleneuron or synaptic properties [8,9], from the intricate cortical network structure [10], or from a combination of both [11,12]. Furthermore, timescales may be influenced by the external input to the network, and depend on the chosen measurement procedure [13]. Thus, while these timescales are referred to as intrinsic timescales, they are shaped by intrinsic and extrinsic factors alike.
Explaining the timescales of individual neurons embedded in a network poses a theoretical challenge: How to account for a microscopic, neuron-level observable in a macroscopic theory? Clearly, a straightforward coarsegraining of the activity eliminates the microscopic ob-servable of interest [14]. Dynamic mean-field theory (DMFT) [15][16][17] makes microscopic observables accessible because, instead of coarse-graining the activity of the neurons, it coarse-grains their input. Here, the term 'dynamic' specifies that the input is approximated as a stochastic process that varies in time, in contrast to the notion of a mean-field theory in physics, which usually describes processes embedded in a constant field. DMFT has led to significant insights into the interrelation between network structure and intrinsic timescales for recurrent networks of (non-spiking) rate neurons [15][16][17][18][19][20][21][22][23]. In particular, it has been shown that very slow intrinsic timescales emerge close to a transition to chaos in autonomous networks [15]. Interestingly, simply adding a noisy input to the network significantly reduces this effect and even leads to a novel dynamical regime [21]. Furthermore, increasing the complexity of the single-neuron dynamics reveals that timescales of slow adaptive currents are not straightforwardly expressed in the network dynamics [22], and leads to yet another dynamical regime termed "resonant chaos" [23]. In combination, these results suggest that the mechanisms shaping the intrinsic timescales in recurrent networks are highly involved.
A characteristic feature of neural communication in arXiv:1909.01908v2 [q-bio.NC] 15 Sep 2021 the brain is the spike-based coupling [24]. Consequently, spiking neural network models have already yielded notable insights into cortical neural dynamics. Prominent examples are the excitatory-inhibitory balance mechanism which dynamically generates strong fluctuations while keeping the activity in a physiological range [25,26] and the mechanism of recurrent inhibitory feedback leading to low cross-correlation between neurons despite the high number of shared inputs [27,28]. From a theoretical perspective, spike-based coupling further increases the complexity of the dynamics. This calls for an extension of DMFT to spiking networks, as recently achieved with a model-independent framework [29] (see also [30]).
Perhaps unintuitively, the main obstacle is not the reduction of the recurrent dynamics to the DMFT but the colored noise problem: to obtain the output statistics of the neuron for temporally correlated input statistics. Previous works relied on numerical methods to address the colored noise problem [31][32][33][34] because the spiking nonlinearity renders this problem in general analytically intractable. Such a self-consistent numerical scheme already revealed an unexpected minimum instead of a maximum in the intrinsic timescales for spiking networks at a critical coupling strength [35]. However, numerical solutions have the drawback that they lead to noisy estimates of the autocorrelation function, which poses additional challenges on the inference of intrinsic timescales [36] and other dynamical quantities from the neuronal and network parameters. In addition, such a self-consistent numerical scheme is computationally intensive.
In this paper, we use analytical approaches to close the self-consistency equations for spiking networks. First, we transfer the theory for rate networks to one for spiking networks starting from the characteristic functional of the recurrent input. This shows that the first two cumulants (mean and variance) of the connectivity matrix suffice to fully characterize the effective stochastic input, and automatically take the static variabilities (firing rate, indegree) in the network into account. Since it is based on DMFT, the resulting theory indeed accounts for the timescales on the microscopic level, orthogonal to approaches where the activity of a population of neurons is reduced to an effective mesoscopic description [37]. Second, we derive an analytical solution to the colored noise problem for generalized linear model (GLM) neurons with exponential and error function nonlinearity. Using these analytical solutions, we validate that the selfconsistent DMFT captures both the static second-order statistics, the distribution of firing rates across neurons, and the dynamic second-order statistics, the populationaveraged autocorrelation function. Furthermore, we use the theory to investigate the conditions for longer intrinsic timescales, like those observed in in vivo electrophysiological recordings [2,7], in a balanced random network of GLM neurons. Due to the analytical tractability, our theory exposes the factors that shape the intrinsic timescale. Third, we derive a numerically efficient analytical approximation for the colored noise problem for leaky integrate-and-fire (LIF) neurons in the noisedriven regime based on the Wiener-Rice series and the Stratonovich approximation thereof [38,39]. For a different approach based on a Markovian embedding, which leads to multidimensional Fokker-Planck equations with involved boundary conditions that are solved numerically, see [40]. In contrast, our approximation leads to integrals of which the computationally most involved ones can be solved analytically. Lastly, we use these results to explore the parameter space of a balanced random network of LIF neurons for long timescales, and apply the theory to a more elaborate model with populationspecific connection probabilities that are constrained by biological data [41].
We start this manuscript with the derivation of the DMFT equations from the characteristic functional of the recurrent input. The remainder of the results is structured according to the neuron model: First we consider GLM neurons with exponential and error function nonlinearity, respectively, then we turn to LIF neurons. For each neuron model, we begin by deriving the solution or approximation of the colored noise problem. We then describe the numerical method to solve the self-consistent DMFT equations for the given neuron type (GLM or LIF). Subsequently, we use our theory to investigate the timescale in the respective network models.

II. MICROSCOPIC THEORY OF INTRINSIC TIMESCALES
We consider random network topologies where the entries of the matrix J containing the synaptic strengths, i.e. the amplitudes of evoked post-synaptic currents due to incoming spikes, are independent and identically distributed (i.i.d.). A synapse from neuron j to neuron i exists (J ij is non-zero) with probability p; each non-zero entry J ij is independently sampled from the distribution of synaptic strengths with mean µ J and variance σ 2 J < ∞: The connectivity is thus taken to be pairwise Bernoulli, yielding maximally one synapse from a given presynaptic to a given postsynaptic neuron. To account for Dale's law and further heterogeneities, we subdivide the network into populations, e.g., all pyramidal cells in cortical layer V, consisting of statistically identical neurons and denote the population by a Greek superscript. Within this generalization, the entries of J are still i.i.d. random numbers for a given pair of populations α, β, but p αβ and the distribution of J αβ ij can vary for different pairs of populations (Fig. 1a). For example, if I denotes a population of inhibitory interneurons, all J αI ij are negative.
In this manuscript, we focus on the situation where the average number of synapses per neuron, the indegree The theory reduces a population to a single neuron driven by an effective stochastic input η α . The first-and second-order statistics µ α η and C α η of η α depend self-consistently on the output statistics, ν α and C α x . (c) From the stationary spike train autocorrelation function Cx(τ ) = νδ(τ ) +Ĉx(τ ), we obtain the correlation time τc, the asymptotic decay τ∞, and the variability of the rate across neurons, σ 2 ν . (d) Instead of the stationary autocorrelation function we sometimes consider the power spectrum Sx(f ), which saturates at the firing rate, Sx(f ) f →∞ → ν, and, for a renewal process, has the zero-frequency limit Throughout, we consider the population-averaged singleunit statistics (black curve) instead of the statistics of the population-averaged activity (gray curve).
K αβ = p αβ N β , is large: K αβ ≫ 1 due to a large number of presynaptic neurons N β ≫ 1 in combination with a moderate connection probability p αβ on the order of 10%, in agreement with the situation in cortical networks [42]. In line with the theory of balanced networks [43], we assume that neither single spikes are sufficient to cause firing nor coherent input from all presynaptic neurons is necessary. Moreover, we consider networks which are in an asynchronous irregular state exhibited by cortical networks of awake, behaving animals [44].
In the following, we first consider a single population for clarity because the generalization to multiple populations is straightforward.

A. Input statistics
Dynamic mean-field theory reduces the dynamics of the recurrent network to a set of self-consistent stochastic equations. Its core idea is to approximate the recurrent input by independent Gaussian processes. Throughout this manuscript, x i (t) denotes the spike train of neuron i. The sum in Eq. (2) extends over all N neurons, using that J ij = 0 for neurons that are not connected.

Gaussian Process Approximation
Here, we sketch the derivation to expose necessary conditions for the DMFT. For the full treatment of the problem, we refer to the model-independent DMFT developed in [29], which is applicable to spiking networks.
We start from the deterministic input Eq. (2) and derive its approximation as independent Gaussian processes. To this end, let us consider the characteristic functional of the recurrent input. Because η i (t) is a deterministic quantity, its distribution is a Dirac delta and its characteristic functional, defined by [38,45] (see also Appendix A, Eq. (47)) Now we assume that the dynamics of the system are, on a statistical level, very similar for any given realization if the connectivity, i.e. we assume that the system is self-averaging. Thus, we can consider the average across realizations of J , where we used the independence of the J ij , their characteristic function ⟨exp(ik ij J ij )⟩ Jij = exp(i⟨J ⟩k ij − 1 2 ⟨∆J 2 ⟩k 2 ij + . . . ), and neglected the cumulants of J ij beyond the second-order cumulant (the variance) ⟨∆J 2 ⟩. Due to the independence of the J ij , the expectation factorizes into a product ∏ N i,j=1 which leads to the sum ∑ N i,j=1 in the exponent. Within each factor, the first (second) cumulant leads to a linear (quadratic) term in the exponent. Next, we rewrite the square, and introduce the auxiliary fields Using the auxiliary fields, the characteristic functional , with the individual factors given bŷ which is the characteristic functional of a Gaussian process with mean µ η (t) and correlation function C η (t, t ′ ) [38,45] (see Appendix A, Eq. (48)). The factoriza- implies that the approximate inputs described byΦ η [u(t)] are independent across neurons.
The above expressions reveal multiple assumptions we make in the DMFT. First, we assumed self-averaging. This is a necessary assumption if one wants to derive a statement that generalizes beyond a given connectivity matrix to its statistics only. For a broad class of rate networks, one can show rigorously that network-averaged quantities are indeed self-averaging [46,47]. Here, we check this assumption post-hoc by comparison of the theory with simulations for a single realization of the connectivity. Second, we implicitly assumedḡ ∶= N ⟨J ⟩ and g 2 ∶= N ⟨∆J 2 ⟩ do not scale with N such that the auxiliary fields remain finite for large networks. Using the mean number of inputs per neuron K = pN and the properties of J , we get Third, we neglected higher cumulants of the input. Using the assumption K) for the neglected higher cumulants. Accordingly, in the regime K ≫ 1, neglecting the contributions from higher cumulants, e.g. due to shot noise effects [33], is justified.

Self-consistency problem
Given these assumptions, the recurrent inputs η i (t) can be approximated by independent Gaussian processes, which leads to a coarse-grained description of the dynamics: since all inputs are statistically equivalent, the neurons become statistically equivalent as well and the system reduces to N independent, identical stochastic equations. For N ≫ 1, we can replace the empirical averages in Eq. (4) and Eq. (5) by ensemble averages such that we arrive at a set of self-consistency equations. This step can be made rigorous using the formalism of [29], see Eqs. (2)(3) and Appendix 1 therein.
In the stationary state, the self-consistency equations are given by The averages ⟨x⟩ η ≡ ν and ⟨xx⟩ η (τ ) − ν 2 ≡ C x (τ ) denote the mean (firing rate) and correlation function of the spike train produced by a neuron driven by the effective stochastic input η(t). Since the input thereby appears on both the left-hand and the right-hand side, this poses a self-consistency problem. To recapitulate, DMFT approximates the input of a single neuron by an effective Gaussian process with self-consistent statistics (Fig. 1b). Thus, the description, albeit stochastic, is still on the level of individual neurons.

Static contribution
The networks we consider are heterogeneous even within a population-each neuron potentially has a different number of presynaptic partners and thus also a different firing rate [48]. On a first glance, DMFT neglects this heterogeneity. However, Eqs. (7) in fact account for such static variabilities: on the r.h.s. the second moment of the spike train appears instead of the correlation function. Rewriting ⟨xx⟩ η (τ ) = C x (τ ) + ν 2 reveals a first static component g 2 ν 2 of the variability of the effective input due to the firing rate of individual neurons. Moreover, C x (τ → ∞) ≡ σ 2 ν potentially saturates on a plateau which accounts for the variability of the firing rate across neurons (Fig. 1c). To make this explicit, we sometimes rewrite where ζ is a Gaussian random variable with µ ζ =ḡν, σ 2 ζ = g 2 (ν 2 + σ 2 ν ) and ξ(t) a zero-mean Gaussian process with C ξ (τ ) = g 2 (C x (τ ) − σ 2 ν ).

B. Multiple populations
Using the expressions Eqs. (7) for a single population, we can straightforwardly generalize the theory to multiple populations. Due to the independence of the effective inputs in DMFT, both mean and correlation function are a simple sum over the contributions from all populations [18,49]: with the corresponding generalizations of Eqs. (6),ḡ αβ = K αβ µ αβ J and (g αβ ) This leads to one stochastic equation per population (Fig. 1b). As before, we can split the input into static and dynamic contributions, η α (t) = ζ α + ξ α (t).

External input
We take the sum ∑ β to include external populations, e.g. excitatory neurons that drive the network dynamics with homogeneous Poissonian spike trains of rate ν ext . In Eq. (9) and Eq. (10), such an external Poisson input leads to a term J α,ext ν ext and (J α,ext ) 2 ν ext δ(τ ), respectively. If the network is driven by a constant external input, only Eq. (9) obtains an additional contribution µ α ext . An external zero-mean, stationary Gaussian process leads to an additional term C ext (τ ) in Eq. (10).

C. Output statistics
Approximating the input is only the first step. In a second step, the self-consistency problem has to be solved. To this end, the output statistics of a neuron driven by a non-Markovian Gaussian process have to be calculated. In other words, we need a solution for the colored noise problem. The full non-Markovian problem has to be considered because a Markovian approximation neglects the quantity of interest: the temporal correlations. For sufficiently simple rate neurons, the problem is analytically solvable [15,50]; the case of two spiking neuron models is discussed in the following sections. For the remainder of this section, let us assume that we are able to solve the colored noise problem to obtain a self-consistent solution of Eq. (9) and Eq. (10).

Timescale
Given a self-consistent solution, we can calculate the intrinsic timescale from the spike-train autocorrelation function C α x (τ ). Since C α x (τ ) always contains a delta peak [38], we consider only the smooth part of the autocorrelation functionĈ α To characterize the timescale, we use the definition of [38] (Fig. 1c): Note that the definition of the autocorrelation time is not unequivocal.
Other possible defini- [23]. We observed drastic differences between these definitions for empirical correlation functions directly obtained from the simulations. These differences are in part an artifact from the absolute value: the variance of the empirical estimate grows with τ [51]; due to the absolute value these fluctuations add up. The three functional forms carry with them different fluctuations, e.g. the squared fluctuations are typically much smaller than x (∞) < 1, and hence lead to different estimates. For theoretically predicted autocorrelations, the difference is less drastic and we choose Eq. (11) because it is the most simple definition. Due to this difficulty, we always use the theoretical prediction of the autocorrelation function to determine the timescale-after checking that it matches the empirical autocorrelation function well apart from fluctuations.
In addition to τ α c , we will also consider the asymptotic decay constant (Fig. 1c) because in special cases τ α ∞ directly follows from our theory. For a simple exponential autocorrelation function, the timescales in Eqs. (11) and (12) coincide. We work from the assumption that Eq. (12) is a good approximation to Eq. (11) and verify this assumption post-hoc.
Recently, two approaches have been proposed to estimate the timescale directly from spiking data [36,52]. While both overcome important challenges, biases in the estimated timescale related to and independent of subsampling, respectively, we do not use them here because they rely on models which implicitly assume (a mixture of) exponential correlation functions: [52] assumes an autoregressive model and [36] a mixture of Ornstein-Uhlenbeck processes.

Spike train power spectrum
Instead of the autocorrelation function, we sometimes consider the spike train power spectrum (Fig. 1d) Due to the delta peak in the autocorrelation function, the power spectrum always saturates at the firing rate, For a renewal process, the zero-frequency , which directly reveals the coefficient of variation of the inter-spike-interval (ISI) distribution CV α .

Comparison with simulations
In our theory, we consider disorder-averaged quantities and stationary processes. To compare the theory with a single simulation, we assume self-averaging in the sense that the activity distribution across neurons is approximately the same for each network realization. Since neurons with different indegrees have different disorderand time-averaged inputs, in practice this means that we assume that neurons with comparable indegree have comparable activity statistics in each network realization.
The disorder averages preserve the static variability across neurons, as we consider the same connectivity statistics, and in particular the same indegree distribution, across realizations. Self-averaging works well when each neuron (or at least a sufficiently large proportion of neurons) receives input from a representative sample of the rest of the network.
Under stationarity, distributions across neurons of instantaneous rates at any given time point (but not of instantaneous rates across time points -which we do not consider here) equal distributions of time-averaged rates across neurons. To obtain the rate distributions from the simulations, we use time-averaged rates to reduce the variance of the corresponding estimates. Similarly, we use time averages to compute the single-neuron autocorrelation functions and power spectra.
We focus on the second-order statistics. Since firstorder statistics, i.e. the firing rate, scale the power spectra and correlation function [24], we plot S x (f ) ν and C x (f ) ν 2 to eliminate this trivial dependency. Note that a multiplicative factor does not influence the intrinsic timescale, Eq. (11).

III. GENERALIZED LINEAR MODEL NEURONS
First, we consider generalized linear model (GLM) neurons [24,53]. GLM neurons are stochastic model neurons that spike according to an inhomogeneous Poisson process at a rate determined by the synaptic input. Due to their simplicity, GLM neurons are frequently fitted to experimental data [24]; here we consider them because they are analytically tractable.

A. Neuron dynamics
Each neuron generates a spike train according to an inhomogeneous Poisson process with intensity (rate) where θ α denotes the (soft) threshold, φ(V ) is a smooth, nonnegative, monotonically increasing function, and c α 1 > 0, c α 2 > 0 are free parameters. The voltage is given by a linear filtering of the input where d αβ allows for a transmission delay. For all simulations, we choose a filter with a single exponential with time constant τ α m which corresponds to post-synaptic currents in the form of delta spikes: Here, Θ(t) denotes the Heaviside function ensuring causality of the filter. We rescale the synaptic weights J αβ ij and the threshold θ α using c α 2 such that c α 2 = 1 throughout the rest of this section.

Colored noise problem
The effective stochastic input η α (t) leads to stochastic voltage dynamics. Because the voltage is given by a convolution, the voltage becomes a Gaussian process with where the filter determinesκ α = ∫ ∞ −∞ κ α (t)dt and For the singleexponential filter that we used in simulations, we haveκ α = τ α m andκ α (t) = τ α m 2 e − t τ α m . Note that the transmission delay cancels in the stationary case considered here.
All cumulants of the resulting spike trains x(t) can be obtained from their characteristic functional [38] (see Appendix A, Eq. (53)): From here, we temporarily drop the population index for the sake of clarity. Averaging over realizations of the rates yields where we neglect terms of O(u 3 ) since we are only interested in the first and second cumulants. Expanding also e iu(t) − 1 to second order in u(t), we can simply read off the stationary cumulants in agreement with the result of [54].
We are left with the task of calculating the first two cumulants of λ(t) from µ V and C V (τ ), depending on the choice of the nonlinearity φ(V ).

Exponential nonlinearity
First, we consider the commonly employed exponential nonlinearity [24] Both cumulants are straightforward to obtain from the characteristic functional of the voltage. We have (see Appendix A, Eq. (49) and Eq. (50)) , where we used the stationarity of V . Including the prefactor and the threshold from Eq. (14), we get From Eq. (21) it follows that C λ (τ ) has a static part as long as C V (∞) > 0. Since C η (τ ) contains a static part (see Eq. (8) and Eq. (10)), C V (τ ) and hence C λ (τ ) and C x (τ ) indeed also contain a static contribution and saturate on a plateau.

Rate distribution
The rate distribution across neurons is lognormal because the (static) input distribution is Gaussian and the f-I curve is a simple exponential [48]. The theory yields the mean ν = c 1 exp µ V − θ + 1 2 C V (0) and variance σ 2 ν = C x (∞) = ν 2 e C V (∞) − 1 of the firing rate. We note that we can obtain the same result from a constant input with meanμ Parameterized in terms ofμ V andσ V , the firing rate distribution is thus with the normal distribution N (x µ, σ 2 ).

Error function nonlinearity
A drawback of the exponential function, Eq. (19), is that it allows for infinite rates. Thus, we also consider the bounded nonlinearity The integrals to determine the cumulants can be solved using the table [55] (details in Appendix B.1); the result is where we again suppressed the population index, abbre- Rate distribution Equivalent to the situation for the exponential nonlinearity, the input distribution across neurons is Gaussian. Again, we consider the equivalent static problem which, in this case, leads tõ . Parameterized in terms ofμ V andσ V , the firing rate distribution is where probit(x) denotes the inverse of the standard normal cumulative distribution, i.e. probit(φ(V )) = V , and we used φ ′ (V ) = N (V 0, 1).
Here, the small update step ε < 1 is crucial because otherwise the fixed-point iteration is numerically unstable. Now we iterate and generate new voltage statistics. With the incremental update and the initialization ν α = 1 2 c α 1 , the algorithm quickly converged to the fixed point corresponding to the simulation in the examples we considered. Due to the analytical solutions, the only bottleneck for the numerics is the convolution in Eqs. (17), which can be solved efficiently using the Fast Fourier Transform [56]. Thus, even the parameter scans with 5,000 points described in the following run on a laptop in less than two minutes.

B. Balanced random network
As a first application of the theory, we consider a balanced random network of excitatory cells and inhibitory interneurons. The network contains two populations ( Fig. 2a), α ∈ {E, I}, and it is driven by an excitatory external input which we incorporate into an effective threshold θ eff = θ − µ ext . Here, we use a constant external input rather than a Poisson drive because we are particularly interested in finding long timescales, which might be hindered by the lack of temporal correlation of Poisson spike trains. However, the theory can straightforwardly be applied to Poisson input. Although four times more excitatory cells are present in the network, we typically place it in an inhibition-dominated regime by increasing the synaptic weights of the inhibitory neurons. As well known [26], this settles the network in the balanced state leading to asynchronous irregular activity of the neurons (see, e.g., Fig. 2b).
In line with Brunel's model A [26], we choose identical values for the single-neuron parameters. Since we also choose the same connection probability of 10% for all pairs of populations, both populations receive statistically identical input in the DMFT approximation. Due to identical single-neuron parameters and input statistics, the statistics of the activity is the same for excitatory and inhibitory neurons (see, e.g., Fig. 2b); therefore, we do not distinguish between the populations for the statistics in our plots. In contrast to the network examined by Brunel, we consider the somewhat more involved case of a fixed connection probability between a pair of neurons instead of a fixed number of incoming synapses per neuron (indegree). The fixed connection probability leads to a (binomially) distributed indegree across neurons, such that a strong variability across neurons is present in the network (see, e.g., Fig. 2c). This variability is already (d) Population-averaged single-unit autocorrelation function from simulation (gray) and self-consistent theory (black) using Eq. (20) and Eq. (21). Here, we subtracted the static contribution Cx(∞). Parameters: N E = 10,000, N I = 2,500, present on the level of mean firing rates, i.e. there is static variability in the network.
All simulations were performed using the NEST simulator version 2.20.1 [57]. In all GLM network simulations, we simulated 1 min of biological time with a time step of 0.1 ms and discarded an initial transient of 1 s. For the GLM neurons, we used the 'pp_psc_delta' neuron model. To allow for the error function nonlinearity, we modified the 'pp_psc_delta' model accordingly.

Exponential nonlinearity: absence of long timescales
First, we consider networks with an exponential nonlinearity (Fig. 2). The fixed-point iteration yields a rate distribution and an autocorrelation function that closely match the simulation (Fig. 2c,d). The theory for the rate distribution ( Fig. 2c) is slightly biased towards higher rates; a possible cause for this is a finite size effect because the mean inhibitory indegree K I = pN I = 250 is relatively small. Nonetheless, the theory predicts the autocorrelation function very well (Fig. 2d) and yields a timescale τ c ≈ τ m = 20 ms.
For the parameters in Fig. 2, the intrinsic timescale is close to the membrane time constant. This raises the question whether longer timescales can be achieved in a network of GLM neurons. To answer this question, we employ our theory and perform parameter scans. First, we vary the single-neuron parameters c 1 and c 2 (Fig. 3a,b). The rate increases monotonically with c 1 while c 2 has as smaller effect up to a certain thresh-  old (Fig. 3a). Beyond this threshold, the rate diverges rapidly to infinity in the threshold iteration (white area in Fig. 3a). The timescale is close to the membrane time constant throughout the non-divergent regime and only increases slightly towards the threshold where the rate diverges (Fig. 3b). Next, we vary the strength of the external input by adjusting the effective threshold θ eff and the inhibition dominance by varying J I J E for constant J E . We find a clear threshold of J I J E beyond which the rate diverges (Fig. 3c). Again, this threshold corresponds to the regime where the timescale slowly starts to grow above the membrane time constant. Put together, these observations suggest that the rate divergence prevents recurrent dynamics with long timescales in balanced random networks of GLM neurons with exponential nonlinearity.

Error function nonlinearity: existence of long timescales
In the previous section, the rate divergence prevented long timescales. To avoid the divergence, we consider the bounded transfer function Eq. (23) and use our theory for parameter scans (Fig. 4). The effect of the single-neuron parameters c 1 and c 2 is similar to the unbounded case but the rate divergence is absent (Fig. 4a). This allows for a parameter regime with longer timescales up to approximately 3τ m (Fig. 4b). Similarly, varying θ eff and J I J E uncovers a regime with a rate close to the maximum c 1 when the network is not inhibition-dominated (Fig. 4c). Outside the inhibition-dominated regime, we expect that our theory does not yield quantitatively accurate predictions. The effect on the timescale is more subtle: within the inhibition-dominated regime, for any given J I J E the timescale displays a maximum whose  location depends on the external input (Fig. 4d).
What kind of dynamics is displayed by the network at such a local maximum of the timescale? The corresponding spike trains show a strong variability of firing rate across neurons and temporally correlated spikes (Fig. 5a). The rate distribution reveals that all rates between the minimum zero and the maximum c 1 are present, in excellent agreement with the theoretical prediction (Fig. 5b). In the example considered, the empirical estimate of the network-averaged single-unit autocorrelation displays an intrinsic timescale of approximately 2τ m ; again, the empirical estimate and the theoretical prediction agree closely (Fig. 5c). From the spike train power spectrum, a high CV > 2 is apparent (Fig. 5d). All of these characteristics agree with the "heterogeneous asynchronous state" uncovered in [58].

Error function nonlinearity: mechanism of timescale
To uncover the mechanisms that shape the timescale, in particular the local maximum in Fig. 4d, we develop a theory for the asymptotic timescale τ ∞ , Eq. (12). To this end, we use thatκ(t) = τm 2 e − t τm is the fundamental solution to the differential operator Thus, we can rewrite Eqs. (17) into a differential equation: where the dependence of C η on C V is determined by Eq. (10), Eqs. (18), and Eq. (25). Next, we rescale time such that τ m = 1 and linearize this differential equation Firing rate distributions across all neurons from simulation (gray) and theory (black) using Eq. (26). (c,d) Population-averaged singleunit autocorrelation function and power spectrum from simulation (gray) and self-consistent theory (black) using Eq. (24) and Eq. (25). As in Fig. 2, we subtracted the static contribution Cx(∞). Parameters: c1 = 250 s −1 , c2 = 0.075 mV −1 , further parameters as in Fig. 2.
This allows for an exponential solution with time constant where we used Eq. (10) and Eqs. (18) to derive . We see that there are two factors that determine the timescale: the cumulant of the connectivity g 2 and the gain of the rate autocorrelation . For the latter, we obtain from Eq. (25) Thus, given a self-consistent autocorrelation C x and the corresponding voltage statistics from Eqs. (17), the asymptotic timescale Eq. (27) can be directly evaluated. We vary θ eff and J I J E in Fig. 6a-c. First, we plot dC V (∞) alone, which we refer to as the gain (Fig. 6a). Due to the interplay between the exponential suppres- and the square root factor (28), the gain already exhibits a maximum. The existence of the maximum is mainly determined by the exponential suppression with growing µ V − θ eff in Eq. (28): both in the [ms] excitation-and in the inhibition-dominated regime, µ V is far from the effective threshold θ eff . The precise location of the maximum is not necessarily at µ V = θ eff as it is also determined by the square root factor. The latter decays reciprocally to C V (0) and C V (∞). Both C V (0) and C V (∞) decay for growing effective threshold and inhibition dominance, which results in a larger square root factor that shifts the maximum towards the upper right and broadens it. The cumulant of the connectivity g 2 grows with J I J E 2 , which further broadens the region of the maximum (Fig. 6b). The resulting asymptotic timescale (Fig. 6c) agrees both qualitatively and quantitatively with the intrinsic timescale (Fig. 4d). This is likely due to the single-exponential shape of the autocorrelation function (Fig. 5c).
To investigate the interplay of the gain and the connectivity further, we vary the overall synaptic strengths by varying the excitatory weight J E while keeping J I J E fixed at an inhibition-dominated value (Fig. 6d). Increasing J E in the inhibition-dominated regime shifts µ V away from the effective threshold and decreases the gain; conversely g 2 grows with J 2 E . This interplay leads to a broad region in parameter space with an increased timescale. However, the exponential decrease of the gain is more pronounced than the quadratic increase of g 2 such that the asymptotic timescale does not continue to grow with J E but saturates. Thus, although increasing J E goes together with increased variability across neurons as in the "heterogeneous asynchronous state" described by Ostojic [58], this does not map systematically onto longer singleneuron timescales.

Error function nonlinearity: external timescale
Our theory allows arbitrary Gaussian processes as external input. To investigate the influence of an external timescale on the intrinsic timescale, we choose a zeromean Ornstein-Uhlenbeck process with Here, the scaling factors ensure that the external timescale does not influence the resulting variance of the voltage, C V (0) = ∫ ∞ −∞κ (s)C ext (s)ds = σ 2 ext for κ(t) = 1 2 τ m e − t τm . We take the parameters from Fig. 5 where the intrinsic timescale is maximal in the absence of external input. Increasing the strength of the external input σ 2 ext leads to an increased firing rate (Fig. 7a). As desired, by construction of Eq. (29), the external timescale has a negligible effect on the firing rate at constant σ 2 ext (Fig. 7a). The effect of the external timescale on the intrinsic timescale is highly intuitive: If τ ext is smaller than the intrinsic timescale without external input it decreases the intrinsic timescale, and vice versa (Fig. 7b). The strength of this effect grows with the strength of the external input. In the limit of strong external input, the intrinsic timescale approaches the external timescale if τ ext > τ m ; if τ ext < τ m the intrinsic timescale approaches the minimum set by the membrane time constant.

IV. LEAKY INTEGRATE-AND-FIRE NEURONS
Considering GLM neurons is a convenient choice due to their analytical tractability. However, their intrinsic stochasticity might fundamentally alter the network dynamics. Thus, we consider the frequently used leaky integrate-and-fire neuron model in this section [24]. The synapses are taken to be current-based with an exponential time course. An analytical solution to the colored noise problem for LIF neurons is an open challenge. Here, we focus on the fluctuation-driven regime and employ an approach based on the Wiener-Rice series [39,59,60] and the Stratonovich approximation thereof [38,39]. Below, we briefly introduce both the Wiener-Rice series and its Stratonovich approximation. For a comprehensive and pedagogic introduction to this approach, in particular with a focus on LIF neurons, see [61].

A. Neuron dynamics
The dynamics of individual neurons are governed by where V α i denotes the membrane voltage, I α i the synaptic current, τ α m s the membrane/synaptic time constant, and the voltage is reset to V α r and held constant during the refractory period τ α ref whenever it reaches the threshold θ α . Threshold crossing triggers a spike which arrives at another neuron after a delay d αβ . We set the resting potential to zero without loss of generality and absorb the membrane resistance into the synaptic current.

Effective stochastic dynamics
The effective stochastic input with statistics governed by Eq. (9) and Eq. (10) leads to a stochastic current with whereκ α (t) = τ α s 2 e − t τ α s , similar to Eqs. (17). Contrary to the GLM neurons, the voltage cannot become a stationary process for LIF neurons due to the fire-and-reset rule. To circumvent this problem, we use the Wiener-Rice series which relates the free process without reset to the spiking statistics.

Wiener-Rice series and Stratonovich approximation
We consider a LIF neuron after the refractory period and the voltage dynamics that results if we do not allow for another fire-and-reset. We denote this free voltage U (t). Moreover, we temporarily neglect the static contribution to the input variability and drop the population index. The process starts at U (0) = V r and produces a system of random points {t i } defined by the upcrossings U (t i ) = θ,U (t i ) > 0. For this system of random points, the probability that no point falls in the interval [0, T ], i.e. the survival probability, is given by [38] S(T ) = exp where the g s (t 1 , . . . , t s ) are related to the free upcrossing probabilities n s (t 1 , . . . , t s ) calculated below, similar to the relation between moments and cumulants. For example, g 1 (t 1 ) = n 1 (t 1 ) and g 2 (t 1 , t 2 ) = n 2 (t 1 , t 2 ) − n 1 (t 1 )n 1 (t 2 ). Now we approximate the output process as a renewal process such that the survival probability is sufficient to describe the statistics. Instead of the survival probability, it is more convenient to consider the cumulative hazard H(T ) = − ln S(T ) [24], i.e.
This can be regarded as a resummation of the Wiener-Rice series in terms of the g s (t 1 , . . . , t s ) instead of the free upcrossing probabilities n s (t 1 , . . . , t s ) [39].
A much simpler alternative to the Stratonovich approximation would be to set g s (t 1 , . . . , t s ) = 0 for s ≥ 2, leading to H(T ) = ∫ T 0 n 1 (t)dt. This approximation is sometimes referred to as the Hertz approximation. In particular, the Hertz approximation leads to a closed expression for the hazard function h(t) ≡ d dt H(t) = n 1 (t). Unfortunately, this approximation is too severe and strongly affects the resulting firing rate. The main difference between the two approximations is the asymptotic saturation of the hazard function. Thus, we employ an approximation suggested by Stratonovich for long times [38]: ∫ T 0 Q(t, t ′ )n 1 (t ′ )dt ′ ≈ n 0 ∫ ∞ 0 Q(t, t ′ )dt ′ ≈ n 0 η with n 0 = lim t→∞ n 1 (t) and η = lim t→∞ ∫ ∞ 0 Q(t, t ′ )dt ′ . Inserting this approximation into Eq. (34) leads to Eq. (35) combines the simplicity of the Hertz approximation with the asymptotic behavior of the Stratonovich approximation.
The asymptotic level is given by lim t→∞ h S (t) = κ S ; to leading order in η we have κ S = n 0 + O(η), which recovers the Hertz approximation. In the parameter regime we consider, Eq. (35) yields very similar results to Eq. (34) (see Appendix D). In all figures in the main text, we use Eq. (35).
From the hazard function, we obtain the firing rate as well as the inter-spike-interval distribution [24] p From the Fourier transform of the inter-spike-interval distributionp(f ) = ∫ ∞ 0 e 2πif T p(T )dT , we obtain the spike-train power spectrum using [38] S Thus, we are left with the task of calculating n 1 (t 1 ) and Q(t 1 , t 2 ).

Free upcrossing probabilities
The free voltage dynamics are governed by Eq. (30) τ mU (t) = −U (t) + I(t), where I is a Gaussian process determined by Eq. (32) and Eq. (33), and the initial condition is U (0) = V r . U is a nonstationary Gaussian process due to the initial condition. For a sufficiently smooth Gaussian process, the upcrossing probability is given by the Kac-Rice formulae [38,59,62] where p(θ,U 1 V r ,U 0 ) denotes the probability that the process is at the threshold after time t and has velocitẏ U 1 given that it started at the reset at t = 0 with veloc-ityU 0 . Similarly, p(θ,U 2 ; θ,U 1 V r ,U 0 ) denotes the joint probability to be at the threshold at t 1 and t 2 with veloc-itiesU 1 andU 2 . All integrals are over positive velocities only, because we consider upcrossings.
In both equations, we need to specify the distribution of the initial velocityU 0 . Here, it is important to take into account the biased sampling of the initial velocity [63]: at −τ α ref , the neuron spiked due to an increased input current; hence, the initial velocity τ mU0 = −V r + I 0 is likely to be larger than for an I 0 drawn from the stationary current distribution. To keep the integral in Eq. (39) tractable, we assume that I 0 is Gaussian-distributed. To determine the mean and variance of this distribution, we use that the velocity of a stationary process at an upcrossing is Rayleigh-distributed [38] (details in Appendix C).
For n 2 (t 1 , t 2 ), we consider only the stationary twopoint upcrossing probability, so that it becomes a function of the time difference t 2 −t 1 and loses the dependency on the initial velocity. After marginalizing the initial velocity in n 1 (t), we obtain where n 2 (τ ) leads to a stationary Q(τ ) = 1 − n2(τ ) . This makes the integrals in Eq. (34) considerably easier to solve numerically (details in Appendix D).
Since the free dynamics are linear, p(θ,U 1 V r ) and p(θ,U 2 ; θ,U 1 ) can be obtained analytically. Importantly, the integral in Eq. (39) as well as the double integral in Eq. (40) are analytically solvable using the table [55] (details in Appendix B.1 and Appendix C). This is a novel result, to the best of our knowledge, and considerably simplifies the numerical evaluation of Eq. (35).

Numerical solution of the self-consistency problem
Just as for the GLM networks, we solve the colored noise problem using a fixed-point iteration. To initiate the algorithm, we set the rates to ν α = 1 τ α m . We use these rates to calculate the input mean, variance, and spectrum according to Eq. (9) and Eq. (10), beginning with the diffusion approximation S α x (t) = ν α and σ α ν = 0 across neurons. Despite assuming initially equal rates across neurons, it is possible to have static input variability both due to distributed indegrees (see Eq. (8) and Fig. 8a) and due to evolution of the rates during the fixed-point iteration. To account for the static variability, we consider an ensemble of inputs µ α + ζ α and determine the corresponding hazard functions h α S (t µ α + ζ α ), Eq. (35), output rates ν α (µ α +ζ α ), Eq. (36), ISI distributions p α (T µ α +ζ α ), Eq. (37), and spectra S α x (f µ α +ζ α ), Eq. (38). From this ensemble, we obtain the final output statistics from a numerical average over the ensemble: S α We solve the above Gaussian integrals using Gauss-Hermite quadrature [56]. Gauss-Hermite quadrature of order k solves Gaussian integrals of polynomials up to power k exactly by construction. This allows us to keep the ensemble very small; throughout we use k = 5. Finally, we update the statistics using incremental steps, e.g. ν α n+1 = ν α n + ε(ν α n+1 − ν α n ) for the firing rate, wherê ν α n+1 denotes the estimated rate based on the input at the previous step. Here, the small update step ε < 1 is crucial because otherwise the algorithm is numerically unstable. Now we iterate and generate new input statistics. Repeated application of this scheme suggests that the self-consistent problem for the type of networks under consideration possesses only a single fixed point to which the algorithm always converges.

B. Balanced random network
First, we consider the same balanced random network as we did for the GLM neurons (Fig. 2a). In particular, we place the network in the inhibition-dominated regime, drive the network with a constant external input, and use identical single-neuron parameters for excitatory and inhibitory neurons. In order to obtain a biologically plausible activity below 10 spks s, we keep the external input weak to place the network deep in the fluctuation-driven regime. In this regime, the mean input to a neuron is far below threshold and only occasional large fluctuations in the input drive it above the spike threshold (Fig. 8a,b). If the mean inter-spike interval exceeds the correlation time of the input, the renewal approximation is admissible. Indeed, since the firing rates are low by construction, even moderate input correlation times are smaller than the inverse firing rate.

Colored noise problem
First, we isolate the colored noise problem to gauge the above approximations. To this end, we examine a single step in the fixed-point iteration. For this single step, we compare the theory with a population of unconnected LIF neurons driven by independent Gaussian processes (GPs). A self-consistent solution using populations of LIF neurons driven by GPs, where the colored noise problem is thus solved numerically, accurately captures the network-averaged power spectrum of the singleunit activity of the connected network [33]. In particular, this implies that cross-correlations as well as the non-Gaussianity of the input statistics can be neglected, in agreement with the DMFT prediction. Hence, if our theory yields results comparable to a population of GPdriven LIF neurons for a single step of the fixed-point iteration, the self-consistent fixed point will capture the simulation well. We compare the self-consistent theory with simulations in the following sections.  We do not determine the effective input statistics using simulation results here, because this would preclude a systematic scan over the parameters of the input, which consists of both external and recurrent network contributions. Instead, we fix the effective external input and determine the statistics of the effective recurrent input in terms of the input spiking statistics ν in , σ in ν = 0 across neurons, and corresponding to a gamma process with rate ν in and CV of the ISI distribution CV in , cf. Eq. (38). This leaves a two-dimensional parameter space spanned by ν in and CV in . From the spiking statistics, we obtain the statistics of the effective input using Eq. (9) and Eq. (10). Note that although σ in ν = 0, the static variability of the effective input is nonzero, σ ζ > 0, due to the distributed indegree, see Eq. (8) and Fig. 8a. Hence, we can compare both the averaged output statistics and the rate variability in the population. For the comparison, we simulate 250 GPdriven LIF neurons for 50 s with a time step of 0.05 ms; we use the same interval and time step for the theory.
Guided by the regime attained in full simulations, we choose ν in ∈ [0.5, 2.5] spks s and CV in ∈ [0.5, 1.5] (Fig. 8). The network is in the inhibition-dominated regime; thus the mean input decreases with ν in starting from the value that brings the membrane potential on average to threshold (Fig. 8a). In contrast, the static variability increases monotonically with ν in (Fig. 8a). To measure the strength of the dynamic variability, we divide the resulting standard deviation of the free membrane voltage by the distance of the mean free membrane voltage to the threshold, σ U (θ − µ U ). Since the numerator grows with √ ν in while the denominator grows linearly with ν in in inhibition-dominated networks, the standard deviation relative to the distance to threshold decreases with increasing ν in ; in contrast, it slightly increases with increasing CV in (Fig. 8b). For the entire parameter regime, the absolute difference in the firing rate is smaller than 1 spks s and it is maximal at the brink of the fluctuation-driven regime (Fig. 8c). For the static rate variability, we also consider the absolute difference, which is below 0.3 spks s throughout the parameter space (Fig. 8d). Next, we compare the ISI distributions using their Kolmogorov-Smirnov distance, i.e. the maximal absolute difference between the cumulative distributions. The Kolmogorov-Smirnov distance is maximal deep in the fluctuation-driven regime where the firing rate is well below 1 spks s and the estimate of the ISI distribution is noisy (Fig. 8e). Finally, we compare the output spectra using the maximum absolute distance between the scaled spectra S x (f ) ν. Here, the deviation is below 0.1 in most parts of the parameter space except for low CV in ≤ 0.6, high CV in ≥ 1.3, and at the brink of the fluctuation-driven regime (Fig. 8f ). To give meaning to the quantitative results, we plot two example ISI distributions (Fig. 8g) and spectra (Fig. 8h). For the ISI distribution, we see the noisy estimate at low rates. For the spectra, we note that the main difference is a constant offset caused by a small error in the rate, see Eq. (38), while the shape is well matched.
To conclude, the above approximations work well in the fluctuation-driven regime for moderate values 0.6 < CV in < 1.3. Within this regime, the firing rate and its variability across neurons, the ISI distribution, and the power spectra are well predicted. Most importantly for the prediction of the intrinsic timescale, the theory closely predicts the scaled spectrum S x (f ) ν.

Timescales in balanced random networks of LIF neurons
Having established the validity of the theory, we employ it to investigate the intrinsic timescale. It is well known that increasing the overall synaptic strength leads to a network state with long temporal correlations [35,58]. However, this state comes along with giant fluctuations of the membrane potential [64] which are well beyond the physiological regime and which our theory can capture only to a limited extent. Hence, we focus on the influence of the external input µ ext and the inhibition dominance J I J E , in line with our above investigations for GLM neurons. We solve the theory on a ∆t = 0.05 ms grid to a maximum of T = 10 s, use an ensemble size of k = 5 for the Gauss-Hermite quadrature, and choose an update step ε = 0.2.
We investigate the regime J I J E ∈ [4.1, 6] and µ ext ∈ [21,30] mV. Within this regime, the rate is below approximately 20 spks s, increases with µ ext , and decreases with J I J E (Fig. 9a). In contrast, the intrinsic timescale decreases with µ ext , increases with J I J E , and reaches a maximum of approximately 60 ms = 3τ m (Fig. 9b). The autocorrelation function reveals that the nature of these longer intrinsic timescales in LIF networks is fundamentally different to the GLM networks above (Fig. 9c): in the GLM networks, the autocorrelation function is positive, which corresponds to an increased probability to spike in succession; in the LIF networks it is negative, which corresponds to a prolonged effective refractory period caused by the fire-and-reset mechanism in combination with the input statistics. Indeed, in the corresponding power spectra and their zero-frequency limit, we see that the CV is well below 1 (Fig. 9d). Hence, the process is more regular than a Poisson process, as opposed to the high irregularity CV  along with bursty spiking.

Simulation of balanced random network of LIF neurons
We validate the theoretical predictions for the balanced random network of LIF neurons by comparing with a network simulation. To acquire sufficient statistics, we simulate the network for T = 2.5 min with time step ∆t = 0.1 ms and discard the first 1 s as an initial transient. After this transient, the network is in an asynchronous irregular state (Fig. 10a). The rates of individual neurons are mostly below 5 spks s with a peak at around 1 spks s (Fig. 10b). The theory closely predicts the ISI distribution apart from a slight overestimation of the tail (Fig. 10c). Thus, the resulting autocorrelation function is also well matched and the predicted intrinsic timescale of approximately 55 ms is confirmed (Fig. 10d). Also the scaled spectrum is closely reproduced and reveals a CV 2 ≈ 0.75 (Fig. 10e). To illustrate the difference between the single-unit and the population statistics, we furthermore plot the power spectrum of the population activity y(t) (Fig. 10f ). For vanishing cross-correlations, these two spectra would be proportional to each other. Already weak cross-correlations can shape the population spectrum since their contribution is of O(N 2 ) compared to O(N ) contributions from the autocorrelations, leading to (e) Population spectrum of the layer 4 excitatory population. (f,g) Spike-train power spectra obtained from simulations (colored) and theory (black) and the corresponding intrinsic timescale. Parameters as specified in [41]. the clear differences we see between the single-unit and the population spectrum. A notable difference between the two spectra is the peak around 30 Hz in the population spectrum, contrasting with the roughly 10 Hz peak in the single-unit spectrum. Furthermore, the population spectrum displays increased power at low frequencies compared to high frequencies, while the reverse is true for the single-unit spectrum.

C. Biologically Constrained Network Model
Thus far, we only considered balanced random networks with identical excitatory and inhibitory neurons that reduce to a single effective population. Despite this simplification, these balanced random networks already span a large parameter space. Here, we apply our theory to a multi-population network model constrained by biological data [41]. Beyond the aspect of multiple populations, this network model allows us to highlight two additional features of our theory that we left out thus far: the possibility to include external Poisson input and distributed synaptic weights. We solve the theory on a ∆t = 0.05 ms grid to a maximum of T = 10 s, use an ensemble size of k = 5 for the Gauss-Hermite quadrature, choose an update step ε = 0.1, and initialize all populations with a rate of 10 spks s.
The model represents the neurons under 1 mm 2 of surface of generic early sensory cortex. It comprises eight populations: layers 2/3, 4, 5, and 6 with a population of excitatory cells and inhibitory interneurons for each layer (Fig. 11a). In total, this leads to 77,169 neurons connected via approximately 3 × 10 8 synapses, with population-specific connection probabilities p αβ based on an extensive survey of the anatomical and physiological literature. In contrast to the original model, we directly use the connection probabilities to create the connectivity such that the total number of synapses can vary across instantiations of the model, and we draw source and target neurons without replacement, so that multapses are not allowed. Transmission delays follow truncated normal distributions with mean ± standard deviation of 1.5 ± 0.75 ms for excitatory source neurons and 0.75 ± 0.375 ms for inhibitory source neurons, both with a cutoff at 0.1ms. The synaptic strengths J αβ ij are normally distributed with µ αI J = −351.2 pA for inhibitory source neurons and µ αE J = 87.8 pA for excitatory source neurons except for connections from layer 4 excitatory to layer 2/3 excitatory neurons, which have a mean strength of 175.6 pA. For all synaptic strengths, the standard deviation is fixed to 10% of the mean. The network is driven by external Poisson input with layer-specific rates (for further details see [41]).
The intrinsic parameters of the neurons do not vary across populations. Shaped by the connectivity, a layerspecific activity arises (Fig. 11b) with mean firing rates between 1 and 10 spks s (Fig. 11c) and a standard deviation across neurons between 1 and 5 spks s (Fig. 11d). While the quantitative agreement is not perfect, our theory captures the specificity of both mean firing rate and its variability across neurons well.
A prominent feature of the model are oscillations on the population level [65] which are already visible in the raster plot of only 2% of the population (Fig. 11b). These oscillations lead to a clear peak at about 80 Hz in the power spectrum of the population activity in all layers [65]. Here, we only show a representative population spectrum (Fig. 11e). These population-level oscillations clearly violate the independence assumption of the effective inputs. Thus, they could potentially explain the deviations of the predicted firing rate from the simulation.
For most populations, the peak in the populationlevel oscillations also manifests itself in the populationaveraged single-unit spectra (Fig. 11f,g). Apart from this peak, our theory closely captures the shape of all spectra (Fig. 8f,g). Note that, despite the large heterogeneity of mean rates, the intrinsic timescale is similar across populations. As in the balanced random network, the intrinsic timescales are on the order of magnitude of the membrane time constant (here 10 ms); concretely, the intrinsic timescale is approximately twice as large.

V. DISCUSSION
We developed a self-consistent theory for the secondorder statistics, in particular the intrinsic timescales as defined by autocorrelation decay times, in blockstructured random networks of spiking neurons. Orthogonal to approaches based on the mean activity of a population of neurons, we consider population-averaged single-neuron statistics. To this end, we built on the model-independent dynamic mean-field theory (DMFT) developed in [29] and applied it to networks of spiking neurons. We sketched the derivation starting from the characteristic functional of the recurrent input, Eq. (3), to expose the inherent assumptions of the DMFT as well as its main result. In particular, we showed that the mean-field equations, Eq. (9) and Eq. (10), where the connectivity matrix enters only through its first two cumulants, account for both (static) inter-neuron variability and (dynamic) temporal fluctuations. In order to close the self-consistency problem, we derived a novel analytical solution for the output statistics of a generalized linear model (GLM) neuron with error-function nonlinearity driven by a Gaussian process (GP), Eq. (25), and an analytical approximation for the output statistics of a GP-driven leaky integrate-and-fire (LIF) neuron in the fluctuation-driven regime, Eq. (35). These theoretical results yield firing rate distributions, spike-train power spectra, and inter-spike interval distributions that are close to those obtained from numerical simulations (Fig. 2, Fig. 5, Fig. 10) even for a complex, biologically constrained network model (Fig. 11).
The excellent agreement between theory and simulations demonstrates the validity of the DMFT approximation, i.e. the approximation of the recurrent inputs as independent Gaussian processes. The validity of the DMFT approximation is most clearly demonstrated by the networks of GLM neurons, since in that case the DMFT assumption constitutes the only approximation, while the remainder of the solution is exact; while for the LIF networks, additional approximations are made, so that the effects of the DMFT assumption can be less well isolated.
Focusing on balanced random networks, we leveraged our theory to investigate the influence of network parameters on the intrinsic timescale for both GLM (Fig. 3,  Fig. 4) and LIF (Fig. 9) neurons. For the former neu-ron model with error function nonlinearity, our theory unveils that a product of two factors determines the intrinsic timescale (Eq. (27), Fig. 6): the gain of the rate autocorrelation function with respect to changes in the membrane voltage autocorrelation function for τ → ∞, Eq. (28), and the variance of the connectivity, Eqs. (6). Furthermore, providing a temporally correlated external drive causes the intrinsic timescale to monotonically approach the extrinsic timescale as the input strength is increased (Fig. 7).
For both GLM neurons with error function nonlinearity and LIF neurons, we find parameter regimes where the intrinsic timescale τ c is longer than the largest time constant of the single-neuron dynamics, the membrane time constant τ m (Fig. 5, Fig. 9). This demonstrates that the recurrent dynamics shape the intrinsic timescale. Note that we consider a regime where the inverse firing rate ν −1 is large compared to τ m . In contrast, [18,66] consider the opposite regime where slow neuronal timescales lead to effective rate dynamics, and the spiking noise is either left out or treated perturbatively. Our results show that it is possible to obtain longer intrinsic timescales even in a regime where the white component of the spiking noise contributes non-negligibly to the membrane voltage fluctuations. However, the temporal structure that causes the prolonged intrinsic timescale is very different for the two models that we consider: For GLM neurons, the autocorrelation is positive for a period on the order of τ c , corresponding to an increased spiking probability. For LIF neurons, the autocorrelation function is negative, corresponding to a prolonged effective refractory period.
Furthermore, LIF networks exhibit a minimum in the intrinsic timescale [35] while the corresponding GLM networks exhibit a maximum (Fig. 6). We hypothesize that this difference is due to the difference in the temporal structure: The minimum in the timescale for LIF networks is caused by a switch from an increased effective refractory period (a negative autocorrelation function for τ → 0) to an increased probability for another spike (a positive autocorrelation function for τ → 0). For GLM networks, this switch and hence the minimum is absent. Instead, the more subtle interplay between the gain and the variance of the connectivity leads to the maximum. The presence of a maximum rather than a minimum in the intrinsic timescales renders the GLM networks more similar to networks of rate units [15]. If similar mechanisms are at play as in rate networks, the white spiking noise of the input to the GLM neurons may temper the size of the largest possible timescale [21]. However, due to the inherent stochasticity of GLM neurons, it is unclear whether the maximum occurs at a transition to chaos as it does in rate networks [15].
Considering a more complex block-structured network model that is constrained by biological data [41] exposes limits of our theory: while the theory accurately captures the non-oscillatory components of the power spectra, it misses a high-frequency oscillation (Fig. 11). These high-frequency oscillations are caused by correlated ac-tivity on the population level [65]; hence, the peak in the population-averaged single-neuron spectra demonstrates an interplay between single-unit and population-level statistics that was absent in the simpler balanced random network models. By construction, our theory only accounts for population-averaged single-neuron statistics and thus misses the high-frequency peak. It is an interesting challenge to derive a self-consistent theory on both scales simultaneously.
In general, the limits of DMFT when applied to spiking networks merit further investigation. For example, assuming that the network is sparse, K ≪ N or p ≪ 1, is not a necessary condition for a DMFT to apply [18]. Nonetheless, increasing sparsity reduces the pairwise correlations between the neurons [58,67,68] such that DMFT is expected to yield better results. Another important aspect is that for the synaptic weights scaling as , the fluctuations of the mean input µ η (t) can be O(1), i.e., not scale with K −α , α > 0, as the network size increases and p is kept constant. In Eqs. (7), µ η (t) and C η (t, t ′ ) are replaced by their average, neglecting fluctuations; including the fluctuations of the mean input would lead to an additional term in C η (t, t ′ ) [28]. Since these fluctuations of the mean input reflect pairwise correlations, the latter need to be small for the theory to be accurate. The above scaling argument shows that it is nontrivial that the pairwise correlations vanish, even in the large network limit. They only do so for an asynchronous state in which the pairwise correlations are small already for finite networks, e.g., due to a sparse network or due to inhibitory feedback [27]. Conversely, if a network is in an asynchronous irregular state, which has low pairwise correlations by definition, DMFT is expected to yield reliable results.
The heterogeneity of timescales even within a cortical area [69] suggests another interesting extension, namely to calculate the variability of the timescale within a population. This requires calculating the variability of the second-order statistics, which has recently been achieved for linear rate networks [70] but to the best of our knowledge is an open challenge even for simple nonlinear rate networks, let alone for spiking networks.
The microscopic theory presented here enables direct comparisons with experimental measurements of neuronlevel intrinsic timescales [2], in contrast to previous works which have considered population rate models [10,71]. It is important to distinguish between neuron-level and population-level autocorrelations, since the latter are shaped by O(N 2 ) cross-correlations and can therefore differ substantially from neuron-level autocorrelations, as we have illustrated for the balanced random network model (Fig. 10e,f ) and the biologically constrained network model (Fig. 11e-g).
Establishing a direct link between the connectivity and the emergent intrinsic timescales opens up the possibility of a thorough investigation of the effect of network architecture. Moreover, within our theory, it is possible to account for population-specific intrinsic neuron parame-ters. Thus, the theory also provides an avenue for investigations of the complex interplay between intrinsic parameters [8,9] and the network structure [10]. In this context, an interesting application is clustered networks which feature slow switching between transiently active clusters [72]. In particular, clustered networks with both excitatory and inhibitory clusters [73][74][75] could be of interest because they robustly give rise to winnerless competition. From a modeler's point of view, uncovering mechanisms shaping intrinsic timescales could be used to fine-tune network models [76][77][78][79] to match the experimentally observed hierarchy of timescales [2]. Focusing on computational aspects, diverse timescales strongly enhance the computational capacity of a recurrent network [80][81][82], and neurons with long intrinsic timescales carry more information in a working memory task [83] (but see [84]). In this light, the results presented here may also contribute to improved understanding of aspects of information processing in the brain.

Stochastic Processes
The characteristic functional of a stochastic process ξ(t) is defined as In terms of the cumulants k r (t 1 , . . . , t r ) the characteristic functional can be written as All properties of a stochastic process are determined by its characteristic functional. If all cumulants except for the first vanish, the process is deterministic and has the characteristic functional In this case, the first cumulant coincides with the process itself, k 1 (t) = ξ(t). If only the first and the second cumulants are non-vanishing, the process is a Gaussian process. The corresponding characteristic functional reads If the Gaussian process is stationary, k 1 (t 1 ) = k 1 and k 2 (t 1 , t 2 ) = k 2 (t 2 − t 1 ), the characteristic functional simplifies further to . The characteristic functional describes the statistics at all points in time. It is often useful to relate the characteristic functional to the distribution of the values of ξ(t) at fixed points in time, for instance to compute the statistics of the current at upcrossings and after the refractory period, or to obtain marginal activity statistics which, given stationarity, reflect time-averaged activity. To this end, we can use the test functions u(t) = u 1 δ(t − t 1 ) and These are the characteristic functions of a Gaussian with cumulants determined by k 1 and k 2 . Knowing these characteristic functions for all times t 1 and t 2 provides the full picture; this is the marginalization property of Gaussian processes [85].

Point Processes
The equivalent to the characteristic functional for a point process is the generating functional. For a spike train {t 1 , . . . , t n } (a "system of random points" in Stratonovich's naming) with t i ∈ [0, T ] for all i, the generating functional is defined by Here, the number of spikes n is itself a random variable because the average is taken with respect to all possible realizations of the spike train [86].
For point processes, the role of the moments is taken by the "distribution functions" n r (t 1 , . . . , t r ) which denote the probability of having at least one point in each interval [t i , t i + dt]. The role of the cumulants is taken by the functions g r (t 1 , . . . , t r ) which are related to the distribution functions as the cumulants of a stochastic process are related to its moments. In terms of the g r (t 1 , . . . , t r ), the generating functional can be written as [86] . The generating functional is directly related to the characteristic functional of the stochastic process ξ(t) = ∑ n j=1 δ(t − t j ): This relation links the distribution functions n r through the g r to the cumulants of the spike train. For example, the characteristic functional of a Poisson spike train is Expanding the exponent on the r.h.s. of Eq. (52) to second order in u(t), we obtain the relations between the g r and the first two cumulants of the spike train.

B. Gaussian integrals
We solve several Gaussian integrals using the impressive table by Owen [55]. First, we introduce his notation for the standard normal CDF and PDF. Furthermore, we need Owen's T function 1 + x 2 dx. All formulas were numerically validated using numerical integration routines implemented in SciPy [87].

Free upcrossing probabilities
For the free two-point upcrossing probability, we need integrals of the form For arbitrary n, Eq. (10,01n.4) from [55] provides the solution where F ν,µ (x) denotes the cumulative distribution function of noncentral t-distribution with ν degrees of freedom and noncentrality parameter µ. Analytical expressions for F ν,µ (x) in terms of g(x), G(x), and T (h, a) can be found in [88] (the ones in [55] contain typos). Using these expressions, the solutions for n = 0, 1, 2 are T (bB, a) , where we used the shorthand notation B = 1 √ 1 + a 2 and M 0 (a, b) = aB g(bB) G(−abB), Since we consider only up to n = 2, we are spared the increasingly cumbersome expressions for n > 2.

C. Free upcrossing probabilities
The dynamics of the free membrane voltage and the current for the LIF neuron model are given bẏ where we measure time in units of the membrane time constant τ m , i.e. we set τ m = 1. Furthermore, we set ⟨η⟩ = 0, i.e. we measure U and I relative to the mean input. Lastly, we define t = 0 to be the end of the refractory period, i.e. the time when the free dynamics start evolving. First, we need the distribution of the voltage and the current. Since η is a Gaussian process, both are Gaussian for arbitrary time arguments. Thus, it is sufficient to calculate the first two conditional cumulants. Throughout, we assume a correlation-free preparation [89], i.e. we assume that η and I are uncorrelated prior to t = 0.

Non-stationary mean and variance of U and I
We need the non-stationary mean and variance of U and I to calculate the free upcrossing probability. For a given initial current and initial voltage, Eq. (54) and Eq. (55) lead to This leads immediately to the mean µ I (t) = I 0 e −t τs , To obtain the variances numerically, we use that they follow linear differential equations: taking the temporal derivatives of I(t) 2 , I(t)U (t), and U (t) 2 , using Eq. (54) and Eq. (55), and averaging leads to The initial conditions for all of the above differential equations are σ 2 I (0) = σ 2 IU (0) = σ 2 U (0) = 0. They are straightforward to solve numerically in the order that they appear, but they require two additional quantities: which can be numerically computed using a composite trapezoidal rule. If C η (τ ) contains a Dirac delta, C η (τ ) = C η (τ ) + 2Dδ(τ ), we have to separate it analytically in σ 2 Iη (t): Note the factor 1 2 because we only integrate "half" of the Dirac delta. In σ 2 U η (t), the Dirac delta does not contribute because the integrand vanishes at zero, i.e. σ 2 U η (t) =σ 2 U η (t). Ultimately, we need the cumulants of U andU instead of U and I. To relate the respective quantities, we use Eq. (54). For the initial conditions, we havė The first cumulants are µ U (t) = U 0 e −t + (U 0 + U 0 )A(t), µU (t) = −µ U (t) + (U 0 + U 0 )e −t τs where we used Eq. (54) for µU (t) and abbreviated The second cumulants do not depend on the initial conditions and we get from Eq. (54): σ 2 UU (t) = −σ 2 U (t) + σ 2 IU (t), σ 2 U (t) = σ 2 U (t) − 2σ 2 IU (t) + σ 2 I (t).
Finally, we need to marginalize the initial velocity.
We assume thatU 0 is Gaussian distributed with mean µU 0 and variance σ 2 U0 . MarginalizingU 0 again results in a Gaussian distribution because p(U 0 ) and p(U 1 ,U 1 U 0 ,U 0 ) are Gaussian. Hence, we only need to compute the cumulants. For the mean, we simply have to replaceU 0 → µU 0 . The second cumulants arẽ With this, we can evaluate the mean and the variance numerically from the statistics of η(t) andU 0 .

Initial velocity distribution
For the distribution of initial velocities, we assume that the voltage has reached a stationary distribution by the time it crosses the threshold. The velocity at an upcrossing of a stationary Gaussian process is Rayleigh distributed [38]. Because at the threshold we haveU up = −θ + I up (remember that t = 0 denotes the end of the refractory period, that the membrane resistance is absorbed into the current, and time is rescaled such that τ m = 1), the current is also Rayleigh distributed, where σ 2 I = −C U (0) with the stationary autocorrelation C U (τ ) of the free voltage. We assume that the further development of the current is also stationary, and neglect the conditional dependencies of the transition probability p(I 0 I up ) on the threshold crossing beyond I up , e.g. onİ up andÏ up . This transition probability can thus be obtained from the unconstrained ('free') stationary statistics of the current-not conditioned on a threshold crossing-which are Gaussian: p(I 0 I up ) = p free (I 0 , I up ) p free (I up ). The unconstrained joint and instantaneous distributions here function as auxiliary quantities for computing p(I 0 I up ). The unconstrained joint distribution is a Gaussian with variance σ 2 I and covariance σ 2 I R I (τ ref ) where R I (τ ) = −C U (τ ) σ 2 U . We derive C U (τ ) most conveniently by Fourier transforming Eq. (54) and Eq. (55), which leads to S U (f ) = S η (f ) (1+(2πf ) 2 ) (1+(2πτ s f ) 2 ), and using the Wiener-Khinchin theorem to obtain the autocorrelation. From the unconstrained joint and instantaneous distributions, we obtain the transition probability p(I 0 I up ), which is again a Gaussian with [85]  which is not a Gaussian anymore. We only calculate the first two cumulants, and neglect the higher cumulants to arrive at a Gaussian approximation. Finally, after the refractory time we havė U 0 = −V r + I 0 . Combining the above equations leads to which determine the Gaussian approximation of the initial velocity distribution.
Due to the linearity of Eq. (54) and Eq. (55), the distribution p(θ,U 1 V r ) is a Gaussian with the cumulants we calculated above [89]. Hence, it takes the form where u T = (θ,U 1 ) and the mean and the correlation matrix are given by .

Stationary correlation function of U andU
For the stationary two-point upcrossing probability, we need the stationary correlation functions of U ,U , and between U andU . The power spectrum of U follows from the power spectrum of η using .