Correlated fluctuations in strongly-coupled binary networks beyond equilibrium

Randomly coupled Ising spins constitute the classical model of collective phenomena in disordered systems, with applications covering ferromagnetism, combinatorial optimization, protein folding, stock market dynamics, and social dynamics. The phase diagram of these systems is obtained in the thermodynamic limit by averaging over the quenched randomness of the couplings. However, many applications require the statistics of activity for a single realization of the possibly asymmetric couplings in finite-sized networks. Examples include reconstruction of couplings from the observed dynamics, learning in the central nervous system by correlation-sensitive synaptic plasticity, and representation of probability distributions for sampling-based inference. The systematic cumulant expansion for kinetic binary (Ising) threshold units with strong, random and asymmetric couplings presented here goes beyond mean-field theory and is applicable outside thermodynamic equilibrium; a system of approximate non-linear equations predicts average activities and pairwise covariances in quantitative agreement with full simulations down to hundreds of units. The linearized theory yields an expansion of the correlation- and response functions in collective eigenmodes, leads to an efficient algorithm solving the inverse problem, and shows that correlations are invariant under scaling of the interaction strengths.


I. INTRODUCTION
Understanding collective phenomena arising from the interactions in a many-body system is the challenging subject of many-particle physics. The characterization of the emerging states typically rests on the quantification of correlations [1]. Among the simplest classical models is the Ising (binary) model [2], or Glauber dynamics [3] in its kinetic formulation. While for symmetric random couplings [4] these systems are within the realm of equilibrium statistical mechanics [5], the asymmetric kinetic Ising model [6][7][8], even in the stationary state, does not reach thermodynamic equilibrium. In their Markovian formulation [9] these processes are studied in the field of non-equilibrium stochastic thermodynamics [for review see 10, chapter 6]. The methods derived in these fields have proven useful in a variety of disciplines, including computer science, biology, artificial intelligence, social sciences and economics [see e.g. 11-16, and references therein].
Neuronal networks of the central nervous system are prominent examples of non-equilibrium systems due to Dale's principle [17,18], which states that a neuron either excites or inhibits all its targets. The correlations between the activities of nerve cells are functionally important [19]: they influence the representation of information in population signals depending on the readout in an either detrimental [20] or beneficial [21] manner, their time-dependent modulations are linked to behavior [22], and they determine the influence of neuronal activity on synaptic plasticity [23][24][25], the biophysical substrate underlying learning. In case of binary units (Ising spins), knowledge of pairwise covariances, moreover, proves use-ful for sampling-based inference [reviewed in 26] as they constitute the next order in the systematic expansion of the joint probability distribution beyond independence [cf. 3, eq. (22)].
Dynamic mean-field theory [27][28][29][30] in the N → ∞ limit effectively reduces the many body problem of a network comprised of a large number of units to a single particle interacting with a self-consistently determined field, but neglects the cross-covariances of activities between units. Ginzburg and Sompolinsky [31] introduced pairwise correlations to the description of weakly coupled systems and Renart et al. [32] extended the analysis to strongly coupled units in the large N limit. Both approaches are, however, limited to averaged pairwise correlations. Similarly, approximate master equations for binary-state dynamics on complex networks, as recently discussed in context of the pair approximation by Gleeson [33,34], are restricted to global dynamics in infinite networks.
In the current work, we derive a systematic cumulant expansion yielding an analytical description of correlations between the activities of individual pairs of units, which goes beyond averaged pairwise correlations [31,32,35] and is applicable to a single realization of an asymmetric (directed) network. The framework links the coupling structure to the emerging correlated activity and describes the first and second order statistics for single units and pairs of units in a large, but finite, network of possibly strongly interacting elements. The analytical expressions predict a distribution of pairwise correlations with a small mean but large standard deviation, as observed experimentally [36] in neural tissue and yet not explained theoretically. Moreover, the framework can be employed to infer the couplings between the units from the observed correlated activity, also termed the inverse problem [37][38][39][40].
In particular, we show that already a Gaussian truncation of the presented cumulant hierarchy yields good predictions for individual mean activities and pairwise correlations. Taking into account the subset of third order cumulants that for binary variables can be expressed by lower order ones, improves the prediction significantly. This finding demonstrates that the second order statistics suffices to capture the major features of the collective dynamics arising in random binary networks, even when the coupling is strong. Our approach consistently takes into account the network-generated fluctuations in the marginal statistics of the input to each unit in a similar spirit as the seminal work by van Vreeswijk and Sompolinsky [41]. This approach exposes peculiar features of networks of binary threshold units. For a fixed number of incoming connections to each unit, mean-field theory predicts their averaged activities to be identical. We here show how distributed mean activities arise solely from the correlations emerging in the network. Units with a hard activation threshold render covariances independent of synaptic amplitudes J: if all incoming connections as well as the threshold of a unit are changed proportionally, the mean activity and correlations are maintained. Hence, scaling the threshold appropriately, it is impossible to increase the influence of one unit on another by larger coupling amplitudes. This finding questions the customary division into strong (J ∝ N − 1 2 ) and weak coupling (J ∝ N −1 ) in such systems. We here show in addition that a network with hard threshold units implements the strongest possible coupling between stochastic binary units.
The independence of covariances with respect to coupling strengths implies that the latter cannot be reconstructed from the knowledge of the activity alone. However, the amplitude and slope of covariance functions at zero time lag together uniquely determine a linearized effective coupling strength between units. A linear approximation of the dynamics of fluctuations around the stationary state leads to a modified Lyapunov equation. Decomposing the fluctuations into characteristic eigenmodes of the network shows that, to linear order, the kinetic binary network is equivalent to a system of coupled Ornstein-Uhlenbeck processes [42,43]. The linear description further allows for the decomposition of the response to external stimulations into eigenmodes of the system.

II. FIRST AND SECOND MOMENTS OF THE JOINT PROBABILITY DISTRIBUTION
We here consider the classical network model of stochastic binary units, and denote the activity of unit k as n k (t) being either 0 or 1, where 1 indicates activity and 0 inactivity [see e.g. 31-35, 41, 44], following the notation of Buice et al. [35]. Since we may interpret a unit as representing an individual neuron, we use the terms "unit" and "neuron" interchangeably. The state of the network of N such units is described by a binary vector n = (n 1 , . . . , n N ) ∈ {0, 1} N . The model shows transitions at random points in time between the two states 0 and 1 controlled by transition probabilities. Using asynchronous update [45], in each infinitesimal interval [t, t + δt) each unit in the network has the probability 1 τ δt to be chosen for update [46], where τ is the time constant of the dynamics. An equivalent implementation draws the time points of update independently for all units. For a particular unit, the sequence of update points has exponentially distributed intervals with mean duration τ , i.e. the update times form a Poisson process with rate τ −1 . We employ the latter implementation in the globally time-driven [47] spiking simulator NEST [48], and use a discrete time resolution δt = 0.1 ms for the intervals. The stochastic update constitutes a source of noise in the system. Given the k-th neuron is selected for update, the probability to end in the up-state (n k = 1) is determined by the activation function F k (n) which depends on the activity n of all connected units. The probability to end in the down state (n k = 0) is given by 1 − F k (n).
The stochastic system at time t is completely determined by its probability distribution p(n, t). The time evolution of the two point distribution p(n, t, q, s) (describing the probability that the system was in state q at time s and is in state n at time t) obeys -due to the Markov property -the same master equation (see (41) in Section X A) as p(n, t). Using the definitions of the first moment and the equal time as well as the two time point second moments one obtains a set of differential equations for the first and second moments [31,32,35,49], which read (for completeness, in the Appendix we included their derivation in (44)) where the expectation value ⟨⟩ defined in (1) can be interpreted as an average over realizations of the random dynamics. Note that the second line does not hold for k = l, but becomes the first line due to ⟨n 2 k ⟩ = ⟨n k ⟩ =∶ m k . The third line becomes the second line for t → s. These equations are identical to eqs. (3.4)-(3.7) in [31], to eqs. (3.12) and (3.13) in [35], and to eqs. (18)- (22) in [32,Supplementary material]. These previous works have so far considered the second order statistics averaged over many pairs of neurons, i.e. ⟨n α n β ⟩ = 1 NαN β ∑ i∈α j∈β ⟨n i n j ⟩. These averages are closely related to the populationaveraged activities n α (t) = 1 Nα ∑ i∈α n i (t) and can therefore be treated by population-averaging mean-field methods. Here, we go beyond the population-averaged level and consider the second order statistics of individual pairs. Initially, we are interested in the stationary statistics of the mean activities of individual units and their zero time lag pairwise covariances c kl = ⟨n k (t)n l (t)⟩ − ⟨n k (t)⟩⟨n l (t)⟩, which can be deduced from (2)

III. GAUSSIAN APPROXIMATION
Following [32,41], we now assume a particular form for the activation function and couplings between neurons given by where the gain function F k (n) = f (h k (n)) depends on the state of the network only via the summed and weighted activity h k , which is a scalar. Here J ki denotes the matrix of synaptic couplings from neuron i to neuron k. For concreteness, when comparing the analytical predictions to numerical results, we choose where H(x) = 1 for x ≥ 0, 0 else is the Heaviside function. Intermediate results also hold for arbitrary gain functions f . If the distribution of h k would be known, the expectation value ⟨F k ⟩ could directly be calculated.
We aim at a self-consistent equation for the first and second moments of the activity variables (2). Van Vreeswijk et al. [41] and Renart et al. [32] invoke the central limit theorem to justify the assumption that the local field h k typically follows a Gaussian distribution N (µ k , σ 2 k ) with cumulants µ k and σ 2 k . These cumulants are related to cumulants of the activity variables n via with the covariance matrix c ij = ⟨n i n j ⟩ − ⟨n i ⟩⟨n j ⟩, which contains c ii = ⟨n i ⟩(1−⟨n i ⟩) on the diagonal and the vector of mean activities m i = ⟨n i ⟩. In the seminal work by [50], the influence of the cross-covariances on the variance in the input to a neuron was neglected. Instead, only the dominant contribution, given by the variances c ii of the single neurons on the diagonal, were taken into account, leading to the expression σ 2 The additional dependence of σ 2 k on the off-diagonal elements (5) is important for the distribution of mean activities, as we will show in the following. We note that due to the marginal binary statistics of n i it follows that ⟨n 2 i ⟩ = ⟨n i ⟩, illustrating that the second moment is uniquely determined by the first moment. We will exploit this property in the subsequent sections by expressing a subset of higher order cumulants in terms of lower order ones. Using the central-limit theorem argument, as outlined above, leads to To enable the extension to further corrections, we aim at a more systematic calculation of expectation values containing the gain function. Using the Fourier repre-  where we inserted the definition of h k (4) in the second line. Defining the vector t with elements t j = iωJ kj , the expectation value in the first line of the previous expression has the form ⟨exp(t⋅n)⟩ n , which is the definition the characteristic function or moment generating function of the joint distribution of the binary variables p(n) [51, p. 32], which can also be expressed by the cumulant generating function Φ n (t). Note that for compactness of notation we omitted the index j for t. The Taylor expansion of Φ n (t) explicitly introduces a cumulant hierarchy showing that cumulants of activity variables n j at all orders contribute to ⟨F k (n)⟩. If we neglect cumulants higher than order two, thus effectively approximating the binary states as correlated Gaussian variables, we use the corresponding quadratic cumulant generating function and obtain Here we identified the sums in the penultimate line as the mean and variance of the input field (5). On the other hand in the second line of (7) we identify the moment generating function ϕ h k (iω) ≡ ⟨exp ((iω) h k (n))⟩ n of the input field h k . From (10) we obtain its corresponding cumulant generating function as Φ h k (iω) = µ k iω + 1 2 σ 2 k (iω) 2 . We therefore conclude that h k ∼ N (µ k , σ 2 k ) follows a Gaussian distribution. The mean activity then writes which for f (x) = H(x − θ) is identical to (6). The procedure hence reproduces the known result for the Heaviside gain function [32, 52, supplementary material, Sections 2.2, 2.3], but additionally yields a generalization for smooth gain functions f that takes into account the fluctuations of the synaptic input, which is in line with the treatment in [53, eq. (27)]. More importantly, this systematic calculation reveals the assumption that is underlying the Gaussian approximation for the input field, namely that cumulants higher than order two are ignored on the level of individual neuronal activities.
To obtain an expression for the zero time lag covariance, we start from (3). We hence need to evaluate expressions of the form ⟨F k (n)n l ⟩ which do not factorize trivially, since the value of n l may have an influence on neuron k through the gain function F k (n). However, along the same lines as above, we can derive ⟨F k (n)n l ⟩ using the identical approximation of ⟨exp (n ⋅ t)⟩ by first noting Expressing again the characteristic function by the cumulant generating function (9) yields where we generated the factor iω by a derivative ∂ µ k in the last line. With (11) this yields , where we defined the local susceptibility S k that determines the influence of an input to neuron k onto its output at the time point of its update. For a differentiable gain function f the susceptibility is equal to the slope f ′ averaged over the fluctuations of h k [compare also 53, eq. (27)], which follows from integrating the second line (13) by parts (i.b.p.). For the particular choice f (x) = H(x − θ), the susceptibility has the form which, by (13), is the strongest possible coupling for a given input statistics. The self-consistent set of equations for the first and second cumulants thus reads To obtain the joint solution of equations (15), we use a damped fixed point iteration with the n-th iteration value denoted as (m  where we used the damping factor ρ = 0.7 and determined µ k and σ k via (5) in each step. The iteration continues until the summed absolute change of m k and c kl is smaller than 10 −14 . Figure 1 shows that the theoretical prediction yields mean activities and covariances in line with simulations of a random fixed-indegree network of excitatory and inhibitory neurons. While cross-covariances are explained with good accuracy, mean activities slightly, but systematically, deviate from simulated results: The scatter plot, showing the mean activities of individual neurons, has a slope below unity, indicating that the width of the distribution is underestimated by the theory. Moreover, we observe that the population-averaged covariances are slightly underestimated by the theory. However, in summary, the theory in the Gaussian approximation captures not only the mean and width of the covariance distribution to a large extend but also its general shape as shown in Figure 2.
The systematic calculation presented here shows that the Gaussian assumption on the level of the individual statistics directly yields the linearized result of [ Figure 3. Distribution of the summed input. Circles show the distribution assuming uncorrelated activity of the input neurons with constant rate Other parameters as in Figure 1.
of the form ⟨F k (n)n l ⟩ in (3) were obtained using relations between conditioned and unconditioned probability distributions of binary variables. After the Gaussian approximation, the resulting non-linear equation was linearized and second order terms were discussed to be small in the large N limit. In contrast, the derivation here shows that the linearization is not an additional assumption, but appears naturally once the Gaussian approximation has been performed on the level of individual variables. Note also, that no additional assumption regarding the strength of the coupling is necessary.

IV. THIRD ORDER CUMULANTS
The derivation in the previous section is systematic in the sense that all cumulants of order three and higher of the stochastic variables n are consistently neglected. The original idea of a mean-field description, however, seeks an approximation of the local field h k sensed by an individual unit k rather than a truncation of the cumulants of the individual variables n themselves. Since the local field is by (4) a superposition of a large number of weakly correlated contributions, the distribution of h k will be close to Gaussian by the central limit theorem. In the path-integral formulation, commonly employed to study disordered systems, the Gaussian approximation of h k is equivalent to a saddle-point approximation to lowest order in the auxiliary field [see e.g. 27, eq. (3.5)] [28, eq. (3)]. Even though one may expect the central limit theorem to be applicable to the summed input h k , it is easy to see that for networks of several hundreds of units higher order cumulants still have an effect. This is illustrated in Figure 3, showing the distribution of the input to a neuron receiving a sum of binary, uncorrelated signals. The Gaussian approximation with the same mean and variance as the exact distribution has a peak that is slightly shifted to the left. Taking into account the third order cumulant cures the displacement of the peak. The expansion in cumulants in Section III shows how the higher order cumulants can be taken into account in the calculation of the expectation values of the gain function.
In the following we expand the distribution of h k up to third order cumulants. To this end we use the relationship that the n-th cumulant of a summed variable is the sum of n-th cumulants of its constituents (49). We then use the property of binary variables that n K k = n k for K ≥ 1, which relates cumulants of order K + L to cumulants of order L+1 in cases where neuron k appears K times in the cumulant (see Section X D for the detailed calculation). The first two cumulants of the summed input h k = ∑ l J kl n l to a neuron are therefore given as before by κ 1,k = µ k , κ 2,k = σ 2 k (5). The third cumulant κ 3,k (by defining x l = J kl n l and y = h k and using (49)) reads where we use the notation ⟪⟫ to denote cumulants. For the mean activity (3) we need to evaluate the expectation value of a non-linear function applied to h k . We follow an analogous approach as in the previous section (see Section X C for details) and apply (51) to F k (n) = H(∑ j J kj n j −θ) yielding a perturbative treatment for the effect of the third order cumulant To obtain a correction of the covariances, we determine the contribution of the third cumulant to the term of the form ⟨F k (n) n l ⟩ = ⟨H(h k (n) − θ) n l ⟩. We apply the general result (52) with x l = J kl n l , where we need to cancel a factor J kl in the final result due to n l = x l J kl to get with where ⟨F k (n)⟩ is given by (18). The form of the terms ∂ ∂κ q,k shows that, for q ≥ 1, they correspond to an infinitesimal displacement of the q-th cumulant κ q,k by ∆κ q,kl . These corrections come about by the presence of the variable n l in the expectation value, which can alternatively be understood as a conditioned expectation value ⟨F k (n)n l ⟩ = m l ⟨F k (n) n l = 1⟩, where the condition n l = 1 changes the cumulants of h k . The first two terms in the sum (19) are identical to the Gaussian approximation in (12). The difference in the approximation schemes on the level of n and h k , respectively, is apparent in the term for q = 2, which contains a correction to the second order cumulant due to the presence of n l . This term is neglected in the Gaussian truncation of cumulants of n Section III. The consistent approximation of h k up to third order cumulants analogously requires the term for q = 3. This correction in turn depends on the fourth order cumulant ⟪n i1 ⋯n iq n l ⟫.
We now use the properties of binary variables that allow us to express a subset of higher cumulants by lower order ones due to the property ⟨n K i ⟩ = ⟨n i ⟩ (for all integers K ≥ 1). The k-th raw moment is given by the product of all combinations of lower order cumulants [54]. For the third moment this yields ⟨n l n i n j ⟩ = ⟪n l n i n j ⟫ If there are at least two identical indices, we can express the corresponding third order cumulant by the two lower orders. Both cases l = i ≠ j and l = i = j lead to the same expression (53) In the latter case we get the third cumulant of a binary variable ⟪n l n l n l ⟫ = m l − 3m 2 l + 2m 3 l which is uniquely determined by its mean. We can therefore take into account the contribution of all third order cumulants (with at least two identical indices) to the statistics of h k . Stated differently, we calculate the third order cumulant of h k using (17) and neglecting all cumulants of the binary variables where all indices are different (⟪n i n j n l ⟫ ≃ 0 for i ≠ j ≠ l). A straight forward calculation (see Section X D eq. (54) for details) yields where we use the symbol ⊛ for the element-wise (Hadamard) product of two matrices [55]. For the third cumulant appearing explicitly in (19) we perform a similar reduction that yields the matrix (55) The matrix ∆κ 3 takes the form (see Section X D for details) where the fourth order cumulants are given by (56)- (58).
To evaluate the expression (18) for the mean activity and (19) for the covariance, we need the n-th derivative of the complementary error function. We use that d dx 1 2 erfc(−x) = 1 √ π e −x 2 is a Gaussian and further that the n-th derivative of a Gaussian d dx n e −x 2 = (−1) n H n (x) e −x 2 can be expressed in terms of the n-th Hermite polynomial H n [56, 18.3 "physicists notation"]. Each differentiation with respect to µ k yields an additional factor √ 2σ k −1 . Hence we get for n ≥ 1 Using this definition and the expansion e , valid for κ 3,k small compared to σ 3 k , we get the mean activity For the term appearing in the covariance we get The latter expression shows in particular that the correction to the mean activity reappears (as the factor m k ) in the first line, indicating that the contribution of the third cumulant to lowest order (q = 0 in (19)) drops out of the covariance, as the term m l m k is subtracted in (3). In a weakly correlated state of the network, the remaining terms are smaller than m l m k as they give rise to the pairwise covariance (3). This explains why the Gaussian approximation is already fairly accurate. Simultaneously solving (25) and (26) by a damped fixed point iteration, analogous to (16), yields an approximation of the mean activities and covariances shown in Figure 4.
The deviation of the mean activities ( Figure 4a) from simulation results is reduced compared to the Gaussian approximation below significance level. The width of the distributions is only slightly underestimated compared to simulations, as exhibited by the scatter plot aligned to the diagonal and the bar graph.  The pairwise averaged cross-covariances (Figure 4b) are within the error of the simulated results, in contrast to the Gaussian approximation (cf. Figure 1b). A small contribution to the remaining difference in variance stems from the finite simulation time, naturally leading to a wider distribution in Figure 4b compared to the theoretical prediction. The bigger contribution presumably comes from the non-trivial third order cumulants ⟪n i n j n k ⟫, where all indices are unequal. Neglecting the non-trivial covariances shows that the variability of σ k from neuron to neuron is reduced, which in turn reduces the width of the distribution of the mean activities. For networks with fixed in-degree this even leads to a uniform mean activity across neurons, which, however, still matches the averaged mean activities from simulations indicating that averaged quantities are insensitive to variability in non-trivial higher-order cumulants (see Figure 5 in Section VI). Analogously we expect that the neglect of non-trivial third order cumulants underlies the reduced width of the covariances.

V. SCALE INVARIANCE OF COVARIANCES
The equation for cross-covariances in (15) yields an additional insight: Assuming that mean activities and correlations were unchanged, a scaling of all incoming synapses to a neuron k by some factor α > 0 amounts to a scaling of the strength of synaptic fluctuations in the same manner (σ k ∝ α). The fixation of mean activities can be achieved for different choices of incoming synaptic amplitudes by adapting the threshold such that (15)). A uniform rescaling of all synapses by a factor α > 0 therefore requires a change of the threshold by the same factor. The susceptibility (14) S k then scales as σ −1 k ∝ α −1 . Hence the term S k J kl appearing in the equation for covariances in (15) is invariant under this scaling. This implies that covariances are invariant with respect to the absolute value of synaptic amplitudes: The self-generated network noise causes a divisive normalization on the level of the synaptic input to each neuron. This also implies that scaling the synapses with J ∝ N −1 , J ∝ N − 1 2 , or J ∝ 1 as the network size tends to infinity all yield the same covariances, given the thresholds are adapted so that the mean activity is preserved. On the level of population-averaged covariances this has been remarked earlier [53]. The independence of the network dynamics on the absolute value of the synaptic weights even holds exactly, as noted earlier [57]. The reason is the absence of a length scale of a hard threshold; the only relevant length scale is the amplitude σ k of the synaptic noise itself. Considering a single neuron k, the condition for the neuron to be activated is h k = ∑ l J kl n l > θ k . Rescaling all incoming synapses as well as the threshold by the same factor α > 0 multiplies both sides of this inequality; the neuron switches at the same configuration n of incoming spins as in the original case. This consideration is completely in line with the work of [50]. The latter work found that, when increasing the number of neurons N , a scaling of 1 √ N is required to obtain a robust asynchronous state, if the system possesses variability of the thresholds on the scale defined as unity. The width of the distribution of the thresholds induces a second length scale into the system. Invariant behavior can therefore only arise, if the synaptic noise σ k and the standard deviation of the thresholds have a constant ratio. Scaling synapses as J ∝ 1 N in this setting leads to vanishing temporal fluctuations of h k and hence spin glass freezing, whereas for J ∝ 1 the asynchronous state persists. Fixing J, while increasing the number of neurons N , results in more inputs per neuron and thus increased temporal fluctuations, which wash out the effect of distributed thresholds.

VI. MAPPING OF FLUCTUATIONS TO ORNSTEIN-UHLENBECK PROCESSES
We here show an alternative interpretation of the Gaussian approximation in Section III. Despite the fact that the covariances obey different equations on the diagonal k = l and the off-diagonal k ≠ l (15), we can rewrite the equations as a single matrix equation (W = SJ, S = diag(S 1 , . . . , S N )) where D is a diagonal matrix with elements constrained by the condition that the diagonal entries of the covariance matrix fulfill (3) We ensure this latter condition in the following way. We will first solve (27)  The corresponding right eigenvectors are u α , which fulfill the bi-orthogonality v T α u β = δ αβ and completeness ∑ α u α v T α = 1 relation. With the notation C αβ ∶= v T α Cv β (analogously for D) we have where the latter expression follows from the completeness relation.
The expression reveals the connection between the eigenvalues λ α of the linearized coupling matrix and the fluctuations in the system. In the case of a single eigenvalue close to instability, i.e. R(λ α ) ≃ 0, the fluctuations in the corresponding eigendirection constitute the dominant contribution to the covariance matrix 2λα . This scenario could experimentally be detected in a principle component analysis, with the dominant principle component pointing in the direction of u α .
Evaluating expression (29) on the diagonal and using where the symbol ⊛ is to be understood as the elementwise multiplication (Hadamard product) of the two matrices in the numerator. The penultimate line is an ordinary matrix equation relating the (N dimensional vector) D k to the (N dimensional vector) c kk . To determine D as D = B −1 diag({c kk }) we hence need to invert the matrix B. The covariance matrix is then obtained by (29). The result (27) can be further interpreted as a mapping of the fluctuations from the binary dynamics to an effective system of Ornstein-Uhlenbeck processes. Given the set of coupled Ornstein-Uhlenbeck processes with the Gaussian white noise ⟨ξ(t)ξ T (s)⟩ = τ D δ(t − s), the stationary equal-time covariance matrix fulfills the same continuous Lyapunov equation [43, chapter 6.5] as the binary network (27). By the analogy of the continuous Lyapunov equations we see that the elements of the diagonal matrix D can be interpreted as the noise intensity injected into each neuron. The modified Lyapunov method (30) determines this intensity such that the variance of each continuous variable x k agrees to that of the corresponding binary variabe n k given by (28) that, in turn, is fixed by the mean activity (15). It is important to note that W and D in (27) and (31) themselves depend on the covariances C via the susceptibility S. This leads to (29) remaining an implicit equation for C that needs to be solved iteratively. However, by neglecting the contribution of cross-covariances in (5), W and D become independent of C, rendering (27) a linear equation for the covariances and the above projection method an efficient algorithm to compute C. This linear approximation consequently fails to predict the distribution of mean activities for networks with fixed indegree, as shown in Figure 5a. Nevertheless, the width of the distribution of the covariances shown in Figure 5b is only slightly underestimated, showing that the distribution of mean activities contributes only marginally to the distribution of covariances.
The viability of the linear approximation for the covariances shows that fluctuations in the binary model are practically equivalent to those in linear Ornstein-Uhlenbeck processes. This equivalence has been reported earlier for effective equations of the population-averaged pairwise correlations [58]. The latter work used the approximate result c kk ≃ D k 2 , which ignores the effect of the covariances onto the auto-covariances in (27), i.e. assuming that (WC) kk is smaller than c kk itself and hence neglecting the right hand side of (27) when determining the diagonal elements c kk . In the absence of self-connections this amounts to neglecting the offdiagonal cross-covariances in C. For weakly correlated network states this is a good approximation. However, in the present network setting it slightly overestimates the population-averages of the covariances as well as the width of their distribution (data not shown).

VII. RESPONSE OF THE NETWORK TO EXTERNAL STIMULI
To study how the recurrent network processes an externally applied signal, we take into account an additional external input to each neuron k, denoted as h k,ext. (t), such that h k (t) = ∑ j J kj n j (t) + h k,ext. (t). We start from the differential equation (2) for the mean activities As in the case without external input, the expectation value ⟨F k (n(t), h k,ext. (t))⟩ can be treated in the Gaussian approximation. Subtracting the stationary activity state δn k (t) = n k (t) − ⟨n k ⟩ we get after linearization where w kj = S k J kj is the effective connectivity as defined in Section VI and S k is the susceptibility (14). Hence the equation of motion for the perturbation in matrix notation reads In order to excite the system into the direction of one eigenmode of the effective connectivity W, we choose the following stimulus vector where S ∶= diag({S k }) is the diagonal matrix containing the susceptibilities and u α the right-sided eigenvector of W − 1 as defined in Section VI. The strength of the stimulus is regulated by the parameter a and its temporal profile determined by f (t). The parameter a allows us to control the amplitude of the stimulation in comparison to the strength of the synaptic noise received by each unit. For the linear approximation to hold, this amplitude must be chosen such that the input to each unit is in the linear part of the expectation value of the gain function and can hence be approximated by the slope (14). Stimulating into the direction of the real part of the eigenvectors ensures that a complex mode is excited in combination with its complex conjugate, which is necessary since the activity in the network is real-valued.
Here,ũ α is constructed such that it is normalized and compensates for the multiplication of the external input with the diagonal susceptibility matrix. We measure the deflection of the activity into the direction of the eigenmode by defining ⟨δn α (t)⟩ ∶=ṽ T α ⟨δn(t)⟩ as the projection of the activity vector onto the α-th eigenmode of the connectivity matrix W (y α defined analogously), where we with v α being the left eigenvectors of W − 1 as defined in Section VI. The time evolution of the perturbation ⟨δn α (t)⟩ is obtained by solving (32) projected onto v α and v * α , and subsequently adding the results. Hence, for any stimulus f (t) (inserted into (32) as y(t) = Sh α ext. (t)) the perturbation obeys the convolution equation We here consider an incoming DC-input starting at time t 0 = 0 and stopping at t = T , i.e. we choose f (t) = H(t) − H(t − T ). The time course of the perturbation is then given by Figure 6 shows the response of the network activity projected onto one eigenmode to the DC-stimulus in the direction of the same eigenmode. Choosing three stimulus directions associated with three representative eigenvalues from the full eigenvalue cloud of the network (Figure 6a), we demonstrate that the time course of fast (large negative eigenvalue), slowly (eigenvalue close to zero), as well as oscillatory (complex eigenvalue) decaying modes is captured by the linear approximation. This shows how the response of the network depends on the spatial structure of the stimulation. To lowest order, the transformation performed by the network is hence a spatio-temporal filtering of the input, where the responses with slowest decay correspond to the excitation of eigenmodes closest to instability.

VIII. RECONSTRUCTION OF CONNECTIVITY
In the current section we investigate the inverse problem, i.e. in how far the connectivity J kl of the network can be inferred from the observed activity. The time-lagged cross-covariance function in a network of Ornstein-Uhlenbeck processes (for t > s) fulfills the differential equation [43, chapter 6.5] This means we can uniquely reconstruct the connectivity w kl from the knowledge of the two matrices, the covariance and its slope at time lag zero as For an Ornstein-Uhlenbeck process, obeying (27) and (36), the reconstruction of the connectivity w kl is exact, as shown in Figure 8b.
We will now demonstrate that a similar relationship approximately also holds in binary networks. For t > s, the time-lagged cross-covariance function c kl (t, s) = ⟨n k (t)n l (s)⟩ − ⟨n k (t)⟩⟨n l (t)⟩ for the binary network ful- fills (45) q kl (t, s) ∶= τ ∂ ∂t c kl (t, s) + c kl (t, s) t > s = ⟨F k (n(t)) n l (s)⟩ − ⟨n k (t)⟩⟨n l (s)⟩. (38) An example of an auto-and cross-covariance function together with the slope at zero time lag is shown in Figure 7.
In the limit of vanishing time lag the form of q kl = lim t→s q kl (t, s) = ⟨F k (n(t)) n l (t)⟩ − ⟨n k (t)⟩⟨n l (t)⟩ shows that q kl measures the direct influence of fluctuations of neuron l on the gain function of neuron k. In the Gaussian approximation (see Section III) we get for k ≠ l For k = l, however, we have to evaluate ⟨H(h k (n(t) − θ) n k (t)⟩ on the right hand side. Since the term n k appears in the expectation value, the statistics of h k is effectively conditioned on the state n k = 1. Since the transitions of neuron k directly depend on the value of h k , this conditioning violates the close-to Gaussian approximation of h k n k = 1. An approximate treatment on the diagonal is possible under the assumption that the auto-covariance function of h k is dominated by contributions of the auto-covariances of the binary variables n, yielding a non-linear differential equation for the temporal shape of the auto-covariance function [50, eq. (5.17)]. Approximating the temporal profile of the autocovariance function a(s) ∶= ⟨H(h k (t + s) − θ) H(h k (t) − θ)⟩ ≃ ⟨m⟩ + ⟨m⟩(1 − ⟨m k ⟩) e − s τ , which has the right asymptotic behaviors (a(0) = m k and a(s → ∞) = m 2 k ) and assumes fluctuations to be correlated on the time scale τ of the update, leads to the approximate value [by using 50, eq. (A. 10)]. The corresponding approximation of the slope following from the former approximation with (38) is shown in Figure 8a. However, the resulting expression for the slope at zero time lag cannot be written in the form (39). Therefore, we assume that (39) also holds on the diagonal, although the resulting predicted slopes have a slight negative bias compared to simulation results.
Given the two covariance matrices entering (38), as well as the average activities m i for each neuron, we can derive a procedure for reconstructing the connectivity J kl σ k . The mean and variance of the input to a neuron are determined by (5). Inverting the relation (6) as Hence we obtain an expression for the ratio of the incoming synaptic weight and the total synaptic noise of neuron k This again shows that correlations are only controlled by the ratio J kl σ k rather than by J kl and σ k alone, in line with the invariance found in Section V. Figure 8b shows the reconstruction of connectivity from the simulated covariance functions. Due to the previously discussed approximations, the exact reconstruction of the relative weights J kl σ k is not possible for a binary network. The observed deviations from the identity line are predominantly caused by the approximation of the slope of the auto-covariance functions. The reconstructed connectivity correctly infers all excitatory and all inhibitory connections, but additionally yields a considerable number of false positive excitatory and inhibitory connections. The described procedure, moreover, allows us to determine the remaining parameters of the binary network. With regard to correlations, we are free to choose an arbitrary σ k , e.g. σ k = 1 to determine J kl by (40). As the mean activity and correlations are known, the actual magnitude of fluctuations σ loc k caused by the local inputs from the network follows from (5). If these fluctuations are smaller than or equal to our arbitrary choice (σ loc k ≤ σ k = 1), it is possible to supply each neuron with additional noise of variance 1 − σ loc. cell. Finally the threshold θ must be chosen such that Related methods for systems with sequential Glauber update in discrete time steps have previously been proposed. Mézard and Sakellariou [39] present a method that relates the equal time and one step delayed covariance matrices to the connectivity (see their eqs. (8) and (18)). A similar approach, termed TAP, is described in [38, see their eq. (14)]. In contrast, in the timecontinuous case considered here, the sum of the covariance function and its slope appear on the left hand side of (38). Qualitatively, the results for discrete and for continuous dynamics agree, as the latter sum presents the first order Taylor approximation of the covariance function at a time lag τ , which is equal to the average update interval.

IX. DISCUSSION
We here present a theoretical description of fluctuations in strongly coupled networks of large numbers of binary units that, for the first time, faithfully captures the statistics of individual units as well as pairs of units. The fluctuations are characterized by self-consistent equations for mean activity and pairwise correlations, including finite-size effects down to hundreds of units. The method can be applied to a wide range of networks, in particular networks with asymmetric and strong couplings.
Standard approaches describing symmetric systems in terms of a partition function cannot be extended to nonsymmetric coupling, because these systems do not reach thermodynamic equilibrium. Here, we set out to capture the statistics of the system in terms of the cumulant hierarchy for the activity variables. Truncating the hierarchy after second order yields a closed set of equations that already provides a good approximation for the zero time lag covariances between individual units. We show the equivalence of this truncation to the Gaussian approximation of the input field [32,50], which follows from the central limit theorem. Non-Gaussian, finite-size effects in networks of only several hundreds of units are effectively taken into account by incorporating a subset of third order cumulants that can be expressed in terms of lower order cumulants due to the binary nature of the activity variables. This significantly improves the prediction of mean activities, which are strongly affected by the non-linearity of the gain function. The resulting set of non-linear coupled equations can efficiently be solved numerically by damped fixed-point iteration.
We demonstrate that a Heaviside activation function -independent of the choice of coupling amplitudesconstitutes the strongest possible interaction for binary networks. This finding generalizes the invariance of population-averaged pairwise correlations under proportional scaling of couplings and threshold of all units [53,57] to the invariance of individual pairwise correlations under the scaling of the corresponding units' parameters. Weaker coupling in binary networks can only be achieved by a smoother activation function, but not by different scalings of coupling amplitudes, as e.g. 1 N versus 1 √ N , contradictory to frequently employed arguments in the literature [as discussed in 57]. While the scaling invariance is generic, the presented Gaussian and close-to-Gaussian approximations of the input field hold for the commonly considered narrowly distributed couplings. For wide distributions, the statistics of the input field is likely to depart from the close-to Gaussian assumption.
The contribution of cross-covariances to the marginal statistics of the input to each neuron causes a distribution of the mean activities, even in networks composed of neurons each receiving an identical number of inputs. The classical treatment of neurons with a Heaviside nonlinearity neglects this effect [50]. Still, distributed numbers of synaptic inputs typically dominate the distribu-tion of mean activities in sparsely [50] and densely connected random networks [32,49].
The Gaussian closure, i.e. the truncation of the cumulant hierarchy after second order, yields a set of equations that accurately predicts the second order statistics. This is in line with experimental evidence showing that pairwise correlations sufficiently constrain maximum entropy models of collective activity [59]. Analogously, the truncation after first order, i.e. the Curie-Weiss meanfield theory that neglects fluctuations altogether, already yields a good estimate of the first moments. In the context of balanced networks this description is sometimes referred to as "balance equations" [50, eqs. (4.1) and (4.2)]. Taking into account fluctuations in the input, due to the variance of individual units [50], constitutes an intermediate step in the cumulant hierarchy. The variance can be determined exploiting the binary character of the variables resulting in all cumulants of a single unit being fixed by the mean. A consequence of the lower order moments depending only weakly on the statistics of higher order moments is that cross-covariances can be determined from linear fluctuations around the steady state that itself is determined neglecting correlations altogether. We may therefore speculate that an approximation of third or higher order correlations can be obtained from the fluctuations around a state that itself is determined self-consistently by taking into account only up to second order correlations.
The consistent truncation of the cumulant hierarchy further provides deeper insights: The expression for covariances between individual neuron pairs follows naturally from a consistent Gaussian closure, without further approximation. In previous studies, it was obtained as a linear approximation for weak correlations [32, supplementary eqs. (31)- (33)] and population-averaged covariances [31, eq. (6.8)]. We here show that solving the modified Lyapunov equation naturally leads to a decomposition of fluctuations into eigenmodes of the system. The presented method solves the problem that the modified Lyapunov equation does not hold on the diagonal, in contrast to the case of population averages [31]. The approach moreover exposes that fluctuations in the Gaussian approximation are described by a set of coupled Ornstein-Uhlenbeck processes. It is well known that the fluctuations obtained from a systematic system-size expansion [60, chapter X], to lowest order, obey a Langevin equation. This result has been applied to networks of homogeneous populations [58,61,62]. The interesting point here is the feasibility of such a reduction directly on the level of individual binary variables. In contrast to the population-averaged activity, the jumps of a binary variable cannot be considered small compared to its value, hence the precondition to apply the system-size expansion is not valid. The approach taken here to expose the correspondence between binary dynamics and Ornstein-Uhlenbeck processes is hence complementary. In particular, the result shows that the effective noises appearing in the equivalent set of Ornstein-Uhlenbeck processes are not uniquely determined by the activities of the neurons alone, in contrast to the population-averaged case [58, sec. 3.1].
Decomposing externally applied signals to the network into the same eigenmodes as the intrinsically generated fluctuations, we obtain an expression for the susceptibility of the network on the single unit level that explains the spatio-temporal filtering applied to external signals, complementing the result for interacting populations [31, sec. V]. The expression shows how the decay properties of the induced network response are determined by the eigenvalue associated to the eigendirections stimulated by the external signal. In particular, stimuli exciting modes close to an instability yield responses with a long memory life time. In Erdős-Rényi random binary networks with fixed weights, the spectral radius of the linearized connectivity matrix is bounded by 2(1 − p) π < 0.8 < 1 [53], so that all modes of the network are stable. An instability of the mean activities can only be achieved with connectivity statistics or motifs that differ from an Erdős-Rényi network. This finding is in qualitative contrast to spiking networks that indeed show a rate instability at a critical coupling weight [63,64]. This insight has consequences for the use of random networks with fixed weights in the framework of reservoir computing [65,66], where highest computational performance [67] is achieved at the edge of chaos [for a review, see 68].
We show that the effective interaction strength, i.e. the product of a unit's susceptibility and the incoming coupling amplitude, can uniquely be reconstructed from the covariance matrix and the slope of the covariance functions at zero time lag. This relation builds on the regression theorem [43], which states that the fluctuations in the system, to linear order (31), follow the same differential equation as the time-delayed covariance functions (2). Coupling amplitudes, for principle reasons, cannot be inferred unambiguously: Correlations only depend on J σ, where σ measures the strength of the overall incoming fluctuations to the node. The inference of the effective interaction strength J σ still allows the classification of connections into excitatory and inhibitory ones, but leads to a considerable rate of false positive connections. The method presented here requires the measurement of covariances between all pairs of units in the network. In case of severe undersampling, its application is restricted to small sub-networks. Thus, in general settings, more sophisticated approaches that take into account the influence of hidden units on the observed covariances, as e.g. in Tyrcha and Hertz [40] are more promising. The presented expressions that relate the covariance matrix and its slope at zero time lag to the connectivity may still proof useful to construct similar inference methods for the investigated class of networks. Moreover, the algorithm constructing a network that generates activity with desired mean activities and covariances is a useful tool to generate surrogate data.
Previous works have addressed several aspects of pairwise correlations. The smallness of the average mag-nitude of covariances in strongly coupled balanced networks [41] has been explained by the influential work of Renart et al. [32] in the large N limit. In general, decorrelation follows from the dominant negative feedback in balanced networks at any network size [49,69]. These global properties are determined by collective fluctuations and can hence be described by population-averaged mean-field theory [31]. Here we discuss the statistics of individual units, which therefore goes beyond the existing approaches.
Our results are complementary to the method presented in Buice et al. [35], who consider Markov systems of interacting populations of neurons that consequently have integer numbers of active states, as opposed to the Glauber dynamics [3] of individual neurons considered here. In consequence, the truncation in the former work is performed on the level of normal-ordered cumulants, that represent an approximation around Poisson statistics, while here we derive an expansion in terms of ordinary cumulants [70]. Moreover, Buice et al. [35] rely on a smooth activation function, while our results are generally applicable, including deterministic single units with hard thresholds. Both formalisms generalize to higher order cumulants of activity variables.
A different approach treating the statistics of individual units in a single network realization analytically originates in the spin glass literature. Initiated by the symmetrically coupled Sherrington-Kirkpatrick spin glass model [4], an expansion of the free energy in the coupling strength (Plefka-expansion [71]) leads to the Thouless-Anderson-Palmer mean-field theory [5,12,72,73]. By construction this method is restricted to weak coupling and, in its original form, starting from the partition function, also to systems in thermodynamic equilibrium. An extension to asymmetrically coupled non-equilibrium systems has been derived invoking information theoretic arguments [74]. A related approach has been taken by [38]. Still, the extended methods rely on the smoothness of the activation function and the smallness of the coupling constants, two restrictions that we are able to overcome here. Compared to previously mentioned works, the presented approach is closer related to the meanfield theory by Mézard and Sakellariou [39]. Although they consider a system with sequential Glauber update in discrete time steps, the method can be extended to the asynchronous update investigated here. Their analysis does not require weak coupling, similar to ours, but in contrast still relies on a smooth activation function for individual units. Moreover, their theory neglects the influence of the cross-covariance on the marginal statistics (their eq. (4)), an important finite-size effect here shown to result in a distribution of mean activities due to correlations alone.
The mean-field methods employed in computer science, biology, artificial intelligence, social sciences, economics, and theoretical neuroscience [see e.g. 29,30,75] may be complemented by our results that go beyond populationaveraged dynamics. The general formalism starting from the master equation can be widely adapted by defining model-specific transition rates [see Tab. 1 in 34]. In neuroscience, with the advancement of electrophysiology, experimental data with hundreds of simultaneously recorded neurons has become readily available [76,77]. The availability of parallel data progressively changes the focus in neuroscience from the study of single cell responses to emergent phenomena arising through the interaction between neurons in networks [78]. Pairwise correlations are moreover closely linked to fluctuations of the population activity [69], which have been shown to shape experimentally accessible signals, such as the local field potential or the electroencephalogram (EEG) [79,80]. The effective description of the statistics of individual neurons, valid in the entire range of coupling strength, forms the basis for studies of dynamics on adaptive networks [81], e.g. the interaction of neuronal dynamics with correlation-sensitive learning rules [82]. Explicit expressions not only for zero time lag, but also for the slope of covariance functions allow the definition of plasticity rules resembling spike-timing dependent plasticity [25]. The finding that the collective dynamics is captured by the non-trivial second order statistics implies that theoretical descriptions of mechanisms, e.g. biologically realistic synaptic learning, which usually only rely on first and second order statistics, have now come into reach. Balanced networks show widely distributed correlations across pairs of neurons with small mean [36]. This robust feature is captured by the presented linear equations for the covariance matrix. The shape of the distribution has to date not been related to the properties of the underlying network structure. Further work is required to obtain analytical expressions exposing how the structural properties give rise to the statistics of the distribution of covariances. Such results would enable us to deduce statistics of the connections from the observed activity. The presented description of the structure of fluctuations in this archetypical model of collective phenomena by a set of non-linear equations provides a starting point in this endeavor and facilitates further development on disordered, coupled systems with large numbers of degrees of freedom.

A. Derivation of moment equations up to second order
For completeness and to establish a consistent notation, we here include the derivation of equations (2) for the first and second moments of the activity in a binary network. We follow the notation introduced in Buice et al. [35] and used in Helias et al. [49].
The stochastic system is completely characterized by the joint probability distribution p(n) of all N binary variables n. Knowing the joint probability distribution, arbitrary moments can be calculated, among them pair-wise correlations. The occupation probability of each state follows the master equation (41). We denote as n i+ = (n 1 , . . . , n i−1 , 1, n i+1 , . . . , n N ) the state, where the i-th neuron is active (n i = 1), and n i− where neuron i is inactive (n i = 0). Since in each infinitesimal time interval at most one neuron can change state, for each given state n there are N possible transitions (each corresponding to one of the N neurons changing state). The sum of the probability fluxes into the state and out of the state must sum up to the change of probability in the respective state [9], i.e.
∀ n ∈ {0, 1} N . From this equation we derive expressions for the first ⟨n k ⟩ and second moments ⟨n k n l ⟩ by multiplying with n k n l and summing over all possible states n ∈ {0, 1} N , which leads to where we denote as ⟨f (n)⟩ = ∑ n∈{0,1} N p(n)f (n) the average of a function f (n) with respect to the distribution p(n). Note that the term denoted G i (n n i ) does not depend on the state of neuron i. We use the notation n n i for the state of the network excluding neuron i, i.e. n n i = (n 1 , . . . , n i−1 , n i+1 , . . . , n N ). Separating the terms in the sum over i into those with i ≠ k, l and the two terms with i = k and i = l, we obtain ,i≠k,l n ni n k n l (G i (n n i ) − G i (n n i )) + n n k n l G k (n n k ) + n n k n l G l (n n l ), where we obtained the first term by explicitly summing over state n i ∈ {0, 1} (i.e. using ∑ n∈{0,1} N = ∑ n ni∈{0,1} N −1 ∑ 1 ni=0 and evaluating the sum ∑ 1 n1=0 ). This first sum obviously vanishes. The remaining terms are of identical form with the roles of k and l interchanged. We hence only consider the first of them and obtain the other by symmetry. The first term simplifies to n n k n l G k (n n k ) Taken together with the mirror term k ↔ l, we arrive at two conditions, one for the first (k = l, ⟨n 2 k ⟩ = ⟨n k ⟩) and one for the second (k ≠ l) moment The time-lagged correlation function can be derived along completely analogous lines as (44), as the forward time evolution equation (differential equation with respect to t) of the two point probability distribution p(n, t, q, s) fulfills, due to the Markov property, the same master equation (41) as the equal time probability distribution p(n, t). The resulting differential equation reads τ ∂ ∂t ⟨n k (t)n l (s)⟩ ≡ n,q p(n, t, q, s) n k q l = . . . = −⟨n k (t)n l (s)⟩ + ⟨F k (n(t)) n l (s)⟩.
B. Cumulants of summed random variables with x i random variables that follow an arbitrary distribution p(x). The moment generating functions of y and x are then related by where concatenation with the function ι ∶ t ↦ (t, . . . , t) replaces every t i by a t. The cumulant generating functions therefore follow as Φ y (t) = ln ϕ y (t) with Φ x (t) = ln⟨e t⋅x ⟩ the cumulant generating function of the x and t ⋅ x the scalar product. From the expansion in cumulants κ x ij... for x and κ 1,2,... for y follows the relationship so the cumulants of the summed variable κ 1 , κ 2 , etc are given by the sums of the cumulants of the individual variables of corresponding order.

C. Functions of close-to Gaussian variables
To determine the mean activity, we need to apply a non-linear function f to a variable h k that has a statistics which is close to Gaussian. For brevity, we will suppress the index k in the following. We assume that the Fourier transformf (ω) exists. Let y be the random variable (46) which is a sum of a large number of individual variables x i and is assumed to be close to Gaussian. For the expectation value ⟨f (y)⟩ y we get ⟨f (y)⟩ x = 1 2π dωf (ω) ⟨e iωy ⟩ x = 1 2π dωf (ω) e Φy(iω) = 1 2π dωf (ω) e κ1(iω)+ 1 2! κ2(iω) 2 + 1 3! κ3(iω) 3 +... , where Φ y is the cumulant generating function of y (48) and κ i denotes the i-th cumulant of y. We are interested in an approximation that treats the dominant first and second (Gaussian) cumulants of y explicitly and separate the effect of all higher cumulants by writing ⟨f (y)⟩ x = e where we identified in the last line the Fourier transform of a Gaussian with moments κ 1 and κ 2 via (50). For the covariance we need to evaluate terms of the form ⟨F k n l ⟩. These terms can be obtained analogously as where ϕ x is the moment generating function of x (47). The derivative of the cumulant generating function with (49) becomes where product rule produced a factor 2 in the second term and a factor 3 in the third term and we used the symmetry of the cumulants with respect to permutations of indices. So we get where we replaced (iω) q by derivatives of the exponential with respect to cumulants and identified ⟨f (y)⟩ x in the last step using (51).
D. Trivial third and fourth order cumulants of binary variables expressed by lower order cumulants Third order. Let n l , n i , n j , n r ∈ [0, 1] be binary variables. The raw third moment can be written as a sum of all combinations of cumulants up to order three (20) ⟨n l n i n j ⟩ = ⟪n l n i n j ⟫ + c li m j + c ij m l + c jl m i + m l m i m j .
Using n K i = n i for each integer K ≥ 1, in case of binary variables n i we consider the two cases l = i ≠ j ∶ ⟨n l n j ⟩ = ⟪n l n l n j ⟫ + c ll m j + c lj m l + c jl m l + m 2 l m j ⟪n l n l n j ⟫ = ⟨n l n j ⟩ which together yield the expression (21) in the main text.
The third cumulant (17) of h k follows with the assumption ⟪n i n j n r ⟫ ≃ 0 for i ≠ j ≠ r as κ 3,k = ijr J ki J kj J kr ⟪n i n j n r ⟫ = i≠j≠r N (N −1)(N −2) terms J ki J kj J kr ⟪n i n j n r ⟫ ≃0 +3 i=j≠r N (N −1) terms J 2 ki J kr ⟪n i n i n r ⟫ + i=j=r N terms where we included the term i = r in the first sum and compensated accordingly in the latter term. With c ii (1 − 2m i ) = m i − 3m 2 i + 2m 3 i we obtain the expression (22) in the main text. Analogously follows which yields the expression (23) in the main text.
Fourth order. The cumulant of fourth order is ⟨n i n j n r n l ⟩ = ⟪n i n j n r n l ⟫ +⟪n i n j n r ⟫m l + ⟪n j n r n l ⟫m i +⟪n r n l n i ⟫m j + ⟪n l n i n j ⟫m r +c ij m r m l + c ir m j m l + c il m r m j +c jr m l m i + c jl m r m i + c rl m i m j +c ij c rl + c ir c jl + c il c rj +m i m j m r m l , Only those cumulants in which at most two different indices appear are fixed by first and second order cumulants. We need to distinguish three cases. The first case is i = j ≠ r = l and leads with (53) to the matrix The second case i = j = l ≠ r yields the matrix and the third for i = j = r = l yields a vector where we use the notation(x⊛) n = x ⊛ . . . ⊛ x for the element-wise n-th power of a vector. In (19) we need the term