Lyapunov spectra of chaotic recurrent neural networks

Brains process information through the collective dynamics of large neural networks. Collective chaos was suggested to underlie the complex ongoing dynamics observed in cerebral cortical circuits and determine the impact and processing of incoming information streams. In dissipative systems, chaotic dynamics takes place on a subset of phase space of reduced dimensionality and is organized by a complex tangle of stable, neutral and unstable manifolds. Key topological invariants of this phase space structure such as attractor dimension, and Kolmogorov-Sinai entropy so far remained elusive. Here we calculate the complete Lyapunov spectrum of recurrent neural networks. We show that chaos in these networks is extensive with a size-invariant Lyapunov spectrum and characterized by attractor dimensions much smaller than the number of phase space dimensions. We find that near the onset of chaos, for very intense chaos, and discrete-time dynamics, random matrix theory provides analytical approximations to the full Lyapunov spectrum. We show that a generalized time-reversal symmetry of the network dynamics induces a point-symmetry of the Lyapunov spectrum reminiscent of the symplectic structure of chaotic Hamiltonian systems. Fluctuating input reduces both the entropy rate and the attractor dimension. For trained recurrent networks, we find that Lyapunov spectrum analysis provides a quantification of error propagation and stability achieved. Our methods apply to systems of arbitrary connectivity, and we describe a comprehensive set of controls for the accuracy and convergence of Lyapunov exponents. Our results open a novel avenue for characterizing the complex dynamics of recurrent neural networks and the geometry of the corresponding chaotic attractors. They also highlight the potential of Lyapunov spectrum analysis as a diagnostic for machine learning applications of recurrent networks.

When you dream, think, or read this sentence, in your brain gazillions of tiny cells called neurons are talking to each other.These neurons pass on the messages coming from your five senses by sending patterns of electric pulses to each other or to your big toe if you need to run.Neuroscientists try to eavesdrop on this complex chatter and want to understand the language the neurons speak.I use math to build a super simplified imitation of this chatter.I have randomly connected thousands of neurons inside a big computer and so they form a network that looks like a giant cobweb woven by a drunken spider.I call this imitation of real neurons my 'model'.
In my model, each neuron has a very simple rule.The rule tells the neuron how messages coming from other neurons are translated into some kind of activity.As part of its activity, it sends messages on to thousands of other neurons that it is connected to.Other scientists, using pen and paper instead of computers, found out that such networks are quiet as a mouse when the connections are weak, but they start a tumult of chatter when the connections between the neurons are strong enough.I want to better understand the space of activities and how complicated it is.It is much easier to imagine this space of possible activity patterns if you have only three neurons.You can give each neuron a number which says how active the neuron is at a particular moment in time.With three neurons, you can imagine the activity space as your bedroom.The activity of the first neuron is the direction from head to toe when you are lying in bed.The activity of the second neuron gives the position from left to right, and the activity of the third neuron gives the height above your bed.A combination of neuron activity states gives a point in space in the bedroom.Over time, the point moves around in space.Now we can ask: What patterns will form from the points of network activity if we wait long enough?Will they fill the whole room, or will they create patterns that are lying only in a small subspace?For example, any activity could lie on a thin, crumpled layer on the floor like a blanket.Or the activity points could loop around a curved line, like a hula-hoop leaning against your desk.Now think of the same questions, but with thousands of neurons whose activities span thousands of directions.It is hard to imagine this space.I discovered that although the activity of the whole network looks like a complete mess, there is a hidden pattern in the space of activities.I reveal this using a theory called chaos theory.Chaos theory can be used to understand complex groups of many small things that interact, such as the swirling of gazillions drops of water in the clouds on a rainy day.We call such a system chaotic when a tiny poke is enough to make it do something very different from what it would have done without the poke.Actually, chaos theory started already more than a hundred years ago.Back then, people wanted to understand whether our solar system is stable.By stable, they mean whether the paths of the planets are the same after you poke them a bit.If the solar system was chaotic, Earth or Mars might take a wrong turn and end up sizzling into the sun or being catapulted out of the Milky Way one day.A Russian mathematician named Lyapunov was worried about that and thought very hard about it.There are some numbers, known as Lyapunov exponents -after that Russian mathematician -that measure how fast things fly apart in a chaotic system after tiny poking or tickling.
There is an almost magical link between these Lyapunov exponents and the space of all imaginable gibberish the neurons could possibly ever talk about.I use this link to show that although the activity of a chaotic network looks like a random jumble of gibberish, there is a lot more secret order than what you would expect when only listening to the neurons one by one or two at a time.In the space of all imaginable network activity patterns, there are lots of holes, like in Swiss cheese.Actually, in the model I studied, this space is almost empty, it is made up mostly of thin air.Therefore specific network activity patterns can never occur.So is it that the neurons have secretly agreed never to talk about certain topics?No! It's instead that the wiring of the random network (the giant cobweb of the drunken spider) and the neuron rules somehow don't allow them to chat about certain things.In the following pages, I propose how to find out more about this secret order using other tricks from chaos theory.If you want to learn more, just write me an email :-)

II. INTRODUCTION
A major challenge in theoretical neuroscience, statistics, and statistical physics is to develop mathematical concepts to characterize high-dimensional activity and find collective degrees of freedom and information representations of strongly interacting populations of elements, such as neurons.Theoretical work suggested that asynchronous rate activity in neural systems may originate from chaotic dynamics in recurrent networks.A seminal study showed that large networks of randomly connected firing-rate units display a sharp transition from an inactive state to a chaotic state [1] (Fig. 1).In this class of models, each rate unit maps its synaptic input h i smoothly into a firing rate through a hyperbolic tangent input-output transfer function φ.Coupling strengths are drawn independently from a Gaussian distribution with zero mean and standard deviation g/ √ N , where N is the size of the network.A dynamic mean-field theory has been developed and is applicable in the large network limit N → ∞.In this approach, the recurrent input into a typical unit is modeled by a Gaussian process whose statistics is determined self-consistently.For small coupling g < 1, the trivial fixed point h i = 0 for all i is the only stable solution to the mean-field theory (Fig. 1A,B).For increasing coupling strength, this trivial fixed point loses stability and chaos emerges from the nonlinear interaction of unstable activity modes (Fig. 1C,D).Sompolinsky, Crisanti, and Sommers showed in the large network limit N → ∞ that above a critical strength g crit = 1, the only stable self-consistent solution is chaotic dynamics [1].The transition to chaos occurs when the spectral radius λmax of the stability matrix obtained from linearizing the rate dynamics around the fixed point h i = 0 crosses unity (Fig. 1A,C).Transition to chaos for sufficiently strong coupling g in rate networks.A Linear stability of rate dynamics near the zero fixed point.Real vs imaginary part of eigenvalues λi of the stability matrix for g = 0.99.B For subcritical couplings (g = 0.99) the trivial fixed point of the system hi = 0 is the only stable solution.C In large networks the trivial fixed point loses stability at gcrit = 1 and chaos emerges from the nonlinear interaction of rate units where the spectral radius crosses unity (gray dotted line).D Rate chaos for g = 1.2 (other parameters: network size N = 1000, integration step ∆t = 10 −3 τ ).
Recent developments in machine learning, including the renaissance of deep networks, has sparked additional interest in principles of stability and information processing in recurrent rate networks [31,32].One reason for this is that recurrent networks can be unrolled in time into infinitely deep feed-forward networks with tied weights [33].To avoid vanishing or exploding gradients during learning, this analogy suggests that learning in deep nonlinear networks is facilitated if the weights are initialized such that the corresponding recurrent networks are close to the edge of chaos (g crit = 1) [31,[34][35][36][37]. Intriguingly, transient rate chaos yields exponential expressivity in deep networks [32,38,39].
Here we calculate for the first time to our knowledge the full set of Lyapunov exponents of classical firing-rate networks.Previous studies only considered the largest Lyapunov exponent, which measures the average exponential rate of divergence or convergence of nearby network states.The full Lyapunov spectrum provides growth rates of volume elements along the trajectory and gives valuable additional insights into the collective dynamics of firing-rate networks.
We use concepts from the ergodic theory of dynamical systems to further characterize the complex collective dynamics of rate networks.Often large-scale dissipative systems evolve towards a low-dimensional attractor, but it is a challenge to identify and characterize this lower dimensional manifold.Ergodic theory provides an estimate of the attractor dimensionality, by characterizing the diversity of collective network activity states [40].It also provides access to the dynamical entropy rate which measures the amplification of dynamical uncertainty due to sensitivity to initial conditions.The dynamical entropy rate constrains the capability of information processing.Given that the initial state is known only with finite precision, the sensitive dependence on initial conditions makes predictions of future states impossible in chaotic systems [41,42].This corresponds to a dynamical entropy rate because nearby states, which cannot be distinguished by a finite precision readout initially, are pulled apart by the chaotic dynamics and become dis-tinguishable later on.Therefore, the dynamical entropy rate quantifies the speed at which microscopic perturbations affect macroscopic rate fluctuations [41].Sensitivity to initial conditions in cortical circuits might serve as a dynamical mechanism to pull nearby trajectories apart [43][44][45].If the microscopic initial state contains a relevant signal, the dynamical entropy rate measures the rate by which this information becomes accessible.From a neural coding perspective, the dynamical entropy rate can contribute to the so-called noise entropy [46], because the dynamic amplification of microscopic noise by chaotic dynamics can impair coding capacity.Both the dynamical entropy rate and attractor dimensionality are invariants of dynamical systems, i.e., they do not change under diffeomorphisms of the phase space [47][48][49][50] and can be obtained from the set of Lyapunov exponents [51].This is the only known general way of accessing the entropy of a high-dimensional differentiable dynamical system [40].Sampling-based estimates of entropy rate and dimensionality, e.g. the Grassberger-Procaccia algorithm [52][53][54] that estimates the correlation dimension D 2 , are intractable for systems with many degrees of freedom.The data required for such samplingbased estimates of the attractor dimensionality scales exponentially in D [55][56][57][58].Our approach is applicable for arbitrary network structures and transfer functions φ.We show that both the dynamical entropy rate and the attractor dimensionality saturate with coupling strength g.Thus, both uncertainty amplification due to sensitivity to initial conditions and the diversity of network activity states saturates for strong coupling.We find that time-discretization increases both the entropy rate and dimensionality.Using random matrix theory, we analytically approximate the full Lyapunov spectrum in several limiting cases.We extend the analysis to a balanced network of thresholdlinear units, where entropy rate and dimensionality peak as a function of coupling strength.We find that timevarying input reduces both the entropy rate and dimensionality.Finally, we use the Lyapunov spectrum to quantify the stability of trained networks.

III. MODEL
We study the dynamics of a randomly wired network of nonlinear firing-rate units.The dynamics of the state, h i for i = 1, 2, . . ., N , of each firing-unit follows [1,14] Here h i is the total synaptic current received by firingrate unit i and τ is the rate-unit time constant.We draw independent identically distributed entries of the coupling matrix J ij from a Gaussian distribution J ij ∼ N (0, g 2 /N ), remove self-coupling by setting J ii = 0 and choose the transfer function φ(x) = tanh(x) [1].

IV. LYAPUNOV SPECTRUM OF CLASSIC RECURRENT NEURAL NETWORKS
To calculate the Lyapunov spectrum, we evaluate the Jacobian of the flow of the dynamics.This measures how infinitesimal perturbations of the network state evolve in the tangent space along the trajectory h i .The instantaneous Jacobian is given by Thus, in our case, the Jacobian is a negative identity matrix plus the coupling matrix with columns scaled by the squared hyperbolic secant φ = sech 2 of the network activity states h i .For strong g, the variance of h i increases proportional to g [59] and most rates are in the saturated regime, so sech 2 (h i ) ≈ 0 for most i and hence most columns of D ij (t s ) are close to zero, aside from the diagonal entries.The full Lyapunov spectrum obtained by a reorthonormalization procedure [60], which is described in detail in Appendix B, including a detailed analysis of the convergence of the Lyapunov spectra.Briefly, calculating the Lyapunov spectrum involves two steps: First, we evolve an initially orthonormal system Q in the tangent space along the trajectory using the Jacobian D.
To this end, the variational equation τ Q = D(t)Q has to be integrated.A continuous system can be transformed to a discrete system by considering a stroboscopic representation, where the trajectory is only considered at certain discrete time points.We use here the notation of discrete dynamical systems, where this corresponds to performing the product of Jacobians along the trajectory We study the discrete network dynamics in the limit of small time step ∆t.The notation can be extended directly to continuous systems [61].Second, we extract the exponential growth rates using the QR-decomposition, Q s+1 = Q s+1 R s+1 , which uniquely decomposes Q s+1 into an orthonormal matrix Q s+1 and an upper triangular matrix R s+1 with positive diagonal elements.Geometrically, Q s+1 describes the rotation of Q s caused by D s and the diagonal entries of R s+1 describe the stretching and shrinking of Q s , while the off-diagonal elements describe the shearing.The Lyapunov exponents are given by timeaveraged logarithms of the diagonal elements of R s : Note that the QR-decomposition does not need to be performed at every simulation step, just sufficiently often, i.e., once every t ONS steps such that [60].An initial transient should be disregarded in the calculation of the Lyapunov spectrum because h first has to converge towards the attractor and Q has to converge to the unique eigenvectors of the Oseledets matrix (Eq.A4) [62].A simple example of this algorithm in pseudocode is: Jacobian-based algorithm for Lyapunov spectrum initialize h, Q evolve h until it is on attractor (avoid initial transient) evolve Q until it converges to the eigenvectors of the backward Oseledets matrix

Extensive spatiotemporal network chaos
In dissipative systems, arbitrary initial conditions converge towards a lower dimensional attractor.The dimensionality characterizing the diversity of collective dynamical states in this attractor, however, can be constant, grow in proportion to the size of the system, or have other more complex dependencies.If the dimensionality is proportional to the system's size, the system is called extensive, which occurs when the shape of the Lyapunov spectrum is invariant with system size.Such an invariance also implies an extensive entropy rate.
In the case of the firing-rate networks studied here, we find extensive chaos, indicated by the invariance of the shape of the Lyapunov spectrum to network size N (Fig. 2A) for sufficiently large networks (although the structure of the attractor depends on the realization of the connectivity J ij ).The Lyapunov spectrum is pointsymmetric around its constant mean value −1/τ (See Fig. 7A).We will investigate the origin of the symmetry of the Lyapunov spectrum in section.The largest Lyapunov exponent quickly saturates as a function of network size (Fig. 2B).We investigate the finite-size effect on the largest Lyapunov exponent, its convergence to the value predicted by dynamic mean-field theory and the finite-size effect in the transition to chaos g crit in Appendix D. The entropy rate H, also called the Kolmogorov-Sinai entropy rate, quantifies the amplification of small state differences by the chaotic dynamics.While formally defined via partitions of the phase space, it is under weak mathematical constraints given by the sum of the positive Lyapunov exponents: H = λi>0 λ i (See Appendix E).As a consequence of the size-invariant Lyapunov spectrum, it also grows linearly, as demonstrated over two orders of magnitude in Fig. 2C.The same is true for the attractor dimensionality (Fig. 2D), which is given by the interpolated number of Lyapunov exponents that sum to zero: Intuitively, the attractor dimension is the dimensionality of the highest dimensional infinitesimal hypersphere, whose volume does not shrink nor grow through the chaotic dynamics.In other words, on the attractor, growth along unstable manifolds is being compensated by shrinking along the stable manifolds.Thus, a Ddimensional hypersphere is merely deformed over time, with the volume preserved on average.The extensivity of the Lyapunov spectrum for rate networks was conjectured earlier [1], but never before demonstrated.Extensive chaos is often found in extended systems that are decomposable into weakly interacting subsystems, whose number grows linearly with system size [63].As this is not fulfilled for this fully randomly connected rate network, extensive chaos in our networks is not a trivial property.Globally coupled networks, for instance, can exhibit nonextensive chaos [64].

Strong coupling intensifies chaos
Next, we investigate the role of the synaptic coupling strength g (Fig. 3).The full Lyapunov spectrum shows an interesting dependence on g (Fig. 3A).For increasing g, the first half of the Lyapunov spectrum is increasingly curved (Fig. 3A).Note that the Lyapunov spectrum is point-symmetry for all values of g.The largest Lyapunov exponent shows the theoretically predicted gdependence in the stable regime g < 1 (Fig. 3B).In the chaotic regime g > 1, it grows first quadratically and then logarithmically with g in agreement with previous work [1,59].Note that the asymptotic large g behavior λ max ∝ log(g) is only expected when first sending N and then g → ∞.The calculation of the largest Lyapunov exponent is confirmed both by tracking the amplitude of a small perturbation in direct numerical simulations and by using the Jacobian-based method [60] (Fig. 3B).While the exponential separation rate of nearby trajectories increases for growing g, the overall dissipation of the system, measured by the mean Lyapunov exponent λ is independent of g and only depends on the time constant τ .The reasons for this are provided in Appendix H.We now focus first on the entropy rate and attractor dimensionality.
The dynamical entropy rate is zero for g ≤ 1 and grows monotonically for increasing values of g (Fig. 3C).Our numerical results suggest that for large g, the dynamical entropy rate peaks with g (See Fig. 21 and Appendix K).Again, this asymptotic behavior is only expected when first sending N and then g → ∞.In case the specific initial state of the network does not encode relevant information, the growth of the entropy rate with g can be interpreted as an increasing contribution to noise entropy.

Attractor dimensionality bounded for strong coupling
We found that the attractor dimension first increases with g (Fig. 3D) in the chaotic regime g > g crit and peaks as a function of g at less then 10% of the number of phase space dimensions N (Fig. 4A).This suggests that despite vanishing pairwise correlations [1], rate unit activities are not independent of each other.Even for strongly chaotic networks, the strange attractor of the network dynamics does not fill the entire phase space but only a small but extensive fraction of it.Note that the geometric structure of the attractor nevertheless changes when g is further increased.

Comparison of attractor dimension and PCA dimension
We compared the attractor dimensionality with a dimensionality estimate based on second-order statistics of the activity h i and φ(h i ) given by the effective number of principal components that account for most of the variance (See Appendix G for details).Such dimensionality estimates of the network activity based on Principal Component Analysis (PCA) are commonly used in experimental and theoretical neuroscience [9,10,[65][66][67][68][69], e.g., to quantify the spatiotemporal complexity of neural activity in a data set.PCA-based estimates of dimension are generally not invariant with respect to changes of coordinates and can be misleading if applied to limited data sets.Extensivity of the PCA-based dimension does not in general imply extensivity of the attractor dimension, nor vice versa.In addition, PCA analyses, because they are based on a pairwise correlation function, can miss low-dimensional structure hidden in higher-order correlations.In general, the PCA-based dimension can both under-and overestimate the attractor dimension.We found that a PCA-based dimension strongly differs depending on whether it is estimated based on the statistics of the firing rates φ(h i ) or on h i (Fig. 4).Generally, we find for all dimensionality estimates growth of dimension with g in weakly chaotic networks.However, for large g 1, we find a peak and subsequent slight decay of the attractor dimension (Fig. 4) and 21).In contrast, both PCA dimensions saturate for g 1 but they saturate at different levels and with distinct rates.The PCAbased dimensionality (both based on h i and φ(h i )) grows extensively with network size N , as does the attractor dimensionality (Fig. 4B).

Lyapunov spectrum of discrete-time firing-rate network
We next assess the effect of introducing finite temporal discretization.The dynamics of discrete-time rate networks has attracted much attention because it is math-  ematically simpler [8,68,[70][71][72][73].Here we aim to understand the impact of time-discretization on chaotic dynamics.We set τ = 1 and study the evolution of the map J ij φ(h j (t)).
In the limit ∆t → 0, the continuous-time dynamics [1,2,11] is recovered.For ∆t = 1, the discrete-time network [8,72,73] is obtained.The Jacobian for the discrete-time map is The full Lyapunov spectrum is again obtained by a reorthonormalization procedure of the Jacobians along a numerical solution of the map [60].For details, see Appendix B. We found a drastic effect of time-discretization on the Lyapunov spectrum (Fig. 5).In discrete-time networks (∆t = 1), the Lyapunov spectrum is not pointsymmetric anymore (Fig. 5A).The largest Lyapunov exponent grows slowly as a function of coupling strength g (Fig. 5B), as expected from previous analytical results [8].However, the slow increase of the largest Lyapunov exponent with coupling strength g is overcompensated by a faster decay of the number of positive Lyapunov exponents, which results in a peak of both dynamical entropy rate H (Fig. 5C) and attractor dimensionality D (Fig. 5D).For increasing g, the Lyapunov spectrum bents down to strongly negative values.Very negative Lyapunov exponents can be explained by an increasing fraction of rate units in saturation, resulting in a vanishing fraction of directions that carry the gradient dynamics.In the continuous-time case for large g, the Lyapunov spectrum would quickly fall to the negative inverse characteristic timescale −1/τ that originates from the leak term −h i in the dynamical equation (Fig. 9C).Here, in the discrete-time case, no such intrinsic timescale is present, and the last Lyapunov exponent becomes progressively more negative (Fig. 5A), indicating a quick divergence of the condition number of the long-term Jacobian T t (x 0 ).From a machine learning perspective, the leak term can be interpreted as a mimicking skip connection that preserves information of the network state across (unrolled) layers even if the rate units are saturated, thus ameliorating the problem of vanishing gradients [74].Next, we study the effect of gradually decreasing the time-discretization ∆t.At finite ∆t, the Lyapunov spectrum loses its symmetry (Fig. 6A and Fig. 7A), although we demonstrate that the Lyapunov spectrum again approaches point-symmetry around i = N/2 and λ i = − 1 τ for ∆t → 0 by showing convergence of the Lyapunov spectrum towards its point reflection, so |λ i + λ N +1−i − 2 λ| → 0.Even for very small ∆t, however, there exists a small asymmetry because of the neutral Lyapunov exponent.Removing the neutral Lyapunov exponent, which is associated to a perturbation in the direction of the flow (λ neutral = 0), improves the point-symmetry of the Lyapunov spectrum.Note that the symmetry of the Lyapunov spectrum originates in the approximate time-reversal symmetry of the dynamics, which only becomes exact in the limit of large N (see section VII).Thus, the Lyapunov spectrum is only point-symmetric in the limits N → ∞ and ∆t → 0. While the largest Lyapunov exponent changes only moderately -and non-monotonously -as the step size increases (Fig. 6B), the dynamical entropy rate and attractor dimensionality both strongly grow for large ∆t (Fig. 6C,D).This growth of entropy rate and dimensionality is primarily caused by an increasing number of positive Lyapunov exponents (Fig. 6A).At the same time, the negative end of the Lyapunov spectrum decreases drastically (Fig. 6A).This also strongly lowers the mean Lyapunov exponent (Fig. 6A and Fig. 7B).The mean Lyapunov exponent λ converges for small ∆t towards −1/τ .The dependence of the mean Lyapunov exponent on ∆t can be approximated analytically using random matrix theory by (Appendix H) This analytical result agrees well with numerical simulations (Fig. 7B).A The full Lyapunov spectrum reveals drastic changes for increasing ∆t.For finite ∆t, the Lyapunov spectrum loses its symmetry (See also Fig. 7 and compare with Fig. 3).While the majority of Lyapunov exponents decrease for increasing ∆t, the number of positive exponents increases.B The largest Lyapunov exponent converges for small ∆t.For increasing ∆t, it first decreases and then increases moderately.C The dynamical entropy rate converges for small ∆t and increases for large ∆t.D The attractor dimensionality behaves similar to the dynamical entropy rate, (other parameters: N = 1000, g = 10, tONS = τ , tsim = 10 4 τ ; averages across 10 network realizations).

V. ANALYTICAL APPROXIMATIONS OF THE FULL LYAPUNOV SPECTRUM
The full Lyapunov spectrum is given by the eigenvalues of the Oseledets matrix [75], where T t is the long-term Jacobian As T t (h 0 ) is a product of generally noncommuting matrices, it is considered difficult to calculate the full Lyapunov spectrum analytically [76].However, we identified several limits where temporal correlations between subsequent Jacobians vanish and analytical random matrix approximations are justified [76].First, we demonstrate an approximation for the stable regime g < g crit , second in the chaotic regime just above the transition g → g + crit , third in the limit of large g → ∞, fourth when each rate unit is driven by strong Gaussian white noise process with standard deviation σ in the limit σ → ∞ (see Sec. IX for definition), and finally in the discrete-time case without a leak ∆t = τ .We numerically confirmed that in these limits the Lyapunov spectrum becomes invariant under shuffling the sequence of Jacobians (Fig. 9).We calculated the distribution of entries of the Jacobian analytically in the limit N → ∞, where all h i follow a Gaussian distribution h ∼ N (0, ∆ 0 ).First, we calculated the distribution of y = φ analytically: with support y ∈ [0, 1], where ∆ 0 is obtained analytically from dynamic mean-field theory [1].We can thus write the Jacobian as where y j are random numbers drawn from the distribution in Eq. .The analytically predicted distributions p(y) are in excellent agreement with the results from direct numerical simulations (Fig. 8).In the limits we are discussing in the following, the long-term Jacobian can be approximated by a product of random matrices of the form of Eq. 9, In the stable regime g ≤ g crit , the Lyapunov spectrum is given by the real parts of the eigenvalue spectrum of the stability matrix Because the trivial fixed point h i = 0 for all i is the only stable solution for large N , this reduces to For J ij drawn from a Gaussian distribution J ij ∼ N (0, g 2 /N ), we find that the real parts of the eigenvalues of D ij follow the right-shifted Wigner semicircle distribution [76][77][78]: . Note that this is not only expected for J ij drawn from a Gaussian distribution, but generally for many random matrix ensembles [79].The cumulative distribution function is given by The Lyapunov spectrum in the stable regime follows from the inverse, , and with λ i measured in units of 1/τ .The analytical Lyapunov spectra are in excellent agreement with the results from direct numerical simulations (Fig. 9A, purple (g = 0.2) and maroon (g = 0.7) lines).
Figure 9. Analytical approximations of the full Lyapunov spectrum.A Lyapunov spectra of autonomous continuous-time rate networks for different coupling strengths g, where g is color-coded from purple (small g) to red (large g), dashed lines are analytical results (Eq.10) for the stable case and for g → g + crit (Eq.14), full transparent lines are numerical results.B In the discrete-time case without leak (∆t = 1), the Lyapunov spectrum is invariant under shuffling for any g and can be approximated analytically by the triangle law for small g (Eq.16).C Lyapunov spectra of autonomous continuous-time rate networks for g 1.For large g, the Lyapunov spectrum becomes invariant under shuffling the temporal sequence of Jacobians at fixed ∆t = 0.1.D Lyapunov spectra of driven continuous-time rate networks for different input strength σ at fixed g = 1000.Again, for large g and σ, the Lyapunov spectrum becomes invariant under shuffling the temporal sequence of Jacobians at fixed ∆t = 0.1, and the full Lyapunov spectrum can be approximated by a product of random matrices with entries given by Eq. 9. (Parameters if not stated differently: for g = 1.2 in A N = 8000 and tsim = 10 3 τ , else N = 1000, tsim = 10 4 τ , ∆t = 0.1τ , tONS = τ ).
Next, we consider the limit g → 1 + close to the transition to chaos.For that, we need to estimate both the distribution of Jacobian entries and its autocorrelations.The autocorrelations of the activity can be solved self-consistently [1,59].Close to the chaotic instability g → g + crit , the autocorrelations are approximately [1,59] Thus, the timescale of the autocorrelations of h i diverges when approaching g crit with Therefore the autocorrelations of D ij diverge with the same time constant τ −1 D = (g − 1)/ √ 3. Consistent with these analytical considerations, numerical simulations show that for g 1 the Lyapunov spectrum obtained after shuffling the sequence of (almost identical) Jacobians is almost the same.Thus, we conjecture that the Lyapunov spectrum is given by the logarithms of the singular values of a product of almost identical random matrices, which is still approximately given by the Wigner semicircle distribution [76,78] Here, we used the analytical knowledge of the largest Lyapunov exponent obtained from dynamic mean-field theory [1,59], which behaves in the limit g → g + crit = 1 + as The Lyapunov spectrum in the chaotic regime for g → g + crit follows the inverse of χ chaos : The analytical Lyapunov spectra are in good agreement with the results from direct numerical simulations (Fig. 9A, red line for g = 1.2).The approximation breaks down if g is too large and becomes more accurate as g → g + crit .Next, we consider the limit of large g → ∞.Expanding the solution to the self-consistency equation for the autocorrelation of the local fields h in this limit around t = 0 yields in that case with ∆ 0 = 2(1 − 2/π) [1,59].But how can we deal with correlations between subsequent Jacobians?We note that the autocorrelations of the Jacobians become arbitrary short in the limit of large g, although the autocorrelations of the activity variables h approach Eq. 15.
For large g, the model behaves like the fully asymmetric Ising spin glass model [59,80].Substituting ∆(t ) = ∆ 0 exp(−t /τ h ) into the self-consistency equation and taking the large t limit yield a relaxation rate for the autocorrelation equal to τ Thus, the autocorrelation of D ij relaxes approximately with τ D ∼ τ h /g.Intuitively, for large g, most rate units are in saturation, and rate units cross the non-saturated regime where they are susceptible to perturbations in shorter time windows.The vanishing autocorrelation time of the Jacobians D ij explains why the Lyapunov spectrum becomes invariant under shuffling of the sequence of Jacobians and justifies the approximation of the long-term Jacobian by a product of uncorrelated matrices drawn of the form of Eq. 9.As expected, the analytical approximations of the Lyapunov spectra approach the results from direct numerical simulations when the values of g increase (Fig. 9C).We also find that for strong uncorrelated input (see Sec. IX for numerical results), the Lyapunov spectrum becomes invariant under shuffling the sequence of Jacobians.With increasing input drive σ at fixed g, all Lyapunov exponents converge towards the negative inverse of the characteristic timescale −1/τ (not shown).When simultaneously increasing g and σ, the Lyapunov spectrum becomes invariant under shuffling the Jacobians at finite nontrivial values of the Lyapunov exponents (Fig. 9D).Finally, in the discrete-time case ∆t = 1 without a leak, temporal correlations between subsequent Jacobians can be neglected for large N [8].For g → 1 + , the Lyapunov spectrum can thus be obtained from a product of uncorrelated Gaussian matrices, whose eigenvalue distribution follows approximately a triangle law [81,82].The full Lyapunov spectrum (Fig. 9B) in this limit can thus be approximated by where the largest Lyapunov exponent λ max can be obtained analytically as described earlier both in the discrete and continuous-time case with constant input and frozen white noise drive [1,2,8,11,59].

VI. LYAPUNOV SPECTRUM OF BALANCED RATE NETWORK WITH THRESHOLD-LINEAR TRANSFER FUNCTIONS
While odd symmetric saturated sigmoid transfer functions, e.g., φ(x) = tanh(x) are popular because of their mathematical tractability [1,59] and because the saturation prevents runaway activity, the firing rate of many cortical neuron types seems in a physiological operating regime not to be limited by intrinsic electrophysiological features.Evidence for this comes from the obser-vation that artificially driven neurons can fire at much higher rates [83] than they actually do in experimental recordings of awake behaving animals [84].In balanced networks, large externally incoming excitatory currents are dynamically canceled by net inhibitory recurrent currents, which yields a broad parameter regime of asynchronous irregular activity in spiking network models [85][86][87][88][89].Such balanced state models were recently extended from spiking networks to firing rate networks [2,4,7,90].
Here we extend our Lyapunov spectrum analysis to balanced networks with a threshold-linear transfer function and investigate the role of the synaptic coupling strength g on the Lyapunov spectrum (Fig. 10).Threshold-linear transfer functions are also commonly used in deep learning [91,92].Another reason to investigate this transfer function is that experimentally measured neural nonlinearities in sensory neurons have been approximated by a power-law threshold nonlinearity [93][94][95][96][97].
For simplicity, we focus on the dynamics of an inhibitory network of N threshold-linear rate units that balance a constant excitatory external input.The dynamics of each firing-rate unit follows where I is a positive constant and φ(x) = max(x, 0).We draw entries of the coupling matrix J ij from a Gaussian distribution J ij ∼ N (−µ/N, g 2 /N ) (similar to [2]).As in the previously considered tanh networks, the first half of the Lyapunov spectrum is increasingly curved for increasing g (Fig. 10A).For large g the network dynamics turns unstable and the activities h i diverge [2], therefore there is no chaotic large g-limit for fixed mean coupling strength J.The divergence occurs at much larger g than displayed in Fig. 10.The largest Lyapunov exponent shows the analytically predicted behavior [2].We confirmed the results obtained from the Jacobian-based method [60] by tracking the amplitude of a small perturbation in direct numerical simulations (Fig. 10B).The dynamical entropy rate is zero for g ≤ g crit and first grows for increasing values of g (Fig. 10C) up to a peak value g peak .Beyond the peak, the growth of a small fraction of positive Lyapunov exponents is overcompensated by a decreasing number for large g; thus, the entropy rate decreases for large g.We found that the dimensionality also peaks as a function of g for balanced networks with threshold-linear transfer functions (Fig. 10D), but at higher values than for the entropy rate.Thus, for larger g, the diversity of network states as quantified by D decreases.

VII. POINT-SYMMETRY OF LYAPUNOV SPECTRA FOR CONTINUOUS-TIME DYNAMICS
A symmetry of Lyapunov spectra around zero is usually found in dynamical systems with a symplectic structure [98,99].This is given, for example, in Hamiltonian systems, where the Lyapunov spectrum is symmetric around zero.Symmetry around a negative value was previously described in a class of dissipative dynamical systems with viscous damping [98].
The recurrent neuronal networks we considered have an asymmetric connectivity J ij = J ji .Thus, there is no conservation of energy, thus there is no time reversal symmetry.Also, a pseudo-Hamiltonian structure is not given.Moreover, our findings indicate that symmetric Lyapunov spectra are not a generic feature of recurrent neural networks.But was is the origin of the symmetry in our case of recurrent neural firing rate networks?The symmetry of the Lyapunov spectrum of recurrent networks in the continuous-time limit originates in the approximate time-reversal symmetry of the dynamics.This can be directly seen by a change of variables into a reference frame that contracts with time.Introducing the new variables z = e t τ h turns the original equation of motion to Making the replacement z(t) = z(−t), gives Thus, with reversed time, one obtains the same dynamics with coupling matrix Jij = −J ij .Jij follows the same distribution as −J ij .As for large N , the Lyapunov spectrum does not depend on the realization of the network connectivity, which is drawn from a Gaussian distribution J ij ∼ N (0, g 2 /N ), the Lyapunov spectrum is invariant under flipping the sign of J ij .Thus, the dynamics is statistically invariant under time-reversal, where 'statistically invariant' means under the statistics of the connectivity.The Oseledets matrix in the contracting reference frame is Growing tangent space volume elements in forward time correspond to shrinking tangent space volume elements in backward time.Because of the time-reversal symmetry, they are approximately inverse, i.e., the eigenvalues of the Oseledets matrix satisfy where +(−) indicate forward (backward) time direction.Thus, the Lyapunov exponents, given by the logarithm of the eigenvalues of the Oseledets matrix satisfy where the factor − 1 τ comes from the shrinking reference frame.Note that in contrast to Hamiltonian systems where the symplectic structure of the Hamiltonian implies an exact symmetry of the Lyapunov spectrum (around zero), here, the symmetry is only approximate for finite-size networks and around the negative inverse of the characteristic time scale that acts through the leak term as global damping on the dynamics.Also note that in autonomous systems that are not at a fixed point, there is always a zero Lyapunov exponent λ neutral = 0 corresponding to neutral shifts in the direction of time that does not have a symmetric analogue at λ = − 2 τ .Finally, we note that our symmetry argument assumes that there exist statistically analogous backward trajectories, which is generally not correct but justified in our case because of the statistical mirror-symmetry of the connectivity p(J ij ) around zero.For example, this does not generally hold after training, where the negative connectivity Jij = −J ij can yield statistically very different dynamics.

VIII. DELOCALIZATION OF THE FIRST COVARIANT LYAPUNOV VECTOR
To quantify how many rate units contribute to the chaotic dynamics at each moment in time, we investigated properties of the covariant Lyapunov vectors v (k) (t).The first covariant Lyapunov vector gives at any point in time the direction in which almost all initial infinitesimal perturbations grow with average rate λ max .It corresponds to the first Gram-Schmidt vector and is denoted here as v with The number of rate units contributing to the maximally growing direction at time t can be measured by the participation ratio −1 [100][101][102].If all rate units contribute equally to the Lyapunov vector |v i (t)| = 1/ √ N , the participation ratio is P (t) = 1/(N/N 2 ) = N .If only one rate unit contributes to the Lyapunov vector, the participation ratio is P (t) = 1.The Lyapunov vector of firing rate networks indicates that a temporally varying subset of rate units governs the most unstable direction (Fig. 11A, C).There is only a moderate temporal fluctuation of the participation ratio (Fig. 11C), which declines proportionally to 1/ √ N with network size (not shown).
The participation ratio P = P (t) was independent of g both for networks with tanh and threshold-linear inputoutput transfer function (Fig. 11B).
To further characterize the nature of the chaotic collective network state, we investigated the scaling of the mean participation ratio P with network size.Whether the Lyapunov vector is called localized or delocalized depends on how P scales as a function of network size N .A delocalized state is indicated by a linear scaling P ∼ N , while in the case of a localized state, the participation ratio is independent of N .We found a linear scaling P ∼ N (Fig. 11D) for both tanh and threshold-linear transfer function.This is consistent with assuming that entries of the first covariant Lyapunov vector are independent Gaussian with v i ∼ N (0, 1/N ), which yields This is in contrast to chaos in sparse spiking neural networks, where a sublinear scaling of the participation ratio with network size has been reported for sparse networks of quadratic integrate-and-fire neurons in the balanced state [89].We conclude that the direction of greatest instability in random rate networks is supported by a macroscopic number of rate units, which indicates the existence of collective Lyapunov modes that characterize the instability of the collective dynamic.This is a promising direction for future research that might link the microscopic phase space structure to macroscopic modes of activity [103].

IX. LYAPUNOV SPECTRUM OF EXTERNALLY DRIVEN NETWORK
Thus far, we have analyzed the autonomous dynamics of a deterministic firing-rate network, but it is interesting to extend this to a non-autonomous system driven by time-varying input [8,10,11,73,104].We consider an input-driven network, with τ = 1, where in the case considered here ξ i are fixed realizations of independent Gaussian frozen white noise processes with autocorrelation function ξ i (t)ξ i (t + t ) = τ σ 2 δ(t ).
To assess the dynamic stability of the (frozen) stochastic differential equation, we employ the theory of random dynamical systems (RDS).This theory characterizes how reliably different initial states respond to a frozen external input realization.We call a system reliable if different initial conditions converge to the same (time-dependent) trajectory, and unreliable otherwise [105].More formally, the evolution of a sample measure µ t ξ is studied for a frozen noise realization ξ(t) with t ∈ (−∞, ∞).This is described in more detail in Appendix F. The mathematical expression for the Jacobian of the flow of the dynamics is the same as in the autonomous case Eq. 2. However, despite this similarity, an external input can have a strong effect both on the distribution of h i and on the autocorrelations ∆ i (τ ) = δh i (t)δh i (t + τ ) .
First, input fluctuations increase the width of the distribution of h i , meaning that more units are in the saturated regime, and the Jacobian becomes sparser, which suppresses chaos [8].Second, input fluctuations temporally decorrelate network states, which destroys temporal correlations of subsequent Jacobians resulting in an independent dynamic reduction of chaos [11].The full Lyapunov spectrum, which is independent of input realization ξ [106], is again obtained by a reorthonormalization procedure of the Jacobians along a numerical solution of the stochastic differential equation integrated with the Euler-Maruyama method [60].For details, see Appendix B. We explored the effect of increasing input strength σ on the Lyapunov spectrum (Fig. 12).For increasing input strength σ, the Lyapunov spectrum is increasingly pushed towards the mean Lyapunov exponent −1/τ (Fig. 12A).Increasing σ monotonically reduces the largest Lyapunov exponent as previously observed in discrete [8,73] and continuous-time [11] (Fig. 12B).A similar effect has been observed in rate networks driven by periodic input [9,10].A For increasing input strength σ, the Lyapunov spectrum is increasingly pushed towards the mean Lyapunov exponent −1/τ .B The largest Lyapunov exponent decreases, and the transition is smoothed, consistently with previous work [11].C The dynamical entropy rate H is reduced.D The relative attractor dimensionality D/N decreases for increasing σ. (Parameters: N = 1000, ∆t = 10 −2 τ , tONS = τ , tsim = 10 3 τ , averages across 10 network realizations).

Input fluctuations reduce the dynamical entropy rate and attractor dimensionality
The dynamical entropy rate for a given external input, which is calculated from the sum of the positive Lyapunov exponents, decreases for increasing external input strength σ (Fig. 12C).For sufficiently strong input, the entropy rate drops to zero.Thus, time-varying input im- pedes the flow of information from the microscopic states to the macroscopic network states.If the information in the microscopic state is considered to be noise, one can conclude that stronger external input fluctuations reduce the noise entropy arising from sensitivity to initial conditions.The attractor dimensionality also decreases for increasing input strength σ (Fig. 12D).Sufficiently strong input suppresses chaos, implying that the sample measure collapses on a (wandering) random sink [107,108].In other words, almost all initial conditions converge onto a set of measure zero.Thus, while the network dynamics with strong time-varying input might still seem to be high-dimensional, the attractor dimensionality given the external input can shrink drastically with a time-varying external input.Such a transition is relevant for information processing because the network loses its dependence on initial conditions, which could be a desirable feature if the network must reliably generate different output trajectories for different input patterns [14][15][16].

X. APPLICATIONS TO QUANTIFYING STABILITY OF TRAINED RECURRENT NEURAL NETWORKS
The networks we studied up to this point had random connectivity, but collective network dynamics is strongly shaped by wiring and learning algorithms for training recurrent neural networks in machine learning work by tuning connectivity.We now show that training a recurrent network to perform a task is reflected in the dynamic stability, as quantified by the Lyapunov spectrum, and show in some examples how it can affect the dimensionality and dynamic entropy rate.
During training, network dynamics becomes confined to a low-dimensional manifold (Fig. 13).When initializing with a random network structure in the chaotic regime (g > 1), the dynamics evolves on a high-dimensional attractor that spans an extensive fraction of the full Ndimensional phase space (Fig. 2).A projection of the high-dimensional strange chaotic attractor onto the first two principal components is shown for a network of 50 rate units in Fig. 13B.After training the network to perform a simple sine oscillation through a linear readout, the network dynamics is confined to a periodic orbit (Fig. 13D).Note that the sine is computed by the coordinated activity of many rate units together that individually have dynamics different from the sine target (Fig. 13C).For this task, the largest Lyapunov exponent becomes zero after training.This is expected for an autonomous network because all but the neutral direction along the flow become stabilized.Therefore, the dynamic entropy rate is trivially zero, and the attractor dimension is unity.
The Lyapunov spectrum can also be used as a quantification of how stable trajectories are after training.In Fig. 14, we compare the result of training a recurrent rate network to output an oscillation with temporally varying frequency in response to an input pulse with three different training algorithms, backpropagation through time (BPTT), FORCE [15] and full-FORCE [109].In BPTT, the full recurrent weight matrix and a readout vector are iteratively adapted by minimizing an error function using (stochastic) gradient descent [110].FORCE recursively updates a rank-one perturbation u i w j to J ij such that the linear readout z(t) = j w j φ(x j (t)) matches a (potentially time-varying) target output.Full-FORCE does a full-rank recursive update of a task-performing network to match for each unit the activity to a teacher network.For full-FORCE, the Jacobian of the dynamics is for FORCE, it is We obtained Lyapunov exponents by evolving an orthonormal basis along the trajectory using the analytical Jacobians as described before and in more detail in Appendix B.
We find that the full-rank method full-FORCE results in a more negative largest Lyapunov exponent and thus a microscopically more stable dynamics (Fig. 14).Moreover, subsequent Lyapunov exponents drop quicker towards the negative inverse of the characteristic timescale −1/τ .The external input pulse makes the dynamics nonautonomous; therefore, no neutral Lyapunov exponent occurs.Note that convergence of infinitesimally different initial conditions does not necessarily imply stability with respect to finite-size perturbations.For example, in spiking networks there exists multistability [111], and also trained firing rate networks often exhibit multistability (not shown).

XI. LYAPUNOV SPECTRUM OF RECURRENT LSTM NETWORK
Training recurrent neural networks on tasks that involve long time lags with gradient-based methods is hampered by the loss of gradient information.Long short-term memory (LSTM) units were introduced to ameliorate this problem of vanishing or exploding gradients by adding a latent -potentially slow -additional degree of freedom for each rate unit with dedicated input, output, and forget gates that conspire to retain information over extended time lags [112].The dynamics of the LSTM units follow [112]: where denotes the Hadamard product, σ(x) = 1 1+exp(−x) is the sigmoid function, and entries U x are drawn from U x ∼ N (0, g 2 x /N ) and the bias terms b x are scalars for simplicity.The full Lyapunov spectrum is again obtained by a reorthonormalization procedure of the Jacobians along a numerical solution of the map [60].For details, see Appendix B. As a proof-of-concept, we calculate Lyapunov spectra of recurrent LSTM networks in the autonomous case W x = 0 (Fig. 15).We find that 0.0 0.5 1.0 i/2N saturating the forget gates by increasing b f of the LSTM network results in slow latent modes and an accumulation of Lyapunov exponents close to 0 (Fig. 15A).For increasing bias current in the forget gate b f , the first half of the Lyapunov spectrum is increasingly pushed towards zero, concomitantly the autocorrelation of the latent state indicates the emergence of slow latent modes.This finding is consistent with previous theoretical work based on spectra of the state-to-state Jacobian D (analogous to Eq. 2 in our case) which suggested an accumulation of eigenvalues of the Jacobian close to 1 for closed forget gates [36,37].At the same time, the second half of the Lyapunov spectrum sharply drops to very negative values for increasing b f , similarly to the classical tanh rate network with discrete-time dynamics (Fig. 5A), with no plateau from an intrinsic characteristic timescale (as for example the one coming from a leak term (Fig. 9C), or a synaptic integration timescale, or adaptation current [4,6,113]).Moreover, our results indicate that LSTM networks can have a high attractor dimensionality D even in a weakly chaotic state when saturating the forget gates (Fig. 15D), as a growing number of near-zero Lyapunov exponents is necessary to yield a total sum of zero.In contrast, the dynamical entropy rate H is only governed by positive Lyapunov exponents, which do not reflect the large number of Lyapunov exponents close to zero for increasing b f (Fig. 15C).We find this phenomenon independent of network size N , which suggests extensive chaos (not shown), as in the classical rate networks (Fig. 2).Driv-ing each LSTM with independent Gaussian white noise process independent (W x = 0) into leads to a reduction of chaos, decreasing dynamical entropy rate H and attractor dimensionality D as expected from the case of the classical rate networks [11] (Fig. 12

) (not shown).
To what extent ou findings of the asymptotic dynamics of random LSTM networks can explain typical training scenarios with time-dependent external input, performanceoptimized network structure, and finite time intervals remains to be determined in future studies.

XII. RELATING GRADIENTS IN BACKPROPAGATION THROUGH TIME TO THE FULL LYAPUNOV SPECTRUM
We found a direct mathematical link between features of the Lyapunov spectrum of the recurrent network dynamics and the problem of vanishing and exploding gradients when training with backpropagation through time.In backpropagation through time, all connection weights are iteratively updated by stochastic gradient descent such that locally a loss is reduced [114][115][116][117].The gradient of the loss with respect to the weights of the recurrent network is evaluated by unrolling the network dynamics in time.The resulting expression for the gradient involves the long-term Jacobian, which is also used to calculate the Lyapunov spectrum (See Appendix J).A common problem in training recurrent networks is that the gradients tend to vanish or to explode, especially in case of long temporal dependencies [33,110].
The singular values of the long-term Jacobian, which determine how quickly gradients vanish or explode during backpropagation through time, are directly related to the Lyapunov exponents of the dynamics: The Lyapunov exponents are given by the logarithm of the singular values of the long-term Jacobian (See Appendix J).This justifies a couple of conclusions for backpropagation through time.To avoid diverging or vanishing gradients, recurrent networks should be initialized such that many singular values of the long-term Jacobian are close to one [31,[34][35][36][37][38], which amounts to having many Lyapunov exponents of the forward dynamics close to zero, corresponding to slowly growing/shrinking directions in tangent space.As the product of Jacobian is generally numerically ill-conditioned, we suggest using the orthonormalization procedure discussed here to quantify the stability of the tangent space.Furthermore, the trainability of RNNs, as quantified by the maximum time difference a recurrent neural network can be trained across using BPTT before running into vanishing/exploding gradients can be quantified by Lyapunov exponents of the forward dynamics.We can thus use Lyapunov exponents to compare the effect of different initializations, nonlinearities, and optimizers on trainability.We predict that after learning long-term dependencies, there should be some Lyapunov exponents close to zero reflecting the slow timescales.

A. Summary
We used canonical measures from the ergodic theory of strange attractors to characterize the chaotic dynamics of randomly wired networks of firing-rate units.This is to our knowledge the first time the full Lyapunov spectrum of a continuous-time random rate network has been calculated and used to study dynamical entropy rate and attractor dimensionality.We showed that, in the classical model, dynamical entropy rate and relative attractor dimensionality first grow and then saturate for as a function of coupling strength g.Thus, both the intensity and diversity of network activity states saturates for strong coupling, despite a monotonously growing largest Lyapunov exponent.We analytically approximated the full Lyapunov spectrum in several limiting cases using random matrix theory.We found that time-varying input reduces both entropy and dimensionality.
We demonstrated that the shape of the Lyapunov spectrum is size invariant and exhibits a linear growth of attractor dimensionality and entropy rate with network size N .This is clear evidence of extensive chaos, which was previously conjectured in [1].We further found the Lyapunov spectrum to be point-symmetric around the mean Lyapunov exponent −1/τ , which we derived analytically (Appendix H).Note that the symmetry would be around −1 if time is measured in units of τ .A symmetry of Lyapunov spectra around zero is usually found in dynamical systems with a symplectic structure [98,99].Symmetry around a negative value was previously described in a class of dissipative dynamical systems with viscous damping [98].We found a strong effect of time-discretization: increasing the step size breaks the symmetry of the Lyapunov spectrum and sharply increases the entropy rate and dimensionality.This has methodological implications for further studies: The parallel update of chaotic discretetime network dynamics has fundamentally different properties than the continuous-time limit.
In balanced networks of threshold-linear units, we found that both the entropy rate and attractor dimensionality increase for small values of g, as in the classical model.For large values of g, first the attractor dimensionality and later also the entropy rate peak as a function of g.Different from the classical tanh model, the Lyapunov spectrum is not point-symmetric.Time-dependent input reduced both the entropy rate and attractor dimensionality.For strong input, we found that all trajectories collapsed to a time-dependent random sink.If the input is interpreted as an input signal, this means that trajectories are reliable across repetitions of the same frozen input realization and do not depend on the initial conditions of the recurrent network.Finally, we showed that Lyapunov spectra are a useful tool to characterize dynamic stability properties of trained networks and to analyze the solution trajectories without assuming fixed points or 'slow points' as done, for instance, in [118][119][120][121].Moreover, we show a direct link between the Lyapunov exponents of the forward dynamics and the gradient stability when training recurrent neural networks with backpropagation through time, which constrains trainability and stability.

B. Relation to previous work
Firing-rate networks can generate spontaneous ratefluctuations by recurrent chaotic dynamics [1].Mechanisms underlying rate chaos have attracted substantial attention in studies of network heterogeneity [3], bistability [5], external stimuli [8][9][10][11] and the role of the single unit transfer function [2] and slow synaptic dynamics [4,7] for the collective network state.See also, e.g., [72,73,90,[122][123][124][125][126].Our approach provides a toolkit from dynamical systems theory to analyze how these different factors shape the complex rate dynamics.We compared the attractor dimension with a dimensionality estimate based on principal component analysis, which is commonly used in neuroscience [9,10,65,66,127].We find a qualitatively similar but quantitatively different behavior of the PCA-based dimensionality and the attractor dimension: both saturate with synaptic strength for g > 1 but they saturate at different levels and with distinct rates.Note that Lyapunov exponents and the attractor dimension are invariant under diffeomorphisms of the phase space [50], while PCA-based dimensionality estimates are generally not invariant with respect to changes of coordinates and can be misleading for limited data sets [128].Because the PCA-based dimensionality estimates are based on a two-point correlation function, they miss low-dimensional structure hidden in higher-order correlations.Generally, the PCAbased dimensionality can both under-and overestimate the attractor dimensionality.
Our approach allows interpolation from continuous-time to discrete dynamics.Discrete-time dynamics of rate networks has previously been studied in random diluted networks [72], noise-driven networks [8] and on a ring network [129,130].Chaotic rate dynamics provide a substrate for complex nonlinear computations, such as learning input-output relations [13,15,17,26,119,131] and learning temporal sequences [16].Intriguingly, transient rate chaos yields exponential expressivity in deep networks, which has been explained by transient chaos across layers [32].
Our tools facilitate the quantification of the reorganization of the collective network dynamics during learning and the underlying mechanisms of different computing strategies.A suppression of chaos by time-dependent input was studied previously, both with white noise input in discrete-time [8] and continuous-time networks [11] and with sinusoidal input [10].Such a transition has relevance for information processing because the network loses its dependence on initial conditions, which is expected to affect the ability of a network to generate controlled output trajectories in response to certain input patterns after learning [14][15][16].A transition to complete control by an external stimulus and concomitant independence of initial conditions was previously studied in rate networks in the context of echo state networks for reservoir computing and termed the echo state property [132][133][134][135].
An accumulation of Lyapunov exponents close to zero is consistent with previous theoretical work based on spectra of the state-to-state Jacobian D (analogous to Eq. 2 in our case) which suggested an accumulation of eigenvalues of the Jacobian close to 1 for closed forget gates [36,37].

C. Outlook
We are only beginning to use ergodic theory to understand neural computation.By employing these concepts in large-scale rate networks, we have laid a foundation for further investigation.Computational ergodic theory of firing-rate networks is currently the only way to measure information-theoretic quantities in large recurrent circuits.It is an important challenge to obtain a more comprehensive understanding of how different factors shape collective network dynamics.
The link between firing-rate networks and spiking neural networks has been studied by investigating networks in the limit of very slow synaptic dynamics.In this limit, the synaptic input current integrates over a long time, and the network dynamics is analogous to a rate network [7] with quantitatively similar activity fluctuations.An interpolation from spiking to rate dynamics with increasing τ s and a comparison of the associated Lyapunov spectra of rate and spiking networks might improve our understanding of chaos both in spiking and rate networks.
Collective network dynamics is expected to be strongly shaped by wiring, and learning algorithms operate by modifying connectivity.Investigating how features of connectivity shape the dynamics is therefore important and could also be investigated with these tools.The role of an excess of bidirectional connections [136], other second-order motifs [137] and strong self-coupling [5] could all be examined.A time-resolved analysis of dynamic stability of the recurrent network dynamics using covariant Lyapunov vectors and local Lyapunov exponents can also help to understand the mechanisms of learning, and under what conditions the training of recurrent networks fails.
average rate of exponential divergence or convergence of nearby initial conditions: In dynamical systems that are ergodic on the attractor, the Lyapunov exponents do not depend on the initial conditions, as long as the initial conditions are in the basins of attraction of the attractor.Note that it is crucial to first take the limit → 0 and then t → ∞, as λ max (x 0 ) would be trivially zero for a bounded attractor if the limits are exchanged, as lim t→∞ log || ut|| || u0|| is bounded for finite perturbations even if the system is chaotic.To measure m Lyapunov exponents, one has to study the evolution of m independent infinitesimal perturbations u s spanning the tangent space: where the N × N Jacobian D s (x s ) = df (x s )/dx characterizes the evolution of generic infinitesimal perturbations during one step.Note that this Jacobian along the trajectory is equivalent to a stability matrix only at a fixed point, i.e., when We are interested in the asymptotic behavior, and therefore we study the long-term Jacobian Note that T t (x 0 ) is a product of generally noncommuting matrices.The Lyapunov exponents λ 1 ≥ λ 2 • • • ≥ λ N are defined as the logarithms of the eigenvalues of the Oseledets matrix where denotes the transpose operation.The expression inside the brackets is the Gram matrix of the longterm Jacobian T t (x 0 ).Geometrically, the determinant of the Gram matrix is the squared volume of the parallelotope spanned by the columns of T t (x 0 ).Thus, the exponential volume growth rate is given by the sum of the logarithms of its first m (sorted) eigenvalues.Oseledets' multiplicative ergodic theorem guarantees the existence of the Oseledets matrix Λ(x 0 ) for almost all initial conditions x 0 [75].In ergodic systems, the Lyapunov exponents λ i do not depend on the initial condition x 0 .However, for a numerical calculation of the Lyapunov spectrum, Eq.A4 cannot be used directly because the long-term Jacobian T t (x 0 ) quickly becomes illconditioned, i.e., the ratio between its largest and smallest singular value diverges exponentially with time.For calculating the first m Lyapunov exponents, we exploit the fact that the growth rate of an m-dimensional infinitesimal volume element is given by λ [60].The volume growth rates can be obtained via QR-decomposition.First, one needs to evolve an orthonormal basis Q s = [q 1 s , q 2 s , . . .q m s ] in time using the Jacobian D s : Second, the volume growth rates are obtained by applying a QR-decomposition As a result of this, the non-orthonormal matrix Q s+1 is uniquely decomposed into an orthonormal matrix Q s+1 of size N ×m so Q s+1 Q s+1 = 1 m×m and to an upper triangular matrix R s+1 of size m×m with positive diagonal elements.
Geometrically, Q s+1 describes the rotation of Q s caused by D s and the diagonal entries of R s+1 describe the stretching or shrinking of Q s , while the off-diagonal elements describe the shearing.Fig. 16 visualizes D s and the QR-decomposition for m = 2.The Lyapunov exponents are given by time-averaged logarithms of the diagonal elements of R s : Note that the QR-decomposition does not need to be performed in every simulation step, just sufficiently often that [60].An appropriate reorthonormalization interval w ONS = t ONS /∆t thus depends on the condition number, the ratio of the smallest and largest singular value: Therefore, the condition number can be estimated based on the ratio of the largest and smallest Lyapunov exponent that is calculated: Thus, an appropriate reorthonormalization interval is given by t ONS = O (log(κ 2 )/(λ 1 − λ m )), where κ2 is some acceptable condition number.The acceptable condition number depends on the desired accuracy of the entries of R s+w .As the dynamical system first has to converge onto the attractor and the initially random orthonormal basis is not aligned, i.e., the first vector does not point in the direction of the first covariant Lyapunov vector, and s , q 2 s , . . .q m s ], whose columns are the axes of an m-dimensional cube, is rotated and distorted by the Jacobian Ds into an m-dimensional parallelotope Qs+1 = DsQs embedded in R N .The figure illustrates this for m = 2, in which case the columns of Qs+1 span a parallelogram, which can be divided into a right triangle and a trapezoid and rearranged into a rectangle.Thus, the area of the gray parallelogram is the same as that of the orange rectangle.The QR-decomposition reorthonormalizes Qs+1 by decomposing it into the product of an orthonormal matrix Qs+1 = [q 1 s+1 , q 2 s+1 , . . .q m s+1 ] and the upper-triangular matrix R s+1 .Qs+1 describes the rotation of Qs caused by Ds.The diagonal entries of R s+1 gives the stretching/shrinking along the columns of Qs+1, thus the volume of the parallelotope formed by the first m columns of Qs+1 is given by Vm = m i=1 R s+1 ii .The time-averaged logarithms of the diagonal elements of R s give the Lyapunov spectrum: λi = limt sim →∞ 1 so on, an initial transient should be discarded.It is guaranteed that under general conditions initially random orthonormal systems will exponentially converge towards a unique basis that is given by the eigenvectors of the Oseledets matrix Eq.A4 [62].A minimal example of this algorithm in pseudocode is shown in the main text (see IV).A feasible strategy to determine t ONS is to get first a rough estimate of the Lyapunov spectrum using a short simulation time t sim and a small t ONS and repeat with a longer simulation time and a t ONS based on the Lyapunov spectrum of the rough estimate of the Lyapunov spectrum.Another strategy is, to first iteratively adapt t ONS on a short simulation run to get a condition number that is acceptable.

Appendix C: Convergence of the Lyapunov spectrum
We checked the convergence of the Lyapunov spectrum as a function of different simulation parameters.First, the Lyapunov exponents were checked to converge with simulation time t sim (Fig. 17). Figure 10 shows the temporal convergence of selected Lyapunov exponents for ten random network realizations for different values of g and σ.The Lyapunov spectra were independent of initial conditions but showed some variability across different realizations of the random network structure.There are two main contributions to the variability of numerically calculated Lyapunov spectra, finite-time sampling noise and quenched fluctuations.Indeed, Lyapunov exponents are asymptotic properties numerically estimated from finite time calculations.Variability also arises from the quenched disorder in different random network realization.The first contribution would vanish in the limit of long simulations for ergodic systems.The second contribution is expected to vanish in the large network limit due to self-averaging.Quantities that are self-averaging converge in the limit of large system size to the ensemble average.Second, we confirmed that the orthonormalization interval was chosen sufficiently small (Fig. 18A).If the reorthonormalization is not carried out sufficiently often, the long-term Jacobian T t (x 0 ) becomes ill-conditioned.As a consequence, the orthonormalization becomes numerically unstable, and errors start to accumulate.This results in a flattening of the Lyapunov spectrum beginning at small Lyapunov exponents (Fig. 18A, D).As described above, a suitable orthonormalization interval inversely scales with the difference between smallest and largest Lyapunov exponent that is calculated |λ max −λ k |.Therefore, it is no surprise that for large ∆t, the errors in the Lyapunov spectrum grow faster with t ONS (Fig. 18C,  D), because the difference |λ 1 − λ k | is larger (Fig. 18A).Third, we checked convergence with the integration time step ∆t (Fig. 6A).For large g, the integration time step ∆t has to be chosen smaller, because the autocorrelation of the Jacobians become very short (τ AC τ ), although the autocorrelation of the dynamics variables h i stays finite even for g → ∞ [1,59].Fourth, we confirmed the convergence of the shape of the Lyapunov spectrum for large network size N (Fig. 2B).Note that even for very small ∆t, there exists a small asymmetry in the Lyapunov spectrum because of the neutral Lyapunov exponent (λ i = 0).Thus, the Lyapunov spectrum is only symmetric in the limits N → ∞ and ∆t → 0. Fifth, we confirmed numerically that the neutral Lyapunov exponent (λ i = 0) associated to a perturbation in the direction of the flow converges towards zero in the limit of small ∆t (not shown).Sixth, we confirmed numerically that the Lyapunov spectrum does not depend on the realization of the initially random orthonormal system.Dependence in the realization of the orthonormal system would indicate that the ONS did not converge to the eigenvectors of the Oseledets matrix Eq.A4 [62] (not shown).
Seventh, for large N , the numerical estimate of the largest Lyapunov exponent can be compared to one calculated analytically using dynamic mean-field theory [1,2,8,11,59] (Fig. 19A,C).
Eighth, we confirmed that the Lyapunov spectrum does not systematically change when increasing the floatingpoint precision by using arbitrary precision floating point arithmetic in spot checks (not shown).Entropy rate Chaos of a dynamical system is always associated with a dynamical entropy rate because nearby states, which could not be distinguished by a finite precision readout, are pulled apart by the sensitive dependence on initial conditions [41].This concept was formalized by Kolmogorov and Sinai in 1959 and termed metric entropy (also called Kolmogorov-Sinai entropy or dynamical entropy rate) [40,42,51,58,145].Ruelle showed that the sum of the positive Lyapunov exponents gives an upper bound to the Kolmogorov-Sinai entropy [146]: Equality holds if and only if the system is endowed with an SRB (Sinai-Ruelle-Bowen) measure (Pesin entropy formula) [147].An f -invariant Borel probability measure µ is an SRB measure if the conditional probability of µ on smooth manifolds is absolutely continuous [148].finvariantmeans here that µ f −1 (µ) = µ(A).The Pesin entropy formula thus says that uncertainty in the prediction of future states comes from positive Lyapunov exponents, or more precisely from the expanding manifolds with smooth densities [42].In several classes of dynamical systems, the existence of an SRB measure was proved [149].The angles between unstable and stable manifolds can be used to test numerically whether a system is hyperbolic.If a dynamical system is hyperbolic, there is always a finite angle between stable and unstable manifolds.In this case, the existence of an SRB measure is guaranteed [51,[150][151][152].
Attractor dimensionality The trajectory of a dissipative chaotic system with N degrees of freedom does not cover the whole phase space.After a transient period, it relaxes onto an attractor, which has a dimensionality D ≤ N .This can be a zero-dimensional fixed point, a one-dimensional periodic orbit, a higher-dimensional quasi-periodic orbit, or a strange attractor with typically non-integer dimensionality in case of a chaotic system.Such a strange attractor is often a fractal set and one classical approach to measuring its dimensionality is box counting.The idea is to count the number M of Ndimensional boxes of side length a that are necessary to cover the attractor.The box-counting dimension is then defined as D = − lim a→0 log(M (a)) log(a) .For increasing dimension, one runs into the curse of dimensionality, because the data necessary for the box counting scales exponentially with the dimensionality.A more generalized concept of dimensionality of fractals is given by the Rényi dimension (also called generalized dimension) [54].The Rényi dimension of order α is given by For α = 0 the capacity dimension (box-counting dimension) is obtained.α = 1 gives the information dimension and α = 2 the correlation dimension [52].Besides boxcounting, there exist other sampling-based techniques to obtain entropies and dimensionalities directly from data, e.g., the Grassberger-Procaccia algorithm [52,53], which estimates the correlation dimension D 2 .Similar to the case of box counting, a strict lower bound on the data required to estimate the attractor dimensionality with a fixed desired accuracy scales exponentially in the degrees of freedom D [55,56].It is well understood in nonlinear dynamics that such direct approaches of measuring dimensionality are inappropriate for high-dimensional dynamical systems [153].A more tractable way to quantify the attractor dimension and thus the number of degrees of freedom of a strange chaotic attractor can be obtained based on the Lyapunov spectrum if the equations of motion of the dynamical system are known and differentiable.The attractor dimension is then given by the interpolated number of Lyapunov exponents that sum to zero: The attractor dimension has been conjectured to be 'in general' equivalent to the information dimension D 1 [51,[154][155][156].While there exists no proof in general, it has been proven for several low-dimensional systems [157,158] and for other systems supporting numerical evidence has been found [159].The following bound on the capacity dimension has been proven: D 0 ≤ D KY [157,160].Intuitively, the attractor dimension is the dimensionality of the highest dimensional infinitesimal hypersphere, whose volume does not shrink nor grow by the chaotic dynamics.In other words, on the attractor, growth along unstable manifolds is being compensated by shrinking along the stable manifolds and any D-dimensional hypersphere is merely deformed and the volume is preserved on average.Remembering the inequalities h KS ≤ h and D 0 ≤ D KY , we will call h = λi>0 λ i the entropy rate and D = D KY the attractor dimension throughout this paper.
Appendix F: Random Dynamical Systems and trial-to-trial variability The extension of concepts of the ergodic theory of dynamical systems to input-driven systems was done in the theory of Random Dynamical Systems [161].This can be useful for neuroscience to better understand trialto-trial variability, controllability and input-driven chaos (see e.g., [46,162]).Consider a stochastic differential equation of the form: where dW i t are independent Brownian motions.An associated stochastic flow map is a solution for the dynamics, i.e.F t1, t2;ζ (x t1 ) = x t2 maps the state x from t 1 to t 2 , where ζ denotes the realization of the stochasticity.Instead of studying the temporal evolution of some initial measure µ, where each initial condition receives "private" noise, as it is usually done in a Fokker-Planck approach, the theory of random dynamical systems studies the evolution of a sample measure µ t ζ , defined as Two theorems for random dynamical systems link sample measure µ t ζ and Lyapunov spectrum in chaotic and stable systems, respectively.First, Ledrappier and Young proved that if λ 1 > 0, then µ t ζ is a random SRB (Sinai-Ruelle-Bowen) measure [163].As a consequence, in contrast to autonomous systems, for random dynamical systems, the Pesin identity H = λi>0 λ i is guaranteed to hold.Note that in contrast to SRB measures of autonomous systems, random SRB measures are timedependent.However, they have a similar meaning: systems with SRB measure have smooth conditional measures along the unstable manifolds.In addition, Baxendale and Le Jan showed that if λ 1 < 0 and the stationary measure is ergodic and some nondegeneracy conditions on the measure are fulfilled [107], then µ t ζ is a random sink, which means µ t ζ (x) = δ(x−x t ), where x t is a solution of the stochastic dynamics for a given noise realization ζ [107,108].This means that any trajectory of a stable rate network driven by white noise will after finite time be absorbed into one single trajectory, which is independent of the initial condition but depends only on the noise realization.Equally, any smooth initial measure will asymptotically coalesce into a timedependent random sink.Note that the theorems by Baxendale and Le Jan do not say when the globally attracting random sink will be reached, which means that for very long transients, its asymptotic existence might have no practical relevance on biologically relevant timescales [42].

Appendix G: Principal component-based dimensionality estimate
We compared the attractor dimension to a principal component-based dimensionality estimate.Principal component analysis (PCA) has been widely used as a dimensionality reduction technique both in experimental and theoretical neuroscience [9,10,65,66].For a given data set, PCA provides the succeeding orthogonal directions that account for most of the variance in the data and the associated fraction of variance explained.Mathematically, PCA is given by the eigenvalue decomposition of the covariance matrix.The number of principal components necessary to account for the majority of the total variance gives an estimate of the number of degrees of freedom of the underlying dynamics.If a few principal components explain most of the variance, the dynamics is mostly constrained to a hyperellipsoid with few long axes.If many principal components are necessary, no such localized structures in the second-order statistics of the collective dynamics are detected.To avoid choosing an arbitrary threshold of variance (e.g., 95 %), one can use a participation ratio, commonly used in physics to quantify, e.g., localization of collective activity modes [164], Anderson localization of waves in a disordered medium [165] or localized Lyapunov vectors [89,166].We calculated PCA-based dimensionality estimates both based on the covariance of the total synaptic currents h i and of the rates φ i = tanh(h i ).For instance, for h i , we compute the covariance matrix C h ij : A PCA-based dimensionality estimate is then given by the participation ratio where µ h n is the n th eigenvalue of the covariance matrix C h ij .If all eigenvalues contribute equally (i.e.PCA was calculated the same way, but for the covariance matrix C tanh h of the firing rates.to the weights of the recurrent network has to be evaluated.This is done by unrolling the network dynamics in time [33]: where D τ is the Jacobian Eq. 2 that we already considered when calculating the Lyapunov spectrum.The recursive dependence of the gradient on the previous network state results in a product of Jacobians, which takes the form of the long-term Jacobian T t (h) (Eq.A3) whose inner product gives the Oseledets matrix (Eq.A4).
The singular values of the long-term Jacobian T t (h τ ), which determine how quickly gradients vanish or explode during backpropagation through time, are directly related to the Lyapunov exponents of the forward dynamics: The Lyapunov exponents of the forward dynamics are given by the logarithm of the singular values of the long-term Jacobian [61].Thus, our results on how the global coupling strength g, simulation parameters (e.g., time-discretization ∆t), time-dependent input, and nonlinearity φ (e.g., threshold-linear vs. tanh) shape the Lyapunov spectrum can directly be translated into predictions on the gradient instability during backpropagation through time.As pointed out previously [35,36,38], the trainability of recurrent networks is constrained by the condition number κ of the long-term Jacobian T t (h τ ).The condition number can be approximated by the Lyapunov spectrum: κ 2 (T t (h τ )) = σ1(Tt(hτ )) σ N (Tt(hτ )) ≈ (t−τ ) exp (λ 1 − λ N ), where t−τ is the time to be bridged by backpropagation through time.
Appendix K: Lyapunov spectra for huge coupling g spectrum For very large values of g, we observe that both attractor dimensionality D and dynamical entropy rate H peak as a function of g (Fig. 21).For increasing network size N , this peak did not vanish or shift, indicating that it is not merely an finite-size effect.The peak in both H and D can be explained by a growing fraction of rate units in saturation and consequently, increasingly sparse Jacobian D ij (t s ), thus fewer active units at each moment and therefor fewer unstable space directions as indicated by the decreasing number of positive Lyapunov exponents.Our seemingly contradictory claim of decreasing dynamical entropy rate H despite growing largest Lyapunov exponent for large g are consistent for large N and finite g, where relative contribution of the largest Lyapunov exponent to the sum of the positive Lyapunov exponents vanishes.We note that the analytical argument for point-symmetry in the Lyapunov spectrum around i = N/2 and λ i = − 1 τ together with the fact that D is always bounded by N already imply that H has to be bounded: H < D < N , as where p is the number of positive Lyapunov exponents.
For notational simplicity, we assumed here an integer dimensionality D, but the argument holds generally.S1 Code Source code for Lyapunov spectrum of rate networks.We provide all necessary code to calculate the full Lyapunov spectrum written in Julia [144].The efficient implementation is parallelized using level-3 matrix-matrix operations from BLAS (Basic Linear Algebra Subprograms) called via LAPACK (Linear Algebra PACKage).The code also provides an alternative estimate of the largest Lyapunov exponents by tracking the evolution of a small but finite initial perturbation and resizing it iteratively [51].Furthermore, the program provides bootstrapped 95 percentile confidence intervals for the first and the last Lyapunov exponent, the Kolmogorov-Sinai entropy rate, and the attractor dimensionality.Optionally, a principal component-based dimensionality estimate can also be calculated.Finally, the program provides the convergence of the Lyapunov spectrum in time.Input variables are network size N , coupling strength g, time-discretization ∆t, simulation time t sim , number of Lyapunov exponents to be calculated, nLE, orthonormalization time interval t ONS , seed for initial conditions seed IC , seed for random network realization seed net , seed for orthonormal system seed ONS and finally the subdirectory where the results are stored.Code written in MATLAB R /Octave/Python is available on github.S2 Code Source code for Lyapunov spectrum of input-driven rate networks.We also provide Julia code to obtain the full Lyapunov spectrum of a noisedriven rate network by a reorthonormalization procedure [60].This is done along a numerical solution of the stochastic differential equation obtained with the Euler-Maruyama method [138].The noise strength σ is now an additional input parameter.Code written in MATLAB R /Octave/Python is available on github.

Figure 1 .
Figure 1.Transition to chaos for sufficiently strong coupling g in rate networks.A Linear stability of rate dynamics near the zero fixed point.Real vs imaginary part of eigenvalues λi of the stability matrix for g = 0.99.B For subcritical couplings (g = 0.99) the trivial fixed point of the system hi = 0 is the only stable solution.C In large networks the trivial fixed point loses stability at gcrit = 1 and chaos emerges from the nonlinear interaction of rate units where the spectral radius crosses unity (gray dotted line).D Rate chaos for g = 1.2 (other parameters: network size N = 1000, integration step ∆t = 10 −3 τ ).

Figure 2 .
Figure 2. Extensive chaos revealed by the sizeinvariance of the Lyapunov spectrum A Full Lyapunov spectra for different network sizes N are on top of each other, indicating an identical shape.The Lyapunov spectrum is point-symmetric around the mean Lyapunov exponent λ = −1/τ (See analytical derivation in Appendices VII and H).B The largest Lyapunov exponent quickly saturates with network size.C The Kolmogorov-Sinai entropy rate H grows linearly with N as shown over two orders of magnitude.D The same holds for the attractor dimensionality D (other parameters: g = 10, ∆t = 0.1τ , tONS = τ , tsim = 10 3 τ ).

Figure 3 .
Figure 3. Entropy rate and attractor dimensionality of firing-rate network dynamics.A Full Lyapunov spectra of rate networks with different coupling strengths g, colorcoded from blue (small g) to red (large g).B The largest Lyapunov exponent shows the theoretically predicted linear growth for g < 1 and first quadratic and then logarithmic growth for g 1 as a function of g [1].(Green dots: direct numerical simulations, black line: Jacobian-based method) C The dynamical entropy rate H grows with g but is bounded (See Appendix K).D Relative attractor dimensionality D/N peaks at D/N < 10%.(Averages over 20 network realizations in black, red error bars indicate double std across 20 network realizations, parameters: N = 1000, ∆t = 10 −2 τ , tsim = 10 4 τ , tONS = τ ).

Figure 4 .
Figure 4. PCA and attractor dimensions of networks with tanh transfer function A Principal Component Analysis (PCA)-based dimensionality estimate (green) and attractor dimension based on the Lyapunov spectrum (black) for different values of synaptic strength g.Both PCA dimensions saturate for g 1 but they saturate at different levels and with distinct exponential rates (error bars are double std across 20 network realizations).PCA dimension estimate of dynamics depends on whether tanh(hi) or hi is considered.B Both PCA-based dimensionality estimates seem to be extensive, as indicated by the approximately linear growth with N (other parameters: N = 1000, g = 10, ∆t = 0.1τ , tONS = τ , tsim = 10 4 τ ).

Figure 5 .
Figure 5. Lyapunov spectra of discrete-time networks.A Full Lyapunov spectra of discrete-time rate networks with different coupling strengths g, color-coded from blue (small g) to red (large g).B The largest Lyapunov exponent grows as expected monotonically as a function of g [8].C The dynamical entropy rate H peaks with coupling strength g.D Relative attractor dimensionality D/N also peaks.(Averages over 10 network realizations in black, red error bars indicate double std across 10 network realizations, parameters: N = 1000, ∆t = τ , tsim = 10 5 τ , tONS = τ ).

Figure 6 .
Figure 6.Full Lyapunov spectrum for different timediscretization ∆t.A The full Lyapunov spectrum reveals drastic changes for increasing ∆t.For finite ∆t, the Lyapunov spectrum loses its symmetry (See also Fig.7and compare with Fig.3).While the majority of Lyapunov exponents decrease for increasing ∆t, the number of positive exponents increases.B The largest Lyapunov exponent converges for small ∆t.For increasing ∆t, it first decreases and then increases moderately.C The dynamical entropy rate converges for small ∆t and increases for large ∆t.D The attractor dimensionality behaves similar to the dynamical entropy rate, (other parameters: N = 1000, g = 10, tONS = τ , tsim = 10 4 τ ; averages across 10 network realizations).

Figure 8 .
Figure 8. Distribution of Jacobian factors φ(h) .Colored dots are from direct numerical simulations, and grey dashed lines analytical distributions (Eq.V) for different values of variance ∆0 of the local fields h.For small ∆0, most probability mass is close to 1.For large ∆0, the distribution becomes bimodal, because most units are in saturation of the nonlinearity with corresponding y close to zero and few y near 1.

DFigure 10 .
Figure 10.Entropy rate and dimensionality of a balanced firing-rate network with threshold-linear transfer function.A Top 200 Lyapunov exponents of rate networks for different coupling strengths g, where g is color-coded from blue (small g) to red (large g).B The largest Lyapunov exponent grows monotonically as a function of g below the divergence.(Green dots: direct numerical simulations, black line: Jacobian-based method) C The dynamical entropy rate H peaks with coupling g.D Relative attractor dimensionality D/N has a peak as a function of g. (black curves are averages over 20 network realizations, red error bars indicate double std across 20 network realizations, parameters: N = 4000, ḡ = 300, I = 300, tsim = 10 3 τ , tONS = τ ).

Figure 11 .
Figure 11.Spatiotemporal analysis of network chaos and localization of first covariant Lyapunov vector A First covariant Lyapunov vector v(t) (gray-scale of |vi(t)| of a subset of 100 random directions) B Average participation ratio P vs. g stays constant for tanh network and for balanced threshold-linear network, consistent with prediction of P = N/3.C Participation ratio P (t) of first covariant Lyapunov vector (CLV) and corresponding local Lyapunov exponent λ local max (t) D Average participation ratio P vs. network size N indicates delocalized CLV P = N/3.(parameters: N = 1000, g = 2, ∆t = 10 −2 τ , tONS = τ , tsim = 10 3 τ , averages across 10 network realizations).

Figure 12 .
Figure 12.Time-varying stimuli reduce both the dynamical entropy rate and attractor dimensionality.A For increasing input strength σ, the Lyapunov spectrum is increasingly pushed towards the mean Lyapunov exponent −1/τ .B The largest Lyapunov exponent decreases, and the transition is smoothed, consistently with previous work[11].C The dynamical entropy rate H is reduced.D The relative attractor dimensionality D/N decreases for increasing σ. (Parameters: N = 1000, ∆t = 10 −2 τ , tONS = τ , tsim = 10 3 τ , averages across 10 network realizations).

Figure 13 .
Figure 13.Reorganization of rate network phase space during learning.A Local Lyapunov exponent (λ local i (t)), output z(t) = w φ(x(t)), and activity of example rate units φi(t) before learning in the chaotic state.B Chaotic network activity φ(t) before learning projected on the first two principal components.C Same as A after learning a periodic task using FORCE [15].D Same as B after training.(Parameters: N = 200, g = 1.5, ∆t = 10 −1 τ , tONS = 10 −1 τ , τ = 10 −2 s)

Figure 14 .
Figure 14.Quantification of dynamic stability after training rate networks on task using FORCE, full-FORCE, and backpropagation through time A Example output z(t) = w φ(x(t)) of a network of 500 units trained with FORCE (blue), full-FORCE (orange), and backpropagation through time (green).B Lyapunov exponents calculated at the end of the training.We find that the full-rank method full-FORCE results in a more negative largest Lyapunov exponent and thus a microscopically more stable dynamics.Moreover, subsequent Lyapunov exponents drop quicker towards the negative inverse of the characteristic timescale −1/τ .(other parameters: N = 500, g = 1.5, ∆t = 0.1τ , tONS = τ , tsim = 10 4 τ , σ = 0 averages across 10 network realizations).

Figure 15 .
Figure 15.Lyapunov spectrum, dynamical entropy rate and dimensionality of recurrent LSTM network A For increasing bias current in the forget gate b f , the first half of the Lyapunov spectrum is increasingly pushed towards zero, concomitantly, the autocorrelation of the latent state indicates the emergence of slow latent modes.B The largest Lyapunov exponent decreases for increasing b f .C The dynamical entropy rate H is also reduced for growing b f .D In contrast, the relative attractor dimensionality D/2N even increases with b f , despite decreasing λmax.(Parameters: N = 200, ∆t = τ , tONS = τ , tsim = 10 4 τ , g f = go = gi = gc = 3.0, bo = bi = bc = 0.0, averages across 10 network realizations).
Appendix B: Algorithm for calculating Lyapunov spectrum of rate networks

2 Figure 16 .
Figure 16.Geometric illustration of Lyapunov spectrum calculation.An orthonormal matrix Qs = [q 1 s , q 2 s , . . .q m s ], whose columns are the axes of an m-dimensional cube, is rotated and distorted by the Jacobian Ds into an m-dimensional parallelotope Qs+1 = DsQs embedded in R N .The figure illustrates this for m = 2, in which case the columns of Qs+1 span a parallelogram, which can be divided into a right triangle and a trapezoid and rearranged into a rectangle.Thus, the area of the gray parallelogram is the same as that of the orange rectangle.The QR-decomposition reorthonormalizes Qs+1 by decomposing it into the product of an orthonormal matrix Qs+1 = [q 1 s+1 , q 2 s+1 , . . .q m s+1 ] and the upper-triangular matrix R s+1 .Qs+1 describes the rotation of Qs caused by Ds.The diagonal entries of R s+1 gives the stretching/shrinking along the columns of Qs+1, thus the volume of the parallelotope formed by the first m columns of Qs+1 is given by Vm = m i=1 R s+1 ii .The time-averaged logarithms of the diagonal elements of R s give the Lyapunov spectrum: λi = limt sim →∞

Figure 18 .
Figure 18.Convergence of Lyapunov spectrum with reorthonormalization interval tONS.If the reorthonormalization is not performed sufficiently often, the Lyapunov spectrum is flattening from the end for large tONS.A Lyapunov spectra for tONS ∈ {0.1, 0.2, 0.3, 0.5, 1, 2, 5, 10, 20, 50, 100}τ for ∆t = 0.01.B ∆λmax shows the deviation of the largest Lyapunov exponent for different tONS from the smallest tONS = 0.1τ .The same is shown for H and D. For our typical parameter sets, an orthonormalization interval of tONS = 1τ is sufficient to keep errors in H and D orders of magnitudes smaller than the deviations across network realizations due to quenched fluctuations.C Same as A for ∆t = 1.D Deviations of full Lyapunov spectra for different tONS from the smallest tONS = 0.1 for ∆t = 0.01.(Other parameters: N = 1000, ∆t = 0.01τ , g = 10, tsim = 10 4 τ , averages across 10 network realizations).

Figure 19 .
Figure 19.Finite-size effect on the largest Lyapunov exponent and transition to chaos A Largest Lyapunov exponent across 100 network realizations as a function of network size N for g ∈ {2, 5, 10}.Dots indicate individual realizations, full line are median, dotted curves are 20% and 80 % percentile, dashed lines are the prediction obtained from dynamic mean-field theory where g is color-coded from blue (small g) to red (large g).B Critical coupling strength gcrit as a function of network size N for 100 network realizations obtained by bisection method.Dots indicate individual realizations, full line are median, dotted curves are 20% and 80 % percentile, dashed line the analytical prediction.C Difference between mean-field theory prediction and median across 100 realizations as a function of network size.D Same data as B but gcrit − 1 depicted on log-scale (Other parameters: relative tolerance = 10 −10 , tsim = 10 4 τ , tONS = τ , median across 100 network realizations).

Figure 21 .
Figure 21.Peak in dynamical entropy rate and attractor dimensionality for large g A For increasing coupling strength g, the Lyapunov spectrum is increasingly bent upward with a decreasing fraction of positive Lyapunov exponents (N = 2000).B The largest Lyapunov exponent grows monotonically for increasing values of g as predicted analytically (Fig.20B).For very large g, the largest Lyapunov exponents flattens (black lines, N = 2000, orange lines, N = 3000).Increasing N reduces the flattening, indicating a finite N effect on λmax.C The dynamical entropy rate H peaks as a function of g.D The relative attractor dimensionality D/N also peaks as function of g.Both positions of the peak in D and H do not shift with N indicating that the peak is not a finite N effect.(Parameters: relative tolerance = 10 −10 , tONS = τ , tsim = 10 3 τ , averages across 3 network realizations).
s, t;ζ ) * µ where the propagator (F −s, t;ζ ) * transports the initial measure µ for some fixed white noise realization ζ(t) defined for all t ∈ (−∞, ∞) along the flow F −s, t;ζ .In other words, the sample measure µ t ζ is the conditional measure at time t given the infinite past history of ζ(t).Note that in general, while µ t ζ depends both on time t and the noise realization ζ, it posses invariant properties, characterizing its structure.For example, the Lyapunov exponents N are independent of the input realization ζ [106].