Dynamics of random recurrent networks with correlated low-rank structure

A given neural network in the brain is involved in many different tasks. This implies that, when considering a specific task, the network's connectivity contains a component which is related to the task and another component which can be considered random. Understanding the interplay between the structured and random components, and their effect on network dynamics and functionality is an important open question. Recent studies addressed the co-existence of random and structured connectivity, but considered the two parts to be uncorrelated. This constraint limits the dynamics and leaves the random connectivity non-functional. Algorithms that train networks to perform specific tasks typically generate correlations between structure and random connectivity. Here we study nonlinear networks with correlated structured and random components, assuming the structure to have a low rank. We develop an analytic framework to establish the precise effect of the correlations on the eigenvalue spectrum of the joint connectivity. We find that the spectrum consists of a bulk and multiple outliers, whose location is predicted by our theory. Using mean-field theory, we show that these outliers directly determine both the fixed points of the system and their stability. Taken together, our analysis elucidates how correlations allow structured and random connectivity to synergistically extend the range of computations available to networks.


I. INTRODUCTION
One of the central paradigms of neuroscience is that computational function determines connectivity structure: if a neural network is involved in a given task, its connectivity must be related to this task. However, a given circuit's connectivity also depends on development and the learning of a multitude of tasks [1,2]. Accordingly, connectivity has often been depicted as containing a sum of random and structured components [3][4][5][6][7]. Given that structure emerges through adaptive processes on top of existing random connectivity, one would intuitively expect correlations between the two components. Nevertheless, the functional effects of the interplay between the random and the structured components have not been fully elucidated.
Networks designed to solve specific tasks often use purely structured connectivity [8][9][10] that has been analytically dissected [11]. The dynamics of networks with purely random connectivity were also thoroughly explored, charting the transitions between chaotic and ordered activity regimes [12][13][14][15][16][17]. Adding uncorrelated random connectivity to a structured one was shown to generate the activity statistics originating from the random component while retaining the functional aspects of the structured one [4][5][6]18].
A specific setting in which correlations between random and structured components arise is the training of initially random networks to perform tasks. One class of training algorithms, reservoir computing, only modifies * email: omri.barak@gmail.com a feedback loop on top of the initial random connectivity [19][20][21]. These algorithms can be used to obtain a wide range of computations [22,23]. Recently, a specific instance of a network trained to exhibit multiple fixed points was analytically examined [3]. It was shown that the dependence between the feedback loop and the initial connectivity is essential to obtain the desired functionality, but the explicit form of the correlations and the manner in which they determine functionality remained elusive.
Thus there is no general theory linking the correlations between random and structured components to network dynamics. Here we address this issue by examining the nonlinear dynamics of networks with such correlations. Because the dynamics of nonlinear systems vary between different areas of phase space, we focus on linearized dynamics around different fixed points. To facilitate the analysis, we consider low-rank structured components which were shown to allow a wide range of functionalities [4].
We develop a mean field theory that takes into account correlations between the random connectivity and the low-rank part. Our theory directly links these correlations to the spectrum of the connectivity matrix. We show how a correlated rank-one perturbation can lead to multiple spectral outliers and fixed points, a phenomenon that requires high-rank perturbations in the uncorrelated case [4]. We study dynamics around nontrivial fixed points, obtaining simple analytical expressions linking the spectrum, the fixed points and their stability. These expressions reveal a surprising universality: both the statistics and the stability of fixed points are determined entirely by the spectrum and are indepen-dent of the details of the connectivity. Taken together, we show how correlations between the low-rank structure and the random connectivity extend the computations of the joint network beyond the sum of its parts.

II. NETWORK MODEL
We examine the dynamics of recurrent neural networks with correlated random and structured components in their connectivity. The structured component P is a low-rank matrix and the random component J is a fullrank matrix. Network dynamics with such a connectivity structure have been analyzed for P being independent of the random connectivity [4]. The learning frameworks of echo state networks and FORCE also have such connectivity structure [20,21]. There, however, the structure P is trained such that the full network performs a desired computation, possibly correlating P to J.
For most of this study, we set the rank of P to one and write it as the outer product of the two structure vectors m and n. The matrix J and vector m are drawn independently from normal distributions, J ij ∼ N (0, g 2 /N ) and m i ∼ N (0, 1), where N is the network size and g controls the strength of the random part [12]. The second vector n is defined in terms of J and m. In this sense, n carries the correlation between J and P . This is in line with the echo state and FORCE models, where n corresponds to the readout vector which is trained and therefore becomes correlated to J and m.
In contrast to these models, however, we constrain the statistics of n to be Gaussian. This allows for an analytical treatment and thus for a transparent understanding of how the correlations affect the network dynamics. The details of the construction of n are described later on. At this point we merely state that the entries of n scale with the network size as 1/N . The structure P is hence considered as a perturbation to the random connectivity J whose entries scale as 1/ √ N . All our results are valid in the limit of infinitely large networks, N → ∞. Throughout the work, we compare the theoretical predictions with samples from finite networks.
The network dynamics are given by standard rate equations. Neurons are characterized by their internal states x i and interact with each other via firing rates φ(x i ). The nonlinear transformation from state to firing rate is taken to be the hyperbolic tangent, φ = tanh. The entire network dynamics are written aṡ with the state vector x ∈ R N and the nonlinearity applied element-wise. The derivation of our results, Appendix D, further includes a constant external input I. The results in the main text, however, only consider the autonomous network.

III. LINEAR DYNAMICS AROUND THE ORIGIN
The origin x = 0 is a fixed point, since φ(0) = 0. It is stable if the real parts of all the eigenvalues of the Jacobian are smaller than one. Since φ (0) = 1, the Jacobian is simply the connectivity matrix J + mn T itself. Here we examine the spectral properties of this matrix.

A. Eigenvalues
The spectrum of the Gaussian random matrix J converges to a uniform distribution on a disk with radius g and centered at the origin for N → ∞ [24]. Previous studies have explored the effect of independent low-rank perturbations like in our model [25,26]. They found that the limiting distribution of the remaining eigenvalues, referred to as the bulk, does not change. Additionally, the spectrum contains outliers corresponding to the eigenvalues of the low-rank perturbation itself. In this sense, the spectra of the random matrix J and the low-rank perturbation decouple (although the precise location of each eigenvalue is affected by the perturbation). To our knowledge, the effect of correlated low-rank perturbations, which we explore below, has not been considered before.
To determine the spectrum, we apply the matrix determinant lemma [27]: where A ∈ C N ×N is an invertible matrix. For a complexvalued λ outside the spectrum of J, the matrix J − 1λ is invertible, resulting in det (J + mn T ) − 1λ The roots of this equation are the eigenvalues of J +mn T , and are given by: For eigenvalues λ with absolute value larger than g, the matrix J/λ has a spectral radius smaller than 1. The inverse can thus be written as a series, with the overlaps and sums without specified limits running from k = 0 to infinity. This series representation is the main result of this section. It indicates that the overlaps θ k between m Uncorrelated Exponential Truncated after 2 and n after passing through J for k times determine the eigenvalues of the perturbed matrix. It is hence useful to characterize the correlations between J and the rank-one perturbation in terms of these overlaps. A random matrix J has the effect of decorrelating independent vectors: if the vectors m and n are uncorrelated to J, a single pass through the network already annihilates any overlap between n and Jm. In Appendix A, we formally show that we indeed have n T Jm = 0 in expectation, and deviations vanish as O(1/ √ N ). The same holds true for any of the higher order overlaps θ k with k ≥ 1. We can apply this to Eq. (6) and recover the well-known result [25,26]: In other words, an independent rank-one perturbation yields a single outlier positioned at the eigenvalue of the rank-one matrix itself [ Fig. 1(a)].
If mn T is correlated to J, the θ k will not vanish for nonzero k. We introduce such correlations by explicitly constructing n from J and m. For example, if we set then the overlaps will self-average to θ k =θ k for k = 0, 1 and θ k = 0 for any k ≥ 2. This is shown formally and generalized to higher θ k in Appendix A. We analyze two special cases of correlations: (i) For the case in Eq. (9), there are two outliers This can give rise to complex conjugate outliers, as displayed in Fig. 1(c). More generally, K nonzero overlaps lead to K outliers via a polynomial equation [Eq. (B1)].
(ii) A second case is one of a converging series in Eq. (6). The simplest assumption is an exponential scaling, θ k = θ 0 b k with base b. Inserting into the eigenvalue equation (6) yields a single solution Remarkably, we see that correlation between the random matrix J and the rank-one perturbation does not necessarily lead to more than one outlier. This is shown in Fig. 1(b). The observation generalizes to correlations expressible as a sum of K exponentially decaying terms, leading to K outliers [Eq. (B3)]. We can apply this understanding to construct a network with a set of outliers and either one of the underlying correlation structures. This is discussed in Appendix B and applied in our numerical simulations, such as Fig. 1.
The simulations further show that the remaining eigenvalues span the same circle as without the perturbation. While all eigenvalues change, visual inspection does not reveal any changes in the statistics.

B. Implementation of multiple outliers
So far we analyzed the outliers for given correlations between J and mn T as quantified by the overlaps θ k . We now change the perspective and ask about the properties of the rank-one perturbation given a set of outliers. We saw that in principle a given set of outliers may have multiple underlying correlation structures -e.g. through a truncated set of non-zero overlaps or a combination of exponentially decaying terms. Regardless of the correlation structure, however, we observe that the norm of n grows fast with the number of outliers introduced, implying that strong perturbations are needed to generate a large number of outliers.
To understand analytically the origin of this phenomenon, we focus on a method to determine the least square n given J, m and the set of target outliers Λ. The resulting n can be formulated using the pseudoinverse, as detailed in Appendix C. The main result of this analysis is the scaling of the Frobenius norm of the rank-one matrix mn T with the number of outliers. The asymptotic behavior is given by that is, exponentially growing with the number of outliers. In comparison, the Frobenius norm of J is given by ||J|| = g √ N . This means that if one aims to place more than a handful of outliers, the perturbation mn T becomes the dominating term (for a fixed network size N ). We illustrate this in Fig. 2 by plotting ||mn T || for sets of outliers Λ K = {λ 1 , . . . , λ K } with growing number K. The outliers λ k were placed on the real line. Further tests including complex eigenvalues gave similar results (not shown). Deriving n from the pseudoinverse has previously been described in Ref. [28].
The scaling (12) shows another important point: the bulk radius g critically determines the norm of the rankone perturbation. Indeed, the contribution of each outlier λ k is relative to the radius. Even for a single outlier, where an increase in g leads to decreasing norm. This observation suggests that a large random connectivity facilitates the control of the spectrum by a rank-one perturbation.

IV. NON-TRIVIAL FIXED POINTS
We now turn to the non-trivial fixed points of the network. At these, the internal states x obey the equation Here we defined the scalar feedback strength κ = n T φ, using the vector notation φ = φ(x). The fixed points of related models have been analyzed in previous works. For infinitely large networks, the unperturbed system (P = 0) has a single fixed point at the origin if g < 1 [15]. For g > 1, the system exhibits chaotic dynamics [12]. In this regime, the number of (unstable) fixed points scales exponentially with the network size N [15]. Here we only focus on networks in the non-chaotic regime, where either g < 1 or the perturbation P suppresses chaos [4].

A. Fixed point manifold
Following Rivkind and Barak [3], the perturbed system with fixed points (14) can be understood using a surrogate system in which the feedback κ is replaced with a fixed scalarκ. For g < 1, every such valueκ corresponds to a unique fixed point This equation defines the one-dimensional nonlinear manifold The manifold M can be understood by looking at the asymptotics. For large inputκ, the nonlinearity saturates and the manifold becomes approximately linear: with c = J sign(m). Around the origin, we linearize and obtainx with a = (1 − J) −1 m. Applying orthonormalization to the triplet (m, c, a), we obtain a three-dimensional basis. We observe that, for N → ∞, the vectors m and c are orthogonal and that the vector a becomes orthogonal to the other two in the limit g → 1. Accordingly, we name the coefficients of the basis (y m , y c , y a ). The projection of the manifold M on this basis is shown in Fig. 3(c) for three different values of g. Numerical evaluation of the reconstruction error shows that these three dimensions reconstruct the manifold very well albeit with decreasing accuracy for increasing g (not shown).
Fixed points of the full system are obtained by determining κ self-consistently. They necessarily lie on the manifold M. One consequence is a strong correlation between pairs of fixed points, especially if both lie close to the origin or in the saturating regime. In Fig. 3(b-d), we numerically evaluate this correlation for three different randomness strengths g. On can observe that for g ≤ 0.5, correlation does not drop below 90%. Even for g = 0.9, the correlation is low only if one fixed point is very close to the origin and the other one is far out.
So far we only considered the case g < 1. For g > 1, there is a minimalκ min for which the dynamics are stabilized and a unique stable fixed point emerges [4]. Here, the manifold M is unconnected and now reads Finally we note that the constraints of a onedimensional manifold are general and do not depend on the details of the vector n, especially not on its Gaussian statistics. This is particularly important for learning algorithms like the echo state framework or FORCE, which by construction only allow for the adaptation of the vector n [20,21]. Accordingly, fixed points in these cases are also strongly correlated, which may lead to catastrophic forgetting when trying to learn multiple fixed points sequentially [29].

B. Mean field theory
For non-trivial fixed points of the full network, Eq. (14), the scalar feedback κ needs to be consistent with the firing rates φ(x). Similar to prior works, we compute κ using a mean field theory [4]. The central idea of the mean field theory is to replace the input to each variable x i by a stochastic variable with statistics matching the original system. The statistics of the resulting stochastic processes x i are then computed self-consistently.
Because our model includes correlations between the random part J and the low-rank structure P , the correlations in the activity do not vanish as dynamics unfold. This phenomenon prevents the application of previous theories [4]. We hence develop a new theory. The details are elaborated in Appendix D. Here we give an outline of the analysis.
The starting point is the scalar feedback κ. The Gaussian statistics of n and the fixed point x allow to factor out the effect of the nonlinearity via partial integration. We have with the average slope φ evaluated at the fixed point [Eq. (D6)]. Inserting the fixed point equation (14) yields The expression above the brace vanished in previous studies with no correlation between P and J [4]. The term below the brace results from interpreting J T n as a Gaussian vector and applying the partial integration as before. We can apply this scheme recursively and arrive at an equation linear in κ on both sides: We are looking at a non-trivial fixed point, so we can divide by the nonzero κ to obtain A comparison with Eq. (6) shows that the two equations are identical if This is a remarkable relationship between the outliers and autonomously generated fixed points: each non-trivial fixed point x (i) must be associated with a real eigenvalue λ i such that the average over the derivative of firing rates at this fixed point, φ i , fulfills the above condition (23).
In the special case of φ = tanh, the φ i are confined to the interval (0, 1], so the corresponding eigenvalues must be real and larger than one. One may hence look at the spectrum of the connectivity matrix alone and determine how many non-trivial fixed points there are. An instance of this phenomenon is illustrated in Fig. 4. The spectrum at the origin contains two outliers λ i , i = 1, 2, each real and larger than one. The dynamics have two corresponding fixed points x (i) located on the manifold M.

C. Stability of fixed points
The stability of each fixed point is determined by the spectrum of its Jacobian. The associated stability matrix −0.5 (the Jacobian shifted by −1) is with the diagonal matrix of slopes R ij = δ ij φ i . Previous work ( [4]) has found that the spectrum of S, too, consists of a bulk and a small number of exceptional eigenvalues: in the case of an uncorrelated rank-one perturbation, there are two nonzero eigenvalues obtained via mean field theory, only one of which has been found outside the random bulk. The radius of the bulk shrinks to g φ 2 due to the saturation of the nonlinearity [4]. We find numerically that the bulk behaves alike in our model, too. For the rest of this section, however, we focus on exceptional eigenvalues of the stability matrix S. Similar to the trivial fixed point, one can apply the matrix determinant lemma to derive an equation for the stability eigenvalues: We can apply the mean field theory introduced above to evaluate the right hand side. The details of this calculation are deferred to Appendix E. It turns out that the resulting γ are surprisingly compact. We now describe these stability eigenvalues.
Consider the fixed point x (i) . According to Eq. (23), this fixed point corresponds to the eigenvalue λ i . Eq. (25) always has two solutions γ ± determined by a quadratic equation. These are only dependent on the outlier λ i and the statistics of the fixed point x (i) , but entirely independent of the remaining spectrum or other fixed points.
Their precise values are detailed in Eq. (F2). It turns out that γ + and γ − always have a real part smaller than one. They hence do not destabilize the fixed point. Additionally, at least one of the two is always hidden within the bulk of the eigenvalues, as observed before for the case of no correlation [4]. In Fig. 4(b-c), the spectra of the Jacobian at two fixed points are compared with the theoretical predictions. In both cases, γ ± correspond to the two stars within the bulk. In Fig. 5(b), the bulk is smaller (g = 0.6) and γ + is visible.
If λ i = λ 1 is the only outlier, then γ ± are the only two solutions of Eq. (25), and the fixed point x (1) as well as its mirror −x (1) will be stable. However, if there are K ≥ 2 outliers {λ 1 , . . . , λ K }, we find an additional set of K − 1 stability eigenvalues This expression indicates a remarkable relationship between different fixed points: the existence of a fixed point x (j) with outlier λ j > λ i will always destabilize the fixed point x (i) corresponding to λ i . Conversely, if there are no outliers with real part larger than that of λ i , then x (i) will be stable. Since λ = 1/ φ implies that larger λ corresponds to larger fixed point variance ∆ 0 , one can say that only the largest fixed point can be stable, Such an interaction between two fixed points is illustrated in Fig. 4(b-c). The stars outside of the bulk correspond to the predicted eigenvalue γ 2 or γ 1 . Comparison between the theoretical prediction (26) and numerical calculation for a sampled network shows good agreement for both fixed points. Furthermore, a simulation of the dynamics in Fig. 4(d) shows that indeed all trajectories converge to the larger fixed point x (1) or its negative counterpart. Finally, note that a complex outlier λ j also destabilizes a fixed point x (i) if the real part of λ j is larger than that of λ i . Complex outliers do not have corresponding fixed points, since Eq. (23) is real. An example of such a case is shown in Fig. 5. There is only one real eigenvalue larger than one, and hence only a single non-trivial fixed point x (3) . Nonetheless, the two complex outliers λ 1 = λ * 2 destabilize the fixed point by virtue of Eq. (26), since the real parts are larger than the outlier corresponding to the fixed point, λ 1 = λ 2 > λ 3 . Numerical simulations indicate that in such a case, the dynamics converge on a limit cycle.

V. RANK-TWO PERTURBATIONS
The previous section demonstrated two properties of networks with multiple non-trivial fixed points: they are highly correlated due to the confinement on the manifold M [ Fig. 3(b-d) and Fig. 4(d)], and their stability properties interact [Eq. (26)]. We asked whether the latter is a result of the former. To approach this question, we extend the model to a rank-two perturbation which allows for uncorrelated fixed points.
The rank-two connectivity structure is defined by We assume J, m and u to be drawn independently. Similar to the rank-one case, the entries of both m and u are drawn from standard normal distributions while n and v are Gaussian and dependent on J, m and u.
The outliers λ of the perturbed matrix J + mn T + uv T are calculated similarly to the rank-one case. Applying the matrix determinant lemma twice, we arrive at an equation of quadratic form: In other words, λ is an eigenvalue of the matrix which depends on λ through In general there are more than two solutions, but if the rank-two perturbation is uncorrelated to J, the matrix M λ disappears in Q λ . The solution is then in agreement with previous results [4]. Non-trivial fixed points of the network dynamics (2) with a rank-two perturbation (27) obey the equation with κ 1 = n T φ and κ 2 = v T φ. Similar to the rank-one case, we can apply the recursive insertion of the fixed point and partial integration, Eqs. (19) and (20), to compute the two-component vector κ = (κ 1 , κ 2 ). We arrive at This equation has two consequences: First, we find that λ = 1/ φ , because both sides are eigenvalues of Q λ , see Eq. (28). Second, the feedback vector κ is the corresponding eigenvector. This gives rise to three situations: (i) If Q λ has two distinct eigenvalues, one of them is equal to λ. The corresponding eigenvector determines the direction of κ.
In the case of degeneracy, the geometric multiplicity, that is, the number of eigenvectors, determines the situation.
(ii) If there is only one eigenvector, the direction of κ is determined uniquely.
(iii) If λ has two corresponding eigenvectors, any direction is a solution. The length of κ is determined below, Eq. (33), and we obtain a ring attractor [4]. This situation arises in the case of precise symmetry, Q λ = λ1.
Finally, the length of κ is determined by the variance ∆ 0 = x T x/N of the fixed point, which obeys The fixed point stability is calculated based on the techniques introduced above; details can be found in Appendix G. The result is the same as that in the rank-one case: the stability eigenvalues obey the same equations as before. Namely, if the spectrum of J +mn T +uv T has the outliers {λ 1 , . . . , λ K }, there are always two stability eigenvalues γ ± , both with real parts smaller than one. At a fixed point x (i) , there are K − 1 additional outliers γ j = λ j /λ i for j = i. This implies that linear dynamics around a fixed point is completely determined by its statistics and the spectrum of the connectivity matrix: as long as the outliers are the same, the stability eigenvalues are independent of the rank of the perturbation P or its correlations to J.
This also answers our question about whether the correlation between fixed points is responsible for the strong influence on each other. The rank-two case, too, can be analyzed by replacing the feedback κ 1 , κ 2 with two constant scalars. The corresponding manifold is now twodimensional, and fixed points can be arbitrarily uncorrelated. In Fig. 6, we show an example: plotting the projection of the fixed points along the vectors m and u shows that the fixed points are almost orthogonal. Yet, the spectra at the origin and at each fixed point are identical to the corresponding rank-one case (compare with Fig. 4). The correlation between fixed points is hence not important for the mutual influence of different fixed points.

VI. DISCUSSION
Given a network with connectivity consisting of a random and a structured part, we examined the effects of correlations between the two. We found that such correlations enrich the functional repertoire of the network. This is reflected in the number of non-trivial fixed points and the spectrum of the connectivity matrix. We analyzed precisely which aspects of the correlations determine the fixed points and eigenvalues.
In our model, the overlaps θ k quantify the correlations between random and structured connectivity components. For uncorrelated networks, only θ 0 is nonzero, and the spectrum of the joint connectivity matrix has only a single outlier [25,26]. We showed that for correlated networks with higher θ k nonzero, multiple outliers can exist, and that with such, multiple fixed points induced by a random plus rank-one connectivity structure become possible. The correlations between random part and rank-one structure hence enrich the dynamical repertoire in contrast to networks with uncorrelated rank-one structures, which can only induce a single fixed point [4]. Note, however, that our assumption of Gaussian connectivity limits the resulting dynamics to a single stable fixed point (discussed below).
Apart from multiple fixed points, the correlated rankone structure can also lead to a pair of complex conjugate outliers, which in turn yield oscillatory dynamics on a limit cycle. In absence of correlations, such dynamics need the perturbation to be at least of rank two [4]. Finally, we found that correlations allow for smaller structures: the norm of a correlated rank-one structure inducing a fixed point decreases with increasing variance of the random part, pointing towards possible benefits of large initial random connectivity.
Constraining the model to Gaussian connectivity allowed us to analytically understand the mechanisms of correlations in a nonlinear network. We established a remarkable one-to-one correspondence between the outliers of the connectivity matrix and fixed points of the nonlinear dynamics: each real outlier larger than one induces a single fixed point. Surprisingly, the stability of the fixed points is governed by a simple set of equations and also only depend on the outliers of the spectrum at the origin. Through these results, we were able to look at the system at one point in phase space (the origin) and determine its dynamics at a different part of the phase space. It remains an open question to which degree these insights extend to non-Gaussian connectivity. Interesting other connectivity models might include sparse connectivity [25], different neuron types [30], or networks of binary neurons such as the Hopfield model [9].
Our approach allows us to gain mechanistic insight into the computations underlying echo state and FORCE learning models which have the same connectivity structure as our model [20,21]. Here, the readout vector n is trained, which leads to correlations to the random part J [3,31]. Our results on multiple fixed points and oscillations show that these correlations are crucial for the rich functional repertoire. However, constraining our theory to Gaussian connectivity limits the insights, since the learning frameworks do not have this constraint. One study analyzing such non-Gaussian rank-one connectivity in the echo state framework shows that, like in our study, each fixed point had one corresponding outlier in the connectivity matrix [3]. However, multiple stable fixed points were observed, which is in contrast to our model where the Gaussian connectivity only permits the fixed point with largest variance to be stable. It would thus be interesting to extend our model beyond the Gaussian statistics.
We pointed out a general limitation of networks with random plus rank-one connectivity: the restriction of fixed points to a one-dimensional manifold. This insight is independent of the Gaussian assumption and leads to high correlations between fixed points. Such correlations have been found to impede sequential learning of multiple fixed points [29]. An extension to rank-two structures allows for uncorrelated fixed points. Surprisingly, however, the strong influence of the largest outliers on the stability of fixed points still exists for Gaussian rank-two connectivity. Indeed, the fixed point statistics and their stability is determined solely by the spectral outliers of the connectivity matrix, independently of how these outliers were generated. Since these relations do not hold in the non-Gaussian case, [3], we conclude that the Gaussian assumption poses a severe limitation to the space of solutions.
Further in accordance with the echo state and FORCE learning frameworks [20,21], we model the correlations to be induced by one of the two vectors forming the rankone structure. Some of the results, such as the overlaps θ k , are symmetric under the exchange of the two vectors and should hence be unaffected. The result on the strongly increasing norm of the perturbation when placing multiple outliers, on the other hand, may depend on this assumption [28]. To which degree our results or the capabilities of trained networks are limited by this constraint is not clear. Our choice to model the structured part as a low-rank matrix was in part motivated by the computational models discussed above. Besides these, the existence of such structures may also be inspired by a biological perspective. Any feedback loop from a high-dimensional network through an effector with a small number of degrees of freedom may be considered as a low-rank perturbation to the high-dimensional network. Similarly, feedback loops from cortex through basal ganglia have been modeled as low-rank connectivity [28]. Even without such explicit loops, networks may effectively have such structure if their connectivity is scale-free or contains hubs [32]. Finally, low-rank connectivity also appears outside of neuroscience, for example in an evolutionary setting [33]. Whether low-rank matrices arise in general in learning networks, and to which degree such structure is correlated with the initially present connectivity are interesting future questions to be approached with the theory we developed here. In our model, J and m are drawn independently from Gaussian distributions. We construct n from the these two quantities for a target set of overlapsθ k . Specifically, we set Below we show that with this definition the actual overlaps θ k converge to the targetsθ k with increasing network size N . Note that the scaling by 1/N renders the θ k order one quantities.
We start by analyzing the uncorrelated case, for whicĥ θ k = 0 for any k ≥ 1, and We need to show that θ 0 converges toθ 0 as N → ∞. To this end, we will show that the expected value has this limit and that the variance vanishes. We calculate the scalar product The expected value of m T m/N is one, so we observe that, indeed, θ 0 equalsθ 0 in expectation. For the variance we have var The term in the second line vanishes, and the one in line three is of order O(1/N ) since the fourth moment does not depend on the network size. The scalar product hence has a self-averaging quality in the sense that it converges to its expected value with deviations on the order O(1/ √ N ).
The next overlap is treated similarly: Here, the expected value is equal to zero because of the independence between J and m. The variance decays with N as before: The off-diagonal terms in the first line disappear since entries with different indices are independent. Similar calculations can be done for any of the higher order overlaps θ k , and one finds that the variance of these terms equally scales with O(1/N ). This self-averaging property is true for any of the terms calculated below, so we omit the expectation symbol and simply write equality signs. For example, θ 0 =θ 0 and θ 1 = 0. We now turn to the correlated case. Here, the termŝ θ k may be non-zero. We only discuss the simplest case withθ k = 0 for any k ≥ 2, since all other cases can be treated similarly. We write and calculate the zeroth overlap The crossed-out term self-averages to zero due to the independence between J and m. Similar reasoning applies to the first overlap: Now, however, it is the first term that vanishes. The second one remains order O(1): and hence θ 1 =θ 1 . The last calculation explains the scaling by 1/g 2k in the definition of n, Eq. (A1). It also points to a general feature of the algebra of scalar products: products of the sort a(J T ) k J l b for vectors a and b independent of J yield the expected value Applying this algebra, one obtains that for the construction of n according to Eq. (A1) one indeed obtains θ k =θ k , valid in expectation and with deviations on the order O(1/ √ N ).

Appendix B: Construction of outliers
Here we detail how to construct a rank-one perturbation mn T such that the joint matrix J + mn T has a set Λ = {λ 1 , . . . , λ K } of K outliers. Applying Eq. (A1) for n, the question reduces to finding the coefficientsθ k . As shown above, these converge to the actual overlaps θ k for large networks. In the discussion below, we will hence omit the hats for clarity, settingθ k = θ k .
The procedure of determining the θ k from the λ i allows some choices. We start by choosing whether the θ k should form a truncated series or decay exponentially. For the truncated case with θ k = 0 for all k ≥ K, the equation follows directly from outlier equation (6): To obtain K outliers from a non-truncated series, one can write the θ k as sums of exponentially decaying terms with coefficients a α and bases b α . Evaluating the geometric series leads to the polynomial equation Either choice hence yields a polynomial of degree K, the roots of which need to be the target outliers λ i . The coefficients of this polynomial can hence be obtained from a comparison with the coefficients of the polynomial p(λ) = K i=1 (λ − λ i ). For example, in the case of a truncated series, the coefficients θ k are determined by where π ( K k ) denotes the possible choices of k different indices from the K available ones. Note that the indices for θ k run from zero to K − 1, whereas those of λ i run from 1 to K.
In the case of exponentially decaying series, one can derive a similar equation from Eq. (B3). However, there are 2K parameters a α and b α -there is the freedom to choose K of them, as long as b α < g.
This can be read as an underconstrained linear system An = 1, with the vector of ones 1 and the matrix A ∈ C K×N defined by its rows The least square solution to the system is given by with the pseudoinverse of A denoted by We express the vector w λ by the series expansion and insert this into the term AA T . Applying the algebra (A11) developed above yields Inserting this into Eq. (C3), we can finally write where the coefficients a α are determined by solving the linear equation Connecting to the above schemes of constructing n with deliberate overlaps θ k = n T J k m, we compute these for the least square solution found here. We find that overlaps decay exponentially as in Eq. (B2), namely We observed numerically that for a large number of outliers K the rank-one perturbation became the dominant term in the matrix J + mn T . To understand this, we look at its Frobenius norm, given by The squared norm of n can be obtained from the pseudoinverse defined above: To arrive at a general expression for this quantity, we calculated Eq. (C5) explicitly for the cases K = 1 and K = 2 and performed thorough numerical checks for larger K (see Fig. 2). The resulting equation is For an increasing number of outliers, the offset by minus one becomes negligible, and we arrive at the result stated in the main text, Eq. (12).

Appendix D: Mean field theory with correlations
We analyze fixed points of the form We extend the setting in the main text by allowing for a constant input vector I. Like the other vectors, we assume I to be Gaussian and uncorrelated to J. We want to compute κ = n T φ(x). The mean field assumption is that x is a Gaussian variable. That is, its entries x i are drawn from a normal distribution with zero mean and variance ∆ 0 . The variance ∆ 0 is determined self-consistently below. Since the entries of the vector follow the same statistics, we look at a representative x i and drop the index i: with the standard Gaussian random variable z x ∼ N (0, 1). By the model assumption, the vector n is also a Gaussian. One can express n explicitly in relation to x by defining with a second, independent standard Gaussian random variable z n . The parameters σ 2 n and ρ encode the variance of n and its correlation to x. The self-averaging quality of the scalar product allows us to write Computing κ can then be achieved by exchanging between the scalar product n T φ and the corresponding Gaussian integral: where Dz is the standard Gaussian measure. In the second line, the term σ n √ 1 − ρz n vanishes with the integration over z n . For the second summand, the integrating over z n evaluates to 1. The step from second to third line involves partial integration: for some function f and the Gaussian variable z ∼ N (0, 1), we have Dy zf (z) = Dz f (z). This is also known as Stein's lemma [4]. Finally, the angled brackets · indicate the average over the fixed point statistics, Such explicit representations of pairs of correlated Gaussian variables have been applied before [3,4]. Below, however, we encounter a multitude of such Gaussian vectors, so such a framework would become increasingly cumbersome. We thus introduce a new formalism which allows us to model arbitrary many Gaussian vectors. Let a and b be two such vectors of interest. Like before, we are interested in the statistics of a representative entry, namely a and b. We define these as Here, B a , B b ∈ R K are a set of real-valued coefficients and X is a K-dimensional standard normal Gaussian variable with mutually independent entries X α ∼ N (0, 1). The K-dimensional dot product is defined as B a · X = K α=1 (B a ) α X α . Any additional vectors c, d, . . . are added by defining corresponding coefficients B c , B d , . . . . One just needs to choose the dimension K of the embedding to be sufficiently large.
Since scalar products in the N -dimensional physical space are self-averaging, we can write: In line with the previous sections, the equality sign is only valid in the limit N → ∞, and the variance decays like 1/N . Ultimately, we are interested in such scalar products. This allows us to use the coefficients B a merely as placeholders inside calculations, without ever defining their actual values. With this notation we return to the computation of κ: where DX is the now standard Gaussian measure in K dimensions. The third line is obtained using partial integration as before. In the last line of Eq. (D9) above, we inserted the definition of coefficients from above, Eq. (D8), for the scalar products. The advantage of the new formalism is that it allows us to continue the calculation. We insert the fixed point x = Jφ + κm + I. In the case of structure vectors drawn independently from J, the term n T Jφ vanishes and one recovers the known result κ = φ n T (κm + I) [4]. For the general case, we go on calculating n T Jφ. The scalar product allows to pull the random matrix to the left side, and hence Recursively applying this strategy, we arrive at with For a driven network with nonzero n T M I, one can compute κ by re-sorting: The scalar φ , Eq. (22), is a function of the fixed point variance ∆ 0 = x T x/N . Due to the independence of m and I from J, the variance obeys the same equation as in previous studies [4]: with φ 2 = Dz φ 2 ( √ ∆ 0 z). The coupled nonlinear Eqs. (22), (D13) and (D14) can be solved numerically if the overlaps θ k = n T J k m and n T J k I, which respectively enter n T M m and n T M I, are known.
In the case of no input, I = 0, κ is not directly determined. Instead, φ is given directly by the outliers via Eq. (23), as discussed in the main text. The mean field equations can be closed by numerically solving Eq. (D6) for ∆ 0 . The corresponding κ is determined up to the sign by (D15) Appendix E: Stability eigenvalues The matrix JR − 1γ with the diagonal matrix R ij = δ ij φ i is invertible as long as γ ∈ C is not within the spectrum of JR . We apply the matrix determinant lemma to compute the characteristic polynomial, det (JR − 1γ) The first bracket has to vanish, so that we arrive at for a nonzero γ.
We calculate the terms n T R (JR ) k m applying the Gaussian mean field theory introduced above. For brevity we use induction. The hypothesis to be proven is where x denotes the fixed point and we define We start the induction by calculating the Gaussian integral We used partial differentiation twice, which yields the third derivative of the nonlinearity. The steps above did not depend the specific vectors m and n so we will apply the same step below without explicitly mentioning the Gaussian integrals. The entire deviation furthermore does not depend on the vector n. We make use of this independence in the induction step where we assume the hypothesis (E3) to be true also after replacing n T with n T J. This term is identified by square brackets in the below calculation: The first step is to replace m with (JR ) k m in Eq. (E5). The step from second to third line involves the induction hypothesis for k − 1, and including the last term in the sum completes the induction. The last term needs to be calculated separately. We show that in a separate induction. The start is trivial. For the induction step at k ≥ 1, we insert the fixed point equation Inserting the induction hypothesis for k − 1 proves the statement. A few comments on the steps of the calculation: • The crossed-out term in the first line vanishes because m and I are drawn independently of J. Showing this formally is a matter of applying the same techniques as introduced above recursively.
• The step from line two to three involves the vector algebra (A11) introduced above, according to which a T J T Jb = g 2 a T b for two vectors a, b independent of J.
• The fourth line is obtained by applying partial integration. For an arbitrary Gaussian vector a, we have φ T R a = N DX φ(B x · X) φ (B x · X) B a · X = x T a φ φ + φ 2 . Orange lines: radius of the bulk; any eigenvalues with smaller magnitude will not be observable, and hence not numerically testable for finite size networks. Where the absolute values of γ± coincide, the two form a pair of complex conjugates. Note that for g > 1, the minimal λ to stabilize the chaotic activity is larger than one. and q = g 2 φ φ+φ 2 as defined above, Eq. (E4). We observe that γ ± is entirely defined by the fixed point statistics. One can even reduce the problem to two parameters, for example the outlier λ and the network parameter g which quantifies the strength of the random connectivity. This allows to thoroughly scan the numerical values of γ ± . The results can be observed in Fig. 7. The first observation is that both γ + and γ − are always smaller than one. They hence do not destabilize the fixed point. Additionally, we can compare γ ± to the radius of the bulk, which is given by r = g φ 2 [4]. This shows that γ − is always within the bulk and hence not observable numerically.
The roots of the first bracket in Eq. (F1) are identified by comparing once again with Eq. (5) which defines the outliers λ of the spectrum at the origin. In fact, the equation is identical, but now the variable is λγ. Here, λ is the outlier corresponding to the fixed point under consideration. We hence need to fulfill λγ = λ for some λ in the set of outliers. Accordingly, the solutions are given by Eq. (26) in the main text.
Appendix G: Stability for rank-two perturbation The stability of a fixed point in the case of a rank-two perturbation can be evaluated similarly to the rank-one case. One simply applies the matrix determinant lemma once more on the stability matrix. Let λ = 1/ φ be the outlier corresponding to the fixed point under consideration. All mean field quantities will hence implicitly depend on λ, even though we omit this dependency in the notation. The quadratic equation for the stability eigenvalues γ reads with the mean field form of the stability matrix andM γ = R (1 − JR /γ) −1 . As above, Eq. (E12), we have for two vectors a and b and M λ = (1 − J/λ) −1 as before, Eq. (30). The second line is valid in case of an autonomous fixed point, c.f. Eq. (F1). Evaluating the terms appearing inQ γ , we arrive at We abbreviated A = φ λγ (γ−q)(γ−1) and ∆Q = γQ λ − Q λγ . The trace and determinant are then conveniently evalu-ated as det(Q γ ) = det(Q λγ ) λ 2 1 + Aκ t (Q λγ ) −1 ∆Qκ , (G5) Tr(Q γ ) = 1 λ Tr(Q λγ ) + Aκ t ∆Qκ .
(G7) We hence arrive at the same solutions as in the case of a rank-one perturbation, Eqs. (26) and (F2).