Suppression of chaos in a partially driven recurrent neural network

The dynamics of recurrent neural networks (RNNs), and particularly their response to inputs, play a critical role in information processing. In many applications of RNNs, only a specific subset of the neurons generally receive inputs. However, it remains to be theoretically clarified how the restriction of the input to a specific subset of neurons affects the network dynamics. Considering RNNs with such restricted input, we investigate how the proportion, $p$, of the neurons receiving inputs (the"inputs neurons") and the strength of the input signals affect the dynamics by analytically deriving the conditional maximum Lyapunov exponent. Our results show that for sufficiently large $p$, the maximum Lyapunov exponent decreases monotonically as a function of the input strength, indicating the suppression of chaos, but if $p$ is smaller than a critical threshold, $p_c$, even significantly amplified inputs cannot suppress spontaneous chaotic dynamics. Furthermore, although the value of $p_c$ is seemingly dependent on several model parameters, such as the sparseness and strength of recurrent connections, it is proved to be intrinsically determined solely by the strength of chaos in spontaneous activity of the RNN. This is to say, despite changes in these model parameters, it is possible to represent the value of $p_c$ as a common invariant function by appropriately scaling these parameters to yield the same strength of spontaneous chaos. Our study suggests that if $p$ is above $p_c$, we can bring the neural network to the edge of chaos, thereby maximizing its information processing capacity, by amplifying inputs.


I. INTRODUCTION
Large-scale recurrent neural networks (RNNs) exhibit various dynamical patterns, including limit cycles and chaos.They have been used to model brain functions such as working memory [1], motor control [2], and context-based learning [3].The rich dynamics exhibited by an RNN can also be used for information processing.Reservoir computing (RC) [4,5] is a machine learning framework that utilizes large RNNs, called "reservoirs", to reproduce the time series data of interest.RC is not restricted to RNNs, and indeed a wide range of dynamical systems can serve as the reservoir under appropriate conditions.Physical reservoir computing, in which a real physical system is used as the reservoir, has been an area of active research in recent years [6].
In general, RNNs must possess rich dynamics in order to assimilate a diverse set of signals to be learned.For this reason, it is advantageous for RNNs to exhibit chaotic spontaneous activity.On the other hand, for an RNN to successfully reproduce a target time series, it must converge to the same state each time it receives a particular set of input signals, regardless of its initial internal state.This property is known as the "echo state property" [4] in the context of RC.Hence, it is hypothesized that an RNN that displays varied spontaneous activity while maintaining consistency in response to inputs will exhibit superior computational performance.Such an RNN is commonly referred to as being at the "edge of chaos", and it is known empirically that reservoirs in this regime have the highest computational capacity [7].In * shotaro.takasu.63x@st.kyoto-u.ac.jp fact, there is experimental evidence suggesting that mammalian neuronal networks operate in this critical regime [8,9].
Lyapunov spectrum analysis [10] allows us to study the dynamics of RNNs in a quantitative manner.The maximum Lyapunov exponent (MLE) characterizes the exponential rate of separation of infinitesimally close trajectories.In the case that a dynamical system is driven by given input signals, the MLE is called the maximum conditional Lyapunov exponent (MCLE).Two identical RNNs with slightly different initial states will converge to the same state under the same inputs if and only if the MCLE is negative.Therefore, a negative MCLE is necessary for an RNN to exhibit consistency with respect to inputs.
It is known that the MCLE of a random RNN decreases with the strength of the input signal and eventually becomes negative [11][12][13][14][15].In other words, sufficiently amplified driving input signals can suppress chaotic dynamics.These findings suggest that it is possible to shift the state of an RNN exhibiting chaotic spontaneous dynamics toward the edge of chaos by appropriately amplifying the input, and then use its shifted state as an efficient reservoir [16].There is also experimental evidence indicating similar suppression of chaos in the brain [17].
Previous studies of the Lyapunov exponents of large random RNNs assume a model in which every unit in the RNN connects to the input layer and receives driving signals.Hereafter, we refer to RNNs of this type as "full-input RNNs".However, such a model is biologically implausible because biological synapses are sparse [18].Additionally, in physical reservoir computing, it is often unfeasible to connect the input to all reservoir units.Therefore, it is important to investigate the dynamics of RNNs in the case that only a subset of the neurons receive input signals.We refer to RNNs of this type as "partial-input RNNs."It remains to be elucidated whether sufficiently amplified inputs always suppress the chaotic activity in a partial-input RNN, as in the case of a full-input RNN.In this work, we address this question by analytically calculating the MCLE of a partial-input random RNN.

II. NETWORK MODEL
We investigate the discrete-time dynamics of a random sparse RNN with N ≫ 1 neurons of which only pN receive inputs (Fig. 1).We define the parameter p ∈ [0, 1], called the "input partiality," which determines the fraction of neurons coupling to the input unit.We consider the case in which the input signal s(t) at time t is a scalar for simplicity, although our theory can be straightforwardly extended to multi-dimensional inputs.The state of a neuron that receives inputs is represented by a dynamical variable x i (t) ∈ R (i = 1, • • • , pN ) whose evolution obeys the equation while the state of a neuron that does not receive inputs is represented by a dynamical variable y i (t) ∈ R (i = pN + 1, • • • , N ) whose evolution obeys the equation Here, u i (i = 1, • • • , pN ) is the coupling weight connecting the input signal to the ith neuron, J ij is a recurrent weight matrix determining the coupling from the jth to the ith neuron, and ϕ is the activation function.The value of u i is drawn randomly from a Gaussian distribution with zero mean and unit variance.We define J ij to be a random sparse matrix whose elements are nonzero with probability α ∈ (0, 1].The non-zero element is independently drawn from a Gaussian distribution with zero mean and variance g 2 /N .The gain parameter g represents the recurrent coupling strength.The activation function is chosen as ϕ(x) = erf( √ π 2 x) for analytic tractability.It has been found that in the absence of inputs, this RNN exhibits a transition from fixed-point dynamics to chaotic dynamics at g = 1 in the limit of a large network [19].In previous studies, zero mean white Gaussian noise has typically been used for the input signal s(t) [11][12][13][14]], but we do not limit the choice of s(t) in this way, and instead regard it to be a time series that satisfies a certain condition to be shown in Sec.IV .Overall, the model parameters to be varied are the input partiality, p, the recurrent connection sparsity, α, and the recurrent coupling strength, g.

III. DERIVATION OF THE MAXIMUM CONDITIONAL LYAPUNOV EXPONENT A. Mean-field theory
We can obtain the statistical properties of x i (t) and y i (t) using a mean-field approach [12,13] in the limit N → ∞.As seen from Eqs.( 1) and ( 2), x i (t + 1) and y i (t + 1) are sums of large numbers of identically distributed independent variables, {J ij ϕ(x j (t))} pN j=1 and {J ik ϕ(y k (t))} N k=pN +1 .Thus, according to the central limit theorem, we can consider x i (t) and y i (t) to follow Gaussian distributions.It thus suffices to determine their averages, ⟨x i (t)⟩ and ⟨y i (t)⟩, and variances, ⟨x 2 i (t)⟩ and ⟨y 2 i (t)⟩, where ⟨• • • ⟩ denotes the average over realizations of the quenched weights J ij and u i .
Taking the average of the evolution equation of x i (t) and x i (t) 2 over realizations of weights J ij and u i , we obtain Similarly, ⟨y i (t + 1)⟩ and ⟨y i (t + 1) 2 ⟩ are given by Note that we have used the assumption that ϕ(x j ) and ϕ(y k ) is independent of its incoming weight J ij and J ik .
This assumption is justified in the limit N → ∞ using the generating-function formalism [11,20].
Introducing the notation K(t) := ⟨y i (t) 2 ⟩, the Gaussian distributions exhibited by x i (t) and y i (t) can be expressed as follows: From Eqs.(3b) and ( 4), we can directly calculate K(t+1), obtaining where

B. Derivation of the MCLE
The time series {K(t)} t obtained above allows us to derive the MCLE, λ, of the RNN [11].The MCLE is defined as the asymptotic growth rate of the distance between two replicated RNNs, where ∥δ(t)∥ denotes the distance at time t between two replicated RNNs receiving the same input signals.
Linearizing the evolution equation of the RNN, we obtain the variational equation describing the evolution of infinitesimal perturbation δ(t), where the matrix Φ ′ (t) is the diagonal matrix whose ith diagonal entry is ϕ ′ (x i (t)) for 1 ≤ i ≤ pN or ϕ ′ (y i (t)) for pN + 1 ≤ i ≤ N .The typical growth rate, ∥δ(t + 1)∥/∥δ(t)∥, is determined by the spectral radius of the Jacobian J Φ ′ (t).According to the random matrix theory [21], the spectral radius ρ(t) is given by in the limit of large network size.Applying the results of the mean-field theory (Eqs.(4)), we can calculate λ, yielding λ = lim T →∞,δ(0)→0

2T
T −1 Figure 2(a) displays the MCLE, λ, as a function of the input intensity σ defined below, for p = 0.4 and p = 0.6, with g = 3.0 and α = 1.0.Here we assume the input to be Gaussian white noise with zero mean and standard deviation σ.The analytic results obtained from Eq.( 7) are consistent with the results of the numerical simulations.For both values of p, λ decreases monotonically as a function of σ.However, while the MCLE for p = 0.6 falls below 0 around σ = 20, the MCLE for p = 0.4 remains positive throughout the range of σ shown in the figure.

IV. THE MAXIMUM LYAPUNOV EXPONENT CONDITIONED BY INFINITELY LARGE INPUTS
In this work, our main interest is to determine whether the MCLE falls below 0 with sufficiently amplified inputs.However, it cannot be answered by observing the numerical simulations shown in Figure .2(a)even if simulations are performed for quite large σ, which motivates us to derive the analytic value of the MCLE conditioned by infinitely large inputs, λ ∞ .We introduce a scaling parameter of input signals, ξ > 0, and then represent the amplified input signals by {ξs(t)} t .The MCLE conditioned by infinitely large inputs is denoted by λ ∞ := lim ξ→∞ λ.
The quantity λ ∞ can be derived analytically as follows.We assume that K(t) converges to a certain value, K ∞ ,  7) is plotted with a solid curve for p = 0.4 and a dashed curve for p = 0.6, where the time average in Eq.( 7) is computed up to T = 10 5 .Error bars represent ±std of direct numerical simulations based on the definition of λ in Eq.( 6).For each plot, the coupling strength is set to g = 3.0, and the sparsity is set to α = 1.0.For the numerical simulations, the value of the MCLE was obtained by directly calculating Eq.( 6) for a sufficiently long time, i.e., T = 10 4 .(b) Analytic results (solid, dashed, dotdashed curves) and numerical results (error bars indicating ±std) for λ∞ with sparsity α = 1.0.In numerical simulations, the value of λ∞ was obtained by directly calculating Eq.( 6) for a sufficiently large input magnitude (σ = 10 3 ).In the cases of both (a) and (b), the input is assumed to be Gaussian white noise with zero mean and standard deviation σ.The values of the MCLE (λ and λ∞) were calculated for 10 different network realizations with a network size of N = 1000.
as ξ becomes sufficiently large.Whether this assumption is valid or not depends on the nature of the input time series.We will discuss a case where K(t) does not converge in Sec.VI.The value of K ∞ is given by This expression for K ∞ is obtained by replacing s(t) with ξs(t) in Eq.( 5) and considering the limit ξ → ∞.Taking this limit and substituting K ∞ for K(t) in Eq.( 7), we obtain We have confirmed that the value of λ ∞ obtained from Eqs.( 9) is consistent with the results of numerical simulations (Fig. 2(b)).
Figure 2(b) depicts the relationship between p and λ ∞ for various values of the recurrent weight intensity g.We define the value of p at λ ∞ = 0 as the "critical input partiality", p c .Because the condition λ ∞ > 0 always holds for p < p c , we conclude that even sufficiently amplified input signals cannot suppress the chaotic activity of the RNN if p < p c . Figure 3(a) depicts the dependence of p c on g with fixed α.We obtain the curve by solving Eq.( 9) for K ∞ with λ ∞ = 0 and substituting this K ∞ into Eq.(8), yield-ing Below the curve, λ ∞ is positive, and thus, in this region, chaos is not suppressed no matter how strong the input.As seen in Figure 3(a), p c is an increasing function of g.
We next investigate the effect of sparsity α on p c .Plotting the g-p c curves with several values of α, we find that a sparser RNN results in a smaller value of p c (Fig. 3(a)).This is intuitively understandable, because the dynamics of a sparser RNN are less chaotic, and thus a smaller value of the input partiality is sufficient to control the chaos.To take account of this relationship, we introduce the MLE of the RNN with no input, denoted by λ 0 .Clearly, λ 0 quantifies the strength of chaos in spontaneous activity of the RNN, and it can be determined analytically by substituting s(t) ≡ 0 into Eq.( 5) and Eq.( 7), yielding Interestingly, we find that when p c is plotted with respect to λ 0 , the resulting curves for all values of α coincide, as seen in Figure 3(b).The reason for this coincidence is easily understood by considering Eqs.( 10)-( 12).From Eqs.( 11) and ( 12), we see that λ 0 is a function of αg 2 .Writing the corresponding inverse function as αg 2 = f (λ 0 ), and substituting this into Eq.(10), we obtain p c expressed as a function of λ 0 alone.This finding implies that p c depends primarily on the strength of spontaneous chaos, independently of how sparse the recurrent connection is.

V. PERFORMANCE OF A PARTIALLY DRIVEN RESERVOIR COMPUTING
Finally, we study the information processing capability of a partial-input RNN employed as a reservoir for RC.Memory capacity [22] is a commonly used benchmarks for RC.It is a measure of the ability of a reservoir to perform short-term memory tasks through the reconstruction of its past input signals.The memory capacity is defined as follows.From the N neurons, K (1 ≤ K ≤ N ) lead-out units are randomly chosen, and represented by a vector x(t) ∈ R K .The reservoir's output is defined as ẑ(t) := w ⊤ x(t), where the vector w ∈ R K represents the output weights.In a τ -delay memory task, the reservoir at time t is required to output the previous input signal s(t − τ ), and the output weights are trained to minimize the mean squared error between ẑ(t) and the desired output s(t − τ ).This is accomplished with a least-squares method, and the trained output weights are determined as ŵ = (XX ⊤ ) −1 Xs, where X := (x(1) • • • x(T )) and s := (s(1) • • • s(T )) ⊤ (T being the length of the simulation).After training, we evaluate the task performance M τ defined as where the brackets represent the time average.Because the numerator of the second term in Eq.( 13) is the mean squared error, M τ approaches 1 as the reservoir learns to accurately reconstruct its past input s(t − τ ).The memory capacity M C is defined as the sum of the M τ , M C := ∞ τ =1 M τ .It has been mathematically proved that M C satisfies the inequality 0 ≤ M C ≤ K [22,23].
Assuming that the input signal s(t) is Gaussian white noise with zero mean and variance σ 2 , we calculated both λ and M C. The results are plotted as functions of σ in Figure 4(a) and (b).As previously noted, an increase in the input magnitude leads to a decrease in the MCLE (as seen in Fig. 4(a)), with the result that the plot in Fig. 4(c) shifts leftward as σ increases.From Eq.( 10), p c is found to be approximately 0.074 under the conditions employed in Fig. 4. When p = 0.15 and p = 0.50, the plots intersect the vertical line λ = 0 in Fig. 4(c), as our theory predicts, and the memory capacity reaches its maximum value near λ = 0. Contrastingly, the plots with p = 0.05 and p = 0.07 remain in the chaotic domain (λ > 0), and the memory capacity remains relatively low.It is thus seen that once input connections have been built such that p exceeds p c , optimal computational capability can be realized only by amplifying the input signals appropriately.This finding should be helpful for the physical reservoir computing paradigm, because amplifying input signals is generally easier and more cost effective than adding new input connections.

VI. DISCUSSION
In the present work, we have examined a partial-input RNN with rate neurons and have analytically shown the existence of a critical input partiality p c that determines whether the chaotic activity can be suppressed by input signals.In our theory, RNNs are assumed to have ratebased neurons and recurrent weights sampled from Gaussian distribution.It is our future work whether there exists critical input partiality in other types of an RNN such as that with spiking neurons or heavy tailed recurrent weights [24].
In the derivation of λ ∞ (Sec.IV), we have assumed lim ξ→∞ K(t) to exist.However, there are some examples where this assumption does not hold.For example, let us consider the case where an input time series, s(t), has a non-negligible number of zeros.Then, even if the scaling parameter ξ approaches infinity, ξs(t) also has a non-negligible number of zeros, and thus, the value of K(t) determined by Eq.( 5) does not converge.Although it remains to be seen what condition on input signals is required for the existence of lim ξ→∞ K(t), we believe that our theory holds for a wider variety of input signals than Gaussian white noise.
We have focused on the sign of an MCLE to investigate the echo state property (ESP) of an RNN.In RNNs, the sign of the MCLE has been widely regarded as a representative indicator of whether the ESP holds or not.
However, it should be noted that a negative MCLE does not necessarily guarantee that the ESP holds.We discuss two typical cases below.
Firstly, a negative MCLE does not always guarantee ESP.This distinction stems from the fact that the Lyapunov exponent primarily characterizes local stability, whereas ESP is related to the global stability of the network's response to identical input signals.For example, if there are two locally stable attractors, the different initial states in the reservoir can lead to convergence to different attractors.This scenario results in different outputs in response to identical inputs, violating the conditions of ESP.
Moreover, there is another counterexample where an RNN with a positive λ ∞ can behave as having a stable attractor under certain tuned input signals.This can be realized by inputs that are precisely constructed by chaos control methods, such as those used in the Poincaré map [25].However, in the context of reservoir computing, chaos control conditions are rarely satisfied in realistic situations because the input signals are typically predetermined as training data.
Taking all of the above into consideration, a negative MCLE is a reliable, though not infallible, indicator of ESP.In fact, our numerical simulations demonstrated that the ESP always holds with the negative MCLE (data not shown).Consequently, we believe that these exceptions does not substantially affect the general applicability of our results.
We have confirmed that memory capacity is maximized near the critical value of MCLE, λ = 0, which corresponds to the "edge of chaos."Our theory suggests that we can readily construct a partial-input RNN at the edge of chaos by tuning the magnitude of input signals, as long as the input partiality exceeds p c .The present study provides a possible novel approach to designing reservoir computing.
FIG. 1.A schematic depiction of the partial-input RNN studied in this work.The shaded region represents the neurons that receive input signals through input connectivity.

FIG. 2 .
FIG. 2. (a)The maximum conditional Lyapunov exponent (MCLE), λ, calculated for various values of the input partiality, p.The theoretical form given in Eq.(7) is plotted with a solid curve for p = 0.4 and a dashed curve for p = 0.6, where the time average in Eq.(7) is computed up to T = 10 5 .Error bars represent ±std of direct numerical simulations based on the definition of λ in Eq.(6).For each plot, the coupling strength is set to g = 3.0, and the sparsity is set to α = 1.0.For the numerical simulations, the value of the MCLE was obtained by directly calculating Eq.(6) for a sufficiently long time, i.e., T = 10 4 .(b) Analytic results (solid, dashed, dotdashed curves) and numerical results (error bars indicating ±std) for λ∞ with sparsity α = 1.0.In numerical simulations, the value of λ∞ was obtained by directly calculating Eq.(6) for a sufficiently large input magnitude (σ = 10 3 ).In the cases of both (a) and (b), the input is assumed to be Gaussian white noise with zero mean and standard deviation σ.The values of the MCLE (λ and λ∞) were calculated for 10 different network realizations with a network size of N = 1000.

FIG. 3 .
FIG. 3. (a)Relation between the coupling strength, g, and the critical input partiality, pc, given by Eq.(10) for several values of the sparsity, α.(b) The same data as in (a) plotted with respect to λ0 rather than g.It is seen that when plotted in this manner, the data for pc fall along the same curve for each value of α considered.

FIG. 4 .
FIG. 4. Relation between the memory capacity, M C, and the maximum conditional Lyapunov exponent (MCLE), λ, for network size N = 1000, coupling strength g = 1.5, sparsity α = 1.0, and number of lead-out nodes K = 10.The values of (a) λ and (b) M C are respectively plotted as functions of the standard deviation of the input signals, σ.(c) Each plot represents (λ, M C/K) with various values of σ (0.01 ≤ σ ≤ 20).The sum of the Mτ is calculated up to τ = 500.