Repeated sequential learning increases memory capacity via effective decorrelation in a recurrent neural network

Memories in neural system are shaped through the interplay of neural and learning dynamics under external inputs. By introducing a simple local learning rule to a neural network, we found that the memory capacity is drastically increased by sequentially repeating the learning steps of input-output mappings. The origin of this enhancement is attributed to the generation of a Psuedo-inverse correlation in the connectivity. This is associated with the emergence of spontaneous activity that intermittently exhibits neural patterns corresponding to embedded memories. Stablization of memories is achieved by a distinct bifurcation from the spontaneous activity under the application of each input.

Through sequential learning, the brain learns to appropriately respond to various inputs. In neural system, synaptic connections are modified to shape neural dynamics such that the applied stimulus and desired response are adequately represented therein. After learning, the stimulus is represented according to the shaped neural dynamics [1][2][3][4][5]. How memories are successively embedded into neural dynamics through the interplay between the neural dynamics and learning process is a crucial question in neuroscience.
To understand the representation of memories in neural system, associative memory models are often studied. In conventional models [6][7][8], multiple memories are designed to be embedded into corresponding attractors and are generated by a simple learning rule. In spite of their success, however, the interplay between neural dynamics and learning has not been taken into account: when learning a new memory, the change in a connection was determined only by the memory pattern, independently of the already-shaped neural dynamics.
In contrast, we previously proposed a novel associative memory model [9,10] that incorporates interactions between neural dynamics and the learning process. In these studies, however, each pattern was only presented once during learning, and existing memories are gradually eroded as new patterns are learned.
In the present Letter, we first introduce a theoretical formulation for a sequential and repeated learning process which interacts with neural dynamics. By studying this learning process, we investigated if all memories are able to be successfully stored by repeated learning. If so, we then address what kind of neural-network structure * kurikawt@hirakata.kmu.ac.jp enables such enforcement and how memories are represented in neural dynamics upon input. We also study spontaneous dynamics without input which is suggested to involve computation in neural system [11][12][13][14][15][16].
We consider a model that consists of N continuous rate-coding neurons to memorize M Input-Output (I/O) mappings. The activity x = {x i } (i = 1, 2, · · · , N ) is set between -1 and 1 and evolves according tȯ where J ij denotes a connection from the j-th to i-th neuron, an N dimensional vector η µ is an input pattern, γ being its strength, and µ is the index of I/O mappings to be learned. In the following section, γ and β are set at 1.0 and 4.0, respectively, unless otherwise stated. For each input η µ , we set an N dimensional vector ξ µ as a target. These input and target patterns are generated as random N -bit binary patterns, with probabilities P (ξ i = ±1) = P (η i = ±1) = 1/2. In the presence of each input η µ , the corresponding target ξ µ is required to be recalled, i.e., an attractor matching ξ µ is generated. The learning process is required to modify the connectivity J ij to enable the network recall the targets.
Previously [10], we showed such a memory structure is formed through a simple learning rule. To make repeated sequential learning, we added a decay terṁ where h i = j =i J ij x j . We use a learning rate ǫ = 0.03 unless otherwise stated. According to this learning rule, d( j =i J 2 ij )/dt ∝ (1 − j =i J 2 ij ). Thus, this rule preserves j =i J 2 ij = 1, if initially j =i J 2 ij = 1: J ij takes a binary value with probabilities P (J ij = ±(N − 1) −1/2 ) = 1/2 before learning. Diagonals of J are set at zero during the whole process. The learning process stops automatically when the neural activity matches the target, becauseJ ij = 0; otherwise, the learning process continues. Here we imposed M I/O maps successively: an input is applied to learn a target and after learning the map was completed, another input is applied to learn another target. The learning process for each single I/O map is called a learning step and denoted T . During the learning process, maps are applied in the order for T ≤ M steps. Then, they are randomly applied for T > M steps. Fig. 1A shows the recall processes in response to two input patterns after learning. η 1 is applied from t = 50 to 100. Under the input, neural dynamics are modified and the neural state converges to ξ 1 . When η 2 is applied instead of the input 1, the neural dynamics are modified differently and, then, the neural state converges to ξ 2 . A neural state that provides a desired target pattern is an attractor. Here, these two I/O maps are successfully recalled.
We first analyzed how repeated learning enhances the memory capacity. For this purpose, we computed the temporal average of an overlap of neural activity x with a target, [m µ ] = Σ i x i ξ µ i /N in the presence of input η µ (µ = 1, , , M ). ... and [...] represent the temporal average and average over network and trials in Fig. 1B. At this stage, networks can recall only one or two targets perfectly and overlaps with other targets decrease rapidly, independently of M . After learning these targets more and more times (T = 30M ), however, recall performance increases and targets of about 60 are recalled perfectly. Networks fail to memorize target patterns beyond M = 60. Thus, M = 60 indicates the limit of memory.
To evaluate this memory capacity in detail, we calculated the averaged overlap < [m µ ] > µ over maps, networks and trials, and plotted it for different N in Fig  1C, where < ... > µ represents average over index of I/O maps. After T = M learning steps, the average overlap decreases rapidly, while, after T = 30M learning steps, the overlap is maintained at around unity up to α = M/N = 0.3. Therefore, the capacity of the present model is estimated to be α c = 0.3 − 0.35. To explore dependence of the memory on T , we examined [< m µ > µ ] for different T . We found that the memory capacity increases monotonically as T increases and saturates around T = 20M (See Fig. S1). Thus, we studied behavior for T = M and 30M as typical samples in the earlier and later stage of learning.
Enhancement in the memory capacity after iterative learning is not trivial, but depends on the learning speed ǫ and β. As shown in Fig. S1, the memory capacity is decreased as ǫ increases and β decreases. Especially, the memory capacity for ǫ ≥ 1 and β ≤ 1 is almost one, which cannot be increased by repeated learning. This result indicates the need for an adequate relation between the timescales of neural and learning dynamics, as well as the nature of neural dynamics, is important to enhance memory capacity through repeated learning. Next, we examined the nature of J shaped throughout the learning process and its relevance to the enhancement in the memory capacity. To this end, we calculated singular values of the connectivity for different learning steps. A learned connectivity J is decomposed as J = U ΣV t . V t is a transpose matrix of V with Σ as diagonal matrix whose elements are singular values. The values are plotted in the order of their magnitude for N = 200, M = 60 in (Fig 2A). They decrease continuously for earlier learning steps, while, after long learning, there appears a large discontinuity at 60. For different N , the singular value always show a discontinuous drop at M at the later learning stage. This means M left and right singular vectors become dominant in the connection throughout learning [17].
Recalling the connections in the Hopfield model [7] (J = Σ M−1 µ=0 ξ µ (ξ µ ) t /N ) and our previous models [9, 10] (J = Σ µ (ξ µ − η µ )(ξ µ + η µ ) t /N ), we hypothesize that these M vectors consist mainly of linear combinations of ξ and η and the other N − M left and right singular vectors are in the normal space to these combinations. To examine this hypothesis, we used a κ,µ = Here, u κ i and v κ i are i-th elements of κ-th left and right singular vectors, respectively. Contributions of ξ µ and η µ to u κ (v κ ) are roughly estimated by a κ,µ and b κ,µ (c κ,µ , and d κ,µ ), respectively [18]. We measured < Σ µ (a κ,µ ) 2 > 0≤κ≤M−1 , the average contribution of targets to one of dominant M left singular vectors and also the corresponding quantities for b, c and d in Fig.2B. All of the values are much higher than chance level M/N = 0.3 meaning that the dominant M vectors mainly consist of targets and inputs. Particularly a κ,µ and c κ,µ increases with learning.
We also found that a κ,µ is highly correlated with b κ,µ , while c κ,µ is correlated to d κ,µ (Fig. S2). Thus, dominant M left and right singular vectors are decomposed as where k µ (l µ ) is the correlation coefficient between a κ,µ and b κ,µ , (c κ,µ and d κ,µ ). We found also that a κ,µ is highly correlated with c κ,µ across κ for a given µ, but not with c κ,ν (Fig. S2). By this analysis, in total, J is decomposed as where S µν = Σ κ σ κ a κµ c κν and σ κ is an κ-th singular value. Note that, to enhance recall performance, nondiagonal terms S µν (µ = ν) should be small. In our model, actually, they are much smaller than the diagonal ones of S, since there is no correlation between a κ,µ and c κ,ν . Additionally, these non-diagonal terms are further reduced as learning progresses, as shown in Fig. 2C. To achieve optimal memory capacity, it is generally believed that inverse matrix of correlation between targets has to be introduced into the connectivity [19][20][21] to reduce the interference due to correlation between targets. In the case of our model, Instead of obtaining the exact form of the connectivity [22], we focus on whether the learned connectivity effectively decorrelates the patterns. Recalling that, in the standard Hopeld network corresponding to the case that S is a diagonal matrix, the standard deviation of ξ µ Jξ ν /N (µ = ν) follows O((α/N ) 1/2 ), whereas it follows O(N −1/2 ) for the Pseudo-inverse correlation matrix [19][20][21]. We, hence, measured ξ µ Jξ ν /N (µ = ν) and estimated its dependence on N and α in Fig. 2D and in Fig. S2 for the present connection matrix shaped by learning. We found that the standard deviation at the earlier stage of learning (at T = M ), it follows O((α/N ) 1/2 ), but at the later stage of learning it turns to follow O(N −1/2 ) (for T = 30M ). This result implies that our learning rule effectively shapes the inverse correlation matrix into the connectvity throughout the learning process to optimally reduce interference.
Next, we analyzed how memories are represented after the inverse correlation matrix is shaped. We first focus on modification of neural dynamics against input strength γ. In Fig. 3A, we plot a bifurcation diagram against γ for T = 30M . Neural activity for γ = 0, i.e., spontaneous neural activity, oscillates around the origin. As γ increases, it moves towards a target while maintaining the oscillation amplitude. At a certain strength, the attractor of neural dynamics bifurcates from oscillation to a fixed point corresponding to the target. Neural dynamics projected onto 2-D plane is plotted around the bifurcation point in Fig. 3. Neural activity with a large-amplitude oscillation reduces into a fixed point corresponding to the target between γ = 0.65 and 0.7. Beyond the bifurcation point, the fixed point stays around the target as γ is increased. Thus, neural activity corresponding to target recall is clearly distinguished from other activities through a bifurcation and is stable against change in γ beyond the bifurcation point.
We then explored the behavior of neural activity against the mixture of two learned inputs. As an example, the phase diagram of m 48 against strength of two learned inputs (µ = 48, 49) is shown in Fig. 3B. The fixed point corresponding to ξ 48 shapes a distinctive phase, at boundary of which bifurcation from the target fixed point to oscillating dynamics occurs (Fig. S3). The fixed point of the other target (µ = 49) provides similar phase diagram (Fig. S3). As an input pattern is changed to increase γ in the form of γη 48 + (1 − γ)η 49 (0 ≦ γ ≦ 1), the attractor bifurcates from the fixed point of ξ 49 to oscillating neural activity close to ξ 49 and to that close to ξ 48 , then, to the fixed point of ξ 48 . These results show that a target is represented as a distinctive phase of the fixed point which is separated by the bifurcations from the attractor with the oscillation.
We asked how robust these memories are against perturbation in inputs. To examine the robustness, we applied quenched random noise with strength s to the original input patterns, as η ′µ = η µ + sζ (ζ is an Ndimensional vector whose elements are random number from uniform probability distribution [−1, 1]) and analyzed stability of the neural activity that recalls the target (Fig. 3C). For small s, the fixed point of the target is insensitive to the noise and remains around unity. Beyond the bifurcation point, the fixed point is collapsed into neural activity showing oscillation.
To close the analysis of neural dynamics, we explored how spontaneous activity is related to the recall perfor- mance through learning. For earlier learning step, spontaneous activity shows chaotic dynamics that intermittently approach and depart from targets in Fig. S4A. Here, only a few targets each of which is successfully recalled upon input are approached (as well as their opposite patterns due to parity symmetry in our model) in Fig.  4A and Fig. S4B. For later step, the spontaneous activity also approaches targets, but, here, many targets are more equally approached. We further analyzed neural dynamics by using principal components (PC) analysis and measuring Lyapunov dimension in Fig. 4B. We found that the variation of the spontaneous activity is larger and more chaotic, when learning progresses and recall performance is improved. Thus, the spontaneous activity is constrained on lower dimension along axis connecting target patterns for lower recall performance, whereas that is distributed more isotropically against target patterns across higher dimensions for higher performance.
To confirm this relation between spontaneous activity and recall performance generally, we examined the spontaneous activity for different ǫ in Fig. S4. For smaller ǫ, recall performance is higher and the spontaneous activity shows high-dimensional distribution which is close to all of the targets. As ǫ increases, in contrast, recall performance decreases and the spontaneous activity turns to be low-dimensional, approaching to only a few targets which are perfectly recalled. Finally, for quite large ǫ (=5), the spontaneous activity turns to be a few of fixed points which are target patterns and only these target are successfully retrieved. These results support the relation between the spontaneous activity approaching the targets and recall performance.
To sum, by studying neural networks that memorize I/O maps, we have shown how repeated learning stabilizes each memorized state and enhances memory capacity via the interplay between neural dynamics and learning. In usual sequential learning, e.g., gradient descent method [23,24] and palimpsest memory [25][26][27], connections are slowly shaped. The network's output moves in the direction of the desired target, but does not match it after a single step. In contrast, in the present study, connections are modified such that the network generates the correct target after each step in one shot. Thus, we can analyze how targets are embedded in neural dynamics and how the representation of these targets changes through learning. Interaction between neural dynamics and learning was investigated to reveal how neural representation is shaped in several studies [1,2,[28][29][30][31]. These studies, however, did not focus on parametric effects of neural dynamics (e.g., the gain parameter) and learning (e.g., the learning speed) on learning performance and representation of memories.
Spontaneous activity which intermittently reproduces stimulus-evoked patterns is commonly reported in visual [12,32] and auditory [33] cortices. Theoretical studies [28,[34][35][36][37] demonstrated how the spontaneous activity is shaped through learning. Our study provides another simple learning rule to form such spontaneous activity. Further, we showed a relation between features of spontaneous activity and recall performance -consistent with its interpretation as a prior distribution in terms of probabilistic inference [11][12][13][14]. More generally, properties of neural dynamics relevant for information processing were investigated [38][39][40][41], and the edge of chaos was suggested as an appropriate regime. Our model suggests that high-dimensional chaos with intermittent visits to learned patterns is suitable to produce appropriate targets in response to inputs. The role of such itinerant dynamics [42] has been discussed over decades [43][44][45], and the present study clearly demonstrates it.
The Pseudo inverse model [19][20][21] can achieve higher memory capacity 1.0N than the standard Hopfield network 0.14N [7,8]. In this model, the inverse correlation matrix of memories is included in the connectivity to reduce interference among memories in recall, and non-local information is required to shape this connection. Further, Diedrich [21] proved the local learning rule ∆J ij = (1/N )(ξ µ i − Σ j J ij ξ µ j )ξ µ j can shape such a connectivity after repeated learning. In our model, if we focus only on the relaxation dynamics in the vicinity of ξ µ , a fixed point of neural dynamics in eq. 1, which is given by . Then, the learning rule in eq. 2 takes a similar form with the Diedrich rule, by neglecting the decay term. This may partially explain why our local, repeated learning shapes the connection matrix to include the inverse correlation matrix and enhances the memory capacity.
In the present study, in contrast to the ordinary associative memory [7,8,[25][26][27], each memory is recalled through an input-induced bifurcation from the sponta-neous neural activity. After repeated learning, the spontaneous activity and the fixed point of the recalled memory state are distinguished discontinuously through this bifurcation, resulting in the stability of memory against a perturbed input pattern. Although modulation of neural dynamics by input is analyzed in some studies [46][47][48], our study suggests that the memory state is represented as a robust and distinct phase against the parameter space of input strength. This discrete representation of memory often observed in auditory [49], olfactory [50] cortices and Hippocampus [51]. In these cortices, neural activity patterns are discretely switched between two memory states depending on the intensity of sensory inputs and/or ratio of mixture of two different inputs. Our model provides a simple learning rule to form such memory representations and gives a prediction in the terms of spontaneous activity properties and memory performance.

I. AKNOWLEDGEMENT
We thank David Colliaux for fruitful discussion. This work was partly support by KAKENHI (no. 18K15343) and Hitachi The university of Tokyo for funding.
We studied how memory performance is changed through learning. We plotted [m µ ] against µ with the increase of T . It is sorted in the order of magnitude of the overlap in Fig S1A(i). For early learning stage, only a few targets are stored, while, for the later stage, the number of targets perfectly recalled increases rapidly. Here, we measured [< m µ > µ ] as recall performance in A(ii). For N = 200 and M = 60, we plot the recall performance against leaning step T . The capacity increases rapidly up to T = 10M and almost saturates at T = 20M . B. recall performance for different ǫ and β For ǫ = 0.03 and β = 4, the memory capacity is enhanced through repeated learning. We explored its dependence on different parameters, especially, ǫ and β. ǫ is a time scale of learning process relative to that in neural dynamics. We plotted capacity curve for various ǫ in Fig.S1B. As ǫ increases, the number of patterns which are successfully recalled decreases and for ǫ > 1, only one pattern is recalled. We also explored dependence of the recall performance on β. Generally, in randomly coupled neural network models, attractors change from fixed points to chaos with the increase in β. We plotted the recall performance for different β in Fig. S1C. As β increases, the recall performance is increased. For β < 1, only one or two memories are recalled successfully. These results show that relationship between timescales of neural dynamics and learning process is crucial to shape successful memories.

E. Representation of targets
We explored the behaviors of the overlap with the target against inputs. In Fig S3A, we show the bifurcation of the overlap with the target 48 against strength of the input 48 under the presence of input 49 in addition to Fig 3B. the bifurcation of the overlap with the target 49 is plotted in Fig S3C, whereas the bifurcate diagram for inputs 48 and 49 are shown in Fig S3B. All the results support that recall of the target pattern is represented as a distinctive phase of the corresponding fixed-point attractor and as separated from oscillating neural activity.

F. Spontaneous activity
We analyzed how the nature of spontaneous activity is changed through learning. Spontaneous activity shows chaotic behavior intermittently approaching some targets in Fig. S4A. For earlier learning, we found clear correlation between recall performance < m > and maximum overlap < max 0<t<1000 m µ (t) > µ as shown in Fig. S4B. A few targets which show nearly perfect recall performance are closely approached by the spontaneous activity. For later learning, in contrast, there appears no clear correlation. Almost all targets show perfect recall performance and their closeness (the maximum overlap) is distributed in middle value.
Next, we explored spontaneous activity for different ǫ. As epsilon decreases and recall performance increases ( Fig. S1B and Fig. S4E), the spontaneous activity is distributed broader (Fig. S4D and Fig. S4F) and more chaotic (Fig. S4F). This relation between the spontaneous activity and recall performance is consistent with that for different learning step. For quite larger ǫ, some fixed points, instead of chaotic dynamics, are shaped, one of which corresponds to the latest trained network in   Caption is in the next page