Embedding Theory of Reservoir Computing and Reducing Reservoir Network Using Time Delays

Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with the generalized embedding theory, we rigorously prove that RC is essentially a high dimensional embedding of the original input nonlinear dynamical system. Thus, using this embedding property, we unify into a universal framework the standard RC and the time-delayed RC where we novelly introduce time delays only into the network's output layer, and we further find a trade-off relation between the time delays and the number of neurons in RC. Based on this finding, we significantly reduce the network size of RC for reconstructing and predicting some representative physical systems, and, more surprisingly, only using a single neuron reservoir with time delays is sometimes sufficient for achieving those tasks.

Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems.However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration.Here, combining the delayed embedding theory with the generalized embedding theory, we rigorously prove that RC is essentially a high dimensional embedding of the original input nonlinear dynamical system.Thus, using this embedding property, we unify into a universal framework the standard RC and the time-delayed RC where we novelly introduce time delays only into the network's output layer, and we further find a trade-off relation between the time delays and the number of neurons in RC.Based on these findings, we significantly reduce the RC's network size and promote its memory capacity in completing systems reconstruction and prediction.More surprisingly, only using a single-neuron reservoir with time delays is sometimes sufficient for achieving reconstruction and prediction tasks, while the standard RC of any large size but without time delay cannot complete them yet.
The last decades have witnessed the extensive application and development of machine learning technology in data-driven research and in high-technology-oriented industry as well.As a representative leader among many machine learning techniques, the artificial neural network (ANN) has emerged as a powerful approach that is well suited for coping with the supervised learning problems.Among various architectures of ANN, Reservoir Computing (RC), which is a recently developed framework [1], a special variant of a recurrent neural network, and also known as a generalization of echo-state network (ESN) [2] or liquid state machine (LSM) [3], has been reported to have great efficacy in reconstruction or/and prediction of many complex physical systems only based on the observational data of time series [4][5][6][7].The architecture of RC is quite contracted.As shown in Fig. 1(a), only three weight matrices are involved: the input matrix and the reservoir recurrent matrix are randomly generated but fixed, while the output matrix is determined via training.As such, efficient least squares optimization methods rather than the resource-consuming back propagation algorithm are adopted in the training process [8].Behind such a contracted architecture, two questions arise naturally: "What is the fundamental mechanism resulting the efficacy of RC?" and "How to improve the structure using the uncovered mechanism?"These questions have attracted great attention and motivated abundant discussions, including those from the topology and the complexity of random connections [9,10] to the spectral radius of random networks and the edge of chaos [11][12][13], from the fading memory property [14] to the echo state property [15,16], from the choice of activation functions [17] to the training algorithm of the output layer [18].Yet, recent understanding of RC is often via heuristic interpretation and it is widely believed that a successful RC should possess high dimensionality, nonlinearity, fading memory, and separation property [6], but barely with rigorous and mathematical demonstrations.
In order to decipher the RC's capacity of reconstructing and forecasting nonlinear dynamics, several efforts from a viewpoint of dynamical systems have been recently made.For example, the regression model and the dynamical model decomposition method were used to illustrate the usefulness of RC to forecasting chaotic dynamics [19,20], and, to demonstrate the approximation capability of RC, an embedding conjecture was studied and could be partially validated for a specific form of RC under right technical conditions [21,22].In the area of photonic neural network, an architecture of photonic reservoir computing has been developed through using a spatiotemporal analogy to translate a delayed differential equation (DDE) into a virtual single-neuron reservoir network [23][24][25].Still, despite these significant efforts and achievements, some key questions remain unsolved: "How to understand the network dimension of general RC using the theories of nonlinear dynamics and functional analytics?"and "How to design a small size network in RC for sustaining its efficacy?" In this Letter, we rigorously study the mechanism of RC from a viewpoint of nonlinear dynamical systems and novelly propose a framework of time-delayed RC.Particularly combining the delayed embedding theory with the generalized embedding theory, we first prove a general reservoir network rigorously as a high dimensional embedding of the original input nonlinear dynamical system.Then, we further reveal a trade-off relation between the time delays and the number of  neurons by unifying into a universal framework the standard RC without delays and the time-delayed RC where the time delays are introduced into the network's output layer.It therefore allows us to construct a random reservoir network with a significantly-reduced physical dimension to achieve the efficacy that the original larger-size RC owns.Surprisingly, we show that a standard reservoir of single-neuron, without introducing any DDE or time-division multiplexing technique, can sometimes work well for reconstructing and forecasting some representative physical systems.Moreover, we find flexible memory capacity in the time-delayed RC, which makes it possible to accomplish more challenging tasks of dynamics reconstruction that cannot be easily achieved using a standard RC of the same scale.
We start with a standard RC as sketched in Fig. 1(a).Here, the input data x k ∈ R n represents the state vector of a dynamical system that is evolving on a compact manifold M with the evolution operator ϕ ∈ Diff 2 (M) : x k+1 = ϕ(x k ).The vector r k ∈ R m represents the state of m reservoir neurons at time step k, the input layer weight matrix W in and the reservoir network matrix W res are, respectively, m × n and m × m random matrices generated according to certain distribution laws.The dynamical evolution of the reservoir neurons is gov-erned by (RN): where α is the leakage factor, and φ ∈ C 2 (R, (−1, 1)) is set a sigmoid function (e.g., tanh) in this Letter.The output vector y k ∈ R l is determined by the output weight matrix W out ∈ R l×m such that y k = W out r k .In the task of nonlinear system reconstruction, given the time series, denoted by x k , k = 1, • • • , N + 1, as training data, the target is to train the output weight matrix W out so as to approximate the one-step dynamics prediction, i.e., y k ≈ x k+1 .To achieve this, the output weight matrix W out is generally calculated by minimizing the loss function L = To rigorously establish an embedding theory for RC, we consider directly the evolution (RN) of the reservoir neurons with the leakage factor α = 1 as: Here, the generic conclusion in Theorem 1 means that, for all [r 0 , W res , W in ] ∈ S where S ⊂ I m × R m×m × R m×n is an open and dense set, G k [r 0 , W res , W in ] is an embedding for any sufficiently large k.The detailed and rigorous proof with respect to the C 1 -topology is provided in Supplemental Information (SI) [26].Moreover, the echo state property, a necessary condition for constructing an RC, requires that, with the general configuration {W in , W res , φ}, the evolutions (RN) of the reservoir neurons, starting from any different initial values r  [15].Hence, by virtue of Theorem 1, regardless of the choice of the initial value r 0 , the dynamics of reservoir neurons is determined by the input dynamics, i.e., there exists a unique embedding Ψ such that r k = Ψ(x k ) after a transient phase, while each component r ik = Ψ i (x k ) implies that the dynamics of each neuron is an observable of the original dynamics.
In the standard RC investigated-above, m, the number of reservoir neurons and also known as the reservoir dimension, is often required to be huge [6,8].To design a different RC framework, significantly reducing m, we introduce time delays into the output layer, as sketched in Fig. 1(c).While all the configuration {W in , W res , φ} and the input data x are set in the same manners, the reservoir network is assumed to include q (< m) neurons only.Thus, a new reservoir vector before the output layer is designated as rk and, correspondingly, the output matrix W out is calculated by minimizing the L 2 loss function Here, the new reservoir vector rk is formed by the lagged dynamics of each neuron, i.e., q neurons with each neuron contributing d i lagged dynamics [r i,k , r i,k−τ , . . ., r i,k−diτ +τ ] where τ is a time delay.Assigning d as the output dimension of this delayed RC.Now, we are in a position to demonstrate that the timedelayed RC with the above-assigned d has the same representation and computation ability as the standard RC involving m neurons without time delay under the same parameter settings, as long as d ≈ m.
Actually, based on the delayed embedding theory and its applications [27][28][29][30], an approximate combination of the lagged observable can also generically form an embedding, i.e., for smooth observational functions ) is generically an embedding as long as q i=1 d i > 2dim(M).Using the aboveobtained conclusion that each neuron is generally an observable, we further conclude that the proposed new reservoir vector rk is also an embedding.Thus, the dynamics of the state vector r k in the m-neuron reservoir network without time delay is topologically conjugated with the dynamics of the reservoir vector rk of a q-neuron reservoir network in the sense of embedding as long as m = d with d = q i=1 d i , as sketched in Fig. 1(b).Consequently, we come to a conclusion that the delayed observables of the RC state, seen as additional nonlinear observables, have the same computational power in the system reconstruction.
To demonstrate the capability of our time-delayed RC, we first consider the benchmark Lorenz system.After a training phase including N = 6, 000 samples, the autonomouslygenerated dynamics by the RC are shown in Fig. 2(a).Particularly, used are a standard RC, a time-delayed RC containing fewer neurons with uniformly lagged dynamics for each neuron, and a time-delayed RC containing the same number of neurons but with random lags for each neuron.Clearly, the time-delayed RC has almost the same performance of system reconstruction as the non-delayed one, no matter the lags are uniformly or randomly generated.Actually, this coincides with the above-performed arguments from a viewpoint of embedding that the dynamics of this non-delayed RC is a generalized embedding to the input dynamics with generically 200 observables, while the dynamics of the time-delayed RC forms an embedding of dimension 200 when the sum of lags equals 200 for either uniform or random lags.Such a trade-off relation is further clearly illustrated in Fig. 2(b) where different neuron number with different lag number for each neuron is combined, and, for each combination, a training error is calculated as the mean squared error (MSE) on the training data set based over 20 independent runs.As depicted in Fig. 2(b), for a fixed moderate number of neurons, the training error decreases monotonically with the lag number for each neuron, and, for a fixed moderate lag number, the training error also decreases monotonically with the neuron number.Analogous results are also obtained for the other benchmark systems, as presented in [26] (see Fig. S5).All these further reinforce the above conclusion that, whenever d ≈ m with moderate N lag and N neuron , the time-delayed and the non-delayed RCs generally share the same ability in system representation, and the numbers of neurons and of time lags can be traded off mutually in these frameworks.
Such a trade-off relationship further puts the non-delayed and time-delayed RCs into a unified framework where the output dimension d becomes the effective reservoir dimension that finally decides the ability of the system representation.The standard non-delayed RC is actually a degenerated form in this unified framework where all the neurons have zero lag.More surprisingly, we find that it is even pos- sible to reduce the number of neurons into one and realize a single-neuron reservoir in the proposed framework.To see this, we consider a gene regulation model with multiple delays: ) which describes self-inhibitory and self-activation with distinct delays τ 1 and τ 2 , and with specific parameters the one-dimensional model has chaotic dynamics [26,31,32].Specifically, a timedelayed RC including only one neuron with 600 lags is used to reconstruct the dynamics, and the autonomously generated dynamics after training are shown in Fig. 3(a).The results confirm that the single-neuron, time-delayed RC performs well, achieving the same reconstruction ability of the timedelayed RC with multiple neurons.Frankly, the single-neuron RC in this numerical illustration is only a special case that is not universally suitable for any system reconstruction.Due to multi-scale property, the task of system reconstruction for multi-variable using one reservoir network usually requires more than one single-neuron.As for the task in Fig. 2(a), in order to get a successful prediction for the 3 components of the Lorenz system, a single-neuron reservoir is not adequate even with multiple time delays.In addition to the equivalent representation ability in the sense of embedding, we further discover that the time-delayed RC has a more flexible memory capacity which is an essential measure for RC's reconstruction ability for delayed systems.In the dynamics recon-struction job for the above gene regulation model in Fig. 3(a), the chaotic dynamics cannot be reconstructed by a standard RC, no matter how large the reservoir is, according to the dimension test [26].However, with all the same reservoir environment, the time-delayed RC [both RC#1 and RC#2 in Fig. 3(a)] can fulfill the job quite well.To understand this phenomenon, we calculate the memory capacity (MC) for different RC frameworks, using the definition in [8] and with different combinations of N neuron and N lag but satisfying the same output dimension, i.e., N neuron • N lag = 600.Specifically, MC of a reservoir refers to its ability to retain information from previous time steps and it is defined in [8] as where a random sequence of input values x(t) is presented to the reservoir, and the reservoir output ŷk (t) is trained to predict a previous input value x(t − k), and here cov(•) and var(•), respectively, represent covariance and variance.Figure 3(b) clearly shows that, as N lag increases, the reservoir computer with different delay settings has stronger memory capacity though still keeping a fading memory fashion.This is essential for the dynamics reconstruction job particularly for time-delayed physical or biological systems such as the gene regulation model above.Thus, the proposed timedelayed RC framework has a more flexible capability to deal with dynamics reconstruction jobs requiring tunable MC.
Finally, to further validate the efficacy of the time-delayed RC in reconstructing high-dimensional spatial-temporal system, we consider the ideal storage cellular automation model (ISCAM) simulating heterocatalytic reaction-diffusion processes at metal surfaces [33,34].Considering the extremely high dimension (the 100×100 grids yields 10000 input dimension), it is a challenging job to reconstruct the chaotic spatialtemporal patterns.As shown in Fig. 4, with the same reservoir output dimension, the time-delayed RC has almost the same reconstruction ability as the non-delay one.
Our framework uses a few hyper-parameters, such as d, the effective reservoir dimension, and τ , the time delay, which definitely affect RC's efficacy in system reconstruction.In fact, the existing literature included some criteria for selecting such parameters in system reconstruction using delayed embedding theory.We thus implement these criteria, the dimension test and the delayed mutual information (DMI), to determine d and τ .From a perspective of embedding, d is only required to be larger than 2 • dim(M) while practically the boxcounting dimension of the manifold M is usually very small, i.e., dim(M) is between 2 to 3 [35,36] for the chaotic Lorenz attractor.However, to design an effective RC, d is required to be moderately large (see all the examples above).This is probably due that, although the generic property in the embedding theory means open and dense in a topological sense, there are still degenerated situations in practice, particularly for randomly-generated networks (see Fig. S1 in [26]).Moreover, to reveal the mechanism from representation to computation, the recent efforts used the universal approximation theory [21] and the DMD [19] framework which further demonstrate the necessity of a large network size of RC in achieving good approximations.Thus, the dimension tests are used to seek a suitable d for each computation.As for the delay τ , either too small or too large value renders computation problematic in system reconstruction, which naturally prompts us to introduce a modified DMI test taking into account the intrinsic time-scales of the neuronal dynamics in RC.Finally, it is noted that, for chaotic systems, the lagged observables earlier than the Lyapunov time have diminishing predictive power for the current time step, so we suggest the constraint τ • ∆t • N lag < Λ max for the choice of τ and N lag in practice where ∆t is the sampling stepsize.The details for the choice of these hyper-parameters are referred to [26].
In conclusion, we have provided a deep and rigorous insight to the mechanism of RC from a viewpoint of embedding theory and nonlinear dynamical systems.Based on our analytical findings, we have studied the role of time delay in the reservoir network and proposed a new framework of time-delayed RC.This framework can significantly reduce the network size and promote the memory capacity, making its ability attain or even transcend the ability owned by the standard RC.Considering the computational costs which are crucially dependent on the network size in the dynamical evolution of RC and the hardware costs related to the circuit size in those overwhelminglydeveloped physical RCs [6], smaller-size reservoir is always expected to promote its real and extensive applications.Moreover, we notice a recently-published and independent work [37], where a method, different from the perspective of embedding theory and memory capacity presented here, was proposed to concatenating internal states through time in RC and realize model-size reduction.Lastly, any contributions to designing RC frameworks of low-resource-consumption are believed to advance the direction of machine learning and thus be of broad applicability in solving data-driven science and engineering problems.

FFIG. 1 .
FIG. 1. RCs as different nonlinear dynamical systems and embeddings.(a) A standard RC without time-delay.(b) The generalized embedding Ψ from the input dynamics to the standard non-delayed reservoir network and the delayed embedding F from the input dynamics to the delayed reservoir network, which constitute a topological conjugation between the dynamics of the non-delayed reservoir network and the delayed reservoir network.(c) A time-delayed RC with a smaller network size in the reservoir layer.

N
k=1 x k+1 − W out r k 2 + β W out 2 over the training data set, where β > 0, the L 2 -regularization coefficient, is introduced to make optimization robust.After training, one can fix the output weight matrix W out and redirect the output y k = W out r k as an approximation of x k+1 into the input layer of the network and thus generate the autonomous dynamics for x k with k > N .
and define a map as G k [r 0 , W res , W in ](x 0 ) = r b0 k,x0 .Here, r b0 0,x0 = b 0 , b 0 ∈ I m , and I = (−1, 1).Thus, we rigorously have the following result.Theorem 1 Let m 2dim(M) + 1 and [r 0 , W res , W in ] ∈ I m × R m×m × R m×n with dim(M) as the box-counting dimension of the manifold M.Then, there exists a number k * > 0, such that G k [r 0 , W res , W in ] ∈ C 1 (M, R m ) is generically an embedding for all k > k * .

FIG. 2 .
FIG. 2. (a) Reconstructed dynamics of the Lorenz system by a nondelayed RC including 200 neurons, a delayed RC #1 including 40 neurons with uniformly 5 lags for each neuron, and a delayed RC #2 including 40 neurons with random lags for each neuron.Here, the time unit is expressed in the Lyapunov time, and the random lags are generated by a distribution centered at 5 as shown in the inset.(b) System reconstruction test for the Lorenz system with different combinations of Nneuron and N lag , where the training MSE in a logscale and the contour curves are, respectively, highlighted.Here, τ = 5 and the sampling stepsize is ∆t = 0.01.All the other parameter settings are introduced in [26].

FIG. 3 .
FIG. 3. (a) Reconstructed dynamics of the chaotic gene regulation model by the standard RC including 600 neurons (see the inset), the single-neuron, time-delayed RC#1 with 600 lags, and the timedelayed RC#2 including 6 neurons and 100 lags for each neuron.(b) MC test with different combinations of Nneuron•Nlag.Here, τ = 5 and the sampling stepsize is ∆t = 0.1.All the other parameter settings are introduced in [26].

FIG. 4 .
FIG. 4. Reconstructed chaotic patterns for the ISCAM model by a standard non-delayed RC including 5000 neurons and a time-delayed RC including 1000 neurons and 5 lags for each neuron.(a) Selected dynamical pattern using different evolution rules.(b) Reconstruction errors deviating from true dynamics from different RC frameworks.