Emergence of a resonance in machine learning

The beneﬁts of noise to applications of nonlinear dynamical systems through mechanisms such as stochastic and coherence resonances have been well documented. Recent years have witnessed a growth of research in exploiting machine learning to predict nonlinear dynamical systems. It has been known that noise can act as a regularizer to improve the training performance of machine learning. Utilizing reservoir computing as a paradigm, we ﬁnd that injecting noise to the training data can induce a resonance phenomenon with signiﬁcant beneﬁts to both short-term prediction of the state variables and long-term prediction of the attractor. The optimal noise level leading to the best performance in terms of the prediction accuracy, stability, and horizon can be identiﬁed by treating the noise amplitude as one of the hyperparameters for optimization. The resonance phenomenon is demonstrated using two prototypical high-dimensional chaotic systems.


I. INTRODUCTION
A challenging problem in nonlinear dynamics is modelfree and data-driven prediction of chaotic systems .In general, there are two kinds of forecasting problems: short term and long term.In short-term forecasting, the goal is to predict the detailed dynamical evolution of the state variables from specific initial conditions, typically for a few cycles of oscillation (or Lyapunov times).In long-term prediction, the aim is to generate the attractor of the system with the correct statistical behaviors.According to conventional wisdom, for solving the prediction problems, noise would be detrimental.For example, in short-term prediction, because of the sensitive dependence on initial conditions, noise will make the predicted state evolution diverge exponentially from the true one.In long-term prediction, noise can induce the trajectory to cross the basin boundary, leading to a wrong attractor.
Recent years have witnessed the development of machinelearning techniques for predicting chaotic systems [24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41], where an extensively studied scheme is reservoir computing [42][43][44][45].In machine learning, it has been known that adding noise to the training data can improve the generalizability through the mathematical mechanism of regularization [46].Quite recently, in a study of reservoir computing used to learn the relationship between different state variables of a chaotic system, it was found that the best performance is achieved when noise amplitudes in the training and testing phases are matched [47].What is the physical or dynamical mechanism underlying the benefit of noise and how do we find the optimal level of noise?
In this paper, we uncover a resonance phenomenon in which a certain amount of noise can significantly enhance the short-term and long-term prediction accuracy and robustness for chaotic systems, where the optimal noise level can be found through a generalized scheme of hyperparameter optimization.In particular, we consider reservoir computing and inject noise into the input signal.The machine-learning architecture contains a number of hyperparameters and the prediction performance depends on their values.Our simulations reveal that if the hyperparameters are not optimized, noise in the training data can improve to certain extent the prediction performance.However, in order to maximize the predictive power of a reservoir computer, it is necessary to find the optimal values of the hyperparameters, a task that can be accomplished through, e.g., Bayesian optimization [48,49].The key to identifying the resonance is to treat the noise amplitude as one of the hyperparameters, i.e., to regard it as an intrinsic parameter of the reservoir computer.Bayesian optimization can then yield the optimal noise level.We demonstrate using two prototypical high-dimensional chaotic systems in which noise with the determined amplitude can generate more accurate, robust and stable predictions in both short and long terms.We develop a physical theory by deriving an approximate Langevin equation to understand the emergence of the resonance.hidden layer, and the noise amplitude σ .To determine the optimal hyperparameter values, we use the SURROGATEOPT function in MATLAB [50], a Bayesian optimization procedure, and employ a surrogate approximation function to estimate the objective function and to find the global minimum through sampling and updating.Specifically, the SURROGATEOPT algorithm [51] first samples several random points and evaluates the objective function at these trial points.The algorithm then creates a surrogate model of the objective function by interpolating a radial basis function through all the random trial points.From the surrogate function, the algorithm identifies the potential minima and samples the points about these minima to update the function.
We demonstrate the benefits of noise to both short-term and long-term prediction using two prototypical chaotic systems: the Mackey-Glass (MG) system described by a nonlinear delay differential equation and the spatiotemporal chaotic Kuramoto-Sivashinsky (KS) system.We use the Bayesian algorithm to obtain the optimal values of the six hyperparameters (including the noise amplitude σ ).We then choose a number of σ values away from the optimal value and test the prediction performance.For each fixed σ value, we optimize the other five hyperparameters.For a different value of σ , the set of the other five hyperparameters is then different.As the noise amplitude deviates from the optimal value on either side, there is a gradual deterioration of the prediction performance, signifying the emergence of a resonance.

A. Emergence of a resonance from short-term prediction
Our first example is the MG system [52] described by ṡ(t ) = as(t − τ )/(1 + [s(t − τ )] c ) − bs(t ), where τ is the time delay, a, b, and c are parameters.The state of the system at time t is determined by the entire prior state history within the time delay, making the phase space of the system infinitely dimensional.To be concrete, we use two values of the time delay: τ = 17 and τ = 30, and fix the other three parameters as a = 0.2, b = 0.1, and c = 10.For τ = 17, the system exhibits a chaotic attractor with one positive Lyapunov exponent: λ + ≈ 0.006.For τ = 30, the system has a chaotic attractor with two positive Lyapunov exponents [53]: λ + ≈ 0.011 and 0.003.To generate the one-dimensional MG time series data, we integrate the delay differential equation with the time step h = 0.01 and generate the training and testing data by sampling the time series every 100 steps: t = 100h = 1.0,where t is evolutionary time step of the dynamical network in the hidden layer of the reservoir computer.To remove any transient behavior, we disregard the first 10 000 t in the training data set.The length of training data is T = 150 000 t.The step after the training data marks the start of the testing data, whose length depends on whether the task is to make short-term or long-term prediction.The time series data are preprocessed by using z-score normalization: z(t ) = [s(t ) − s]/σ s , where s(t ) is the original time series, s and σ s are the mean and standard deviation of s(t ), respectively.For τ = 17 and τ = 30 in the MG system, the testing lengths for Bayesian optimization are T opt = 900 t and 300 t, respectively, which are also the testing lengths for short-term prediction.The so-obtained optimal hyperparameter values are listed in Table I. Figure 1(a) shows, for Visually and statistically, the predicted attractor cannot be distinguished from the true attractor.Prediction results for τ = 17 are presented in Fig. 2.
Our second example is the one-dimensional KS system [54,55], a paradigm not only in physics and chemistry but also in applications of reservoir computing for demonstrating the predictive power for high-dimensional dynamical systems [28].The system equation is where u(x, t ) is a scalar field defined in the spatial domain 0 x L, μ and φ are parameters.We set μ = 1 and φ = 1, and use the periodic boundary condition.As the domain size L increases, the system becomes progressively more high-dimensionally chaotic with the number of Lyapunov FIG. 1. Short-term and long-term prediction of the MG system for τ = 30.The optimal noise amplitude is 10 −1.97 .(a) Machinepredicted system evolution (red trace) in comparison with the ground truth (blue).The predicted state evolution agrees with the true evolution for a time period that contains about 15 local maxima (T = 500 t), a result that is significantly better than those without optimal noise.(b), (c) Representation of the true and predicted attractor in the {X ≡ s(t ), Y ≡ s(t − τ )} plane.The prediction time length is T = 10, 000 t. FIG. 2. Short-term and long-term prediction of MG system for τ = 17.The optimal noise amplitude is determined to be 10 −3.42 .Top row: Machine-predicted system evolution (red trace) in comparison with the ground truth (blue).The predicted state evolution agrees with the true evolution for more than 20 cycles of oscillation, a result that is significantly better than those without optimal noise.Bottom row: Representation of the true (blue) and the predicted (red) attractor in the {X ≡ s(t ), Y ≡ s(t − τ )} plane.The prediction time length is T = 20, 000 t.
exponents increasing linearly with the system size [56].As a representative case of high-dimensional chaos, we choose L = 60, where the system has seven positive Lyapunov exponents: λ + ≈ 0.089, 0.067, 0.055, 0.041, 0.030, 0.005, and 0.003.The length of the training data is about 1000 Lyapunov times (after disregarding a transient of about 300 Lyapunov times), where a Lyapunov time is defined as the inverse of the largest positive exponent.The testing data for short-term and long-term prediction are taken immediately after the training data of six and 100 Lyapunov times, respectively.
Figure 3 shows the results of short-term and long-term predictions of the KS system.It can be seen that the reservoir computing machine with the aid of optimal noise not only can accurately predict the short-term spatiotemporal evolution but also is able to replicate the long-term attractor with the correct statistical behavior.To demonstrate the emergence of a resonance for both short-term and long-term predictions, we ascertain that the optimal noise amplitude values from Bayesian optimization as listed in Table I are indeed optimal.To this end, we vary the noise amplitude (uniformly on a logarithmic scale) in the range [10 −8 , 10 −0.5 ].For each fixed noise amplitude, we optimize the other five hyperparameters (ρ, γ , α, β, and k).For different values of the noise amplitude, the so-obtained values of the other five hyperparameters are listed in three tables in Appendix B. To characterize the performance of short-term prediction, besides the conventional RMSE, we introduce two additional measures: Prediction horizon and stability, where the former (denoted as t s ) is the maximal time interval during which the RMSE is below a threshold  I. and the latter is the probability that a reservoir computer generates stable dynamical evolution of the target chaotic system in a fixed time window, which is defined as where r c is the RMSE threshold, n is the number of iterations, and [•] = 1 if the statement inside is true and zero otherwise.
Figure 4 shows the RMSE, the prediction stability R s (r c ), and the prediction horizon versus the noise amplitude σ for the MG system for τ = 30 (left column, r c = 0.1), as well as the KS system (right column, r c = 8.0).In both cases, an optimal noise level emerges in the sense that a prediction measure versus the noise amplitude exhibits either a bellshape or an anti-bell-shape type of variation about an optimal point.Figure 4 thus provides strong evidence for a resonance associated with short-term performance of machine-learning prediction of chaotic systems.
Figure 5 illustrates the three quantitative measures (RMSE, prediction stability and horizon) characterizing the short-term prediction versus the noise amplitude for the MG system for τ = 17.The emergence of an optimal noise level can be seen.The results in Figs.2-5 provide strong evidence for the emergence of an optimal noise level and a resonance in reservoir-computing-based machine learning.

B. Emergence of a resonance from long-term prediction
We study the beneficial role of noise in long-term prediction of chaotic attractors.Due to the sensitive dependence on initial conditions in chaotic systems, an accurate prediction of the state evolution is possible only within a few Lyapunov times.However, as we have demonstrated in the main text, it is still possible to predict the long-term statistical behavior, e.g., the attractor of the system.If this is the case, the trained reservoir computer has captured the dynamical climate of the target system.It can also occur that a reservoir computer, in spite of training, fails to capture the climate of the target system.In this case, the attractor predicted by the machine deviates from the true one.Remarkably, we find that noise can enhance the reservoir computer's ability to capture the dynamical climate of the target system, providing a beneficial role in long-term prediction.
To compare two attractors, it is necessary to introduce a measure to quantify their mutual deviation.To gain insights, we first generate six examples of long-term prediction of the Mackey-Glass (MG) system: two for τ = 17, two for τ = 30, and long-term prediction of the Kuramoto-Sivashinsky (KS) system.For each example, we generate a case of successful There is a range of noise amplitude in which the DV value is minimized, which contains the optimal noise level determined from the corresponding short-term prediction results in Fig. 2. FIG. 7. Quantifying long-term prediction through the deviation value DV. (a), (b) Successful cases of attractor prediction in the presence of optimal noise for the MG system for τ = 30 and KS system, respectively.(c), (d) Unsuccessful cases of attractor prediction without noise for the two systems.The two-dimensional phase space for the MG system is {X (t ) ≡ s(t ), Y (t ) ≡ s(t − τ )}.For the KS system, the space is {X (t ) ≡ u(4, t ), Y (t ) ≡ u(5, t )}.(e), (f) DV versus the noise amplitude for the MG and KS systems, respectively.There exists an optimal noise amplitude at which the DV value is minimized, which agrees with the optimal noise level determined from the corresponding short-term prediction results in Fig. 4.
for both the true and predicted attractors in a fixed time interval, and define the deviation value (DV) as: where m x and m y are the total numbers of cells in the x and y directions, respectively, f i, j and fi, j are the frequencies of visit to the cell (i, j) by the true and predicted trajectories, respectively.If the predicted trajectory leaves the square, we count them as if they belonged to the cells at the boundary where the true trajectory never visits.The length of the time interval to demonstrate long-term prediction of the MG system for τ = 17 and τ = 30 is T = 20 000 t.For the KS system, the length is 200 Lyapunov times.Different from RMSE, the DV value will not be large even if there is a collapse.It is meaningful to calculate the average DV.We do this using 100 independent realizations of the reservoir computer for each example.Figures 6(a) and 6(b) show that the DV value for the successful case of prediction is much smaller than that for the unsuccessful case, for the MG system for τ = 17.In fact, for the unsuccessful cases without noise, the predictions are so bad that, after a transient time, the predicted trajectories largely deviate from the true attractor.Without noise facilitating the training, such unsuccessful cases of attractor prediction are not uncommon [26].Figure 6(c) show DV versus the noise amplitude for the MG system for τ = 17.In this case, there is a range of the noise amplitude in which the DV value is minimized.Note that this range contains the optimal value of the noise amplitude from the short-term prediction results in Fig. 2.
Figure 7 illustrates the emergence of a resonance from long-term prediction for the MG system for τ = 30 (left column) and the KS system (right column).In each case, there is an optimal noise amplitude at which the DV value is minimized [Figs.7(e) and 7(f)], which agrees with the optimal value of the noise amplitude from the short-term prediction results in Fig. 4, providing additional support for the emergence of a resonance in machine learning in terms of long-term prediction of chaotic attractors.Compared with short-term prediction, the resonance associated with long-term prediction is wider about the same optimal noise level.These results provide consistent support for the emergence of a resonance from the perspective of long-term prediction of chaotic attractors.

III. HEURISTIC REASON FOR THE OCCURRENCE OF A RESONANCE
Intuitively, the dynamical mechanism of the resonance is the result of a time-scale match.In particular, the input chaotic signal to the machine has an intrinsic time scale.When noise is present in the input, the recurrent nature of the neural network generates stochastic evolution of the dynamical state, inducing another time scale: the mean first-passage time.When these two time scales match, a resonance emerges.For reservoir computing with nonlinear activation, at the present it is not feasible to develop a quantitative mathematical understanding of the resonance.However, the seminal work by Bollt [40] suggested that an approximate model of linear reservoir computing captures the essential dynamics of the neural learning mechanism.We thus consider this approximate model subject to input noise and provide a heuristic argument that the underlying stochastic dynamics can be described by a Langevin-like equation: ṙ ≈ f (r, u) + ξ , where r and u represent the reservoir hidden state and the input vectors, respectively, the deterministic force f (•) is a function of r and u and ξ is a vector of random fluctuations representing the stochastic perturbations.The time-scale match can be demonstrated by applying the analysis of the Langevin equation in treating noise-enhanced temporal regularity (or coherence resonance) in chaotic systems [57,58].
A resonance in nonlinear dynamical systems is broadly referred to as the phenomenon in which noise can improve the performance of the system.For example, coherence resonance is characterized by the optimization of a measure of the temporal regularity of the state variables by noise, which was originally studied in neural dynamical systems [59] and observed in various other systems, such as climate systems [60], lasers [61], and biological systems [62].Unlike a stochastic resonance [63][64][65][66][67][68], which describes the effect of noise on overcoming the system's energy barriers and improving the signal-to-noise ratio, coherence resonance concerns the temporal aspect of the signal and it does not require an external periodic driving.
In general, the underlying mechanism of any resonance phenomenon is a match between two time scales, which occurs at some optimal noise level.For example, in a coupled oscillator system, one time scale can be the characteristic average frequency of the individual oscillators and the other is induced by noise, e.g., the mean first-passage time.We have demonstrated that a certain level of noise in the data can improve, quite remarkably, the ability of a reservoir computer to predict both the short-term dynamical evolution and the long-term invariant sets of chaotic systems.Quantitatively, we find that a number of measures characterizing the short-term and long-term prediction performance exhibit the defining feature of a resonance: there exists an optimal noise amplitude for which the measures are maximized.Because of this remarkable consistency and agreement with the notion of a general resonance in nonlinear systems, we propose that the phenomenon uncovered in our work indeed represents a resonance.
A challenging issue is to identify the underlying dynamical mechanism responsible for the emergence of a resonance in machine learning.It is difficult to apply a mechanical model to the machine-learning system, as the dynamics of the high-dimensional neural network in the hidden layer are extraordinarily complicated.Our approach is to develop an approximate physical picture.Following Bollt's seminal work on explainable reservoir computing [40], we apply stochastic input and derive a Langevin type of equation to obtain a physical understanding of the numerically observed resonance phenomenon.
The state evolution of the recurrent neural network in the hidden layer is described in Appendix A 1. For simplicity, we set α = 1 and rewrite the equation of dynamical evolution as where t is the time step, ξ (t ) represents the noise added to the input signal, and the activation is described by the hyperbolic tangent function tanh(x).For x 1, we have tanh(x) ∼ x and obtain the special class of linear reservoir computers [40]: To simplify notation, we set t = 1.Successive iterations of Eq. ( 4) are where A 0 = I.Since the output matrix W out maps r(t ) into the output signal v(t ) that has the same dimension as that of the input vector u(t ), we have indicating that a linear reservoir computer yields a vector autoregressive process (VAR) with a general form [40]: where x and y represent the input and output, respectively, c is a constant term, and A 1 , A 2 , . . ., A t are coefficient matrices, and ξ denotes the stochastic process.
Note that the reservoir state evolution Eq. ( 4) and its recursion Eq. ( 5) can be cast into the form which is similar in mathematical form to the Langevin equation: that describes a particle moving under the influence of two forces: a deterministic force f (x l ) and a stochastic force g(x l )ξ (t ).Comparing Eqs. ( 8) and ( 9), we have that [A − I] • r + W in • u represents the deterministic force while W in • ξ is the stochastic force that provides the random driving to the reservoir intrinsic state.The Langevin equation of the form Eq. ( 4) was shown previously through the corresponding Fokker-Planck equation to yield a stochastic time scale required for matching with the dynamical time scale of the system in the context of coherence resonance [57,58].The approximate equivalence of Eq. ( 4) to the particular form of the Langevin equation provides a physical reason for a resonance to arise in reservoir computing, as we have demonstrated numerically.

IV. DISCUSSION
To summarize, we have uncovered the emergence of a resonance in machine-learning prediction of chaotic systems.Focusing on reservoir computing, we find that injecting noise into the training data can be beneficial to both short-and long-term predictions.In particular, for short-term prediction, a number of characterizing quantities such as the prediction accuracy, stability, and horizon can be maximized by an optimal level of noise that can be found through hyperparameter optimization.For long-term prediction, optimal noise can significantly increase the chance for the machine generated trajectory to stay in the vicinity of (or to shadow) the true attractor of the target chaotic system.Intuitively, training with noise can enhance the machine's tolerance to chaotic fluctuations, which can be beneficial for the machine to learn the dynamical climate of the target chaotic system.This suggests that the optimal noise level should be on the same order of magnitude as the one-step prediction error in noiseless prediction, which is indeed so as verified by our numerical examples.Pertinent issues such as the requirement of prediction time for the emergence of a resonance, robustness of resonance against different scenarios of noise injection, and the beneficial role of noise in reducing the reservoir network size and computational complexity are addressed in Appendix C. Our work extends the ubiquitous phenomena of stochastic [63][64][65][66][67][68] and coherence [57,59] resonances in nonlinear dynamical systems to the realm of machine learning, where deliberate noise combined with hyperparameter optimization can be a practically feasible approach to enhancing the predictive power.
We note that, previously the role of noise in neural network training was studied, e.g., adding noise to the training data for convolutional neural networks can play the role of regularization to reduce overfitting in the learning models [69].In reinforcement learning, injecting noise into the signals can help the system reach the persistent excitation condition to facilitate parameter estimation [70,71].How noise negatively affects the prediction of chaotic systems has recently been considered [72], where long short-term memory machines tend to be more resistant to noise than other machine-learning methods.The beneficial role of noise in machine-learning prediction has also been recognized [46,[73][74][75].We present a systematic study of the interplay between noise and machinelearning prediction of dynamical systems in this work, along with the demonstration of the resonance phenomenon in machine learning.
All relevant data are available from the authors upon request.All relevant computer codes are available from the authors upon request.
A reservoir computer is essentially a recurrent neural network (RNN), which consists of three components: an input layer, a hidden layer, and an output layer.Compared with the conventional RNNs, the key advantage of reservoir computing lies in its computational efficiency: the input weights and the hidden layer neural network are predefined and only the weights of the output layer need to be determined from training through a standard linear regression.
As illustrated in Fig. 8, the input matrix W in maps the input signal u(t ) into a hidden layer.The hidden, recurrent layer hosts a neural network characterized by the adjacency matrix A, whose state vector is r(t ), where the ith entry represents the dynamical state of the ith neuron in the network.The dynamical evolution of r(t ) is determined by both the input and the recurrent structure: where α is the leakage parameter determining the temporal scale of the neural network, t is the time step of the dynamical evolution, and the activation is described by the hyperbolic tangent function tanh(x).The output matrix W out maps r(t ) into the output signal v(t ) that typically has the same dimension as the input vector u(t ).FIG. 8. Reservoir computing structure.A reservoir computer consists of three components: an input layer, a hidden layer, and an output layer.The vectors u(t ), r(t ), and v(t ) are the input signal, the dynamical state of the network in the hidden layer, and the output signal, respectively.The matrices W in , A, and W out represent the input weights, the network structure, and the output weights, respectively.The elements of W in and A are predefined and fixed.The matrix W out is determined by training through a linear regression.
Let D in and D r be the dimension of the input vector u(t ) and of the hidden-layer state vector r(t ), respectively.The matrix W in thus has the dimension D r × D in , where D r D in so that W in maps a low-dimensional input vector to a high-dimensional hidden state vector.Prior to training, the weights (elements) of W in are chosen uniformly from the interval [−γ , γ ].The dimension of the adjacency matrix A is D r × D r , which characterizes a symmetric random network with link probability p.The nonzero elements of A are drawn from a Gaussian normal distribution of zero mean and unit variance.We rescale A so that its spectral radius is given by the hyperparameter ρ.The output matrix W out has the dimension D out × D r , which is determined by l-2 linear regression (ridge regression) as where I is the identity matrix of dimension D r , β is the l-2 regularization coefficient, U and R consist of u(t ) and r (t ) at all time steps, respectively, in which a column represents the corresponding vector at a specific time step.The vector r (t ) is identical to r(t ) except that all the entries in the even rows are squared.Note that u(t ) is the training target for the reservoir to produce a one-step prediction.We inject Gaussian white noise of zero mean and standard deviation σ to each dimension of the training data and investigate the prediction performance for different values of the noise amplitude σ .We treat σ as one of the hyperparameters of the whole reservoir computer.For prediction and validation, no noise is applied.In particular, during the prediction phase, the output vector v(t ) becomes the input vector u(t ) and the reservoir computer generates one-step predictions.A step-by-step iterative process leads to a prediction signal, whose accuracy is determined by the real, noiseless testing data.For validation, we measure the performance of the trained reservoir computer using the root mean-square error (RMSE): where y and ŷ are the real and predicted signals, respectively, y n (t ) represent the nth component of y at time step t, and T stp is the prediction time.We use the RMSE to characterize the short-term prediction performance, typically for about 4∼5 Lyapunov times.

Implementation of reservoir computing and hyperparameter optimization
In the past few years, reservoir computing has been shown to be effective for modeling the dynamics of low-and highdimensional chaotic systems [24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41].As denoted in Ref. [35], if a reservoir computer can acquire the full state dynamics in the training phase, it outperforms the back propagation through time approaches, such as long short-term memory (LSTM) systems with respect to both short-term and long-term prediction even with much less training time.A main advantage of reservoir computing is that the input weights and the hidden-layer neural network are predefined prior to training, and only the output parameters need to be optimized at the end of training phase through, e.g., an l-2 linear regression (ridge regression).
We choose MATLAB so that we can readily build up the reservoir-computing framework through adjusting the network structure, parameters, and noise input setting.In fact, a number of reservoir-computing packages now exist, such as RESERVOIRPY and ECHOTORCH in PYTHON or RESER-VOIRCOMPUTING in JULIA.Regardless of the programming languages, the core algorithm of reservoir computing is the same and the implementation of the algorithms are quite similar.
It is essential to choose an appropriate optimization algorithm to search for the optimal hyperparameters.In our work, we used Bayesian optimization to determine the optimal hyperparameter values.This is because some traditional optimization algorithms, such as the gradient-free optimization algorithms that use grid or random search, may not be suitable for solving such complex problems as predicting chaotic systems, while Bayesian optimization has the ability to handle difficult problems with fewer iterations [54].The Bayesian optimization method can be implemented using PYTHON or other languages.Different packages for Bayesian optimization are now available, such as BAYESIAN-OPTIMIZATION and BOTORCH in PYTHON.The emergence of a resonance in predicting the dynamical evolution of the chaotic system depends on the prediction time T stp in the definition of RMSE.For a short prediction time, the RMSE is generally small, regardless of the training noise amplitude.As T stp increases, the benefits of noise begin to stand out, leading to the emergence of a resonance.This behavior is illustrated in Fig. 9 for the MG and KS systems, where the color-coded RMSE values in the parameter plane (T stp , σ ) are shown.Here, for better visualization, the RMSE values are normalized to the unit interval with respect to the variation in the noise amplitude for each fixed value of T stp .As shown in Fig. 9(a), for the MG system for τ = 17, a resonance emerges for T stp 600 t, whereas for τ = 30, a resonance emerges almost immediately as T stp increases from zero, as shown in Fig. 9(b).For the KS system, it takes about two Lyapunov times for a resonance to emerge, as shown in Fig. 9(c).

Robustness of resonance against different scenarios of noise injection
The results in the main text and discussed so far here are under the scenario where noise is injected into the entire training set.Specifically, we generate a matrix of Gaussian noise of zero mean and standard deviation σ , which has the same dimension and time length as training data set, and add this noise matrix directly to the normalized training data matrix.When updating the reservoir state with the input training data as a driving force, noise appears directly at the input.Simultaneously, the same noise is present at the output layer during the linear regression.
What if noise is added only to the input layer without appearing in the regression step? Figure 10 shows the RMSE versus the noise amplitude for the three examples in Fig. 9, together with the results from the scenario where noise is applied at both the input and output layers (for comparison).A resonance arises under both noise-injection scenarios.In fact, there is little difference between the results from the two scenarios, indicating that the occurrence of the resonance is robust with respect to the ways by which noise is supplied to the reservoir computer.

Beneficial role of noise in reducing the reservoir network size and computational complexity
In general, the predictive power of a reservoir computer can be improved by increasing the size D r of the random network in the hidden layer to enable the neural machine to generate more complex and richer dynamics.However, increasing the network size leads to higher computational complexity.Can noise be used to reduce the network size while maintaining the prediction accuracy?
Figure 11 shows, for the three examples in Figs. 9 and 10, the RMSE of short-term prediction versus the network size for four different values of the noise amplitude.For each example, the blue points are for a noise amplitude close to the optimal value for the resonance.It can be seen that optimal noise can lead to a tremendous reduction in the FIG.11.Demonstration of the beneficial role of noise in reducing the network size (thus computational complexity) while maintaining the prediction accuracy.Shown is the RMSE for shortterm prediction versus the size D r of the hidden-layer network for four different values of the noise amplitude for (a), (b) MG system for τ = 17 and τ = 30, respectively, and (c) KS system.The blue circles correspond to the case of optimal noise level at which a resonance arises, for which the RMSE values are low even for small network size.For different network sizes and noise amplitudes, the values of the five hyperparameters are fixed, which is the reason for the abnormal increase in RMSE at large network size in (c) as predicting the dynamical evolution of the KS system depends sensitively on the hyperparameters.Overall, with optimal noise, the reservoir computer can achieve a high prediction accuracy that cannot be achieved even with much larger networks without noise or when the noise level is not optimal.network size.For example, for the MG system for τ = 17 in Fig. 11(a), when optimal noise is added to the training data, the RMSE becomes small as the network size exceeds about 600, whereas this low value of RMSE can never be achieved for near-zero noise (e.g., σ = 10 −8 ) even if the network size is increased to 3000.A similar behavior occurs for the other two examples, as shown in Figs.11(b) and 11(c).

FIG. 3 .
FIG. 3. Short-term and long-term prediction of the KS system.(a), (b) True short-term (six Lyapunov times) and long-term (100 Lyapunov times) spatiotemporal evolution of the nonlinear field u(x, t ), respectively, (c), (d) the predicted field û(x, t ) in short and long terms, respectively.(e) Difference between the predicted and true fields defined as D(x, t ) ≡ [u(x, t ) − û(x, t )] 2 .(f) Overlapped image of the true and predicted attractors in terms of the fourth and fifth dimension of the KS system.The values of the optimal hyperparameters (including the optimal noise amplitude) are listed in TableI.

FIG. 4 .
FIG. 4. A resonance associated with short-term prediction of chaotic systems.Shown are three measures of short-term prediction versus the noise amplitude for two examples: left column, MG system for τ = 30 (r c = 0.1, length of prediction time window = 300 t); right column, KS system (r c = 8.0, length of prediction time window = five Lyapunov times); top row, RMSE; middle row, prediction stability R s (r c ); bottom row, prediction horizon t s .The error bars are obtained from an ensemble of 80 performing reservoir computers.For each chaotic system, a specific and unique noise level emerges at which each prediction measure is optimized, which is characteristic of a resonance.

FIG. 5 .
FIG. 5. A resonance associated with short-term prediction for the MG system for τ = 17.Shown are the three measures of short-term prediction versus the noise amplitude: (a) RMSE, (b) prediction stability R s (r c ), (c) prediction horizon t s .The error bars are obtained from an ensemble of 80 performing reservoir computers.For this system, a specific and unique noise level emerges at which each prediction measure is optimized, which is characteristic of a resonance.The relevant parameter values are r c = 0.1 and length of prediction time window = 900.

FIG. 9 .
FIG. 9. Emergence of a resonance.Shown are the color-coded normalized RMSE values in the parameter plane (T stp , σ ) for (a), (b) MG system for τ = 17 and τ = 30, respectively, and (c) KS system.To reduce the statistical fluctuations, the normalized RMSE values are calculated from an ensemble of 80 independently trained reservoir computers.

FIG. 10 .
FIG. 10.Resonance under two different noise-inputting scenarios.The two scenarios are: adding noise to both the input and output layers (yellow circles) and injecting noise to the input layer only (blue diamonds) for (a), (b) MG system for τ = 17 and τ = 30, respectively, and (c) KS system.The error bars are obtained using an ensemble of 80 independent realizations of the reservoir computer.Each data point is the result of the ensemble average of 80 best results out of 100 independent realizations of the reservoir computer.

TABLE I .
Optimal hyperparameter values for MG and KS.

TABLE III .
Optimal hyperparameter values for MG system with τ = 30.

TABLE IV .
Optimal hyperparameter values for KS system.