Long-term prediction of chaotic systems with machine learning

Reservoir computing systems, a class of recurrent neural networks, have recently been exploited for model-free, data-based prediction of the state evolution of a variety of chaotic dynamical systems. The prediction horizon demonstrated has been about half dozen Lyapunov time. Is it possible to signiﬁcantly extend the prediction time beyond what has been achieved so far? We articulate a scheme incorporating time-dependent but sparse data inputs into reservoir computing and demonstrate that such rare “updates” of the actual state practically enable an arbitrarily long prediction horizon for a variety of chaotic systems. A physical understanding based on the theory of temporal synchronization is developed.

A recently emerged interdisciplinary field is a machinelearning-based, model-free prediction of the state evolution of nonlinear/chaotic dynamical systems [1][2][3][4][5][6][7][8][9][10][11][12]. A paradigm that has been exploited is reservoir computing [13][14][15][16], a class of recurrent neural networks. Starting from the same initial condition, a well-trained reservoir system can generate a trajectory that stays close to that of the target system for a finite amount of time, realizing a short-term prediction. Because of the hallmark of chaos-sensitive dependence on initial conditions, the solution of the reservoir system will diverge from that of the original system exponentially. Nonetheless, if training is done properly so that the single-step error is orders of magnitude smaller than the oscillation range of the chaotic signal [12], an accurate prediction can be achieved in a short time. So far, the prediction horizon achieved is about five or six Lyapunov time [7], where one Lyapunov time is the inverse of the maximum Lyapunov exponent.
Is it possible to extend significantly the prediction horizon of reservoir computing? We provide an affirmative answer in this Rapid Communication. The key observation is that, after training, prediction is enabled because the neural network system can replicate the dynamical evolution of the target system (or synchronize with it) but only for a transient period of time. In the conventional scheme, data from the target system are used only during the training phase. A solution to extend the transient time is to provide some "update" of the target system. We thus conceive the scenario where, after the initial training, infrequent or sparse updates in the form of new measurements or the observation of the target system are available. We demonstrate that even rare updates of the actual state enable an arbitrarily long prediction horizon to be achieved for a variety of chaotic systems. Essentially, before the trajectories of the reservoir and original systems diverge substantially (e.g., about to exceed a predefined accuracy), we correct the state of the reservoir system with a real measurement of a duration as small as a single data point. We develop a physical understanding based on the theory of temporal synchronization. Practically, with rare data updates, the reservoir computing system can replicate the evolution of the original system within some desired accuracy for an arbitrarily long time, in spite of chaos. This will have applications in fields where chaos arises.
The basic working of reservoir computing can be described briefly as follows. Suppose time series data represented by a relatively low-dimensional data vector from the target system to be predicted are available. As shown in Fig. 1, one feeds the time series into the input-to-reservoir (I/R) module to generate a time-dependent data vector whose dimension is significantly larger than that of the original data vector. The high-dimensional vector is then sent to a complex network constituting the core of the reservoir, whose size matches the dimension of the vector. That is, there is a one-to-one correspondence between any component of the high-dimensional vector and a node in the network, and the data from a component are fed into the corresponding node. The state of the reservoir network is updated according to some nonlinear function, and the resulting state vector is sent to a reservoirto-output (R/O) module whose function is opposite that of the I/R module, i.e., to convert the high-dimensional vector of the reservoir back into a vector of the same low dimension as that of the original data from the target system. The reservoir network, once chosen, is fixed, so all parameters associated with it are hyperparameters. Training is done through a relatively small set of adjustable parameters associated with the R/O module, which can be tuned (or "learned") based on the available input data. During the training phase, the system is open as it requires input from the target system. After training, one feeds the output from R/O directly into FIG. 1. Proposed reservoir computing (RC) system with rare state updates in the prediction phase, which is capable of generating an arbitrarily long prediction of the state evolution of any chaotic system. The I/R, reservoir, and R/O modules within the blue dashed box constitute the conventional reservoir computing system. Initial training with data from the target chaotic system is done in a conventional manner. The articulated scheme of rare state updates is represented by the C/I module inside the red dashed box, which couples sparse measurement data with the output of the system to generate updated inputs to the system. the I/R module, closing the system. The system then evolves by itself. Some key numbers involved in reservoir computing for model-free predictions are as follows. For example, for the Kuramoto-Sivashinsky equation (KSE) with spatiotemporally chaotic solutions [7], the typical number of the spatial measurement sites (the dimension of the input vector) is 64 and the size of the reservoir network is about 5000. The number of parameters in the R/O module to be trained is 4992 × 64.
Our proposed reservoir computing system functioning in the prediction phase leading to an arbitrarily long prediction horizon is shown schematically in Fig. 1, where the three modules inside the blue dashed box represent the conventional system and the one inside the red dashed box is a new module incorporating rare state updates. The I/R module is described by W in , a D r × D in random matrix that maps a D in -dimensional input vector v into a D r -dimensional vector r(t ), where D in ≪ D r . The elements of W in are generated from a uniform distribution in [−σ, σ ]. The reservoir is a complex network of D r nodes with an average degree d, whose connecting topology is described by the D r × D r matrix A. (For simplicity, we choose it to be a directed random network.) A recent work [12] has revealed that successful training and finite-time prediction can be achieved if the spectral radius of the network is in a finite range, which can be adjusted by properly normalizing the link weights in the network. The R/O module is represented by a D out × D r matrix, whose elements are parameters determined through training [3,7]. Typically, we have D out = D in . Without any state update, in the prediction phase, the reservoir computing system is a self-evolving dynamical system described by , where f (r) is the output function [3,7]: f i (r) = r i and f i (r) = r 2 i for odd and even index i (i = 1, 2, . . . , D r ), respectively. Associated with the dynamical evolution of the reservoir system are two sets of dynamical variables: the high-dimensional reservoir state vector r and the (typically) low-dimensional output vector v. The matrices W in and A are predefined while W out is determined by training during which v is replaced by the state vector u of the target system. After training is completed, to set the initial condition for the reservoir network, one approach is to continue to use u in place of v but only for a few time steps, after which the reservoir system executes natural dynamical evolution by itself.
In the conventional scheme [1][2][3][4][5][6][7][8][9][10][11][12], after withdrawing the true state vector u so that the system is closed, real measurements are no longer used, resulting in a relatively short prediction horizon for chaotic systems because of the exponential divergence between the trajectories of the reservoir and true systems. Our idea, as shown in Fig. 1, is to update the reservoir state with a sparsely sampled real state vector u ′ before the divergence exceeds a predefined tolerance limit, i.e., the updates are needed only rarely. In particular, during the update, the input to the I/R module can be written as v , where u ′ contains data at t and c is the coupling parameter. Most of the time during the evolution, we still have v ′ (t ) = v(t ). Updating is effectively an on-off coupling process between the reservoir and the true systems, where the "on" phase is significantly more sparse than the "off" phase. Prediction can then be viewed as a synchronization process [11,17] between the two systems that are coupled but only intermittently [18,19].
We test the predictive power of our proposed reservoir computing scheme with a large number of chaotic systems. Here, we present two examples of high-dimensional chaotic systems: KSE and the complex Ginzburg-Landau equation (cGLE). (Examples of a number of low-dimensional chaotic systems are presented in the Supplemental Material [20].) The KSE is y t + yy x + y xx + y xxxx = 0, where y(x, t ) is a scalar field in the interval x ∈ (0, L) with periodic boundaries. We divide the spatial domain into M uniform subintervals. In addition, to avoid overfitting of W out during the training process, we set the relevant bias parameter [7] to be η = 1 × 10 −4 . Figure 2(b) shows the difference between the evolution of the reservoir and true system, i.e., the prediction error (color coded). The prediction horizon is about five Lyapunov time.
We now instigate rare updates with the true data where y rc is the output of the reservoir system. We divide the output time series into equal time intervals, each of T time steps. In one interval, the actual data points are available for consecutive T 0 steps, and there is no update for the remaining T − T 0 steps. In the space, we select M c uniform spatial points from the total M measurement points. For illustration, we T 0 = 1 and M c = 64, i.e., each component of the input vector to the reservoir system (corresponding to a distinct measurement point in space) receives one true data point every T time steps. the error is reduced as compared the case of no data update [ Fig. 2(b)] and exhibits intermittency, implying an intermittently synchronous behavior between the reservoir and the true system. In this case, the time interval in which the error is below the tolerance emerges intermittently in time. As the updating becomes more frequent, the small-error intervals are generally enlarged, as shown in Fig. 2 Remarkably, for T = 80, the error is essentially zero in the whole time interval considered. This means that, insofar as a single true data point is used to update the reservoir system in every Lyapunov time, the prediction horizon becomes arbitrarily long.
To obtain a systematic picture of the prediction horizon, we calculate the average error δe over a long time interval, e.g., about 100 Lyapunov time, in the parameter plane (T 0 , T ) (for T 0 T ). Figure 3(a) shows, for M c = M/2 = 32 (the number of spatial coupling channels) and c = 2.0, the error behavior. In most of the parameter region, the error is small: δe < 0.01. Long-term prediction fails only for extremely rare updates, i.e., for small values of T 0 and large values of T . Fixing the value of T 0 , we obtain the curves of δe vs T . For T 0 = 1 (black line), δe increases rapidly with T . As more data points are included in each update, e.g., as the value of T 0 is increased from two to ten, δe decreases gradually. The cases considered so far are for rare but regular updates. What about random updates? We examine the parameter plane (p t , p s ), where p t is the probability of update in each time step and p s is the probability for a spatial point to receive updates. Figure 3(e) shows the average prediction error, which exhibits an approximately symmetric behavior with respect to p t and p s . Figure 3(f) shows, for fixed p s = 0.5 and p s = 1 (corresponding to the cases of M c = 32 and M c = 64 with regular coupling, respectively), random updating yields somewhat larger errors than those with regular coupling.
We now demonstrate the predictive power of our reservoir computing scheme for the 1D cGLE in the regime of spatiotemporal chaos, t ) is a complex field in the interval x ∈ (−L/2, L/2) with the periodic boundary condition, and α and β are parameters. The cGLE is a general model for a variety of physical phenomena [21][22][23]. For L = 18, α = 2, and β = −2, the 1D cGLE exhibits spatiotemporal chaos with the maximum Lyapunov exponent ≈ 0.23. We divide the whole interval into M = 32 equally spaced points-the measurement sites. For integration step t = 0.07, one Lyapunov time corresponds to about 65 time steps. Figure 4(a) shows the true spatiotemporal evolution pattern |A(x, t )|. Because the field A(x, t ) is complex, it is necessary to use both the real [A r (x, t )] and imaginary [A i (x, t )] parts for training the reservoir system. The parameters of the reservoir system are D in = D out = 2M, D r = 9984, σ = 1, d = 3, ρ = 0.1, and η = 2 × 10 −5 . Without any update, the error between the predicted and true state evolution is shown in Fig. 4(b), where the prediction horizon is about five Lyapunov time. To introduce rare updates of the true state, we use the coupling scheme A ′r(i) is the real (imaginary) part of the true data and A r(i) rc is the real (imaginary) part of the predicted state. We demonstrate updating with only a single data point, T 0 = 1. Figures 4(c)-4(e) show the prediction error for T = 195, 130, and 65, corresponding to three, two, and one Lyapunov time, respectively. Evidence of intermittent synchronization between the reservoir and the actual systems is shown in Figs. 4(c) and 4(d). As for the spatiotemporally chaotic KSE system, updating the reservoir system with a single data point every Lyapunov time [ Fig. 4(e)] leads to an arbitrarily long prediction horizon for the spatiotemporally chaotic state of the 1D cGLE.
Figures 4(f) and 4(g) show the time-averaged prediction error δe in the parameter planes (T 0 , T ) (regular updating scheme) and (p s , p t ) (random updating scheme), respectively, where the average is taken over approximately 100 Lyapunov time. For regular updating, a small prediction error can be achieved in most of the parameter plane. For random updating, the error decreases with an increase in p t and/or p s .
The theoretical explanation for the observed long-term prediction is that, after proper training the output vector of the reservoir system follows the state vector of the target system for a finite amount of time, indicating complete synchronization between the two systems. Without state updating, the synchronization state is slightly unstable, leading to a short prediction horizon. State updates, even applied rarely, represent a kind of perturbation that makes the synchronization state less unstable, prolonging the prediction horizon. When the frequency of the updates is such that there is one update within one Lyapunov time, the synchronization state becomes stable, giving rise to an arbitrarily long prediction horizon. This scenario has been verified using low-dimensional chaotic systems [20] through a synchronization stability analysis [24,25]. We also note that, for a variety of low-dimensional chaotic systems including the classic Rössler [26] and Lorenz [27] oscillators, the Hindmarsh-Rose neuron [28], and a chaotic food web [29], rare updates to a single state variable enables a properly trained reservoir system to predict all state variables for an arbitrarily long period of time [20]. For example, for the chaotic food web, it is necessary only to supply sparse vegetation data for the reservoir system to correctly predict the abundances of the herbivores and predators, for as long as one wishes.
To summarize, existing reservoir computing systems can predict the dynamical evolution of chaotic systems but only for a short period of time. We have articulated a scheme incorporating true state updates and demonstrated that rare updates of a subset of state variables can significantly prolong the prediction horizon. Of particular interest is the finding that, insofar as there is a state update of a single data point within one Lyapunov time, the prediction horizon can be made arbitrarily long. The machine-learning scheme proposed and studied here has the potential to extend significantly the application scope of reservoir computing in predicting complex dynamical systems.
The difference in our work is threefold: (1) We have introduced intermittent data updates into machine learning, (2) we have demonstrated that a long prediction time can be achieved, and (3) we have introduced the concept of temporal synchronization with on-off coupling to understand the working of the reservoir computing machine with state updating. Our work sheds light on the working of reservoir computing in predicting the state evolution of chaotic systems. In particular, in the synchronization-based scenario, the predictability of reservoir computing is the result of its ability to synchronize with the target chaotic system for a finite amount of time. At the start of the prediction phase, the reservoir system has the same initial condition as the target system. Because of temporal synchronizability, the reservoir system is able to follow the target system for some time before the synchronization error becomes significant. Without state updates, the time it takes for the error to grow to a predefined threshold value determines the prediction time. State updates, even being rare, reset the synchronization error from time to time, insofar as a new update is provided before the error exceeds the threshold.
In essence, the basic idea of our method is similar to that of data assimilation, where models and measurements (true state updates) are combined to generate accurate predictions with applications in, e.g., weather forecasting [30][31][32]. In a recent study, data assimilation and machine learning have been combined to emulate the Lorenz 96 model from sparse and noisy observations [33]. It is also noteworthy that, in predicting chaotic dynamical systems, an alternative machine-learning-based framework is radial basis function networks-artificial neural networks employing radial basis functions as activation functions [34,35]. Given a set of inputs, such a network outputs a signal that is a linear combination of radial basis functions of the inputs, and the parameters of the artificial neurons are determined through training based on, e.g., the standard backpropagation scheme. The method has been demonstrated to be effective for low-dimensional chaotic systems such as the logistic map and the classic Lorenz chaotic oscillator [36][37][38][39]. Whether the method can be superior to reservoir computing in predicting spatiotemporal chaotic systems is an open question. To understand the mechanism behind the realization of long prediction horizon in our proposed reservoir computing scheme with state updating, we resort to synchronization analysis. To facilitate the analysis, we apply our scheme to the chaotic Rössler oscillator given by [1]: x = y z, y = x + 0.2y, (S1.1) . The non-zero elements of the D r ⇥ D r matrix A are generated from the uniform distribution [ 1,1]. Figure S1(a) shows the prediction result with the conventional scheme, where the prediction horizon is about t ⇡ 75 (corresponding to approximately 15 average cycles of oscillation). Figure S1(b) shows that, with our proposed scheme, the prediction horizon is practically infinite, where an update of the actual data of a single dynamical variable, y act , is coupled into the system (T 0 = 1 and T = 50) once every 50 time steps. More specifically, in the iterative process of reservoir computing system, the input data y rc is replaced by y 0 rc = y rc + c(y act y rc ) once every 50 time steps for c = 0.8. Note that, in Fig. S1(b), only the true and predicted x time series are shown, but time series from the other two dynamical variables give essentially the same prediction result. Figure S1(c) shows the color-coded average prediction error e in the parameter plane (T 0 , T ), where there are multiple parameter regions in which the error is small. The patterns in Fig. S1(c) are reminiscent of the phenomenon of ragged synchronization in coupled chaotic oscillator systems [2,3], suggesting the use of synchronization theory to understand the working mechanism of our articulated reservoir computing machine.
A well trained reservoir computing system can be viewed as a high-dimensional replica of the target system. A previous calculation of the Lyapunov exponents of the reservoir dynamical network revealed that the first few exponents are indeed approximate values of the exponents of the true system, and the vast set of remaining exponents have large negative values [4]. This is anticipated as the large negative exponents are necessary to reduce the exceedingly high dimension of the reservoir network to the low-dimensional target system through a strong compression of the dynamics along vast majority of orthogonal directions in the phase space. For the chaotic Rössler system, the considerations suggest that the dynamics of the reservoir computing system be approximately described byẋ rc = y rc z rc , (S1.2) y rc = x rc + 0.2y rc + "(t)(y act y rc ), z rc = 0.2 + (x rc 9)z rc , where, in the coupling term "(t)(y act y rc ), "(t) specifies the on-off nature of the coupling: "(t) = " if nT t < t < T 0 t + nT t (n = 0, 1, 2, . . .) and "(t) = 0 otherwise. There is a linear relation between " and c: c = " t. The data of y act are generated from the target system Eq. (S1.1). In our scheme, predictability implies synchronization between the reduced reservoir computing system Eq. (S1.2) and the true system Eq. (S1.1) with only on-off coupling, where the "on" phase is typically significantly shorter than the "off" phase.
We use stability analysis to quantify synchronization. Let x = x rc x act , y = y rc y act , and z = z rc z act be the infinitesimal perturbations transverse to the synchronization manifold. The variational equations can be obtained by linearizing Eq. (S1.2) with respect to the true state of the target system as described by Eq. (S1.1): Combining Eqs. (S1.1) and (S1.3), we calculate the maximum transverse Lyapunov exponent ⇤. Stable synchronization requires ⇤ < 0. Figure S1(d) shows the color-coded value of ⇤ in the parameter plane (T 0 , T ), which exhibits typical features of ragged synchronization [2,3]. Comparing Fig. S1(c) with Fig. S1(d), we observe a striking degree of similarity, indicating synchronization between the reservoir computing and the target systems subject to on-off coupling as the dynamical mechanism responsible for realizing the long prediction horizon with rare data updating.

II. PREDICTING CHAOTIC LORENZ SYSTEM
We demonstrate that a practically infinite prediction horizon can be achieved for the classic chaotic Lorenz system [5] with rare state updating. The equations of the system arė x = 10(y x), y = x(28 z) y, (S2.4) We obtain the time series x(t), y(t), and z(t) with integration step size t = 0.01. The parameter setting of the reservoir computing system is D in = D out = 3, D r = 600, = 0.1, d 0 = 0.3, ⇢ = 1.2, and ⌘ = 1 ⇥ 10 5 . Figure S2(a) shows the prediction result with the conventional scheme without any state updating, where the prediction horizon is t ⇡ 5 (corresponding to approximately ten average oscillations). Figure S2(b) shows that, with rare state updating (T 0 = 1 and T = 40) of one of the dynamical variables, mathematically represented as replacement of y rc by y 0 rc = y rc + c(y act y rc ) once every 40 time steps, practically an arbitrarily long prediction horizon can be achieved. FiguresS2(c) and S2(d) show the difference between the predicted and true time series from the conventional and our proposed reservoir computing schemes, respectively. It can be seen that, with rare state updating, the prediction error is essentially zero for the time interval displayed, with relatively large errors occurring only at about a few dozen time steps (out of 2⇥10 4 time steps).

III. PREDICTING CHAOTIC HINDMARSH-ROSE NEURON DYNAMICS
We test our reservoir computing scheme with rare state updating for the chaotic Hindmarsh-Rose neuron model [6]:ẋ = y + 3x 2 x 3 z + 3.2, y = 1 5x 2 y, (S3.5) where x is the membrane potential, y and z are the transport rates of the fast and slow channels, respectively. The integration time step is t = 0.1. The three time series x(t), y(t), and z(t) are used to train the reservoir computing system with parameters D in = D out = 3, D r = 600, = 0.6, d 0 = 0.2, ⇢ = 0.3, and ⌘ = 1 ⇥ 10 7 . Figure S3  (c,d) The difference between the predicted and true state evolution for the cases in (a,b), respectively.

IV. PREDICTING POPULATION EVOLUTION IN A CHAOTIC FOOD WEB
We demonstrate an arbitrarily long prediction horizon for the following chaotic food web system [7]:ẋ = x 0.2 xy 1 + 0.05x , y = y + 0.2 xy 1 + 0.05x yz, (S4.6) z = 10(z 0.006) + yz, where x, y and z represent vegetation, herbivores and predators, and the evolution displays uniform phase evolution but with chaotic amplitude modulation. The integration step size is t = 0.1. The parameter values of the reservoir computing system are D in = D out = 3, D r = 600, = 0.15, d 0 = 0.2, ⇢ = 0.2, and ⌘ = 1 ⇥ 10 7 . Figure S4(a) shows the prediction result from the conventional scheme without any state updating, where the prediction horizon is t ⇡ 50 (containing seven or eight bursts in the predator population). In the food web system, vegetation data are relatively easy to be collected, so we use x act to perform rare state updating. Figure S4(b) shows the predicted and actual predator time series for T 0 = 1 and T = 50, i.e., we replace x rc by x 0 rc = x rc + c(x act x rc ) once every 50 time steps. Visually the two types of time series cannot be distinguished. Figures S4(c) and S4(d) show the corresponding evolution of the prediction error for Figs. S4(a) and S4(b), respectively. We see that, with rare state updating, the prediction error is exceedingly small in the long time interval (10 4 time steps) tested, indicating that a practically infinite prediction horizon has been achieved.