Anticipating synchronization with machine learning

In applications of dynamical systems, situations can arise where it is desired to predict the onset of synchronization as it can lead to characteristic and significant changes in the system performance and behaviors, for better or worse. In experimental and real settings, the system equations are often unknown, raising the need to develop a prediction framework that is model free and fully data driven. We contemplate that this challenging problem can be addressed with machine learning. In particular, exploiting reservoir computing or echo state networks, we devise a"parameter-aware"scheme to train the neural machine using asynchronous time series, i.e., in the parameter regime prior to the onset of synchronization. A properly trained machine will possess the power to predict the synchronization transition in that, with a given amount of parameter drift, whether the system would remain asynchronous or exhibit synchronous dynamics can be accurately anticipated. We demonstrate the machine-learning based framework using representative chaotic models and small network systems that exhibit continuous (second-order) or abrupt (first-order) transitions. A remarkable feature is that, for a network system exhibiting an explosive (first-order) transition and a hysteresis loop in synchronization, the machine learning scheme is capable of accurately predicting these features, including the precise locations of the transition points associated with the forward and backward transition paths.


I. INTRODUCTION
As a universal phenomenon in nonlinear and complex dynamical systems, synchronization has attracted a great deal of research and a continuous interest [1][2][3].Synchronization represents a kind of coherent motion that typically arises in systems of coupled dynamical units when the interaction or coupling among the units is sufficiently strong.Depending on the specific form of the coherent motion, different types of synchronization can emerge, including complete chaotic synchronization [4], phase synchronization [5], and generalized synchronization [6].The occurrence of synchronization has significant consequences for the system behavior and functions.An example is the occurrence of epileptic seizures in the brain neural system.Specifically, a widely adopted assumption is that hypersynchrony is closely associated with the occurrence of epileptic seizures [7], during which the number of independent degrees of freedom of the underlying brain dynamical system is reduced.Among the extensive literature in this field, there was demonstration that partial and transient phase synchrony can be exploited to detect and characterize (but not to predict) seizure from multichannel brain data [8][9][10].To date, reliable seizure prediction remains an unsolved problem.In real-world situations such as this, it is of interest to predict or anticipate synchronization before its actual occurrence.A daunting challenge is that it is often impossible to know the system equations so any prediction attempt must be based on time series data obtained before the system evolves into some kind of synchronous dynamical state.We are thus motivated to ask the question: given that the system operates in a parameter regime where there is no synchronization, would it be possible to predict, without relying on any model, the onset of synchronization based solely on the dynamically incoherent time series measurements taken from the parameter regime of desynchronization?In this paper, we articulate a machine-learning framework based on reservoir computing [11,12] to provide an affirmative answer to this question.
To place our work in a proper context, here we offer a brief literature review of the fields of synchronization and reservoir computing.
Synchronization in nonlinear dynamics and complex networks.In the study of synchronization, the typical setting is coupled dynamical oscillators, where the bifurcation or control parameter is the coupling strength among the oscillators.A central task is to identify the critical point at which a transition from desynchronization to synchronization occurs [1,2].Depending on the dynamics of the oscillators and the coupling function, the system can have a sequence of transitions, giving rise to distinct synchronization regimes in the parameter space.For example, for a system of coupled identical nonlinear oscillators, complete synchronization can arise when the coupling exceeds a critical strength that can be determined by the master stability function [13,14].Systems of nonlinearly coupled phase oscillators, e.g., those described by the classic Kuramoto model [1] can host phase synchronization and the critical coupling strength required for the onset of this type of "weak" synchronization can be determined by the mean-field theory [15,16].Synchronization in coupled physical oscillators was experimentally studied [17,18].A counter-intuitive phenomenon is that adding connections can hinder network synchronization of time-delayed oscillators [19].With the development of modern network science in the past two decades, synchronization in complex networks has been extensively studied [20], due to the rich variety of complex network structures in natural and engineering systems and the fundamental role of synchronous dynamics in the system functioning.Earlier it was found that small-world networks, due to their small network diameter values, are more synchronizable than regular networks of comparable sizes [21], but heterogeneity in the network structure presents an obstacle to synchronization- [22].Subsequently it was found that heterogeneous networks with weighted links can be more synchronizable than smallworld and random networks [23].The onset of chaotic phase synchronization in complex networks of coupled heterogeneous oscillators was also studied [24].The interplay between network symmetry and synchronization was uncovered and understood [25,26].In complex networks, the transition from desynchronization to synchronization is typically continuous, e.g., the synchronization error or the order parameter tends to change continuously through the critical point, which is characteristic of a second-order phase transition.However, studies of networked systems of coupled phase oscillators revealed that, if the network links are weighted according to the natural frequencies of the oscillators, the transition can be abrupt and discontinuous, i.e., a first-order phase transition [27][28][29].This phenomenon, known as explosive synchronization, is proof of the strong influence of network structure on the collective dynamics.
Reservoir computing for predicting chaotic systems.The idea and principle of exploiting reservoir computing for predicting the state evolution of chaotic systems were first laid out about two decades ago [11,12].In recent years, modelfree predication of chaotic systems using reservoir computing has gained considerable momentum [30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48].The neural architecture of reservoir computing consists of a single hidden layer of a complex dynamical network that receives input data and generates output data.In the training phase, the whole system is set to the "open-loop" mode, where it receives input data and optimizes its parameters to match its output with the true output corresponding to the input.In the standard reservoir computing setting [30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48], the adjustable parameters are those associated with the output matrix that maps the internal dynamical state of the hidden layer to the output layer, while other parameters, such as those defining the network and the input matrix, are fixed (the hyperparameters).In the prediction phase, the neural machine operates in the "closeloop" mode where the output variables are fed directly into the input, so that the whole system becomes a self-evolving dynamical system.With reservoir computing, the state evolution of chaotic systems can be accurately predicted for about half dozen Lyapunov time -longer than that can be achieved using the traditional methods of nonlinear time series analysis.A result was that, even reservoir computing is unable to make long term prediction of the state evolution of a chaotic system, it is still able to replicate the ergodic properties of the system [33].This feature makes it possible to generate the bifurcation diagram of a nonlinear dynamical system without the equations [49].More recently, a "parameter-aware" reservoir computing scheme was articulated to predict critical transitions, transient chaos, and system collapse [50].
Contributions of the present work.In situations where synchronization is undesired, such as epilepsy, the sudden onset of synchronization, or a "synchronization catastrophe," is of concern.Likewise, in alternative situations where synchronization is deemed desired, a sudden collapse of synchroniza-tion is to be avoided.It is of interest to predict sudden onset or collapse of synchronization before it occurs.If the detailed network structure and system equations are known, in principle the prediction task is feasible.In real world applications, it is often the case that the only available information about the system is a set of measured time series at a number of control parameter values [5,6].For instance, in epilepsy, the details of the brain neuronal network generating the seizures are unknown, but the EEG signals upon applications of controlled drug doses can be obtained.In such applications, a model-free and fully data-driven approach to predicting a synchronization catastrophe is needed.In this regard, if the network is sparse and if the nodal dynamical equations have a simple mathematical structure, such as those which can be represented by a small number of power series or Fourier series terms, sparse optimization tailored to nonlinear dynamical systems [51][52][53] can be employed to discover the system equations so as to predict the onset of synchronization [54].
The scheme of reservoir computing incorporating a parameter input channel so that it is able to sense and track the parameter variations of the target system [50] provides a solution to the problem of predicting synchronization transition.The purpose of this paper is to demonstrate that this is indeed the case.The basic principle is that the dynamical "climate" of the target system, e.g., the dimension of its chaotic attractor [33], undergoes an abrupt change at the synchronization transition point, and a well trained parameter-aware reservoir computing machine is able to capture this "climate" change.To be concrete, we take the coupling strength as the control parameter whose value is fed into the input parameter channel.The required training constitutes optimizing the reservoir output matrix based on time series collected from a small number of distinct values of the control parameter in the desynchronization regime so that the neural machines learns the "climate change" of the target system.As we will show, provided that the training is successful, the machine is able to predict a characteristic change in the collective dynamical behavior of the target system for any value of the control parameter that the input parameter channel receives.The scenarios tested consist of a broad spectrum of complex synchronization behaviors, including complete synchronization in coupled identical chaotic systems and explosive synchronization in networks of coupled nonidentical phase oscillators.Because of the modelfree nature of our machine-learning framework for synchronization prediction, it can be applied directly to real world systems.

II. MACHINE-LEARNING METHOD
Our reservoir computing machine consists of four modules: the I/R layer (input-to-reservoir), the control module, the hidden layer (the reservoir), and the R/O layer (reservoir-tooutput).The I/R layer is characterized by W in , a D r × D indimensional matrix that maps the input vector u ε (t) ∈ R Din to the dynamical network in the reservoir hidden layer, where the input vector is acquired from the target system at time t for the specific bifurcation parameter value ε.The elements of W in are randomly drawn from a uniform distribution within the range [−σ, σ].The control module is characterized by the vector s = η(t)b, where η(t) is the time-dependent control parameter and b ∈ R Dr is a bias vector.In general, the control parameter is related to the bifurcation parameter of the target system by a smooth function, where a convenient choice is simply η(t) = ε(t).Effectively, η(t) can be regarded as an additional input component that can be incorporated into the input vector u(t), where the elements of b are also drawn randomly from a uniform distribution within the range [−σ, σ].The network in the hidden layer consists of D r nonlinear elements (nodes), whose dynamics are governed by the rule where r(t) ∈ R Dr is the state vector of the network at time t, ∆t is the time step, α is a leakage parameter, ε k and ε bias define a linear transformation of ε before input it into the reservoir, and A is the D r × D r -dimensional matrix characterizing the connecting structure of the hidden layer network.With probability p, the elements of matrix A are set to be zero.The symmetric, non-zero elements of A are drawn from a uniform distribution within the range [−1, 1], and are normalized so as to make the spectral radius of the matrix ρ.The output layer is characterized by a D out × D r dimensional matrix W out , which generates the D out -dimensional output vector v(t) via where f (r) is the output function and we set D in = D out .The elements of the output matrix are to be determined through training, where a general rule is to set f (r) = r for the odd nodes in the reservoir and f (r) = r 2 for the even nodes so as to enable proper optimization [38].In particular, different from W in , the elements of W out are not known a priori, but are to be "learned" from the input data through training, with the purpose to find the proper matrix W out such that the output vector v(t + ∆t) as calculated from Eq. ( 2) is as close as possible to the input vector u(t + ∆t) for t = (τ + 1)∆t, . . ., (L − 1)∆t, L∆t, where T 0 = τ ∆t is the initial period discarded to remove the transient behavior in reservoir's response to the training signal, and L is the length of the training time series.This can be done [33,34,38] by minimizing a cost function with respect to W out , which gives where F is the D r × L dimensional state matrix whose kth column is f (r((τ + k)∆t)), U is the D in × L dimensional matrix whose kth column is u((τ +k+1)∆t), I is the identity matrix, and λ is the ridge regression parameter.
After training, the elements in matrix W out are fixed, and the machine is ready for prediction, where we first set the control parameter to a specific value of interest (not necessarily any of the parameter values used in the training phase), and then evolve the machine according to Eq. ( 1) by replacing u ε (t) with v(t + ∆t).Finally, by tuning ε to different values, we monitor the variation of the statistical properties of the reservoir outputs, and predict the transition of the system dynamics with respect to ε.
The main feature of our reservoir computing design is that the input data in the training phase contain two components: (1) the input vector u ε (t) representing the time series measured from the target system and (2) the bifurcation parameter ε(t) under which u ε (t) is obtained, whereas in the conventional scheme [30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48], only the first component (time series from a fixed value of the bifurcation parameter) is present.In particular, u ε (t) consists of m segments of equal length T (i.e., L = mT ) and, for each segment, the value of the bifurcation parameter ε(t) is fixed so, overall, ε(t) is a step function of time.(The proposed scheme is equally effective when the segments are not of equal length -see Appendix A.) In the predicating phase, the input vector u ε (t) is replaced by v(t) as in the conventional scheme, but the value of the bifurcation parameter ε(t) is still needed as an input.Since our goal is to predict synchronization among a number of coupled oscillators, the coupling strength ε is a natural choice for the bifurcation parameter.

A. Predicting complete synchronization in coupled chaotic maps
We consider the following system of two coupled, identical chaotic maps: (4) where x 1,2 (n) denote the dynamical variables of the system at the nth iteration, F(x) and H(x) are the map and coupling functions, respectively.As an illustrative example, we choose the one-dimensional chaotic logistic map defined on the unit interval: F (x) = 4x(1 − x), and set the coupling function to be H(x) = F (x).The critical coupling value for complete synchronization can be obtained using the master stability function [13,14], which gives that complete synchronization occurs for 0.25 We obtain the training data for three different values of ε: 0.2, 0.22 and 0.24, all in the desynchronization regime.For each ε value, we collect the state variables {u(n)} = {x 1 (n), x 2 (n)} for T = 2 × 10 3 successive time steps after disregarding a transient of 10 3 time steps.The time series from the three values of ε are combined to form a single time series which, together with the step function ε(n) of the coupling parameter, are fed into the reservoir for training that yields the optimal output matrix W out .The values of the hyperparameters of the reservoir are (D r , p, σ, ρ, α, ε k , ε bias ) = (100, 0.2, 1, 10 −5 , 1, 1, 0).The regression parameter for obtaining W out is λ = 1 × 10 −5 .
To predict the synchronization transition, the trained reservoir computing machine must possess the ability to sense the change in the "synchronization climate" of the target system.Figures 1(a1-d1) show the predicted return maps (black dots) constructed from x 1 for four values of ε: 0.2, 0.22, 0.24, The first three values of ε are below ε 1 , so there is no synchronization, and the last value is in the synchronization regime.The reservoir machine predicts these behaviors correctly.Especially, as the value of ε is increased from 0.2 to 0.26, the return map gradually evolves into the map function F (x) = 4x(1 − x) and the points (x 1 , x 2 ) converges to the diagonal line, which are characteristic of complete synchronization.In fact, statistically the black and red dots cannot be distinguished, indicating the superior power of the machine to capture the collective dynamics of the target system.
Note that the first three ε values (0.2, 0.22, and 0.24) are the ones used in training.It may thus not be surprising that the reservoir is able to predict correctly the distinct dynamical behaviors of the system at these parameter values, i.e., there is no synchronization.What is remarkable is that the last ε value (0.26) is totally "new" to the machine as it has never been exposed to data from this parameter value, yet it predicts, still quite correctly, that there is now synchronization.This means that training at different coupling parameter values in the desynchronization regime has instilled into the machine the ability for it to "sense" the "climate" change in the collective dynamics of the target system.
As the reservoir computing has been trained to capture the "climate" of the collective dynamics in the coupled chaotic map system, it should be able to predict the synchronization transition.In particular, the expectation is that it would predict correctly the two ending points of the synchronization parameter regime (ε 1 , ε 2 ) at which a transition to synchronization occurs depending on the direction of parameter variation.To demonstrate the successful prediction of the transition point ε 1 , we fix the output matrix W out and increase the ε value systematically from 0.18 to 0.27 at the step size ∆ε = 1 × 10 −3 .For each ε value, we let the machine generate a time series of length T = 1 × 10 3 and calculate the time-averaged synchronization error ∆x = |x 1 − x 2 | T .Figure 2(a) shows the machine-predicted ∆x versus ε (red circles), where ∆x decreases continuously for ε 0.2 and becomes zero for ε = ε 1 ≈ 0.25.For comparison, the true behavior of ∆x is also included in Fig. 2(a), to which the predicted result agrees well for ε 0.2.At the opposite end of the synchronization regime, the reservoir machine does an equally good job to predict the transition to synchronization at ε 1 ≈ 0.75 as ε decreases from a larger value, as shown in Fig. 2(b), where the training data are obtained from ε = 0.76, 0.78 and 0.8.The results in Fig. 2 are thus evidence that a properly trained reservoir computing machine has the power to accurately predict the critical point of transition to synchronization.
The performance of the reservoir machine in predicting the synchronization transition is affected slightly by the number and locations of the training parameter values.We find that, insofar as training is done with data from at least two distinct parameter values, the synchronization transition can be predicted.Training with data from increasingly more values of the bifurcation parameter, regardless of the order, can in general improve the prediction accuracy.Also, the closer the training parameter values to the critical point, the more accurate the prediction.(Details about the effects of the number, locations and order of the training parameter values on the prediction performance are provided in Appendix A.)

B. Predicting synchronization transition in coupled chaotic Lorenz oscillators
The system of a pair of coupled identical chaotic Lorenz oscillators is given by where the parameter setting is (µ, β, γ) = (10, 28, 2) for which an isolated oscillator has a chaotic attractor [55].Analysis based on the master stability function gives that complete synchronization occurs for ε > ε c ≈ 0.42.
Figures 3(a1-a3) show the predicted evolution of the dynamical variables x 1 and x 2 from the two oscillators for the three training parameter values preceding the onset of synchronization.The machine predicts correctly that there is no synchronization for these parameter values.When the coupling parameter value is set to be ε = 0.45 (in the syn-chronization regime), the machine indeed predicts the synchronous behavior, as shown in Fig. 3(a4).To test if the machine can predict the critical transition point ε c for synchronization, we increase the control parameter value in the machine from 0.2 to 0.5 systematically and calculate the timeaveraged synchronization error ∆x = |x 1 − x 2 | T over a time period of T = 1 × 10 4 (after discarding transients).Figure 3(b) shows ∆x versus ε.The machine predicted that the synchronization error approaches zero for ε c ≈ 0.42, which is in good agreement with the true value of the transition point.Similar results are obtained for alternative coupling configurations, e.g., through a single variable or a cross coupling scheme (Appendix B).

C. Predicting synchronization transition in coupled chaotic food-chain systems
We consider the following mutually coupled system of two food chains, each with three-species [56]: where x 1,2 , y 1,2 , z 1,2 are the population densities of the resource, consumer, and predator species, respectively, in each food chain.We use the parameter setting [56] by which each isolated food chain is a chaotic oscillator: (K, a c , b c , a p , b p , x 0 , y 0 ) = (0.99, 0.4, 2.009, 0.08, 2.876, 0.16129, 0.5).
We generate the training data from three distinct values of the coupling parameter: ε = 4.5×10 −3 , 5.5×10 −3 , and 6.5× 10 −3 , all in the desynchronization regime.For each value, we generate a time series of length T = 9, 600δt with δt = 0.5, after disregarding the initial T 0 = 8, 000 steps to remove any transient behavior.All six dynamical variables of the coupled system as well as the coupling strength are fed as input to the reservoir machine.The values of the hyperparameters of the reservoir machine are set as (D r , p, σ, ρ, α, ε k , ε bias ) = (1000, 0.695, 2.30, 1.20, 0.27, 0.049, 1.53), where the elements of the reservoir network A are sampled from a standard normal distribution, and the regression parameter is λ = 3 × 10 −4 .
We use the trained reservoir to predict the synchronization behaviors in the coupling range ε ∈ [0.04, 0.012], by calculating the time-averaged synchronization error ∆x = |x 1 − x 2 | T over a time period of T = 2.4 × 10 4 steps for each parameter value (after discarding transients with T 0 = 4.8×10 4 steps).Figure 4 shows that the predicted error versus ε (red circles) is quite close to the true errors (black squares).The machine predicted synchronization transition point agrees with the true value well.This is quite remarkable considering that only the time series from the desynchronization regime are used as the training data.

D. Predicting explosive synchronization in coupled nonidentical phase oscillators
The route to complete synchronization treated in Secs.III A, III B, and III C belongs to the category of second-order phase transition, where a physical quantity characterizing the degree of synchronization varies continuously (albeit non-smoothly) through the transition point.As demonstrated, our machine learning scheme is fully capable of predicting the transition in all the cases.Another type of synchronization transition that has been extensively studied is first-order phase transition, also known as explosive synchronization [27][28][29], where the onset of synchronization is abrupt in the sense that the underlying characterizing quantity changes discontinuously at the transition point.The question is whether our machine learning scheme can predict explosive synchronization.Here we present an affirmative answer using the paradigmatic system in the literature [27][28][29] for explosive synchronization: coupled nonidentical phase oscillators.
For simplicity, we consider a small star network of N = 4 nodes, as shown in Fig. 5(a), where the natural frequencies of the peripheral (leaf) nodes are identical and that of the hub node is proportional to its degree [28].The network dynamics is described by where l = 1, 2, 3 denote the leaf nodes, h denotes the hub node, ε is the uniform coupling strength, and k h = 3 is the degree of the hub node.The degree of network synchronization can be characterized by the order parameter where j = 1, . . ., N is the node index, N = 4 is the network size, | • | is the module function, and • T denotes the time average.
Setting ω = 1, we increase ε systematically from 0.3 to 0.8, and calculate the dependence of R on ε by simulating Eq. ( 9).In numerical simulations, the initial conditions of the oscillators are randomly chosen from the range (0, 2π] and the integration time step is ∆t = 0.05.Representative numerical results are shown in Fig. 5(b) (black solid squares).It can be seen that, at about ε f = 0.55, the value of R changes abruptly from about 0.45 to about 0.9 -the feature of a discontinuous, first-order transition.The dynamical origin of this type of explosive synchronization lies in the interplay between the heterogeneity of the network structure and dynamics, which occurs naturally in networks where the node degree and the natural frequency are positively correlated [29].Our goal is to use the reservoir machine trained by the time series from the desynchronization states to predict the critical coupling ε f .
The data used in training consist of time series from five values of ε, all inside the desynchronization regime: ε = 0.4, 0.425, 0.45, 0.475 and 0.5.For each ε value, we collect time series measurements from all nodes: x j = sin(θ j ) and y j = cos(θ j ).The input state vector is u = [x 1 , y 1 , x 2 , y 2 , x 3 , y 3 , x 4 , y 4 ] T , and we choose the data length to be 1.5 × 10 4 that contains five segments of measurements, one from each value of ε.These data, together with the parameter function ε(t), are fed into the machine for determining the output matrix W out .The parameters of the reservoir are (D r , p, σ, ρ, α, ε k , ε bias ) = (1000, 0.2, 1, 1.15, 1, 1, 0), and the regression parameter is λ = 1 × 10 −10 .
In the predicating phase, we increase the control parameter ε systematically from 0.3 to 0.8 with the step ∆ε = 0.01, and calculate from the machine output the variation of the synchronization order parameter R, where the output vector u = [x 1 , y 1 , x 2 , y 2 , x 3 , y 3 , x 4 , y 4 ] T is transformed back to the original state variables [θ 1 , θ 2 , θ 3 , θ 4 ] T through θ i = arctan(y i /x i ) + π/2 for x i ≥ 0 and θ i = arctan(y i /x i ) + π for x i < 0. The prediction results are shown in Fig. 5(b) (red open circles).It can be seen that, at about ε rc f = 0.56, the value of R changes suddenly from about 0.38 to about 0.8the feature of a first-order transition.
A distinct feature of explosive synchronization transitions is that when ε is varied in the opposite direction, the variation of R will follow a different path -the phenomenon of hysteresis [27][28][29].To demonstrate it, we decrease ε from 0.8 to 0.3.To numerically observe the hysteresis in this context, we apply small random perturbations of amplitude 5% to the natural frequency of the leaf nodes [28].but diverge from each other for ε ≤ ε f .Particularly, at ε f , we have R ≈ 0.9 for the backward path, whereas R ≈ 0.38 for the forward path.Along the backward path, as ε decreases from ε f , the value of R maintains at large values until the critical coupling ε b ≈ 0.51, where R is suddenly decreased from about 0.8 to about 0.4.Since ε b < ε f , a hysteresis loop of width ∆ε ≡ ε f − ε b emerges in the parameter region ε ∈ (ε b , ε f ).The dynamical mechanism for the hysteresis loop is the bistability of the synchronization manifold in this region, deemed as a necessary condition for generating a first-order phase transition [29].
To predict the backward transition path, we use five values of the coupling parameter in the strong synchronization regime to obtain the training data: ε = 0.65, 0.625, 0.6, 0.575 and 0.55.The input vector is constructed in the same way as for predicting the forward transition, and the hyperparameter values are (D r , p, σ, ρ) = (1 × 10 3 , 0.2, 1, 1.15) and the regression parameter is λ = 1 × 10 −7 .We decrease ε systematically from 0.8 to 0.3.The dependence of R on ε predicted by the machine is shown in Fig. 5(b) (open green downtriangles).It can be seen that the machine predictions agree well with the true behavior of the backward transition where, at the transition point ε b , the value of R decreases suddenly from about 0.8 to about 0.45.

IV. DISCUSSION
We have articulated and tested a model-free, machine learning scheme to predict the synchronization transition in systems of coupled oscillators.The machine is trained with time series collected from a small number of the coupling (control) parameter, all in the desynchronization regime, as well as the value of the control parameter itself through a specially designed input channel.Prediction is achieved by feeding any desired parameter value into the input parameter channel.A properly trained machine is able to not only reproduce, statistically, the nature of the collective dynamics at the training parameter values, but also predict, quantitatively, how the collective dynamics change with respect to the variations in the control parameter.Examples demonstrating the predictive power of our machine learning scheme include complete synchronization in coupled identical chaotic oscillators and explosive synchronization in coupled nonidentical phase oscillators.For complete synchronization, both the critical coupling for synchronization and the variation in the degree of synchronization about the critical point can be well predicted.For explosive synchronization, our scheme not only predicts the forward and backward critical couplings, but also reproduces the hysteresis loop associated with a first-order transition.Due to the importance of synchronization to the functionality and operation in many natural and man-made systems, our machine learning method may find broad applications.
Reservoir computing based prediction of chaotic systems is an extremely active field of research at the boundary between nonlinear dynamics and machine learning [30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48].The main contribution of the present work is the development of a reservoir computing scheme to predict or anticipate synchronization in systems of coupled nonlinear oscillators.Our work focuses on predicting the collective dynamics instead of the state evolution, based on the general idea to view the predictive power of reservoir computing as a kind of ability to replicate the dynamical "climate" of the target system [33], which is gained through training with time series data.Inspired by the recent works on conducting training at multiple parameter values to predict bifurcations [49,50], our work adopts a similar method by training the machine at a small number of control parameter values in the desynchronization regime to instill into the machine the ability to sense the change in the "synchronization climate" with the control parameter.With the desired (arbitrary) value of the control parameter fed into the input parameter channel, a well trained reservoir computing machine is then able to accurately predict the critical transition between desynchronization and synchronization, regardless of the nature of the underlying transition, e.g., second-order or first-order.
The type of collective dynamics tested in the present work is complete synchronization between a pair of coupled chaotic oscillators for which the transition is of the nature of second order, and the first order, explosive synchronization in a small network.To extend our work to other types of collective dynamics in large complex systems, such as partial (cluster) synchronization, chimera-like states and spiral waves, is worth pursuing.A difficulty with large systems is the requirement to use large reservoir networks so that the complexity of the machine can "overpower" that of the target system.Quantitatively, how the size of the reservoir network should be enlarged to accommodate an increase in the size of the target system as characterized by, e.g., a scaling law, remains unknown at the present.With the use of large reservoir networks come the issues of data requirement and computation overload, as to train a large reservoir machine not only requires massive data but also imposes a serious demand for the computational resource.One approach to deal with this difficulty is the parallel reservoir computing scheme [38,41].However, a recent work revealed that the parallel scheme may fail to sense and predict the phase coherence among a pair of coupled, nonidentical chaotic oscillators [47].It remains a worthy issue to study if the parallel scheme can be exploited to predict the collective dynamics among a large number of coupled oscillators.
In the main text, the lengths of the time series for training at different values of the control parameter are identical.For example, for the system of a pair of chaotic logistic maps in Sec.IIIA, the length of the time series is T = 2 × 10 3 .If time series of nonuniform length are used for training, the predication performance will be not affected.To demonstrate this, we calculate the machine predicted synchronization error versus the coupling parameter using training time series of length T = 1000, 2000 and 3000 for ε = 0.2, 0.22 and 0.24, respectively, as shown in Fig. 6  see that the reservoir machine still predicts well the transition about the critical point.Similar results are also obtained when the training data have the length T = 3000 for ε = 0.20, 2000 for ε = 0.22, and 1000 for ε = 0.24, as shown by the blue triangles in Fig. 6.The observation is that, insofar as the training data set is sufficient (e.g., > 1000), both the critical point and the associated transition behavior can be well predicted.

Effect of order of training at different control parameter values on predication performance
In our machine learning scheme, the training data are generated by combining the time series acquired from different values of the control parameter.We find that order of training at these parameter values does not affect the the predication performance.To demonstrate this, we use the results in Fig. 2(a) in the main text as a reference and obtain prediction results for altered training orders.Since data are obtained from three values of the control parameter ε (0.2, 0.22, and 0.24), there are six distinct sequences of training order.Figure 7 shows the machine predicted synchronization error ∆x versus ε for all six cases, together with the true results obtained from direct simulation of the target system.It can be seen that different orders of training lead to essentially the same result.A heuristic reason is that, in obtaining the output matrix W out , the regression operation is conducted for the entire, combined time series.For a linear regression, the orders by which the time series are combined have little effect on the result.

Impact of training control parameter values on predication performance
For the given number of training parameter values, the closer they are to the transition point, the better the predication performance will be.To demonstrate this, we consider the system of coupled chaotic Logistic maps in the main text and test the predication performance by changing the values of ε to ε M − 0.04, ε M − 0.02 and ε M , where ε M is below the synchronization transition point so that all three parameter values are still in the desynchronization regime.We define the following prediction error about the transition as   An increase in m above three improves the predication performance but only slightly, indicating that the reservoir computing machine is already able to capture the dynamical "climate" of the target system with training conducted at three values of the control parameter.

Impact of the number of training parameter values on predication performance
Increasing the number m of control parameter values in the desynchronization regime for training can in general improve the prediction performance, but only incrementally, insofar as There is no synchronization in the first three cases, but there is for the last case, in agreement with the result of the analysis based on the master stability function that the oscillators are completely synchronized for ε > εc = 3.7.
training is conducted based on time series data collected from at least two values of the control parameter.To demonstrate this, we take the model in Fig. 2(a) in the main text and investigate how the value of m impacts the predicted synchronization error, For simplicity, we choose the m training parameter values from the interval [0.20, 0.24] (including the two ending points). Figure 9 shows ∆x e versus m.As the value of m increases from 2 to 4, the predication error is reduced but the trend slows down for m > 4, indicating that it is generally not necessary to conduct the training for more than three or four values of the control parameter.

FIG. 1 .
FIG. 1. Predicted synchronization behavior for different values of the bifurcation parameter in coupled chaotic maps.The system consists of a pair of coupled chaotic logistic maps with coupling parameter ε and respective dynamical variables x1 and x2.Top row (a1-d1): the predicted (black dots) and true (red dots) returned map constructed from x1 for ε = 0.2, 0.22, 0.24, and 0.26, respectively; bottom row (a2-d2): the predicted (black) and true (red) mutual relationship between x1 and x2 for the same set of parameter values, where a diagonal line represents complete synchronization.

25 FIG. 2 .
FIG. 2. Predicting synchronization transition in coupled chaotic logistic maps.(a) As the coupling is strengthened, the synchronization error ∆x gradually decreases to zero at about ε1 ≈ 0.25.(b) The error ∆x starts to increase from zero at about ε2 ≈ 0.75.The vertical dashed lines denote the coupling parameter values used in generating the training data.The machine-predicted and true results are represented as red circles and black squares, respectively.The machine predicts correctly the transitions at both ends of the synchronization parameter regime (ε1, ε2).

5 FIG. 3 .
FIG. 3. Predicting synchronization transition in coupled chaotic Lorenz oscillators.(a1-a3) Machine-generated time evolution of x1 (black trace) and x2 (red trace) for the three training values of the control parameter in the desynchronization regime: ε = 0.25, 0.3, and 0.35.(a4) Predicted synchronization behavior for ε = 0.45.The machine has never been exposed to data from this parameter value, yet it successfully predicts synchronization.(b) Synchronization error ∆x versus ε.Red circles and black squares represent the machine-predicted and true results, respectively.The three vertical dashed lines indicate the locations of the three training parameter values.

3 FIG. 4 .
FIG. 4. Predicting synchronization transition in coupled chaotic food chains.Shown are the true synchronization errors (black squares) and the reservoir's predicted errors (red circles) for various values of the coupling strength ε.The three vertical blue dashed lines indicate the locations of the training parameter values.

FIG. 5 .
FIG. 5. Predicting explosive synchronization transitions in coupled nonidentical phase oscillators.(a) A star network.(b) Synchronization order parameter R versus the coupling strength ε for the forward and backward transition paths.The true transitions obtained from model simulations are represented by black solid squares (forward transition) and blue solid up-triangles (backward transition).The corresponding machine predictions are displayed as red open circles (forward transition) and green empty down-triangles (backward transition).The values of the coupling parameters used to generate the training data for predicting the forward and backward transitions are marked by the black and red arrows, respectively.

6 FIG. 7 .
FIG. 7. Effect of the order of training at different control parameter values on predication performance.The model and parameters are identical to those in Fig. 2(a) in the main text, while the order of training at the three values of the control parameter (ε = 0.2, 0.22 and 0.24, as indicated by the vertical dashed lines) is altered.Shown is the synchronization error ∆x versus ε.A change in the training order has little effect on the performance.

Figure 8 showsFIG. 8 .
FIG. 8. Impact of training control parameter values on predication performance.With the model for Fig. 2(a) in the main text, the values of the control parameter for training (ε = 0.2, 0.22 and 0.24) are shifted by a small amount toward the critical point ε1 = 0.25.Shown is the predication error ∆xe versus the parameter value for training, denoted as εM .As the training parameter values move closer to the critical point (decreasing εM ), the predication performance is continuously improved.

FIG. 9 .
FIG. 9. Impact of the number of training parameter values on predication performance.The model and parameter values are identical to those in Fig. 2(a) in the main text, except that the number m of control parameter values chosen to obtain the time series data for training now varies.Shown is the predication error ∆xe versus m.An increase in m above three improves the predication performance but only slightly, indicating that the reservoir computing machine is already able to capture the dynamical "climate" of the target system with training conducted at three values of the control parameter.

8 FIG. 10 .
FIG. 10.Predicting collective dynamical behaviors arising from the system of two coupled Lorenz chaotic oscillators under the x coupling scheme.Shown is the time evolution of the reservoir outputs x1 (black curve) and x2 (red curve) for different values of the control parameter: (a) ε = 2.0, (b) ε = 2.5, (c) ε = 3.0, and (d) ε = 3.8.There is no synchronization in the first three cases, but there is for the last case, in agreement with the result of the analysis based on the master stability function that the oscillators are completely synchronized for ε > εc = 3.7.
(red discs).Comparing with the results of from direct model simulations (black squares), we FIG.6.Effect of nonuniform segment length on predication performance.Shown is the variation of the synchronization error ∆x versus the coupling parameter ε for the system of coupled chaotic logistic maps in the main text.The parameter values are the same as those in Fig.2(a) in the main text, but with nonuniform length of the time series for different values of the training parameter.Red discs: T = 1000, 2000 and 3000 for ε = 0.2, 0.22 and 0.24, respectively; blue triangles: T = 3000, 2000 and 1000 for ε = 0.2, 0.22 and 0.24, respectively.The black squares denote the corresponding results from the target system.The vertical dashed lines denote the values of the control parameter for training.