Discover governing differential equations from evolving systems

Discovering the governing equations of evolving systems from available observations is essential and challenging. In this paper, we consider a new scenario: discovering governing equations from streaming data. Current methods struggle to discover governing differential equations with considering measurements as a whole, leading to failure to handle this task. We propose an online modeling method capable of handling samples one by one sequentially by modeling streaming data instead of processing the entire dataset. The proposed method performs well in discovering ordinary differential equations (ODEs) and partial differential equations (PDEs) from streaming data. Evolving systems are changing over time, which invariably changes with system status. Thus, finding the exact change points is critical. The measurement generated from a changed system is distributed dissimilarly to before; hence, the difference can be identified by the proposed method. Our proposal is competitive in identifying the change points and discovering governing differential equations in three hybrid systems and two switching linear systems.


I. INTRODUCTION
Research on the hidden information in measurements that performs a transformative impact on the discovery of dynamical systems has attracted the attention of scientific researchers [1][2][3][4]. Many canonical dynamics are rooted in conservation laws and phenomenological behaviors in physical engineering and biological science [5]. Enabled by the increasing maturity of sensor technology, data-driven methods have promoted various innovations in describing time series generated from experiments [4]. Due to the complexity of dynamical systems, revealing the underlying governing equations representing the physical laws from time-series data that gives a general description of the spatiotemporal activities is a tremendous challenge [3,[6][7][8][9][10][11][12].
In a series of developments, modeling methods for complex systems include empirical dynamic modeling [13], equation-free modeling [14,15], and modeling emergent behavior [16]. Discovery patterns contain normal form identification [17], nonlinear Laplacian spectral analysis [18], dynamic automatic inference [19], and nonlinear regression [20]. Overall, popular methods for datadriven discovery of complex systems are mainly based on sparse identification [21,22] and deep neural networks * kwu@xidian.edu.cn [23,24], such as sparse identification of nonlinear dynamics (SINDy) [25] and PDE-Net [26]. Existing methods have provided an encouraging performance on systems motivated by unchanging rules or architectures, where novelty and variability are seen as disruptive [27].
However, although capable of effectively handling spatiotemporal sequences, the above algorithms are batch learning methodologies whose common problem is ephemeral fitting [13,28], thereby leading to the abortive modeling for streaming data consecutively generated like water flow. The overwhelming majority of complex systems are constantly running and generating new observations one by one over time (such as financial markets, social networks, and neurological connectivity, etc.) [29][30][31][32], resulting in failing to update gradients in the SINDy. In real-time systems, we must adapt to different stream parts and produce an immediate output before seeing the next input.
Moreover, several SINDy-based methods have been demonstrated to perform online tasks. Autoencoder SINDy is trained offline and queried online to compute the time evolution of the full order system in different conditions [33]. [34] claims that the SINDy-based algorithm holds feasibility in online applications due to its few computational time for all cases, owing to a little tuning of parameters and Fast Fourier Transform. However, these methods cannot handle streaming data and discover the change points in evolving systems. The most related work is the method in [35], which processes so-lution snapshots that arrive sequentially. It is a small batch method. Its performance and training iterations depend on the length of snapshots, which will seriously affect the evaluation of change points. Thus, [35] has difficulty finding the change point hidden in solution snapshots on account of the nature of the mini-batch method, which limits the application of this method to switching systems.
To the concrete, we regard the measurements reflecting the system status as invariably changing streaming data. In this article, we present a novel method, called Online-Sparse Identification of Nonlinear Dynamics (O-SINDy), to develop a parsimonious model that most accurately represents the real-time streaming data. Our proposal handles samples sequentially by modeling subsampled spatiotemporal sequences as streaming measurements instead of processing the entire dataset directly. Moreover, O-SINDy can bypass batch storage and intractable cases of combinatorial brute-force search across all possible candidate terms.
Mathematically, we conduct experiments with diverse spatiotemporal measurements generated from dynamical systems to verify the excellent performance of O-SINDy. By recovering extensively representative physical system expressions solely from streaming data, the method is demonstrated to work on various canonical instances, including the nonlinear Lorenz system, Burgers' equation, reaction-diffusion equation, etc. Experimental results show that our framework provides an online technique for real-time online analysis of complex systems. We suggest that O-SINDy, which overcomes the limitations of batch learning methodologies, is competent for deducing the governing differential equations if sequential measurements of complex dynamic systems are available.
With the ability to handle streaming data, O-SINDy can ideally cope with the parameter estimation of timevarying nonlinear dynamics and is general enough to detect multiple types of evolutionary patterns. O-SINDy outperforms the state-of-the-art methods in discovering governing differential equations in the evolving twodimensional damped harmonic oscillators, Susceptible-Infected-Recovered disease model with time-varying transmission rates, and two switching linear systems.

II. METHODS
With large amounts of data arriving as streams, any offline machine learning algorithm that attempts to store the complete dataset for analysis will fail due to running out of memory. The rise of streaming data raises the technical challenge of a real-time system, which must do any processing or learning encouraged by each record in arriving order. Let u denote the state vector of a realtime system at time t, and u t represent its partial differentiation in time domain. The system receives streaming inputs: · · · , u t−3 , u t−2 , u t−1 , u t , u t+1 , u t+2 , u t+3 , · · · For instance, vector u represents the position information of the particle shown in Fig. 1(a). Every step, the particle moves along the trajectory; the data will be captured by sensors and sent to the system that needs to analyze or respond in time. Instead of storing the entire stream, continuous online learning faces one input u t at a time.
The rigorous model of a complex system is always a set of differential equations with unknown physics parameters [36]. Without losing generality, we consider a general parameterized physical system of the following form: where the terms with subscript x represent partial differentiation of u in space domain, "1" denotes the constant, and N (•) is an unknown combination of the nonlinear functions, partial derivatives, constant, and additional terms of u(x, t). Our goal is to select the correct terms that are most relevant to dynamic information on realtime circumstance. In view of measurements of all considered state variables U , the right-hand side of Eq. (1) can be expressed by multiplying the library function matrix Θ (U ) with the coefficient matrix Ξ as follows: where Columns of Θ (U ), θ s (U ) (s = 1, 2, . . . , p), correspond to p specific candidate terms for the governing equation, as shown in Fig. 1(b).
O-SINDy is divided into two steps: (i) build complete libraries of candidate terms; (ii) update the structure of governing equations via the FTRL-Proximal style methodology. Each procedure is described as follows.

A. Build a candidate library
We start by collecting the spatiotemporal series data at m time points and n spatial locations of the state variables. For each state variable, the captured state measurements are represented as a single column state vector u ∈ C nm×1 . Based on all the observables, e.g., variables x, y, and z in Lorenz systems, a series of functional terms associated with these quantities can be calculated and then reshaped into a single column as well, such as x 2 , xy, xyz, x 2 y, etc. Likewise, partial differential terms should be considered in the candidate library if PDEs govern the dynamics. Furthermore, "1" denotes the constant term that possibly appears in equations. The compositive function library Θ (U ) ∈ C nm×p is a matrix that contains p designed functional terms. Note that the computed time derivative u t ∈ C nm×1 is also a single-column vector presented on the left-hand side of Eq. (2). The constant term in the library Θ sufficiently considers the bias term in the governing differential equation so that  the model can be regarded as an unbiased representation of dynamics. For O-SINDy, an important prerequisite for revealing the correct governing differential equation is that the candidate function library contains all the members which constitute the concise dynamics expression, so as to ensure that the exact sparse coefficient Ξ is computed under iterations.

B. Optimization
Formulating a hypothesis that the governing differential equations are evolving at any moment, we model each example as a dynamic process to simulate the arrival of streaming data. Given a set of time-series state measurements at a fixed number of spatial locations in x, the goal is to construct the form of N (•) online. To fix this issue, the core of our innovation is utilizing real-time streaming data to denote the loss function. Considering a coefficient vector in Ξ, ξ j (j = 1, 2, . . . , d), which corresponds to the specific state variable u, the fitness function is designed as follows: where λ 1 ≥ 0 denotes the regularization coefficient, n is the number of positions, m denotes the length of the time series, ξ j = [ξ j1 , ξ j2 , . . . , ξ jp ] T , and L i (ξ j ) is defined as follows: whereU i is the time derivative of the ith state observation, Θ i is the ith row of the common library Θ that represents all candidate function values for the ith data. The sparsity constraints mean that the coefficient vector ξ j is sparse with only a few non-zero entries, each representing a subsistent item in the function library. Subsequently, we exploit the follow-the-regularized-leader (FTRL)-Proximal [37][38][39] style methodology to optimize the outcome by considering solely one instance each time.
Leveraging the fact that each instance is considered individually, we rewrite the loss function and define a single loss term (see Eq. (5)). For example, if the first state measurement u 1 is available, the loss is defined as: the second instance of u 2 arrives, the loss is defined as: In this way, the computer only needs to store information about a single example, thus relieving memory stress. On the ith sample, the gradient is calculated as follows: Normally, the online gradient descent algorithm can be used to update the coefficient vector ξ i j after the arrival of the ith data by using: However, this method has been proven to lack general applicability [38]. Correspondingly, the FTRL-proximal style approach is introduced to solve the online problem. We use the following equation to update the coefficient vector ξ i j : where λ 1 and λ 2 both denote the regularization coefficient, which is the positive constant, while σ k is on the ith learning rate η i aspect, defined by: Given the gradient ∇L i (ξ j ) a shorthand g i , we set g 1:i = i k=1 g k . Then the update in Eq. (8) can be rewritten as follows: where the last term 0.5 i k=1 σ k ξ k j 2 2 is a constant with respect to ξ j and negligibly impacts the update process.
and we ignore the last term and rewrite Eq. (10) as: (11) Let ξ js and Z i,s (s = 1, 2, . . . , p) represent the sth element of the vector ξ j and Z i , respectively. Equation (9) is generalized as Aiming to simplify the expression, we suppose w * to be the optimal solution of ξ i js and Φ ∈ ∇w * to be its gradient. Then, Eq. (13) is satisfied. Equation (14) shows the subdifferential of w = sgn(w).
Moreover, we set a remarkable learning rate Compute ξ i jk using Eq. (20); 10 for each element k in I do where α and β are both positive constants, g k,s denotes the sth element of the gradient g k and η is is the learning rate of g k,s . Thus, Eq. (9) can be rewritten as follows: The final form of the closed solution is obtained and expressed as where α and β are both positive constants, g k,s denotes the sth element of the gradient g k . Notably, ξ js is always updated according to the current input instance, thereby guiding the detection of change points if there exists any variation in the evolving system. Additionally, the mechanism frees the computer from storing the whole large-scale data. It should be highlighted that the method suggested above is entirely different from the batch learning methods, which update gradients depending on all available measurements. O-SINDy [41] takes advantage of a single available instance and updates gradients based on the current loss L i (ξ j ) (i = 1, 2, . . . , mn) in each iteration, thereby being easily extended to handle evolving systems. Note that the optimum selection of hyper-parameters in each case is determined by grid search (10 i , i = −8, −7, . . . , 1).

III. RESULTS
A. Discovering Single System from Streaming Data

Example -Discovering Chaotic Lorenz System
The online algorithmic procedure for identifying correct governing differential equations of the chaotic Lorenz system from time series is demonstrated in Fig. 1. Our proposal combines data collection, a library of potential candidate functions, and the online reconstruction method. The Lorenz system is highly nonlinear described by the following equations, according to which we construct a canonical dynamic instance for model discovery.
To indicate the nonlinear combination of Lorenz variables, common parameters are σ = 10, β=8/3, and ρ=28, with the initial condition of (x 0 , y 0 , z 0 ) T = (−8, 7, 27) T . Ref. [3] has employed the SINDy autoencoder to discover a parsimonious model with only seven active terms that seems to mismatch the original Lorenz system. Interestingly, the dynamics of the resulting model exhibit an attractor with a 2-lobe structure, which is qualitatively consistent with the true Lorentz attractor. Then the sparsity pattern can be rewritten in the standard form by choosing an appropriate variable transformation. Considering the chaotic nature, O-SINDy show the ability to capture the concise model on two Lorenz systems. Results in Fig. 1(d) and Table I imply that O-SINDy can accurately reproduce the attractor dynamics from chaotic trajectory measurements. In both cases, we conducted 10000 timesteps with time intervals dt=0.01.
Given time intervals and initial conditions, we simulate the Lorenz system within a certain time horizon. Specifically, state measurements are collected under the constraints of the transformed expressions, including historical data of the state U and its time derivative U t . The combination of spatial and temporal modes results in the trajectory of dynamics shown in Fig. 1(a). A library of potential candidate functions, Θ (U ), is constructed to find the least terms required to satisfy U t = Θ (U ) Ξ. Candidate functions can be polynomial functions, trigonometric functions, exponential functions, logarithmic functions, partial derivatives, constant items, and any additional terms about U . The proposed online technique can identify the coefficient Ξ = [ξ 1 ; ξ 2 ; . . . ; ξ d ] to determine active terms of the dynamics. To elude the obstacle of large-scale data storage and batch gradient update, we design a cumulative loss function to reflect the mode of streaming data by solely TABLE I. Summary of online identification results for a wide range of canonical models. O-SINDy is applied to reconstruct the correct model structure in each example. Settings for the spatial and temporal sampling of the numerical simulation data is given, along with the relative L2 error in recovering the parameters of these dynamical models.
x, y ∈ [−10, 10] , n = 512 analyzing the information of one sample at a time. The pre-established sparse scenario guarantees that the resulting solution obtained by the online method can effectively balance model complexity with description ability to avoid overfitting, thereby promoting its interpretability and extensibility. The prompt update of components and coefficients in the governing equation is one implementation of data arrival based synchronization. Consequently, we can subtly detect moderate or drastic variations in the system through changes in the resulting model.

Performance on Discovery for Canonical Models
We perform O-SINDy on ten canonical models in mathematical physics and engineering sciences, and the results are shown in Table I. These physical systems contain dynamics with periodic to chaotic behavior, ranging from linear to strongly nonlinear systems. The settings for spatial and temporal sampling and the coefficient errors are detailed in Table I. Encouragingly, O-SINDy can recover every physical system, even those with significant spatial subsampling. The notable results highlight the broad applicability of the method and the success of this online technique in discovering governing differen-tial equations. Remarkably, the capacity for capturing nontrivial active terms, in particular, has essential explanatory implications for model discovery.

B. Discovering Hybrid Systems
We first consider an evolving two-dimensional damped harmonic oscillator that changes from cubic to linear.
A particle in simple harmonic vibration is known as a harmonic oscillator, whose motion is the simplest ideal vibration model. Fig. 2 illustrates the dynamic data and the detection of change points. Firstly, we create the solution to Eq. (25), U 1 , with 2,500 timesteps and the initial data [2,0]. Then, we utilize the last instance in U 1 as the initial data and create the solution to Eq. timesteps as well, recorded as U 2 . Specifically, the change point arises when the system changes, and then the original distribution of generated data varies. Therefore, the corresponding loss value gradually decreases and stabilizes unless the change point appears. In this context, the coefficients in the former governing equation and the change location are simultaneously identified according to the sudden increase of loss value.
With an augmented nonlinear library including polynomials up to the fifth order, the change point is spotted in evolving circumstances. The loss curve tends to stabilize faster after resetting the training coefficient, thereby accelerating the convergence of O-SINDy because the former result may mislead the training process.
The transition from the KdV to the Kuramoto-Sivashinsky equation is simulated as an illustrative example that exhibits qualitatively different dynamics as the system evolves. Summary of O-SINDy and the stateof-the-art methods in SINDy for this identification in Table II demonstrates that our approach works well. Most data-driven methods regard batch measurements as a whole and update gradients integrally to obtain an ephemeral fitting. Owing to the ignorance of variations, existing methodologies are biased to strike a compromise solution, so that strenuous to identify the changes from consecutively generated data. Quite the contrary, O-SINDy succeeds in simultaneously estimating forms and identifying changes of the governing equations.
Moreover, we investigate the Susceptible-Infected-Recovered (SIR) disease model with time-varying transmission rates, widely studied in epidemiology [45,46]. The following equation is a mathematical description of the SIR model.Ṡ It is a time-dependent hybrid dynamical system, where v = d = 0.0027, N = 1000 is the total population, γ = 0.2, b = 0.8, andβ = 9.336 is a base transmission rate. We simulate the model for three years within the initial condition at [S 0 , I 0 , R 0 ] = [12,13,975], recording at a daily interval. A random perturbation is added to the start of each season with the same probability.
To illustrate the dynamics of the SIR model, we display the recovered population by subtracting 920 and scaling the loss in O-SINDy by the maximum value. As shown in Fig. 3, O-SINDy captures the transition between states by the dramatically increasing loss. In this example, the perturbation is able to be identified utilizing solely the information of the recovered population, which is ignored in Hybrid-Sparse Identification of Nonlinear Dynamics (Hybrid-SINDy) [46]. It is an offline batch learning method and outputs the governing equation of a single day by repeatedly clustering the whole time-series data during the model selection step. The identified governing equations of four random days in the four seasons are summarized in Table III. The results show that O-SINDy provides more precise reconstruction accuracy for a single day. Specifically, instances of the same coefficients share the hyper-parameters in O-SINDy. At the same time, Hybrid-SINDy selects a model of the highest frequency from results recovered by SINDy with different hyper-parameters for each certain instance. Moreover, the effect of cluster size is particularly vital to solutions in Hybrid-SINDy.

C. Discovering Switching Linear Systems
We also test O-SINDy on two switching linear systems, including synthetic and real-world datasets. Define x t ∈ R N as the sampled trajectory of a dynamical system, N is the feature dimension. Time-varying autoregressive model with low-rank tensors (TVART) [47] holds the assumption, x t+1 = A t x t , where A t is constant within a time window but varies in the switching system. On the contrary, O-SINDy discards the time window and admits that A t may change at every time point.
Regarding the synthetic example, we randomly generate two system matrices A 1 N ×N and A 2 N ×N to encourage the switching system. The dynamics follow A 1 N ×N and switch to A 2 N ×N after half of the time series. Results illustrated in Fig. 4 show that O-SINDy is competitive with the state-of-the-art technique TVART.
The posture dynamics of the worm Caenorhabditis elegans [48,49] is studied as the real-world example for  clustering via adaptive, locally linear models [50]. This time-series data details the escape behavior of the worm in response to a heat stimulus, e.g., the transition from forward, turn, to backward crawling. Fig. 5 characterizes the true dynamics and the discovered modes corresponding to its three typical dynamical regimes. It should be highlighted that the bifurcation of the worm dynamics is interpretable on a smaller time scale with the application of O-SINDy.

IV. CONCLUSION AND DISCUSSION
We have proposed an online modeling method, O-SINDy, capable of finding governing differential equations of evolving systems from streaming data. We demonstrate that O-SINDy works on various canonical instances. Results in Table I indicate that the inference of governing equation is accurate when utilizing O-SINDy on time series from numerical simulations. Additionally, we show the excellent performance of simultaneously estimating forms and identifying changes in time-varying dynamic systems.
More sampling points or longer time series correspond to the preferable identification of the internal control structure, while the KdV equation is an exception (see Table IV). One potential viewpoint is that approximating the soliton solution introduces great uncertainty. Nevertheless, it remains an open question to estimate the required time-series length for distilling the accurate underlying governing differential equations. The implementation of existing methodologies fundamentally depends on sufficiently large datasets, even though the dynamics are only a parsimonious representation. O-SINDy, on the contrary, is demonstrated to recover the true governing equations by iteratively reusing the finite available time series. Note that the computation of time derivatives results in the main error, which is magnified by the numerical roundoff. Thus, correctly estimating numerical derivatives is the most critical and challenging task for O-SINDy, especially in a noisy context.
O-SINDy is a viable tool capable of tackling stream-ing data from evolving systems for accomplishing the assignment of model discovery. The integration opens up a novel, interesting research insight for real-time modeling, online analysis, and control techniques of complex dynamic systems. For massive-scale datasets, sparse sampling can be used to reduce the data size. For identification, it should be noted that we only require a small number of spatial points and their neighbors, whose responsibility is estimating the partial derivative terms in the candidate library. That is, local information around each measurement is necessarily wanted. Distinction allowing for application to subsampled spatiotemporal sequences, to an extent, is critically important due to experimentally and computationally prohibitive implementation of collecting full-state measurements [4]. In terms of derivation, second-order finite differences [51] are devoted to the clean data from numerical simulations, while the easiest to implement and most reliable method for the noisy data is a polynomial interpolation [52]. In principle, we abandon those points close to the boundaries due to the absence of numerical derivatives. If we know any prior knowledge about the governing equation, for instance, if one of the potential terms is determined to be nonzero in advance, we can apply the additional information to the initialization phase. Moreover, truncation of the solution is a tool to maintain its sparsity, and different threshold values may provide distinguishing sparsity levels of the final output. As for the artificial noise added to the state measurements of governing equations, we use white noise. The noisy data is directly trained in experiments. In terms of the reaction-diffusion equation, exceptionally, we use the singular value decomposition (SVD) [53] to denoise some noisy spatiotemporal series, and the result is a low-dimensional approximation of datasets.
Appendix B: Canonical Models

Reaction Diffusion
The dynamic of reaction-diffusion systems is defined by the following equations, which provide great expressiveness and freedom although they are deceptively simple.
Given the vast number of data points, we subsample 5000 discretized spatial points with 30 time points each.

Hopf Norm Form
In this example, the application of O-SINDy is extended to a parameterized normal form of Poincaré-Andronov-Hopf bifurcation.
We explored a normal form of Hopf bifurcation on two-dimensional ordinary differential equations systems. Given w = 1 and A = 1, we collected data from the noisefree system for eight various values of the parameter µ.

Fokker-Planck Equation
Fokker-Planck equation that reflects the connection between the diffusion equation and Brownian motion has been taken into account. Considering the simplest form of the diffusion equation where the diffusion coefficient is 0.5 and the drift term is zero, the corresponding Fokker-Planck equation is u t = 0.5u xx . The Brownian motion model is usually simplified as a random walk in physics, such as the random movement of molecules in liquids and gases. To simulate the process, we add a distributed random variable with variance dt = 0.01 to the time series and sample the movement of the random walker.

Burgers' Equation
Here, we consider a fundamental partial differential equation in the fields of fluid mechanics, nonlinear acoustics, and aerodynamics. As a dissipative system in onedimensional space, the general form of Burgers' equation is described by a diffusion coefficient c (also known as kinematic viscosity). The speed of the fluid at the indicated spatial coordinate x and temporal coordinate t in a thin ideal pipe can be expressed as the following equation: where the diffusion term u xx contributes the advective form of the Burgers' equation.

Korteweg-de Vries Equation
The Korteweg-de Vries (KdV) equation is a nonlinear, partial differential equation for a function u of two real variables, x and t, which refer to space and time, respectively. Numerically, its solutions seem to be decomposed into a collection of well-separated solitary waves that are almost unaffected in shape by passing through each other. The soliton solution is given by where c stands for the phase speed, a is an arbitrary constant and sech (x) = 1/cosh (x) is the hyperbolic secant function. This equation describes a right-moving soliton that propagates with a speed proportional to the amplitude.
The correct equation cannot be distinguished from a single propagating soliton solution, due to the fact that some studied expressions may be solutions to more than one PDE [4]. For example, both the one-way wave equation u t + cu x = 0 and the KdV equation admit the same traveling wave solution of the form u = f (x − ct) if the initial data was a hyperbolic secant squared. Hence, we constructed time-series data for more than a single initial amplitude to rectify the ambiguity in selecting the governing PDE, thereby enabling the unique determination.
As the circumstance under a single propagating soliton solution, we respectively constructed two solutions without noise having the traveling speed c equal to 5 or 1 on grids with 6 timesteps and 256 spatial points. Ultimately, corresponding advection equations with different c were identified using a single traveling wave. We believe that two waves with different amplitudes and speeds may solve the KdV equation.

Kuramoto-Sivashinsky Equation
The Kuramoto-Sivashinsky (KS) equation is a fourthorder nonlinear PDE derived by Yoshiki Kuramoto and Gregory Sivashinsky in the late 1970s. Specifically, it provides two dissipative terms u xxx and u xxxx based on Burgers' equation, where the fourth-order diffusion term accomplishes the stabilizing regularization rather than the second-order diffusion term u xx , which leads to long-wavelength instabilities. By leveraging a spectral method, the numerical solution to the KS equation was created with 101 timesteps and 1024 spatial points.