Data-driven discovery and extrapolation of parameterized pattern-forming dynamics

Pattern-forming systems can exhibit a diverse array of complex behaviors as external parameters are varied, enabling a variety of useful functions in biological and engineered systems. First-principles derivations of the underlying transitions can be characterized using bifurcation theory on model systems whose governing equations are known. In contrast, data-driven methods for more complicated and realistic systems whose governing evolution dynamics are unknown have only recently been developed. Here we develop a data-driven approach, the {\em sparse identification for nonlinear dynamics with control parameters} (SINDyCP), to discover dynamics for systems with adjustable control parameters, such as an external driving strength. We demonstrate the method on systems of varying complexity, ranging from discrete maps to systems of partial differential equations. To mitigate the impact of measurement noise, we also develop a weak formulation of SINDyCP and assess its performance on noisy data. We demonstrate applications including the discovery of universal pattern-formation equations, and their bifurcation dependencies, directly from data accessible from experiments and the extrapolation of predictions beyond the weakly nonlinear regime near the onset of an instability.

Data-driven approaches to system identification are undergoing a revolution, spurred by the increasing availability of computational resources, data, and the development of novel and reliable machine learning algorithms [1][2][3].The sparse identification of nonlinear dynamics (SINDy) is a particularly simple and flexible mathematical approach that leverages efficient sparse optimization algorithms in the automated discovery of complex system dynamics and governing equations [4].In this Letter, we leverage the SINDy model discovery framework to understand parametric dependencies and underlying bifurcations in pattern-forming systems.Specifically, we develop the SINDY with control parameters (SINDyCP) to discover such parameterized dynamics.
It has been 30 years since Cross and Hohenberg's seminal and authoritative review consolidating an exceptionally large body of work on pattern formation across a broad range of physical systems [5].Universal equations determined by normal forms of canonical bifurcations [6], such as the complex Ginzburg-Landau equation [7], govern the formation of patterns near the onset of instabilities across scientific disciplines.Such equations continue to reveal insights into complex systems, including in the study of, for example, synchronization, biophysics, active matter, and quantum dynamics [8,9].
Despite the success of pattern-formation theory in modeling complex dynamics, ongoing challenges remain in applying such model equations more broadly.Firstprinciple derivations and the computation of normalform parameters in terms of physical driving parameters are tedious, costly, and error prone.Furthermore, the resulting weakly nonlinear models are only theoretically justified near the onset of instability, while inter-esting and important pattern-forming processes often occur far from the instability threshold.Recent advances in data-driven system identification are opening new avenues of research to address these challenges, including a paradigm for modeling strongly nonlinear regimes beyond the asymptotic approximations reviewed by Cross and Hohenberg [5].
The SINDy model discovery framework is particularly well suited to the modern analysis of bifurcations and normal forms, as it generates interpretable models that have as few terms as possible, balancing model complexity and descriptive capability.A variety of extensions of the SINDy approach have been developed since its introduction.For example, SINDYc enables the discovery of systems subject to external control signals [10][11][12], while PDEFind [13,14] enables the discovery of spatiotemporal dynamics characterized by partial differential equations (PDEs).Data-driven approaches can also learn to disambiguate between parametric dependency and governing equations and discover bifurcations [4,15,16].Model pattern-formation equations typically encode the effects of external drive through a number of driving parameters, which characterize the bifurcation leading to the onset of instability.Several recent works establish system identification on pattern-forming systems ranging from closure models for fluid turbulence [17][18][19][20] to biochemical reactions and active matter systems [21][22][23].These approaches show promise, but crucially, they have not demonstrated the ability to extrapolate by detecting pattern-forming instabilities that may develop when driving parameters differ from those used in the training data.
FIG. 1. Schematic of the SINDyCP approach.Data collected from sample trajectories collected under various driving parameters are processed to construct a matrix of time derivatives, a feature library Θ feat of possible governing terms, and a parameter library Θpar of parametric dependencies.Sparse regression is applied to the library coefficients ξ to identify a parameterized governing equation.
PySINDy repository [24,25], enabling other powerful methods to be used in conjunction (see Supplemental Material Sec.S1A [30]).In particular, we develop and assess a weak formulation [26][27][28][29] of SINDyCP, which shows excellent performance on noisy data.We demonstrate that the method can be easily and effectively employed to discover accurate parameterized models from the kind of data available in typical pattern-formation experiments and that these parameterized models enable extrapolation beyond the conditions under which they were developed.
Building the library.Figure 1 illustrates the SINDyCP approach applied to the spatiotemporal evolution of four trajectories of the complex Ginzburg-Landau equation which is described by a complex dependent variable A(x, t) in two spatial dimensions x = (x, y).Ginzburg-Landau exhibits a stunning variety of patterns, depending on the bifurcation parameters b and c.We generate four trajectories with parameter values (b, c) = (2.0,1.0), (2.0, 0.75), (0.5, 0.5), and (1.0, 0.75), which exhibit differing dynamical phases, corresponding to amplitude turbulence, phase turbulence, stable waves, and frozen spiral glasses, respectively [7].
Our goal is to discover the partial differential equation for the real and imaginary components A = X + iY parameterized by b and c from time series data.
As with most SINDy algorithms, we first form a matrix of the input data X, whose columns correspond to the dependent variables and whose rows correspond to the sample measurements of the dependent variables.In the case of Fig. 1, for example, X consists of two columns corresponding to the real and imaginary parts of A and 4N x N y N t rows, where N x , N y , and N t are the number of sample points in the corresponding spatiotemporal dimensions; again, there are four trajectories.We then determine the temporal derivative Ẋ for each sample point, either through numerical differentiation or through direct measurements.
In basic SINDy, we define a matrix of library terms Θ = Θ(X) depending on the input data, which includes all possible terms that may be present in the differential equation that describes the temporal derivatives.These terms may be built from polynomial combinations of the dependent variables and their spatial derivatives, for example, although more general libraries are possible.In the SINDYc approach, we augment the library dependence with an external control signal U, i.e., Θ = Θ(X, U).The library terms are typically determined by appending the control variables to the dependent variables and again forming polynomials and derivatives.In the case in Fig. 1, we can treat the parameters as external control signals, U = (b, c) and apply SINDYc, but the traditional implementation of this approach will fail for PDEs, as we show.
SINDYc aims to find a sparse linear combination of the library terms determined by the vector of coefficients ξ which minimizes the fit error where sparsity promoting regularizing term λ |ξ| 0 penalizes nonzero coefficients via the L0 norm.Crucially, all SINDy methods employ sparse regression (with appropriate regularization) to determine a sparse set of nonzero coefficients ξ * .Such sparsity is expected in physicallyrelevant dynamics and produces parsimonious and interpretable models.Here, we employ the sequentially thresholded least-squares algorithm [4], which iteratively eliminates library terms with coefficients that fall below a threshold hyperparameter.One challenge that arises when applying the traditional SINDYc to control parameters in PDEs with existing implementations such as PySINDy is that the matrix of library terms Θ is traditionally formed by computing all polynomial combinations of spatial derivatives of the de-pendent and control variables.However, since the control parameters are spatially constant, the spatial derivatives will vanish identically, leading to a singular matrix Θ.Such degeneracies lead to poor numerical results on data with control parameters when combining PDEFind and SINDYc without modification.To overcome this challenge, we propose constructing a more general library through products of a feature library Θ feat (X) and a parameter library Θ par (U), as where the product ⊗ here is defined to give the matrix consisting of all combinations of products of columns between the libraries, i.e., the ith row of A ⊗ B contains all the products of the form A ij B ik where j and k span the columns of A and B, respectively.By distinguishing the feature and parameter library dependencies with this SINDyCP approach, we can construct much more targeted and well-conditioned libraries.
Using a feature library consisting of spatial derivatives up to third order and polynomials up to third order along with a linear parameter library, the SINDyCP approach easily discovers Eq. ( 1) in Cartesian coordinates, as shown in Fig. 1.Additional demonstrations of SINDyCP for maps and ordinary differential equations (ODEs) are available in the Supplemental Material Sec.S1B, and further details of the complex Ginzburg-Landau equation (CGLE) integration, along with an animation illustrating the temporal evolution of the sample trajectories, are available in the Supplemental Material Sec.S2 [30].
Amplitude dynamics beyond weakly nonlinear theory.To illustrate the application of SINDyCP to pattern formation, we implement a numerical demonstration with the Belousov-Zhabotinksy chemical reaction system.We numerically integrate the Oregonator model [31], which describes the evolution of oscillating chemical concentrations C X , C Y , and C Z for given supplied concentrations C A , C B , and C H , and stoichiometric coefficient ν, which depends on the experimental setup.We vary the concentration of C B and define a control parameter µ where C c B is the critical value where the Hopf bifurcation occurs.Section S3A of the Supplemental Material details the Oregonator model, along with a data-driven approach to detect and characterize the bifurcation point when the model is unknown [30].Here, we aim to develop a data-driven extension The fit correctly exhibits a canard explosion far from the onset of the instability (insets show CX above and below the explosion, and an animation of the evolution is available in the Supplemental Material [30]).
of Eq. ( 1) that incorporates nonlinear parameter dependence describing the dynamics far from the bifurcation.
We expect the dynamics near a Hopf point to be constrained to the two-dimensional center manifold, which describes the evolution of the complex amplitude dynamics governed by Eq. ( 1).When the governing equations are known, the weakly nonlinear theory develops a perturbative expansion near the Hopf point to express the complex amplitude A in terms of the state space given by . This theory follows from a near identity transformation of the governing equations up to cubic order, as detailed in the Supplemental Material Sec.S3B [30].
To demonstrate our approach, we develop a datadriven construction of the amplitude A and its dynamical equations when the governing equations for the state space are unknown that is effective even far from the Hopf point.This approach follows from a SINDyCP fit on time series data for any two independent measurements of the state space (motivated by previous results [31], we choose to use C X and C Z here).Because we will employ the normal-form transformation below, we employ a cubic feature library with polynomial terms up to third-order and second-order spatial derivatives and a parameter library with polynomial terms up to second order for the control parameter µ 1/2 .Furthermore, we employ implicit SINDy [32] by including first-order temporal derivatives in the feature library.Inverting the resulting implicit equations results in governing equations that are cubic in the state variables with coefficients that are rational functions of the control parameters, enabling the discovery of nonlinear corrections to parameter dependencies in the weakly nonlinear theory.Finally, by eliminating nonresonant coefficients using the normal-form transformation for cubic equations, we discover amplitude dynamics of the form in Eq. ( 1), but with normal-form coefficients c(µ) and b(µ) with rational dependence on the control parameter.Additional details on the data-driven amplitude construction are available in the Supplemental Material Sec.S3C [30].
Figure 2(a) shows the R 2 score of the model on test trajectories corresponding to the parameter values that the model was trained on (a value of R 2 = 1 means that the fit perfectly predicts the temporal derivatives of the data).For reference, we also perform an unparameterized SINDy fit [purple dots in Fig. 2(a)] on the training trajectory with the smallest µ value, which produces a model very close to the weakly nonlinear theory.The SINDyCP fit performs significantly better than the unparameterized fit, with 1−R 2 nearly an order of magnitude smaller for the larger µ values.
The normal-form parameters b(µ) and c(µ) agree with the analytic values derived [33] from the original model as µ → 0, but here we are able to discover them directly from data without any knowledge of the governing equations.Furthermore, as shown in Fig. 2(b), the variation of the parameters becomes extreme for µ 1/2 > 0.35, which we were able to discover via the implicit version of SINDy.In fact, as shown in Fig. 2(c), the Oregonator model exhibits a canard explosion (in which the limit cycle amplitude expands abruptly due to highly nonlinear effects) [31] around µ 1/2 ≈ 0.39, where the weakly nonlinear theory breaks down.The SINDyCP model reflects this breakdown and enables the development new models that account for it.
Weak formulation.The weak formulation utilizes integration against compactly supported "test functions" to defined the SINDy problem.The weak method shows excellent performance for noisy data, owing to its ability to minimize the need for computing numerical derivatives.Rather than forming samples (rows in Fig. 1) from spatiotemporal points for each trajectory, the weak method constructs the system rows by projecting the data onto weak samples such as where Ω k is a compactly supported sample domain, ϕ is the test function, and X (ν) i denotes the νth partial derivative the ith dependent variable.By moving derivatives off of the data and onto the test functions via integration by parts, the weak method significantly reduces the impact of measurement noise on the SINDy library and improves the fit results [34].
To maximize the performance of the weak method, we have optimized and fully vectorized numerical integration for the weak formulation in PySINDy, which can be easily combined with the SINDyCP library class.Products of weak features do not generally form reasonable samples for a SINDy model, since multiplication and integration do not commute, so at first sight, it is not clear how to combine weak-form feature and parameter libraries with SINDyCP.However, when computing the weak samples corresponding to constant functions, such as those that form the parameter library, the integrals simply represent the spatiotemporal volume of the domain Ω k .Our implementation thus rescales the weak features for the temporal derivatives by the same volumetric factors.Details of our implementation are presented in Sec.S4 of the Supplemental Material [30].
Performance.Using 500 randomly distributed sample domains (measuring 1/10th the spatiotemporal domain size in each direction), the weak SINDyCP easily identifies the complex Ginzburg-Landau equation using the same data used for the traditional differential form shown in Fig. 1.Furthermore, it can do so in just a few seconds of run-time on a modern processor in this case (over five times faster than the differential form).
To assess the impact of noise, we inject random Gaussian noise of varying intensity [35] into the four trajectories used as the training data for the complex Ginzburg-Landau equation.We then generate two new sample trajectories to use as testing data, with b = 2.0, 1.5 and c = 1.5, 1.0, respectively.Using the training data, we perform the SINDyCP fits using both the differential and weak formulation and evaluate the R 2 score on our test trajectories.Figure 3(a) shows the results for the R 2 score on the test trajectories.While both methods provide good fits for low noise intensity, only the weak method exhibits a robust fit for parameterized equations for large noise intensities.
The SINDyCP fit also requires a sufficient amount of data to identify governing equations.Figure 3(b) shows the performance of SINDyCP on the testing data for fits performed with a varying number of trajectories n t = 2, 3, 4, 5 and of varying length corresponding to a number of time samples N t = 25, 50, 75, 100, with an injected noise intensity of 10 −3 .Unlike the trajectories in Fig. 1, the parameters for trajectories were randomly generated, with (b, c) distributed as Gaussian random variables with means (1.5, 1.0) and standard deviations (0.5, 0.25).For too little data, the fit fails to identify the correct model, and the value of 1 − R 2 is O(1).The models improve moderately with an increasing number of samples per trajectory (the product of N t with the number of spatial grid points).More importantly, a sufficiently large number of trajectories n t is required to achieve a good fit (at least 3 in this case).The amount of data required will further increase when including a larger number of possible library terms and when identifying a larger number of parameters.These requirements should be carefully assessed in order to achieve successful SINDyCP fits for more general pattern-forming systems.
Parameter extrapolation.As a final demonstration (Fig. 4), we consider the one-dimensional cubic-quintic Swift-Hohenberg equation with parameters d, e, and f describing the linear, cubic, and regularizing quintic terms, respectively.This model pattern-formation equation has been used to study defect dynamics incorporating quintic corrections beyond the weakly nonlinear approximation and has revealed universal snaking bifurcations leading to the formation of localized states for e > 0 and d < 0 [36].
The parameters d, e, and f are the minimal and natural set to describe the possible dynamics in the Swift-Hohenberg equation derived from normal-form theory.However, in typical pattern-formation applications, one does not have direct control over such parameters.Instead, experimentally accessible parameters will have a complicated and nonlinear relationship with the normalform parameters, which requires detailed knowledge and tedious calculations to derive, e.g., an expansion and center manifold transformation around a bifurcation point.The SINDyCP approach enables an automated discovery of such relationships, which can be used to extrapolate system behavior beyond a set of measurements.
To illustrate this idea, we generate random quadratic relationships between an experimental parameter ε and the normal-form parameters (d, e, f ), and we create three training trajectories using random values of the param- eter 1 < ε < 3 [Fig.4(a)].For all of the training trajectories, ε is sufficiently large that no localized or periodic states exist, and all trajectories decay to the trivial u = 0 solution.We perform the weak SINDyCP fit using these trajectories subject to injected white noise with intensity σ = 0.01 [35] and with a quadratic parameter library.To test the ability of SINDyCP to extrapolate beyond the parameter regime given in the input data, we simulate the identified model for the experimental parameter value ε = 0. Remarkably, even with limited and noisy training data, the method identifies an accurate relationship between ε and the normal-form parameters.
Simulations of the identified model with random initial conditions converge to localized states for ε = 0. Numerical continuation of these localized states [Fig.4(b)] with the AUTO package [37] reveals that the SINDyCP model exhibits snaking bifurcations that closely approximate those in the Swift-Hohenberg equation (see Sec. S5 of the Supplemental Material [30] for fits and continuations with differing noise intensity and training data).Thus, despite the significant extrapolation of the parameter value beyond the input data, the model captures the complex bifurcation structure in the dynamics.Discussion.The SINDyCP approach represents a simple but powerful generalization of SINDy with control.By disambiguating the feature and parameter components of the SINDy libraries, the method enables the discovery of systems of partial differential equations parameterized by driving parameters.Such equations arise naturally in the context of pattern formation, where the normal forms of bifurcations lead to parameterized equations near the onset of instabilities.The approach can be easily applied with the data available in typical patternformation experiments and promises to enable extrapo-lation beyond the regime that can be theoretically described with weakly nonlinear theory.For example, it may find application in the discovery of mechanisms leading to the formation of novel localized states beyond the snaking bifurcations of the Swift-Hohenberg equation [38,39].While new phenomena may be easily conjectured to occur at unseen parameter values, we emphasize that such predictions must be validated experimentally to ensure correct extrapolation.
In practice, two significant challenges must be overcome to discover good parameterized models with SINDyCP.First, the method requires sufficiently informative trajectory data.Samples should be collected on appropriate temporal and spatial scales, sufficiently many parameter values should be measured, and trajectories with persistent dynamics provide more information than transient trajectories.While the weak formulation significantly mitigates the problem, measurement noise impairs the fit and can corrupt results.Second, the method requires an appropriate state space with a good coordinate representation to discover sparse dynamics.Near a bifurcation, the normal-form theory helps provide information about the state space dimension and sparsity-promoting coordinate transformations.In the future, a more sophisticated data-driven phaseamplitude reconstruction [40] or autoencoder-assisted discovery of physical coordinates [41][42][43][44][45] will further enable researchers to discover parsimonious equations governing complex systems directly from data gathered through experiments conducted under various driving parameters.

Supplementary Material for "Data-Driven Discovery and Extrapolation of Parameterized Pattern-Forming Dynamics"
Zachary G. Nicolaou, Guanyu Huo, Yihui Chen, Steven L. Brunton, and J. Nathan Kutz

A. Installation
We recommend installing the PySINDy package from source by cloning the GitHub repository with git to obtain the most up-to-date version.We highly recommend doing so with a new Anaconda environment in order to avoid unexpected dependency conflicts.Anaconda can be installed following the instructions here: https://www.anaconda.com/download#downloads.The following bash commands will download the PySINDy repository, create an anaconda environment, and install the package: cd $HOME git clone https :// github .com / dynamicslab / pysindy .git cd $HOME / pysindy conda create -y -n pysindy_env pip Jupyter matplotlib conda activate pysindy_env pip install -e .
Once installed, the easiest way to develop code is in a Jupyter notebook, and the repository provides several example notebooks in the examples subdirectory of the package.We recommend starting with the first example notebook, which provides an overview of all the features implemented in the package.Running the commands cd $HOME / pysindy / examples jupyter notebook 1 _ featur e_overview .ipynb will launch a browser with the notebook, and running all the cells in sequence will demonstrate the package.Finally, the developers currently actively maintain the package, and issues with the code can be addressed on the GitHub page: https://github.com/dynamicslab/pysindy/issues.

B. Demonstrations
All the results in this manuscript can be reproduced by sequentially running all cells in the 17th example notebook, which can be launched with cd $HOME / pysindy / examples /17 _ p a r a m e t e r i z e d _ p a t t e r n _ f o r m a t i o n jupyter notebook p a r a m e t e r i z e d _ p a t t e r n _ f o r m a t i o n .ipynb Note that since PDE data is high-dimensional, some of the cells require up to 100GB of RAM to run successfully, so we recommend running the notebook on a cluster server if available or downsampling the input.
In addition to the results in the main text, the notebook illustrates simpler pedagogical demonstrations of SINDyCP in discrete maps, ODEs, and PDEs shown in Fig. S1.The left panels illustrate the logistic map, which is a discrete-time system with a single dependent variable x n and a single parameter r.This equation is the model for a universal period-doubling route to chaos as the parameter r increases past 3.56995.We perform the SINDyCP fit using four sample trajectories of 1000 iterations, corresponding to parameter values r = 3.6, 3.7, 3.8, 3.9 (red dotted lines in Fig. S1).We employ a library consisting of polynomials up to third order in the dependent variable x n and linear functions of the control parameter r, and the SINDyCP approach correctly identifies the parameterized equation.The middle panels illustrate the Lorenz system, ẋ = σ(y − x), ẏ = x(ρ − z) − y, ż = xy − βz, (S2) which consists of three ordinary differential equations in three dependent variables x, y, and z and three parameters σ, ρ, and β.This equation exhibits the iconic butterfly-shaped Lorenz attractor for certain parameter values.We perform the SINDyCP fit using five sample trajectories that have converged to their attractors, corresponding to the randomly selected parameter values σ = 10.0, 9.8, 9.9, 10.3, 9.5, ρ = 27.6,28.2, 28.3, 27.6, 28.1, and β = 3.1, 2.4, 2.4, 2.3, 2.4, respectively.We use feature and parameter libraries consisting of polynomials up to fourth order in the dependent variables (x, y, z) and linear functions in the parameters (σ, ρ, β), and the SINDyCP approach again correctly identifies the parameterized equation.Finally, the right panels illustrate the CGLE described in the main text.

S2. COMPLEX GINZBURG-LANDAU EQUATION
To integrate the complex Ginzburg-Landau equation, we use a pseudospectral integration method.We take a periodic domain of size of size L = 32π in each direction and discretize using N x = N y = 128 grid points in each spatial direction.Derivatives are calculated using fast Fourier transforms, and the discretized system is integrated with a 4(5) Runge-Kutta-Fehlberg method (which is also used for the other equations, with relative and absolute error tolerances of 10 −6 ).To produce states in the dynamical phases of interest, we take random initial conditions A 0 = nm α nm e iknm•x + ϵe ik 2 2 •x , where α nm are complex random Gaussian amplitudes with mean zero and variance σ 2 /(1 + n 2 + m 2 ), k nm = 2π(nx + mŷ)/L, the sum ranges over −2 ≤ n, m ≤ 2, and ϵ is the scale of an initial plane wave perturbation with wavevector k 2 2 .The mode amplitudes are determined by σ = 0.1, 0.1, 0.1, 1.0 and ϵ = 0.01, 0.01, 1.0, 0.01 for the four trajectories used in the main text.The system is allowed to approach an attractor for the first 90 time units, then the trajectory is formed by the next 10 time units, in steps of 0.01.We also provide an animation showing the phase and amplitude for longer runs of 100 time units (Fig. S2).

A. Model details and bifurcation identification
We mainly follow the analyses of the Oregonator model in Refs.[31,33], with realistic parameter values shown in Table I ) undergoes a Hopf bifurcation as the control concentration C B decreases below the critical value C c B = 0.787, leading to oscillatory chemical dynamics.While this is known for the Oregonator model, here we demonstrate that this bifurcation can be predicted and quantified if the governing equations were unknown using the SINDyCP approach.
We initially fix a spatial domain of length L = 1/ √ 0.05 cm ≈ 4.47 cm and employ the pseudospectral integration with an integration time of T = 400 s, and we produce trajectories for C B = (0.826, 0.834, 0.842, 0.850, 0.858, 0.866).We record the concentrations for time steps of dt = 0.594804 s, which corresponds to 1/10th the period of the critical frequency at µ = 1.Anticipating the two-dimensional state space required to describe a Hopf bifurcation, we consider fits for the variables (C X , C Z ).Each of these trajectories decays to a fixed point (C 0 X (µ), C 0 Z (µ)), and we estimate the fixed point for each C B by averaging the trajectories for the final 100 time steps.
We then construct scaled coordinates to fit with SINDyCP.We use the first 200 time steps as training data and the next 200 time steps as test data; Fig. S3(a) shows the spatial average ⟨ X⟩, ⟨ Ỹ ⟩ ≡ X, Ỹ dxdy/L 2 of the training data sets.The dynamics are dominated by exponential decay, so we use a linear feature library.Furthermore, since we assume no knowledge of the location of the bifurcation, we assume a linear parameter library for the control parameter C B as well.The SINDyCP fit produces a model The stability of the constant solution to this model is determined by the eigenvalues λ of the Jacobian matrix Figure S3 shows the real part of the eigenvalues as a function of C B .As expected, the spectrum is stable for the training data.The spectrum of the fit becomes unstable as C B decreases past C B = 0.784, closely predicting the true instability of C c B = 0.787 as well as the critical frequency.To assess the ability to extrapolate the dynamics, we also produce additional testing trajectories for C B = (0.748, 0.756, 0.763, 0.771, 0.779) and evaluated the R 2 score of the model (orange dots in Fig. S3).The model performs poorly, as expected given the strictly linear dynamics.More sophisticated fitting does not resolve this issue This simple amplitude construction is superior to naive strobing at the critical period and works well for our purposes, but we note that more sophisticated approaches employing, e.g., analytic signal theory [40], may be helpful in more complex scenarios.We rescale each trajectory by the time-and space-averaged amplitude |A|dtdxdy in order to weigh each trajectory equally in the SINDyCP fit. Figure S4 shows the spatially-averaged amplitude for the trajectories with µ < 0 corresponding to those in Section S3 A. As desired, the time averaging removes the fast oscillations from the data and produces a smooth envelop evolution.Given the expected scaling of the Hopf bifurcation, we include a parameter library with square root and linear features and a feature library with polynomial features up to cubic terms and spatial derivatives up to second order.The equations discovered by SINDyCP with a threshold of 0. where the subscripts denote the spatial derivatives.The model performs very well for the parameter regime that it was trained under, with a R 2 score greater than 0.999, but the dynamics for µ < 0 are too transient to give enough information to build a model that can extrapolate well.Thus, the model does not extrapolate very well, producing poor R 2 scores for test trajectories with µ > 0.
The weakly nonlinear CGLE results in Section S3 B rely on the Taylor series expansion Eq. (S6) for µ near zero, but the normal-form transformation eliminating non-resonant terms applies more generally to asymptotic series up to cubic order in the state variables with arbitrary parameter dependence (see, e.g.Lemma 3.7 in Ref. [6]).The values of the normal-form parameters under this general transformation remain b(µ) = Im(d)/Re(d) and c(µ) = −Im(g)/Re(g) where g and d are defined as in Eq. (S9)-(S10) as before, but now with µ-dependent values of F x n .Consistently, the normal form parameters for the fits with µ < 0 closely approximate the analytic results b(0) ≈ b 0 and c(0) ≈ c 0 in the training regime (Fig. S4), but significant variations emerge for µ > 0. As noted previously, the transient trajectories for µ < 0 do not contain enough information to build a model that extrapolates well.
For the µ > 0 trajectories from the main text, we use a scaled domain of size L = 1/ √ µ and integration time T = 500/µ according to the expectations from the Hopf bifurcation to obtain informative training trajectories.The amplitudes A are then calculated and interpolated onto a common temporal grid with 500 time points.The first fifth of the trajectory is discarded as the initial conditions relax to the chaotic attractor, the next two-fifths are used as training data, and the final two-fifths are used as test data.
For the smallest value of µ = 0.01, we perform an unparameterized SINDy fit with a threshold of 0.1 resulting in The normal-form transformation gives parameter values b = 0.189 and c = 2.586, which approximate the weakly nonlinear theory values of b 0 = 0.173 and c 0 = 2.379 fairly well (this fit can be improved with additional trajectory data).Thus, the data-driven amplitude construction works well enough to reproduce the results of weakly nonlinear theory even when the governing equations are unknown.
For the parameterized SINDyCP fit in the main text, we employ the implicit SINDy to fit, which includes additional features columns for Ẋ and Ẏ , leading to in highly nonlinear rational coefficients when solving for explicit governing equations.The resulting SINDyCP fit with a threshold of 0.1 is shown in Eqs.(S16)-(S17) below.This model extrapolates much better than the weakly nonlinear theory.Further sparsity is achieved by applying the normal-form transformation, leading to the parameter-dependent values of b(µ) and c(µ) [main text Fig. 2(b)].We also provide an animation the coordinates on the grid points are x i .The weak form requires us to calculate the integral of interpolated data f (x) weighted by the dth derivatives of test function ϕ(x), We choose to use test functions ϕ(x) = (x 2 − 1) p in our implementation (after centering and rescaling the spatial coordinates so that x 0 = −1 and x N = 1; we ignore the rescaling coefficients here), and thus their dth derivatives are We are provided with some feature values u i at the grid points, and we consider the value of a library function f applied to that feature, f i ≡ f (u i ).We linearly interpolate the function as Expanding the interpolation, and integrating the xϕ (d) (x) terms by parts, where Φ (d) (x) are the antiderivatives of ϕ (d) [i.e.Φ (d) (x) = ϕ (d−1) (x) for d > 0].By relabelling the dummy summation variables, we can recast Eq. (S20) as a dot product between the input data f j and a weight w j with where 0 < j < N − 1.At the left and right sides of the domain (for j = 0 and j = N − 1), we must adjust the weights to correct for boundary effects, Expressing the integrals along each dimension as dot products [Eq.(S21)] enables efficient vectorization with BLAS operations, and the integration weights [Eqs.(S22)-(S24)] are evaluated (in a vectorized fashion) just one time when the library is first initialized.We further vectorize the code by forming tensor products over all integration dimensions to calculate multidimensional integrals using a single tensor dot product.

S5. SWIFT HOHENBERG EQUATION AND LOCALIZED STATES
As noted in the main text, we randomly generated a quadratic relationship between the "experimental" parameter ε and the normal-form parameters d(ε), e(ε) and f (ε).Specifically, we take d(ε) = which we treat as ground truth.We selected these specific intervals for random sampling simply to localize the snaking bifurcations around ε = 0, as shown in Fig. 4(b) of the main text.
As before, we employ a pseudospectral method to integrate the Swift-Hohenberg equation, with N x = 256 discretization points, a domain of size L = 64π, an integration time of 5 time units, and random initial condition is given by the real part of u 0 = 20 n=−20 α n e iknx with k n = 2πn/L and α n complex random Gaussian amplitudes with mean zero and variance 1.0/(1 + |n|) 2 .We randomly selected five values from a uniform distribution on ε ∈ (1, 3) and generated high-ε trajectories well above the snaking bifurcations, and we similarly selected another five values on ε ∈ (−2, 1) to generate low-ε trajectories around the snaking bifurcations.Additionally, we added noise sampled from a Gaussian distribution with variance σ 2 and mean zero to produce the training data.
To maintain the reflection symmetry that ensures that homoclinic orbits can be continued under one-parameter continuation, we use a feature library consisting of odd-order polynomials up to order 5 and even spatial derivatives up to fourth order.This choice is simply a matter of convenience; when the reflection symmetry is broken slightly, localized states persist but begin to travel, which complicates the numerical continuation.On the high-ε training trajectories shown in Fig. S6

FIG. 2 .
FIG. 2. Corrections to the weakly nonlinear theory of the Oregonator model.(a) R 2 score for the parameterized SINDyCP model and for an unparameterized SINDy fit on test trajectories collected at the parameter values used to train the model.(b) Corrected normal-form parameter values relative to the weakly nonlinear values b0 and c0 as a function of the bifurcation parameter µ 1/2 .(c) Average limit cycle amplitude A 2 vs µ 1/2 for the Oregonator model and SINDyCP fit.The fit correctly exhibits a canard explosion far from the onset of the instability (insets show CX above and below the explosion, and an animation of the evolution is available in the Supplemental Material[30]).

FIG. 3 .
FIG. 3. Performance of SINDyCP for the fit of the complex Ginzburg-Landau equation with noisy data.(a) Model score vs noise intensity using the differential and weak forms of SINDyCP with nt = 4 trajectories.(b) Model score vs number of samples for varying number of randomly generated trajectories, varying trajectory length, and noise intensity 10 −3 .

FIG. 4 .
FIG. 4. Extrapolation of localized states in the cubic-quintic Swift-Hohenberg equation.(a) The randomly generated relationships between the normal-form parameters (d, e, f ) and the experimental parameter ε.Red dotted lines show the values used to train the SINDyCP fit and dashed colored lines show the coefficients derived from the fit.(b) Snaking bifurcations of localized states for Eq.(7) (purple lines) and the SINDyCP fit (red lines).Insets show the localized states found from random initial conditions with ε = 0 in the discovered model.
FIG.S1.Demonstrations of the SINDyCP approach for three models (top row) of nonlinear dynamics.Several trajectories produced from different parameter values (middle row) are supplied as input, and the SINDyCP fit (bottom row) correctly identifies the governing equations in each case.
FIG. S2.Snapshot of the animation showing the complex amplitude A = re iϕ of the trajectories.In each panel, the phase ϕ is shown in the upper left and the amplitude r is shown in the lower right.
FIG. S3.Estimation of the Hopf point from data.The left two panels show the spatially-averaged scaled coordinates as a function of time; the top-right panel shows the R 2 score for test data corresponding to the training parameters (blue dots) and for extrapolation to beyond the Hopf point (orange dots); and the bottom-right panel shows the real part of the linear spectrum as a function of C B .

FIG. S4 .
FIG. S4.SINDyCP fits for the µ < 0 data.The left two panels show the spatially-averaged amplitudes as a function of time; the top-right panel shows the R 2 score for test data in the corresponding to the training parameters (blue dots) and for extrapolation to µ > 0 test data (orange dots); and the bottom-right panel shows the predicted normal-form parameters as a function of µ.