SINDy-BVP: Sparse Identification of Nonlinear Dynamics for Boundary Value Problems

We develop a data-driven model discovery and system identification technique for spatially-dependent boundary value problems (BVPs). Specifically, we leverage the sparse identification of nonlinear dynamics (SINDy) algorithm and group sparse regression techniques with a set of forcing functions and corresponding state variable measurements to yield a parsimonious model of the system. The approach models forced systems governed by linear or nonlinear operators of the form $L[u(x)] = f(x)$ on a prescribed domain $x \in [a, b]$. We demonstrate the approach on a range of example systems, including Sturm-Liouville operators, beam theory (elasticity), and a class of nonlinear BVPs. The generated data-driven model is used to infer both the operator and/or spatially-dependent parameters that describe the heterogenous, physical quantities of the system. Our SINDy-BVP framework will enables the characterization of a broad range of systems, including for instance, the discovery of anisotropic materials with heterogeneous variability.


Introduction
Boundary value problems (BVPs) are ubiquitous in the engineering and physical sciences [1,2]. From heat transfer to elasticity, many fundamental technologies developed in the 20th century are formulated as linear BVPs whose solutions are used in engineering design. For example, the semi-conductor industry developed many critical technologies and chip architectures by solving BVPs that characterize the underlying quantum, thermal, and electromagnetic physics. Modern BVPs of interest often arise in complex systems characterized by nonlinearity and spatial heterogeneity, thus rendering standard analytic and computational techniques intractable since the governing equations and spatial variability are often unknown. Indeed, the governing BVPs for many emerging applications are often unknown and/or their spatial dependencies undetermined. Modern anisotropic material system designs provide a canonical example of the ability to leverage nonlinearity and heterogeneity in order to produce remarkable new materials. Data-driven methods provide a potential theoretical framework for characterizing such materials by discovering both the governing BVPs (linear and nonlinear) and their spatial dependencies through measurements alone. Toward this goal, we develop a sparse regression framework, previously used for the discovery of dynamical systems, in order to discover interpretable and parsimonious BVPs and their spatial dependencies.
The formulation of many canonical problems in physics resulted in the first BVPs. From as early as 1822, when Fourier formulated and solved the heat equation [3], BVPs played a central role in electromagnetism, wave propagation, quantum mechanics, and elasticity. Many of these BVPs resulted from applying a spacetime separation of variables decomposition to a governing partial differential equation (PDE). In different geometries and dimensions, the solutions to many canonical BVPs became known as special functions: Bessel, Laguerre, Hermite, Legendre, Chebyshev, spherical harmonics, radial basis, etc. More broadly, these canonical linear equations of mathematical physics were unified under the aegis of Sturm-Liouville theory. The impact of Sturm-Liouville theory in the 20th century is difficult to overestimate given its enormous breadth of applications ranging from the underlying theory of quantum mechanics to the propagation of electromagnetic energy in waveguides. The BVP theory for these two applications arise from a separation of variables solution of the Schrödinger equation and Maxwell's equations, respectively.
Linear BVPs are amenable to a number of solution strategies, foremost among these being eigenfunction expansions [1]. Such a solution technique is highly advantageous given the interpretability of the

Background
This work relies on adapting SINDy [10], PDE-FIND [17], and the Parametric PDE-FIND [18] algorithms to learn the differential operator L in BVPs of the form L[u(x)] = f (x), along with parametric and spatial heterogeneous dependencies. SINDy is a model discovery algorithm originally designed to discover gov- Ξ ← argmin Ξ U t − ΘΞ 2 Initial Ξ guess 3: for i = 1, ..., iters do 4: terms ← {i : r(Ξ(i)) > } Threshold by coefficient matrix 5: end for 8: return Ξ 9: end procedure erning equations for nonlinear dynamical systems. This method uses a sparse regression framework with a large library of candidate physics models to determine governing equations for physical systems that are often characterized with relatively few terms. This makes the governing equation sparse in the space of possible candidate functions included in the library. SINDy considers dynamical systems of the form: where x(t) ∈ R n represents the measured variables of the system at time t. The regression is formulated in a discrete matrix formulation where x is measured at discrete snapshots in time t. The snapshots are used to form the matrices X andẊ, whereẊ is either directly measured or numerically computed from the snapshots x(t). If the interval [0, T ] is discretized into m points, the matrices are: . . .ẋ n (t 2 ) . . . . . . . . .
where since x(t) is an n-dimensional vector, then X andẊ ∈ R m×n . The system identification problem is formulated in matrix form as an over-determined linear regression problem (Ax = b) for learning the governing equations: where Θ(X) ∈ R m×p contains p column vectors, each representing a possible candidate term in the governing equation to be learned. These columns contain candidate symbolic functions for characterizing the governing equations N in (1) by numerically evaluating the state-space at m discrete time points. The unknown coefficient vector of loadings, Ξ ∈ R p×n is learned via sparse regression. Candidate terms in Θ can be excluded from the learned governing equation by setting the corresponding coefficient in Ξ to 0, which is naturally done by a sparse regression. The sparse regression minimizes the 2 reconstruction error while enforcing sparsity. Rather than traditional 1 regularization terms, or the idealized 0 regularization, sparsity is achieved through an iterative thresholding procedure [10] whose convergence properties have been studied under various assumptions [19,20]. However, this problem can be solved using any sparse regression algorithm, such as lasso [21], sparse relaxed regularized regression (SR3) [22,23], stepwise sparse regression (SSR) [24], or Bayesian methods [25][26][27]. The iterative thresholding algorithm for SINDy is outlined in Algorithm 1.
Classical SINDy works well for model discovery and system identification on problems where the terms in the governing equation can be well-represented in the candidate library (Θ) and where the learned terms have constant coefficients with respect to the independent variable(s) of the system (i.e. time-invariant constant coefficients). Parametric PDE-FIND was developed as an extension of the SINDy algorithm to accommodate partial differential equations with time-variant or space-variant coefficients [18]. The approach is modified for the data-driven modeling framework in this work, as described in Section 3.

Methods
Our proposed innovation learns the differential operator L in BVPs and its spatially varying coefficients. This is accomplished by subjecting the system to a collection of known spatially-varying forcing functions and measuring the system's response. Each response, u j (x), to a forcing function, f j (x), is recorded as a trial and each trial is governed by the relationship: where L is the linear or nonlinear differential operator, x is the independent spatial variable, u j (x) is the measured system state variable quantifying the system's response when subjected to the force f j (x), and there are m total trials. The f j (x) are known applied forcing functions, which can be considered as probes for the system. The variable x is used to denote a single spatial scalar variable rather than a position vector.
The interval x ∈ [a, b] defines the spatial region of interest, where x = a and x = b are the boundaries of the BVP. Although a variety of boundary conditions can be realized in physical systems, this work uses Dirichlet boundary conditions which specify u(x = a) and u(x = b). Figure 1 shows the general principle of SINDy-BVP.

Problem Statement
To begin, we assume L is a second-order differential operator. In this case, it is known that L[u] contains the term u xx . This operator order assumption will be relaxed in later sections (Section 4.4). If u xx is in the governing equation L[u(x)] = f (x), we assume it can be represented as some generalized function N which contains f (x) and other terms in L: This is the BVP equivalent to (1). The BVP problem (4), which is formulated as a continuous variable over the domain x ∈ [a, b], is discritized into n spatial locations. We assume these to be equally spaced measurements or discretization locations. The discretized function u(x) is mapped to the vector u = [u(x 1 ) u(x 2 ) u(x 3 ) · · · u(x n )] T where x 1 = a and x n = b. With the vectorization of the data, we can adopt the SINDy nomenclature and restate the sparse regression for BVPs as where U xx is the second spatial derivative of the discretized vector of the state space u(x), Θ is a library of candidate functions similar to N (·), and Ξ is a vector of coefficients which prescribe the loadings of the columns of Θ. The coefficient vector can vary spatially with x, or more precisely, the discretization of x. This regression uses the input data set with both U ∈ R m x n and F ∈ R m x n with n discrete sampled spatial positions and m unique trials or forcings. Each trial is a system response u j to a corresponding Figure 1: SINDy-BVP studies steady-state systems subjected to a forcing function. One simple example system is a beam clamped at both ends (b) subjected to a static load (a). The beam deflects (c) in response to the load, and the forcing function and deflection are used for data-driven modeling via SINDy-BVP to learn the parametric coefficients (d) in the governing operator. The coefficients p(x) and q(x) vary spatially. The coefficients are directly related to the beam's spatially-varying mechanical properties. The grey boxes in (d) indicate that error can occur in the learned coefficients near the boundaries.
forcing function f j governed by the same operator L. The input data set U and F have the structure: Note this is different from dynamical systems SINDy, where m temporal snapshots of a dynamical system are sampled. The data in U is numerically differentiated (in space) to produce U x , and further derivatives of U with respect to x as needed. The numerically differentiated data is stacked as a vector and used as the outcome variable for the SINDy regression (5). The stacked vector U xx has the structure: Figure 2: Overview of constructing the data sets and regression for each discrete spatial point in the data set. Part (a) shows a collection of trials, where different forcings (fj(x)) are applied to a system yielding different responses (uj(x)).
A matrix containing a library of candidate terms Θ(U, F) is produced for each trial, where the rows are each a spatial point x k and each column contains a candidate function, as seen in part (b). A sliding procedure is used to select rows of a single x k from the libraries in part (b) to produce an aggregated library Θ (k) for each x k in the data set. In (c), the regression is performed for each x k to produce a vector of the parametric coefficients p(x) and q(x) at each x k .
Following the approach in [18], the candidate function matrix Θ(U, F) ∈ R (m x n) x (p x n) is constructed as a sparse block matrix to enable discovery of spatially-varying parametric coefficients. Θ(U, F) contains p symbolic candidate basis functions, each evaluated at the n spatial coordinates for all of the m trials. The candidate basis functions used in Θ(U, F) include nonlinearities of u(x) and spatial derivatives of u(x) as well as polynomials of the independent spatial variable x. The functions included in the library Θ(U, F) are further described in Section 3.5. The matrix Θ(U, F) is a diagonal sparse matrix with the structure: where Θ (k) ∈ R m×p is a symbolic function library with p candidate functions evaluated at a single spatial coordinate, x k , for all of the m trials: Since the forcing functions f j (x) are known to influence the observed behavior of the system, they must be in the learned function N (·), and therefore must be included in the candidate term library Θ(U, F). Furthermore, any complete, accurate model must include the f (x) term, so it is a natural method for selecting plausible parsimonious models that accurately describe the system behavior.
The problem formulation is tied together as a group regression problem through a set of group indices G = {l + p * k : k = 1, . . . , n; l = 1, . . . , p} where l counts through the p candidate functions and k counts through the n discrete spatial positions. The group indices effectively tie together the columns in Θ(U, F) corresponding to a single candidate function at all of the n discrete spatial positions.
Group sparsity is imposed to produce a solution which is sparse in the space of possible candidate functions, where the same candidate functions are used to describe the system behavior at all spatial points in the system. Group sparse regression is performed using the Sequential Grouped Threshold Ridge regression (SGTR) algorithm developed by Rudy et al. [18]. An intuitive way of thinking about this approach is presented in Figure 2. In the figure, a separate sparse regression is constructed for each of the n spatial coordinates. The regression aggregates data from m trials and enforces the solution's sparsity pattern across all of the spatial positions. As implied by the figure, this allows for inference of the operator L and its parametric coefficients.

Sequential Grouped Threshold Ridge Regression
SGTR [18] is a group regression technique which accomplishes group-level sparsity through an iterative thresholding process. This example assumes SINDy-BVP will be performed with the outcome variable U xx . Using the construction provided above, each group in G contains a set of indices which represent columns of a single candidate term in Θ(U, F) at all spatial positions and its corresponding coefficient in Ξ at all spatial positions. The algorithm achieves sparsity at the group level through a combination of ridge regression and iterative thresholding across all groups. The optimization for the SINDy-BVP example is shown in Algorithm 2.
The iterative thresholding loop in the SGTR algorithm progressively eliminates groups from Ξ and Θ by setting the columns in Θ to zero. This thresholding imposes sparsity on the candidate functions in Θ(U, F) based on the candidate function's coefficient vector Ξ. Recall each group is a single candidate function at all spatial positions. The evaluation function r in this work is the 2 norm, which means SGTR performs ridge regression and thresholds out candidate functions based on the 2 norm of the coefficient vector (i.e. r(Ξ (g) ). The result is a parsimonious function for U xx with relatively few nonzero entries in Ξ, where the non-zero coefficients are allowed to vary at each spatial position x k . P ← {g ∈ G : r(Ξ (g) ) < } Select groups below threshold 5: Ξ (P ) ← 0 Set to zero 6: Ξ ← argmin Ξ U xx − ΘΞ 2 + λ Ξ 2 Repeat regression 7: end for 8: Final fit 9: return Ξ 10: end procedure

Model Selection
Model selection follows the same procedure developed for Parametric PDE-FIND with the SGTR algorithm [18]. To improve the performance of SINDy-BVP, each candidate function is normalized to unit length. Prior to constructing the block diagonal matrix Θ(U, F) above, the entries Θ (k) are stacked and each column is normalized to unit length. More precisely, the matrixΘ ∈ R (m x n)xp which is assembled aŝ .Θ is normalized column-wise over each of the p columns, each containing a candidate function. Similarly, the outcome variable vector (e.g. U xx ) is normalized such that U xx = √ m. A set of tolerance values are computed by: where max and min are the highest and lowest tolerances that affect the sparsity of the predicted model. At a thresholding value of max , all coefficients are set to 0 after the first pass with SGTR. Conversely, using min as the thresholding tolerance would not eliminate any candidate functions with SGTR. A number of values (typically 50) spaced logarithmically between max and max are selected and used for thresholding, and the results are scored by the AIC-inspired loss function: where k is the number of nonzero coefficients in the identified model ( Ξ 0 /m) and mn is the number of rows in Θ(U, F). The loss function used to select the model assumes there is error in evaluating U xx , and thus a model that minimizes only mean squared error ( Θ(U, F)Ξ − U xx 2 2 ) is likely overfit. The value of β used in the loss function can be adjusted to accommodate noise in the state variable u observations. The original work on Parametric PDE-FIND explores the importance of selecting appropriate β and λ in more detail [18]. In this work these values are fixed: β = 10 −6 , and λ = 10 −5 .

Learning the Operator L
The operator can be inferred from the learned function for u xx (x). Using a simple ansatz that the differential operator is at least second order, it is presumed the operator contains a u xx (x) term. This means we can define a new function N such that: If u xx (x) has the spatially-varying coefficient φ(x), then the function N is related to φ(x) and the operator L by rearrangement of the original problem to: which shows the parametric coefficient for the term f (x) is 1/φ(x). If the operator is of the Sturm-Liouville form, φ(x) = p(x) > 0 and is therefore positive in the interval x ∈ [a, b]. Furthermore, the other terms with nonzero Ξ (g) correspond to additional terms in the operator L other than u xx (x). By identifying φ(x), we can directly infer the operator L from the learned function for u xx (x), f (x), φ(x), and N . This method can be extended to other types of differential operators as well, including fourth order linear operators and nonlinear operators. There is the question of whether this formulation holds any merits over a simpler formulation for directly learning the operator and coefficients, without algebraic manipulation. The simpler formulation with f (x) as the outcome variable (or left-hand side term) in regression provides the opportunity to directly learn the operator L through a regression of the form F = Θ(U, F)Ξ. In practice, keeping a highly accurate f in the library Θ appears to improve the ability of SINDy-BVP to handle noise when identifying the operator.
However, if f also contained noise there may not be an advantage to this construction.

Candidate Function Library
The candidate function library Θ(U, F) contains columns for derivatives of u(x), polynomials of x, and nonlinearities of u(x). In all cases, u(x), nonlinearities of u(x) up to fifth order, and polynomials of x up to fifth order are included. The polynomials of x are included to challenge SINDy-BVP; these polynomials could theoretically be used to fit a Taylor expansion-like solution to these ODE problems.
The derivatives in the library vary for different outcome variables. Assume the outcome variable for SINDy-BVP is the discrete form of d A u(x)/dx A j. In this case, Θ(U, F) contains derivatives d a u(x)/dx a of order a, for integers 0 < a < A. For example, if U xxxx is the outcome variable, the library contains columns for the derivatives u x (x), u xx (x), and u xxx (x). Finally, the products of u(x) and nonlinearities in u(x) with the spatial derivatives of u(x) (e.g.u(x) u x (x) and u 2 (x) u xx (x)) are included in Θ.

Boundary Value Problem Models
The models used in this work are solved on the interval x ∈ [0, 10] using the shooting method [28]. A tolerance of 0.001 is used for the right-side boundary condition, such that solutions which aim to achieve u(x = 10) = 0 can have an actual value u(x = 10) ∈ [−0.001, 0.001]. Importantly, boundary conditions are the same for all trials for each model. The following subsections describe the models used for this work.

Linear Sturm-Liouville
Sturm-Liouville form operators are an extremely common class of linear, self-adjoint, Hermitian operators. Sturm-Liouville theory is especially important in engineering applications, and its study focuses on operators of the form in Equation 11: where the state variable u(x) is a function of the spatial variable x, and p(x) and q(x) are in general functions of the spatial variable. Importantly, p(x) > 0 is nonzero and positive-valued in the interval [0, 10]. In our example model, the parametric coefficients are described by the functions: p(x) = 0.5 sin(x) + 0.1 sin(12x) + 0.25 cos(4x) + 2 q(x) = 0.4 sin(3x) + 0.15 cos(8x) + 1.
The boundary conditions u(0) = 0 and u(10) = 0 are enforced for solutions of this model.

Linear Second Order Poisson
Many simple physical systems are described by Poisson's equation. These elliptic differential equations are described by a Laplacian operator subjected to a force: ∆u = f . In our system, a parametric coefficient describing a material property, p(x), is introduced to the model.
Steady-state heat conduction is one example of a system that follows from this model. The coefficient p(x) could thus be considered as thermal diffusivity (often κ) of the material and is allowed to vary spatially. The material in this example system is a two-component composite that is anisotropic along the x coordinate and contains an exponentially-varying quantity of the two materials along the x direction. The model for p(x) in this problem is the simple arithmetic average: where v a (x) and v b (x) are the volume fractions of component a and b respectively, and vary spatially. The values p a and p b are the material properties for pure a and b. The components' material properties hold the value p a = 12 and p b = 3, which do not change.
Although this model is simple and the arithmetic average often overestimates the true observed material properties of composites [9], it is instructive to consider the ability of SINDy-BVP to learn a spatially varying anisotropic material property. The volume fraction of component b is described by an exponential decay function while component a makes up the remainder of the volume: A steady-state heat conduction problem, where one end has a higher temperature than the other, is modeled in this problem . Boundary conditions of u(0) = 0.8 and u(10) = 0 are applied.

Euler-Bernoulli Beam Theory
The Euler-Bernoulli beam theory is a fourth order operator similar to the Poisson form. The operator takes the form: where EI is the flexural rigidity of the material. In our model, the flexural rigidity varies spatially following a stepwise function as expected for a lamellar, laminate composite with the lamella oriented perpendicular to the x coordinate: The stepwise function EI(x) is a challenge for SINDy-BVP because of the discontinuities at the jumps in flexural rigidity which occur at x = 2, x = 4, x = 6, and x = 8. The beam in this problem is considered clamped at both ends such that u(0) = 0 and u(10) = 0.

Forcing Functions
The forcing functions for all examples are sinusoidal functions of the form a sin(bx) + c. The amplitude a, frequency b, and positive offset c are selected from a set of values which varies for each model.
The approach for this work is to generate a large library of solutions to the problem L[u j (x)] = f j (x), and then sub-sample the library of solutions to test the SINDy-BVP algorithm. This approach is physically and experimentally relevant. In real systems, a number of different conditions could be tested and compiled into a database of forcings F and corresponding responses U on the discretized spatial vector x.

Operator Identification and Parametric Coefficient Estimation
SINDy-BVP aims to achieve two primary goals: identification of the structure of a differential operator L and discovery of the parametric coefficients present in L for a forced system governed by the model L[u(x)] = f (x). The method is applied to the four models described in Section 4.1. Operator identification is only required in cases where the governing operator is unknown, and so two cases can be considered: known operator and unknown operator. The data used in this section is noise-free (up to numerical precision). Derivatives are computed using the finite differences method. Although this is physically unrealistic since measurements would introduce noise, this exercise provides insight into the capability of the method. Figure 3 shows the four models used in this paper, an example sub-sample of data used for training each of the SINDy-BVP data-driven models, and a plot of true and learned parametric coefficients in the operator L over the interval x ∈ [0, 10]. The parametric coefficient plots are taken from the case of an unknown operator, where both the operator and the parametric coefficients are learned by SINDy-BVP.
SINDy-BVP is effective at learning the coefficients p(x) and (if applicable) q(x) with relatively few trials for numerical precision data. Table 1 shows the number of trials required for SINDy-BVP to estimate the parametric coefficients to within 1% error for the middle 98% of the interval (i.e.[0. 1,9.9]). This metric is used to quantify the accuracy of learned coefficients because the error in the learned coefficients happens almost exclusively at the boundaries (inspect Figure 3).

Effects of Noise
The effect of noise is best exemplified by focusing on a single model. For these studies, the Nonlinear Sturm-Liouville model is used; see Section 4.1.2 for model details. Noise is introduced to the system by applying Gaussian white noise to the measurement data in each U j . The magnitude of noise is based on the standard deviation of the vector U j . The effects of noise will be discussed separately for the goals of Operator Identification and Parameter Estimation.

Noisy Operator Identification
Operator Identification is a challenging task for SINDy-BVP with noise. Prior works with SINDy have also described challenges in dealing with noise, so this is not a surprising finding. In order to enable effective operator identification, the data in each U j is filtered with a convolutional Gaussian filter, and differentiation is performed with Chebychev polynomial interpolation on subsets of the data as discussed in [18]. Figure 4 shows the effect of varying magnitudes of noise and different numbers of trials used in regression for the operator identification process. In the case of the Nonlinear Sturm-Liouville model, over 120 trials are required to routinely identify the correct model at 1% noise. With a constant 200 trials used, spurious terms show up in the learned function starting at 2.5% noise.

Noisy Parameter Estimation
If the operator is known, the focus shifts towards estimating the spatially-dependent parametric coefficients. SINDy-BVP is better at estimating parameters in noisy systems than identifying an unknown operator from noisy data. In order to enable effective operator identification, differentiation is performed with Chebychev polynomial interpolation on subsets of the data, as above, but unlike the process for operator identification no convolutional filter is applied. Figure 5 shows the effect of noise on parameter estimation. In the case of a known operator, this is an ordinary least squares regression. Figure 5(a) shows the error increase using a sample size of 200 trials. This is a relatively large overall number of trials compared to the trials needed to get decent parameter estimation in the 1% noise case in Figure 5(b). Despite the large number of trials, the error in parameter prediction significantly increases with small amounts of noise, reaching a reconstruction error over 0.2 by 5% noise.

Model Differential Order Selection
Knowing the dimension of the model's left hand side has not yet been discussed. For the Euler-Bernoulli beam theory example, for instance, the left hand side term is d 4 u/dx 4 . The order of the model can be determined by using a test data set that is excluded from a series of SINDy-BVP regressions. Using the methods described in section 3.4, the operator L can be identified from a generalized equation N which describes a given left-hand side term. A series of SINDy-BVP regressions is used to identify an operator L for a set of left hand side terms with increasing differential order (du/dx, d 2 u/dx 2 , d 3 u/dx 3 ,...). Each of these operators is then evaluated with the test data set for the error 1 T j=1,...,T L[U j ] − F j for the T test data sets. Figure 6 shows how the 2 norm error compares between data-driven models generated with increas- Figure 3: A summary of the models and operators studied with SINDy-BVP. The center column shows three example trials from the training data set (u1(x), u2(x), and u3(x)). The solutions are all of O(1). The final column shows the parametric coefficients in the operator, as well as the inferred parameters for clean data. Coefficients p(x) and q(x) are plotted with an offset, with markers every 30 points.
ing differential order (du/dx, d 2 u/dx 2 , d 3 u/dx 3 ,...). A significant decrease in error occurs at the model for d 4 u/dx 4 , indicating this is likely the correct model to use. This approach places an emphasis on the relationship Lu = f , which is central to this work.

Discussion
SINDy-BVP successfully adapts the data-driven modeling approach of SINDy to time-invariant, steadystate, spatially-varying BVP systems. The method is used to identify a differential operator, L, governing forced systems of the form L[u j (x)] = f j (x), where f j (x) is a known forcing function and u j (x) is the measured variable which quantifies the system's response to the forcing. Operator identification and parametric coefficient estimation are the two most important tasks. With noiseless data, SINDy-BVP is effective at identifying the operator and the parametric coefficients withiñ 1% error with relatively little data (see Table 1). However, noisy data makes both tasks more difficult.   Figure 4 suggests operator identification is challenging with as little as 1% noise. Parameter estimation can succeed within 15% error with as little as 10-15 trials (f j (x)-u j (x) pairs) in 1% noise ( Figure 5(b)). However, parameter estimation error does not improve significantly without using much larger sets of data (over 100 data sets). With and without noise in the data, SINDy-BVP often incurs error in both operator identification and parameter estimation near the boundaries of the system.
A variety of improvements and adaptations to the SINDy-BVP architecture can be made which may improve model convergence for operator identification and the accuracy of parametric coefficient estimation. One common error which induces error in model convergence is the inclusion of a term with a single large value in its Ξ (g) . The large value passes the thresholding step, which is based on the 2 norm of the entire coefficient vector. A simple approach to reduce erroneous terms originating from this issue is to include an ∞ norm threshold. Another strategy is to aggregate multiple spatial positions in each u  per regression rather a single position (e.g. x k−1 , x k , x k+1 rather than just x k ).
An additional approach to improving SINDy-BVP is to include physics-informed constraints on the regression optimization. For example, in the case of known Sturm-Liouville form operators, constraints could be added to the optimization that directly relate p(x) to its derivative p x (x). Conservation laws can also be included in the optimization to provide additional constraints, for example, on the energy within the system. Data used for modeling is a critically important aspect of any data-driven modeling approach. The data used for developing the data-driven model must be balanced to display behavior from each of the contributing terms in order for the SINDy-BVP to identify the full, correct model. For example, if the differential operator is the linear Sturm-Liouville form L[u] = −p(x)u xx − p x (x)u x + q(x)u = f but the data shows the term −p x (x)u x contributes very little, SINDy-BVP may exclude the term from the learned model. This results in an inaccurate model of L[u] = −p(x)u xx + q(x)u, but with low reconstruction errors and consequently low loss function values. Therefore, the selection of appropriate forcing functions for a given set of boundary conditions is important to adjust the order of solutions and to control the dominant balance of the observed response.
Noisy data is one of the most important barriers to successfully applying SINDy-BVP to physical systems. Noise is often amplified by numerical differentiation methods, so one approach to reducing noise is to use integral-based formulations of SINDy which was shown to improve noise-handling [29]. Alternatively, improved differentiation methods could be developed or black-box interpolation methods (e.g. neural networks) could be used to build 'clean' signals U j from noisy data. Similarly, it may be challenging to apply numerically precise forcings to a system.