Learning Hamiltonian neural Koopman operator and simultaneously sustaining and discovering conservation law

Accurately finding and predicting dynamics based on the observational data with noise perturbations is of paramount significance but still a major challenge presently. Here, for the Hamiltonian mechanics, we propose the Hamiltonian Neural Koopman Operator (HNKO), integrating the knowledge of mathematical physics in learning the Koopman operator, and making it automatically sustain and even discover the conservation laws. We demonstrate the outperformance of the HNKO and its extension using a number of representative physical systems even with hundreds or thousands of freedoms. Our results suggest that feeding the prior knowledge of the underlying system and the mathematical theory appropriately to the learning framework can reinforce the capability of machine learning in solving physical problems.

Accurate reconstruction of nonlinear dynamical systems solely based on the observational data with noise perturbations is a focal challenge in many fields of physics and engineering.The neural networks (NNs) equipped with the induced biases have remarkable abilities in learning and generalizing the intrinsic kinetics of the underlying systems from the noisy data, such as the Hamiltonian NNs [1,2], the Lagrangian NNs [3], the neural differential equations [4][5][6], and the physics-informed NNs [7][8][9][10][11][12][13][14][15].These frameworks have been applied successfully to many tasks (e.g., the generative tasks [16], the dynamics reconstruction [17][18][19][20], the intelligent control problems [21,22], and the tipping point detection [23,24]), sharing the common idea in design-utilization of an appropriate loss function enforcing the model to nearly obey the physical principles.Although progresses have been outstandingly achieved, these frameworks, which either enlarge the network complexity or overfit the noisy data during the training stage to decrease the loss, are suffered from the poor generalization abilities.Recently, endowing NNs with natural physics priors becomes an effective approach to promote the sample efficiency, robustness, and generalization ability of NNs [25][26][27][28][29].
Recent advances in the Koopman operator theory pave a direct way to identify intrinsic kinetics using infinitedimensional linear representations for strongly nonlinear systems [30][31][32][33][34][35].Several algorithms using observational data have been developed for approximating such an operator, including the dynamic mode decomposition (DMD) [32,36,37] and the extending dynamic mode decomposition (EDMD) [38,39].Although all these algorithms try to obtain parsimonious models and maintain accurate reconstructions of the unknown systems, either conservation properties or the accurate prediction cannot be obtained surely.In this Letter, inspired by the advances of physicsinformed learning and the Koopman operator theory, we articulate a framework to efficiently and robustly learn the Hamiltonian dynamics based solely on the observational data even with noise perturbations.Noticing the unitary property of the Koopman operator for the Hamiltonian dynamics, we use an orthogonal neural network to approximate the Koopman operator, so that the learned operator naturally sustains the conservation laws.Also, we include an auto-encoder structure in the NNs to identify the nonlinear coordinate transformation, mapping the original data to a low-dimensional manifold on a sphere.We test the proposed HNKO framework on a group of representative Hamiltonian systems and show its advantages over many mainstream methods in several aspects including robust preservation of the conservation laws and accurate prediction of the dynamical behaviors.
Method.-To begin with, we consider a dynamical system whose state vector evolves along some smooth symplectic vector field f (x).The flow mappings produced by this Hamiltonian dynamical system form an operator group, denoted by {F t : F t (x) = x + t 0 f (x(s))ds, x ∈ M, t > 0}.The Koopman operator K t with regard to any flow F t is an infinite-dimensional linear operator, acting on the func-arXiv:2406.02154v1[math-ph] 4 Jun 2024 tion space F = {g : M → R} and satisfying K t g = g • F t for g ∈ F. Specifically, if the time interval ∆t and the initial state x 0 ∈ M are given, a state trajectory is generated by this flow, denoted as {x k : ).For a sake of simplicity, we use K for K ∆t if ∆t is given.Since, in practice, the observational data {x i } m i=0 often contain noise perturbations, it is really difficult for the existing methods to achieve a robust and accurate approximation of K and simultaneously preserve the energy-like quantity in the considered Hamiltonian dynamics.It thus motivates us to find a framework owning all these capabilities.
The DMD algorithm [32,36,37] constructs two data matrices X and X ′ from the observational data by Then, according to [36], the optimal linear operator K = X ′ X + ∈ R n×n satisfying KX ≈ X ′ is regarded as an approximation for K, where X + = (X ⊤ X) −1 X ⊤ is the Moore-Penrose pseudoinverse.Particularly, since the Koopman operator for the conservative Hamiltonian dynamics is unitary, we should restrict the candidate K in the special orthogonal group Then, the DMD actually solves the vanilla optimal problem: arg min K∈O(n) KX − X ′ F , where ∥ • ∥ F is the Frobenius norm.The vanilla surrogate of the DMD has two major weaknesses: (i) The size of K is limited by the system's dimension n, which is not large enough to approximate the intrinsically infinite-dimensional operator K, and (ii) the orthogonal transformation preserves the norm of the state and hence induces its dynamics {K i x 0 } m i=0 embedded on a sphere, while the conserved orbit of the original Hamiltonian dynamics may not be on some n-dimensional sphere.
To overcome the first weakness, the EDMD was developed in [38,39] to lift the dimension of K by introducing a dictionary of nonlinear observational functions {g i } p i=1 and obtaining the augmented state y = (g Analogous to the DMD, the two data matrices are constructed as , which gives an approximated Koopman operator as K = Y ′ Y + ∈ R p×p .However, the EDMD uses the dictionary {g i } p i=1 as a basis of F and, in practice, it requires an extremely large dictionary to approximate the coordinate function, which makes the accurate approximation inefficient and even impossible.To reduce the computational cost and promote the representation ability, we adopt an auto-encoder NN to encode y = (g 1 (x), • • • , g p (x)) ⊤ as y = ϕ θ1 (x), and decode the coordinates as x = ϕ −1 θ2 (y), as shown in Fig. 1(a).We train the weights θ = (θ 1 , θ 2 ) in this auto-encoder using the loss function as 2 with the data.
To conquer the second weakness, we in this work embed the augmented state y to some higher-dimensional sphere, denoted by S p (r) = {y ∈ R p | ∥y∥ 2 = r 2 }, and obtain a constrained optimization problem as Actually, it is still difficult to solve this optimization problem because of its highly nonconvex property induced by the orthogonality constraint.
To solve this problem efficiently, we first notice a fact that the Lie exponent map [27].There exists an isomorphism α from R n(n−1)/2 to so(n) as α(A) = A − A ⊤ where A ∈ R n(n−1)/2 is identified as an upper triangular matrix with the zero diagonal elements.Thus, in our framework, we represent the orthogonal Koopman operator approximately as a parameterized form K = exp(α(A)), where A owns n(n − 1)/2 learnable parameters.Hence, we train K using the loss function as Significantly, our framework does not require those uninterpretable regularization term in the loss function while all the conventional methods always use it [1,3], and, indeed, the above configurations of mathematical physics automatically guarantee the orthogonality of the operator K during its training procedure.
To further ensure the augmented state trajectory {y i } m i=0 on some p-dimensional sphere, we set r, the radius of the embedded sphere, as a learnable parameter and simply set the distance to the origin as another loss function L sphere (θ, r) . We prove that the trajectory generated by K ∈ SO(p) belongs to a manifold of dimension at most ⌊p/2⌋ [see Supplementary Information (SI)].Hence, the freedom degree of the trajectory {y i } m i=0 is lower than ⌊p/2⌋.Notice that, once the hyperplane equation ⟨v, y⟩ = 0 is satisfied for any nonzero vector v, the freedom degree for y decreases by one order.Thus, to restrict the freedom degree of the augmented states, we introduce a loss as: where and q ∈ Z + .To guarantee the linear independence of the column vectors in V , we introduce an orthogonal regularization term as Finally, we train the parameters {θ, K, r, V } in a delicately-designed framework of NNs, integrating the prior knowledge of mathematical physics and using a comprehensive loss function as: , where L sphere with L deg embeds the original ndimensional data to the (p − q − 1)-dimensional manifold on S p (r), and the adjustable hyperparameter q satisfies p − ⌊p/2⌋ − 1 ≤ q ≤ p − 2. Significantly, the extracted features {g i (x)} p i=1 from the encoder span the Koopman invariant subspace.The eigenvector c = (c 1 , • • • , c p ) ⊤ associated with the eigenvalue 1 of the orthogonal Koopman matirx K induces the analytical Hamiltonian of the original system, g c (x) = p i=1 c i g i (x) (see SI). Scalability for high-dimensional systems.-Toreduce the computational complexity of the Lie exponent op- (a) The original and noise-free dynamics for the interacted bodies.Here, the motion qi = (q 1 i , q 2 i ), and the trajectories are the projections from the original spatiotemporal space q 1 i -q 2 i -t to the phase plane with a normal vector as (sin(− π 50 ) cos π 4 , sin(− π 50 ) sin π 4 , cos(− π 50 )).The grey direct line indicates the time direction and the terminal positions of the three bodies are highlighted by blue, purple, and orange colors, respectively.The reconstructed and the predicted dynamics using the HNKO (b) and the other the most advanced machine learning techniques (c)-(i) are shown, respectively.(j) The temporal variance in the logarithm scale (log(var)) of the features and the discovered Hamiltonian.(k) The system's energies using different methods change over the time.Here, we set m1,2,3 = g = 1.
eration when applying the current HNKO to any highdimensional system, we approximate the p-dimensional Koopman matrix K via K 1 ⊗ K 2 , where (K i ) pi×pi (i = 1, 2) are the two orthogonal matrices with p 1 p 2 = p, implying the orthogonality of K [40], and ⊗ is the Kronecker product.We name this extension as HNKO * .As such, by applying the HNKO * with p 1,2 = √ p, we reduce the computational cost of the learnable parameters from O(p 2 ) to O(p) (See Appendix for more illustrations).
Next, we numerically show several advantages of our framework, and validate the natural existence of the conservation in our operator.This makes the framework extremely suitable for dealing with the noise-perturbed data produced by the Hamiltonian dynamics.Indeed, we take a number of representative nonlinear Hamiltonian systems of physical significance, including the many-body problem, the stiff spring oscillator, and the Korteweg-De Vries (KdV) equation.Throughout, the noise-perturbed data are set as , where the information of the used noise {ξ i } is provided in SI.
Many-body problem.-Weconsider the classic n-body problem.First, we focus on the case of n = 3, where the canonical Hamiltonian dynamics [41]: qi = H pi , ṗi = −H qi with p i , q i ∈ R 2 (i = 1, 2, 3) representing the space coordinates and the momenta, respectively.Here, is the total conserved energy with mass m i and gravitational constant g.The three bodies interact with the others through an attractive force from gravity, and the force tends to infinity when the two particles get close to each other.As shown in Fig. 2(a), the orbits of the three bodies evolve in a higher dimensional spatiotemporal space, which results in a challenge for accurate prediction based on the noise-perturbed observational data.We show the prediction performance using the HNKO framework.Particularly, we train the NNs with the noise-perturbed data on the time interval [0, 5], less than a period, and produce the trajectories using the trained model on the interval [0, 50].As shown in Fig. 2(b), the reconstructed and the predicted dynamics are preserved in high fidelity and for a fairly long time [42].We also test the methods of the EDMD with a dictionary of at most 2 nd -order Hermite polynomials [38], the HNN [1], the SympNets [43], the deep learning Koopman operator (DLKO) [33], the CNN-LSTM [44,45], the Hamiltonian ODE graph networks (HOGN) [18], and the reservoir computing (RC) [46,47].on the same generated noise-perturbed data.The produced trajectories [see Figs.2(c)-2(i)] either cannot sustain the conservation laws or diverge extremely fast.Figure 2(j) displays a successful discovery of the Hamiltonian using the HNKO.Additional comparisons in Fig. 2(k) suggest that the HNKO framework attains the least prediction errors and only has the ability of maintaining the conservation law.In SI, we further show successful examples using our framework when the length of the training data within a period is even shorter, and also show the reconstruction ability of the HNKO in dealing with the chaotic dynamics of the three-body system.
Next, we test our framework for a specific case: n = 1, the classic Kepler problem as: q = H p , ṗ = H q with H = m 2 ∥p∥ 2 2 − gm 2 ∥q∥ −1 2 [48].We also compare this framework with the other methods.As shown in Figs.3(a)-3(d), the HNKO framework obtains the best prediction result and maintains the conservative law perfectly, while the corresponding results obtained by the EDMD and the HNN are far away from the original dynamics and energies.Notice that, for this case, the results by the EDMD seem to be better than those in Fig. 2(c) obtained for the above case n = 3.This suggests that the EDMD method using a 2 nd -order dictionary could provide acceptable prediction results for complex systems of lower dimensions.However, due to the curse of dimensionality, it cannot directly use a dictionary of higherorder polynomials to deal with the data produced from even a three-body problem, which thus impacts its practical usefulness.In SI, we further show the efficacy of the HNKO framework in reconstruction and prediction for the chaotic 3-body case and for the n-body problem with n ≫ 3 as well.Particularly, to show the advantages of the HNKO over the existing methods, we reconstruct the periodic solution, previously-found in [49], of the 24body problem in the 3-dimensional space (see Appendix Tab.I).In addition, the robustness of the HNKO against the noise perturbation is demonstrated from the view of state and energy predictions, as shown in Fig. 3(e).Significantly, Figure 3(f) shows a compelling and successful identification of the Hamiltonian using the HNKO.
Stiff mass-spring system.-Weconsider the frictionfree mass-spring system: q = p/m, ṗ = −kq, where (q, p) ∈ R 2 are the canonical coordinates representing the position and the momentum, m is the mass, and k is the elastic coefficient [50].With large k and m, the system becomes a typical slow-fast system with the stiffness coefficient (SC) as √ km [51].The system's conservative total energy 1 2 kq 2 + 1 2m p 2 corresponds to the elliptic phase orbits with an eccentricity going to 1 as the SC goes to infinity.In Fig. 4(a), under a high SC, the trajectory looks like noise series wandering along the line segment, which leads to a failure of reconstruction and prediction using the ODE solver-based learners [51].
We compare the prediction performance of HNKO under different SCs, with existing methods including the STEER, a better method for solving stiff ODEs problems [52].Figures 4(b)-4(d) shows the HNKO alleviates the effects of stiffness while the other methods perform unstably and diverge fast as SC grows.
KdV equation.-Wefinally consider the KdV equation u t + u xxx − 6uu x = 0, a Hamiltonian partial differential equation with infinitely many integrals of motion (IOM) [53].This equation is used to describe the behavior of shallow water waves with periodic solitary wave phenomenon [54], as shown in Fig. 5(a).Two low-order IOMs of the systemudx and u 2 dx, corresponding to the mass and the energy conservation-are selected as the indexes to evaluate the prediction performance.We conduct comparison studies using respective methods: the HNKO, the DMD, and the neural ordinary differential equation (NODE), a recently-developed and widely-used framework [4].The HNN and EDMD cannot work here due to [55].As shown in Figs.5(a)-5(d), the HNKO robustly keeps the periodicity and the consistency of solitary wave solutions for a long time, while the DMD only holds a short-term forecast, showing a rapid decay and the trained NODE shows high fluctuations due to its less robustness against noise interference.After a sufficiently long time evolution, the required solitary wave behavior is only observed in the trained model using the HNKO, as shown in Fig. 5(e).More significantly, due to the orthogonality rooted in the HNKO, the conservation laws of both mass and energy are sustained only in the trained model using the HNKO [see Figs.5(f)-5(h)].In Appendix Tab.II, we further successfully apply the HNKO * to cope effectively with the high dimensional tasks with the number of the grid points even as 1024.
Concluding remarks.-Wehave proposed a machine learning framework to approximate the Koopman operator for the Hamiltonian dynamics.Different from the mainstream NN methods [1,3,56], our framework integrates typical mathematical physical structures of orthogonality and flexibility and thus has natural advantages in accurately reconstructing the Hamiltonian dynamics from the noise perturbed data, and achieving accurate prediction simultaneously and perfectly with energy conservation [57,58].Therefore, it can be applied to reconstruction and prediction problems in the real-world scenarios where the conservation laws are persistent but the collected data are more or less shuffled and contaminated.More importantly, based on the Koopman invariant space spanned by the features in the encoder and the orthogonal spectral decomposition of the Koopman matrix, we show that our HNKO owns an ability of discovering the conservation laws with analytical expression, which makes an essential step towards solving the critical problem of discovering conservation laws [26,56,59,60].

FIG. 1 .
FIG.1.A sketch for the HNKO framework.(a) A combination of the auto-encoder with the orthogonal Koopman matrix K using NNs.(b) Geometrically, the encoder embeds the original data in R n to some low-dimensional manifold on p-dimensional sphere, and the decoder reverses this process, where K maps the trajectory on the embedded manifold.

FIG. 2 .
FIG. 2. Comparison studies on the three-body problem.(a)The original and noise-free dynamics for the interacted bodies.Here, the motion qi = (q 1 i , q 2 i ), and the trajectories are the projections from the original spatiotemporal space q 1 i -q 2 i -t to the phase plane with a normal vector as (sin(− π 50 ) cos π 4 , sin(− π 50 ) sin π 4 , cos(− π 50 )).The grey direct line indicates the time direction and the terminal positions of the three bodies are highlighted by blue, purple, and orange colors, respectively.The reconstructed and the predicted dynamics using the HNKO (b) and the other the most advanced machine learning techniques (c)-(i) are shown, respectively.(j) The temporal variance in the logarithm scale (log(var)) of the features and the discovered Hamiltonian.(k) The system's energies using different methods change over the time.Here, we set m1,2,3 = g = 1.

FIG. 3 .
FIG.3.Comparison studies on the Kepler problem.(a) The original dynamics and the predicted phase orbits using different methods and the coordinate q = (q 1 , q 2 ).The changes of the kinetic energy E k (b), the potential energy Ep (c), and the total energy E (d) over the time for different methods.(e) Prediction errors of the state and the energy change with the noise variance σ 2 , using HNKO.Here, m = g = 1.(f) log(var) of the features and the discovered Hamiltonian.

FIG. 4 .
FIG. 4. Comparison studies on the stiff mass-spring system.(a) The mean square error (MSE data ) between the normalized trajectories and the vertical line segment under different SCs.The subfigures show the trajectories in the least and the stiffest cases.(b) The mean prediction MSE on time interval [0, 9] of different methods over SCs.The prediction MSE (c) and the energy (d) over the time for different methods.

FIG. 5 .
FIG. 5. Comparison studies on the KdV equation.(a) The dynamics produced by the KdV equation and perturbed with the Gaussian noise N (0, 0.03I).The predicted trajectories using the DMD (b), the NODE (c), and the HNKO (d).Different solitary waves produced by using different methods at t = 400 (e), the prediction error for the state (f), the total mass (g), and the total energy (h) over the time using different methods.The inset in (g) zoom in the changes of mass using the NODE and the HNKO.Here, we introduce the discretization to u(t, •) on [0, 50] by 64 predefined grid points.