Adaptable Hamiltonian neural networks

The rapid growth of research in exploiting machine learning to predict chaotic systems has revived a recent interest in Hamiltonian Neural Networks (HNNs) with physical constraints defined by the Hamilton's equations of motion, which represent a major class of physics-enhanced neural networks. We introduce a class of HNNs capable of adaptable prediction of nonlinear physical systems: by training the neural network based on time series from a small number of bifurcation-parameter values of the target Hamiltonian system, the HNN can predict the dynamical states at other parameter values, where the network has not been exposed to any information about the system at these parameter values. The architecture of the HNN differs from the previous ones in that we incorporate an input parameter channel, rendering the HNN parameter--cognizant. We demonstrate, using paradigmatic Hamiltonian systems, that training the HNN using time series from as few as four parameter values bestows the neural machine with the ability to predict the state of the target system in an entire parameter interval. Utilizing the ensemble maximum Lyapunov exponent and the alignment index as indicators, we show that our parameter-cognizant HNN can successfully predict the route of transition to chaos. Physics-enhanced machine learning is a forefront area of research, and our adaptable HNNs provide an approach to understanding machine learning with broad applications.


I. INTRODUCTION
A daunting challenge in machine learning is the lack of understanding of the inner working of the artificial neural networks. As machine learning has been increasingly incorporated into many vital structures and systems that support the functioning of the modern society, it is imperative to develop a general understanding of the inner gears of the underlying neural networks. For example, feed-forward neural networks or multilayer perceptrons constitute the fundamentals of modern deep learning machines with broad applications in image, video and audio processing [1]. Such a neural machine typically consists of an input layer, a large number of hidden layers, and an output layer. From the input layer on, nodes in the same layer do not interact with each other, but they are connected with the nodes in the next layer via a set of weights and biases whose values are determined through training, where the paradigmatic method of stochastic gradient descent (SGD) [2] is often used. How the networks in different layers work together to solve a specific problem remains unknown. In another line of research, reservoir computing, a class of recurrent neural networks [3][4][5][6], has gained considerable momentum since 2017 as a powerful paradigm for model-free, fully data driven prediction of nonlinear and chaotic dynamical systems [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. A reservoir computing machine constitutes an input layer, a single hidden layer, and an output layer. Differing from the network structure of a * Ying-Cheng.Lai@asu.edu multilayer perceptron, the network in the hidden layer of a reservoir computing machine has a complex topology in which the nodes are coupled with each other following some probability distribution. Another difference is that, in feed-forward neural networks, only the weights and biases connecting the hidden layer and the output layer neurons are determined by training, while in reservoir computing those parameters as well as the weights of the complex network in the hidden layer are pre-defined. A well trained reservoir machine can generate accurate prediction of the state evolution of a chaotic system for a duration that is typically several times longer than that which can be achieved using the traditional methodologies in nonlinear time series analysis. This is remarkable, considering the hallmark of chaos: sensitive dependence on initial conditions, which rules out long-term prediction. Yet, there is little understanding of how the internal network dynamics of reservoir computing machines behave or "manage" to replicate accurately (for some amount of time) the chaotic evolution of the true system.
At the present, to develop a general explainable framework to encompass various types of machine learning is not feasible. In this regard, a promising direction of pursuit is the so-called physics-enhanced machine learning, in which the neural networks are designed to solve specific physics problems with the goal to enhance the learning efficiency through exploiting the underlying physical principles or constraints. The idea was articulated almost three decades ago [24], when the principle of Hamiltonian mechanics was incorporated into the design of neural networks, leading to Hamiltonian Neural Networks (HNNs) that have recently gained renewed attention [25][26][27][28][29]. Comparing with traditional neural networks, in an HNN, the energy is conserved. It has been demonstrated that, an HNN can be trained to possess the power to predict the dynamical evolution of the target Hamiltonian system in both integrable and chaotic regimes, provided that the network is trained with data taken from the same set of parameter values at which the prediction is to be made [25][26][27][28][29]. Recently the principle of HNN has been generalized [30] to systems described by the Lagrangian equation of motion [31] and a general type of ordinary differential equations [32] or coordinate transforms [33,34] with applications in robotics [35,36].
In this paper, we address adaptability, a fundamental issue in machine learning, of Hamiltonian neural networks. More precisely, we consider the situation where a target Hamiltonian system can experience slow drift or sudden changes in some parameters. Slow environmental variations can lead to adiabatic parameter drifting, while external disturbances can lead to sudden parameter changes. We ask if it is possible to design HNNs, which are trained with data from a small number of parameter values of the target system, to have the predictive power for parameter values that are not in the training set. Inspired by the recent work on predicting critical transitions and collapse in dissipative dynamical systems based on reservoir computing [37][38][39][40][41], we articulate a class of HNNs whose input layer contains a set of channels that are specifically used for inputting the values of the distinct parameters of interest to the neural network. The number of the parameter channels is equal to the number of freely varying parameters in the target Hamiltonian system. The simplest case is where the target system has a single bifurcation or control parameter so only one input parameter channel to the neural network is necessary. We demonstrate that, by incorporating such a parameter channel into a feed-forward type of HNNs and conducting training using time series data from a small number of bifurcation parameter values (e.g., four), we effectively make the HNN adaptable to parameter variations. That is, the so-trained HNN has inherited the rules governing the dynamical evolution of the target Hamiltonian system. When a parameter value of interest, which is not in the training parameter set, is fed into the HNN through the parameter channel, the machine is capable of generating dynamical behaviors that statistically match those of the target system at this particular parameter value. The HNN has thus become adaptable because it has never been exposed to any information or data from the target system at this parameter value, yet the neural machine can reproduce the dynamical behavior. Using the Hénon-Heiles model as a prototypical target Hamiltonian system, we demonstrate that our adaptable HNN can successfully predict the dynamical behaviors, integrable or chaotic, for any parameter values that are reasonably close to those in the training parameter set. Remarkably, by feeding a systematically varying set of bifurcation parameter val-ues into the parameter channel, the HNN can successfully predict the transition to chaos in the target Hamiltonian system, which we characterize using two measures: the ensemble maximum Lyapunov exponent and the alignment index. It is worth emphasizing that, in the existing literature on HNNs [25][26][27][28][29], training and prediction are done at the same set of parameter values of the target Hamiltonian system, but our work goes beyond by making the HNN significantly more powerful with enhanced and expanded predictability.
We remark that, in physics, machine learning has been exploited to solve difficult problems in particle physics [42,43], quantum many-body systems [44], inverse design in optical systems [45], and quantum information [46,47]. However, the working mechanisms of the underlying neural networks remain largely unknown [48]. The physics enhanced HNNs studied here are different from these applications, as we focus on exploiting physical principles to enable neural networks with unprecedented predictive power with respect to parameter variations.
In Sec. II, we describe the architecture of the articulated parameter-cognizant HNNs and the method of training. In Sec. III, we present results of predicting the dynamical behavior of the Hénon-Heiles system in a wide parameter region, including the transition to chaos based on calculating the ensemble maximum Lyapunov exponent and the minimum alignment index. In general, the prediction accuracy depends on how "close" the desired parameter value is to the training regime. In Sec. IV, we address a number of pertinent issues such as the choosing of the training parameter values, multiple parameter channels, and HNNs for a Hamiltonian system defined by the one-dimensional Morse potential. A summarizing discussion and speculations are offered in Sec. V.

II. PARAMETER-COGNIZANT HAMILTONIAN NEURAL NETWORKS
The central idea for physics-enhanced machine learning is to "force" the dynamical evolution of the neural network to follow certain physical rules or constraints, examples of which are Hamilton's equations of motion [25][26][27][28][29], Lagrangian equations [31], or the principle of least action [49,50]. In particular, the structure of HNNs is such that the underlying neural dynamical system is effectively a Hamiltonian system for which the energy is conserved during the evolution. Different from previous work [25][26][27][28][29], the bifurcation parameter of the target Hamiltonian system serves as an input "variable" to the neural network through an additional input channel so that the HNN learns to associate the input time series with the specific value of the bifurcation parameter. Using time series from a small number of distinct bifurcation parameter values to train the HNN, it can gain the ability to "sense" the changes in the dynamics (or dynamical "climate") of the target system with the bifurcation pa-rameter.
FIG. 1. Structure of parameter-cognizant HNN. The input channels are denoted by the blue circles, which are connected to the first hidden layer (purple circles). The blue circle denoted by "α" is the parameter input channel that feeds the value of the bifurcation parameter of the target Hamiltonian system, together with the time series q(t) and p(t) through the corresponding input channels, into the first hidden layer. There are two hidden layers. The output variables are the partial derivatives of the Hamiltonian of the target system with respect to the canonical coordinates and momenta, together with the Hamiltonian, which determine the dynamical state at the next time step.
The structure of our articulated parameter-cognizant HNN is shown in Fig. 1, where the input contains three parts: the position and momentum variables of the target system, and the bifurcation parameter. To be concrete, we use two hidden layers, where each layer contains 200 artificial neurons (nodes). The third layer is the output, which contains a single node whose dynamical state corresponds to the Hamiltonian of the target system. Let y denote the set of dynamical variables of each layer. The transform from the dynamical variables in the ith layer to those in the (i + 1)th layer follows the following rule: where σ i is a given nonlinear activation function, W i is the weight matrix and b i is bias vector associated with the neurons in the ith layer, which are to be determined through training. We set the output as the spatial derivatives of the input variables to force the dynamics of the neural network to follow the Hamilton's equations of motion. The derivatives are calculated through the back prorogation algorithm. Once the output is known, the loss function defined as can be calculated. Through the training process, we optimize the weights and biases in Eq. (1) by minimizing the loss function. This is done by using the standard SGD method [2]. The whole process from network construction and training to carrying out the prediction is accomplished by using the open source package Tensorflow and Keras [51,52].  Table I summarizes the structure and parameters of our HNN. It has about 40,000 unknown parameters to be optimally determined through training. The computation can be quite efficient even without using parallel or GPU acceleration. The anticipation is that, after training with time series data from a small number of distinct bifurcation parameter values, the HNN can predict the dynamical behavior of the target Hamiltonian system in a wide parameter interval, where the parameter variations are implemented through the input parameter channel to the HNN.

III. ADAPTABLE HAMILTONIAN NEURAL NETWORKS FOR PREDICTING TRANSITION TO CHAOS
To test the adaptability of our parameter-cognizant HNN for predicting state evolution and dynamical transitions in Hamiltonian systems, we use the paradigmatic Hénon-Heiles model [53]. It is a two-degrees-of-freedom system for investigating distinct types of Hamiltonian dynamics including integrable, mixed, and chaotic behaviors and the transitions among them. It was originated from the gravitational three-body system [53], with applications in contexts such as molecular dynamics [54][55][56].

A. System description
The Hénon-Heiles Hamiltonian is where q 1 and q 2 denote the coordinates, p 1 and p 2 are the corresponding momenta, α > 0 is a bifurcation parameter that sets the magnitude of the nonlinear potential function describing, e.g., the dissociation energy in molecules [54][55][56]. The dynamics of system Eq. (3) depend not only on α, but also on the energy E of the system that is conserved during the dynamical evolution. The maximum value of the potential function is E max = 1/(6α 2 ). For α = 0, E max diverges, so all trajectories are bounded. For α > 0, if the particle energy exceeds E max , the Hamiltonian system becomes open with scattering trajectories that can escape to infinity. To train the adaptable HNN, bounded trajectories are required, so we set 0 ≤ α ≤ 1 and E ≤ 1/6. (For particle energy above the threshold, chaotic scattering dynamics and fractal geometry can arise [57][58][59].) As the value of α increases from zero to one, characteristically different dynamical behaviors can arise, such as integrable, mixed, and chaotic. In particular, for α = 0, the nonlinear term in Eq. (3) disappears and the system becomes a harmonic oscillator -an integrable system. In this case, the entire phase space contains periodic and quasiperiodic orbits only, as shown in Fig. 2(a). As α increases from zero, the system becomes nonlinear and chaotic seas amid the Kol'mogorov-Arnol'd-Moser (KAM) islands can arise in the phase space, giving rise to mixed dynamics, as shown in Fig. 2(b) for α = 0.7 and E = 1/6. For α = 0.9, α = 1 and E = 1/6, most trajectories in the phase space are chaotic, as shown in Figs. 2(c) and 2(d).

B. Training and testing of adaptability
Our goal of training is to "instill" certain adaptable power into the HNN. To achieve this, we choose a number of distinct values of the bifurcation parameter α. For each α value, we randomly choose initial conditions with energy below the escape threshold E max and numerically integrate the Hamilton's equations of motion to generate particle trajectories in the phase space. Because of the mixed dynamics, the training data contain both integrable and chaotic orbits. Specifically, the time interval of the trajectory is 0 ≤ t ≤ 1000, which contains hundreds of oscillation cycles, and we collect training data using the sampling time step dt = 0.1. The energy associated with the training data is maintained to be constant to within 10 −6 .
In general, the weights and biases of the adaptable HNN determined by the SGD method depend on the training data set. To reduce the prediction error, an ensemble of HNNs can be used [41]. Concretely, for each value of α, we generate 20 different sets of data for training, leading to an ensemble of 20 HNNs. The parameter setting for training is listed in Tab. II.
After the training, all the weights and biases in Eq. (1) are determined. The Hamiltonian and its derivatives for each network in the HNN ensemble can be evaluated for any input, leading to the average derivative values. To characterize the prediction accuracy for different values of the bifurcation parameter, we use the root-mean square error (RMSE) that can be calculated from the difference between the HNN predicted and the true orbits. For α = 0, the motion is integrable so the predicted orbit is always close to some real orbit, leading to exceedingly small errors. In this case, we take advantage of one feature of HNNs that it directly yields the Hamiltonian function, from which the potential function can be calculated. It is thus convenient to use the relative error between the predicted potential function and the true one to characterize the HNN performance, which is defined as where the average is taken in the region of V real < 1/6. The predicted potential profile is given by V pred = H pred − C, where C = min(H pred ) so that the minimum value of V pred is zero. Note that the average in Eq. (4) is calculated from an integral in a 2D domain in the physical space, for which the boundary of the domain needs to be specified. A natural choice of the criterion to set the boundary would be V (x, y) < E max = 1/6, but occasionally the predicted orbit will diverge. Numerically, there are different ways to overcome this difficulty. For example, if the boundary is set according to the criterion: max(V pred , V real ) < 1/6, then almost all orbits are bounded, rendering calculable the error ∆V . We demonstrate that HNN can be used to reconstruct the Hamiltonian of the target system. Consider phase-space points for H(α, q 1 , q 2 , p 1 , p 2 ) < 1/6 and expand the Hamiltonian about the origin using the Taylor series: where β's are the expansion coefficients, and the sum is taken according to 0 ≤ sum(i 1 , i 2 , i 3 , i 4 ) ≤ 3, which contains in total 35 terms. Comparing with true Hamiltonian Eq. (3), only six terms are non-zero. In the shaded region that contains these four values of α, the relative error is less than 2%, demonstrating the adaptability of the HNN in predicting the target Hamiltonian system for parameter values not in the training set. The adaptability extends even outside the shaded region but with larger error (still within 8% though). (b) Coefficients of the Taylor expansion for H pred versus the bifurcation parameter α, where β2000, β0200, β0020 and β0002 correspond to the first four square terms in Eq. (3) whose true value is 1/2, and β2100 and β0300 correspond to the two cubic terms that are proportional to α. Other terms in the expansion do not appear in the original Hamiltonian, among which the first two largest ones are β3000 and β1200 that correspond to other cubic potential terms.
We train the HNN at four values of the bifurcation parameter: α ∈ {0.2, 0.4, 0.6, 0.8}. For each α value, we choose seven random initial conditions with their energies below the threshold. Figure 3(a) shows the relative error in predicting the potential function for α ∈ (0, 1). The interval in α can be divided into two parts: the shaded region α ∈ [0.2, 0.8] that contains the four values of α used in training, and the blank regions on both side of the shaded region. In the shaded region, the relative error is less than 2%, but the error increases away from the shaded region. Figure 3(b) shows the expansion coefficients for the predicted Hamiltonian. Comparing with the terms in the real Hamiltonian, our HNN predicts accurately the linear terms. For the nonlinear terms, the HNN reproduces the behavior with the variation in the bifurcation parameter α, where the errors are small in the shaded region in Fig. 3(b) but relatively large outside the region. To examine the adaptability of our parametercognizant HNN in more detail, we take, for example, α = 0.7 in between the two training points α = 0.6 and α = 0.8, for which the vast majority of the orbits with energy E = 1/6 are quasiperiodic, as can be seen from Fig. 2(b). Figures 4(a) and 4(b) show the true and predicted potential functions for E < 1/6, respectively, which are essentially indistinguishable. Figures 4(c) and 4(d) show some representative true and predicted orbits starting from the same initial condition, which agree with each other qualitatively but differ in detail. Particularly worth emphasizing is the fact that for the predicted quasiperiodic orbit, the energy can be maintained at a constant value. In fact, we have tested the method of reservoir computing [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] for predicting the orbit and find that, while it typically yields a more accurate orbit in short time (e.g., a few cycles), in the long run the energy is not conserved and the prediction error becomes large. Overall, since the testing bifurcation parameter value α = 0.7 is sandwiched between two training points, our parameter-cognizant HNN exhibits a strong adaptability.
For α = 1, with energy E = 1/6, most of the orbits are chaotic, where the portion of the KAM islands in the phase space becomes relatively insignificant, as shown in Fig. 2(d). In the case, the contour map of the true potential function has a triangular shape, as shown in Fig. 5(a). The predicted potential contour map is shown in Fig. 5(b), which agrees reasonably well with the true one. Figures 5(c) and 5(d) show a true and the predicted chaotic orbits from the same initial condition. While their details are different, the HNN predicts correctly that the orbit is chaotic. In fact, as will be shown in Sec. III C, for the two quantities characterizing the statistical behavior of the orbit, e.g, the maximum Lyapunov exponent and the alignment index, the predicted orbit yields the same results as those from the true orbit. In general, the closer the testing parameter value is to one of the training points, the higher the prediction accuracy.

C. Adaptable prediction of a Hamiltonian system
In a typical Hamiltonian system, the route of transition to ergodicity as a nonlinearity parameter increases is as follows [60]. In the weak nonlinear regime, the system is integrable, where the motions are quasiperiodic and occur on tori generated by different initial conditions, as illustrated in Fig. 2(a) for the Hénon-Heiles system. As the nonlinearity parameter α increases, chaotic seas of various sizes emerge, leading to a mixed phase space, as exemplified in Fig. 2(b). In the regime of strong nonlinearity, e.g., α = 1, most of the phase space constitutes chaotic seas with only a small fraction still occupied by KAM islands, as shown in Fig. 2(c) and (d).
Here we provide strong evidence for the adaptability of our parameter-cognizant HNN by demonstrating that it can accurately predict the transition scenario, with training conducted based on time series from only a handful values of the nonlinearity parameter.
Distinct from dissipative systems in which random initial conditions in the basin of attraction of an attractor (periodic or chaotic) lead to trajectories that all end up in the same attractor, in Hamiltonian systems different initial conditions typically lead to different dynamically invariant sets. Because of this feature of Hamiltonian systems, to investigate the transition scenario, computations from initial conditions in the whole phase space leading to a statistical assessment and characterization of the resulting orbits are necessary. We focus on two statistical quantities: the largest Lyapunov exponent and the minimum alignment index, where the former characterizes the exponential separation rate of infinitesimally close trajectories and the latter measures the relative "closeness" of two arbitrary vectors along the trajectory (e.g., whether they become parallel, anti-parallel, or neither) [61]. For a chaotic trajectory, an infinitesimal vector stretches or contracts exponentially along the unstable or the stable direction, respectively. As a result, a random vector will approach the unstable direction along the trajectory and two random vectors will align with each other quickly. In particular, given two initial vectors u 0 1 and u 0 2 , after i time steps, they become u i 1 and u i 2 , respectively. The minimum alignment index is defined as When chaos sets in, the value of γ i will quickly approach zero with time. For a properly trained HNN with its weights and biases determined, the output contains the predicted Hamiltonian whose partial derivatives with respect to the coordinate and momentum vectors can be calculated directly based on the architecture of the neural network. These partial derivatives constitute the velocity field of the underlying dynamical system, whose Jacobian matrix can then be determined, from which the machine predicted Lyapunov exponents and the alignment index can then be calculated (see Appendix A). The true values of the Lyapunov exponents and the minimum alignment index can (a-c) The ensemble maximum Lyapunov exponent λM , the ensemble minimum alignment index γm together with the threshold 10 −8 , and the fraction fc of chaos in the phase space versus the nonlinearity parameter α, respectively. Transition to chaos occurs about α > ∼ 0.7. The orange and blue colors correspond to the true and HNN predicted results, respectively. There is a reasonable agreement between the predicted and true behaviors, attesting to the adaptable predictive power of the HNN. be calculated directly from the original Hamiltonian (3) of the target system.
In our calculation, we take 100 equally spaced values of the bifurcation parameter in the unit interval: α ∈ [0, 1]. For each α value, we choose 200 random initial conditions and calculate, for each initial condition, the values of the largest Lyapunov exponent and the minimum alignment index. A trajectory is deemed chaotic [62] if the largest exponent is positive and the minimum alignment index is less than 10 −8 . We denote the maximum Lyapunov exponent and the minimum alignment index from the ensemble of 200 trajectories as λ M and γ m , respectively, which are functions of α. Another quantity of interest is the fraction of chaotic trajectories, denoted as f c , which also depends on α. The triplet of characterizing quanti-ties, λ M , γ m , and f c , can be calculated from the HNN and from the original Hamiltonian as a function of α. A comparison can then be made to assess the adaptable power of prediction of our parameter-cognizant HNN. Figures 6(a-c) show the machine predicted and true values of λ M , γ m , and f c versus α, respectively, for particle energy E = 1/6. It can be seen that chaos arises for α > ∼ 0.7, at which λ M becomes positive, γ m decreases to 10 −8 , and f c begins to increase from zero. In Fig. 6(a), the true value of λ M for 0 ≤ α < 0.7 is essentially zero, but the HNN predicted λ M is slightly positive. The remarkable feature is that both types of λ M value begins to increase appreciably for α > 0.7. In fact, there is a reasonable agreement between the true and predicted behavior of λ M . Similar features are present in the behaviors of γ m and f c versus α, as shown in Figs. 6(b) and 6(c), respectively. These results are strong evidence that our parameter-cognizant HNN is capable of adaptable prediction of distinct dynamical behaviors in Hamiltonian systems.

IV. ISSUES PERTINENT TO ADAPTABILITY OF HAMILTONIAN NEURAL NETWORKS
We address the adaptability of HNNs by asking the following three questions. First, can the adaptability of HNNs be enhanced by increasing the number of training values of the bifurcation parameter? Second, can adaptability be achieved with multiple parameter channels? Third, does adaptability hold for different target Hamiltonian systems?

A. Effect of number of training parameter values
So far, we have used four distinct values of the bifurcation parameter to train our parameter-cognizant HNN. We now investigate if the adaptability can be enhanced by increasing the number of training parameter values. Here by "enhancement" we mean a reduction in the overall errors of predicting the Hamiltonian in a parameter interval that contains values not in the training set. To test this, we conduct the following numerical experiment. We choose N ≥ 3 training parameter values and, for each parameter value, we train the HNN M times using an ensemble of time series collected from M energy values below the threshold (10 time series from 10 random initial conditions with energy below the threshold). To make the comparison meaningful, we choose the values of M and N such that N M is approximately constant. In particular, for Simulation #1, we set N = 3: α = 0. shown in Fig. 7. It can be seen that the errors for N = 3 are generally larger than those for N > 3, but the errors for the two cases (N = 4 and 5) are approximately the same, indicating that increasing N above four will not lead to a significant reduction in the errors of adaptable prediction. That is, by training with multiple time series from four values of the bifurcation parameter, the HNN has already acquired the necessary adaptability for predicting the system behavior at other nearby parameter values.

B. HNNs with two parameter channels
We construct parameter-cognizant HNNs with more than one parameter channel. For this purpose, we modify the Hénon-Heiles Hamiltonian Eq. (3) to where α 1 and α 2 are two independent bifurcation parameters, requiring two independent parameter input channels to the HNN. The energy threshold for bounded motions can be evaluated numerically. We conduct training for a number of combinations of α 1 and α 2 values: α 1 , α 2 ∈ {0.2, 0.4, 0.6, 0.8}. The training data are generated as follows: for each parameter pair, we choose five energy values below the threshold and, for each energy value, a single time series is collected. After the training is done, we predict the potential function for parameter variation. Figure 8 shows the color-coded ensemble prediction error ∆V in the (α 1 , α 2 ) plane. For some combinations of α 1 and α 2 with a relatively large difference in their values, the threshold energy is less than 1/6. For such cases, the integration domain in Eq. (4) is modified accordingly based on the threshold value. It can be seen that, in the parameter region (α 1 , α 2 ) ∈ (0.2, 0.8), the prediction error is about 5%, while the errors outside the region tend to increase. At the two off-diagonal corners, the errors are the largest, due to the strong asymmetry in the potential profile. Figure 8 demonstrates that HNNs with two parameter channels can be trained to be adaptable for prediction.

C. HNNs for a diatomic molecule system
We consider a different target Hamiltonian system, a system defined by the one-dimensional Morse potential that describes the interaction between diatomic molecule [63]. This system was previously used in the study of chaotic scattering [64,65]. The Hamiltonian is given by  Figure 9 shows the predicted potential profile for a = 1.5, together with those for the two training points a = 1 and a = 2. The result is accurate for x around the minimum potential point, but large errors arise when the position is far away from the minimum point. A plausible reason is that, for large values of x, the potential varies slowly, resulting in small changes in the momentum. As a result, the corresponding portions of the time series exhibit less variation, leading to large prediction errors by the HNN. The trained HNN has apparently gained certain adaptability, as the prediction result for a = 1.5 is not the interpolation of those for a = 1.0 and a = 2.0.

V. DISCUSSION
Developing adaptable machine learning in general has broad applications to critical problems of current interest. For example, a problem of paramount importance is to predict how a system may behave in the future when some key parameters of the system may have drifted, based on information that is available at the present. As an example, suppose an ecosystem is currently in a normal state. Due to the environmental deterioration, some of its parameters such as the carrying capacity and/or the species decay rates will have drifted in the future. Is it possible to predict if the system will collapse when certain amount of parameter drift has occurred, when the system equations are not known and the only available information is time series data that can be measured prior to but including the present? Adaptable machine learning offers a possible solution. For example, it has been demonstrated recently [41] that incorporating a parameter-cognizant mechanism into reservoir computing machines enables prediction of possible critical transition and system collapse in the future for any given amount of parameter drift. However, the state-of-theart reservoir computing schemes under intensive current research [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] do not taken into account physical constraints such as energy conservation, so they are suitable but for dissipative dynamical systems.
Combining the laws of physics and traditional machine learning has the potential to significantly enhance the performance and predictive power of neural networks. It has been demonstrated recently that enforcing the Hamilton's equations of motion in the traditional feedforward neural networks can lead to improvement in the prediction accuracy for Hamiltonian systems in both integrable and chaotic regimes [25][26][27][28][29]. In these studies, training and prediction are conducted for the same set of parameter values of the target Hamiltonian system, so the underlying Hamiltonian neural networks are not adaptable in the sense that they are not capable of predicting the dynamical behavior of the system at a different parameter setting.
Do adaptable HNNs that we have developed have any practical significance? From the point of view of making predictions of the future states of Hamiltonian systems subject to parameter drifting, the answer is perhaps no. The main reason is that HNNs require all coordinate and momentum time series. For example, one may be interested in predicting whether a complicated many-body astrophysical system may lose its stability and become mostly chaotic in the future, where the only available information is the position and momentum measurements prior to or at the present when the system is still in a mostly integrable regime. As the laws of physics for this system are known, the data required for training is not a lesser burden than knowing the Hamiltonian itself. Nonetheless, our work generates insights into the working of HNNs, as follows.
Our parameter-cognizant, adaptable HNNs have a parameter input channel to the standard multilayer network with the loss function stipulated by the Hamilton's equations of motion, and are capable of successful prediction of transition to chaos in Hamiltonian systems. In particular, through training with coordinate and momentum time series from four different values of the bifurcation (nonlinearity) parameter, the machine gains adaptability as evidenced by its successful prediction of the dynamical behavior of the target system in an entire parameter interval containing the training parameter values. That is, the benefits of training are that the HNN has learned not only the dynamical "climate" of the target Hamiltonian system but also how the "climate" changes with the bifurcation parameter. Machine learning can thus be viewed as a process by which the neural network selfadjusts its dynamical evolution rules to incorporate those of the target system.
When systematically varying values of the bifurcation parameter are fed into the HNN, it can predict the transition to chaos from a mostly integrable regime, as determined by the ensemble maximum Lyapunov exponent and minimum alignment index as well as the fraction of chaos as a function of the bifurcation parameter. For a single parameter channel, the adaptable predictive power is achieved insofar as the training parameter set contains at least three or four distinct values. For an HNN with duplex parameter channels, the size of the required training parameter set should be at least four by four. Adaptable prediction has also been accomplished for a different Hamiltonian system defined by the Morse potential function. We expect the principle of designing parametercognizant HNNs and the training method devised in this paper to hold for general Hamiltonian systems.
One issue is the dependence of the energy surface on the bifurcation parameter. As the parameter changes continuously, the energy surface will evolve accordingly. If we intend to predict the system dynamics for some specific value of the bifurcation parameter for a fixed energy value, the training data sets should contain time series collected from a larger energy value to cover the pertinent phase space region at the desired energy value.
It should also be noted that, using HNNs to predict the transition from integrable dynamics to chaos in the Hénon-Heiles system was first reported [28], which relied on using energy E as the control parameter for a fixed value of the nonlinearity parameter (e.g., α = 1). Here we have studied the transition using α as the bifurcation parameter for a fixed energy value (e.g., E = 1/6). The two routes are equivalent because the Hénon-Heiles system Eq. (3) possesses a three-fold symmetry in the configuration space. Such an equivalence also arises in systems whose potential function contains nonlinear square terms, e.g., the classical φ 4 or FPU model [66,67]. However, for the two-parameters Hamiltonian Eq. (7) studied in this paper, the three-fold symmetry is broken, destroying the equivalence between varying the nonlinearity parameter and energy. In fact, for Hamiltonian systems such as the Morse and double-pendulum systems, the equivalence does not hold either [63,65,68]. Our adaptable HNN does not rely on any such equivalence, and can be effective in predicting the transition to chaos in any type of Hamiltonian systems. Given a dynamical system dx/dt = f (x), the Jacobi matrix is given by J = ∂f /∂x. For a Hamiltonian system, the dynamical variables are x ≡ [q, p] T and For an HNN, in principle, the Hamiltonian H is given by a sequence of operations of the neural network with the weights and biases in Eq. (1) determined by training. An alternative but efficient approach to calculating the Jacobian matrix J is the finite-difference method.
In particular, for a given initial condition, we generate an orbit of N points with time interval dt and calculate J at each time step. Let the sequence of Jacobian matrices be denoted as J (t 0 ), J (t 1 ), · · · , J (t N ), and let Y(t 0 ) be the identity matrix I. If the phase space of the target Hamiltonian system is D-dimensional (D = 4 for the Hénon-Heiles system), there are D Lyapunov exponents. Let λ be the vector of the D Lyapunov exponents: λ ≡ (λ 1 , λ 2 , . . . , λ D ) T and set the initial values of the exponents to be zero: λ(t 0 ) = (0, 0, . . . , 0) T . After N steps, we have Carrying out the QR decomposition of the matrix Y(t N ) with the resulting matrices denoted as Q and R, we have where R jj is the jth diagonal element of the matrix R. The Lyapunov exponents are given by The maximum Lyapunov exponent is λ M = max j (λ j ).
Normalizing the vectors u 1,2 (t N ) by their respective magnitude to have the unit length, we obtain the minimum alignment index as γ m = lim N →∞ min( u 1 (t N ) + u 2 (t N ) , u 1 (t N ) − u 2 (t N ) ). (A7)