Learning orbital dynamics of binary black hole systems from gravitational wave measurements

We introduce a gravitational waveform inversion strategy that discovers mechanical models of binary black hole (BBH) systems. We show that only a single time series of (possibly noisy) waveform data is necessary to construct the equations of motion for a BBH system. Starting with a class of universal differential equations parameterized by feed-forward neural networks, our strategy involves the construction of a space of plausible mechanical models and a physics-informed constrained optimization within that space to minimize the waveform error. We apply our method to various BBH systems including extreme and comparable mass ratio systems in eccentric and non-eccentric orbits. We show the resulting differential equations apply to time durations longer than the training interval, and relativistic effects, such as perihelion precession, radiation reaction, and orbital plunge, are automatically accounted for. The methods outlined here provide a new, data-driven approach to studying the dynamics of binary black hole systems.


I. INTRODUCTION
Classical physical theories begin with scientific laws as ansätze, which are validated by repeated scientific experiment. From these laws, one derives a set of equations (usually differential equations) that can be solved, either completely or partially, to deduce various conclusions about the physical system under consideration. In this paper, we follow a different approach to learning physical equations: we solve an optimization problem that isolates the most likely physical model (differential equations) that would deliver certain physical measurements (data). This approach is aligned with a growing trend in data-driven science; see, e.g., [1][2][3][4][5][6][7][8][9][10][11][12].
The focus of this paper is on learning mechanical models for binary black hole (BBH) systems through gravitational wave measurements. As the black holes orbit one another, the motion of these massive objects generate gravitational waves that radiate away to the far-field where they can be observed by an international-network of detectors [13]. Complicated partial differential equations (PDEs) govern the entire process, and in particular connect the near-field dynamics to the far-field gravitational radiation. Traditionally, black hole orbital dynamics and gravitational waves have been computed by expensive simulation codes [14] or approximations to general relativity such as the post-Newtonian formalism [15].
Our principal contribution is to show that two-body relativistic orbital models can be deduced from gravita- * Corresponding author: keith10@llnl.gov tional wave (GW) measurements by solving an inverse problem [16,17] where the control variable is the vector of weights and biases in a neural network. Our numerical examples use gravitational waveform measurements from both a noisy detector and "clean" measurements from numerical relativity (NR) simulations. As such, the techniques described here may apply to both traditional modeling endeavors that require NR data for calibration and GW astronomy. In the latter case, where the training data is comprised of GW observations, our inversion strategy avoids the need to solve Einstein's equation of general relativity to learn the orbital model.
A key goal of our paper is to develop the computational framework for learning binary black hole dynamical models from gravitational waves, which is a new approach to the modeling problem. As such, we focus on simple modeling choices and apply them to illustrative examples. We will show that simple ansatz models parameterized by feed-forward neural networks can be used to discover complicated dynamics. Indeed, despite starting with an essentially Newtonian ansatz model (cf. Eq. (5)), our trained models accurately capture both the relativistic dynamics and the waveform (cf. Sec. III). Potential applications are considered in the Discussion section.

A. Universal differential equations
In this work, we rely on the following general class of dynamical system models referred to in [8] as (au-arXiv:2102.12695v3 [gr-qc] 9 Nov 2021 tonomous) universal differential equations (UDEs), where x(t) is the solution vector and x 0 specifies the initial conditions. Here, F(x) is neural network and the overdot symbol "˙" denotes differentiation with respect to time t ∈ [0, T ]. For example, if F(x) = F(x; ξ) is a feed-forward deep neural network with two hidden layers, it can be written as with ξ = (W 1 , W 2 , W 3 , b 1 , b 2 , b 3 ). Here, W i are matrices (weights), b i are vectors (biases), and σ i are the chosen activation functions. We note that the UDE paradigm permits a immense variety of different parameterized functions F(x), not only neural network-based parameterizations. This flexibility provides numerous advantages, for example, prior scientific knowledge of the solution may be incorporated into the choice of function parametrization. We choose to use feed-forward neural networks, such as (2), because of the ease with which their parametrization can be determined using existing software [8].

B. BBH modeling
Our task is to define a family of physical models that can be used to describe relativistic orbital dynamics of two spherical objects of mass m 1 and m 2 . That is, we ask that our model provides the position of object 1, r 1 (t), and object 2, r 2 (t), which are the solutions to the dynamical system model. In Newtonian physics, this is the familiar two-body problem of Kepler whose solution has been known since the earliest days of classical mechanics. In relativistic physics, the field of computational relativity is largely devoted to providing numerical solutions to this problem by solving the equations of general relativity (a nonlinear, coupled, hyperbolic-elliptic PDE system) on large supercomputers [14]. In order to motivate a dynamical system model, we consider special cases whereby the dynamical motion described by PDEs can be approximately reduced to ordinary differential equations (ODEs). Such approximations have been well developed over many decades, and we refer the reader to [15,18,19] and references therein. Throughout this paper, we use geometric units where both the speed of light, c, and the gravitational constant, G, are set to unity.
First, for slow moving objects (v c), a powerful formalism known as the post-Newtonian approximation provides a systematic framework for adding relativistic corrections in powers of v/c to the Keplerian equations of motion. The post-Newtonian framework provides us with justification for treating the two-body problem as an effective one-body problem. Here, the relevant equations that govern the separation vector, r = r 1 − r 2 , can be used to reconstruct the two-body motion through the relations, where M = m 1 + m 2 . We call this an effective one-body problem as one often views there to be an "effective" body located at r relative to the system's center-of-mass. A different technique, known as the post-Minkowskian approximation, informs us that the dominant contribution to the gravitational radiation field can be computed according to the quadrupole formula, first derived by Einstein. Collectively, the two-body-to-one-body map and the quadrupole formula neglect a substantial amount of physics including terms proportional to v/c, as well as many more higher-order terms, some of which have been computed and others that remain unknown. Second, in the limit of m 1 m 2 , the two-body problem reduces to a simpler setup whereby the larger object is fixed at the coordinate system's origin. The smaller orbiting object's motion is then described by a geodesic path in the Schwarzschild geometry set by the larger black hole. Recent numerical evidence suggests that the geodesic equations of motion (11) with self-force corrections may work unreasonably well even for near-equal mass systems [20][21][22][23][24][25][26][27][28]. More importantly for our purposes, we know from blackhole perturbation theory results that using geodesic equations of motion to describe the two-body problem neglects a substantial amount of physics including terms proportional to m 2 /m 1 .
Having motivated some of the physics behind the problem, we now outline our strategy to write down a model inspired by a combination of Newtonian and relativistic physics. Our orbital model will omit a significant amount of important physics that will be accounted for by deep neural networks trained on gravitational waveform data.
We write the two-body problem as an effective onebody one, and associate the orbital separation vector, r, with the location of the fictitious effective body orbiting a fixed, spherically-symmetric central object. Owing to the spherical symmetry of the central object, we may assume, without loss of generality, that the effective object's trajectory lies in the equatorial plane, which we take to be the plane perpendicular to the z-axis and where the angle φ is between r and the x-axis. Orbits are specified by the orbital parameters eccentricity e(t) and semi-latus rectum p(t); in the Newtonian case these are constants while in the general relativistic case they are often interpreted as time-dependent functions. Finally, we use a well-known parameterization for the Euclidean norm of r, and evolve the anomaly χ(t) instead of r(t) because the anomaly increases monotonically through radial turning points. To summarize, we assume the equations of motion for the effective object can be described by four timedependent variables φ, χ, p, and e. The effective object's trajectory is provided φ, χ whereas p, e parameterize the orbital configuration.
Upon denoting x = (φ, χ, p, e), we propose the following family of UDEs to describe the two-body relativistic dynamics: with x(0) = (φ 0 , χ 0 , p 0 , e 0 ). Note that the functional form of Eqs. (5) have been inspired by Eq. (11), which are the geodesic equations of motion for an infinitesimally small "particle" orbiting a super-massive blackhole. In particular, Eqs. (5) are rotationally invariant because the right-hand side omits the φ-variable. Moreover, when F 3 = F 4 = 0, orbital energy, E(p, e), and orbital angular momentum, L(p, e), are conserved: Due to the emission of gravitational waves (so-called radiation-reaction), we have that bothĖ,L < 0 for all time. When each F j = 0, we recover Newtonian orbits. Eqs. (5a) through (5d) define a family of trajectories x(t) = x(t; ξ). Through Eq. (3), these trajectories determine the black hole orbits, Gravitational waves are generated by orbiting black holes, and so the waves encode detailed information about the dynamical variables r 1 (t) and r 2 (t). General relativity tells us that the dynamics and waves are connected through PDEs, which is a familiar scenario in the modeling of waves.
In the next section we summarize how to learn F j from gravitational wave measurements. Despite the simplicity of Eqs. (5), we show that the learned ODEs can describe dynamics beyond the base mechanical model (the base model corresponds to setting the neural network parameters ξ to zero and, thus, each F j = 0). In our first numerical experiment III A, for example, we show that bound orbits of a test particle following geodesic motion on a Schwarzschild geometry can be accounted for. In our final set of numerical experiments we show that the dissipative dynamics can also be accounted for. Due to the flexible framework of our waveform inversion technique, one can easily swap out our base model (5) for others; e.g., the EOB model [29]. This suggests many possible future applications of gravitational waveform inversion.
C. Quadrupole formula, the loss function, and model discovery Very far from a BBH system, where gravitational wave detectors are located, the gravitational radiation field is an outgoing spherical wavefront. On a sufficiently large sphere we can expand the radiation field into a complete basis of (tensorial) spherical harmonics labeled by ( , m) harmonic indices.
In this paper, we consider only the dominant ( , m) = (2, 2)-mode gravitational waveforms (cf. Appendix A 2), however, our waveform inversion technique could be modified to easily include subdominant modes. Accordingly, we denote all waveforms by the variable w = (r/M ) · Re{h 22 }, where and the trace-free mass quadrupole tensors I xx , I xy , and I yy are defined in Eq. (A10) (see also [30,). Eq. (8) is the well-known quadrupole formula which expresses the measurable waveform, h 22 , in terms of the orbits r 1 = (x 1 , y 1 , 0) and r 2 = (x 2 , y 2 , 0). The quadrupole formula is a very simple approximation that will necessarily introduce systematic error when learning F j , but is sufficient for our purpose. We assume that our waveform measurements appear as ordered pairs (t k , w k ), where w k denotes the value of the waveform data at time t k ∈ [0, T ]. In this setting, we define the mean-squared waveform error where J(x, t) = k w k − w(t) 2 δ(t − t k ) and bracket notation, · , denotes denotes averaging over the time interval. Accordingly, we choose to solve the inverse problem: In some situations, convergence to the solution of (10) can be improved by adding well-chosen, physics-informed penalty and regularization to (9); cf. Section III B. We note that the exclusive use of gravitational-wave data in the loss function is motivated by the consideration that in experimental settings only gravitationalwave observations will be available and never a direct view of black hole orbits. Even in computational relativity simulations, the numerical measurement of black hole trajectories are complicated by coordinate ambiguities of general relativity that make it difficult to assign physical significance to their values. Waveforms computed from computational relativity simulations, on the other hand, are well-defined and physically meaningful.
The ODE-constrained optimization problem (10) delivers the calibrated dynamical system modelẋ = f (x, F(x; ξ )), where ξ denotes the optimizer found by solving (10). This inverse problem can be solved with a number of standard methods. We choose to use a BFGS algorithm with backtracking line search [31] and an adjoint-based (implicit differentiation/adjoint sensitivity method) calculation of gradients [32] implemented with the Julia [33] software package DiffEqFlux [8]. The algorithm is described by the flowchart in Figure 1. Our code is available for download at [34].

III. RESULTS
In this section, we present results with three different examples. The first demonstrates the ability of (10) to recover known orbital equations. The second two showcase the discovery of new equations of motion for equal mass binary black hole mergers.
A. Extreme mass ratio systems As our first motivating example, we consider a special case of the relativistic two-body problem where the exact solution is known. We show that from short-duration gravitational wave observations we are able to discover differential equations that are valid over much longer time-scales.
In the regime of m 1 m 2 , formally the limit m 1 → M , m 2 is a "test particle" whose motion obeys [19,[35][36][37] while r 1 = (0, 0, 0) andė =ṗ = 0. We shall be interested in the parameter restriction 0 ≤ e < 1, for which the radial motion occurs between two turning points, pM/(1 + e) and pM/(1 − e) and the orbit is bounded. When e = 0, the orbit is circular. We let F 3 = F 4 = 0 and provide values for the initial conditions φ 0 = 0 and χ 0 = π. In our example, we set e = 0.5, p = 100, and m 1 = 1, although the results we show remain largely the same for other parameter values we have tested. For simplicity, in this first example we provide known values for e 0 , p 0 while in Sec. III B we show how our approach performs when these parameters are also learned. To prepare our ground-truth data, we numerically solve Eq. (11) on a dense time grid, thereby generating the black hole trajectory r 2 (t). We then apply the quadropole formula Eq. (8) to generate a gravitational waveform sampled at 250 equally-spaced points spanning the time interval [0, 0.6 · 10 5 ], and shown in Fig. 2 (bottom panel; black dots). Note that in the extreme mass ratio limit, m 2 → 0, and the waveform h 22 ∝ m 2 /m 1 goes to zero. Therefore, in this example, we use w = (m 1 /m 2 ) · Re{rh 22 } as gravitational-wave data; w is now independent of m 2 .

True trajectory
Learned trajectory Training interval 0 5 · 10 4 1 · 10 5 1.5 · 10 5 2 · 10 5 2.5 · 10 5 3 · 10 5 to learn the underlying two-dimensional dynamical system model governing the relativistic two-body problem in the extreme mass ratio limit with orbital parameters p = 100, e = 0.5. Top left: Learned (dashed red) and exact (solid blue) trajectories extrapolated 4× the training interval. We also show the portion of the orbit (black) corresponding to the gravitational-wave training window, although no orbital data was used to learn the dynamics. Top right: Relative error between the learned model (5) and the exact model (11). Bottom: Learned (dashed red) and exact (solid blue) waveforms extrapolated 4× the training interval.
Using the procedure summarized in Sec. II C, we recover the governing equations by optimizing for F 1 and F 2 . In this setting, both abstract functions only depend on cos χ. We exploit this periodic structure by defining F 1 and F 2 with cosine activation functions, σ j = cos. We then construct F 1 and F 2 as feed-forward neural networks with two hidden layers each; see, e.g., Eq. (2). The exact network architecture and numerical discretization can be found in the file EMR.jl in [34]. Finally, we learn the corresponding neural network weights and biases by optimizing (10).
This process delivers the red trajectory and waveform presented in Figure 2. Not only do both waveforms and trajectories match over the training interval consisting of about about 6 orbits, they continue to agree when the learned dynamics are extrapolated to about 31 orbits, after which the orbit's perihelion precession has undergone a full cycle. In Figure 2 we compare the true waveforms/trajectories to the learned waveforms/trajectories over the extended time interval [0, 3·10 5 ]. To compare the learned model (5) to the exact model (11), we also compute the error inφ andχ over the extended time interval. Evidently, not only do the waveforms and trajectories match upon visual inspection, the learned model matches the true mechanical model to about two orders of magnitude. The learned model also recovers important relativistic effects, notably perihelion precession, from just a few gravitational-wave cycles. Finally, we note that once the dynamical model is known, it can be used to generate very long orbits and gravitational wave signals by integrating the ODE (5) and post-processing the solution with the quadrupole formula (8).
This experiment demonstrates the potential power of waveform inversion in a simple setting with a known solution. It also demonstrates how the information content in the original waveform can be used to infer UDEs. Nevertheless, this system is conservative (ė =ṗ = 0), the quadrupole formula is prescribed exactly, and the learnable dynamics depend only on the χ-variable.
The following examples are more challenging because none of the aforementioned simplifications hold.

B. General relativistic orbital dynamics of binary black holes
In this pair of examples, we consider numerically generated waveform measurements from equal mass m 1 = m 2 = 0.5 binary black hole systems. Unlike the previous experiment, the orbital dynamics for these systems is much more complicated; the exact equations of motion are unknown and the dynamics include time-dependent values of the eccentricity and semi-latus rectum. Although an extensive body of literature exists for deriving these equations from approximations of general relativity [15,29,39], we are unaware of any data-driven approaches focused on discovering orbital dynamics from waveform measurements.
Computational relativity codes provide exact (up to numerical discretization error) solutions to the general relativistic two-body problem, including both the corresponding trajectories of the center of the black holes and gravitational-wave data. Although the location of the black holes are coordinate-dependent they can still be used to compare with the trajectories obtained from our model. However, such comparisons should no longer be understood as model error since the coordinate system used for the data and model are necessarily different.
In this new expression, we define where (f (t)) + = max{f (t), 0}, where 1 Ω denotes the indicator function on the set Ω ⊂ [0, T ], and finally where ξ denotes the 2 -norm of the expanded parameter vector ξ. It is standard practice to use large coefficients for penalty terms and small coefficients for regularization terms. However, as explained in the paragraphs below, our penalty terms are present to help avoid nonphysical local minima and are not active in the optimized model. For this reason, we do not make a concerted attempt to tune these coefficients. In both of the coming experiments, we somewhat arbitrarily fix γ 1 = 10 3 , γ 2 = 10 2 , γ 3 = 10 1 , and γ 5 = 10 −1 . In the first experiment, we take γ 4 = 1, while in the second experiment we use γ 4 = 0. The physical motivation for the terms in (13) relies on Eqs. (7). From these equations, we have that the distance between the two black holes r is proportional to p. Due to energy loss from the emitted gravitational waves, r(t) converges to zero at a rate that increases throughout the system's evolution. The penalty terms (ṗ) 2 + and (p) 2 + have been chosen to encourage the selection of solutions with this physical behavior. The first term in Eq. (14) encourages the selection of a positive eccentricity function e(t) for all time t. On the other hand, the final term in this definition is motivated by the stability condition p ≥ 6 + 2e for bound orbits. It is widely accepted that e decays in this range [35], and this term helps to direct the solution toward models with this property. Clearly, ifṗ,p, −e, (e − e 0 ) 2 · 1 {p>6+2e0} ≤ 0, then P 1 (x) = P 2 (x) = 0. Our experiments appear to indicate that optimal solutions p(t) and e(t) satisfy each of these bounds, therefore, the penalty terms only act as guardrails throughout the optimization process. The Tikhonov-Phillips regularization term (15) helps convergence by ensuring continuous dependence between the data and the solution [46]. The Tikhonov regularization term ξ 2 can also help avoid model degeneracies and overfitting in the presence of noisy data. For example, when the orbit is circular (e = 0) the model is degenerate in χ and, therefore, it is also degenerated in the weights and biases defining F 2 . Other penalty and regularization terms could be considered in future studies.
In the following pair of examples, we construct feedforward neural network parameterizations of F j , j = 1, . . . , 4, with tanh activation functions. The exact network architecture we use can be found in files SXS1.jl and SXS2.jl [34].  [38]. We also show the orbit computed from our learned dynamical system (blue and red lines). In the upper right panel, we show the evolution of the eccentricity and semi-latus rectum from our learned-dynamical system. The middle right panel shows the disagreement between the NR trajectories and the ones computed from the learned dynamical system. We caution the reader that this figure should not be understood as a relative error because the numerical relativity black hole trajectories and our learned model are expressed in different coordinate systems that are impossible to relate. Bottom: Learned (red line) and computational relativity (black dots) waveform data corresponding to the real part of the h 22 mode.

Near-circular orbits from clean GW observations
For this experiment, we consider a binary black hole system with negligible eccentricity during the initial inspiral. For inspection, the center of mass-corrected trajectories of the binary black hole system are depicted in the top left-hand corner of Figure 3 (solid black lines), with the associated 1000 equally-spaced waveform data points in the bottom panel (black dots). From now on, we let [0, T ] denote the time interval between the first (t = 0) and final (t = T ) measurement, where the final measurement occurs shortly before merger.
As in the previous experiment, we adopt the initial conditions φ 0 = 0 and χ 0 = π and assume that r 0 is known. Using Eq. (4), these assumptions provide us with an explicit expression for p 0 , M p 0 = r 0 · (1 + e 0 cos(χ 0 )). Due to the nearly-zero eccentricity of the initial trajectories, we opt for the simple initial condition e 0 = 0. In the next and final experiment, we treat the more realistic case where both e 0 and χ 0 are unknown.
In order to avoid local minima, we solve (10) on a sequence of increasing time intervals [0, T 0 ] [0, T 1 ] · · · [0, T ], using the optimal parameters ξ from each preceding optimization problem (plus a small amount of Gaussian noise) as initial data for the subsequent problem. Using this incremental procedure, we are able to recover the overwhelming majority of the black hole trajectories, as indicated by the visual agreement between the learned (red and blue) and NR (black) trajectories shown in the top left-hand panel of Figure 3. We also depict the relative disagreement between the NR trajectory of the first black holer 1 and the learned trajectory r 1 . The model also recovers important general relativistic effects, notably the learned functions F 3 and F 4 cause a runaway inspiral process that drives the black holes to merge. This process is seen most clearly by monitoring the behavior of p(t) in the upper right-hand panel of Figure 3. We also observe that our model is able to naturally include both the inspiral (p > 6) and plunge (p < 6) orbital regimes. Note the upper right-hand panel of Figure 3 shows that near this transition region the eccentricity quickly grows, which is at odds with our physical expectation for stable orbits and indicates a very different dynamical regime.
One complication in validating our learned dynamical system is how to perform meaningful comparisons with other models. Indeed, besides the waveform, the other three sub-panels shown in Fig. 3 depict gauge-dependent quantities; that is their value depends on the coordinate system being used. In particular, our trajectories are not expressed in the same damped harmonic gauge coordinates used by SpEC simulations [47]. Nevertheless, recent studies have noted surprisingly good agreement between NR trajectories and those computed with post-Newtonian (PN) models [48] and PN-augmented dynamical models [49][50][51]. In particular, Ref. [49] conjectures that the main source of disagreement is due to the PN formula being expressed in the harmonic gauge [15]. The close agreement between NR and UDE trajectories shown in Fig. 3 is another example of surprisingly good agreement [48,49].
To avoid gauge ambiguities, comparisons of BBH dynamics focus on comparing gauge-invariants that are computable within different frameworks. For example, Refs. [52,53] explore the conservative dynamics by comparing the relationship between the total energy and total angular momentum from NR data to the corresponding analytical predictions from PN and EOB theory. However, the identification of a conserved energy or angular momentum within our setup is not obvious as our equations are not derived from a Hamiltonian; we will return this issue in the Discussion section. Instead, we follow Ref. [28] and compute the accumulated orbital phase as a function of the orbital frequency. This quantity includes both dissipative and conservative effects, and can be computed within different modeling frameworks.
The NR orbital phase is defined in terms of the wave-form data as follows: We also compute the orbital phase from a recently developed precessing EOB model (SEOBNRv4PHM) [54], a numerical relativity surrogate model (NRSur7dq4) [55], and our UDE model. Both NRSur7dq4 and SEOBNRv4PHM are considered state-of-the-art and have been used by the LIGO-Virgo Collaboration to analyze recent gravitational-wave observations [13,56,57]. We represent each orbital phase by a degree 3 spline (using a smoothing factor of 0.0002), from which we compute the orbital frequency, by taking a derivative of the spline, and finally forming the function φ orb (Ω) = φ orb (t(Ω)). We then compare φ orb NRSur7dq4 , φ orb SEOBNRv4P , and φ orb UDE to φ orb NR . For each comparison, we form the difference, ∆φ = φ orb model − φ orb NR after phase alignment 1 . Figure 4 shows ∆φ for each model. Over the range of orbital frequencies shown, the L 2 -error in the phase is 2.5 × 10 −2 (UDE model), 2.0 × 10 −2 (NRSur7dq4), and 2.2 × 10 −2 (SEOBNRv4PHM). All three models do an excellent job at tracking the NR orbital phase throughout the late inspiral phase to Ω ≈ 0.16, which is when a common apparent horizon appears in the NR simulation. One of the most important practical uses of a dynamical model is as an intermediate step towards generating gravitational waveforms. While a full study is outside the scope of this paper, we provide a preliminary look at this here. We compute an L 2 -type error measurement (see Eq. 21 from Ref. [50]) between the complexified NR waveform and each model's prediction of the waveform after optimizing for phase and time alignments. We find the errors to be 3.1 × 10 −3 (UDE model), 1.2 × 10 −3 (SEOBNRv4PHM), and 1.1 × 10 −5 (NRSur7dq4). All three models have been calibrated to q = 1 NR waveform data, so this comparison is only meant to be suggestive of how well the modeling techniques can perform on the training set and not its generalization error.
We note that all three models used in our comparisons have been built in very different ways. The SEOBNRv4PHM model is a highly sophisticated analytical model thats been under investigation for two decades [29] while the NRSur7dq4 model was trained against 1528 NR simulations using numerical techniques that have been in development for nearly a decade [58,59]. By comparison, our UDE model is new and our modelization choices are simple. Given that our UDE model is able to perform comparably well against state-of-the-art models demonstrates the potential of waveform inversion as new tool for model builders to consider in future work.

Eccentric orbits from noisy GW measurements
This experiment proceeds in much the same way as the previous one. Here, however, we learn the dynamics of an eccentric binary black hole system whose trajectories are depicted in the top left-hand corner of Figure 5 (solid black lines) with the associated waveform data in the bottom panel (black dots). Unlike the previous experiments, we do not assume known values for the initial conditions e 0 , p 0 , or χ 0 but instead make these part of the learning process. We continue to adopt the initial conditions φ 0 = 0. As can be seen in Figure 5, we introduce additive Gaussian noise to the waveform data of the form w(t i ) + n(t i ), where n(t i ) is draw from a normal distribution of mean 0 and standard deviation of σ = 10 −2 . As the typical waveform amplitude is ∼ 0.1, this corresponds to a coefficient of variation of around σ/0.1 = 0.1.
In spite of these new challenges, we are still able to recover the original trajectories as indicated by the visual agreement between the learned (red and blue) and NR (black) trajectories shown in the top left-hand panel of Figure 5. This is achieved by simultaneously optimizing for both e 0 or χ 0 in (10), in addition to the neural network parameters ξ, and deducing the associated value of p 0 directly from Eq. (4). The model also recovers important general relativistic effects, notably, as in the previous example, radiation-reaction effects. In this case, the tendency for the orbit to circularize, e(t) → 0, is seen in the upper right-hand panel of Figure 5. As before, the eccentricity has an inflection point around p ≈ 6 and quickly grows thereafter.
As follow-on to this experiment, we test the stability of the learned solution to different signal-to-noise ratios (SNRs). Specifically, we use the learned solution parameters ξ, e 0 , and χ 0 from Figure 5 as initial guesses in a new set of model discovery problems that fit the original SXS:BBH:1356 waveform data, but have white Gaussian noise of different variance superimposed. After solving each of these new model discovery problems, we measure how much the learned waveform differs from the true waveform. The curve in Figure 6 shows the relative error in the learned waveform as a function of the SNR in the waveform measurement. The relative error grows as the SNR decreases, however, the results are surprisingly accurate down to SNR≈ 32. We note that while all BBH gravitational-wave detections to-date have had SNRs below 32 [13], future detectors such as LISA [60], Cosmic Explorer [61], or Einstein Telescope [62] will routinely detect events with an SNR greater than 32.

IV. DISCUSSION
We have presented a new, data-driven gravitational waveform inversion strategy which generates mechanical models of binary black hole systems. We start with a structurally very simple set of universal differential equations and parameterize the space of models with feedforward neural networks. Our differential equations are trained by solving a physics-informed constrained optimization problem that seeks to minimize the waveform error. We tested our method on various BBH systems including extreme and comparable mass ratio systems in eccentric and non-eccentric orbits, and train on portions of the waveform corresponding to orbital plunge right up to the time of merger. We find that the resulting differential equations agree remarkably well with the black hole trajectories computed through purely numerical means. Our models can be extrapolated in time and recover various known relativistic effects despite these being previously unknown to the universal differential equations. The main contribution of our work is to show that twobody relativistic models can be deduced from gravitational wave measurements.
To describe the computational framework, we have focused on simple choices such as our ansatz neural ODE model Eq. (5) and the quadrupole formula for computing the harmonic modes. These modeling choices are distinct from the overall computational framework and can be easily swapped out for different choices. For example, future high-accuracy studies should seek better orbit-towaveform maps as the quadrupole formula is likely a large source of systematic error.
Our framework for learning the dynamics of binary black holes is quite general, and we expect that it can be applied to a variety of cases we have not considered including unequal masses, aligned-spins systems, and precessing systems. Our method should be espe-  [38]. We also show the orbit computed from our learned dynamical system (blue and red lines). In the upper right panel, we show the evolution of the eccentricity and semi-latus rectum from our learned-dynamical system. The middle right panel shows the disagreement between the NR trajectories and the ones computed from the learned dynamical system. We caution the reader that this figure should not be understood as a relative error because the numerical relativity black hole trajectories and our learned model are expressed in different coordinate systems that are impossible to relate. Bottom: Learned (red line) and computational relativity (black dots) waveform data corresponding to the real part of gravitational waveforms. Here, we also include the learned imaginary part of the waveform (blue line), reconstructed with the quadrupole formula, and compare it with the reference imaginary part taken from the SXS database (black line).

Merger
cially useful for discovering equations of motion for systems where traditional approaches are less well-developed including eccentric binaries, the highly relativistic lateinspiral and plunge dynamical regimes, and beyond-GR theories. Given the close agreement with NR trajecto-ries, other possible applications could include setting NR initial data whereby the neural ODEs could be used to predict an NR trajectory before the simulation is performed.
One of the most important applications of our tech-  , which is related to the overlap error (cf. Ref. [50]) commonly used in gravitational-wave data analysis.
nique may involve calibrating existing orbital dynamics models (and high-accuracy gravitational-wave models) by using a base model different from Eq. (5). Given that all modern inspiral-merger-ringdown waveform models require calibration of unknown parameters to numerical relativity simulations, the waveform inversion technique described here could benefit these efforts. For example, if implemented within the effective-one-body (EOB) approach, a suitably modified version of our methodology could be used to calibrate for missing terms in the EOB Hamiltonian. Precessing NR surrogate models also require a dynamical model, which is found through a direct fitting for the right-hand-side of the relevant ODEs. This computationally costly step might be simplified and accelerated with our techniques.
Another potential use of our methodology is training dynamical models entirely from gravitational-wave datasets instead of solving or analyzing Einstein's equation of general relativity as is traditionally done. In Fig. 6 we explore how well the algorithm performs as the signal's SNR is systematically varied. We find that, at least for the examples considered here, the method continues to work down to an SNR of about 32. Consequently, our method is most applicable for future GW detections, including those made with LISA [60], Cosmic Explorer [61], Einstein Telescope [62], or perhaps the upcoming LIGO-Virgo-Kagra science run. Our method will need to be modified to achieve SNR levels of ap-proximately 10 (which would cover most GW detections to date [13]) without compromising accuracy. Some approaches could include: (i) comparing waveforms with the Wasserstein metric which is known to be more robust to phase trappings, which shows up in higher noise, (ii) detecting an ensemble of noisy signals, and training on this observation set, (iii) using Bayesian networks where the network parameters are probability functions, or (iv) to apply our technique to filtered waveforms using a model-agnostic approach such as wavelet methods or denoising methods. We leave such extensions for future work. lies in computing total derivatives of the functional with respect to ξ. Indeed, assume that we are working with the definition of J given in Eq. (9). Here, any gradient-based optimization algorithm requires the calculation of the total derivative One may easily note that, in the specific setting given to us through Eq. (9), we have J = J (x) and, therefore, the partial derivative ∂ ξ J vanishes. In more general situations, the term ∂ ξ J is routine to derive. On the other hand, the calculation of d ξ x is problematic due to the fact that x(ξ) is not available in closed form. One approach to computing d ξ J involves directly estimating d ξ x via finite differences, however, the cost of this approach scales linearly with the size of ξ. This makes it prohibitive for most practical problems. We choose to compute these gradients using in an alternative way often referred to as the adjoint sensitivity method [32]. The adjoint sensitivity method has been used extensively in engineering design [63,64] and, more recently, in machine learning research [4,8,65]. It involves the integration of two ODEs over the time interval [0, T ]: the original governing ODE (A1) and an adjoint ODE (integrated backwards in time).
The derivation of this method usually begins with the Lagrangian which comes from writing the ξ-minimization of (A2), constrained by solutions of (A1), as a saddle-point problem [66]. This functional is clearly designed such that d ξ J = d ξ L for any x satisfying (A1). In addition, one observes that if ∂ λ L = ∂ µ L = ∂ x L = 0. A straightforward calculation shows that the first two of these equations are equivalent to the original dynamical system (A1). On the other hand, ∂ x L = 0 is equivalent to the adjoint equation Evidently, the ODE system (A7) depends on the solution to (A1), x(t). Therefore, the algorithm for computing d ξ J must follow a specific order: 1. Integrateẋ = F(x; ξ) from t = 0 to T with the initial condition x(0) = x 0 .

Compute d
Extension of this algorithm to the scenario where the initial condition x 0 is also optimized for (as considered in, e.g., Section III B 2) is straightforward. For thorough accounts of the numerical implementation of the adjoint sensitivity method for UDEs, we refer the interested reader to [8,67].

Gravitational waves from an orbit
In the context of general relativity, gravitational waves are associated with the outgoing, radiative parts of the spacetime metric and are solutions to Einstein field equation. The motion of massive objects produce gravitational waves, and our model outlined in Sec. II B provides the equations of motion for object 1 of mass m 1 and position r 1 (t), and object 2 of mass m 2 and position r 2 (t).
The quadrupole formula provides one simple method of obtaining the gravitational radiation from these orbital trajectories. In this framework, we assume both black holes to be "point sources" (i.e. Dirac delta functions). The Newtonian mass density of two objects orbiting in the x-y plane is ρ(t, x, y, z) = m 1 δ(z)δ(x − x 1 (t))δ(y − y 1 (t)) + m 2 δ(z)δ(x − x 2 (t))δ(y − y 2 (t)) , (A8) and note that in the special case m 2 m 1 we have r 2 (t) = (0, 0, 0) and r 1 (t) = r(t). Given the density, the quadrupole formula tells us that the dominant quadrupole mode of the gravitational radiation field tensor in the transverse-traceless gauge is where the trace-free mass quadrupole tensor is δ ab is the Kronecker delta. In Cartesian coordinates, the indicies take on values of "x", "y", and "z". For example, we have H xx , H yy , H xy , etc. The components of the mass quadrupole tensor, I ab , that are relevant to Eq. (A9) (non-zero temporal derivatives) are I yy = d 3 xρy 2 = m 1 y 1 (t) 2 + m 2 y 2 (t) 2 , (A11b) I xy = d 3 xρxy = m 1 x 1 (t)y 1 (t) + m 2 x 2 (t)y 2 (t) , and by symmetry I xy = I yx . This framework computes the gravitational wave as perturbations, H ab , of the background spacetime metric. However, numerical simulations instead provide the plus, h + and cross, h × , gravitational-wave polarizations defined on a "sphere at infinity", which are obtained after contracting H ab with the polarization tensors [30]. On this sphere, it is common to define a complexified gravitational wave and subsequently expand h into a complete basis of spin =−2 weighted spherical harmonics labeled by ( , m) harmonic indices, −2 Y m . Here θ and φ are the polar and azimuthal angles. For example, the SXS (2, 2) mode gravitational waveform data, h 22 , was used extensively in this paper. Given the orbital trajectories, the (2, 2) mode, can be computed directly from the trace-free mass quadrupole tensor [30,.
To summarize, from the orbital motion we compute the three components of the mass quadrupole tensor Eq. (A11), compute the trace-free mass quadrupole tensor, compute the time derivatives using finite differences, and then finally assemble the (2,2)-multipolar component of the outgoing gravitational waves, h 22 , from Eq. (A14).
While a full discussion is outside the scope of this appendix, we point out that the quadrupole formula is the simplest possible one and, consequently, ignores a lot of the relevant physics.
Future work could consider incorporating more physics into the gravitational waveform model, including relativistic definitions of the density, higher-order post-Minkowskian corrections, subdominant harmonic modes, or near-field-tofar-field transport maps. Nevertheless, some of missing physics might already be accounted for through the orbital model, where the deep networks may try to account for missing physics in the waveform model by modifying the orbital dynamics model.