Surrogate models for precessing binary black hole simulations with unequal masses

Only numerical relativity simulations can capture the full complexities of binary black hole mergers. These simulations, however, are prohibitively expensive for direct data analysis applications such as parameter estimation. We present two new fast and accurate surrogate models for the outputs of these simulations: the first model, NRSur7dq4, predicts the gravitational waveform and the second model, surfinBH7dq4, predicts the properties of the remnant black hole. These models extend previous 7-dimensional, non-eccentric precessing models to higher mass ratios, and have been trained against 1528 simulations with mass ratios $q\leq4$ and spin magnitudes $\chi_1,\chi_2 \leq 0.8$, with generic spin directions. The waveform model, NRSur7dq4, which begins about 20 orbits before merger, includes all $\ell \leq 4$ spin-weighted spherical harmonic modes, as well as the precession frame dynamics and spin evolution of the black holes. The final black hole model, surfinBH7dq4, models the mass, spin, and recoil kick velocity of the remnant black hole. In their regime of validity, both models are shown to be more accurate than existing models by at least an order of magnitude, with errors comparable to the estimated errors in the numerical relativity simulations.


I. INTRODUCTION
As the LIGO [1] and Virgo [2] detectors reach their design sensitivity, gravitational wave (GW) detections [3][4][5][6][7][8][9] are becoming routine [10,11].To maximize the science output of the data collected by the network of detectors, it is crucial to accurately model the source of the GWs.Among the most important sources for these detectors are binary black hole (BBH) systems, in which two black holes (BHs) lose energy through GWs, causing them to inspiral and eventually merge.
Numerical relativity (NR) simulations are necessary to accurately model the late inspiral and merger stages of the BBH evolution.These simulations accurately solve Einstein's equations to predict the evolution of the BBH spacetime.The most important outputs of NR simulations are the gravitational waveform and the mass, spin, and recoil kick velocity of the remnant BH left after the merger.
For interpreting detected signals, model waveforms are used to compare with detector data and infer the properties of the source [12][13][14].The mass and spin of the remnant determine the black hole ringdown frequencies, which are used in testing general relativity [15][16][17].In addition, the recoil kick is astrophysically important because it can cause the remnant BH to be ejected from its * vvarma@caltech.eduhost galaxy [18][19][20].
Among BBHs, systems with BH spins that are misaligned with respect to the orbital angular momentum are complicated to model analytically or semi-analytically.For these systems, the spins interact with both the orbital angular momentum and each other, causing the system to precess about the direction of the total angular momentum [51].This precession is imprinted on the waveform as characteristic modulations in the amplitude and frequency of the GWs, and can be used to extract information about the spins of the source.One important application of the extracted spins is to distinguish between formation channels of BBHs [52][53][54][55].
The precessing BBH problem for quasicircular orbits is parametrized by seven parameters: the mass ratio q = m 1 /m 2 ≥ 1 and two spin vectors χ 1,2 , where the index 1 (2) refers to the heavier (lighter) BH.The total mass scales out of the problem and does not constitute an additional parameter for modeling.The surrogate models of Ref. [56] for the gravitational waveform, and Ref. [57] for the remnant properties, were the first to model the 7−dimensional space of generically precessing BBH systems, albeit restricted to mass ratios q ≤ 2, and dimensionless spin magnitudes χ 1,2 ≤ 0.8.Trained directly against numerical simulations, these models do not need to introduce additional assumptions about the underlying phenomenology of the waveform or remnant properties that necessarily introduces some systematic error.Through cross-validation studies, it was shown that both these models achieve accuracies comparable to the numerical simulations themselves [56,57], and as a result, are the most accurate models currently available for precessing systems, within their parameter space of validity.
In this paper, we present extensions of the above surrogate models to larger mass ratios.Our new surrogate models are called NRSur7dq4 and NRSur7dq4Remnant, for the gravitational waveform and remnant properties, respectively.They are trained against 1528 precessing NR simulations with mass ratios q ≤ 4, spin magnitudes χ 1 , χ 2 ≤ 0.8, and generic spin directions.Both models are made publicly available through the gwsurrogate [58] and surfinBH [59] Python packages; example evaluation codes are provided at Ref. [60] and Ref. [59], respectively, for NRSur7dq4 and NRSur7dq4Remnant.
The rest of the paper is organized as follows.Section.II covers some preliminaries to set up the modeling problem for precessing BBH systems.Section III describes the training simulations.Sec.IV describes the NRSur7dq4 waveform surrogate model.Section V describes the NR-Sur7dq4Remnant remnant properties surrogate model.Section VI compares these models against NR simulations to assess their accuracy.Finally, Sec.VII presents some concluding remarks.In App.A we examine how accurate these models are when extrapolated beyond mass ratio q = 4, and in App.B we investigate some features in the error distribution of the NR simulations.

II. PRELIMINARIES AND NOTATION
It is convenient to combine the two polarizations of the waveform into a single complex, dimensionless strain h = h + − ih × , and to represent the waveform on a sphere as a sum of spin-weighted spherical harmonic modes: Here −2 Y m are the spin =−2 weighted spherical harmonics, and ι and ϕ 0 are the polar and azimuthal angles on the sky in the source frame.For nonprecessing systems, the direction of orbital angular momentum (L) is fixed and the ẑ direction of the source frame is chosen to be along L by convention.The gravitational radiation is strongest along the directions parallel and antiparallel to L. Therefore, for nonprecessing systems the quadrupole modes ( = 2, m = ±2) dominate the sum in Eq. ( 1), but the nonquadrupole modes can become important at large mass ratios or ι close to π/2 [61][62][63][64][65][66][67][68][69][70] Coorbital frame FIG. 1.The real part of the (2, 2) and (2, 1) modes of the gravitational waveform in the inertial (top), coprecessing (middle), and coorbital (bottom) frames.In the inertial frame, the amplitude of the (2, 1) mode can be comparable to that of the (2, 2) mode.In the coprecessing frame, on the other hand, the (2, 2) mode always dominates.In addition, most effects of precession are removed by the rotation and the waveform in the coprecessing frame resembles that of a nonprecessing system.In the coorbital frame, finally, the waveform is further simplified and does not oscillate about zero.Mass ratio and initial spins used to produce this figure are indicated at the top of the plot.
By contrast, for precessing systems the direction of L varies due to precession [51] and so there is not a fixed axis along which the radiation is dominant.The standard practice is to choose ẑ of the source frame along the direction of L (or the total angular momentum) at a reference time or frequency.
Heuristically, one can think of a precessing system as a nonprecessing system with time-dependent frame rotations applied to it.In this non-inertial frame the rotation causes mixing of power between modes of fixed .For example, the power of the (2, ±2) modes leaks into the (2, ±1) and (2, 0) modes.This means that all = 2 modes can be dominant in Eq. (1).While this rotating-frame picture ignores some dynamical features such as nutation, it accounts for most of the effects of precession in the waveform.
By the same logic, one could apply a time-dependent rotation to a precessing system such that ẑ always lies along L(t).In this non-inertial frame, referred to as the coprecessing frame [71][72][73], the radiation is always strongest along ẑ, and the ( = 2, m = ±2) modes are dominant.In fact, since most precessional effects are accounted for by the frame rotation, the waveform in the coprecessing frame is qualitatively similar to that of a nonprecessing system (cf.Fig. 1).This observation has been exploited in the literature [21,24,27,56,74] to simplify the modeling of precessing systems.Here we proceed similarly, using the coprecessing frame described in Ref. [73] and denoting the strain in this frame as h copr m .The waveform can be made even simpler, and therefore easier to model, by applying an additional rotation about the z−axis of the coprecessing frame by an amount equal to the instantaneous orbital phase: Here we define the orbital phase, using the coprecessing frame strain.The waveform h coorb m (t) corresponds to a new frame, called the coorbital frame, in which the BHs are always on the x−axis, with the heavier BH on the positive x−axis 1 .More importantly, the waveform in the coorbital frame is nearly nonoscillatory, simplifying the modeling problem greatly.Figure 1 shows an example of a waveform in the inertial, coprecessing, and coorbital frames.

A. Parameter space coverage
We use 890 precessing NR simulations used in the construction of the surrogate models of Refs.[56,57], which provide coverage in the q ≤ 2 and χ 1 , χ 2 ≤ 0.8 regions of the parameter space.We also make use of 64 aligned-spin simulations with q ≤ 4 and χ 1 , χ 2 ≤ 0.8 used in the construction of the surrogate model presented in Ref. [82].Finally, we performed 574 new simulations with 2 < q ≤ 4, χ 1 , χ 2 ≤ 0.8 and generic spin directions-these simulations are presented here for the first time.The parameters for the first 204 of these are chosen based on sparse grids as detailed in Appendix A of Ref. [56].The remaining parameters are chosen as follows.We randomly sample 1000 points uniformly in mass ratio, spin magnitude, and spin direction on the sphere.We compute the distance between points a and b using the metric where ∆q = 4 − 1 = 3 and ∆χ = 0.8 are the ranges of these parameters.These normalization factors are somewhat arbitrary, although any choice of order unity should provide a reasonable criteria for point selection.For each sampled parameter, we compute the minimum distance to all previously chosen parameters.We then add the sampled parameter maximizing this minimum distance to the set of chosen parameters.This is done iteratively for 370 additional parameters.The new simulations have identifiers SXS:BBH:1346-1350 and SXS:BBH:1514-2082, and are made publicly available through the SXS public catalog [83].The parameter space covered by the 890+64+574=1528 NR simulations used in this work is shown in Fig. 2. Note that not all of these are independent simulations: for 154 of these cases we have q = 1, with χ 1 = χ 2 ; for each of these cases we effectively obtain an additional simulation by exchanging the labels of the two BHs.
The start time of these simulations varies between 4693M and 5234M before the peak of the waveform amplitude, where M = m 1 + m 2 is the total Christodoulou mass measured close to the beginning of the simulation at the "relaxation time" [84].The initial orbital parameters are chosen through an iterative procedure [85] such that the orbits are quasicircular; the largest eccentricity for these simulations is 9.8 × 10 −4 , while the median value is 3.8 × 10 −4 .

B. Data extracted from simulations
We make use of the following quantities extracted from the NR simulations: the waveform modes h m (t), the component spins χ(t), the mass ratio q, and the remnant mass m f , spin χ f , and kick velocity v f .
The waveform is extracted at several extraction spheres at varying finite radii from the origin and then extrapolated to future null infinity [84,86].Then the extrapolated waveforms are corrected to account for the initial drift of the center of mass [87,88].The time steps during the simulations are chosen nonuniformly using an adaptive time-stepper [84].Using cubic splines, we interpolate the real and imaginary parts of the waveform modes to a uniform time step of 0.1M ; this is dense enough to capture all frequencies of interest, including near merger.The interpolated waveform at future null infinity, scaled to unit mass and unit distance, is denoted as h m (t) in this paper.q FIG. 2. Parameters of the 1528 NR simulations used in the construction of the surrogate models in this paper.We show the distribution of mass ratio q and the spin components in standard spherical polar coordinates (χ, θ, φ) at −4300M from the waveform amplitude peak.The index 1 (2) refers to the heavier (lighter) BH.
The component spins χ 1,2 (t) and masses m 1,2 are evaluated on the apparent horizons [77] of the BHs.The masses at the relaxation time [84] are used to define the mass ratio q = m 1 /m 2 .Unless otherwise specified, all masses in this paper are given in units of the total mass M = m 1 + m 2 at relaxation.The spins are interpolated onto the same time array2 as used for the waveform, using cubic splines.
The remnant mass m f and spin χ f are determined from the common apparent horizon long after ringdown, as detailed in Ref. [84].The remnant kick velocity is derived from conservation of momentum, v f = −P rad /m f [90].
measured in the near zone [84].In this paper, we identify t with t.While this identification is gauge-dependent, the spin directions are already gauge-dependent.We, however, note that the spin and orbital angular momentum vectors in the damped harmonic gauge used by SpEC agree quite well with the corresponding vectors in post-Newtonian (PN) theory [89].
The radiated momentum flux P rad is integrated [91] from the strain h m .

C. Post-processing the output of NR simulations
After extracting the strain and spins from the simulations, we apply the following post processing steps before building the surrogate models.
First, we shift the time arrays of all waveforms such that t = 0 occurs at the peak (see Ref. [56] for how the peak is determined) of the total waveform amplitude, defined as: ( Then we rotate the waveform modes such that at a reference time t 0 = −4300M , the inertial frame coincides with the coorbital frame.This means that the ẑ direction of the inertial frame is along the principal eigenvector of the angular momentum operator [73] at the reference time.In addition, the x direction of the inertial frame is along the line of separation from the lighter BH to the heavier BH (in other words, the orbital phase is zero).The spin vectors χ 1,2 (t) are also transformed into the same inertial frame.
We then truncate the waveform and spin time series by dropping all times t < −4300M to exclude the initial transients known as "junk radiation".After the truncation, the reference time t = −4300M is also the start time of the data.
For t > −100M , the spin measurements from the apparent horizons start to become unreliable as the horizons become highly distorted.Following Ref. [56], starting at t = −100M , we extend the spins to later times using PN spin evolution equations.This evolution is done even past the merger stage, into the ringdown.We stress that the extended spins are unphysical but are a useful parametrization to construct fits at late times.
Finally we apply a smoothing filter (see Eq. ( 6) of Ref. [56]) on the spin time series to remove fast oscillations taking place on the orbital timescale.This smoothing helps improve the numerical stability of the ordinary differential equation (ODE) integrations described in Sec.IV B. Note that we use the filtered spins for the waveform surrogate (Sec.IV) but not for the remnant surrogate (Sec.V), for which we just use the unfiltered spins since there are no ODE integrations involved.

IV. WAVEFORM SURROGATE
To construct the waveform surrogate, we closely follow the model of Ref. [56], with some modifications to adapt it to higher mass ratios.We refer to the new waveform model as NRSur7dq4.and h − 2,2 (cf.Eq. 6), we take advantage of the above fact to move most of the oscillations from the larger to the smaller data piece.

A. Coorbital frame surrogate
Modelling slowly varying functions of time, rather than oscillatory functions, inevitably results in higher accuracy.Therefore, we first decompose the strain into several "data pieces", each of which is a slowly varying function of time, and build a surrogate for each of them.At evaluation time, we combine the various data pieces to reconstruct the inertial frame strain.To reduce the cost of these transformations, we first downsample the inertial frame strain onto a set of 2000 time values t coorb i that are approximately uniformly spaced in the orbital phase (using the method described in App.B of Ref. [56]).
As described in Sec.II, the waveform is simpler in the coorbital frame.A further simplification is possible by considering combinations of m > 0 and m < 0 counterparts of a fixed mode: Figure 3 shows an example of the simplification obtained with this combination.For all m = 0 modes we model the real and imaginary parts of h ± m .For m = 0 modes, we directly model the real and imaginary parts of the coorbital frame strain h coorb ,m .We construct an independent surrogate model for each of these data pieces and refer to the combination of these models as the coorbital frame surrogate.
As described in Ref. [56], for each waveform data piece, we construct a linear basis using singular value decomposition with an RMS tolerance of 3 × 10 −4 .We then construct an empirical time interpolant with the same number of empirical time nodes as basis functions for that data piece [92][93][94].The empirical time nodes are chosen as a subset of the 2000 coorbital time values (t coorb i ) and are specific to each data piece.Finally, for each empirical time node, we construct a parametric fit for the waveform data piece.The fits are parametrized as functions of the mass ratio and the spins in the coorbital frame at that time.We describe our fitting procedure in Sec.IV C. At evaluation time, the coorbital frame spins at any time are obtained using the dynamics surrogate described in Sec.IV B.

B. Dynamics surrogate
The surrogate described in Sec.IV A only models the strain in the coorbital frame.We also need to model the following quantities: 1.The orbital phase in the coprecessing frame, which is required to transform the strain from the coorbital frame to the coprecessing frame [cf.Eq. ( 2)]; 2. The quaternions describing the coprecessing frame, which are required to transform the strain from the coprecessing frame to the inertial frame; 3. The spins as a function of time, which are used in the evaluation of the parametric fits described in Sec.IV C.
We refer to the model for these quantities as the dynamics surrogate.Using the fitting method of Sec.IV C, we first construct parametric fits for ω(t), Ω coorb x,y (t), and χcoorb 1,2 (t) at selected time nodes referred to as the dynamical time nodes t dyn i .Here χcoorb 1,2 (t) are the time derivatives of the coprecessing frame spins transformed to the coorbital frame, ω(t) is dφ/dt (cf.Eq. ( 3)), and Ω coorb x,y (t) is the angular velocity of the coprecessing frame, transformed to the coorbital frame.These quantities are described in more detail in Sec.III of Ref. [56] ).This involves solving a coupled ODE as described in Sec.V of Ref. [56].At each step of the ODE integration, the coorbital frame spins at the current node t dyn i are first obtained.These are then used to evaluate the parametric fits for the derivative quantities mentioned above.Note that the spins used in the dynamics surrogate are the filtered spins mentioned in Sec.III C; this improves the accuracy of the ODE integration by making the spin time derivatives easier to model.

C. Parametric fits
For the coorbital frame surrogate of Sec.IV A, we need to construct parametric fits at various empirical time nodes for the different data pieces.Similarly, for the dynamics surrogate of Sec.IV B, we need to construct fits for various time derivatives at the dynamical time nodes t dyn i .We use the same procedure for each of these fits.Let us refer to the data to be fitted as y(Λ), where Λ is a seven-dimensional set of parameters.
For each of these fits, the seven parameters Λ must contain information on mass ratio q and coorbital frame spins χ coorb 1,2 (t i ) at the time corresponding to the fit.Following Ref. [57], we parametrize the fits using where χcoorb is the spin parameter entering the GW phase at leading order [12,26,95,96] and χ coorb a is the "anti-symmetric spin", We empirically found this parameterization to perform more accurately than the more intuitive choice ] used in Ref. [56].
Fits are constructed using the forward-stepwise greedy fitting method described in App.A of Ref. [74].We choose the basis functions to be a tensor product of 1D monomials in the components of Λ.The components of Λ are first affine mapped to the interval [−1, 1] before constructing the tensor product.We consider up to cubic powers in log(q) and up to quadratic powers in the spin parameters.We find that going to higher powers does not significantly improve the fit accuracy within the training region, but the mass ratio extrapolation errors estimated in App.A become much larger.
It is always possible to improve the accuracy of a fit by adding more basis functions.However, this can lead to over-fitting when the data contain some noise.Our source of noise is mostly due to NR truncation error, but also systematic errors such as waveform extrapolation and residual eccentricity.In order to safeguard against over-fitting, we perform 10 trial fits, leaving a random 10% of the dataset out as validation points in each trial, to determine the set of basis functions used in constructing the final fit.We allow a maximum of 100 basis functions for each fit.See App.A of Ref. [74] for more details.

D. Surrogate evaluation
To evaluate the surrogate, we begin with a user-specified mass ratio q and spins χ coorb 1,2 at the initial time t = −4300M .Note that at this time, the inertial frame coincides with the coorbital frame.These values are used to initialize the dynamics surrogate described in Sec.IV  ).The coorbital frame surrogate described in Sec.IV A is used to predict the strain in the coorbital frame.This involves evaluating the fits at the empirical time nodes for this surrogate using χ coorb  1,2 (t coorb i ) and q.Then, the orbital phase φ(t coorb i ) is used to transform the strain from the coorbital frame to the coprecessing frame (cf.Eq. 2).Finally, the coprecessing frame quaternions Q(t coorb i ) are used to transform the strain from the coprecessing frame to the inertial frame (this involves Wigner matrices, see App.A of Ref. [73]).This gives us h m (t coorb i ), which is interpolated onto any required time array t using cubic splines to get h m (t).

V. REMNANT SURROGATE
To construct the remnant properties surrogate, we closely follow the model of Ref. [57].We refer to the new model presented here as NRSur7dq4Remnant.
We model the remnant mass m f , spin χ f , and kick velocity v f .Before constructing the fits, χ f and v f are transformed into the coorbital frame at t = −100M .We model each component of the vectors independently.The fits are parametrized by the same Λ of Eq. ( 7), but using the component spins at t = −100M .Unlike the waveform surrogate case, we do not filter out orbitaltimescale oscillations.The filtered spins were found to be necessary for the accuracy of the time integration in Sec.IV B, which is not necessary here because the remnant properties can evaluated from the BBH parameters at a single time t = −100M .
All fits are performed using Gaussian Process Regression (GPR), as described in the supplementary materials of Ref. [57].We find that GPR fitting is, in most cases, more accurate but also significantly more expensive than the polynomial fitting method described in Sec.IV C. GPR becomes impractical to use for the waveform surrogate as there are hundreds of fits that need to be evaluated to generate the waveform.For the remnant fits, however, the additional cost of GPR is acceptable because one is only fitting 7 quantities (m, χ f , v f ).In addition, GPR naturally provides error estimates which can be useful in data analysis applications.The efficacy of the GPR error estimate in reproducing the underlying error of the surrogate models was investigated thoroughly in the supplementary materials of Ref. [57].
Although NRSur7dq4Remnant is parameterized internally by input spins specified in the coorbital frame at t = −100M , we allow the user to specify input spins at earlier times, and in the inertial frame; this case is handled by two additional levels of spin evolution.Given the inertial-frame input spins at an initial orbital frequency f 0 , we first evolve the spins using a post-Newtonian (PN) approximant -3.5PN SpinTaylorT4 [89,97,98] -until we reach the domain of validity of the more accurate NRSur7dq4 (t = −4300M from the peak).We then use the dynamics surrogate of NRSur7dq4 to evolve the spins until t = −100M .These spins are then transformed to the coorbital frame and used to evaluate the remnant fits.Thus, spins can be specified at any given orbital frequency and are evolved consistently before estimating the final BH properties.Note that NRSur7dq4 uses the filtered spins, while NRSur7dq4Remnant expects unfiltered spins at t = −100M , but we find that the errors introduced by this discrepancy are negligible compared to the errors due to PN spin evolution.

VI. RESULTS
We evaluate the accuracy of our new surrogate models by comparing against the waveform and remnant properties from the NR simulations used in this work.For this, we perform a 20-fold cross-validation study to compute "out-of-sample" errors as follows.We first randomly divide the 1528 training simulations into 20 groups of ∼ 76 simulations each.For each group, we build a trial surrogate using the ∼ 1452 remaining training simulations and test against these ∼ 76 validation ones, which may include points on the boundary of the training set. .We also show the corresponding NR waveforms.Each waveform is projected using all available modes for that model, along the direction that results in the largest mismatch for NRSur7dq4 (SEOBNRv3) in the top (bottom) panel.Note that NRSur7dq4 is evaluated using trial surrogates that are not trained using these cases.The binary parameters and the direction in the source frame are indicated in the figure text.All waveforms are time shifted such that the peak of the total amplitude occurs at t = 0 [using all available modes, according to Eq. ( 5)].The waveform modes are then rotated to have their orbital angular momentum aligned with the z-axis, and such that the orbital phase is equal to zero at t = −4300M .

A. Waveform surrogate errors
To estimate the difference between two waveforms, h 1 and h 2 , we use the mismatch where h(f ) indicates the Fourier transform of the complex strain h(t), * indicates a complex conjugation, Re indicates the real part, and S n (f ) is the one-sided power spectral density of a GW detector.We taper the time domain waveform using a Planck window [99], and then zero-pad to the nearest power of two.We further zero-pad the waveform to increase the length by a factor of eight before performing the Fourier transform.The tapering at the start of the waveform is done over 1.5 cycles of the (2, 2) mode.The tapering at the end is done over the last 30M .Note that our model contains times up to 100M after the peak of the waveform amplitude, and the signal has essentially died down by the last 30M .We take f min to be twice the waveform angular velocity (as defined by Ref. [100]) at the end of the initial tapering window, and f max is chosen to be 4 times the waveform angular velocity at t = 0; the extra factor of 4 is chosen to resolve up to m = 4 spherical-harmonic modes, with an extra margin of a factor of 2. We compute mismatches with a flat noise curve (S n = 1) as well as with the Advanced-LIGO design sensitivity noise curve [101].Mismatches are computed following the procedure described in Appendix D of Ref. [74].In particular, we optimize over shifts in time, polarization angle, and initial orbital phase.Both plus and cross polarizations are treated on an equal footing by using a two-detector setup where one detector sees only the plus and the other only the cross polarization.We compute the mismatches at 37 points uniformly distributed on the sky in the source frame, and we use all available modes of a given waveform model.Figure 4 summarizes the out-of-sample mismatches for NRSur7dq4 against the NR waveforms.In the left panel we show mismatches computed using a flat noise curve.We compare this with the truncation error in the NR waveforms themselves, estimated by computing the mismatch between the two highest available resolutions of each NR simulation.The errors in the surrogate model are well within the estimated truncation errors of the NR simulations.In addition, we also show the errors for the waveform model SEOBNRv3 [24,31], which also includes spin precession effects 3 .The surrogate errors are at least an order of magnitude lower than those of SEOBNRv3.
Apart from SEOBNRv3, another model commonly used in data analysis applications is IMRPhemomPv2 [27].IM-RPhemomPv2 was shown to be comparable in accuracy to SEOBNRv3 in Ref. [56], at least in order of magnitude.Therefore, for simplicity, we do not show comparisons of IMRPhemomPv2 to NR here.Note that updated versions of both SEOBNRv3 (based on Ref. [22]) and IMRPhemomPv2 (see Ref. [21]) are under development, but are not currently available publicly.We note that these models are calibrated only against aligned-spin NR simulations, using a much smaller set of simulations than our model.Both these factors contribute to the accuracy of these models.On the other hand, these models are expected to be valid for larger mass ratios and spin magnitudes than our model, although their accuracy in that region is unknown due to lack of sufficient number of simulations.
We note that the NR truncation mismatch distribution in the left panel of Fig. 4 has a tail extending to MM ∼ 0.1.We find that these cases occur when the spins of the two highest resolutions of the simulation are inconsistent with each other because of unresolved effects during junk-radiation emission, meaning that the two resolutions represent different physical systems.This means that comparing the resolutions for these cases gives us an error estimate that is too conservative and does not reflect the actual truncation error of the simulations.We expect the actual truncation error to be closer to the errors reproduced by the surrogate model (which is trained on the high resolution data set) in Fig. 4. Evidence for these claims is provided in App.B. The right panel of Fig. 4 shows mismatches computed using the Advanced LIGO design sensitivity noise curve [101].In this case, results depend on the total mass M of the system.Consequently, we show the median and Error histograms for NRSur7dq4Remnant for the remnant mass, spin magnitude, spin direction, kick magnitude, and kick direction for precessing BBH with mass ratios q ≤ 4 and spin magnitudes χ1, χ2 ≤ 0.8.The direction error is the angle between the predicted vector and a fiducial vector, taken to be the high-resolution NR case and indicated by .Square (triangle) markers indicate median (95th percentile) values.Also shown are the NR resolution errors and errors for different existing fitting formulae.In the bottom-right panel we show the joint distribution of kick magnitude and kick-direction error.
95th percentile values at different M , rather than full histograms.Once again, the surrogate errors are comparable to those of the NR simulations, and are at least an order of magnitude lower than that of SEOBNRv3.Over the mass range 50 − 200M , mismatches for NRSur7dq4 are always 8 × 10 −3 at the 95 percentile level.Fig. 5 shows a comparison of waveforms computed via NRSur7dq4, SEOBNRv3, and NR for the cases that lead to the largest error for NRSur7dq4 and SEOBNRv3 in the left panel of Fig. 4. The surrogate shows reasonable agree-ment with NR, even for its worst case, while SEOBNRv3 shows a noticeably larger deviation in both cases.
In Figs. 4 and 5 we use all available modes for NR-Sur7dq4 and SEOBNRv3.NRSur7dq4 models all modes ≤ 4, while SEOBNRv3 models only the = 2 modes.For the NR waveforms in Figs. 4 and 5, we include all modes ≤ 5 to account for the error due to neglecting > 4 modes in NRSur7dq4.To better understand what fraction of the SEOBNRv3 error comes from neglecting modes with > 2, we repeat the calculations leading to the SEOBNRv3 histogram in the left panel of Fig. 4 in Fig. 6, while restricting all waveforms to = 2.While there is a noticeable move towards lower mismatches when restricted to = 2, the median and 95th percentile values change only marginally, suggesting that the main error source for SEOBNRv3 are the = 2 modes themselves.

B. Remnant surrogate errors
We evaluate the accuracy of the remnant surrogate NRSur7dq4Remnant by comparing against the NR simulations through a cross-validation study as in Sec.VI A. Out-of-sample errors for the remnant properties predicted by NRSur7dq4Remnant are shown in Fig. 7. 95th percentile errors are ∼ 5×10 −4 M for mass, ∼ 2×10 −3 for spin magnitude, ∼ 4 × 10 −3 radians for spin direction, ∼ 4×10 −4 c for kick magnitude, and ∼ 0.2 radians for kick direction.Our errors are at the same level as the NR resolution error, estimated by comparing the two highest NR resolutions.The largest errors in the kick direction can be of order ∼ 1 radian.The bottom-right panel of Fig. 7 shows the joint distribution of kick magnitude and kick direction error for NRSur7dq4Remnant, showing that direction errors are larger at low kick magnitudes.Our error in kick direction is below ∼ 0.2 radians whenever v f 2 × 10 −3 c.
We also compare the performance of our fits against several existing fitting formulae for remnant mass, spin, and kick which we denote as follows: HBMR ( [32,33] with n M = n J = 3), UIB [34], HL [35], HLZ [36], and CLZM ( [37][38][39][40][41] as summarized in [42]).To partially account for spin precession, these fits are corrected as described in Ref. [102] and used in current LIGO/Virgo analyses [6,103]: spins are evolved using PN from relaxation to the Schwarzschild innermost stable circular orbit, and final UIB and HL spins are post-processed by adding the sum of the in-plane spins in quadrature.Figure 7 shows that our procedure to predict remnant mass, spin magnitude, and kick magnitude for precessing systems is more accurate than these existing fits by at least an order of magnitude.
Our fits appear to outperform the NR simulations when estimating the spin direction.Once again, this is due to the post-junk-radiation initial spins of the two highest resolutions being inconsistent with each other for some of our simulations, so that different resolutions represent different physical systems (cf.App.B).Therefore, the errors estimated by comparing the two highest resolutions is a poor estimate of the actual truncation error for these cases.The actual truncation error is likely to be close to the errors reproduced by the surrogate.
The NRSur7dq4Remnant fits in Fig. 7 are evaluated using the NR spins at t = −100M as inputs.In typical applications, one may have access to the spins only at the start of the waveform, rather than at t = −100M .For this case, as described in Sec.V, we use a combination of PN and NRSur7dq4 to evolve the spins from any given

Error
FIG. 8. Errors for NRSur7dq4Remnant in predicting remnant properties when spins are specified at an orbital frequency of f0 = 10 Hz.For four different total masses, we compute the differences between the surrogate prediction of various remnant properties with the value obtained in the NR simulation.For each mass, these differences are shown as a vertical histogram.Note that the distributions in these plots are normalized to have a fixed height, not fixed area.
starting frequency to t = −100M .These spins are then used to evaluate the NRSur7dq4Remnant fits.Thus, spins can be specified at any given orbital frequency and are evolved consistently before estimating the final BH properties.This is a crucial improvement (introduced by Ref. [57]) over previous results, which, being calibrated solely to non-precessing systems, suffer from ambiguities regarding the time/frequency at which spins are defined.Figure 8 shows the errors in NRSur7dq4Remnant when the spins are specified at an orbital frequency f 0 = 10 Hz.These errors are computed by comparing against 23 long NR (3 × 10 4 M to 10 5 M in length) simulations [84] with mass ratios q ≤ 4 and generically oriented spins with magnitudes χ 1 , χ 2 ∼ 0.5.None of these simulations were used to train the fits.Longer PN evolutions are needed at lower total masses, and the errors are therefore larger.These errors will decrease with an improved spin evolution procedure.Note, however, that our predictions are still more accurate than those of existing fitting formulae (cf.Fig. 7).

VII. CONCLUSION
We present new NR surrogate models for precessing BBH systems with generic spins and unequal masses.In particular, we model the two most-used outputs of NR simulations: the gravitational waveform and the properties (mass, spin, and recoil kick) of the final BH formed after the merger.Trained against 1528 NR simulations with mass ratios q ≤ 4, spin magnitudes χ 1,2 ≤ 0.8, and generic spin directions, both these models are shown to reproduce the NR simulations with accuracies comparable to those of the simulations themselves.
The waveform model, NRSur7dq4, includes all spinweighted spherical harmonic modes up to = 4.The precession frame dynamics and spin evolution of the BHs are also modeled as byproducts.Through a crossvalidation study, we show that the mismatches for NR-Sur7dq4 against NR computed with the Advanced LIGO design sensitivity noise curve are always 8 × 10 −3 at the 95 percentile level over the mass range 50 − 200M .This is at least an order of magnitude improvement over existing waveform models.NRSur7dq4 is made publicly available through the gwsurrogate [58] Python package, with example evaluation code at Ref. [60].
For the final BH model, NRSur7dq4Remnant, the 95th percentile errors are ∼ 5×10 −4 M for mass, ∼ 2×10 −3 for spin magnitude, ∼ 4×10 −4 c for kick magnitude.Once again, these are lower than that of existing models by at least an order of magnitude.In addition, we also model the spin and kick directions.Moreover, the GPR methods employed here naturally provide error estimates along with the fitted values.These uncertainty estimates can be incorporated into data analysis applications to marginalize over systematic uncertainties.NRSur7dq4Remnant is made publicly available through the surfinBH [59] Python package, which includes an example evaluation code.

A. Future work
In App.A we test the performance of these surrogate models when extrapolated outside their training range to q = 6.We find that our models become worse at these mass ratios, but are still comparable or better than existing models.Unfortunately, suitable precessing simulations are currently not available for testing at intermediate mass ratios 4 < q < 6.In general, we advice caution with extrapolation.A natural improvement of both NRSur7dq4 and NRSur7dq4Remnant is to extend their range of validity with new training simulations at higher mass ratios and spin magnitudes.We note, however, that both these regimes are increasingly expensive to model in NR.
Another important limitation of these models is that they are restricted to the same length as the NR simulations (starting time of ∼ 4300M before the peak or about 20 orbits).For LIGO, assuming a starting GW frequency of 20 Hz, the (2, 2) mode of the surrogate is valid for total masses M 66M .This number, however, depends on the mass ratio.Fig. 9 shows the mass range of validity of NRSur7dq4 as a function of mass ratio.We compare this with the parameters of the 10 BBH detections seen by LIGO and Virgo in the first two observing runs [9].NR-Sur7dq4 sufficiently covers the posterior spread of most but not all of these detections, the main limitation being the number of orbits covered by the model.However, see Ref. [104] for an example of NR surrogates used in data analysis with GW signals.A promising avenue to extend the length of the waveforms is to "hybridize" the simulations using PN waveforms in the early inspiral.This approach already was found to be successful for the case of aligned-spin BBH [82], but still needs to be generalized to precessing spins.Furthermore, it is not clear if the current length of the NR simulations is sufficient to guarantee good attachment of the PN and NR waveforms for precessing BBH.
Despite these limitations, in their regime of validity, the models presented in the paper are the most accurate models currently available for precessing BBHs.As shown in this paper, our models rival the accuracy of the NR simulations, while being very cheap to evaluate.As more and more BBHs are detected at higher signal-to-noise ratios, fast yet accurate models such as these will contribute to turning GW astronomy into high precision science.by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois.Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.Simulations were performed on NSF/NCSA Blue Waters under allocation NSF PRAC-1713694; on the Wheeler cluster at Caltech, which is supported by the Sherman Fairchild Foundation and by Caltech; and on XSEDE resources Bridges at the Pittsburgh Supercomputing Center, Comet at the San Diego Supercomputer Center, and Stampede and Stampede2 at the Texas Advanced Computing Center, through allocation TG-PHY990007N.Computations for building the model were performed on Wheeler.In this Appendix we assess the performance of the NR-Sur7dq4 and NRSur7dq4Remnant models when evaluated at mass ratio q = 6.Doing so is effectively an extrapolation because q = 6 is outside the training range of the surrogates (q ≤ 4).The surrogate models are compared against 100 NR simulations with q = 6 and generically precessing spins with magnitudes χ 1 , χ 2 ≤ 0.8.These simulations have been assigned the identifiers SXS:BBH:2164 -SXS:BBH:2263, and are made publicly available through the SXS public catalog [83].
Figure 10 shows the q = 6 extrapolation mismatches for NRSur7dq4.Also shown are the mismatches for SEOB-NRv3 when compared against the same simulations.The mismatches are computed in the same manner as in the left panel of Fig. 4, which we reproduce here for compari-son.The surrogate errors become noticeably worse when extrapolating to q = 6, but are still much smaller than the corresponding errors for SEOBNRv3.Error histograms of the remnant mass, spin magnitude and kick magnitude when extrapolating NR-Sur7dq4Remnant to mass ratio q = 6.The training range errors from Fig. 7 are reproduced here for comparison.We show errors using the NR spins at t = −100M (yellow) as well as the initial NR spins (blue) as inputs for the model.Also shown are the errors for existing fitting formulae described in Sec.VI B; for the final mass and spin, we only show the minimum error among the HBMR, UIB, and HL fits.The square (triangle) markers indicate median (95th percentile) values.
Fig. 11 shows the performance of NRSur7dq4Remnant when extrapolating to q = 6.We show the errors when the fits are evaluated using the NR spins at t = −100M as well as when the spins are specified at the start of the NR simulations.In the latter case, we use the extrapolated dynamics surrogate of NRSur7dq4 to evolve the spins to t = −100M and then evaluate the fits.We reproduce the training range errors from Fig. 7 for comparison.Also shown are the errors for the existing fitting formulae described in Sec.VI B when compared against the same simulations.We find that NRSur7dq4Remnant performs noticeably worse when extrapolated to q = 6 but is still slightly better than the existing fitting formulae, except for the final spin where the existing fitting formulae perform slightly better.
In general, we find that the NRSur7dq4 and NR-Sur7dq4Remnant models become worse with extrapolation to q = 6 but are still better or comparable to existing models.Unfortunately, we do not have enough suitable precessing simulations with 4 < q < 6 with which to test at what mass ratio the degradation of these surrogate models becomes significant.We leave these tests, as well as extending the models to larger mass ratios by adding NR simulations, to future work.

Appendix B: On the high mismatch tail in NR errors
The histogram of NR errors in the left panel of Fig. 4 shows a significant tail to the right, i.e. at large mismatches.In Sec.VI A, this tail was attributed to different resolutions of the same NR simulation having different physical parameters, namely the "initial" spins, which are measured at the relaxation time [84] after the poorlyresolved junk-radiation transients have settled.In this Appendix we provide some evidence for this claim.Figure 12 shows the maximum mismatch (with a flat noise curve) over points in the sky versus the difference in the relaxation-time dimensionless spins between the two highest resolutions.We refer to the two highest resolutions as HiRes and MedRes, and their corresponding relaxationtime dimensionless spins are denoted by (χ HiRes ), respectively.We note that the largest mismatch occurs when the spin difference is largest between the two resolutions.For a significant fraction of the simulations the spins can be different by about 0.1; for these cases the two resolutions essentially represent two different physical systems, so the difference in waveforms between the two resolutions fails to be a good estimate of the truncation error in the simulations.
Figure 12 suggests that the high NR mismatch tail of Fig. 4 is artificially large, and if the two resolutions were to correspond to the same physical system, the tail would be shorter.We test this in Fig. 13, where we compare the surrogate against the MedRes simulations, but use the spins of the MedRes simulation (χ MedRes 1 , χ MedRes 2 ) to evaluate the surrogate.The surrogate mismatches against the HiRes simulations as well as the NR resolution mismatches (HiRes vs MedRes) are reproduced from Fig. 4 for comparison.We note that the surrogate mismatches when compared against the MedRes simulations always lie below ∼ 10 −2 and do not have the high mismatch tail  seen for the NR resolution mismatches.In this test, we are treating the surrogate, which is trained on the HiRes simulations, as a proxy for the HiRes dataset.Evaluating the surrogate with the parameters of a MedRes simulation is treated as a proxy for performing the HiRes simulation with the same parameters.Therefore, the green histogram in Fig. 13 can be treated as the "true" resolution error when the parameters of the resolutions are the same.As expected for this case, this estimate of the resolution error agrees with the errors for the surrogate model (red histogram).Together, Figs. 12 and 13 show that the high NR mismatch tail in the left panel of Fig. 4 is due to the difference in the parameters of the different NR resolutions.We believe this difference arises from spurious initial transients known as "junk radiation".These transients result from initial data that do not precisely represent a snapshot of a binary that has evolved from t = −∞.The transients quickly leave the simulation domain after about one or two binary orbits.It is computationally expensive to resolve the high spatial and temporal frequencies of the transients, so we typically choose not to resolve these tran-sients at all, and instead we simply discard the initial part of the waveform.Because some of the transients carry energy and angular momentum down the BHs, the masses and spins are modified, so we measure "initial" masses and spins at a relaxation time [84] deemed sufficiently late that the transients have decayed away.Because we do not fully resolve the transients, their effect on the masses and spins are not always convergent with resolution.
This issue should ideally be resolved with improved, junk-free initial data (see Ref. [105] for steps in this direction).In the meantime, we propose a change in how SpEC performs different resolutions for the same simulation.Currently, initial data are constructed by solving the Einstein constraint equations [77,106].The same constraint-satisfying initial data are then interpolated onto several grids of different resolution, and Einstein's equations are evolved on each grid independently.Our proposal is to first evolve the initial data using the high resolution grid until the transients leave the simulation domain, and then interpolate the data at that time onto grids of lower resolution, and evolve Einstein's equations on these lower-resolution grids independently.This way all resolutions start with the same initial data at a time after transients have decayed away instead of at the start of the simulation, and the masses and spins of the black holes should be convergent.
This proposal is tested in Fig. 14 for the case leading to the largest NR mismatch in the left panel of Fig. 4. We perform the resolution branching at t ∼ 1000M after the start of the high resolution simulation.The outer boundary is at ∼ 600M and this is sufficient time for junk radiation to leave the simulation domain.We find that the mismatches decrease significantly when the resolution branching is done post-junk, as the resolutions now correspond to the same physical system.

FIG. 4 .= 3 .6 χ 1 =FIG. 5 .
FIG.4.Mismatches for NRSur7dq4 and SEOBNRv3 models, when compared against precessing NR simulations using all ≤ 5 modes with mass ratios q ≤ 4, and spin magnitudes χ1, χ2 ≤ 0.8.The NRSur7dq4 errors shown are out-of-sample errors.Also shown are the NR resolution errors.Mismatches are computed at several sky locations using all available modes for each model: ≤ 4 for NRSur7dq4, and = 2 for SEOBNRv3.The NR error is computed using all ≤ 5 modes from the two highest available resolutions.Left panel: Mismatches computed using a flat noise curve.The square (triangle) markers at the top indicate the median (95th percentile) values.Right panel: Mismatches computed using the Advanced LIGO design sensitivity noise curve, as a function of total mass.The dashed (solid) lines indicate the median (95th percentile) values over different NR simulations and points in the sky.

mismatch SEOBNRv3 vs NR ( 5 )FIG. 6 .
FIG.6.Same as the left panel of Fig.4but using only = 2 modes for NR when compared to SEOBNRv3.The blue histogram from the left panel of Fig.4, where SEOBNRv3 is compared to NR with all ≤ 5 modes, is reproduced here for comparison.The square (triangle) markers at the top indicate the median (95th percentile) values.

FIG. 9 .
FIG. 9.The shaded region shows the regime of validity of the (2,2) mode of NRSur7dq4 with a starting frequency of 20 Hz.Also shown are the parameter ranges for the 10 BBH signals seen by LIGO and Virgo during the first two observing runs [9].The markers indicate the median values of the marginalized posteriors for the detector frame total mass M and mass ratio q.The error bars indicate the range between the 5th percentile and 95th percentile values of the posteriors.

FIG. 10 .
FIG. 10.Mismatch histogram when extrapolating the NR-Sur7dq4 waveform model to mass ratio q = 6.Also shown are mismatches for SEOBNRv3.The mismatches are computed using a flat noise curve.The training range errors from the left panel of Fig. 4 are reproduced here for comparison.The square (triangle) markers indicate median (95th percentile) values.
FIG. 11.Error histograms of the remnant mass, spin magnitude and kick magnitude when extrapolating NR-Sur7dq4Remnant to mass ratio q = 6.The training range errors from Fig.7are reproduced here for comparison.We show errors using the NR spins at t = −100M (yellow) as well as the initial NR spins (blue) as inputs for the model.Also shown are the errors for existing fitting formulae described in Sec.VI B; for the final mass and spin, we only show the minimum error among the HBMR, UIB, and HL fits.The square (triangle) markers indicate median (95th percentile) values.

FIG. 12 .
FIG.12.Dependence of the NR resolution error on the difference in the relaxation-time spins of the two highest resolutions (labeled HiRes and MedRes).The horizontal (vertical) axis shows the difference between the spin of the heavier (lighter) BH.The colors show the largest (flat noise) mismatch between the waveforms of the two resolutions over different points in the sky.Large mismatches occur when the difference between the relaxation-time spins of the two resolutions is large.

FIG. 13 .
FIG.13.Mismatch histograms for NRSur7dq4 when compared against the two highest available NR simulations (referred to as HiRes and MedRes).Also shown are mismatches between the two resolutions (labeled NR).The "NRSur7dq4 vs HiRes" and NR errors are the same as the red and black histograms, respectively, in the left panel of Fig.4.These are flat noise mismatches, computed at several points in the sky.The square (triangle) markers indicate median (95th percentile) values.
FIG. 14. NR resolution mismatches for the simulation leading to the largest NR mismatch in the left panel of Fig. 4. The different samples in the histogram correspond to comparisons at different angles on the sky.The blue histogram shows the current resolution errors when the two resolutions start with the same initial data at the start of the simulation.All points in the blue histogram are the same as those included in the left panel of Fig. 4. The green histogram shows the resolution errors for the same case when the two resolutions start with the same initial data at ∼ 1000M after start, at which point the junk radiation has left the simulation domain.
19, −0.78, 0.00] χ 2 = [0.01,−0.00, 0.80] FIG. 3. The top panel shows the real part of the (2, 2) and (2, −2) modes of the waveform in the coorbital frame.Notice that the orbital time scale oscillations of these two modes have opposite signs.The bottom panel shows the real parts of h + 2,2