Tests of General Relativity with Gravitational-Wave Observations using a Flexible--Theory-Independent Method

We perform tests of General Relativity (GR) with gravitational waves (GWs) from the inspiral stage of compact binaries using a theory-independent framework, which adds generic phase corrections to each multipole of a GR waveform model in frequency domain. This method has been demonstrated on LIGO-Virgo observations to provide stringent constraints on post-Newtonian predictions of the inspiral and to assess systematic biases that may arise in such parameterized tests. Here, we detail the anatomy of our framework for aligned-spin waveform models. We explore the effects of higher modes in the underlying signal on tests of GR through analyses of two unequal-mass, simulated binary signals similar to GW190412 and GW190814. We show that the inclusion of higher modes improves both the precision and the accuracy of the measurement of the deviation parameters. Our testing framework also allows us to vary the underlying baseline GR waveform model and the frequency at which the non-GR inspiral corrections are tapered off. We find that to optimize the GR test of high-mass binaries, comprehensive studies would need to be done to determine the best choice of the tapering frequency as a function of the binary's properties. We also carry out an analysis on the binary neutron-star event GW170817 to set bounds on the coupling constant $\alpha_0$ of Jordan-Fierz-Brans-Dicke gravity. We take two plausible approaches; in the first \emph{theory-agnostic} approach we find a bound $\alpha_0 \lesssim 2\times 10^{-1}$ from measuring the dipole-radiation for different neutron-star equations of state, while in the second \emph{theory-specific} approach we obtain $\alpha_0 \lesssim 4\times 10^{-1}$, both at $68\%$ credible level. These differences arise mainly due to different statistical hypotheses used for the analysis.


I. INTRODUCTION
Over the past half-decade, observations of gravitational waves (GWs) have gone from being elusive to the routine. Since the first detection of GWs in September 2015 [1], the LIGO [2] and Virgo [3] detectors have observed almost a hundred GW signals [4] from mergers of black holes (BHs), neutron stars (NSs) [5,6] and their mixture [7]. Placed alongside independent confirmations of these detections, as well as, claims of new ones [8][9][10][11][12][13], these results have firmly established the field of GW astronomy.
Besides attempting answers in astrophysics [14][15][16] and cosmology [17,18] in a manner complementary to electromagnetic astronomy, GWs are unique probes of fundamental physics. For more than a century, Albert Einstein's theory of General Relativity (GR) has been our description of gravitational interactions, having passed every experimental and observational challenge, so far. However, for the first time, the LIGO-Virgo GW observations have allowed us to probe GR in the large-velocity, * ajit.mehta@aei.mpg.de † alessandra.buonanno@aei.mpg.de ‡ rcotest1@jhu.edu § abhirup.ghosh@aei.mpg.de ¶ jan.steinhoff@aei.mpg.de highly dynamical, strong-field regime of gravity, a regime which is inaccessible with tests in the Solar System [19], in binary pulsars [20], and with observations around supermassive BHs at the center of galaxies [21][22][23].
Tests of GR with GW observations come in two distinct flavors: theory agnostic and theory specific. The first class of tests assumes that the underlying GW signal is well-described by GR, and any potential deviation is characterized by extra, phenomenological, non-GR degrees of freedom or parameters. These tests use observations of GWs to constrain the non-GR parameters and check for consistency with their nominal predictions in GR. While theory-agnostic tests can only comment on (dis)agreement with GR predictions, the above measurements of phenomenological non-GR parameters can be translated to specific modified theories of gravity, albeit there are subtleties, as we shall discuss below. Investigations that compare directly the data with modified theories of gravity belong to the theory-specific flavor of tests of GR.
Several tests of GR have been demonstrated using the observations of GW signals by the LIGO-Virgo Collaboration (LVC) [24][25][26][27][28][29]. Among them are the theoryagnostic parameterized tests of the inspiral, which check for the agreement of the early evolutionary (or inspiral) phase of a compact binary coalescence composed of BHs and/or NSs with the analytic post-Newtonian (PN) approximation for binaries in GR [30][31][32][33][34]. Parameter-ized GR waveforms have used the LIGO-Virgo observations to provide state-of-the-art bounds on possible deviations from the PN predictions [28]. At the same time, these theory-agnostic bounds have been used to conduct theory-specific tests and constrain particular modified theories of gravity (see, e.g., Refs. [35][36][37][38][39]).
And yet, parameterized inspiral tests are not all exactly the same; they can differ in the underlying GR waveform model, in how the non-GR parameters are introduced into it, and in the transition beyond the inspiral part. In this work, we develop a framework to examine how the details of the construction of parameterized waveform models systematically affect the tests of GR in which they are employed 1 . It is also important to be able to distinguish deviations from GR due to systematic uncertainties of the waveform model from true violations of the theory. This infrastructure allows us to add generic corrections to the inspiral portion of any gravitational waveform, thereby allowing tests of GR with a broader range of waveform models than previously possible (e.g., with the TIGER infrastructure [33,34]). We call this framework the flexible theory-independent (FTI) approach, and explore it using synthetic binary BH (BBH) GW signals. In addition, the conveniently adaptable design of the FTI framework allows us for easy construction of waveform models for theory-specific tests. As an example, we apply the FTI framework to the first binary NS (BNS) merger observed by the LIGO and Virgo detectors, GW170817, and set bounds on the Jordan-Fierz-Brans-Dicke (JFBD) scalar-tensor theory of gravity. We note that the FTI approach has already been extensively used by LIGO and Virgo data analysts in Refs. [26][27][28][29] and also by some of the authors of this manuscript in Ref. [37]. This paper is organized as follows. In Sec. II we introduce the FTI method for multipolar waveform models of compact-object binaries. After recalling the tenets of Bayesian inference in Sec. III, in Sec. IV we apply the FTI method to synthetic GW signals of BBHs. This allows us to discuss the effect of the FTI parameterization on the recovery of the BBH properties (excluding the GR-deviation parameters) and to study the robustness of the FTI method. In Sec. V, we use the FTI construction on real data, notably the BNS signal GW170817, to set bounds on the JFBD theory of gravity. Finally, in Sec. VI, we summarize our main conclusions and also discuss possible future work. The Appendix A collects the necessary PN results for the GW phase of BBH with aligned spins.
Henceforth, we use natural units such that the Newton constant G = 1 and the speed of light c = 1.

II. THE FTI APPROACH
In GR, gravitational signals from quasi-circular BBHs depend on the instrinsic parameters λ = {m 1 , m 2 , S 1 , S 2 }, where m i , S i are the masses and spins of the compact objects (i = 1, 2), as well as, a set of extrinsic parameters ξ = {ι, ϕ c , α, δ, ψ, d L , t c }. These are the angular position of the line of sight measured in the source frame (ι, ϕ c ), the sky location of the source in the detector frame (α, δ), the polarization angle ψ, the luminosity distance of the source d L and the time of arrival t c . Limiting ourselves to objects with non-precessing spins (i.e., spins aligned or antialigned with the orbital angular momentum L), the only (dimensionless) spin component on which the dynamics, and hence the waveform, depends is χ i = S i · L/(|L|m 2 i ). The set of instrinsic parameters consequently reduces to four λ = {m 1 , m 2 , χ 1 , χ 2 }. For convenience, we additionally introduce the following parameters: the mass ratio q = m 1 /m 2 ≥ 1, the symmetric mass ratio ν = q/(1 + q) 2 , the binary's total mass M = m 1 + m 2 , the chirp mass M = ν 3/5 M , and the effective spin χ eff = (m 1 χ 1 + m 2 χ 2 )/M . The above set of 11 parameters is enough to describe an aligned-spin BBH signal. For a binary involving NSs, this set increases by the tidal parameters (Λ 1,2 ), which encode the NS matter equation of state.
In GR, the GW signal can be decomposed into a set of modes by projecting the complex linear combination of its plus and cross polarizations During the inspiral, the GW signals from aligned-spin binaries satisfy a reflection symmetry about its orbital plane, which implies where * denotes the complex conjugation. As a consequence, we can restrict ourselves to the m ≥ 0 modes to describe the complete mode-content of aligned-spin inspiral waveforms. Furthermore, for such systems,h R m (f ), the Fourier transform of the real part of h m (t) is related to the imaginary part viã Using Eq. (3) and (4), the GW polarizations in the frequency domain read (see Appendix C of Ref. [42] for full derivation) where d 2 m (ι) denote the Wigner functions of weight −2 [43]. Being a complex function,h R m (f ) can be written ash The construction of a parameterized (or generalized) waveform model begins with the baseline model in GR (Eq. (6)). During the quasi-circular, adiabatic inspiral, the frequency-domain phase ψ m (f ) can be obtained from PN theory [44,45] using the stationary-phase approximation (SPA). In GR it reads where The quantity F is the orbital frequency which is related to the (Fourier) GW frequency f for a given ( , m)-mode through the equation above. The quantities ψ (PN) n and ψ (PN) n(l) are the (n/2)-PN coefficients 2 . which depend on the binary parameters. In the Appendix A, we provide explicit expressions for the PN coefficients up to 3.5PN order (including spin effects), the highest PN order to which the GW phasing is currently known. Note that the logarithmic terms in Eq. (7) arise from tails effects (i.e., terms that depend on the complete past history of the binary).
We generalize the GR waveform model by considering corrections to the phase that take a form similar to the PN expansion 2 The l in n(l) refers to PN coefficients alongside log v in addition to v n dependence (see the Appendix A) where δψ n and δψ n(l) are deviations to the (n/2)-PN phase coefficients ψ (PN) n and ψ (PN) n(l) defined above. These correction terms depend, in addition to λ, also on the corresponding deviation parameter δφ n or δφ n(l) , respectively. We include possible deviations at "pre-Newtonian" orders (n < 0) as these are predicted in some alternative theories of gravity. In particular the emission of dipole radiation (discussed in Sec. V A) leads to a nonvanishing deviation at n = −2. A solitary deviation at n = −1 is less well motivated and not discussed in this work. Parameterized deviations of this form can be mapped onto the predictions of any hypothetical theory of gravity provided that (i) the theory admits a weakfield, slow-velocity PN expansion as in GR, and (ii) the deviations from GR are parametrically smaller than the PN-expansion parameter v 2 . Note that this excludes theories that admit non-perturbative phenomena like dynamical scalarization, for which the naive PN expansion in Eq. (9) breaks down [46,47], and other methods are needed to observe such effects (see, e.g., Ref. [48]). Because GW detectors are more sensitive to the evolution of the signal's phase than its amplitude, considering also the computational cost of introducing several free parameters, we neglect deviations in the mode amplitudes A m .
Despite the generality of Eq. (9), the moderate signalto-noise ratios (SNRs) of most LIGO-Virgo observations, so far, do not allow us to place meaningful bounds on multiple deviation parameters concurrently. Hence, we vary one parameter at a time, keeping the rest fixed at their nominal GR prediction, which is zero. This assumption is validated by investigations [39,[49][50][51] that conclude that a signal containing deviations at several PN orders is likely to lead to a non-zero deviation measurement using a model with only a single deviation parameter. On the other hand, a recent work [52] following a principle component analysis identified certain combinations of these deviation parameters with the tightest constraints; in fact, the two dominant principle components rather capture the essence of the full multi-parameter test.
In this paper, we assume that each deviation parameter represents a fractional deviation to the corresponding PN coefficient in GR and for this reason, we also refer to the deviation parameters as non-GR parameters, We handle PN orders for which GR coefficients vanish slightly differently (i.e., for n = −2, −1, 1). For such cases, we let ϕ n represent an absolute deviation at that order instead. While Eq. (9) unambiguously details how to generalize GR waveforms containing only the inspiral, additional care must be taken for waveforms that contain later portions of the GW signal, such as the merger-ringdown. For the FTI approach, we require that the parameterized deviations satisfy the following properties 1. The early-inspiral (low-frequency) waveform has a phase ψ m (f ; λ; δφ n , δφ n(l) ) = ψ (GR) m (f ; λ) + δψ m (f ; λ; δφ n , δφ n(l) ), where δψ m takes the form of Eq. (9).
2. The post-inspiral (high-frequency) waveform has a phase ψ m (f ; λ; δφ n , δφ n(l) ) = ψ (GR) m (f ; λ) that exactly reproduces the underlying GR polarizations (Eq. (5)) up to some constant shift which represents the total dephasing from the GR polarization accumulated over the inspiral.
3. The waveform polarizations are C 2 smooth over all frequencies.
Using Eq. (10), the δψ m (f ; λ; δφ n , δφ n(l) ) in Eq. (9) read To smoothly apply these corrections over only the inspiral, we use a tapering function W (f ; v tape , ∆v tape ) given by which smoothly transitions between one and zero around v tape over the range of ∼ ∆v tape , where v is defined in Eq. (8). We construct the total phase correction for a given ( , m)-mode by combining this windowing function with the second derivative with respect to frequency of δψ m , which we denote as ψ m (f ), and reintegrating with appropriate integration constants to ensure C 2 smoothness. In summary, we use δψ m (f ; λ; δφ n , δφ n(l) ; v tape , ∆v tape ) = df δψ m (f , λ; δφ n , δφ n(l) ) is the reference frequency at which the phase of the ( m)-mode vanishes. This choice ensures that the definition of the reference frequency does not change when we add these corrections to the GR waveform. The second integration boundary is fixed by requiring that the first derivative of δψ m (f ) goes to zero at the frequency of the (2, 2)-mode's peak f peak 22 (see Eq. (A8) of Ref. [53]). This requirement ensures that the alignment between the GR waveform and the modified GR waveform in the time-domain remains the same.
Let us elaborate more on the tapering parameters v tape and ∆v tape that enter the final expression for the phase corrections (13). Using Eq. (8), the parameter v tape can be equivalently specified by the orbital frequency F tape at which the corrections are tapered off. Note that while the orbital tapering frequency F tape is the same for all modes, the tapering frequency in Fourier domain f tape m = mF tape depends on the mode. In the following, we fix the tapering frequency by specifying f tape 22 as a fraction of f peak 22 , say f tape with a constant α of order unity, and v tape = (πf tape 22 M ) 1/3 . Furthermore, instead of specifying the parameter ∆v tape directly, we find it more useful to fix the number of GW cycles over which the window function defined in Eq. (12) switches its value from 0 to 1. The number of GW cycles N GW between the GW frequencies f 1 and f 2 , or respectively v 1 and v 2 defined by Eq. (8), can be estimated as where we make use of the leading-order PN expressions for f (t) andḟ (t) in the last equality. Now, choosing v 1,2 = v tape ∓ ∆v tape /2, we solve for the small ∆v tape v tape , and N GW = 1. While we also employ these choices for the studies here, we devote Sec. IV B to investigate how results are effected when varying, in particular, the tapering frequency f tape 22 . Changing N GW in the range 0.8-3, instead, does not affect the results significantly.
At low frequencies, any GR description of the modes of the GW-phase reduces to the one of Eq. (7). Thus, we can apply the method outlined above to inspiraling PN models and also to inspiral-merger-ringdown GR models (if they are available in time domain we first perform a Fourier transform). Our method can treat the deviation coefficients in Eqs. (9) and (10) either as free parameters (see Sec. IV) or identify them to the ones predicted in specific alternative theories of gravity (see Sec. V), provided the latter have perturbative deviations from GR and are represented by PN-like coefficients. This is the eponymous flexibility of the FTI approach.
In this work, we apply the FTI approach to the multipolar, aligned-spin effective-one-body (EOB) waveform model SEOBNRHM [53,56] for BBHs, the aligned-spin EOB model SEOBNRT [53,57,58] for BNSs, and for some  Table I). The numerical value of the (2, 1)-mode is relatively small and hence it is not shown in the plot for the sake of clarity. We use as tapering frequency f tape studies, the aligned-spin inspiral-merger-ringdown phenomenological model PhenomX for BBHs [59] 3 . The SEOBNR, SEOBNRT and PhenomX waveforms contain the ( , m) = (2, 2) (dominant) mode, while the SEOBNRHM waveforms include four additional sub-dominant modes, ( , m) = (2, 1), (3,3), (4,4), and (5,5). We denote the waveforms to which we apply the non-GR phase corrections (10) parameterized waveforms and refer to them as pSEOBNRHM, pSEOBNRT and pPhenomX. We end this section with an illustration of the tapering using the pSEOBNRHM model. We choose a binary with parameters that correspond to the median of the GW190814 [54] event, as listed in Table I, and generate two pSEOBNRHM waveforms corresponding to a deviation parameter of δφ 2 = 0.5 and its GR limit δφ 2 = 0 (i.e., identical to SEOBNRHM). Figure 1 shows contributions of the phase corrections δψ The left panel highlights how the correction to the phase for each mode, δψ m (f ), approaches a constant value for f > f tape m where the correction is tapered off. Consistent with this, the right panel shows how the amplitude of the parameterized waveform (pSEOBNRHM) returns to the GR limit (SEOBNRHM) for f > f tape 44 4 We note that the parameterized waveform returns to the GR one when the overall phase difference, as illustrated in the left panel, becomes constant and is hence absorbed into the coalescence phase. In other words, the coalescence phase recovered using the parameterized waveforms and GR waveforms from the parameter-estimation studies is, in general, not expected to agree. In Fig. 2, we show the plus polarization of the parameterized and GR waveforms in time domain. The time-domain waveforms are generated from their frequency-domain counterparts via inverse Fourier transformation. The waveforms are aligned at the peak of their amplitudes. The vertical lines indicate the times corresponding to the tapering frequencies (i.e., F (t tape ) ≡ F tape ), where the quantity 2F (t) is computed from the first derivative of the (2,2)-mode's phase. For this figure, the tapering frequency used is = 2F tape , as stated before. For later analyses, we also indicate in the figure other choices of the tapering frequency. In the time-domain plot, it is visually clearer to see that the pSEOBNRHM waveform reduces to SEOBNRHM post-tapering (i.e., for t ≥ t tape ), while displaying a significant mismatch before the tapering (in the inspiral regime).

III. BAYESIAN INFERENCE
We can now use the parameterized waveforms introduced under the FTI approach in the previous section to perform tests of GR with LIGO-Virgo observations. The first step involves the measurement or inference of the waveform parameters, given the observed GW data. For this, we use a Bayesian approach in this paper. Given the hypothesis H that our data d = h(θ) + n consists of detector noise n and a single GW signal, which can be accurately described by our waveforms h(θ) depending on parameters θ = {λ, ξ, δφ n , δφ n(l) }, the Bayes' theorem states that Here, P (θ|d, H) is the posterior probability distribution of the parameters θ, and p(θ|H) is the prior probability distribution. The quantity E(d|H) is called the evidence of the hypothesis H, which is just the normalization constant of the posterior probability distribution P (θ|d, H). The quantity L(d|θ, H) denotes the likelihood of obtaining the data d = h(θ) + n given that the parameters of the waveform are θ under the hypothesis H, the latter entailing the waveform and noise models. It is computed using where .|. is the noise-weighted inner product, Equation (17) simply encodes our hypothesis of a stationary Gaussian noise model, with the power spectraldensity (PSD) of the GW detector noise denoted by S n (f ). We work with the three-detector network consisting of the two LIGO and the Virgo detectors. The quantities f high and f low denote the maximum and minimum frequencies, respectively, that enter in the likelihood estimation. The precise values of these quantities are dictated by the sensitivity bandwidth of the GW detectors. We will state them explicitly wherever required. The prior probability distributions chosen for our analyses in this paper are identical to Ref. [60]. More specifically, they are uniform in the component masses, isotropic in spin orientations and uniform in their magnitudes between [0, 0.99], uniform in the Euclidean volume for the luminosity distance, and isotropic in the sky location and binary orientation. We also choose a flat prior on our non-GR parmeters, δφ n , δφ n(l) . For the GW170817 analysis in Sec. V, we will introduce additional parameters and their priors below. With these prior distributions, we proceed to stochastically sample the parameter space using a Markov-Chain Monte Carlo (MCMC) algorithm provided in the LALINFERENCE code [60]. The one-dimensional posterior distribution of a specific parameter (demonstrated, e.g., in Fig. 3) is then computed by simply marginalizing P (θ|d, H) over the nuisance parameters.
We now utilize the Bayesian parameter inference outlined above to demonstrate the FTI approach in two specific cases: in the context of a theory-agnostic test with the BBH GW signals GW190412 and GW190814 (Sec. IV), and a theory-specific test with the BNS signal GW170817 (Sec. V).

IV. APPLICATION OF THE FTI APPROACH TO BINARY BLACK HOLES
The first of our two goals in this paper is to show the pliability of the FTI approach in performing theory-agnositc tests of GR, where one checks for the (dis)agreement between estimates of non-GR parameters with their GR predictions. In this section, we stress-test the FTI approach on GW signals, and explore its robustness against different assumptions made by different families of waveforms. We also investigate possible systematic biases when the underlying waveform model contains missing physics, for example, an absence of information due to higher modes (HMs). Finally, we scrutinize how "flexible" the FTI approach really is in its choice of internal settings like the tapering frequency. An incomplete model or an incorrect internal setting can potentially flag a violation of GR, and needs to be accounted for while performing theory-agnositc tests of GR.
These investigations also illuminate the robustness of parameterized theory-agnostic inspiral tests in general, which is more straightforward in an approach like FTI that strives to expose all flexibility (or arbitrariness) of the test. Comparing this to the TIGER infrastructure [33,34], we note that TIGER is designed and implemented within a particular family of waveform models, with a specific prescription for the transition to the merger-ringdown phase being essentially inherited from the underlying GR waveform.
A. FTI results with synthetic signals: GW190412-like and GW190814-like In this section, we explore the effect on theory-agnostic tests of varying the physical content in our waveform model, for instance, the impact of HMs. We restrict ourselves to GW signals from BBH mergers, specifically, the high-mass ratio GW190412 and GW190814 BBH signals, which show non-negligible evidence for the presence of HMs [54,55]. We note that the FTI analysis with HMs has been already performed by the LVC on these two events in Ref. [28] (see Appendix C therein). Let us recapitulate those results here, summarized by the unfilled black histograms in Fig. 3. However, note that in LVC publications, the FTI results are typically transformed  [54] and GW190412 [55] signals observed by LIGO and Virgo. We indicate the source-frame masses with the superscript s, to distinguish them from the detector-frame masses, mi = (1 + z) m s i , used in the rest of the paper. We use the median values for our synthetic signals: GW190814-like and GW190412-like signals.
Parameter GW190814 GW190412 and reweighed to match the normalization of the TIGER inspiral parameters. This permits comparisons between the two approaches, generally finding good agreement for the GR constraints (see also Sec. V of Ref. [28] for details). Throughout this work, we present the FTI results without such reweighing. We thus create two simulated signals, one with properties similar to GW190412 [55] and another similar to GW190814 [54], and choose our injection parameters to be the median values of these events [54,55] (see Table I) 5 . We choose the pSEOBNRHM waveform model and a zero-noise configuration (i.e., we assume the data only contains the signal and no noise, so as to remove noise-induced systematics and focus on imperfections in waveform modeling). The properties of the two signals are very different and we expect conclusions established through this study to hold for a wide set of BBH signals. For the analysis of these injections, we assume a threedetector network of the two LIGO and Virgo detectors, with their respective sensitivities representative of the third observing run (O3) configurations. Accordingly, we use the PSDs provided in Ref. [61]. We set the minimum frequency of the likelihood computation as f low = 20 Hz. The simulated signals are analyzed with two separate waveform models, pSEOBNRHM and pSEOBNR , to understand the effect of the presence/absence of HMs in our waveform model on the results. We again highlight here that we do not allow more than one deviation parameter to vary during the inference analysis at any given time.
Before investigating these synthetic signals, we highlight a few intriguing points about the black curves in Fig. 3. First, for both the events, the deviation parameter δφ −2 contains the GR value (i.e., zero) at the tail of its posterior distributions. Note that δφ 0 shows a sim- ilar albeit not as pronounced a behavior as δφ −2 . Additionally, for GW190814, the posterior distribution of δφ 5l shows a small secondary peak at ∼ 0.62, which gets enhanced upon the reweighing. These features could be due to the particular noise realization even when using the Gaussian approximation, or might be due to unaccounted systematics, either in our understanding of noise properties around the event, or our waveform modeling, especially in the high-mass regimes. Thus, it would be worthwhile to have the results also from the corresponding simulated signals, so that some of these questions can be answered. We now discuss this.
In Fig. 3 we show the results from the simulated signals next to the black curves (i.e., those from the actual events). We can see that the posterior distributions of δφ −2 obtained from the simulated signals with the HM waveform pSEOBNRHM peak at zero. Also, unlike in the case of the real event GW190814, δφ 5l from the simulated signal has a unimodal posterior distribution. These results suggest that the large bias seen in the case of real events are likely induced by the noise content 6 . Figure 4 shows the 90% bounds on the posterior distributions of the deviation parameters for both signals. The bounds from the real events and the corresponding simulated signals are different, however, they vary by much less than an order of magnitude. This is interesting because there could be noise artifacts in the real data and because the simulated signals may not exactly represent the real events, yet we see that the bounds are of the same order.
We also show, in the same figures, the results obtained with the pSEOBNR waveforms, which do not include HMs. One can see that the posterior distributions obtained with this waveform peak away from the injected value (zero) relative to the HM waveform pSEOBNRHM. Among all, the lower-order deviation parameters (δφ −2 , δφ 0 , δφ 1 , δφ 2 , δφ 4 ) have noticeable biases. This suggests that neglecting HMs would compromise on accuracy, which would be consequential at high SNRs. In   In addition to the accuracy, the inclusion of HMs also improves precision (i.e., the bounds) of the measurement of the deviation parameters (Fig. 4). The improvement is marginal, but it is expected to improve significantly at high SNRs. Furthermore, consistent with our expectation, the lower PN-order deviation parameters (δφ −2 , δφ 0 , δφ 1 , δφ 2 , δφ 3 , δφ 4 , δφ 5l ) are measured relatively better with the GW190814-like signals as this system has lower total mass and thus more of the low-frequency inspiral falls within the bandwidth of the detectors. To give an example, the 90% bound for δφ −2 obtained from GW190814-like signals is ∼ O(10 −3 ), while from GW190412-like signals it is ∼ O(10 −2 ), and thus there is an order of magnitude difference. On the other hand, for GW190412-like signals which have higher total mass, the high PN deviation parameters (δφ 6 and δφ 7 ) are better measured.
We end this section with a plot (Fig. 6) that show a typical FTI waveform. The shaded region shows the 90% uncertainty in the FTI waveforms from the analysis of a representative deviation parameter, in this case δφ 0 , with GW190412-like injected signal. We first computed the waveforms corresponding to the posterior samples and then determine their 90% uncertainty in each time bin. We also plot the corresponding GR waveform which corresponds to the maximum a posteriori (MAP) parameters in the GR analysis of the GW190412-like signal, to explore the uncertainties of the FTI waveforms around the true GR waveform. We align the GR waveform with the FTI MAP waveform at low frequency following the   Figure 6 shows that the MAP FTI and GR waveforms are quite close to each other. This is because their GR parameters are very similar, as we will discuss below.

B. Optimization and systematics of the FTI approach
In this section we would like to investigate how to optimize the null tests enabled by the FTI approach, and also explore possible systematics due to the settings, notably the choice of the tapering frequency, and the family of waveform models adopted.
In Sec. II we have discussed a particular choice of the tapering frequency (f tape 22 = 0.35f peak 22 ). Historically, this choice was motivated by comparisons of the FTI results with TIGER results 8 . Nevertheless, given that the FTI method allows for flexibility in the tapering frequency, unlike TIGER, we want to understand the effects of varying it. It is also theoretically difficult to justify exactly where (i.e., at which orbital frequency) the inspiral ends. We thus allow the tapering frequency to vary up until 7 The exact range of the orbital frequency chosen here for the alignment does not matter much since the parameters of the GR and non-GR waveforms are very close. 8 In the LIGO-Virgo analyses [24][25][26][27][28], the TIGER code [34]   the (approximate) merger frequency, f peak 22 , and explore its impact on our results. Varying the tapering frequency beyond the merger frequency would not make much sense for an inspiral test. We note that FTI analysis of BNS would not be affected by increasing the default tapering frequency, because for such systems, higher tapering frequency lie outside the sensitivity bandwidth of current GW detectors (e.g., 0.35f peak 22 1350 Hz for GW170817). In Fig. 7 we show the change in the 90% credible intervals of the deviation-parameter posterior distributions, as the tapering frequency is varied. More specifically, the exact change in the bounds depends on the underlying signal itself, for example, for the high-total-mass GW190412-like system (right panel), with less inspiral in the detectors' frequency bands, bounds on the lower PN order deviation parameters (δφ −2 , δφ 0 , δφ 1 , δφ 2 ,φ 3 ) change at most by a factor of ∼ 2-3 (if we push the tapering frequency up to the peak of h 22 ). This makes sense because those deviation parameters affect the waveform significantly only at low frequencies and thus increasing the tapering frequency does not make much difference to the results. The change on the bounds increases (up to ∼ 5) for the higher PN deviation parameters, and becomes the largest for δφ 6 , for which the variation is ∼ 7.
Extending the tapering frequency up to close to merger increases the available SNR (see Table II), and improves the measurement of the higher PN deviation parameters δφ 4 , δφ 5 , δφ 6 .
For low-total-mass GW190814-like systems, as one can see from the left panel of Fig. 7, for almost all deviation parameters, the tapering frequency has a somewhat larger impact on the bounds. Bounds for the lower PN order deviation parameters (δφ −2 , δφ 0 , δφ 1 , δφ 2 , and δφ 3 ) change by a factor of 1-5, while the higher-order ones change by a factor as large as ∼ 7 (forφ 5 ). Also in this case, extending the tapering frequency up to merger, can improve the measurement of the high PN deviation parameters. The above studies suggest that to optimize the FTI framework for BBHs, it would be beneficial to obtain a tapering frequency that depends on the available SNR and the number of GW cycles up to the peak of the waveform. Eventually, those quantities depend on the total mass of the binary, mass ratio and spins.
As stated above, the chirp mass M generally correlates with the deviation parameters, a property discussed in more detail in the following subsection. This means that the measurement of the chirp mass would get affected by the variation of the tapering frequency. This can be seen in Fig. 8. The left panel shows the correlation of M with δφ 0 for GW190814-like systems, while the right panel shows the correlation with δφ −2 for GW190412-like systems. From the left panel we observe that the variation in the tapering frequency only affects the width of the chirp-mass posterior distribution, while the right panel shows that, additionally, there could arise some biases (even in the deviation parameter δφ −2 ) if the tapering frequency is too low. The reason for this could be that, for relatively high-mass systems, the number of GW cycles in the detectors' frequency bands is already low to start with, and thus using too low tapering frequencies would essentially make the inference almost insensitive to δφ −2 . We note that for the GW190412-like source, the bias is quite reduced when we use the tapering frequency at the peak of the (2,2) mode. This is because for this high-mass binary the last few cycles before merger can increase significantly the SNR accumulated. Hence, if one should find evidence for a violation of GR, one must first check for a possible bias of the FTI results by varying the tapering frequency. We find that the other GR parameters, besides the chirp mass, remain mostly unaffected. This could be because they play a subdominant role in the orbital dynamics during the inspiral.
Finally, so far we have employed for our analyses the EOB-based waveform models: SEOBNRHM and SEOBNR. We show in Fig. 9 the results obtained from the two simulated signals when recovered with the aligned-spin phenomenological waveform pPhenomX, in addition to the SEOBNR waveform. Both models only contain the (2, 2) mode. As we can see, there is really no significant difference between the bounds obtained from these two different waveform families, except for δφ 7 with the highermass GW190412-like system (perhaps due to modeling differences at high frequencies). In addition, the details of the individual posterior distributions (not shown here) are very similar. We thus expect that the results established in this work do not depend much on the underlying family of GR waveform models used, as long as they have comparable accuracy to numerical relativity.

C. Impact of the FTI approach on the GR parameters
In Figs. 10 and 11 we show the posteriors of the GR parameters of the two simulated signals introduced above, when recovering them with the GR SEOBNRHM model and the pSEOBNRHM model. For the cases involving the parameters δφ −2 and δφ 0 , we observe that the measurement uncertainty of the chirp mass increases significantly, while other GR parameters, e.g., the total mass (M ), mass ratio (q), or effective spin (χ eff ), remain unaffected. For the GW190412-like signal, the addition of δφ −2 introduces is used for the analysis, as demonstrated in the previous subsection. Hence, the biases appear to be the consequence of an insufficient number of GW cycles or SNR.
The fact that the chirp-mass measurement is heavily correlated or degenerate with the measurement of some non-GR parameters can be understood by looking at the FTI formulation itself. Restricting ourselves to the (2, 2)mode, using the definition of v from Eq. (8), the n/2-PNorder coefficient in Eq. (11) that contributes to the total phase correction is where δφ n is the n/2-PN-order deviation parameter. For the leading-order term n = 0, this implies since ψ (GR) 0 = 1. For the -1PN and 0.5PN phase corrections (i.e., n = −2, 1), which are absolute corrections since they are absent in GR, the expressions read and It becomes clear from Eqs. (20), (21), and (22) that the deviation parameters δφ −2 , δφ 0 and δφ 1 are degenerate with the chirp mass, as Fig. 8 shows. There are also correlations between the chirp mass (and other binary parameters) and the deviation parameters at higher PN orders, but they are milder. Indeed the addition of δφ 2 , δφ 3 , δφ 4 , δφ 5l and δφ 6l do not affect estimates of GR parameters in any noticeable fashion, as we have verified using the results in Figs. 10 and 11. However, for the highest PN-order deviation parameters, δφ 6 and δφ 7 , posterior distributions of GR parameters can show features like bimodalities, depending on the underlying signal, see Fig. 11. This is because, unlike the cases of n = −2, 0, 1, for values of n ≥ 2 the PN coefficients ψ spins. In fact, the bimodalities observed in the δφ 6 and δφ 7 cases disappear when we perform the analysis with non-spinning waveforms, shown by the dotted lines in Fig. 11. This suggests that the deviation parameters δφ 6 and δφ 7 induce strong correlations between the GR parameters when the binary is spinning. We notice that the amount of bimodality also depends on the tapering frequency, notably on the GW cycles and SNR.

V. APPLICATION OF THE FTI APPROACH TO A BINARY NEUTRON STAR
Here we consider the application of the FTI approach to a specific alternative theory of GR: the Jordan-Fierz-Brans-Dicke (JFBD) scalar-tensor theory [62][63][64]. Initially formulated in the mid-20th century, JFBD gravity was the very first scalar-tensor theory-a theory in which gravity is mediated by both a tensor (the metric) and a scalar. Since then, significant work has been done to extend this notion beyond JFBD theory to broader, more generic classes of scalar-tensor theories (e.g., Horndeski theories [65], Beyond Horndeski theories [66], Degenerate Higher-Order Scalar-Tensor theories [67,68]). Yet, despite its simplicity, JFBD gravity remains relevant today, though more as a pedagogical archetype of modified gravity than as a truly viable alternative to GR. In this vein, constraining JFBD theory with a particular experiment offers an easily understood benchmark of its sensitivity to deviations from GR.
The action for JFBD gravity written in the Jordan frame is given by where φ is a massless scalar field, ω BD is a dimensionless coupling constant 9 , and S m represents the action for matter fields ψ minimally coupled to the metricg µν . (Here ψ should not be confused with the GW modes ψ m introduced earlier.) Alternatively, the action can be rewritten in the Einstein frame by performing the conformal transformation g µν ≡ φg µν 9 JFBD is also commonly known as simply Brans-Dicke gravity (BD); following the standard convention in the literature, we adopt this abbreviation when denoting the coupling constant ω BD .
where we have defined the dimensionless parameter α 0 ≡ (3 + 2ω BD ) −1/2 and introduced the scalar field ϕ ≡ log(φ)/(2α 0 ); note that α 0 is non-negative and that we have implicitly assumed that ω BD > −3/2. (Note that ϕ is unrelated to the FTI deviation parameters.) In the limit that α 0 → 0 (ω BD → ∞), the scalar field decouples from the metric and matter, and JFBD theory reduces to GR with an additional massless scalar that is minimally coupled to gravity only. The most accurate constraints on this parameter come from the Doppler tracking of the Cassini spacecraft through the Solar System [69] α 0 < 4 × 10 −3 (ω BD > 4 × 10 4 ), and binary pulsar obervations [20,[70][71][72]. In particular, the most recent results obtained in Ref. [73] with 16 years of observation of the double-pulsar J0737-3039 yield the bounds α 0 = 0.004083, and = 0.003148 at 95% credible level, when the stiff MPA1 and soft WFF1 EOS are employed, respectively. The recent advent of GW astronomy offers a new avenue to test gravity in the relativistic regime. The majority of GWs observed by LIGO and Virgo thus far were generated by the coalescence of BBHs; several tests of GR have already been conducted using these observations [24,25,27,35]. However, Hawking famously showed that stationary BHs in JFBD theory must have a trivial scalar profile, and thus are indistinguishable from the analogous solutions in GR [74]. Although there are some possible scenarios that evade this no-hair theorem (see, e.g., Ref. [75]), binary systems composed of BHs are generally expected to behave identically in JFBD theory and GR, and thus GWs from such systems are unable to constrain this scalar-tensor theory.
Unlike BHs, NSs source a nontrivial scalar field in JFBD gravity, and thus BNS systems can be used to constrain α 0 . In this section, we use the first GW observation of a coalescing BNS-GW170817 [5]-to constrain α 0 at the 68% (and 90%) credible levels. Though the constraint from GW170817 is not as strong as those previously quoted from other experiments, this result represents the first bound directly from the highly dynamical (orbital velocities v ∼ 10 −1 ) and strong-field (Newtonian potential Φ Newt = M/R ∼ 10 −1 ) regime of gravity.
This section is organized as follows. In Sec. V A, we detail the GW signature of JFBD theory in BNSs. Then, in Sec. V B, we present two Bayesian analyses to constrain α 0 with GW170817: the first directly uses the theoryagnostic analyses presented in Sec. II, while the second is tailored specifically to test JFBD gravity.
A. Gravitational-wave signature of JFBD gravity The predominant differences in GWs produced in JFBD gravity as compared to GR stem from the fact that only the latter respects the strong equivalence principle. This principle extends the universality of free fall by test particles implied by the Einstein equivalence principle to also include self-gravitating bodies; unlike in GR, the motion of a body through spacetime depends on its internal gravitational interactions (i.e., its composition) in scalar-tensor theories like JFBD gravity. This section details how this violation of the strong equivalence principle impacts the GWs produced by binary systems in JFBD gravity. This alternative theory of gravity falls within the class of scalar-tensor theories for which PN predictions have been computed. Though those results are available at next-to-next-to-leading PN order (and even higher-order PN calculations have been made recently [76][77][78][79]), we will assume that α 0 is sufficiently small that we can neglect all but the leading-order PN effects when describing the signature of JFBD gravity in a gravitational waveform.
The dominant effect on the inspiral from the new scalar introduced in JFBD gravity is the emission of dipole radiation, which enters into the phase evolution at -1PN order. In the notation of the FTI framework and using the fact that the quadrupolar GR radiation dominates over the dipolar one for small α 0 [76], this contribution is given by where α i is the scalar charge of body i, defined as where m i (ϕ) is the gravitational mass of body i measured in the Einstein frame (i = 1, 2). That a body's mass depends on the local value of the scalar field is unsurprising given the form of Eq. (24); a shift in ϕ modulates the physical metric e −2α0ϕ g µν that effects gravity upon the matter fields ψ, and thus also modulates any body's gravitational mass. This dependence is an explicit manifestation of violation of the strong equivalence principle. Note that in the limit that a body has no self-gravity (i.e., the test-body limit), the functional form of m i (ϕ) simplifies significantly to and thus its scalar charge reduces to α (test body) i = α 0 .
As strongly self-gravitating bodies, violations of the strong equivalence principle are particularly pronounced in NSs. This violation manifests as a scalar charge that differs significantly from the test-body charge α 0 . As the scalar charges of a binary's constituents-or rather their difference, the scalar dipole-control the dominant effect on the GW signal in JFBD gravity, we devote the remainder of this subsection to computing these quantities for various NSs.
We consider spherically symmetric, static solutions sourced by a perfect fluid as a model for an isolated, nonspinning NS. Under these assumptions, the field equations for Eq. (24) reduce to the Tolman-Oppenheimer-Volkoff (TOV) equations, given in the Einstein frame in  Ref. [71]. These solutions are parameterized by three degrees of freedom; for our purposes, these are most clearly manifested as the (i) background scalar field ϕ 0 , that is the scalar field far from the NS, (ii) the NS EOS, and (iii) the NS mass. 10 Note that the asymptotic scalar field ϕ 0 can be set to zero without loss of generality by rescaling the Jordan-frame bare gravitational constantG accordingly, that is ϕ 0 → 0 ⇒G →Ge 2α0ϕ0 . The remaining degrees of freedom can be mapped to boundary conditions for the matter and scalar field at the origin with a numerical shooting method [84]. These conditions are parameterized by the central pressure P c and the scalar field ϕ c , which serve as the inputs for numerically integrating the TOV equations. (Note that in this section ϕ c is different from the parameter used in Sec. II to define the binary's orientation.) The details for extracting the mass and scalar charge from the numerical solutions of the NS interior are given in Ref. [71]. Ultimately, we would like to combine the numerical calculations of the NS scalar charge outlined above with their anticipated effect on the GW signal (25) to constrain JFBD theory. However, evaluating the scalar charge directly for every point visited by the stochastic sampling algorithms used for parameter estimation would require an unreasonable amount of computational resources. Instead, we compute polynomial fits for the scalar charge, which, after having been derived, can be evaluated quickly and with little computational overhead.
We first construct solutions for various choices of EOS, NS mass, and scalar-tensor coupling α 0 . We interpolate tabulated EOS data for the sly [80], eng [81], 10 We define the NS mass as the tensor mass m T introduced in Ref. [83] because-as shown in that reference-it obeys the same conservation laws as the ADM mass in GR.
and H4 [82] EOSs; sly is a soft EOS (compact stars) whereas H4 is relatively stiff (diffuse stars). Then, we numerically construct NSs with masses ranging between m i ∈ [0.5 M , 2.0 M ] and scalar coupling α 0 ∈ [0.001, 1.0] and compute their scalar charge. We calculate two types of polynomial fits of the scalar charge for each EOS. For the first, we factor out the dominant linear dependence of α i on α 0 , fitting their quotient as a fourth-order polynomial in m i . We compute the polynomial fits with least-squares regression; the fits are given in Table III for each EOS that we consider. These fits match all of our data sets within 5% relative error, with the greatest discrepancy arising for masses close to 2 M .
Although these (effectively) one-dimensional polynomial fits are crucial for some of our analysis, it is possible to construct two-dimensional fits that are simpler (fewer terms) and more accurate using more sophisticated methods. We compute these fits using the greedy-multivariate-rational regression method developed in Ref. [85]. This method relies on a greedy algorithm to construct a multivariate fit: during each iteration, it adds a polynomial term to the current fit (up to a pre-specified maximum degree) so as to best improve the agreement with the inputted data. This process is repeated until sufficient accuracy is achieved, and then terms are systematically removed from the polynomial until the accuracy goal is saturated. Using this method, we construct fits that agree to within 1% relative error for each EOS-these are listed in Table IV. B. Constraining α0 with GW170817 Next, we use the tools introduced previously to place constraints on the scalar-tensor coupling α 0 in JFBD gravity with GW170817-the first GW event from a co-alescing BNS. We present two complementary analyses based off of the FTI infrastructure to achieve this result. These two methods follow the same overall approach, but adopt different statistical assumptions, utilize different waveform models, and use different numerical fits for the NS scalar charge α i . In both approaches, we employ a generalized waveform model that allows for an additional contribution to the phase evolution at -1PN order, so as to reproduce the behavior seen in Eq. (25); however, the parameterization of this -1PN deviation from GR differs in each approach. Ultimately, both analyses provide a bound on α 0 of the same order of magnitude.
The first approach we adopt directly uses the theoryagnostic constraints on a -1PN deviation, and it was originally obtained in Ref. [26]. We use in this analysis a generalization of the tidal, aligned-spin SEOBNRT [53,57,58] waveform model in which the -1PN term is parameterized by the deviation parameter δφ −2 . By assuming a particular NS EOS, we can use the polynomial fit in Table III in conjunction with Eq. (25) to map a measured value of δφ −2 to an inferred value on α 0 ; schematically, this mapping takes the form α 0 (δφ −2 , m 1 , m 2 ; EOS). Note that this mapping is infeasible using the multivariate fit in Table IV because of the nonlinear dependence of α i on α 0 . Though the exact NS EOS remains unknown, we can repeat this analysis for the three candidate EOSs detailed earlier, and then use the variance on the bounds of α 0 , recovered each time, as an estimate of the systematic error arising from our ignorance of the true NS EOS.
In practice, one does not measure the masses and deviation parameter δφ −2 with perfect accuracy, but instead uses Bayesian inference (Sec. III) to reconstruct the posterior distribution P (θ|d, H) on these parameters given some assumed prior distribution p(θ|H). So, rather than map a single point from one parameterization to another, we instead map the appropriate distributions to their counterparts in the new parameterization. These prior and posterior distributions transform respectively as and where the first term on the right-hand side of either equations is the inverse of the Jacobian of the aforementioned transformation.
In the analysis of Ref. [26], a flat prior was assumed on the component masses and deviation parameter δφ −2 .
(We note that the upper prior bound is dictated by the theory since δφ −2 ≤ 0, see Eq. (25).) These choices reflect the theory-agnostic nature of that test; without . This analysis provides a bound of α 0 2 × 10 −1 (5 × 10 −1 ) at a 68% (90%) credible level. We find that that the systematic error arising from our ignorance of the NS EOS does not impact our estimate at 68%, but change the bound at 90% CL by ∼ 30%. The bound on α 0 with GW170817 is about three orders of magnitude larger than what obtained with the doublepulsar J0737-3039 [73], which since 2003 has been tracked for about 60 000 orbital cycles. Future observations with GW detectors on the ground and in space, will allow to reach similar or better accuracies [86]. Due to the much smaller velocity (v/c ∼ 2 × 10 −3 ), the double-pulsar observation becomes quickly less constraining for high PN terms entering the GW phasing (i.e., the ones discussed in Sec. IV). We can straightforwardly translate our bounds on the coupling to the JFBD parameter ω BD using that α 2 0 = 1/(3 + 2ω BD ), see Table V. We find that the conservative bound in the theory-specific approach is ω BD 1.12 (−0.70) at a 68% (90%) credible level. We also find it useful to quote here a bound on the parameterized post-Newtonian (PPN) parameter γ PPN [72] that would correspond to our bound on α 0 , using |1 − γ PPN | = 2α 2 0 /(1+α 2 0 ). This leads us to the bound |1−γ PPN | 0.32 (0.77) at a 68% (90%) credible level. However, note that our constraint originates from the dipole radiation and not from the 1PN deformation of the metric that defines γ PPN .
The second approach we employ to constrain α 0 relies instead on a waveform model design specifically to test JFBD theory. Using the FTI infrastructure, we construct a generalized waveform model from SEOBNRT in which the deviation parameter is precisely α 0 . The appropriate form of the -1PN correction to the phase evolution is obtained by inserting the polynomial fit for α i (α 0 , m i ) found in Table IV for a particular choice of the EOS into Eq. (25). Additionally, unlike the previous theoryagnostic analysis in which the tidal parameters were allowed to vary freely, for this analysis, we express these parameters as functions of the respective NS masses and assumed EOS. 11 This step reduces the dimensionality of the waveform model by two parameters while ensuring that all matter effects are handled self-consistently. We assume a flat prior on α 0 ∈ [0, 1] for this analysis; beyond this upper bound, our assumption that JFBD effects of order α 2 0 are subdominant to the PN effects in GR is no longer valid. This prior distribution is depicted in Fig. 12 with a solid black curve. Using the generalized waveform described above, we perform parameter estimation to construct the marginalized posterior distribution on α 0 , shown in Fig. 13 with solid colored curves corresponding to the assumed EOS. We obtain the upper bound of α 0 4 × 10 −1 (8 × 10 −1 ) at a 68% (90%) credible interval where, as before, the systematic error due to ignorance of the true NS EOS does not contribute at this level of precision. As explained above, our constraint on α 0 can be translated to the bounds ω BD 14.37 (−0.02) or |1 − γ PPN | 0.06 (0.50) at a 68% (90%) credible level.
Comparing the bounds set by the two analyses, we see that the theory-agnostic test provides a stronger bound on α 0 . At first glance, this result may appear counterintuitive, as Fig. 12 shows that this test assumed a marginalized prior on α 0 with greater support away from zero. The predominant cause for this discrepancy stems from how the tidal parameters are handled by each waveform model. For the theory-agnostic test, these parameters are allowed to vary freely, independent of the masses of the NSs. However, in the theory-specific test, the tidal parameters are linked directly to the component masses. This latter restriction significantly affects the recovered posterior distribution on the component masses, placing much greater weight near equal-mass configurations than in the previous case. As can be seen in Eq. (25), in 11 We use polynomial fits to the tidal parameters as a function of NS mass that are constructed in GR, not in JFBD gravity. However, the differences between these two relations scale as α 2 0 , and thus can be neglected in favor of the simpler GR relation by the same reasoning that the other sub-dominant PN effects (e.g., at 0PN order, 0.5PN order, etc.) can be ignored. very symmetric configurations, the total deviation from the baseline GR waveform remains small even when α 0 is relatively large; as a result, the JFBD parameter is more poorly measured when the tidal parameters cannot vary freely, and thus we recover a weaker bound with this theory-specific test.

VI. CONCLUSION
In this paper we have developed a framework that allows us to introduce deviations from GR to any frequency-domain inspiral phasing, assuming that such corrections are small modifications to the GR signal. For theory-agnostic tests, our FTI framework has already been successfully applied to GWs observed by LIGO and Virgo detectors by the LVC in Refs. [24][25][26][27][28][29], while for theory-specific tests the FTI method was employed in Ref. [37] to set bounds on an effective-field theory of GR using BBH signals.
More specifically, building on the PN SPA phasing in frequency domain, we have described how to apply the FTI framework to multipolar aligned-spin waveforms, by introducing deviations parameters to GR PN terms up to 3.5PN order, and also to PN terms that are absent in GR, such as the -1PN and 0.5PN corrections. Al- though we apply our framework mainly to the inspiralmerger-ringdowm SEOBNRHM waveform model, which contains ( , m) = (2, 1), (3,3), (4,4), (5,5) modes in addition to the dominant (2, 2) mode, the method is general and can be used for any frequency-domain waveform (or the Fourier-transform of a time-domain waveform). The corrections introduced in the phasing are tapered off at a certain orbital frequency, which is a free parameter in this framework. The tapering process is introduced to ensure that the modified PN phase for each mode reduces to its corresponding GR phase during the late-inspiral-mergerringdown stages, up to a constant phase shift above the tapering frequency.
We have then discussed the application of the FTI framework to BBHs, notably to two specific high-massratio events observed by LIGO and Virgo, GW190412 and GW190814. The latter were previously analyzed in Ref. [28], where it was observed that the posterior distributions of the -1PN deviation parameter, δφ −2 , were peaking away from the GR prediction, suggesting possible modeling systematic biases. Here, by creating simulated signals with parameters corresponding to the median values of the two events, we have demonstrated that these features are most likely due to (unexplored) artifacts of the noise around the events rather than missing physics in the waveform models. Furthermore, using the simulated signals, we have showed that modes beyond the quadrupole do affect the accuracy of the deviationparameter measurements and the GR parameters, especially if the signals have relatively high-mass ratios and inclinations. In such cases, neglecting the high modes can significantly bias the measurements at high SNRs, leading, erroneously, to interpret the measurement results as a violation of GR.
We have also performed robustness tests of the FTI results, and showed that the bounds could be sensitive to the tapering frequency, depending on the parameters of the signal being analyzed. For very low total-mass systems like the BNS GW170817, the tapering frequency lies outside the frequency band where the majority of the SNR is accumulated, thus the bounds are not expected to change. On the other hand, for high-total-mass BBH systems, like GW190814 and GW190412, the bounds on the deviation parameters can change. More specifically, bounds on deviation parameters that are measured with an accuracy of few tens of percent or less, may change by a factor 2-5, depending on the binary's parameters, when the tapering frequency spans the last 3-4 GW cycles before the (2,2)-mode's peak. These results suggest to go beyond the choice of the tapering frequency adopted so far in Refs. [24][25][26][27][28], where it was fixed to a specific value to compare with the complimentary analysis provided by the TIGER framework [34]. In order to optimize the GR test with the FTI and exploit the full SNR accumulated during the inspiral, up to merger, comprehensive studies would need to be undertaken to determine the best choice of the tapering frequency as a function of the binary's properties and the accumulated SNR. By contrast, we have shown that the bounds are marginally affected by the use of a different but similarly accurate waveform model, e.g., the state-of-the-art aligned-spin (phenomenological) pPhenomX model, instead of pSEOBNR.
Moreover, we have investigated how the presence of the deviation parameters in the waveform model affects the measurement of the GR parameters. We have found that most deviation parameters such as δφ 2 , δφ 3 , δφ 4 , δφ 5l , and δφ 6l do not impact the GR parameters in any noticeable way. On the other hand, the lower-order deviation parameters δφ −2 , δφ 0 and δφ 1 affect the width of the chirp-mass measurement quite significantly. This is due to the fact that those deviation parameters are degenerate with the chirp mass. We have found that the extent of the correlation, however, also depends on the choice of the tapering frequency (i.e., on the amount of SNR). The remaining two deviation parameters, δφ 6 and δφ 7 , can cause bimodalities in the posterior distributions of the GR parameters. The strength of these bimodalities depends on the tapering frequency and the binary's parameters. We find that the bimodalities can be also caused by the fact that the 3PN and 3.5PN terms in the phasing are functions of the mass ratio and the spins, and depending on the binary's parameters, those PN coefficients can go to zero. It might be possible to avoid bimodalities in the GR posterior distributions by splitting the deviation parameters at 3PN and 3.5PN order in non-spinning and spinning parts, and do inference on those parts separately.
Finally, we have used the FTI method to perform a theory-specific test with the BNS event GW170817 and the JFBD gravity theory. We have obtained constraints on the JFBD parameter α 0 following two strategies and employing the tidal aligned-spin SEOBNRT waveform model. In the first approach, we directly converted the theory-agnostic -1PN (i.e., δφ −2 ) posterior samples into samples of α 0 (δφ −2 , m 1 , m 2 ; EOS) using the fits in Table III for different NS EOSs. This approach has provided us with a bound on α 0 2 × 10 −1 (5 × 10 −1 ) at the 68% (90%) credible interval, see also Table V for the corresponding bounds on ω BD and γ PPN . In the second approach, we have employed a waveform model that is specifically designed to test JFBD, that is we have chosen as deviation parameter directly α 0 . In this case, the phase corrections are given by Eq. (25) where α 1 and α 2 are determined using the fits in Table IV. Additionally, in this second approach, we have also fixed the tidal parameters since for a given EOS they can be determined from the component masses. This process reduces the dimensionality of the waveform model while ensuring that all matter effects are handled selfconsistently. Using a flat prior on α 0 ∈ [0, 1], we have found a bound on α 0 4 × 10 −1 (8 × 10 −1 ) at the 68% (90%) credible interval, see also Table V for the corresponding bounds on ω BD and γ PPN . The main reason for the difference in the bounds can be traced to how the tidal parameters are handled in the two approaches. In the theory-agnostic approach the tidal parameters were allowed to vary freely during the inference analysis. By contrast, in the theory-specific approach, tidal parameters and scalar sensitivities cannot be treated as independent, but must be computed consistently (for each NS mass) by fixing the EOS. Thus, theory-specific and theory-agnostic analyses test slightly different statistical hypotheses, and within a fully Bayesian framework, converting results from one to the other requires care. To our knowledge, the impact of statistical hypotheses on relating theory-specific and theory-agnostic bounds has not been studied in detail in the context of GW tests of GR, and thus offers an interesting new avenue for future work.
Lastly, the FTI framework can be applied to perform theory-specific tests with non-GR theories other that JFBD gravity, as done in Ref. [37]. Additionally, the framework can be adapted to constrain other effects that leave an impact on the inspiral phase -for example the existence of exotic compact objects having a spin-induced quadrupole moment different from the one of a BH in GR [87,88]. However, more crucial for future, more sensitive observations, is the extension of the FTI framework to the spin-precessing case. This will be possible by applying the frequency-domain corrections to the phasing in the co-precessing frame, where the GWs are usually well approximated by aligned-spin waveforms [89,90].