Gravitational-wave parameter inference with the Newman-Penrose scalar

Detection and parameter inference of gravitational-wave signals from compact mergers rely on the comparison of the incoming detector strain data d ( t ) to waveform templates for the gravitational-wave strain h ( t ) that ultimately rely on the resolution of Einstein’s equations via numerical relativity simulations. These, however, commonly output a quantity known as the Newman-Penrose scalar ψ 4 ( t ) which, under the Bondi gauge


I. INTRODUCTION
The observation of the gravitational-wave (GW) event GW150914 in 2015 by the Advanced LIGO detectors [2] opened a new window to explore the Universe [3].In barely half a decade, and after the addition of the Advanced Virgo [4] and KAGRA [5] detectors, the number of detections has grown to 90 events, all consistent with the merger of compact objects such as black holes (BHs) and neutron stars (NSs) [6][7][8][9].These events have provided us with invaluable knowledge about the BH population of our Universe [10,11] and their environments, star formation, tests of strong gravity [12][13][14][15][16][17][18][19][20] and the observation of new strong-field phenomena [21][22][23][24][25] to name a few.The retrieval of this information relies on an accurate extraction of the parameters of the GW source.This is commonly carried out through the comparison of the incoming strain detector data d(t) to pre-computed waveform templates for the gravitational-wave strain h(t; θ) [26][27][28][29][30] that span a continuous range of possible source parameters θ such as the masses and spins of merging compact objects.In this process, it is crucial that waveform templates are faithful representations of the incoming GWs.For the case of the GWs emitted during the early inspiral, these templates can be obtained through analytical approximated techniques such as post-Newtonian approximations [31] or effectiveone-body formalisms [32,33].However, modelling the full space-time dynamics taking place during the merger and ringdown stages of compact binary mergers requires solving the full Einstein's equations for the system, which can only be done through numerical simulations using numerical relativity (NR) [34][35][36][37][38][39][40][41][42][43][44][45].Consequently, approximated models are commonly calibrated to these simulations during the merger and ringdown stages [46][47][48][49][50][51][52].Alternatively, "surrogate" models can also be constructed by interpolating through a given set of NR simulations [53][54][55].
When available, NR provides the most accurate prediction for the GW emission of a given source.Therefore, if simulations are available in the parameter space of interest, a direct comparison of GW data to NR simulations Schematic comparison of our proposed data analysis framework and the currently used one.To date, the ψ4 magnitude outputted by numerical relativity simulations is converted to the strain h outputted by gravitational-wave detectors via a double integration that is subject to systematic errors (red path).Instead, we transform both the simulation ψ4 and the detector h (and powerspectral-density Sn) into a third quantity that we label by Ψ4, avoiding the integration process and the corresponding systematic errors (green paths).
is fundamental to -at least -check the robustness of the results provided by approximated models [56,57].Furthermore, in some cases such as highly eccentric or precessing sources, continuous semi-analytical models may not exist, leaving NR as the only option to analyse the data.Consequently, several studies have directly compared some of the existing signals to NR templates [56][57][58][59] and even used the latter as simulated signals to evaluate the efficacy of parameter estimation and detection algorithms [60][61][62][63][64].
Continuous models that include the entire inspiralmerger-ringdown process can only be built for regions of the parameter space densely covered by the available numerical simulations, namely sources with small orbital eccentricity and relatively equal masses.While these examples have sufficed to explain all GW signals detected to date, we are entering an era in which comparison with more exotic scenarios, for which only NR waveforms exist, is in order.Moreover, as of now, NR provides the only way to accurately model the dynamics of exotic compact objects and search for new physics beyond the neutron-star and Kerr black hole paradigm.For example, in Ref. [1] we recently compared GW190521 to numerical simulations of head-on mergers of exotic horizon-less objects known as Proca stars, demonstrating that the latter scenario is slightly more consistent with the data than the standard one based on BBH mergers.

II. EXTRACTION OF GRAVITATIONAL-WAVE STRAIN FROM NUMERICAL SIMULATIONS VIA THE NEWMAN-PENROSE SCALAR ψ4
While current GW detectors output a quantity known as GW strain h(t), the most extended type of NR simulations, which are based on the 3+1 formulation (or Cauchy evolution), return the so-called Newman-Penrose (NP) scalar ψ 4 (t) [65,66].Under the Bondi gauge, the scalar ψ 4 (t) is related to the strain by ψ 4 (t) = d 2 h(t) dt 2 [67].Obtaining h(t) therefore requires a double time integration, which is a non-trivial process involving fundamental difficulties.A well-known effect is the appearance of non-linear drifts in the resulting strain-waveform arising from the time-domain integration of finite length, discretely sampled and noisy data streams.These are independent of the parameters of the simulation, such as gauge or numerical method used [68].
Frequency-domain integration methods can avoid the effects arising from time-domain integration but at the cost of modifying the original data.One of the bestknown effects is the impact of spurious low-frequency modes in the strain waveform.It is known that the effect of these modes, resulting from either spectral leakage or aliasing effects, can be significantly suppressed through the usage of high-pass signal filters [68,69] that can reduce the energy within frequencies lower than a chosen cutoff ω 0 .This technique is commonly known as "fixed-frequency integration" (FFI).A common and wellmotivated choice for ω 0 is that corresponding to the lowest instantaneous frequency of the GW emission.This is, for instance, the strategy used by the existing parameterestimation code RIFT [58] to directly compare GW data to NR templates.In practice, however, the choice of ω 0 requires a certain amount of tuning.On the one hand, a small value will amplify nonphysical low-frequency components during the integration process.On the other hand, a large value may suppress the physical frequencies of the waveform.In some cases, such a choice can be clearly guided by the known features of the "true" signal.For instance, in the case of quasi-circular binaries, the inspiral GW 'chirp' frequency is a monotonically increasing function of time, which provides a natural way to associate a given choice of ω 0 with a given starting time for the strain waveform.
This choice, however, is not obvious or even well defined for cases where the GW frequency is not a monotonic function of time.On the one hand, this makes FFI itself a potential source of error that, as we will show, can qualitatively impact the interpretation of the source.On the other hand, in the best-case scenario, a "correct" obtention of the strain data requires an artisan and time-consuming trial-and-error process adapted to each particular type of source.This is the case of some of the most interesting sources that the astrophysical community is trying to detect for the first time in the next observing run of LIGO and Virgo.Such cases include eccentric mergers [59], highly precessing mergers, dynamical captures, [70] or even cases in which an inspiral stage does not exist at all.This is also the situation for the (academic) case of head-on collisions [71] that we will address in this work or for core-collapse supernovae waveforms, for which the bounce GW signal consists essentially of a burst [72][73][74].Finally, the act of integrating involves a choice of integration constants that can cause fundamental changes in the properties of the waveform.For instance, it is typically imposed that the average of the GW strain should be zero, which automatically removes/changes the effect of GW memory [75].
Integration-free extraction methods.While in this work we focus on numerical waveforms obtained through the NP formalism, whose use is much generalised in the numerical relativity community, we note that there exist alternative methods that can directly extract the GW strain.First, within the Cauchy evolution framework, the GW strain can be also directly extracted at finite radii through the so-called Regge-Wheeler-Zerilli (RWZ) method [76][77][78].Just as in the NP case, waveforms can then be extrapolated to null infinity through e.g.polynomial expansions [41,79].RWZ extraction has been long employed by the SXS collaboration [41], producing extensive catalogs of BBH waveforms that include the largest number of inspiral cycles in the literature.These waveforms have been consistently used to inform continuous waveform models (e.g.[54]), often in combination with NP ones (e.g.[50,51]), and to directly analyse several GW events [56,57].Second, within the so-called Cauchy Characteristic Extraction (CCE) framework [80], waveforms can be directly extracted at future null infinity.While early simulations were limited to output the Bondi news function [81] (given by N (t) = dh(t)/dt ), later developments made possible the direct extraction of the GW strain [82][83][84], including some simulations in the SXS catalog [84].
While the above approaches present advantages with respect to the NP formalism, there are reasons that motivate the generalised used of the latter.First, the master wave equations of the RWZ approach are obtained under the assumption that the background metric can be described by a Schwarzschild spacetime where perturbations are applied.Also, it has been found that the relative accuracy of RWZ and NP methods depend on the the case of application, e.g., [85,86].Second, while CCE can deliver extremely accurate waveforms, it involves specific complications that have so far prevented its widespread use, on top of its intensive computational cost.For instance, recent studies have discussed the weak hyperbolicity of the characteristic evolution equations [87][88][89].Also, initial spurious emission known as junk radiation has been found to last significantly longer in these simulations [84].In addition, characteristic evolution (based on null foliations) is known to be limited to describe BBH systems, as null surfaces may focus and form caustics [81].This triggered the design of so-called Cauchy Characteristic Matching methods [89,90].Finally, we note that NP presents several advantages of its own, as enumerated in [66]: a) it provides a first-order, gauge-invariant description of the radiation field (please see Koop & Finn [91] for a fully gauge invariant derivation of the detector response); b) it does not rely on any frequency or multipole decomposition; c) the Weyl scalars (ψ 4 among them) are defined in the full non-linear theory.A one-parameter perturbative expansion of this theory was proved to provide a reliable account of the problem [92]; and d), finally, the NP formulation provides a simpler framework to organise higher-order perturbation schemes.For an overview of different waveform extraction methods we refer the reader to [93] In this work we remove from GW data analysis the fundamental problems related to waveform integration by avoiding such step.We present a framework, schematically summarised in Fig. 1, to perform GW data analysis directly using ψ 4 (t).This provides an uniquely defined way to obtain GW waveforms for data analysis -free of human choices -which are by definition free of the systematic errors related to waveform integration.We showcase our framework in the context of parameter inference and model selection performed on both synthetic signals injected in LIGO-Virgo noise and on real GW signals.

A. Proca stars and Dark Matter
Proca stars belong to a family of theoretical exotic compact objects (ECOs) known as bosonic stars [94][95][96][97].These are part of the wider family of objects known as "BH mimickers" which, lacking the characteristic event horizon of BHs, can reproduce many of their propertiessee e.g.[98][99][100], avoiding issues related to the black hole singularity, as well as poorly understood issues related to quantum fields near event horizons [101].ECOs have been proposed, e.g., as dark matter candidates [102], in particular in models invoking the existence of hypothetical ultralight (i.e.sub-eV) bosonic particles, often referred to as fuzzy dark matter [103].One common candidate is the pseudo-scalar QCD axion [104], but other ultralight bosons arise, e.g., in the string axiverse [105].In particular, vector bosons are also motivated in extensions of the Standard Model of elementary particle [106] and can clump together forming macroscopic entities dubbed vector boson stars or Proca stars.
Bosonic stars are amongst the simplest and dynamically more robust ECOs proposed so far and their dynamics have been extensively studied, e.g.[71,[107][108][109].Scalar boson stars and their vector analogues, Proca stars [95,110], are self-gravitating stationary solutions of the Einstein-(complex, massive) Klein-Gordon [94] and of the Einstein-(complex) Proca [95] systems, respectively.These consist of complex bosonic fields oscillating at a well-defined frequency ω, which determines the mass and compactness of the star.Bosonic stars can dynamically form without any fine-tuned condition through the gravitational cooling mechanism [111,112].
While spinning solutions have been obtained for both scalar and vector bosons, the former are unstable against non-axisymmetric perturbations, in the simplest models wherein the bosonic field is free [113,114].Hence, we will focus on the vector case in this work.For non-self-interacting bosonic fields, the maximum possible mass of the corresponding stars is determined by the boson particle mass µ B .In particular, ultralight bosons within 10 −13 ≤ µ B ≤ 10 −10 eV, can form stars with maximal masses ranging between ∼1000 and 1 solar masses, respectively.In [1], we showed that GW190521 was consistent with the head-on collision of two Proca stars with µ B = 8.7 × 10 −13 eV.

B. Numerical simulations of Proca star mergers
We will demonstrate our ψ 4 data analysis making use of NR simulations of head-on collisions of spinning Proca stars.In addition to the quadrupole (ℓ, m) = (2, ±2) modes dominant for circular mergers, our simulations also yield the (2, 0) mode, co-dominant for the case of head-on collisions, and the much weaker (3, ±3) and (3, ±2) modes.Our set of waveforms is obtained from simulations of the collisions of two spinning Proca stars with aligned spin axes [1,115,116].Although starting from rest, the trajectories of the two stars are eccentric rather than strictly head-on due to frame-dragging.In our study's region of parameter space, all Proca-star progenitors are sufficiently massive and compact to trigger the gravitational collapse of the remnant.Therefore, the outcome of the collision always leads to BH formation after the merger.The simulations are performed with the Einstein Toolkit infrastructure [117][118][119], together with the Carpet package [120,121] for mesh-refinement.The Proca evolution equations are solved via a modified Proca thorn [71,113,122,123] to include a complex field.We consider both equal-mass and unequal-mass cases, as reported in our numerical Proca catalogue [116].The initial data consists of the superposition of two equilibrium solutions separated by D = 40/µ [1,71,109,124], in geometrized units, guaranteeing an admissible initial constraint violation.The equilibrium stars are numerically constructed using the solver fidisol/cadsol for non-linear partial differential equations of elliptic type, via a Newton-Raphson method (see [95][96][97] for more details).

IV. DATA ANALYSIS WITH ψ4
Consider an observation model where s(t) = F + h + (t) + F × h × (t) is the GW strain, F + and F × are the beam pattern functions of the + and × polarization states i.e. h + (t) and h × (t) respectively, and n(t) is the detector noise.Here we only consider a transient signal, therefore the beam pattern functions are approximated to be constant over the duration of the signal for a given sky localisation and polarisation angle.We can rewrite Eq. ( 1) as follows where h(t) = h + (t) − ih × (t).Taking second-time derivative on both sides yields where ψ 4 (t) = d 2 h(t) dt 2 and s ψ4 (t) = d 2 s(t) dt 2 .Now we have obtained the observation model with ψ 4 (t) directly involved.Since in practice we analyse the digital strain data which are discrete, we have to replace the second-order differential operator d 2 dt 2 by the secondorder difference operator δ 2 defined by where x[m] is a discrete time series (labelled by index m) with a sampling interval ∆t.We then have To express the above observation model with a closer notation connection to ψ 4 (t), i.e., the second derivative of h(t), we put a subscript Ψ 4 to represent a second-order differenced time series i.e. x Ψ4 [m] := (δ 2 x)[m].And we also reserve Ψ 4 [m] as a special notation for (δ 2 h)[m], in analogy with ψ 4 (t).With the new set of notations, we rewrite Eq. ( 5) as where ] is the second difference of the GW strain signal.In practice, parameter estimation is often performed on the data in the Fourier domain due to the more efficient evaluation of the likelihood function as compared to that in the time domain (see e.g.Ref. [15]).Applying Fourier transform on Eq. ( 6) yields (7) We note, however, that since the ψ 4 (t) extracted from NR simulations is sampled from the second derivative of h(t), ḧ[m], but not the second-order finite difference of the discrete strain (δ 2 h)[m]; we cannot directly use ψ 4 as templates for our analysis.Instead, as represented on the left side of Fig. 1, we need to transform these following the relation for which we provide the proof in Appendix A. In the above equation, ∆f = 1/(M ∆t) and M is the length of the discrete Ψ 4 [m].We finally obtain the observation model in the Fourier domain with ψ 4 directly involved as follows where where F + and F × are functions of the sky location of the source, i.e. the right ascension α, the declination δ, the polarisation angle ψ, and the event time t event , and θ ′ are other source parameters.
Another crucial ingredient for parameter estimation is the distribution of the second-differenced noise n Ψ4 [k] in order to obtain the likelihood function.It can be shown (see Appendix B) that if n(t) follows the stationary Gaussian distribution with power spectral density S n (f ), then n Ψ4 [m] also follows the stationary Gaussian distribution with power spectral density S nΨ 4 [k] as The likelihood function for source parameters θ given the second-differenced strain data d Ψ4 in the Fourier domain is therefore where s Ψ4 (θ) is the second-differenced template with parameters θ, and (a|b) denotes the noise-weighed inner product defined as [125] (a|b) = 4 Re with S nΨ 4 (f ) the power spectral density of the seconddifferenced detector noise given in Eq. (11).
A. Summarised recipe for a ψ4-analysis We summarise here our method to perform GW data analysis with ψ 4 .Consider the canonical situation where we have detector strain data d(t), the corresponding PSD S n (f ) and strain templates s(t; θ) for source parameters θ.Then, an analysis based on the Newman-Penrose scalar can be implemented by just replacing: 4)) PSD: S n (f ) → S nΨ 4 (f ) (Eq. ( 11)) Templates: s(f ; θ) → s Ψ4 (f ; θ) (Eq.( 8)). ( 14) Above, we assume that Ψ 4 templates are obtained from the ψ 4 outputted by NR simulations through Eq. ( 8), i.e., following the right path of Fig. 1, and are therefore free of integration systematics present in the strain templates.Consequently, both sides of Eq. ( 14) generally lead to different results, which is the point of our work.Nevertheless, one can also check that obtaining Ψ 4 as δ 2 h following the left side of Fig. 1 (taking second-order finite differences on the strain templates) makes both sides of Eq. ( 14) return identical results.

V. RESULTS: DATA WHITENING
GW data analysis ultimately relies on whitened data.This is, the detector data divided by the amplitude spectral density of the background noise d(f )/ S n (f ).This is then matched-filtered [26] with the whitened waveform templates h(f )/ S n (f ).Here we show that the transformations we perform on both of the strain data and PSD to obtain their Ψ 4 (t) versions lead to identical whitened data and templates.
Therefore, these lead to a completely equivalent analysis where the only difference is that Ψ 4 (t)-templates are free of systematic errors introduced during the obtention of the h(t) templates through integration.

A. Analytic case: Sine-Gaussian waveform
We start by considering the case of an analytical function h(t) for which we can analytically compute ψ 4 (t) = d 2 h(t)/dt 2 ≡ ḧ(t).This way, we have a "controlled" experiment where the "strain" h(t) and the corresponding "Newman-Penrose scalar" ψ 4 (t) at hand are free of potential differences introduced by systematic errors arising from the double integration of the former.In particular, we consider the case of a sine-Gaussian strain time series For this strain, we compute the corresponding ψ 4 (t) and obtain a corresponding finite sampling time-series . Next, we obtain a finite sampling-time series of the strain h[m] and compute the corresponding compute second- . Finally, we also obtain δ 2 h[m] by "correcting" the discretized second derivative ḧ[m], via the transformation in Eq. ( 8).The left panel of Fig. 2 shows these three time series.The inset therein shows that while the two finite-differenced time series are identical, ḧ(t) differs from them.
Next, we compare the result of whitening the strain time-series h[m] by a given strain PSD S n and that of whitening δ 2 h[m] by the corresponding transformed PSD S nΨ 4 .The inset of the right panel of Fig. 2 shows the difference between the whitened Fourier transforms of h[m] and δ 2 h[m].As before, we obtain the latter both by taking second-order finite differences on h[m] and by correcting ḧ[m] via Eq.( 8), which we denote ḧcorr .These differences are below 1 part in 10 12 .The main panel shows the differences between the whitened h[m] and ḧ[m], "wrongly" whitened by S nΨ 4 , which are nine orders of magnitude larger.
The above shows that given a continuous strain h(t) and its corresponding ψ 4 (t) ≡ ḧ(t), the processes of a) taking second-order finite differences on the finitesampling time-series of h[m] and b) correcting ḧ[m] via Eq.( 8) lead to identical time series that we call Ψ 4 (t).Second, it shows that given the PSD of a stochastic Gaussian and stationary strain time-series h[m] and the corresponding (δ 2 )h[m] = Ψ 4 [m], our estimation of the PSD S nΨ 4 correctly whitens the latter.In the following, in order to adapt to common literature, we drop the discrete notation, e.g.replacing h[m] by h(t).

B. Whitening of detector data
The left panel of Fig. 3 shows the whitened strain d(t) and d Ψ4 (t) time-series of the Livingston detector at the time of the event GW190521.Their differences, shown in the right panel, are well below one part in 10 12 .Again, this shows that our formalism correctly whitens the data and that, therefore, both types of analyses are totally equivalent provided that no artefacts are picked during the construction of the strain templates from the ψ 4 ones.

Whitening waveform templates: impact of the choice of ω0 during fixed-frequency integration
The left panel of Figs. 4 and 5 show raw strain templates two simulations of a Proca star merger h(t).These respectively correspond to a waveform consistent with GW190521 and to a larger mass-ratio and rather edge-on configuration with multi-modal structure [22,126,127] that, in a separate paper [128], we find consistent with the GW trigger 200114 020818 (S200114f in the following) [129] 1 .In both, cases, the strain h(t) has been obtained from ψ 4 (t) through an FFI using a given ω 0 cutoff.Overlaid, we show the corresponding Ψ 4 (t) obtained both as (δ 2 h)(t) and by correcting ψ 4 (t) outputted by NR, which in the following we simply call Ψ 4 (t).We scale h(t) by a suitable amplitude factor so that both waveforms can be plotted together.The right panel shows the corresponding whitened waveforms.
First, we note that while (as expected) the raw h(t) widely differs from the two Ψ 4 (t) waveforms, the whitened h(t) and (δ 2 h)(t) waveforms are identical but differ from the "direct" Ψ 4 .This is due to the impact of the choice of ω 0 used to obtain h(t) from ψ 4 (t).As we will show later, for the case shown in Fig. 4, these differences are not large enough to have a significant impact on parameter inference or model selection.However, for the case shown in Fig. 5, the choice of ω 0 removes enough "true" signal power to cause clear morphological alterations that impact both parameter estimation and model selection.

VI. PARAMETER INFERENCE AND MODEL SELECTION ON SIMULATED SIGNALS A. Summary of Bayesian Parameter Inference and Model selection
We test our framework by performing full Bayesian parameter estimation and model selection on simulated signals injected in zero-noise using the Bayesian inference library Parallel Bilby.We consider a reference signal or FIG. 2. Demonstration of our transformation and whitening scheme on sine-Gaussian pulses.The left panel shows the analytical second derivative ḧ(t) of a sine-Gaussian strain time-series h(t) and its second-order finite difference time-series δ 2 h(t).We obtain the latter both directly and from correcting ḧ(t) via Eq.( 8).The inset of the right panel shows the difference between the latter two time-series, whitened with a PSD Sn Ψ 4 , and the original strain h(t) whitened by the corresponding Sn.These are of the order of 1 part in 10 Here, π(θ) denotes the prior probability of the parameters θ while L M * (h M (θ True ) | θ) represents the likelihood of the data h M (θ True ) under the waveform model M * with the given parameters θ.We use the canonical likelihood for GW transients in Eq. (12).Finally, the term Z M M * denotes the Bayesian evidence for the data h M assuming the template model M * .This is equal to the integral of the numerator over the explored parameter space Θ, given by Given two template models M 1 and M 2 being compared to some data d, or some simulated signal h M (θ), the relative probability for those models, or relative Bayes factor We show the raw time-domain data for the case of a) Ψ4 directly coming from a numerical relativity simulation (through Eq. ( 8)) of a head-on Proca-star merger consistent with GW190521 (black), b) the strain obtained from ψ4 through double integration (blue) and c) the Ψ4 obtained from the latter through second-order finite differencing, denoted by δ 2 h(t).The strain in the left panel has been conveniently scaled to note the obvious morphological differences with respect to Ψ4. Right: corresponding whitened time-series.The zoomed boxes show how the h(t) and δ 2 h(t) are exactly identical while very small differences can be observed with respect to the original Ψ4.

B M1
M2 is given by Expressing these in terms of natural logarithms, it is commonly considered that the model M 1 is strongly preferred with respect to M 2 when log B M1

M2
= log(Z M1 ) − log(Z M1 ) ≥ 5. Finally, since the Bayesian evidence Z represents the Bayes Factor for the "model vs. noise" hypotheses, we will commonly refer to it as simply the "Bayes Factor", denoting it as B.
As it will become relevant later, it is important to note that, the evidence Z M M * is bounded above by the maximum value of the likelihood L M * (h M (θ True )|θ best ), achieved for the best fitting parameters θ best .This is, by the best fit that the model M * can provide for h M (θ).At the same time, in the absence of noise, such maximum likelihood is capped by the "optimal maximum likelihood" L M (h M (θ True )|θ True ).
To anticipate the expected consequences of respectively analysing and modelling a true GW using a tem-  Direct 4 (eq.8) Strain-derived 42 h(t) FIG. 5. Whitening of strain and Ψ4 templates.Impact of aggressive choice of ω0.Same as in Fig. 4 but for a waveform template consistent with S200114f [128,129].In this case, the differences between the Ψ4 directly extracted from the numerical simulation and the other two waveforms are significantly more noticeable.
plate affected by integration errors, let us consider two scenarios.First, consider that we model the true GW, i.e. our injection, as h M (θ True ) = Ψ 4 (θ True ) and try to recover it using templates (δ 2 h)(θ True ) which carry integration artefacts.Since such artefacts will change the frequency content of the templates, these will not perfectly match the injection, leading to a drop in the maximum likelihood and, therefore, of the corresponding Bayesian evidence in favour of the model.Second, any choice of the integration frequency cutoff ω 0 will remove some true power from the waveform.This will consequently lead to an under-estimation of the signal loudness for a given source distance, yielding a bias toward lower distances.Second, for this last reason, if we model the true signal using either h(θ True ) or (δ 2 h)(θ True ), this will cause an under-estimation of the signal loudness, the optimal maximum likelihood and, therefore, an intrinsic decrease of the maximum Bayesian evidence achievable in the analysis.

B. Specific set-up
We perform parameter estimation on injections generated in terms of Ψ 4 (t), h(t) and (δ 2 h)(t).For the latter two, we consider two cases.In the first case, we obtain the strain through FFI using a frequency cutoff Mω 0 ≃ 0.27, in geometric units 2 .In the second, we sim-ply apply a regularization at the pole given by ω 0 = 0, which we replace by the value for the lowest frequency multiplied by 10 −4 .We respectively label the resulting waveforms by F and NF sub-indexes, i.e., h F , (δ 2 h F ) and h NF , (δ 2 h NF ).
We recover these injections using different types of templates, as shown in Table I.We make two choices for the parameters θ, corresponding to the two cases shown in Figs. 4 and 5.These correspond to parameters consistent with GW190521 and the trigger S200114f under our Ψ 4 formalism.The most relevant difference between the corresponding simulations is the aggressiveness of the ω 0 used to obtain h(t).
As mentioned in the previous section, in the case of the simulation consistent with GW190521, we find that this does not subtract significant power from the portion of the signal falling into the Advanced LIGO sensitive band while in the second case (the S200114f-like simulation) it does.The expectation is that for the first case, results obtained through Ψ 4 and all h F -based analyses will be very similar; while in the second, those based on filtered waveforms will differ significantly.In particular, two types of differences are expected.First, if the frequency content of the waveforms is altered by the integration errors, this will limit the ability of the resulting strain waveforms (or rather, (δ 2 h)(t)) to fit the original Ψ 4 .This will translate into both a reduction of the Bayes Factor that may bias model selection and into potential parameter biases.Second, since any choice of ω 0 will remove a certain amount of signal power, this will result in an underrating of the strain-signal loudness.On the one hand, for identical parameters, this will lead to an under-estimation of the signal SNR.On the other hand, this will cause a bias in the distance estimate.
The significance of the above effects in model selection and parameter inference depends on the signal loud-ness, as louder signals require more accurate templates in order to avoid analysis biases.We evaluate the impact of these biases under various observing scenarios, we consider three types of signal loudness, characterised by the optimal signal-to-noise ratio (SNR) of the injection modeled by Ψ 4 .In the first case, where use the exact parameters best-fitting GW190521 and s200114f, the Ψ 4 injection has an SNR of ≃ 15 across the whole detector network, typical of current GW detections.Next, reducing the distance by a factor of 2, we study the case of signals with SNR ≃ 30, similar to the maximum SNR observed to date.Finally, we consider the case where the injection has an SNR of ≃ 60.For simplicity, in what follows, we will use "=" signs to refer to these cases.
Finally, as shown in the previous section, if our Ψ 4 formalism is equivalent to the classical one based on strain, results obtained through the injection and recovery of (δ 2 h F/NF )(t) and h F/NF (t) should be exactly equal, modulo the uncertainty associated to the sampling of the likelihood throughout the parameter space.In fact, Table I shows that the evidences obtained by such pairs of analyses differ at most by 0.5 (which would not impact our conclusions regarding model selection), even in the highest SNR cases where convergence is harder to achieve, with most cases ranging between 0 and 0.2.
When sampling the likelihood, we fix the mass-ratio and spins of the templates to those of the injection and sampling only over the total red-shifted mass of the source M total , the luminosity distance d L and orientation angles (ι, φ), the sky-location angles (α, δ), the polarization angle ψ and the time-of-arrival.The power spectral densities used for our two injections are those of the Advanced LIGO and Virgo detectors at the times of GW190521 and S200114f.When analysing the corresponding Ψ 4 or δ 2 h injections, we applied the correction factor in Eq. ( 11) to obtain the appropriate PSDs.We sample the parameter space using the nested sampler Dynesty [130] with 4096 live points for the cases with SNR=15 and 30, and 8192 live points for the cases with SNR = 60.

C. Results on simulated signals
Table I shows the natural log Bayes factor (log B) and the maximum log likelihood (log L max ) recovered by each of our analyses for each of our two types of injections.First, we note that in both cases, the analyses making use of strain waveforms h F(NF) and the corresponding second-order finite differenced waveforms δ 2 h F(NF) yield equivalent results even for SNRs of 60, corroborating that our formalism does not introduce any artefacts.This means there is no fundamental reason to prefer an analysis based on strain.Therefore, given that the Ψ 4 waveforms, directly obtained from ψ 4 , avoid a complete layer of systematic errors, in the following we use the results based on the injection and recovery of Ψ 4 itself as our reference results.Since we checked above that sampling errors introduce maximal uncertainties in the Bayes Factors of ≃ 0.5 (irrelevant for the purpose of model selection), we will assume that any significant difference between our reference analysis and the remaining ones are due to artefacts arising from the integration of ψ 4 (t) to obtain h(t).
Fig. 6 shows posterior parameter distributions for all injection-template combinations in Table I, for the case of our GW190521-like signal scaled to an SNR of 15 .The vertical bars show the true injection values.Similarly, Fig. 7 shows the same for our S200114f-like injection.In the first case, all distributions yield equivalent results.In particular, since our choice of ω 0 barely affects the signal morphology, there is no significant difference between the posteriors obtained when injecting either Ψ 4 (t), h F (t) or h NF (t) and recovering with any of the relevant template models.We find the same is true when we raise the SNR to 30 and 60.In particular, the top contours of Fig. 8 show the symmetric 90% credible intervals around the median obtained for the total mass as a function of the SNR of the Ψ 4 injection when this is recovered with Ψ 4 itself (blue) and δh 2 F , which carries potential integration and ω 0 -choice artefacts.Both these contours are essentially equal and converge to the true value for increasing SNR, indicating that, in this case, the aforementioned artefacts are mild enough to not to bias parameter estimation.However, the likelihoods reported in Table I and on the left panel of Fig. 9 show that the runs involving injections (h F (t) and (δ 2 h F )(t)) reach slightly lower log-likelihoods due to the (small) power eliminated by the choice of ω 0 , which reduces the SNR of the injection.Nevertheless, as Table I shows, for the cases with SNR = 15 and 30 such missing power does not cause changes in the Bayes factors that can lead to qualitatively different conclusions when performing model selection.Accordingly, we will later show that the analysis of GW190521 is not impacted at all by the usage of h F templates.This is however not the case when the SNR is raised to 60.In this situation, while parameter estimation is unaffected, we observe that analysing a true GW with our filtered waveforms causes a drop of ≃ 6 in the Bayesian evidence, sufficient to lead to model selection biases.
The situation is quite different for the case of our second injection.In this case, the choice of ω 0 does significantly affect both the morphology and signal power of the waveform.As a consequence, Fig. 7 shows clear shifts of the posteriors for the total mass and the polarization angle when we recover the Ψ 4 injection with the δ 2 h F templates at an SNR of only 15, even if these are not completely inconsistent with the others.Figure 8 shows, however, that such shifts bias the estimate of the total mass when the SNR is above 30.This turns even more dramatic when evaluating the impact on the recovered log-likelihood and on model selection.The yellow distribution in the right panel of Fig. 9 shows that the "F" templates (e.g.(δ 2 h F )) fail to recover a significant amount of power from Ψ 4 , therefore dramatically reducing the maximum log likelihood.This leads to a FIG. 6. Posterior parameter distributions for our GW190521-like injection, when scaled to a signal-to-noise ratio of 15 Posterior parameter distributions for our different analyses in Table I together with the true value represented by a dashed line.The color code denotes the type of injection used (Ψ4, strain h(t) or strain-derived Ψ4 denoted by δ 2 h(t)), and the type of template.All analyses yield equivalent results.In particular, no significant difference is observed when filtered or non-filtered injections and templates are used.The parameter θJN describes the angle formed between the total angular momentum of the source and the line-of-sight.We note that since our sources do not precess, this is equal to the parameter ι.
significant drop in the Bayes factor, as shown in Table I, that can change qualitative conclusions concerning model selection even when the SNR is only of 15.In fact, as we will show later, using h F templates has strong consequences for the analysis of S200114f.Finally, as expected, injecting any of the "F" waveforms leads to a significant drop in the power present in the injection and, therefore, in the recovered power and Bayes factor.

VII. RESULTS ON REAL DATA I: GW190521 AS A BOSON-STAR MERGER
We now demonstrate our framework on real GW data.In Ref. [1] we performed parameter estimation and model selection on 4 seconds of data around the time of GW190521, comparing this event to a "vanilla" quasi-circular BBH model employed by the LIGO-Virgo-KAGRA (LVK) collaboration [131] and to a set of numerical simulations for Proca star mergers.Here we reproduce this analysis both using the classical strain formalism and our new ψ 4 -based framework.We obtain our Ψ 4 input data by applying the transformations in Eqs. ( 4) and (11) to the public strain data and the corresponding strain PSDs.
We compare GW190521 to a family of numerical sim- .Moreover, the model can be extrapolated to q = 6 and a i = 0.99.In our original study [1], we made use of the parameter estimation software Bilby [29] and sampled the likelihood across the parameter space using nested sampler CPNest [133].In this work, however, we switch to the parallelizable version of Bilby, known as Parallel Bilby [134], and the sampler Dynesty [130].
Owing to this change in software, we also repeat our original analysis based on strain data.We impose the same parameter priors as in Ref. [1], as detailed below.

FIG. 7.
Posterior parameter distributions for our S200114f-like injection, when scaled to a signal-to-noise ratio of 15 Posterior parameter distributions for our different analyses in Table I together with the true value represented by a dashed line.The color code denotes the type of injection used (Ψ4, strain h(t) or strain-derived Ψ4 ≡ (δ 2 h)(t)) and the type of template.Recovering non-filtered injections with filtered waveforms leads to visible shifts in some posteriors equivalent results.This is due to the excessive aggressiveness of the integration filter.

Bayesian priors
For the intrinsic source parameters, we consider uniform priors on the field frequency ω/µ B ∈ [0.80, 0.93] and the total red-shifted mass M ∈ [50, 500] M ⊙ .For the extrinsic parameters, we impose a distance prior uniform in co-moving volume with d L ∈ [10, 10000] Mpc, flat priors on the polarisation angle and time of arrival, and isotropic priors on the source orientation and skylocation.For the BBH model, we set identical priors in all of the parameters shared with the BBS model.In addition, we set uniform priors on the dimensionless spin magnitudes and isotropic priors on their orientations.As in [1] we set a uniform prior on the mass ratio q ∈ [1/6, 1].Finally, we note that as in [128], and as in the analysis of S200114f we describe later, we have tried an alternative prior uniform on the (inverse) mass-ratio Q ∈ [1,6], and we have also tried to restrict the mass-ratio ranges to Q ∈ [1,4] and q ∈ [1/4, 1].All of these yield evidences that differ, at most, by 0.2, therefore leading to identical conclusions regarding model selection.

A. Model selection
Fig. 10 shows the whitened data of the Hanford, Livingston and Virgo detectors at the time of GW190521, for both the case of h(t) and Ψ 4 (t).Together, we overlay the maximum likelihood waveforms returned by the BBH model and by both our strain and ψ 4 -based analyses when using our equal-mass Proca star mergers.First, we note that the two latter analyses return essentially identical waveforms, once again showing that both analyses are equivalent modulo systematic errors coming from the obtention of h(t).
Table II shows the natural logarithm of the Bayes factor (log B) for our different models under different choices of the distance prior.First, we note that for the same prior and waveform model, the Ψ 4 and strain analyses produce almost identical results.Second, consistently with Ref. [1], the first column shows that when attaching to the standard distance prior which is uniform in comoving volume, both the equal and unequal-mass models yield log B only slightly larger than the BBH model.In particular, using Ψ 4 as our reference analysis, the equal (unequal-mass) model is e 0.8 ≃ 2 (e 1.7 ≃ 5.5) times more probable than the BBH one.The second column shows results obtained under the assumption of a uniform distance prior.Although this can be considered to be rather nonphysical, this prior effectively removes the intrinsic bias towards louder sources (as circular BBHs) that can be observed from much further away than much weaker head-on mergers, which was induced by the previous prior.Alternatively, results obtained under the GW190521-like injection S200114f-like injection 4 Injection vs. 4 template 4 Injection vs. h [2]  F template True value FIG. 8.
Total mass bias due to ψ4-integration and filtering as function of signal loudness The blue and orange contours denote the 90% credible intervals for the total mass for the case of our GW190521-like and S200114f-like injections as a function of the injection signal-to-noise ratio.The injection is always modelled by Ψ4, free of integration errors.The blue and orange contours denote, respectively, the result of analysing the injection with Ψ4 itself and with the (δ 2 hF ), which inherits the ψ4-integration errors together with the power loss due to the choice of an integration lowfrequency cutoff ω0. .
uniform prior can be considered as crude estimates of what would happen once numerical simulations for (intrinsically louder) less eccentric configurations of Proca star mergers become available 3 .Once again, using our Ψ 4 -analysis as a reference, the equal and unequal-mass models are favoured with probabilities e 3.3 = 27 : 1 and e 4.2 = 67 : 1 with respect to the BBH case.These results are perfectly consistent with those obtained from the analysis of strain data and with those reported in Ref. [1].

B. Parameter estimation
Table III shows our parameter estimates for GW190521.We report median values together with symmetric 90% credible intervals.The q = 1 rows correspond to results obtained with our equal-mass simulations while the q ̸ = 1 rows correspond to those obtained with our exploratory unequal-mass ones.For each of these, we report results from the analysis of both strain and Ψ 4 data. 3Here, we make the crude assumption that, for suitable combinations of the binary parameters, we would be similar mergerringdown signals as those obtained for our head-on mergers.First, we note that, as expected from the previous section, strain and Ψ 4 produce almost identical results.Consistent with Ref. [1], and taking Ψ 4 -analysis using q = 1 waveforms as a reference, we find that GW190521 can be interpreted as a head-on merger of two Proca stars with masses 117 +5 −8 M ⊙ that left behind a black-hole a final mass of M f = 233 +12 −16 M ⊙ and spin of a f = 0.70 +0.04 −0.03 ; observed at a distance of 544 +296 −163 Mpc.Due to the much lower intrinsic loudness of head-on mergers, the inferred distance and total mass are in large contrast with those inferred by the LVK Collaboration, respectively ≃ 5 Gpc and ≃ 150 M ⊙ .For the Proca stars, we infer a field frequency ω/µ B = 0.895 +0. 15  −0.15 .Combined with the total mass, this yields an ultralight boson mass of 8.60 +0. 63 −0.62 ×10 −13 eV.Our analysis making use of unequalmass stars yields consistent conclusions.In particular, it yields a boson-mass of 8.57 +0. 64 −0.67 × 10 −13 eV.

VIII. RESULTS ON REAL DATA II: THE TRIGGER S200114F
Finally, we show a real data example for which our framework makes an important difference.The trigger S200114f is a LIGO-Virgo high-mass trigger detected by a model agnostic search that identifies coherent excess power across the detector network, known as coherent WaveBurst [135], with a highly significant false-alarm-rate of 1/17 yr [129].This trigger has, however, challenged existing waveform models.In particular, while the LVK Collaboration analysed S200114f under three BBH waveform models, no pair of these models returned consistent parameter estimates.Far from indicating that this trigger is not of astrophysical origin, this a symptom that the mentioned waveform models disagree in the regions of the parameter space where they best reproduce the signal.However, this trigger is morphologically consistent with a type of noise transients known as Tomte glitches [136].In this situation, the LVK decided not to classify it as a confident or catalogued detection but, importantly, nor Whitened detector data and maximum likelihood waveforms for S200114f We show the whitened strain h(t) (light blue) and Ψ4(t) (grey) detector data around the time of GW190521 together with the maximum likelihood waveforms returned by the BBH model NRSur7dq4(blue) and by our equal-mass Proca star merger simulations obtained through strain (brown) and Ψ4(t) analyses (orange).Unlike in Fig. 9 for the case of GW190521, in this case, the brown and orange waveforms show visible differences particularly visible in the early part of the waveforms.These translate into a worse fit to the data in the brown case and qualitatively different conclusions in terms of model selection. .Columns labelled by h(t) correspond to a "classical" analysis performed with strain-data and templates; while those labelled by ψ4(t) make use of Ψ4(t)-data and templates.We quote results corresponding to a distance prior uniform in co-moving volume.
was it classified as a noise trigger.
The above characteristics make S200114f a tantalising candidate to compare to our simulation catalogue of Proca-star mergers, in the same way we previously treated GW190521.We note, however, that because we find that S200114f was poorly reproduced by the small simulation sets mentioned above, here we use an enhanced bank of nearly 759 simulations spanning a grid in the two-star oscillation frequencies the space ω 1,2 /µ B ∈ [0.80, 0.93].Due to this enhancement, we use priors that slightly differ from those of the GW190521 analysis, which we specify below.Just as for the case of GW190521, we perform our comparison to the BBH model NRSur7dq4 within the strain framework while for the Proca-star case we use both the strain and Ψ 4 formalisms.Unlike in the case of GW190521, however, we find that both methods return significantly different results that arise from the fact that integration/filtering issues affect the best-fitting strain waveforms.

Bayesian Priors
For the PSM model, we impose the same priors as in our GW190521 study, with the exception of the field frequencies.For these, we impose a prior uniform across the triangle defined by ω 1,2 /µ B ∈ [0.80, 0.93], with ω 1 /µ B ≥ ω 2 /µ B .Finally, for the BBH model, we impose the same priors as for the GW190521 case.However, from the four mass-ratio priors we discuss we retain the one yielding the largest Bayesian evidence for the Ψ 4 run.The goal of this is to be as conservative as we can in our statements in favour of the existence of Proca stars.These priors are identical to those imposed in [128], where we refer the reader to for further details..

A. Model selection
Fig. 11 shows the whitened time series from the three detectors around the time of S200114f.Overlaid, we show the maximum likelihood waveforms for the BBH case and for the Proca-star merger case, the latter both in terms of strain and Ψ 4 .Unlike for the case of GW190521, the latter two (brown and orange) clearly differ.This difference is particularly visible in the early pre-merger part of the signal, which is most prone to be affected by integration artefacts and choices of the integration frequency cutoff.Additional differences are also observable in the late ringdown part.While visually mild, such disagreement drives dramatically different values of the corresponding likelihood, which is 6 e-folds larger in the Ψ 4 case.This, in turn, has a great impact on model selection, as shown in Table IV.While under the Ψ 4 formalism we obtain log B = 2.0, this is reduced to log B = −7.6 when using strain templates, in the case where we use our distance prior uniform in co-moving volume.In other words, while the trigger is slightly preferred as a Proca-star merger under the artefact-free Ψ 4 analysis, such an option is conclusively discarded under the strain analysis due to the artefacts arisen during the waveform integration process.We note that this result is qualitatively consistent with that returned by the noise-free injection study described in section VI. C. In particular, the log B reported in the first (Ψ 4 vs.Ψ 4 ) and fifth (Ψ 4 vs. δ 2 h F ) rows of the fourth column of Table I differ by 6.1 units, as compared to the 9.2 units we obtain in real data.Finally, a similarly dramatic effect is observed when we use a prior uniform in distance.In this case, the usage of strain waveforms causes a reduction from log B = 5.3 to log B = −0.3,i.e., from a strong preference for the Proca-star scenario to rather equal preference for both scenarios.

B. Parameter estimation
Finally, for the sake of completeness, Table V shows our parameter estimates for S200114f under both the strain and Ψ 4 analyses.First, clear differences arise in the estimated luminosity distance, total redshifted mass, star frequency and spin parameters.These translate into biases in the boson mass and source-frame mass estimates.In particular, the boson-mass estimate from the strain formalism is highly consistent with that of GW190521 while it becomes highly inconsistent if using the integration-error-free Ψ 4 waveforms.

IX. DISCUSSION OF RESULTS: INTEGRATED STRAIN VS. NEWMAN-PENROSE SCALAR
Given our results, both on synthetic signals and on real data, the question arises of: what is special about the waveforms reproducing S200114f that makes strain waveforms problematic, as opposed to the case of GW190521?
The answer is: nothing in principle.As stated throughout the paper, obtaining "integration-error free" strain waveforms depends on a series of human choices (in particular that of ω 0 ) that are only reasonably well guided for the case of quasi-circular mergers.In other cases, obtaining clean strain waveforms through FFI (if possible at all ) involves a way more convoluted trial-anderror process.Moreover, we understand that in the absence of a "true" reference waveform, concluding that the obtained waveform is correct can only be done through the comparison to the Newman-Penrose one, similarly to what we do in the right panels of Figs. 4 and 5.
In this situation, we can only make the everything but scientific argument that the integration choices that hap- pened to work correctly for the waveforms best fitting GW190521 (making our results in [1] safe from integration artefacts), did not work for the waveforms best fitting S200114f.This is, at least partially, explained by the fact that our choice of Mω 0 does remove more true signal power for S200114f than for the GW190521-like injection.In this situation, it could be argued that better strain waveforms may have been obtained if further exploration of ω 0 -choices was performed, although there is no guarantee that this process would lead to a driftless waveform containing all the true signal power.In fact, FFI does by definition eliminate some true signal power even in quasi-circular cases.In our view, this exactly exemplifies the huge advantage that the usage of the Newman-Penrose scalar has over that of the integrated strain: the resulting waveforms are "uniquely defined", with no choices to be made beyond those pertaining the specific configuration of the numerical simulation itself.

X. CONCLUSIONS
Extracting the properties of GW sources requires accurate waveform templates that can be compared to detector data.NR provides the most precise way to obtain such templates and it is often the only way.Computing GW strain waveforms from ψ 4 outputted by NR simulations that can be compared to the strain detector data is a non-trivial process subject to well-known systematics that can impact the physical interpretation of the source.Moreover, easing these errors is a rather artisan process subject to human choices that are not always obvious or even well-motivated depending on the considered type of source.This is particularly problematic for some of the most astrophysically interesting sources LIGO and Virgo are starting to observe, like precessing mergers [8,137], or may observe in the future observing runs, like eccentric mergers or dynamical captures, for which there is no monotonic relation between GW frequency and time.We note that, even in cases where such relation exists, typical systematic errors of ∼ 1% in amplitude will always exist.Moreover, these are in practice impossible to know because the true waveform is not known [68].By taking second-order finite differences on the detector strain data, we have presented a data analysis framework that allows to directly compare GW data to ψ 4 , removing the need to extract the GW strain from numerical simulations and the associated systematic errors.We have shown that our framework is equivalent to the traditional strain one modulo the potential systematic errors present in the strain waveforms.Therefore, given that Ψ 4 -waveforms have one less layer of systematic errors than strain ones, classical strain analyses will, at best, be as faithful as Ψ 4 -based ones.
As a demonstration of our framework in real data analysis, we have first repeated our previous study comparing GW190521 to numerically simulated strain waveforms from Proca star mergers presented in Ref. [1], but using the direct Ψ 4 outputted by our numerical simulations.We obtain results completely consistent with the original ones, which is indicative that our strain waveforms best-fitting GW190521 suffered, at most, from mild integration errors that did not impact our original analysis.Second, we have analysed the high-mass trigger S200114f using an enhanced catalog of Proca-star merger simulations reported in [128].In this case, we find that the usage of strain waveformsaffected by integration errors -has a huge impact in the interpretation of this signal, yielding conclusions that differ dramatically with respect to those obtained by using error-free Ψ 4 waveforms.
Our framework removes the need to obtain strain waveforms from numerical relativity simulations, removing a complete layer of systematic errors.We note that while we have focused on the case of short numerical relativity simulations with rather "exotic" dynamics, the integration errors worsen with increasing waveform length.In particular, this makes such errors particularly troublesome in the task of constructing hybrid numericalrelativity -post-Newtonian waveforms [138] that are matched are early times [68].
While we have discussed our procedure under the prevalent scenario where the transverse-traceless (TT) gauge is considered, its application to alternative gaugeindependent formulations [91] is, in principle, straight forward.Similarly, while we have demonstrated our framework in the context of parameter inference and model selection, this is trivially applicable to the case of actual matched-filter searches for GW signals [139][140][141][142][143][144].Finally, we note that since LVK results have so far been obtained under the assumption of quasi-circular mergers, we have no reasons to believe such results may be affected by the errors we have discussed here.

ACKNOWLEDGEMENTS
We plan to publish and mantain our code to perform gravitational-wave data analysis using the Newman-Penrose scalar, within the software Bilby [29,134] at [145].The analysed LIGO-Virgo data and the corresponding power spectral densities, in their strain versions, are publicly available at the online Gravitational-Wave Open Science Center [146,147].This research has made use of data or software obtained from the Gravitational Wave Open Science Center (gwosc.org), a service of LIGO Laboratory, the LIGO Scientific Collaboration, the Virgo Collaboration, and KAGRA.LIGO Laboratory and Advanced LIGO are funded by the United States National Science Foundation (NSF) as well as the Science and Technology Facilities Council (STFC) of the United Kingdom, the Max-Planck-Society (MPS), and the State of Niedersachsen/Germany for support of the construction of Advanced LIGO and construction and operation of the GEO600 detector.Additional support for Advanced LIGO was provided by the Australian Research Council.Virgo is funded, through the European Gravitational Observatory (EGO), by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale di Fisica Nucleare (INFN) and the Dutch Nikhef, with contributions by institutions from Belgium, Germany, Greece, Hungary, Ireland, Japan, Monaco, Poland, Portugal, Spain.KAGRA is supported by Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan Society for the Promotion of Science (JSPS) in Japan; National Research Foundation (NRF) and Ministry of Science and ICT (MSIT) in Korea; Academia Sinica (AS) and National Science and Technology Council (NSTC) in Taiwan.JCB is supported by a fellowship from "la Caixa" Foundation (ID 100010434) and from the European Union's Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreement No 847648.The fellowship code is LCF/BQ/PI20/11760016. JCB is also supported by the research grant PID2020-118635GB-I00 from the Spain-Ministerio de Ciencia e Innovación.KC acknowledges the MHRD, Government of India, for the fellowship support.JAF is supported by the Spanish Agencia Estatal de Investigación (grants PGC2018-095984-B-I00 and PID2021-125485NB-C21) and by the Generalitat Valenciana (PROMETEO/2019/071).This work is supported by the Center for Research and Development in Mathematics and Applications x[M + 1] = x [1], then δ 2 x is a discrete-time Gaussian process with mean µ 2 = T µ and covariance Σ 2 = T ΣT where T is a M ×M matrix with entries T j,j = −2/(∆t) 2 , T j,j+1 = T j,j−1 = 1/(∆t) 2 with a periodic boundary condition imposed on the matrix index i.e. index 0 implies index M and index M + 1 implies index 1, and otherwise zero.
Proof.The probability density function of the discretetime stationary Gaussian process is (B2) The second difference of x can be regarded as a linear transformation of x.Since      (δ 2 x) [1] (δ 2 x) [2] . . .
x [1] x [2] . . .or T j,j = −2/(∆t) 2 , T j,j+1 = T j,j−1 = 1/(∆t) 2 and a periodic boundary condition is imposed on the matrix index i.e. index 0 implies index N and index N + 1 implies index 1, and otherwise zero.The probability density function of δ 2 x is then q(δ 2 x) = p(T −1 δ 2 x) ∂(T −1 δ 2 x) ∂(δ 2 x) = p(T −1 δ 2 x) T −1 where we have used T T = T since T is a symmetric matrix.Therefore, δ 2 x is a discrete-time Gaussian process with mean T µ and covariance T ΣT .which depends on the time difference only.δ 2 x is there-fore also a stationary discrete-time Gaussian process.The autocovariance of δ 2 x is therefore  We quote the inclination in terms of the angle between the line-of-sight and the total angular momentum θJN as well as the azimuthal angle of the observer φ (see Appendix I in [1]).
where S[k] is the power spectral density of the stochastic process x.
Appendix C: Appendix: Parameters of simulated signals Table VI shows the parameters of the two injections discussed in Section VI.

FIG. 1 .
FIG. 1.Schematic comparison of our proposed data analysis framework and the currently used one.To date, the ψ4 magnitude outputted by numerical relativity simulations is converted to the strain h outputted by gravitational-wave detectors via a double integration that is subject to systematic errors (red path).Instead, we transform both the simulation ψ4 and the detector h (and powerspectral-density Sn) into a third quantity that we label by Ψ4, avoiding the integration process and the corresponding systematic errors (green paths).
ulations for head-on mergers of Proca stars with equalmass and spin.The spin of the Proca stars can be directly mapped onto the bosonic field frequency, which is uniformly distributed in ω/µ B ∈ [0.80, 0.93] with a resolution of ∆ω/µ B = 0.0025.In addition, as in[1], we use a secondary exploratory family of unequal-mass mergers in which the frequency of one of the stars is fixed to ω 1 /µ B = 0.895 and the other varies uniformly in ω 2 /µ B ∈ [0.80, 0.93].We perform model selection w.r.t. the classical circular BBH case for which we choose the waveform model NRSur7dq4[54] implemented in the LALSuite library[132].This model includes all gravitational-wave modes with ℓ ≤ 4 and is directly calibrated to precessing NR simulations with mass ratio q = m 1 /m 2 ∈ [1, 4] and individual spin magnitudes a i ∈ [0, 0.8]

FIG. 9 .FIG. 10 .
FIG.9.Likelihood posterior distributions for our two sets of parameter inference runs Left: Posterior distributions for the case of our GW190521-like injections.Right: same for our S200114f-like injections .
FIG.3.Whitening of strain and Ψ4 detector data.Left: whitened strain and Ψ4 gravitational-wave time-series from the Livingston detector around the time of GW190521.Right: difference between the absolute values of the corresponding Fourier-domain data.These are at the level of one part in 10 12 , so that both whitened detector data are equivalent for all practical purposes."injection"h M (θ True ) with source parameters θ True computed by a waveform model M .In our case, the model M corresponds to either Ψ 4 (t), h(t) or (δ 2 h)(t).Next, we recover the posterior distributions of the parameters p M * (θ | h(θ True )) using a different model M * as the signal template.This is given by

TABLE I .
Template M * log B log Lmax log B log Lmax log B log Lmax log B log Lmax log B log Lmax log B log Lmax Summary of injection recovery with different waveform models We report the log Bayes factor (for model M * vs. noise hypotheses) obtained from our different waveform models, together with the corresponding maximum log likelihood values.We show results for two types of injections of Proca-star merger signals, respectively consistent with the GW190521 signal and with the 200114f trigger, both with SNRs around 15. To show the increasing impact of ψ4 integration errors as the SNR raises, we further scale our injections by factors of 2 and 4, corresponding to SNRs of approximately 30 and 60.Log Bayes' Factors have typical uncertainties of ≃ 0.1.with maximum values of 0.5.

TABLE II .
Model selection for GW190521 We report the natural Log Bayes factor obtained for our different waveform models.For the Proca star merger model, analyses done under the classical strain formalism and our Ψ4-formalism are equivalent.

TABLE III .
Parameters of GW190521 as a head-on Proca star merger We report median values together with symmetric 90% credible intervals under the scenario of an equal-mass, equal-spin merger and under our exploratory unequal-mass model.

TABLE IV .
Model selection for s200114fWe report the natural Log Bayes factor obtained for our different waveform models under different signal models and distance priors.

TABLE V .
Parameters of S200114f as a head-on Procastar merger We report median values together with symmetric 90% credible intervals.The column labelled by h(t) corresponds to a "classical" analysis performed with straindata and templates; while that labelled by ψ4(t) makes use of Ψ4(t)-data and templates.We quote results obtained under a distance prior uniform in co-moving volume.

TABLE VI .
Parameters of the synthetic signals analysed in section VI.C.