Probing lens-induced gravitational-wave birefringence as a test of general relativity

,

GR also dictates that GWs have only two tensor polarisations (+, ×) which propagate independently from each other at the speed of light.However, in alternative theories of gravity, extra degrees of freedom (tensor, vector, scalar) [20] can mix with GWs as they propagate, producing phenomena similar to neutrino oscillations (i.e.due to interactions between different neutrino flavors [21]).In Lorentz invariant theories, the symmetries of the FRW restricts mixing effects to tensor degrees of freedom, either fundamental (e.g. in bigravity) or composite (e.g.multiple vector fields) [22][23][24][25][26].However, inhomogeneities spontaneously break Lorentz symmetry, allowing interaction between GWs and scalar or vector degrees of freedom [27,28].This leads to new, testable predictions, and opens new opportunities to probe the gravitational sector beyond the FRW limit.
The evolution of GWs on an inhomogeneous background is described via propagation eigenstates: linear combinations of the interaction eigenstates (h + , h × and perturbations of additional polarisations) with a welldefined dispersion relation (analogous to massive neutrinos).As the relation between interaction and propagation eigenstates and their speed depends on position and direction, an inhomogeneous region of space splits the original signal into several components, each arriving with a relative time delay [27].Moreover, if deviations from GR are small, two eigenstates correspond to mostly-tensorial polarizations (linear combinations of h + , h × plus a negligible correction distinguishing both), with a very small speed difference. 1  We will refer to the difference in propagation speed between the +, × polarizations as lens-induced birefrin-gence (LIB).LIB is analogous to the way a non-isotropic crystal, such as calcite, splits light in two beams.This splitting is caused by a difference in the refractive index of the linear electromagnetic polarizations, which depends on the alignment of the polarization vector with the crystal structure.In our case birefringence is caused not by anisotropies in a crystal, but by the background configuration of additional, non-GR fields which spontaneously break Lorentz symmetry.Moreover, LIB splitting is independent of the frequency (in the high frequency approximation assumed), which would correspond to a perfectly isochromatic birefringent crystal.Because GW detectors have excellent time resolution and bad sky localization, our main observable will be the time delay between splitted signals and not their angular separation.
If the arrival time difference between the mostlytensorial polarisations is larger than the duration of the binary merger signal then we would see only one polarisation at a time, mimicking a binary with either a zero inclination angle (face-on binary) or with 90 • inclination angle (face-off binary), appearing as GW echoes.Since the detectors are more sensitive to face-on binaries, one can expect excess of near zero inclinations in case of birefringence for the population of binaries.If the delay between the polarisations is larger than typical observing runs or the amplitude of one of the polarisations decays faster than the other, e.g. in Chern-Simons gravity [29], one would also expect an anisotropic inclination distribution.Current observations though show that the orientation distribution is consistent with being isotropic [30].
However, when the time delay is less than the duration the signal, the GW waveform would be distorted or "scrambled" due to the interference of both polarisations.Note that this effect is frequency-independent, and hence distinguishable from a different dispersion relation for the + and × modes or the circularly polarized combinations (L-R), as predicted in GR [31][32][33] and alternative theories [34][35][36].As it is not suppressed by the frequency, LIB is the dominant effect in the high-frequency limit for theories in which this effect is present.
Our study analyzes for the first time arrival time difference (∆t 12 ) between the two polarisation states due to different propagation speeds (frequency-independent dispersion relations) as a result of LIB.This is a new, model-independent test of a basic prediction of GR.We use these generic results to constrain GW lensing effects beyond GR, for example in scalar-tensor theories with derivative interactions [27].
LIB signatures are not linked to a specific regime of gravitational lensing in GR, such as strongly magnified or multiple images.The scale on which LIB can be observed is very sensitive to the theory parameters and independent of the Einstein radius R E , which characterizes the regimes of gravitational lensing.Hence, for sufficiently strong deviations from GR, LIB can be detected for impact parameters much larger than R E , typically associated to weak lensing.Therefore, LIB-tests can be applied to all the GW detections.In addition, LIB can be impor-tant for lenses very close to the source or the observer, for which R E vanishes.This is particularly interesting for sources merging near massive objects (e.g. a supermassive BH) since the background configuration of the additional fields enhances LIB.
The rest of the paper is organised as follows.In Sec.II we describe our LIB waveform model, methods for data analysis and introduce parameterized LIB observation probabilities.In Sec.III, we perform the birefringence test over a set of simulated GW events, and then to real events using the Bayesian model selection framework.In Sec.IV, we study the implications of the results in constraining LIB probabilities and beyond-GR theories.Finally, in Sec.V we summarize the main results and discuss future prospects.

II. METHOD
In GR, GWs have only two polarisations (+, ×) which propagate independently at the speed of light over the background FRW metric.A given ground-based detector I measures the GW signal, h(t) as a linear combination of these polarisations [37], where, F + I , F × I are the detector antenna pattern functions.In the case of compact binary coalescence (CBC), the relative amplitude of the polarisation modes depends on the inclination and polarisation angles of the binary w.r.t to the line of sight, and also depends on the sensitivity of the detector for the source location at the time of arrival.The overall amplitude of the signal is inversely proportional to the luminosity distance of the source.Masses and spins of the source dictate the frequency evolution of the signal and its amplitude.

A. Parameterized Lens-induced Birefringence Waveforms
When there is any inhomogeneity along the travel path of a GW, e.g. an intervening galaxy, the GW can be gravitationally lensed.Gravitational lensing of a GW can produce multiple images of the original signal (strong lensing) or cause distortions (microlensing), but, in GR, both polarisations are affected in the same way, i.e. the polarisation rotation is negligible for any sensible astrophysical lens [38,39].However, in alternative theories of gravity the additional fields can couple with the tensor polarizations around the lens and modify the GW propagation eigenstates.These eigenstates are a linear combination of original GW polarisations that evolve independently, each with a different speed, thus reaching the detectors at different times.We will assume spherically symmetric lenses, focus on the limit of small deviations from GR, so the mostly-metric propagation eigenstates 1. GW polarisations (left) and detector strain (right) for a CBC (30+30)M with birefringent time delays ∆t12 = 5, 10, 100 ms (top to bottom).The sky localization and detector orientation correspond to F + = −0.38,F × = 0.71 and LIB strain is given by Eq. 5 correspond to linear combinations of h + , h × (depending on the projected angle between the lens and the source), and neglect the additonal eigenstates (See Ref. [27] and footnote 1).
This class of LIB of GWs can be captured in a phenomenological manner as proposed in Ref. [27].After diagonalizing the propagation equations, the propagation eigenstates can be computed and one gets the transformation matrix S relating the polarisation amplitudes in GR and after the LIB: where and ∆ = e −iω∆t12 with ∆t 12 is the time delay between the polarisations and φ lens as the angle between the lens and the source, relative to the direction of GWs propagation that dictates the polarisation mixing.
It is easy to note that for φ lens = π/2, S = diag(1, ∆), hence the signal observed by the detectors will just be superposition of (+, ×) arriving at different times.
whereas, if ∆t 12 = 0 the LIB waveform morphology will be identical to the GR one, independent of φ lens .Fig. 1 compares GR v/s LIB waveform polarisations and the detector strains for various values ∆t 12 .Under LIB, the polarisations interfere leading to waveform distortions.Since lensing is an environmental effect which can occur through any local inhomogeneity in the path of GWs, the parameters ∆t 12 and φ lens are expected to vary between GW events.The time delay distribution depends on the theory and the (usually unknown) lens properties and the configuration relative to the source.In general, one can only predict the probability of the birefringence parameters given a gravitational theory and matter distribution (unless further information or assumptions are employed about the source's location or the signal's trajectory), see Sec.II D. This is in stark contrast to other tests of GW propagation (that are done with individual GW events) in which deviations represent a fundamental property of gravity (eg.massive graviton dispersion relations) and are thus the same across all events and only depend on their distance [20].

B. Template Mismatch Studies
In order to quantify distortions due to GW birefringence, we calculate the mismatch between the GR and LIB waveforms as seen the LIGO-Virgo detectors.At each detector (I), the mismatch between the injected waveform (h inj I ) and the recovery waveform (h rec I ) is given by: where, (• | •) symbolises the noise-weighted inner product: Here, ã, b represent the Fourier transform of the time series a(t), b(t); [f min , f max ] is the frequency range over which the inner product is evaluated; * represents complex conjugation and S n (f ) is the colored Gaussian noise power spectral density (PSD) at the detector.The norm ||h|| = (h|h) is the optimal SNR of a waveform.We define the total mismatch (M) for network of detectors as the signal-to-noise ratio (SNR, ρ I ) squared weighted average of individual detector matches, Note that the mismatch is a normalized quantity and is maximised over time and phase shifts.Thus, the mismatch quantifies differences in morphology between the signals.Whereas, during the parameter estimation (PE) from GW signals both the mismatch and SNR play a role as the log-likelihood ∝ M I ρ 2 I around the maximum likelihood parameters.We first wish to quantify the overall detectability of the birefringence.Later, we will estimate parameters using Bayesian inference, accounting for correlations between all parameters.LALSuite software package [41].The waveforms are then projected onto the LIGO and Virgo detectors using their antenna pattern functions, as implemented in the bilby [42] software package.The LIB waveforms have additional frequency modulations which depend on the two parameters: ∆t 12 and φ lens (see II A).Therefore we calculate the mismatch between the GR and LIB waveforms, keeping all the other parameters identical and fixed for the two waveforms.In practice, we calculate M I using pycbc.filter[43] module.The detector noise is generated using the zero-detuned high-power PSDs of Advanced LIGO and Advanced Virgo at their design sensitivities [44,45].
We consider two systems of binaries, one whose parameters resemble to that of the first CBC detection GW150914 and other of a higher mass-ratio CBC GW190814 where the presence of higher-order modes (HoMs) of GWs are significant.For both the systems, we inject a GR and a LIB waveform and recover with LIB waveform to calculate the mismatch.The parameters for both the CBCs are mentioned in Appendix A Fig. 3 shows mismatches for a GW150914-like CBC (top) and GW190814-like CBC (bottom).As expected for a GR injection i.e. ∆t inj 12 = 0 and φ inj lens = 0, the mismatch is minimum for ∆t rec 12 0, for all φ lens as expected from Eq. 3. Additionally, the local minimum of mismatch is at φ rec lens π/3, which could be because of vanishing polarisation (+ or ×) as seen at the detectors which further makes the mismatch independent of the time delay ∆t 12 .The waveform plots in Fig. 2 confirms this as the φ rec lens π/3 waveform resembles the GR ones more as compared to the φ rec lens π/5, especially in the Livingston (L1) detector for GW150914-like CBC.
We also checked the mismatch for the LIB injections (right panel Fig. 3) with ∆t inj 12 = 10 ms and φ inj lens = π/5 and the mismatch is minimum at, ∆t rec 12 ±10 ms and φ rec lens π/5, π/4 + π/5.We can infer the degeneracy between ∆t 12 and the coalescence time (t c ) as follows, from Eq. ( 1)-( 4) if φ lens → φ lens + π/4 and ∆ → 1/∆ then, one finds S → S/∆, which implies that the modification at (∆t 12 , φ lens , t c ) is the same as at (−∆t 12 , φ lens + π/4, t c + ∆t 12 ).Note that, this degeneracy stems from the fact that we do not know the composition of h +,× before it encounters the lens.Additional information about the source could break this degeneracy, but we leave these investigations for the future.

C. Bayesian Inference
Bayesian model selection allows us to assign evidences for various hypotheses pertaining to the observed data, and also derive posterior probability distributions of the model parameters conditioned on individual hypotheses.Given the set of data {d} from a network of detectors, the marginalized likelihood (or, Bayesian evidence) of the hypothesis H A can be computed by where θ is a set of parameters that describe the signal under hypothesis H A (including the masses and spins of the compact objects in the binary, location and orientation of the binary and the arrival time and phase of the signal), P (θ|H A ) is the prior distribution of θ under hypothesis H A , and P ({d}|θ, H A ) is the likelihood of the data {d}, given the parameter vector θ and hypothesis H A .
Given the hypothesis H A and data {d}, we can sample and marginalize the likelihood over the parameter space using an appropriate stochastic sampling technique such as nested sampling [46].Bayesian model selection allows us to compare multiple hypotheses.For e.g., the odds ratio O LIB GR is the ratio of the posterior probabilities of the two hypotheses LIB and GR.When O LIB GR is greater than one then hypothesis LIB is preferred over GR and vice versa.Using Bayes theorem, the odds ratio can also be written as the product of the ratio of the prior odds P LIB GR of the hypotheses and the likelihood ratio, or Bayes factor B LIB GR : Since GR has been tested well in a variety of settings, our prior odds are going to be highly biased towards it, i.e., P LIB GR 1.Hence, in order to claim evidence of birefringence the corresponding Bayes factor supporting the LIB hypothesis has to be very large.Since the Bayes factor is the only quantity that is derived from data, for the rest of the paper, we focus on the Bayes factor, i.e. the ratio of evidences under the two hypothesis.
The waveforms under the GR and LIB hypotheses at each detector is the same way as described in Sec.II B. We use the standard Gaussian likelihood model for estimating the posteriors of the parameters under different hypotheses (see, e.g., [37]).We use uniform priors in redshifted component masses of the binary, isotropic sky location (uniform in α, sin δ) and orientation (uniform in cos ι, φ 0 ), uniform in polarisation angle ψ, and a prior ∝ d2 L on luminosity distance.Additionally for the LIB hypothesis, we choose the priors on ∆t 12 as uniform ∈ [−100, 100] ms and φ lens as uniform ∈ [0, π/2].To estimate the posterior distribution and evidences for the GR and LIB hypotheses, we use the open-source parameter estimation package bilby package [42] coupled with the dynamical nested sampler dynesty [47].

D. Lensing Probabilities
The (non-)observation of birefringence can help us put constraints on theories beyond GR that predict LIB.According to GR, the strong lensing of GWs caused by galaxies occurs when sources lie inside the Einstein radius of the lens, which depends on lens mass and profile.This is the relevant scale determining the probability of lensing.However, birefringence beyond GR is in principle independent of the ratio between the impact parameter and the Einstein radius, changing the probability of observing LIB compared to strong lensing.It is thus possible to have LIB time delays without multiple images, but birefringence could also occur for strongly lensed GWs, in which case it applies to each image separately, as typical time delays between images will be larger than ∆t 12 [27].
Assuming that the lenses are randomly distributed, birefringence detection is described by Poisson statistics.A series of observations with L lensed and U unlensed GW events has an associated probability, The result depends on LIB rate for the i th event: Here S, L denote parameters corresponding to the source and lens/theory (i.e.beyond GR), P i is the posterior distribution of the source parameters and P is the prior, which includes relations between parameters (i.e. the measured ∆t 12 as a function of lens mass and beyond-GR parameters). 2The birefringence optical depth, τ (z s , p L ) is the fraction of the sky for which LIB is detectable for sources at a redshift z s .Hereafter we will assume the posterior to be sharply peaked at the mean source redshift z s and include the integration on the lens model parameters ( p L ) in the definition of the optical depth, so λ i ≈ τ (z s,i ).If birefringence is excluded in all events, the probability only depends on the total optical depth τ tot ≈ i=1..N τ i ≈ i=1..N λ i .In Appendix B we will comment on opportunities to study GW birefringence beyond the Poisson statistics.
The lensing optical depth τ (z s ) depends on the angular cross section σ (z s , p L ) and the density of lenses n ( p L ) [49].In the following we will explicitly write the lens redshift z L and let p L denote the remaining properties (i.e.lens mass & theory parameters).The total density of lenses at redshift z L is then n (z L , p L ) d p L .The optical depth is computed directly by adding-up the cross-sections weighted by the density at different redshifts, i.e.
is the physical volume given the solid angle δΩ, angular diameter distance to the lens D L and the Hubble parameter H(z).For simplicity, we will assume point mass lenses of mass M throughout.In GR, the lensing cross-section is σ = πθ 2 E , where is the Einstein angle, D S is the distance to the source from the earth and D LS is the distance between lens and source.
The relation between the LIB-time delays, the theory parameters and the configuration of the lensed system is complex (See Sec.VI of Ref. [27] for a worked-out example in a viable Horndeski theory).For this reason we will first consider two phenomenological models with generic dependences of lensing cross section.As a first example, we will assume that the relevant LIB scale is proportional to the Einstein angle, θ E X = α X θ E , so that the cross section becomes Then the optical depth is given by Eq. ( 74) in Ref. [27] where the lenses have been assumed point-like.
Under these assumptions, lensing probabilities are independent of the mass function. 3 In our second example we assume that the relevant LIB scale is given by a constant physical scale associated to each halo.Moreover, we assume that this scale depends on the halo mass as a power law.Accordingly, the crosssection reads, The scale R 12 fixes the probability of lensing for halos with M = 10 12 M , while n allows us to extrapolate to different halo masses.Below we will discuss some cases of interest.We now generalize the expression for the optical depth presented in Ref. [27] (Eq.76) to include a realistic halo mass function.The optical depth from Eq. ( 15) is given by where Here f (M, z) = M 2 ρ0 dn dM is the scaled differential mass function (dimensionless) with ρ 0 as the matter density of universe at z = 0. We will use the Tinker et al. form [52] as implemented in the Colossus package [53].As our approach is phenomenological, we assume a Planck ΛCDM cosmology [54].The true optical depth of a consistent LIB model will typically depend more strongly on the theory parameters (e.g.entering Eq. ( 15) via R 12 ) than on the precise values of H(z), f (M, z) of the underlying LIB cosmology, including the effects of deviations from GR in cosmological expansion and structure formation.This is the case for the example theory discussed in Sec.IV B: GW lensing effects are orders of magnitude more sensitive than solar-system tests (cf.Fig. 8), in turn more stringent than current cosmological observations [55,56] (for theories without a screening mechanism).
In addition to the Einstein radius scaling, Eq. ( 14), we will consider three cases of interest: • n = 1: the physical scale is proportional to the total halo mass, much like the Schwarzschild radius.The rates are dominated by large masses and saturate at z s 1, as the more massive halos are exponentially suppressed at early times.This case captures the dependence of the time delay in a Horndeski model (Sec.IV B).
• n = 1/2: the scale has the same mass scaling as the Einstein radius and leads to rates independent of M .However, the overall redshift dependence is different, as R E depends also on D S , D LS .
• n = 1/3: this mass scaling favors lighter halos and thus grows very rapidly with redshift.It is motivated by the mass-dependence of the Vainshtein radius R V , i.e. the classical strong-coupling scale [57].For n = 1/3 the contribution from lighter halos diverges and a low mass cutoff needs to be included (we will take M > 10 7 M ).We will see this mass dependence when considering a binary merging near an active galactic nuclei in a Horndeski theory (Sec.IV C).
The optical depths for each of the cases as a function of the source redshift are plotted in Fig. 7.Note that these phenomenological models assume that the cross section is independent of ∆t 12 , and thus common for all the analyzed events.Dependence in the time delay can be included, e.g. by multiplying Eqs. ( 14), ( 15) by a factor (∆t 12 /10ms) −k .For the sake of simplicity, we will not include this dependence and instead interpret the obtained values of α X , R 12 (n) at the median 95% c.l. from all analyzed events.

III. RESULTS
In order to test our method and understand the observing capabilities, we first apply our pipeline to injections.We then proceed to analyse the latest GW catalog (GWTC-3).

A. Injections
We inject GW150914-like signals in simulated Gaussian noise with ∆t inj 12 ∈ {0, 1, 3, 10, 30} ms and φ inj lens = π/5 rad and recover them by running the parameter estimation routines under the GR and LIB hypothesis, as mentioned in Sec.II C.This allows us to compute the Bayes factors B LIB GR to compare the two hypothesis for each injection.
The injections are set to have SNR ∈ {10, 15, 20, 30, 40}.These SNRs are achieved by inversely scaling the luminosity distance (d L ) of the injections.Fig. 4 shows the violin plots and the log Bayes Factors for these injections.The posteriors on φ lens are uninformative in all the cases and hence not shown in the figure.LIB time delays (∆t 12 ) as small as 1 ms are recovered well with SNR 30 and 40 signals, whereas for SNR 10 signals time delays < 30 ms are not measurable.As one would expect, only with ∆t inj 12 = 0, i.e.GR injection the log B LIB GR < 0, i.e. consistent with the GR hypothesis, except for the SNR 10 case where log B LIB GR = 0.1 is within the intrinsic sampling error on the calculation of evidence.For ∆t inj 12 ∈ {1, 3, 10, 30} ms we find that log B LIB GR > 0, i.e. consistent with the LIB hypothesis for all the SNRs except 10.Hence, both model selection and sensitivity to measure the time delays improve with an increase in SNR.
We also note that the time delays are measurable up to a symmetry around ∆t 12 = 0.This is because LIB waveform, Eq. ( 3), is identical at (∆t 12 , φ lens , t c ) and (−∆t 12 , φ lens + π/4, t c + ∆t 12 ), which we also saw during mismatch studies with LIB injections, see right panel of Fig. 3.It is possible that for asymmetric and inclined binaries with significant HoMs a better measurement of φ lens could break the ∆t 12 parity as well, however, this needs to be investigated further and is left for future studies.
Overall, as the sensitivity of the detectors improve we shall be able to measure the birefringence time delays as small as 1 ms.On the other hand, in the absence of birefringence we expect to see ∆t 12 (s) posteriors which are consistent with the GR value, i.e. zero and bayes factors that favour the GR hypothesis.Most events in GWTC-3 have SNR < 30.The time delay posteriors are hence expected to be broad, however the Bayes factors should already indicate whether LIB is present or not.

B. GWTC-3 Events
We now analyse 43 CBC events from the GWTC-3, that have low detection false alarm rate, FAR 10 −3 yr −1 .These are also the events that are considered for other tests of GR performed previously [12][13][14][15].Fig. 5 shows the ∆t 12 posteriors and the log Bayes Factors for the real events.We find that for almost all the events the ∆t 12 posteriors are broad containing zero, i.e. consistent with GR.This is mostly due to the low SNRs of the events, as seen in our injection study.We also find the tightest 90% credible bounds on |∆t 12 | 0.51 ms coming from the event GW200311_115853 which has reasonably high SNR ( 17.8) and moderate redshift (z ∼ 0.23) as compared to other events.As expected φ lens posteriors are uninformative for almost all the events.38 out of 43 events resulted in log B LIB GR < 0, and hence consistent with the GR hypothesis.Only a few events showed preference to LIB hypothesis (log B LIB GR > 0), with highest one for GW190521 (3.21) and then GW190910_112807 (0.8), GW170823 (0.8), GW191109_010717 (0.7) & GW191129_134029 (0.1).
The Bayes factors are known to be prior dependent and its value does not signify the confidence in preferring one hypothesis over the other, but rather the preference of one hypothesis over the other given a set of prior assumptions.The model with extra parameters (LIB) could be either fitting the noise or the signal, therefore we take a frequentist approach to determine the significance by considering different realisations of noise.We focus on the event with the highest Bayes factors (GW190521) and estimate its significance.We generate the background distribution of Bayes factors by injecting GR signals in Gaussian noise using the power spectral density around the trigger time.To calculate the false alarm probability corresponding to the observed Bayes Factor for the event GW190521, we simulate a hundred GR injections, whose parameters are taken from the posteriors of GW190521 event for the GR hypothesis.Fig. 6 shows the background distribution of the Bayes factors and the corresponding false alarm probability (FAP).The FAP corresponding to each B LIB GR = κ is calculated as the fraction of the background events having B LIB GR > κ.We find that for the observed log B LIB GR = 3.2 for GW190521 is 0.48, i.e. its significance is less than 1σ.
It is to be noted that GW190521 is a remarkably loud but short (< 100 ms) signal, being easily fit by widely different hypotheses such as head on-collision of a boson star [58] or left-right (L-R), frequency-dependent birefringence [34].For the interested reader, in Appendix C (Fig. 10), we also plot posteriors and the waveforms corresponding to the maximum a posteriori parameters for both the hypothesis, over the whitened signals observed at each detector.The plots show that the LIB hypothesis is also fitting the noise, and might therefore give a high Bayes factor, B LIB GR .The other events with B LIB GR > 0 are also show similar behavior and as their preference for LIB is marginal, we conclude that none of the events have any significant Bayes factor and find no strong evidence for birefringence.

IV. IMPLICATIONS
In our analysis of the latest GW catalog, we have found that the majority of the events disfavor birefringence.For a subset of them (most notably GW190521) while the Bayesian inference prefers the LIB hypothesis, a followup background study indicates that most simulated GR signals give comparable Bayes factors.In the following, we present the implications of these results.First, we consider the implications for generic LIB.Then, we study the constraints on a specific scalar-tensor theory that predicts LIB.Finally, we entertain the possibility that GW190521 was emitted in an active galactic nucleus (AGN) and is displaying evidence of birefringence.

A. Constraints on generic LIB
From the non-observation of birefringence in the 43 events from GWTC-3 and using their median redshift values [8], we estimate the total optical depth for the LIB models discussed in Sec.II D. The non-observation of birefringence translates to constraints on the phenomenological model parameters, as summarized in Table I.For reference, we also show the constraints obtained from the full GWTC-3 (90 events).∝ M 1/2 R12 < 20 (12) kpc ∝ M 1/3 R12 < 12 (6.9)kpc TABLE I. Constraints on the phenomenological models Eq. ( 14) and Eq. ( 15), assuming no birefringence detected for analysed (all) GWTC-3 events.
The higher redshift events have higher optical depth.Non-observation of birefringence in distant sources leads to more stringent constraints, although the SNR scales with the inverse luminosity distance: hence some of the highest redshift events will not be considered because of our FAR threshold.The final results depend strongly on the model via source redshift and halo mass function.Figure 7 shows the redshift dependence of the optical depth for the parameterizations discussed, adopting the 95% c.l. values found by our analysis along with the observed GWTC-3 redshift distribution.
Future observations will increase in number of events and their SNRs, allowing better constrain the birefringence probabilities and ruling out more of the parameter space in the alternative theories of gravity.Higherredshift observations above our FAR threshold will be especially valuable to constrain α E and R 12 for n = 1/3, 1/2 (see Fig. 7).

NGW(z)
FIG. 7. Birefringence optical depth for the phenomenological models considered here, using the parameters correspond to the 95% c.l. limit compatible with the non-observation of LIB.The dark (light) gray shaded histograms show the binned redshift distribution of analysed (all) GWTC-3 events.See Sec.II D for details.

B. GW birefringence in Horndeski theories
Let us now use our results to a specific theory that predicts LIB.We will present the theory and translate the constraints of the phenomenological model (Table I) into fundamental theory parameters.In the next subsection we will interpret a tentative detection of LIB in GW190521 as an AGN binary within the same theory.We will focus on a particular scalar-tensor theory within the Horndeski class [59], whose LIB predictions have been analyzed in detail, cf.Sec.6 in Ref. [27].The model is described by two parameters describing couplings between the Ricci scalar (R) and the new field φ: a linear coupling p 4φ and a derivative coupling suppressed by an energy scale Λ 4 .The Lagrangian of this theory can be written as [60,61] where R is the Ricci scalar, G µν is the Einstein tensor, M P is the Planck mass in units of c = h = 1, and ∇ the covariant derivative.The GR limit corresponds to p 4φ → 0, Λ 4 → ∞ .The parameters of this model are stringently constrained by the speed of GWs on the homogeneous FRW metric [62][63][64][65] (see also [66][67][68]), as observed by the near-coincident arrival of GW170817 and its associated counterpart [69]: |c g /c − 1| 10 −15 .Compliance with this limit requires [27] While this constraint is extremely stringent, LIB allows comparable limits.Specifying a model allows one to derive concrete predictions.The dependence of the time delay contributions (Shapiro, geometric) with the lens and theory parameters is complex.Nonetheless, we observed that the time delay decreases monotonically with the impact parameter.Moreover, its slope changes and becomes very sharp beyond the Vainshtein radius r V represents the scale at which the scalar field has a strong self-coupling near a massive object [57] 4 .In many scalar-tensor theories this leads to screening: a suppression of scalar field fluctuations for r < r V , allowing the theory to approximately recover GR around massive bodies.However, screening is not necessary in this model given the stringent constraint from GW170817 (Eq.( 19)).In this case, the strong-interaction within r V represents a a large coupling between the scalar field and the Riemann tensor, the kind of interaction producing LIB.
For simplicity, we will focus on the Shapiro time delay.The geometric time delay is usually dominant for massive halos at intermediate distances.(Fig 12 in Ref. [27]).It is proportional to the Einstein radius, and it could thus be captured generalizing Eq. 14 to extended lenses.Neglecting the geometric time delay is conservative but reasonable, since our constraints involve events at relatively low redshift (z 0.6).
The LIB predictions have a simple dependence on the lens mass and theory parameters.We verified that ∆t 12 ∝ M Λ −4/3 4 . The proportionality to the mass stems from the scaling with the Vainstein radius, as well as ∆t 12 , the impact parameter and the time spent by the GW on the region of sizeable birefringence are all ∝ r V .It allows us to directly connect the theory parameters to R 12 with n = 1, as constrained in the phenomenological model (15).The scaling with Λ 4 allows us then to find R 12 by equating ∆t 12 (R 12 ) to the constrained value for different p 4φ , but keeping M = 10 12 M , Λ 4 fixed.For simplicity, we will take a sensitivity of ∆t 12 ∼ 10 ms to define R 12 .Using the actual posteriors on ∆t 12 for each of the GWTC-3 events analyzed will not qualitatively affect these constraints in any significant manner.
The excluded region is shown in Fig. 8, along with constraints from the GW speed on FRW and lunar laser ranging (no screening, p 4φ 1, see Sec.VBc in Ref. [55]).The change in slope at low Λ corresponds to a transition in which R 12 surpasses the Vainsthein radius (20).For Λ 4 H 0 the birefringence constraints approach those of the GW speed: this happens when r V is so large that most GWs are effectively behind a lens.Then the constraints are satisfied in the limit c GW → c, equivalent to Eq. (19).For the sensitivity of GWTC-3 this happens for Λ 4 H 0 , where LVK frequencies lie Horndeski theory [27] using the lens-induced birrefringence (LIB) test.Shaded regions are excluded according to GWTC-3 (this work, blue solid), GW170817 [62][63][64][65] (green dashed) and GW190521 assuming an AGN binary [70] (red dotted, see Fig. 9).The GR limit corresponds to p 4φ → 0, Λ4 → ∞, when the scalar field is decoupled from gravity and its derivative interactions suppressed.See sections IV B, IV C for details.If GW190521 is associated to an the upper shaded region improves the overall GWTC-3 constraints for Λ4 3H0.If we further assume a detection of LIB, then the bottom red shaded region excludes GR.For reference, we also indicate Solar System constraints (gray horizontal) and the region there the GW frequencies at LIGO-Virgo detectors are larger than the (non-linear) energy scale of the effective field theory (magenta vertical).
beyond the validity of our framework as a classical effective field theory [71].At increasing Λ 4 the constraints degrade, since probability becomes very suppressed, Eq. (20).For Λ 4  10 2 H 0 solar system constraints become more efficient than birefringence.

C. GW190521 as an AGN binary
Let us now discuss the implications of a possible birefringence detection associated to GW190521.Given the constraints from the speed of GWs (19) on our example Horndeski theory, the chances of birefringence being caused by a lens in the line of sight are very small.We will instead interpret our result, ∆t 12 9.5 ms as due to an environmental effect near the source.We will follow the scenario outlined in Ref. [70], where a candidate electromagnetic counterpart from an AGN J124942.3 + 344929, observed 34 days after the GW signal, suggests that the binary merged in the environment of a supermassive black hole (SMBH).Note that there are important uncertainties, both regarding the counterpart association (given large GW localization uncertainties [72]), and the significance of LIB detection (given our analysis of random noise realizations, Fig. 6).This discussion is therefore not a statement on the status of GR.Instead,  19).The horizontal line corresponds to the lower bound on ∆t12 = 9.5 ms from the analysis of GW190521.The region between the shaded areas encompasses 95% probability for a random observer.The lowest θ represents trajectories passing at 10 Schwarzschild radii of the SMBH.
it proves the potential of identifying environments of GW sources to test gravity theories.Following Ref. [70], we will assume an AGN binary scenario where the mass of the SMBH is M SMBH ∼ 10 8 M and the source is located in a migration trap at r ∼ 700GM SMBH .Then, using the framework of Ref. [27] allows us to compute the time delay as a function of the angle between the observer and the source, relative to the SMBH.The results are shown in Fig. 9 for p 4φ = 10 −8 , Λ 4 = 10H 0 , compatible GW170817 (19), and different distances to the SMBH (the dependence on M SMBH is less pronounced, see below).The birefringent time delay becomes very large as θ → 05 .Ultimately, the maximum time delay is limited by the existence of the horizon, θ s ≈ 2GM/r.The birefringence also vanishes as θ → π because of geometric cancellations in spherical symmetry.
We will translate these predictions into theory parameters and include the comparison to GW190521.We will take the values of M SMBH and the source radius fixed, and consider the credible intervals as being determined by the angle θ, cf.Eq. (B1) and Appendix B. As we do not know the emission angle, we will assume a flat prior on the sphere P (θ) = sin(θ), and take the upper/lower 95% c.l. values based on P (θ) (excluding the shaded regions in Fig. 9).Limits on the theory parameters can be derived by noting that ∆t 12 (θ) ∝ p including different assumptions about the SMBH mass.Note that M SMBH enters with a different scaling than the lens mass in Sec.IV B, due to the source being at a fixed distance from the SMBH and within its Vainsthein radius, rather than randomly located.
The implications of GW190521 for the example theory (18) are shown in Fig. 8.The orange regions are excluded if we assume the AGN scenario as discussed above.The lower region excludes the GR limit p 4φ → 0, Λ 4 → ∞ and relies on trusting the measured birefringence ∆t 12 9.5 ms to be due to new gravitational physics.Even if the result is interpreted as noise (e.g.Fig. 6), assuming the AGN scenario leads to exclusion of the upper orange region (assuming sensitivity to ∆t 12 9.5 ms).Because of the different scaling with the parameters, the detection of an AGN binary becomes even more constraining than GW170817 for high Λ 4 .The beyond GR interpretation can be further probed not only by AGN events but by high-redshift multi-messenger observations.In this case, the time delay between GWs and EM counterparts scales as ≈ 1s 10 8 p 4φ H0 Λ4 2 D 40Mpc and can be probed by distant neutron-star mergers.

V. SUMMARY AND OUTLOOK
In this paper, we explored LIB as a test of GR using observations of GWs.LIB produces a difference in the arrival times of the GW polarisations in signals from the binary mergers, predicted by some alternatives to the GR.Using the Bayesian model selection framework, not only we can identify the signatures of birefringence, but also measure the time delay between the arrival of both polarisations (∆t 12 ).We show that this difference can be measured with high accuracy, of order few milliseconds with existing events and is likely to improve in the future following detector upgrades.
Using the latest GW catalog, GWTC-3, we find no strong evidence for the observation of the birefringence, with the highest log B LIB GR = 3.21 for the heaviest binary black holes so far, GW190521.However, after simulating similar events under different noise realizations, we determine that there is a false alarm probability of 48%.This event has been associated with an AGN flare, possibly indicating that the merger occurred near a SMBH.This AGN scenario is especially favorable for the observation of LIB since the SMBH would act as a strong source of LIB.However, the AGN flare-GW association has been disputed, see e.g.[72].Moreover, the loudness and shortness of this event makes it susceptible to different astrophysical and fundamental physics interpretations.It has also been found to be violating many tests of GR and mimicking many exotic scenarios of compact binary such as head-on-collision of a boson star [58] or left-right (L-R), frequency-dependent birefringence [34].The latter effect is related to our flavour of LIB, with two impor-tant differences: first, L-R birefringence is defined in the basis of circularly polarized waves (left vs right, rather than + vs ×), and second, it depends on the GW frequency.Both features also appear in the Gravitational spin Hall effect in GR, although the L-R time-delay is very suppressed [32,33].
Of the 43 analyzed events, we find that the tightest bounds on the time delay between the two polarisations is ∆t 12 ∼ 0.51 ms at 90% credible intervals coming from the GW200311_115853 merger event, while the median is ∆t 12 80 ms.From the non-observation of LIB, we constrained the lensing optical depths in a phenomenological parameterization in which the lensing cross-section is proportional to the Einstein radius or a fixed physical radius with a power law scaling in the halo mass.
Our constraints can be translated to gravitational theories that predict LIB.As an example, we presented novel constraints on a Horndeski scalar-tensor theory featuring a new dynamical field and two free parameters.The theory is stringently constrained by the speed of GWs on the homogeneous FRW background following GW170817.Nevertheless, the lack of observed LIB places stringent bounds, which can be orders of magnitude better than Solar System tests and in some limits as tight as the GW speed bound.As a proof of principle of LIB due to a known inhomogeneity, we interpret GW190521 as an AGN binary (assuming that the signal originated in close proximity to a SMBH [70]) in terms of our example theory.Then, the large curvature is able to generate detectable LIB even when deviations from GR are minute.Our |∆t 12 | 9.5 ms results would then exclude GR, placing a minimum value of the theory parameters.When interpreting this result as a fluctuation and GR to be correct, the AGN hypothesis is still able to produce very stringent bounds, that can even overcome those of the GW speed on FRW.
In future, the methods we developed here can be useful for studying new classes of events.Of particular interest will be signals where the merger is either near an SMBH or is known to have a lensed counterpart due to strong lensing.In such cases, the information about the lens may improve the constraints substantially, along the lines of the AGN-scenario we discussed.The increase in detection rate and a growing chance of strongly lensed GW identification makes LIB test also relevant for future runs of LVK detectors and upcoming GW detectors such as Einstein Telescope, Cosmic Explorer and LISA [73][74][75][76].Lastly, the addition of ground-based detectors such as LIGO-India and KAGRA can allow us to measure extra linear combinations of the GW polarisations and construct a null-stream [77] to extract each of the polarisations individually.The extracted polarisations can then be used to test their consistency with GR or other theories of gravity directly.
Strongly lensed GW signals may allow us to measure additional linear combinations of the same GW polarisations and hence improve various tests of GR [78], including the one proposed here.Ultimately, developing LIB predictions for other alternative theories and generalizing the model-independent parameterizations presented here will allow our results to further test the landscape of theories beyond GR.from strong lensing probabilities, which are weighted by the Einstein radius, which vanishes when D L → D S (near the source) or D L → 0 (near the detector).In contrast, birefringence probabilities do not suffer such suppression and can be sizeable for objects near the source or the observer.Our optical-depth framework (Sec.II D) does not consider this possibility.
Source environment may play a role for LIB, as GW sources will generally be located in regions denser than the cosmic average.In this case, the host galaxy (or objects within it) would have a much larger density compared to the cosmological average used in, e.g.Eq. ( 16).In addition, the projected cross-section ∝ 1/D 2 will be larger for nearby objects, thus enhancing the probabilities.Given a distribution of GW sources near an object, the posterior on the theory parameters p can be obtained as P ( p) = drdθP s (r) sin(θ)P (∆t 12 (r, θ, p)) . (B1) Here we have assumed a symmetric r-dependent distribution.The θ dependence corresponds to a uniform prior on the sphere.This simple dependence could be used to model the effect of the source's galaxy or nearby objects.An extreme case of environmental enhancement is given by a binary merging in an AGN near a supermassive black hole (SMBH), as discussed in Sec.IV C, taking the multi-messenger scenario of GW190521 and its implications for the example Horndeski theory.Estimates for the rate of such events are uncertain.Nonetheless, in some cases it might be possible to associate an event with a SMBH thanks to an EM counterpart [70], multiple images due to strong lensing [79][80][81] or strong-field propagation effects [33].
Another potential to improve the quoted result is by correlating GW arrival direction with known lenses.relevant in cases where the Milky Way (or perhaps even the Sun) may imprint an observable birefringence.Adding information on the GW direction, relative to known objects, will allow better constraints on those scenarios more effectively than assuming randomly located lenses.For instance, if stellar-scale lenses are relevant in a given theory and the cross-section scales as the physical radius (allowing nearby lenses to contribute), sources behind the milky way can probe a much larger effective cross-section than given by Eq. 17.
Finally, any confident detection of a lensed GW can be used to refine constraints within a given model.This would follow either through the identification of several GW detections as images of the same underlying source or through waveform distortions (millilensing).Both cases allow information about the lens mass and impact parameter to be recovered, at least when assuming a lens model [76,82,83].That information can then place constraints within a specific theory of gravity.

FIG. 2 .
FIG.2.GR and LIB detector frame waveform amplitudes in frequency domain of GW150914-like CBC.The birefringence leads to additional frequency modulations and distorts the GR waveform.The magnitude of these distortions are however dependent on the two parameters: ∆t12 and φ lens .

Fig. 2 FIG. 3 .
Fig.2shows frequency domain LIB and GR waveforms for a GW150914-like CBC.The waveforms are generated using the approximant IMRPhenomXPHM[40], as implemented in the LALSimulation module of the

40 FIG. 4 .
FIG. 4. Signal-to-noise ratio (SNR) dependence of ∆t12(ms) posteriors and the log B LIB GR (upper-x axis) for the GW150914-like injections with different values of ∆t inj 12 (lower-x axis) and φ inj lens = π/5.Time delays (∆t12) as small as 1ms are recovered well with SNR 30 & 40 signals, and for SNR 10 signals time delays < 30ms are not measurable.Both model selection and time delay measurements (without the symmetry around ∆t12 = 0) improve with increase in SNR.

FIG. 6 .
FIG.6.Bayes factors distributions for GW190521-like CBC, calculated by doing PE with both the hypothesis, for ∼ 100 GR from the GW190521 posteriors in different realisations of gaussian noise.The false alarm probability for the observed log B LIB GR = 3.2 is found to be 0.48.

FIG. 8 .
FIG.8.95% c.l. constraints on the parameters of a quartic Horndeski theory[27] using the lens-induced birrefringence (LIB) test.Shaded regions are excluded according to GWTC-3 (this work, blue solid), GW170817[62][63][64][65] (green dashed) and GW190521 assuming an AGN binary[70] (red dotted, see Fig.9).The GR limit corresponds to p 4φ → 0, Λ4 → ∞, when the scalar field is decoupled from gravity and its derivative interactions suppressed.See sections IV B, IV C for details.If GW190521 is associated to an the upper shaded region improves the overall GWTC-3 constraints for Λ4 3H0.If we further assume a detection of LIB, then the bottom red shaded region excludes GR.For reference, we also indicate Solar System constraints (gray horizontal) and the region there the GW frequencies at LIGO-Virgo detectors are larger than the (non-linear) energy scale of the effective field theory (magenta vertical).

FIG. 9 .
FIG.9.Birefringent time delay for a source near a SMBH as a function of the angle of the observer, relative to the SMBH.Each line corresponds to a different source distance, for model parameters compatible with GW170817 (see.Eq.19).The horizontal line corresponds to the lower bound on ∆t12 = 9.5 ms from the analysis of GW190521.The region between the shaded areas encompasses 95% probability for a random observer.The lowest θ represents trajectories passing at 10 Schwarzschild radii of the SMBH.